如何使numba @jit使用所有cpu核心(parallelize numba @jit)

前端之家收集整理的这篇文章主要介绍了如何使numba @jit使用所有cpu核心(parallelize numba @jit)前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

我正在使用numbas @jit装饰器在python添加两个numpy数组.如果我使用@jit与python相比,性能如此之高.

但是,即使我传入@numba.jit(nopython = True,parallel = True,nogil = True),它也没有使用所有cpu内核.

有没有办法使用numba @jit来使用所有cpu内核.


import time                                                
import numpy as np                                         
import numba                                               

SIZE = 2147483648 * 6                                      

a = np.full(SIZE,1,dtype = np.int32)                     

b = np.full(SIZE,dtype = np.int32)                     

c = np.ndarray(SIZE,dtype = np.int32)                     

@numba.jit(nopython = True,nogil = True) 
def add(a,b,c):                                          
    for i in range(SIZE):                                  
        c[i] = a[i] + b[i]                                 

start = time.time()                                        
end = time.time()                                          

print(end - start)                                        
您可以将parallel = True传递给任何numba jitted函数,但这并不意味着它总是使用所有核心.您必须了解numba使用一些启发式方法来使代码并行执行,有时这些启发式方法根本找不到任何在代码中并行化的内容.目前有一个pull request,如果无法使其“并行”,它会发出警告.所以它更像是“请尽可能并行执行”参数而不是“强制执行并行执行”.

但是,如果您确实知道可以并行化代码,则可以始终手动使用线程或进程.只是改编example of using multi-threading from the numba docs

#!/usr/bin/env python
from __future__ import print_function,division,absolute_import

import math
import threading
from timeit import repeat

import numpy as np
from numba import jit

nthreads = 4
size = 10**7  # CHANGED

def func_np(a,b):
    Control function using Numpy.
    return a + b

def inner_func_nb(result,a,b):
    Function under test.
    for i in range(len(result)):
        result[i] = a[i] + b[i]

def timefunc(correct,s,func,*args,**kwargs):
    Benchmark *func* and print out its runtime.
    print(s.ljust(20),end=" ")
    # Make sure the function is compiled before we start the benchmark
    res = func(*args,**kwargs)
    if correct is not None:
        assert np.allclose(res,correct),(res,correct)
    # time it
    print('{:>5.0f} ms'.format(min(repeat(lambda: func(*args,**kwargs),number=5,repeat=2)) * 1000))
    return res

def make_singlethread(inner_func):
    Run the given function inside a single thread.
    def func(*args):
        length = len(args[0])
        result = np.empty(length,dtype=np.float64)
        return result
    return func

def make_multithread(inner_func,numthreads):
    Run the given function inside *numthreads* threads,splitting its
    arguments into equal-sized chunks.
    def func_mt(*args):
        length = len(args[0])
        result = np.empty(length,dtype=np.float64)
        args = (result,) + args
        chunklen = (length + numthreads - 1) // numthreads
        # Create argument tuples for each input chunk
        chunks = [[arg[i * chunklen:(i + 1) * chunklen] for arg in args]
                  for i in range(numthreads)]
        # Spawn one thread per chunk
        threads = [threading.Thread(target=inner_func,args=chunk)
                   for chunk in chunks]
        for thread in threads:
        for thread in threads:
        return result
    return func_mt

func_nb = make_singlethread(inner_func_nb)
func_nb_mt = make_multithread(inner_func_nb,nthreads)

a = np.random.rand(size)
b = np.random.rand(size)

correct = timefunc(None,"numpy (1 thread)",func_np,b)
timefunc(correct,"numba (1 thread)",func_nb,"numba (%d threads)" % nthreads,func_nb_mt,b)


numpy (1 thread)       539 ms
numba (1 thread)       536 ms
numba (4 threads)      442 ms



只有当函数更复杂并且实际操作与加载和存储数组元素相比需要大量时间时,您才会看到并行执行会有很大改进. numba文档中的示例是这样的:

def func_np(a,b):
    Control function using Numpy.
    return np.exp(2.1 * a + 3.2 * b)

    Function under test.
    for i in range(len(result)):
        result[i] = math.exp(2.1 * a[i] + 3.2 * b[i])


func_nb = make_singlethread(inner_func_nb)
func_nb_mt2 = make_multithread(inner_func_nb,2)
func_nb_mt3 = make_multithread(inner_func_nb,3)
func_nb_mt4 = make_multithread(inner_func_nb,4)

a = np.random.rand(size)
b = np.random.rand(size)

correct = timefunc(None,"numba (2 threads)",func_nb_mt2,"numba (3 threads)",func_nb_mt3,"numba (4 threads)",func_nb_mt4,b)


numpy (1 thread)      3422 ms
numba (1 thread)      2959 ms
numba (2 threads)     1555 ms
numba (3 threads)     1080 ms
numba (4 threads)      797 ms
