python-Cython函数指针取消引用时间(与直接调用函数相比)

我有一些Cython代码,涉及以下形式的Numpy数组(代表BGR图像)上极度重复的像素操作：

ctypedef double (*blend_type)(double,double) # function pointer
@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
cdef cnp.ndarray[cnp.float_t,ndim=3] blend_it(const double[:,:,:] array_1,const double[:,:] array_2,const blend_type blendfunc,const double opacity):
  # the base layer is a (array_1)
  # the blend layer is b (array_2)
  # base layer is below blend layer
  cdef Py_ssize_t y_len = array_1.shape[0]
  cdef Py_ssize_t x_len = array_1.shape[1]
  cdef Py_ssize_t a_channels = array_1.shape[2]
  cdef Py_ssize_t b_channels = array_2.shape[2]
  cdef cnp.ndarray[cnp.float_t,ndim=3] result = np.zeros((y_len,x_len,a_channels),dtype = np.float_)
  cdef double[:,:] result_view = result
  cdef Py_ssize_t x,y,c

  for y in range(y_len):
    for x in range(x_len):
      for c in range(3): # iterate over BGR channels first
        # calculate channel values via blend mode
        a = array_1[y,x,c]
        b = array_2[y,c]
        result_view[y,c] = blendfunc(a,b)
        # many other operations involving result_view...
  return result;

其中blendfunc引用另一个cython函数,例如以下overlay_pix：

cdef double overlay_pix(double a,double b):
  if a < 0.5:
    return 2*a*b
  else:
    return 1 - 2*(1 - a)*(1 - b)

使用函数指针的目的是避免必须为每种混合模式(其中有很多混合模式)一遍又一遍地重写大量的重复代码.因此,我为每种混合模式创建了这样的界面,省去了我的麻烦：

def overlay(double[:,double[:,double opacity = 1.0):
  return blend_it(array_1,array_2,overlay_pix,opacity)

但是,这似乎花了我一些时间！我注意到,对于非常大的图像(例如8K图像和更大的图像),在blend_it函数中使用blendfunc而不是直接调用blend_it中的overlay_pix时,会浪费大量时间.我认为这是因为blend_it在每次迭代中都必须取消引用函数指针,而不是立即使用该函数指针,但是我不确定.

时间损失不是理想的,但是我当然不想为每种混合模式一遍又一遍地重写blend_it.有什么方法可以避免时间损失？有什么方法可以将函数指针转换为循环外部的局部函数,然后在循环内部更快地访问它？

最佳答案

@ead’s answer说了两件事：

> C也许可以将其优化为直接调用.除了相当简单的情况外,我认为这通常是不正确的,而且对于编译器和OP使用的代码似乎也不是正确的.
>在C语言中,您将改为使用模板-这确实是正确的,并且由于总是在编译时就知道模板类型,因此优化通常很容易.

Cython和C模板有点混乱,因此我认为您不想在这里使用它们.但是,Cython确实具有名为fused types的类似模板的功能.您可以使用融合类型来获得编译时优化,如下所示.该代码的大致轮廓是：

>为要执行的所有操作定义一个包含静态方法cdef函数的cdef类.
>定义一个包含所有cdef类的融合类型. (这是此方法的局限性-它不容易扩展,因此,如果要添加操作,则必须编辑代码)
>定义一个函数,该函数采用您的融合类型的虚拟参数.使用此类型来调用静态方法.
>定义包装器功能-您需要使用显式的[type]语法使其起作用.

码：

import cython

cdef class Plus:
    @staticmethod
    cdef double func(double x):
        return x+1    

cdef class Minus:
    @staticmethod
    cdef double func(double x):
        return x-1

ctypedef fused pick_func:
    Plus
    Minus

cdef run_func(double [::1] x,pick_func dummy):
    cdef int i
    with cython.boundscheck(False),cython.wraparound(False):
        for i in range(x.shape[0]):
            x[i] = cython.typeof(dummy).func(x[i])
    return x.base

def run_func_plus(x):
    return run_func[Plus](x,Plus())

def run_func_minus(x):
    return run_func[Minus](x,Minus())

为了进行比较,使用函数指针的等效代码为

cdef double add_one(double x):
    return x+1

cdef double minus_one(double x):
    return x-1

cdef run_func_ptr(double [::1] x,double (*f)(double)):
    cdef int i
    with cython.boundscheck(False),cython.wraparound(False):
        for i in range(x.shape[0]):
            x[i] = f(x[i])
    return x.base

def run_func_ptr_plus(x):
    return run_func_ptr(x,add_one)

def run_func_ptr_minus(x):
    return run_func_ptr(x,minus_one)

与使用函数指针相比,使用timeit可以使速度提高约2.5倍.这表明函数指针并未针对我进行优化(但是我没有尝试更改编译器设置来尝试改进)

import numpy as np
import example

# show the two methods give the same answer
print(example.run_func_plus(np.ones((10,))))
print(example.run_func_minus(np.ones((10,))))

print(example.run_func_ptr_plus(np.ones((10,))))
print(example.run_func_ptr_minus(np.ones((10,))))

from timeit import timeit

# timing comparison
print(timeit("""run_func_plus(x)""","""from example import run_func_plus
from numpy import zeros
x = zeros((10000,))
""",number=10000))

print(timeit("""run_func_ptr_plus(x)""","""from example import run_func_ptr_plus
from numpy import zeros
x = zeros((10000,number=10000))

python-Cython函数指针取消引用时间(与直接调用函数相比)

猜你在找的Python相关文章