我需要生成一维数组,其中重复的整数序列由随机数的零分隔.
到目前为止,我正在使用下一个代码:
from random import normalvariate regular_sequence = np.array([1,2,3,4,5],dtype=np.int) n_iter = 10 lag_mean = 10 # mean length of zeros sequence lag_sd = 1 # standard deviation of zeros sequence length # Sequence of lags lengths lag_seq = [int(round(normalvariate(lag_mean,lag_sd))) for x in range(n_iter)] # Generate list of concatenated zeros and regular sequences seq = [np.concatenate((np.zeros(x,dtype=np.int),regular_sequence)) for x in lag_seq] seq = np.concatenate(seq)
当我需要很多长序列时它可以工作但看起来很慢.那么,我该如何优化呢?
解决方法
您可以预先计算要放置重复regular_sequence元素的索引,然后以矢量化方式设置具有regular_sequence的索引.为了预先计算这些索引,可以使用
np.cumsum
来获得每个这样的regular_sequence块的开始,然后添加一组连续的整数,扩展到regular_sequence的大小以获得要更新的所有索引.因此,实现看起来像这样 –
# Size of regular_sequence N = regular_sequence.size # Use cumsum to pre-compute start of every occurance of regular_sequence offset_arr = np.cumsum(lag_seq) idx = np.arange(offset_arr.size)*N + offset_arr # Setup output array out = np.zeros(idx.max() + N,dtype=regular_sequence.dtype) # Broadcast the start indices to include entire length of regular_sequence # to get all positions where regular_sequence elements are to be set np.put(out,idx[:,None] + np.arange(N),regular_sequence)
运行时测试 –
def original_app(lag_seq,regular_sequence): seq = [np.concatenate((np.zeros(x,regular_sequence)) for x in lag_seq] return np.concatenate(seq) def vectorized_app(lag_seq,regular_sequence): N = regular_sequence.size offset_arr = np.cumsum(lag_seq) idx = np.arange(offset_arr.size)*N + offset_arr out = np.zeros(idx.max() + N,dtype=regular_sequence.dtype) np.put(out,regular_sequence) return out In [64]: # Setup inputs ...: regular_sequence = np.array([1,dtype=np.int) ...: n_iter = 1000 ...: lag_mean = 10 # mean length of zeros sequence ...: lag_sd = 1 # standard deviation of zeros sequence length ...: ...: # Sequence of lags lengths ...: lag_seq = [int(round(normalvariate(lag_mean,lag_sd))) for x in range(n_iter)] ...: In [65]: out1 = original_app(lag_seq,regular_sequence) In [66]: out2 = vectorized_app(lag_seq,regular_sequence) In [67]: %timeit original_app(lag_seq,regular_sequence) 100 loops,best of 3: 4.28 ms per loop In [68]: %timeit vectorized_app(lag_seq,regular_sequence) 1000 loops,best of 3: 294 µs per loop