我需要生成一维数组,其中重复的整数序列由随机数的零分隔.
到目前为止,我正在使用下一个代码:
- from random import normalvariate
- regular_sequence = np.array([1,2,3,4,5],dtype=np.int)
- n_iter = 10
- lag_mean = 10 # mean length of zeros sequence
- lag_sd = 1 # standard deviation of zeros sequence length
- # Sequence of lags lengths
- lag_seq = [int(round(normalvariate(lag_mean,lag_sd))) for x in range(n_iter)]
- # Generate list of concatenated zeros and regular sequences
- seq = [np.concatenate((np.zeros(x,dtype=np.int),regular_sequence)) for x in lag_seq]
- seq = np.concatenate(seq)
当我需要很多长序列时它可以工作但看起来很慢.那么,我该如何优化呢?
解决方法
您可以预先计算要放置重复regular_sequence元素的索引,然后以矢量化方式设置具有regular_sequence的索引.为了预先计算这些索引,可以使用
np.cumsum
来获得每个这样的regular_sequence块的开始,然后添加一组连续的整数,扩展到regular_sequence的大小以获得要更新的所有索引.因此,实现看起来像这样 –
- # Size of regular_sequence
- N = regular_sequence.size
- # Use cumsum to pre-compute start of every occurance of regular_sequence
- offset_arr = np.cumsum(lag_seq)
- idx = np.arange(offset_arr.size)*N + offset_arr
- # Setup output array
- out = np.zeros(idx.max() + N,dtype=regular_sequence.dtype)
- # Broadcast the start indices to include entire length of regular_sequence
- # to get all positions where regular_sequence elements are to be set
- np.put(out,idx[:,None] + np.arange(N),regular_sequence)
运行时测试 –
- def original_app(lag_seq,regular_sequence):
- seq = [np.concatenate((np.zeros(x,regular_sequence)) for x in lag_seq]
- return np.concatenate(seq)
- def vectorized_app(lag_seq,regular_sequence):
- N = regular_sequence.size
- offset_arr = np.cumsum(lag_seq)
- idx = np.arange(offset_arr.size)*N + offset_arr
- out = np.zeros(idx.max() + N,dtype=regular_sequence.dtype)
- np.put(out,regular_sequence)
- return out
- In [64]: # Setup inputs
- ...: regular_sequence = np.array([1,dtype=np.int)
- ...: n_iter = 1000
- ...: lag_mean = 10 # mean length of zeros sequence
- ...: lag_sd = 1 # standard deviation of zeros sequence length
- ...:
- ...: # Sequence of lags lengths
- ...: lag_seq = [int(round(normalvariate(lag_mean,lag_sd))) for x in range(n_iter)]
- ...:
- In [65]: out1 = original_app(lag_seq,regular_sequence)
- In [66]: out2 = vectorized_app(lag_seq,regular_sequence)
- In [67]: %timeit original_app(lag_seq,regular_sequence)
- 100 loops,best of 3: 4.28 ms per loop
- In [68]: %timeit vectorized_app(lag_seq,regular_sequence)
- 1000 loops,best of 3: 294 µs per loop