如何在滚动运算符中访问多列?

前端之家收集整理的这篇文章主要介绍了如何在滚动运算符中访问多列?前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

我想在pandas中做一些滚动窗口计算,需要同时处理两列.我将用一个简单的例子来清楚地表达问题:

  1. import pandas as pd
  2. df = pd.DataFrame({
  3. 'x': [1,2,3,1,5,4,6,7,9],'y': [4,9,2]
  4. })
  5. windowSize = 4
  6. result = []
  7. for i in range(1,len(df)+1):
  8. if i < windowSize:
  9. result.append(None)
  10. else:
  11. x = df.x.iloc[i-windowSize:i]
  12. y = df.y.iloc[i-windowSize:i]
  13. m = y.mean()
  14. r = sum(x[y > m]) / sum(x[y <= m])
  15. result.append(r)
  16. print(result)
  17. @H_301_7@

有没有办法在没有for pringas循环来解决问题?任何帮助表示赞赏

最佳答案
这是使用NumPy工具的一种矢量化方法

  1. windowSize = 4
  2. a = df.values
  3. X = strided_app(a[:,0],windowSize,1)
  4. Y = strided_app(a[:,1],1)
  5. M = Y.mean(1)
  6. mask = Y>M[:,None]
  7. sums = np.einsum('ij,ij->i',X,mask)
  8. rest_sums = X.sum(1) - sums
  9. out = sums/rest_sums
  10. @H_301_7@

strided_app取自here.

运行时测试 –

方法

  1. # @kazemakase's solution
  2. def rolling_window_sum(df,windowSize=4):
  3. rw = rolling_window(df.values.T,windowSize)
  4. m = np.mean(rw[1],axis=-1,keepdims=True)
  5. a = np.sum(rw[0] * (rw[1] > m),axis=-1)
  6. b = np.sum(rw[0] * (rw[1] <= m),axis=-1)
  7. result = a / b
  8. return result
  9. # Proposed in this post
  10. def strided_einsum(df,windowSize=4):
  11. a = df.values
  12. X = strided_app(a[:,1)
  13. Y = strided_app(a[:,1)
  14. M = Y.mean(1)
  15. mask = Y>M[:,None]
  16. sums = np.einsum('ij,mask)
  17. rest_sums = X.sum(1) - sums
  18. out = sums/rest_sums
  19. return out
  20. @H_301_7@

计时 –

  1. In [46]: df = pd.DataFrame(np.random.randint(0,(1000000,2)))
  2. In [47]: %timeit rolling_window_sum(df)
  3. 10 loops,best of 3: 90.4 ms per loop
  4. In [48]: %timeit strided_einsum(df)
  5. 10 loops,best of 3: 62.2 ms per loop
  6. @H_301_7@

为了获得更多性能,我们可以计算Y.mean(1)部分,它基本上是Scipy's 1D uniform filter的窗口求和.因此,M可以替代地计算为windowSize = 4 –

  1. from scipy.ndimage.filters import uniform_filter1d as unif1d
  2. M = unif1d(a[:,1].astype(float),windowSize)[2:-1]
  3. @H_301_7@

性能提升显着 –

  1. In [65]: %timeit strided_einsum(df)
  2. 10 loops,best of 3: 61.5 ms per loop
  3. In [66]: %timeit strided_einsum_unif_filter(df)
  4. 10 loops,best of 3: 49.4 ms per loop
  5. @H_301_7@

猜你在找的Python相关文章