我有一个超过40万行的pandas数据帧,现在我想计算每行的四分位数范围,但我的代码产生了以下错误:
cannot do a non empty take from an empty axes
我的代码:
def calIQR(x):
x=x.dropna()
return (np.percentile(x,75),np.percentile(x,25))
df["count"]=df.iloc[:,2:64].apply(calIQR,axis=1)
我正在运行Python 2.7.13
在每一行中,都有一些NaN值,但我确信所有NaN都没有行.
最佳答案
我想这里是问题行,所有NaNs值都在2到63列,x = x.dropna返回空系列.
np.random.seed(100)
df = pd.DataFrame(np.random.random((5,5)))
df.loc[3,[3,4]] = np.nan
df.loc[2] = np.nan
print (df)
0 1 2 3 4
0 0.543405 0.278369 0.424518 0.844776 0.004719
1 0.121569 0.670749 0.825853 0.136707 0.575093
2 NaN NaN NaN NaN NaN
3 0.978624 0.811683 0.171941 NaN NaN
4 0.431704 0.940030 0.817649 0.336112 0.175410
def calIQR(x):
x = x.dropna()
return (np.percentile(x,2:4].dropna(how='all').apply(calIQR,axis=1)
print (df)
0 1 2 3 4 \
0 0.543405 0.278369 0.424518 0.844776 0.004719
1 0.121569 0.670749 0.825853 0.136707 0.575093
2 NaN NaN NaN NaN NaN
3 0.978624 0.811683 0.171941 NaN NaN
4 0.431704 0.940030 0.817649 0.336112 0.175410
count
0 (0.739711496927,0.529582226142)
1 (0.65356621375,0.30899313104)
2 NaN
3 (0.171941012733,0.171941012733)
4 (0.697265021613,0.456496307285)
或者使用Series.quantile
:
def calIQR(x):
return (x.quantile(.75),x.quantile(.25))
#with real data change 2;4 to 2:64
df["count"]=df.iloc[:,2:4].apply(calIQR,axis=1)
print (df)
0 1 2 3 4 \
0 0.543405 0.278369 0.424518 0.844776 0.004719
1 0.121569 0.670749 0.825853 0.136707 0.575093
2 NaN NaN NaN NaN NaN
3 0.978624 0.811683 0.171941 NaN NaN
4 0.431704 0.940030 0.817649 0.336112 0.175410
count
0 (0.7397114969272109,0.5295822261418257)
1 (0.653566213750024,0.3089931310399766)
2 (nan,nan)
3 (0.1719410127325942,0.1719410127325942)
4 (0.6972650216127702,0.45649630728485585)