我是熊猫新手.
我有一个看起来像这样的数据框(只有更大):
Horses RaceDate Position
1 RedHorse 1/2/00 2
2 BlueHorse 1/2/00 6
3 YellowHorse 1/2/00 7
4 RedHorse 15/1/00 3
我想为以前的结果添加列.这样我的数据框可能最终看起来像:
Horses RaceDate Position PrevPosition
1 RedHorse 1/2/00 2 3
2 BlueHorse 1/2/00 6 -
3 YellowHorse 1/2/00 7 -
4 RedHorse 15/1/00 3 -
我尝试了以下方法:
def prevRuns(horseName,raceDate):
horseDf = df.loc[df['Horse'] == horseName]
currentRace = horseDf.index[horseDf['RaceDate'] == raceDate]
if len(horseDf.index) >= currentRace:
return horseDf.at[currentRace+1,'Position']
else:
return 0
df['prevRun'] = df['Horse'].apply(prevRuns,raceDate = df['RaceDate'])
但这是行不通的.
ValueError: Can only compare identically-labeled Series objects
为什么不起作用?
有没有更优雅的方式来实现我要完成的任务?
最佳答案
您可以使用groupby shift:
# convert dates to datetime and sort descending
df['RaceDate'] = pd.to_datetime(df['RaceDate'],dayfirst=True)
df = df.sort_values('RaceDate',ascending=False)
# groupby and shift for prevIoUs position
df['PrevPosition'] = df.groupby('Horses')['Position'].shift(-1)
print(df)
Horses RaceDate Position PrevPosition
1 RedHorse 2000-02-01 2 3.0
2 BlueHorse 2000-02-01 6 NaN
3 YellowHorse 2000-02-01 7 NaN
4 RedHorse 2000-01-15 3 NaN