在尝试获取分组滞后变量(仅使用滞后不可能)的过程中,建议的解决方案是将数据拉出,滞后于不同的行,然后重新加入它.
我更喜欢在不创建中间对象的情况下这样做,并且希望在链中间进行.然而,它似乎没有像我期望的那样工作,并且问题似乎是使用之间的一些交互.和left_join中的嵌套链.
require(tidyverse) #> Loading required package: tidyverse df <- data.frame(Team = c("A","A","B","C","D","D"),Date = c("2016-05-10","2016-05-10","2016-05-12","2016-05-15","2016-05-30","2016-05-30"),Points = c(1,4,3,2,1,5,6,9) ) #This works: df %>% left_join(x = .,y = df %>% distinct(Team,Date) %>% mutate(Date_Lagged = lag(Date))) #> Joining,by = c("Team","Date") #> Team Date Points Date_Lagged #> 1 A 2016-05-10 1 <NA> #> 2 A 2016-05-10 4 <NA> #> 3 A 2016-05-10 3 <NA> #> 4 A 2016-05-10 2 <NA> #> 5 B 2016-05-12 1 2016-05-10 #> 6 B 2016-05-12 5 2016-05-10 #> 7 B 2016-05-12 6 2016-05-10 #> 8 C 2016-05-15 1 2016-05-12 #> 9 C 2016-05-15 2 2016-05-12 #> 10 D 2016-05-30 3 2016-05-15 #> 11 D 2016-05-30 9 2016-05-15 #And this works: df %>% left_join(x = .,y = .) #> Joining,"Date","Points") #> Team Date Points #> 1 A 2016-05-10 1 #> 2 A 2016-05-10 4 #> 3 A 2016-05-10 3 #> 4 A 2016-05-10 2 #> 5 B 2016-05-12 1 #> 6 B 2016-05-12 5 #> 7 B 2016-05-12 6 #> 8 C 2016-05-15 1 #> 9 C 2016-05-15 2 #> 10 D 2016-05-30 3 #> 11 D 2016-05-30 9 #This doesn't work despite the fact that `.` is df. df %>% left_join(x = .,y = . %>% distinct(Team,Date) %>% mutate(Date_Lagged = lag(Date))) #> Error in UseMethod("tbl_vars"): no applicable method for 'tbl_vars' applied to an object of class "c('fseq','function')" #Desired output distinct(df,Team,Date) %>% mutate(Date_Lagged = lag(Date)) %>% right_join(.,df) %>% select(Team,Date,Points,Date_Lagged) #> Joining,"Date") #> Team Date Points Date_Lagged #> 1 A 2016-05-10 1 <NA> #> 2 A 2016-05-10 4 <NA> #> 3 A 2016-05-10 3 <NA> #> 4 A 2016-05-10 2 <NA> #> 5 B 2016-05-12 1 2016-05-10 #> 6 B 2016-05-12 5 2016-05-10 #> 7 B 2016-05-12 6 2016-05-10 #> 8 C 2016-05-15 1 2016-05-12 #> 9 C 2016-05-15 2 2016-05-12 #> 10 D 2016-05-30 3 2016-05-15 #> 11 D 2016-05-30 9 2016-05-15
由reprex package(v0.2.0)创建于2018-06-12.
解决方法
为了让你的代码工作,你需要在y参数周围加一个大括号,如下所示
df %>% left_join(x = .,y = {.} %>% distinct(Team,Date) %>% mutate(Date_Lagged = lag(Date))) Joining,"Date") Team Date Points Date_Lagged 1 A 2016-05-10 1 <NA> 2 A 2016-05-10 4 <NA> 3 A 2016-05-10 3 <NA> 4 A 2016-05-10 2 <NA> 5 B 2016-05-12 1 2016-05-10 6 B 2016-05-12 5 2016-05-10 7 B 2016-05-12 6 2016-05-10 8 C 2016-05-15 1 2016-05-12 9 C 2016-05-15 2 2016-05-12 10 D 2016-05-30 3 2016-05-15 11 D 2016-05-30 9 2016-05-15
你可以这样做
df %>% left_join(df%>% distinct(Team,Date) %>% mutate(Date_Lagged = lag(Date)))