dplyr / left_join中的嵌套管道链

前端之家收集整理的这篇文章主要介绍了dplyr / left_join中的嵌套管道链前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
在尝试获取分组滞后变量(仅使用滞后不可能)的过程中,建议的解决方案是将数据拉出,滞后于不同的行,然后重新加入它.

我更喜欢在不创建中间对象的情况下这样做,并且希望在链中间进行.然而,它似乎没有像我期望的那样工作,并且问题似乎是使用之间的一些交互.和left_join中的嵌套链.

require(tidyverse)
#> Loading required package: tidyverse
df <- data.frame(Team = c("A","A","B","C","D","D"),Date = c("2016-05-10","2016-05-10","2016-05-12","2016-05-15","2016-05-30","2016-05-30"),Points = c(1,4,3,2,1,5,6,9)
)


#This works:
df %>% left_join(x = .,y = df %>% 
                   distinct(Team,Date) %>% 
                   mutate(Date_Lagged = lag(Date)))
#> Joining,by = c("Team","Date")
#>    Team       Date Points Date_Lagged
#> 1     A 2016-05-10      1        <NA>
#> 2     A 2016-05-10      4        <NA>
#> 3     A 2016-05-10      3        <NA>
#> 4     A 2016-05-10      2        <NA>
#> 5     B 2016-05-12      1  2016-05-10
#> 6     B 2016-05-12      5  2016-05-10
#> 7     B 2016-05-12      6  2016-05-10
#> 8     C 2016-05-15      1  2016-05-12
#> 9     C 2016-05-15      2  2016-05-12
#> 10    D 2016-05-30      3  2016-05-15
#> 11    D 2016-05-30      9  2016-05-15

#And this works:
df %>% left_join(x = .,y = .)
#> Joining,"Date","Points")
#>    Team       Date Points
#> 1     A 2016-05-10      1
#> 2     A 2016-05-10      4
#> 3     A 2016-05-10      3
#> 4     A 2016-05-10      2
#> 5     B 2016-05-12      1
#> 6     B 2016-05-12      5
#> 7     B 2016-05-12      6
#> 8     C 2016-05-15      1
#> 9     C 2016-05-15      2
#> 10    D 2016-05-30      3
#> 11    D 2016-05-30      9

#This doesn't work despite the fact that `.` is df.  
df %>% left_join(x = .,y = . %>% 
                   distinct(Team,Date) %>% 
                   mutate(Date_Lagged = lag(Date)))
#> Error in UseMethod("tbl_vars"): no applicable method for 'tbl_vars' applied to an object of class "c('fseq','function')"



#Desired output
distinct(df,Team,Date) %>%
  mutate(Date_Lagged = lag(Date)) %>%
  right_join(.,df) %>%
  select(Team,Date,Points,Date_Lagged)
#> Joining,"Date")
#>    Team       Date Points Date_Lagged
#> 1     A 2016-05-10      1        <NA>
#> 2     A 2016-05-10      4        <NA>
#> 3     A 2016-05-10      3        <NA>
#> 4     A 2016-05-10      2        <NA>
#> 5     B 2016-05-12      1  2016-05-10
#> 6     B 2016-05-12      5  2016-05-10
#> 7     B 2016-05-12      6  2016-05-10
#> 8     C 2016-05-15      1  2016-05-12
#> 9     C 2016-05-15      2  2016-05-12
#> 10    D 2016-05-30      3  2016-05-15
#> 11    D 2016-05-30      9  2016-05-15

reprex package(v0.2.0)创建于2018-06-12.

解决方法

为了让你的代码工作,你需要在y参数周围加一个大括号,如下所示
df %>% left_join(x = .,y = {.} %>% 
                   distinct(Team,Date) %>% 
                   mutate(Date_Lagged = lag(Date)))

Joining,"Date")
   Team       Date Points Date_Lagged
1     A 2016-05-10      1        <NA>
2     A 2016-05-10      4        <NA>
3     A 2016-05-10      3        <NA>
4     A 2016-05-10      2        <NA>
5     B 2016-05-12      1  2016-05-10
6     B 2016-05-12      5  2016-05-10
7     B 2016-05-12      6  2016-05-10
8     C 2016-05-15      1  2016-05-12
9     C 2016-05-15      2  2016-05-12
10    D 2016-05-30      3  2016-05-15
11    D 2016-05-30      9  2016-05-15

你可以这样做

df %>% left_join(df%>% 
                   distinct(Team,Date) %>% 
                   mutate(Date_Lagged = lag(Date)))

猜你在找的MsSQL相关文章