我想使用ALTER TABLE和UPDATE语句向我的表中添加一列,而不是重新创建完整的表.
在我的UPDATE语句中使用子查询时,我没有得到我期望的输出.
建立可重复的数据
library(dplyr) library(dbplyr) library(DBI) con <- DBI::dbConnect(Rsqlite::sqlite(),path = ":memory:") copy_to(con,iris[c(1,2,51),],"iris") tbl(con,"iris") # # Source: table<iris> [?? x 5] # # Database: sqlite 3.19.3 [] # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # <dbl> <dbl> <dbl> <dbl> <chr> # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 7.0 3.2 4.7 1.4 versicolor
在单独的表中创建新列
DBI::dbSendQuery(con,"CREATE TABLE new_table AS SELECT t2.new_col from iris t1 inner join (SELECT Species,sum(`Sepal.Width`) as new_col FROM iris GROUP BY Species) t2 on t1.Species = t2.Species") tbl(con,"new_table") # # Source: table<new_table> [?? x 1] # # Database: sqlite 3.19.3 [] # new_col # <dbl> # 1 6.5 # 2 6.5 # 3 3.2
在旧表中创建新列
DBI::dbSendQuery(con,"ALTER TABLE iris ADD COLUMN new_col DOUBLE")
尝试从new_table插入新列
DBI::dbSendQuery(con,"UPDATE iris SET new_col = (SELECT new_col FROM new_table)") tbl(con,"iris") # # Source: table<iris> [?? x 6] # # Database: sqlite 3.19.3 [] # Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_col # <dbl> <dbl> <dbl> <dbl> <chr> <dbl> # 1 5.1 3.5 1.4 0.2 setosa 6.5 # 2 4.9 3.0 1.4 0.2 setosa 6.5 # 3 7.0 3.2 4.7 1.4 versicolor 6.5
正如你所看到的,我的new_col只包含值6.5,我希望在最后一行有3.2.我怎样才能解决这个问题 ?
sql数据库中表中的行没有固有顺序.所以你不能像在R中那样分配值的“向量”.但是,你可以稍微修改你的查询:
library(dplyr) library(DBI) con <- DBI::dbConnect(Rsqlite::sqlite(),"iris")
使用聚合数据创建单独的表
DBI::dbSendQuery(con,"CREATE TABLE new_table AS SELECT Species,sum(`Sepal.Width`) as new_col FROM iris GROUP BY Species") tbl(con,"new_table") #> # Source: table<new_table> [?? x 2] #> # Database: sqlite 3.22.0 [] #> Species new_col #> <chr> <dbl> #> 1 setosa 6.5 #> 2 versicolor 3.2
在旧表中创建新列
DBI::dbSendQuery(con,"ALTER TABLE iris ADD COLUMN new_col DOUBLE")
使用相关子查询将数据移动到原始表
DBI::dbSendQuery(con,"UPDATE iris SET new_col = (SELECT new_col FROM new_table t2 WHERE iris.Species = t2.Species)") tbl(con,"iris") #> # Source: table<iris> [?? x 6] #> # Database: sqlite 3.22.0 [] #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_col #> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> #> 1 5.1 3.5 1.4 0.2 setosa 6.5 #> 2 4.9 3 1.4 0.2 setosa 6.5 #> 3 7 3.2 4.7 1.4 versicolor 3.2
如果你有多个计算列,你可以像这样使用UPDATE … SET(c1,c2,…)=(…):
library(dplyr) library(dbplyr) library(DBI) con <- DBI::dbConnect(Rsqlite::sqlite(),"iris") DBI::dbSendQuery(con,"CREATE TABLE aggs AS SELECT Species,SUM(`Sepal.Width`) AS sw_sum,AVG(`Sepal.Width`) AS sw_avg FROM iris GROUP BY Species") tbl(con,"aggs") #> # Source: table<aggs> [?? x 3] #> # Database: sqlite 3.22.0 [] #> Species sw_sum sw_avg #> <chr> <dbl> <dbl> #> 1 setosa 6.5 3.25 #> 2 versicolor 3.2 3.2 DBI::dbSendQuery(con,"ALTER TABLE iris ADD COLUMN sw_sum DOUBLE") DBI::dbSendQuery(con,"ALTER TABLE iris ADD COLUMN sw_avg DOUBLE") DBI::dbSendQuery(con,"UPDATE iris SET (sw_sum,sw_avg) = (SELECT sw_sum,sw_avg FROM aggs WHERE iris.Species = aggs.Species)") tbl(con,"iris") #> # Source: table<iris> [?? x 7] #> # Database: sqlite 3.22.0 [] #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sw_sum sw_avg #> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> #> 1 5.1 3.5 1.4 0.2 setosa 6.5 3.25 #> 2 4.9 3 1.4 0.2 setosa 6.5 3.25 #> 3 7 3.2 4.7 1.4 versico… 3.2 3.2
这也适用于Postgres,但可能不适用于sql Server.
实际上,在这种情况下,不需要中间表:
library(dplyr) library(dbplyr) library(DBI) con <- DBI::dbConnect(Rsqlite::sqlite(),sw_avg) = (SELECT sw_sum,sw_avg FROM (SELECT Species,AVG(`Sepal.Width`) AS sw_avg FROM iris GROUP BY Species) aggs WHERE iris.Species = aggs.Species)") tbl(con,"iris") #> # Source: table<iris> [?? x 7] #> # Database: sqlite 3.22.0 [] #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sw_sum sw_avg #> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> #> 1 5.1 3.5 1.4 0.2 setosa 6.5 3.25 #> 2 4.9 3 1.4 0.2 setosa 6.5 3.25 #> 3 7 3.2 4.7 1.4 versico… 3.2 3.2
但是,中间表在其他情况下可能会有所帮助.例如,在链接问题中使用R创建它时.