df <- data.frame(name=c("john","david","callum","joanna","allison","slocum","lisa"),id=1:7) df name id 1 john 1 2 david 2 3 callum 3 4 joanna 4 5 allison 5 6 slocum 6 7 lisa 7
我有一个包含正则表达式的向量,我希望在df $name变量中找到它:
vec <- c("lis","^jo","um$")
我想得到的输出如下:
name id group 1 john 1 2 2 david 2 NA 3 callum 3 3 4 joanna 4 2 5 allison 5 1 6 slocum 6 3 7 lisa 7 1
我可以做到以下几点:
df$group <- ifelse(grepl("lis",df$name),1,ifelse(grepl("^jo",2,ifelse(grepl("um$",3,NA)
但是,我想直接从’vec’这样做.我在一个闪亮的应用程序中反应性地生成不同的值.我可以根据vec中的索引分配组吗?
此外,如果发生类似下面的事情,该组应该是第一个出现的.例如’callum’对于’all’和“um $”为TRUE,但是应该在这里获得组1.
vec <- c("all","um$")
df$group <- apply(Vectorize(grepl,"pattern")(vec,function(ii) which(ii)[1]) # name id group #1 john 1 2 #2 david 2 NA #3 callum 3 3 #4 joanna 4 2 #5 allison 5 1 #6 slocum 6 3 #7 lisa 7 1
使用命名向量并在其上合并:
names(vec) <- seq_along(vec) df <- merge(df,stack(Vectorize(grep,"pattern",SIMPLIFY=FALSE)(vec,df$name)),by.x="id",by.y="values",all.x = TRUE) df[!duplicated(df$id),] # to keep only the first match # id name ind #1 1 john 2 #2 2 david <NA> #3 3 callum 3 #4 4 joanna 2 #5 5 allison 1 #6 6 slocum 3 #7 7 lisa 1
一个for循环:
df$group <- NA for ( i in rev(seq_along(vec))) { TFvec <- grepl(vec[i],df$name) df$group[TFvec] <- i } df # name id group #1 john 1 2 #2 david 2 NA #3 callum 3 3 #4 joanna 4 2 #5 allison 5 1 #6 slocum 6 3 #7 lisa 7 1
或者你可以使用带有stringi的stri_match_first_regex的outer
library(stringi) match.mat <- outer(df$name,vec,stri_match_first_regex) df$group <- apply(match.mat,function(ii) which(!is.na(ii))[1]) # [1] for first match in `vec` # name id group #1 john 1 2 #2 david 2 NA #3 callum 3 3 #4 joanna 4 2 #5 allison 5 1 #6 slocum 6 3 #7 lisa 7 1