正则表达式 – 如何根据部分字符串匹配R中的其他列在数据帧中创建新列

前端之家收集整理的这篇文章主要介绍了正则表达式 – 如何根据部分字符串匹配R中的其他列在数据帧中创建新列前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我在 r中有一个数据帧,带有2列GL和GLDESC,并且根据GLDESC列中的某些数据,添加一个名为KIND的第3列.

数据框如下:

GL                             GLDESC
1 515100         Payroll-Indir Salary Labor
2 515900 Payroll-Indir Compensated Absences
3 532300                           Bulk Gas
4 539991                     Area Charge In
5 551000        Repairs & Maint-Spare Parts
6 551100                 Supplies-Operating
7 551300                        Consumables

对于数据表的每行:

>如果GLDESC在字符串中包含单词Payroll,那我想要KIND为工资单
>如果GLDESC在字符串中包含“Gas”字样,那么我希望KIND成为“材料”
>在所有其他情况下,我想要KIND是其他的

我在stackoverflow上寻找类似的例子,但是找不到任何的,也在R中看到在转换,grep,apply和正则表达式上的虚拟变量只尝试匹配GLDESC列的一部分,然后用该类型的字段填充KIND列,无法使其工作.

任何帮助都不胜感激.

感谢:D

由于您只有两个条件,您可以使用嵌套ifelse:
#random data; it wasn't easy to copy-paste yours  
DF <- data.frame(GL = sample(10),GLDESC = paste(sample(letters,10),c("gas","payroll12","GaSer","asdf","qweaa","PayROll-12","asdfg","GAS--2","fghfgh","qweee"),sample(letters,sep = " "))

DF$KIND <- ifelse(grepl("gas",DF$GLDESC,ignore.case = T),"Materials",ifelse(grepl("payroll","Payroll","Other"))

DF
#   GL         GLDESC      KIND
#1   8        e gas l Materials
#2   1  c payroll12 y   Payroll
#3  10      m GaSer v Materials
#4   6       t asdf n     Other
#5   2      w qweaa t     Other
#6   4 r PayROll-12 q   Payroll
#7   9      n asdfg a     Other
#8   5     d GAS--2 w Materials
#9   7     s fghfgh e     Other
#10  3      g qweee k     Other

编辑10/3/2016(..收到比预期更多的关注)

处理更多模式的可能解决方案可能是迭代所有模式,并且每当有匹配时,逐渐减少比较的数量

ff = function(x,patterns,replacements = patterns,fill = NA,...)
{
    stopifnot(length(patterns) == length(replacements))

    ans = rep_len(as.character(fill),length(x))    
    empty = seq_along(x)

    for(i in seq_along(patterns)) {
        greps = grepl(patterns[[i]],x[empty],...)
        ans[empty[greps]] = replacements[[i]]  
        empty = empty[!greps]
    }

    return(ans)
}

ff(DF$GLDESC,"payroll"),c("Materials","Payroll"),"Other",ignore.case = TRUE)
# [1] "Materials" "Payroll"   "Materials" "Other"     "Other"     "Payroll"   "Other"     "Materials" "Other"     "Other"

ff(c("pat1a pat2","pat1a pat1b","pat3","pat4"),c("pat1a|pat1b","pat2","pat3"),c("1","2","3"),fill = "empty")
#[1] "1"     "1"     "3"     "empty"

ff(c("pat1a pat2",c("pat2","pat1a|pat1b",c("2","1",fill = "empty")
#[1] "2"     "1"     "3"     "empty"

猜你在找的正则表达式相关文章