我在
r中有一个数据帧,带有2列GL和GLDESC,并且根据GLDESC列中的某些数据,添加一个名为KIND的第3列.
数据框如下:
GL GLDESC 1 515100 Payroll-Indir Salary Labor 2 515900 Payroll-Indir Compensated Absences 3 532300 Bulk Gas 4 539991 Area Charge In 5 551000 Repairs & Maint-Spare Parts 6 551100 Supplies-Operating 7 551300 Consumables
对于数据表的每行:
>如果GLDESC在字符串中包含单词Payroll,那我想要KIND为工资单
>如果GLDESC在字符串中包含“Gas”字样,那么我希望KIND成为“材料”
>在所有其他情况下,我想要KIND是其他的
我在stackoverflow上寻找类似的例子,但是找不到任何的,也在R中看到在转换,grep,apply和正则表达式上的虚拟变量只尝试匹配GLDESC列的一部分,然后用该类型的字段填充KIND列,无法使其工作.
任何帮助都不胜感激.
感谢:D
由于您只有两个条件,您可以使用嵌套ifelse:
#random data; it wasn't easy to copy-paste yours DF <- data.frame(GL = sample(10),GLDESC = paste(sample(letters,10),c("gas","payroll12","GaSer","asdf","qweaa","PayROll-12","asdfg","GAS--2","fghfgh","qweee"),sample(letters,sep = " ")) DF$KIND <- ifelse(grepl("gas",DF$GLDESC,ignore.case = T),"Materials",ifelse(grepl("payroll","Payroll","Other")) DF # GL GLDESC KIND #1 8 e gas l Materials #2 1 c payroll12 y Payroll #3 10 m GaSer v Materials #4 6 t asdf n Other #5 2 w qweaa t Other #6 4 r PayROll-12 q Payroll #7 9 n asdfg a Other #8 5 d GAS--2 w Materials #9 7 s fghfgh e Other #10 3 g qweee k Other
编辑10/3/2016(..收到比预期更多的关注)
处理更多模式的可能解决方案可能是迭代所有模式,并且每当有匹配时,逐渐减少比较的数量:
ff = function(x,patterns,replacements = patterns,fill = NA,...) { stopifnot(length(patterns) == length(replacements)) ans = rep_len(as.character(fill),length(x)) empty = seq_along(x) for(i in seq_along(patterns)) { greps = grepl(patterns[[i]],x[empty],...) ans[empty[greps]] = replacements[[i]] empty = empty[!greps] } return(ans) } ff(DF$GLDESC,"payroll"),c("Materials","Payroll"),"Other",ignore.case = TRUE) # [1] "Materials" "Payroll" "Materials" "Other" "Other" "Payroll" "Other" "Materials" "Other" "Other" ff(c("pat1a pat2","pat1a pat1b","pat3","pat4"),c("pat1a|pat1b","pat2","pat3"),c("1","2","3"),fill = "empty") #[1] "1" "1" "3" "empty" ff(c("pat1a pat2",c("pat2","pat1a|pat1b",c("2","1",fill = "empty") #[1] "2" "1" "3" "empty"