我希望将字符串拆分为某个字符,同时将该字符保留在第二个结果字符串中.我可以实现几乎所有所需的操作,除了我丢失了我在strsplit中指定的字符,我猜这个字符称为分隔符.
有没有办法要求strsplit保留分隔符?或者我必须使用某种正则表达式吗?谢谢你的任何建议.这似乎是一个非常基本的问题.对不起,如果它是重复的.我更喜欢使用底座R.
这是一个显示我到目前为止的例子:
my.table <- read.table(text = ' model npar AICc AA(~region+state+county+city)BB(~region+state+county+city)CC(~1) 17 11111.11 AA(~region+state+county)BB(~region+state+county)CC(~123) 14 22222.22 AA(~region+state)BB(~region+state)CC(~33) 13 33333.33 AA(~region)BB(~region)CC(~4321) 6 44444.44 ',header = TRUE,stringsAsFactors = FALSE) desired.result <- read.table(text = ' model CC npar AICc AA(~region+state+county+city)BB(~region+state+county+city) CC(~1) 17 11111.11 AA(~region+state+county)BB(~region+state+county) CC(~123) 14 22222.22 AA(~region+state)BB(~region+state) CC(~33) 13 33333.33 AA(~region)BB(~region) CC(~4321) 6 44444.44 ',stringsAsFactors = FALSE) split.model <- strsplit(my.table$model,'CC\\(') split.models <- matrix(unlist(split.model),ncol=2,byrow=TRUE,dimnames = list(NULL,c("model","CC"))) desires.result2 <- data.frame(split.models,my.table[,2:ncol(my.table)]) desires.result2 # model CC npar AICc # 1 AA(~region+state+county+city)BB(~region+state+county+city) ~1) 17 11111.11 # 2 AA(~region+state+county)BB(~region+state+county) ~123) 14 22222.22 # 3 AA(~region+state)BB(~region+state) ~33) 13 33333.33 # 4 AA(~region)BB(~region) ~4321) 6 44444.44
解决方法
基本思想是使用从正则表达式到strsplit的环视操作来获得所需的结果.然而,它比strsplit和积极向前看更棘手.阅读@JoshO’Brien的
this excellent post作为解释.
pattern <- "(?<=\\))(?=CC)" strsplit(my.table$model,pattern,perl=TRUE) # [[1]] # [1] "AA(~region+state+county+city)BB(~region+state+county+city)" # [2] "CC(~1)" # [[2]] # [1] "AA(~region+state+county)BB(~region+state+county)" # [2] "CC(~123)" # [[3]] # [1] "AA(~region+state)BB(~region+state)" "CC(~33)" # [[4]] # [1] "AA(~region)BB(~region)" "CC(~4321)"
当然,我将do.call(rbind,…)和cbind的任务留给你,以获得最终的desired.output.