有人知道将一个数字的文本表示形式转换为实际数字的功能,例如“二万三百五十”到20305年.我在数据帧行中编写了数字,并将其转换为数字.
在包qdap中,您可以用字代替数字表示的数字(例如,1001变为一千个),但不能相反:
library(qdap) replace_number("I like 346457 ice cream cones.") [1] "I like three hundred forty six thousand four hundred fifty seven ice cream cones."
解决方法
这是一个开始,应该让你成千上万.
word2num <- function(word){ wsplit <- strsplit(tolower(word)," ")[[1]] one_digits <- list(zero=0,one=1,two=2,three=3,four=4,five=5,six=6,seven=7,eight=8,nine=9) teens <- list(eleven=11,twelve=12,thirteen=13,fourteen=14,fifteen=15,sixteen=16,seventeen=17,eighteen=18,nineteen=19) ten_digits <- list(ten=10,twenty=20,thirty=30,forty=40,fifty=50,sixty=60,seventy=70,eighty=80,ninety=90) doubles <- c(teens,ten_digits) out <- 0 i <- 1 while(i <= length(wsplit)){ j <- 1 if(i==1 && wsplit[i]=="hundred") temp <- 100 else if(i==1 && wsplit[i]=="thousand") temp <- 1000 else if(wsplit[i] %in% names(one_digits)) temp <- as.numeric(one_digits[wsplit[i]]) else if(wsplit[i] %in% names(teens)) temp <- as.numeric(teens[wsplit[i]]) else if(wsplit[i] %in% names(ten_digits)) temp <- (as.numeric(ten_digits[wsplit[i]])) if(i < length(wsplit) && wsplit[i+1]=="hundred"){ if(i>1 && wsplit[i-1] %in% c("hundred","thousand")) out <- out + 100*temp else out <- 100*(out + temp) j <- 2 } else if(i < length(wsplit) && wsplit[i+1]=="thousand"){ if(i>1 && wsplit[i-1] %in% c("hundred","thousand")) out <- out + 1000*temp else out <- 1000*(out + temp) j <- 2 } else if(i < length(wsplit) && wsplit[i+1] %in% names(doubles)){ temp <- temp*100 out <- out + temp } else{ out <- out + temp } i <- i + j } return(list(word,out)) }
结果:
> word2num("fifty seven") [[1]] [1] "fifty seven" [[2]] [1] 57 > word2num("four fifty seven") [[1]] [1] "four fifty seven" [[2]] [1] 457 > word2num("six thousand four fifty seven") [[1]] [1] "six thousand four fifty seven" [[2]] [1] 6457 > word2num("forty six thousand four fifty seven") [[1]] [1] "forty six thousand four fifty seven" [[2]] [1] 46457 > word2num("forty six thousand four hundred fifty seven") [[1]] [1] "forty six thousand four hundred fifty seven" [[2]] [1] 46457 > word2num("three forty six thousand four hundred fifty seven") [[1]] [1] "three forty six thousand four hundred fifty seven" [[2]] [1] 346457
我可以告诉你,这对于word2num(“四十万五十”)是不行的,因为它不知道如何处理连续的“百”和“千”字,但算法可以大概修改.任何人都应该随意修改,如果他们有改进或建立在他们自己的答案.我只是认为这是一个有趣的问题,玩一会儿.