我有一个巨大的csv文件.它的大小约9 gb.我有16公斤的公羊.我遵循了
page的建议,并在下面实施.
If you get the error that R cannot allocate a vector of length x,close out of R and add the following line to the ``Target'' field: --max-vsize=500M
我仍然收到以下错误和警告.我应该怎么把9GB的文件读入我的R?我有R 64位3.3.1,我在rstudio 0.99.903中运行下面的命令.我有windows server 2012 r2标准,64位操作系统.
> memory.limit() [1] 16383 > answer=read.csv("C:/Users/a-vs/results_20160291.csv") Error: cannot allocate vector of size 500.0 Mb In addition: There were 12 warnings (use warnings() to see them) > warnings() Warning messages: 1: In scan(file = file,what = what,sep = sep,quote = quote,... : Reached total allocation of 16383Mb: see help(memory.size) 2: In scan(file = file,... : Reached total allocation of 16383Mb: see help(memory.size) 3: In scan(file = file,... : Reached total allocation of 16383Mb: see help(memory.size) 4: In scan(file = file,... : Reached total allocation of 16383Mb: see help(memory.size) 5: In scan(file = file,... : Reached total allocation of 16383Mb: see help(memory.size) 6: In scan(file = file,... : Reached total allocation of 16383Mb: see help(memory.size) 7: In scan(file = file,... : Reached total allocation of 16383Mb: see help(memory.size) 8: In scan(file = file,... : Reached total allocation of 16383Mb: see help(memory.size) 9: In scan(file = file,... : Reached total allocation of 16383Mb: see help(memory.size) 10: In scan(file = file,... : Reached total allocation of 16383Mb: see help(memory.size) 11: In scan(file = file,... : Reached total allocation of 16383Mb: see help(memory.size) 12: In scan(file = file,... : Reached total allocation of 16383Mb: see help(memory.size)
——————- Update1
我的第一次尝试根据建议的答案
> thefile=fread("C:/Users/a-vs/results_20160291.csv",header = T) Read 44099243 rows and 36 (of 36) columns from 9.399 GB file in 00:13:34 Warning messages: 1: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv",: Reached total allocation of 16383Mb: see help(memory.size) 2: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv",: Reached total allocation of 16383Mb: see help(memory.size)
——————- Update2
根据建议的答案我的第二次尝试如下
thefile2 <- read.csv.ffdf(file="C:/Users/a-vs/results_20160291.csv",header=TRUE,VERBOSE=TRUE,+ first.rows=-1,next.rows=50000,colClasses=NA) read.table.ffdf 1.. Error: cannot allocate vector of size 125.0 Mb In addition: There were 14 warnings (use warnings() to see them)
我如何将该文件读入单个对象,以便我可以一次性分析整个数据
——————更新3
我们买了一台昂贵的机器.它有10个内核和256 gb ram.这不是最有效的解决方案,但它至少在不久的将来会起作用.我看下面的答案,我不认为他们解决我的问题:(我喜欢这些答案,我想执行市场篮子分析,我不认为没有别的办法,而不是保持我的数据在RAM
确保您使用64位R,而不仅仅是64位Windows,以便您可以将RAM分配增加到所有16 GB.
此外,您可以以块的形式阅读文件:
file_in <- file("in.csv","r") chunk_size <- 100000 # choose the best size for you x <- readLines(file_in,n=chunk_size)
您可以使用data.table来更有效地处理大型文件的阅读和操作:
require(data.table) fread("in.csv",header = T)
如果需要,您可以利用ff的存储内存:
library("ff") x <- read.csv.ffdf(file="file.csv",first.rows=10000,colClasses=NA)