R读一个巨大的csv

前端之家收集整理的这篇文章主要介绍了R读一个巨大的csv前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我有一个巨大的csv文件.它的大小约9 gb.我有16公斤的公羊.我遵循了 page的建议,并在下面实施.
If you get the error that R cannot allocate a vector of length x,close out of R and add the following line to the ``Target'' field: 
--max-vsize=500M

我仍然收到以下错误和警告.我应该怎么把9GB的文件读入我的R?我有R 64位3.3.1,我在rstudio 0.99.903中运行下面的命令.我有windows server 2012 r2标准,64位操作系统.

> memory.limit()
[1] 16383
> answer=read.csv("C:/Users/a-vs/results_20160291.csv")
Error: cannot allocate vector of size 500.0 Mb
In addition: There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In scan(file = file,what = what,sep = sep,quote = quote,... :
  Reached total allocation of 16383Mb: see help(memory.size)
2: In scan(file = file,... :
  Reached total allocation of 16383Mb: see help(memory.size)
3: In scan(file = file,... :
  Reached total allocation of 16383Mb: see help(memory.size)
4: In scan(file = file,... :
  Reached total allocation of 16383Mb: see help(memory.size)
5: In scan(file = file,... :
  Reached total allocation of 16383Mb: see help(memory.size)
6: In scan(file = file,... :
  Reached total allocation of 16383Mb: see help(memory.size)
7: In scan(file = file,... :
  Reached total allocation of 16383Mb: see help(memory.size)
8: In scan(file = file,... :
  Reached total allocation of 16383Mb: see help(memory.size)
9: In scan(file = file,... :
  Reached total allocation of 16383Mb: see help(memory.size)
10: In scan(file = file,... :
  Reached total allocation of 16383Mb: see help(memory.size)
11: In scan(file = file,... :
  Reached total allocation of 16383Mb: see help(memory.size)
12: In scan(file = file,... :
  Reached total allocation of 16383Mb: see help(memory.size)

——————- Update1

我的第一次尝试根据建议的答案

> thefile=fread("C:/Users/a-vs/results_20160291.csv",header = T)
Read 44099243 rows and 36 (of 36) columns from 9.399 GB file in 00:13:34
Warning messages:
1: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv",:
  Reached total allocation of 16383Mb: see help(memory.size)
2: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv",:
  Reached total allocation of 16383Mb: see help(memory.size)

——————- Update2

根据建议的答案我的第二次尝试如下

thefile2 <- read.csv.ffdf(file="C:/Users/a-vs/results_20160291.csv",header=TRUE,VERBOSE=TRUE,+                    first.rows=-1,next.rows=50000,colClasses=NA)
read.table.ffdf 1..
Error: cannot allocate vector of size 125.0 Mb
In addition: There were 14 warnings (use warnings() to see them)

我如何将该文件读入单个对象,以便我可以一次性分析整个数据

——————更新3

我们买了一台昂贵的机器.它有10个内核和256 gb ram.这不是最有效的解决方案,但它至少在不久的将来会起作用.我看下面的答案,我不认为他们解决我的问题:(我喜欢这些答案,我想执行市场篮子分析,我不认为没有别的办法,而不是保持我的数据在RAM

确保您使用64位R,而不仅仅是64位Windows,以便您可以将RAM分配增加到所有16 GB.

此外,您可以以块的形式阅读文件

file_in    <- file("in.csv","r")
chunk_size <- 100000 # choose the best size for you
x          <- readLines(file_in,n=chunk_size)

您可以使用data.table来更有效地处理大型文件的阅读和操作:

require(data.table)
fread("in.csv",header = T)

如果需要,您可以利用ff的存储内存:

library("ff")
x <- read.csv.ffdf(file="file.csv",first.rows=10000,colClasses=NA)

猜你在找的Windows相关文章