我有一个巨大的xlsx文件(约127 MB),并希望使用Spreadsheet :: Excel模块阅读,但我在2GB RAM机器上得到“Out of Memory”错误. (注意脚本适用于较小的excel 2007文件)
有没有办法在不达到内存限制的情况下逐行读取excel文件.搜索谷歌我遇到了http://discuss.joelonsoftware.com/default.asp?joel.3.160328.14,但我不熟悉如何将电子表格存储到标量.有人可以将excel 2007文件作为标量和打印单元格值进行读取.
下面是我在较小的电子表格上运行的当前脚本.
#!/usr/bin/perl use Excel::Writer::XLSX; use Spreadsheet::XLSX; my $workbook = Excel::Writer::XLSX->new('Book1.xlsx'); my $worksheet = $workbook->add_worksheet(); # use strict; my $excel = Spreadsheet::XLSX -> new ('Book2.xlsx'); my $date_format = $workbook->add_format(); $date_format->set_num_format('dd/mm/yy hh:mm'); # Columns of interest @columns=(0,1,2,5,9,10,12,13,31); @reportlist=("string1","String2","String3"); @actuallist=("ModifiedString1","ModifiedString2","ModifiedString3"); $max_list=$#reportlist; foreach my $sheet (@{$excel -> {Worksheet}}) { printf("Sheet: %s\n",$sheet->{Name}); $sheet -> {MaxRow} ||= $sheet -> {MinRow}; foreach my $row ($sheet -> {MinRow} .. $sheet -> {MaxRow}) { $sheet -> {MaxCol} ||= $sheet -> {MinCol}; for ($c=0;$c<=$#columns;$c++){ $col=$columns[$c]; my $cell = $sheet -> {Cells} [$row] [$col]; if($col==0){ $cell->{Val}=~ s/\ GMT\+11\:00//g; $worksheet->write($row,$c,$cell->{Val},$date_format); } if ($cell) { $worksheet->write($row,$cell -> {Val}); for($z=0;$z<=$#reportisplist;$z++){ if(($cell->{Val})=~ m/$reportlist[$z]/i){ $worksheet->write($row,$actuallist[$z]); } } } } } } $workbook->close();
解决方法
我正在开发一个新模块,用于使用Perl快速,高效地读取Excel xlsx文件.它还没有在CPAN上(它需要更多的工作),但你可以在
GitHub上获得它.
以下是如何使用它的示例:
use strict; use warnings; use Excel::Reader::XLSX; my $reader = Excel::Reader::XLSX->new(); my $workbook = $reader->read_file( 'Book1.xlsx' ); if ( !defined $workbook ) { die $reader->error(),"\n"; } for my $worksheet ( $workbook->worksheets() ) { my $sheetname = $worksheet->name(); print "Sheet = $sheetname\n"; while ( my $row = $worksheet->next_row() ) { while ( my $cell = $row->next_cell() ) { my $row = $cell->row(); my $col = $cell->col(); my $value = $cell->value(); print " Cell ($row,$col) = $value\n"; } } } __END__
更新:此模块从未达到CPAN质量.请尝试Spreadsheet::ParseXLSX.