我正在尝试从12GB文本文件中删除特定行.
我没有在HP-UX上使用sed -i选项,并且保存到临时文件等其他选项无效,因为我只有20GB可用空间,文本文件已经使用了12 GB.
考虑到空间要求,我正在尝试使用Perl.
#!/usr/bin/env perl use strict; use warnings; use Tie::File; tie my @lines,'Tie::File','test.txt' or die "$!\n"; $#lines -= 9; untie @lines;
Tie :: File永远不是答案.
原文链接:https://www.f2er.com/bash/384983.html>这太疯狂了.
>它可以消耗更多的内存,而不仅仅是将整个文件放入内存,即使你限制其缓冲区的大小.
你遇到了这两个问题.您遇到文件的每一行,因此Tie :: File将读取整个文件并将每行的索引存储在内存中.这在64位构建的Perl上每行占用28个字节(不计算内存分配器中的任何开销).
use File::ReadBackwards qw( ); my $qfn = '...'; my $pos; { my $bw = File::ReadBackwards->new($qfn) or die("Can't open \"$qfn\": $!\n"); for (1..9) { defined( my $line = $bw->readline() ) or last; } $pos = $bw->tell(); } # Can't use $bw->get_handle because it's a read-only handle. truncate($qfn,$pos) or die("Can't truncate \"$qfn\": $!\n");
my $qfn = '...'; open(my $fh_src,'<:raw',$qfn) or die("Can't open \"$qfn\": $!\n"); open(my $fh_dst,'+<:raw',$qfn) or die("Can't open \"$qfn\": $!\n"); while (<$fh_src>) { next if $. == 9; # Or "if /keyword/",or whatever condition you want. print($fh_dst $_) or die($!); } truncate($fh_dst,tell($fh_dst)) or die($!);
以下优化版本假设只有一行(或行块)要删除:
use Fcntl qw( SEEK_CUR SEEK_SET ); use constant BLOCK_SIZE => 4*1024*1024; my $qfn = 'file'; open(my $fh_src,$qfn) or die("Can't open \"$qfn\": $!\n"); open(my $fh_dst,$qfn) or die("Can't open \"$qfn\": $!\n"); my $dst_pos; while (1) { $dst_pos = tell($fh_src); defined( my $line = <$fh_src> ) or do { $dst_pos = undef; last; }; last if $. == 9; # Or "if /keyword/",or whatever condition you want. } if (defined($dst_pos)) { # We're switching from buffered I/O to unbuffered I/O,# so we need to move the system file pointer from where the # buffered read left off to where we actually finished reading. sysseek($fh_src,tell($fh_src),SEEK_SET) or die($!); sysseek($fh_dst,$dst_pos,SEEK_SET) or die($!); while (1) { my $rv = sysread($fh_src,my $buf,BLOCK_SIZE); die($!) if !defined($rv); last if !$rv; my $written = 0; while ($written < length($buf)) { my $rv = syswrite($fh_dst,$buf,length($buf)-$written,$written); die($!) if !defined($rv); $written += $rv; } } # Must use sysseek instead of tell with sysread/syswrite. truncate($fh_dst,sysseek($fh_dst,SEEK_CUR)) or die($!); }