前端之家收集整理的这篇文章主要介绍了
bash – 从12GB文件中删除特定行,
前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我正在尝试从12GB文本
文件中
删除特定行.
我没有在HP-UX上使用sed -i选项,并且保存到临时文件等其他选项无效,因为我只有20GB可用空间,文本文件已经使用了12 GB.
考虑到空间要求,我正在尝试使用Perl.
此解决方案适用于从12 GB的文件中删除最后9行.
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::File;
tie my @lines,'Tie::File','test.txt' or die "$!\n";
$#lines -= 9;
untie @lines;
我想修改上面的代码来删除任何特定的行号.
@H_
404_12@
Tie :: File永远不是答案.
>这太疯狂了.
>它可以消耗更多的内存,而不仅仅是将整个文件放入内存,即使你限制其缓冲区的大小.
你遇到了这两个问题.您遇到文件的每一行,因此Tie :: File将读取整个文件并将每行的索引存储在内存中.这在64位构建的Perl上每行占用28个字节(不计算内存分配器中的任何开销).
要删除文件的最后9行,可以使用以下命令:
use File::ReadBackwards qw( );
my $qfn = '...';
my $pos;
{
my $bw = File::ReadBackwards->new($qfn)
or die("Can't open \"$qfn\": $!\n");
for (1..9) {
defined( my $line = $bw->readline() )
or last;
}
$pos = $bw->tell();
}
# Can't use $bw->get_handle because it's a read-only handle.
truncate($qfn,$pos)
or die("Can't truncate \"$qfn\": $!\n");
要删除任意行,您可以使用以下内容:
my $qfn = '...';
open(my $fh_src,'<:raw',$qfn)
or die("Can't open \"$qfn\": $!\n");
open(my $fh_dst,'+<:raw',$qfn)
or die("Can't open \"$qfn\": $!\n");
while (<$fh_src>) {
next if $. == 9; # Or "if /keyword/",or whatever condition you want.
print($fh_dst $_)
or die($!);
}
truncate($fh_dst,tell($fh_dst))
or die($!);
以下优化版本假设只有一行(或行块)要删除:
use Fcntl qw( SEEK_CUR SEEK_SET );
use constant BLOCK_SIZE => 4*1024*1024;
my $qfn = 'file';
open(my $fh_src,$qfn)
or die("Can't open \"$qfn\": $!\n");
open(my $fh_dst,$qfn)
or die("Can't open \"$qfn\": $!\n");
my $dst_pos;
while (1) {
$dst_pos = tell($fh_src);
defined( my $line = <$fh_src> )
or do {
$dst_pos = undef;
last;
};
last if $. == 9; # Or "if /keyword/",or whatever condition you want.
}
if (defined($dst_pos)) {
# We're switching from buffered I/O to unbuffered I/O,# so we need to move the system file pointer from where the
# buffered read left off to where we actually finished reading.
sysseek($fh_src,tell($fh_src),SEEK_SET)
or die($!);
sysseek($fh_dst,$dst_pos,SEEK_SET)
or die($!);
while (1) {
my $rv = sysread($fh_src,my $buf,BLOCK_SIZE);
die($!) if !defined($rv);
last if !$rv;
my $written = 0;
while ($written < length($buf)) {
my $rv = syswrite($fh_dst,$buf,length($buf)-$written,$written);
die($!) if !defined($rv);
$written += $rv;
}
}
# Must use sysseek instead of tell with sysread/syswrite.
truncate($fh_dst,sysseek($fh_dst,SEEK_CUR))
or die($!);
}