我有一个文本文件(基本上是一个包含日期,时间戳和一些数据的错误日志),格式如下:
mm/dd/yy 12:00:00:0001 This is line 1 This is line 2 mm/dd/yy 12:00:00:0004 This is line 3 This is line 4 This is line 5 mm/dd/yy 12:00:00:0004 This is line 6 This is line 7
我是Perl的新手,需要编写一个脚本来搜索文件中的时间戳,并合并其中包含相同时间戳的数据.
我期待以上样本的以下输出.
mm/dd/yy 12:00:00:0001 This is line 1 This is line 2 mm/dd/yy 12:00:00:0004 This is line 3 This is line 4 This is line 5 This is line 6 This is line 7
完成这项工作的最佳方法是什么?
解决方法
我之前必须在一些非常大的文件上执行此任务,并且时间戳没有按顺序排列.我不想把它全部存储在内存中.我通过使用三遍解决方案完成了任务:
>使用时间戳标记每个输入行并保存在临时文件中
>使用快速排序器对临时文件进行排序,例如sort(1)
>将已排序的文件恢复为起始格式
这对我的任务来说足够快,我可以在我去喝杯咖啡的时候让它运行,但如果你真的很快就需要结果,你可能需要做更多的事情.
use strict; use warnings; use File::Temp qw(tempfile); my( $temp_fh,$temp_filename ) = tempfile( UNLINK => 1 ); # read each line,tag with timestamp,and write to temp file # will sort and undo later. my $current_timestamp = ''; LINE: while( <DATA> ) { chomp; if( m|^\d\d/\d\d/\d\d \d\d:\d\d:\d\d:\d\d\d\d$| ) # timestamp line { $current_timestamp = $_; next LINE; } elsif( m|\S| ) # line with non-whitespace (not a "blank line") { print $temp_fh "[$current_timestamp] $_\n"; } else # blank lines { next LINE; } } close $temp_fh; # sort the file by lines using some very fast sorter system( "sort",qw(-o sorted.txt),$temp_filename ); # read the sorted file and turn back into starting format open my($in),"<",'sorted.txt' or die "Could not read sorted.txt: $!"; $current_timestamp = ''; while( <$in> ) { my( $timestamp,$line ) = m/\[(.*?)] (.*)/; if( $timestamp ne $current_timestamp ) { $current_timestamp = $timestamp; print $/,$timestamp,$/; } print $line,$/; } unlink $temp_file,'sorted.txt'; __END__ 01/01/70 12:00:00:0004 This is line 3 This is line 4 This is line 5 01/01/70 12:00:00:0001 This is line 1 This is line 2 01/01/70 12:00:00:0004 This is line 6 This is line 7