正则表达式 – 用于搜索文件中的模式和连续行的Perl脚本

前端之家收集整理的这篇文章主要介绍了正则表达式 – 用于搜索文件中的模式和连续行的Perl脚本前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我有一个文本文件(基本上是一个包含日期,时间戳和一些数据的错误日志),格式如下:

mm/dd/yy 12:00:00:0001  
This is line 1
This is line 2

mm/dd/yy 12:00:00:0004  
This is line 3
This is line 4
This is line 5


mm/dd/yy 12:00:00:0004
This is line 6
This is line 7

我是Perl的新手,需要编写一个脚本来搜索文件中的时间戳,并合并其中包含相同时间戳的数据.

我期待以上样本的以下输出.

mm/dd/yy 12:00:00:0001  
This is line 1
This is line 2

mm/dd/yy 12:00:00:0004  
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7

完成这项工作的最佳方法是什么?

解决方法

我之前必须在一些非常大的文件上执行此任务,并且时间戳没有按顺序排列.我不想把它全部存储在内存中.我通过使用三遍解决方案完成了任务:

>使用时间戳标记每个输入行并保存在临时文件
>使用快速排序器对临时文件进行排序,例如sort(1)
>将已排序的文件恢复为起始格式

这对我的任务来说足够快,我可以在我去喝杯咖啡的时候让它运行,但如果你真的很快就需要结果,你可能需要做更多的事情.

use strict;
use warnings;
use File::Temp qw(tempfile);

my( $temp_fh,$temp_filename )  = tempfile( UNLINK => 1 );

# read each line,tag with timestamp,and write to temp file
# will sort and undo later.
my $current_timestamp = '';
LINE: while( <DATA> )
    {
    chomp;

    if( m|^\d\d/\d\d/\d\d \d\d:\d\d:\d\d:\d\d\d\d$| ) # timestamp line
        {
        $current_timestamp = $_;
        next LINE;
        }
    elsif( m|\S| ) # line with non-whitespace (not a "blank line")
        {
        print $temp_fh "[$current_timestamp] $_\n";
        }
    else # blank lines
        {
        next LINE;
        }
    }

close $temp_fh;

# sort the file by lines using some very fast sorter
system( "sort",qw(-o sorted.txt),$temp_filename );

# read the sorted file and turn back into starting format
open my($in),"<",'sorted.txt' or die "Could not read sorted.txt: $!";

$current_timestamp = '';
while( <$in> )
    {
    my( $timestamp,$line ) = m/\[(.*?)] (.*)/;
    if( $timestamp ne $current_timestamp )
        {
        $current_timestamp = $timestamp;
        print $/,$timestamp,$/;
        }

    print $line,$/;
    }

unlink $temp_file,'sorted.txt';

__END__
01/01/70 12:00:00:0004
This is line 3
This is line 4
This is line 5

01/01/70 12:00:00:0001
This is line 1
This is line 2


01/01/70 12:00:00:0004
This is line 6
This is line 7

猜你在找的正则表达式相关文章