我需要通过关键字从logfile grep完整的堆栈跟踪.
这段代码工作正常,但是对于大文件来说速度慢(比文件慢一点).
我认为提高正则表达式找到关键字的最佳方法,但我无法完成它.
#!/usr/bin/perl use strict; use warnings; my $regexp; my $stacktrace; undef $/; $regexp = shift; $regexp = quoteMeta($regexp); while (<>) { while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s (?<THREAD>.*?)\/ (?<CLASS>.*?)\s-\s (?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) { $stacktrace = $&; if ( $+{MESSAGE} =~ /$regexp/ ) { print "$stacktrace"; } } }
用法:./ grep_log4j.pl< pattern> <文件>
示例:./ grep_log4j.pl Exception sample.log
我认为$stacktrace = $& ;;因为如果删除此字符串并只是打印所有匹配的行脚本工作得很快.
用于打印所有匹配项的脚本版本:
#!/usr/bin/perl use strict; use warnings; undef $/; while (<>) { while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s (?<THREAD>.*?)\/ (?<CLASS>.*?)\s-\s (?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) { print_result(); } } sub print_result { print "LEVEL: $+{LEVEL}\n"; print "TIMESTAMP: $+{TIMESTAMP}\n"; print "THREAD: $+{THREAD}\n"; print "CLASS: $+{CLASS}\n"; print "MESSAGE: $+{MESSAGE}\n"; }
用法:./ grep_log4j.pl< file>
示例:./ grep_log4j.pl sample.log
Lo4j模式:%-1p%d%t /%c {1} – %m%n
日志文件示例:
I 111012 141506.000 thread/class - Received message: something E 111012 141606.000 thread/class - Failed handling mobile request java.lang.NullPointerException at javax.servlet.http.HttpServlet.service(HttpServlet.java:710) at java.lang.Thread.run(Thread.java:619) W 111012 141706.000 thread/class - Received message: something E 111012 141806.000 thread/class - Failed with Exception java.lang.NullPointerException at javax.servlet.http.HttpServlet.service(HttpServlet.java:710) at java.lang.Thread.run(Thread.java:619) D 111012 141906.000 thread/class - Received message: something S 111012 142006.000 thread/class - Received message: something I 111012 142106.000 thread/class - Received message: something I 111013 142206.000 thread/class - Metrics:0/1
我的正则表达式你可以在http://gskinner.com/RegExr/上找到log4j关键字:
解决方法
您正在使用:
$/ = undef;
这使得perl将整个文件读入内存.
我会像这样逐行处理这个文件(假设堆栈跟踪与跟踪上方的消息相关联):
my $matched; while (<>) { if (m/^(?<LEVEL>\S+) \s+ (?<TIMESTAMP>(\d+) \s+ ([\d.])+) \s+ (?<THREADCLASS>\S+) \s+ - \s+ (?<REST>.*)/x) { my %captures = %+; $matched = ($+{REST} =~ $regexp); if ($matched) { print "LEVEL: $captures{LEVEL}\n"; ... } } elsif ($matched) { print; } }
这是解析多行块的一般技术.
以下循环一次读取一行STDIN,并将日志文件的完整块提供给子例程进程:
my $first; my $stack = ""; while (<STDIN>) { if (m/^\S /) { process($first,$stack) if $first; $first = $_; $stack = ""; } else { $stack .= $_; } } process($first,$stack) if $first; sub process { my ($first,$stack) = @_; # ... do whatever you want here ... }