根据
perlre,以下代码应该花几秒钟执行:
$time perl -E '$_="((()" . ("a") x 18; say "ok" if m{ \(([^()]+|\( [^()]* \))+\)}x;' real 0m0.006s user 0m0.000s sys 0m0.005s
文件说:
Consider how the pattern above detects no-match on
((()aaaaaaaaaaaaaaaaaa
in several seconds,but that each extra
letter doubles this time.
如图所示,我的笔记本电脑只需要几分之一秒.
即使在百万分之一的情况下运行也会在半秒内完成:
$time perl -E '$_="((()" . ("a") x 1000000; say "ok" if m{ \(([^()]+|\( [^()]* \))+\)}x;' real 0m0.454s user 0m0.446s sys 0m0.008s
我在这里想念的是什么?
解决方法
你可以用来弄清楚正则表达式引擎正在做什么的一个技巧是:
use re 'debug';
例如.:
use strict; use warnings; use re 'debug'; my $str = "a" x 18; $str =~ m{ \(([^()]+|\( [^()]* \))+\)}x;
然后,这将打印正则表达式引擎实际执行的操作:
Compiling REx " \(([^()]+|\( [^()]* \))+\)" Final program: 1: EXACT <(> (3) 3: CURLYX[0] {1,32767} (40) 5: OPEN1 (7) 7: BRANCH (20) 8: PLUS (37) 9: ANYOF[^()][{above_bitmap_all}] (0) 20: BRANCH (FAIL) 21: EXACT <(> (23) 23: STAR (35) 24: ANYOF[^()][{above_bitmap_all}] (0) 35: EXACT <)> (37) 37: CLOSE1 (39) 39: WHILEM[1/3] (0) 40: NOTHING (41) 41: EXACT <)> (43) 43: END (0) anchored "(" at 0 floating ")" at 2..2147483647 (checking floating) minlen 3 Matching REx " \(([^()]+|\( [^()]* \))+\)" against "aaaaaaaaaaaaaaaaaa" Intuit: trying to determine minimum start position... doing 'check' fbm scan,[2..18] gave -1 Did not find floating substr ")"... Match rejected by optimizer Freeing REx: " \(([^()]+|\( [^()]* \))+\)"
如果你重新添加括号,你会得到不同的结果 – 我有大约2000步来处理正则表达式.每增加一个字母 – 大约300步,这会变得更长.
所以我会说是 – 灾难性的回溯正在发生,但你可能会发现处理器(以及正则表达式引擎优化)意味着时间要短得多.
use strict; use warnings; use re 'debug'; my $str = "((()"."a" x 100_000; $str =~ m{ \(([^()]+|\( [^()]* \))+\)}x;
运行时间要长得多 – 但至少有一部分是因为在屏幕上打印文本实际上相当“昂贵”.
我的测试表明(“a”的数量)
10 : 1221 lines of output (broadly correlated with regex steps) 11 : 1324 (+103) 12 : 1467 (+143) 13 : 1590 (+129) 14 : 1728 (+138) 15 : 1852 (+124) 20 : 2630 (approx +155/extra) 30 : 4536 (+190/extra) 40 : 6940 (+240/extra) 50 : 9846 (+290/extra) 100 - 31,846 (+440/extra letter)
所以肯定看起来像是指数行为 – 但在这两种情况下,处理时间仍然相当快,我将其归因于更快的cpus(并且可能更好地优化正则表达式引擎)