perl – 如何在文本中提取所有引用?

前端之家收集整理的这篇文章主要介绍了perl – 如何在文本中提取所有引用?前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我正在寻找一个输出文本中所有引用的SimpleGrepSedPerlOr PythonOneLiner.

例1:

echo “HAL,” noted Frank,“said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner

标准输出

"HAL,"
"said that everything was going extremely well.”

例2:

cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner

标准输出

"EULA"
"Software"
"Workstation Computer"
"Device"
"DRM"

等等

(link to the corresponding text).

解决方法

我喜欢这个:

perl -ne 'print "$_\n" foreach /"((?>[^"\\]|\\+[^"]|\\(?:\\\\)*")*)"/g;'

它有点冗长,但它比最简单的实现更好地处理转义引用和回溯.它的意思是:

my $re = qr{
   "               # Begin it with literal quote
   ( 
     (?>           # prevent backtracking once the alternation has been
                   # satisfied. It either agrees or it does not. This expression
                   # only needs one direction,or we fail out of the branch

         [^"\\]    # a character that is not a dquote or a backslash
     |   \\+       # OR if a backslash,then any number of backslashes followed by 
         [^"]      # something that is not a quote
     |   \\        # OR again a backslash
         (?>\\\\)* # followed by any number of *pairs* of backslashes (as units)
         "         # and a quote
     )*            # any number of *set* qualifying phrases
  )                # all batched up together
  "                # Ended by a literal quote
}x;

如果你不需要那么大的力量 – 说它只是可能是对话而不是结构化的引用,那么

/"([^"]*)"/

可能与其他任何东西一样有效.

猜你在找的Perl相关文章