regex – 用于识别文本引用的正则表达式

我正在尝试创建一个正则表达式来捕获文本引用.

以下是文本引用的几个例句：

… and the reported results in (Nivre et al.,2007) were not representative …

… two systems used a Markov chain approach (Sagae and Tsujii 2007).

Nivre (2007) showed that …

… for attaching and labeling dependencies (Chen et al.,2007; Dredze et al.,2007).

目前,我的正则表达式是

\(\D*\d\d\d\d\)

哪个匹配示例1-3,但不匹配示例4.如何修改此示例以捕获示例4？

谢谢！

我最近为此目的使用了这样的东西：

#!/usr/bin/env perl

use 5.010;
use utf8;
use strict;
use autodie;
use warnings qw< FATAL all >;
use open qw< :std IO :utf8 >;

my $citation_rx = qr{
    \( (?:
        \s*

        # optional author list
        (?: 
            # has to start capitalized
            \p{Uppercase_Letter}        

            # then have a lower case letter,or maybe an apostrophe
            (?=  [\p{Lowercase_Letter}\p{Quotation_Mark}] )

            # before a run of letters and admissible punctuation
            [\p{Alphabetic}\p{Dash_Punctuation}\p{Quotation_Mark}\s,.] +

        ) ?  # hook if and only if you want the authors to be optional!!

        # a reasonable year
        \b (18|19|20) \d\d 

        # citation series suffix,up to a six-parter
        [a-f] ?         \b                 

        # trailing semicolon to separate multiple citations
        ; ?  
        \s*
    ) +
    \)
}x;

while (<DATA>) {
    while (/$citation_rx/gp) {
        say ${^MATCH};
    } 
} 

__END__
... and the reported results in (Nivré et al.,2007) were not representative ...
... two systems used a Markov chain approach (Sagae and Tsujii 2007).
Nivre (2007) showed that ...
... for attaching and labelling dependencies (Chen et al.,2007; Dreǳe et al.,2007).

运行时,它会产生：

(Nivré et al.,2007)
(Sagae and Tsujii 2007)
(2007)
(Chen et al.,2007)

regex – 用于识别文本引用的正则表达式

猜你在找的正则表达式相关文章