我在将Perl正则表达式转换为
Python时遇到了麻烦.我想要匹配的文本具有以下模式:
Author(s) : Firstname Lastname Firstname Lastname Firstname Lastname Firstname Lastname
在perl中我能够匹配这个并提取作者
/Author\(s\) :((.+\n)+?)/
当我尝试
re.compile(r'Author\(s\) :((.+\n)+?)')
在Python中,它匹配第一个作者两次并忽略其余的.
谁能解释我在这里做错了什么?
解决方法
你可以这样做:
# find lines with authors import re # multiline string to simulate possible input text = ''' Stuff before This won't be matched... Author(s) : Firstname Lastname Firstname Lastname Firstname Lastname Firstname Lastname Other(s) : Something else we won't match More shenanigans.... Only the author names will be matched. ''' # run the regex to pull author lines from the sample input authors = re.search(r'Author\(s\)\s*:\s*(.*?)^[^\s]',text,re.DOTALL | re.MULTILINE).group(1)
上面的正则表达式匹配起始文本(作者,空格,冒号,空格),它通过匹配后面以空格开头的所有行给出了下面的结果:
'''Firstname Lastname Firstname Lastname Firstname Lastname Firstname Lastname '''
然后,您可以使用以下正则表达式对这些结果中的所有作者进行分组
# grab authors from the lines import re authors = '''Firstname Lastname Firstname Lastname Firstname Lastname Firstname Lastname ''' # run the regex to pull a list of individual authors from the author lines authors = re.findall(r'^\s*(.+?)\s*$',authors,re.MULTILINE)
哪个给出了作者列表:
['Firstname Lastname','Firstname Lastname','Firstname Lastname']
组合示例代码:
text = ''' Stuff before This won't be matched... Author(s) : Firstname Lastname Firstname Lastname Firstname Lastname Firstname Lastname Other(s) : Something else we won't match More shenanigans.... Only the author names will be matched. ''' import re stage1 = re.compile(r'Author\(s\)\s*:\s*(.*?)^[^\s]',re.DOTALL | re.MULTILINE) stage2 = re.compile('^\s*(.+?)\s*$',re.MULTILINE) preliminary = stage1.search(text).group(1) authors = stage2.findall(preliminary)
这使作者成为:
['Firstname Lastname','Firstname Lastname']
成功!