我正在尝试使用pyparsing解析一些交通违规句子,当我使用grammar.searchString(句子)时它没关系,但是当我使用parseString时抛出ParseException.任何人都可以帮我,请说我的代码出了什么问题?
from pyparsing import Or,Literal,oneOf,OneOrMore,nums,alphas,Regex,Word,\
SkipTo,LineEnd,originalTextFor,Optional,ZeroOrMore,Keyword,Group
import pyparsing as pp
from nltk.tag import pos_tag
sentences = ['Failure to control vehicle speed on highway to avoid collision','Failure to stop at stop sign','Introducing additives into special fuel by unauthorized person and contrary to regulations','driver fail to stop at yield sign at nearest pointf approaching traffic view when req. for safety','Operating unregistered motor vehicle on highway','Exceeding maximum speed: 39 MPH in a posted 30 MPH zone']
for sentence in sentences:
words = pos_tag(sentence.split())
#print words
verbs = [word for word,pos in words if pos in ['VB','VBD','VBG']]
nouns = [word for word,pos in words if pos == 'NN']
adjectives = [word for word,pos in words if pos == 'JJ']
adjectives.append('great') # initializing
verbs.append('get') # initializing
object_generator = oneOf('for to')
location_generator = oneOf('at in into on onto over within')
speed_generator = oneOf('MPH KM/H')
noun = oneOf(nouns)
adjective = oneOf(adjectives)
location = location_generator + pp.Group(Optional(adjective) + noun)
action = oneOf(verbs)
speed = Word(nums) + speed_generator
grammar = action | location | speed
parsed = grammar.parseString(sentence)
print parsed
错误回溯
Traceback (most recent call last): File “script3.py”,line 35,in parsed = grammar.parseString(sentence) File “/Users/alana/anaconda/lib/python2.7/site-packages/pyparsing.py”,line 1032,in parseString raise exc pyparsing.ParseException: Expected Re:(‘control|avoid|get’) (at char 0),(line:1,col:1)
最佳答案
searchString正在工作,因为它跳过了与语法不完全匹配的文本. parseString更加特殊,需要完整的语法匹配,从输入字符串的第一个字符开始.在你的情况下,语法有点难以确定,因为它是基于输入句子的NLTK分析自动生成的(一种有趣的方法,顺便说一句).如果你只是打印语法本身,它可能会给你一些关于它正在寻找什么字符串的见解.例如,我猜测NLTK会在你的第一个例子中将’Failure’解释为名词,但你的语法中的3个表达式中没有一个以名词开头 – 因此,parseString将失败.
您可能需要根据NLTK找到的内容打印名词,形容词和动词列表,然后查看它如何映射到您生成的语法.
你也可以尝试使用Python的sum()内置结合句子中多个匹配的结果:
grammar = action("action") | Group(location)("location") | Group(speed)("speed")
#parsed = grammar.parseString(sentence)
parsed = sum(grammar.searchString(sentence))
print(parsed.dump())