我在python3中使用stanford依赖解析器来解析一个句子,它返回一个依赖图.
import pickle
from nltk.parse.stanford import StanfordDependencyParser
parser = StanfordDependencyParser('stanford-parser-full-2015-12-09/stanford-parser.jar','stanford-parser-full-2015-12-09/stanford-parser-3.6.0-models.jar')
sentences = ["I am going there","I am asking a question"]
with open("save.p","wb") as f:
pickle.dump(parser.raw_parse_sents(sentences),f)
它给出了一个错误:
AttributeError: Can't pickle local object 'DependencyGraph.__init__.
我想知道是否可以使用或不使用pickle保存依赖图.
最佳答案
继instructions to get a parsed output之后.
1.将DependencyGraph输出为CONLL格式并写入文件
(见What is CoNLL data format?和What does the dependency-parse output of TurboParser mean?)
$export STANFORDTOOLSDIR=$HOME
$export CLASSPATH=$STANFORDTOOLSDIR/stanford-parser-full-2015-12-09/stanford-parser.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-12-09/stanford-parser-3.6.0-models.jar
$python
>>> from nltk.parse.stanford import StanfordDependencyParser
>>> dep_parser=StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> sent = "The quick brown fox jumps over the lazy dog."
>>> output = next(dep_parser.raw_parse("The quick brown fox jumps over the lazy dog."))
>>> type(output)
2.将CONLL文件读入NLTK中的DependencyGraph
>>> from nltk.parse.dependencygraph import DependencyGraph
>>> output = DependencyGraph.load('sent.conll') # Loads any CONLL file,that might contain 1 or more sentences.
>>> output # list of DependencyGraphs
[
请注意,StanfordParser的输出是nltk.tree.Tree而不是DependencyGraph(这只是有人在树上发布类似问题的情况.
$export STANFORDTOOLSDIR=$HOME
$export CLASSPATH=$STANFORDTOOLSDIR/stanford-parser-full-2015-12-09/stanford-parser.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-12-09/stanford-parser-3.6.0-models.jar
$python
>>> from nltk.parse.stanford import StanfordParser
>>> parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> list(parser.raw_parse("the quick brown fox jumps over the lazy dog"))
[Tree('ROOT',[Tree('NP',[Tree('DT',['the']),Tree('JJ',['quick']),['brown']),Tree('NN',['fox'])]),Tree('NP',[Tree('NNS',['jumps'])]),Tree('PP',[Tree('IN',['over']),['lazy']),['dog'])])])])])])]
>>> output = list(parser.raw_parse("the quick brown fox jumps over the lazy dog"))
>>> type(output[0])
对于nltk.tree.Tree,您可以将其输出为括号中的解析字符串,并将字符串读入Tree对象:
>>> from nltk import Tree
>>> output[0]
Tree('ROOT',['dog'])])])])])])
>>> str(output[0])
'(ROOT\n (NP\n (NP (DT the) (JJ quick) (JJ brown) (NN fox))\n (NP\n (NP (NNS jumps))\n (PP (IN over) (NP (DT the) (JJ lazy) (NN dog))))))'
>>> parsed_sent = str(output[0])
>>> type(parsed_sent)