以动宾关系的搭配为例研究复述搭配的抽取.具体地,该方法将复述搭配抽取视作二元分类问题,并综合使用了基于翻译、词典、极性词以及网络挖掘的多种特征.实验结果表明,所采用的二元分类方法对于抽取复述搭配是行之有效的,其中使用的各种特征对于提高复述搭配抽取的效果皆有帮助.利用该方法,共抽取出28万余对的复述搭配,其准确率超过70%.进一步的实验结果表明,使用抽取的复述搭配,可以为约40%的句子实现复述生成,从而说明了该方法的实际应用价值.
This paper addresses the problem of paraphrase collocation extraction by using "OBJ" relationship as a case study.Specifically,the proposed method recasts paraphrase collocation extraction as a binary classification problem,which combines multiple features based on translation,thesaurus,polarity words,and web mining.Experimental results show that the binary classification-based method is effective for paraphrase collocation extraction.Especially,the exploited features are all helpful for improving the extraction performance.With the proposed method,more than 280 000 pairs of paraphrase collocations are extracted,the precision of which is above 70%.Further experiments show that nearly 40% of sentences can be paraphrased by using the extracted paraphrase collocations,which demonstrates that the proposed method is useful in practice.