利用语义、语法等语言知识,建立一种基于依存关系的句法分析统计模型,并利用改进的句法分析模型进行句法分析实验。研究结果表明:利用依存关系、互信息对词聚类,能解决模型数据稀疏问题;模型可同时考虑几种语义依存关系;该模型是一个词汇化的句法分析模型,能结合分词、词性标注进行句法分析;概率上下文无关语法中由概率的上下文无关性假设和祖先结点无关性假设引起的问题在该模型中得到有效解决;精确率和召回率分别为86.96%和85.25%,其综合指标F与Collins的头驱动句法分析模型的F相比提高4.75%。
By incorporating linguistic features such as semantic dependency and syntactic relations, a novel statistical Parsing model was proposed. The experiments were conducted for the refined statistical parser. The results show that the model is constructed on word cluster, so the problem of data sparseness is not serious. The model can take advantage of a few semantic dependencies at the same time. The model is a parser based on lexicalized model, it is combined with segmentation and POS tagging model and thus a language parser is built. The questions caused by context-free hypothesis and ancestor-free hypothesis in probability context free grammar are solved well in this model. It achieves 86.96% precision and recall 85.25%, F value is improved by 4.75% compared with that of the head-driven parsing model introduced by Collins.