提出了关联词搭配模式自动发现的基本方法。建立一个大规模语料库,然后作分词处理,并对关联词进行自动标注和人工校对;评估关联词搭配的三个重要参数(搭配距离、搭配强度MI值、搭配强度Z值),并设定阈值,超过阈值的格式自动作为候选搭配模式。通过实验,标注的准确率为88.75%,表明本方法具有较好效果。运用该方法,发现了以往大量未被注意的句法搭配模式,对研制高质量的关联词知识库起到了积极的促进作用,对复句句法、语义的自动分析具有重要的意义。
This paper provided a method of the automatic discovery of the conjunctions' collocation pattern. Built a large corpus, and it was tagged by a Chinese automatic segmenting system, and tagged and proofed the connects words artificially. Set a threshold, and regard the collocation whose parameters were above of the value as candidates for the collocation pattern. The accuracy of tagging was 88.75% ,which indicated that this method was feasible. Many syntactic patterns are discoved in the research which will promot buliding a top-quality knowledge base of connects words. And it has vital significance in automatic analysis of the syntactic and semantic of compund sentences.