名物化复合词的识别是汉语复合词识别中的难点。困难之处在于汉语动词和名词共现时既可以构成动词短语也可以构成名物化复合词。传统的汉语复合词识别往往只使用语料统计特征,效果往往不怎么理想。基于最大熵模型,在基准上下文特征的基础上,采用了词汇特征与Web特征对动词和名词共现时的名物化媛选进行判定,取得了较好的实验结果。其中,Precision达到了86.31%,Recall达到了70.00%。
The identification of nominalization compounds is very. difficult in Chinese compound recognition. When a verb and a noun cooccur,there will be an ambiguity as whether the expression is a verb phrase or a compound. Traditional identification of nominalization compounds is usually only based on the features from the corpus and the result is not very good. In this paper it uses a Maximum Entropy model to identify nominafization eompounds. Besides the baseline contextual features, the model also adopts lexical and Web features for the identification task. The experiment result is eneouraging. The Preeision and Recall is 86.31% and 70% respectively.