汉语框架网的低覆盖率导致汉语句子中存在许多未登录的词元,严重制约着汉语的框架语义分析任务.针对未登录词元的框架识别问题,该文借助同义词词林的词义信息,提出基于平均语义相似度计算及最大熵模型两种方法,采用静态特征与动态特征相结合的特征选择方法.实验证明,这两种方法都能有效地实现未登录词元的框架选择,基于相似度计算的方法(TOP-4)获得78.61%的准确率;基于最大熵的方法结果可达87.29%,同时在新闻语料上达到了75%的准确率.
The low coverage of Chinese FrameNet leads to many unknown lexical units and restricts the frames se- mantic analysis for Chinese. In order to identify frames for unknown lexical units, this paper proposes two methods based on Tongyici CiLin: the Average Semantic Similarity method and Maximum Entropy (ME-based) method which both combine the static features and dynamic features. Experiments show that the two methods can effectively identify the frame of unknown lexical units: the accuracy of the similarity-based method is 78. 61%considering Top-4 candidates; the Top-1 accuracy of the ME-based method for the same test set is 87.29% (and 75% for anoth- er news texts).