东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

汉语形容词的自动词义区分研究

ISSN号：1003-0077
期刊名称：《中文信息学报》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]北京大学计算语言学研究所,北京100871, [2]北京大学计算语言学教育部重点实验室,北京100871
相关基金：国家973课题资助项目（2004CB318102）;国家自然科学基金资助项目（60775031）;国家社科基金资助项目（08BYY060）;全国优秀博士学位论文作者专项资助项目（200514）

关键词：计算机应用, 中文信息处理, 知识获取, 词义区分, 特征选择, EM算法, computer application, Chinese information processing, knowledge acquisition, word sense discrimination, feature selection, EM algorithm

中文摘要：

词义知识获取是词义知识库建设、词义消歧等任务的基础和起点,目前该工作基本依赖人类专家的智慧和洞察力,在大规模文本处理上缺乏意义计算的客观性和一致性。该文以汉语的中高频形容词为样本,深入挖掘词义特征并采用有参数初始化过程的EM迭代算法,实现了从真实文本中自动发现并区分词语词义的过程。该词义区分算法选取易获取的词形特征、基于大规模语料的搭配特征、基于网络语料的属性—宿主关系特征,替代以往难以获取的句法结构特征,并进一步利用HowNet优化了词形特征的选择。该工作可以应用于信息检索等领域,能够对现有词典起到修改和补充的作用,该思路亦可扩展到其他汉语词类上去。

英文摘要：

Lexieal knowledge acquisition is the bottleneck for many tasks like word sense disambiguation, lexieal knowledge base construction et al. This paper introduces an automatic word sense discrimination method for Chinese mid-high-frequency adjectives. We employ the EM algorithm and exploit the features of Chinese character, contextual bag-of-words and host-attribute pair instead of the more unreliable syntactic information. We further optimize the morphology selection by utilizing HowNet in our work. The experimental results show that word sense discrimination results are different from Chinese lexicons and could be used for lexicon modification and expansion even for other type of Chinese words.

同期刊论文项目