东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于交叉覆盖算法的中文分词

ISSN号：1000-7024
期刊名称：计算机工程与设计
时间：0
页码：1355-1361
语言：中文
分类：TP39[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]安徽大学计算机科学与技术学院,安徽合肥230039, [2]安徽大学计算智能与信号处理教育部重点实验室,安徽合肥230039
相关基金：国家自然科学基金项目（60773114）; 安徽省教育厅重点科研基金项目（2006kj013A）; 安徽大学人才队伍建设基金项目（02203105）
相关项目：基于安全多方计算的抗强制电子选举协议研究

关键词：中文分词, 覆盖, 交叉覆盖算法, 互信息, 交叉歧义, Chinese word segment, cover, alternative covering algorithm, mutual information, overlapping ambiguity

中文摘要：

中文分词是自然语言处理的前提和基础,利用分类效果较好的交叉覆盖算法实现中文分词。将中文分词想象成字的分类过程,把字放入向前向后相邻两个字这样一个语境下判断该字所属的类别,是自己独立,或是跟前一字结合,或是跟后一字结合,或是跟前后的字结合。对人民日报熟语料库进行训练,不需要词典,可以较好地解决中文分词中的交叉歧义问题,分词正确率达90.6%。

英文摘要：

Chinese word segment is very important in natural language processing.Chinese word segment is regards as classified process of character.The character is put in the linguistic environment which covers four characters around it.Every character belongs to one of such four categories as independent existence, existence connecting with the character before, existence connecting with the character after and existence connecting with the character before and after.The category of every character is judged by using alternative covering algorithm which has good classification effect.This method carries on statistics in a large annotated corpus and does not need the dictionary.It has a good solution to overlapping ambiguity and achieves 90.6% accuracy.

同期刊论文项目