东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于“固结词串”实例的中文分词研究

ISSN号：1003-0077
期刊名称：中文信息学报
时间：2012
页码：59-64
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]北京工业大学计算机学院,北京100022, [2]北京语言大学语言信息处理研究所,北京100083
相关基金：国家自然科学基金资助项目（60872121）
相关项目：基于广义话题的汉语篇章结构研究

作者：修驰|宋柔|

关键词：中文分词, CRF, 固结词串, 分词歧义, 机器学习, Chinese Word Segmentation（CWS）, CRF, stable string, ambiguity, machine learning

中文摘要：

近几年的中文分词研究中,基于条件随机场（CRF）模型的中文分词方法得到了广泛的关注。但是这种分词方法在处理歧义切分方面存在一定的问题。CRF虽然可以消除大部分原有的分词歧义,却会带来更多新的错误切分。该文尝试找到一种简单的、基于＂固结词串＂实例的机器学习方法解决分词歧义问题。实验结果表明,该方法可以简单有效的解决原有的分词歧义问题,并且不会产生更多新的歧义切分。

英文摘要：

Chinese word segmentation based on CRF（Conditional Random Field） has attracted the most attention in recent research.But this method has certain defects in handling the ambiguity of word segmentation： eliminating most original ambiguity errors at the cost of more new errors.In this paper,we attempt on a simple and example-based machine learning method to deal with the problem of word segmentation ambiguity： the method based on stable string.The experiment results indicate that stable string based method can solve the ambiguity simple and effective.And it will not introduce more new errors.

同期刊论文项目