东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于汉语拼音首字母索引的混合分词算法

ISSN号：1003-3254
期刊名称：《计算机系统应用》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]华中师范大学计算机学院,武汉430079, [2]湖北工业大学计算机学院,武汉430068
相关基金：教育部社科基金（13YJAZH117）; 国家社科基金（14BYY093）

关键词：中文分词, 拼音索引, 双向匹配, 歧义切分, Chinese automatic segmentation, Pinyin index, bidirectional match, ambiguity resolve

中文摘要：

中文自动分词是web文本挖掘以及其它中文信息处理应用领域的基础.蓬勃发展的中文信息处理应用对分词技术提出了更高的要求.提出了一种新的分词算法FPLS,该算法用拼音首字母作为词语表一级索引,词语的字数为二级索引构造分词词典,采用双向匹配方法,并引入规则解决歧义切分问题.与现有的快速分词算法比较,该算法分词效率高且正确率高.

英文摘要：

Chinese automatic segmentation is the basis of web text mining and other Chinese information processing applications. Booming Chinese information processing applications put forward a higher requirement for Chinese automatic segmentation. This paper presents a new segmentation algorithm FPLS, which uses a dictionary with a first letter of the Pinyin as a first level index and words count as the secondary index structure. A bidirectional matching method and rules are employed to resolve ambiguity segmentation problem in the algorithm. Comparing with the existing algorithm,algorithm FPLS gets higher accuracy and efficiency.

同期刊论文项目

　基于依存句法树的汉语复句关系词自动标识方法研究

期刊论文 4

同项目期刊论文

二句式非充盈态有标复句关系类别的自动标志

汉语复句关系词的依存树特征分析

基于联合引力度扩展的加权网络重叠社区划分算法

期刊信息

《计算机系统应用》
中国科技核心期刊

主管单位:中国科学院
主办单位:中国科学院软件研究所
主编：苏振泽
地址：北京8718信箱
邮编：100190
邮箱：csa@iscas.ac.cn
电话：010-62661041

国际标准刊号：ISSN：1003-3254
国内统一刊号：ISSN：11-2854/TP
邮发代号:82-558

获奖情况:

国内外数据库收录:
波兰哥白尼索引,美国剑桥科学文摘,中国中国科技核心期刊,中国北大核心期刊（2000版）

被引量:15201