东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于LCS的中文缩写字段匹配问题的研究

ISSN号：1002-4026
期刊名称：山东科学
时间：0
页码：52-56
语言：中文
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]山东省人民检察院信息中心,山东济南250014, [2]山东经济学院计算机科学与技术学院,山东济南250014
相关基金：国家自然科学基金项目（60603077）,山东省自然科学基金青年基金（Q2007G04）,山东省教育厅科研计划项目（J07YJ11）.
相关项目：任意拓扑网格上的B样条曲面表示问题研究

作者：邢晓辉|刘慧|

关键词：信息检索, 中文缩写字段匹配, 最长公共子序列, 分词, information retrieval, Chinese abbreviation field match, longest common subsequence, word segment

中文摘要：

鉴于中文字段匹配在信息检索领域的重要性以及日益复杂的检索需求，本文首次提出并实现了基于最长公共子序列LCS的中文缩写字段匹配模型，避免了繁琐的分词操作，将字段匹配过程简单化。在CWT100G数据集部分网页上的实验表明，该方法性能比较稳定，检索效果比较好，尤其在较长缩写字段的匹配方面效果更优于传统的基于字符串匹配的分词模型。

英文摘要：

We initially present and realize a Longest Common Subsequence （LCS） based Chinese abbreviation field match model in view of its significance in information retrievil and increasingly complicated search demands, which avoids the fussy operation to word segment and simplifies the process of field match. Experiment in partial webpage of CWTIOOG dataset shows that the approach is stable in performance and preferable to retrieval results, and that it is superior to the traditional string match based word seyment model especially in the longer Chinese abbreviation field match.

同期刊论文项目