东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于证据理论的单词语义相似度度量

ISSN号：0254-4156
期刊名称：《自动化学报》
时间：0
分类：TP[自动化与计算机技术]
作者机构：[1]吉林大学计算机科学与技术学院,长春130012, [2]符号计算与知识工程教育部重点实验室(吉林大学),长春130012, [3]长春工业大学计算机科学与工程学院,长春130012
相关基金：国家自然科学基金（60903098,60973040,61300148,61472049）; 吉林省重点科技攻关项目（20130206051GX）; 吉林省科技计划青年基金项目（20130522112JH）资助

关键词：词计算, 统计学习, 证据理论, 不确定性度量, Computing with word, statistical learning, evidence theory, uncertainty modeling

中文摘要：

单词语义相似度度量一直是自然语言处理领域的经典和热点问题,其成果可对词义消歧、机器翻译、本体映射、计算语言学等应用具有重要影响.本文通过结合证据理论和知识库,提出一个新颖的度量单词语义相似度度量途径.首先,借助通用本体Word Net获取证据;其次,利用散点图分析证据的合理性;然后,使用统计和分段线性插值生成基本信任分配函数;最后,结合证据冲突处理、重要度分配和D-S合成规则实现信息融合获得全局基本信任分配函数,并在此基础上量化单词语义相似度.在数据集R＆G（65）上,对比本文算法评判结果与人类评判结果的相关度,采用5折交叉验证对算法进行分析,相关度达到0.912,比当前最优方法 P＆S高出0.4个百分点,比经典算法re LHS、dist JC、sim LC、sim L和sim R高出7%～13%;在数据集M＆C（30）和Word Sim353上也取得了比较好的实验结果,相关度分别为0.915和0.941;且算法的运行效率和经典算法相当.实验结果显示使用证据理论解决单词语义相似度问题是合理有效的.

英文摘要：

Measuring semantic similarity between words is a classical and hot problem in nature language processing, the achievement of which has great impact on many applications such as word sense disambiguation, machine translation, ontology mapping, computational linguistics, etc. This paper proposes a novel approach to measure words semantic similarity by combining evidence theory with knowledge base. Firstly, we extract evidences based on WordNet;secondly, we analyze the reasonableness of the extracted evidence using scatter plot;thirdly, we generate basic probability assignment by statistics and piecewise linear interpolation technique; fourthly, we obtain global basic probability assignment by integrating evidence conflict resolution, importance distribution, and D-S combination rules; finally, we quantify word semantic similarity. On data set R＆amp;G（65）, we conducted experiment through 5-fold cross validation, and the correlation of our experimental results with human judgment was 0.912, with 0.4% improvements over existing best practice P＆amp;S, 7%～13% improvements over classical methods （reLHS、distJC、simLC、simL, simR）; the experimental results based on M＆amp;C（30） and WordSim353 were also good with correlations being 0.915 and 0.941. The operational e？ciency of our method is as good as classical methods0, showing that using evidence theory to measure word semantic similarity is reasonable and effective.

同期刊论文项目

基于本体的Deep Web搜索技术

期刊论文 32 会议论文 3 专利 1

异质社会网络信息可信度评估与建模研究

期刊论文 7

基于主动学习的半监督领域本体自动构建

期刊论文 29 会议论文 4

大数据环境下基于群体协同智能聚类的关键技术研究

期刊论文 3

同项目期刊论文

一种基于加权非负矩阵分解的多维用户人格特质识别算法

一种基于改进D-S证据理论的信任关系强度评估方法研究

基于社会学理论的信任关系预测模型

一种基于差分隐私和时序的推荐系统模型研究

基于用户搜索行为的query-doc关联挖掘

基于本体与模式的网络用户兴趣挖掘

一种基于信息熵的多维流数据噪声检测算法

基于本体的Deep Web 查询接口集成

Ontology-assisted schema matching for deep web query interfaces

Automatic classification of Deep Web databases based on centroid and WordNet

Load Balancing Parallelizing XML Query Processing Based on Shared Cache CMP

Optimization strategy of top-down join enumeration on modern multi-core CPUs

Data Extraction and Annotation Based on Domain-specific Ontology

Optimizing large query by simulated annealing algorithm based on graph-based approach

An ontology-based schema matching on deep web

主题爬行中的隧道穿越技术

Heterogeneous deep web data extraction using ontology evolution

一种基于本体的文本聚类方法

本体定义及本体代数

Extracting data records based on global schema

基于启发式信息的Deep Web查询接口属性抽取

基于本体的语义查询优化

Optimization strategy of parallel query processing based on multi-core architecture

Research on Discovering Deep Web Entries

基于本体的Deep Web查询接口集成

一种基于聚类的PU主动文本分类方法

基于聚类和决策树的链路预测方法

异质网中基于张量表示的动态离群点检测方法

Automatic Table Integration by Domain-specific Ontology

Hybrid Schema Matching for Deep Web

Ontology-based Filling Forms of Deep Web Entries Automatically

Ontology-Based Focused Crawler

主题爬行中的隧道穿越技术

Ontology Based Automatic Attributes Extracting and Queries Translating for Deep Web

Automatic Generation of Domain-specific Ontology from Deep Web

Semi-automatic Ontology Construction based on Text learning

Ontology-assisted Schema Matching for Deep Web Query Interface

Optimizing Search Engines using Time-Sensitive Ranking

Robust and Efficient Annotation based on Ontology Evolution for Deep Web Data

Ontology-assisted Deep Web Source Selection

Data Extraction and Annotation Based on Domain-specific Ontology Evolution for Deep Web

An Ontology-based Approach to Integrate Deep Web Query Interfaces

基于本体的Deep Web查询接口集成

在线增量标签主题模型

一种基于差分隐私和时序的推荐系统模型研究

一种基于聚类的PU主动文本分类方法

基于多核环境的并行性双向枚举连接

基于本体增量学习的主题爬行

免疫算法优化的大气质量评价模型及其应用

基于用户搜索行为的query-doc关联挖掘

基于本体与模式的网络用户兴趣挖掘

智能信息处理的多指标面板数据聚类方法及其应用

基于Spark的Apriori算法的改进

期刊信息

《自动化学报》
中国科技核心期刊

主管单位:中国科学院
主办单位:中国自动化学会中国科学院自动化研究所
主编：王飞跃
地址：北京东黄城根北街16号
邮编：100717
邮箱：aas@ia.ac.cn
电话：010-64019820

国际标准刊号：ISSN：0254-4156
国内统一刊号：ISSN：11-2109/TP
邮发代号:2-180

获奖情况:
1997年获全国优秀期刊奖,1985、1990、1996、2000年获中国科学院优秀期刊二等奖,2002年获国家期刊奖

国内外数据库收录:
美国数学评论（网络版）,德国数学文摘,荷兰文摘与引文数据库,美国工程索引,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:27550