东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

实体消歧中特征文本选取研究

ISSN号：1672-9722
期刊名称：《计算机与数字工程》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：国防科技大学,长沙410073
相关基金：国家自然科学基金项目（编号：61472436,61532001,61303190）资助.

关键词：实体消歧, 特征文本, 数据清洗, entity disambiguation, feature text, data cleaning

中文摘要：

在实体消歧问题中,特征文本是指输入实体消歧系统的用于表征实体指称和候选实体的文本,其质量对于实体消歧的性能有重要的影响.论文对特征文本的选取问题进行研究,针对网络文本的特点,综合考虑文本中的特殊字符、特征文本的位置、特征文本是否包含实体指称和特征文本的单句长度等因素,对文本进行筛选和处理,产生特征文本,以提高实体消歧的效果.论文在深度结构语义网（Deep Structured Semantic Model,DSSM）和向量相似度模型（Vector Similarity Mod-el,VSM）两个实体排序模型上验证了特征文本选取方法的效果.结果显示特征文本筛选提高了DSSM上排序准确性,在P@3、P@5和P@10上分别有12.2％、12.3％和12.2％的提高.其中特殊字符处理对VSM有5.5％的提高.实验结果表明,对特征文本进行合理的筛选及清洗,有助于提高实体消岐中候选实体排序步骤的效果.

英文摘要：

In an entity disambiguation task,feature text is the input of entity disambiguation system to represent the men-tioned entity and the candidate entity. Quality of feature text affects entity disambiguation performance. Feature text selection regard-ing web text is studied in three aspects,including special tokens,text location and whether it contains the mention and length of sen-tences. Experiments are conducted on DSSM（Deep Structured Semantic Model）and VSM（Vector Similarity Model）. Results in DSSM show but increases of 12.2％,12.3％and 12.2％on P@3,P@5 and P@10 respectively. Special token preprocess increased VSM precision by 5.5％. Feature text selection helps in semantic understanding in entity disambiguation.

同期刊论文项目