位置:成果数据库 > 期刊 > 期刊详情页
基于互联网的商业机构名识别研究
  • 期刊名称:情报学报,
  • 时间:0
  • 页码:851-860
  • 语言:中文
  • 分类:F716[经济管理—产业经济]
  • 作者机构:[1]安徽大学商学院,合肥230039, [2]中国科学技术大学管理学院,合肥230026, [3]中国科学技术大学计算机科学与技术学院,合肥230026
  • 相关基金:国家自然科学基金项目“Web环境下本体和实体驱动的企业竞争情报获取机制研究”(编号70803001); 模式识别国家重点实验室开放课题(编号20090029); 中国科技大学青年创新基金资助
  • 相关项目:Web环境下本体和实体驱动的企业竞争情报获取机制研究
中文摘要:

互联网已经成为企业和组织获取竞争对手情报的主要来源之一。建立基于Web的竞争对手情报自动获取系统已成为企业的迫切需求。在竞争对手情报自动获取系统中,商业机构名的识别是基础,它为竞争对手的标识和进一步情报抽取提供了依据。本文提出了一种基于互联网的商业机构名识别新方法。该方法考虑了商业机构名与其上下文之间的语义关联性,通过语义标注和隐马尔可夫模型相结合的方法进行商业机构名识别。我们以互联网上的真实中文网页为数据集对提出的识别算法进行了性能评估,并从召回率、准确率和F指标三个方面与CHMM(基于层叠隐马尔可夫模型的机构名识别算法)、MEM(基于最大熵模型的机构名识别算法)以及SVM(基于支持向量机的机构名识别算法)进行了对比。实验结果表明,本文提出的算法改善了商业机构名识别效果,并且具有很好的普适性。

英文摘要:

Internet has been one of the major sources for enterprises and organizations to acquire competitive intelligence.And many enterprises have shown urgent requirements on building a Web-based system to acquire competitor intelligence.In such a Web-based competitor intelligence system,a fundamental issue is to recognize business organizations' names in Internet,because it is the basis of identifying competitors and extracting further intelligence from the Web.In this paper,we present a new approach to recognizing business organizations in Internet,which considers the semantic relationship between business organizations' names and their context in Web pages and recognizes organizations' names based on an integration of semantic annotation and the Hidden Markov Model(HMM).We conduct an experiment on a real dataset consisting of a large number of Chinese Web pages and evaluate the performance of our approach as well as three competitor algorithms including CHMM,MEM,and SVM,with respect to recall,precision,and F-measure.The results show that our new approach improves the effectiveness of the reorganization of business organizations ' names. Meanwhile,it is a general-purposed algorithm and can suit different types of tasks on business organizations recognition.

同期刊论文项目
同项目期刊论文