位置:成果数据库 > 期刊 > 期刊详情页
基于图挖掘的文本主题识别方法研究综述
  • ISSN号:1001-8867
  • 期刊名称:《中国图书馆学报》
  • 时间:0
  • 分类:G252.8[文化科学—图书馆学]
  • 作者机构:[1]中国科学院文献情报中心,北京100190
  • 相关基金:国家自然科学基金项目“基于语言网络的文本主题中心度计算方法研究”(编号:61075047)的研究成果之一~~
中文摘要:

本文通过文献凋研分析,将基于图挖掘的文本主题识别方法总结为中心度方法、紧密关联子图查找和图聚类三种,后两者又细分为基于clique子团或类clique子团、基于图拓扑结构或结点属性聚类的方法。中心度方法通过对比文本网络中术语结点的重要度来实现文本主题的识别.紧密关联子图查找和图聚类方法则是根据文本图中术语结点和边的属性相似度来识别文本核心主题。基于语言文本网络自身特性,如何构建复杂文本关系图来同时揭示术语间的句法、共现和语义关系,如何基于术语关联和图拓扑结构识别其中的紧密关联子团.基于何种标准将紧密关联子团聚类以揭示文本核心主题,都是未来需要进一步深入研究的问题。

英文摘要:

With the development of the internet, electronic text is booming. These text resources, especially scientific journal papers, contain rich semantic and linked information. How to demonstrate the core topics quickly and accurately to assist researchers and improve research efficiency has been an urgent issue in text mining. Nodes and edges of graph can represent terms and their relations of texts, so many researchers tried to combine graph mining with natural language processing to identify text theme. This paper investigated and analyzed the studies and summarized their advantages and disadvantages in order to provide a reference for further research. At present, the studies focus on textual representation of relation graph, theme identification based on centrality and subgraph detection or clustering. The method of theme identification based on cohesive subgraph detection mainly is to recognize clique or quasi-clique subgraph to represent the core content of the texts. Theme identification based on graph mining uses two methods: one is according to the graph topological structure, and the other considers graph topological structure and node attributes simultaneously. We mainly analyzed the clustering model, algorithm and evaluation criterion of clustering result. The methods of frequency statistics and external dictionary are relatively mature and often used asbenchmark. Centrality methods have been greatly improved, but the algorithm efficiency still needs to be improved. The methods based on graph mining have already shown advantages and are worth deeper exploration. Language network of text has its unique characteristics. Various relations exist between terms, for example, co-occurrence relation, syntactic relation and semantic relation. How to construct complex text network which can reveal the relations of terms at the same time is one of the research directions in the future. Further studies need to address how to identify cohesive subgraph in complex text network according to relations between terms a

同期刊论文项目
同项目期刊论文
期刊信息
  • 《中国图书馆学报》
  • 北大核心期刊(2014版)
  • 主管单位:中华人民共和国文化部
  • 主办单位:中国图书馆学会 国家图书馆
  • 主编:韩永进
  • 地址:北京市海淀区中关村南大街33号
  • 邮编:100034
  • 邮箱:jlis.cn@nlc.gov.cn
  • 电话:010-88545141
  • 国际标准刊号:ISSN:1001-8867
  • 国内统一刊号:ISSN:11-2746/G2
  • 邮发代号:2-408
  • 获奖情况:
  • 国家期刊奖之百种重点期刊奖,全国优秀图书馆学期刊
  • 国内外数据库收录:
  • 中国中国人文社科核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国社科基金资助期刊,中国国家哲学社会科学学术期刊数据库,中国北大核心期刊(2000版)
  • 被引量:41018