随着网络技术的发展,网络舆情分析越来越受到人们的重视。长期以来,应用于网络舆情分析中的文本分类和聚类技术都是以词作为最小的分析单位,很难把握词语之间的关系。介绍了舆情分析的核心体系,即基于概念网络提取文本本征特征,可以有效提高网络舆情分析准确度,利用概念消歧的方法,将文本映射成为概念网络中的概念,以义元作为最小的表达概念的单位,利用统计方法将高权重的义元集合作为文本本征特征。
With the development of network,more and more people pay attention to the analysis of public opinion.Text classification and cluster technologies are all implemented at the level of words,which could not express the relationship among between the different words.Accuracy of the analysis on public opinion could be effectively improved within the concept network.This paper proposes a way based on concept network to extract the core feature of the text.The author reflects word to concept through a serial of methods such as concept reflection and concept disambiguation and then computes the sememe similarity which is the basic unit for composing the concepts.Finally a number of sememes whose similarity is higher than the limitation value would be the core feature of the text.