当前,人们越来越倾向于通过互联网(论坛、讨论组、博客)表达自己对事物的观点、意见.如何利用计算机自动、有效地挖掘这些信息是一个具有挑战性的问题,并且在企业智能分析、政府舆情分析等领域具有广阔的应用空间和发展前景.文本倾向分析就是以挖掘、分析文本中所包含的情感信息为目的的一种技术,它是传统的话题发现与跟踪研究的拓展和深化,并为其提供了新的思路和方法.文本倾向分析的基础是词语语义倾向计算.提出一个可扩展的词汇语义倾向计算框架,将词语语义倾向计算问题归结为优化问题.在算法实现上,首先利用多种词语相似度计算方法构建词语无向图;然后利用以“最小切分”为目标的目标函数对该图进行划分,并利用模拟退火算法进行求解.实验证明了该框架的合理性以及求解方法的有效性.
At present, people have ever-increasing preference for the Internet for expressing their personal experiences and opinions on almost anything at review sites, forums, discussion groups, blogs, etc. Those user-generated content contains very valuable emotional information. How to mine those emotional information automatically and efficiently will hence be a very challenging question, as well as he promising in applications and development of enterprise business intelligence and public opinion survey and so on. Text-leveled sentiment analysis technology is considered as an extension and enhancement of traditional topic detecting and tracking (TDT) technology by adding some new approaches and ideas, which is based on word semantic orientation computing. In this paper, a novel scalable word semantic orientation computing framework is proposed, in which the word semantic orientation computing is transformed into the function optimization. As an instance of the proposed framework, the authors build an undirected graph in the use of word similarity computing technology first, and then partition the word-to-word graph by the idea of 'minimum-cut', thereby function optimization is adopted in this word semantic orientation computing framework and resolved by using simulated annealing algorithm. The experimental results prove that the proposed framework is reasonable and the algorithm performs better than those existing counterparts.