东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis

ISSN号：1000-9000
期刊名称：《计算机科学技术学报：英文版》
时间：0
分类：TP3[自动化与计算机技术—计算机科学与技术] O15[理学—数学;理学—基础数学]
作者机构：[1]Pattern Recognition and Intelligent System Laboratory, School of Information Engineering Beijing University of Posts and Telecommunications, Beijing 100876, China
相关基金：Supported by the National Natural Science Foundation of China under Grant Nos. 60475007 and 60675001, the Key Project of Chinese Ministry of Education under Grant No. 02029, and the Foundation of Chinese Ministry of Education for Century Spanning Talent.

关键词：分析方法, 指数, 计算机, 计算方法, exponential prior, language model, maximum entropy, n-gram, subjectivity analysis

中文摘要：

文件主观分析成为了网文章内容采矿的一个重要方面。这个问题类似于传统的文章分类，因此，许多相关分类技术能这里被改编。然而，有一重要差别，更多的语言或语义信息为更好估计一个文件的主观被要求。因此在这篇论文，我们的焦点主要在二个方面上。一个人是怎么提取有用、有意义的语言特点，并且其它是怎么为这项特殊任务高效地构造适当语言模型。为第一个问题，我们进行全球过滤、本地重量的策略与不同订单并且在各种各样的距离窗户以内在一系列 n 克选择并且评估语言特点。为第二个问题，我们采用最大平均信息量(MaxEnt ) 构造我们的语言模型框架的建模方法。除古典 MaxEnt 模型以外，我们也分别地与 Gaussian 和指数的 priors 构造了二种改进模型。在这给的详细实验糊与选择的井和加权的语言特点，有指数的 priors 的 MaxEnt 模型是的表演显著地对文章主观分析任务更合适。电子增补材料这篇文章(doi:10.1007/s11390-008-9125-z ) 的联机版本包含增补材料，它对授权用户可得到。

英文摘要：

Document subjectivity analysis has become an important aspect of web text content mining. This problem is similar to traditional text categorization, thus many related classification techniques can be adapted here. However, there is one significant difference that more language or semantic information is required for better estimating the subjectivity of a document. Therefore, in this paper, our focuses are mainly on two aspects. One is how to extract useful and meaningful language features, and the other is how to construct appropriate language models efficiently for this special task. For the first issue, we conduct a Global-Filtering and Local-Weighting strategy to select and evaluate language features in a series of n-grams with different orders and within various distance-windows. For the second issue, we adopt Maximum Entropy （MaxEnt） modeling methods to construct our language model framework. Besides the classical MaxEnt models, we have also constructed two kinds of improved models with Gaussian and exponential priors respectively. Detailed experiments given in this paper show that with well selected and weighted language features, MaxEnt models with exponential priors are significantly more suitable for the text subjectivity analysis task.

同期刊论文项目