东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于LDA模型的移动投诉文本热点话题识别

ISSN号：1003-3513
期刊名称：《数据分析与知识发现》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]杭州电子科技大学计算机学院,杭州310018, [2]中国计量大学,杭州310018
相关基金：本文系国家自然科学基金青年基金项目“引入涉身认知机制的汉语隐喻计算模型及其实现”（项目编号：61103101）、国家自然科学基金青年基金项且“基于马尔科夫树与DRT的汉语句群自动划分算法研究”（项目编号：61202281）和教育部人文社会科学研究青年基金项目“面向信息处理的汉语隐喻计算研究”倾目编号：10YJCZH052）的研究成果之一.

作者：方小飞[1], 黄孝喜[1], 王荣波[1], 谌志群[1], 王小华[1,2]

关键词：移动投诉, K-means, 话题识别, LDA模型, Mobile Complaints k-means Topic Detection LDA Model

中文摘要：

【目的】运用中文信息处理和话题识别与追踪的方法，从大量移动投诉文本中找出有价值的信息。【方法】从分析投诉文本的特点人手，使用k—means先对文本聚类。利用LDA对每个类进行建模，提取话题，并从词频、词跨度和词长三方面计算每个话题中词的权值，把权重最大的词作为该话题的标签，并计算每个话题的文档分布概率均值。对具有相同标签的话题，先按照均值最大的原则去掉重复标签话题，再对所有话题计算文档支持率，并将文档支持率作为话题的热度，通过热度区分热点话题和一般话题。【结果】对投诉文本进行时间上的建模，通过对比一般话题和热点话题，得出热点话题的支持文档率至少是一般话题的3倍，支持文档率变化趋势也比一般话题高，说明本文算法是有效的。【局限】没有考虑到话题之间的语义关系。【结论】利用LDA模型对移动投诉话题检测初探的方法是比较合理和有效的，对今后此领域的研究具有一定的借鉴意义。

英文摘要：

[Objective] This paper aims to extract valuable information from large amount of complaint texts with the help of Chinese message processing technologies. [Methods] First, we analyzed the characteristics of the complaint texts, and then clustered them by k-means algorithm. Second, we extracted topics from the texts of each category with the LDA model. In the mean time, we calculated the weight of the word of each topic, as well as the mean of document probability distribution. Third, we analyzed topics with the highest means and used the document supporting rates to identify the trending ones. [Results] The document supporting rates of the topics extracted by this study was three times higher than the average ones. [Limitations] We did not investigate the semantic relationship among the topics. [Conclusions] The LDA model is an effective method to detect hot topics of the mobile complaints and indicates some future studies.

同期刊论文项目