传统的主题挖掘模型一般仅从交互型文本中挖掘出文档主题,为了能够从中挖掘出会话主题并提高挖掘模型的普适性,提出了一种基于对话内容的交互型文本会话主题生成模型。首先通过分析交互型文本的特征,基于主题树的概念,定义了一个5层结构的对话生成树。以此为基础,再基于LDA构建会话主题生成模型(ST-LDA)。最后采用吉布斯抽样法对ST-LDA进行推导,得到会话主题及其分布概率。使用实际数据进行验证,结果表明,ST-LDA模型可以从交互型文本中有效地挖掘出会话主题。此外,成果可以降低分类算法的复杂度,回溯主题—参与者关联关系,具有较好的普适性。
Traditional theme mining model generally digs out the document theme from the interactive text only. In order to explore the session topic and improve the universality of mining model, a kind of interactive text session topic generation model based on the content of the dialogue was put forward. Firstly, by analyzing the characteristics of interactive text and based on the concept of topic tree, a dialog spanning tree was defined with a five-layer structure. Based on this and LDA, the model of session topic generation(ST-LDA) was built. At last, Gibbs sampling method was adopted to deduce the ST-LDA and obtaining session topic and its distribution probability. The results show that the ST-LDA model can dig out a session topic effectively from the interactive text. Besides, the results can reduce the complexity of the classification algorithm and can be back to the theme —participants association. It also has a good universality.