随着计算社会学的兴起,利用数据挖掘分析社会情感是近期的研究重点.当前的研究主要针对现代文本,对于古代诗歌这类短文本的情感分析相对较少.本文提出了一个基于短文本特征扩展的迁移学习模型CATLPCO,通过分析诗歌情感对当时社会及文化进行进一步了解.该模型首先基于频繁词对对古文特征向量进行扩展,再通过迁移学习方式,建立三个分类器并投票得出最后的情感分析结果.CATL-PCO模型首先能够解决古文短文本特征稀疏的问题,在此基础上进一步解决由于现代译文信息匮乏所导致的古代诗歌情感分析困难问题,从而准确的分析古诗词情感倾向,从计算社会学的角度,增进对中国历史的认识.实验表明,当训练集为中国唐诗时,本文提出方法能够准确的对唐代诗歌进行情感分类,并能应用于唐代和宋代各个时期情感分析及代表流派分析.
With the rise of computational social science, analyzing social sentiment with data mining methods has at- tracted widespread attention and has become a hot spot in recent years. Existing researches of sentiment analysis mainly focus on modem text,but hardly involve the ancient short text literature. This paper proposes a short text feature extension based transfer learning model CATL-PCO( Correlation Analysis Transfer Learning-Probability Co-occurrence). Through sentiments analysis in ancient literature ,this paper can discovery social and cultural development in the ancient era. CATL-PCO expands the ancient literature feature vector based on the frequent word pairs, and utilizes transfer learning method to train three senti- ment classifiers. CATL-PCO solves the problem of sparsity of short text feature vector, and the scarcity of modem transla- tion, which improves the cognition of Chinese History. Experiments demonstrate the effectiveness of the proposed method on the dataset of Chinese poems in Tang Dynasty. Moreover, different periods of Tang and Song Dynasty, and different genres are analyzed in this paper in details.