互联网文本的大量出现给情感分析研究提供了新的可能。文章研究中文的反讽修辞识别,试图通过归纳的方法提出了一个汉语中出现反讽修辞的特征体系,并进行了相关的算法设计。通过在互联网上抓取相关信息建立文档,然后训练反讽识别的Logistic模型。通过模型自身的显著性、模型识别能力和人工标注识别结果的比较,验证了模型的有效性。显著性测试表明"意指义和字面义的偏离"和"情感的变化张力"是反讽修辞在网络上汉语中最主要的两个特征。模型达到的71.2%的召回率和60.3%的分类准确度可以与近年国内外在英语,意大利语等类似问题研究中做出的最好结果相比较。
The emergency of large quantity of Internet text material has provided new possibility for researches of sentiment analysis.In order to discuss Chinese irony recognition issues,this paper proposes a set of features characterizing irony phenomenon and designs effective algorithms.By crawling documents from Internet to form documents with related information,and training a Logistic model for irony recognition,this paper compares results of model pattern recognition and manual tagging outcomes,so as to verify the model's effectiveness.Tests show that"deviation of sense meaning and literal meaning"and"emotion fluctuation"are the two main features characterizing Chinese irony in Internet text.The model achieves a recall rate of 71.2% and classification accuracy of 60.3%.By comparing with the best recent results obtained from similar researches in English and Italian,it can be concluded that the model is efficient.