问答社区已经成为网络信息获取的一种重要渠道,但其信息质量差异较大。该文研究了问答社区中回答质量的评价方法。具体考察了百度知道的问答社区环境,并对其构建了大规模的语料数据。针对百度知道的特点,文本提出的基于时序的特征、基于问题粒度的特征和基于百度知道社区用户的特征,从更多的角度对回答质量进行评价。利用分类学习的框架,该文综合了新设计的三方面特征和经典的文本特征、链接特征,对高质量和非高质量的回答进行分类。基于大规模问答语料的实验表明,在文本特征与链接特征的基础上,基于时序与基于问题粒度的特征能够有效地提高回答质量的评估效果。另外也发现,根据该文的回答质量评价框架做出的质量评分能够有效地预测最佳答案。
Community Question Answering(CQA) becomes more and more important for information access of Web users.However,CQA content quality varies dramatically from excellence to abuse and spam.This work investigates methods of answer quality analysis on CQA.In particular,it focuses on Baidu Knows,the largest Chinese CQA portal on the Web.A large scale corpus has been constructed by collecting data from the portal,and three new kinds of features were proposed,including sequence-based feature,features in the granularity of question,and BaiduKnows-specific user-based features.To separate high-quality answers from others,a learning based classification method is used to combine the proposed features and traditional textual and link-based features.Experiment results show that the proposed features are effective in improving the performance.Besides,this answer quality analysis framework achieves high accuracy in predicting best answers.