在论坛话题中识别答案是面向论坛的问答对挖掘中的核心问题.在论坛话题的讨论中通常存在隐式的结构,这种结构信息非常有助于最佳答案的定位和识别.本文提出了一种基于中文论坛话题段落划分的答案识别方法:首先将论坛话题重新组织为若干段落的集合,并基于此划分提取一组能够反映话题讨论逻辑结构的特征.在此基础上给出了一种可以根据候选答案所在段落类别实现模型选择的答案识别策略,从而避免了噪声信息对模型预测的误导.实验结果表明本文的答案识别方法非常适用于面向在线论坛的问答资源挖掘工作.
Detecting answers in the threads is an essential task for the online forum oriented question-answer (QA) pair mining. In the forum threads, there normally exist implicit discussion structures with the valuable indication for locating the best answers. This paper proposes a thread segmentation based answer detecting approach: a forum thread is reorganized into several segments, and a group of features reflecting the discussion structures are extracted based on the segmentation results. Utilizing the segment information, a strategy is put forward to find the best answers. By evaluating the candidate answers in different types of segments with different models, the strategy filters the samples that mislead the decision. The experimental results show that our approach is promising for mining the QA resource in the online forums.