为了更好地解决汉语标点句句首话题缺失的问题,需要在话题句识别过程中优化用于评估候选话题句优劣的评估函数.为此,提出了话题句生成的上下文相似性特征、话题串与评述相邻接的局部相似性特征,并设计了相关的评估函数.实验结果表明:综合运用这2个评估函数,话题句识别的准确率提高了5.72个百分点.
Topics were often omitted in the beginning of Chinese punctuation clause (abbreviated as PC). In order to better recover topics more accurately, an improved candidate topic clause (abbreviated as CTC) evaluation function was proposed in the topic clause (abbreviated as TC) identification task. Both the context similarity and the local similarity of CTC were taken into account in the evaluation function. Result shows that the performance of TC identification measured by accuracy is increased by 5.72 percent.