文本分割在信息提取、文摘生成、语篇解析及其他多个领域有着极为重要的应用。文本分割的对象包括静态书面文本、语音文本以及动态文本等;分割的粒度因分割的目的不同而有所区别;分割的准确性不仅需要直接评测.更需要间接评测。在大量文献的基础上,对目前常用的分割方法及评测手段进行了全面的归纳和总结,分析了文本分割技术的研究现状,指出尚存在的问题并展望研究前景。
Text segmentation is very important in information retrieval,automatic summarization,discourse analysis,and many other fields.Static written text,speech text and dynamic text can be segmented.The granularity of segmentation is varied for different purpose.Direct and indirect evaluations are applied to assess algorithms.The current work on segmentation approaches and direct evaluation methods are generalized on the basis of lots of literatures.The paper presents the status of text segmentation,points out the problems and future research.