块分析是自然语言处理研究中的重要技术,其处理基础是设计一套合理有效的块描述体系。本文在吸收和总结前人研究成果和经验的基础上,提出了一套基于拓扑结构的汉语基本块描述体系。它通过引入词汇关联信息确定基本拓扑结构,形成了很好的基本块内聚性判定准则,建立了句法形式与语义内容的有机联系桥梁。这套描述体系大大简化了从现有的句法树库TCT中自动提取基本块标注语料库和相关词汇关联知识库的处理过程,为进一步进行汉语基本块自动分析和词汇关联知识获取互动进化研究打下了很好的基础。
Chunk parsing is an important technique in the natural language processing research community, whose processing basis lies in a suitable and efficient chunk scheme. In this paper, we proposed a new topology-based base chunk scheme for the Chinese language. After introducing the lexical cohesion relationships to determinate three basic topological structures, we formed a better set of principles to analyze the content cohesion of a base chunk and built an efficient bridge to link its syntactic form and semantic meaning. Based on the chunk scheme, we can greatly simplify the processing procedure to automatically extract useful base chunk annotated corpora and corresponding lexical cohesion knowledge from a large scale Chinese syntactically annotated corpus TCT. All these research work will lay good foundations for the further explorations to develop Chinese base chunk parser and lexical cohesion knowledge acquisition tools.