本文基于大规模清华树库,从中统计了汉语词汇在句法结构中充当的句法成分,获取了汉语词汇的句法功能分布,并给出了汉语词汇句法功能分布复杂度的定义。在对汉语词汇按照汉语词汇句法功能分布复杂度的高低排序后,本文发现两者之间呈现洛特卡现象。本文的这一发现一方面揭示了汉语词汇在句法结构中的分布规律,对于汉语的研究具有重要的促进作用;另一方面对于中文信息处理中的词性标注、自动消歧和句法分析等研究具有重要的影响。
The Chinese word syntactic constituents in the syntactic structure are calculated based on large-scale Tsinghua Treebank, and the Chinese word syntactic function distribution is gained in the paper. The definition of Chinese word syntactic function distribution complexity is given in this paper. The Lotka's phenomenon presents between Chinese word syntactic function distribution complexity and Chinese word amount after the Chinese word is sorted according to the Chinese word syntactic function distribution complexity. On the one hand, the discovery in the paper reveals Chinese word distribution law which will promote the Chinese researches in the syntactic structure, on the other the discovery will influence the researches of part-of-speech tagging, automatic disambiguation and syntactic analysis in the Chinese information processing.