藏文音节字的频次统计可以准确地掌握藏文音节字中表意字的使用频度。文章以1亿5千万藏文字符的藏文平衡语料库——大型藏文基础语料库为统计源,提出了非藏文字符和93个特殊藏文字符作为音节分隔符来识别藏文音节字的方法,设计实现了藏文音节字频次统计软件,并从不同的角度对统计结果和错误音节的类型进行了分析。
Frequency statistics of Tibetan syllable word can accurately grasp the using frequency of ideogram inTibetan syllable words. Tibetan balanced corpus base with 150 million Tibetan characters – the large Tibetancorpusbase as the statistical source, a method of recognizing the Tibetan syllable word was proposed with non-Ti-betan characters and 93 special Tibetan characters as a syllable delimiter. Software of Tibetan syllable word fre-quencies statistics was designed and the statistical results and the occurred error syllable types were analyzedform different point of view.