藏字的频度统计是藏文信息处理技术领域的一项基础研究,对藏文拼写检查、字典建立等应用有着重要的意义。该文根据藏文音节的特性,结合Unicode藏文基本集的编码特征,提出了计算机统计藏文字频的方法,设计实现了藏文字频统计软件,并在藏文样本语料中进行测试,证明了方法的正确性。
Word frequency statistics,which has important significance to the Tibetan spell checking, Tibetan dictionary building and other suchlike applications, is a basic research in the field of Tibetan information processing technology. According to the characteristics of Tibetan syllables, combined with the features of encoding of Unicode Tibetan basic set, we proposed a method to statistic Tibetan word with computer ,and implemented the software ,which has been tested in a sample corpus, and achieved satisfactory result.