东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

藏文音节字的频次统计

ISSN号：1003-0077
期刊名称：《中文信息学报》
时间：0
分类：TP391.1[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：西藏大学藏文信息技术研究中心,西藏拉萨850000
相关基金：2013年度国家自然科学基金项目“跨语言社会舆情分析基础理论与关键技术研究”（项目号：61331013）; 2015年度国家自然科学基金项目“基于深度学习的藏语网络舆情监测中的大数据处理研究”（项目号：61540060）; 2015年度西藏自治区高等学校人文社会科学研究项目“藏文音节字的频度统计”（项目号：sk2015-06）阶段性成果

关键词：藏文, 音节字, 频次, 统计, Tibetan, syllable word, frequency, statistics

中文摘要：

藏文音节字的频次统计可以准确地掌握藏文音节字中表意字的使用频度。文章以1亿5千万藏文字符的藏文平衡语料库——大型藏文基础语料库为统计源,提出了非藏文字符和93个特殊藏文字符作为音节分隔符来识别藏文音节字的方法,设计实现了藏文音节字频次统计软件,并从不同的角度对统计结果和错误音节的类型进行了分析。

英文摘要：

Frequency statistics of Tibetan syllable word can accurately grasp the using frequency of ideogram inTibetan syllable words. Tibetan balanced corpus base with 150 million Tibetan characters – the large Tibetancorpusbase as the statistical source, a method of recognizing the Tibetan syllable word was proposed with non-Ti-betan characters and 93 special Tibetan characters as a syllable delimiter. Software of Tibetan syllable word fre-quencies statistics was designed and the statistical results and the occurred error syllable types were analyzedform different point of view.

同期刊论文项目