藏文人名识别是藏文信息处理领域研究的难点之一,其识别效果直接影响到藏文自动分词的精度和相关应用系统的性能,包括藏汉翻译、藏文信息检索、文本分类等.该文在分析藏文人名构成规律和特点的基础上,提出了一种最大熵和条件随机场相融合的藏文人名识别方法.实验表明,该方法可以获取较好的识别效果,在我们的测试集上F-测度值到达了93.08%.
Tibetan person name recognition is one of the most difficult tasks in the area of Tibetan information pro- cessing, with a direct impact on the precision of Tibetan word segmentation. Based on the analysis of wording rules and features of Tibetan names, this paper proposes a method combining maximum entropy and conditional random fields to identify Tibetan person names. The experiment shows that this approach works significant well reaching 93. 08% in Fl-measure.