在中文自动分词及词性标注系统中,电子词典是系统的重要纽成部分,也是影响系统性能的重要因素之一。介绍了电子词典应该具备的查询功能及常用的纽织结构,给出了一种结构为系统词典+用户词典的可扩展式电子词典机制。其系统词典是基于首字Hash散列的逐字二分词典结构,用户词典采用基于首字Hash散列的链接表词典结构,具有很强的扩展性和实用性。
Digital dictionary is an important part in automatic Chinese word segmentation and part of speech tagging,which is also a vital factor aftecting system performance.This thesis introduces the necessary searching thnetions and common components for a digital dictionary and proposes an extendable mechanism which consists of system dictionary and user dictionary.The system dictionary is indexed with initial character hash table characterized with character-based binary tree structure.The user's dictionary is also indexed with initial character hash table but augmented with linking structure.Experiment shows that the system is extendable in practice.