结合维吾尔语语音特征,以建立维吾尔音素语料库为目标,为了减少人工工作量,通过HTK工具实现了音素的自动切分算法:首先完成了文本设计、录音和手动标注等准备工作,设计了上下文属性集,通过训练获得了每个音素的HMM模型,随后对任意输入的语音句子进行了其音素构成部分的自动切分,最后分析了其切分准确度、存在的问题及对策等。实践表明,在语料库的建设中,该研究策略确实节省了大量的时间和人力成本,提高了语音语料库标注信息的一致性和准确性。
In combination with Uyghur phonetic features,aiming at building Uyghur phoneme corpus and for reducing manual workload,we realised the phoneme automatic segmentation algorithm with the HTK tool as presented in this paper.First,we completed the preparatory works such as text design,sound recording and manual labelling,and designed the context attribute collection,the HMM models of each phoneme was obtained through training.Then we automatically segmented the phonemic components of arbitrarily inputted vocal sentences.At last we analysed its segmentation accuracy,the problems existed and their countermeasures,etc.Practice indicated that during the construction of the corpus,the strategy studied in this paper did have saved massive time and manpower cost,and improved the consistency and the accuracy of labelled information of speech corpus.