针对现有无监督语音样例检测精度不高的现状,提出一种基于后验概率特征和主成分分析的方法。该方法首先利用无标注语料训练GMM,得到训练数据频谱参数的高斯混元后验概率特征向量序列;采用层次聚类算法检测其边界信息得到声学分段,利用K-means算法对所有声学分段聚类并添加标签,通过声学分段和标签训练基于后验概率的声学分段模型(ASMs);ASMs将查询项-9检索文档的高斯混元后验概率转换为新的后验概率,利用主成分分析方法对其优化处理,保持概率向量维数不变,去除噪声信息,提高后验概率特征向量鲁棒性与区分性:最后通过分段动态时间规整算法检索查询项。实验证明该方法的检索精度较现有方法有显著提升。
This paper presents a study of using posterior features and principal components analysis to improve the detection of unsupervised query-by-example spoken terms. A Gaussian Mixture Model is trained without any transcription information to label speech frames with Gaussian posteriorgram. Through hierarchical agglomerative clustering and K-means, the boundaries and labels are obtained to train acoustic segment models (ASMs). Additionally by principal components analysis, ASMs posteriorgrams are extracted and then the segmental dynamic time warping is applied to match the query, to test posteriorgrams, and to locate possible occurrences of the query term. Experimental re- suits show that the proposed method consistently outperform the traditional method.