提出一种基于稀疏神经网络的说话人分割方法,利用稀疏的单隐层神经网络提取语音的超矢量特征中说话人因子特征,然后通过K均值聚类得到每帧语音的标号来分割不同说话人,在稀疏网络的训练过程中引入了dropout技术以克服过拟合问题.在TIMIT语音数据库构成的多说话人语音数据上的实验结果表明:通过增加稀疏网络中隐层节点的个数可以提高说话人分割的效果,与贝叶斯信息准则(Bayesian information criterion,BIC)方法和稀疏自编码网络方法相比,所提基于稀疏神经网络的说话人分割方法的性能有明显提高.
A method of speaker segmentation based on sparse neural network is presented.The speaker factor feature is extracted using the sparse neural network of one hidden layer from the super-vector feature of speech signals,then the label of every speech frame obtained by K-means clustering is used to segment different speakers,and the problem of over-fitting is tackled by the dropout technology in the training process of sparse network.The performance evaluation on the multi-speaker audio stream corpus generated from the TIMIT databases shows that the performance of speaker segmentation is improved by increasing the number of sparse network's hidden nodes,and the proposed speaker segmentation algorithm based on the sparse neural network performs better than the Bayesian information criterion(BIC) method and the sparse auto-encoder method.