该文针对美国国家标准与技术研究院(NIST)的NIST评测,构建了一套多距离麦克风说话人分类及定位语音处理系统,针对NIST富标注评测中提出的说话人分类问题,提出改进的结合时延估计和聚类的说话人分类方法,在保证稳定性的前提下降低说话人分类的复杂度并提高准确率;提出一种新的相邻阵元间时延构造矩阵方程算法,可得到多个说话人的方向角。实验在标准会议环境下采集真实语音数据进行算法验证,说话人分类算法的正确率接近目前主要说话人分类系统的正确率,定位方向角误差在3°以内。实验结果说明,适当条件下多距离麦克风系统可作为合适的语音信号输入设备应用于多人多方会议环境。
This paper builds one speaker diarization and localization speech processing system based on Multiple Distance Microphone(MDM) for NIST evaluation,and proposes a modified clustering algorithm based on time delay estimation,which can decrease the complexity of speaker diarization and improve the correct rate under the guarantee of stable performance.A new time delay matrix structure is proposed,which can acquire multiple speakers' direction angle.It is the real speech data collected under the standard session environment to validate the algorithms.The correct rate of proposed speaker diarization algorithm is similar with other speaker diarization system existed;Location algorithm direction angle error is less than 3.The results show that under appropriate conditions,the MDM system can be a better input device applied to multiple dialogue scenes.