东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种两层次无监督的音频分割算法

ISSN号：1003-0077
期刊名称：《中文信息学报》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院自动化研究所高技术创新中心,北京100080
相关基金：国家自然科学基金（60475014）

关键词：人工智能, 模式识别, 两层次无监督音频分割, 修正广义似然比, 区域层次, 边界层次, artificial intelligence, pattern recognition, two level unsupervised method, modified generalized likelihood ratio, region level, boundary level

中文摘要：

本文提出一种两层次无监督音频分割算法，它用于检测音频流中的说话人、环境、信道等声学特征变化点，该方法将音频分割过程分为两个层次：区域层次和边界层次，通过固定检测窗移动，它快速定位到声学特征变化点存在的区域，然后在潜在变化区域内采用T^2统计值和贝叶斯信息准则（BIC）结合的方法快速确定片断边界。在区域检测层次，将修正的广义对数似然比准则应用于潜在的变化区域检测，它即无需设定闽值门限又可保证低的漏检率，在1997年Hub4中文广播语音数据库上的实验结果表明，同传统的混合分割算法比较，该算法在处理速度得到提高的同时，声学特征变化点的召回率提高10．5％。

英文摘要：

We propose a two level unsupervised method for audio segmentation that detects acoustic changes of speaker, environment and channel in a continuous audio stream effectively. In our approach, we divide the change detection process into two levels： region level that detects the potential change regions containing candidate acoustic change points, and boundary level that searches and refines the true change points. At the region level, we employ the modified Generalized Likelihood Ratio metric to search for the potential change regions in continuous local windows without setting any threshold. At the boundary level, we perform T2 and Bayesian Information Criterion algorithm to detect segment boundaries within the potential windows. The experimental results on the 1997 Broadcast News Hub4-NE mandarin corpus show the proposed scheme can get nearly 10.5% recall rate increase.

同期刊论文项目