东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

Audio-visual underdetermined blind source separation algorithm based on Gaussian potential function

ISSN号：1673-5447
期刊名称：China Communications
时间：2014.6.22
页码：71-80
分类：O11[理学—数学;理学—基础数学] TP391.4[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]Department of Electronic Information Engineering, Nanchang University, Nanchang 330031, China, [2]National Engineering Laboratory for Disaster Backup and Recovery, Beijing University of Posts and Telecommunications, Beijing 100876,China
相关基金：supported by the National Natural Science Foundation of China(Grant Nos.61162014,61210306074);the Natural Science Foundation of Jiangxi Province of China(Grant No.20122BAB201025);the Foundation for Young Scientists of Jiangxi Province(Jinggang Star)(Grant No.20122BCB23002)
相关项目：基于视听觉信息融合的欠定卷积语音混合信号盲分离及其在机器人听觉系统中应用的研究

作者： Cao Kang|Wu Kangrui|Yu Tenglong|Zhou Nanrun|

关键词：分离算法, 高斯, 势函数, 参数初始化, 参数估计, 视听, 混合参数, 混合物, underdetermined blind sourceseparation, interaural time difference, interaural level difference, visual information, Gaussian potential function

中文摘要：

Most existing algorithms for the underdetermined blind source separation(UBSS) problem are two-stage algorithm, i.e., mixing parameters estimation and sources estimation. In the mixing parameters estimation, the previously proposed traditional clustering algorithms are sensitive to the initializations of the mixing parameters. To reduce the sensitiveness to the initialization, we propose a new algorithm for the UBSS problem based on anechoic speech mixtures by employing the visual information, i.e., the interaural time difference(ITD) and the interaural level difference(ILD), as the initializations of the mixing parameters. In our algorithm, the video signals are utilized to estimate the distances between microphones and sources, and then the estimations of the ITD and ILD can be obtained. With the sparsity assumption in the time-frequency domain, the Gaussian potential function algorithm is utilized to estimate the mixing parameters by using the ITDs and ILDs as the initializations of the mixing parameters. And the time-frequency masking is used to recover the sources by evaluating the various ITDs and ILDs. Experimental results demonstrate the competitive performance of the proposed algorithm compared with the baseline algorithms.

英文摘要：

Most existing algorithms for the underdetermined blind source separation （UBSS） problem are two-stage algorithm, i.e., mixing parameters estimation and sources estimation. In the mixing parameters estimation, the previously proposed traditional clustering algorithms are sensitive to the initializations of the mixing parameters. To reduce the sensitiveness to the initialization, we propose a new algorithm for the UBSS problem based on anechoic speech mixtures by employing the visual information, i.e., the interaural time difference （ITD） and the interaural level difference （ILD）, as the initializations of the mixing parameters. In our algorithm, the video signals are utilized to estimate the distances between microphones and sources, and then the estimations of the ITD and ILD can be obtained. With the sparsity assumption in the time-frequency domain, the Gaussian potential function algorithm is utilized to estimate the mixing parameters by using the ITDs and ILDs as the initializations of the mixing parameters. And the time-frequency masking is used to recover the sources by evaluating the various ITDs and ILDs. Experimental results demonstrate the competitive performance of the proposed algorithm compared with the baseline algorithms.

同期刊论文项目