提出一种基于声源时延估计的二元时频掩蔽方法,通过三个接收信号实现多于多个语音源信号的欠定盲分离。利用语音信号的W-分离正交性,在时频域估计各个源信号到达接收阵列的相对时延序列;进而基于信号时延序列的估计,采用最大似然算法将时频域划分为与源信号个数相同的互不重叠的时频点集合,每个集合(近似)只包含一个源信号的所有时频分量;再通过二元时频掩蔽依次恢复出各集合所对应的源信号。该方法性能通过主观试听得到了验证,其分段信噪比增益至少为13dB。较之欠定解混迭估计技术DUET,本文方法得到的分离信号与实际声源信号的相异度降低约3dB。
Based on time-delay estimation, a time-frequency masking method is proposed for underdetermined blind source separation. The method can realize the blind separation more than 3 source signals by using only 3 received array elements. Firstly, relative time-delay sequences of all sources are estimated in time-frequency domain by virtue of the W-disjoint orthogonality of speech signals. Secondly, based on the estimated time-delay sequences, the maximum likelihood method is used to estimate the support domain of each signal. The timefrequency components in each support domain belong to only one signal approximatively, and different support domains are mutually disjoint. Finally, the time-frequency representation of each signal is obtained by the time-frequency masking, and then the time-domain source signals are retrieved. The experiments illustrate that the method is validated by the informal subjective measure, and the gain of segment signal-to-noise ratio is at least 13dB. Compared with the degenerate unmixing estimation technique, the separation performance of the proposed method improves about 3dB measured by signal dissimilarities.