东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于计算听觉场景分析和语者模型信息的语音识别鲁棒前端研究

ISSN号：0254-4156
期刊名称：自动化学报
时间：0
页码：410-416
语言：中文
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院自动化研究所模式识别国家重点实验室,北京100190, [2]诺基亚中国研究中心,北京100176, [3]中国科学院自动化研究所数字内容技术研究中心,北京100190
相关基金：围家重点基础研究发展计划（973计划）（2004CB318105）,国家自然科学基金（60675026,60121302,90820011）,国家高技术研究发展计划（863计划）（20060101Z4073,2006AA01Z194）资助.
相关项目：基于语音知识和全局最优准则指导的段模型汉语LVCSR方法研究

关键词：计算听觉场景分析, 语音分离, 鲁棒语音识别, 因子最大矢量量化, 语者识别, Computational auditory scene analysis （CASA）, speech segregation, robust speech recognition, factorial-max vector quantization （MAXVQ）, speaker recognition

中文摘要：

传统抗噪算法无法解决人声背景下语音识别（Automatic speech recognition，ASR）系统的鲁棒性问题。本文提出了一种基于计算听觉场景分析（Computational auditory scene analysis，CASA）和语者模型信息的混合语音分离系统。该系统在CASA框架下，利用语者模型信息和因子最大矢量量化（Factorial-max vector quantization，MAXVQ）方法进行实值掩码估计，实现了两语者混合语音中有效地分离出目标说话人语音的目标，从而为ASR系统提供了鲁棒的识别前端。在语音分离挑战（Speech separation challenge，SSC）数据集上的评估表明，相比基线系统，本文所提出的系统的语音识别正确率提高了15.68％。相关的实验结果也验证了本文提出的多语者识别和实值掩码估计的有效性。

英文摘要：

Conventional noise robust speech recognition system does not work well when human speech is presented in the background. In this paper, a computational auditory scene analysis （CASA） and speaker model based speech segregation system is proposed to solve this problem. By utilizing speaker model and factorial-max vector quantization （MAXVQ） to estimate real-value masks in CASA framework, a robust front-end for speech recognition is constructed. Evaluations on speech separation challenge （SSC） showed that the proposed system won 15.68 % improvement over the baseline system. The results of evaluation also proved the validity of the multi-speaker recognition and the real-value mask estimation module.

同期刊论文项目

语音、图像与视觉计算

期刊论文 155 会议论文 74 著作 5

　面向网络应用环境的口语翻译关键技术与系统研究

期刊论文 13

基于语音知识和全局最优准则指导的段模型汉语LVCSR方法研究

期刊论文 21 会议论文 25 获奖 2 专利 3

基于客观质量评估和音频场景分析语音分离新方法研究

期刊论文 36 会议论文 22 获奖 2

同项目期刊论文

Linguistic theory based contextual evidence mining for statistical Chinese co-reference resolution

Efficient Night Gait Recogniti

Motion Detection in driving En

Topology-preserved Diffusion D

Dynamic Similarity Kernel for

Discriminative analysis of ear

Gait Recognition Based on Fusi

Fast global Motion Estimation

基于中间转换格式的中英文语言生

The Parallel Genetic Algorithm

Regional Coherence Changes in

Handedness-Related Functional

Regional Honogeneity Based App

Semantic-based Surveillance Vi

Altered Resting State Function

An Approach to Automatic Acqui

Characterizing The Dynamic Con

Combining Local Features for R

Structure and Motion of Nonrig

A Robust Method toRecognize Cr

Reconstruction of Structured S

Pixon-Based Image Segmentation

Altered Functional Connectivit

Can Image Measurement Errors b

Functional Dysconnectivity of

汉语大词汇量连续语音识别系统研究进展

汉语韵律短语的时长与音高研究

A SVD-Based Watermarking Schem

Esub8: A novel tool to predict

Affine Invariant Classificatio

Catadioptric Camera Calibratio

Alignment of Two Dimensional E

Region Growing Method for the

Global motion Model for stereo

Intensity-Based Robust Similar

A Combinational Feature Select

Silhoutte Analysis based Gait

An Evolutionary Tabu Search fo

A Modified Gabor Filter Design

Brief Review of Invariant Text

A New Easy Camera Calibration

Personal Identification Based

Illumination Invariant Face Re

Visual Vehicle Tracking Algori

A New Constraint on the Imaged

Fingerprint Registration by Ma

Non-rigid Registration of Medi

Quantitative Analysis along th

Exploring Susceptibility Genes

A Co-Chunk based method for Sp

The Impossibility of Affine Re

面向口语翻译的汉语语句改写方法

A hierarchical self-organizing

Single View Metrology from Sce

Volumetric Segmentation of the

Changes in Hippocampal Connect

Functional Disintegration in P

Asymmetry of Prefrontal Cortic

Automatic Gait Recognition bas

Normalization-cooperated gradi

PnP Revisited

Head tracking Using Shape and

Parametric Reconstruction of G

Improving Depth Resolution of

Characterizing Anatomical Diff

Implementation and Experimenta

A Neural Network Approach to S

A?Comparative Study of Global

The Number of Independent Krup

A New Robust Circular Gabor Ba

Modulation of Functional Conne

Toward Practical Spoken Langua

Discriminative Analysis of Rel

The LLE and a Linear Mapping

The Relationship within and be

Side and Handedness Effects on

Dual Cross-Media Relevance Mod

Human Behavior Consistent Rele

A Mid-Level Scene Change Repre

Shape and Texture Based Deform

Image segmentation Using Co-EM

Multiple Similarities Based Ke

Automatic Moving Object Segmen

Semantic Knowledge Extraction

Local Intensity Variation Anal

Efficient Iris Recognition by

Design and Performance Studies

Improving Iris Recognition Acc

Camera Calibration and 3D Reco

Automated Short Proof Generati

Face annotation for family pho

Ensemble-based Discriminant Le

鲁棒的汉语口语解析方法研究

3D Model Base Vehicle Tracking

Recent Developments in Human M

Kernel based Nonlinear Discrim

A Modified Temporal Self-Corre

Asymmetry Analysis of Cingulum

A Real-time Detecting Tracking

Single-Trial Variable Model fo

Do Singular Values Contains Ad

Spontaneous Activity Associate

A Note on the Convergence of t

Camera Calibration with Moving

People Tracking Based on Motio

Hippocampal Shape Analysis of

The Invariant Representations

Protein Folding Simulations of

Multi-context Fuzzy Clustering

Widespread Functional Dysconne

Prosody Conversion from Neutra

A Note on the Number of Soluti

A General Sufficient Condition

A Split-merge Based Region Gro

Whole Brain Functional Connect

Improving Performance of Refle

Learning Multi-View Face Subsp

中文Base NP识别:错误驱动的组合分类器方法

基于“松弛尺度”的短语翻译对抽取方法

阿尔茨海默病脑灰质体积异常的MRI研究

基于MRI海马形状特征的阿尔茨海默病的自动判别

基于Fourier—Mellin变换的图像配准方法及应用拓展

基于三参数模型的快速全局运动估计

基于后验概率解码段模型的汉语语音数字串识别

如何增加人工神经元网络的透明度

基于多统计模型和人耳听觉特性的麦克风阵列后滤波语音增强算法

基于语义分类树的汉语口语理解方法

The Number of Independent Kruppa Constraints from N Images

引入标点处理的层次化汉语长句句法分析方法

应用统计方法综合评估核函数分类能力的研究

椭球基函数神经网络的混合学习算法

采用听觉滤波器的宽带MUSIC声源定位方法

Auditory filter based broadband MUSIC algorithm for sound source localization

基于听觉感知特性的信号子空间麦克风阵列语音增强算法

一种改进的单声道混合语音分离方法

Unsupervised Learning of Gaussian Mixture Model with Application to Image Segmentation

From English pitch accent detection to Mandarin stress detection, where is the difference?

Monaural Voiced Speech Segregation Based on Dynamic Harmonic Function

Monaural speech separation based on MAXVQ and CASA for robust speech recognition

汉语大词汇量连续语音识别系统研究进展

融合引导概率的语音识别解码算法研究

Integrating Induced Probability into Decoding for Large Vocabulary Continuous Speech Recognition

基于互补模型的汉语韵律间断自动检测

基于F范数的信号子空间维度估计的多通道语音增强算法

韵律相关的汉语语音识别系统研究

Mandarin stress detection using acoustic, lexical and syntactic features

基于多空间概率分布的汉语连续语音声调识别研究

改进谐波组织规则的单通道浊语音分离系统

基于韵律间断层级的汉语韵律间断分类

Monaural voiced speech segregation based on elaborate harmonic grouping strategies

基于互补模型的汉语重音检测

基于多统计模型和人耳听觉特性的麦克风阵列后滤波语音增强算法

基于声学相关特征与词典语法相关特征的汉语重音检测

Robust front-end for speech recognition based on computational auditory scene analysis and speaker m

汉语大词汇量连续语音识别系统研究进展

融合引导概率的语音识别解码算法研究

汉语韵律短语的时长与音高研究

A modified monaural mixture speech separation method

基于高斯-拉普拉斯-伽玛模型和人耳听觉掩蔽效应的信号子空间语音增强算法

Perceptual properties based signal subspace microphone array speech enhancement algorithm

基于F范数的信号子空间维度估计的多通道语音增强算法

韵律相关的汉语语音识别系统研究

改进谐波组织规则的单通道浊语音分离系统

基于韵律间断层级的汉语韵律间断分类

基于多基音跟踪的单声道混合语音分离

基于互补模型的汉语重音检测

基于多统计模型和人耳听觉特性的麦克风阵列后滤波语音增强算法

基于声学相关特征与词典语法相关特征的汉语重音检测

汉语大词汇量连续语音识别系统研究进展

Integrating Induced Probability into Decoding for Large Vocabulary Continuous Speech Recognition

汉语韵律短语的时长与音高研究

韵律相关的汉语语音识别系统研究

改进谐波组织规则的单通道浊语音分离系统

基于互补模型的汉语重音检测

人机互助的交互式口语翻译方法

基于多模型融合的人名翻译系统

基于混淆网络解码的机器翻译多系统融合

期刊信息

《自动化学报》
中国科技核心期刊

主管单位:中国科学院
主办单位:中国自动化学会中国科学院自动化研究所
主编：王飞跃
地址：北京东黄城根北街16号
邮编：100717
邮箱：aas@ia.ac.cn
电话：010-64019820

国际标准刊号：ISSN：0254-4156
国内统一刊号：ISSN：11-2109/TP
邮发代号:2-180

获奖情况:
1997年获全国优秀期刊奖,1985、1990、1996、2000年获中国科学院优秀期刊二等奖,2002年获国家期刊奖

国内外数据库收录:
美国数学评论（网络版）,德国数学文摘,荷兰文摘与引文数据库,美国工程索引,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:27550