Bayesian feature enhancement using a mixture of unscented transformations for uncertainty decoding of noisy speech

Yusuke Shinohara, Masami Akamine

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

A new parameter estimation method for the Model-Based Feature Enhancement (MBFE) is presented. The conventional MBFE uses the vector Taylor series to calculate the parameters of non-linearly transformed distributions, though the linearization leads to a degraded performance. We use the unscented transformation to estimate the parameters, where a minimal number of samples propagated through the nonlinear transformation are used. By avoiding the linearization, the parameters are estimated more accurately. Experimental results on Aurora2 show that the proposed method reduces the word error rate by 8.48% relatively, while the computational cost is just modestly higher, compared with the conventional MBFE.

Original languageEnglish
Title of host publication2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009
Pages4569-4572
Number of pages4
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009 - Taipei, Taiwan, Province of China
Duration: 2009 Apr 192009 Apr 24

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009
Country/TerritoryTaiwan, Province of China
CityTaipei
Period09/4/1909/4/24

Keywords

  • Feature enhancement
  • Noisy speech recognition
  • Uncertainty decoding
  • Unscented transformation
  • Vector Taylor series

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Bayesian feature enhancement using a mixture of unscented transformations for uncertainty decoding of noisy speech'. Together they form a unique fingerprint.

Cite this