Feature enhancement by speaker-normalized SPLICE for robust speech recognition

Yusuke Shinohara, Takashi Masuko, Masami Akamine

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The SPLICE method of feature enhancement is known for its powerful performance. It learns a mapping from noisy to clean feature vectors given a set of stereo training data. However, feature vector variation caused by speaker changes conceals noise-induced variation, which is what we want to find in the SPLICE training. In this paper, an improvement of SPLICE by means of speaker-normalization is proposed. The training data is first normalized with respect to speaker variation, and a mapping is learned afterward. CMLLR with a GMM as its target is utilized for the speaker-normalization, where the GMM representing a standard speaker is learned via a novel variant of the speaker adaptive training. The proposed method was evaluated on Aurora2, and achieved a relative word error rate reduction of 38% over the conventional SPLICE.

Original languageEnglish
Title of host publication2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Pages4881-4884
Number of pages4
DOIs
Publication statusPublished - 2008 Sep 16
Externally publishedYes
Event2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP - Las Vegas, NV, United States
Duration: 2008 Mar 312008 Apr 4

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
CountryUnited States
CityLas Vegas, NV
Period08/3/3108/4/4

Keywords

  • Feature enhancement
  • Robust speech recognition
  • SPLICE
  • Speaker adaptive training
  • Speaker normalization

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Feature enhancement by speaker-normalized SPLICE for robust speech recognition'. Together they form a unique fingerprint.

  • Cite this

    Shinohara, Y., Masuko, T., & Akamine, M. (2008). Feature enhancement by speaker-normalized SPLICE for robust speech recognition. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (pp. 4881-4884). [4518751] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2008.4518751