Speech recognition in a home environment using parallel decoding with GMM-based noise modeling

Kohei Machida, Takashi Nose, Akinori Ito

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose a method for noise-robust speech recognition in a home environment based on noise modeling and parallel decoding. There are three basic ideas of the proposed method. First, we model the noise signals observed in the environment using a GMM. Second, we generate multiple noise-reduced signals using the mean vectors of the GMM and decode the signals in parallel. Third, we choose the best recognition result from the multiple recognition results based on the confidence score. The proposed method is very simple and straightforward, yet effective compared with simple noise reduction. The experiments proved that the proposed method is effective for not only noise signals in the database but also for those in the real home environment.

Original languageEnglish
Title of host publication2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9786163618238
DOIs
Publication statusPublished - 2014 Feb 12
Event2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014 - Chiang Mai, Thailand
Duration: 2014 Dec 92014 Dec 12

Publication series

Name2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014

Other

Other2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014
CountryThailand
CityChiang Mai
Period14/12/914/12/12

Keywords

  • Confidence measure
  • FBANK
  • Gaussian Mixture Model
  • Noise modeling
  • Speech recognition in noise

ASJC Scopus subject areas

  • Signal Processing
  • Information Systems

Fingerprint Dive into the research topics of 'Speech recognition in a home environment using parallel decoding with GMM-based noise modeling'. Together they form a unique fingerprint.

Cite this