Leveraging a small corpus by different frame shifts for training of a speech recognizer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

During the feature extraction process for speech recognition, a window function is first applied to the input waveform to extract temporally-limited spectrum. By shifting the window function with a short time period, we can analyze the temporal change of speech spectrum. This time period is called “the frame shift,” which is usually 5 to 10 ms. In this paper, frame shift is re-considered from two aspects. The first one is the appropriateness of 10 ms as the frame shift. The frame-based process is based on the assumption that temporal change of speech spectrum is slow enough compared with the frame shift, which does not hold for kinds of consonants such as plosives. Thus, this paper experimentally shows that feature value fluctuates much according to the first position of the frame. Then a training method is proposed that uses temporally shifted samples as independent samples to compensate for the fluctuation of feature caused by the difference of the beginning position of a frame. The second aspect is that the frame shift could be longer if the fluctuation can be compensated. To prove this, an experiment was conducted to change frame shift from 10 to 60 ms, and it was found that the result of 40 ms frame shift outperformed the result of 10 ms frame shift, and comparable recognition performance with 10 ms frame shift result was obtained with 50 ms frame shift.

Original languageEnglish
Title of host publicationRecent Advances in Intelligent Information Hiding and Multimedia Signal Processing - Proceeding of the Fourteenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing
EditorsLakhmi C. Jain, Lakhmi C. Jain, Pei-Wei Tsai, Akinori Ito, Jeng-Shyang Pan, Lakhmi C. Jain
PublisherSpringer Science and Business Media Deutschland GmbH
Pages82-89
Number of pages8
ISBN (Print)9783030037475
DOIs
Publication statusPublished - 2019
Event14th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2018 - Sendai, Japan
Duration: 2018 Nov 262018 Nov 28

Publication series

NameSmart Innovation, Systems and Technologies
Volume110
ISSN (Print)2190-3018
ISSN (Electronic)2190-3026

Other

Other14th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2018
Country/TerritoryJapan
CitySendai
Period18/11/2618/11/28

Keywords

  • Frame shift
  • Speech recognition
  • Windowing

ASJC Scopus subject areas

  • Decision Sciences(all)
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Leveraging a small corpus by different frame shifts for training of a speech recognizer'. Together they form a unique fingerprint.

Cite this