Improved reference speaker weighting using aspect model

Seong Jun Hahm, Yuichi Ohkawa, Masashi Ito, Motoyuki Suzuki, Akinori Ito, Shozo Makino

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


We propose an improved reference speaker weighting (RSW) and speaker cluster weighting (SCW) approach that uses an aspect model. The concept of the approach is that the adapted model is a linear combination of a few latent reference models obtained from a set of reference speakers. The aspect model has specific latent-space characteristics that differ from orthogonal basis vectors of eigenvoice. The aspect model is a "mixture-of-mixture" model. We first calculate a small number of latent reference models as mixtures of distributions of the reference speaker's models, and then the latent reference models are mixed to obtain the adapted distribution. The mixture weights are calculated based on the expectation maximization (EM) algorithm. We use the obtained mixture weights for interpolating mean parameters of the distributions. Both training and adaptation are performed based on likelihood maximization with respect to the training and adaptation data, respectively. We conduct a continuous speech recognition experiment using a Korean database (KAIST-TRADE). The results are compared to those of a conventional MAP, MLLR, RSW, eigenvoice and SCW. Absolute word accuracy improvement of 2.06 point was achieved using the proposed method, even though we use only 0.3 s of adaptation data.

Original languageEnglish
Pages (from-to)1927-1935
Number of pages9
JournalIEICE Transactions on Information and Systems
Issue number7
Publication statusPublished - 2010 Jul


  • Aspect model
  • Latent reference model
  • Reference speaker weighting
  • Speaker adaptation

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Artificial Intelligence


Dive into the research topics of 'Improved reference speaker weighting using aspect model'. Together they form a unique fingerprint.

Cite this