TY - GEN
T1 - One sentence voice adaptation using GMM-based frequency-warping and shift with a sub-band basis spectrum model
AU - Tamura, Masatsune
AU - Morita, Masahiro
AU - Kagoshima, Takehiko
AU - Akamine, Masami
PY - 2011
Y1 - 2011
N2 - This paper presents a rapid voice adaptation algorithm using GMM-based frequency warping and shift with parameters of a sub-band basis spectrum model (SBM)[1]. The SBM parameter represents a shape of a spectrum of speech. It is calculated by fitting a sub-band basis to the log-spectrum. Since the parameter is the frequency domain representation, frequency warping can be directly applied to the SBM parameter. A frequency warping function that minimize the distance between source and target SBM parameter pairs in each mixture component of a GMM is derived using a DP (Dynamic programming) algorithm. The proposed method is evaluated in an unit-selection based voice adaptation framework applied to a unit-fusion based text-to-speech synthesizer. The experimental results show that the proposed adaptation method is effective for rapid voice adaptation using just one sentence, compared to the conventional GMM.-based linear transformation of mel-cepstra.
AB - This paper presents a rapid voice adaptation algorithm using GMM-based frequency warping and shift with parameters of a sub-band basis spectrum model (SBM)[1]. The SBM parameter represents a shape of a spectrum of speech. It is calculated by fitting a sub-band basis to the log-spectrum. Since the parameter is the frequency domain representation, frequency warping can be directly applied to the SBM parameter. A frequency warping function that minimize the distance between source and target SBM parameter pairs in each mixture component of a GMM is derived using a DP (Dynamic programming) algorithm. The proposed method is evaluated in an unit-selection based voice adaptation framework applied to a unit-fusion based text-to-speech synthesizer. The experimental results show that the proposed adaptation method is effective for rapid voice adaptation using just one sentence, compared to the conventional GMM.-based linear transformation of mel-cepstra.
KW - frequency warping
KW - sub-band basis spectrum model
KW - unit fusion speech synthesis
KW - voice adaptation
UR - http://www.scopus.com/inward/record.url?scp=80051619373&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80051619373&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2011.5947510
DO - 10.1109/ICASSP.2011.5947510
M3 - Conference contribution
AN - SCOPUS:80051619373
SN - 9781457705397
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5124
EP - 5127
BT - 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
T2 - 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Y2 - 22 May 2011 through 27 May 2011
ER -