Photo-realistic expressive text to talking head synthesis

Vincent Wan, Robert Anderson, Art Blokland, Norbert Braunschweiler, Langzhou Chen, Bala Krishna Kolluru, Javier Latorre, Ranniery Maia, Björn Stenger, Kayoko Yanagisawa, Yannis Stylianou, Masami Akamine, Mark J.F. Gales, Roberto Cipolla

研究成果: Conference article査読

15 被引用数 (Scopus)


A controllable computer animated avatar that could be used as a natural user interface for computers is demonstrated. Driven by text and emotion input, it generates expressive speech with corresponding facial movements. To create the avatar, HMM-based text-to-speech synthesis is combined with active appearance model (AAM)-based facial animation. The novelty is the degree of control achieved over the expressiveness of both the speech and the face while keeping the controls simple. Controllability is achieved by training both the speech and facial parameters within a cluster adaptive training (CAT) framework. CAT creates a continuous, low dimensional eigenspace of expressions, which allows the creation of expressions of different intensity (including ones more intense than those in the original recordings) and combining different expressions to create new ones. Results on an emotion-recognition task show that recognition rates given the synthetic output are comparable to those given the original videos of the speaker.

ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
出版ステータスPublished - 2013
イベント14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France
継続期間: 2013 8月 252013 8月 29

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション


「Photo-realistic expressive text to talking head synthesis」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。