Multi-task learning of hierarchical vision-language representation

Duy Kien Nguyen, Takayuki Okatani

研究成果: Conference contribution

9 被引用数 (Scopus)

抄録

It is still challenging to build an AI system that can perform tasks that involve vision and language at human level. So far, researchers have singled out individual tasks separately, for each of which they have designed networks and trained them on its dedicated datasets. Although this approach has seen a certain degree of success, it comes with difficulties of understanding relations among different tasks and transferring the knowledge learned for a task to others. We propose a multi-task learning approach that enables to learn vision-language representation that is shared by many tasks from their diverse datasets. The representation is hierarchical, and prediction for each task is computed from the representation at its corresponding level of the hierarchy. We show through experiments that our method consistently outperforms previous single-task-learning methods on image caption retrieval, visual question answering, and visual grounding. We also analyze the learned hierarchical representation by visualizing attention maps generated in our network.

本文言語English
ホスト出版物のタイトルProceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
出版社IEEE Computer Society
ページ10484-10493
ページ数10
ISBN(電子版)9781728132938
DOI
出版ステータスPublished - 2019 6
イベント32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 - Long Beach, United States
継続期間: 2019 6 162019 6 20

出版物シリーズ

名前Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
2019-June
ISSN(印刷版)1063-6919

Conference

Conference32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
国/地域United States
CityLong Beach
Period19/6/1619/6/20

ASJC Scopus subject areas

  • ソフトウェア
  • コンピュータ ビジョンおよびパターン認識

フィンガープリント

「Multi-task learning of hierarchical vision-language representation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル