Learning to describe E-commerce images from noisy online data

Takuya Yashima, Naoaki Okazaki, Kentaro Inui, Kota Yamaguchi, Takayuki Okatani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Recent study shows successful results in generating a proper language description for the given image, where the focus is on detecting and describing the contextual relationship in the image, such as the kind of object, relationship between two objects, or the action. In this paper, we turn our attention to more subjective components of descriptions that contain rich expressions to modify objects – namely attribute expressions. We start by collecting a large amount of product images from the online market site Etsy, and consider learning a language generation model using a popular combination of a convolutional neural network (CNN) and a recurrent neural network (RNN). Our Etsy dataset contains unique noise characteristics often arising in the online market. We first apply natural language processing techniques to extract highquality, learnable examples in the real-world noisy data. We learn a generation model from product images with associated title descriptions, and examine how e-commerce specific meta-data and fine-tuning improve the generated expression. The experimental results suggest that we are able to learn from the noisy online data and produce a product description that is closer to a man-made description with possibly subjective attribute expressions.

Original languageEnglish
Title of host publicationComputer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers
EditorsKo Nishino, Shang-Hong Lai, Vincent Lepetit, Yoichi Sato
PublisherSpringer-Verlag
Pages85-100
Number of pages16
ISBN (Print)9783319541921
DOIs
Publication statusPublished - 2017 Jan 1
Event13th Asian Conference on Computer Vision, ACCV 2016 - Taipei, Taiwan, Province of China
Duration: 2016 Nov 202016 Nov 24

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10115 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th Asian Conference on Computer Vision, ACCV 2016
CountryTaiwan, Province of China
City Taipei
Period16/11/2016/11/24

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Learning to describe E-commerce images from noisy online data'. Together they form a unique fingerprint.

  • Cite this

    Yashima, T., Okazaki, N., Inui, K., Yamaguchi, K., & Okatani, T. (2017). Learning to describe E-commerce images from noisy online data. In K. Nishino, S-H. Lai, V. Lepetit, & Y. Sato (Eds.), Computer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers (pp. 85-100). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10115 LNCS). Springer-Verlag. https://doi.org/10.1007/978-3-319-54193-8_6