Large-scale taxonomy problem: A mixed machine learning approach

Quentin Labernia, Yashio Kabashima, Michimasa Irie, Toshiyuki Oike, Kohei Asano, Jinhee Chun, Takeshi Tokuyama

Research output: Contribution to journalConference articlepeer-review


Rakuten Data Challenge suggests tackling the Large-Scale Taxonomy Challenge. Given a large amount of product titles and category paths leading to these products, we would like to predict the category path of a given product, only based on its title. The provided paths are structured as a forest of 14 trees. The learning process is split into two steps: we first retrieve the tree the input belongs to and then handle the category path. We describe data embedding which represents an important task in this challenge and then introduced the so-called two step architecture. The original idea is based on deep neural network model. We also introduce an actual method as second step modification since the former second step is not efficient enough. This last technique makes usage of multiple sets of random forest classifiers to navigate inside each tree.

Original languageEnglish
JournalCEUR Workshop Proceedings
Publication statusPublished - 2018
Event2018 SIGIR Workshop On eCommerce, eCom 2018 - Ann Arbor, United States
Duration: 2018 Jul 12 → …


  • Deep neural network
  • Machine learning
  • Natural language processing
  • Random forest

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'Large-scale taxonomy problem: A mixed machine learning approach'. Together they form a unique fingerprint.

Cite this