Abstract
Rakuten Data Challenge suggests tackling the Large-Scale Taxonomy Challenge. Given a large amount of product titles and category paths leading to these products, we would like to predict the category path of a given product, only based on its title. The provided paths are structured as a forest of 14 trees. The learning process is split into two steps: we first retrieve the tree the input belongs to and then handle the category path. We describe data embedding which represents an important task in this challenge and then introduced the so-called two step architecture. The original idea is based on deep neural network model. We also introduce an actual method as second step modification since the former second step is not efficient enough. This last technique makes usage of multiple sets of random forest classifiers to navigate inside each tree.
Original language | English |
---|---|
Journal | CEUR Workshop Proceedings |
Volume | 2319 |
Publication status | Published - 2018 |
Event | 2018 SIGIR Workshop On eCommerce, eCom 2018 - Ann Arbor, United States Duration: 2018 Jul 12 → … |
Keywords
- Deep neural network
- Machine learning
- Natural language processing
- Random forest
ASJC Scopus subject areas
- Computer Science(all)