Rakuten Data Challenge suggests tackling the Large-Scale Taxonomy Challenge. Given a large amount of product titles and category paths leading to these products, we would like to predict the category path of a given product, only based on its title. The provided paths are structured as a forest of 14 trees. The learning process is split into two steps: we first retrieve the tree the input belongs to and then handle the category path. We describe data embedding which represents an important task in this challenge and then introduced the so-called two step architecture. The original idea is based on deep neural network model. We also introduce an actual method as second step modification since the former second step is not efficient enough. This last technique makes usage of multiple sets of random forest classifiers to navigate inside each tree.
ASJC Scopus subject areas
- コンピュータ サイエンス（全般）