Cross-lingual learning-to-rank with shared representations

Shota Sasaki, Shuo Sun, Shigehiko Schamoni, Kevin Duh, Kentaro Inui

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Cross-lingual information retrieval (CLIR) is a document retrieval task where the documents are written in a language different from that of the user's query. This is a challenging problem for data-driven approaches due to the general lack of labeled training data. We introduce a large-scale dataset derived from Wikipedia to support CLIR research in 25 languages. Further, we present a simple yet effective neural learning-to-rank model that shares representations across languages and reduces the data requirement. This model can exploit training data in, for example, Japanese-English CLIR to improve the results of Swahili-English CLIR.

Original languageEnglish
Title of host publicationShort Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages458-463
Number of pages6
ISBN (Electronic)9781948087292
Publication statusPublished - 2018
Event2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018 - New Orleans, United States
Duration: 2018 Jun 12018 Jun 6

Publication series

NameNAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
Volume2

Conference

Conference2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018
CountryUnited States
CityNew Orleans
Period18/6/118/6/6

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Cross-lingual learning-to-rank with shared representations'. Together they form a unique fingerprint.

Cite this