PointWise HSIC: A linear-time kernelized co-occurrence norm for sparse linguistic expressions

Sho Yokoi, Sosuke Kobayashi, Kenji Fukumizu, Jun Suzuki, Kentaro Inui

研究成果: Conference contribution

抄録

In this paper, we propose a new kernel-based co-occurrence measure that can be applied to sparse linguistic expressions (e.g., sentences) with a very short learning time, as an alternative to pointwise mutual information (PMI). As well as deriving PMI from mutual information, we derive this new measure from the Hilbert-Schmidt independence criterion (HSIC); thus, we call the new measure the pointwise HSIC (PHSIC). PHSIC can be interpreted as a smoothed variant of PMI that allows various similarity metrics (e.g., sentence embeddings) to be plugged in as kernels. Moreover, PHSIC can be estimated by simple and fast (linear in the size of the data) matrix calculations regardless of whether we use linear or nonlinear kernels. Empirically, in a dialogue response selection task, PHSIC is learned thousands of times faster than an RNN-based PMI while outperforming PMI in accuracy. In addition, we also demonstrate that PHSIC is beneficial as a criterion of a data selection task for machine translation owing to its ability to give high (low) scores to a consistent (inconsistent) pair with other pairs.

本文言語English
ホスト出版物のタイトルProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
編集者Ellen Riloff, David Chiang, Julia Hockenmaier, Jun'ichi Tsujii
出版社Association for Computational Linguistics
ページ1763-1775
ページ数13
ISBN(電子版)9781948087841
出版ステータスPublished - 2020 1 1
イベント2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 - Brussels, Belgium
継続期間: 2018 10 312018 11 4

出版物シリーズ

名前Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018

Conference

Conference2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
CountryBelgium
CityBrussels
Period18/10/3118/11/4

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

フィンガープリント 「PointWise HSIC: A linear-time kernelized co-occurrence norm for sparse linguistic expressions」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル