Interpretable adversarial perturbation in input embedding space for text

Motoki Sato, Jun Suzuki, Hiroyuki Shindo, Yuji Matsumoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

Following great success in the image processing field, the idea of adversarial training has been applied to tasks in the natural language processing (NLP) field. One promising approach directly applies adversarial training developed in the image processing field to the input word embedding space instead of the discrete input space of texts. However, this approach abandons such interpretability as generating adversarial texts to significantly improve the performance of NLP tasks. This paper restores interpretability to such methods by restricting the directions of perturbations toward the existing words in the input embedding space. As a result, we can straightforwardly reconstruct each input with perturbations to an actual text by considering the perturbations to be the replacement of words in the sentence while maintaining or even improving the task performance.

Original languageEnglish
Title of host publicationProceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018
EditorsJerome Lang
PublisherInternational Joint Conferences on Artificial Intelligence
Pages4323-4330
Number of pages8
ISBN (Electronic)9780999241127
DOIs
Publication statusPublished - 2018
Event27th International Joint Conference on Artificial Intelligence, IJCAI 2018 - Stockholm, Sweden
Duration: 2018 Jul 132018 Jul 19

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2018-July
ISSN (Print)1045-0823

Other

Other27th International Joint Conference on Artificial Intelligence, IJCAI 2018
CountrySweden
CityStockholm
Period18/7/1318/7/19

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Interpretable adversarial perturbation in input embedding space for text'. Together they form a unique fingerprint.

Cite this