TY - GEN
T1 - Attention is not only a weight
T2 - 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020
AU - Kobayashi, Goro
AU - Kuribayashi, Tatsuki
AU - Yokoi, Sho
AU - Inui, Kentaro
N1 - Funding Information:
We would like to thank the anonymous reviewers of the EMNLP 2020 and the ACL 2020 Student Research Workshop (SRW), and the SRW mentor Junjie Hu for their insightful comments. We also thank the members of Tohoku NLP Laboratory for helpful comments. This work was supported by JSPS KAKENHI Grant Number JP19H04162. This work was also partially supported by a Bilateral Joint Research Program between RIKEN AIP Center and Tohoku University.
Publisher Copyright:
© 2020 Association for Computational Linguistics.
PY - 2020
Y1 - 2020
N2 - Attention is a key component of Transformers, which have recently achieved considerable success in natural language processing. Hence, attention is being extensively studied to investigate various linguistic capabilities of Transformers, focusing on analyzing the parallels between attention weights and specific linguistic phenomena. This paper shows that attention weights alone are only one of the two factors that determine the output of attention and proposes a norm-based analysis that incorporates the second factor, the norm of the transformed input vectors. The findings of our norm-based analyses of BERT and a Transformer-based neural machine translation system include the following: (i) contrary to previous studies, BERT pays poor attention to special tokens, and (ii) reasonable word alignment can be extracted from attention mechanisms of Transformer. These findings provide insights into the inner workings of Transformers.
AB - Attention is a key component of Transformers, which have recently achieved considerable success in natural language processing. Hence, attention is being extensively studied to investigate various linguistic capabilities of Transformers, focusing on analyzing the parallels between attention weights and specific linguistic phenomena. This paper shows that attention weights alone are only one of the two factors that determine the output of attention and proposes a norm-based analysis that incorporates the second factor, the norm of the transformed input vectors. The findings of our norm-based analyses of BERT and a Transformer-based neural machine translation system include the following: (i) contrary to previous studies, BERT pays poor attention to special tokens, and (ii) reasonable word alignment can be extracted from attention mechanisms of Transformer. These findings provide insights into the inner workings of Transformers.
UR - http://www.scopus.com/inward/record.url?scp=85108701909&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108701909&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85108701909
T3 - EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 7057
EP - 7075
BT - EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
Y2 - 16 November 2020 through 20 November 2020
ER -