TY - GEN
T1 - Improving neural machine translation by incorporating hierarchical subword features
AU - Morishita, Makoto
AU - Suzuki, Jun
AU - Nagata, Masaaki
N1 - Publisher Copyright:
© 2018 COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings. All rights reserved.
PY - 2018
Y1 - 2018
N2 - This paper focuses on subword-based Neural Machine Translation (NMT). We hypothesize that in the NMT model, the appropriate subword units for the following three modules (layers) can differ: (1) the encoder embedding layer, (2) the decoder embedding layer, and (3) the decoder output layer. We find the subword based on Sennrich et al. (2016) has a feature that a large vocabulary is a superset of a small vocabulary and modify the NMT model enables the incorporation of several different subword units in a single embedding layer. We refer these small subword features as hierarchical subword features. To empirically investigate our assumption, we compare the performance of several different subword units and hierarchical subword features for both the encoder and decoder embedding layers. We confirmed that incorporating hierarchical subword features in the encoder consistently improves BLEU scores on the IWSLT evaluation datasets.
AB - This paper focuses on subword-based Neural Machine Translation (NMT). We hypothesize that in the NMT model, the appropriate subword units for the following three modules (layers) can differ: (1) the encoder embedding layer, (2) the decoder embedding layer, and (3) the decoder output layer. We find the subword based on Sennrich et al. (2016) has a feature that a large vocabulary is a superset of a small vocabulary and modify the NMT model enables the incorporation of several different subword units in a single embedding layer. We refer these small subword features as hierarchical subword features. To empirically investigate our assumption, we compare the performance of several different subword units and hierarchical subword features for both the encoder and decoder embedding layers. We confirmed that incorporating hierarchical subword features in the encoder consistently improves BLEU scores on the IWSLT evaluation datasets.
UR - http://www.scopus.com/inward/record.url?scp=85084056115&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084056115&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85084056115
T3 - COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings
SP - 618
EP - 629
BT - COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings
A2 - Bender, Emily M.
A2 - Derczynski, Leon
A2 - Isabelle, Pierre
PB - Association for Computational Linguistics (ACL)
T2 - 27th International Conference on Computational Linguistics, COLING 2018
Y2 - 20 August 2018 through 26 August 2018
ER -