Bibliography
[1] 慧立, 彦悰, and 道宣. 大慈恩寺三藏法師傳. Volume 2. 中华书局, 2000 (cited
on page 27).
[2] 中国翻译协. 2019 中国语言服务行业发展报告. 中国翻译协会, 2019 (cited
on page 27).
[3] 姚恺 赵军. 改革探讨创新进发展——全翻译专业学位究生
教育 2019 年会综述”. In: 中国翻译, 2019 (cited on page 27).
[4] James Knowlson. Universal Language Schemes in England and France 1600-1800.
University of Toronto Press, 1975 (cited on page 27).
[5] Claude E. Shannon. “A mathematical theory of communication”. In: volume 27.
3. Bell System Technical Journal, 1948, pages 379–423 (cited on page 27).
[6] Claude E. Shannon and Warren Weaver. “The mathematical theory of communi-
cation”. In: volume 13. IEEE Transactions on Instrumentation and Measurement,
1949 (cited on page 27).
[7] Warren Weaver. “Translation”. In: volume 14. 15-23. Cambridge: Technology Press,
MIT, 1955, page 10 (cited on page 27).
[8] Noam Chomsky. “Syntactic Structures”. In: volume 33. 3. Language, 1957 (cited
on pages 28, 99).
650 BIBLIOGRAPHY
[9] Peter F. Brown, John Cocke, Stephen Della Pietra, Vincent J. Della Pietra, Freder-
ick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. “A Statistical
Approach to Machine Translation”. In: volume 16. 2. Computational Linguistics,
1990, pages 79–85 (cited on pages 29, 41, 161).
[10] Peter F. Brown, Stephen Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer.
“The Mathematics of Statistical Machine Translation: Parameter Estimation”. In:
volume 19. 2. Computational Linguistics, 1993, pages 263–311 (cited on pages 29,
145, 146, 161, 163, 181, 183, 186, 190, 229).
[11] Makoto Nagao. “A framework of a mechanical translation between Japanese and
English by analogy principle”. In: Artificial and human intelligence, 1984, pages 351–
354 (cited on pages 29, 40).
[12] Satoshi Sato and Makoto Nagao. “Toward Memory-based Translation”. In: Inter-
national Conference on Computational Linguistics, 1990, pages 247–252 (cited on
page 29).
[13] Sergei Nirenburg. “Knowledge-based machine translation”. In: volume 4. 1. Springer,
1989, pages 5–24 (cited on page 35).
[14] William John Hutchins. Machine translation: past, present, future. Ellis Horwood
Chichester, 1986 (cited on page 35).
[15] Michael Zarechnak. “The history of machine translation”. In: volume 1979. Ma-
chine Translation, 1979, pages 1–87 (cited on page 35).
[16] 冯志伟. 机器翻译研究. 中国对外翻译出版公司, 2004 (cited on page 36).
[17] Dan Jurafsky and James H. Martin. Speech and language processing: an introduc-
tion to natural language processing, computational linguistics, and speech recog-
nition, 2nd Edition. Prentice Hall, Pearson Education International, 2009 (cited on
pages 36, 66, 146).
[18] . 述语 (CTRDL)”. In:
volume 5. 4. 中文信息学报, 1991 (cited on page 39).
[19] 姚天顺 唐泓英. 基于搭配词典的词汇语义驱动算法”. In: volume 6. A01. 软件
学报, 1995, pages 78–85 (cited on page 39).
[20] William A. Gale and Kenneth W. Church. “A program for aligning sentences in
bilingual corpora”. In: volume 19. 1. Computational Linguistics, 1993, pages 75–
102 (cited on page 41).
[21] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. “Sequence to Sequence Learning
with Neural Networks”. In: Advances in Neural Information Processing Systems,
2014, pages 3104–3112 (cited on pages 42, 348, 359).
BIBLIOGRAPHY 651
[22] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural Machine Trans-
lation by Jointly Learning to Align and Translate”. In: International Conference on
Learning Representations, 2015 (cited on pages 42, 198, 343, 348, 359, 369, 374,
386, 398, 477, 633).
[23] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan
Gomez, Lukasz Kaiser, and Illia Polosukhin. “Attention is All You Need”. In: In-
ternational Conference on Neural Information Processing, 2017, pages 5998–6008
(cited on pages 42, 78, 198, 338, 349, 352, 386, 410, 411, 426, 514, 525).
[24] Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin.
“Convolutional Sequence to Sequence Learning”. In: volume 70. International Con-
ference on Machine Learning, 2017, pages 1243–1252 (cited on pages 42, 349, 352,
387, 394).
[25] Thang Luong, Hieu Pham, and Christopher D. Manning. “Effective Approaches to
Attention-based Neural Machine Translation”. In: Conference on Empirical Meth-
ods in Natural Language Processing, 2015, pages 1412–1421 (cited on pages 42,
359, 369, 386, 398, 506).
[26] Philipp Koehn. Statistical Machine Translation. Cambridge University Press, 2010
(cited on page 44).
[27] Philipp Koehn. Neural Machine Translation. Cambridge University Press, 2020
(cited on page 44).
[28] Christopher D Manning, Christopher D Manning, and Hinrich Schütze. Founda-
tions of statistical natural language processing. Massachusetts Institute of Tech-
nology Press, 1999 (cited on page 44).
[29] 宗成庆. 统计自然语言处理. 清华大学出版社, 2013 (cited on page 44).
[30] Ian J. Goodfellow, Yoshua Bengio, and Aaron C. Courville. Deep Learning. MIT
Press, 2016 (cited on pages 44, 343).
[31] Yoav Goldberg. “Neural network methods for natural language processing”. In: vol-
ume 10. 1. Morgan & Claypool Publishers, 2017, pages 1–309 (cited on pages 44,
343).
[32] 周志华. 机器学习. 清华大学出版社, 2016 (cited on pages 44, 96).
[33] 李航. 统计学习方法. 清华大学出版社, 2019 (cited on pages 44, 96, 97).
[34] 邱锡鹏. 神经网络与深度学习. 机械工业出版社, 2020 (cited on page 44).
[35] 魏宗
.
概率论与数理统计教程
:
第二
.
北京
:
高等教育出版
, 2011 (cited
on page 48).
652 BIBLIOGRAPHY
[36] Andre Nikolaevich Kolmogorov and Albert T Bharucha-Reid. Foundations of the
theory of probability: Second English Edition. Courier Dover Publications, 2018
(cited on page 48).
[37] 刘克. 实用马尔可夫决策过程. 清华大学出版社, 2004 (cited on page 60).
[38] A. Barbour and Sidney Resnick. “Adventures in Stochastic Processes.” In: vol-
ume 88. Journal of the American Statistical Association, Dec. 1993, page 1474
(cited on page 60).
[39] Irving J Good. “The population frequencies of species and the estimation of popu-
lation parameters”. In: volume 40. 3-4. Oxford University Press, 1953, pages 237–
264 (cited on page 63).
[40] William A. Gale and Geoffrey Sampson. “Good-Turing Frequency Estimation With-
out Tears”. In: volume 2. 3. Journal of Quantitative Linguistics, 1995, pages 217–
237 (cited on page 63).
[41] Reinhard Kneser and Hermann Ney. “Improved backing-off for M-gram language
modeling”. In: International Conference on Acoustics, Speech, and Signal Process-
ing, 1995, pages 181–184 (cited on page 64).
[42] Stanley F. Chen and Joshua Goodman. “An empirical study of smoothing tech-
niques for language modeling”. In: volume 13. 4. Computer Speech & Language,
1999, pages 359–393 (cited on pages 64, 66, 78).
[43] Hermann Ney and Ute Essen. “On smoothing techniques for bigram-based natu-
ral language modelling”. In: International Conference on Acoustics, Speech, and
Signal Processing, 1991, pages 825–828 (cited on page 65).
[44] Hermann Ney, Ute Essen, and Reinhard Kneser. “On structuring probabilistic de-
pendences in stochastic language modelling”. In: volume 8. 1. Computer Speech
& Language, 1994, pages 1–38 (cited on pages 65, 66).
[45] Kenneth Heafield. “KenLM: Faster and Smaller Language Model Queries”. In: An-
nual Meeting of the Association for Computational Linguistics, 2011, pages 187–
197 (cited on pages 66, 78).
[46] Andreas Stolcke. “SRILM - an extensible language modeling toolkit”. In: Interna-
tional Conference on Spoken Language Processing, 2002 (cited on page 66).
[47] Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction
to Algorithms. The MIT Press and McGraw-Hill Book Company, 1989 (cited on
page 69).
[48] Shimon Even. Graph algorithms. Cambridge University Press, 2011 (cited on page 72).
BIBLIOGRAPHY 653
[49] Robert Endre Tarjan. “Depth-First Search and Linear Graph Algorithms”. In: vol-
ume 1. 2. SIAM Journal on Computing, 1972, pages 146–160 (cited on page 72).
[50] Ashish Sabharwal and Bart Selman. “S. Russell, P. Norvig, Artificial Intelligence:
A Modern Approach, Third Edition”. In: volume 175. 5-6. Artificial Intelligence,
2011, pages 935–937 (cited on page 74).
[51] Sartaj Sahni and Ellis Horowitz. Fundamentals of Computer Algorithms. Computer
Science Press, 1978 (cited on page 74).
[52] Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. “A Formal Basis for the Heuris-
tic Determination of Minimum Cost Paths”. In: volume 4. 2. IEEE Transactions on
Systems Science and Cybernetics, 1968, pages 100–107 (cited on page 75).
[53] Bruce T. Lowerre. The HARPY speech recognition system. Carnegie Mellon Uni-
versity, 1976 (cited on page 75).
[54] Christopher M. Bishop. Neural networks for pattern recognition. Oxford university
press, 1995 (cited on page 75).
[55] Karl Johan Åström. “Optimal control of Markov processes with incomplete state in-
formation”. In: volume 10. 1. Journal of Mathematical Analysis and Applications,
1965, pages 174–205 (cited on page 75).
[56] Richard E. Korf. “Real-time heuristic search”. In: volume 42. 2. Artificial Intelli-
gence, 1990, pages 189–211 (cited on page 75).
[57] Liang Huang, Kai Zhao, and Mingbo Ma. “When to Finish? Optimal Beam Search
for Neural Text Generation (modulo beam size)”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2017, pages 2134–2139 (cited on pages 77,
477).
[58] Yilin Yang, Liang Huang, and Mingbo Ma. “Breaking the Beam Search Curse: A
Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Transla-
tion”. In: Annual Meeting of the Association for Computational Linguistics, 2018,
pages 3054–3059 (cited on pages 77, 477, 479).
[59] F. Jelinek. “Interpolated estimation of Markov source parameters from sparse data”.
In: Pattern Recognition in Practice, 1980, pages 381–397 (cited on page 78).
[60] S. Katz. “Estimation of probabilities from sparse data for the language model com-
ponent of a speech recognizer”. In: volume 35. 3. International Conference on
Acoustics, Speech and Signal Processing, 1987, pages 400–401 (cited on page 78).
[61] Timothy C. Bell, John G. Cleary, and Ian H. Witten. Text compression. Prentice
Hall, 1990 (cited on page 78).
654 BIBLIOGRAPHY
[62] I.H. Witten and T.C. Bell. “The zero-frequency problem: estimating the probabili-
ties of novel events in adaptive text compression”. In: volume 37. 4. IEEE Trans-
actions on Information Theory, 1991, pages 1085–1094 (cited on page 78).
[63] Joshua T. Goodman. “A bit of progress in language modeling”. In: volume 15. 4.
Computer Speech & Language, 2001, pages 403–434 (cited on page 78).
[64] Katrin Kirchhoff and Mei Yang. “Improved Language Modeling for Statistical Ma-
chine Translation”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2005, pages 125–128 (cited on page 78).
[65] Ruhi Sarikaya and Yonggang Deng. “Joint Morphological-Lexical Language Mod-
eling for Machine Translation”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2007, pages 145–148 (cited on page 78).
[66] Philipp Koehn and Hieu Hoang. “Factored Translation Models”. In: Annual Meet-
ing of the Association for Computational Linguistics, 2007, pages 868–876 (cited
on page 78).
[67] Marcello Federico and Mauro Cettolo. “Efficient Handling of N-gram Language
Models for Statistical Machine Translation”. In: Annual Meeting of the Association
for Computational Linguistics, 2007, pages 88–95 (cited on page 78).
[68] Marcello Federico and Nicola Bertoldi. “How Many Bits Are Needed To Store
Probabilities for Phrase-Based Translation?” In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2006, pages 94–101 (cited on page 78).
[69] David Talbot and Miles Osborne. “Smoothed Bloom Filter Language Models: Tera-
Scale LMs on the Cheap”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2007, pages 468–476 (cited on page 78).
[70] David Talbot and Miles Osborne. “Randomised Language Modelling for Statistical
Machine Translation”. In: Annual Meeting of the Association for Computational
Linguistics, 2007, pages 512–519 (cited on page 78).
[71] Kun Jing and Jungang Xu. “A Survey on Neural Network Language Models.” In:
arXiv preprint arXiv:1906.03591, 2019 (cited on page 78).
[72] Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. “A neu-
ral probabilistic language model”. In: volume 3. 6. Journal of Machine Learning
Research, 2003, pages 1137–1155 (cited on pages 78, 126, 288, 334).
[73] Tomas Mikolov, Martin Karafiát, Lukás Burget, Jan Cernocký, and Sanjeev Khu-
danpur. “Recurrent neural network based language model”. In: International Speech
Communication Association, 2010, pages 1045–1048 (cited on pages 78, 288, 336).
BIBLIOGRAPHY 655
[74] Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. “LSTM Neural Networks
for Language Modeling”. In: International Speech Communication Association,
2012, pages 194–197 (cited on page 78).
[75] Franz Josef Och, Nicola Ueffing, and Hermann Ney. “An Efficient A* Search Algo-
rithm for Statistical Machine Translation”. In: Proceedings of the ACL Workshop
on Data-Driven Methods in Machine Translation, 2001 (cited on page
78).
[76] Ye-Yi Wang and Alex Waibel. “Decoding Algorithm in Statistical Machine Trans-
lation”. In: Morgan Kaufmann Publishers, 1997, pages 366–372 (cited on pages 78,
229).
[77] Christoph Tillmann, Stephan Vogel, Hermann Ney, and Alex Zubiaga. “A DP-
based Search Using Monotone Alignments in Statistical Translation”. In: Morgan
Kaufmann Publishers, 1997, pages 289–296 (cited on pages 78, 228).
[78] Ulrich Germann, Michael Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada.
“Fast Decoding and Optimal Decoding for Machine Translation”. In: Morgan Kauf-
mann Publishers, 2001, pages 228–235 (cited on pages 78, 179).
[79] Ulrich Germann. “Greedy decoding for statistical machine translation in almost
linear time”. In: Annual Meeting of the Association for Computational Linguistics,
2003, pages 1–8 (cited on pages 78, 179).
[80] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Fed-
erico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens,
Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. “Moses: Open
Source Toolkit for Statistical Machine Translation”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2007 (cited on pages 78, 198, 215, 228,
472, 475, 631).
[81] Philipp Koehn. “Pharaoh: A Beam Search Decoder for Phrase-Based Statistical
Machine Translation Models”. In: volume 3265. Springer, 2004, pages 115–124
(cited on pages 78, 198, 228, 472).
[82] S. Bangalore and G. Riccardi. “A finite-state approach to machine translation”. In:
Annual Meeting of the Association for Computational Linguistics, 2001, pages 381–
388 (cited on page 78).
[83] Srinivas Bangalore and Giuseppe Riccardi. “Stochastic Finite-State Models for
Spoken Language Machine Translation”. In: volume 17. 3. Machine Translation,
2002, pages 165–184 (cited on page 78).
656 BIBLIOGRAPHY
[84] Ashish Venugopal, Andreas Zollmann, and Vogel Stephan. “An Efficient Two-
Pass Approach to Synchronous-CFG Driven Statistical MT”. In: Annual Meeting
of the Association for Computational Linguistics, 2007, pages 500–507 (cited on
page 78).
[85] Andreas Zollmann, Ashish Venugopal, Matthias Paulik, and Stephan Vogel. “The
Syntax Augmented MT (SAMT) System at the Shared Task for the 2007 ACL
Workshop on Statistical Machine Translation”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2007, pages 216–219 (cited on pages 78,
632).
[86] Yang Liu, Qun Liu, and Shouxun Lin. “Tree-to-String Alignment Template for
Statistical Machine Translation”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2006 (cited on pages 78, 251, 278).
[87] Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei
Wang, and Ignacio Thayer. “Scalable Inference and Training of Context-Rich Syn-
tactic Translation Models”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2006 (cited on pages 78, 251, 258, 278).
[88] David Chiang. “A Hierarchical Phrase-Based Model for Statistical Machine Trans-
lation”. In: Annual Meeting of the Association for Computational Linguistics, 2005,
pages 263–270 (cited on pages 78, 236, 278).
[89] Rico Sennrich, Barry Haddow, and Alexandra Birch. “Neural Machine Translation
of Rare Words with Subword Units”. In: Annual Meeting of the Association for
Computational Linguistics, 2016 (cited on pages 81, 435, 436, 480, 549).
[90] 刘挺, 吴岩, and 王开铸. 最大概率分词问题及其解”. In: 06. 哈尔滨工业
学学报, 1998, pages 37–41 (cited on page 86).
[91] 丁洁. 基于最大概率分词算法的中文分词方法研究”. In: 21. 科技信息, 2010,
pages I0075–I0075 (cited on page 86).
[92] Richard Bellman. “Dynamic programming”. In: volume 153. 3731. Science, 1966,
pages 34–37 (cited on page 86).
[93] Kevin Humphreys, Robert J. Gaizauskas, Saliha Azzam, Charles Huyck, Brian
Mitchell, Hamish Cunningham, and Yorick Wilks. University of Sheffield: Descrip-
tion of the LaSIE-II system as used for MUC-7. Annual Meeting of the Association
for Computational Linguistics, 1995 (cited on page 88).
[94] George Krupka and Kevin Hausman. “IsoQuest Inc.: Description of the NetOwl™
Extractor System as Used for MUC-7”. In: Annual Meeting of the Association for
Computational Linguistics, 1998 (cited on page 88).
BIBLIOGRAPHY 657
[95] William J Black, Fabio Rinaldi, and David Mowatt. “FACILE: Description of the
NE System Used for MUC-7”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 1998 (cited on page 88).
[96] Sean R Eddy. “Hidden Markov models.” In: volume 6. 3. Current Opinion in Struc-
tural Biology, 1996, pages 361–5 (cited on pages 88, 90).
[97] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. “Conditional
Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”.
In: proceedings of the Eighteenth International Conference on Machine Learning,
2001, pages 282–289 (cited on pages 88, 94, 95, 106).
[98] Jagat Narain Kapur. Maximum-entropy models in science and engineering. John
Wiley & Sons, 1989 (cited on page 88).
[99] Marti A. Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf.
“Support vector machines”. In: volume 13. 4. IEEE Intelligent Systems & Their
Applications, 1998, pages 18–28 (cited on page 88).
[100] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu,
and Pavel Kuksa. “Natural Language Processing (almost) from Scratch”. In: vol-
ume 12. 1. Journal of Machine Learning Research, 2011, pages 2493–2537 (cited
on pages 88, 343, 406, 552).
[101] Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami,
and Chris Dyer. “Neural Architectures for Named Entity Recognition”. In: Annual
Meeting of the Association for Computational Linguistics, 2016, pages 260–270
(cited on page 89).
[102] Leonard E Baum and Ted Petrie. “Statistical Inference for Probabilistic Functions
of Finite State Markov Chains”. In: volume 37. 6. Annals of Mathematical Stats,
1966, pages 1554–1563 (cited on page 90).
[103] Leonard E Baum, Ted Petrie, George Soules, and Norman Weiss. “A maximization
technique occurring in the statistical analysis of probabilistic functions of Markov
chains”. In: volume 41. 1. Annals of Mathematical Stats, 1970, pages 164–171
(cited on pages 90, 92).
[104] Arthur P Dempster, Nan M Laird, and Donald B Rubin. “Maximum likelihood
from incomplete data via the EM algorithm”. In: volume 39. 1. Journal of the Royal
Statistical Society: Series B (Methodological), 1977, pages 1–22 (cited on page 92).
[105] Andrew Viterbi. “Error bounds for convolutional codes and an asymptotically op-
timum decoding algorithm”. In: volume 13. 2. IEEE Transactions on Information
Theory, 1967, pages 260–269 (cited on page 92).
658 BIBLIOGRAPHY
[106] Peter Harrington. ”. In: 版社, , 2013 (cited on
page 97).
[107] Andrew Y Ng and Michael I Jordan. “On Discriminative vs. Generative Classi-
fiers: A comparison of logistic regression and naive Bayes”. In: MIT Press, 2001,
pages 841–848 (cited on page 106).
[108] Christopher D Manning, Hinrich Schütze, and Prabhakar Raghavan. Introduction
to information retrieval. Cambridge university press, 2008 (cited on page 106).
[109] Adam Berger, Stephen A Della Pietra, and Vincent J Della Pietra. “A maximum
entropy approach to natural language processing”. In: volume 22. 1. Computational
linguistics, 1996, pages 39–71 (cited on page 106).
[110] Tom Mitchell. Machine Learning. McCraw Hill, 1996 (cited on page 106).
[111] Franz Josef Och and Hermann Ney. “Discriminative Training and Maximum En-
tropy Models for Statistical Machine Translation”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2002, pages 295–302 (cited on pages 106,
207).
[112] Liang Huang. “Coling 2008: Advanced Dynamic Programming in Computational
Linguistics: Theory, Algorithms and Applications-Tutorial notes”. In: International
Conference on Computational Linguistics, 2008 (cited on page 106).
[113] Mehryar Mohri, Fernando Pereira, and Michael Riley. “Speech recognition with
weighted finite-state transducers”. In: Springer, 2008, pages 559–584 (cited on
page 106).
[114] Alfred V Aho and Jeffrey D Ullman. The theory of parsing, translation, and com-
piling. Prentice-Hall Englewood Cliffs, NJ, 1973 (cited on page 106).
[115] Thorsten Brants. “TnT - A Statistical Part-of-Speech Tagger”. In: Annual Meeting
of the Association for Computational Linguistics, 2000, pages 224–231 (cited on
page 106).
[116] Yoshimasa Tsuruoka and Jun’ichi Tsujii. “Chunk Parsing Revisited”. In: Annual
Meeting of the Association for Computational Linguistics, 2005, pages 133–140
(cited on page 106).
[117] Sujian Li, Houfeng Wang, Shiwen Yu, and Chengsheng Xin. “News-Oriented Au-
tomatic Chinese Keyword Indexing”. In: Annual Meeting of the Association for
Computational Linguistics, 2003, pages 92–97 (cited on page 106).
[118] Noam Chomsky. Lectures on government and binding: The Pisa lectures. Walter
de Gruyter, 1993 (cited on page 106).
BIBLIOGRAPHY 659
[119] Zhiheng Huang, Wei Xu, and Kai Yu. “Bidirectional LSTM-CRF Models for Se-
quence Tagging”. In: CoRR, 2015 (cited on page 106).
[120] Jason PC Chiu and Eric Nichols. “Named entity recognition with bidirectional
LSTM-CNNs”. In: volume 4. MIT Press, 2016, pages 357–370 (cited on page 106).
[121] Andrej Zukov Gregoric, Yoram Bachrach, and Sam Coope. “Named Entity Recog-
nition With Parallel Recurrent Neural Networks”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2018, pages 69–74 (cited on page 107).
[122] Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li. “A Survey on Deep Learning
for Named Entity Recognition”. In: volume PP. 99. IEEE Transactions on Knowl-
edge and Data Engineering, 2020, pages 1–1 (cited on page 107).
[123] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. “Bert: Pre-
training of deep bidirectional transformers for language understanding”. In: Annual
Meeting of the Association for Computational Linguistics, 2019, pages 4171–4186
(cited on pages 107, 127, 475, 493, 548, 552–554, 575).
[124] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. “Improving
language understanding by generative pre-training”. In: 2018 (cited on pages 107,
127, 552–554).
[125] Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guil-
laume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer,
and Veselin Stoyanov. “Unsupervised Cross-lingual Representation Learning at
Scale”. In: Annual Meeting of the Association for Computational Linguistics, 2020,
pages 8440–8451 (cited on page 107).
[126] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-jing Zhu. “Bleu: a Method
for Automatic Evaluation of Machine Translation”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2002, pages 311–318 (cited on pages 109,
117, 118).
[127] Kenneth W Church and Eduard H Hovy. “Good applications for crummy machine
translation”. In: volume 8. 4. Springer, 1993, pages 239–258 (cited on page 110).
[128] John B. Carroll. “An experiment in evaluating the quality of translations”. In:
volume 9. 3-4. Mech. Transl. Comput. Linguistics, 1966, pages 55–66 (cited on
page 113).
[129] John S White, Theresa A OConnell, and Francis E OMara. “The ARPA MT eval-
uation methodologies: evolution, lessons, and future approaches”. In: Proceedings
of the First Conference of the Association for Machine Translation in the Americas,
1994 (cited on pages 113, 114).
660 BIBLIOGRAPHY
[130] Keith J. Miller and Michelle Vanni. “Inter-rater Agreement Measures, and the Re-
finement of Metrics in the PLATO MT Evaluation Paradigm”. In: The tenth Ma-
chine Translation Summit, 2005, pages 125–132 (cited on page 113).
[131] Margaret King, Andrei Popescu-Belis, and Eduard Hovy. “FEMTI: creating and
using a framework for MT evaluation”. In: Proceedings of MT Summit IX, New
Orleans, LA, 2003, pages 224–231 (cited on page
113).
[132] Mark A. Przybocki, Kay Peterson, Sebastien Bronsart, and Gregory A. Sanders.
“The NIST 2008 Metrics for machine translation challenge - overview, methodol-
ogy, metrics, and results”. In: volume 23. 2-3. Machine Translation, 2009, pages 71–
103 (cited on page 114).
[133] Florence Reeder. “Direct application of a language learner test to MT evaluation”.
In: Proceedings of AMTA, 2006 (cited on page 114).
[134] Chris Callison-Burch, Cameron S. Fordyce, Philipp Koehn, Christof Monz, and
Josh Schroeder. “(Meta-) Evaluation of Machine Translation”. In: Annual Meeting
of the Association for Computational Linguistics, 2007, pages 136–158 (cited on
page 114).
[135] Chris Callison-Burch, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut,
and Lucia Specia. “Findings of the 2012 Workshop on Statistical Machine Transla-
tion”. In: Annual Meeting of the Association for Computational Linguistics, 2012,
pages 10–51 (cited on page 114).
[136] Adam Lopez. “Putting Human Assessments of Machine Translation Systems in Or-
der”. In: Annual Meeting of the Association for Computational Linguistics, 2012,
pages 1–9 (cited on page 114).
[137] Philipp Koehn. “Simulating human judgment in machine translation evaluation
campaigns”. In: International Workshop on Spoken Language Translation, 2012,
pages 179–184 (cited on page 115).
[138] Ondrej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias
Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo
Negri, Matt Post, Carolina Scarton, Lucia Specia, and Marco Turchi. “Findings of
the 2015 Workshop on Statistical Machine Translation”. In: Annual Meeting of the
Association for Computational Linguistics, 2015, pages 1–46 (cited on page 115).
[139] Shujian Huang and Kevin Knight. Machine Translation: 15th China Conference,
CCMT 2019, Nanchang, China, September 27–29, 2019, Revised Selected Papers.
Volume 1104. Springer Nature, 2019 (cited on page 115).
BIBLIOGRAPHY 661
[140] Dan Jurafsky. Speech & language processing. Pearson Education India, 2000 (cited
on pages 116, 599).
[141] Christoph Tillmann, Stephan Vogel, Hermann Ney, Arkaitz Zubiaga, and Hassan
Sawaf. “Accelerated DP based search for statistical translation”. In: European Con-
ference on Speech Communication and Technology, 1997 (cited on page 116).
[142] Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul.
“A study of translation edit rate with targeted human annotation”. In: volume 200.
6. Proceedings of association for machine translation in the Americas, 2006 (cited
on page 116).
[143] Nancy Chinchor. “MUC-4 evaluation metrics”. In: Annual Meeting of the Associ-
ation for Computational Linguistics, 1992, pages 22–29 (cited on page 118).
[144] David Chiang, Steve DeNeefe, Yee Seng Chan, and Hwee Tou Ng. “Decomposabil-
ity of Translation Metrics for Improved Evaluation and Efficient Algorithms”. In:
Annual Meeting of the Association for Computational Linguistics, 2008, pages 610–
619 (cited on page 118).
[145] Matt Post. “A Call for Clarity in Reporting BLEU Scores”. In: Annual Meeting
of the Association for Computational Linguistics, 2018, pages 186–191 (cited on
page 118).
[146] Satanjeev Banerjee and Alon Lavie. “METEOR: An Automatic Metric for MT
Evaluation with Improved Correlation with Human Judgments”. In: Annual Meet-
ing of the Association for Computational Linguistics, 2005, pages 65–72 (cited on
page 119).
[147] Michael J. Denkowski and Alon Lavie. “METEOR-NEXT and the METEOR Para-
phrase Tables: Improved Evaluation Support for Five Target Languages”. In: An-
nual Meeting of the Association for Computational Linguistics, 2010, pages 339–
342 (cited on page 122).
[148] Michael J. Denkowski and Alon Lavie. “Meteor 1.3: Automatic Metric for Reliable
Optimization and Evaluation of Machine Translation Systems”. In: Annual Meet-
ing of the Association for Computational Linguistics, 2011, pages 85–91 (cited on
page 122).
[149] Michael J. Denkowski and Alon Lavie. “Meteor Universal: Language Specific
Translation Evaluation for Any Target Language”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2014, pages 376–380 (cited on page 122).
[150] Shiwen Yu. “Automatic evaluation of output quality for Machine Translation sys-
tems”. In: volume 8. 1-2. Mach. Transl., 1993, pages 117–126 (cited on page 122).
662 BIBLIOGRAPHY
[151] Ming Zhou, Bo Wang, Shujie Liu, Mu Li, Dongdong Zhang, and Tiejun Zhao. “Di-
agnostic Evaluation of Machine Translation Systems Using Automatically Con-
structed Linguistic Check-Points”. In: International Conference on Computational
Linguistics, 2008, pages 1121–1128 (cited on page 123).
[152] Joshua Albrecht and Rebecca Hwa. “A Re-examination of Machine Learning Ap-
proaches for Sentence-Level MT Evaluation”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2007 (cited on page 123).
[153] Joshua Albrecht and Rebecca Hwa. “Regression for Sentence-Level MT Evalua-
tion with Pseudo References”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2007 (cited on page 123).
[154] Ding Liu and Daniel Gildea. “Source-Language Features and Maximum Correla-
tion Training for Machine Translation Evaluation”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2007, pages 41–48 (cited on page 124).
[155] Jesús Giménez and Lluı
́
s Màrquez. “Heterogeneous Automatic MT Evaluation
Through Non-Parametric Metric Combinations”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2008, pages 319–326 (cited on page 124).
[156] Markus Dreyer and Daniel Marcu. “HyTER: Meaning-Equivalent Semantics for
Translation Evaluation”. In: Annual Meeting of the Association for Computational
Linguistics, 2012, pages 162–171 (cited on page 124).
[157] Ondrej Bojar, Matous Machácek, Ales Tamchyna, and Daniel Zeman. “Scratching
the Surface of Possible Translations”. In: volume 8082. Springer, 2013, pages 465–
474 (cited on page 125).
[158] Ying Qin and Lucia Specia. “Truly Exploring Multiple References for Machine
Translation Evaluation”. In: European Association for Machine Translation, 2015
(cited on page 126).
[159] Boxing Chen and Hongyu Guo. “Representation Based Translation Evaluation Met-
rics”. In: Annual Meeting of the Association for Computational Linguistics, 2015,
pages 150–155 (cited on page 126).
[160] Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christo-
pher D. Manning. “Semi-Supervised Recursive Autoencoders for Predicting Sen-
timent Distributions”. In: Annual Meeting of the Association for Computational
Linguistics, 2011, pages 151–161 (cited on pages 126, 127).
BIBLIOGRAPHY 663
[161] Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning,
Andrew Y. Ng, and Christopher Potts. “Recursive Deep Models for Semantic Com-
positionality Over a Sentiment Treebank”. In: Annual Meeting of the Association
for Computational Linguistics, 2013, pages 1631–1642 (cited on page 126).
[162] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. “Efficient Estimation
of Word Representations in Vector Space”. In: arXiv preprint arXiv:1301.3781,
2013 (cited on pages 127, 343).
[163] Quoc Le and Tomas Mikolov. “Distributed representations of sentences and docu-
ments”. In: International conference on machine learning, 2014, pages 1188–1196
(cited on pages 127, 551).
[164] Ben Athiwaratkun and Andrew Gordon Wilson. “Multimodal Word Distributions”.
In: Annual Meeting of the Association for Computational Linguistics, 2017, pages 1645–
1656 (cited on page 127).
[165] Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark,
Kenton Lee, and Luke Zettlemoyer. “Deep Contextualized Word Representations”.
In: Annual Conference of the North American Chapter of the Association for Com-
putational Linguistics, 2018, pages 2227–2237 (cited on pages 127, 343, 552, 553,
580).
[166] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. “Glove: Global
Vectors for Word Representation”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2014, pages 1532–1543 (cited on pages 127, 340, 343).
[167] Ryan Kiros, Yukun Zhu, Russ R Salakhutdinov, Richard Zemel, Raquel Urtasun,
Antonio Torralba, and Sanja Fidler. “Skip-thought vectors”. In: Advances in neural
information processing systems, 2015, pages 3294–3302 (cited on page 127).
[168] Junki Matsuo, Mamoru Komachi, and Katsuhito Sudoh. “Word-Alignment-Based
Segment-Level Machine Translation Evaluation using Word Embeddings”. In: vol-
ume abs/1704.00380. CoRR, 2017 (cited on page 127).
[169] Francisco Guzmán, Shafiq Joty, Lluı
́
s Màrquez, and Preslav Nakov. “Machine
translation evaluation with neural networks”. In: volume 45. Computer Speech &
Language, 2017, pages 180–200 (cited on page 128).
[170] Karl Pearson. “Notes on the history of correlation”. In: volume 13. 1. JSTOR, 1920,
pages 25–45 (cited on page 128).
[171] Deborah Coughlin. “Correlating automated and human assessments of machine
translation quality”. In: 2003 (cited on pages 128, 129).
664 BIBLIOGRAPHY
[172] Andrei Popescu-Belis. “An experiment in comparative evaluation: humans vs. com-
puters”. In: Proceedings of the Ninth Machine Translation Summit. New Orleans,
2003 (cited on page 128).
[173] Christopher Culy and Susanne Z Riehemann. “The limits of N-gram translation
evaluation metrics”. In: MT Summit IX, 2003, pages 71–78 (cited on page 129).
[174] Andrew Finch, Yasuhiro Akiba, and Eiichiro Sumita. “Using a paraphraser to im-
prove machine translation evaluation”. In: International Joint Conference on Natu-
ral Language Processing, 2004 (cited on page 129).
[175] Olivier Hamon and Djamel Mostefa. “The Impact of Reference Quality on Auto-
matic MT Evaluation”. In: International conference on machine learning, 2008,
pages 39–42 (cited on page 129).
[176] George Doddington. “Automatic evaluation of machine translation quality using
n-gram co-occurrence statistics”. In: Proceedings of the second international con-
ference on Human Language Technology Research, 2002, pages 138–145 (cited
on page 129).
[177] Chris Callison-Burch, Miles Osborne, and Philipp Koehn. “Re-evaluation the role
of bleu in machine translation research”. In: 11th Conference of the European Chap-
ter of the Association for Computational Linguistics, 2006 (cited on page 129).
[178] Hirotugu Akaike. “A new look at the statistical model identification”. In: volume 19.
6. IEEE, 1974, pages 716–723 (cited on page 129).
[179] Bradley Efron and Robert Tibshirani. An Introduction to the Bootstrap. Springer,
1993 (cited on page 130).
[180] Philipp Koehn. “Statistical Significance Tests for Machine Translation Evaluation”.
In: ACL, 2004, pages 388–395 (cited on page 130).
[181] Eric W Noreen. Computer-intensive methods for testing hypotheses. Wiley New
York, 1989 (cited on page 130).
[182] Stefan Riezler and John T. Maxwell III. “On Some Pitfalls in Automatic Evalua-
tion and Significance Testing for MT”. In: Annual Meeting of the Association for
Computational Linguistics, 2005, pages 57–64 (cited on page 130).
[183] Taylor Berg-Kirkpatrick, David Burkett, and Dan Klein. “An Empirical Investiga-
tion of Statistical Significance in NLP”. In: Annual Meeting of the Association for
Computational Linguistics, 2012, pages 995–1005 (cited on pages 130, 131).
[184] Michael Gamon, Anthony Aue, and Martine Smets. “Sentence-level MT evalua-
tion without reference translations: Beyond language modeling”. In: Proceedings
of EAMT, 2005, pages 103–111 (cited on page 135).
BIBLIOGRAPHY 665
[185] Christopher Quirk. “Training a Sentence-Level Machine Translation Confidence
Measure”. In: European Language Resources Association, 2004 (cited on page 135).
[186] Douglas A. Jones, Edward Gibson, Wade Shen, Neil Granoien, Martha Herzog,
Douglas A. Reynolds, and Clifford J. Weinstein. “Measuring human readability
of machine generated text: three case studies in speech recognition and machine
translation”. In: IEEE, 2005, pages 1009–1012 (cited on page
136).
[187] Carolina Scarton, Marcos Zampieri, Mihaela Vela, Josef van Genabith, and Lucia
Specia. “Searching for Context: a Study on Document-Level Labels for Transla-
tion Quality Estimation”. In: European Association for Machine Translation, 2015
(cited on page 136).
[188] Pablo Fetter, Frédéric Dandurand, and Peter Regel-Brietzmann. “Word graph rescor-
ing using confidence measures”. In: volume 1. Proceeding of Fourth International
Conference on Spoken Language Processing, 1996, pages 10–13 (cited on page 137).
[189] Ergun Biçici. “Referential Translation Machines for Quality Estimation”. In: An-
nual Meeting of the Association for Computational Linguistics, 2013, pages 343–
351 (cited on pages 137, 138).
[190] José Guilherme Camargo de Souza, Christian Buck, Marco Turchi, and Matteo Ne-
gri. “FBK-UEdin Participation to the WMT13 Quality Estimation Shared Task”. In:
Annual Meeting of the Association for Computational Linguistics, 2013, pages 352–
358 (cited on page 137).
[191] Ergun Biçici and Andy Way. “Referential Translation Machines for Predicting
Translation Quality”. In: Annual Meeting of the Association for Computational
Linguistics, 2014, pages 313–321 (cited on pages 137, 141).
[192] José Guilherme Camargo de Souza, Jesús González-Rubio, Christian Buck, Marco
Turchi, and Matteo Negri. “FBK-UPV-UEdin participation in the WMT14 Quality
Estimation shared-task”. In: Annual Meeting of the Association for Computational
Linguistics, 2014, pages 322–328 (cited on pages 137, 138).
[193] Miquel Esplà-Gomis, Felipe Sánchez-Martı
́
nez, and Mikel L. Forcada. “UAlacant
word-level machine translation quality estimation system at WMT 2015”. In: An-
nual Meeting of the Association for Computational Linguistics, 2015, pages 309–
315 (cited on page 137).
[194] Julia Kreutzer, Shigehiko Schamoni, and Stefan Riezler. “QUality Estimation from
ScraTCH (QUETCH): Deep Learning for Word-level Translation Quality Estima-
tion”. In: Annual Meeting of the Association for Computational Linguistics, 2015,
pages 316–322 (cited on page 138).
666 BIBLIOGRAPHY
[195] André F. T. Martins, Ramón Fernández Astudillo, Chris Hokamp, and Fabio Ke-
pler. “Unbabel’s Participation in the WMT16 Word-Level Translation Quality Es-
timation Shared Task”. In: Annual Meeting of the Association for Computational
Linguistics, 2016, pages 806–811 (cited on page 138).
[196] Zhiming Chen, Yiming Tan, Chenlin Zhang, Qingyu Xiang, Lilin Zhang, Maoxi
Li, and Mingwen Wang. “Improving Machine Translation Quality Estimation with
Neural Network Features”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2017, pages 551–555 (cited on page 138).
[197] Julia Kreutzer, Shigehiko Schamoni, and Stefan Riezler. “Quality estimation from
scratch (quetch): Deep learning for word-level translation quality estimation”. In:
Proceedings of the Tenth Workshop on Statistical Machine Translation, 2015, pages 316–
322 (cited on page 138).
[198] Kashif Shah, Varvara Logacheva, Gustavo Paetzold, Frédéric Blain, Daniel Beck,
Fethi Bougares, and Lucia Specia. “SHEF-NN: Translation Quality Estimation
with Neural Networks”. In: Annual Meeting of the Association for Computational
Linguistics, 2015, pages 342–347 (cited on page 138).
[199] Carolina Scarton, Daniel Beck, Kashif Shah, Karin Sim Smith, and Lucia Specia.
“Word embeddings and discourse information for Quality Estimation”. In: Annual
Meeting of the Association for Computational Linguistics, 2016, pages 831–837
(cited on page 138).
[200] Amal Abdelsalam, Ondrej Bojar, and Samhaa El-Beltagy. “Bilingual Embeddings
and Word Alignments for Translation Quality Estimation”. In: Annual Meeting
of the Association for Computational Linguistics, 2016, pages 764–771 (cited on
page 138).
[201] Prasenjit Basu, Santanu Pal, and Sudip Kumar Naskar. “Keep It or Not: Word Level
Quality Estimation for Post-Editing”. In: Annual Meeting of the Association for
Computational Linguistics, 2018, pages 759–764 (cited on page 138).
[202] Hou Qi. “NJU Submissions for the WMT19 Quality Estimation Shared Task”. In:
Annual Meeting of the Association for Computational Linguistics, 2019, pages 95–
100 (cited on page 138).
[203] Junpei Zhou, Zhisong Zhang, and Zecong Hu. “SOURCE: SOURce-Conditional
Elmo-style Model for Machine Translation Quality Estimation”. In: Annual Meet-
ing of the Association for Computational Linguistics, 2019, pages 106–111 (cited
on page 138).
BIBLIOGRAPHY 667
[204] Chris Hokamp. “Ensembling Factored Neural Machine Translation Models for Au-
tomatic Post-Editing and Quality Estimation”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2017, pages 647–654 (cited on page 138).
[205] Ziyang Wang, Hui Liu, Hexuan Chen, Kai Feng, Zeyang Wang, Bei Li, Chen Xu,
Tong Xiao, and Jingbo Zhu. “NiuTrans Submission for CCMT19 Quality Estima-
tion Task”. In: Springer, 2019, pages 82–92 (cited on page
138).
[206] Fábio Kepler, Jonay Trénous, Marcos Treviso, Miguel Vera, António Góis, M Amin
Farajian, António V Lopes, and André FT Martins. “Unbabel’s Participation in the
WMT19 Translation Quality Estimation Shared Task”. In: 2019, pages 78–84 (cited
on page 138).
[207] Elizaveta Yankovskaya, Andre Tättar, and Mark Fishel. “Quality Estimation and
Translation Metrics via Pre-trained Word and Sentence Embeddings”. In: Annual
Meeting of the Association for Computational Linguistics, 2019, pages 101–105
(cited on page 138).
[208] Hyun Kim, Joon-Ho Lim, Hyun-Ki Kim, and Seung-Hoon Na. “QE BERT: Bilin-
gual BERT Using Multi-task Learning for Neural Quality Estimation”. In: An-
nual Meeting of the Association for Computational Linguistics, 2019, pages 85–89
(cited on page 138).
[209] Silja Hildebrand and Stephan Vogel. “MT Quality Estimation: The CMU System
for WMT’13”. In: Annual Meeting of the Association for Computational Linguis-
tics, 2013, pages 373–379 (cited on page 138).
[210] André FT Martins, Ramón Astudillo, Chris Hokamp, and Fabio Kepler. “Unbabel
s participation in the wmt16 word-level translation quality estimation shared task”.
In: Proceedings of the First Conference on Machine Translation, 2016, pages 806–
811 (cited on page 138).
[211] Ding Liu and Daniel Gildea. “Syntactic Features for Evaluation of Machine Trans-
lation”. In: Annual Meeting of the Association for Computational Linguistics, 2005,
pages 25–32 (cited on page 140).
[212] Jesús Giménez and Lluı
́
s Màrquez. “Linguistic Features for Automatic Evaluation
of Heterogenous MT Systems”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2007, pages 256–264 (cited on page 140).
[213] Sebastian Padó, Daniel M. Cer, Michel Galley, Dan Jurafsky, and Christopher D.
Manning. “Measuring machine translation quality as semantic equivalence: A met-
ric based on entailment features”. In: volume 23. 2-3. Machine Translation, 2009,
pages 181–193 (cited on page 140).
668 BIBLIOGRAPHY
[214] Karolina Owczarzak, Josef van Genabith, and Andy Way. “Dependency-Based Au-
tomatic Evaluation for Machine Translation”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2007, pages 80–87 (cited on page 140).
[215] Karolina Owczarzak, Josef van Genabith, and Andy Way. “Labelled Dependencies
in Machine Translation Evaluation”. In: Annual Meeting of the Association for
Computational Linguistics, 2007, pages 104–111 (cited on page
140).
[216] Hui Yu, Xiaofeng Wu, Jun Xie, Wenbin Jiang, Qun Liu, and Shouxun Lin. “RED:
A Reference Dependency Based MT Evaluation Metric”. In: Annual Meeting of
the Association for Computational Linguistics, 2014, pages 2042–2051 (cited on
page 140).
[217] Rafael E. Banchs and Haizhou Li. “AM-FM: A Semantic Framework for Trans-
lation Quality Assessment”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2011, pages 153–158 (cited on page 140).
[218] Florence Reeder. “Measuring MT adequacy using latent semantic analysis”. In:
Proceedings of the 7th Conference of the Association for Machine Translation of
the Americas. Cambridge, Massachusetts, 2006, pages 176–184 (cited on page 140).
[219] Chi-kiu Lo, Meriem Beloucif, Markus Saers, and Dekai Wu. “XMEANT: Bet-
ter semantic MT evaluation without reference translations”. In: Annual Meeting
of the Association for Computational Linguistics, 2014, pages 765–771 (cited on
page 140).
[220] David Vilar, Jia Xu, Luis Fernando D’Haro, and Hermann Ney. “Error Analysis of
Statistical Machine Translation Output”. In: European Language Resources Asso-
ciation (ELRA), 2006, pages 697–702 (cited on page 140).
[221] Maja Popovic, Aljoscha Burchardt, et al. “From human to automatic error clas-
sification for machine translation output”. In: European Association for Machine
Translation, 2011 (cited on page 140).
[222] Ângela Costa, Wang Ling, Tiago Luı
́
s, Rui Correia, and Luı
́
sa Coheur. “A linguisti-
cally motivated taxonomy for Machine Translation error analysis”. In: volume 29.
2. Machine Translation, 2015, pages 127–161 (cited on page 140).
[223] Arle Lommel, Aljoscha Burchardt, Maja Popovic, Kim Harris, Eleftherios Avramidis,
and Hans Uszkoreit. “Using a new analytic measure for the annotation and analy-
sis of MT errors on real data”. In: European Association for Machine Translation,
2014, pages 165–172 (cited on page 140).
BIBLIOGRAPHY 669
[224] Maja Popovic, Adrià de Gispert, Deepa Gupta, Patrik Lambert, Hermann Ney, José
B. Mariño, Marcello Federico, and Rafael E. Banchs. “Morpho-syntactic Informa-
tion for Automatic Error Analysis of Statistical Machine Translation Output”. In:
Annual Meeting of the Association for Computational Linguistics, 2006, pages 1–
6 (cited on page 140).
[225] Maja Popovic and Hermann Ney. “Word Error Rates: Decomposition over POS
classes and Applications for Error Analysis”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2007, pages 48–55 (cited on page 140).
[226] Meritxell González, Laura Mascarell, and Lluı
́
s Màrquez. “tSEARCH: Flexible
and Fast Search over Automatic Translations for Improved Quality/Error Analy-
sis”. In: Annual Meeting of the Association for Computational Linguistics, 2013,
pages 181–186 (cited on page 140).
[227] Alex Kulesza and Stuart Shieber. “A learning approach to improving sentence-
level MT evaluation”. In: Proceedings of the 10th International Conference on
Theoretical and Methodological Issues in Machine Translation, 2004 (cited on
page 141).
[228] Simon Corston-Oliver, Michael Gamon, and Chris Brockett. “A machine learning
approach to the automatic evaluation of machine translation”. In: Annual Meeting
of the Association for Computational Linguistics, 2001, pages 148–155 (cited on
page 141).
[229] Joshua S Albrecht and Rebecca Hwa. “Regression for machine translation evalu-
ation at the sentence level”. In: volume 22. 1-2. Springer, 2008, page 1 (cited on
page 141).
[230] Kevin Duh. “Ranking vs. regression in machine translation evaluation”. In: Pro-
ceedings of the Third Workshop on Statistical Machine Translation, 2008, pages 191–
194 (cited on page 141).
[231] Boxing Chen, Hongyu Guo, and Roland Kuhn. “Multi-level evaluation for ma-
chine translation”. In: Proceedings of the Tenth Workshop on Statistical Machine
Translation, 2015, pages 361–365 (cited on page 141).
[232] Franz Josef Och. “Minimum Error Rate Training in Statistical Machine Transla-
tion”. In: Annual Meeting of the Association for Computational Linguistics, 2003,
pages 160–167 (cited on pages 141, 219, 571).
670 BIBLIOGRAPHY
[233] Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang
Liu. “Minimum Risk Training for Neural Machine Translation”. In: Annual Meet-
ing of the Association for Computational Linguistics, 2016 (cited on pages 141,
377, 450, 453, 478, 479).
[234] Xiaodong He and Li Deng. “Maximum expected bleu training of phrase and lexi-
con translation models”. In: Annual Meeting of the Association for Computational
Linguistics, 2012, pages 292–301 (cited on page 141).
[235] Markus Freitag, Isaac Caswell, and Scott Roy. “APE at Scale and Its Implications
on MT Evaluation Biases”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2019, pages 34–44 (cited on page 141).
[236] Ergun Biçici, Declan Groves, and Josef van Genabith. “Predicting sentence trans-
lation quality using extrinsic and language independent features”. In: volume 27.
3-4. Machine Translation, 2013, pages 171–192 (cited on page 141).
[237] Ergun Biçici, Qun Liu, and Andy Way. “Referential Translation Machines for Pre-
dicting Translation Quality and Related Statistics”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2015, pages 304–308 (cited on page 141).
[238] Kevin Knight. “Decoding Complexity in Word-Replacement Translation Models”.
In: volume 25. 4. Computational Linguistics, 1999, pages 607–615 (cited on pages 158,
179, 223).
[239] Claude Elwood Shannon. “Communication theory of secrecy systems”. In: vol-
ume 28. 4. Bell system technical journal, 1949, pages 656–715 (cited on page 161).
[240] Franz Josef Och and Hermann Ney. “A Systematic Comparison of Various Sta-
tistical Alignment Models”. In: volume 29. 1. Computational Linguistics, 2003,
pages 19–51 (cited on pages 164, 178, 186, 195, 632).
[241] Robert C. Moore. “Improving IBM Word Alignment Model 1”. In: Annual Meeting
of the Association for Computational Linguistics, 2004, pages 518–525 (cited on
page 178).
[242] 肖桐, 李天宁, 陈如山, 朱靖波, and 王会珍. 面向统计机器翻译的重对齐方法
研究”. In: volume 24. 110–116. 中文信息学报, 2010 (cited on page 178).
[243] Hua Wu and Haifeng Wang. “Improving Statistical Word Alignment with Ensem-
ble Methods”. In: volume 3651. International Joint Conference on Natural Lan-
guage Processing, 2005, pages 462–473 (cited on page 178).
[244] Ye-Yi Wang and Wayne Ward. “Grammar Inference and Statistical Machine Trans-
lation”. In: Carnegie Mellon University, 1999 (cited on page 178).
BIBLIOGRAPHY 671
[245] Ido Dagan, Kenneth Ward Church, and Willian Gale. “Robust Bilingual Word
Alignment for Machine Aided Translation”. In: Very Large Corpora, 1993 (cited
on page 178).
[246] Abraham Ittycheriah and Salim Roukos. “A Maximum Entropy Word Aligner for
Arabic-English Machine Translation”. In: Annual Meeting of the Association for
Computational Linguistics, 2005 (cited on page
178).
[247] William A. Gale and Kenneth Ward Church. “Identifying Word Correspondences
in Parallel Texts”. In: Morgan Kaufmann, 1991 (cited on page 178).
[248] Tong Xiao and Jingbo Zhu. “Unsupervised sub-tree alignment for tree-to-tree trans-
lation”. In: volume 48. Journal of Artificial Intelligence Research, 2013, pages 733–
782 (cited on pages 178, 268, 269).
[249] Percy Liang, Benjamin Taskar, and Dan Klein. “Alignment by Agreement”. In:
Annual Meeting of the Association for Computational Linguistics, 2006 (cited on
page 178).
[250] Chris Dyer, Victor Chahuneau, and Noah A. Smith. “A Simple, Fast, and Effective
Reparameterization of IBM Model 2”. In: Annual Meeting of the Association for
Computational Linguistics, 2013, pages 644–648 (cited on pages 178, 212, 632).
[251] Benjamin Taskar, Simon Lacoste-Julien, and Dan Klein. “A Discriminative Match-
ing Approach to Word Alignment”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2005, pages 73–80 (cited on pages 178, 212).
[252] Alexander Fraser and Daniel Marcu. “Measuring Word Alignment Quality for Sta-
tistical Machine Translation”. In: volume 33. 3. Computational Linguistics, 2007,
pages 293–303 (cited on page 178).
[253] John DeNero and Dan Klein. “Tailoring Word Alignments to Syntactic Machine
Translation”. In: Annual Meeting of the Association for Computational Linguistics,
2007 (cited on page 178).
[254] Paul C DavisZhuli Xie and Kevin Small. “All Links are not the Same: Evaluating
Word Alignments for Statistical Machine Translation”. In: Machine Translation
Summit XI, 2007 (cited on page 178).
[255] , , , , and . 词对
”. In: volume 23. 88-94. 中文信息学报, 2009 (cited on page 178).
[256] Shi Feng, Shujie Liu, Mu Li, and Ming Zhou. “Implicit Distortion and Fertility
Models for Attention-based Encoder-Decoder NMT Model”. In: volume abs/1601.03317.
CoRR, 2016 (cited on page 179).
672 BIBLIOGRAPHY
[257] Raghavendra Udupa, Tanveer A. Faruquie, and Hemanta Kumar Maji. “An Algo-
rithmic Framework for Solving the Decoding Problem in Statistical Machine Trans-
lation”. In: International Conference on Computational Linguistics, 2004 (cited on
page 179).
[258] Sebastian Riedel and James Clarke. “Revisiting Optimal Decoding for Machine
Translation IBM Model 4”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2009 (cited on page 179).
[259] Raghavendra Udupa and Hemanta Kumar Maji. “Computational Complexity of
Statistical Machine Translation”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2006 (cited on page 179).
[260] Gregor Leusch, Evgeny Matusov, and Hermann Ney. “Complexity of Finding the
BLEU-optimal Hypothesis in a Confusion Network”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2008, pages 839–847 (cited on page 179).
[261] Noah Fleming, Antonina Kolokolova, and Renesa Nizamee. “Complexity of align-
ment and decoding problems: restrictions and approximations”. In: volume 29. 3-4.
Machine Translation, 2015, pages 163–187 (cited on page 179).
[262] Stephan Vogel, Hermann Ney, and Christoph Tillmann. “HMM-Based Word Align-
ment in Statistical Translation”. In: International Conference on Computational
Linguistics, 1996, pages 836–841 (cited on pages 181, 185).
[263] Brown D.C. “Decentering Distortion of Lenses”. In: volume 32. Photogrammetric
Engineering, 1966, pages 444–462 (cited on page 198).
[264] David Claus and Andrew W. Fitzgibbon. “A Rational Function Lens Distortion
Model for General Cameras”. In: IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, 2005, pages 213–219 (cited on page 198).
[265] Jerneja Žganec Gros. “MSD Recombination Method in Statistical Machine Transla-
tion”. In: volume 1060. American Institute of Physics, 2008, pages 186–189 (cited
on page 198).
[266] Deyi Xiong, Qun Liu, and Shouxun Lin. “Maximum Entropy Based Phrase Re-
ordering Model for Statistical Machine Translation”. In: Annual Meeting of the
Association for Computational Linguistics, 2006 (cited on pages 198, 216, 228).
[267] Franz Josef Och and Hermann Ney. “The Alignment Template Approach to Sta-
tistical Machine Translation”. In: volume 30. 4. Computational Linguistics, 2004,
pages 417–449 (cited on pages 198, 216, 228).
BIBLIOGRAPHY 673
[268] Shankar Kumar and William J. Byrne. “Local Phrase Reordering Models for Sta-
tistical Machine Translation”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2005, pages 161–168 (cited on pages 198, 216, 228).
[269] Peng Li, Yang Liu, Maosong Sun, Tatsuya Izuha, and Dakun Zhang. “A Neural
Reordering Model for Phrase-based Translation”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2014, pages 1897–1907 (cited on pages
198,
217).
[270] David Chiang, Adam Lopez, Nitin Madnani, Christof Monz, Philip Resnik, and
Michael Subotin. “The Hiero Machine Translation System: Extensions, Evaluation,
and Analysis”. In: Annual Meeting of the Association for Computational Linguis-
tics, 2005, pages 779–786 (cited on page 198).
[271] Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, and Richard Socher.
“Non-Autoregressive Neural Machine Translation”. In: International Conference
on Learning Representations, 2018 (cited on pages 198, 382, 476, 486, 488–490).
[272] Andrew J. Viterbi. “Error bounds for convolutional codes and an asymptotically
optimum decoding algorithm”. In: volume 13. 2. IEEE Transactions on Information
Theory, 1967, pages 260–269 (cited on page 207).
[273] Philipp Koehn and Kevin Knight. “Estimating Word Translation Probabilities from
Unrelated Monolingual Corpora Using the EM Algorithm”. In: AAAI Press, 2000,
pages 711–715 (cited on page 212).
[274] Franz Josef Och and Hermann Ney. “A Comparison of Alignment Models for
Statistical Machine Translation”. In: Morgan Kaufmann, 2000, pages 1086–1090
(cited on page 212).
[275] Kevin Knight. “Learning a translation lexicon from monolingual corpora”. In: An-
nual Meeting of the Association for Computational Linguistics, 2002, pages 9–16
(cited on page 213).
[276] M. J. D. Powell. “An efficient method for finding the minimum of a function of
several variables without calculating derivatives”. In: volume 7. 2. The Computer
Journal, 1964, pages 155–162 (cited on page 220).
[277] David Chiang, Yuval Marton, and Philip Resnik. “Online Large-Margin Training
of Syntactic and Structural Translation Features”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2008, pages 224–233 (cited on pages 222,
229).
674 BIBLIOGRAPHY
[278] Mark Hopkins and Jonathan May. “Tuning as Ranking”. In: Annual Meeting of
the Association for Computational Linguistics, 2011, pages 1352–1362 (cited on
pages 222, 229).
[279] Franz Josef Och and Hans Weber. “Improving Statistical Natural Language Trans-
lation with Categories and Rules”. In: Annual Meeting of the Association for Com-
putational Linguistics, 1998, pages 985–989 (cited on page
228).
[280] Franz Josef Och. “Statistical machine translation: from single word models to align-
ment templates”. PhD thesis. 2002 (cited on page 228).
[281] Ye-Yi Wang and Alex Waibel. “Modeling with Structures in Statistical Machine
Translation”. In: Annual Meeting of the Association for Computational Linguistics,
1998, pages 1357–1363 (cited on pages 228, 278).
[282] Taro Watanabe, Eiichiro Sumita, and Hiroshi G. Okuno. “Chunk-Based Statistical
Translation”. In: Annual Meeting of the Association for Computational Linguistics,
2003, pages 303–310 (cited on page 228).
[283] Daniel Marcu. “Towards a Unified Approach to Memory- and Statistical-Based
Machine Translation”. In: Morgan Kaufmann Publishers, 2001, pages 378–385
(cited on page 228).
[284] Philipp Koehn, Franz Josef Och, and Daniel Marcu. “Statistical Phrase-Based Trans-
lation”. In: Annual Meeting of the Association for Computational Linguistics, 2003
(cited on pages 228, 229).
[285] Richard Zens, Franz Josef Och, and Hermann Ney. “Phrase-Based Statistical Ma-
chine Translation”. In: Annual Conference on Artificial Intelligence, 2002, pages 18–
32 (cited on pages 228, 570).
[286] Richard Zens and Hermann Ney. “Improvements in Phrase-Based Statistical Ma-
chine Translation”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2004, pages 257–264 (cited on page 228).
[287] Daniel Marcu and Daniel Wong. “A Phrase-Based, Joint Probability Model for
Statistical Machine Translation”. In: Conference on Empirical Methods in Natural
Language Processing, 2002, pages 133–139 (cited on page 228).
[288] John DeNero, Dan Gillick, James Zhang, and Dan Klein. “Why Generative Phrase
Models Underperform Surface Heuristics”. In: Annual Meeting of the Association
for Computational Linguistics, 2006, pages 31–38 (cited on page 228).
BIBLIOGRAPHY 675
[289] German Sanchis-Trilles, Daniel Ortiz-Martinez, Jesus Gonzalez-Rubio, Jorge Gon-
zalez, and Francisco Casacuberta. “Bilingual segmentation for phrasetable pruning
in Statistical Machine Translation”. In: Conference of the European Association for
Machine Translation, 2011, pages 257–264 (cited on page 228).
[290] Graeme W. Blackwood, Adrià de Gispert, and William Byrne. “Phrasal Segmenta-
tion Models for Statistical Machine Translation”. In: International Conference on
Computational Linguistics, 2008, pages 19–22 (cited on page 228).
[291] Deyi Xiong, Min Zhang, and Haizhou Li. “Learning Translation Boundaries for
Phrase-Based Decoding”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2010, pages 136–144 (cited on page 228).
[292] Christoph Tillman. “A Unigram Orientation Model for Statistical Machine Transla-
tion”. In: Annual Meeting of the Association for Computational Linguistics, 2004
(cited on page 228).
[293] Masaaki Nagata, Kuniko Saito, Kazuhide Yamamoto, and Kazuteru Ohashi. “A
Clustered Global Phrase Reordering Model for Statistical Machine Translation”.
In: Annual Meeting of the Association for Computational Linguistics, 2006 (cited
on page 228).
[294] Richard Zens and Hermann Ney. “Discriminative Reordering Models for Statistical
Machine Translation”. In: Annual Meeting of the Association for Computational
Linguistics, 2006, pages 55–63 (cited on page 228).
[295] Spence Green, Michel Galley, and Christopher D. Manning. “Improved Models of
Distortion Cost for Statistical Machine Translation”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2010, pages 867–875 (cited on page 228).
[296] Colin Cherry. “Improved Reordering for Phrase-Based Translation using Sparse
Features”. In: Annual Meeting of the Association for Computational Linguistics,
2013, pages 22–31 (cited on page 228).
[297] Matthias Huck, Joern Wuebker, Felix Rietig, and Hermann Ney. “A Phrase Orien-
tation Model for Hierarchical Machine Translation”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2013, pages 452–463 (cited on page 228).
[298] Matthias Huck, Stephan Peitz, Markus Freitag, and Hermann Ney. “Discriminative
Reordering Extensions for Hierarchical Phrase-Based Machine Translation”. In:
International Conference on Material Engineering and Advanced Manufacturing
Technology, 2012 (cited on page 228).
676 BIBLIOGRAPHY
[299] Vinh Van Nguyen, Akira Shimazu, Minh Le Nguyen, and Thai Phuong Nguyen.
“Improving a Lexicalized Hierarchical Reordering Model Using Maximum En-
tropy”. In: Machine Translation Summit XII, 2009 (cited on page 228).
[300] Arianna Bisazza and Marcello Federico. “A Survey of Word Reordering in Statis-
tical Machine Translation: Computational Models and Language Phenomena”. In:
volume 42. 2. Computational Linguistics, 2016, pages 163–205 (cited on page
228).
[301] Fei Xia and Michael C. McCord. “Improving a Statistical MT System with Au-
tomatically Learned Rewrite Patterns”. In: International Conference on Computa-
tional Linguistics, 2004 (cited on page 228).
[302] Michael Collins, Philipp Koehn, and Ivona Kucerova. “Clause Restructuring for
Statistical Machine Translation”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2005, pages 531–540 (cited on page 228).
[303] Chao Wang, Michael Collins, and Philipp Koehn. “Chinese Syntactic Reordering
for Statistical Machine Translation”. In: Annual Meeting of the Association for
Computational Linguistics, 2007, pages 737–745 (cited on pages 228, 556).
[304] Xianchao Wu, Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, and Masaaki Na-
gata. “Extracting Pre-ordering Rules from Predicate-Argument Structures”. In: An-
nual Meeting of the Association for Computational Linguistics, 2011, pages 29–37
(cited on page 228).
[305] Christoph Tillmann and Hermann Ney. “Word Re-ordering and DP-based Search
in Statistical Machine Translation”. In: Morgan Kaufmann, 2000, pages 850–856
(cited on page 229).
[306] Wade Shen, Brian Delaney, and Timothy R. Anderson. “An efficient graph search
decoder for phrase-based statistical machine translation”. In: International Sympo-
sium on Computer Architecture, 2006, pages 197–204 (cited on page 229).
[307] Robert C. Moore and Chris Quirk. “Faster Beam-Search Decoding for Phrasal Sta-
tistical Machine Translation”. In: Machine Translation Summit XI, 2007 (cited on
page 229).
[308] Kenneth Heafield, Michael Kayser, and Christopher D. Manning. “Faster Phrase-
Based Decoding by Refining Feature State”. In: Annual Meeting of the Association
for Computational Linguistics, 2014, pages 130–135 (cited on page 229).
[309] Joern Wuebker, Hermann Ney, and Richard Zens. “Fast and Scalable Decoding
with Language Model Look-Ahead for Phrase-based Statistical Machine Transla-
tion”. In: Annual Meeting of the Association for Computational Linguistics, 2012,
pages 28–32 (cited on page 229).
BIBLIOGRAPHY 677
[310] Richard Zens and Hermann Ney. “Improvements in dynamic programming beam
search for phrase-based statistical machine translation”. In: International Sympo-
sium on Computer Architecture, 2008, pages 198–205 (cited on page 229).
[311] Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada,
Alexander M. Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng,
Viren Jain, Zhen Jin, and Dragomir R. Radev. “A Smorgasbord of Features for
Statistical Machine Translation”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2004, pages 161–168 (cited on pages 229, 279).
[312] David Chiang, Kevin Knight, and Wei Wang. “11,001 New Features for Statistical
Machine Translation”. In: Annual Meeting of the Association for Computational
Linguistics, 2009, pages 218–226 (cited on page 229).
[313] Daniel Gildea. “Loosely Tree-Based Alignment for Machine Translation”. In: An-
nual Meeting of the Association for Computational Linguistics, 2003, pages 80–87
(cited on page 229).
[314] Phil Blunsom, Trevor Cohn, and Miles Osborne. “A Discriminative Latent Variable
Model for Statistical Machine Translation”. In: Annual Meeting of the Association
for Computational Linguistics, 2008, pages 200–208 (cited on page 229).
[315] Phil Blunsom, Trevor Cohn, Chris Dyer, and Miles Osborne. “A Gibbs Sampler for
Phrasal Synchronous Grammar Induction”. In: Annual Meeting of the Association
for Computational Linguistics, 2009, pages 782–790 (cited on page 229).
[316] Trevor Cohn and Phil Blunsom. “A Bayesian Model of Syntax-Directed Tree to
String Grammar Induction”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2009, pages 352–361 (cited on page 229).
[317] David A. Smith and Jason Eisner. “Minimum Risk Annealing for Training Log-
Linear Models”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2006 (cited on page 229).
[318] Zhifei Li and Jason Eisner. “First- and Second-Order Expectation Semirings with
Applications to Minimum-Risk Training on Translation Forests”. In: Annual Meet-
ing of the Association for Computational Linguistics, 2009, pages 40–51 (cited on
page 229).
[319] Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. “Online Large-
Margin Training for Statistical Machine Translation”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2007, pages 764–773 (cited on page 229).
678 BIBLIOGRAPHY
[320] Markus Dreyer and Yuanzhe Dong. “APRO: All-Pairs Ranking Optimization for
MT Tuning”. In: Annual Meeting of the Association for Computational Linguistics,
2015, pages 1018–1023 (cited on page 229).
[321] Tong Xiao, Derek F. Wong, and Jingbo Zhu. “A Loss-Augmented Approach to
Training Syntactic Machine Translation Systems”. In: volume 24. 11. IEEE Trans-
actions on Audio, Speech, and Language Processing, 2016, pages 2069–2083 (cited
on page 229).
[322] Harold Charles Daume Iii. Practical structured learning techniques for natural
language processing. University of Southern California, 2006 (cited on page 229).
[323] Holger Schwenk, Marta R. Costa-jussà, and José A. R. Fonollosa. “Smooth Bilin-
gual N-Gram Translation”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2007, pages 430–438 (cited on page 229).
[324] Boxing Chen, Roland Kuhn, George Foster, and Howard Johnson. “Unpacking
and Transforming Feature Functions: New Ways to Smooth Phrase Tables”. In:
Machine Translation Summit, 2011 (cited on page 229).
[325] Nan Duan, Hong Sun, and Ming Zhou. “Translation Model Generalization using
Probability Averaging for Machine Translation”. In: International Conference on
Computational Linguistics, 2010 (cited on page 229).
[326] Christopher Quirk and Arul Menezes. “Do we need phrases? Challenging the con-
ventional wisdom in Statistical Machine Translation”. In: Annual Meeting of the
Association for Computational Linguistics, 2006 (cited on page 229).
[327] José B. Mariño, Rafael E. Banchs, Josep Maria Crego, Adrià de Gispert, Patrik
Lambert, José A. R. Fonollosa, and Marta R. Costa-jussà. N-gram-based Machine
Translation”. In: volume 32. 4. Computational Linguistics, 2006, pages 527–549
(cited on page 229).
[328] Richard Zens, Daisy Stanton, and Peng Xu. “A Systematic Comparison of Phrase
Table Pruning Techniques”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2012, pages 972–983 (cited on pages 229, 481).
[329] Howard Johnson, Joel D. Martin, George F. Foster, and Roland Kuhn. “Improving
Translation Quality by Discarding Most of the Phrasetable”. In: Annual Meeting
of the Association for Computational Linguistics, 2007, pages 967–975 (cited on
pages 229, 481).
[330] Wang Ling, João Graça, Isabel Trancoso, and Alan W. Black. “Entropy-based Prun-
ing for Phrase-based Machine Translation”. In: Annual Meeting of the Association
for Computational Linguistics, 2012, pages 962–971 (cited on pages 229, 481).
BIBLIOGRAPHY 679
[331] Luke S. Zettlemoyer and Robert C. Moore. “Selective Phrase Pair Extraction for
Improved Statistical Machine Translation”. In: Annual Meeting of the Association
for Computational Linguistics, 2007, pages 209–212 (cited on page 229).
[332] Matthias Eck, Stephan Vogel, and Alex Waibel. “Translation Model Pruning via
Usage Statistics for Statistical Machine Translation”. In: Annual Meeting of the
Association for Computational Linguistics, 2007, pages 21–24 (cited on page
229).
[333] Chris Callison-Burch, Colin J. Bannard, and Josh Schroeder. “Scaling Phrase-Based
Statistical Machine Translation to Larger Corpora and Longer Phrases”. In: Annual
Meeting of the Association for Computational Linguistics, 2005, pages 255–262
(cited on page 229).
[334] Richard Zens and Hermann Ney. “Efficient Phrase-Table Representation for Ma-
chine Translation with Applications to Online MT and Speech Translation”. In: An-
nual Meeting of the Association for Computational Linguistics, 2007, pages 492–
499 (cited on page 229).
[335] Ulrich Germann. “Dynamic Phrase Tables for Machine Translation in an Interac-
tive Post-editing Scenario”. In: Association for Machine Translation in the Ameri-
cas, 2014 (cited on page 229).
[336] David Chiang. “Hierarchical Phrase-Based Translation”. In: volume 33. 2. Compu-
tational Linguistics, 2007, pages 201–228 (cited on pages 236, 241, 251).
[337] John Cocke and J.T. Schwartz. Programming Languages and Their Compilers: Pre-
liminary Notes. Courant Institute of Mathematical Sciences, New York University,
1970 (cited on page 243).
[338] Daniel H. Younger. “Recognition and Parsing of Context-Free Languages in Time
n3̂”. In: volume 10. 2. Information and Control, 1967, pages 189–208 (cited on
page 243).
[339] Tadao Kasami. “An efficient recognition and syntax-analysis algorithm for context-
free languages”. In: Coordinated Science Laboratory Report no. R-257, 1966 (cited
on page 243).
[340] Liang Huang and David Chiang. “Better k-best Parsing”. In: Annual Meeting of the
Association for Computational Linguistics, 2005, pages 53–64 (cited on page 246).
[341] Dekai Wu. “Stochastic Inversion Transduction Grammars and Bilingual Parsing of
Parallel Corpora”. In: volume 23. 3. Computational Linguistics, 1997, pages 377–
403 (cited on pages 251, 278).
680 BIBLIOGRAPHY
[342] Liang Huang, Kevin Knight, and Aravind Joshi. “Statistical syntax-directed trans-
lation with extended domain of locality”. In: Computationally Hard Problems &
Joint Inference in Speech & Language Processing, 2006, pages 66–73 (cited on
page 251).
[343] Michel Galleyand Mark Hopkins, Kevin Knight, and Daniel Marcu. “Whats in a
translation rule?” In: Proceedings of the Human Language Technology Conference
of the North American Chapter of the Association for Computational Linguistics,
2004, pages 273–280 (cited on pages 251, 258).
[344] Jason Eisner. “Learning Non-Isomorphic Tree Mappings for Machine Translation”.
In: Annual Meeting of the Association for Computational Linguistics, 2003, pages 205–
208 (cited on page 251).
[345] Min Zhang, Hongfei Jiang, AiTi Aw, Haizhou Li, Chew Lim Tan, and Sheng Li.
“A Tree Sequence Alignment-based Tree-to-Tree Translation Model”. In: Annual
Meeting of the Association for Computational Linguistics, 2008, pages 559–567
(cited on page 251).
[346] Daniel Marcu, Wei Wang, Abdessamad Echihabi, and Kevin Knight. “SPMT: Sta-
tistical Machine Translation with Syntactified Target Language Phrases”. In: An-
nual Meeting of the Association for Computational Linguistics, 2006, pages 44–52
(cited on pages 264, 278).
[347] Nianwen Xue, Fei Xia, Fu dong Chiou, and Martha Palmer. “Building a large an-
notated Chinese corpus: the Penn Chinese treebank”. In: volume 11. 2. Journal of
Natural Language Engineering, 2005, pages 207–238 (cited on page 265).
[348] Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. “Building a
Large Annotated Corpus of English: The Penn Treebank”. In: volume 19. 2. Com-
putational Linguistics, 1993, pages 313–330 (cited on page 265).
[349] Hao Zhang, Liang Huang, Daniel Gildea, and Kevin Knight. “Synchronous Bi-
narization for Machine Translation”. In: Annual Meeting of the Association for
Computational Linguistics, 2006 (cited on pages 266, 277).
[350] Tong Xiao, Mu Li, Dongdong Zhang, Jingbo Zhu, and Ming Zhou. “Better Syn-
chronous Binarization for Machine Translation”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2009, pages 362–370 (cited on pages 266,
277).
[351] Dan Klein and Christopher D. Manning. “Accurate Unlexicalized Parsing”. In: An-
nual Meeting of the Association for Computational Linguistics, 2003, pages 423–
430 (cited on page 266).
BIBLIOGRAPHY 681
[352] Yang Liu, Yajuan Lü, and Qun Liu. “Improving Tree-to-Tree Translation with
Packed Forests”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2009, pages 558–566 (cited on pages 266, 278).
[353] Declan Groves, Mary Hearne, and Andy Way. “Robust Sub-Sentential Alignment
of Phrase-Structure Trees”. In: International Conference on Computational Linguis-
tics, 2004 (cited on page
268).
[354] Jun Sun, Min Zhang, and Chew Lim Tan. “Discriminative Induction of Sub-Tree
Alignment using Limited Labeled Data”. In: International Conference on Compu-
tational Linguistics, 2010, pages 1047–1055 (cited on pages 268, 269).
[355] Yang Liu, Tian Xia, Xinyan Xiao, and Qun Liu. “Weighted Alignment Matrices
for Statistical Machine Translation”. In: Annual Meeting of the Association for
Computational Linguistics, 2009, pages 1017–1026 (cited on pages 268, 269).
[356] Jun Sun, Min Zhang, and Chew Lim Tan. “Exploring Syntactic Structural Fea-
tures for Sub-Tree Alignment Using Bilingual Tree Kernels”. In: Annual Meeting
of the Association for Computational Linguistics, 2010, pages 306–315 (cited on
page 269).
[357] Dan Klein and Christopher D. Manning. “Parsing and Hypergraphs”. In: volume 65.
3. New Developments in Parsing Technology, 2001, pages 123–134 (cited on page 270).
[358] Joshua Goodman. “Semiring Parsing”. In: volume 25. 4. Computational Linguis-
tics, 1999, pages 573–605 (cited on page 271).
[359] Jason Eisner. “Parameter Estimation for Probabilistic Finite-State Transducers”.
In: Annual Meeting of the Association for Computational Linguistics, 2002, pages 1–
8 (cited on page 271).
[360] Jingbo Zhu and Tong Xiao. “Improving Decoding Generalization for Tree-to-String
Translation”. In: Annual Meeting of the Association for Computational Linguistics,
2011, pages 418–423 (cited on pages 275, 278).
[361] Hiyan Alshawi, Adam L. Buchsbaum, and Fei Xia. “A Comparison of Head Trans-
ducers and Transfer for a Limited Domain Translation Application”. In: Morgan
Kaufmann Publishers, 1997, pages 360–365 (cited on page 278).
[362] Dekai Wu. “Trainable Coarse Bilingual Grammars for Parallel Text Bracketing”.
In: Third Workshop on Very Large Corpor, 1995 (cited on page 278).
[363] Dekai Wu and Hongsing Wong. “Machine Translation with a Stochastic Grammat-
ical Channel”. In: Morgan Kaufmann Publishers, 1998, pages 1408–1415 (cited on
page 278).
682 BIBLIOGRAPHY
[364] J.A.Sánchez and J.M.Benedí. “Obtaining Word Phrases with Stochastic Inversion
Transduction Grammars for Phrase-based Statistical Machine Translation”. In: An-
nual Meeting of the Association for Computational Linguistics, 2006 (cited on
page 278).
[365] Hao Zhang, Chris Quirk, Robert C. Moore, and Daniel Gildea. “Bayesian Learning
of Non-Compositional Phrases with Synchronous Parsing”. In: Annual Meeting of
the Association for Computational Linguistics, 2008 (cited on page 278).
[366] Andreas Zollmann, Ashish Venugopal, Franz Josef Och, and Jay M. Ponte. “A
Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Sta-
tistical MT”. In: International Conference on Computational Linguistics, 2008,
pages 1145–1152 (cited on page 278).
[367] Taro Watanabe, Hajime Tsukada, and Hideki Isozaki. “Left-to-Right Target Gener-
ation for Hierarchical Phrase-Based Translation”. In: Annual Meeting of the Asso-
ciation for Computational Linguisticss, 2006 (cited on page 278).
[368] Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. “What’s in a trans-
lation rule?” In: Annual Meeting of the Association for Computational Linguistics,
2004, pages 273–280 (cited on page 278).
[369] Bryant Huang and Kevin Knight. “Relabeling Syntax Trees to Improve Syntax-
Based Machine Translation Quality”. In: Annual Meeting of the Association for
Computational Linguistics, 2006 (cited on page 278).
[370] Steve DeNeefe, Kevin Knight, Wei Wang, and Daniel Marcu. “What Can Syntax-
Based MT Learn from Phrase-Based MT?” In: Annual Meeting of the Association
for Computational Linguistics, 2007, pages 755–763 (cited on pages 278, 535).
[371] Ding Liu and Daniel Gildea. “Improved Tree-to-String Transducer for Machine
Translation”. In: Annual Meeting of the Association for Computational Linguistics,
2008, pages 62–69 (cited on page 278).
[372] Andreas Zollmann and Ashish Venugopal. “Syntax Augmented Machine Transla-
tion via Chart Parsing”. In: Annual Meeting of the Association for Computational
Linguistics, 2006, pages 138–141 (cited on page 278).
[373] Yuval Marton and Philip Resnik. “Soft Syntactic Constraints for Hierarchical Phrased-
Based Translation”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2008, pages 1003–1011 (cited on page 278).
[374] Rebecca Nesson, Stuart M. Shieber, and Alexander Rush. “Induction of probabilis-
tic synchronous tree-insertion grammars for machine translation”. In: Annual Meet-
ing of the Association for Computational Linguistics, 2006 (cited on page 278).
BIBLIOGRAPHY 683
[375] Min Zhang, Hongfei Jiang, Ai Ti Aw, Jun Sun, Sheng Li, and Chew Lim Tan.
A Tree-to-Tree Alignment-based Model for Statistical Machine Translation. 2007
(cited on page 278).
[376] Haitao Mi, Liang Huang, and Qun Liu. “Forest-Based Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2008, pages 192–199
(cited on page
278).
[377] Haitao Mi and Liang Huang. “Forest-based Translation Rule Extraction”. In: An-
nual Meeting of the Association for Computational Linguistics, 2008, pages 206–
214 (cited on page 278).
[378] Jiajun Zhang, Feifei Zhai, and Chengqing Zong. “Augmenting String-to-Tree Trans-
lation Models with Fuzzy Use of Source-side Syntax”. In: Annual Meeting of the
Association for Computational Linguistics, 2011, pages 204–215 (cited on page 278).
[379] Martin Popel, David Marecek, Nathan Green, and Zdenek Zabokrtský. “Influence
of Parser Choice on Dependency-Based MT”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2011, pages 433–439 (cited on page 278).
[380] Tong Xiao, Jingbo Zhu, Hao Zhang, and Muhua Zhu. “An Empirical Study of
Translation Rule Extraction with Multiple Parsers”. In: Chinese Information Pro-
cessing Society of China, 2010, pages 1345–1353 (cited on page 278).
[381] Feifei Zhai, Jiajun Zhang, Yu Zhou, and Chengqing Zong. “Unsupervised Tree
Induction for Tree-based Translation”. In: volume 1. Transactions of Association
for Computational Linguistic, 2013, pages 243–254 (cited on page 278).
[382] Christopher Quirk and Arul Menezes. “Dependency treelet translation: the con-
vergence of statistical and example-based machine-translation?” In: volume 20. 1.
Machine Translation, 2006, pages 43–65 (cited on page 279).
[383] Deyi Xiong, Qun Liu, and Shouxun Lin. “A Dependency Treelet String Correspon-
dence Model for Statistical Machine Translation”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2007, pages 40–47 (cited on page 279).
[384] Dekang Lin. “A Path-based Transfer Model for Machine Translation”. In: Interna-
tional Conference on Computational Linguistics, 2004 (cited on page 279).
[385] Yuan Ding and Martha Palmer. “Machine Translation Using Probabilistic Syn-
chronous Dependency Insertion Grammars”. In: Annual Meeting of the Associ-
ation for Computational Linguistics, 2005, pages 541–548 (cited on page 279).
684 BIBLIOGRAPHY
[386] Hongshen Chen, Jun Xie, Fandong Meng, Wenbin Jiang, and Qun Liu. “A Depen-
dency Edge-based Transfer Model for Statistical Machine Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2014, pages 1103–1113
(cited on page 279).
[387] Jinsong Su, Yang Liu, Haitao Mi, Hongmei Zhao, Yajuan Lv, and Qun Liu. “Dependency-
Based Bracketing Transduction Grammar for Statistical Machine Translation”. In:
Chinese Information Processing Society of China, 2010, pages 1185–1193 (cited
on page 279).
[388] Jun Xie, Jinan Xu, and Qun Liu. “Augment Dependency-to-String Translation with
Fixed and Floating Structures”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2014, pages 2217–2226 (cited on page 279).
[389] Liangyou Li, Andy Way, and Qun Liu. “Dependency Graph-to-String Translation”.
In: Annual Meeting of the Association for Computational Linguistics, 2015, pages 33–
43 (cited on page 279).
[390] Haitao Mi and Qun Liu. “Constituency to Dependency Translation with Forests”.
In: Annual Meeting of the Association for Computational Linguistics, 2010, pages 1433–
1442 (cited on page 279).
[391] Zhaopeng Tu, Yang Liu, Young-Sook Hwang, Qun Liu, and Shouxun Lin. “Depen-
dency Forest for Statistical Machine Translation”. In: International Conference on
Computational Linguistics, 2010, pages 1092–1100 (cited on page 279).
[392] German Bordel Srinivas Bangalore and Giuseppe Riccardi. “Computing consen-
sus translation from multiple machine translation systems”. In: IEEE Workshop on
Automatic Speech Recognition and Understanding, 2001, pages 351–354 (cited on
page 279).
[393] Antti-Veikko I. Rosti, Necip Fazil Ayan, Bing Xiang, Spyridon Matsoukas, Richard
M. Schwartz, and Bonnie J. Dorr. “Combining Outputs from Multiple Machine
Translation Systems”. In: Annual Meeting of the Association for Computational
Linguistics, 2007, pages 228–235 (cited on page 279).
[394] Tong Xiao, Jingbo Zhu, and Tongran Liu. “Bagging and boosting statistical ma-
chine translation systems”. In: volume 195. Artificial Intelligence, 2013, pages 496–
527 (cited on pages 279, 478, 495).
[395] Yang Feng, Yang Liu, Haitao Mi, Qun Liu, and Yajuan Lü. “Lattice-based System
Combination for Statistical Machine Translation”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2009, pages 1105–1113 (cited on page 279).
BIBLIOGRAPHY 685
[396] Xiaodong He, Mei Yang, Jianfeng Gao, Patrick Nguyen, and Robert C. Moore.
“Indirect-HMM-based Hypothesis Alignment for Combining Outputs from Ma-
chine Translation Systems”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2008, pages 98–107 (cited on page 279).
[397] Chi-Ho Li, Xiaodong He, Yupeng Liu, and Ning Xi. “Incremental HMM Align-
ment for MT System Combination”. In: Annual Meeting of the Association for
Computational Linguistics, 2009, pages 949–957 (cited on page 279).
[398] Yang Liu, Haitao Mi, Yang Feng, and Qun Liu. “Joint Decoding with Multiple
Translation Models”. In: Annual Meeting of the Association for Computational
Linguistics, 2009, pages 576–584 (cited on page 279).
[399] Mu Li, Nan Duan, Dongdong Zhang, Chi-Ho Li, and Ming Zhou. “Collaborative
Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between
Decoders”. In: Annual Meeting of the Association for Computational Linguistics,
2009, pages 585–592 (cited on page 279).
[400] Tong Xiao, Jingbo Zhu, Chunliang Zhang, and Tongran Liu. “Syntactic Skeleton-
Based Translation”. In: AAAI Conference on Artificial Intelligence, 2016, pages 2856–
2862 (cited on pages 279, 535).
[401] Eugene Charniak. “Immediate-Head Parsing for Language Models”. In: Morgan
Kaufmann Publishers, 2001, pages 116–123 (cited on page 279).
[402] Libin Shen, Jinxi Xu, and Ralph M. Weischedel. “A New String-to-Dependency
Machine Translation Algorithm with a Target Dependency Language Model”. In:
Annual Meeting of the Association for Computational Linguistics, 2008, pages 577–
585 (cited on page 279).
[403] Tong Xiao, Jingbo Zhu, and Muhua Zhu. “Language Modeling for Syntax-Based
Machine Translation Using Tree Substitution Grammars: A Case Study on Chinese-
English Translation”. In: volume 10. 4. ACM Transactions on Asian Language
Information Processing (TALIP), 2011, pages 1–29 (cited on pages 279, 499).
[404] Peter F. Brown, Vincent J. Della Pietra, Peter V. De Souza, Jennifer C. Lai, and
Robert L. Mercer. “Class-based n-gram models of natural language”. In: volume 18.
4. Computational linguistics, 1992, pages 467–479 (cited on page 288).
[405] Tomas Mikolov and Geoffrey Zweig. “Context dependent recurrent neural net-
work language model”. In: IEEE Spoken Language Technology Workshop, 2012,
pages 234–239 (cited on pages 288, 343).
686 BIBLIOGRAPHY
[406] Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. “Recurrent Neural Network
Regularization”. In: arXiv: Neural and Evolutionary Computing, 2014 (cited on
page 288).
[407] Julian G. Zilly, Rupesh Kumar Srivastava, Jan Koutnı
́
k, and Jürgen Schmidhuber.
“Recurrent Highway Networks”. In: International Conference on Machine Learn-
ing, 2016 (cited on page
288).
[408] Stephen Merity, Nitish Shirish Keskar, and Richard Socher. “Regularizing and op-
timizing LSTM language models”. In: International Conference on Learning Rep-
resentations, 2017 (cited on page 288).
[409] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya
Sutskever. “Language models are unsupervised multitask learners”. In: volume 1.
8. OpenAI Blog, 2019, page 9 (cited on pages 288, 436).
[410] Atılım Günes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jef-
frey Mark Siskind. “Automatic differentiation in machine learning: a survey”. In:
volume 18. 1. Journal of Machine Learning Research, 2017, pages 5595–5637
(cited on pages 317, 319).
[411] Ning Qian. “On the momentum term in gradient descent learning algorithms”. In:
volume 12. 1. Neural Networks, 1999, pages 145–151 (cited on page 320).
[412] John C. Duchi, Elad Hazan, and Yoram Singer. “Adaptive Subgradient Methods for
Online Learning and Stochastic Optimization”. In: volume 12. Journal of Machine
Learning Research, 2011, pages 2121–2159 (cited on pages 320, 321).
[413] Matthew D. Zeiler. “ADADELTA:An Adaptive Learning Rate Method”. In: arXiv
preprint arXiv:1212.5701, 2012 (cited on page 320).
[414] Tijmen Tieleman and Geoffrey Hinton. “Lecture 6.5-rmsprop: Divide the gradient
by a running average of its recent magnitude”. In: volume 4. 2. COURSERA: Neu-
ral networks for machine learning, 2012, pages 26–31 (cited on pages 320, 322).
[415] Diederik P. Kingma and Jimmy Ba. “Adam: A Method for Stochastic Optimiza-
tion”. In: International Conference on Learning Representations, 2015 (cited on
pages 320, 322, 377).
[416] Timothy Dozat. “Incorporating Nesterov Momentum into Adam”. In: International
Conference on Learning Representations, 2016 (cited on page 320).
[417] Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. “On the Convergence of Adam
and Beyond”. In: International Conference on Learning Representations, 2018 (cited
on page 320).
BIBLIOGRAPHY 687
[418] Tong Xiao, Jingbo Zhu, Tongran Liu, and Chunliang Zhang. “Fast Parallel Train-
ing of Neural Language Models”. In: International Joint Conference on Artificial
Intelligence, 2017, pages 4193–4199 (cited on pages 324, 380).
[419] Sergey Ioffe and Christian Szegedy. “Batch Normalization: Accelerating Deep Net-
work Training by Reducing Internal Covariate Shift”. In: volume 37. International
Conference on Machine Learning, 2015, pages 448–456 (cited on page
325).
[420] Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey Hinton. “Layer Normalization”.
In: volume abs/1607.06450. CoRR, 2016 (cited on pages 325, 422, 514).
[421] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learn-
ing for Image Recognition”. In: IEEE Conference on Computer Vision and Pattern
Recognition, 2016, pages 770–778 (cited on pages 325, 387, 395, 422, 514).
[422] Ngoc-quan Pham, German Kruszewski, and Gemma Boleda. “Convolutional Neu-
ral Network Language Models”. In: Conference on Empirical Methods in Natural
Language Processing, 2016 (cited on page 338).
[423] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean.
“Distributed Representations of Words and Phrases and their Compositionality”.
In: Conference on Neural Information Processing Systems, 2013, pages 3111–3119
(cited on pages 340, 343).
[424] Raha Moraffah, Mansooreh Karami, Ruocheng Guo, Adrienne Raglin, and Huan
Liu. “Causal Interpretability for Machine Learning-Problems, Methods and Eval-
uation”. In: volume 22. 1. ACM SIGKDD Conference on Knowledge Discovery
and Data Mining, 2020, pages 18–33 (cited on page 343).
[425] Boris Kovalerchuk, Muhammad Ahmad, and Ankur Teredesai. “Survey of explain-
able machine learning with visual and granular methods beyond quasi-explanations”.
In: volume abs/2009.10221. ArXiv, 2020 (cited on page 343).
[426] Finale Doshi-Velez and Been Kim. “Towards A Rigorous Science of Interpretable
Machine Learning”. In: arXiv preprint arXiv:1702.08608, 2017 (cited on page 343).
[427] Philip Arthur, Graham Neubig, and Satoshi Nakamura. “Incorporating Discrete
Translation Lexicons into Neural Machine Translation”. In: Conference on Empir-
ical Methods in Natural Language Processing, 2016, pages 1557–1567 (cited on
page 343).
[428] Jiacheng Zhang, Yang Liu, Huanbo Luan, Jingfang Xu, and Maosong Sun. “Prior
Knowledge Integration for Neural Machine Translation using Posterior Regulariza-
tion”. In: Annual Meeting of the Association for Computational Linguistics, 2017,
pages 1514–1523 (cited on pages 343, 386).
688 BIBLIOGRAPHY
[429] Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne. “Syntactically Guided
Neural Machine Translation”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2016 (cited on page 343).
[430] Anna Currey and Kenneth Heafield. “Incorporating source syntax into transformer-
based neural machine translation”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2019, pages 24–33 (cited on page
343).
[431] Baosong Yang, Derek Wong, Tong Xiao, Lidia Chao, and Jingbo Zhu. “Towards
Bidirectional Hierarchical Representations for Attention-based Neural Machine
Translation”. In: Conference on Empirical Methods in Natural Language Process-
ing, 2017, pages 1432–1441 (cited on pages 343, 386, 530).
[432] David Mareček and Rudolf Rosa. “Extracting syntactic trees from transformer en-
coder self-attentions”. In: Conference on Empirical Methods in Natural Language
Processing, 2018, pages 347–349 (cited on page 343).
[433] Terra Blevins, Omer Levy, and Luke Zettlemoyer. “Deep rnns encode soft hierar-
chical syntax”. In: Annual Meeting of the Association for Computational Linguis-
tics, 2018 (cited on page 343).
[434] Youzheng Wu, Xugang Lu, Hitoshi Yamamoto, Shigeki Matsuda, Chiori Hori,
and Hideki Kashioka. “Factored Language Model based on Recurrent Neural Net-
work”. In: International Conference on Computational Linguistics, 2012 (cited on
page 343).
[435] Heike Adel, Ngoc Vu, Katrin Kirchhoff, Dominic Telaar, and Tanja Schultz. “Syn-
tactic and Semantic Features For Code-Switching Factored Language Models”. In:
volume 23. IEEE/ACM Transactions on Audio, Speech, and Language Processing,
2015, pages 431–440 (cited on page 343).
[436] Tian Wang and Kyunghyun Cho. “Larger-Context Language Modelling”. In: An-
nual Meeting of the Association for Computational Linguistics, 2015 (cited on
page 343).
[437] Sungjin Ahn, Heeyoul Choi, Tanel Pärnamaa, and Yoshua Bengio. “A Neural Knowl-
edge Language Model”. In: arXiv preprint arXiv:1608.00318, 2016 (cited on page 343).
[438] Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. “Character-
Aware Neural Language Models”. In: AAAI Conference on Artificial Intelligence,
2016 (cited on page 343).
[439] Kyuyeon Hwang and Wonyong Sung. “Character-level language modeling with
hierarchical recurrent neural networks”. In: International Conference on Acoustics,
Speech and Signal Processing, 2017, pages 5720–5724 (cited on page 343).
BIBLIOGRAPHY 689
[440] Yasumasa Miyamoto and Kyunghyun Cho. “Gated Word-Character Recurrent Lan-
guage Model”. In: Conference on Empirical Methods in Natural Language Process-
ing, 2016, pages 1992–1997 (cited on page 343).
[441] Lyan Verwimp, Joris Pelemans, Hugo Van Hamme, and Patrick Wambacq. “Character-
Word LSTM Language Models”. In: Annual Conference of the European Associa-
tion for Machine Translation, 2017 (cited on page
343).
[442] Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. “Hybrid speech recog-
nition with Deep Bidirectional LSTM”. In: IEEE Workshop on Automatic Speech
Recognition and Understanding, 2013, pages 273–278 (cited on page 343).
[443] Jetic Gu, Hassan S. Shavarani, and Anoop Sarkar. “Top-down Tree Structured
Decoding with Syntactic Connections for Neural Machine Translation and Pars-
ing”. In: Conference on Empirical Methods in Natural Language Processing, 2018,
pages 401–413 (cited on pages 343, 386).
[444] Pengcheng Yin, Chunting Zhou, Junxian He, and Graham Neubig. “StructVAE:
Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing”.
In: Annual Meeting of the Association for Computational Linguistics, 2018 (cited
on page 343).
[445] Roee Aharoni and Yoav Goldberg. “Towards String-To-Tree Neural Machine Trans-
lation”. In: Annual Meeting of the Association for Computational Linguistics, 2017
(cited on pages 343, 534).
[446] Jasmijn Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Sima’an.
“Graph Convolutional Encoders for Syntax-aware Neural Machine Translation”.
In: Conference on Empirical Methods in Natural Language Processing, 2017 (cited
on page 343).
[447] Rik Koncel-Kedziorski, Dhanush Bekal, Yi Luan, Mirella Lapata, and Hannaneh
Hajishirzi. “Text Generation from Knowledge Graphs with Graph Transformers”.
In: Annual Conference of the North American Chapter of the Association for Com-
putational Linguistics, 2019 (cited on page 343).
[448] Bryan Mccann, James Bradbury, Caiming Xiong, and Richard Socher. “Learned in
Translation: Contextualized Word Vectors”. In: Conference on Neural Information
Processing Systems, 2017, pages 6294–6305 (cited on pages 343, 553).
[449] Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard M. Schwartz,
and John Makhoul. “Fast and Robust Neural Network Joint Models for Statistical
Machine Translation”. In: Annual Meeting of the Association for Computational
Linguistics, 2014, pages 1370–1380 (cited on pages 348, 387).
690 BIBLIOGRAPHY
[450] Holger Schwenk. “Continuous Space Translation Models for Phrase-Based Statisti-
cal Machine Translation”. In: International Conference on Computational Linguis-
tics, 2012, pages 1071–1080 (cited on page 348).
[451] Nal Kalchbrenner and Phil Blunsom. “Recurrent Continuous Translation Models”.
In: Annual Meeting of the Association for Computational Linguistics, 2013, pages 1700–
1709 (cited on pages 348, 359, 387, 394).
[452] Sepp Hochreiter. “The Vanishing Gradient Problem During Learning Recurrent
Neural Nets and Problem Solutions”. In: volume 6. 2. International Journal of Un-
certainty, Fuzziness and Knowledge-Based Systems, 1998, pages 107–116 (cited
on page 348).
[453] Yoshua Bengio, Patrice Y. Simard, and Paolo Frasconi. “Learning long-term de-
pendencies with gradient descent is difficult”. In: volume 5. 2. IEEE Transportation
Neural Networks, 1994, pages 157–166 (cited on page 348).
[454] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi,
Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff
Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan
Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian,
Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick,
Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. “Google’s Neu-
ral Machine Translation System: Bridging the Gap between Human and Machine
Translation”. In: volume abs/1609.08144. CoRR, 2016 (cited on pages 349, 359,
374, 375, 394, 476).
[455] Felix Stahlberg. “Neural Machine Translation: A Review”. In: volume 69. Journal
of Artificial Intelligence Research, 2020, pages 343–418 (cited on pages 349, 404,
479).
[456] Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo, and Marcello Federico. “Neu-
ral versus Phrase-Based Machine Translation Quality: a Case Study”. In: Annual
Meeting of the Association for Computational Linguistics, 2016, pages 257–267
(cited on pages 350, 351).
[457] Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark,
Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis,
Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank
Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang,
Zhirui Zhang, and Ming Zhou. “Achieving Human Parity on Automatic Chinese
to English News Translation”. In: volume abs/1803.05567. CoRR, 2018 (cited on
pages 350, 351, 546, 557).
BIBLIOGRAPHY 691
[458] Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey,
George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish
Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, and Mac-
duff Hughes. “The Best of Both Worlds: Combining Recent Advances in Neural
Machine Translation”. In: Annual Meeting of the Association for Computational
Linguistics, 2018, pages 76–86 (cited on pages 352, 483, 510).
[459] Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, and Tie-Yan Liu.
“Layer-Wise Coordination between Encoder and Decoder for Neural Machine Trans-
lation”. In: Conference on Neural Information Processing Systems, 2018 (cited on
page 352).
[460] Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. “Self-Attention with Relative
Position Representations”. In: Proceedings of the Human Language Technology
Conference of the North American Chapter of the Association for Computational
Linguistics, 2018, pages 464–468 (cited on pages 352, 416, 429, 497, 502, 503,
505).
[461] Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek Wong, and
Lidia Chao. “Learning Deep Transformer Models for Machine Translation”. In: An-
nual Meeting of the Association for Computational Linguistics, 2019, pages 1810–
1822 (cited on pages 352, 423, 426, 429, 497, 514, 516, 519, 520, 522, 524, 525).
[462] Bei Li, Ziyang Wang, Hui Liu, Yufan Jiang, Quan Du, Tong Xiao, Huizhen Wang,
and Jingbo Zhu. “Shallow-to-Deep Training for Neural Machine Translation”. In:
Conference on Empirical Methods in Natural Language Processing, 2020 (cited on
pages 352, 429, 502, 519, 524, 526).
[463] Xiangpeng Wei, Heng Yu, Yue Hu, Yue Zhang, Rongxiang Weng, and Weihua
Luo. “Multiscale Collaborative Deep Models for Neural Machine Translation”. In:
Annual Meeting of the Association for Computational Linguistics, 2020 (cited on
pages 352, 429).
[464] Yanyang Li, Qiang Wang, Tong Xiao, Tongran Liu, and Jingbo Zhu. “Neural Ma-
chine Translation with Joint Representation”. In: AAAI Conference on Artificial
Intelligence, 2020, pages 8285–8292 (cited on pages 355, 542, 544).
[465] Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio.
“On the Properties of Neural Machine Translation: Encoder-Decoder Approaches”.
In: Annual Meeting of the Association for Computational Linguistics, 2014, pages 103–
111 (cited on page 359).
692 BIBLIOGRAPHY
[466] Sébastien Jean, KyungHyun Cho, Roland Memisevic, and Yoshua Bengio. “On
Using Very Large Target Vocabulary for Neural Machine Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2015, pages 1–10 (cited
on pages 359, 434, 480).
[467] Sepp Hochreiter and Jürgen Schmidhuber. “Long Short-term Memory”. In: vol-
ume 9. Neural Computation, Dec. 1997, pages 1735–80 (cited on page 363).
[468] Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi
Bougares, Holger Schwenk, and Yoshua Bengio. “Learning Phrase Representa-
tions using RNN Encoder-Decoder for Statistical Machine Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2014, pages 1724–1734
(cited on page 365).
[469] Rico Sennrich, Orhan Firat, Kyunghyun Cho, Barry Haddow, Alexandra Birch, Ju-
lian Hitschler, Marcin Junczys-Dowmunt, Samuel Läubli, Antonio Valerio Miceli
Barone, Jozef Mokry, and Maria Nadejde. “Nematus: a Toolkit for Neural Ma-
chine Translation”. In: Annual Conference of the European Association for Ma-
chine Translation, 2017, pages 65–68 (cited on pages 374, 633).
[470] Xavier Glorot and Yoshua Bengio. “Understanding the difficulty of training deep
feedforward neural networks”. In: volume 9. International Conference on Artificial
Intelligence and Statistics, 2010, pages 249–256 (cited on pages 377, 520, 521).
[471] Hirotugu Akaike. “Fitting autoregressive models for prediction”. In: volume 21(1).
Annals of the institute of Statistical Mathematics, 2015, pages 243–247 (cited on
page 382).
[472] Yanyang Li, Tong Xiao, Yinqiao Li, Qiang Wang, Changming Xu, and Jingbo Zhu.
“A Simple and Effective Approach to Coverage-Aware Neural Machine Transla-
tion”. In: Annual Meeting of the Association for Computational Linguistics, 2018,
pages 292–297 (cited on pages 385, 473, 477, 479, 552).
[473] Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. “Modeling
Coverage for Neural Machine Translation”. In: Annual Meeting of the Association
for Computational Linguistics, 2016 (cited on pages 385, 476, 477, 552).
[474] Biao Zhang and Rico Sennrich. “A Lightweight Recurrent Network for Sequence
Modeling”. In: Annual Meeting of the Association for Computational Linguistics,
2019, pages 1538–1548 (cited on page 386).
[475] Tao Lei, Yu Zhang, and Yoav Artzi. “Training RNNs as Fast as CNNs”. In: vol-
ume abs/1709.02755. CoRR, 2017 (cited on page 386).
BIBLIOGRAPHY 693
[476] Biao Zhang, Deyi Xiong, Jinsong Su, Qian Lin, and Huiji Zhang. “Simplifying
Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Net-
works”. In: Conference on Empirical Methods in Natural Language Processing,
2018, pages 4273–4283 (cited on page 386).
[477] Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, and Min Zhang.
“Neural Machine Translation Advised by Statistical Machine Translation”. In: AAAI
Conference on Artificial Intelligence, 2017, pages 3330–3336 (cited on page 386).
[478] Wei He, Zhongjun He, Hua Wu, and Haifeng Wang. “Improved Neural Machine
Translation with SMT Features”. In: AAAI Conference on Artificial Intelligence,
2016, pages 151–157 (cited on page 386).
[479] Xintong Li, Guanlin Li, Lemao Liu, Max Meng, and Shuming Shi. “On the Word
Alignment from Neural Machine Translation”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2019, pages 1293–1303 (cited on page 386).
[480] Yau-Shian Wang, Hung-yi Lee, and Yun-Nung Chen. “Tree Transformer: Integrat-
ing Tree Structures into Self-Attention”. In: Conference on Empirical Methods in
Natural Language Processing, 2019, pages 1061–1070 (cited on page 386).
[481] Xinyi Wang, Hieu Pham, Pengcheng Yin, and Graham Neubig. “A Tree-based De-
coder for Neural Machine Translation”. In: Conference on Empirical Methods in
Natural Language Processing, 2018, pages 4772–4777 (cited on pages 386, 535).
[482] Jiajun Zhang and Chengqing Zong. “Bridging Neural Machine Translation and
Bilingual Dictionaries”. In: volume abs/1610.07272. CoRR, 2016 (cited on page 386).
[483] Xiangyu Duan, Baijun Ji, Hao Jia, Min Tan, Min Zhang, Boxing Chen, Weihua Luo,
and Yue Zhang. “Bilingual Dictionary Based Neural Machine Translation without
Using Parallel Sentences”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2020, pages 1570–1579 (cited on page 386).
[484] Qian Cao and Deyi Xiong. “Encoding Gated Translation Memory into Neural Ma-
chine Translation”. In: Conference on Empirical Methods in Natural Language Pro-
cessing, 2018, pages 3042–3047 (cited on page 386).
[485] Haitao Mi, Zhiguo Wang, and Abe Ittycheriah. “Supervised Attentions for Neural
Machine Translation”. In: Annual Meeting of the Association for Computational
Linguistics, 2016, pages 2283–2288 (cited on page 386).
[486] Lemao Liu, Masao Utiyama, Andrew M. Finch, and Eiichiro Sumita. “Neural Ma-
chine Translation with Supervised Attention”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2016, pages 3093–3102 (cited on page 386).
694 BIBLIOGRAPHY
[487] Lesly Miculicich Werlen, Dhananjay Ram, Nikolaos Pappas, and James Hender-
son. “Document-Level Neural Machine Translation with Hierarchical Attention
Networks”. In: Conference on Empirical Methods in Natural Language Process-
ing, 2018, pages 2947–2954 (cited on pages 386, 601, 603).
[488] Elena Voita, Pavel Serdyukov, Rico Sennrich, and Ivan Titov. “Context-Aware
Neural Machine Translation Learns Anaphora Resolution”. In: Annual Meeting
of the Association for Computational Linguistics, 2018, pages 1264–1274 (cited
on pages 386, 600–603).
[489] Bei Li, Hui Liu, Ziyang Wang, Yufan Jiang, Tong Xiao, Jingbo Zhu, Tongran Liu,
and Changliang Li. “Does Multi-Encoder Help? A Case Study on Context-Aware
Neural Machine Translation”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2020, pages 3512–3518 (cited on pages 386, 448, 602, 607).
[490] Alexander Waibel, Toshiyuki Hanazawa, Geoffrey Hinton, Kiyohiro Shikano, and
Kevin J. Lang. “Phoneme recognition using time-delay neural networks”. In: vol-
ume 37. International Conference on Acoustics, Speech and Signal Processing,
1989, pages 328–339 (cited on page 387).
[491] Yann Lecun, Bernhard Boser, John Denker, Don Henderson, Richard E.Howard,
Wayne E. Hubbard, and Larry Jackel. “Backpropagation Applied to Handwritten
Zip Code Recognition”. In: volume 1. Neural Computation, 1989, pages 541–551
(cited on page 387).
[492] Yann Lecun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. “Gradient-based
learning applied to document recognition”. In: volume 86. 11. Proceedings of the
IEEE, 1998, pages 2278–2324 (cited on pages 387, 405).
[493] Yu Zhang, William Chan, and Navdeep Jaitly. “Very deep convolutional networks
for end-to-end speech recognition”. In: International Conference on Acoustics,
Speech and Signal Processing, 2017, pages 4845–4849 (cited on page 387).
[494] Li Deng, Ossama Abdel-Hamid, and Dong Yu. “A deep convolutional neural net-
work using heterogeneous pooling for trading acoustic invariance with phonetic
confusion”. In: International Conference on Acoustics, Speech and Signal Process-
ing, 2013, pages 6669–6673 (cited on page 387).
[495] Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. “A Convolutional Neu-
ral Network for Modelling Sentences”. In: Annual Meeting of the Association for
Computational Linguistics, 2014, pages 655–665 (cited on pages 387, 392, 402,
406).
BIBLIOGRAPHY 695
[496] Yoon Kim. “Convolutional Neural Networks for Sentence Classification”. In: Con-
ference on Empirical Methods in Natural Language Processing, 2014, pages 1746–
1751 (cited on pages 387, 392, 393, 402, 406).
[497] Mingbo Ma, Liang Huang, Bowen Zhou, and Bing Xiang. “Dependency-based
Convolutional Neural Networks for Sentence Embedding”. In: Annual Meeting
of the Association for Computational Linguistics, 2015, pages 174–179 (cited on
page 387).
[498]
́
cero Nogueira dos Santos and Maira Gatti. “Deep Convolutional Neural Net-
works for Sentiment Analysis of Short Texts”. In: International Conference on
Computational Linguistics, 2014, pages 69–78 (cited on pages 387, 392).
[499] Mingxuan Wang, Zhengdong Lu, Hang Li, Wenbin Jiang, and Qun Liu. “genCNN:
A Convolutional Architecture for Word Sequence Prediction”. In: Annual Meeting
of the Association for Computational Linguistics, 2015, pages 1567–1576 (cited
on page 387).
[500] Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. “Language Mod-
eling with Gated Convolutional Networks”. In: volume 70. International Confer-
ence on Machine Learning, 2017, pages 933–941 (cited on pages 387, 395, 396).
[501] Jonas Gehring, Michael Auli, David Grangier, and Yann N. Dauphin. “A Con-
volutional Encoder Model for Neural Machine Translation”. In: Annual Meeting
of the Association for Computational Linguistics, 2017, pages 123–135 (cited on
pages 387, 394).
[502] Lukasz Kaiser, Aidan N. Gomez, and François Chollet. “Depthwise Separable Con-
volutions for Neural Machine Translation”. In: International Conference on Learn-
ing Representations, 2018 (cited on pages 387, 394, 402).
[503] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-
Yang Fu, and Alexander C. Berg. “SSD: Single Shot MultiBox Detector”. In: vol-
ume 9905. European Conference on Computer Vision, 2016, pages 21–37 (cited
on page 388).
[504] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. “Faster R-CNN: Towards
Real-Time Object Detection with Region Proposal Networks”. In: volume 39. 6.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, pages 1137–
1149 (cited on pages 388, 596).
696 BIBLIOGRAPHY
[505] Rie Johnson and Tong Zhang. “Effective Use of Word Order for Text Categoriza-
tion with Convolutional Neural Networks”. In: Proceedings of the Human Lan-
guage Technology Conference of the North American Chapter of the Association
for Computational Linguistics, 2015, pages 103–112 (cited on pages 392, 402).
[506] Thien Huu Nguyen and Ralph Grishman. “Relation Extraction: Perspective from
Convolutional Neural Networks”. In: Proceedings of the Human Language Tech-
nology Conference of the North American Chapter of the Association for Compu-
tational Linguistics, 2015, pages 39–48 (cited on page 392).
[507] Felix Wu, Angela Fan, Alexei Baevski, Yann Dauphin, and Michael Auli. “Pay
Less Attention with Lightweight and Dynamic Convolutions”. In: International
Conference on Learning Representations, 2019 (cited on pages 394, 402, 404, 405,
429, 483, 508, 510).
[508] Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob Fergus. “End-To-End
Memory Networks”. In: Conference on Neural Information Processing Systems,
2015, pages 2440–2448 (cited on page 395).
[509] Md. Amirul Islam, Sen Jia, and Neil Bruce. “How much Position Information Do
Convolutional Neural Networks Encode?” In: International Conference on Learn-
ing Representations, 2020 (cited on page 396).
[510] Ilya Sutskever, James Martens, George E. Dahl, and Geoffrey Hinton. “On the im-
portance of initialization and momentum in deep learning”. In: International Con-
ference on Machine Learning, 2013, pages 1139–1147 (cited on page 400).
[511] Yoshua Bengio, Nicolas Boulanger-Lewandowski, and Razvan Pascanu. “Advances
in optimizing recurrent networks”. In: International Conference on Acoustics, Speech
and Signal Processing, 2013, pages 8624–8628 (cited on page 401).
[512] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan
Salakhutdinov. “Dropout: A Simple Way to Prevent Neural Networks from Over-
fitting”. In: volume 15. Journal of Machine Learning Research, 2014, pages 1929–
1958 (cited on pages 401, 426, 445).
[513] François Chollet. “Xception: Deep Learning with Depthwise Separable Convolu-
tions”. In: IEEE Conference on Computer Vision and Pattern Recognition, 2017,
pages 1800–1807 (cited on pages 402, 540).
[514] Andrew Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang,
Tobias Weyand, Marco Andreetto, and Hartwig Adam. “MobileNets: Efficient Con-
volutional Neural Networks for Mobile Vision Applications”. In: CoRR, 2017 (cited
on page 402).
BIBLIOGRAPHY 697
[515] Rie Johnson and Tong Zhang. “Deep Pyramid Convolutional Neural Networks for
Text Categorization”. In: Annual Meeting of the Association for Computational
Linguistics, 2017, pages 562–570 (cited on page 402).
[516] Laurent Sifre and Stéphane Mallat. “Rigid-motion scattering for image classifica-
tion”. In: Citeseer, 2014 (cited on page 402).
[517] Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. “DeepFace:
Closing the Gap to Human-Level Performance in Face Verification”. In: IEEE
Conference on Computer Vision and Pattern Recognition, 2014, pages 1701–1708
(cited on page 405).
[518] Yu-hsin Chen, Ignacio Lopez-Moreno, Tara Sainath, Mirkó Visontai, Raziel Al-
varez, and Carolina Parada. “Locally-connected and convolutional neural networks
for small footprint speaker recognition”. In: Conference of the International Speech
Communication Association, 2015, pages 1136–1140 (cited on page 405).
[519] Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, and Zicheng
Liu. “Dynamic Convolution: Attention Over Convolution Kernels”. In: IEEE Con-
ference on Computer Vision and Pattern Recognition, 2020, pages 11027–11036
(cited on page 405).
[520] Peng Zhou, Suncong Zheng, Jiaming Xu, Zhenyu Qi, Hongyun Bao, and Bo Xu.
“Joint Extraction of Multiple Relations and Entities by Using a Hybrid Neural Net-
work”. In: volume 10565. Springer, 2017, pages 135–146 (cited on page 406).
[521] Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and Jun Zhao. “Event Extraction
via Dynamic Multi-Pooling Convolutional Neural Networks”. In: Annual Meeting
of the Association for Computational Linguistics, 2015, pages 167–176 (cited on
page 406).
[522] Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. “Relation Clas-
sification via Convolutional Deep Neural Network”. In: International Conference
on Computational Linguistics, 2014, pages 2335–2344 (cited on page 406).
[523] Thien Huu Nguyen and Ralph Grishman. “Event Detection and Domain Adaptation
with Convolutional Neural Networks”. In: Annual Meeting of the Association for
Computational Linguistics, 2015, pages 365–371 (cited on page 406).
[524] Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. “Recurrent Convolutional Neural
Networks for Text Classification”. In: AAAI Conference on Artificial Intelligence,
2015, pages 2267–2273 (cited on page 406).
698 BIBLIOGRAPHY
[525] Tao Lei, Regina Barzilay, and Tommi S. Jaakkola. “Molding CNNs for text: non-
linear, non-consecutive convolutions”. In: Conference on Empirical Methods in
Natural Language Processing, 2015, pages 1565–1575 (cited on page 406).
[526] Emma Strubell, Patrick Verga, David Belanger, and Andrew Mccallum. “Fast and
Accurate Entity Recognition with Iterated Dilated Convolutions”. In: Conference
on Empirical Methods in Natural Language Processing, 2017, pages 2670–2680
(cited on page 406).
[527] Xuezhe Ma and Eduard H. Hovy. “End-to-end Sequence Labeling via Bi-directional
LSTM-CNNs-CRF”. In: Annual Meeting of the Association for Computational
Linguistics, 2016 (cited on page 406).
[528] Peng-Hsuan Li, Ruo-Ping Dong, Yu-Siang Wang, Ju-Chieh Chou, and Wei-Yun
Ma. “Leveraging Linguistic Structures for Named Entity Recognition with Bidi-
rectional Recursive Neural Networks”. In: Conference on Empirical Methods in
Natural Language Processing, 2017, pages 2664–2669 (cited on page 406).
[529] Changhan Wang, Kyunghyun Cho, and Douwe Kiela. “Code-Switched Named En-
tity Recognition with Embedding Attention”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2018, pages 154–158 (cited on page 406).
[530] Zhouhan Lin, Minwei Feng,
́
cero Nogueira dos Santos, Mo Yu, Bing Xiang,
Bowen Zhou, and Yoshua Bengio. “A Structured Self-Attentive Sentence Embed-
ding”. In: International Conference on Learning Representations, 2017 (cited on
pages 408, 518).
[531] Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer,
and Alexander Ku. “Image Transformer”. In: volume abs/1802.05751. CoRR, 2018
(cited on page 410).
[532] Linhao Dong, Shuang Xu, and Bo Xu. “Speech-Transformer: A No-Recurrence
Sequence-to-Sequence Model for Speech Recognition”. In: International Confer-
ence on Acoustics, Speech and Signal Processing, 2018, pages 5884–5888 (cited
on page 410).
[533] Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu,
Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, and Ruoming Pang. “Con-
former: Convolution-augmented Transformer for Speech Recognition”. In: Inter-
national Speech Communication Association, 2020, pages 5036–5040 (cited on
pages 410, 508).
BIBLIOGRAPHY 699
[534] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbig-
niew Wojna. “Rethinking the Inception Architecture for Computer Vision”. In:
IEEE Conference on Computer Vision and Pattern Recognition, 2016, pages 2818–
2826 (cited on pages 426, 441).
[535] Ashish Vaswani, Samy Bengio, Eugene Brevdo, François Chollet, Aidan Gomez,
Stephan Gouws, Llion Jones, Lukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan
Sepassi, Noam Shazeer, and Jakob Uszkoreit. “Tensor2Tensor for Neural Machine
Translation”. In: Association for Machine Translation in the Americas, 2018, pages 193–
199 (cited on pages 428, 429, 476, 515, 633).
[536] Matthieu Courbariaux and Yoshua Bengio. “BinaryNet: Training Deep Neural Net-
works with Weights and Activations Constrained to +1 or -1”. In: volume abs/1602.02830.
CoRR, 2016 (cited on page 428).
[537] Ye Lin, Yanyang Li, Tengbo Liu, Tong Xiao, Tongran Liu, and Jingbo Zhu. “To-
wards Fully 8-bit Integer Inference for the Transformer Model”. In: International
Joint Conference on Artificial Intelligence, 2020, pages 3759–3765 (cited on pages 428,
429).
[538] Tong Xiao, Yinqiao Li, Jingbo Zhu, Zhengtao Yu, and Tongran Liu. “Sharing Atten-
tion Weights for Fast Transformer”. In: International Joint Conference on Artificial
Intelligence, 2019, pages 5292–5298 (cited on pages 428, 429, 481–483).
[539] Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. “An-
alyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the
Rest Can Be Pruned”. In: Annual Meeting of the Association for Computational
Linguistics, 2019, pages 5797–5808 (cited on pages 429, 482, 499, 544).
[540] Biao Zhang, Deyi Xiong, and Jinsong Su. “Accelerating Neural Transformer via
an Average Attention Network”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2018, pages 1789–1798 (cited on pages 429, 483).
[541] Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, and Jingbo Zhu.
“Weight Distillation: Transferring the Knowledge in Neural Network Parameters”.
In: volume abs/2009.09152. ArXiv, 2020 (cited on page 429).
[542] Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin, and Song Han. “Lite Transformer
with Long-Short Range Attention”. In: International Conference on Learning Rep-
resentations, 2020 (cited on pages 429, 509).
[543] Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. “Reformer: The Efficient
Transformer”. In: International Conference on Learning Representations, 2020 (cited
on pages 429, 483, 512, 601).
700 BIBLIOGRAPHY
[544] Myle Ott, Sergey Edunov, David Grangier, and Michael Auli. “Scaling Neural Ma-
chine Translation”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2018 (cited on page 429).
[545] Aishwarya Bhandare, Vamsi Sripathi, Deepthi Karkada, Vivek Menon, Sun Choi,
Kushal Datta, and Vikram Saletore. “Efficient 8-Bit Quantization of Transformer
Neural Machine Language Translation Model”. In: volume abs/1906.00532. CoRR,
2019 (cited on pages 429, 485, 499).
[546] Abigail See, Minh-Thang Luong, and Christopher D. Manning. “Compression of
Neural Machine Translation Models via Pruning”. In: International Conference on
Computational Linguistics, 2016, pages 291–301 (cited on page 429).
[547] Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. “Distilling the Knowledge in a
Neural Network”. In: volume abs/1503.02531. CoRR, 2015 (cited on pages 429,
458, 459, 482, 499, 562, 563).
[548] Yoon Kim and Alexander Rush. “Sequence-Level Knowledge Distillation”. In:
Conference on Empirical Methods in Natural Language Processing, 2016, pages 1317–
1327 (cited on pages 429, 459).
[549] Yun Chen, Yang Liu, Yong Cheng, and Victor O. K. Li. “A Teacher-Student Frame-
work for Zero-Resource Neural Machine Translation”. In: Annual Meeting of the
Association for Computational Linguistics, 2017, pages 1925–1935 (cited on pages 429,
561, 562).
[550] Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc Le, and Ruslan
Salakhutdinov. “Transformer-XL: Attentive Language Models Beyond a Fixed-
Length Context”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2019, pages 2978–2988 (cited on pages 429, 502, 505).
[551] Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, and Cho-Jui Hsieh. “Learning to
Encode Position for Transformer with Continuous Dynamical Model”. In: vol-
ume abs/2003.09229. ArXiv, 2020 (cited on pages 429, 506).
[552] Ganesh Jawahar, Benoı
̂
t Sagot, and Djamé Seddah. “What Does BERT Learn about
the Structure of Language?” In: Annual Meeting of the Association for Computa-
tional Linguistics, 2019 (cited on pages 429, 508).
[553] Baosong Yang, Zhaopeng Tu, Derek Wong, Fandong Meng, Lidia Chao, and Tong
Zhang. “Modeling Localness for Self-Attention Networks”. In: Annual Meeting of
the Association for Computational Linguistics, 2018, pages 4449–4458 (cited on
pages 429, 506, 508).
BIBLIOGRAPHY 701
[554] Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, and Zhaopeng Tu.
“Convolutional Self-Attention Networks”. In: Annual Meeting of the Association
for Computational Linguistics, 2019, pages 4040–4045 (cited on pages 429, 508).
[555] Qiang Wang, Fuxue Li, Tong Xiao, Yanyang Li, Yinqiao Li, and Jingbo Zhu. “Multi-
layer Representation Fusion for Neural Machine Translation”. In: volume abs/2002.06714.
International Conference on Computational Linguistics, 2018 (cited on pages
429,
514, 516).
[556] Ankur Bapna, Mia Xu Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. “Training
Deeper Neural Machine Translation Models with Transparent Attention”. In: An-
nual Meeting of the Association for Computational Linguistics, 2018, pages 3028–
3033 (cited on pages 429, 514, 516, 518, 519).
[557] Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Shuming Shi, and Tong Zhang. “Exploit-
ing Deep Representations for Neural Machine Translation”. In: Annual Meeting of
the Association for Computational Linguistics, 2018, pages 4253–4262 (cited on
pages 429, 514, 516, 520, 524).
[558] Xing Wang, Zhaopeng Tu, Longyue Wang, and Shuming Shi. “Exploiting Senten-
tial Context for Neural Machine Translation”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2019 (cited on pages 429, 514).
[559] Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Longyue Wang, Shuming Shi, and Tong
Zhang. “Dynamic Layer Aggregation for Neural Machine Translation with Routing-
by-Agreement”. In: AAAI Conference on Artificial Intelligence, 2019, pages 86–
93 (cited on pages 429, 514, 516, 519).
[560] Mercedes Garcia-Martinez, Loïc Barrault, and Fethi Bougares. “Factored Neural
Machine Translation Architectures”. In: International Workshop on Spoken Lan-
guage Translation (IWSLT’16), 2016 (cited on page 434).
[561] Jason Lee, Kyunghyun Cho, and Thomas Hofmann. “Fully Character-Level Neural
Machine Translation without Explicit Segmentation”. In: volume 5. Transactions
of the Association for Computational Linguistics, 2017, pages 365–378 (cited on
pages 434, 564, 580).
[562] Minh-Thang Luong and Christopher Manning. “Achieving Open Vocabulary Neu-
ral Machine Translation with Hybrid Word-Character Models”. In: Annual Meeting
of the Association for Computational Linguistics, 2016 (cited on pages 435, 633).
[563] Philip Gage. “A new algorithm for data compression”. In: volume 12. The C Users
Journal archive, 1994, pages 23–38 (cited on page 435).
702 BIBLIOGRAPHY
[564] Taku Kudo. “Subword Regularization: Improving Neural Network Translation Mod-
els with Multiple Subword Candidates”. In: Annual Meeting of the Association for
Computational Linguistics, 2018, pages 66–75 (cited on pages 436, 438).
[565] Mike Schuster and Kaisuke Nakajima. “Japanese and Korean voice search”. In:
IEEE International Conference on Acoustics, Speech and Signal Processing, 2012,
pages 5149–5152 (cited on page
436).
[566] Taku Kudo and John Richardson. “SentencePiece: A simple and language indepen-
dent subword tokenizer and detokenizer for Neural Text Processing”. In: Confer-
ence on Empirical Methods in Natural Language Processing, 2018, pages 66–71
(cited on page 438).
[567] Ivan Provilkov, Dmitrii Emelianenko, and Elena Voita. “BPE-Dropout: Simple
and Effective Subword Regularization”. In: Annual Meeting of the Association
for Computational Linguistics, 2020, pages 1882–1892 (cited on page 438).
[568] Xuanli He, Gholamreza Haffari, and Mohammad Norouzi. “Dynamic Program-
ming Encoding for Subword Segmentation in Neural Machine Translation”. In: An-
nual Meeting of the Association for Computational Linguistics, 2020, pages 3042–
3051 (cited on page 438).
[569] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. “Deep learning”. In: volume 521.
7553. Nature, 2015, pages 436–444 (cited on pages 441, 450).
[570] Geoffrey Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan
Salakhutdinov. “Improving neural networks by preventing co-adaptation of feature
detectors”. In: volume abs/1207.0580. CoRR, 2012 (cited on page 443).
[571] Mathias Müller, Annette Rios, and Rico Sennrich. “Domain Robustness in Neural
Machine Translation”. In: Association for Machine Translation in the Americas,
2020, pages 151–164 (cited on page 445).
[572] Nicholas Carlini and David Wagner. “Towards Evaluating the Robustness of Neu-
ral Networks”. In: IEEE Symposium on Security and Privacy, 2017, pages 39–57
(cited on page 445).
[573] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. “Deep-
Fool: A Simple and Accurate Method to Fool Deep Neural Networks”. In: IEEE
Conference on Computer Vision and Pattern Recognition, 2016, pages 2574–2582
(cited on pages 445, 469).
[574] Yong Cheng, Lu Jiang, and Wolfgang Macherey. “Robust Neural Machine Trans-
lation with Doubly Adversarial Inputs”. In: Annual Meeting of the Association for
Computational Linguistics, 2019, pages 4324–4333 (cited on pages 445, 448).
BIBLIOGRAPHY 703
[575] Anh Mai Nguyen, Jason Yosinski, and Jeff Clune. “Deep neural networks are easily
fooled: High confidence predictions for unrecognizable images”. In: IEEE Confer-
ence on Computer Vision and Pattern Recognition, 2015, pages 427–436 (cited on
pages 445, 469).
[576] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan,
Ian J. Goodfellow, and Rob Fergus. “Intriguing properties of neural networks”. In:
International Conference on Learning Representations, 2014 (cited on page 445).
[577] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. “Explaining and Har-
nessing Adversarial Examples”. In: International Conference on Learning Repre-
sentations, 2015 (cited on pages 445–447).
[578] Robin Jia and Percy Liang. “Adversarial Examples for Evaluating Reading Com-
prehension Systems”. In: Conference on Empirical Methods in Natural Language
Processing, 2017, pages 2021–2031 (cited on pages 446, 469).
[579] Giannis Bekoulis, Johannes Deleu, Thomas Demeester, and Chris Develder. “Ad-
versarial training for multi-context joint entity and relation extraction”. In: Confer-
ence on Empirical Methods in Natural Language Processing, 2018, pages 2830–
2836 (cited on page 446).
[580] Michihiro Yasunaga, Jungo Kasai, and Dragomir Radev. “Robust Multilingual Part-
of-Speech Tagging via Adversarial Training”. In: Annual Conference of the North
American Chapter of the Association for Computational Linguistics, 2018, pages 976–
986 (cited on page 446).
[581] Yonatan Belinkov and Yonatan Bisk. “Synthetic and Natural Noise Both Break
Neural Machine Translation”. In: International Conference on Learning Represen-
tations, 2018 (cited on page 446).
[582] Paul Michel, Xian Li, Graham Neubig, and Juan Miguel Pino. “On Evaluation of
Adversarial Perturbations for Sequence-to-Sequence Models”. In: Annual Confer-
ence of the North American Chapter of the Association for Computational Linguis-
tics, 2019, pages 3103–3114 (cited on page 446).
[583] Zhitao Gong, Wenlu Wang, B. Li, D. Song, and W. Ku. “Adversarial Texts with
Gradient Methods”. In: volume abs/1801.07175. ArXiv, 2018 (cited on page 446).
[584] Vaibhav, Sumeet Singh, Craig Stewart, and Graham Neubig. “Improving Robust-
ness of Machine Translation with Synthetic Noise”. In: Annual Conference of the
North American Chapter of the Association for Computational Linguistics, 2019,
pages 1916–1920 (cited on page 446).
704 BIBLIOGRAPHY
[585] Antonios Anastasopoulos, Alison Lui, Toan Nguyen, and David Chiang. “Neural
Machine Translation of Text from Non-Native Speakers”. In: Annual Conference
of the North American Chapter of the Association for Computational Linguistics,
2019, pages 3070–3080 (cited on page 446).
[586] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. “Semantically Equiva-
lent Adversarial Rules for Debugging NLP models”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2018, pages 856–865 (cited on page 446).
[587] Suranjana Samanta and Sameep Mehta. “Towards Crafting Text Adversarial Sam-
ples”. In: volume abs/1707.02812. CoRR, 2017 (cited on page 447).
[588] Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi.
“Deep Text Classification Can be Fooled”. In: International Joint Conference on
Artificial Intelligence, 2018, pages 4208–4215 (cited on page 447).
[589] Javid Ebrahimi, Daniel Lowd, and Dejing Dou. “On Adversarial Examples for
Character-Level Neural Machine Translation”. In: International Conference on Com-
putational Linguistics, 2018, pages 653–663 (cited on page 447).
[590] Fei Gao, Jinhua Zhu, Lijun Wu, Yingce Xia, Tao Qin, Xueqi Cheng, Wengang
Zhou, and Tie-Yan Liu. “Soft Contextual Data Augmentation for Neural Machine
Translation”. In: Annual Meeting of the Association for Computational Linguistics,
2019, pages 5539–5544 (cited on pages 447, 550, 580).
[591] Zhengli Zhao, Dheeru Dua, and Sameer Singh. “Generating Natural Adversarial
Examples”. In: International Conference on Learning Representations, 2018 (cited
on page 447).
[592] Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, and Yang Liu. “Towards
Robust Neural Machine Translation”. In: Annual Meeting of the Association for
Computational Linguistics, 2018, pages 1756–1766 (cited on pages 447, 585).
[593] Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, and Zhongjun He. “Robust
Neural Machine Translation with Joint Textual and Phonetic Embedding”. In: An-
nual Meeting of the Association for Computational Linguistics, 2019, pages 3044–
3049 (cited on page 448).
[594] Stanley Chen and Ronald Rosenfeld. “A Gaussian prior for smoothing maximum
entropy models”. In: CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL
OF COMPUTER SCIENCE, 1999 (cited on page 448).
[595] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol.
“Extracting and composing robust features with denoising autoencoders”. In: In-
ternational Conference on Machine Learning, 2008 (cited on pages 449, 548, 549).
BIBLIOGRAPHY 705
[596] Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, and Hang Li. “Neural ma-
chine translation with reconstruction”. In: volume 31. 1. AAAI Conference on Ar-
tificial Intelligence, 2017 (cited on page 449).
[597] Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. “Scheduled Sam-
pling for Sequence Prediction with Recurrent Neural Networks”. In: Annual Con-
ference on Neural Information Processing Systems, 2015, pages 1171–1179 (cited
on pages 450, 451, 479).
[598] Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. “Se-
quence Level Training with Recurrent Neural Networks”. In: International Confer-
ence on Learning Representations, 2016 (cited on pages 450, 469, 479).
[599] Chen Xu, Bojie Hu, Yufan Jiang, Kai Feng, Zeyang Wang, Shen Huang, Qi Ju, Tong
Xiao, and Jingbo Zhu. “Dynamic Curriculum Learning for Low-Resource Neural
Machine Translation”. In: International Committee on Computational Linguistics,
2020, pages 3977–3989 (cited on pages 451, 466).
[600] Lijun Wu, Yingce Xia, Fei Tian, Li Zhao, Tao Qin, Jianhuang Lai, and Tie-Yan Liu.
“Adversarial Neural Machine Translation”. In: volume 95. Proceedings of Machine
Learning Research. Asian Conference on Machine Learning, 2018, pages 534–549
(cited on page 452).
[601] Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe,
Joelle Pineau, Aaron C. Courville, and Yoshua Bengio. “An Actor-Critic Algo-
rithm for Sequence Prediction”. In: International Conference on Learning Repre-
sentations, 2017 (cited on pages 453, 455, 469).
[602] Sham M. Kakade. “A Natural Policy Gradient”. In: Advances in Neural Informa-
tion Processing Systems, 2001, pages 1531–1538 (cited on page 454).
[603] Peter Henderson, Joshua Romoff, and Joelle Pineau. “Where Did My Optimum
Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient
Methods”. In: volume abs/1810.02525. CoRR, 2018 (cited on page 454).
[604] Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. “Understanding
Back-Translation at Scale”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2018, pages 489–500 (cited on pages 454, 546, 547).
[605] Wouter Kool, Herke van Hoof, and Max Welling. “Stochastic Beams and Where To
Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replace-
ment”. In: volume 97. Proceedings of Machine Learning Research. International
Conference on Machine Learning, 2019, pages 3499–3508 (cited on page 454).
706 BIBLIOGRAPHY
[606] Richard Sutton and Andrew Barto. Reinforcement learning: An introduction. MIT
press, 2018 (cited on page 456).
[607] David Silver, Aja Huang, Chris Maddison, Arthur Guez, Laurent Sifre, George
van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneer-
shelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalch-
brenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu,
Thore Graepel, and Demis Hassabis. “Mastering the game of Go with deep neural
networks and tree search”. In: volume 529. 7587. Nature, 2016, pages 484–489
(cited on page 457).
[608] Wojciech Zaremba, Tomas Mikolov, Armand Joulin, and Rob Fergus. “Learning
Simple Algorithms from Examples”. In: volume 48. JMLR Workshop and Confer-
ence Proceedings. International Conference on Machine Learning, 2016, pages 421–
429 (cited on page 457).
[609] Andrew Ng, Daishi Harada, and Stuart Russell. “Policy Invariance Under Reward
Transformations: Theory and Application to Reward Shaping”. In: International
Conference on Machine Learning, 1999, pages 278–287 (cited on page 457).
[610] Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, and
Joseph E Gonzalez. “Train large, then compress: Rethinking model size for effi-
cient training and inference of transformers”. In: arXiv preprint arXiv:2002.11794,
2020 (cited on page 459).
[611] Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, and
Joseph E. Gonzalez. “Train Large, Then Compress: Rethinking Model Size for Effi-
cient Training and Inference of Transformers”. In: volume abs/2002.11794. CoRR,
2020 (cited on page 459).
[612] Sauleh Eetemadi, William Lewis, Kristina Toutanova, and Hayder Radha. “Survey
of data-selection methods in statistical machine translation”. In: volume 29. 3-4.
Machine Translation, 2015, pages 189–223 (cited on page 462).
[613] Denny Britz, Quoc Le, and Reid Pryzant. “Effective domain mixing for neural ma-
chine translation”. In: Proceedings of the Second Conference on Machine Transla-
tion, 2017, pages 118–126 (cited on pages 462, 578).
[614] Amittai Axelrod, Xiaodong He, and Jianfeng Gao. “Domain Adaptation via Pseudo
In-Domain Data Selection”. In: Conference on Empirical Methods in Natural Lan-
guage Processing, 2011, pages 355–362 (cited on page 463).
BIBLIOGRAPHY 707
[615] Amittai Axelrod, Philip Resnik, Xiaodong He, and Mari Ostendorf. “Data Selec-
tion With Fewer Words”. In: Conference on Empirical Methods in Natural Lan-
guage Processing, 2015, pages 58–65 (cited on page 463).
[616] Rui Wang, Masao Utiyama, Lemao Liu, Kehai Chen, and Eiichiro Sumita. “In-
stance Weighting for Neural Machine Translation Domain Adaptation”. In: Con-
ference on Empirical Methods in Natural Language Processing, 2017, pages 1482–
1488 (cited on pages 463, 611).
[617] Saab Mansour, Joern Wuebker, and Hermann Ney. “Combining translation and
language model scoring for domain-specific data filtering”. In: International Work-
shop on Spoken Language Translation, 2011, pages 222–229 (cited on page 463).
[618] Boxing Chen and Fei Huang. “Semi-supervised Convolutional Networks for Trans-
lation Adaptation with Tiny Amount of In-domain Data”. In: The SIGNLL Confer-
ence on Computational Natural Language Learning, 2016, pages 314–323 (cited
on page 463).
[619] Boxing Chen, Roland Kuhn, George Foster, Colin Cherry, and Fei Huang. “Bilin-
gual methods for adaptive training data selection for machine translation”. In: As-
sociation for Machine Translation in the Americas, 2016, pages 93–103 (cited on
page 463).
[620] Boxing Chen, Colin Cherry, George Foster, and Samuel Larkin. “Cost Weighting
for Neural Machine Translation Domain Adaptation”. In: Annual Meeting of the
Association for Computational Linguistics, 2017, pages 40–46 (cited on page 463).
[621] Mirela-Stefania Duma and Wolfgang Menzel. “Automatic Threshold Detection for
Data Selection in Machine Translation”. In: Proceedings of the Second Conference
on Machine Translation, 2017, pages 483–488 (cited on page 463).
[622] Ergun Biçici and Deniz Yuret. “Instance Selection for Machine Translation using
Feature Decay Algorithms”. In: Proceedings of the Sixth Workshop on Statistical
Machine Translation, 2011, pages 272–283 (cited on page 463).
[623] Alberto Poncelas, Gideon Maillette de Buy Wenniger, and Andy Way. “Feature
decay algorithms for neural machine translation”. In: European Association for
Machine Translation, 2018 (cited on page 463).
[624] Xabier Soto, Dimitar Sht. Shterionov, Alberto Poncelas, and Andy Way. “Selecting
Backtranslated Data from Multiple Sources for Improved Neural Machine Transla-
tion”. In: Annual Meeting of the Association for Computational Linguistics, 2020,
pages 3898–3908 (cited on page 463).
708 BIBLIOGRAPHY
[625] Marlies van der Wees, Arianna Bisazza, and Christof Monz. “Dynamic Data Se-
lection for Neural Machine Translation”. In: Conference on Empirical Methods in
Natural Language Processing, 2017, pages 1400–1410 (cited on pages 463, 579).
[626] Wei Wang, Taro Watanabe, Macduff Hughes, Tetsuji Nakagawa, and Ciprian Chelba.
“Denoising Neural Machine Translation Training with Trusted Data and Online
Data Selection”. In: Proceedings of the Third Conference on Machine Translation,
2018, pages 133–143 (cited on pages 463, 464).
[627] Rui Wang, Masao Utiyama, and Eiichiro Sumita. “Dynamic Sentence Sampling for
Efficient Training of Neural Machine Translation”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2018, pages 298–304 (cited on page 463).
[628] Huda Khayrallah and Philipp Koehn. “On the Impact of Various Types of Noise on
Neural Machine Translation”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2018, pages 74–83 (cited on page 464).
[629] Lluı
́
s Formiga and José A. R. Fonollosa. “Dealing with Input Noise in Statistical
Machine Translation”. In: International Conference on Computational Linguistics,
2012, pages 319–328 (cited on page 464).
[630] Lei Cui, Dongdong Zhang, Shujie Liu, Mu Li, and Ming Zhou. “Bilingual Data
Cleaning for SMT using Graph-based Random Walk”. In: Annual Meeting of the
Association for Computational Linguistics, 2013, pages 340–345 (cited on page 464).
[631] Mohammed Mediani. “Learning from Noisy Data in Statistical Machine Trans-
lation”. PhD thesis. Karlsruhe Institute of Technology, Germany, 2017 (cited on
page 464).
[632] Spencer Rarrick, Chris Quirk, and Will Lewis. “MT detection in web-scraped par-
allel corpora”. In: Machine Translation, 2011, pages 422–430 (cited on page 464).
[633] Kaveh Taghipour, Shahram Khadivi, and Jia Xu. “Parallel corpus refinement as an
outlier detection algorithm”. In: Machine Translation, 2011, pages 414–421 (cited
on page 464).
[634] Hainan Xu and Philipp Koehn. “Zipporah: a Fast and Scalable Data Cleaning Sys-
tem for Noisy Web-Crawled Parallel Corpora”. In: Conference on Empirical Meth-
ods in Natural Language Processing. 2017 (cited on page 464).
[635] Marine Carpuat, Yogarshi Vyas, and Xing Niu. “Detecting Cross-Lingual Semantic
Divergence for Neural Machine Translation”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2017, pages 69–79 (cited on page 464).
BIBLIOGRAPHY 709
[636] Yogarshi Vyas, Xing Niu, and Marine Carpuat. “Identifying Semantic Divergences
in Parallel Text without Annotations”. In: Annual Conference of the North Ameri-
can Chapter of the Association for Computational Linguistics, 2018, pages 1503–
1515 (cited on page 464).
[637] Wei Wang, Isaac Caswell, and Ciprian Chelba. “Dynamically Composing Domain-
Data Selection with Clean-Data Selection by ”Co-Curricular Learning” for Neural
Machine Translation”. In: Annual Meeting of the Association for Computational
Linguistics, 2019, pages 1282–1292 (cited on pages 464, 466).
[638] Jingbo Zhu, Huizhen Wang, and Eduard H. Hovy. “Multi-Criteria-Based Strategy
to Stop Active Learning for Data Annotation”. In: International Conference on
Computational Linguistics, 2008, pages 1129–1136 (cited on page 464).
[639] Jingbo Zhu and Matthew Ma. “Uncertainty-based active learning with instability
estimation for text classification”. In: volume 8. 4. ACM Transactions on Speech
and Language Processing, 2012, 5:1–5:21 (cited on page 465).
[640] Jingbo Zhu, Huizhen Wang, Tianshun Yao, and Benjamin K. Tsou. “Active Learn-
ing with Sampling by Uncertainty and Density for Word Sense Disambiguation
and Text Classification”. In: International Conference on Computational Linguis-
tics, 2008, pages 1137–1144 (cited on page 465).
[641] Ming Liu, Wray L. Buntine, and Gholamreza Haffari. “Learning to Actively Learn
Neural Machine Translation”. In: The SIGNLL Conference on Computational Nat-
ural Language Learning, 2018, pages 334–344 (cited on page 465).
[642] Yuekai Zhao, Haoran Zhang, Shuchang Zhou, and Zhihua Zhang. “Active Learning
Approaches to Enhancing Neural Machine Translation: An Empirical Study”. In:
Conference on Empirical Methods in Natural Language Processing, 2020, pages 1796–
1806 (cited on page 465).
[643] Álvaro Peris and Francisco Casacuberta. “Active Learning for Interactive Neural
Machine Translation of Data Streams”. In: The SIGNLL Conference on Computa-
tional Natural Language Learning, 2018, pages 151–160 (cited on pages 465, 478).
[644] Marco Turchi, Matteo Negri, M. Amin Farajian, and Marcello Federico. “Contin-
uous Learning from Human Post-Edits for Neural Machine Translation”. In: vol-
ume 108. The Prague Bulletin of Mathematical Linguistics, 2017, pages 233–244
(cited on page 465).
[645] Álvaro Peris and Francisco Casacuberta. “Online learning for effort reduction in in-
teractive neural machine translation”. In: volume 58. Computer Speech Language,
2019, pages 98–126 (cited on pages 465, 614).
710 BIBLIOGRAPHY
[646] Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. “Curricu-
lum learning”. In: volume 382. ACM International Conference Proceeding Series.
International Conference on Machine Learning, pages 41–48 (cited on page 465).
[647] Emmanouil Antonios Platanios, Otilia Stretcu, Graham Neubig, Barnabás Póczos,
and Tom M. Mitchell. “Competence-based Curriculum Learning for Neural Ma-
chine Translation”. In: Conference of the North American Chapter of the Associa-
tion for Computational Linguistics: Human Language Technologies, 2019, pages 1162–
1172 (cited on page 466).
[648] Tom Kocmi and Ondrej Bojar. “Curriculum Learning and Minibatch Bucketing
in Neural Machine Translation”. In: International Conference Recent Advances in
Natural Language Processing, 2017, pages 379–386 (cited on page 466).
[649] Xuan Zhang, Pamela Shapiro, Gaurav Kumar, Paul McNamee, Marine Carpuat,
and Kevin Duh. “Curriculum Learning for Domain Adaptation in Neural Machine
Translation”. In: Annual Conference of the North American Chapter of the Associ-
ation for Computational Linguistics, 2019, pages 1903–1915 (cited on pages 466,
579).
[650] Xuan Zhang, Gaurav Kumar, Huda Khayrallah, Kenton Murray, Jeremy Gwinnup,
Marianna J Martindale, Paul McNamee, Kevin Duh, and Marine Carpuat. “An em-
pirical exploration of curriculum learning for neural machine translation”. In: arXiv
preprint arXiv:1811.00739, 2018 (cited on pages 466, 469).
[651] Yikai Zhou, Baosong Yang, Derek Wong, Yu Wan, and Lidia S. Chao. “Uncertainty-
Aware Curriculum Learning for Neural Machine Translation”. In: Annual Meeting
of the Association for Computational Linguistics, 2020, pages 6934–6944 (cited
on page 466).
[652] Zhizhong Li and Derek Hoiem. “Learning without Forgetting”. In: volume 40. 12.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, pages 2935–
2947 (cited on page 467).
[653] Amal Rannen Triki, Rahaf Aljundi, Matthew Blaschko, and Tinne Tuytelaars. “En-
coder Based Lifelong Learning”. In: IEEE International Conference on Computer
Vision, 2017, pages 1329–1337 (cited on page 467).
[654] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H.
Lampert. “iCaRL: Incremental Classifier and Representation Learning”. In: IEEE
Conference on Computer Vision and Pattern Recognition, 2017, pages 5533–5542
(cited on page 468).
BIBLIOGRAPHY 711
[655] Francisco Castro, Manuel Marı
́
n-Jiménez, Nicolás Guil, Cordelia Schmid, and Kar-
teek Alahari. “End-to-End Incremental Learning”. In: volume 11216. Lecture Notes
in Computer Science. European Conference on Computer Vision, 2018, pages 241–
257 (cited on page 468).
[656] Andrei Rusu, Neil Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirk-
patrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. “Progressive neu-
ral networks”. In: arXiv preprint arXiv:1606.04671, 2016 (cited on page 468).
[657] Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha,
Andrei Rusu, Alexander Pritzel, and Daan Wierstra. “PathNet: Evolution Channels
Gradient Descent in Super Neural Networks”. In: volume abs/1701.08734. CoRR,
2017 (cited on page 468).
[658] Paul Michel and Graham Neubig. “MTNT: A Testbed for Machine Translation of
Noisy Text”. In: Conference on Empirical Methods in Natural Language Process-
ing, 2018, pages 543–553 (cited on page 469).
[659] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. “Delving into Transferable
Adversarial Examples and Black-box Attacks”. In: International Conference on
Learning Representations, 2017 (cited on page 469).
[660] Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. “Adversarial Examples: Attacks
and Defenses for Deep Learning”. In: volume 30. 9. IEEE Transactions on Neural
Networks and Learning Systems, 2019, pages 2805–2824 (cited on page 469).
[661] Xiaoyong Yuan, Pan He, Xiaolin Li, and Dapeng Wu. “Adaptive Adversarial At-
tack on Scene Text Recognition”. In: IEEE Conference on Computer Communica-
tions, 2020, pages 358–363 (cited on page 469).
[662] Stéphane Ross, Geoffrey Gordon, and Drew Bagnell. “A Reduction of Imitation
Learning and Structured Prediction to No-Regret Online Learning”. In: volume 15.
JMLR Proceedings. JMLR.org, 2011, pages 627–635 (cited on page 469).
[663] Arun Venkatraman, Martial Hebert, and J. Andrew Bagnell. “Improving Multi-Step
Prediction of Learned Time Series Models”. In: AAAI Conference on Artificial
Intelligence, 2015, pages 3024–3030 (cited on page 469).
[664] Khanh Nguyen, Hal Daumé III, and Jordan Boyd-Graber. “Reinforcement Learn-
ing for Bandit Neural Machine Translation with Simulated Human Feedback”. In:
Empirical Methods in Natural Language Processing, 2017, pages 1464–1474 (cited
on pages 469, 614).
712 BIBLIOGRAPHY
[665] Rico Sennrich, Barry Haddow, and Alexandra Birch. “Improving Neural Machine
Translation Models with Monolingual Data”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2016 (cited on pages 469, 546, 547).
[666] Lijun Wu, Fei Tian, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. “A Study of Re-
inforcement Learning for Neural Machine Translation”. In: Annual Meeting of
the Association for Computational Linguistics, 2018, pages 3612–3621 (cited on
page 469).
[667] Ajay Surendranath and Dinesh Babu Jayagopi. “Curriculum Learning for Depth
Estimation with Deep Convolutional Neural Networks”. In: Mediterranean Confer-
ence on Pattern Recognition and Artificial Intelligence, 2018, pages 95–100 (cited
on page 469).
[668] Haw-Shiuan Chang, Erik G. Learned-Miller, and Andrew McCallum. “Active Bias:
Training More Accurate Neural Networks by Emphasizing High Variance Sam-
ples”. In: Annual Conference on Neural Information Processing Systems, 2017,
pages 1002–1012 (cited on page 469).
[669] Felix Stahlberg, Eva Hasler, Danielle Saunders, and Bill Byrne. “SGNMT - A Flex-
ible NMT Decoding Platform for Quick Prototyping of New Models and Search
Strategies”. In: Conference on Empirical Methods in Natural Language Processing,
2017, pages 25–30 (cited on pages 472, 477).
[670] Felix Stahlberg and Bill Byrne. “On NMT Search Errors and Model Errors: Cat
Got Your Tongue?” In: Conference on Empirical Methods in Natural Language
Processing, 2019, pages 3354–3360 (cited on pages 473, 479).
[671] Rico Sennrich, Barry Haddow, and Alexandra Birch. “Edinburgh Neural Machine
Translation Systems for WMT 16”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2016, pages 371–376 (cited on pages 474, 497).
[672] Lemao Liu, Masao Utiyama, Andrew M. Finch, and Eiichiro Sumita. “Agreement
on Target-bidirectional Neural Machine Translation”. In: Annual Conference of
the North American Chapter of the Association for Computational Linguistics,
2016, pages 411–416 (cited on page 474).
[673] Bei Li, Yinqiao Li, Chen Xu, Ye Lin, Jiqiang Liu, Hui Liu, Ziyang Wang, Yuhao
Zhang, Nuo Xu, Zeyang Wang, Kai Feng, Hexuan Chen, Tengbo Liu, Yanyang
Li, Qiang Wang, Tong Xiao, and Jingbo Zhu. “The NiuTrans Machine Translation
Systems for WMT19”. In: Annual Meeting of the Association for Computational
Linguistics, 2019, pages 257–266 (cited on pages 474, 497).
BIBLIOGRAPHY 713
[674] Felix Stahlberg, Adrià de Gispert, and Bill Byrne. “The University of Cambridge’s
Machine Translation Systems for WMT18”. In: Annual Meeting of the Association
for Computational Linguistics, 2018, pages 504–512 (cited on page 474).
[675] Xiangwen Zhang, Jinsong Su, Yue Qin, Yang Liu, Rongrong Ji, and Hongji Wang.
“Asynchronous Bidirectional Decoding for Neural Machine Translation”. In: AAAI
Conference on Artificial Intelligence, 2018, pages 5698–5705 (cited on page
474).
[676] Long Zhou, Jiajun Zhang, and Chengqing Zong. “Synchronous Bidirectional Neu-
ral Machine Translation”. In: volume 7. Transactions of the Association for Com-
putational Linguistics, 2019, pages 91–105 (cited on page 474).
[677] Aodong Li, Shiyue Zhang, Dong Wang, and Thomas Fang Zheng. “Enhanced neu-
ral machine translation by learning from draft”. In: IEEE Asia-Pacific Services
Computing Conference, 2017, pages 1583–1587 (cited on page 475).
[678] Ayah ElMaghraby and Ahmed Rafea. “Enhancing Translation from English to Ara-
bic Using Two-Phase Decoder Translation”. In: Intelligent Systems and Applica-
tions, 2018, pages 539–549 (cited on page 475).
[679] Xinwei Geng, Xiaocheng Feng, Bing Qin, and Ting Liu. “Adaptive Multi-pass
Decoder for Neural Machine Translation”. In: Conference on Empirical Methods
in Natural Language Processing, 2018, pages 523–532 (cited on page 475).
[680] Jason Lee, Elman Mansimov, and Kyunghyun Cho. “Deterministic Non-Autoregressive
Neural Sequence Modeling by Iterative Refinement”. In: Conference on Empiri-
cal Methods in Natural Language Processing, 2018, pages 1173–1182 (cited on
pages 475, 489, 493).
[681] Jiatao Gu, Changhan Wang, and Jake Zhao. “Levenshtein Transformer”. In: Annual
Conference on Neural Information Processing Systems, 2019, pages 11179–11189
(cited on page 475).
[682] Junliang Guo, Linli Xu, and Enhong Chen. “Jointly Masked Sequence-to-Sequence
Model for Non-Autoregressive Neural Machine Translation”. In: Annual Meeting
of the Association for Computational Linguistics, 2020, pages 376–385 (cited on
page 475).
[683] Shikib Mehri and Leonid Sigal. “Middle-Out Decoding”. In: Conference on Neural
Information Processing Systems, 2018, pages 5523–5534 (cited on page 475).
[684] Felix Stahlberg, Danielle Saunders, and Bill Byrne. “An Operation Sequence Model
for Explainable Neural Machine Translation”. In: Conference on Empirical Meth-
ods in Natural Language Processing, 2018, pages 175–186 (cited on page 475).
714 BIBLIOGRAPHY
[685] Mitchell Stern, William Chan, Jamie Kiros, and Jakob Uszkoreit. “Insertion Trans-
former: Flexible Sequence Generation via Insertion Operations”. In: International
Conference on Machine Learning, 2019, pages 5976–5985 (cited on page 475).
[686] Robert Östling and Jörg Tiedemann. “Neural machine translation for low-resource
languages”. In: volume abs/1708.05729. CoRR, 2017 (cited on page 475).
[687] Yuta Kikuchi, Graham Neubig, Ryohei Sasano, Hiroya Takamura, and Manabu
Okumura. “Controlling Output Length in Neural Encoder-Decoders”. In: Confer-
ence on Empirical Methods in Natural Language Processing, 2016, pages 1328–
1338 (cited on page 475).
[688] Sho Takase and Naoaki Okazaki. “Positional Encoding to Control Output Sequence
Length”. In: Annual Conference of the North American Chapter of the Association
for Computational Linguistics, 2019, pages 3999–4004 (cited on page 475).
[689] Kenton Murray and David Chiang. “Correcting Length Bias in Neural Machine
Translation”. In: Annual Meeting of the Association for Computational Linguistics,
2018, pages 212–223 (cited on pages 475, 479).
[690] Pavel Sountsov and Sunita Sarawagi. “Length bias in Encoder Decoder Models and
a Case for Global Conditioning”. In: Conference on Empirical Methods in Natural
Language Processing, 2016, pages 1516–1525 (cited on pages 475, 479).
[691] Sébastien Jean, Orhan Firat, Kyunghyun Cho, Roland Memisevic, and Yoshua Ben-
gio. “Montreal Neural Machine Translation Systems for WMT’15”. In: Confer-
ence on Empirical Methods in Natural Language Processing, 2015, pages 134–140
(cited on page 476).
[692] Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush.
“OpenNMT: Open-Source Toolkit for Neural Machine Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2017, pages 67–72 (cited
on pages 476, 477, 515, 633).
[693] Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, and Kenny Q. Zhu. “Im-
proving Attention Modeling with Implicit Distortion and Fertility for Machine
Translation”. In: International Conference on Computational Linguistics, 2016,
pages 3082–3092 (cited on page 476).
[694] Jing Yang, Biao Zhang, Yue Qin, Xiangwen Zhang, Qian Lin, and Jinsong Su.
“Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT”. In: CCF
International Conference on Natural Language Processing and Chinese Comput-
ing, 2018, pages 291–302 (cited on page 476).
BIBLIOGRAPHY 715
[695] Haitao Mi, Baskaran Sankaran, Zhiguo Wang, and Abe Ittycheriah. “Coverage Em-
bedding Models for Neural Machine Translation”. In: Conference on Empirical
Methods in Natural Language Processing, 2016, pages 955–960 (cited on page 477).
[696] M. Kazimi and Marta R. Costa-jussà. “Coverage for Character Based Neural Ma-
chine Translation”. In: volume 59. arXiv preprint arXiv:1810.02340, 2017, pages 99–
106 (cited on page
477).
[697] Sam Wiseman and Alexander M. Rush. “Sequence-to-Sequence Learning as Beam-
Search Optimization”. In: Conference on Empirical Methods in Natural Language
Processing, 2016, pages 1296–1306 (cited on page 477).
[698] Mingbo Ma, Renjie Zheng, and Liang Huang. “Learning to Stop in Structured
Prediction for Neural Machine Translation”. In: Annual Conference of the North
American Chapter of the Association for Computational Linguistics, 2019, pages 1884–
1889 (cited on page 477).
[699] J. Eisner and Hal Daumé. “Learning Speed-Accuracy Tradeoffs in Nondeterminis-
tic Inference Algorithms”. In: Annual Conference on Neural Information Process-
ing Systems, 2011 (cited on page 477).
[700] Jiarong Jiang, Adam R. Teichert, Hal Daumé, and Jason Eisner. “Learned Priori-
tization for Trading Off Accuracy and Speed”. In: Annual Conference on Neural
Information Processing Systems, 2012, pages 1340–1348 (cited on page 477).
[701] Renjie Zheng, Mingbo Ma, Baigong Zheng, Kaibo Liu, and Liang Huang. “Oppor-
tunistic Decoding with Timely Correction for Simultaneous Translation”. In: An-
nual Meeting of the Association for Computational Linguistics, 2020, pages 437–
442 (cited on page 477).
[702] Mingbo Ma, Liang Huang, Hao Xiong, Renjie Zheng, Kaibo Liu, Baigong Zheng,
Chuanqiang Zhang, Zhongjun He, Hairong Liu, Xing Li, Hua Wu, and Haifeng
Wang. “STACL: Simultaneous Translation with Implicit Anticipation and Control-
lable Latency using Prefix-to-Prefix Framework”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2019, pages 3025–3036 (cited on page 477).
[703] Álvaro Peris, Miguel Domingo, and F. Casacuberta. “Interactive neural machine
translation”. In: volume 45. Computer Speech and Language, 2017, pages 201–
220 (cited on pages 478, 614).
[704] Kevin Gimpel, Dhruv Batra, Chris Dyer, and Gregory Shakhnarovich. “A System-
atic Exploration of Diversity in Machine Translation”. In: Conference on Empir-
ical Methods in Natural Language Processing, 2013, pages 1100–1111 (cited on
page 478).
716 BIBLIOGRAPHY
[705] Jiwei Li and Dan Jurafsky. “Mutual Information and Diverse Decoding Improve
Neural Machine Translation”. In: volume abs/1601.00372. CoRR, 2016 (cited on
page 478).
[706] Nan Duan, Mu Li, Tong Xiao, and Ming Zhou. “The Feature Subspace Method
for SMT System Combination”. In: Conference on Empirical Methods in Natural
Language Processing, 2009, pages 1096–1104 (cited on pages
478, 495).
[707] Tong Xiao, Jingbo Zhu, Muhua Zhu, and Huizhen Wang. “Boosting-Based System
Combination for Machine Translation”. In: Annual Meeting of the Association for
Computational Linguistics, 2010, pages 739–748 (cited on pages 478, 495).
[708] Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. “A Diversity-
Promoting Objective Function for Neural Conversation Models”. In: Annual Con-
ference of the North American Chapter of the Association for Computational Lin-
guistics, 2016, pages 110–119 (cited on page 478).
[709] Xuanli He, Gholamreza Haffari, and Mohammad Norouzi. “Sequence to Sequence
Mixture Model for Diverse Machine Translation”. In: International Conference on
Computational Linguistics, 2018, pages 583–592 (cited on page 478).
[710] Tianxiao Shen, Myle Ott, Michael Auli, and Marc’Aurelio Ranzato. “Mixture Mod-
els for Diverse Machine Translation: Tricks of the Trade”. In: International Con-
ference on Machine Learning, 2019, pages 5719–5728 (cited on page 478).
[711] Xuanfu Wu, Yang Feng, and Chenze Shao. “Generating Diverse Translation from
Model Distribution with Dropout”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2020, pages 1088–1097 (cited on page 478).
[712] Zewei Sun, Shujian Huang, Hao Ran Wei, Xin Yu Dai, and Jiajun Chen. “Gener-
ating Diverse Translation by Manipulating Multi-Head Attention”. In: AAAI Con-
ference on Artificial Intelligence, 2020, pages 8976–8983 (cited on page 478).
[713] Ashwin K. Vijayakumar, Michael Cogswell, Ramprasaath R. Selvaraju, Qing Sun,
Stefan Lee, David J. Crandall, and Dhruv Batra. “Diverse Beam Search: Decoding
Diverse Solutions from Neural Sequence Models”. In: volume abs/1610.02424.
CoRR, 2016 (cited on page 478).
[714] Tong Xiao, Derek F. Wong, and Jingbo Zhu. “A Loss-Augmented Approach to
Training Syntactic Machine Translation Systems”. In: volume 24. IEEE/ACM Trans-
actions on Audio, Speech, and Language Processing, 2016, pages 2069–2083 (cited
on page 478).
BIBLIOGRAPHY 717
[715] Lemao Liu and Liang Huang. “Search-Aware Tuning for Machine Translation”. In:
Conference on Empirical Methods in Natural Language Processing, 2014, pages 1942–
1952 (cited on page 478).
[716] Heng Yu, Liang Huang, Haitao Mi, and Kai Zhao. “Max-Violation Perceptron and
Forced Decoding for Scalable MT Training”. In: Conference on Empirical Methods
in Natural Language Processing, 2013, pages 1112–1123 (cited on page
478).
[717] Jan Niehues, Eunah Cho, Thanh-Le Ha, and Alex Waibel. “Analyzing Neural MT
Search and Model Performance”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2017, pages 11–17 (cited on page 479).
[718] Philipp Koehn and Rebecca Knowles. “Six Challenges for Neural Machine Transla-
tion”. In: Annual Meeting of the Association for Computational Linguistics, 2017,
pages 28–39 (cited on page 479).
[719] Wen Zhang, Yang Feng, Fandong Meng, Di You, and Qun Liu. “Bridging the
Gap between Training and Inference for Neural Machine Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2019, pages 4334–4343
(cited on page 479).
[720] Jianhua Lin. “Divergence measures based on the Shannon entropy”. In: volume 37.
1. IEEE Transactions on Information Theory, 1991, pages 145–151 (cited on page 481).
[721] Raj Dabre and Atsushi Fujita. “Recurrent Stacking of Layers for Compact Neural
Machine Translation Models”. In: AAAI Conference on Artificial Intelligence,
2019, pages 6292–6299 (cited on page 482).
[722] Sharan Narang, Eric Undersander, and Gregory Diamos. “Block-Sparse Recurrent
Neural Networks”. In: volume abs/1711.02782. CoRR, 2017 (cited on page 482).
[723] Trevor Gale, Erich Elsen, and Sara Hooker. “The State of Sparsity in Deep Neural
Networks”. In: volume abs/1902.09574. CoRR, 2019 (cited on page 482).
[724] Paul Michel, Omer Levy, and Graham Neubig. “Are Sixteen Heads Really Better
than One?” In: Annual Conference on Neural Information Processing Systems,
2019, pages 14014–14024 (cited on pages 482, 499, 544).
[725] Raden Mu’az Mun’im, Nakamasa Inoue, and Koichi Shinoda. “Sequence-level
Knowledge Distillation for Model Compression of Attention-based Sequence-to-
sequence Speech Recognition”. In: IEEE International Conference on Acoustics,
Speech and Signal Processing, 2019, pages 6151–6155 (cited on pages 482, 499).
718 BIBLIOGRAPHY
[726] Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and Franccois Fleuret.
“Transformers are RNNs: Fast Autoregressive Transformers with Linear Atten-
tion”. In: volume abs/2006.16236. International Conference on Machine Learning,
2020 (cited on pages 483, 513).
[727] Sinong Wang, Belinda Li, Madian Khabsa, Han Fang, and Hao Ma. “Linformer:
Self-Attention with Linear Complexity”. In: volume abs/2006.04768. CoRR, 2020
(cited on pages 483, 499, 544).
[728] Weijie Liu, Peng Zhou, Zhiruo Wang, Zhe Zhao, Haotang Deng, and Qi Ju. “Fast-
BERT: a Self-distilling BERT with Adaptive Inference Time”. In: Annual Meeting
of the Association for Computational Linguistics, 2020, pages 6035–6044 (cited
on page 483).
[729] Maha Elbayad, Jiatao Gu, Edouard Grave, and Michael Auli. “Depth-Adaptive
Transformer”. In: International Conference on Learning Representations, 2020 (cited
on page 483).
[730] Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, An-
drew G. Howard, Hartwig Adam, and Dmitry Kalenichenko. “Quantization and
Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”. In:
IEEE Conference on Computer Vision and Pattern Recognition, 2018, pages 2704–
2713 (cited on page 485).
[731] Gabriele Prato, Ella Charlaix, and Mehdi Rezagholizadeh. “Fully Quantized Trans-
former for Improved Translation”. In: volume abs/1910.10485. CoRR, 2019 (cited
on page 485).
[732] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Ben-
gio. “Binarized Neural Networks”. In: Annual Conference on Neural Information
Processing Systems, 2016, pages 4107–4115 (cited on page 485).
[733] Chunting Zhou, Graham Neubig, and Jiatao Gu. “Understanding Knowledge Dis-
tillation in Non-autoregressive Machine Translation”. In: volume abs/1911.02727.
ArXiv, 2020 (cited on page 489).
[734] Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, and Tie-Yan Liu. “Fine-
Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Trans-
lation”. In: AAAI Conference on Artificial Intelligence, 2020, pages 7839–7846
(cited on pages 489, 490).
BIBLIOGRAPHY 719
[735] Bingzhen Wei, Mingxuan Wang, Hao Zhou, Junyang Lin, and Xu Sun. “Imita-
tion Learning for Non-Autoregressive Neural Machine Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2019, pages 1304–1312
(cited on page 490).
[736] Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, and Tie-Yan Liu. “Non-Autoregressive
Neural Machine Translation with Enhanced Decoder Input”. In: AAAI Conference
on Artificial Intelligence, 2019, pages 3723–3730 (cited on pages 490, 500).
[737] Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu. “Non-
Autoregressive Machine Translation with Auxiliary Regularization”. In: AAAI Con-
ference on Artificial Intelligence, 2019, pages 5377–5384 (cited on page 490).
[738] Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, and Eduard H. Hovy. “FlowSeq:
Non-Autoregressive Conditional Sequence Generation with Generative Flow”. In:
Conference on Empirical Methods in Natural Language Processing, 2019, pages 4281–
4291 (cited on pages 490, 500).
[739] Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob
Uszkoreit, and Noam Shazeer. “Fast Decoding in Sequence Models using Dis-
crete Latent Variables”. In: International Conference on Machine Learning, 2018,
pages 2395–2404 (cited on page 490).
[740] Qiu Ran, Yankai Lin, Peng Li, and Jie Zhou. “Learning to Recover from Multi-
Modality Errors for Non-Autoregressive Neural Machine Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2020, pages 3059–3069
(cited on page 490).
[741] Lifu Tu, Richard Yuanzhe Pang, Sam Wiseman, and Kevin Gimpel. “ENGINE:
Energy-Based Inference Networks for Non-Autoregressive Machine Translation”.
In: Annual Meeting of the Association for Computational Linguistics, 2020, pages 2819–
2826 (cited on page 490).
[742] Raphael Shu, Jason Lee, Hideki Nakayama, and Kyunghyun Cho. “Latent-Variable
Non-Autoregressive Neural Machine Translation with Deterministic Inference us-
ing a Delta Posterior”. In: AAAI Conference on Artificial Intelligence, 2020, pages 8846–
8853 (cited on page 490).
[743] Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao Qin, Liwei Wang, and Tie-Yan Liu. “Hint-
Based Training for Non-Autoregressive Machine Translation”. In: Conference on
Empirical Methods in Natural Language Processing, 2019, pages 5707–5712 (cited
on page 490).
720 BIBLIOGRAPHY
[744] Nader Akoury, Kalpesh Krishna, and Mohit Iyyer. “Syntactically Supervised Trans-
formers for Faster Neural Machine Translation”. In: Annual Meeting of the Associ-
ation for Computational Linguistics, 2019, pages 1269–1281 (cited on page 491).
[745] Chunqi Wang, Ji Zhang, and Haiqing Chen. “Semi-Autoregressive Neural Machine
Translation”. In: Conference on Empirical Methods in Natural Language Process-
ing, 2018, pages 479–488 (cited on pages
491, 492).
[746] Qiu Ran, Yankai Lin, Peng Li, and Jie Zhou. “Guiding Non-Autoregressive Neural
Machine Translation Decoding with Reordering Information”. In: volume abs/1911.02215.
CoRR, 2019 (cited on pages 491, 500).
[747] Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. “Mask-
Predict: Parallel Decoding of Conditional Masked Language Models”. In: Confer-
ence on Empirical Methods in Natural Language Processing, 2019, pages 6111–
6120 (cited on page 493).
[748] Jungo Kasai, J. Cross, Marjan Ghazvininejad, and Jiatao Gu. “Non-Autoregressive
Machine Translation with Disentangled Context Transformer”. In: arXiv: Compu-
tation and Language, 2020 (cited on page 493).
[749] Yoav Freund and Robert E. Schapire. “A Decision-Theoretic Generalization of On-
Line Learning and an Application to Boosting”. In: volume 55. 1. Journal of Com-
puter and System Sciences, 1997, pages 119–139 (cited on page 495).
[750] Khe Chai Sim, William J. Byrne, Mark J. F. Gales, Hichem Sahbi, and Philip
C. Woodland. “Consensus Network Decoding for Statistical Machine Translation
System Combination”. In: Proceedings of the IEEE International Conference on
Acoustics, Speech, and Signal Processing, 2007, pages 105–108 (cited on page 495).
[751] Antti-Veikko I. Rosti, Spyridon Matsoukas, and Richard M. Schwartz. “Improved
Word-Level System Combination for Machine Translation”. In: Annual Meeting
of the Association for Computational Linguistics, 2007 (cited on page 495).
[752] Antti-Veikko I. Rosti, Bing Zhang, Spyros Matsoukas, and Richard M. Schwartz.
“Incremental Hypothesis Alignment for Building Confusion Networks with Ap-
plication to Machine Translation System Combination”. In: Proceedings of the
Third Workshop on Statistical Machine Translation, 2008, pages 183–186 (cited
on page 495).
[753] Jiwei Li, Will Monroe, and Dan Jurafsky. “A Simple, Fast Diverse Decoding Al-
gorithm for Neural Generation”. In: volume abs/1611.08562. CoRR, 2016 (cited
on page 495).
BIBLIOGRAPHY 721
[754] Mingxuan Wang, Li Gong, Wenhuan Zhu, Jun Xie, and Chao Bian. “Tencent Neu-
ral Machine Translation Systems for WMT18”. In: Annual Meeting of the Associ-
ation for Computational Linguistics, 2018, pages 522–527 (cited on page 497).
[755] Yuhao Zhang, Ziyang Wang, Runzhe Cao, Binghao Wei, Weiqiao Shan, Shuhan
Zhou, Abudurexiti Reheman, Tao Zhou, Xin Zeng, Laohu Wang, Yongyu Mu, Jing-
nan Zhang, Xiaoqian Liu, Xuanjun Zhou, Yinqiao Li, Bei Li, Tong Xiao, and Jingbo
Zhu. “The NiuTrans Machine Translation Systems for WMT20”. In: Annual Meet-
ing of the Association for Computational Linguistics, Nov. 2020, pages 336–343
(cited on page 497).
[756] Roy Tromble, Shankar Kumar, Franz Josef Och, and Wolfgang Macherey. “Lattice
Minimum Bayes-Risk Decoding for Statistical Machine Translation”. In: Confer-
ence on Empirical Methods in Natural Language Processing, 2008, pages 620–629
(cited on page 497).
[757] Jinsong Su, Zhixing Tan, Deyi Xiong, Rongrong Ji, Xiaodong Shi, and Yang Liu.
“Lattice-Based Recurrent Neural Network Encoders for Neural Machine Trans-
lation”. In: AAAI Conference on Artificial Intelligence, 2017, pages 3302–3308
(cited on page 497).
[758] Leonhard Held and D Sabanés Bové. “Applied statistical inference”. In: volume 10.
978-3. Springer, 2014, page 16 (cited on page 499).
[759] S. D. Silvey. “Statistical Inference”. In: Encyclopedia of Social Network Analysis
and Mining, 2018 (cited on page 499).
[760] Matthew J. Beal. “Variational algorithms for approximate Bayesian inference”. In:
University College London, 2003 (cited on page 499).
[761] Zhifei Li, Jason Eisner, and Sanjeev Khudanpur. “Variational Decoding for Statis-
tical Machine Translation”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2009, pages 593–601 (cited on page 499).
[762] Jasmijn Bastings, Wilker Aziz, Ivan Titov, and Khalil Sima’an. “Modeling Latent
Sentence Structure in Neural Machine Translation”. In: volume abs/1901.06436.
CoRR, 2019 (cited on page 499).
[763] Harshil Shah and David Barber. “Generative Neural Machine Translation”. In: An-
nual Conference on Neural Information Processing Systems, 2018, pages 1353–
1362 (cited on page 499).
[764] Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, and Biao Zhang. “Vari-
ational Recurrent Neural Machine Translation”. In: AAAI Conference on Artificial
Intelligence, 2018, pages 5488–5495 (cited on pages 499, 544).
722 BIBLIOGRAPHY
[765] Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, and Min Zhang. “Variational
Neural Machine Translation”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2016, pages 521–530 (cited on page 499).
[766] Angela Fan, Edouard Grave, and Armand Joulin. “Reducing Transformer Depth
on Demand with Structured Dropout”. In: International Conference on Learning
Representations, 2020 (cited on page
499).
[767] Qiang Wang, Tong Xiao, and Jingbo Zhu. “Training Flexible Depth Model by
Multi-Task Learning for Neural Machine Translation”. In: Conference on Empir-
ical Methods in Natural Language Processing, 2020, pages 4307–4312 (cited on
pages 499, 617).
[768] Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, and Ming Zhou. “BERT-of-
Theseus: Compressing BERT by Progressive Module Replacing”. In: Conference
on Empirical Methods in Natural Language Processing, 2020 (cited on page 499).
[769] Alexei Baevski and Michael Auli. “Adaptive Input Representations for Neural Lan-
guage Modeling”. In: arXiv preprint arXiv:1809.10853, 2019 (cited on page 499).
[770] Sachin Mehta, Rik Koncel-Kedziorski, Mohammad Rastegari, and Hannaneh Ha-
jishirzi. “DeFINE: DEep Factorized INput Word Embeddings for Neural Sequence
Modeling”. In: volume abs/1911.12385. CoRR, 2019 (cited on page 499).
[771] Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Dawei Song,
and Ming Zhou. “A Tensorized Transformer for Language Modeling”. In: vol-
ume abs/1906.09777. CoRR, 2019 (cited on page 499).
[772] Zhilin Yang, Thang Luong, Ruslan Salakhutdinov, and Quoc V. Le. “Mixtape:
Breaking the Softmax Bottleneck Efficiently”. In: Conference on Neural Informa-
tion Processing Systems, 2019, pages 15922–15930 (cited on page 499).
[773] Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, and Noah Smith. “Deep
Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine
Translation”. In: volume abs/2006.10369. CoRR, 2020 (cited on pages 499, 516).
[774] Chi Hu, Bei Li, Yinqiao Li, Ye Lin, Yanyang Li, Chenglong Wang, Tong Xiao, and
Jingbo Zhu. “The NiuTrans System for WNGT 2020 Efficiency Task”. In: Annual
Meeting of the Association for Computational Linguistics, 2020, pages 204–210
(cited on pages 499, 516).
[775] Yi-Te Hsu, Sarthak Garg, Yi-Hsiu Liao, and Ilya Chatsviorkin. “Efficient Inference
For Neural Machine Translation”. In: volume abs/2010.02416. CoRR, 2020 (cited
on page 499).
BIBLIOGRAPHY 723
[776] Song Han, Jeff Pool, John Tran, and William J. Dally. “Learning both Weights
and Connections for Efficient Neural Network”. In: Annual Conference on Neural
Information Processing Systems, 2015, pages 1135–1143 (cited on page 499).
[777] Namhoon Lee, Thalaiyasingam Ajanthan, and Philip H. S. Torr. “Snip: single-Shot
Network Pruning based on Connection sensitivity”. In: International Conference on
Learning Representations, 2019 (cited on page
499).
[778] Jonathan Frankle and Michael Carbin. “The Lottery Ticket Hypothesis: Finding
Sparse, Trainable Neural Networks”. In: International Conference on Learning
Representations, 2019 (cited on page 499).
[779] Christopher Brix, Parnia Bahar, and Hermann Ney. “Successfully Applying the
Stabilized Lottery Ticket Hypothesis to the Transformer Architecture”. In: Annual
Meeting of the Association for Computational Linguistics, 2020, pages 3909–3915
(cited on page 499).
[780] Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Chang-
shui Zhang. “Learning Efficient Convolutional Networks through Network Slim-
ming”. In: IEEE International Conference on Computer Vision, 2017, pages 2755–
2763 (cited on page 499).
[781] Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. “Re-
thinking the Value of Network Pruning”. In: volume abs/1810.05270. ArXiv, 2019
(cited on page 499).
[782] Robin Cheong and Robel Daniel. “transformers.zip : Compressing Transformers
with Pruning and Quantization”. In: Stanford University, 2019 (cited on page 499).
[783] Ron Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. “Scalable Methods for
8-bit Training of Neural Networks”. In: Conference on Neural Information Process-
ing Systems, 2018, pages 5151–5159 (cited on page 499).
[784] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Ben-
gio. “Quantized Neural Networks: Training Neural Networks with Low Precision
Weights and Activations”. In: volume 18. Journal of Machine Learning Reseach,
2017, 187:1–187:30 (cited on page 499).
[785] Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, and Jimmy Lin.
“Distilling Task-Specific Knowledge from BERT into Simple Neural Networks”.
In: volume abs/1903.12136. CoRR, 2019 (cited on page 499).
[786] Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, and Omer Levy.
“Aligned Cross Entropy for Non-Autoregressive Machine Translation”. In: vol-
ume abs/2004.01655. CoRR, 2020 (cited on page 500).
724 BIBLIOGRAPHY
[787] Chenze Shao, Jinchao Zhang, Yang Feng, Fandong Meng, and Jie Zhou. “Min-
imizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine
Translation”. In: AAAI Conference on Artificial Intelligence, 2020, pages 198–
205 (cited on page 500).
[788] Peter Battaglia, Jessica Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinı
́
cius
Flores Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam
Santoro, Ryan Faulkner, Çaglar Gülçehre, H. Francis Song, Andrew Ballard, Justin
Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston,
Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matthew Botvinick,
Oriol Vinyals, Yujia Li, and Razvan Pascanu. “Relational inductive biases, deep
learning, and graph networks”. In: volume abs/1806.01261. CoRR, 2018 (cited on
page 502).
[789] Zhiheng Huang, Davis Liang, Peng Xu, and Bing Xiang. “Improve Transformer
Models with Better Relative Position Embeddings”. In: Conference on Empiri-
cal Methods in Natural Language Processing, 2020, pages 3327–3335 (cited on
page 504).
[790] Xing Wang, Zhaopeng Tu, Longyue Wang, and Shuming Shi. “Self-Attention with
Structural Position Representations”. In: Conference on Empirical Methods in Nat-
ural Language Processing, 2019, pages 1403–1409 (cited on page 505).
[791] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. “Neural
Ordinary Differential Equations”. In: Annual Conference on Neural Information
Processing Systems, 2018, pages 6572–6583 (cited on page 506).
[792] Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, and Zheng Zhang. “Multi-
Scale Self-Attention for Text Classification”. In: AAAI Conference on Artificial
Intelligence, 2020, pages 7847–7854 (cited on page 507).
[793] Kawin Ethayarajh. “How Contextual are Contextualized Word Representations?
Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings”. In: Confer-
ence on Empirical Methods in Natural Language Processing, 2019, pages 55–65
(cited on page 508).
[794] Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. “Aggre-
gated Residual Transformations for Deep Neural Networks”. In: IEEE Conference
on Computer Vision and Pattern Recognition, 2017, pages 5987–5995 (cited on
page 508).
[795] David So, Quoc Le, and Chen Liang. “The Evolved Transformer”. In: volume 97.
International Conference on Machine Learning, 2019, pages 5877–5886 (cited on
pages 508, 537, 539, 542).
BIBLIOGRAPHY 725
[796] Jianhao Yan, Fandong Meng, and Jie Zhou. “Multi-Unit Transformers for Neural
Machine Translation”. In: Conference on Empirical Methods in Natural Language
Processing, 2020, pages 1047–1059 (cited on pages 508, 509).
[797] Yang Fan, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Xiang-Yang Li, and
Tie-Yan Liu. “Multi-branch Attentive Transformer”. In: volume abs/2006.10270.
CoRR, 2020 (cited on pages
508, 509).
[798] Karim Ahmed, Nitish Shirish Keskar, and Richard Socher. “Weighted Transformer
Network for Machine Translation”. In: volume abs/1711.02132. CoRR, 2017 (cited
on page 509).
[799] , , , , , , , and .
”. In: volume 33. 3. , 2019 (cited on
page 509).
[800] Andreas Veit, Michael Wilber, and Serge Belongie. “Residual Networks Behave
Like Ensembles of Relatively Shallow Networks”. In: Annual Conference on Neu-
ral Information Processing Systems, 2016, pages 550–558 (cited on page 510).
[801] Klaus Greff, Rupesh Kumar Srivastava, and Jürgen Schmidhuber. “Highway and
Residual Networks learn Unrolled Iterative Estimation”. In: International Confer-
ence on Learning Representations, 2017 (cited on pages 510, 527).
[802] Bo Chang, Lili Meng, Eldad Haber, Frederick Tung, and David Begert. “Multi-
level Residual Networks from Dynamical Systems View”. In: International Con-
ference on Learning Representations, 2018 (cited on page 510).
[803] Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz
Kaiser. “Universal Transformers”. In: International Conference on Learning Rep-
resentations, 2019 (cited on page 510).
[804] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma,
and Radu Soricut. “ALBERT: A Lite BERT for Self-supervised Learning of Lan-
guage Representations”. In: International Conference on Learning Representations,
2020 (cited on page 510).
[805] Jie Hao, Xing Wang, Baosong Yang, Longyue Wang, Jinfeng Zhang, and Zhaopeng
Tu. “Modeling Recurrence for Transformer”. In: Annual Conference of the North
American Chapter of the Association for Computational Linguistics, 2019, pages 1198–
1207 (cited on page 511).
726 BIBLIOGRAPHY
[806] Jiezhong Qiu, Hao Ma, Omer Levy, Wen-tau Yih, Sinong Wang, and Jie Tang.
“Blockwise Self-Attention for Long Document Understanding”. In: Conference
on Empirical Methods in Natural Language Processing, 2020, pages 2555–2565
(cited on page 511).
[807] Peter Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz
Kaiser, and Noam Shazeer. “Generating Wikipedia by Summarizing Long Sequences”.
In: International Conference on Learning Representations, 2018 (cited on pages 511,
512).
[808] Iz Beltagy, Matthew Peters, and Arman Cohan. “Longformer: The Long-Document
Transformer”. In: volume abs/2004.05150. CoRR, 2020 (cited on pages 511, 544).
[809] Aurko Roy, Mohammad Saffar, Ashish Vaswani, and David Grangier. “Efficient
Content-Based Sparse Attention with Routing Transformers”. In: volume abs/2003.05997.
CoRR, 2020 (cited on page 512).
[810] Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, An-
dreea Gane, Tamás Sarlós, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz
Kaiser, David Belanger, Lucy Colwell, and Adrian Weller. “Rethinking Attention
with Performers”. In: volume abs/2009.14794. CoRR, 2020 (cited on pages 513,
544).
[811] Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing,
Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tie-Yan Liu. “On Layer Normal-
ization in the Transformer Architecture”. In: volume abs/2002.04745. International
Conference on Machine Learning, 2020 (cited on page 514).
[812] Liyuan Liu, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, and Jiawei Han. “Under-
standing the Difficulty of Training Transformers”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2020, pages 5747–5763 (cited on pages 514,
523).
[813] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Identity Mappings in
Deep Residual Networks”. In: volume 9908. European Conference on Computer
Vision, 2016, pages 630–645 (cited on page 515).
[814] Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng,
David Grangier, and Michael Auli. “fairseq: A Fast, Extensible Toolkit for Se-
quence Modeling”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2019, pages 48–53 (cited on pages 515, 633).
BIBLIOGRAPHY 727
[815] Bei Li, Ziyang Wang, Hui Liu, Quan Du, Tong Xiao, Chunliang Zhang, and Jingbo
Zhu. “Learning Light-Weight Translation Models from Deep Transformer”. In: vol-
ume abs/2012.13866. CoRR, 2020 (cited on page 516).
[816] Xiangpeng Wei, Heng Yu, Yue Hu, Yue Zhang, Rongxiang Weng, and Weihua Luo.
“Multiscale Collaborative Deep Models for Neural Machine Translation”. In: An-
nual Meeting of the Association for Computational Linguistics, 2020, pages 414–
426 (cited on page 520).
[817] Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. “Training Very
Deep Networks”. In: Conference on Neural Information Processing Systems, 2015,
pages 2377–2385 (cited on page 520).
[818] David Balduzzi, Marcus Frean, Lennox Leary, JP Lewis, Kurt Wan-Duo Ma, and
Brian McWilliams. “The Shattered Gradients Problem: If resnets are the answer,
then what is the question?” In: volume 70. International Conference on Machine
Learning, 2017, pages 342–350 (cited on page 520).
[819] Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. “A Convergence Theory for Deep
Learning via Over-Parameterization”. In: volume 97. International Conference on
Machine Learning, 2019, pages 242–252 (cited on page 520).
[820] Simon Du, Jason Lee, Haochuan Li, Liwei Wang, and Xiyu Zhai. “Gradient De-
scent Finds Global Minima of Deep Neural Networks”. In: volume 97. Interna-
tional Conference on Machine Learning, 2019, pages 1675–1685 (cited on page 520).
[821] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Delving Deep into
Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”. In:
IEEE International Conference on Computer Vision, 2015, pages 1026–1034 (cited
on page 521).
[822] Hongfei Xu, Qiuhui Liu, Josef van Genabith, Deyi Xiong, and Jingyi Zhang. “Lip-
schitz Constrained Parameter Initialization for Deep Transformers”. In: Annual
Meeting of the Association for Computational Linguistics, 2020, pages 397–402
(cited on page 522).
[823] Xiao Shi Huang, Juan Perez, Jimmy Ba, and Maksims Volkovs. “Improving Trans-
former Optimization Through Better Initialization”. In: International Conference
on Machine Learning, 2020 (cited on page 522).
[824] Lijun Wu, Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Jianhuang Lai,
and Tie-Yan Liu. “Depth Growing for Neural Machine Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2019, pages 5558–5563
(cited on page 524).
728 BIBLIOGRAPHY
[825] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Weinberger. “Densely
Connected Convolutional Networks”. In: IEEE Conference on Computer Vision
and Pattern Recognition, 2017, pages 2261–2269 (cited on page 524).
[826] Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, and Guodong Zhou.
“Modeling Source Syntax for Neural Machine Translation”. In: Annual Meeting
of the Association for Computational Linguistics, 2017, pages 688–697 (cited on
pages 529, 532, 534).
[827] Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. “Tree-to-Sequence
Attentional Neural Machine Translation”. In: Annual Meeting of the Association
for Computational Linguistics, 2016 (cited on pages 529, 530).
[828] Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen. “Improved Neu-
ral Machine Translation with a Syntax-Aware Encoder and Decoder”. In: Annual
Meeting of the Association for Computational Linguistics, 2017, pages 1936–1945
(cited on page 530).
[829] Rico Sennrich and Barry Haddow. “Linguistic Input Features Improve Neural Ma-
chine Translation”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2016, pages 83–91 (cited on page 531).
[830] Xing Shi, Inkit Padhi, and Kevin Knight. “Does String-Based Neural MT Learn
Source Syntax?” In: Annual Meeting of the Association for Computational Lin-
guistics, 2016, pages 1526–1534 (cited on pages 534, 536).
[831] Emanuele Bugliarello and Naoaki Okazaki. “Enhancing Machine Translation with
Dependency-Aware Self-Attention”. In: Annual Meeting of the Association for
Computational Linguistics, 2020, pages 1618–1627 (cited on page 534).
[832] David Alvarez-Melis and Tommi Jaakkola. “Tree-structured decoding with doubly-
recurrent neural networks”. In: International Conference on Learning Representa-
tions, 2017 (cited on page 534).
[833] Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah Smith. “Recurrent
Neural Network Grammars”. In: Annual Meeting of the Association for Computa-
tional Linguistics, 2016, pages 199–209 (cited on page 535).
[834] Minh-Thang Luong, Quoc Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser.
“Multi-task Sequence to Sequence Learning”. In: International Conference on Learn-
ing Representations, 2016 (cited on pages 535, 556).
[835] Shuangzhi Wu, Dongdong Zhang, Nan Yang, Mu Li, and Ming Zhou. “Sequence-
to-Dependency Neural Machine Translation”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2017, pages 698–707 (cited on page 535).
BIBLIOGRAPHY 729
[836] Barret Zoph and Quoc Le. “Neural Architecture Search with Reinforcement Learn-
ing”. In: International Conference on Learning Representations, 2017 (cited on
pages 537, 539, 540).
[837] Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc Le. “Learning Transfer-
able Architectures for Scalable Image Recognition”. In: IEEE Conference on Com-
puter Vision and Pattern Recognition, 2018, pages 8697–8710 (cited on pages
537,
541).
[838] Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc Le. “Aging Evolution
for Image Classifier Architecture Search”. In: AAAI Conference on Artificial In-
telligence, 2019 (cited on pages 537, 541).
[839] Geoffrey Miller, Peter Todd, and Shailesh Hegde. “Designing Neural Networks
using Genetic Algorithms”. In: International Conference on Genetic Algorithms,
1989, pages 379–384 (cited on pages 538, 540).
[840] John Koza and James Rice. “Genetic generation of both the weights and architec-
ture for a neural network”. In: volume 2. international joint conference on neural
networks, 1991, pages 397–404 (cited on page 538).
[841] Steven Harp, Tariq Samad, and Aloke Guha. “Designing Application-Specific Neu-
ral Networks Using the Genetic Algorithm”. In: Advances in Neural Information
Processing Systems, 1989, pages 447–454 (cited on page 538).
[842] Hiroaki Kitano. “Designing Neural Networks Using Genetic Algorithms with Graph
Generation System”. In: volume 4. 4. Complex Systems, 1990 (cited on page 538).
[843] Hanxiao Liu, Karen Simonyan, and Yiming Yang. “DARTS: Differentiable Archi-
tecture Search”. In: International Conference on Learning Representations, 2019
(cited on pages 538–540).
[844] Yinqiao Li, Chi Hu, Yuhao Zhang, Nuo Xu, Yufan Jiang, Tong Xiao, Jingbo Zhu,
Tongran Liu, and Changliang Li. “Learning Architectures from an Extended Search
Space for Language Modeling”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2020, pages 6629–6639 (cited on pages 538, 540, 541, 543).
[845] Yufan Jiang, Chi Hu, Tong Xiao, Chunliang Zhang, and Jingbo Zhu. “Improved Dif-
ferentiable Architecture Search for Language Modeling and Named Entity Recog-
nition”. In: Annual Meeting of the Association for Computational Linguistics, 2019,
pages 3583–3588 (cited on pages 538, 541).
[846] Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. “Efficient Neural
Architecture Search via Parameter Sharing”. In: volume 80. International Confer-
ence on Machine Learning, 2018, pages 4092–4101 (cited on page 539).
730 BIBLIOGRAPHY
[847] Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding,
Hongbo Deng, Jun Huang, Wei Lin, and Jingren Zhou. “AdaBERT: Task-Adaptive
BERT Compression with Differentiable Neural Architecture Search”. In: Interna-
tional Joint Conference on Artificial Intelligence, 2020, pages 2463–2469 (cited
on pages 539, 543).
[848] Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, and
Song Han. “HAT: Hardware-Aware Transformers for Efficient Natural Language
Processing”. In: Annual Meeting of the Association for Computational Linguistics,
2020, pages 7675–7688 (cited on pages 539, 543).
[849] Esteban Real, Chen Liang, David So, and Quoc Le. “AutoML-Zero: Evolving Ma-
chine Learning Algorithms From Scratch”. In: volume abs/2003.03384. CoRR,
2020 (cited on page 540).
[850] Yang Fan, Fei Tian, Yingce Xia, Tao Qin, Xiang-Yang Li, and Tie-Yan Liu. “Search-
ing Better Architectures for Neural Machine Translation”. In: volume 28. IEEE
Transactions on Audio, Speech, and Language Processing, 2020, pages 1574–1585
(cited on pages 540, 542).
[851] Peter Angeline, Gregory Saunders, and Jordan Pollack. “An evolutionary algorithm
that constructs recurrent neural networks”. In: volume 5. 1. IEEE Transactions on
Neural Networks, 1994, pages 54–65 (cited on page 540).
[852] Kenneth Stanley and Risto Miikkulainen. “Evolving neural networks through aug-
menting topologies”. In: volume 10. 2. Evolutionary computation, 2002, pages 99–
127 (cited on page 540).
[853] Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Sue-
matsu, Jie Tan, Quoc Le, and Alexey Kurakin. “Large-Scale Evolution of Image
Classifiers”. In: volume 70. International Conference on Machine Learning, 2017,
pages 2902–2911 (cited on pages 540, 541).
[854] Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. “Efficient Multi-Objective
Neural Architecture Search via Lamarckian Evolution”. In: International Confer-
ence on Learning Representations, 2019 (cited on pages 540, 541).
[855] Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray
Kavukcuoglu. “Hierarchical Representations for Efficient Architecture Search”.
In: International Conference on Learning Representations, 2018 (cited on page 540).
BIBLIOGRAPHY 731
[856] Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu,
Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. “FBNet: Hardware-
Aware Efficient ConvNet Design via Differentiable Neural Architecture Search”.
In: IEEE Conference on Computer Vision and Pattern Recognition, 2019, pages 10734–
10742 (cited on page 540).
[857] Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and
Hongkai Xiong. “PC-DARTS: Partial Channel Connections for Memory-Efficient
Architecture Search”. In: International Conference on Learning Representations,
2020 (cited on page 540).
[858] Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter.
“Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets”.
In: volume 54. International Conference on Artificial Intelligence and Statistics,
2017, pages 528–536 (cited on page 541).
[859] Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. “A Downsampled Variant
of ImageNet as an Alternative to the CIFAR datasets”. In: volume abs/1707.08819.
CoRR, 2017 (cited on page 541).
[860] Arber Zela, Aaron Klein, Stefan Falkner, and Frank Hutter. “Towards Automated
Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search”.
In: International Conference on Machine Learning, 2018 (cited on page 541).
[861] Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. “Efficient Archi-
tecture Search by Network Transformation”. In: AAAI Conference on Artificial
Intelligence, 2018, pages 2787–2794 (cited on page 541).
[862] Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. “Speeding Up Au-
tomatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation
of Learning Curves”. In: International Joint Conference on Artificial Intelligence,
2015, pages 3460–3468 (cited on page 541).
[863] Aaron Klein, Stefan Falkner, Jost Tobias Springenberg, and Frank Hutter. “Learn-
ing Curve Prediction with Bayesian Neural Networks”. In: International Confer-
ence on Learning Representations, 2017 (cited on page 541).
[864] Bowen Baker, Otkrist Gupta, Ramesh Raskar, and Nikhil Naik. “Accelerating Neu-
ral Architecture Search using Performance Prediction”. In: International Confer-
ence on Learning Representations, 2018 (cited on page 541).
[865] Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. “Neural Architec-
ture Optimization”. In: Advances in Neural Information Processing Systems, 2018,
pages 7827–7838 (cited on page 542).
732 BIBLIOGRAPHY
[866] Yingce Xia, Xu Tan, Fei Tian, Fei Gao, Di He, Weicong Chen, Yang Fan, Linyuan
Gong, Yichong Leng, Renqian Luo, Yiren Wang, Lijun Wu, Jinhua Zhu, Tao Qin,
and Tie-Yan Liu. “Microsoft Research Asia’s Systems for WMT19”. In: Annual
Meeting of the Association for Computational Linguistics, 2019, pages 424–433
(cited on page 542).
[867] Prajit Ramachandran, Barret Zoph, and Quoc V. Le. “Searching for Activation
Functions”. In: International Conference on Learning Representations, 2018 (cited
on page 542).
[868] Wei Zhu, Xiaoling Wang, Xipeng Qiu, Yuan Ni, and Guotong Xie. “AutoTrans: Au-
tomating Transformer Design via Reinforced Architecture Search”. In: volume abs/2009.02070.
CoRR, 2020 (cited on page 542).
[869] Henry Tsai, Jayden Ooi, Chun-Sung Ferng, Hyung Won Chung, and Jason Riesa.
“Finding Fast Transformers: One-Shot Neural Architecture Search by Component
Composition”. In: volume abs/2008.06808. CoRR, 2020 (cited on page 543).
[870] Jian Li, Zhaopeng Tu, Baosong Yang, Michael Lyu, and Tong Zhang. “Multi-Head
Attention with Disagreement Regularization”. In: Conference on Empirical Meth-
ods in Natural Language Processing, 2018, pages 2897–2903 (cited on page 544).
[871] Jie Hao, Xing Wang, Shuming Shi, Jinfeng Zhang, and Zhaopeng Tu. “Multi-Granularity
Self-Attention for Neural Machine Translation”. In: Conference on Empirical Meth-
ods in Natural Language Processing, 2019, pages 887–897 (cited on page 544).
[872] Junyang Lin, Xu Sun, Xuancheng Ren, Muyu Li, and Qi Su. “Learning When to
Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural
Machine Translation”. In: Conference on Empirical Methods in Natural Language
Processing, 2018, pages 2985–2990 (cited on page 544).
[873] Hendra Setiawan, Matthias Sperber, Udhyakumar Nallasamy, and Matthias Paulik.
“Variational Neural Machine Translation with Normalizing Flows”. In: Annual
Meeting of the Association for Computational Linguistics, 2020 (cited on page 544).
[874] Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, and Zheng
Zhang. “Star-Transformer”. In: Annual Conference of the North American Chapter
of the Association for Computational Linguistics, 2019, pages 1315–1325 (cited on
page 544).
[875] Marzieh Fadaee, Arianna Bisazza, and Christof Monz. “Data Augmentation for
Low-Resource Neural Machine Translation”. In: Annual Meeting of the Associ-
ation for Computational Linguistics, 2017, pages 567–573 (cited on pages 545,
549).
BIBLIOGRAPHY 733
[876] Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. “SwitchOut: an Effi-
cient Data Augmentation Algorithm for Neural Machine Translation”. In: Confer-
ence on Empirical Methods in Natural Language Processing, 2018, pages 856–861
(cited on pages 545, 550, 580).
[877] Yuval Marton, Chris Callison-Burch, and Philip Resnik. “Improved Statistical Ma-
chine Translation Using Monolingually-Derived Paraphrases”. In: Annual Meeting
of the Association for Computational Linguistics, 2009, pages 381–390 (cited on
page 545).
[878] Jonathan Mallinson, Rico Sennrich, and Mirella Lapata. “Paraphrasing Revisited
with Neural Machine Translation”. In: Annual Conference of the European Asso-
ciation for Machine Translation, 2017, pages 881–893 (cited on pages 545, 550).
[879] Connor Shorten and Taghi M. Khoshgoftaar. “A survey on Image Data Augmenta-
tion for Deep Learning”. In: volume 6. Journal of Big Data, 2019, page 60 (cited
on page 546).
[880] Cong Duy Vu Hoang, Philipp Koehn, Gholamreza Haffari, and Trevor Cohn. “Iter-
ative Back-Translation for Neural Machine Translation”. In: Annual Meeting of the
Association for Computational Linguistics, 2018, pages 18–24 (cited on pages 546,
547).
[881] Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ran-
zato. “Unsupervised Machine Translation Using Monolingual Corpora Only”. In:
International Conference on Learning Representations, 2018 (cited on pages 546,
548, 573).
[882] Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio
Ranzato. “Phrase-Based & Neural Unsupervised Machine Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2018, pages 5039–5049
(cited on pages 546, 571, 573).
[883] Anna Currey, Antonio Valerio Miceli Barone, and Kenneth Heafield. “Copied Mono-
lingual Data Improves Low-Resource Neural Machine Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2017, pages 148–156
(cited on page 547).
[884] Kenji Imamura, Atsushi Fujita, and Eiichiro Sumita. “Enhancement of Encoder and
Attention Using Target Monolingual Corpora in Neural Machine Translation”. In:
Annual Meeting of the Association for Computational Linguistics, 2018, pages 55–
63 (cited on page 547).
734 BIBLIOGRAPHY
[885] Lijun Wu, Yiren Wang, Yingce Xia, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. “Ex-
ploiting Monolingual Data at Scale for Neural Machine Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2019, pages 4205–4215
(cited on pages 547, 548, 580).
[886] Myle Ott, Michael Auli, David Grangier, and Marc’Aurelio Ranzato. “Analyzing
Uncertainty in Neural Machine Translation”. In: volume 80. International Confer-
ence on Machine Learning, 2018, pages 3953–3962 (cited on page 547).
[887] Jiajun Zhang and Chengqing Zong. “Exploiting Source-side Monolingual Data in
Neural Machine Translation”. In: Conference on Empirical Methods in Natural
Language Processing, 2016, pages 1535–1545 (cited on pages 548, 549, 556, 580).
[888] Wael Farhan, Bashar Talafha, Analle Abuammar, Ruba Jaikat, Mahmoud Al-Ayyoub,
Ahmad Bisher Tarakji, and Anas Toma. “Unsupervised dialectal neural machine
translation”. In: volume 57. 3. Information Processing & Management, 2020, page 102181
(cited on pages 548, 574).
[889] Rahul Bhagat and Eduard Hovy. “What Is a Paraphrase?” In: volume 39. 3. Com-
putational Linguistics, 2013, pages 463–472 (cited on page 550).
[890] Nitin Madnani and Bonnie Dorr. “Generating Phrasal and Sentential Paraphrases:
A Survey of Data-Driven Methods”. In: volume 36. 3. Computational Linguistics,
2010, pages 341–387 (cited on page 550).
[891] Yinuo Guo and Junfeng Hu. “Meteor++ 2.0: Adopt Syntactic Level Paraphrase
Knowledge into Machine Translation Evaluation”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2019, pages 501–506 (cited on page 550).
[892] Zhong Zhou, Matthias Sperber, and Alexander Waibel. “Paraphrases as Foreign
Languages in Multilingual Neural Machine Translation”. In: Annual Meeting of the
Association for Computational Linguistics, 2019, pages 113–122 (cited on page 550).
[893] Sisay Fissaha Adafre and Maarten de Rijke. “Finding Similar Sentences across
Multiple Languages in Wikipedia”. In: Annual Conference of the European Asso-
ciation for Machine Translation, 2006 (cited on pages 550, 551).
[894] Dragos Stefan Munteanu and Daniel Marcu. “Improving Machine Translation Per-
formance by Exploiting Non-Parallel Corpora”. In: volume 31. 4. Computational
Linguistics, 2005, pages 477–504 (cited on pages 550, 551).
[895] Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Tao Qin, Jianhuang Lai, and Tie-Yan Liu.
“Machine Translation With Weakly Paired Documents”. In: Annual Meeting of
the Association for Computational Linguistics, 2019, pages 4374–4383 (cited on
pages 550, 551).
BIBLIOGRAPHY 735
[896] Keiji Yasuda and Eiichiro Sumita. “Method for building sentence-aligned corpus
from wikipedia”. In: AAAI Conference on Artificial Intelligence, 2008 (cited on
page 551).
[897] Jason Smith, Chris Quirk, and Kristina Toutanova. “Extracting Parallel Sentences
from Comparable Corpora using Document Level Alignment”. In: Annual Meeting
of the Association for Computational Linguistics, 2010, pages 403–411 (cited on
page 551).
[898] Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. “Exploiting Similarities among
Languages for Machine Translation”. In: volume abs/1309.4168. CoRR, 2013 (cited
on pages 551, 567, 569).
[899] Sebastian Ruder, Ivan Vulic, and Anders Søgaard. “A Survey of Cross-lingual
Word Embedding Models”. In: volume 65. Journal of Artificial Intelligence Re-
search, 2019, pages 569–631 (cited on page 551).
[900] Gulcehre Caglar, Firat Orhan, Xu Kelvin, Cho Kyunghyun, Barrault Loic, Lin Huei
Chi, Bougares Fethi, Schwenk Holger, and Bengio Yoshua. “On Using Monolin-
gual Corpora in Neural Machine Translation”. In: Computer Science, 2015 (cited
on pages 551, 579).
[901] Çaglar Gülçehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, and Yoshua Bengio.
“On integrating a language model into neural machine translation”. In: volume 45.
Computational Linguistics, 2017, pages 137–148 (cited on page 551).
[902] Felix Stahlberg, James Cross, and Veselin Stoyanov. “Simple Fusion: Return of
the Language Model”. In: Annual Meeting of the Association for Computational
Linguistics, 2018, pages 204–211 (cited on page 551).
[903] Zhaopeng Tu, Yang Liu, Zhengdong Lu, Xiaohua Liu, and Hang Li. “Context Gates
for Neural Machine Translation”. In: volume 5. Annual Meeting of the Association
for Computational Linguistics, 2017, pages 87–99 (cited on page 552).
[904] Andrew Dai and Quoc Le. “Semi-supervised Sequence Learning”. In: Annual Con-
ference on Neural Information Processing Systems, 2015, pages 3079–3087 (cited
on page 552).
[905] Ronan Collobert and Jason Weston. “A unified architecture for natural language
processing: deep neural networks with multitask learning”. In: volume 307. Interna-
tional Conference on Machine Learning, 2008, pages 160–167 (cited on page 552).
[906] Felipe Almeida and Geraldo Xexéo. “Word Embeddings: A Survey”. In: CoRR,
2019 (cited on page 552).
736 BIBLIOGRAPHY
[907] Masato Neishi, Jin Sakuma, Satoshi Tohda, Shonosuke Ishiwatari, Naoki Yoshi-
naga, and Masashi Toyoda. “A Bag of Useful Tricks for Practical Neural Machine
Translation: Embedding Layer Initialization and Large Batch Size”. In: Asian Fed-
eration of Natural Language Processing, 2017, pages 99–109 (cited on page 552).
[908] Ye Qi, Devendra Singh Sachan, Matthieu Felix, Sarguna Janani Padmanabhan, and
Graham Neubig. “When and Why are Pre-trained Word Embeddings Useful for
Neural Machine Translation?” In: Annual Conference of the North American Chap-
ter of the Association for Computational Linguistics, 2018 (cited on pages 552,
555).
[909] Matthew Peters, Waleed Ammar, Chandra Bhagavatula, and Russell Power. “Semi-
supervised sequence tagging with bidirectional language models”. In: Annual Meet-
ing of the Association for Computational Linguistics, 2017, pages 1756–1765 (cited
on page 553).
[910] Stéphane Clinchant, Kweon Woo Jung, and Vassilina Nikoulina. “On the use of
BERT for Neural Machine Translation”. In: Annual Meeting of the Association
for Computational Linguistics, 2019, pages 108–117 (cited on pages 554, 555).
[911] Kenji Imamura and Eiichiro Sumita. “Recycling a Pre-trained BERT Encoder for
Neural Machine Translation”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2019, pages 23–31 (cited on pages 554, 555).
[912] Sergey Edunov, Alexei Baevski, and Michael Auli. “Pre-trained language model
representations for language generation”. In: Annual Conference of the North Amer-
ican Chapter of the Association for Computational Linguistics, 2019, pages 4052–
4059 (cited on page 554).
[913] Tianyu He, Xu Tan, and Tao Qin. “Hard but Robust, Easy but Sensitive: How En-
coder and Decoder Perform in Neural Machine Translation”. In: volume abs/1908.06259.
CoRR, 2019 (cited on page 554).
[914] Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang
Li, and Tie-Yan Liu. “Incorporating BERT into Neural Machine Translation”. In:
International Conference on Learning Representations, 2020 (cited on page 554).
[915] Jiacheng Yang, Mingxuan Wang, Hao Zhou, Chengqi Zhao, Weinan Zhang, Yong
Yu, and Lei Li. “Towards Making the Most of BERT in Neural Machine Trans-
lation”. In: AAAI Conference on Artificial Intelligence, 2020, pages 9378–9385
(cited on pages 554, 555).
BIBLIOGRAPHY 737
[916] Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. “MASS: Masked
Sequence to Sequence Pre-training for Language Generation”. In: volume 97. In-
ternational Conference on Machine Learning, 2019, pages 5926–5936 (cited on
pages 554, 580).
[917] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mo-
hamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. “BART: Denoising
Sequence-to-Sequence Pre-training for Natural Language Generation, Translation,
and Comprehension”. In: Annual Meeting of the Association for Computational
Linguistics, 2020, pages 7871–7880 (cited on pages 554, 555, 580).
[918] Weizhen Qi, Yu Yan, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei
Zhang, and Ming Zhou. “ProphetNet: Predicting Future N-gram for Sequence-to-
Sequence Pre-training”. In: Annual Meeting of the Association for Computational
Linguistics, 2020, pages 2401–2410 (cited on page 554).
[919] Rongxiang Weng, Heng Yu, Shujian Huang, Shanbo Cheng, and Weihua Luo.
“Acquiring Knowledge from Pre-Trained Model to Neural Machine Translation”.
In: AAAI Conference on Artificial Intelligence, 2020, pages 9266–9273 (cited on
page 555).
[920] Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvinine-
jad, Mike Lewis, and Luke Zettlemoyer. “Multilingual Denoising Pre-training for
Neural Machine Translation”. In: volume 8. Transactions of the Association for
Computational Linguistics, 2020, pages 726–742 (cited on page 555).
[921] Baijun Ji, Zhirui Zhang, Xiangyu Duan, Min Zhang, Boxing Chen, and Weihua
Luo. “Cross-Lingual Pre-Training Based Transfer for Zero-Shot Neural Machine
Translation”. In: AAAI Conference on Artificial Intelligence, 2020, pages 115–122
(cited on page 555).
[922] Zhen Yang, Bojie Hu, Ambyera Han, Shen Huang, and Qi Ju. “CSP: Code-Switching
Pre-training for Neural Machine Translation”. In: Conference on Empirical Meth-
ods in Natural Language Processing, 2020, pages 2624–2636 (cited on page 555).
[923] Dusan Varis and Ondrej Bojar. “Unsupervised Pretraining for Neural Machine
Translation Using Elastic Weight Consolidation”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2019, pages 130–135 (cited on page 555).
[924] Sebastian Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”.
In: volume abs/1706.05098. CoRR, 2017 (cited on page 555).
[925] Rich Caruana. “Multitask Learning”. In: Springer, 1998, pages 95–133 (cited on
page 555).
738 BIBLIOGRAPHY
[926] Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. “Multi-Task Deep
Neural Networks for Natural Language Understanding”. In: Annual Meeting of
the Association for Computational Linguistics, 2019, pages 4487–4496 (cited on
page 555).
[927] Tobias Domhan and Felix Hieber. “Using Target-side Monolingual Data for Neu-
ral Machine Translation through Multi-task Learning”. In: Conference on Empir-
ical Methods in Natural Language Processing, 2017, pages 1500–1505 (cited on
pages 556, 579).
[928] Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. “Multi-Task
Learning for Multiple Language Translation”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2015, pages 1723–1732 (cited on pages 556,
564, 580).
[929] Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng
Chen, Nikhil Thorat, Fernanda B. Viégas, Martin Wattenberg, Greg Corrado, Mac-
duff Hughes, and Jeffrey Dean. “Google’s Multilingual Neural Machine Trans-
lation System: Enabling Zero-Shot Translation”. In: volume 5. Transactions of
the Association for Computational Linguistics, 2017, pages 339–351 (cited on
pages 556, 561, 564, 565, 580).
[930] Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, and Enhong Chen. “Joint Training
for Neural Machine Translation Models with Monolingual Data”. In: AAAI Con-
ference on Artificial Intelligence, 2018, pages 555–562 (cited on page 557).
[931] Meng Sun, Bojian Jiang, Hao Xiong, Zhongjun He, Hua Wu, and Haifeng Wang.
“Baidu Neural Machine Translation Systems for WMT19”. In: Annual Meeting
of the Association for Computational Linguistics, 2019, pages 374–381 (cited on
page 557).
[932] Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, and Tie-Yan Liu. “Dual
Supervised Learning”. In: volume 70. International Conference on Machine Learn-
ing, 2017, pages 3789–3798 (cited on pages 558, 559).
[933] Yingce Xia, Xu Tan, Fei Tian, Tao Qin, Nenghai Yu, and Tie-Yan Liu. “Model-
Level Dual Learning”. In: volume 80. Proceedings of Machine Learning Research.
International Conference on Machine Learning, 2018, pages 5379–5388 (cited on
page 558).
[934] Tao Qin. “Dual Learning for Machine Translation and Beyond”. In: Springer, 2020,
pages 49–72 (cited on pages 558, 559).
BIBLIOGRAPHY 739
[935] Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying
Ma. “Dual Learning for Machine Translation”. In: 2016, pages 820–828 (cited on
page 558).
[936] Zhibing Zhao, Yingce Xia, Tao Qin, Lirong Xia, and Tie-Yan Liu. “Dual Learning:
Theoretical Study and an Algorithmic Extension”. In: arXiv preprint arXiv:2005.08238,
2020 (cited on page
558).
[937] Richard Sutton, David Allen McAllester, Satinder Singh, and Yishay Mansour.
“Policy Gradient Methods for Reinforcement Learning with Function Approxima-
tion”. In: The MIT Press, 1999, pages 1057–1063 (cited on page 560).
[938] Raj Dabre, Chenhui Chu, and Anoop Kunchukuttan. “A survey of multilingual
neural machine translation”. In: volume 53. 5. ACM Computing Surveys, 2020,
pages 1–38 (cited on pages 561, 564).
[939] Hua Wu and Haifeng Wang. “Pivot language approach for phrase-based statistical
machine translation”. In: volume 21. 3. Machine Translation, 2007, pages 165–181
(cited on page 561).
[940] Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram Khadivi, and Hermann Ney.
“Pivot-based Transfer Learning for Neural Machine Translation between Non-English
Languages”. In: Annual Meeting of the Association for Computational Linguistics,
2019, pages 866–876 (cited on page 561).
[941] Masao Utiyama and Hitoshi Isahara. “A Comparison of Pivot Methods for Phrase-
Based Statistical Machine Translation”. In: Annual Meeting of the Association for
Computational Linguistics, 2007, pages 484–491 (cited on page 561).
[942] Samira Tofighi Zahabi, Somayeh Bakhshaei, and Shahram Khadivi. “Using Con-
text Vectors in Improving a Machine Translation System with Bridge Language”.
In: Annual Meeting of the Association for Computational Linguistics, 2013, pages 318–
322 (cited on page 561).
[943] Xiaoning Zhu, Zhongjun He, Hua Wu, Conghui Zhu, Haifeng Wang, and Tiejun
Zhao. “Improving Pivot-Based Statistical Machine Translation by Pivoting the Co-
occurrence Count of Phrase Pairs”. In: Conference on Empirical Methods in Natu-
ral Language Processing, 2014, pages 1665–1675 (cited on page 561).
[944] Akiva Miura, Graham Neubig, Sakriani Sakti, Tomoki Toda, and Satoshi Naka-
mura. “Improving Pivot Translation by Remembering the Pivot”. In: Annual Meet-
ing of the Association for Computational Linguistics, 2015, pages 573–577 (cited
on page 561).
740 BIBLIOGRAPHY
[945] Trevor Cohn and Mirella Lapata. “Machine Translation by Triangulation: Making
Effective Use of Multi-Parallel Corpora”. In: Annual Meeting of the Association
for Computational Linguistics, 2007 (cited on page 561).
[946] Hua Wu and Haifeng Wang. “Revisiting Pivot Language Approach for Machine
Translation”. In: Annual Meeting of the Association for Computational Linguistics,
2009, pages 154–162 (cited on page
561).
[947] Adrià De Gispert and Jose B Marino. “Catalan-English statistical machine transla-
tion without parallel corpus: bridging through Spanish”. In: International Confer-
ence on Language Resources and Evaluation, 2006, pages 65–68 (cited on page 561).
[948] Yong Cheng, Yang Liu, Qian Yang, Maosong Sun, and Wei Xu. “Neural Machine
Translation with Pivot Languages”. In: volume abs/1611.04928. CoRR, 2016 (cited
on page 561).
[949] Michael Paul, Hirofumi Yamamoto, Eiichiro Sumita, and Satoshi Nakamura. “On
the Importance of Pivot Language Selection for Statistical Machine Translation”.
In: Annual Conference of the North American Chapter of the Association for Com-
putational Linguistics, 2009, pages 221–224 (cited on page 561).
[950] Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, and Tie-Yan Liu. “Multilingual Neural
Machine Translation with Knowledge Distillation”. In: International Conference
on Learning Representations, 2019 (cited on page 562).
[951] Jiatao Gu, Yong Wang, Yun Chen, Victor O. K. Li, and Kyunghyun Cho. “Meta-
Learning for Low-Resource Neural Machine Translation”. In: Conference on Em-
pirical Methods in Natural Language Processing, 2018, pages 3622–3631 (cited on
page 563).
[952] Chelsea Finn, Pieter Abbeel, and Sergey Levine. “Model-Agnostic Meta-Learning
for Fast Adaptation of Deep Networks”. In: volume 70. Proceedings of Machine
Learning Research. International Conference on Machine Learning, 2017, pages 1126–
1135 (cited on page 563).
[953] Jiatao Gu, Hany Hassan, Jacob Devlin, and Victor O. K. Li. “Universal Neural
Machine Translation for Extremely Low Resource Languages”. In: Annual Con-
ference of the North American Chapter of the Association for Computational Lin-
guistics, 2018, pages 344–354 (cited on page 563).
[954] Tom Kocmi and Ondrej Bojar. “Trivial Transfer Learning for Low-Resource Neu-
ral Machine Translation”. In: Annual Meeting of the Association for Computational
Linguistics, 2018, pages 244–252 (cited on page 564).
BIBLIOGRAPHY 741
[955] Baijun Ji, Zhirui Zhang, Xiangyu Duan, Min Zhang, Boxing Chen, and Weihua
Luo. “Cross-Lingual Pre-Training Based Transfer for Zero-Shot Neural Machine
Translation”. In: volume 34. 01. Proceedings of the AAAI Conference on Artificial
Intelligence, 2020, pages 115–122 (cited on page 564).
[956] Zehui Lin, Xiao Pan, Mingxuan Wang, Xipeng Qiu, Jiangtao Feng, Hao Zhou,
and Lei Li. “Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information”. In: Conference on Empirical Methods in Natural Lan-
guage Processing, 2020, pages 2649–2663 (cited on page 564).
[957] Matiss Rikters, Marcis Pinnis, and Rihards Krislauks. “Training and Adapting Mul-
tilingual NMT for Less-resourced and Morphologically Rich Languages”. In: Eu-
ropean Language Resources Association, 2018 (cited on page 564).
[958] Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. “Multi-Way, Multilingual Neu-
ral Machine Translation with a Shared Attention Mechanism”. In: Annual Confer-
ence of the North American Chapter of the Association for Computational Linguis-
tics, 2016, pages 866–875 (cited on pages 564, 580).
[959] Biao Zhang, Philip Williams, Ivan Titov, and Rico Sennrich. “Improving Massively
Multilingual Neural Machine Translation and Zero-Shot Translation”. In: Annual
Meeting of the Association for Computational Linguistics, 2020, pages 1628–1639
(cited on page 566).
[960] Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Sid-
dharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaud-
hary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard
Grave, Michael Auli, and Armand Joulin. “Beyond English-Centric Multilingual
Machine Translation”. In: volume abs/2010.11125. CoRR, 2020 (cited on page 566).
[961] 黄书剑. 统计机器翻译中的词对齐研究”. In: 南京大学, 2012 (cited on page 567).
[962] Ivan Vulic and Anna Korhonen. “On the Role of Seed Lexicons in Learning Bilin-
gual Word Embeddings”. In: Annual Meeting of the Association for Computational
Linguistics, 2016 (cited on page 567).
[963] Samuel L. Smith, David H. P. Turban, Steven Hamblin, and Nils Y. Hammerla.
“Offline bilingual word vectors, orthogonal transformations and the inverted soft-
max”. In: International Conference on Learning Representations, 2017 (cited on
pages 567, 569).
[964] Mikel Artetxe, Gorka Labaka, and Eneko Agirre. “Learning bilingual word embed-
dings with (almost) no bilingual data”. In: Annual Meeting of the Association for
Computational Linguistics, 2017, pages 451–462 (cited on page 567).
742 BIBLIOGRAPHY
[965] Ruochen Xu, Yiming Yang, Naoki Otani, and Yuexin Wu. “Unsupervised Cross-
lingual Transfer of Word Embedding Spaces”. In: Conference on Empirical Meth-
ods in Natural Language Processing, 2018, pages 2465–2474 (cited on pages 568,
569).
[966] Schnemann and Peter. “A generalized solution of the orthogonal procrustes prob-
lem”. In: volume 31. 1. Psychometrika, 1966, pages 1–10 (cited on page 568).
[967] Guillaume Lample, Alexis Conneau, Marc’Aurelio Ranzato, Ludovic Denoyer,
and Hervé Jégou. “Word translation without parallel data”. In: International Con-
ference on Learning Representations, 2018 (cited on pages 568, 569).
[968] Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. “Adversarial Training
for Unsupervised Bilingual Lexicon Induction”. In: Annual Meeting of the Associ-
ation for Computational Linguistics, 2017, pages 1959–1970 (cited on pages 568,
569).
[969] Tasnim Mohiuddin and Shafiq Rayhan Joty. “Revisiting Adversarial Autoencoder
for Unsupervised Word Translation with Cycle Consistency and Improved Train-
ing”. In: Annual Meeting of the Association for Computational Linguistics, 2019,
pages 3857–3867 (cited on pages 568, 569).
[970] David Alvarez-Melis and Tommi S. Jaakkola. “Gromov-Wasserstein Alignment
of Word Embedding Spaces”. In: Conference on Empirical Methods in Natural
Language Processing, 2018, pages 1881–1890 (cited on pages 568, 569).
[971] Nicolas Garneau, Mathieu Godbout, David Beauchemin, Audrey Durand, and Luc
Lamontagne. “A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual
Mappings of Word Embeddings: Making the Method Robustly Reproducible as
Well”. In: Language Resources and Evaluation Conference, 2020, pages 5546–
5554 (cited on page 568).
[972] Jean Alaux, Edouard Grave, Marco Cuturi, and Armand Joulin. “Unsupervised Hy-
peralignment for Multilingual Word Embeddings”. In: International Conference on
Learning Representations, 2018 (cited on pages 568, 569).
[973] Chao Xing, Dong Wang, Chao Liu, and Yiye Lin. “Normalized Word Embedding
and Orthogonal Transform for Bilingual Word Translation”. In: Annual Conference
of the North American Chapter of the Association for Computational Linguistics,
2015, pages 1006–1011 (cited on page 569).
BIBLIOGRAPHY 743
[974] Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. “Earth Movers Distance
Minimization for Unsupervised Bilingual Lexicon Induction”. In: Conference on
Empirical Methods in Natural Language Processing, 2017, pages 1934–1945 (cited
on page 569).
[975] Mareike Hartmann, Yova Kementchedjhieva, and Anders Søgaard. “Empirical ob-
servations on the instability of aligning word vector spaces with GANs”. In: open-
review.net, 2018 (cited on page 569).
[976] Zi-Yi Dou, Zhi-Hao Zhou, and Shujian Huang. “Unsupervised Bilingual Lexicon
Induction via Latent Variable Models”. In: Conference on Empirical Methods in
Natural Language Processing, 2018, pages 621–626 (cited on page 569).
[977] Jiaji Huang, Qiang Qiu, and Kenneth Church. “Hubless Nearest Neighbor Search
for Bilingual Lexicon Induction”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2019, pages 4072–4080 (cited on page 569).
[978] Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Hervé Jégou, and Edouard
Grave. “Loss in Translation: Learning Bilingual Word Mapping with a Retrieval
Criterion”. In: Conference on Empirical Methods in Natural Language Processing,
2018, pages 2979–2984 (cited on page 569).
[979] Xilun Chen and Claire Cardie. “Unsupervised Multilingual Word Embeddings”. In:
Conference on Empirical Methods in Natural Language Processing, 2018, pages 261–
270 (cited on page 569).
[980] Hagai Taitelbaum, Gal Chechik, and Jacob Goldberger. “Multilingual word trans-
lation using auxiliary languages”. In: Conference on Empirical Methods in Natural
Language Processing, 2019, pages 1330–1335 (cited on page 569).
[981] Geert Heyman, Bregt Verreet, Ivan Vulic, and Marie-Francine Moens. “Learning
Unsupervised Multilingual Word Embeddings with Incremental Multilingual Hubs”.
In: Annual Conference of the North American Chapter of the Association for Com-
putational Linguistics, 2019, pages 1890–1902 (cited on page 569).
[982] Yedid Hoshen and Lior Wolf. “Non-Adversarial Unsupervised Word Translation”.
In: Annual Meeting of the Association for Computational Linguistics, 2018, pages 469–
478 (cited on page 569).
[983] Tanmoy Mukherjee, Makoto Yamada, and Timothy Hospedales. “Learning Unsu-
pervised Word Translations Without Adversaries”. In: Conference on Empirical
Methods in Natural Language Processing, 2018, pages 627–632 (cited on page 569).
744 BIBLIOGRAPHY
[984] Ivan Vulic, Goran Glavas, Roi Reichart, and Anna Korhonen. “Do We Really
Need Fully Unsupervised Cross-Lingual Embeddings?” In: Conference on Empir-
ical Methods in Natural Language Processing, 2019, pages 4406–4417 (cited on
page 569).
[985] Yanyang Li, Yingfeng Luo, Ye Lin, Quan Du, Huizhen Wang, Shujian Huang, Tong
Xiao, and Jingbo Zhu. “A Simple and Effective Approach to Robust Unsupervised
Bilingual Dictionary Induction”. In: International Conference on Computational
Linguistics, 2020 (cited on pages 569, 570).
[986] Anders Søgaard, Sebastian Ruder, and Ivan Vulic. “On the Limitations of Unsuper-
vised Bilingual Dictionary Induction”. In: Annual Meeting of the Association for
Computational Linguistics, 2018, pages 778–788 (cited on page 570).
[987] Benjamin Marie and Atsushi Fujita. “Iterative Training of Unsupervised Neural
and Statistical Machine Translation Systems”. In: volume 19. 5. ACM Transactions
on Asian and Low-Resource Language Information Processing, 2020, 68:1–68:21
(cited on page 570).
[988] Mikel Artetxe, Gorka Labaka, and Eneko Agirre. “Unsupervised Statistical Ma-
chine Translation”. In: Conference on Empirical Methods in Natural Language
Processing, 2018, pages 3632–3642 (cited on pages 570, 571, 574).
[989] Mikel Artetxe, Gorka Labaka, and Eneko Agirre. “An Effective Approach to Un-
supervised Machine Translation”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2019, pages 194–203 (cited on page 572).
[990] Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, and
Jonathan May. “Translating Translationese: A Two-Step Approach to Unsuper-
vised Machine Translation”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2019, pages 3057–3062 (cited on page 572).
[991] Alexis Conneau and Guillaume Lample. “Cross-lingual Language Model Pretrain-
ing”. In: Annual Conference on Neural Information Processing Systems, 2019,
pages 7057–7067 (cited on pages 574, 580).
[992] Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. “Revisiting
Unreasonable Effectiveness of Data in Deep Learning Era”. In: IEEE International
Conference on Computer Vision, 2017, pages 843–852 (cited on page 576).
[993] Kevin Duh, Graham Neubig, Katsuhito Sudoh, and Hajime Tsukada. “Adaptation
Data Selection using Neural Language Models: Experiments in Machine Transla-
tion”. In: Annual Meeting of the Association for Computational Linguistics, 2013,
pages 678–683 (cited on pages 576, 577).
BIBLIOGRAPHY 745
[994] Spyros Matsoukas, Antti-Veikko I. Rosti, and Bing Zhang. “Discriminative Corpus
Weight Estimation for Machine Translation”. In: Conference on Empirical Methods
in Natural Language Processing, 2009, pages 708–717 (cited on page 576).
[995] George F. Foster, Cyril Goutte, and Roland Kuhn. “Discriminative Instance Weight-
ing for Domain Adaptation in Statistical Machine Translation”. In: Conference on
Empirical Methods in Natural Language Processing, 2010, pages 451–459 (cited
on page 576).
[996] Jingbo Zhu and Eduard H. Hovy. “Active Learning for Word Sense Disambiguation
with Methods for Addressing the Class Imbalance Problem”. In: Conference on
Empirical Methods in Natural Language Processing, 2007, pages 783–790 (cited
on page 577).
[997] Kashif Shah, Loı
̈
c Barrault, and Holger Schwenk. “Translation Model Adaptation
by Resampling”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2010, pages 392–399 (cited on page 577).
[998] Masao Utiyama and Hitoshi Isahara. “Reliable Measures for Aligning Japanese-
English News Articles and Sentences”. In: Annual Meeting of the Association for
Computational Linguistics, 2003, pages 72–79 (cited on page 577).
[999] Nicola Bertoldi and Marcello Federico. “Domain Adaptation for Statistical Ma-
chine Translation with Monolingual Resources”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2009, pages 182–189 (cited on page 577).
[1000] Chenhui Chu, Raj Dabre, and Sadao Kurohashi. “An Empirical Comparison of
Domain Adaptation Methods for Neural Machine Translation”. In: Annual Meeting
of the Association for Computational Linguistics, 2017, pages 385–391 (cited on
pages 578, 579).
[1001] Mohammad Amin Farajian, Marco Turchi, Matteo Negri, Nicola Bertoldi, and Mar-
cello Federico. “Neural vs. Phrase-Based Machine Translation in a Multi-Domain
Scenario”. In: Annual Conference of the European Association for Machine Trans-
lation, 2017, pages 280–284 (cited on page 578).
[1002] Jiali Zeng, Yang Liu, Jinsong Su, Yubin Ge, Yaojie Lu, Yongjing Yin, and Jiebo
Luo. “Iterative Dual Domain Adaptation for Neural Machine Translation”. In: Con-
ference on Empirical Methods in Natural Language Processing, 2019, pages 845–
855 (cited on page 579).
746 BIBLIOGRAPHY
[1003] Antonio Valerio Miceli Barone, Barry Haddow, Ulrich Germann, and Rico Sen-
nrich. “Regularization techniques for fine-tuning in neural machine translation”. In:
Conference on Empirical Methods in Natural Language Processing, 2017, pages 1489–
1494 (cited on pages 579, 611).
[1004] Huda Khayrallah, Gaurav Kumar, Kevin Duh, Matt Post, and Philipp Koehn. “Neu-
ral lattice search for domain adaptation in machine translation”. In: International
Joint Conference on Natural Language Processing, 2017, pages 20–25 (cited on
page 579).
[1005] Rico Sennrich. “Perplexity Minimization for Translation Model Domain Adapta-
tion in Statistical Machine Translation”. In: Annual Meeting of the Association for
Computational Linguistics, 2012, pages 539–549 (cited on page 579).
[1006] Markus Freitag and Yaser Al-Onaizan. “Fast Domain Adaptation for Neural Ma-
chine Translation”. In: volume abs/1612.06897. CoRR, 2016 (cited on page 579).
[1007] Danielle Saunders, Felix Stahlberg, Adrià de Gispert, and Bill Byrne. “Domain
Adaptive Inference for Neural Machine Translation”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2019, pages 222–228 (cited on page 579).
[1008] Ankur Bapna and Orhan Firat. “Non-Parametric Adaptation for Neural Machine
Translation”. In: Annual Conference of the North American Chapter of the Associ-
ation for Computational Linguistics, 2019, pages 1921–1931 (cited on page 579).
[1009] Mengzhou Xia, Xiang Kong, Antonios Anastasopoulos, and Graham Neubig. “Gen-
eralized Data Augmentation for Low-Resource Translation”. In: Annual Meeting
of the Association for Computational Linguistics, 2019, pages 5786–5796 (cited
on page 580).
[1010] Marzieh Fadaee and Christof Monz. “Back-Translation Sampling by Targeting Dif-
ficult Words in Neural Machine Translation”. In: Annual Meeting of the Associa-
tion for Computational Linguistics, 2018, pages 436–446 (cited on page 580).
[1011] Nuo Xu, Yinqiao Li, Chen Xu, Yanyang Li, Bei Li, Tong Xiao, and Jingbo Zhu.
“Analysis of Back-Translation Methods for Low-Resource Neural Machine Trans-
lation”. In: volume 11839. Natural Language Processing and Chinese Computing,
2019, pages 466–475 (cited on page 580).
[1012] Isaac Caswell, Ciprian Chelba, and David Grangier. “Tagged Back-Translation”.
In: Annual Meeting of the Association for Computational Linguistics, 2019, pages 53–
63 (cited on page 580).
BIBLIOGRAPHY 747
[1013] Zi-Yi Dou, Antonios Anastasopoulos, and Graham Neubig. “Dynamic Data Se-
lection and Weighting for Iterative Back-Translation”. In: Conference on Empir-
ical Methods in Natural Language Processing, 2020, pages 5894–5904 (cited on
page 580).
[1014] Shuo Wang, Yang Liu, Chao Wang, Huanbo Luan, and Maosong Sun. “Improv-
ing Back-Translation with Uncertainty-based Confidence Estimation”. In: Annual
Meeting of the Association for Computational Linguistics, 2019, pages 791–802
(cited on page 580).
[1015] Guanlin Li, Lemao Liu, Guoping Huang, Conghui Zhu, and Tiejun Zhao. “Un-
derstanding Data Augmentation in Neural Machine Translation: Two Perspectives
towards Generalization”. In: Annual Meeting of the Association for Computational
Linguistics, 2019, pages 5688–5694 (cited on page 580).
[1016] Benjamin Marie, Raphael Rubino, and Atsushi Fujita. “Tagged Back-translation
Revisited: Why Does It Really Work?” In: Annual Meeting of the Association for
Computational Linguistics, 2020, pages 5990–5997 (cited on page 580).
[1017] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdi-
nov, and Quoc V. Le. “XLNet: Generalized Autoregressive Pretraining for Lan-
guage Understanding”. In: Annual Conference on Neural Information Processing
Systems, 2019, pages 5754–5764 (cited on page 580).
[1018] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma,
and Radu Soricut. “ALBERT: A Lite BERT for Self-supervised Learning of Lan-
guage Representations”. In: International Conference on Learning Representations,
2020 (cited on page 580).
[1019] Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu.
“ERNIE: Enhanced Language Representation with Informative Entities”. In: An-
nual Meeting of the Association for Computational Linguistics, 2019, pages 1441–
1451 (cited on page 580).
[1020] Haoyang Huang, Yaobo Liang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang,
and Ming Zhou. “Unicoder: A Universal Language Encoder by Pre-training with
Multiple Cross-lingual Tasks”. In: Conference on Empirical Methods in Natural
Language Processing, 2019, pages 2485–2494 (cited on page 580).
[1021] Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid.
“VideoBERT: A Joint Model for Video and Language Representation Learning”.
In: International Conference on Computer Vision, 2019, pages 7463–7472 (cited
on page 580).
748 BIBLIOGRAPHY
[1022] Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. “ViLBERT: Pretraining Task-
Agnostic Visiolinguistic Representations for Vision-and-Language Tasks”. In: An-
nual Annual Conference on Neural Information Processing Systems, 2019, pages 13–
23 (cited on page 580).
[1023] Yung-Sung Chuang, Chi-Liang Liu, Hung-yi Lee, and Lin-Shan Lee. “Speech-
BERT: An Audio-and-Text Jointly Learned Language Model for End-to-End Spo-
ken Question Answering”. In: Annual Conference of the International Speech Com-
munication Association, 2020, pages 4168–4172 (cited on page 580).
[1024] Matthew Peters, Sebastian Ruder, and Noah A. Smith. “To Tune or Not to Tune?
Adapting Pretrained Representations to Diverse Tasks”. In: Annual Meeting of the
Association for Computational Linguistics, 2019, pages 7–14 (cited on page 580).
[1025] Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. “How to Fine-Tune BERT
for Text Classification?” In: volume 11856. Chinese Computational Linguistics,
2019, pages 194–206 (cited on page 580).
[1026] Thanh-Le Ha, Jan Niehues, and Alexander H. Waibel. “Toward Multilingual Neu-
ral Machine Translation with Universal Encoder and Decoder”. In: volume abs/1611.04798.
CoRR, 2016 (cited on page 580).
[1027] Graeme W. Blackwood, Miguel Ballesteros, and Todd Ward. “Multilingual Neural
Machine Translation with Task-Specific Attention”. In: International Conference
on Computational Linguistics, 2018, pages 3112–3122 (cited on page 580).
[1028] Devendra Singh Sachan and Graham Neubig. “Parameter Sharing Methods for
Multilingual Self-Attentional Translation Models”. In: Annual Meeting of the As-
sociation for Computational Linguistics, 2018, pages 261–271 (cited on page 580).
[1029] Yichao Lu, Phillip Keung, Faisal Ladhak, Vikas Bhardwaj, Shaonan Zhang, and
Jason Sun. “A neural interlingua for multilingual machine translation”. In: An-
nual Meeting of the Association for Computational Linguistics, 2018, pages 84–92
(cited on page 580).
[1030] Yining Wang, Long Zhou, Jiajun Zhang, Feifei Zhai, Jingfang Xu, and Chengqing
Zong. “A Compact and Language-Sensitive Multilingual Translation Method”. In:
Annual Meeting of the Association for Computational Linguistics, 2019, pages 1213–
1223 (cited on page 580).
[1031] Xinyi Wang, Hieu Pham, Philip Arthur, and Graham Neubig. “Multilingual Neural
Machine Translation With Soft Decoupled Encoding”. In: International Conference
on Learning Representations, 2019 (cited on page 580).
BIBLIOGRAPHY 749
[1032] Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin, and Tie-Yan Liu. “Multilingual
Neural Machine Translation with Language Clustering”. In: Conference on Em-
pirical Methods in Natural Language Processing, 2019, pages 963–973 (cited on
page 580).
[1033] Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman-Vural, and
Kyunghyun Cho. “Zero-Resource Translation with Multi-Lingual Neural Machine
Translation”. In: Conference on Empirical Methods in Natural Language Process-
ing, 2016, pages 268–277 (cited on page 580).
[1034] Lierni Sestorain, Massimiliano Ciaramita, Christian Buck, and Thomas Hofmann.
“Zero-Shot Dual Machine Translation”. In: volume abs/1805.10338. CoRR, 2018
(cited on page 580).
[1035] Maruan Al-Shedivat and Ankur P. Parikh. “Consistency by Agreement in Zero-
Shot Neural Machine Translation”. In: Annual Conference of the North American
Chapter of the Association for Computational Linguistics, 2019, pages 1184–1197
(cited on page 580).
[1036] Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson,
and Wolfgang Macherey. “The Missing Ingredient in Zero-Shot Neural Machine
Translation”. In: volume abs/1903.07091. CoRR, 2019 (cited on page 580).
[1037] Jiatao Gu, Yong Wang, Kyunghyun Cho, and Victor O. K. Li. “Improved Zero-
shot Neural Machine Translation via Ignoring Spurious Correlations”. In: Annual
Meeting of the Association for Computational Linguistics, 2019, pages 1258–1268
(cited on page 580).
[1038] Anna Currey and Kenneth Heafield. “Zero-Resource Neural Machine Translation
with Monolingual Pivot Data”. In: Conference on Empirical Methods in Natural
Language Processing, 2019, pages 99–107 (cited on page 580).
[1039] 李琳 洪青阳. 语音识别:原理与应用. 电子工业出版社, 2020 (cited on pages 583,
584).
[1040] 陈果, 家宇, 兴宇, and 俊博. Kaldi 识别实战. 工业版社,
2020 (cited on page 583).
[1041] Tara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin W. Wilson, and Oriol
Vinyals. “Learning the speech front-end with raw waveform CLDNNs”. In: An-
nual Conference of the International Speech Communication Association, 2015,
pages 1–5 (cited on page 584).
750 BIBLIOGRAPHY
[1042] Abdel-rahman Mohamed, Geoffrey E. Hinton, and Gerald Penn. “Understanding
how Deep Belief Networks perform acoustic modelling”. In: International Confer-
ence on Acoustics, Speech and Signal Processing, 2012, pages 4273–4276 (cited
on page 584).
[1043] Mark J. F. Gales and Steve J. Young. “The Application of Hidden Markov Models
in Speech Recognition”. In: Found Trends Signal Process, 2007, pages 195–304
(cited on page 585).
[1044] Abdel-rahman Mohamed, George E. Dahl, and Geoffrey E. Hinton. “Acoustic Mod-
eling Using Deep Belief Networks”. In: IEEE Transactions on Speech and Audio
Processing, 2012, pages 14–22 (cited on page 585).
[1045] G Hinton, L Deng, D Yu, GE Dahl, and B Kingsbury. “Deep Neural Networks
for Acoustic Modeling in Speech Recognition: The Shared Views of Four Re-
search Groups”. In: IEEE Signal Processing Magazine, 2012, pages 82–97 (cited
on page 585).
[1046] Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua
Bengio. “Attention-Based Models for Speech Recognition”. In: Annual Confer-
ence on Neural Information Processing Systems, 2015, pages 577–585 (cited on
page 585).
[1047] William Chan, Navdeep Jaitly, Quoc V. Le, and Oriol Vinyals. “Listen, attend and
spell: A neural network for large vocabulary conversational speech recognition”.
In: International Conference on Acoustics, Speech and Signal Processing, 2016,
pages 4960–4964 (cited on page 585).
[1048] Long Duong, Antonios Anastasopoulos, David Chiang, Steven Bird, and Trevor
Cohn. “An Attentional Model for Speech Translation Without Transcription”. In:
Annual Conference of the North American Chapter of the Association for Compu-
tational Linguistics, 2016, pages 949–959 (cited on page 586).
[1049] Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, and Zhifeng Chen.
“Sequence-to-Sequence Models Can Directly Translate Foreign Speech”. In: Inter-
national Symposium on Computer Architecture, 2017, pages 2625–2629 (cited on
page 586).
[1050] Alexandre Berard, Olivier Pietquin, Christophe Servan, and Laurent Besacier. “Lis-
ten and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation”.
In: Conference and Workshop on Neural Information Processing Systems, 2016
(cited on page 586).
BIBLIOGRAPHY 751
[1051] Mattia Antonino Di Gangi, Matteo Negri, Roldano Cattoni, Roberto Dessı
̀
, and
Marco Turchi. “Enhancing Transformer for End-to-end Speech-to-Text Transla-
tion”. In: European Association for Machine Translation, 2019, pages 21–31 (cited
on page 588).
[1052] Alex Graves, Santiago Fernández, Faustino J. Gomez, and Jürgen Schmidhuber.
“Connectionist temporal classification: labelling unsegmented sequence data with
recurrent neural networks”. In: volume 148. International Conference on Machine
Learning, 2006, pages 369–376 (cited on page 588).
[1053] Shinji Watanabe, Takaaki Hori, Suyoun Kim, John R. Hershey, and Tomoki Hayashi.
“Hybrid CTC/Attention Architecture for End-to-End Speech Recognition”. In: IEEE
Journal of Selected Topics in Signal Processing, 2017, pages 1240–1253 (cited on
page 588).
[1054] Suyoun Kim, Takaaki Hori, and Shinji Watanabe. “Joint CTC-attention based end-
to-end speech recognition using multi-task learning”. In: International Conference
on Acoustics, Speech and Signal Processing, 2017, pages 4835–4839 (cited on
page 588).
[1055] Baoguang Shi, Xiang Bai, and Cong Yao. “An End-to-End Trainable Neural Net-
work for Image-Based Sequence Recognition and Its Application to Scene Text
Recognition”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence,
2017, pages 2298–2304 (cited on page 588).
[1056] Antonios Anastasopoulos and David Chiang. “Tied Multitask Learning for Neural
Speech Translation”. In: Annual Conference of the North American Chapter of the
Association for Computational Linguistics, 2018, pages 82–91 (cited on page 589).
[1057] Parnia Bahar, Tobias Bieschke, and Hermann Ney. “A Comparative Study on End-
to-End Speech to Text Translation”. In: IEEE Automatic Speech Recognition and
Understanding Workshop, 2019, pages 792–799 (cited on page 589).
[1058] Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, and Sharon Gold-
water. “Pre-training on high-resource speech recognition improves low-resource
speech-to-text translation”. In: Annual Conference of the North American Chap-
ter of the Association for Computational Linguistics, 2019, pages 58–68 (cited on
page 590).
[1059] Alexandre Berard, Laurent Besacier, Ali Can Kocabiyikoglu, and Olivier Pietquin.
“End-to-End Automatic Speech Translation of Audiobooks”. In: International Con-
ference on Acoustics, Speech and Signal Processing, 2018, pages 6224–6228 (cited
on page 590).
752 BIBLIOGRAPHY
[1060] Ye Jia, Melvin Johnson, Wolfgang Macherey, Ron J. Weiss, Yuan Cao, Chung-
Cheng Chiu, Naveen Ari, Stella Laurenzo, and Yonghui Wu. “Leveraging Weakly
Supervised Data to Improve End-to-end Speech-to-text Translation”. In: Interna-
tional Conference on Acoustics, Speech and Signal Processing, 2019, pages 7180–
7184 (cited on page 590).
[1061] Anne Wu, Changhan Wang, Juan Pino, and Jiatao Gu. “Self-Supervised Represen-
tations Improve End-to-End Speech Translation”. In: International Symposium on
Computer Architecture, 2020, pages 1491–1495 (cited on page 590).
[1062] Yuchen Liu, Hao Xiong, Jiajun Zhang, Zhongjun He, Hua Wu, Haifeng Wang, and
Chengqing Zong. “End-to-End Speech Translation with Knowledge Distillation”.
In: Annual Conference of the International Speech Communication Association,
2019, pages 1128–1132 (cited on page 590).
[1063] Ashkan Alinejad and Anoop Sarkar. “Effectively pretraining a speech translation
decoder with Machine Translation data”. In: Conference on Empirical Methods in
Natural Language Processing, 2020, pages 8014–8020 (cited on page 590).
[1064] Takatomo Kano, Sakriani Sakti, and Satoshi Nakamura. “Structured-Based Cur-
riculum Learning for End-to-End English-Japanese Speech Translation”. In: An-
nual Conference of the International Speech Communication Association, 2017,
pages 2630–2634 (cited on page 591).
[1065] Chengyi Wang, Yu Wu, Shujie Liu, Ming Zhou, and Zhenglu Yang. “Curriculum
Pre-training for End-to-End Speech Translation”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2020, pages 3728–3738 (cited on page 591).
[1066] Lucia Specia, Stella Frank, Khalil Sima’an, and Desmond Elliott. “A Shared Task
on Multimodal Machine Translation and Crosslingual Image Description”. In: An-
nual Meeting of the Association for Computational Linguistics, 2016, pages 543–
553 (cited on page 592).
[1067] Ozan Caglayan, Walid Aransa, Adrien Bardet, Mercedes Garcı
́
a-Martı
́
nez, Fethi
Bougares, Loı
̈
c Barrault, Marc Masana, Luis Herranz, and Joost van de Weijer.
“LIUM-CVC Submissions for WMT17 Multimodal Translation Task”. In: Annual
Meeting of the Association for Computational Linguistics, 2017, pages 432–439
(cited on page 592).
[1068] Jindrich Libovický, Jindrich Helcl, Marek Tlustý, Ondrej Bojar, and Pavel Pecina.
“CUNI System for WMT16 Automatic Post-Editing and Multimodal Translation
Tasks”. In: Annual Meeting of the Association for Computational Linguistics, 2016,
pages 646–654 (cited on page 592).
BIBLIOGRAPHY 753
[1069] Iacer Calixto and Qun Liu. “Incorporating Global Visual Features into Attention-
based Neural Machine Translation”. In: Conference on Empirical Methods in Nat-
ural Language Processing, 2017, pages 992–1003 (cited on page 592).
[1070] Jean-Benoit Delbrouck and Stéphane Dupont. “Modulating and attending the source
image during encoding improves Multimodal Translation”. In: Conference and
Workshop on Neural Information Processing Systems, 2017 (cited on pages
592,
594).
[1071] Jindrich Helcl, Jindrich Libovický, and Dusan Varis. “CUNI System for the WMT18
Multimodal Translation Task”. In: Annual Meeting of the Association for Compu-
tational Linguistics, 2018, pages 616–623 (cited on page 592).
[1072] Desmond Elliott and Ákos Kádár. “Imagination Improves Multimodal Transla-
tion”. In: International Joint Conference on Natural Language Processing, 2017,
pages 130–141 (cited on pages 592, 594).
[1073] Yongjing Yin, Fandong Meng, Jinsong Su, Chulun Zhou, Zhengyuan Yang, Jie
Zhou, and Jiebo Luo. “A Novel Graph-based Multi-modal Fusion Encoder for Neu-
ral Machine Translation”. In: Annual Meeting of the Association for Computational
Linguistics, 2020, pages 3025–3035 (cited on page 592).
[1074] Yuting Zhao, Mamoru Komachi, Tomoyuki Kajiwara, and Chenhui Chu. “Double
Attention-based Multimodal Neural Machine Translation with Semantic Image Re-
gions”. In: Annual Conference of the European Association for Machine Transla-
tion, 2020, pages 105–114 (cited on page 592).
[1075] Desmond Elliott, Stella Frank, and Eva Hasler. “Multi-Language Image Descrip-
tion with Neural Sequence Models”. In: CoRR abs/1510.04709 (2015) (cited on
page 592).
[1076] Pranava Swaroop Madhyastha, Josiah Wang, and Lucia Specia. “Sheffield Mul-
tiMT: Using Object Posterior Predictions for Multimodal Machine Translation”. In:
Annual Meeting of the Association for Computational Linguistics, 2017, pages 470–
476 (cited on page 592).
[1077] Shaowei Yao and Xiaojun Wan. “Multimodal Transformer for Multimodal Ma-
chine Translation”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2020, pages 4346–4350 (cited on page 594).
[1078] Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. “Hierarchical Question-
Image Co-Attention for Visual Question Answering”. In: Conference on Neural
Information Processing Systems, 2016, pages 289–297 (cited on page 595).
754 BIBLIOGRAPHY
[1079] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra,
C. Lawrence Zitnick, and Devi Parikh. “VQA: Visual Question Answering”. In:
International Conference on Computer Vision, 2015, pages 2425–2433 (cited on
page 595).
[1080] Raffaella Bernardi, Ruket Çakici, Desmond Elliott, Aykut Erdem, Erkut Erdem,
Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, and Barbara Plank. “Automatic
Description Generation from Images: A Survey of Models, Datasets, and Evalua-
tion Measures (Extended Abstract)”. In: International Joint Conference on Artifi-
cial Intelligence, 2017, pages 4970–4974 (cited on page 595).
[1081] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. “Show and
tell: A neural image caption generator”. In: IEEE Conference on Computer Vision
and Pattern Recognition, 2015, pages 3156–3164 (cited on page 596).
[1082] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan
Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. “Show, Attend and Tell:
Neural Image Caption Generation with Visual Attention”. In: International Confer-
ence on Machine Learning, 2015, pages 2048–2057 (cited on page 596).
[1083] Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. “Image
Captioning with Semantic Attention”. In: IEEE Conference on Computer Vision
and Pattern Recognition, 2016, pages 4651–4659 (cited on page 596).
[1084] Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-
Seng Chua. “SCA-CNN: Spatial and Channel-Wise Attention in Convolutional
Networks for Image Captioning”. In: IEEE Conference on Computer Vision and
Pattern Recognition, 2017, pages 6298–6306 (cited on page 596).
[1085] Kun Fu, Junqi Jin, Runpeng Cui, Fei Sha, and Changshui Zhang. “Aligning Where
to See and What to Tell: Image Captioning with Region-Based Attention and Scene-
Specific Contexts”. In: IEEE Transactions on Pattern Analysis and Machine Intel-
ligence, 2017, pages 2321–2334 (cited on page 596).
[1086] Chang Liu, Fuchun Sun, Changhu Wang, Feng Wang, and Alan L. Yuille. “MAT:
A Multimodal Attentive Translator for Image Captioning”. In: International Joint
Conference on Artificial Intelligence, 2017, pages 4033–4039 (cited on page 596).
[1087] Joseph Redmon and Ali Farhadi. “YOLOv3: An Incremental Improvement”. In:
CoRR, 2018 (cited on page 596).
[1088] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. “YOLOv4:
Optimal Speed and Accuracy of Object Detection”. In: CoRR, 2020 (cited on page 596).
BIBLIOGRAPHY 755
[1089] Ting Yao, Yingwei Pan, Yehao Li, and Tao Mei. “Exploring Visual Relationship for
Image Captioning”. In: Lecture Notes in Computer Science. European Conference
on Computer Vision, 2018 (cited on page 596).
[1090] Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. “Knowing When to
Look: Adaptive Attention via a Visual Sentinel for Image Captioning”. In: IEEE
Conference on Computer Vision and Pattern Recognition, 2017, pages 3242–3250
(cited on page 597).
[1091] Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen
Gould, and Lei Zhang. “Bottom-Up and Top-Down Attention for Image Caption-
ing and Visual Question Answering”. In: IEEE Conference on Computer Vision
and Pattern Recognition, 2018, pages 6077–6086 (cited on page 597).
[1092] Jyoti Aneja, Aditya Deshpande, and Alexander G. Schwing. “Convolutional Image
Captioning”. In: IEEE Conference on Computer Vision and Pattern Recognition,
2018, pages 5561–5570 (cited on page 597).
[1093] Fang Fang, Hanli Wang, Yihao Chen, and Pengjie Tang. “Looking deeper and trans-
ferring attention for image captioning”. In: Multimedia Tools Applications, 2018,
pages 31159–31175 (cited on page 597).
[1094] Scott E. Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele,
and Honglak Lee. “Generative Adversarial Text to Image Synthesis”. In: Interna-
tional Conference on Machine Learning, 2016, pages 1060–1069 (cited on page 597).
[1095] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-
Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. “Generative Ad-
versarial Nets”. In: Conference on Neural Information Processing Systems, 2014,
pages 2672–2680 (cited on page 598).
[1096] Hajar Emami, Majid Moradi Aliabadi, Ming Dong, and Ratna Babu Chinnam.
“SPA-GAN: Spatial Attention GAN for Image-to-Image Translation”. In: IEEE
Transactions on Multimedia, 2019 (cited on page 598).
[1097] Ayushman Dash, John Cristian Borges Gamboa, Sheraz Ahmed, Marcus Liwicki,
and Muhammad Zeshan Afzal. “TAC-GAN - Text Conditioned Auxiliary Classifier
Generative Adversarial Network”. In: CoRR, 2017 (cited on page 598).
[1098] Yehoshua Bar-Hillel. “The Present Status of Automatic Translation of Languages”.
In: volume 1. Advances in computers, 1960, pages 91–163 (cited on page 599).
[1099] Sameen Maruf, Fahimeh Saleh, and Gholamreza Haffari. “A Survey on Document-
level Machine Translation: Methods and Evaluation”. In: volume abs/1912.08494.
CoRR, 2019 (cited on page 599).
756 BIBLIOGRAPHY
[1100] Andrei Popescu-Belis. “Context in Neural Machine Translation: A Review of Mod-
els and Evaluations”. In: volume abs/1901.09115. CoRR, 2019 (cited on page 599).
[1101] Daniel Marcu, Lynn Carlson, and Maki Watanabe. “The Automatic Translation
of Discourse Structures”. In: Applied Natural Language Processing Conference,
2000, pages 9–17 (cited on page 599).
[1102] George Foster, Pierre Isabelle, and Roland Kuhn. “Translating structured docu-
ments”. In: Proceedings of AMTA, 2010 (cited on page 599).
[1103] Annie Louis and Bonnie L. Webber. “Structured and Unstructured Cache Models
for SMT Domain Adaptation”. In: Annual Conference of the European Association
for Machine Translation, 2014, pages 155–163 (cited on page 599).
[1104] Christian Hardmeier and Marcello Federico. “Modelling pronominal anaphora in
statistical machine translation”. In: International Workshop on Spoken Language
Translation, 2010, pages 283–289 (cited on pages 599, 600).
[1105] Ronan Le Nagard and Philipp Koehn. “Aiding Pronoun Translation with Co-Reference
Resolution”. In: Annual Meeting of the Association for Computational Linguistics,
2010, pages 252–261 (cited on page 599).
[1106] Ngoc-Quang Luong and Andrei Popescu-Belis. “A Contextual Language Model
to Improve Machine Translation of Pronouns by Re-ranking Translation Hypothe-
ses”. In: European Association for Machine Translation, 2016, pages 292–304
(cited on page 599).
[1107] Jörg Tiedemann. “Context adaptation in statistical machine translation using mod-
els with exponentially decaying cache”. In: Annual Meeting of the Association for
Computational Linguistics, 2010, pages 8–15 (cited on page 599).
[1108] Zhengxian Gong, Min Zhang, and Guodong Zhou. “Cache-based Document-level
Statistical Machine Translation”. In: Conference on Empirical Methods in Natural
Language Processing, 2011, pages 909–919 (cited on page 599).
[1109] Deyi Xiong, Guosheng Ben, Min Zhang, Yajuan Lv, and Qun Liu. “Modeling Lex-
ical Cohesion for Document-Level Machine Translation”. In: International Joint
Conference on Artificial Intelligence, 2013, pages 2183–2189 (cited on page 599).
[1110] Tong Xiao, Jingbo Zhu, Shujie Yao, and Hao Zhang. “Document-level consistency
verification in machine translation”. In: Machine Translation Summit. Volume 13.
2011, pages 131–138 (cited on page 599).
BIBLIOGRAPHY 757
[1111] Thomas Meyer, Andrei Popescu-Belis, Sandrine Zufferey, and Bruno Cartoni. “Mul-
tilingual Annotation and Disambiguation of Discourse Connectives for Machine
Translation”. In: Annual Meeting of the Special Interest Group on Discourse and
Dialogue, 2011, pages 194–203 (cited on page 599).
[1112] Thomas Meyer and Andrei Popescu-Belis. “Using Sense-labeled Discourse Con-
nectives for Statistical Machine Translation”. In: Hybrid Approaches to Machine
Translation, 2012, pages 129–138 (cited on page 599).
[1113] Jörg Tiedemann and Yves Scherrer. “Neural Machine Translation with Extended
Context”. In: Proceedings of the Third Workshop on Discourse in Machine Trans-
lation, 2017, pages 82–92 (cited on pages 600, 601).
[1114] Rachel Bawden, Rico Sennrich, Alexandra Birch, and Barry Haddow. “Evaluat-
ing Discourse Phenomena in Neural Machine Translation”. In: Annual Conference
of the North American Chapter of the Association for Computational Linguistics,
2018, pages 1304–1313 (cited on pages 600–602).
[1115] Annette Rios Gonzales, Laura Mascarell, and Rico Sennrich. “Improving Word
Sense Disambiguation in Neural Machine Translation with Sense Embeddings”. In:
Annual Meeting of the Association for Computational Linguistics, 2017, pages 11–
19 (cited on pages 600–602).
[1116] Valentin Macé and Christophe Servan. “Using Whole Document Context in Neu-
ral Machine Translation”. In: The International Workshop on Spoken Language
Translation, 2019 (cited on pages 600–602).
[1117] Sébastien Jean, Stanislas Lauly, Orhan Firat, and Kyunghyun Cho. “Does Neural
Machine Translation Benefit from Larger Context?” In: volume abs/1704.05135.
CoRR, 2017 (cited on pages 600–603).
[1118] Jiacheng Zhang, Huanbo Luan, Maosong Sun, Feifei Zhai, Jingfang Xu, Min Zhang,
and Yang Liu. “Improving the Transformer Translation Model with Document-
Level Context”. In: Conference on Empirical Methods in Natural Language Pro-
cessing, 2018, pages 533–542 (cited on pages 600–603).
[1119] Sameen Maruf, André F. T. Martins, and Gholamreza Haffari. “Selective Atten-
tion for Context-aware Neural Machine Translation”. In: Annual Conference of the
North American Chapter of the Association for Computational Linguistics, 2019,
pages 3092–3102 (cited on pages 600, 601, 604).
[1120] Sameen Maruf and Gholamreza Haffari. “Document Context Neural Machine Trans-
lation with Memory Networks”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2018, pages 1275–1284 (cited on pages 600, 604).
758 BIBLIOGRAPHY
[1121] Zhengxin Yang, Jinchao Zhang, Fandong Meng, Shuhao Gu, Yang Feng, and Jie
Zhou. “Enhancing Context Modeling with a Query-Guided Capsule Network for
Document-level Translation”. In: Conference on Empirical Methods in Natural
Language Processing, 2019, pages 1527–1537 (cited on pages 600, 604).
[1122] Zaixiang Zheng, Xiang Yue, Shujian Huang, Jiajun Chen, and Alexandra Birch.
“Towards Making the Most of Context in Neural Machine Translation”. In: Inter-
national Joint Conference on Artificial Intelligence, 2020, pages 3983–3989 (cited
on pages 600, 604).
[1123] Shaohui Kuang, Deyi Xiong, Weihua Luo, and Guodong Zhou. “Modeling Coher-
ence for Neural Machine Translation with Dynamic and Topic Caches”. In: Inter-
national Conference on Computational Linguistics, 2018, pages 596–606 (cited on
pages 600, 601, 604).
[1124] Zhaopeng Tu, Yang Liu, Shuming Shi, and Tong Zhang. “Learning to Remember
Translation History with a Continuous Cache”. In: Transactions of the Association
for Computational Linguistics, 2018, pages 407–420 (cited on pages 600, 601, 604,
605).
[1125] Eva Martı
́
nez Garcia, Carles Creus, and Cristina España-Bonet. “Context-Aware
Neural Machine Translation Decoding”. In: Proceedings of the Fourth Workshop
on Discourse in Machine Translation, 2019, pages 13–23 (cited on pages 600, 605).
[1126] Lei Yu, Laurent Sartran, Wojciech Stokowiec, Wang Ling, Lingpeng Kong, Phil
Blunsom, and Chris Dyer. “Better Document-Level Machine Translation with Bayes’
Rule”. In: volume 8. Transactions of the Association for Computational Linguis-
tics, 2020, pages 346–360 (cited on pages 600, 605).
[1127] Amane Sugiyama and Naoki Yoshinaga. “Context-aware Decoder for Neural Ma-
chine Translation using a Target-side Document-Level Language Model”. In: vol-
ume abs/2010.12827. CoRR, 2020 (cited on pages 600, 605).
[1128] Hao Xiong, Zhongjun He, Hua Wu, and Haifeng Wang. “Modeling Coherence for
Discourse Neural Machine Translation”. In: AAAI Conference on Artificial Intel-
ligence, 2019, pages 7338–7345 (cited on pages 600, 605).
[1129] Elena Voita, Rico Sennrich, and Ivan Titov. “When a Good Translation is Wrong
in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and
Lexical Cohesion”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2019, pages 1198–1212 (cited on pages 600, 605).
BIBLIOGRAPHY 759
[1130] Elena Voita, Rico Sennrich, and Ivan Titov. “Context-Aware Monolingual Repair
for Neural Machine Translation”. In: Conference on Empirical Methods in Natural
Language Processing, 2019, pages 877–886 (cited on pages 600, 606).
[1131] Lesly Miculicich Werlen and Andrei Popescu-Belis. “Validation of an Automatic
Metric for the Accuracy of Pronoun Translation (APT)”. In: Proceedings of the
Third Workshop on Discourse in Machine Translation, 2017, pages 17–25 (cited
on page 600).
[1132] Billy Tak-Ming Wong and Chunyu Kit. “Extending Machine Translation Evalua-
tion Metrics with Lexical Cohesion to Document Level”. In: Conference on Em-
pirical Methods in Natural Language Processing, 2012, pages 1060–1068 (cited on
page 600).
[1133] Zhengxian Gong, Min Zhang, and Guodong Zhou. “Document-Level Machine
Translation Evaluation with Gist Consistency and Text Cohesion”. In: Proceedings
of the Second Workshop on Discourse in Machine Translation, 2015, pages 33–40
(cited on page 600).
[1134] Najeh Hajlaoui and Andrei Popescu-Belis. “Assessing the Accuracy of Discourse
Connective Translations: Validation of an Automatic Metric”. In: volume 7817.
Springer, 2013, pages 236–247 (cited on page 600).
[1135] Annette Rios, Mathias Müller, and Rico Sennrich. “The Word Sense Disambigua-
tion Test Suite at WMT18”. In: Conference on Empirical Methods in Natural Lan-
guage Processing, 2018, pages 588–596 (cited on page 600).
[1136] Mathias Müller, Annette Rios, Elena Voita, and Rico Sennrich. “A Large-Scale Test
Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine
Translation”. In: Conference on Empirical Methods in Natural Language Process-
ing, 2018, pages 61–72 (cited on page 600).
[1137] Ruchit Rajeshkumar Agrawal, Marco Turchi, and Matteo Negri. “Contextual han-
dling in neural machine translation: Look behind, ahead and on both sides”. In:
Annual Conference of the European Association for Machine Translation, 2018,
pages 11–20 (cited on page 601).
[1138] Longyue Wang, Zhaopeng Tu, Andy Way, and Qun Liu. “Exploiting Cross-Sentence
Context for Neural Machine Translation”. In: Conference on Empirical Methods in
Natural Language Processing, 2017, pages 2826–2831 (cited on pages 601, 604).
760 BIBLIOGRAPHY
[1139] Xin Tan, Longyin Zhang, Deyi Xiong, and Guodong Zhou. “Hierarchical Modeling
of Global Context for Document-Level Neural Machine Translation”. In: Confer-
ence on Empirical Methods in Natural Language Processing, 2019, pages 1576–
1585 (cited on page 601).
[1140] Yves Scherrer, Jörg Tiedemann, and Sharid Loáiciga. “Analysing concatenation
approaches to document-level NMT in two different domains”. In: Proceedings
of the Fourth Workshop on Discourse in Machine Translation, 2019, pages 51–61
(cited on page 601).
[1141] Amane Sugiyama and Naoki Yoshinaga. “Data augmentation using back-translation
for context-aware neural machine translation”. In: Proceedings of the Fourth Work-
shop on Discourse in Machine Translation, 2019, pages 35–44 (cited on pages 602,
607).
[1142] Shaohui Kuang and Deyi Xiong. “Fusing Recency into Neural Machine Transla-
tion with an Inter-Sentence Gate Model”. In: International Conference on Compu-
tational Linguistics, 2018, pages 607–617 (cited on pages 602, 603).
[1143] Hayahide Yamagishi and Mamoru Komachi. “Improving Context-Aware Neural
Machine Translation with Target-Side Context”. In: International Conference of
the Pacific Association for Computational Linguistics, 2019 (cited on pages 602,
603).
[1144] Alan V. Oppenheim and Ronald W. Schafer. “Discrete-time Signal Processing”. In:
Pearson, 2009 (cited on page 607).
[1145] Thomas F. Quatieri. Discrete-Time Speech Signal Processing: Principles and Prac-
tice. Prentice Hall PTR, 2001 (cited on page 607).
[1146] Lawrence R. Rabiner and Biing-Hwang Juang. Fundamentals of speech recogni-
tion. Prentice Hall, 1993 (cited on page 607).
[1147] Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon. Spoken Language Process-
ing: A Guide to Theory, Algorithm and System Development. Prentice Hall PTR,
2001 (cited on page 607).
[1148] Li Deng Dong Yu. Automatic Speech Recognition: a Deep Learning Approach.
Springer, 2008 (cited on page 607).
[1149] Mingbo Ma, Liang Huang, Hao Xiong, Renjie Zheng, Kaibo Liu, Baigong Zheng,
Chuanqiang Zhang, Zhongjun He, Hairong Liu, Xing Li, Hua Wu, and Haifeng
Wang. “STACL: Simultaneous Translation with Implicit Anticipation and Control-
lable Latency using Prefix-to-Prefix Framework”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2019, pages 3025–3036 (cited on page 607).
BIBLIOGRAPHY 761
[1150] Renjie Zheng, Mingbo Ma, Baigong Zheng, and Liang Huang. “Speculative Beam
Search for Simultaneous Translation”. In: Conference on Empirical Methods in
Natural Language Processing, 2019, pages 1395–1402 (cited on page 607).
[1151] Fahim Dalvi, Nadir Durrani, Hassan Sajjad, and Stephan Vogel. “Incremental De-
coding and Training Methods for Simultaneous Translation in Neural Machine
Translation”. In: Annual Conference of the North American Chapter of the Asso-
ciation for Computational Linguistics, 2018, pages 493–499 (cited on page 607).
[1152] Kyunghyun Cho and Masha Esipova. “Can neural machine translation do simulta-
neous translation?” In: CoRR, 2016 (cited on page 607).
[1153] Jiatao Gu, Graham Neubig, Kyunghyun Cho, and Victor O. K. Li. “Learning to
Translate in Real-time with Neural Machine Translation”. In: Annual Conference
of the European Association for Machine Translation, 2017, pages 1053–1062
(cited on page 607).
[1154] Alvin Grissom II, He He, Jordan L. Boyd-Graber, John Morgan, and Hal Daumé
III. “Don’t Until the Final Verb Wait: Reinforcement Learning for Simultaneous
Machine Translation”. In: Conference on Empirical Methods in Natural Language
Processing, 2014, pages 1342–1352 (cited on page 607).
[1155] Baigong Zheng, Kaibo Liu, Renjie Zheng, Mingbo Ma, Hairong Liu, and Liang
Huang. “Simultaneous Translation Policies: From Fixed to Adaptive”. In: Annual
Meeting of the Association for Computational Linguistics, 2020, pages 2847–2853
(cited on page 607).
[1156] Baigong Zheng, Renjie Zheng, Mingbo Ma, and Liang Huang. “Simpler and Faster
Learning of Adaptive Policies for Simultaneous Translation”. In: Conference on
Empirical Methods in Natural Language Processing, 2019, pages 1349–1354 (cited
on page 607).
[1157] Baigong Zheng, Renjie Zheng, Mingbo Ma, and Liang Huang. “Simultaneous Trans-
lation with Flexible Policy via Restricted Imitation Learning”. In: Annual Meeting
of the Association for Computational Linguistics, 2019, pages 5816–5822 (cited
on page 607).
[1158] Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, Chung-Cheng Chiu, Semih
Yavuz, Ruoming Pang, Wei Li, and Colin Raffel. “Monotonic Infinite Lookback
Attention for Simultaneous Machine Translation”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2019, pages 1313–1323 (cited on page 607).
762 BIBLIOGRAPHY
[1159] Yunsu Kim, Duc Thanh Tran, and Hermann Ney. “When and Why is Document-
level Context Useful in Neural Machine Translation?” In: Proceedings of the Fourth
Workshop on Discourse in Machine Translation, 2019, pages 24–34 (cited on page 607).
[1160] Sébastien Jean and Kyunghyun Cho. “Context-Aware Learning for Neural Machine
Translation”. In: volume abs/1903.04715. CoRR, 2019 (cited on page 607).
[1161] Danielle Saunders, Felix Stahlberg, and Bill Byrne. “Using Context in Neural Ma-
chine Translation Training Objectives”. In: Annual Meeting of the Association for
Computational Linguistics, 2020, pages 7764–7770 (cited on page 607).
[1162] Dario Stojanovski and Alexander M. Fraser. “Improving Anaphora Resolution in
Neural Machine Translation Using Curriculum Learning”. In: Annual Conference
of the European Association for Machine Translation, 2019, pages 140–150 (cited
on page 607).
[1163] Tejas Gokhale, Pratyay Banerjee, Chitta Baral, and Yezhou Yang. “MUTANT: A
Training Paradigm for Out-of-Distribution Generalization in Visual Question An-
swering”. In: Conference on Empirical Methods in Natural Language Processing,
2020, pages 878–892 (cited on page 607).
[1164] Ruixue Tang, Chao Ma, Wei Emma Zhang, Qi Wu, and Xiaokang Yang. “Seman-
tic Equivalent Adversarial Data Augmentation for Visual Question Answering”.
In: European Conference on Computer Vision, 2020, pages 437–453 (cited on
page 607).
[1165] Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, and Jian-
feng Gao. “Unified Vision-Language Pre-Training for Image Captioning and VQA”.
In: AAAI Conference on Artificial Intelligence, 2020, pages 13041–13049 (cited
on page 607).
[1166] Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai.
“VL-BERT: Pre-training of Generic Visual-Linguistic Representations”. In: Inter-
national Conference on Learning Representations, 2020 (cited on page 607).
[1167] Liangyou Li, Xin Jiang, and Qun Liu. “Pretrained Language Models for Document-
Level Neural Machine Translation”. In: volume abs/1911.03110. CoRR, 2019 (cited
on page 607).
[1168] Shuhao Gu and Yang Feng. “Investigating Catastrophic Forgetting During Con-
tinual Training for Neural Machine Translation”. In: International Committee on
Computational Linguistics, 2020, pages 4315–4326 (cited on page 611).
BIBLIOGRAPHY 763
[1169] Chenhui Chu, Raj Dabre, and Sadao Kurohashi. “An Empirical Comparison of
Simple Domain Adaptation Methods for Neural Machine Translation”. In: vol-
ume abs/1701.03214. CoRR, 2017 (cited on page 611).
[1170] Huda Khayrallah, Brian Thompson, Kevin Duh, and Philipp Koehn. “Regularized
Training Objective for Continued Training for Domain Adaptation in Neural Ma-
chine Translation”. In: Annual Meeting of the Association for Computational Lin-
guistics, 2018, pages 36–44 (cited on page 611).
[1171] Brian Thompson, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, and Philipp
Koehn. “Overcoming Catastrophic Forgetting During Domain Adaptation of Neu-
ral Machine Translation”. In: Annual Meeting of the Association for Computational
Linguistics, 2019, pages 2062–2068 (cited on page 611).
[1172] Joern Wuebker, Spence Green, John DeNero, Sasa Hasan, and Minh-Thang Lu-
ong. “Models and Inference for Prefix-Constrained Machine Translation”. In: An-
nual Meeting of the Association for Computational Linguistics, 2016 (cited on
page 614).
[1173] Franz Josef Och, Richard Zens, and Hermann Ney. “Efficient Search for Interactive
Statistical Machine Translation”. In: the European Chapter of the Association for
Computational Linguistics. 2003, pages 387–393 (cited on page 614).
[1174] Sergio Barrachina, Oliver Bender, Francisco Casacuberta, Jorge Civera, Elsa Cubel,
Shahram Khadivi, Antonio L. Lagarda, Hermann Ney, Jesús Tomás, Enrique Vidal,
and Juan Miguel Vilar. “Statistical Approaches to Computer-Assisted Translation”.
In: Computer Linguistics, 2009, pages 3–28 (cited on page 614).
[1175] Miguel Domingo, Álvaro Peris, and Francisco Casacuberta. “Segment-based interactive-
predictive machine translation”. In: Machine Translation, 2017, pages 163–185
(cited on page 614).
[1176] Tsz Kin Lam, Julia Kreutzer, and Stefan Riezler. “A Reinforcement Learning Ap-
proach to Interactive-Predictive Neural Machine Translation”. In: CoRR, 2018
(cited on page 614).
[1177] Miguel Domingo, Mercedes Garcı
́
a-Martı
́
nez, Amando Estela, Laurent Bié, Alexan-
dre Helle, Álvaro Peris, Francisco Casacuberta, and Manuel Herranz. “Demonstra-
tion of a Neural Machine Translation System with Online Learning for Transla-
tors”. In: Annual Meeting of the Association for Computational Linguistics, 2019,
pages 70–74 (cited on page 614).
764 BIBLIOGRAPHY
[1178] Kun Wang, Chengqing Zong, and Keh-Yih Su. “Integrating Translation Memory
into Phrase-Based Machine Translation during Decoding”. In: Annual Meeting
of the Association for Computational Linguistics, 2013, pages 11–21 (cited on
page 615).
[1179] Mengzhou Xia, Guoping Huang, Lemao Liu, and Shuming Shi. “Graph Based
Translation Memory for Neural Machine Translation”. In: the Association for the
Advance of Artificial Intelligence, 2019, pages 7297–7304 (cited on page 615).
[1180] Chris Hokamp and Qun Liu. “Lexically Constrained Decoding for Sequence Gen-
eration Using Grid Beam Search”. In: Annual Meeting of the Association for Com-
putational Linguistics, 2017, pages 1535–1546 (cited on page 616).
[1181] Matt Post and David Vilar. “Fast Lexically Constrained Decoding with Dynamic
Beam Allocation for Neural Machine Translation”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2018, pages 1314–1324 (cited on page 616).
[1182] Rajen Chatterjee, Matteo Negri, Marco Turchi, Marcello Federico, Lucia Specia,
and Frédéric Blain. “Guiding Neural Machine Translation Decoding with External
Knowledge”. In: Annual Meeting of the Association for Computational Linguis-
tics, 2017, pages 157–168 (cited on page 616).
[1183] Eva Hasler, Adrià de Gispert, Gonzalo Iglesias, and Bill Byrne. “Neural Machine
Translation Decoding with Terminology Constraints”. In: Annual Meeting of the
Association for Computational Linguistics, 2018, pages 506–512 (cited on page 616).
[1184] Kai Song, Yue Zhang, Heng Yu, Weihua Luo, Kun Wang, and Min Zhang. “Code-
Switching for Enhancing NMT with Pre-Specified Translation”. In: Annual Meet-
ing of the Association for Computational Linguistics, 2019, pages 449–459 (cited
on page 616).
[1185] Georgiana Dinu, Prashant Mathur, Marcello Federico, and Yaser Al-Onaizan. “Train-
ing Neural Machine Translation to Apply Terminology Constraints”. In: Annual
Meeting of the Association for Computational Linguistics, 2019, pages 3063–3068
(cited on page 616).
[1186] Tao Wang, Shaohui Kuang, Deyi Xiong, and António Branco. “Merging Exter-
nal Bilingual Pairs into Neural Machine Translation”. In: volume abs/1912.00567.
CoRR, 2019 (cited on page 616).
[1187] Guanhua Chen, Yun Chen, Yong Wang, and Victor O. K. Li. “Lexical-Constraint-
Aware Neural Machine Translation via Data Augmentation”. In: International Joint
Conference on Artificial Intelligence, 2020, pages 3587–3593 (cited on page 616).
BIBLIOGRAPHY 765
[1188] Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. “Adaptive
Neural Networks for Fast Test-Time Prediction”. In: volume abs/1702.07811. CoRR,
2017 (cited on page 617).
[1189] Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and
Kilian Q. Weinberger. “Multi-Scale Dense Networks for Resource Efficient Image
Classification”. In: International Conference on Learning Representations, 2018
(cited on page 617).
[1190] Jiarui Fang, Yang Yu, Chengduo Zhao, and Jie Zhou. “TurboTransformers: An Ef-
ficient GPU Serving System For Transformer Models”. In: CoRR, 2020 (cited on
page 619).
[1191] Tong Xiao, Jingbo Zhu, Hao Zhang, and Qiang Li. “NiuTrans: An Open Source
Toolkit for Phrase-based and Syntax-based Machine Translation”. In: Annual Meet-
ing of the Association for Computational Linguistics, 2012, pages 19–24 (cited on
page 631).
[1192] Zhifei Li, Chris Callison-Burch, Chris Dyer, Sanjeev Khudanpur, Lane Schwartz,
Wren N. G. Thornton, Jonathan Weese, and Omar Zaidan. “Joshua: An Open Source
Toolkit for Parsing-Based Machine Translation”. In: Annual Meeting of the Asso-
ciation for Computational Linguistics, 2009, pages 135–139 (cited on page 631).
[1193] Gonzalo Iglesias, Adrià de Gispert, Eduardo Rodrı
́
guez Banga, and William J.
Byrne. “Hierarchical Phrase-Based Translation with Weighted Finite State Trans-
ducers”. In: Annual Meeting of the Association for Computational Linguistics,
2009, pages 433–441 (cited on page 632).
[1194] Chris Dyer, Adam Lopez, Juri Ganitkevitch, Jonathan Weese, Ferhan Türe, Phil
Blunsom, Hendra Setiawan, Vladimir Eidelman, and Philip Resnik. “cdec: A De-
coder, Alignment, and Learning Framework for Finite-State and Context-Free Trans-
lation Models”. In: Annual Meeting of the Association for Computational Linguis-
tics, 2010, pages 7–12 (cited on page 632).
[1195] Daniel M. Cer, Michel Galley, Daniel Jurafsky, and Christopher D. Manning. “Phrasal:
A Statistical Machine Translation Toolkit for Exploring New Model Features”. In:
Annual Meeting of the Association for Computational Linguistics, 2010, pages 9–
12 (cited on page 632).
[1196] David Vilar, Daniel Stein, Matthias Huck, and Hermann Ney. “Jane: an advanced
freely available hierarchical machine translation toolkit”. In: volume 26. 3. Ma-
chine Translation, 2012, pages 197–216 (cited on page 632).
766 BIBLIOGRAPHY
[1197] Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermüller, Dzmitry
Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexan-
der Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bis-
son, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski,
Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre Luc Carrier,
Kyunghyun Cho, Jan Chorowski, Paul F. Christiano, Tim Cooijmans, Marc-Alexandre
Côté, Myriam Côté, Aaron C. Courville, Yann N. Dauphin, Olivier Delalleau, et al.
“Theano: A Python framework for fast computation of mathematical expressions”.
In: volume abs/1605.02688. CoRR, 2016 (cited on page 633).
[1198] Barret Zoph, Ashish Vaswani, Jonathan May, and Kevin Knight. “Simple, Fast
Noise-Contrastive Estimation for Large RNN Vocabularies”. In: Annual Meeting
of the Association for Computational Linguistics, 2016, pages 1217–1222 (cited
on page 633).
[1199] Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huan-
Bo Luan, and Yang Liu. “THUMT: An Open Source Toolkit for Neural Machine
Translation”. In: volume abs/1706.06415. CoRR, 2017 (cited on page 633).
[1200] Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang,
Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri
Aji, Nikolay Bogoychev, André F. T. Martins, and Alexandra Birch. “Marian: Fast
Neural Machine Translation in C++”. In: Annual Meeting of the Association for
Computational Linguistics, 2018, pages 116–121 (cited on page 634).
[1201] Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov,
Ann Clifton, and Matt Post. “Sockeye: A Toolkit for Neural Machine Translation”.
In: volume abs/1712.05690. CoRR, 2017 (cited on page 634).
[1202] Xiaolin Wang, Masao Utiyama, and Eiichiro Sumita. “CytonMT: an Efficient Neu-
ral Machine Translation Open-source Toolkit Implemented in C++”. In: Annual
Meeting of the Association for Computational Linguistics, 2018, pages 133–138
(cited on page 634).
[1203] Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Carl Case, and
Paulius Micikevicius. “OpenSeq2Seq: extensible toolkit for distributed and mixed
precision training of sequence-to-sequence models”. In: volume abs/1805.10387.
CoRR, 2018 (cited on page 634).
[1204] Ozan Caglayan, Mercedes Garcı
́
a-Martı
́
nez, Adrien Bardet, Walid Aransa, Fethi
Bougares, and Loı
̈
c Barrault. “NMTPY: A Flexible Toolkit for Advanced Neural
Machine Translation Systems”. In: volume 109. The Prague Bulletin of Mathemat-
ical Linguistics, 2017, pages 15–28 (cited on page 634).