Analysis Methods in Neural Language Processing: A Survey
Analysis Methods in Neural Language Processing: A Survey
Analysis Methods in Neural Language Processing: A Survey
of the traditional systems. A plethora of new Why should we analyze our neural NLP mod-
models have been proposed, many of which els? To some extent, this question falls into
are thought to be opaque compared to their the larger question of interpretability in machine
feature-rich counterparts. This has led re- learning, which has been the subject of much de-
searchers to analyze, interpret, and evalu-
bate in recent years.2 Arguments in favor of in-
ate neural networks in novel and more fine-
grained ways. In this survey paper, we re-
terpretability in machine learning usually mention
view analysis methods in neural language goals like accountability, trust, fairness, safety,
processing, categorize them according to and reliability (Doshi-Velez and Kim, 2017; Lip-
prominent research trends, highlight exist- ton, 2016). Arguments against typically stress per-
ing limitations, and point to potential direc- formance as the most important desideratum. All
tions for future work. these arguments naturally apply to machine learn-
ing applications in NLP.
1 Introduction In the context of NLP, this question needs to
The rise of deep learning has transformed the be understood in light of earlier NLP work, often
field of natural language processing (NLP) in re- referred to as feature-rich or feature-engineered
cent years. Models based on neural networks systems. In some of these systems, features are
have obtained impressive improvements in vari- more easily understood by humans – they can be
ous tasks, including language modeling (Mikolov morphological properties, lexical classes, syntac-
et al., 2010; Jozefowicz et al., 2016), syntactic tic categories, semantic relations, etc. In theory,
parsing (Kiperwasser and Goldberg, 2016), ma- one could observe the importance assigned by sta-
chine translation (MT) (Bahdanau et al., 2014; tistical NLP models to such features in order to
Sutskever et al., 2014), and many other tasks; see gain a better understanding of the model.3 In con-
Goldberg (2017) for example success stories. trast, it is more difficult to understand what hap-
This progress has been accompanied by a myr- pens in an end-to-end neural network model that
iad of new neural network architectures. In many takes input (say, word embeddings) and generates
cases, traditional feature-rich systems are being re- an output (say, a sentence classification). Much of
placed by end-to-end neural networks that aim to the analysis work thus aims to understand how lin-
map input text to some output prediction. As end- guistic concepts that were common as features in
to-end systems are gaining prevalence, one may NLP systems are captured in neural networks.
point to two trends. First, some push back against As the analysis of neural networks for language
the abandonment of linguistic knowledge and call www.youtube.com/watch?v=fKk9KhGRBdI. (Videos
for incorporating it inside the networks in different accessed on December 11, 2018.)
2
ways.1 Others strive to better understand how neu- See, for example, the NIPS 2017 debate:
ral language processing models work. This theme www.youtube.com/watch?v=2hW05ZfsUUo. (Ac-
cessed on December 11, 2018.)
1 3
See, for instance, Noah Smith’s invited talk at ACL Nevertheless, one could question how feasible such an
2017: vimeo.com/234958746. See also a recent de- analysis is; consider for example interpreting support vectors
bate on this matter by Chris Manning and Yann LeCun: in high-dimensional support vector machines (SVMs).
is becoming more and more prevalent, neural net- past tense and analyzed its performance on a va-
works in various NLP tasks are being analyzed; riety of examples and conditions. They were es-
different network architectures and components pecially concerned with the performance over the
are being compared; and a variety of new anal- course of training, as their goal was to model the
ysis methods are being developed. This survey past form acquisition in children. They also ana-
aims to review and summarize this body of work, lyzed a scaled-down version having 8 input units
highlight current trends, and point to existing lacu- and 8 output units, which allowed them to de-
nae. It organizes the literature into several themes. scribe it exhaustively and examine how certain
Section 2 reviews work that targets a fundamen- rules manifest in network weights.
tal question: what kind of linguistic information In his seminal work on recurrent neural net-
is captured in neural networks? We also point to works (RNNs), Elman trained networks on syn-
limitations in current methods for answering this thetic sentences in a language prediction task (El-
question. Section 3 discusses visualization meth- man, 1989, 1990, 1991). Through extensive anal-
ods, and emphasizes the difficulty in evaluating vi- yses, he showed how networks discover the no-
sualization work. In Section 4 we discuss the com- tion of a word when predicting characters; cap-
pilation of challenge sets, or test suites, for fine- ture syntactic structures like number agreement;
grained evaluation, a methodology that has old and acquire word representations that reflect lexi-
roots in NLP. Section 5 deals with the generation cal and syntactic categories. Similar analyses were
and use of adversarial examples to probe weak- later applied to other networks and tasks (Har-
nesses of neural networks. We point to unique ris, 1990; Niklasson and Linåker, 2000; Pollack,
characteristics of dealing with text as a discrete 1990; Frank et al., 2013).
input and how different studies handle them. Sec- While Elman’s work was limited in some
tion 6 summarizes work on explaining model pre- ways, such as evaluating generalization or various
dictions, an important goal of interpretability re- linguistic phenomena—as Elman himself recog-
search. This is a relatively under-explored area, nized (Elman, 1989)—it introduced methods that
and we call for more work in this direction. Sec- are still relevant today: from visualizing network
tion 7 mentions a few other methods that do not activations in time, through clustering words by
fall neatly into one of the above themes. In the hidden state activations, to projecting representa-
conclusion, we summarize the main gaps and po- tions to dimensions that emerge as capturing prop-
tential research directions for the field. erties like sentence number or verb valency. The
The paper is accompanied by online supple- sections on visualization (Section 3) and identi-
mentary materials that contain detailed refer- fying linguistic information (Section 2) contain
ences for studies corresponding to Sections 2, many examples for these kinds of analysis.
4, and 5 (Tables SM1, SM2, and SM3, respec-
2 What linguistic information is
tively), available at boknilev.github.io/
captured in neural networks
nlp-analysis-methods.
Neural network models in NLP are typically
Before proceeding, we briefly mention some
trained in an end-to-end manner on input-output
earlier work of a similar spirit.
pairs, without explicitly encoding linguistic fea-
A historical note Reviewing the vast literature tures. Thus a primary questions is the following:
on neural networks for language is beyond our what linguistic information is captured in neural
scope.4 However, we mention here a few repre- networks? When examining answers to this ques-
sentative studies that focused on analyzing such tion, it is convenient to consider three dimensions:
networks, in order to illustrate how recent trends which methods are used for conducting the analy-
have roots that go back to before the recent deep sis, what kind of linguistic information is sought,
learning revival. and which objects in the neural network are be-
Rumelhart and McClelland (1986) built a feed- ing investigated. Table SM1 (in the supplementary
forward neural network for learning the English materials) categorizes relevant analysis work ac-
4
cording to these criteria. In the next sub-sections,
For instance, a neural network that learns distributed rep-
resentations of words was developed already in Miikkulainen
we discuss trends in analysis work along these
and Dyer (1991). See Goodfellow et al. (2016, chapter 12.4) lines, followed by a discussion of limitations of
for references to other important milestones. current approaches.
2.1 Methods computing correlations between neural network
activations and some property, for example, cor-
The most common approach for associating neu-
relating RNN state activations with depth in a
ral network components with linguistic properties
syntactic tree (Qian et al., 2016a) or with Mel-
is to predict such properties from activations of
frequency cepstral coefficient (MFCC) acoustic
the neural network. Typically, in this approach
features (Wu and King, 2016). Such correspon-
a neural network model is trained on some task
dence may also be computed indirectly. For in-
(say, MT) and its weights are frozen. Then, the
stance, Alishahi et al. (2017) defined an ABX dis-
trained model is used for generating feature repre-
crimination task to evaluate how a neural model of
sentations for another task by running it on a cor-
speech (grounded in vision) encoded phonology.
pus with linguistic annotations and recording the
Given phoneme representations from different lay-
representations (say, hidden state activations). An-
ers in their model, and three phonemes, A, B, and
other classifier is then used for predicting the prop-
X, they compared whether the model representa-
erty of interest (say, part-of-speech (POS) tags).
tion for X is closer to A or B. This discrimina-
The performance of this classifier is used for eval-
tion task enabled them to draw conclusions about
uating the quality of the generated representations,
which layers encoder phonology better, observing
and by proxy that of the original model. This kind
that lower layers generally encode more phonolog-
of approach has been used in numerous papers in
ical information.
recent years; see Table SM1 for references.5 It is
referred to by various names, including “auxiliary
prediction tasks” (Adi et al., 2017b), “diagnostic 2.2 Linguistic phenomena
classifiers” (Veldhoen et al., 2016), and “probing Different kinds of linguistic information have been
tasks” (Conneau et al., 2018). analyzed, ranging from basic properties like sen-
As an example of this approach, let us tence length, word position, word presence, or
walk through an application to analyzing syn- simple word order, to morphological, syntactic,
tax in neural machine translation (NMT) by and semantic information. Phonetic/phonemic in-
Shi et al. (2016b). In this work, two NMT formation, speaker information, and style and ac-
models were trained on standard parallel data cent information have been studied in neural net-
– English→French and English→German. The work models for speech, or in joint audio-visual
trained models (specifically, the encoders) were models. See Table SM1 for references.
run on an annotated corpus and their hidden states While it is difficult to synthesize a holistic pic-
were used for training a logistic regression clas- ture from this diverse body of work, it appears
sifier that predicts different syntactic properties. that neural networks are able to learn a substan-
The authors concluded that the NMT encoders tial amount of information on various linguistic
learn significant syntactic information at both phenomena. These models are especially success-
word-level and sentence-level. They also com- ful at capturing frequent properties, while some
pared representations at different encoding layers rare properties are more difficult to learn. Linzen
and found that “local features are somehow pre- et al. (2016), for instance, found that long short-
served in the lower layer whereas more global, term memory (LSTM) language models are able
abstract information tends to be stored in the up- to capture subject-verb agreement in many com-
per layer.” These results demonstrate the kind of mon cases, while direct supervision is required for
insights that the classification analysis may lead solving harder cases.
to, especially when comparing different models or Another theme that emerges in several studies
model components. is the hierarchical nature of the learned represen-
Other methods for finding correspondences be- tations. We have already mentioned such findings
tween parts of the neural network and certain regarding NMT (Shi et al., 2016b) and a visually
properties include counting how often attention grounded speech model (Alishahi et al., 2017).
weights agree with a linguistic property like Hierarchical representations of syntax were also
anaphora resolution (Voita et al., 2018) or directly reported to emerge in other RNN models (Blevins
5 et al., 2018).
A similar method has been used to analyze hierarchi-
cal structure in neural networks trained on arithmetic expres- Finally, a couple of papers discovered that mod-
sions (Veldhoen et al., 2016; Hupkes et al., 2018). els trained with latent trees perform better on nat-
ural language inference (NLI) (Williams et al., which can be uncovered by applying a linear trans-
2018; Maillard and Clark, 2018) than ones trained formation on the learned embeddings. Their re-
with linguistically-annotated trees. Moreover, the sults suggest an alternative explanation, showing
trees in these models do not resemble syntactic that “embedding models are able to encode diver-
trees corresponding to known linguistic theories, gent linguistic information but have limits on how
which casts doubts on the importance of syntax- this information is surfaced.”
learning in the underlying neural network.6 From a methodological point of view, most of
the relevant analysis work is concerned with cor-
2.3 Neural network components
relation: how correlated are neural network com-
In terms of the object of study, various neural neu- ponents with linguistic properties? What may be
ral network components were investigated, includ- lacking is a measure of causation: how does the
ing word embeddings, RNN hidden states or gate encoding of linguistic properties affect the sys-
activations, sentence embeddings, and attention tem output. Giulianelli et al. (2018) make some
weights in sequence-to-sequence (seq2seq) mod- headway on this question. They predicted number
els. Generally less work has analyzed convolu- agreement from RNN hidden states and gates at
tional neural networks (CNNs) in NLP, but see different time steps. They then intervened in how
Jacovi et al. (2018) for a recent exception. In the model processes the sentence by changing a
speech processing, researchers have analyzed lay- hidden activation based on the difference between
ers in deep neural networks for speech recognition the prediction and the correct label. This improved
and different speaker embeddings. Some analy- agreement prediction accuracy, and the effect per-
sis has also been devoted to joint language-vision sisted over the coarse of the sentence, indicating
or audio-vision models, or to similarities between that this information has an effect on the model.
word embeddings and convolutional image rep- However, they did not report the effect on overall
resentations. Table SM1 provides detailed refer- model quality, for example by measuring perplex-
ences. ity. Methods from causal inference may shed new
light on some of these questions.
2.4 Limitations
Finally, the predictor for the auxiliary task is
The classification approach may find that a cer- usually a simple classifier, such as logistic re-
tain amount of linguistic information is captured gression. A few studies compared different clas-
in the neural network. However, this does not sifiers and found that deeper classifiers lead to
necessarily mean that the information is used by overall better results, but do not alter the respec-
the network. For example, Vanmassenhove et al. tive trends when comparing different models or
(2017) investigated aspect in NMT (and in phrase- components (Qian et al., 2016b; Belinkov, 2018).
based statistical MT). They trained a classifier on Interestingly, Conneau et al. (2018) found that
NMT sentence encoding vectors and found that tasks requiring more nuanced linguistic knowl-
they can accurately predict tense about 90% of the edge (e.g., tree depth, coordination inversion) gain
time. However, when evaluating the output trans- the most from using a deeper classifier. However,
lations, they found them to have the correct tense the approach is usually taken for granted; given
only 79% of the time. They interpreted this re- its prevalence, it appears that better theoretical or
sult to mean that “part of the aspectual informa- empirical foundations are in place.
tion is lost during decoding”. Relatedly, Cífka and
Bojar (2018) compared the performance of vari- 3 Visualization
ous NMT models in terms of translation quality
(BLEU) and representation quality (classification Visualization is a valuable tool for analyzing neu-
tasks). They found a negative correlation between ral networks in the language domain and beyond.
the two, suggesting that high-quality systems may Early work visualized hidden unit activations in
not be learning certain sentence meanings. In con- RNNs trained on an artificial language modeling
trast, Artetxe et al. (2018) showed that word em- task, and observed how they correspond to certain
beddings contain divergent linguistic information, grammatical relations such as agreement (Elman,
6
Others found that even simple binary trees may work
1991). Much recent work has focused on visu-
well in MT (Wang et al., 2018b) and sentence classifica- alizing activations on specific examples in mod-
tion (Chen et al., 2015). ern neural networks for language (Karpathy et al.,
Figure 1: A heatmap visualizing neuron activa-
tions. In this case, the activations capture position
in the sentence.
Ali Elkahky, Kellie Webster, Daniel Andor, and Shi Feng, Eric Wallace, Alvin Grissom II, Mo-
Emily Pitler. 2018. A Challenge Set and Meth- hit Iyyer, Pedro Rodriguez, and Jordan Boyd-
ods for Noun-Verb Ambiguity. In Proceedings Graber. 2018. Pathologies of Neural Models
of the 2018 Conference on Empirical Methods Make Interpretations Difficult. In Proceedings
in Natural Language Processing, pages 2562– of the 2018 Conference on Empirical Methods
2572. Association for Computational Linguis- in Natural Language Processing, pages 3719–
tics. 3728. Association for Computational Linguis-
tics.
Zied Elloumi, Laurent Besacier, Olivier Galib- Lev Finkelstein, Evgeniy Gabrilovich, Yossi Ma-
ert, and Benjamin Lecouteux. 2018. Analyzing tias, Ehud Rivlin, Zach Solan, Gadi Wolfman,
Learned Representations of a Deep ASR Per- and Eytan Ruppin. 2002. Placing Search in
formance Prediction Model. In Proceedings of Context: The Concept Revisited. ACM Trans-
the 2018 EMNLP Workshop BlackboxNLP: An- actions on information systems, 20(1):116–131.
alyzing and Interpreting Neural Networks for
NLP, pages 9–15. Association for Computa- Robert Frank, Donald Mathis, and William
tional Linguistics. Badecker. 2013. The Acquisition of Anaphora
by Simple Recurrent Networks. Language Ac- Daniela Gerz, Ivan Vulić, Felix Hill, Roi Reichart,
quisition, 20(3):181–227. and Anna Korhonen. 2016. SimVerb-3500: A
Large-Scale Evaluation Set of Verb Similarity.
Cynthia Freeman, Jonathan Merriman, Abhinav
In Proceedings of the 2016 Conference on Em-
Aggarwal, Ian Beaver, and Abdullah Mueen.
pirical Methods in Natural Language Process-
2018. Paying Attention to Attention: Highlight-
ing, pages 2173–2182. Association for Compu-
ing Influential Samples in Sequential Analysis.
tational Linguistics.
arXiv preprint arXiv:1808.02113v1.
Alona Fyshe, Leila Wehbe, Partha P. Talukdar, Hamidreza Ghader and Christof Monz. 2017.
Brian Murphy, and Tom M. Mitchell. 2015. What does Attention in Neural Machine Trans-
A Compositional and Interpretable Semantic lation Pay Attention to? In Proceedings of the
Space. In Proceedings of the 2015 Conference Eighth International Joint Conference on Natu-
of the North American Chapter of the Associ- ral Language Processing (Volume 1: Long Pa-
ation for Computational Linguistics: Human pers), pages 30–39. Asian Federation of Natural
Language Technologies, pages 32–41. Associ- Language Processing.
ation for Computational Linguistics.
Reza Ghaeini, Xiaoli Fern, and Prasad Tadepalli.
David Gaddy, Mitchell Stern, and Dan Klein. 2018. Interpreting Recurrent and Attention-
2018. What’s Going On in Neural Constituency Based Neural Models: A Case Study on Nat-
Parsers? An Analysis. In Proceedings of the ural Language Inference. In Proceedings of the
2018 Conference of the North American Chap- 2018 Conference on Empirical Methods in Nat-
ter of the Association for Computational Lin- ural Language Processing, pages 4952–4957.
guistics: Human Language Technologies, Vol- Association for Computational Linguistics.
ume 1 (Long Papers), pages 999–1010. Associ-
ation for Computational Linguistics. Mario Giulianelli, Jack Harding, Florian Mohn-
ert, Dieuwke Hupkes, and Willem Zuidema.
J. Ganesh, Manish Gupta, and Vasudeva Varma. 2018. Under the Hood: Using Diagnostic Clas-
2017. Interpretation of Semantic Tweet Rep- sifiers to Investigate and Improve how Lan-
resentations. In Proceedings of the 2017 guage Models Track Agreement Information.
IEEE/ACM International Conference on Ad- In Proceedings of the 2018 EMNLP Workshop
vances in Social Networks Analysis and Mining BlackboxNLP: Analyzing and Interpreting Neu-
2017, ASONAM ’17, pages 95–102, New York, ral Networks for NLP, pages 240–248. Associ-
NY, USA. ACM. ation for Computational Linguistics.
Ji Gao, Jack Lanchantin, Mary Lou Soffa,
Max Glockner, Vered Shwartz, and Yoav Gold-
and Yanjun Qi. 2018. Black-box Genera-
berg. 2018. Breaking NLI Systems with Sen-
tion of Adversarial Text Sequences to Evade
tences that Require Simple Lexical Inferences.
Deep Learning Classifiers. arXiv preprint
In Proceedings of the 56th Annual Meeting of
arXiv:1801.04354v5.
the Association for Computational Linguistics
Lieke Gelderloos and Grzegorz Chrupała. 2016. (Volume 2: Short Papers), pages 650–655. As-
From phonemes to images: Levels of represen- sociation for Computational Linguistics.
tation in a recurrent neural model of visually-
grounded language learning. In Proceedings of Fréderic Godin, Kris Demuynck, Joni Dambre,
COLING 2016, the 26th International Confer- Wesley De Neve, and Thomas Demeester. 2018.
ence on Computational Linguistics: Technical Explaining Character-Aware Neural Networks
Papers, pages 1309–1319, Osaka, Japan. The for Word-Level Prediction: Do They Discover
COLING 2016 Organizing Committee. Linguistic Rules? In Proceedings of the 2018
Conference on Empirical Methods in Natural
Felix A. Gers and Jürgen Schmidhuber. 2001. Language Processing, pages 3275–3284. Asso-
LSTM Recurrent Networks Learn Simple ciation for Computational Linguistics.
Context-Free and Context-Sensitive Lan-
guages. IEEE Transactions on Neural Yoav Goldberg. 2017. Neural Network methods
Networks, 12(6):1333–1340. for Natural Language Processing, volume 10 of
Synthesis Lectures on Human Language Tech- Catherine L. Harris. 1990. Connectionism and
nologies. Morgan & Claypool Publishers. Cognitive Linguistics. Connection Science,
2(1-2):7–33.
Ian Goodfellow, Yoshua Bengio, and Aaron
Courville. 2016. Deep Learning. MIT Press. David Harwath and James Glass. 2017. Learn-
http://www.deeplearningbook.org. ing Word-Like Units from Joint Audio-Visual
Analysis. In Proceedings of the 55th Annual
Ian Goodfellow, Jean Pouget-Abadie, Mehdi
Meeting of the Association for Computational
Mirza, Bing Xu, David Warde-Farley, Sherjil
Linguistics (Volume 1: Long Papers), pages
Ozair, Aaron Courville, and Yoshua Bengio.
506–517. Association for Computational Lin-
2014. Generative Adversarial Nets. In Ad-
guistics.
vances in neural information processing sys-
tems, pages 2672–2680. Georg Heigold, Günter Neumann, and Josef van
Ian J. Goodfellow, Jonathon Shlens, and Christian Genabith. 2018. How Robust Are Character-
Szegedy. 2015. Explaining and Harnessing Ad- Based Word Embeddings in Tagging and MT
versarial Examples. In International Confer- Against Wrod Scramlbing or Randdm Nouse?
ence on Learning Representations (ICLR). In Proceedings of the 13th Conference of The
Association for Machine Translation in the
Kristina Gulordava, Piotr Bojanowski, Edouard Americas (Volume 1: Research Track), pages
Grave, Tal Linzen, and Marco Baroni. 2018. 68–79.
Colorless Green Recurrent Networks Dream
Hierarchically. In Proceedings of the 2018 Felix Hill, Roi Reichart, and Anna Korhonen.
Conference of the North American Chapter 2015. SimLex-999: Evaluating Semantic
of the Association for Computational Linguis- Models With (Genuine) Similarity Estimation.
tics: Human Language Technologies, Volume 1 Computational Linguistics, 41(4):665–695.
(Long Papers), pages 1195–1205. Association
for Computational Linguistics. Dieuwke Hupkes, Sara Veldhoen, and Willem
Zuidema. 2018. Visualisation and ’diagnos-
Abhijeet Gupta, Gemma Boleda, Marco Baroni, tic classifiers’ reveal how recurrent and recur-
and Sebastian Padó. 2015. Distributional vec- sive neural networks process hierarchical struc-
tors encode referential attributes. In Proceed- ture. Journal of Artificial Intelligence Research,
ings of the 2015 Conference on Empirical Meth- 61:907–926.
ods in Natural Language Processing, pages 12–
21. Association for Computational Linguistics. Pierre Isabelle, Colin Cherry, and George Foster.
2017. A Challenge Set Approach to Evaluat-
Pankaj Gupta and Hinrich Schütze. 2018. LISA: ing Machine Translation. In Proceedings of the
Explaining Recurrent Neural Network Judg- 2017 Conference on Empirical Methods in Nat-
ments via Layer-wIse Semantic Accumulation ural Language Processing, pages 2486–2496.
and Example to Pattern Transformation. In Association for Computational Linguistics.
Proceedings of the 2018 EMNLP Workshop
BlackboxNLP: Analyzing and Interpreting Neu- Pierre Isabelle and Roland Kuhn. 2018. A Chal-
ral Networks for NLP, pages 154–164. Associ- lenge Set for French–> English Machine Trans-
ation for Computational Linguistics. lation. arXiv preprint arXiv:1806.02725v2.
Suchin Gururangan, Swabha Swayamdipta, Omer Hitoshi Isahara. 1995. JEIDA’s test-sets for qual-
Levy, Roy Schwartz, Samuel Bowman, and ity evaluation of MT systems-technical evalua-
Noah A. Smith. 2018. Annotation Artifacts in tion from the developer’s point of view. In Pro-
Natural Language Inference Data. In Proceed- ceedings of MT Summit V.
ings of the 2018 Conference of the North Amer-
ican Chapter of the Association for Computa- Mohit Iyyer, John Wieting, Kevin Gimpel, and
tional Linguistics: Human Language Technolo- Luke Zettlemoyer. 2018. Adversarial Exam-
gies, Volume 2 (Short Papers), pages 107–112. ple Generation with Syntactically Controlled
Association for Computational Linguistics. Paraphrase Networks. In Proceedings of the
2018 Conference of the North American Chap- Margaret King and Kirsten Falkedal. 1990. Using
ter of the Association for Computational Lin- Test Suites in Evaluation of Machine Transla-
guistics: Human Language Technologies, Vol- tion Systems. In COLNG 1990 Volume 2: Pa-
ume 1 (Long Papers), pages 1875–1885. Asso- pers presented to the 13th International Confer-
ciation for Computational Linguistics. ence on Computational Linguistics.
Alon Jacovi, Oren Sar Shalom, and Yoav Gold- Eliyahu Kiperwasser and Yoav Goldberg. 2016.
berg. 2018. Understanding Convolutional Neu- Simple and Accurate Dependency Parsing Us-
ral Networks for Text Classification. In Pro- ing Bidirectional LSTM Feature Representa-
ceedings of the 2018 EMNLP Workshop Black- tions. Transactions of the Association for Com-
boxNLP: Analyzing and Interpreting Neural putational Linguistics, 4:313–327.
Networks for NLP, pages 56–65. Association
Sungryong Koh, Jinee Maeng, Ji-Young Lee,
for Computational Linguistics.
Young-Sook Chae, and Key-Sun Choi. 2001. A
test suite for evaluation of English-to-Korean
Inigo Jauregi Unanue, Ehsan Zare Borzeshi, and
machine translation systems. In MT Summit
Massimo Piccardi. 2018. A Shared Attention
Conference.
Mechanism for Interpretation of Neural Auto-
matic Post-Editing Systems. In Proceedings of Arne Köhn. 2015. What’s in an Embedding? Ana-
the 2nd Workshop on Neural Machine Transla- lyzing Word Embeddings through Multilingual
tion and Generation, pages 11–17. Association Evaluation. In Proceedings of the 2015 Con-
for Computational Linguistics. ference on Empirical Methods in Natural Lan-
guage Processing, pages 2067–2073, Lisbon,
Robin Jia and Percy Liang. 2017. Adversarial ex- Portugal. Association for Computational Lin-
amples for evaluating reading comprehension guistics.
systems. In Proceedings of the 2017 Confer-
ence on Empirical Methods in Natural Lan- Volodymyr Kuleshov, Shantanu Thakoor,
guage Processing, pages 2021–2031. Associa- Tingfung Lau, and Stefano Ermon. 2018.
tion for Computational Linguistics. Adversarial Examples for Natural Language
Classification Problems.
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster,
Brenden Lake and Marco Baroni. 2018. Gener-
Noam Shazeer, and Yonghui Wu. 2016. Explor-
alization without Systematicity: On the Com-
ing the Limits of Language Modeling. arXiv
positional Skills of Sequence-to-Sequence Re-
preprint arXiv:1602.02410v2.
current Networks. In Proceedings of the 35th
International Conference on Machine Learning,
Akos Kádár, Grzegorz Chrupała, and Afra Al-
volume 80 of Proceedings of Machine Learning
ishahi. 2017. Representation of Linguistic
Research, pages 2873–2882, Stockholmsmäs-
Form and Function in Recurrent Neural Net-
san, Stockholm, Sweden. PMLR.
works. Computational Linguistics, 43(4):761–
780. Sabine Lehmann, Stephan Oepen, Sylvie Regnier-
Prost, Klaus Netter, Veronika Lux, Judith Klein,
Andrej Karpathy, Justin Johnson, and Fei- Kirsten Falkedal, Frederik Fouvry, Dominique
Fei Li. 2015. Visualizing and Understand- Estival, Eva Dauphin, Herve Compagnion, Ju-
ing Recurrent Networks. arXiv preprint dith Baur, Lorna Balkan, and Doug Arnold.
arXiv:1506.02078v2. 1996. TSNLP - Test Suites for Natural Lan-
guage Processing. In COLING 1996 Volume 2:
Urvashi Khandelwal, He He, Peng Qi, and Dan Ju- The 16th International Conference on Compu-
rafsky. 2018. Sharp Nearby, Fuzzy Far Away: tational Linguistics.
How Neural Language Models Use Context. In
Proceedings of the 56th Annual Meeting of the Tao Lei, Regina Barzilay, and Tommi Jaakkola.
Association for Computational Linguistics (Vol- 2016. Rationalizing Neural Predictions. In
ume 1: Long Papers), pages 284–294. Associa- Proceedings of the 2016 Conference on Empir-
tion for Computational Linguistics. ical Methods in Natural Language Processing,
pages 107–117. Association for Computational Thang Luong, Richard Socher, and Christopher
Linguistics. Manning. 2013. Better Word Representa-
tions with Recursive Neural Networks for Mor-
Ira Leviant and Roi Reichart. 2015. Separated phology. In Proceedings of the Seventeenth
by an Un-common Language: Towards Judg- Conference on Computational Natural Lan-
ment Language Informed Vector Space Model- guage Learning, pages 104–113. Association
ing. arXiv preprint arXiv:1508.00106v5. for Computational Linguistics.
Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Ju- Jean Maillard and Stephen Clark. 2018. La-
rafsky. 2016a. Visualizing and Understanding tent Tree Learning with Differentiable Parsers:
Neural Models in NLP. In Proceedings of the Shift-Reduce Parsing and Chart Parsing. In
2016 Conference of the North American Chap- Proceedings of the Workshop on the Relevance
ter of the Association for Computational Lin- of Linguistic Structure in Neural Architectures
guistics: Human Language Technologies, pages for NLP, pages 13–18. Association for Compu-
681–691. Association for Computational Lin- tational Linguistics.
guistics. Marco Marelli, Luisa Bentivogli, Marco Ba-
roni, Raffaella Bernardi, Stefano Menini, and
Jiwei Li, Will Monroe, and Dan Jurafsky.
Roberto Zamparelli. 2014. SemEval-2014 Task
2016b. Understanding Neural Networks
1: Evaluation of Compositional Distributional
through Representation Erasure. arXiv preprint
Semantic Models on Full Sentences through
arXiv:1612.08220v3.
Semantic Relatedness and Textual Entailment.
Bin Liang, Hongcheng Li, Miaoqiang Su, Pan In Proceedings of the 8th International Work-
Bian, Xirong Li, and Wenchang Shi. 2018. shop on Semantic Evaluation (SemEval 2014),
Deep Text Classification Can be Fooled. In pages 1–8. Association for Computational Lin-
Proceedings of the Twenty-Seventh Interna- guistics.
tional Joint Conference on Artificial Intelli- R. Thomas McCoy, Robert Frank, and Tal Linzen.
gence, IJCAI-18, pages 4208–4215. Interna- 2018. Revisiting the poverty of the stimulus:
tional Joint Conferences on Artificial Intelli- Hierarchical generalization without a hierarchi-
gence Organization. cal bias in recurrent neural networks. In Pro-
ceedings of the 40th Annual Conference of the
Tal Linzen, Emmanuel Dupoux, and Yoav Gold- Cognitive Science Society.
berg. 2016. Assessing the Ability of LSTMs to
Learn Syntax-Sensitive Dependencies. Trans- Risto Miikkulainen and Michael G. Dyer. 1991.
actions of the Association for Computational Natural Language Processing With Modular
Linguistics, 4:521–535. Pdp Networks and Distributed Lexicon. Cog-
nitive Science, 15(3):343–399.
Zachary C. Lipton. 2016. The Mythos of Model
Interpretability. In ICML Workshop on Human Tomáš Mikolov, Martin Karafiát, Lukáš Bur-
Interpretability of Machine Learning. get, Jan Černockỳ, and Sanjeev Khudanpur.
2010. Recurrent neural network based language
Nelson F. Liu, Omer Levy, Roy Schwartz, Chen- model. In Eleventh Annual Conference of the
hao Tan, and Noah A. Smith. 2018. LSTMs Ex- International Speech Communication Associa-
ploit Linguistic Attributes of Data. In Proceed- tion.
ings of The Third Workshop on Representation Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen
Learning for NLP, pages 180–186. Association Li, Yuanzhe Chen, Yangqiu Song, and Huamin
for Computational Linguistics. Qu. 2017. Understanding Hidden Memories of
Recurrent Neural Networks. In IEEE Confer-
Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn
ence on Visual Analytics Science and Technol-
Song. 2017. Delving into Transferable Adver-
ogy (IEEE VAST 2017).
sarial Examples and Black-box Attacks. In In-
ternational Conference on Learning Represen- Grégoire Montavon, Wojciech Samek, and Klaus-
tations (ICLR). Robert Müller. 2018. Methods for interpreting
and understanding deep neural networks. Digi- on Deep Neural Networks. In 2017 IEEE Con-
tal Signal Processing, 73:1 – 15. ference on Computer Vision and Pattern Recog-
nition Workshops (CVPRW), pages 1310–1318.
Pramod Kaushik Mudrakarta, Ankur Taly,
Mukund Sundararajan, and Kedar Dhamdhere. Lars Niklasson and Fredrik Linåker. 2000. Dis-
2018. Did the Model Understand the Question? tributed representations for extended syntac-
In Proceedings of the 56th Annual Meeting of tic transformation. Connection Science, 12(3-
the Association for Computational Linguistics 4):299–314.
(Volume 1: Long Papers), pages 1896–1906.
Association for Computational Linguistics. Tong Niu and Mohit Bansal. 2018. Adversar-
ial Over-Sensitivity and Over-Stability Strate-
James Mullenbach, Sarah Wiegreffe, Jon Duke, gies for Dialogue Models. In Proceedings of
Jimeng Sun, and Jacob Eisenstein. 2018. Ex- the 22nd Conference on Computational Natural
plainable Prediction of Medical Codes from Language Learning, pages 486–496. Associa-
Clinical Text. In Proceedings of the 2018 tion for Computational Linguistics.
Conference of the North American Chapter
of the Association for Computational Linguis- Nicolas Papernot, Patrick McDaniel, and Ian
tics: Human Language Technologies, Volume 1 Goodfellow. 2016a. Transferability in Ma-
(Long Papers), pages 1101–1111. Association chine Learning: From Phenomena to Black-
for Computational Linguistics. Box Attacks using Adversarial Samples. arXiv
preprint arXiv:1605.07277v1.
W. James Murdoch, Peter J. Liu, and Bin Yu.
2018. Beyond Word Importance: Contex- Nicolas Papernot, Patrick McDaniel, Ian Goodfel-
tual Decomposition to Extract Interactions from low, Somesh Jha, Z. Berkay Celik, and Anan-
LSTMs. In International Conference on Learn- thram Swami. 2017. Practical Black-Box At-
ing Representations. tacks Against Machine Learning. In Proceed-
ings of the 2017 ACM on Asia Conference on
Brian Murphy, Partha Talukdar, and Tom Mitchell. Computer and Communications Security, ASIA
2012. Learning Effective and Interpretable Se- CCS ’17, pages 506–519, New York, NY, USA.
mantic Models using Non-Negative Sparse Em- ACM.
bedding. In Proceedings of COLING 2012,
pages 1933–1950. The COLING 2012 Organiz- Nicolas Papernot, Patrick McDaniel, Ananthram
ing Committee. Swami, and Richard Harang. 2016b. Craft-
ing Adversarial Input Sequences for Recurrent
Tasha Nagamine, Michael L. Seltzer, and Nima Neural Networks. In Military Communications
Mesgarani. 2015. Exploring How Deep Neural Conference, MILCOM 2016-2016 IEEE, pages
Networks Form Phonemic Categories. In Inter- 49–54. IEEE.
speech 2015.
Dong Huk Park, Lisa Anne Hendricks, Zeynep
Tasha Nagamine, Michael L. Seltzer, and Nima Akata, Anna Rohrbach, Bernt Schiele, Trevor
Mesgarani. 2016. On the Role of Nonlin- Darrell, and Marcus Rohrbach. 2018. Multi-
ear Transformations in Deep Neural Network modal Explanations: Justifying Decisions and
Acoustic Models. In Interspeech 2016, pages Pointing to the Evidence. In The IEEE Confer-
803–807. ence on Computer Vision and Pattern Recogni-
tion (CVPR).
Aakanksha Naik, Abhilasha Ravichander, Nor-
man Sadeh, Carolyn Rose, and Graham Neu- Sungjoon Park, JinYeong Bak, and Alice Oh.
big. 2018. Stress Test Evaluation for Nat- 2017. Rotated Word Vector Representations
ural Language Inference. In Proceedings of and their Interpretability. In Proceedings of the
the 27th International Conference on Compu- 2017 Conference on Empirical Methods in Nat-
tational Linguistics, pages 2340–2353. Associ- ural Language Processing, pages 401–411. As-
ation for Computational Linguistics. sociation for Computational Linguistics.
Nina Narodytska and Shiva Kasiviswanathan. Matthew Peters, Mark Neumann, Luke Zettle-
2017. Simple Black-Box Adversarial Attacks moyer, and Wen-tau Yih. 2018. Dissecting
Contextual Word Embeddings: Architecture Matı̄ss Rikters. 2018. Debugging Neural
and Representation. In Proceedings of the 2018 Machine Translations. arXiv preprint
Conference on Empirical Methods in Natural arXiv:1808.02733v1.
Language Processing, pages 1499–1509. Asso-
ciation for Computational Linguistics. Annette Rios Gonzales, Laura Mascarell, and Rico
Sennrich. 2017. Improving Word Sense Disam-
Adam Poliak, Aparajita Haldar, Rachel Rudinger, biguation in Neural Machine Translation with
J. Edward Hu, Ellie Pavlick, Aaron Steven Sense Embeddings. In Proceedings of the Sec-
White, and Benjamin Van Durme. 2018a. Col- ond Conference on Machine Translation, pages
lecting Diverse Natural Language Inference 11–19. Association for Computational Linguis-
Problems for Sentence Representation Evalua- tics.
tion. In Proceedings of the 2018 Conference on
Tim Rocktäschel, Edward Grefenstette,
Empirical Methods in Natural Language Pro-
Karl Moritz Hermann, Tomáš Kočiskỳ,
cessing, pages 67–81. Association for Compu-
and Phil Blunsom. 2016. Reasoning about
tational Linguistics.
Entailment with Neural Attention. In Interna-
tional Conference on Learning Representations
Adam Poliak, Jason Naradowsky, Aparajita
(ICLR).
Haldar, Rachel Rudinger, and Benjamin
Van Durme. 2018b. Hypothesis Only Baselines Andras Rozsa, Ethan M. Rudd, and Terrance E.
in Natural Language Inference. In Proceedings Boult. 2016. Adversarial Diversity and Hard
of the Seventh Joint Conference on Lexical Positive Generation. In Proceedings of the
and Computational Semantics, pages 180–191. IEEE Conference on Computer Vision and Pat-
Association for Computational Linguistics. tern Recognition Workshops, pages 25–32.
Jordan B. Pollack. 1990. Recursive dis- Rachel Rudinger, Jason Naradowsky, Brian
tributed representations. Artificial Intelligence, Leonard, and Benjamin Van Durme. 2018.
46(1):77 – 105. Gender Bias in Coreference Resolution. In Pro-
ceedings of the 2018 Conference of the North
Peng Qian, Xipeng Qiu, and Xuanjing Huang. American Chapter of the Association for Com-
2016a. Analyzing Linguistic Knowledge in Se- putational Linguistics: Human Language Tech-
quential Model of Sentence. In Proceedings of nologies, Volume 2 (Short Papers), pages 8–14.
the 2016 Conference on Empirical Methods in Association for Computational Linguistics.
Natural Language Processing, pages 826–835,
D. E. Rumelhart and J. L. McClelland. 1986. Par-
Austin, Texas. Association for Computational
allel Distributed Processing: Explorations in the
Linguistics.
Microstructure of Cognition. volume 2, chapter
On Learning the Past Tenses of English Verbs,
Peng Qian, Xipeng Qiu, and Xuanjing Huang.
pages 216–271. MIT Press, Cambridge, MA,
2016b. Investigating Language Universal and
USA.
Specific Properties in Word Embeddings. In
Proceedings of the 54th Annual Meeting of the Alexander M. Rush, Sumit Chopra, and Jason
Association for Computational Linguistics (Vol- Weston. 2015. A Neural Attention Model for
ume 1: Long Papers), pages 1478–1488, Berlin, Abstractive Sentence Summarization. In Pro-
Germany. Association for Computational Lin- ceedings of the 2015 Conference on Empiri-
guistics. cal Methods in Natural Language Processing,
pages 379–389. Association for Computational
Marco Tulio Ribeiro, Sameer Singh, and Carlos Linguistics.
Guestrin. 2018. Semantically Equivalent Ad-
versarial Rules for Debugging NLP models. In Keisuke Sakaguchi, Kevin Duh, Matt Post, and
Proceedings of the 56th Annual Meeting of the Benjamin Van Durme. 2017. Robsut Wrod Re-
Association for Computational Linguistics (Vol- ocginiton via Semi-Character Recurrent Neu-
ume 1: Long Papers), pages 856–865. Associa- ral Network. In Proceedings of the Thirty-
tion for Computational Linguistics. First AAAI Conference on Artificial Intelli-
gence, February 4-9, 2017, San Francisco, Cal- Xing Shi, Inkit Padhi, and Kevin Knight. 2016b.
ifornia, USA., pages 3281–3287. AAAI Press. Does String-Based Neural MT Learn Source
Syntax? In Proceedings of the 2016 Con-
Suranjana Samanta and Sameep Mehta. 2017.
ference on Empirical Methods in Natural Lan-
Towards Crafting Text Adversarial Samples.
guage Processing, pages 1526–1534, Austin,
arXiv preprint arXiv:1707.02812v1.
Texas. Association for Computational Linguis-
Ivan Sanchez, Jeff Mitchell, and Sebastian Riedel. tics.
2018. Behavior Analysis of NLI Models: Un-
Chandan Singh, W. James Murdoch, and Bin
covering the Influence of Three Factors on Ro-
Yu. 2018. Hierarchical interpretations for
bustness. In Proceedings of the 2018 Confer-
neural network predictions. arXiv preprint
ence of the North American Chapter of the As-
arXiv:1806.05337v1.
sociation for Computational Linguistics: Hu-
man Language Technologies, Volume 1 (Long Hendrik Strobelt, Sebastian Gehrmann, Michael
Papers), pages 1975–1985. Association for Behrisch, Adam Perer, Hanspeter Pfister, and
Computational Linguistics. Alexander M. Rush. 2018a. Seq2Seq-Vis:
A Visual Debugging Tool for Sequence-
Motoki Sato, Jun Suzuki, Hiroyuki Shindo, and
to-Sequence Models. arXiv preprint
Yuji Matsumoto. 2018. Interpretable Adversar-
arXiv:1804.09299v1.
ial Perturbation in Input Embedding Space for
Text. In Proceedings of the Twenty-Seventh In- Hendrik Strobelt, Sebastian Gehrmann, Hanspeter
ternational Joint Conference on Artificial In- Pfister, and Alexander M. Rush. 2018b. LST-
telligence, IJCAI-18, pages 4323–4330. Inter- MVis: A Tool for Visual Analysis of Hidden
national Joint Conferences on Artificial Intelli- State Dynamics in Recurrent Neural Networks.
gence Organization. IEEE transactions on visualization and com-
Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy, puter graphics, 24(1):667–676.
Aykut Koc, and Tolga Cukur. 2018. Seman- Mukund Sundararajan, Ankur Taly, and Qiqi Yan.
tic Structure and Interpretability of Word Em- 2017. Axiomatic Attribution for Deep Net-
beddings. IEEE/ACM Transactions on Audio, works. In Proceedings of the 34th Interna-
Speech, and Language Processing. tional Conference on Machine Learning, vol-
Rico Sennrich. 2017. How Grammatical is ume 70 of Proceedings of Machine Learning
Character-level Neural Machine Translation? Research, pages 3319–3328, International Con-
Assessing MT Quality with Contrastive Trans- vention Centre, Sydney, Australia. PMLR.
lation Pairs. In Proceedings of the 15th Confer- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le.
ence of the European Chapter of the Association 2014. Sequence to Sequence Learning with
for Computational Linguistics: Volume 2, Short Neural Networks. In Advances in neural infor-
Papers, pages 376–382. Association for Com- mation processing systems, pages 3104–3112.
putational Linguistics.
Mirac Suzgun, Yonatan Belinkov, and Stuart M.
Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Shieber. 2019. On Evaluating the Generaliza-
Jiang, and Jian Sun. 2018. Learning Visually- tion of LSTM Models in Formal Languages. In
Grounded Semantics from Contrastive Adver- Proceedings of the Society for Computation in
sarial Samples. In Proceedings of the 27th In- Linguistics (SCiL).
ternational Conference on Computational Lin-
guistics, pages 3715–3727. Association for Christian Szegedy, Wojciech Zaremba, Ilya
Computational Linguistics. Sutskever, Joan Bruna, Dumitru Erhan, Ian
Goodfellow, and Rob Fergus. 2014. Intrigu-
Xing Shi, Kevin Knight, and Deniz Yuret. 2016a. ing properties of neural networks. In Interna-
Why Neural Translations are the Right Length. tional Conference on Learning Representations
In Proceedings of the 2016 Conference on Em- (ICLR).
pirical Methods in Natural Language Process-
ing, pages 2278–2282. Association for Compu- Gongbo Tang, Rico Sennrich, and Joakim Nivre.
tational Linguistics. 2018. An Analysis of Attention Mechanisms:
The Case of Word Sense Disambiguation in Xinyi Wang, Hieu Pham, Pengcheng Yin, and
Neural Machine Translation. In Proceedings of Graham Neubig. 2018b. A Tree-based Decoder
the Third Conference on Machine Translation: for Neural Machine Translation. In Conference
Research Papers, pages 26–35. Association for on Empirical Methods in Natural Language
Computational Linguistics. Processing (EMNLP), Brussels, Belgium.
Yi Tay, Anh Tuan Luu, and Siu Cheung Hui. Yu-Hsuan Wang, Cheng-Tao Chung, and Hung-yi
2018. CoupleNet: Paying Attention to Couples Lee. 2017b. Gate Activation Signal Analysis
with Coupled Attention for Relationship Rec- for Gated Recurrent Neural Networks and Its
ommendation. In Proceedings of the Twelfth In- Correlation with Phoneme Boundaries. In In-
ternational AAAI Conference on Web and Social terspeech 2017.
Media (ICWSM).
Gail Weiss, Yoav Goldberg, and Eran Yahav. 2018.
Ke Tran, Arianna Bisazza, and Christof Monz.
On the Practical Computational Power of Finite
2018. The Importance of Being Recurrent
Precision RNNs for Language Recognition. In
for Modeling Hierarchical Structure. In Pro-
Proceedings of the 56th Annual Meeting of the
ceedings of the 2018 Conference on Empiri-
Association for Computational Linguistics (Vol-
cal Methods in Natural Language Processing,
ume 2: Short Papers), pages 740–745. Associa-
pages 4731–4736. Association for Computa-
tion for Computational Linguistics.
tional Linguistics.
Eva Vanmassenhove, Jinhua Du, and Andy Way. Adina Williams, Andrew Drozdov, and Samuel R.
2017. Investigating ‘Aspect’ in NMT and Bowman. 2018. Do latent tree learning mod-
SMT: Translating the English Simple Past and els identify meaningful structure in sentences?
Present Perfect. Computational Linguistics in Transactions of the Association for Computa-
the Netherlands Journal, 7:109–128. tional Linguistics, 6:253–267.
Sara Veldhoen, Dieuwke Hupkes, and Willem Zhizheng Wu and Simon King. 2016. Investigat-
Zuidema. 2016. Diagnostic Classifiers: Reveal- ing gated recurrent networks for speech syn-
ing how Neural Networks Process Hierarchical thesis. In 2016 IEEE International Confer-
Structure. In CEUR Workshop Proceedings. ence on Acoustics, Speech and Signal Process-
ing (ICASSP), pages 5140–5144. IEEE.
Elena Voita, Pavel Serdyukov, Rico Sennrich, and
Ivan Titov. 2018. Context-Aware Neural Ma- Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-
chine Translation Learns Anaphora Resolution. Ling Wang, and Michael I. Jordan. 2018.
In Proceedings of the 56th Annual Meeting of Greedy Attack and Gumbel Attack: Generating
the Association for Computational Linguistics Adversarial Examples for Discrete Data. arXiv
(Volume 1: Long Papers), pages 1264–1274. preprint arXiv:1805.12316v1.
Association for Computational Linguistics.
Wenpeng Yin, Hinrich Schütze, Bing Xiang, and
Ekaterina Vylomova, Trevor Cohn, Xuanli He,
Bowen Zhou. 2016. ABCNN: Attention-Based
and Gholamreza Haffari. 2016. Word Rep-
Convolutional Neural Network for Modeling
resentation Models for Morphologically Rich
Sentence Pairs. Transactions of the Association
Languages in Neural Machine Translation.
for Computational Linguistics, 4:259–272.
arXiv preprint arXiv:1606.04217v1.
Alex Wang, Amapreet Singh, Julian Michael, Fe- Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin
lix Hill, Omer Levy, and Samuel R. Bowman. Li. 2017. Adversarial Examples: Attacks and
2018a. GLUE: A Multi-Task Benchmark and Defenses for Deep Learning. arXiv preprint
Analysis Platform for Natural Language Under- arXiv:1712.07107v3.
standing. arXiv preprint arXiv:1804.07461v1.
Omar Zaidan, Jason Eisner, and Christine Piatko.
Shuai Wang, Yanmin Qian, and Kai Yu. 2017a. 2007. Using “Annotator Rationales” to Im-
What Does the Speaker Embedding Encode? In prove Machine Learning for Text Categoriza-
Interspeech 2017, pages 1497–1501. tion. In Human Language Technologies 2007:
The Conference of the North American Chap-
ter of the Association for Computational Lin-
guistics; Proceedings of the Main Conference,
pages 260–267. Association for Computational
Linguistics.