Learning To Compose Neural Networks For Question Answering: Atlanta
Learning To Compose Neural Networks For Question Answering: Atlanta
Learning To Compose Neural Networks For Question Answering: Atlanta
Jacob Andreas and Marcus Rohrbach and Trevor Darrell and Dan Klein
Department of Electrical Engineering and Computer Sciences
University of California, Berkeley
{jda,rohrbach,trevor,klein}@eecs.berkeley.edu
i.e. the output of an MLP with inputs hq (x) and E[(∇ log p(z|x; θ` )) · log p(y|z, w; θe )] (11)
f (zi ), and parameters θ` = {a, B, C, d}. Finally,
we normalize these scores to obtain a distribution: Thus the update to the layout-scoring model at each
n
s(zi |x)
.X timestep is simply the gradient of the log-probability
p(zi |x; θ` ) = e es(zj |x) (9) of the chosen layout, scaled by the accuracy of that
j=1
layout’s predictions. At training time, we approxi-
Having defined a layout selection module mate the expectation with a single rollout, so at each
p(z|x; θ` ) and a network execution model step we update θ` in the direction (∇ log p(z|x; θ` ))·
pz (y|w; θe ), we are ready to define a model log p(y|z, w; θe ) for a single z ∼ p(z|x; θ` ). θe and
for predicting answers given only (world, question) θ` are optimized using ADADELTA (Zeiler, 2012)
pairs. The key constraint is that we want to min- with ρ = 0.95, ε = 1e−6 and gradient clipping at a
imize evaluations of pz (y|w; θe ) (which involves norm of 10.
test-dev test-std
Yes/No Number Other All All
Zhou (2015) 76.6 35.0 42.6 55.7 55.9
Noh (2015) 80.7 37.2 41.7 57.2 57.4
Yang (2015) 79.3 36.6 46.1 58.7 58.9
NMN 81.2 38.0 44.0 58.6 58.7
D-NMN 81.1 38.6 45.5 59.4 59.4