what Structural Alignment Biases
why Attention is not alignment
what Attention Scores
how Conditioning Attention on Past Decisions
P9 how to read the picture?
attention is not alignment
P2 formulate
P13 papers ,read it -incorporating structural alignment biases
Structural Alignment Biases
The attentional model, as described above, providesa powerful and elegant model of translation in whichalignments between source and target words arelearned through the implicit conditioning context af-forded by the attention mechanism. Despite its ele-gance, the attentional model omits several key com-ponents of a traditional alignment models such asthe IBM models (Brown et al., 1993) and Vogel’shidden Markov Model (Vogel et al., 1996) as imple-mented in the GIZA++ toolkit (Och and Ney, 2003). Combining the strengths of this highly successfulbody of research into a neural model of machinetranslation holds potential to further improve mod-elling accuracy of neural techniques.
the blue one is the alignment result from IBM model and green one is the result from attention. They are not totally the same-->Attention is not alignment.
how can NMT model translate text, even if attention is off?
3 ways to obtain attention scores
有时候也把这种attention的机制叫做query的输出关注了(或者说叫考虑到了)原文的不同部分。(Query attends to the values)
each decoder hidden state attends to the encoder hidden states (decoder的第t步的hidden state----st是query,encoder的hidden states是values)
The weighted sum is a selective summary of the information contained in the values, where the query determines which values to focus on.