#structrual attention
##motivationattention works as a soft-selection module. It could model structural dependencies implicitly.##defination$x=[x_1,...,x_n]$reperesent a sequence of inputs, let q be a query, and$z$be a categorical latent variable with sample space${1,...,n}$.input space is accessed by attention distribution$z\simp(z|x,q)$. The context over a sequence is defined as exectation$c=\mathbb{E}_{z\simp(z|x,q)}f(x,z)$. f(x,z) is an*annotation function*.in this definition, annotation functino works as a selection function in conventional attention function,$f(x,z)=x_z$. teh context vector can then be computed using a simple sum:$\textbf{c}=\mathbb{E}_{z\simp(z|x,q)}f(x,z)=\sum_{i=1}^np(z=i|x,q)\textbf{x}_i$##method