- Attention
- Multi-Head Attention:类比多个卷积核的方式,将 Attention 重复多次并把结果拼接起来,从而实现多角度集中注意力。
- Self Attention:
https://blog.csdn.net/malefactor/article/details/50583474
https://blog.csdn.net/malefactor/article/details/78767781
https://blog.csdn.net/malefactor/article/details/50550211
https://spaces.ac.cn/archives/4823
https://zhuanlan.zhihu.com/p/53682800