This is a implementation of Recurrent Discounted Attention unit that extends Tensorflow's RNNCell, RDA is builds on the RWA by additionally allowing the discounting of the past.
Accuracy
Cost
Efficiently applying attention to sequential data with the Recurrent Discounted Attention unit 1705.08480v1.pdf
Recurrent Discounted Attention unit (RDA) for Tensorflow