UnsupervisedMT 架构的理解

Hi, Mr Kim. Recently, I re-read the two papers
<Unsupervised Machine Translation Using Monolingual Corpora Only>
and
<Phrase based & Neural Unsupervised Machine Translation>
I have some ideas may not right and some questions.

The improvement from the first paper to the second paper comes from:
i) adding language model training before and during the MT training process, since the lm consists of encoder and decoder, so the better encoder and decoder parameters can help the translation results get more smooth.

ii) adding on-the-fly back translation

image.png

MT will be improved iteratively

Question:
i) in lm.py, the language model using shared layers (which is LSTM layer)of the encoder or decoder, so this means there is no implementation for Transformer-LM. Why? any reference says the RNN-LM works betters than Transformer-LM?

ii) in transformer.py

image.png

one_hot=True has not been implemented for transformer, why? I think Transformer should also use one-hot for loss and training

iii) in trainer.py

image.png

there are three ways to train the encoder/decoder LM, I did not see the reason why we need train lm_enc_rev, and also here we do not add_noise for LM training， which is different from

image.png

iiii) add_noise is only called in autoencoder training, not in lm training as written in main.py, though they may work the same but I think the authors should better notice this in this paper.

What's your thought on my ideas and questions, if you have comments, tell me please~

UnsupervisedMT 架构的理解

推荐阅读更多精彩内容