Hi, Mr Kim. Recently, I re-read the two papers
<Unsupervised Machine Translation Using Monolingual Corpora Only>
and
<Phrase based & Neural Unsupervised Machine Translation>
I have some ideas may not right and some questions.
The improvement from the first paper to the second paper comes from:
i) adding language model training before and during the MT training process, since the lm consists of encoder and decoder, so the better encoder and decoder parameters can help the translation results get more smooth.
ii) adding on-the-fly back translation
MT will be improved iteratively
Question:
i) in lm.py, the language model using shared layers (which is LSTM layer)of the encoder or decoder, so this means there is no implementation for Transformer-LM. Why? any reference says the RNN-LM works betters than Transformer-LM?
ii) in transformer.py
one_hot=True has not been implemented for transformer, why? I think Transformer should also use one-hot for loss and training
iii) in trainer.py
there are three ways to train the encoder/decoder LM, I did not see the reason why we need train lm_enc_rev, and also here we do not add_noise for LM training, which is different from
iiii) add_noise is only called in autoencoder training, not in lm training as written in main.py, though they may work the same but I think the authors should better notice this in this paper.
What's your thought on my ideas and questions, if you have comments, tell me please~