Notes on Importance Weighted Autoencoders

Link:http://arxiv.org/abs/1509.00519

原文

Summary

This paper proposes to train a neural network generative model by optimizing an importance sampling (IS) weighted estimate of the log probability under the model. The authors show that the case of an estimate based on a single sample actually corresponds to the learning objective of variational autoencoders (VAE). Importantly, they exploit this connection by showing that, similarly to VAE, a gradient can be passed through the approximate posterior (the IS proposal) samples, thus yielding an importance weighted autoencoder (IWAE). The authors also show that, by using more samples, this objective, which is a lower bound of the actual log-likelihood, becomes an increasingly tighter approximation to the log-likelihood. In other words, the IWAE is expected to better optimize the real log-likelihood of the neural network, compared to VAE.

The experiments presented show that the model achieves competitive performance on a version of the binarized MNIST benchmark and on the Omniglot dataset.

My two cents

This is a really neat contribution! While simple (both conceptually and algorithmically), it really seems to be an important step forward for the VAE framework. I really like the theoretical result showing that IWAE provides a better approximation to the real log-likelihood, it's quite neat and provides an excellent motivation for the method.

The results on binarized MNIST are certainly impressive. Unfortunately, it appears that the training setup isn't actually comparable to the majority of published results on this dataset. Indeed, it seems that they didn't use the stochastic but *fixed* binarization of the inputs that other publications on this benchmark have used (since my paper on NADE with Iain Murray, we've made available that fixed training set for everyone to use, along with fixed validation and test sets as well). I believe instead they've re-sampled the binarization for each minibatch, effectively creating a setup with a somewhat larger training set than usual. It's unfortunate that this is the case, since it makes this result effectively impossible to compare directly with previous work.

I'm being picky on this issue only because I'm super interested in this problem (that is of generative modeling with neural networks) and this little issue is pretty much the only thing that stops this paper from being a slam dunk. Hopefully the authors (or perhaps someone interested in reimplementing IWAE) can clarify this question eventually.

Otherwise, it seems quite clear to me that IWAE is an improvement over VAE. The experiments of section 5.2, showing that fine-tuning a VAE model with IWAE training improves performance, while fine-tuning a IWAE model using VAE actually makes things worse, is further demonstration that IWAE is indeed a good idea.

The Fine Print:I write these notes sometimes hastily, and thus they might not always perfectly reflect what's in the paper. They are mostly meant to provide a first impression of the paper's topic, contribution and achievements. If your appetite is wet, I'd recommend you dive in the paper and check for yourself. Oh, and do let me know if you think I got things wrong :-)

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 10,880评论 0 23
  • 四、回忆 说起来,林奇和刘潇还是校友呢。 6年前,两位KID的技术大神同属于美国麻省理工学院的人工智能相关学科。 ...
    tanxiaotian阅读 400评论 0 0
  • Http 协议组成 HTTP协议分 request 和 response 两部分,就这些。 request(客户端...
    捕鲸小能手阅读 217评论 0 0
  • 在公园里热吻的,大多不是夫妻;在领导前表功的,大多不是精英;在饭局中穿梭的,大多不是主角;经常拿身份证出来亮的,大...
    e03031d72f97阅读 287评论 2 2
  • 心情很沉重,仿佛郁积了很多的戾气,辗转反侧躺在硬卧铺上,空气中弥漫着难闻的气味,下铺两个去乌鲁木齐培训学习的女教师...
    在西域取经的路上阅读 241评论 0 0

友情链接更多精彩内容