7 练习
我们包含三个练习来检测大家的理解。而解答在第 8 节给出。
7.1 最优判别器策略
在公式 (8) 中,判别器的目标是关于 θ(D) 最小化函数
假设判别器可以在函数空间中优化,所以 D(x) 对 x 的每个值独立确定的。所以 D 的最优策略是什么?获得这样的结果我们需要什么样的假设?
7.2 博弈的梯度下降
现在考虑一个 minimax 博弈,包含两个参与人,每个参与人控制了单个的标量值。最小化参与人控制标量 x,最大化的参与人控制标量 y。对该博弈的值函数就是:
- 这个博弈有一个均衡点么?如果是,这个点在哪里?
- 考虑同步梯度下降的学习动态。为了简化问题,将梯度下降看作是连续时间过程。按照一个无穷小学习率,梯度下降可以用偏微分方程组来表示:
然后解这个方程组。
7.3 GAN 框架的最大似然
在练习中,我们将会推导出一个代价可以得到(或者近似得到)最大似然学习。我们的目标是设计 J(G) 使得如果我们假设判别器最优的,那么 J(G) 期望梯度将会称为 DKL(pdata||pmodel)期望梯度。
其解的形式如下:
练习包含确定 f 的形式。
练习的解
8.1 最优判别器策略
我们的目标是在函数空间中最小化
我们先假设 pdata 和 pmodel 处处非零。如果不作假设,那么在训练过程中有些点永不会遇到,并且有未定义的行为。
对 D 来最小化 J(G),我们可以写下函数关于单个项 D(x) 的导数,然后使之为 0:
解方程可得:
估计该比例就是 GANs 使用的关键近似机制。参看图 35.
8.2 博弈的梯度下降
值函数
是最简单的有一个鞍点的连续函数的例子。而理解这个博弈最简单的方式就是在三维空间中可视化值函数,如图 36.
三维空间可视化给我们清晰地展现了在 x = y = 0 处有一个鞍点。这就是博弈的均衡点。我们同样可以用解导数为 0 的方程来找到该点。
当然并不是所有的鞍点都是均衡点;这里参与人参数的无穷小的扰动不能降低参与人的代价。这个博弈的鞍点满足了这个要求。这个是某种 pathological 均衡因为值函数是在保持其他参与人的参数不变时每个参数的常量函数。
为了得到梯度下降的轨迹,我们求导,并得到
对方程 28 微分,我们得到
这种形式的微分方程有 sinusoids 作为解的基函数的他们集合。关于边界条件来求解这个参数,我们有
这些动态形成了一个循环的过程,正如图 37 中所示。用另外的词,以一个无穷小的学习率的同步梯度下降将会在均衡点处其初始化的半径循环。如果采用一个更大的学习率,那么对同步梯度下降可能会永远地远离均衡点。同步梯度下降就不会达到均衡。
对某些博弈来说,同步梯度下降会收敛,而对另外一些博弈,比如练习中这个,就不会收敛。对于 GANs,其实没有理论预测来解释同步梯度下降是否会收敛。解决这个理论问题,并开发出保证收敛的算法,仍然是开放的研究问题。
8.3 GAN 框架中的最大似然
我们希望找到一个函数 f 使得
等于 DKL(pdata||pmodel) 的期望梯度。
首先,我们对 KL 散度求关于参数 θ 的导数:
现在我们希望找到 f 会让公式 32 的导数等于方程 33。我们从求公式 32 的导数:
为了得到这个结果,我们作两个假设:
- 我们假设处处都满足 pg(x) >= 0,这样能够使用等式 pg(x) = exp(log pg(x)).
- 假设可以使用 Leibniz 法则来改变微分和积分的次序(就是说,函数和其导数都是连续的,而对 x 无穷小的值函数值为 0)
我们看到 J(G) 的导数非常接近于我们的目标;唯一的问题就是期望是由从 pg 中采样计算出来的,而我们想要的其实是由 pdata 中采样来计算。我们可以通过 importance sampling 方法来解决;通过设置 f(x) = pdata(x)/pg(x) 来重置从每个生成器样本到补偿它是从生成器而非原数据中采样对梯度的贡献。
注意在构造 J(G) 时,我们必需复制 pg 到 f(x) 使得 f(x) 对于 pg 参数的导数为 0。如果我们获得pdata(x)/pg(x) 该情况会自然出现。
从 8.1 节可以知道,判别器会估计目标比例。使用一些代数处理,我们可以得到一个 f(x) 数值稳定的实现。如果判别器定义为在输出层应用 logistic sigmoid 函数,其中 D(x) = σ(a(x)),那么 f(x) = - exp(a(x))。
这个练习来自于 Goodfellow 2014 的论文的结果。我们可以看到判别器估计密度的比例可以被用来计算散度的差异(variety)。
总结
GANs 是使用监督学习来近似难解代价函数的生成式模型,如同 Boltzmann machine 使用 Markov chain 和 VAE 通用变分下界近似对应的代价函数。GANs 可以使用监督的比例估计技巧来近似很多代价函数,比如说用来近似最大似然估计的 KL 散度。
GANs 相对新颖,仍然需要更多的研究来发掘出其潜力。特别是训练 GANs 需要找到高维连续非凸博弈的 Nash 均衡。研究者们应该尝试开发更好的理论理解和更好的训练算法。所以在这个领域中的突破,会给除了 GANs 之外的应用带来好处。
GANs 对很多图像生成和控制系统的应用取得最好效果特别关键,并且也在增强其他应用方面由很多潜力。
感谢
The author would like to thank the NIPS organizers for inviting him to present this tutorial. Many thanks also to those who commented on his Twitter and Facebook posts asking which topics would be of interest to the tutorial audience. Thanks also to D. Kingma for helpful discussions regarding the description of VAEs. Thanks to Zhu Xiaohu, Alex Kurakin and Ilya Edrenkin for spotting typographical errors in the manuscript.
参考文献
Abadi, M. and Andersen, D. G. (2016). Learning to protect communications with adversarial neural cryptography. arXiv preprint arXiv:1610.06918 .
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado,G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A.,Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Man´e, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V.,Vasudevan, V., Vi´egas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. (2015). TensorFlow: Large-scale machine learning
on heterogeneous systems. Software available from tensorflow.org.
Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9, 147{169.
Bengio, Y., Thibodeau-Laufer, E., Alain, G., and Yosinski, J. (2014). Deep generative stochastic networks trainable by backprop. In ICML’2014.
Brock, A., Lim, T., Ritchie, J. M., and Weston, N. (2016). Neural photo editing with introspective adversarial networks. CoRR, abs/1609.07093.
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., and Abbeel, P. (2016a). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2172{2180.
Chen, X., Kingma, D. P., Salimans, T., Duan, Y., Dhariwal, P., Schulman, J., Sutskever, I., and Abbeel, P. (2016b). Variational lossy autoencoder. arXiv preprint arXiv:1611.02731.
Deco, G. and Brauer, W. (1995). Higher order statistical decorrelation without information loss. NIPS.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.
52Deng, J., Berg, A. C., Li, K., and Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us? In Proceedings of the 11th European Conference on Computer Vision: Part V , ECCV’10, pages 71{84, Berlin, Heidelberg. Springer-Verlag.
Denton, E., Chintala, S., Szlam, A., and Fergus, R. (2015). Deep generative image models using a Laplacian pyramid of adversarial networks. NIPS.
Dinh, L., Krueger, D., and Bengio, Y. (2014). NICE: Non-linear independent components estimation. arXiv:1410.8516.
Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2016). Density estimation using real nvp. arXiv preprint arXiv:1605.08803.
Donahue, J., Kr¨ahenb¨uhl, P., and Darrell, T. (2016). Adversarial feature learning. arXiv preprint arXiv:1605.09782.
Dumoulin, V., Belghazi, I., Poole, B., Lamb, A., Arjovsky, M., Mastropietro, O., and Courville, A. (2016). Adversarially learned inference. arXiv preprint arXiv:1606.00704.
Dziugaite, G. K., Roy, D. M., and Ghahramani, Z. (2015). Training generative neural networks via maximum mean discrepancy optimization. arXiv preprint arXiv:1505.03906.
Edwards, H. and Storkey, A. (2015). Censoring representations with an adversary. arXiv preprint arXiv:1511.05897.
Fahlman, S. E., Hinton, G. E., and Sejnowski, T. J. (1983). Massively parallel architectures for AI: NETL, thistle, and Boltzmann machines. In Proceedings of the National Conference on Artificial Intelligence AAAI-83.
Finn, C. and Levine, S. (2016). Deep visual foresight for planning robot motion. arXiv preprint arXiv:1610.00696.
Finn, C., Christiano, P., Abbeel, P., and Levine, S. (2016a). A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. arXiv preprint arXiv:1611.03852.
Finn, C., Goodfellow, I., and Levine, S. (2016b). Unsupervised learning for physical interaction through video prediction. NIPS.
Frey, B. J. (1998). Graphical models for machine learning and digital communication. MIT Press.
Frey, B. J., Hinton, G. E., and Dayan, P. (1996). Does the wake-sleep algorithm learn good density estimators? In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Advances in Neural Information Processing Systems 8 (NIPS’95), pages 661{670. MIT Press, Cambridge, MA.
53Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., and Lempitsky, V. (2015). Domain-adversarial training of neural networks. arXiv preprint arXiv:1505.07818.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org.
Goodfellow, I. J. (2014). On distinguishability criteria for estimating generative models. In International Conference on Learning Representations, Workshops Track.
Goodfellow, I. J., Shlens, J., and Szegedy, C. (2014a). Explaining and harnessing adversarial examples. CoRR, abs/1412.6572.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014b). Generative adversarial networks. In NIPS’2014.
Gutmann, M. and Hyvarinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of The Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS’10).
Hinton, G. E. (2007). Learning multiple layers of representation. Trends in cognitive sciences, 11(10), 428{434.
Hinton, G. E. and Sejnowski, T. J. (1986). Learning and relearning in Boltzmann machines. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing, volume 1, chapter 7, pages 282{317. MIT Press, Cambridge.
Hinton, G. E., Sejnowski, T. J., and Ackley, D. H. (1984). Boltzmann machines: Constraint satisfaction networks that learn. Technical Report TR-CMU-CS-84-119, Carnegie-Mellon University, Dept. of Computer Science.
Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527{1554.
Ho, J. and Ermon, S. (2016). Generative adversarial imitation learning. In Advances in Neural Information Processing Systems, pages 4565{4573.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift.
Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2016). Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004.
Jang, E., Gu, S., and Poole, B. (2016). Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144.
54Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Kingma, D. P. (2013). Fast gradient-based inference with continuous latent variable models in auxiliary form. Technical report, arxiv:1306.0733.
Kingma, D. P., Salimans, T., and Welling, M. (2016). Improving variational inference with inverse autoregressive flow. NIPS.
Ledig, C., Theis, L., Huszar, F., Caballero, J., Aitken, A. P., Tejani, A., Totz, J., Wang, Z., and Shi, W. (2016). Photo-realistic single image super-resolution using a generative adversarial network. CoRR, abs/1609.04802.
Li, Y., Swersky, K., and Zemel, R. S. (2015). Generative moment matching networks. CoRR, abs/1502.02761.
Lotter, W., Kreiman, G., and Cox, D. (2015). Unsupervised learning of visual structure using predictive generative networks. arXiv preprint
arXiv:1511.06380.
Maddison, C. J., Mnih, A., and Teh, Y. W. (2016). The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712.
Metz, L., Poole, B., Pfau, D., and Sohl-Dickstein, J. (2016). Unrolled generative adversarial networks. arXiv preprint arXiv:1611.02163.
Nguyen, A., Yosinski, J., Bengio, Y., Dosovitskiy, A., and Clune, J. (2016).
Plug & play generative networks: Conditional iterative generation of images in latent space. arXiv preprint arXiv:1612.00005.
Nowozin, S., Cseke, B., and Tomioka, R. (2016). f-gan: Training generative neural samplers using variational divergence minimization. arXiv preprint arXiv:1606.00709.
Odena, A. (2016). Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583.
Oord, A. v. d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.
Pfau, D. and Vinyals, O. (2016). Connecting generative adversarial networks and actor-critic methods. arXiv preprint arXiv:1610.01945.
Radford, A., Metz, L., and Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
55Ratliff, L. J., Burden, S. A., and Sastry, S. S. (2013). Characterization and computation of local nash equilibria in continuous games. In Communication, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on, pages 917{924. IEEE.
Reed, S., van den Oord, A., Kalchbrenner, N., Bapst, V., Botvinick, M., and de Freitas, N. (2016a). Generating interpretable images with controllable structure. Technical report.
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. (2016b). Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396.
Rezende, D. J. and Mohamed, S. (2015). Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770.
Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. In ICML’2014.Preprint: arXiv:1401.4082.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. (2014). ImageNet Large Scale Visual Recognition Challenge.
Salakhutdinov, R. and Hinton, G. (2009). Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, pages 448{455.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., and Chen, X. (2016). Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages 2226{2234.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484{489.
Springenberg, J. T. (2015). Unsupervised and semi-supervised learning with categorical generative adversarial networks. arXiv preprint arXiv:1511.06390.
Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. (2015). Striving for simplicity: The all convolutional net. In ICLR.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., and Fergus, R. (2014). Intriguing properties of neural networks. ICLR, abs/1312.6199.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2015). Rethinking the Inception Architecture for Computer Vision. ArXiv e-prints.
56Theis, L., van den Oord, A., and Bethge, M. (2015). A note on the evaluation of generative models. arXiv:1511.01844.
Warde-Farley, D. and Goodfellow, I. (2016). Adversarial perturbations of deep neural networks. In T. Hazan, G. Papandreou, and D. Tarlow, editors, Perturbations, Optimization, and Statistics, chapter 11. MIT Press.
Williams, R. J. (1992). Simple statistical gradient-following algorithms connectionist reinforcement learning. Machine Learning, 8, 229{256.
Wu, Y., Burda, Y., Salakhutdinov, R., and Grosse, R. (2016). On the quantitative analysis of decoder-based generative models. arXiv preprint arXiv:1611.04273.
Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., and Metaxas, D. (2016). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1612.03242 .
Zhu, J.-Y., Kr¨ahenb¨uhl, P., Shechtman, E., and Efros, A. A. (2016). Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision, pages 597{613. Springer