Automatic Differentiation, A.K.A please save me from backprop!

原文

With the release of Tensorflow [1] and Twitter’s torch-autograd [2] (inspired by autograd [3] originally written in Python) I think it’s time to take a look at automatic differentiation [4] and why it’s super awesome.

First, I’d like to differentiate symbolic differentiation (SD) and automatic differentiation (AD). SD is akin to the stuff you did in your calculus class, you take a math expression and return a math expression. AD is where you take code that computes some function and return code that computes the derivative of that function. Theano [5], for example, uses SD. This is precisely why it’s very difficult to express loops in Theano and models like RNNs take a long time to compile.

So why does AD even work?

The theory behind AD is all numeric computations are ultimately compositions of a finite set of elementary operations (+, -, *, /, exp, log, sin, cos, etc.) [6].

So the idea is if we can write the code to differentiate these basic operations, then when we encounter a complicated numeric computation we break it down into these elementary ops and deal with those as opposed to figuring out the derivative encapsulating the entire computation. No more fiddling around with backpropagation!
Ok, let’s tie this back in with Tensorflow and torch-autograd.
So far, there’s been two approaches to doing AD. Explicit vs. implicit graph construction.

Tensorflow

Construct a graph and have a compilation step to optimize the graph. Now, to be fair, Tensorflow uses the graph for much more than just AD. For our purposes we’ll just focus on the AD part.
You also can’t write arbitrary code, for example, you can’t use numpy to do computations. You have to use the Python Tensorflow library. This might not be case if you write in C++ since Tensorflow is a C++ program. Either way, the use of Tensorflow would most likely be in a higher-level language so it makes sense to have a language agnostic API.

torch-autograd

No compilation. Constructs a tape data structure on the fly that keeps track of computations and how to compute the backward pass (constucts the computation graph for you).
Here we can write arbitrary Torch/Lua code. Unlike Tensorflow there’s no concern to have a language agnostic API. If you buy into Torch you buy into Lua, so arbitrary code makes sense here.

So which approach is better?

I think both are right for what the each project’s goals are. Also, it doesn’t really matter! We should just be happy AD is taking off and we can avoid the dreaded friction of calculating the backward pass ourselves. Let the computer to the dirty work for you!
Sources:
[1] http://tensorflow.org/
[2] https://github.com/twitter/torch-autograd
[3] https://github.com/HIPS/autograd
[4] https://en.wikipedia.org/wiki/Automatic_differentiation
[5] http://deeplearning.net/software/theano/
[6] http://arxiv.org/abs/1502.05767

随着Tensorflow[1]和Twitter的发布火炬autograd[2](由autograd启发[3]最初用Python编写的),我认为现在是时候看看自动分化[4],以及为什么它的超级真棒。

首先,我想区分符号微分(SD)和自动微分(AD)。 SD类似于你在你的微积分类做了东西,你把一个数学表达式,并返回一个数学表达式。 AD是你采取一些计算功能,并返回代码,该计算函数的导数的代码。 Theano[5],例如,使用的SD。这正是为什么它是非常艰难的啮合来表达Theano循环和模型,如RNNs需要较长时间进行编译

那么,为什么AD甚至工作?

AD背后的理论是所有的数字计算最终都是有限的一系列基本操作的成分(+, - ,*,/,EXP,日志,正弦,余弦,等等)[6]。

这样的想法是,如果当我们遇到一个复杂的数值计算,我们把它分解成这些基本老年退休金计划,并与处理,而不是搞清楚衍生封装整个计算,我们可以编写代码来区分这些基本操作,然后。没有更多的反向传播摆弄周围!

好吧,让我们来配合这回在Tensorflow和火炬autograd。
到目前为止,已经两方法做AD。显性与隐性图施工。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

  • 梦里桃花扇底风,几回无语几曾惊。 国破家亡身何在?秋云底处暮钟声。 【2012年2月2日·大四】
    d03e056874dc阅读 1,774评论 0 0
  • 拒酒樽 不为惜身 假狂士 岂能成真 曾酒醉 毁诗鞭马 怕情多 辜负美人
    兰台小小生阅读 2,682评论 4 16
  • 图片在PPT中的作用无须多言,直接来看几个例子 可是,现在告诉你,做PPT时断网、没有图片,电脑里也没有存货,怎么...
    欢乐PPT阅读 11,132评论 32 214
  • 转自:Python中文社区 有编程基础的人一看就可以了解 Python 的用法了。真正的 30 分钟上手。国外一高...
    PyChina阅读 6,720评论 0 4
  • 先看一段代码,摘自知乎: 输出结果为: Calling constructor of PointCalling c...
    涌凉阅读 4,408评论 0 1

友情链接更多精彩内容