一写在前面

未经允许，不得转载，谢谢~

pytorch发布了0.4版本，跟0.3相比还是有挺多不同的，所以学习了一下官网的资料，然后在这里做一个内容的整理与记录。方便规范自己以后的代码，也便于大家参考。ヾ(◍°∇°◍)ﾉﾞ

这里我只挑了我觉得重要的或者目前有用的东西整理，没有把所有的东西都写在这里，要看完整版的可以文章拉到底点击参考资料~

二核心变换

2.1 主要改变

Tensor和Variable合并
有些操作现在可以返回0维张量（scalar）
抑制了volatile标志位

1 合并Tensor和Variable类

这是我觉得最大最重要的一点改变了。

torch.autograd.Variable 和 torch.Tensor现在是同一个类.
更准确的说是torch.Tensor包括了Variable，所以我们都称之为Tensor好了。
Variable还能像以前一样工作，但是返回的类型是torch.Tensor。
这样也就意味着你没必要在代码中用Variable将变量包装起来了。

2 获取Tensor 类型的函数变了

原来是：type(x)
现在是: x.type()
还有isinstance()函数的用法具体如下：

```source-python
>>> x = torch.DoubleTensor([1, 1, 1])
>>> print(type(x)) # was torch.DoubleTensor
<class 'torch.autograd.variable.Variable'>
>>> print(x.type())  # OK: 'torch.DoubleTensor'
'torch.DoubleTensor'
>>> print(isinstance(x, torch.DoubleTensor))  # OK: True
True

3 关于自动求梯度用法的变迁

自动求梯度requires_grad现在是Tensor的属性。
具体的使用可以见这个例子：

```source-python
>>> x = torch.ones(1)  # create a tensor with requires_grad=False (default)
>>> x.requires_grad
False
>>> y = torch.ones(1)  # another tensor with requires_grad=False
>>> z = x + y
>>> # both inputs have requires_grad=False. so does the output
>>> z.requires_grad
False
>>> # then autograd won't track this computation. let's verify!
>>> z.backward()
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
>>>
>>> # now create a tensor with requires_grad=True
>>> w = torch.ones(1, requires_grad=True)
>>> w.requires_grad
True
>>> # add to the previous result that has require_grad=False
>>> total = w + z
>>> # the total sum now requires grad!
>>> total.requires_grad
True
>>> # autograd can compute the gradients as well
>>> total.backward()
>>> w.grad
tensor([ 1.])
>>> # and no computation is wasted to compute gradients for x, y and z, which don't require grad
>>> z.grad == x.grad == y.grad == None
True

如上所示，我们可以得到如下信息：

默认情况创建的张量不需要梯度；
所有的输入inputs都不需要张量，那么输出也不需要计算梯度；
当我们希望创建的张量自动计算梯度时，在定义的时候加上requires_grad=True
当输入inputs中的有一个需要梯度时，那么输出也会自动计算梯度
但是其他不在定义的时候不需要计算梯度的张量在反向传播及求梯度的时候就不再计算这些张量的梯度了；

除了在定义的时候指定变量需要计算梯度外，也可以用函数requires_grad_()来对已经存在的张量设置requires_grad属性。

```source-python
>>> existing_tensor.requires_grad_()
>>> existing_tensor.requires_grad
True
>>> my_tensor = torch.zeros(3, 4, requires_grad=True)
>>> my_tensor.requires_grad
True

4 关于`.data`

之前是用.data来获取Variable中的Tensor
合并以后也有类似的用法，y=x.data，y会是一个跟x数据一样的新张量，默认不计算梯度。
某些情况下，使用.data会比较不安全。因为所有在x.data中出现的变换就不再被autugrad记录了，在反向传播计算梯度的时候可能就会出错。
比较安全的方式是使用x.detach()，虽然返回的还是没是不计算梯度的张量，但是这个张量所作的in-place变换还是会被autograd自动记录到。

5. 开始支持0维标量

以前pytorch会将标量都扩展成1维的张量（1，）
现在开始支持标量的存在了。
具体的标量使用方法以及向量vector使用方法都如下所示：

```source-python
>>> torch.tensor(3.1416)         # create a scalar directly
tensor(3.1416)
>>> torch.tensor(3.1416).size()  # scalar is 0-dimensional
torch.Size([])
>>> torch.tensor([3]).size()     # compare to a vector of size 1
torch.Size([1])
>>>
>>> vector = torch.arange(2, 6)  # this is a vector
>>> vector
tensor([ 2.,  3.,  4.,  5.])
>>> vector.size()
torch.Size([4])
>>> vector[3]                    # indexing into a vector gives a scalar
tensor(5.)
>>> vector[3].item()             # .item() gives the value as a Python number
5.0
>>> sum = torch.tensor([2, 3]).sum()
>>> sum
tensor(5)
>>> sum.size()
torch.Size([])

所以以后在神经网络中计算损失的时候要将原来的total_loss += loss.data[0]改成total_loss += loss.item()

6. 限制了`volatile`标志位的使用

原来用在Variable中使用volatile=True会让autograd不再计算梯度值。
现在这个标志位被限制了，即使用了也没有作用。
pytorch用了更加灵活的方式来代替，具体的使用方法如下所示：

```source-python
>>> x = torch.zeros(1, requires_grad=True)
>>> with torch.no_grad():
...     y = x * 2
>>> y.requires_grad
False
>>>
>>> is_train = False
>>> with torch.set_grad_enabled(is_train):
...     y = x * 2
>>> y.requires_grad
False
>>> torch.set_grad_enabled(True)  # this can also be used as a function
>>> y = x * 2
>>> y.requires_grad
True
>>> torch.set_grad_enabled(False)
>>> y = x * 2
>>> y.requires_grad
False

emmmm,把最主要的几个变换整理了，还有一些没有包括进来，感兴趣的大家还是戳底部看原文吧，也没有什么特别需要讲的东西，纯粹的翻译好像有点太浪费时间了~

参考资料

Trade-off memory for compute, Windows support, 24 distributions with cdf, variance etc., dtypes, zero-dimensional Tensors, Tensor-Variable merge, , faster distributed, perf and bug fixes, CuDNN 7.1

PyTorch | 0.3到0.4不完整迁移手册

PyTorch | 0.3到0.4不完整迁移手册

一写在前面

二核心变换

2.1 主要改变

1 合并Tensor和Variable类

2 获取Tensor 类型的函数变了

3 关于自动求梯度用法的变迁

4 关于`.data`

5. 开始支持0维标量

6. 限制了`volatile`标志位的使用

参考资料

PyTorch | 0.3到0.4不完整迁移手册

一 写在前面

二 核心变换

2.1 主要改变

1 合并Tensor和Variable类

2 获取Tensor 类型的函数变了

3 关于自动求梯度用法的变迁

4 关于.data

5. 开始支持0维标量

6. 限制了volatile标志位的使用

参考资料

一写在前面

二核心变换

4 关于`.data`

6. 限制了`volatile`标志位的使用