参考:https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html
torch.autograd.Function
- given a random x, y, W1, W2.
y = ( W1 * x) * W2, to predict y with input x using gradient descent by minimizing squared Eculidean distance. - We redefine ReLU and achieve the forward pass and backward pass.
这里自定义了 ReLU函数的前馈和反馈过程
import torch
class MyReLU(torch.autograd.Function):
"""
We can implement our own custom autograd Functions by subclassing
torch.autograd.Function and implementing the forward and backward passes
which operate on Tensors.
"""
@staticmethod
def forward(ctx, input):
"""
In the forward pass we receive a Tensor containing the input and return
a Tensor containing the output. ctx is a context object that can be used
to stash information for backward computation. You can cache arbitrary
objects for use in the backward pass using the ctx.save_for_backward method.
"""
ctx.save_for_backward(input)
return input.clamp(min=0)
@staticmethod
def backward(ctx, grad_output):
"""
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
"""
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input<0] = 0
return grad_input
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# torch.backends.cuda.matmul.allow_tf32 = False # Uncomment this to run on GPU
# The above line disables TensorFloat32. This a feature that allows
# networks to run at a much faster speed while sacrificing precision.
# Although TensorFloat32 works well on most real models, for our toy model
# in this tutorial, the sacrificed precision causes convergence issue.
# For more information, see:
# https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
lr = 1e-6
relu = MyReLU.apply
for i in range(500):
y_pred = relu(x.mm(w1)).mm(w2)
loss = (y_pred-y).pow(2).sum()
if i % 100 == 99:
print(i, loss.item())
loss.backward()
# 参数的更新常规所使用的是`optim.step()`,去对定义在`optim`里面的`model.parameters()`这里进行更新
# 由于这里我们不使用优化器,因此这里直接手动进行更新,注意这里已经不需要算梯度了,只是把已经算好的梯度进行更新
with torch.no_grad():
w1-=lr*w1.grad
w2-=lr*w2.grad
w1.grad.zero_()
w2.grad.zero_()
输出结果:
99 952.6715087890625
199 6.376166820526123
299 0.06997707486152649
399 0.0012868450721725821
499 0.00012174161383882165