分类任务是CV里面最基本的一个task,自然离不开分类损失,一般我们看代码都会有两种写法,今天我就想清楚的搞懂一下:
第一种写法:
criterion = nn.CrossEntropy()
loss = criterion(output, target)
第二种写法:
output = F.log_softmax(x, dim=1)
loss = F.nll_loss(output, target)
11 torch.nn.functional.log_softmax()
这个函数的全部是:
torch.nn.functional.log_softmax(input, dim=None, _stacklevel=3, dtype=None)
我是最近想自己从头写一个分类任务的时候看pytorch的官方examples中MNIST的训练文件里面模型的forward()函数里面突然用了这个函数,之前好像全连接层出来之后就没有了(来自这里)
在torch.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None)下面有这么一段话:
This function doesn’t work directly with NLLLoss, which expects the Log to be computed between the Softmax and itself. Use log_softmax instead (it’s faster and has better numerical properties).
也就是说,对于NLLLoss这种损失函数,是期望数据经过softmax之后再经过对数log运算的,所以在这里它这里采用了log_softmax(),之后就对应用了F.nll_loss损失函数。并且log_softmax()运算更加快和数值稳定,可以用dtype参数来控制输入的数据类型以防止溢出。
以下是softmax和log_softmax的测试:
>>> import torch
>>> import torch.nn.functional as F
>>> a = torch.tensor([[1, 1, 1], [2, 2, 2]], dtype=torch.float)
>>> a
tensor([[1., 1., 1.],
[2., 2., 2.]])
>>> F.softmax(a)
tensor([[0.3333, 0.3333, 0.3333],
[0.3333, 0.3333, 0.3333]])
>>> F.softmax(a,dim=0)
tensor([[0.2689, 0.2689, 0.2689],
[0.7311, 0.7311, 0.7311]])
# 这里好像看出softmax的dim参数好像默认是1,而不是官方文档里面的None
>>> F.log_softmax(a, dim=1)
tensor([[-1.0986, -1.0986, -1.0986],
[-1.0986, -1.0986, -1.0986]])
>>> F.log_softmax(a, dim=0)
tensor([[-1.3133, -1.3133, -1.3133],
[-0.3133, -0.3133, -0.3133]])
# 在用log_softmax()函数时不指定dim会有warning.这个函数就是softmax和log的二合一
这里多说一句:torch.nn.Softmax(dim=None)和torch.nn.LogSoftmax(dim=None)其实功能是和F.softmax()和F.log_softmax()都是一样的,看源码就知道前者是调用后者的(对LogSoftmax也是一样的):
[docs]class Softmax(Module):
r"""Applies the Softmax function to an n-dimensional input Tensor
rescaling them so that the elements of the n-dimensional output Tensor
lie in the range [0,1] and sum to 1.
Softmax is defined as:
.. math::
\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}
Shape:
- Input: :math:`(*)` where `*` means, any number of additional
dimensions
- Output: :math:`(*)`, same shape as the input
Returns:
a Tensor of the same dimension and shape as the input with
values in the range [0, 1]
Arguments:
dim (int): A dimension along which Softmax will be computed (so every slice
along dim will sum to 1).
.. note::
This module doesn't work directly with NLLLoss,
which expects the Log to be computed between the Softmax and itself.
Use `LogSoftmax` instead (it's faster and has better numerical properties).
Examples::
>>> m = nn.Softmax(dim=1)
>>> input = torch.randn(2, 3)
>>> output = m(input)
"""
__constants__ = ['dim']
def __init__(self, dim=None):
super(Softmax, self).__init__()
self.dim = dim
def __setstate__(self, state):
self.__dict__.update(state)
if not hasattr(self, 'dim'):
self.dim = None
def forward(self, input):
return F.softmax(input, self.dim, _stacklevel=5)
def extra_repr(self):
return 'dim={dim}'.format(dim=self.dim)
12 torch.nn.functional.nll_loss()
这个函数的全部是:
torch.nn.functional.nll_loss(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')
也就是The negative log likelihood loss,公式是长这样的:
这里的就是预测出每类的概率分布,也就是ground truth概率分布,也就是所谓的标签target。上面的F.log_softmax()其实就是计算出了,而这也是F.nll_loss()函数的输入之一(官网原话:The input given through a forward call is expected to contain log-probabilities of each class
),而另一个输入就是标签喽,我们给出的标签是一个索引(我通过打印出pytorch自带的MNIST和CIFAR10数据集的标签也证实了这一点),F.nll_loss()在计算的时候会自动转化为one-hot编码(官网原话:The target that this loss expects should be a class index in the range [0, C-1] where C = number of classes
),其实这也就是一个交叉熵损失,和下面要讲的另一个损失函数其实是一样的(关于交叉熵可以去看这一篇博客,文末也会放一些关系式),来看几个例子
>>> a = torch.tensor([[10,10,10,10,10],[20,20,20,20,20]], dtype=torch.float)
>>> target = torch.tensor([0, 0])
>>> loss = F.nll_loss(F.log_softmax(a), target)
>>> loss
tensor(1.6094)
>>> loss = F.nll_loss(F.log_softmax(a), target,size_average=False)
Warning (from warnings module):
File "E:\Miniconda3\lib\site-packages\torch\nn\_reduction.py", line 49
warnings.warn(warning.format(ret))
UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
>>> loss
tensor(3.2189)
>>> loss = F.nll_loss(F.log_softmax(a), target,reduction='sum')
>>> loss
tensor(3.2189)
>>> loss = F.nll_loss(F.log_softmax(a), target,reduction='mean')
>>> loss
tensor(1.6094)
>>> loss = F.nll_loss(F.log_softmax(a), target,reduction='none')
>>> loss
tensor([1.6094, 1.6094])
注意:F.nll_loss()尽量不要用size_average和reduce这两个参数了,建议使用reduction这个参数,当reduction='sum'的时候,就表示把这个batch里面每个example的损失加起来,而reduction='mean'就是做平均.
上面的运算过程可以用下面这幅图表示:
13torch.nn.CrossEntropyLoss()
这个函数的全部是:
torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')
当然有其对应的F.cross_entropy()函数,从参数上可以看出,这个和前面的F..nll_loss()功能是差不多的,那为什么还有这个函数呢?
因为:官网原话:This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.
也就是说,这里可以直接拿logits和target进行交叉熵损失的计算(所谓的logits就是网络的直接输出,不经过什么softmax和log的操作,也就是不用在forward函数里面写别的东西了,也就是开头给出的第一种写法),其他功能是类似的,不再赘述,放几张官网截图吧:
Bonus
总结来说,最后代码里面只要挑一种写法写就好了,下面放一点关于交叉熵的分享:
熵是信息论里面的知识,而交叉熵用来衡量两个分布之间的“距离”
Entropy(熵):
CrossEntropy(交叉熵):
Entropy(熵)与CrossEntropy(交叉熵)的关系:
当分布时:,即
对于one-hot encoding,,即此时,此时我们要学习的目标就是让这两个分布尽量接近。
Note
一般target都是LongTensor类型的,所以在你自己数据集继承dataset类时,返回的target记得转化为int64或者float64。