1 计算损失函数时,logits参数是什么
损失函数的计算涉及到Tensorflow的两个常用函数:
tf.nn.sigmoid_cross_entropy_with_logits(labels, logits)
tf.nn.softmax_cross_entropy_with_logits_v2(labels, logits)
#写这段文字时softmax_cross_entropy_with_logits被明确标为即将弃用的不建议函数
当我初次使用Tensorflow时,我天真的认为logits就是预测值(我一般称之为y_hat)。其实不然,参数logits为输出层的激活值(activation),与预测值的关系通常为:
y_hat = tf.nn.sigmoid (activation)
y_hat = tf.nn.softmax (activation)
2 用sigmoid作为输出层激活函数时,损失函数无法降为0
这很有可能时因为你的标签不是0或1,而是0到1间的一个数。根据交叉熵损失函数的定义,只有logit和label同时为0或同时为1时,其值为0。因此,在这labels不为0的情况下,损失函数的值是无法达到0的。
3 MNIST怎么使用
4 tf.nn.softmax_cross_entropy_with_logits_v2函数认为哪个维度是类别维度?
在一次跑mnist的实验中,我把类别维放在第一维(dim=0),并使用了tf.nn.softmax_cross_entropy_with_logits_v2计算熵,结果:
Epoch 1: cost=1611.621314344
Epoch 6: cost=17532.561093204
Epoch 11: cost=33408.859425080
可见,模型不收敛。几小时后,我发现了问题:tf.nn.softmax_cross_entropy_with_logits_v2默认类别维度为最后一维(dim=-1)。查阅文档:
tf.nn.softmax_cross_entropy_with_logits_v2(
_sentinel=None,
labels=None,
logits=None,
dim=-1,
name=None
)
Args:
-
_sentinel
: Used to prevent positional parameters. Internal, do not use. -
labels
: Each vector along the class dimension should hold a valid probability distribution e.g. for the case in which labels are of shape [batch_size, num_classes], each row of labels[i] must be a valid probability distribution. -
logits
: Unscaled log probabilities. -
dim
: The class dimension. Defaulted to -1 which is the last dimension. -
name
: A name for the operation (optional).
从文档中我们知道了,参数dim
指定了类别的维度。因此在使用这个函数时,记得指定dim参数。添加dim=0后,实验结果:
单层Softmax
Epoch 1: cost=1.747955866
Epoch 6: cost=0.334705674
Epoch 11: cost=0.291128567
Epoch 16: cost=0.271443650
Epoch 21: cost=0.266406643
Epoch 26: cost=0.259507637
Epoch 31: cost=0.255616793
Epoch 36: cost=0.252413219
Epoch 41: cost=0.254052425
Epoch 46: cost=0.254678538
Opitimization Finished!
三层softmax
Epoch 1: cost=9.467425473
Epoch 6: cost=2.299335675
Epoch 11: cost=1.818574688
Epoch 16: cost=0.992218607
Epoch 21: cost=0.570668126
Epoch 26: cost=0.229845069
Epoch 31: cost=0.150805521
Epoch 36: cost=0.119380749
Epoch 41: cost=0.101064101
Epoch 46: cost=0.082242706
Opitimization Finished!
祝大家的模型永远收敛~
5 tf.argmax的作用是什么
官方文档:
tf.argmax(
input,
axis=None,
name=None,
dimension=None,
output_type=tf.int64
)
- Returns the index with the largest value across axes of a tensor. (deprecated arguments)
Args:
-
input
: ATensor
. Must be one of the following types:float32
,float64
,int32
,uint8
,int16
,int8
,complex64
,int64
,qint8
,quint8
,qint32
,bfloat16
,uint16
,complex128
,half
,uint32
,uint64
. -
axis
: ATensor
. Must be one of the following types:int32
,int64
. int32 or int64, must be in the range[-rank(input), rank(input))
. Describes which axis of the input Tensor to reduce across. For vectors, use axis = 0. -
output_type
: An optionaltf.DType
from:tf.int32, tf.int64
. Defaults totf.int64
. -
name
: A name for the operation (optional).
根据文档我们知道,tf.argmax返回张量在某个方向上最大值的下标,运算后会使原张量降维。第二个参数axis指定了进行这个运算的维度,也就是因运算而消减的那个维度。