引用stackoverflow上面一个人个回答:
Having two different functions is aconvenience, as they produce the same result.
The difference is simple:
Forsparse_softmax_cross_entropy_with_logits, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range[0, num_classes-1].
Forsoftmax_cross_entropy_with_logits, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.
Labels used insoftmax_cross_entropy_with_logitsare theone hot versionof labels used insparse_softmax_cross_entropy_with_logits.
Another tiny difference is that withsparse_softmax_cross_entropy_with_logits, you can give -1 as a label to have loss0on this label.
加一个测试效果图: