如果输出为四个,那么输出层的每个神经元需要学习的是“1和2的手写体之间的区别”之类的断言;
如果输出为十个,那么输出层的每个神经元需要学习的只是“判断一幅图片是不是1”这样的断言。
而描述一个图片是不是某个数字比描述两个数字之间的区别容易的多。
(问题来自Neural networks and deep learning)
You might wonder why we use 10 output neurons. After all, the goal of the network is to tell us which digit (0,1,2,…,9) corresponds to the input image. A seemingly natural way of doing that is to use just 44 output neurons, treating each neuron as taking on a binary value, depending on whether the neuron's output is closer to 0 or to 1. Four neurons are enough to encode the answer, since 24=16 is more than the 10 possible values for the input digit. Why should our network use 10 neurons instead? Isn't that inefficient? The ultimate justification is empirical: we can try out both network designs, and it turns out that, for this particular problem, the network with 1010output neurons learns to recognize digits better than the network with 4 output neurons. But that leaves us wonderingwhyusing 1010output neurons works better. Is there some heuristic that would tell us in advance that we should use the 10-output encoding instead of the 4-output encoding?
……