softmax and softmax with losssigmoid cross entropy lossGradient DescentNewton Methodsoft targetbatch norm参数初始化梯度弥散凸函数