其实这个是一个最简单的神经网路,虽然比较简单相比所有深度网络都是从这里开始。其实机器学习还是需要人设计计算图,神经网络还是需要人来设计。在 MXNet 设计神经网络调试起来很方便,可以通过简单的 print 来实现调试。
生成数据
from mxnet import ndarray as nd
from mxnet import autograd as ag
num_inputs = 2
num_examples = 1000
true_w = [2,-3.4]
true_b = 4.2
x = nd.random_normal(shape=(num_examples,num_inputs))
y = true_w[0] * x[:,0] + true_w[1] * x[:,1] + true_b
y += .01 * nd.random_normal(shape=y.shape)
print(x[0],y[0])
(
[1.1630787 0.4838046]
<NDArray 2 @cpu(0)>,
[4.879625]
<NDArray 1 @cpu(0)>)
读取数据
batch_size = 10
def data_iter():
idx = list(range(num_examples))
random.shuffle(idx)
for i in range(0,num_examples,batch_size):
j = nd.array(idx[i:min(i + batch_size,num_examples)])
yield nd.take(X,j), nd.take(y,j)
batch_size = 10
def data_iter():
idx = list(range(num_examples))
random.shuffle(idx)
for i in range(0,num_examples,batch_size):
j = nd.array(idx[i:min(i + batch_size,num_examples)])
yield nd.take(X,j), nd.take(y,j)
for data, label in data_iter():
print(data,label)
break
[[ 0.6331259 0.2607249 ]
[-0.3467711 0.53791016]
[ 1.7955405 0.50660264]
[-0.6394041 -0.07654858]
[-0.11860342 1.4148241 ]
[ 2.0691068 -0.13823958]
[ 1.9469851 0.04463983]
[-1.2816252 0.57725173]
[-0.5098918 -0.98913276]
[-1.2689446 -1.0154109 ]]
初始化模型参数
w = nd.random_normal(shape=(num_inputs,1))
b = nd.zeros((1,))
params = [w,b]
训练时需要对这些参数求导来更新参数的值,所以我们需要创建他们的梯度
for param in params:
param.attach_grad()
定义模型
def net(X):
return nd.dot(X,w) + b
print(net(data))
[[ 1.273058 ]
[-1.0126091 ]
[-0.798329 ]
[ 1.3755615 ]
[ 1.6786777 ]
[ 1.6878706 ]
[ 1.3343879 ]
[-0.59864104]
[ 4.464821 ]
[ 1.3346941 ]]
损失函数
我们使用常见的平方误差来衡量预测目标和真实目标之间的差距。
def square_loss(yhat, y):
return (yhat - y.reshape(yhat.shape)) ** 2
优化
线性回归有显示解,绝大部分模型,每一步,将模型参数沿着梯度的反方向特定距离,这个距离一般叫学习率。
def SGD(params, lr):
for param in params:
param[:] = param - lr * param.grad
训练
我们可以开始训练,训练通常需要迭代数据数次
epochs = 5
learning_rate = 0.001
for e in range(epochs):
total_loss = 0
for data, label in data_iter():
with ag.record():
output = net(data)
loss = square_loss(output,label)
loss.backward()
SGD(params, learning_rate)
total_loss += nd.sum(loss).asscalar()
print("Epoch %d, average loss: %f" % (e, total_loss/num_examples))
SGD 是让我们在 loss 曲线上移动,知道移动到 loss 的最低点,在这一点也就是我们预测结果最优方案。
total_loss += nd.sum(loss).asscalar()
然后我们对 loss 去和求,当 loss 不变时候也就是函数开始收敛了
Epoch 0, average loss: 5.547779
Epoch 1, average loss: 0.098675
Epoch 2, average loss: 0.001856
Epoch 3, average loss: 0.000130
Epoch 4, average loss: 0.000097
([2, -3.4], 4.2)
(
[[ 1.9996272]
[-3.4002655]]
<NDArray 2x1 @cpu(0)>,
[4.2003827]