线性回归

理论依据
泰勒公式

$\mathrm{f}(x)=\frac{f\left(x_{0}\right)}{0 !}+\frac{f^{\prime}\left(x_{0}\right)}{1 !}\left(x-x_{0}\right)+\frac{f^{\prime \prime}\left(x_{0}\right)}{2 !}\left(x-x_{0}\right)^{2}+\cdots \frac{f^{n}\left(x_{0}\right)}{n !}\left(x-x_{0}\right)^{n}+o\left(x-x_{0}\right)^{n}$

$\mathrm{f}(x)=\sum_{k=0}^{n} w_{k} \phi_{\mathrm{k}}(x)+\mathrm{o}\left(x-x_{0}\right)^{n}$

$f(x)=\overrightarrow{\mathrm{w}} * \overrightarrow{\phi(x)}+o\left(x-x_{0}\right)^{n}$

对于区间[a,b]上任意一点，函数值都可以用两个向量内积的表达式近似，其中 $\phi_{\mathrm{k}}(x)$ 是基函数（basis function）, $w_{k}$
是相应的系数。高阶表达式 $o\left(x-x_{0}\right)^{n}$ 表示两者值的误差。

傅里叶公式

$f(x)=c_{0}+\sum_{n=1}^{N} c_{n} \cos \left(m w_{0} x\right)+\varepsilon, \quad \# \in \varepsilon \rightarrow 0$

$\ f(x)=\sum_{n=0}^{N} c_{n} \phi_{n}(x)+\varepsilon$

$f(x)=\bar{c}^{T} * \overline{\phi(x)}+\varepsilon$

周期函数f(x)可以用向量内积近似， $\phi_{n}(x)$ 表示基函数， $c_{n}$ 表示相应的系数， $\varepsilon$ 表示误差。

线性回归

由泰勒公式和傅里叶级数可知，当基函数的数量足够多时，向量内积无限接近于函数值。线性回归的向量内积表达式如下：

$\mathrm{f}(x)=w_{0}+w_{1} \phi_{1}(x)+w_{2} \phi_{2}(x)+\cdots w_{n} \phi_{n}(x)+\varepsilon$

$f(x)=\sum_{j=0}^{n} w_{j} \phi_{j}+\varepsilon$

$f(x)=\overline{w_{j}}^{T} \overline{\phi_{j}(x)}+\varepsilon$

过拟合原因

模型太过复杂以致于把无关紧要的噪声也学进去了。当线性回归的系数向量间差异比较大时，则大概率设计的模型处于过拟合了。用数学角度去考虑，若某个系数很大，对于相差很近的x值，结果会有较大的差异，这是较明显的过拟合现象。

sigmoid函数

$\sigma(z)=\frac{1}{1+e^{-z}}$

$\sigma^{\prime}(z)=\left(\frac{1}{1+e^{-z}}\right)^{\prime}$

$=(-1)\left(1+e^{-z}\right)^{(-1)-1} \cdot\left(e^{-z}\right)^{\prime}$

$=\frac{1}{\left(1+e^{-z}\right)^{2}} \cdot\left(e^{-z}\right)^{\prime}$

$=\frac{1}{1+e^{-z}} \cdot \frac{e^{-z}}{1+e^{-z}}$

$=\frac{1}{1+e^{-z}} \cdot\left(1-\frac{1}{1+e^{-z}}\right)$

$=\sigma(z)(1-\sigma(z))$

pytorch 实现步骤

生成数据集

num_inputs = 2
num_examples = 1000

true_w = [2, -3.4]
true_b = 4.2

features = torch.tensor(np.random.normal(0, 1, (num_examples, num_inputs)), dtype=torch.float)
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float)

读取数据集

import torch.utils.data as Data

batch_size = 10

# combine featues and labels of dataset
dataset = Data.TensorDataset(features, labels)

# put dataset into DataLoader
data_iter = Data.DataLoader(
    dataset=dataset,            # torch TensorDataset format
    batch_size=batch_size,      # mini batch size
    shuffle=True,               # whether shuffle the data or not
    num_workers=2,              # read data in multithreading
)

for X, y in data_iter:
    print(X, '\n', y)
    break

定义模型

class LinearNet(nn.Module):
    def __init__(self, n_feature):
        super(LinearNet, self).__init__()      # call father function to init
        self.linear = nn.Linear(n_feature, 1)  # function prototype: `torch.nn.Linear(in_features, out_features, bias=True)`

    def forward(self, x):
        y = self.linear(x)
        return y

net = LinearNet(num_inputs)
print(net)
# ways to init a multilayer network
# method one
net = nn.Sequential(
    nn.Linear(num_inputs, 1)
    # other layers can be added here
    )

# method two
net = nn.Sequential()
net.add_module('linear', nn.Linear(num_inputs, 1))
# net.add_module ......

# method three
from collections import OrderedDict
net = nn.Sequential(OrderedDict([
          ('linear', nn.Linear(num_inputs, 1))
          # ......
        ]))

print(net)
print(net[0])

初始化模型参数

from torch.nn import init

init.normal_(net[0].weight, mean=0.0, std=0.01)
init.constant_(net[0].bias, val=0.0)  # or you can use net[0].bias.data.fill_(0)` to modify it directly

for param in net.parameters():
    print(param)

定义损失函数

loss = nn.MSELoss()    # nn built-in squared loss function
                       # function prototype: torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')

定义优化函数

import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.03)   # built-in random gradient descent function
print(optimizer)  # function prototype: `torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)

训练

num_epochs = 3
for epoch in range(1, num_epochs + 1):
    for X, y in data_iter:
        output = net(X)
        l = loss(output, y.view(-1, 1))
        optimizer.zero_grad() # reset gradient, equal to net.zero_grad()
        l.backward()
        optimizer.step()
    print('epoch %d, loss: %f' % (epoch, l.item()))

# result comparision
dense = net[0]
print(true_w, dense.weight.data)
print(true_b, dense.bias.data)

linear regression

linear regression

线性回归

理论依据

泰勒公式

傅里叶公式

线性回归

过拟合原因

sigmoid函数

pytorch 实现步骤

推荐阅读更多精彩内容