Logistic Regression(逻辑回归)

  • sigmoid 函数介绍
  • 逻辑回归原理详解
  • 逻辑回归代码实践

1.sigmoid函数

sigmoid表达式:f(x) = \frac{1}{1+e^{- \ x}}
图像:

sigmoid.png

1.1 计算f(0)
f(0)= 0.5

因为e^{-x}\in(0,+\infty) 所以f(x)\in(0,1)

这个特性可以和概率联系起来 p∈[0,1]

1.2 计算sigmoid函数的导数
f(x)^{'} = (\frac{1}{1+e^{- \ x}})^{'}
= -(\frac{1}{1+e^{- \ x}})^{2}\cdot(e^{- \ x})\cdot(-1)
=\frac{1}{1+e^{- \ x}}\cdot\frac{e^{- \ x}}{1+e^{- \ x}}
=\frac{1}{1+e^{- \ x}}\cdot(1-\frac{-1}{1+e^{- \ x}})
=f(x)\cdot(1-f(x))

即:f(x)^{'} = f(x)(1-f(x)),也就是说想知道某sigmoid函数某一点的导数值,根据它的函数值就能计算,这是它的第二个特性。
因为以上的两个特性,所以我们会用sigmoid函数。

2.构建模型

X= \left\{ \begin{matrix} X_1^{(1)} & X_2^{(1)} & \cdots & X_n^{(1)} \\ X_1^{(2)} & X_2^{(2)} & \cdots & X_n^{(2)}\\ \vdots & \vdots & \ddots & \vdots \\ X_1^{(m)} & X_2^{(m)} & \cdots & X_n^{(m)} \end{matrix} \right\}
先来看矩阵X,每一行x是一个样本,右上标数字表示第几个样本,右下标数字表示不同的特征,表示第几个特征,也就是每一列。
y= \left\{ \begin{matrix} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(m)} \end{matrix} \right\}
这里y是标签,要么为0,要么为1,LR是二分类,class1 = 0,class2 = 1
logit
\ \downarrow
\ Z = w_1x_1+w_2x_2+w_nx_n +b= \vec{w}\cdot\vec{x}+b \qquad(1)
\ \downarrow
\ \delta(Z) = \frac{1}{1+e^{- \ z}} =\frac{1}{1+e^{- \ (\vec{w}\cdot\vec{x}+b)}} = \hat{y} \qquad(2)
公式(1)带入sigmoid函数就得到了公式(2),这就是Logistic Regression的model\vec{w}和b是模型参数,找到最优的\vec{w}和b,我们用w^*,b^*代表最优参数

根据sigmoid函数特性:
\hat{y}<0.5 \rightarrow class1\rightarrow 0
\hat{y}>0.5 \rightarrow class1\rightarrow 1

3.构建损失函数

\ Z = w_1x_1+w_2x_2+w_nx_n +b= \vec{w}\cdot\vec{x}+b
\ \downarrow
\ \delta(Z) = \frac{1}{1+e^{- \ z}} =\frac{1}{1+e^{- \ (\vec{w}\cdot\vec{x}+b)}} = \hat{y}
\begin{cases} 当真实值y =1\qquad 预测值\hat{y} \rightarrow 1 越大越好,也就是说当标签y值为1,\hat{y}越接近1越好 \\ 当真实值y = 0\qquad 预测值\hat{y} \rightarrow 0 越小越好,也就是 1-\hat{y}\rightarrow 1 \end{cases}
也就是希望下面这两个式子同时接近1
\begin{cases} 当\hat{y}\rightarrow1 \\ 当1-\hat{y}\rightarrow1 \end{cases} \qquad(3)
所以我们给定式子\hat{y}^y(1-\hat{y})^{1-y}\qquad(4)
(4)式中当y = 1,得到\hat{y}.当y = 0,得到1-\hat{y},根据式(3)我们知道式子(4)越大越好。加上负号即为损失函数,越小越好。
\ \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\downarrow
\sum_{i=1}^m -\hat{y}^{(i)y^{(i)}}(1-\hat{y}^{(i)})^{1-y^{(i)}}\qquad(5)
式(5)把所有样本都考虑了进来,实际值y和预测值\hat{y},这就是LR的损失函数,接下来我们完整写出LR的损失函数
Loss function:
L(w,b) =- \sum_{i=1}^m \hat{y}^{(i)y^{(i)}}\cdot(1-\hat{y}^{(i)})^{1-y^{(i)}}
=- \sum_{i=1}^m \ln\hat{y}^{(i)y^{(i)}}\cdot(1-\hat{y}^{(i)})^{1-y^{(i)}}
=- \sum_{i=1}^m [\ln\hat{y}^{(i)y^{(i)}}+\ln(1-\hat{y}^{(i)})^{1-y^{(i)}}]
=- \sum_{i=1}^m [y^{(i)}\ln\hat{y}+{(1-y^{(i)})}\ln(1-\hat{y}^{(i)})]
y^{(i)}是常数,W和b在\hat{y}^{(i)}中
L(w,b) =- \sum_{i=1}^m [y^{(i)}\ln\hat{y}+{(1-y^{(i)})}\ln(1-\hat{y}^{(i)})]\qquad (6)

4.求取损失函数最小值

L(w,b) =- \sum_{i=1}^m [y^{(i)}\ln\hat{y}+{(1-y^{(i)})}\ln(1-\hat{y}^{(i)})]
目标函数我们有了,最终目标我们是要求取最优参数也就是W和b
即:W = (w_1,w_2,..w_n) \qquad b 我们对w_1求导,先去掉求和符合和角标,这样便于计算和理解
\frac{\partial L(w_1,b)}{\partial w_1} = - \sum_{i=1}^m (y^{(i)}-{\hat{y}^{(i)}})\cdot X_1
=y\cdot\frac{1}{y}\cdot\hat{y}(1-\hat{y})\cdot X_1+(1-y)\frac{1}{1-\hat{y}}\cdot (-1)\cdot\hat{y}(1-\hat{y})\cdot X_1
=(y-\hat{y})\cdot X_1

加上\sum符号和其余角标
\frac{\partial L(w_1,b)}{\partial w_1} = - \sum_{i=1}^m(y^{(i)}-\hat{y}^{(i)})\cdot X_1
\frac{\partial L(w_2,b)}{\partial w_2} = - \sum_{i=1}^m(y^{(i)}-\hat{y}^{(i)})\cdot X_2
\qquad\vdots \\
\frac{\partial L(w_n,b)}{\partial w_n} = - \sum_{i=1}^m(y^{(i)}-\hat{y}^{(i)})\cdot X_n
同理b
\frac{\partial L(w_1,b)}{\partial b} = - \sum_{i=1}^m(y^{(i)}-\hat{y}^{(i)})

使得损失函数最小值时候的参数求法我们用的方法是 梯度下降法
w_1\leftarrow w_1 - lr\cdot\frac{\partial L(w_1,b)}{\partial w_1}
w_2\leftarrow w_2 - lr\cdot\frac{\partial L(w_2,b)}{\partial w_2}
\qquad\vdots \\
w_n\leftarrow w_n - lr\cdot\frac{\partial L(w_n,b)}{\partial w_n}
b \ \ \ \leftarrow \ \ b \ - \ lr\cdot\frac{\partial L(w_1,b)}{\partial b}

这样救球得了 W 和 b了

5.Logistic代码实践

dataset = pd.read_csv("titanic_data/train.csv")
dataset.head()
titanic.png
dataset = dataset[['Age','Pclass','Sex','Fare','SibSp','Parch','Survived']]
n_dataset = dataset.shape[0]

# feature engineering
dataset['Fare'] = (dataset['Fare'] - dataset['Fare'].mean())/dataset['Fare'].std()
dataset['Sex'] = dataset['Sex'].astype('category').cat.codes
dataset['Pclass'] = (dataset['Pclass'] - dataset['Pclass'].mean())/ dataset['Pclass'].std()
dataset['Age'].fillna(dataset['Age'].mean(),inplace = True)
dataset['Age'] = (dataset['Age'] - dataset['Age'].mean()) / dataset['Age'].std()
dataset['Sex'] = (dataset['Sex'] - dataset['Sex'].mean()) / dataset['Sex'].std()
                  
train_data = dataset.iloc[0:600]
test_data = dataset.iloc[600:]
X_train = train_data.drop(columns = 'Survived').astype('float32')
y_train = train_data['Survived'].astype('float32')
n_train = X_train.shape[0]
X_test = test_data.drop(columns = 'Survived').astype('float32')
y_test = test_data['Survived'].astype('float32')
n_test = test_data.shape[0]
#Set up logistic regression model
def sigmoid(x):
    y = 1/(1 + np.exp(-x))
    return y

x = np.arange(-10,10,0.1)
y = sigmoid(x)
fig, ax = plt.subplots()
ax.scatter(0,sigmoid(0))
ax.plot(x,y)
plt.show()
n_features = X_train.shape[1]
w = np.zeros(n_features)
b = 0

epoches = 1000
lr = 0.0001

for epoch in range(epoches):
    det_w = np.zeros(n_features)
    det_b = 0.0
    
    logits = w.dot(X_train.T) + b
    y_hat = sigmoid(logits)
    det_w = - np.dot((y_train - y_hat),X_train)
    
    det_b = - np.sum(y_train - y_hat)
    
    w = w - lr * det_w
    b = b - lr * det_b
def get_accuracy(X,y,w,b):
    n_samples = X.shape[0]
    predicted_result = []
    total_loss = 0
    for i in range(n_samples):
        x = X.iloc[i]
        p = sigmoid(x.dot(w) + b)
        if p > 0.5:
            predicted_result.append(1)
        else:
            predicted_result.append(0)
    total_loss = 0
    for i in range(n_samples):
        loss = (y.iloc[i] - predicted_result[i]) ** 2
        total_loss += loss
    accuracy = (y.shape[0] - total_loss) / y.shape[0]
    return accuracy

#训练集
get_accuracy(X_train,y_train,w,b)

0.8016666666666666

#测试集
get_accuracy(X_test,y_test,w,b)

0.8006872852233677

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容