从0开始实现逻辑回归算法(LogicRegression)
逻辑回归(LR)算法是一个比较常见的二元分类算法,通常只预测正例的概率,如给定一个样本x,预测出来的结果为0.4,那么表示方法就是p(y=1|x)=0.4,也就是说在给定样本x的情况下,通过LR预测出来正例的概率为0.4,反之,为负例的概率为0.6,即p(y=0|x)=0.6。
逻辑回归的数学表示为Y_hat=sigmoid(X*W+b),函数原型和线性模型很相似,实质上LR本质上是一个线性模型,可以从广义线性模型和伯努利分布进行推导这个模型,本文就不做推导了。其实有一个问题一直摆在很多初学者的面前,为啥公式是这个样子的,为啥不是其它的。其实,机器学习的目标是什么,是找到一个参数,我们输入样本,输出结果。那么最简单的表示是通过一个式子来表示我们的这个过程。理论上,总有一个公式来拟合我们的数据,比如牛顿定律F=ma,其实也可以理解为一个模型,参数为a,质量m为样本,那么受到的力为F,F就是我们的目标。LR这个公式也可以这么理解。
想要实现LR并不难,主要要理解cost function和梯度的算法。如果用tensorflow这类的框架,甚至不用求梯度,只用给出cost function即可。下面我将给出LR的实现代码,这个代码是可以正常工作的,main函数就是用iris数据集进行的测试。
from sklearn import datasets
from sklearn import metrics
import matplotlib.pyplot as plt
import numpy as np
def softmax(X):
return (np.exp(X) / (np.exp(X).sum(axis=0)))
def sigmod(X):
return (1) / (1 + np.exp(-X))
def score(W, b, X_test, Y_test):
m = X_test.shape[0]
Y_ = predict(W, b, X_test)
Y2 = np.array([1 if i > 0.5 else 0 for i in Y_]).reshape(m, 1)
accuracy = metrics.accuracy_score(Y_test, Y2)
return accuracy
def cost_gradient_descent(X, Y, W, b, learning_rate, lamda):
Z = np.dot(X, W) + b
Y_ = sigmod(Z)
m = X.shape[0]
Y2 = np.array([1 if i > 0.5 else 0 for i in Y_]).reshape(m, 1)
accuracy = metrics.accuracy_score(Y, Y2)
# J = -(Y.T.dot(np.log(Y_)) + (1 - Y).T.dot(np.log(1 - Y_))).sum() / m
#
# W = W - (learning_rate *
# (1 / m) * (X.T.dot(Y_ - Y)) + 0)
J = -(Y.T.dot(np.log(Y_)) + (1 - Y).T.dot(np.log(1 - Y_))).sum() / \
m + lamda * (np.square(W).sum(axis=0)) * (1 / (2 * m))
W = W - (learning_rate *
(1 / m) * (X.T.dot(Y_ - Y)) + (1 / m) * W * lamda)
b = b - learning_rate * (1 / m) * ((Y_ - Y).sum(axis=0))
# b = b - (learning_rate * (1 / m)
# * ((Y_ - Y).sum(axis=0)) + (1 / m) * b * lamda)
# b一般不进行正则化
return J, W, b, accuracy
def predict(W, b, X):
Z = np.dot(X, W) + b
Y_ = sigmod(Z)
m = X.shape[0]
Y2 = np.array([1 if i > 0.5 else 0 for i in Y_]).reshape(m, 1)
return Y2
def train(X, Y, iter_num=1000):
# define parameter
m = X.shape[0]
n = X.shape[1]
W = np.ones((n, 1))
b = 0
learning_rate = 0.01
lamda = 0.01
i = 0
J = []
Accuracy = []
while i < iter_num:
i = i + 1
j, W, b, accuracy = cost_gradient_descent(
X, Y, W, b, learning_rate, lamda)
J.append(j)
Accuracy.append(accuracy)
print("step:", i, "cost:", j, "accuracy:", accuracy)
print(W)
print(b)
plt.plot(J)
plt.plot(Accuracy)
plt.show()
return W, b
def main():
# construct data
iris = datasets.load_iris()
X, Y = iris.data, iris.target.reshape(150, 1)
X = X[Y[:, 0] < 2]
Y = Y[Y[:, 0] < 2]
train(X, Y, 100)
def test():
X = np.array([[1, 0.5], [1, 1.5], [2, 1], [3, 1]])
m = (X.shape[0])
n = (X.shape[1])
Y = np.array([0, 0, 1, 0]).reshape(m, 1)
print((Y.shape))
print(train(X, Y, 1000))
if __name__ == '__main__':
main()
# test()
运行代码将输出如下:在64次迭代的时候就收敛了。代码里面实现了参数的L2正则化。
step: 62 cost: [ 0.33512973] accuracy: 0.97
step: 63 cost: [ 0.32701202] accuracy: 0.98
step: 64 cost: [ 0.31998367] accuracy: 1.0
step: 65 cost: [ 0.31388857] accuracy: 1.0
此代码是可用代码