logistical regression

1.logistic regression model

1.1 classification

want 0\le h_\theta(x)\le1
h_\theta(x)=g(\theta^Tx)
g(z)=\frac{1}{1+e^{-z}}
h_\theta(x)=P(y=1|x;\theta)--estimated\ probaility\ that\ y=1\ on\ input\ x \ parameterized\ by \theta

1.2 cost function

Cost(h_\theta(x),y)=\left\{ \begin{aligned} &\ -log(h_\theta(x))\qquad\ \ \ if \ y=1 \\ &\ -log(1-h_\theta(x))\quad if \ y=0 \end{aligned} \right.
J(\theta)=\frac{1}{m} \sum_{i=1}^m Cost(h_\theta(x^{(i)},y^{(i)})

y=1

y=0

simplified cost function:
Cost(h_\theta(x),y)=-ylog(h_\theta(x))-(1-y)log(1-h_\theta(x))

gradient descent:
repeat {
\theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}
}

1.3Optimization algorithms:

  • Gradient descent
  • Conjugate gradient
  • BFGS
  • L-BFGS
    cost function
function [jVal, gradient] = costFunction(theta)
  jVal = [...code to compute J(theta)...];
  gradient = [...code to compute derivative of J(theta)...];
end

Then we can use octave's "fminunc()" optimization algorithm along with the "optimset()" function that creates an object containing the options we want to send to "fminunc()"

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2,1);
   [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);

1.4 multiclass classfication

Train a logistic regression classifier h_\theta(x)for each class to predict the probability that y = i .To make a prediction on a new x, pick the class that maximizes h_\theta (x)

1.5 how to solve overFitting

  • reduce number of features
  • regularization
1.5.1 Regularized Linear Regression

small values for parameters \theta_0,\theta_1...\theta_n
①simpler hypothesis
②less prone to overfitting

J(\theta)=\frac{1}{2m}\big[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{i=1}^n\theta_j^2\big]
if \lambda is too large,then it will result in underfitting, because \theta_0,\theta_1...\theta_n will be close to 0 at this moment.

gradient descent:
repeat{
\theta_0:=\theta_0-\alpha/m*\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}
\theta_j:=\theta_j-\alpha/m*\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j\quad (j=1...n)
}

Normal Equation:


normal equation

1.5.2 Regularized logistic Regression

J(\theta)=-\big[1/m\sum_{i=1}^m(y^{(i)}log(h_\theta(x^{(i)}))-(1-y^{(i)})log(1-h_\theta(x^{(i)})))\big]+\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2
attention:\theta_0=0

gradient descent:
repeat{
\theta_0:=\theta_0-\alpha/m*\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}
\theta_j:=\theta_j-\alpha/m*\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j\quad (j=1...n)
}


Appendices

the derivation of cost function
first:
h_\theta(x)=P(y=1|x;\theta) =1-P(y=0|x;\theta) \ \ --estimated\ probaility\ that\ y=1\ on\ input\ x \ parameterized\ by \theta
h_\theta(X)=g(z)=\frac{1}{1+e^{-z}}\quad\quad z=\theta ^TX
so we can get the general formula:
P(y|x;\theta)=g(z)^y(1-g(z))^{(1-y)}\qquad (y={0,1})----(1)
then use the Maximum likelihood estimation(MLE):
note that L(\theta)=\prod_{i=1}^nP(y^{(i)}|x^{(i)};\theta)----(2)
substitute equation (1) into equation (2):
L(\theta)=\prod_{i=1}^ng(z^{(i)})^{y{(i)}}(1-g(z^{(i)}))^{1-y^{(i)}}----(3)

on equation (3) on both sides of the natural logarithm:
ln(L(\theta))=\sum_{i=1}^n\big[y^{(i)}ln(g(z^{(i)}))+(-y^{(i)})ln(1-g(z^{(i)}))----(4)

we know that MLE's goal is to get the best \theta that makes equation (4) max, so we let
J(\theta)=-\frac{1}{m}ln(L(\theta))=-\frac{1}{m}\sum_{i=1}^n\big[y^{(i)}ln(g(z^{(i)}))+(-y^{(i)})ln(1-g(z^{(i)}))----(5)

next, we will to get the deviation\frac{\partial J}{\partial \theta_j}:
\begin{aligned} \\&\frac{\partial J}{\partial \theta_j}=-\sum_{i=1}^n\big[y^{(i)}\frac{1}{g(z^{(i)})}+(1-y^{(i)})\frac{-1}{1-g(z^{(i)})}\big]\frac{\partial g(z^{(i)})}{\partial \theta_j} \\& \\&=-\sum_{i=1}^n\big[\frac{y^{(i)}}{g(z^{(i)})}-\frac{(1-y^{(i)})}{1-g(z^{(i)})}\big]g(z^{(i)})(1-g(z^{(i)}))x_j^{(i)} \\&=-\sum_{i=1}^n\big[y^{(i)}(1-g(z^{(i)}))-(1-y^{(i)})g(z^{(i)})\big]x_j^{(i)} \end{aligned}
so further,we can calculate :
\theta_j:=\theta_j-\alpha\frac{\partial J}{\partial \theta_j}

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容