Nonlinear Regression Models(video1)

definition

Nonlinear Regression is any relationship between an independent variable $x$ and a dependent variable $y$ which results in a non-linear function modelled data.

difference between Linear Regression & Non-Linear Regression

Linear Regression：

Non-Linear Regression：

Remark：the $\phi$ is a mapping which satisfies that $\phi(x):R^d \rightarrow R^{p+1}$ .(d is the dimension of independent variable)And the name of $\phi(x)$ is basis functions. $\phi(x_0)=1$ shown as follow.

Converting Non-Linear Regression to Linear Regression

Map $x\in R^d$ to $\phi(x)\in R^{p+1}$ to obtain nonlinear features
Linear regression on $y$ and nonlinear features $\phi(x)$
Hence，to solve the Non-Linear Regression，we only need to translate $x$ to $\phi(x)$
which means:the parameter estimation for linear regression is

and

Common basis function design

polynomial： $\phi_j(x)=x^j$
Radial basis function $\phi_j(x)=exp(-\frac{x-\mu_j}{2\lambda_j^2})$
sigmoid $\phi_j(x)=\delta (\frac{x-\mu_j}{s})$ where $\delta (x)=(\frac{1}{1+e^{-x}})$
splines
fourier
wavelets

Regulariaztion(video2)

definition

A regularization technique is in simple terms a penalty mechanism which applies shrinkage (driving them closer to zero) of coefficient to build a more robust and parsimonious model.

A regularization technique helps in the following main ways:

Doesn’t assume any particular distribution of the independent variable(对自变量无限制)
Address Variance-Bias Tradeoffs. Generally will lower the variance from the model(降方差)
More robust to handle multicollinearity（很多列相关性高）
Better sparse data (observations即行数、观测值 < features特征值) handling
Natural feature selection（数据清洗）
More accurate prediction on new data as it minimizes overfitting on the train data
Easier interpretation of the output

概念复习：

norm：

Linear Regression：

Linear Regression with Regularization：

R function的处理

从上图不难看出Linear Regression和Linear Regression with Regularization的区别主要是在如何定义R function （即图中的 $R(\beta)$ ）
example:
(注： $\lambda$ 是惩罚力度，是一个超参，训练时固定，需要其他机制去训练
当 $\lambda\rightarrow 0$ ,得到普通线性回归
当 $\lambda\rightarrow\infty$ ,得到 $\beta=0$ )

LASSO: $R(\beta)=\lambda||\beta||_1=\lambda\sum_{i= 1}^{d}|\beta_i|$
Rigde: $R(\beta)=\lambda||\beta||_2^2=\lambda\sum_{i= 1}^{d}\beta_i^2$
Elastic Net:(LASSO和Rigde的综合)： $R(\beta)=\lambda_1||\beta||_1+\lambda_2||\beta||_2^2=\lambda_1\sum_{i= 1}^{d}|\beta_i|+\lambda_2\sum_{i= 1}^{d}\beta_i^2$

三种回归的特性

Ridge Regression Properties：

Ridge regression shrinks the coefficients and it helps to reduce the model complexity and multi�collinearity.
Coefficient of parameters can approach to zero but never become zero .

Lasso Regression Properties：

The lasso is a shrinkage method like ridge, but acts in a nonlinear manner on the outcome $y$ .
$L_1$ regularization can lead to zero coefficients i.e. some of the features are completely neglected for the evaluation of output. So Lasso regression not only helps in reducing over-fitting but it can help us in feature selection.
In $p>n$ case, Lasso selects at most $n$ variable before it saturates.
If there is a group of variables among which the pairwise correlations are very high, then Lasso select one from the group.(高度相似的变量会随机选择，不固定)

Elastic Net Regression Properties：

Combination of both $L_1$ and $L_2$ and regularization
$L_1$ part of the penalty generates a sparse model
$L_2$ part of the penalty
Remove the limitation of the number of selected variables(可以选到p个）
Encouraging group effect
Stabilize the $L_1$ regularization path

ML notes（10/03/20）

ML notes（10/03/20）

Nonlinear Regression Models(video1)

definition

difference between Linear Regression & Non-Linear Regression

Converting Non-Linear Regression to Linear Regression

Common basis function design

Regulariaztion(video2)

definition

A regularization technique helps in the following main ways:

概念复习：

norm：

Linear Regression：

Linear Regression with Regularization：

R function的处理

三种回归的特性

Ridge Regression Properties：

Lasso Regression Properties：

Elastic Net Regression Properties：

推荐阅读更多精彩内容