ML notes(10/03/20)

Nonlinear Regression Models(video1)

definition

Nonlinear Regression is any relationship between an independent variable x and a dependent variable y which results in a non-linear function modelled data.

difference between Linear Regression & Non-Linear Regression

Linear Regression:


Non-Linear Regression

Remark:the \phi is a mapping which satisfies that \phi(x):R^d \rightarrow R^{p+1} .(d is the dimension of independent variable)And the name of \phi(x) is basis functions. \phi(x_0)=1 shown as follow.

Converting Non-Linear Regression to Linear Regression

  • Map x\in R^dto\phi(x)\in R^{p+1} to obtain nonlinear features
  • Linear regression on y and nonlinear features \phi(x)
    Hence,to solve the Non-Linear Regression,we only need to translate x to \phi(x)
    which means:the parameter estimation for linear regression is

    and

Common basis function design

  • polynomial:\phi_j(x)=x^j
  • Radial basis function \phi_j(x)=exp(-\frac{x-\mu_j}{2\lambda_j^2})
  • sigmoid \phi_j(x)=\delta (\frac{x-\mu_j}{s}) where \delta (x)=(\frac{1}{1+e^{-x}})
  • splines
  • fourier
  • wavelets

Regulariaztion(video2)

definition

A regularization technique is in simple terms a penalty mechanism which applies shrinkage (driving them closer to zero) of coefficient to build a more robust and parsimonious model.

A regularization technique helps in the following main ways:

  • Doesn’t assume any particular distribution of the independent variable(对自变量无限制)
  • Address Variance-Bias Tradeoffs. Generally will lower the variance from the model(降方差)
  • More robust to handle multicollinearity(很多列相关性高)
  • Better sparse data (observations即行数、观测值 < features特征值) handling
  • Natural feature selection(数据清洗)
  • More accurate prediction on new data as it minimizes overfitting on the train data
  • Easier interpretation of the output

概念复习:

norm:

Linear Regression:

Linear Regression with Regularization:

R function的处理

从上图不难看出Linear Regression和Linear Regression with Regularization的区别主要是在如何定义R function (即图中的R(\beta)
example:
(注:\lambda是惩罚力度,是一个超参,训练时固定,需要其他机制去训练
\lambda\rightarrow 0,得到普通线性回归
\lambda\rightarrow\infty,得到\beta=0)

  • LASSO:R(\beta)=\lambda||\beta||_1=\lambda\sum_{i= 1}^{d}|\beta_i|
  • Rigde:R(\beta)=\lambda||\beta||_2^2=\lambda\sum_{i= 1}^{d}\beta_i^2
  • Elastic Net:(LASSO和Rigde的综合):R(\beta)=\lambda_1||\beta||_1+\lambda_2||\beta||_2^2=\lambda_1\sum_{i= 1}^{d}|\beta_i|+\lambda_2\sum_{i= 1}^{d}\beta_i^2

三种回归的特性

Ridge Regression Properties:

  • Ridge regression shrinks the coefficients and it helps to reduce the model complexity and multi�collinearity.
  • Coefficient of parameters can approach to zero but never become zero .

Lasso Regression Properties:

  • The lasso is a shrinkage method like ridge, but acts in a nonlinear manner on the outcome y.
  • L_1 regularization can lead to zero coefficients i.e. some of the features are completely neglected for the evaluation of output. So Lasso regression not only helps in reducing over-fitting but it can help us in feature selection.
  • In p>ncase, Lasso selects at mostnvariable before it saturates.
  • If there is a group of variables among which the pairwise correlations are very high, then Lasso select one from the group.(高度相似的变量会随机选择,不固定)

Elastic Net Regression Properties:

  • Combination of both L_1andL_2 and regularization
  • L_1part of the penalty generates a sparse model
  • L_2part of the penalty
  • Remove the limitation of the number of selected variables(可以选到p个)
  • Encouraging group effect
  • Stabilize theL_1regularization path
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容