登录注册写文章

ML6 - Tuning

ML6 - Tuning

目录

1 Dimension reduction
1.1 Principal Component Analysis (PCA)
1.2 Nearest Neighbors
1.3 Discriminant Analysis.
2 Abnormal detection

1 Dimension reduction

由于Curse of Dimensionality的存在，我们需要缩减维度，来使得models得到比较准确的结果，同时减少计算时间和成本。

有两大类Dimension reduction的方法：

Feature Selection methods:
Specific features are selected for each data sample from the originallist of features and other features are discarded.
No new features are generated in this process.
Feature Extraction methods:
Engineer or extract new features from the original list of features in the data.The reduced subset of features will contain newly generated features that were not part of the original feature set. E.g. PCA.

1.1 Principal Component Analysis (PCA)

简单来说，主成分分析PCA是一种linear dimension reduction，寻找在高维数据中最大存在方差的方向，从而保留大部分信息，并将其投射到低维的子空间。

原理

主成分分析在观测数据上通过正交变换（orthogonal transformation）将一组高度correlated的变量转换成一组线性不相关的变量（主成分principal components，这些主成分是原始变量的线性组合）。这样的转换使第一个component的方差最大。

The first principal component is that linear combination of the original variables whose variance is greatest among all possible linear combinations.
The second principal component is that linear combination of the original variables that account for a maximum proportion of the remaining variance subject to being uncorrelated with the first principal component.
Subsequent components are defined similarly.

PCA实际上通过找到一组可以最能解释数据变动的principal axes来量化变量之间的关系。

Practical tips

Scale data
注意在PCA之前要对data做standardization，确保变量的scale反映他们本身的range。否则PC会对其中一个变量biased。

最后编辑于：2021.02.19 09:21:41

©著作权归作者所有,转载或内容合作请联系作者
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

推荐阅读更多精彩内容

InterAxis: Steering Scatterplot Axes via Observ...
通过观察级互动操纵散点图轴 Abstract—Scatterplots are effective visuali...
温柔的谢世杰阅读 787评论 0赞 0
A SURVEY OF TEXT CLUSTERING ALGORITHMS
引言 The problem of clustering has been studied widely in t...
matrices阅读 732评论 0赞 0
家长会谈合作
久违的晴天，家长会。家长大会开好到教室时，离放学已经没多少时间了。班主任说已经安排了三个家长分享经验。放学铃声...
飘雪儿5阅读 7,601评论 16赞 22
2019－11－28 周四阴
今天感恩节哎，感谢一直在我身边的亲朋好友。感恩相遇！感恩不离不弃。中午开了第一次的党会，身份的转变要...
迷月闪星情阅读 10,625评论 0赞 11
狐生缘
在妖界我有个名头叫胡百晓，无论是何事，只要找到胡百晓即可有解决的办法。因为是只狐狸大家以讹传讹叫我“倾城百晓”，...
猫九0110阅读 3,366评论 7赞 3

赞1赞

赞赏

手机看全文