sklearn库的基本介绍
Machine Learning in Python
- Simple and efficient tools for data mining and data analysis
- Accessible to everybody, and reusable in various contexts
- Built on Numpy , Scipy, and matplotlib
- Open source, commercially usable - BSD license
那么sklearn可以做哪些事情呢?
如官网所示
Classification(分类):Identifying to which category an object belongs to.
SVM, nearest neighbors, random forest, ...
Regression(回归):Predicting a continuous-valued attribute associated with an object.
SVR, ridge regression(L2), Lasso(L1), ...
Clustering(聚类)无监督学习:Automatic grouping of similar objects into sets.
K-Means, spectral clustering, mean-shift, ...
Dimensionality reduction(降维):Reducing the number of random variables to consider.
PCA(主成分分析), feature selection(特征选择), nonegative matrix factorization(矩阵分解).
Model selection(模型选择):Comparing, validating and choosing parameters and models.
grid search(栅格搜索), cross validation(交叉验证), metrics.
Preprocessing(预处理):Feature extraction and normalization.
preprocessing, feature extraction.
当我们的数据量不是很大的时候,sklearn 是一个非常好的选择。当然,sklearn 包也有着自己的局限性。sklearn 一般来讲是基于单机跑起来的(依赖单机的性能),当数据量很大或者需要解决一些复杂问题的时候,sklearn 可能就不是那么得心应手了。
后期会逐渐添加一些 sklearn 的具体使用案例和分析。