Applied Machine Learning in Python

Module 1: Fundamentals of Machine Learning

Machine learning brings together statistics, computer science and more...
supervised learning and unsupervised learning
Representation -> Evaluation -> Optimization

  • scikit-learn
  • SciPy
  • NumPy
  • Pandas
  • Matplotlib

It is important to examine your dataset before starting to work with it for a machine learning task, because:

  1. To understand how much missing data there is in the dataset
  2. To get an idea of whether the features need further cleaning
  3. It may turn out the problem doesn't require machine learning.
  • K-Nearest Neighbors Classification
  • A low value of “k” (close to 1) is more likely to overfit the training data and lead to worse accuracy on the test data, compared to higher values of “k”.
  • Setting “k” to the number of points in the training set will result in a classifier that always predicts the majority class.
  • The k-nearest neighbors classification algorithm has to memorize all of the training examples to make a prediction.

Module 2: Supervised Machine Learning

  • Linear regression
    \hat{y} = \hat{w_0}x_0 + \hat{w_1}x_1 + ...\hat{w_n}x_n +\hat{b}
  • Logistic regression
  • Support Vector Machines
  • Multi-class classification
  • Kernelized Support Vector Machine
  • Cross validation
  • Decision tree (less preprocssing of data, easy to interpret and visualize)

Module 3: Model Evaluation and Selection

  • Represent/Train/Evaluate/Refine cycle
  • Preamble
  • Dummy Classifier (strategy: most_frequent, stratified, uniform, constant)
  • Dummy Regressor (strategy: mean, median, quantile, constant)
  • Confusion matrices & Basic evaluation matrices
    1. Accuracy = \frac{TN+TP}{TN+TP+FN+FP}
    2. ClassificationError = \frac{FP+FN}{TN+TP+FN+FP}
    3. Recall = \frac{TP}{TP+FN}
      Recall is also known as:
    • True Positive Rate (TPR)
    • Sensitivity
    • Probability of detection
    1. Precision = \frac{TP}{TP+FP}
    2. False positive rate = \frac{FP}{TN+FP}
      FPR is also known as Specificity
  • Classifier Decision Functions
  • Precision recall and ROC curves
  • Multi-class evaluation, Macro average vs Micro average
  • Regression evaluation (r2_score usually is enough)
  • Model selection: Optimizing classifiers for different evaluation metrics

Module 4: Supervised Machine Learning - Part 2

  • Naive Bayes Classifier (Naive means each feature of an instance is independent of all the others, given the class)
    1. Bernoulli, Multinomial (suitable for text classification)
    2. Gaussian (suitable for high-dimensional data)
  • Random Forests
  • Gradient Boosted Decision Trees
  • Neural Networks
  • Data leakage

Optional module: Unsupervised Machine Learning

  • Dimensionality Reduction and Manifold Learning
  • K-means clustering
  • agglomerative clustering
  • DBSCAN Clustering
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

友情链接更多精彩内容