Module1 Classification

Paste_Image.png

Each observation is represented by a set of numbers(features).

Paste_Image.png
Paste_Image.png

\

Formally, given training set (xi,yi) for i=1…n, we want to create a classification model f that can predict label y for a new x.

Paste_Image.png

The machine learning algorithm will create the function f for you.
The predicted y for a new x is the sigh of f(x).

Loss Functions For Classificaiton

How do we measure classification error?


Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png

Statistical Learning Theory For Supervised Learning

Statistical Learning Theory

  • Ockham's Razor: The best models are simple models that fit the data well.
  • William of Ockham,English frier and philosopher (1287-1347) said that among hypotheses that predict equally well, we should choose the one with the fewest assumptions.

Paste_Image.png

We need a balance between accuracy and simplicity.
Most common machine learning methods choose f to minimize training error and complexity.
Aims to thwart the "curse" of dimensionality.

Paste_Image.png

Basic Outline for ML

  • step 1: Split data randomly into training and test sets.
  • step 2: Estimate coefficients/ Train Model:
Paste_Image.png
  • step 3: Score model: Compute score for each xi in the test set
  • step 4: Evaluate model.

Logistic Regression

simple, fast, often competes with the best ML algorithms.

Paste_Image.png
Paste_Image.png

Another perspective:

Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png

Evaluation Measures for Classifiers

Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png

ROC Curves

  • Started during WWII for analyzing radar signals.
  • For a particular False Positive Rate(FPR), what is the True Positive Rate(TPR)?
  • FPR = number of negatives that were classified by the ML algorithm as positives / total number of negatives
  • TPR = number of positives that were classified by the ML algorithm as positives / total number of positives.
Paste_Image.png

TPR=7/11
FPR=3/11

Paste_Image.png

TPR=3/11
FPR=2/11


Paste_Image.png
Paste_Image.png
Paste_Image.png
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 最近男朋友老忙的,然后我也老忙的,都没有时间在一起了😔 求老板给他减活! 求我能找到新机遇!
    任小菲阅读 140评论 0 0
  • 星期六,今天我正好也休息,这样在家就可以和儿子一起来做饭了。因为,上一次学校里就已经组织了一次让孩子动手给父母做一...
    张子洋妈妈阅读 840评论 0 2
  • 有时候,不带目的性地做一件自己喜欢做的事,愿意去做的事,我觉得那是种很美好的感觉,随心所欲,真诚的,善良的,开心就...
    秋陳阅读 279评论 0 1
  • 第一次来。 这个应用才下载不超过几个小时。 我甚至今天刚知道这个应用——简书。 我什么都不知道,只知道它能写东西,...
    Dreamfall阅读 139评论 0 0
  • 还是那句话,我只推荐自己读过的那些好书,其余的不做太多的评价。 计算机网络,谢希仁版<a id="orgheadl...
    Yihulee阅读 17,032评论 1 3