模型性能评估

不同的机器学习任务对应有不同的评估指标：

模型性能评估指标.png

评估分类模型

混淆矩阵（Confusion matrix）：

confusion matrix.png

True Positive - we predicted "+" and the true class is "+"
True Negative - we predicted "-" and the true class is "-"
False Positive - we predicted "+" and the true class is "-" (Type I error)
False Negative - we predicted "-" and the true class is "+" (Type II error)

1. 分类准确率（accuracy）：

预测正确样本数与总样本数之比。

accuracy.png

2. 精确率（precision）：

又称为查准率，表示的是预测为正的样本中有多少是对的。那么预测为正就有两种可能了，一种就是把正类预测为正类(TP)，另一种就是把负类预测为正类(FP)。

precision.png

3. 召回率（recall）：

又称为查全率，它表示的是样本中的正例有多少被预测正确了。那也有两种可能，一种是把原来的正类预测成正类(TP)，另一种就是把原来的正类预测为负类(FN)。

recall.png

4. P-R曲线：Trade-off between precision and recall

p-r曲线.png

Case1：当查准率更重要的时候

假设我们想去预测y = 1（比如一个人患有癌症的情况），仅仅当我们非常自信的时候，这时我们或许提高threshold到0.7。

我们预测为1，当hθ(x) ⩾0.7

我们预测为0，当hθ(x) <0.7

这样导致了更高的查准率，同时更低的查全率。

Case2：当查全率更重要的时候

假设我们想去避免丢失太多y = 1的情况，我们降低threshold到0.3

我们预测为1，当hθ(x) ⩾0.3

我们预测为0，当hθ(x) <0.3

5. 如何根据precision/recall来比较模型性能：F score

f score公式.png

比较实例：

f score比较.png

6. 度量标准：ROC曲线

维基百科的定义：

In signal detection theory, a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied.

比如在逻辑回归里面，我们会设一个阈值，大于这个值的为正类，小于这个值为负类。如果我们减小这个阀值，那么更多的样本会被识别为正类。这会提高正类的识别率，但同时也会使得更多的负类被错误识别为正类。为了形象化这一变化，在此引入 ROC ，ROC 曲线可以用于评价一个分类器好坏。

直观上，TPR 代表能将正例分对的概率，FPR 代表将负例错分为正例的概率。在 ROC 空间中，每个点的横坐标是 FPR，纵坐标是 TPR，这也就描绘了分类器在 TP（真正率）和 FP（假正率）间的 trade-off。

判定方法：曲线越靠近左上角，分类器的性能就越好。

roc曲线.png

7. 度量标准：AUC（Area Under Curve）

The AUC value is equivalent to the probability that a randomly chosen positive example is ranked higher than a randomly chosen negative example.

翻译过来就是，随机挑选一个正样本以及一个负样本，分类器判定正样本的值高于负样本的概率就是 AUC 值。

判定方法：AUC值越大的分类器，性能越好。

AUC.png

更好的理解ROC和AUC

更好的理解roc和auc.png

评估回归模型

1. 平均绝对误差MAE（Mean Absolute Error）

mae.png

2. 平均平方误差 MSE（Mean Squared Error）

mse.png

3. 决策系数R2

r2.png

模型性能评估的python实现：

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

#导入数据
filename = "d:/my_project/input/pima-indians-diabetes.data.csv"
names =['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = pd.read_csv(filename, names=names)
#将数据分为输入数据和输出结果
array = data.values
X = array[:, 0:8]
y = array[:, 8]

#分类问题
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=4)
model = LogisticRegression()
model.fit(X_train, y_train)
predicted = model.predict(X_test)
#混淆矩阵
matrix = confusion_matrix(y_test, predicted)
#分类报告
report = classification_report(y_test, predicted)

#回归问题
kfold = KFold(n_splits=10, random_state=7)
model = LogisticRegression()
result = cross_val_score(model, X, y, cv=kfold)
print("模型评估结果准确率：%.3f (%.3f)" %(result.mean(), result.std()))

#auc图
scoring = 'roc_auc'
result = cross_val_score(model, X, y, cv=kfold, scoring=scoring)
print("AUC：%.3f (%.3f)" %(result.mean(), result.std()))
#MAE
scoring = 'neg_mean_absolute_error'
result = cross_val_score(model, X, y, cv=kfold, scoring=scoring)
print("MAE：%.3f (%.3f)" %(result.mean(), result.std()))
#MSE
scoring = 'neg_mean_squared_error'
result = cross_val_score(model, X, y, cv=kfodl, scoring=scoring)
print("MSE：%.3f (%.3f)" % (result.mean(), result.std()))
#R2
scoring = 'r2'
result = cross_val_score(model, X, y, cv=kfodl, scoring=scoring)
print("R2：%.3f (%.3f)" % (result.mean(), result.std()))

结果显示：

model_evaluation.png

模型性能评估