对于二分类
一般我们说,对某个类别的precision、Recall、F1,二分类,默认是对正例的。
对于多分类
假设有三分类[1,2,3],那么有P1,R1,F1和P2,R2,F2和P3,R3,F3。
已知:
P1 = TP1/(TP1+FP1)
R1 = TP1/(TP1+FN1)
F1_1 = 2(P1R1)/(P1+R1)
————————————————
P2 = TP2/(TP2+FP2)
R2 = TP2/(TP2+FN2)
F1_2 = 2(P2R2)/(P2+R2)
————————————————
P3 = TP3/(TP3+FP3)
R3 = TP3/(TP3+FN3)
F1_3 = 2(P3*R3)/(P3+R3)
那么如何去计算总的P,R,F呢?
有两种计算方式,分别是micro和macro,分别是微平均和宏平均。
micro
P=(TP1+TP2+TP3)/(TP1+FP1+TP2+FP2+TP3+FP3)
R=(TP1+TP2+TP3)/(TP1+FN1+TP2+FN2+TP3+FN3)
F1 = 2(P*R)/(P+R)
macro
P=(P1+P2+P3)/3
R=(R1+R2+R3)/3
F1 =(F1_1+F1_2+F1_3)/3
Keras 源代码
#参数解释:
# y_true:真实值
# y_pred:预测值
# beta=1.0:默认计算F1值
def precision_recall_fscore_support(y_true, y_pred, beta=1.0, labels=None,
pos_label=1, average=None,
warn_for=('precision', 'recall',
'f-score'),
sample_weight=None):
# F值得beta必须大于0
if beta <= 0:
raise ValueError("beta should be >0 in the F-beta score")
# 计算混淆矩阵
MCM = multilabel_confusion_matrix(y_true, y_pred,
sample_weight=sample_weight,
labels=labels, samplewise=samplewise)
tp_sum = MCM[:, 1, 1]
pred_sum = tp_sum + MCM[:, 0, 1]
true_sum = tp_sum + MCM[:, 1, 0]
# 若为micro准则,则将各个类别的TP等值累加后进行计算
# 若为micro,tp_sum/pred_sum/true_sum,最终就由一个list(各个类别自己的值)变为一个值
if average == 'micro':
tp_sum = np.array([tp_sum.sum()])
pred_sum = np.array([pred_sum.sum()])
true_sum = np.array([true_sum.sum()])
beta2 = beta ** 2# 这里beta=1,则beta2也等于1,就是计算F1
# 计算precision和recall
precision = _prf_divide(tp_sum, pred_sum,
'precision', 'predicted', average, warn_for)
recall = _prf_divide(tp_sum, true_sum,
'recall', 'true', average, warn_for)
# 计算f_score,
denom = beta2 * precision + recall
denom[denom == 0.] = 1 # avoid division by 0
f_score = (1 + beta2) * precision * recall / denom
# 如果考虑weight,则需要设置weights变量(后面函数中会用到)
if average == 'weighted':
weights = true_sum
if weights.sum() == 0:
return 0, 0, 0, None
elif average == 'samples':
weights = sample_weight
else:
weights = None
# macro/micro都会运行到这里
if average is not None:
assert average != 'binary' or len(precision) == 1
# 把各个类别的precision取平均,作为多类别的precision
precision = np.average(precision, weights=weights)
# 把各个类别的recall取平均,作为多类别的recall
recall = np.average(recall, weights=weights)
# 把各个类别的f_score取平均,作为多类别的f_score
f_score = np.average(f_score, weights=weights)
true_sum = None # return no support
return precision, recall, f_score, true_sum