Accuracy/Precision/Recall/F1

对于二分类

一般我们说，对某个类别的precision、Recall、F1，二分类，默认是对正例的。

对于多分类

假设有三分类[1,2,3]，那么有P1，R1，F1和P2，R2，F2和P3，R3，F3。
已知：
P1 = TP1/(TP1+FP1)
R1 = TP1/(TP1+FN1)
F1_1 = 2(P1R1)/(P1+R1)
————————————————
P2 = TP2/(TP2+FP2)
R2 = TP2/(TP2+FN2)
F1_2 = 2(P2R2)/(P2+R2)
————————————————
P3 = TP3/(TP3+FP3)
R3 = TP3/(TP3+FN3)
F1_3 = 2(P3*R3)/(P3+R3)
那么如何去计算总的P，R，F呢？
有两种计算方式，分别是micro和macro，分别是微平均和宏平均。

micro

P=(TP1+TP2+TP3)/(TP1+FP1+TP2+FP2+TP3+FP3)
R=(TP1+TP2+TP3)/(TP1+FN1+TP2+FN2+TP3+FN3)
F1 = 2(P*R)/(P+R)

macro

P=(P1+P2+P3)/3
R=(R1+R2+R3)/3
F1 =(F1_1+F1_2+F1_3)/3

Keras 源代码

#参数解释：
# y_true：真实值
# y_pred：预测值
# beta=1.0：默认计算F1值
def precision_recall_fscore_support(y_true, y_pred, beta=1.0, labels=None,
                                    pos_label=1, average=None,
                                    warn_for=('precision', 'recall',
                                              'f-score'),
                                    sample_weight=None):
    # F值得beta必须大于0
    if beta <= 0:
        raise ValueError("beta should be >0 in the F-beta score")
    
    # 计算混淆矩阵
    MCM = multilabel_confusion_matrix(y_true, y_pred,
                                      sample_weight=sample_weight,
                                      labels=labels, samplewise=samplewise)
    tp_sum = MCM[:, 1, 1]
    pred_sum = tp_sum + MCM[:, 0, 1]
    true_sum = tp_sum + MCM[:, 1, 0]
    
    # 若为micro准则，则将各个类别的TP等值累加后进行计算
    # 若为micro，tp_sum/pred_sum/true_sum，最终就由一个list（各个类别自己的值）变为一个值
    if average == 'micro':
        tp_sum = np.array([tp_sum.sum()])
        pred_sum = np.array([pred_sum.sum()])
        true_sum = np.array([true_sum.sum()])

    beta2 = beta ** 2# 这里beta=1,则beta2也等于1，就是计算F1

    # 计算precision和recall
    precision = _prf_divide(tp_sum, pred_sum,
                            'precision', 'predicted', average, warn_for)
    recall = _prf_divide(tp_sum, true_sum,
                         'recall', 'true', average, warn_for)

    # 计算f_score, 
    denom = beta2 * precision + recall
    denom[denom == 0.] = 1  # avoid division by 0
    f_score = (1 + beta2) * precision * recall / denom

    # 如果考虑weight，则需要设置weights变量（后面函数中会用到）
    if average == 'weighted':
        weights = true_sum
        if weights.sum() == 0:
            return 0, 0, 0, None
    elif average == 'samples':
        weights = sample_weight
    else:
        weights = None

    # macro/micro都会运行到这里
    if average is not None:
        assert average != 'binary' or len(precision) == 1
        # 把各个类别的precision取平均，作为多类别的precision
        precision = np.average(precision, weights=weights)
        # 把各个类别的recall取平均，作为多类别的recall
        recall = np.average(recall, weights=weights)
        # 把各个类别的f_score取平均，作为多类别的f_score
        f_score = np.average(f_score, weights=weights)
        true_sum = None  # return no support

    return precision, recall, f_score, true_sum