DeepSort学习笔记

deepsort作为多目标跟踪的经典算法，相信每个入门MOT领域的人都是从deepsort开始，网上有大量的学习教程，但是相比别人喂我还是更喜欢自己吃。deepsort的代码是典型的面向对象的编程思想，代码优雅简洁，注释多，可以精读学习。为了防止以后遗忘，特地将自己的学习感悟记录下来。

需要掌握的先验知识

deepsort的综合性很强，涉及的相关算法很多，以下列出一些主要的算法，由于其中的一些公式太难编辑了，后续自己会专门结合代码实现写一下自己的理解。

1、卡尔曼滤波：
- （1）https://www.ncnynl.com/archives/201904/2954.html；
- （2）http://shequ.dimianzhan.com/articles/337/extended-kalman-filter-course-from-shallow-to-deep
- （3）调参实验：https://blog.csdn.net/honyniu/article/details/88698442
- （4）图解卡尔曼滤波：https://zhuanlan.zhihu.com/p/39912633
2、马氏距离：https://zhuanlan.zhihu.com/p/46626607
3、PCA主成分分析：https://www.imooc.com/article/29197
4、匈牙利算法：https://zhuanlan.zhihu.com/p/62981901
5、行人Reid
6、MOT评价指标：https://zhuanlan.zhihu.com/p/35391826

主要使用到的类

1、Tracker：跟踪器类，所有的逻辑都用该类串联，负责管理一系列track，通过调用Detection、NearestNeighborDistanceMetric、Track类完成detections和tracks的一系列操作。
2、Detection：负责每一个检测的对象
3、NearestNeighborDistanceMetric：距离计算的类
4、Track：轨迹类，完成对一条轨迹的状态管理、初始化、更新、删除、预测等等。
5、KalmanFilter：卡尔曼滤波
6、Extractor：reid网络

Pipeline

DeepSort将检测器输出的predict box作为输入，输出不同的track的运动状态u,v,γ,h,x ̇,y ̇,γ ̇,h ̇，不同的track以不同的track id标识。主要是用Tracker类的predict和update完成整个流程。千言万语不如一张图。

image.png

1、Input

将检测器产生的predict box转化为Detection对象。

2、tracker.predict

通过流程图我们可以知道，tracker.predict主要干了三件事：

1、使用卡尔曼滤波的predict方法对每条track进行状态预测，基于上一时刻的状态对当前时刻的状态和不确定性进行预测（根据线性运动学方程进行预测mean(u,v,γ,h,x ̇,y ̇,γ ̇,h ̇)，x=vt；预测不确定性covariance）。
2、更新track的age，这个属性之后没用到
3、更新track的since_update_times，这个属性很重要，每track.predict一次，该计数器加1，如果后续该track成功update了，即match了detection，该计数器清零。因此这个属性是表示了这条track连续多少次未和detection匹配，用于判断此条track是否离开了画面。

3、tracker.update

通过流程图我们可以知道，tracker.update主要干了六件事：

（1）track和detection的match

这部分是deepsort流程中最重要的一个环节。tracker中会保存所有的track，每一次的检测结果得到的detections都需要和tracker中的track进行匹配，完成detection的分配问题。

级联匹配（linear_assignment.matching_cascade）：self.time_since_update小的track拥有优先匹配权，因此丢失越久的track优先级越低。在deepsort的论文中作者解释：一个track的不确定性（covariance）会随着它未匹配的次数增加，即self.time_since_update越大，track的不确定性越大。而当两个track去竞争同一个detection时，不确定性更高的track和detection的马氏距离会更小。这样会破坏track的持续性，增加不稳定性。因此令最近匹配上的track应该有更高的匹配优先级。对每条track根据self.time_since_update进行循环的级联匹配，返回值为matched，unmatched_track，unmatched_detections。
- 参与匹配的对象：只有confirmed状态的tracks会参与匹配。
- 距离度量方法：主要使用外观特征的余弦距离和马氏距离。
- 余弦距离：用于衡量外观特征之间的距离。计算每一个detection经过reid网络得到的feature特征向量和track中已经存储的feature特征向量（存储的数量通过budget参数进行设置，默认100）计算余弦距离，每个detection和每个track会有budget个余弦距离，取其中的最小值为该track和detetion的余弦距离。
- 马氏距离：用于过滤外观特征匹配上但detection和track在图像中的location相差很大的配对。计算每个detetion到每个track的马氏距离，根据马氏距离的阈值（3倍的标准差），去除离群点。
- 匈牙利算法：对上述的满足余弦距离和马氏距离条件的detection和track使用匈牙利算法进行匹配。
IOU匹配：返回值为matched，unmatched_track，unmatched_detections。
- 参与匹配的对象：self.time_since_update==1的track和unconfirmed状态的track参与匹配。
- 距离度量的方法：IOU距离
- IOU距离：计算track和detection的iou，track的x,y,w,h是经过卡尔曼滤波predict的状态，dsitance = 1 - IOU

需要额外注意的是：

track的state要转化为confirmed时才能进行级联匹配，必须使用IOU匹配连续匹配上_n_hit次才能将track的state由tentative转换为confirmed。
state为confirmed的track，如果有一次未匹配上，即未完成track.update，其self.time_since_update > 1，无法进行IOU匹配。因此只要跟丢了一帧，就必须通过级联匹配才能重新激活track。

（2）track.update

track的mean表示运动学状态(u,v,γ,h,x ̇,y ̇,γ ̇,h ̇)，track的state表示track是否属性状态(deleted，tentative, confirmed)，需要更新track的五个部分。

需要更新的track对象：matched的track。
更新track的状态mean和不确定性covariance：综合考虑卡尔曼滤波predict预测值和detection的观测值，执行卡尔曼滤波的update过程，更新track的状态mean和不确定性covariance。
更新track的features属性：将本次新增的外观特征向量添加到track的features属性中，self.features.append(detection.feature)。
更新self.hits+=1：表明连续匹配次数+1
更新self.time_since_update=0：表示连续0次未匹配上
更新track的state：如果track的状态为不确定态并且连续_n_init次匹配上(track.state==tentative and self.hits>self._n_init)，将track的state变为confirmed。

（3）track.mark_missed

将符合条件的track的state标记为deleted，主要有两种情况。

参与的track对象：unmatched_track。
如果track的state为tentative（不确定态），将track的sate变为deleted。
如果track的self.time_since_update>max_age，将track的state标记为deleted。

（4）_initiate_track

初始化新的track。主要有两个部分：

参与对象：unmatched_detections。对于没有匹配上的detection，认为是新出现的track
卡尔曼滤波初始化：初始化卡尔曼滤波器的运动方程矩阵self._motion_mat和状态转移方程矩阵self._update_mat。
创建新的track对象，添加到tracker.tracks中

（5）删除track

在tracker.tracks删除state为deleted的track。

（6）self.metric.partial_fit

更新在级联匹配中，用于和新的detection的外观特征向量计算余弦距离的track的外观特征向量（self.budget，默认最新的100个）。

参与对象：confirmed状态的track
更新self.metric.samples：dict格式，key为track id，value为外观特征向量，value只保存最近的self.budget条。

4、Output

state为confirmed的track的location及其track id。源码中是Track对象是不包含类别的，因此输出的track也是没有类别的，在deepsort看来不同的track就是不同的目标对象，不区分类别。

需要注意：
输出的是confirmed的track，track转换为confirmed条件是需要连续_n_init使用iou匹配策略匹配上detection。

卡尔曼滤波过程理论结合代码

KF精髓在于Kalman gain. Kalman gain本质上是weighted average 的 weight。你越相信你的prediction，uncertainty越小，observation的weight就越小；反之，你越不相信你的prediction，uncertainty越大，observation的weight越大。举个例子：在一个陌生地方开车，你越不自信的时候，越信赖你的观察（路标、问路等等）你在熟悉的地方开车时越信赖自己的直觉。而观测噪声和测量噪声分别是对模型和测量不确定性的定量描述。

1、流程

image.png

predict，通过运动学方程x=vt，由Xt预测得到Xt+1的预测状态mean和convariance
通过状态转移方程，将预测状态转换到观测状态。得到观测状态的mean和convariance。对于mean来说，在deepsort中其实直接有location了，因此就是简单的单位矩阵。但是在其他场景中需要将多个复杂的传感器的值转换为我们的需要的测量值。
更新卡尔曼增益gain
测量状态和预测的测量状态之差，称为测量过程的革新或者残余，结合卡尔曼增益求到新的mean
利用卡尔曼增益，更新covariance

2、Predict过程

返回的mean, covariance分别对应流程图中Prediction中的 $X_{k+1}$ 和 $P_{k+1}$

def predict(self, mean, covariance):
    """Run Kalman filter prediction step.

    Parameters
    ----------
    mean : ndarray
        The 8 dimensional mean vector of the object state at the previous
        time step.
    covariance : ndarray
        The 8x8 dimensional covariance matrix of the object state at the
        previous time step.

    Returns
    -------
    (ndarray, ndarray)
        Returns the mean vector and covariance matrix of the predicted
        state. Unobserved velocities are initialized to 0 mean.

    """
    std_pos = [
        self._std_weight_position * mean[3],
        self._std_weight_position * mean[3],
        1e-2,
        self._std_weight_position * mean[3]]
    std_vel = [
        self._std_weight_velocity * mean[3],
        self._std_weight_velocity * mean[3],
        1e-5,
        self._std_weight_velocity * mean[3]]
    motion_cov = np.diag(np.square(np.r_[std_pos, std_vel]))
    
    # 求到流程图中的Xk+1
    mean = np.dot(self._motion_mat, mean)
    # 求到流程图中的Pk+1
    covariance = np.linalg.multi_dot((
        self._motion_mat, covariance, self._motion_mat.T)) + motion_cov

    return mean, covariance

3、Update过程

（1）project函数

project函数完成了状态转移的过程，在流程图Correction的中 $H$ 就是状态转移方程。返回的mean, covariance + innovation_cov分别是 $Hx_k$ 和 $HP_kH^T+R$

def project(self, mean, covariance):
    """Project state distribution to measurement space.

    Parameters
    ----------
    mean : ndarray
        The state's mean vector (8 dimensional array).
    covariance : ndarray
        The state's covariance matrix (8x8 dimensional).

    Returns
    -------
    (ndarray, ndarray)
        Returns the projected mean and covariance matrix of the given state
        estimate.

    """
    std = [
        self._std_weight_position * mean[3],
        self._std_weight_position * mean[3],
        1e-1,
        self._std_weight_position * mean[3]]
    innovation_cov = np.diag(np.square(std))

    mean = np.dot(self._update_mat, mean)
    covariance = np.linalg.multi_dot((
        self._update_mat, covariance, self._update_mat.T))
    return mean, covariance + innovation_cov

（2）update函数

这里提一下cholesky分解，在求逆矩阵很复杂时可以使用此方法简化，在计算卡尔曼增益和马氏距离时都有应用。由于本文中提到的协方差矩阵 $A$ 都是实对称正定矩阵，所以可以应用cholesky分解成下三角矩阵，然后转化为求非齐次线性方程组的问题：
$A= LL^T$
计算马氏距离时：
$D(x,y)=\sqrt{ (x-y)^TA^{-1}(x-y)} = \sqrt{ [ L^{-1}(x-y)]^T [L^{-1}(x-y)]}$
$Z= (L^{T})^{-1}*(x-y)^{T} =>L^{T}Z=(x-y)^{T}$

    # 求LT
    cholesky_factor = np.linalg.cholesky(covariance)
    # x-y
    d = measurements - mean
    # 解非齐次线性方程组求Z
    z = scipy.linalg.solve_triangular(
        cholesky_factor, d.T, lower=True, check_finite=False,
        overwrite_b=True)
    squared_maha = np.sum(z * z, axis=0)

计算卡尔曼增益时：

cholesky分解成下三角矩阵
chol_factor, lower = scipy.linalg.cho_factor(projected_cov, lower=True, check_finite=False)
求解非齐次线性方程组： $A^TK^T=b$ ， $K^T$ 是卡尔曼增益的转置矩阵。
kalman_gain = scipy.linalg.cho_solve( (chol_factor, lower), np.dot(covariance, self._update_mat.T).T, check_finite=False).T

def update(self, mean, covariance, measurement):
    """Run Kalman filter correction step.

    Parameters
    ----------
    mean : ndarray
        The predicted state's mean vector (8 dimensional).
    covariance : ndarray
        The state's covariance matrix (8x8 dimensional).
    measurement : ndarray
        The 4 dimensional measurement vector (x, y, a, h), where (x, y)
        is the center position, a the aspect ratio, and h the height of the
        bounding box.

    Returns
    -------
    (ndarray, ndarray)
        Returns the measurement-corrected state distribution.

    """
    # 状态转移，由测量空间转换到观测空间
    projected_mean, projected_cov = self.project(mean, covariance)

    chol_factor, lower = scipy.linalg.cho_factor(
        projected_cov, lower=True, check_finite=False)
    kalman_gain = scipy.linalg.cho_solve(
        (chol_factor, lower), np.dot(covariance, self._update_mat.T).T,
        check_finite=False).T
    # 测量变量和预测值之差，称为测量过程的革新或者残余
    innovation = measurement - projected_mean

    new_mean = mean + np.dot(innovation, kalman_gain.T)
    new_covariance = covariance - np.linalg.multi_dot((
        kalman_gain, projected_cov, kalman_gain.T))
    return new_mean, new_covariance