增量学习最近文章列表

增量学习解决的问题：catastrophic forgetting（灾难性遗忘）

参数控制

这类方法当面对新数据希望旧模型上的重要权重（参数）改变比较小。他们的不同之处就是如何估计重要的参数。

EWC[1]，通过Fisher information matrix估计weight重要性
SI[2]，使用优化轨迹上的路径积分
MAS[3]，利用网咯输出的梯度

知识蒸馏（knowledge distillation）[4]

从teacher模型迁移key knowledge到student model

LwF[5]
LwF.MC[6], multi-class classification
$M^2$ KD[7]，介绍了一个multi-model和multi-level knowledge distillation策略，利用所有之前模型的snapshots（快照），而不是上一个模型。

记忆

储存一些旧数据来作为训练数据

存储少量旧数据，并在下一个incremental step中replay
利用GANs来生成旧数据，而不是直接存储旧数据。但是这种方法在训练时要同时训练一个生成模型。[11]

类别不平衡

即使使用记忆模块来存储少量旧数据，但是类别不平衡问题依旧严重，这是灾难性遗忘的重要因素。

BiC[8], 通过增加一个bias correction layer来改正模型的输出，这个layer需要一个验证集来训练
NCM[9], 使用cosine normalization, less-forget constraint, and inter-class separation，three specific loss terms来提高性能
IL2M[10]，利用一个dual memory的内容来修正旧类别的分数

结构性方法

network pruning， dynamic expansion，parameter masking
Pack-Net，精简网络，为新任务创建free parameter
HAT，为旧任务学习attention mask，在学习新任务时使用他们限制参数

以上的策略可以结合，iCaRL中使用了知识蒸馏，记忆模块，最近邻分类器。EEIL使用了知识蒸馏，记忆模块，平衡微调

reference

[1] Overcoming catastrophic forgetting in neural networks
[2] Continual learning through synaptic intelligence
[3] Memory aware synapses: Learning what (not) to forget
[4] Distilling the knowledge in a neural network
[5] Learning without forgetting
[6] icarl: Incremental classifier and representation learning
[7] M2kd: Multi-model and multi- level knowledge distillation for incremental learning
[8] Large scale incre- mental learning
[9] Learning a unified classifier incrementally via re- balancing
[10] Il2m: Class incremental learning with dual memory
[11] Continual learning with deep generative replay