Learning without Forgetting

image

Introduction

learning without forgetting与transfer learning, multi-task learning相似。
Currently, there are three common approaches (Figs. 1, 2(b-d)) to learning θn while benefiting from previously learned θs, and they differ mostly on which parameters are unchanged
Feature Extraction: θs and θo are unchanged
Fine-tuning: θs and θn are optimized for the new task, while θo is fixed. A low learning rate is typically used to prevent large drift in θs.
Fine-tuning FC: only top fully connected layers are fine-tuned
Joint Training：All parameters θs, θo, θn are jointly optimized. This method’s performance may be seen as an upper bound of what our proposed method can achieve.
Learning without Forgetting的优点：

Multi-task learning
Domain Adaptation：the same tasks but in different domains
transfer learning：different tasks
Learning without Forgetting approach could be seen as a combination of Distillation Networks and fine-tuning.
Feature Extraction: 不修改原来的network，允许新任务从以前任务中学习的复杂特性中获益。但是，这些特性并不专门用于新任务，通常可以通过微调进行改进
Fine-tuning: 经常使用小的学习率，有时部分network被固定防止overfitting。
Multitask learning：通常，网络的底层是共享的，而顶层是特定于任务的。多任务学习需要所有任务的数据