Introduction
- learning without forgetting与transfer learning, multi-task learning相似。
- Currently, there are three common approaches (Figs. 1, 2(b-d)) to learning θn while benefiting from previously learned θs, and they differ mostly on which parameters are unchanged
- Feature Extraction: θs and θo are unchanged
- Fine-tuning: θs and θn are optimized for the new task, while θo is fixed. A low learning rate is typically used to prevent large drift in θs.
- Fine-tuning FC: only top fully connected layers are fine-tuned
- Joint Training:All parameters θs, θo, θn are jointly optimized. This method’s performance may be seen as an upper bound of what our proposed method can achieve.
Learning without Forgetting的优点:
- 分类性能优于feature extraction,fine-tuning,其他最近的方法。
- 计算效率高,训练时间比joint training和fine-tuning快
- 简单的部署,一旦学习了一个任务,就不需要保留或重新应用训练数据来保持适应网络的性能。
Related Work
- Multi-task learning
- Domain Adaptation:the same tasks but in different domains
- transfer learning:different tasks
- Learning without Forgetting approach could be seen as a combination of Distillation Networks and fine-tuning.
- Feature Extraction: 不修改原来的network,允许新任务从以前任务中学习的复杂特性中获益。但是,这些特性并不专门用于新任务,通常可以通过微调进行改进
- Fine-tuning: 经常使用小的学习率,有时部分network被固定防止overfitting。
- Multitask learning:通常,网络的底层是共享的,而顶层是特定于任务的。多任务学习需要所有任务的数据