Convolutional Neural Networks (CNNs) and
other deep networks have enabled unprecedented breakthroughs in a
variety of computer vision tasks, from image classification to
object detection and segmentation. As in an intuitive scene, all these
tasks can be done with an unified framework with the localization
information gradually been expressed when the layers goes deeper. While
tackling with such a problem, we are also explaining why a model predict
and what a model predict.
In previous work, Zhou proposed a technique called Class Activation
Mapping (CAM) for identifying discriminative regions used by a
restricted class of image classification CNNs which replace full
connected layers with global average pooling layers to keep the
localization information. Ramprasaath used a gradient based method call
Gradient weighted Class Activation Mapping for optimizing CAM which
changes the base model with retaining. Above methods shows a good
visualization of most discriminative regions and be applied to
localization tasks.