This semester I will attend a postgraduate course on Deep Learning. I want to write down my thoughts after each class.
Nowadays, we have accumulated a lot of data and entered the era of "big data". How to extract useful information from such a large data set is a crucial problem faced by many researchers. To process them, we have to develop new efficient tools. A lot of them can be used for data with only numbers, but not images or videos. Deep learning is a very useful tool to extract information from images. It ranks high in most of open competitions on processing images. In the researches of Google glasses and Baidu eyes, deep learning is playing an important role.
When we see a picture, we can detect useful partial information at once, like a human or a tree. But when computers read a picture, they can only obtain a matrix of numbers but nothing else. Then how to make them get the topic of a person from such a matrix? To process images, we need several steps, filtering noise of image data, preprocessing, feature extraction, and then classification. Each step of processing may correspond one or two layers in neural networks as we have seen. Usually deep learning is involved with many steps. For example, to do person re-identification pipeline, one will need at least seven steps, from pedestrian detection, pose estimation and so on. GoogleNet even contains more than 20 layers. Deep learning can allow such a complex model for dealing these problems.
Even in one single step, for example feature extraction, we might need other techniques to combine some or discard some useless features. So it is very difficult to learn with deep learning. But it is powerful! We can improve capacity in a large scale even though we only add one more layer. Therefore, deep learning can break bottleneck of accuracy which other learning models may face. As we can see, it behaves well in applications.