出于科研需要,开一个帖子来总结一下自己在阅读文献中遇到过的数据库,也方便之后再使用。
分类+检测数据库
ImageNet
ImageNet,无需多言,上介绍:
What is ImageNet?
ImageNet is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, we hope ImageNet will offer tens of millions of cleanly sorted images for most of the concepts in the WordNet hierarchy.
ImageNet是一个根据WordNet层级组织起来的数据库。每一个在WordNet上有意义的概念,可能是通过一个词,也可能是通过多个词组织起来的。都被称作“同义词组”。在WordNet上大约有100000个同义词组(概念),其中8000多个是名词。在ImageNet上,我们的目标是为每一个概念提供1000个图像。每一个图像都有质量保证和人工标注。在完成后,我们希望能够提供百万级的分类好的图片。
与其相关的竞赛是 ILSVRC。
分类数据库
MNIST
MNIST 是大牛Yan LeCun的工作之一,用来识别手写数字。简介:
The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.
MNIST数据库是手写数字的数据库(人写的数字)。它包括训练集(60000个实例),测试集(10000个实例)。它是NIST数据库的一个子集。这些数字大小相同,而且都位于图像中央。
它可以帮助科研人员测试学习技术和模式识别方法。
CIFAR
CIFAR 是多伦多大学计算机科学系维护的一个数据库,全称是Canadian Institute for Advanced Research,都是分类好的图片,用来测试算法分类的错误率的。既然是多伦多大学的,果然……CIFAR有Hinton大神参与维护。CIFAR又分为CIFAR-10和CIFAR-100,其实就是10个类别和100个类别的区别。
CIFAR-10包括了60000张32x32的彩色图片,共分为10类,每一类6000张图片。总共有50000个训练图像和10000个测试图像。
这个数据库被分为5个训练批次(batch)和1个测试批次,每个批次10000张图片。测试批次准确包括了每个类别各1000张随机选择的图片。训练批次包含了随机选择的剩余的图片,也就是说,某些训练批次可能包含的某一个类别的图片会多一些。总共加起来,这五个训练批次共包含每类5000张图片。
这些分类都是互斥的。没有重叠,比如说有两个类是汽车(automobile)和卡车(truck)。汽车包括轿车,SUV等。卡车只包括大卡车。你要问我皮卡怎么算?答案是两个类里面都没有皮卡。
CIFAR-100差不多,就是类别多了10倍,每一类的图片的数量不同。详细的需要的时候再去看吧。
YFCC100
YFCC100是雅虎的图片/视频分类数据库。
检测数据库
PASCAL VOC 2007/2012
Visual Object Classes Challenge 2012 (VOC 2012) 是牛津大学出品的数据库,用来识别物体。简介:
The main goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning learning problem in that a training set of labelled images is provided. The twenty object classes that have been selected are:
Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor
There are three main object recognition competitions: classification, detection, and segmentation, a competition on action classification, and a competition on large scale recognition run by ImageNet. In addition there is a "taster" competition on person layout.
VOC2012的主要目标是从真实场景中识别物体。它的基本作用是为监督学习问题提供一个训练集。20个物体类别是:
- 人:人
- 动物:鸟,毛,牛,狗,马,羊
- 交通工具: 飞机,自行车,传,公交,轿车,摩托车,火车;
- 室内物体:瓶子,椅子,餐桌,盆栽植物,沙发,电视/显示器
物体识别主要有三类任务:
- 分类,检测和分割
- 动作分类
- 大尺度识别(by ImageNet)
- 额外的:人体轮廓
COCO
COCO 是一个新的图像识别,分割,标记数据库。这里面的图像都已经预先分割好了,就看你的算法分割的错误率低不低了。与其相关的竞赛是COCO 2016 Detection and Keypoint Challenges
KITTI
KITTI Vision Benchmark Suite,测试自动驾驶 。这个库里面的图片都是汽车在行驶过程中在Karlruhe这个城市拍摄的街景,都有标签。比较小,只有289张训练图片。
其中一些道路标签包括:Highway, minor road
分割数据库
CityScapes Dataset
CityScapes dataset 目标是城市街景的语义理解(感觉就是城市街景里面的物体识别)。特点:
Type of annotations
- Semantic
- Instance-wise
- Dense pixel annotations
Complexity
- 30 classes
- See Class Definitions for a list of all classes and have a look at the applied labeling policy.
Diversity
- 50 cities
- Several months (spring, summer, fall)
- Daytime
- Good/medium weather conditions
- Manually selected frames
- Large number of dynamic objects
- Varying scene layout
- Varying background
Volume
- 5 000 annotated images with fine annotations (examples)
- 20 000 annotated images with coarse annotations (examples)