Person reID数据集介绍

海贼王
数据集 #camera train #identities train #imgs test #identities query #imgs gallery #imgs
market 6 751 12,936 750 3,368 19,732
duke 8 702 16,522 702 2,228 17,661(702+408 distractor)
cuhk03-np(detected) 2 767 7,365 700 1,400 5,332
msmt17 15 1,041 32,621 3,060 11,659 82,161

The Market-1501 dataset is annotated using the following rules. For each detected bounding box to be annotated, we manually draw a ground truth bounding box that contains the pedestrian. Then, for the detected and hand-drawn bounding boxes, we calculate the ratio of the overlapping area to the union area. If the ratio is larger than 50%, the DPM bounding box is marked as "good"; if the ratio is smaller than 20%, the bounding boxe is marked as "distractor"; otherwise, it is marked as "junk", meaning that this image is of zero influence to the re-identification accuracy.

Naming Rule of the bboxes
In bbox "0001_c1s1_001051_00.jpg", "c1" is the first camera (there are totally 6 cameras).

"s1" is sequence 1 of camera 1. Here, a sequence was defined automatically by the camera. We suppose that the camera cannot store a whole video that is quite large, so it splits the video into equally large sequences. Two sequences, namely, "c1s1" and "c2s1" do not happen exactly at the same time. This is mainly because the starting time of the 6 cameras are not exactly the same (it takes time to turn on them). But, "c1s1" and "c2s1" are roughly at the same time period.

"001051" is the 1051th frame in the sequence "c1s1". The frame rate is 25 frames per sec.

As with the last two digits, remember we use the DPM detector. Then, for identity "0001", there may be multiple detected bounding boxes in the frame "c1s1_001051". In other words, a pedestrian in the image may have several bboxes by DPM. So,"00"means that this bounding box is the first one among the several.

The package contains four folders.

  1. "bounding_box_test". There are 19,732 images in this folder used for testing.
  2. "bounding_box_train". There are 12,936 images in this folder used for training.
  3. "query". There are 750 identities. We randomly select one query image for each camera. So the maximum number of query images is 6 for an identity. In total, there are 3,368 query images in this folder.
  4. "gt_query". This folder contains the ground truth annotations. For each query, the relevant images are marked as "good" or "junk". "junk" has zero impact on search accuracy. "junk" images also include those in the same camera with the query.
  5. "gt_bbox". We also provide the hand-drawn bounding boxes. They are used to judge whether a DPM bounding box is good.

We have released the 500k bboxes as distractors. Market-1501+500k Dataset


The original dataset contains 85-minute high-resolution videos from 8 different cameras. Hand-drawn pedestrain bounding boxes are available.
We crop pedestrain images from the videos every 120 frames, yielding in total 36,411 bounding boxes with IDs. There are 1,404 identities appearing in more than two cameras and 408 identities (distractor ID) who appear in only one camera. We randomly select 702 IDs as the training set and the remaining 702 IDs as the testing set. In the testing set, we pick one query image for each ID in each camera and put the remaining images in the gallery.
As a result, we get 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images (702 ID + 408 distractor ID).
Related Datasets

Naming Rule of the images In bbox "0005_c2_f0046985.jpg", "0005" is the identity. "c2" means the image from Camera 2. "f0046985" is the 46985th frame in the video of Camera 2.

  • CUHK01-CUHK03 (Year:2013`2014)
    CUHK01 identities少,图片也少。命名例子:0006004.png,即第六个人的第四张图片。
    CUHK02 :

The five pairs of camera views are P1-P5. Cam1 and Cam2 just indicates two different cameras rather than being taken as a unique camera id.
P1: 971 identities
P2: 306 identities
P3: 107 identities
P4: 193 identities
P5: 239 identities
The first three digits of the image names are to match the identities between the two cameras in each settings.

 CUHK03 :

The data is stored in MATLAB MAT file "cuhk-03.mat". 1467 identities are
collected from 5 different pairs of camera views. The "cuhk-03.mat" contains
three cells.
"detected" means the bounding boxes are estimated by pedestrian detector
"labeled" means the bounding boxes are labeled by human
"testsets" contains the testing protocols

CUHK-NP

Labeled detected
#Training 7,368 7,365
#Query 1,400 1,400
#Gallery 5,328 5,332

To collect a large-scale person re-identification dataset-MSMT17, we utilize an 15-camera network deployed in campus. This camera network contains 12 outdoor cameras and 3 indoor cameras. We select 4 days with different weather conditions in a month for video collection. For each day, 3 hours of videos taken in the morning, noon, and afternoon, respectively, are selected for pedestrian detection and annotation. Our final raw video set contains 180 hours of videos, 12 outdoor cameras, 3 indoor cameras, and 12 time slots. Faster RCNN is utilized for pedestrian bounding box detection. Three labelers go through the detected bounding boxes and annotate ID label for 2 months. Finally, 126,441 bounding boxes of 4,101 identities are annotated. Some statistics on MSMT17 are shown in above. Compared with existing datasets, we summarize the new features in MSMT17 into the following aspects:
(1) Larger number of identities, bounding boxes, and cameras.
(2) Complex scenes and backgrounds.
(3) Multiple time slots result in severe lighting changes.
(4) More reliable bounding box detector.


The dataset consists of images extracted from multiple person trajectories recorded from two different, static surveillance cameras. Images from these cameras contain a viewpoint change and a stark difference in illumination, background and camera characteristics. Since images are extracted from trajectories, several different poses per person are available in each camera view. We have recorded 475 person trajectories from one view and 856 from the other one, with 245 persons appearing in both views. We have filtered out some heavily occluded persons, persons with less than five reliable images in each camera view, as well as corrupted images induced by tracking and annotation errors. This results in the following setup.

Camera view A shows 385 persons, camera view B shows 749 persons. The first 200 persons appear in both camera views, i.e., person 0001 of view A corresponds to person 0001 of view B, person 0002 of view A corresponds to person 0002 of view B, and so on. The remaining persons in each camera view (i.e., person 0201 to 0385 in view A and person 0201 to 0749 in view B) complete the gallery set of the corresponding view. Hence, a typical evaluation consists of searching the 200 first persons of one camera view in all persons of the other view. This means that there are two possible evalutaion procedures, either the probe set is drawn from view A and the gallery set is drawn from view B (A to B, used in our paper), or vice versa (B to A).


For more about Person Re-id datasets, please refer to Person Re-identification Datasets

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 219,110评论 6 508
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,443评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 165,474评论 0 356
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,881评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,902评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,698评论 1 305
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,418评论 3 419
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,332评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,796评论 1 316
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,968评论 3 337
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,110评论 1 351
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,792评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,455评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,003评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,130评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,348评论 3 373
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,047评论 2 355

推荐阅读更多精彩内容