Visual Genome Python Driver——READ Me

Visual Genome Python Driver

A python wrapper for the Visual Genome API. Visit the website for a complete list of object models and details about all endpoints. Look at our demo to see how you can use the python driver to access all the Visual Genome data.

Installation

To install this wrapper, you can use pip, as it follows

pip install .

2 ways of accessing the data

There are 2 ways of accessing the visual genome data.

  1. Use the API functions to access the data directly from our server. You will not need to keep any local data available.
  2. Download all the data and use our local methods to parse and work with the visual genome data.
    ... You can download the data either from the Visual Genome website or by using the download scripts in the data directory.

The API Functions are listed below.

Get all Visual Genome image ids

All the data in Visual Genome must be accessed per image. Each image is identified by a unique id. So, the first step is to get the list of all image ids in the Visual Genome dataset.

> from visual_genome import api
> ids = api.get_all_image_ids()
> print ids[0]
1

ids is a python array of integers where each integer is an image id.

Get a range of Visual Genome image ids

There are 108,249 images currently in the Visual Genome dataset. Instead of getting all the image ids, you might want to just get the ids of a few images. To get the ids of images 2000 to 2010, you can use the following code:

> ids = api.get_image_ids_in_range(startIndex=2000, endIndex=2010)
> print ids
[2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011]

Get image data

Now, let's get basic information about an image. Specifically, for a image id, we will extract the url of the image, it's width and height (dimensions). We will also collect it's COCO and Flickr ids from their respective datasets.

> image = api.get_image_data(id=61512)
> print image
id: 61512, coco_id: 248774, flickr_id: 6273011878, width: 1024, url: https://cs.stanford.edu/people/rak248/VG_100K/61512.jpg

get_image_data returns an Image model that you can read about in visual_genome/models.py.

Get Region Descriptions for an image

Now, let's get some exciting data: dense captions of an image. In Visual Genome, these are called region descriptions. Each region description is a textual description of a particular region in the image. A region is defined by it's top left coordinates (x, y) and a width and height.

# Let's get the regions for image with id=61512
> regions = api.get_region_descriptions_of_image(id=61512)
> print regions[0]
id: 1, x: 511, y: 241, width: 206, height: 320, phrase: A brown, sleek horse with a bridle, image: 61512

get_region_descriptions_of_image returns an array of Region objects which are defined in visual_genome/models.py.
Check out our demo to see these regions get visualized.

Get Region Graph from Region.

Let's get the region graph of the Region we printed out above. Region Graphs are tiny scene graphs for a particular region of an image. It contains: objects, attributes and relationships. Objects are localized in the image with bounding boxes. Attributes modify the object while Relationships are interactions between pairs of objects. We will get the scene graph of an image and print out the objects, attributes and relationships.

# Remember that the region desription is 'A brown, sleek horse with a bridle'.
> graph = api.get_scene_graph_of_image()
> print graph.objects
[horse]
>
>
> print graph.attributes
[horse is brown]
>
>
print graph.relationships
[]

The region graph has one object: horse and one attribute brown to describe the horse. It has no relationships.

Get Scene Graph for an image

Now, let's get the entire scene graph of an image. Each scene graph has three components: objects, attributes and relationships. Objects are localized in the image with bounding boxes. Attributes modify the object while Relationships are interactions between pairs of objects. We will get the scene graph of an image and print out the objects, attributes and relationships.

> # First, let's get the scene graph
> graph = api.get_scene_graph_of_image()
> # Now let's print out the objects. We will only print out the names and not the bounding boxes to make it look clean.
> print graph.objects
[horse, grass, horse, bridle, truck, sign, gate, truck, tire, trough, window, door, building, halter, mane, mane, leaves, fence]
>
>
> # Now, let's print out the attributes
> print graph.attributes
[3015675: horse is brown, 3015676: horse is spotted, 3015677: horse is red, 3015678: horse is dark brown, 3015679: truck is red, 3015680: horse is brown, 3015681: truck is red, 3015682: sign is blue, 3015683: gate is red, 3015684: truck is white, 3015685: tire is blue, 3015686: gate is wooden, 3015687: horse is standing, 3015688: truck is red, 3015689: horse is brown and white, 3015690: building is tan, 3015691: halter is red, 3015692: horse is brown, 3015693: gate is wooden, 3015694: grass is grassy, 3015695: truck is red, 3015696: gate is orange, 3015697: halter is red, 3015698: tire is blue, 3015699: truck is white, 3015700: trough is white, 3015701: horse is brown and cream, 3015702: leaves is green, 3015703: grass is lush, 3015704: horse is enclosed, 3015705: horse is brown and white, 3015706: horse is chestnut, 3015707: gate is red, 3015708: leaves is green, 3015709: building is brick, 3015710: truck is large, 3015711: gate is red, 3015712: horse is chestnut colored, 3015713: fence is wooden]
>
>
> # Finally, let's print out the relationships
> print graph.relationships
[3199950: horse stands on top of grass, 3199951: horse is in grass, 3199952: horse is wearing bridle, 3199953: trough is for horse, 3199954: window is next to door, 3199955: building has door, 3199956: horse is nudging horse, 3199957: horse has mane, 3199958: horse has mane, 3199959: trough is for horse]

Get Question Answers for an image

Let's now get all the Question Answers for one image. Each Question Answer object contains the id of the question-answer pair, the id of image, the question and the answer string, as well as the list of question objects and answer objects identified and canonicalized in the qa pair. We will extract the QAs for image 61512 and show all attributes of one such QA.

> # First extract the QAs for this image
> qas = api.get_QA_of_image(id=61512)
>
> # First print out some core information of the QA
> print qas[0]
id: 991154, image: 61512, question: What color is the keyboard?, answer: Black.
>
> # Now let's print out the question objects of the QA
> print qas[0].q_objects
[]

get_QA_of_image returns an array of QA objects which are defined in visual_genome/models.py. The attributes q_objects and a_objects are both an array of QAObject, which is also defined there.

Get all Questions Answers in the dataset

We also have a function that allows you to get all the 1.7 million QAs in the Visual Genome dataset. If you do not want to get all the data, you can also specify how many QAs you want the function to return using the parameter qtotal. So if qtotal = 10, you will get back 10 QAs.

> # Let's get only 10 QAs and print out the first QA.
> qas = api.get_all_QAs(qtotal=10)
> print qas[0]
id: 133103, image: 1159944, question: What is tall with many windows?, answer: Buildings.

To get all the QAs, set qtotal to None.

Get one type of Questions Answers from the entire dataset

You might be interested in only collecting why questions. To query for a particular type of question, set qtype to what, who, why, where, when, how.

> # Let's get the first 10 why QAs and print the first one.
> qas = api.get_QA_of_type(qtotal=10)
> print qas[0]
id: 133089, image: 1159910, question: Why is the man cosplaying?, answer: For an event.

The local functions are listed below.

Downloading the data.

> # Download all the image data.
> ./visual_genome/data/getImageData.sh
>
> # Download all the region descriptions.
> ./visual_genome/data/getRegionDescriptions.sh
>
> # Download all the question answers.
> ./visual_genome/data/getQuestionAnswers.sh

Get Scene Graphs for 200 images from local .json files

> import visual_genome.local as vg
> 
> # Convert full .json files to image-specific .jsons, save these to 'data/by-id'.
> # These files will take up a total ~1.1G space on disk.
> vg.save_scene_graphs_by_id(data_dir='data/', image_data_dir='data/by-id/')
> 
> # Load scene graphs in 'data/by-id', from index 0 to 200.
> # We'll only keep scene graphs with at least 1 relationship.
> scene_graphs = vg.get_scene_graphs(start_index=0, end_index=-1, min_rels=1,
>                                    data_dir='data/', image_data_dir='data/by-id/')
> 
> print len(scene_graphs)
149
> 
> print scene_graphs[0].objects
[clock, street, shade, man, sneakers, headlight, car, bike, bike, sign, building, ... , street, sidewalk, trees, car, work truck]

License

MIT License copyright Ranjay Krishna

Questions? Comments?

My hope is that the API and the python wrapper are so easy that you never have to ask questions. But if you have any question, you can contact me directly at ranjaykrishna at gmail or contact the project at stanfordvisualgenome @ gmail.

Follow us on Twitter:

Want to Help?

If you'd like to help, write example code, contribute patches, document methods, tweet about it. Your help is always appreciated!

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,372评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,368评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,415评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,157评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,171评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,125评论 1 297
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,028评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,887评论 0 274
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,310评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,533评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,690评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,411评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,004评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,659评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,812评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,693评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,577评论 2 353

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,322评论 0 10
  • 2017年9月13日 星期三 天气晴 35°C 最近看了一下几个大咖的文章,其中就有百度副总裁李叫兽和逻辑思维罗胖...
    陈贺雄阅读 414评论 0 0
  • HTTP简介 HTTP协议是Hyper Text Transfer Protocol(超文本传输协议)的缩写,是用...
    DCbryant阅读 201评论 0 0
  • 最近没有按时吃饭,还是很晚睡,熬夜的时候会忍不住偷偷的想你。每天都过的很充实,只是一个人而已。肚子咕咕的叫,...
    萝卜崽阅读 189评论 0 0