Stacked Capsule Autoencoders中英对照

By Adam R. Kosiorek \ Sara Sabourx \ Yee Whye Tehr \ Geoffrey E. Hintonx
原文链接:https://arxiv.org/abs/1906.06818

Abstract

摘要

An object can be seen as a geometrically organized set of interrelated parts. A system that makes explicit use of these geometric relationships to recognize objects should be naturally robust to changes in viewpoint, because the intrinsic geometric relationships are viewpoint-invariant.

一个物体可以看作一个由相互关联部分组成的几何组织。一个明确使用几何关系辨识物体的系统在改变视点时应当天然具有稳健性,因为物体内在的几何关系是不随视点变化的。

We describe an unsupervised version of capsule networks, in which a neural encoder, which looks at all of the parts, is used to infer the presence and poses of object capsules. The encoder is trained by backpropagating through a decoder, which predicts the pose of each already discovered part using a mixture of pose predictions.

我们将描述一个无监督版本的胶囊网络,其中查看所有部件的神经编码器是用来推测物体胶囊的存在和姿势的。该编码器通过解码器的反向传播进行训练,使用姿势预测的混合来预测已发现部分的姿势。

The parts are discovered directly from an image, in a similar manner, by using a neural encoder, which infers parts and their affine transformations. The corresponding decoder models each image pixel as a mixture of predictions made by affine-transformed parts.

通过使用一个推断部分及其仿射变换的神经编码器,这些部分以类似的方式被直接从图像中发现。对应的编码器将每个图片像素建模为由仿射变换部分做出的预测的混合。

We learn object- and their part-capsules on unlabeled data, and then cluster the vectors of presences of object capsules. When told the names of these clusters, we achieve state-of-the-art results for unsupervised classification on SVHN (55%) and near state-of-the-art on MNIST (98.5%).

我们在无标签数据上学习物体胶囊和其部分胶囊,然后聚集物体胶囊的存在向量。当分辨出这些群集的名字时,我们在SVHN(55%)上达到最高水准,在MNIST上接近最高水准(98.5%)。

1 Introduction

1 导论

Convolutional neural networks (CNN) work better than networks without weight-sharing because of their inductive bias: if a local feature is useful in one image location, the same feature is likely to be useful in other locations. It is tempting to exploit other effects of viewpoint changes by replicating features across scale, orientation and other affine degrees of freedom, but this quickly leads to cumbersome high-dimensional feature maps.

由于采用了归纳偏差,卷积神经网络(CNN)比不共享权重的网络工作的更好:如果局部特征在一个图片位置有用,那么同样的特征很有可能在另一个位置也有用。通过在比例、方向和其他仿射自由度上复制特征来探索视点变化的其他效果是诱人的,但这将很快引致笨重的高维特征图。

An alternative to replicating features across the non-translational degrees of freedom is to explicitly learn transformations between the natural coordinate frame of a whole object and the natural coordinate frames of each of its parts. Computer graphics relies on such object→part coordinate transformations to represent the geometry of an object in a viewpoint-invariant manner. Moreover, there is strong evidence that, unlike standard CNNs, human vision also relies on coordinate frames: imposing an unfamiliar coordinate frame on a familiar object makes it difficult to recognize the object or its geometry (Rock, 1973; Hinton, 1979).

明确地学习整个物体的自然坐标框架与其每个部分的自然坐标框架之间的转换是在非平移自由度上复制特征的替代方案。计算机图形依赖从物体到部分的坐标转换以视点不变的方式来表示物体的几何形状。此外,有很强的证据表明,不同于标准CNNs, 人类视觉同样依赖于坐标框架:将陌生的坐标框架施加于熟悉的物体使得辨认此物体的几何形状变得困难(Rock, 1973; Hinton, 1979)。

A neural system can learn to reason about transformation between objects, their parts and the viewer, but each of the transformations is likely to require different representation. An object-part-relationship (OP) is viewpoint-invariant and is naturally coded by learned weights. The relationship of an object or part to the viewer changes with the viewpoint (it is viewpoint-equivariant) and is naturally coded using neural activations. With this representation, pose of a single object is represented by its relationship to the viewer. Consequently, representing a single object does not necessitate replicating neural activations across space, unlike in CNNs. It is only processing two (or more) different instances of the same type of object in parallel that requires spatial replicas of both model parameters and neural activations.

神经系统可以学习推理物体、其部分和观察者之间的变换,但每次变换可能需要不同的表示。物体-部分关系(OP)是视点不变的,且易于由习得权重编码。物体或者部分对于观察者视点的变化(这是视点等变的)的关系是易于由神经激活编码的。通过这种表示方法,单个物体的姿势由它和观察者之间的关系表示的。因此,表示单个物体不必像CNNs那样跨空间复制神经激活。它仅仅并行处理两个(或更多)同类型物体的不同实例,这需要两个模型参数和神经激活的空间复制。

In this paper we propose the Stacked Capsule Autoencoder (SCAE), which has two stages (Fig. 1). The first stage, the Part Capsule Autoencoder (PCAE), segments an image into constituent parts, infers their poses, and reconstructs each image pixel as a mixture of the pixels of transformed part templates. The second stage, the Object Capsule Autoencoder (OCAE), tries to organize discovered parts and their poses into a smaller set of objects that can explain the part poses using a separate mixture of predictions for each part. Every object capsule contributes components to each of these mixtures by multiplying its pose—the object-viewer-relationship (OV)—by the relevant object-part-relationship (OP).

这这篇论文中,我们提出了堆栈式胶囊自动编码器(SCAE),它有两个阶段。第一个阶段,部分胶囊自动编码器(PCAE),将图片分割成连续的部分,推测它们的姿势,并且将每个图片像素重构成转换部分模板的像素混合。第二个阶段,物体胶囊自动编码器(OCAE),尝试将发现的部分和它们的姿势组成更小的物体集合,它可以通过每个部分的预测的单独混合来解释部分的姿势。每个物体胶囊都通过乘以它的姿势来向每个混合贡献组件——物体-观察者关系(OV)—— 通过相关物体-部分关系。

Stacked Capsule Autoencoders (Section 2) capture spatial relationships between whole objects and their parts when trained on unlabelled data. The vectors of presence probabilities for the object capsules tend to form tight clusters, and when we assign a class to each cluster we achieve state-ofthe-art results for unsupervised classification on SVHN (55%) and near state-of-the-art on MNIST (98.5%), which can be further improved to 67% and 99%, respectively, by learning fewer than 300 parameters. We also present promising proof-of-concept results on CIFAR10 (Section 3). We describe related work in Section 4 and discuss implications of our work and future directions in Section.

堆栈式胶囊自动编码器(第二部分)在无标签数据上训练时捕获整体和部分之间的空间关系。 物体胶囊的存在率矢量有助于形成紧密的集群,且当我们给每个集群分类时,我们在SVHN的无监督分类上达到了最先进的水平(55%),在MNIST上达到了接近最先进水平(98.5%),其结果还可以被进一步提高至67%(SVHN)和99%(MNIST)。

To be continued...

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,287评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,346评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,277评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,132评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,147评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,106评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,019评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,862评论 0 274
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,301评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,521评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,682评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,405评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,996评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,651评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,803评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,674评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,563评论 2 352

推荐阅读更多精彩内容