Real 3D / Volumetric CNN for medical image classification

Author: Zongwei Zhou | 周纵苇
Weibo: @MrGiovanni
Email: zongweiz@asu.edu
原文链接: http://zongwei.leanote.com/post/3D


Reviews

[1] Automatic Detection of Cerebral Microbleeds From MR Images via 3D Convolutional Neural Networks. paper

  • Application: Cerebral microbleeds (CMB) detection.
  • Dataset: SWI-CMB
  • Preprocessing: normalized the volume intensities to the range of [0,1].
  • Evaluation: sensitivity (S), precision (P) and the average number of false positives per subject ($FP_{avg}$).
  • System Implementation: Framework based on Theano library, using a GPU of NVIDIA GeForce GTX TITAN Z.
  • Method

1. Screening strategy > conventional sliding window strategy.相当于一个3D的fully convolutional networks,把3D的数据输入,输出一个3D的score map。这样来初步找到可能目标的坐标点集(Region of Interest, ROI),其中会包含很多false positive,不过这也比扫描高效很多。问题是从TABLE 1看,这个网络结构并不像Fully Convolutional Networks啊,更像是个普通的分类网络。不知道作者是如何得到score map的。

THE ARCHITECTURE OF 3D FCN SCREENING MODEL
THE ARCHITECTURE OF 3D FCN SCREENING MODEL

2. Discrimination stage removes large number of false positive candidates. 相当于一个3D的CNN,用来检测3D patch。ReLU is utilized in the C and FC layer.
3D CNN architecture details: The 3D convolution kernels are randomly initialized form the Gaussian distribution (Learning from Scratch), opimizer is SGD, loss funciton is cross entropy loss. Meanwhile, dropout strategy is utilized. lr=0.03, momentum=0.9, dropout rate=0.3, batch size=100.

512 $\times$ 512 $\times$ 150 image $\longrightarrow$ 3D FCN $\longrightarrow$ 512 $\times$ 512 $\times$ 150 score map $\longrightarrow$ threshold ($\mathcal{T}$ = 0.64) $\longrightarrow$ 20 $\times$ 20 $\times$ 16 patch $\longrightarrow$ 3D CNN $\longrightarrow$ labeled.

  • Results and Conclusions:

1. 3D FCN better than these two methods - Barnes et al. and Chen et al.

COMPARISION OF DIFFERENT SCREENING METHODS
COMPARISION OF DIFFERENT SCREENING METHODS

2. Good detection performance

EVALUATION OF DETECTION RESULTS
EVALUATION OF DETECTION RESULTS

FROC COMPARISON
FROC COMPARISON

对比到对象是Bames et al,random forest和2D-CNN-SVM。

3. Capability of intermediate FEATURE representation better.

FEATURE REPRESENTATION
FEATURE REPRESENTATION

这个对比还是很新奇的,使用的工具是t-SNE toolbox.

[2] Multi-level Contextual 3D CNNs for False Positive Reduction in Pulmonary Nodule Detection. paper

  • Application: reduce false positive for pulmonary nodule detection in volumetric CT scans.
  • Dataset: LUNA16 challenge held in conjunction with ISBI 2016. Totally extracted 0.65 million samples to train the 3D CNNs in order to meet the larger parameter scales in 3D CNNs.
  • Preprocessing: 1) Data augmentation - translated by 1 voxel along each axis and rotated 90, 180 and 270 degrees with the transverse plane. In total, 0.65 million samples generated for training. 2) Normalization - clipped the intensities into the interval (-1000,400) HU and normalized them to the range of (0,1).
  • 3D CNN architecture details: Learning from Scratch, lr=0.3 and decayed by 5% every 5000 iterations. batchsize=200, momentum=0.9, and the dropout rate=0.2 stragety is utilized in C and FC layers.
  • Evaluation: FROC, Sensitivity
  • System Implementation: Framework based on Theano library, using a GPU of NVIDIA GeForce GTX TITAN Z.
  • Method

1. Multi-level contextual receptive field.

FUSION OF THREE 3D CNNs
FUSION OF THREE 3D CNNs

实质上是融合了三个不同的3D CNN的预测结果,这三个网络是根据不同尺寸的input patch来训练得到的,也就是说“多尺度”的CNN。。。好吧,理论上的优点是既用到了局部的细节特征,又用到了全局的特征。这个方法我们曾经有想过,也有很多研究者在2D上做过这个。对于多尺度问题,需要定义“尺度”的大小,所以作者就对数据集做了统计分析,如下图
DISTRIBUTION ANALYSIS OF THE SIZES OF PULMONARY NODULES FOR DETERMINING RECEPTIVE FIELDS.
DISTRIBUTION ANALYSIS OF THE SIZES OF PULMONARY NODULES FOR DETERMINING RECEPTIVE FIELDS.

这个多尺度的划分方法感觉是比较原始的,在实际应用中可参考性不佳,因为需要对数据集做一个统计,而选取的样本是否有统计代表性,要是来了新的数据是否还适用,都是不确定的。作者用的是voxels来标定的,首先来说我认为可以改成绝对的尺度(mm)。

2. Multi-model fusion
接下来看三个3D网络的融合过程,三个网络结构如表

THE ARCHITECTURE OF DIFFERENT RECEPTIVE FIELD 3D CNN
THE ARCHITECTURE OF DIFFERENT RECEPTIVE FIELD 3D CNN

Fuse the softwax regression outputs (probabilities) from all networks. The fused posterior probability $P_{fusion}$ is estimated by weighted linear combination:
$$P_{fusion}=\sum_{i\in{1,2,3}}\gamma_i\cdot P_i$$
The constant weight $\gamma_i$ were determined using grid search on a small subset of the training data in our experiments ($\gamma_1=0.3$, $\gamma_2=0.4$, $\gamma_3=0.3$).
这个融合其实并没有在网络内部进行融合,只是对于输出的概率做了一个简单的融合,这个是表面上的“融合”。对于融合,还有更多的方法,如拼接三个CNN的全连接层来融合,一个思想是把back propagation机制放在融合的过程中,这才是我比较认同的融合。

  • Evaluation Metrics

我觉得这部分是比较有参考价值:
The challenge evaluated detection results by measuring the detection sensitivity and average false positive rate per scan. A predicted candidate location was counted as a true positive if it was located within the radius of a true nodule center.(对于True Positive的定义对于画FROC是很关键的) Detections of irrelevant findings were ignored (i.e., considered as neither false positives nor true positives) in the evaluation. The challenge organizers performed the free receiver operation characteristic (FROC) analysis by setting different thresholds on the raw prediction probabilities submitted by the participating teams. The evaluation also computed the 95% confidence interval using the bootstrapping [36]. A competition performance metric (CPM) score [37], which was calculated as the average sensitivity at seven predefined false positive rates: 1/8, 1/4, 1/2, 1, 2, 4 and 8 false positives per scan, was produced for each algorithm. The ten-fold cross validation on the dataset was specified.

  • Results and Conclusions:

1. 3D > 2D

3D vs 2D CNN detection
3D vs 2D CNN detection

2. Fusion multi-level > single level

FROC ANALYSIS FOR DIFFERENT LEVEL
FROC ANALYSIS FOR DIFFERENT LEVEL

在论文的最后作者给出了3D的卷积核的可视化图,我不清楚放这个有什么用,能说明什么结果?

[3] 3D Deeply Supervised Network for Automatic Liver Segmentation from CT Volumes. paper

这篇文章给我的感觉就是一个3D的HED (paper),或者说一个3D Fully Convolutional Networks (paper),来对比一下它们的网络结构:

3D DSN
3D DSN

HED
HED

FCN
FCN

都是结合中间层的输出map,来做最后的分割预测,这个结构当时给我的疑问是如何设计back propagation,还有怎么把各个中间层结合起来,加权的权重是怎么学习出来的,是否也要放到back propagation中去?

  • Application: Liver (肝脏) Segmentation.
  • Dataset: MICCAI-SLiver07 dataset. The dataset totally consists of 30 contrast-enhanced CT scans (20 training and 10 testing).
  • 3D DSN architecture details: The mainstream network consists 11 layers: 6 convolutional layers, 2 max-pooling lyers, 2 deconvolutional layers and 1 softmax layer.(这里的一个问题是:我发现作者每篇论文中的网络kernel,stride,pooing大小都不太一样,这个是凭感觉决定的吗~正常比较靠谱的convolutional大小应该是像VGG那样的3$\times$3$\times$3)。Learning from Scratch, lr=0.1 and divided by 10 every fifty epochs. The deep supervision balancing weights ($\eta_h$?) were initialized as 0.3 and 0.4, and decayed by 5% every ten epochs.
  • Evaluation: Volumetric overlap error (VOE[%]), relative volume difference (VD[%]), average symmetric surface distance (AvgD[mm]), root mean square symmetric surface distance (RMSD[mm]) and maximum symmetric surface distance (MaxD[mm]). Details of these metrics can be found in Comparison and Evaluation of Methods for Liver Segmentation From CT Datasets
  • System Implementation: Framework based on Theano library, using a GPU of NVIDIA GeForce GTX TITAN Z.
  • Method

1. vanishing gradients problem
文中提到来梯度消失的问题,在3D的网络中可能会更加严重。解决方案是用多个中间层的预测输出来设计Loss,
$$\mathcal{L}=\mathcal{L}{o}(\mathcal{X};W)+\sum{\eta_h\cdot\mathcal{L}{h}(\mathcal{X};W_h,w_h)}+[regularization]$$
用权重$\eta_h$来控制各个隐层的重要性,从而解决前面几层的梯度消失,这个我个人认为不是很站的住脚,原因是一旦出现梯度消失,这个梯度是很小的,大概就是可以认为是0,那么要乘一个很大很大的权重才可以把数值拉上来,即使这样,其实并没有根本解决梯度消失。另外,ReLU的提出好像就是为了解决这个问题的,我不确定如果在3D中用这个激活函数还需不需要考虑梯度消失问题。

2. 条件随机场(CRF)模型
这个就很拼学术功底了,也是我为什么感觉自己的本科学历不够用的重要原因,正常情况下,我是不可能会想到要用这个模型来优化结果的。文章中的篇幅很小,需要拓展学习。我所知道的是作者引入了很多参数($\mu_1$,$\mu_2$,$\theta_{\alpha}$,$\theta_{\beta}$,$\theta_{\gamma}$),来解一个entropy funciton,用到的方法依然是grid search

  • Results and Conclusions:

1. 3D DSN > 3D CNN | CRF works good

EVALUATION
EVALUATION

VISUALIZATION
VISUALIZATION

2. Shorter runtime - 5s for 3D DSN and 87s for CRF.

COMPARISON WITH OTHER TEAM
COMPARISON WITH OTHER TEAM

可以看出,3D到网络运行到时间很短,而条件随机场处理很费时间。

[4] 3D Fully Convolutional Networks for Intervertebral Disc Localization and Segmentation. paper

这篇文章在算法上就只是把2D的FCN变成了3D的FCN,其他没有什么改进的地方,应用到了一个椎间盘的分割数据集中。

  • Application: Intervertebral discs (IVDs) (椎间盘) for volumetric data.
  • Dataset: MICCAI 2015 Challenge on Automatic Intervertebral Disc Localization and Segmentation.
  • Preprocessing: subtracting the mean value before inputting into the network.
  • System Implementation: 3D FCN using the framework based on Theano library, using a GPU of NVIDIA GeForce GTX X. 2D FCN was implemented with Matlab and C++.
  • Comparison: 2D FCN - the input is the adjacent slices (3 slices input and the output is the binary mask of the middle slice).
  • Evaluation: For IVD localization - mean localization distance (MLD) with standard deviation (SD), successful detection rate $P$. For IVD segmentation - mean dice overlap coefficients (MD) with SD, mean average absolute distance (MAAD) with SD.
  • Results and Conclusions:

1. 3D FCN > 2D FCN

TEST1
TEST1

TEST2
TEST2

总体来看,这篇论文的论点很简单,方法有创新(2D$\longrightarrow$3D),但是比较常规,结论也很简单,但是从我的角度看很有学习的必要,因为在这种情况下要发表,很考验写作的能力了,举例来说,写实验结果的时候,如果让我写,那就是一句话:3D FCN performs better than 2D FCN both in IVD localization and segmentation. 完事儿了。:-)

[5] VoxResNet: Deep Voxelwise Residual Networks for Volumetric Brain Segmentation. H Chen, Q Dou, L Yu, P Heng [CUHK] (2016). paper.

  • Propose a deep voxelwise residual network, referred as VoxResNet (3D Residual Network).
  • An auto-context version of VoxResNet is proposed
The architecture of VoxResNet
The architecture of VoxResNet

auto-context
auto-context

Comparison of VoxResNet, Auto-context VoxResNet and Ground truth
Comparison of VoxResNet, Auto-context VoxResNet and Ground truth

[6] Evaluation and comparison of 3D intervertebral disc localization and segmentation methods for 3D T2 MR data: A grand challenge. paper

这篇期刊是对椎间盘检测和分割[Review.4]的一个比较详细的介绍,也让我直观的感觉到了会议论文和期刊论文的区别,期刊就像对会议论文的每一个点都展开来描述的一样。随着CVPR,IPMI,MICCAI投完,我们也要开始投期刊了,把几个会议的内容充实起来,变成一篇丰满的期刊~没有时间仔细看了!Review到此为止。


Related works

[1] V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi [Johns Hopkins University]. paper.

  • Propose an approach to 3D image segmentation based on a volumetric, fully convolutional neural network (3D-FCN).
  • Introduce a novel objective function, optimise using Dice coefficient. In this way we can deal with situa- tions where there is a strong imbalance between the number of foreground and background voxels.
The architecture of V-Net
The architecture of V-Net



[2] 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox [University of Freiburg, Google Deepmind]. paper, code (Caffe).

  • NVIDIA TitanX GPU
2D U-Net Architecture
2D U-Net Architecture

3D U-Net Architecture
3D U-Net Architecture

[3] Deep MRI brain extraction: A 3D convolutional neural network for skull stripping. Jens Kleesiek, Gregor Urban, Alexander Hubert [Heidelberg University Hospital]. paper.

CNN architecture details
CNN architecture details

[4] Integrating Online and Offline 3D Deep Learning for Automated Polyp Detection in Colonoscopy Videos. Lequan Yu, Hao Chen, Qi Dou [CUHK] (2016). paper

Offline 3D FCN 1
Offline 3D FCN 1

Offline 3D FCN 2
Offline 3D FCN 2

Offline 3D FCN 3
Offline 3D FCN 3

Comparison
Comparison
  • The authors compared three different CNN architectures. 说实话这个的参考价值很低,因为很大程序上取决于经验和试凑。

Discussions online

1. Are there any deep learning libraries that have 3D volumetric/spatial convolutions running on a CPU or a GPU?

A recent addition, but Keras now supports 3D convolution. It should work for voxels and video sequences.

2. 3D CNN in Keras - Action Recognition

3. Software: https://github.com/facebook/C3D


Separable 3D CNN

1. References papers

[1] Learning Separable Filters. Amos Sironi, Bugra Tekin, Roberto Rigamonti [EPFL] 2014. paper -- check Section 5.5.

2. Try on

Examine the separability of the kernels in the pre-trained CNNs, check http://www.mathworks.com/matlabcentral/fileexchange/28238-kernel-decomposition


Some Questions

  • 在论文的最后作者给出了3D的卷积核的可视化图,我不清楚放这个有什么用,能说明什么结果?
  • 我发现作者每篇论文中的网络kernel,stride,pooing大小都不太一样,这个是凭感觉决定的吗?
  • [3] paper 的多层融合以及各个层多权重$\eta_h$的训练,编程是怎么实现的?
  • [4] paper 对于3D FCN代码是否有开源,文章中的结论是3D>2D,是否对于3D的FCN有其他细节的改进,因为根据我自己的实验结果,精确度差不多啊。
  • 作者的团队现在在用什么框架,是自己编程还是用开源的代码,如今3D的代码Lasagne的开源程度如何?

祝好!

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 218,122评论 6 505
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,070评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 164,491评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,636评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,676评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,541评论 1 305
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,292评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,211评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,655评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,846评论 3 336
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,965评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,684评论 5 347
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,295评论 3 329
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,894评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,012评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,126评论 3 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,914评论 2 355

推荐阅读更多精彩内容