Author: Zongwei Zhou | 周纵苇
Weibo: @MrGiovanni
Email: zongweiz@asu.edu
原文链接: http://zongwei.leanote.com/post/3D
Reviews
[1] Automatic Detection of Cerebral Microbleeds From MR Images via 3D Convolutional Neural Networks. paper
- Application: Cerebral microbleeds (CMB) detection.
- Dataset: SWI-CMB
- Preprocessing: normalized the volume intensities to the range of [0,1].
- Evaluation: sensitivity (S), precision (P) and the average number of false positives per subject ($FP_{avg}$).
-
System Implementation: Framework based on
Theano
library, using a GPU ofNVIDIA GeForce GTX TITAN Z
. - Method
1. Screening strategy > conventional sliding window strategy.相当于一个3D的fully convolutional networks,把3D的数据输入,输出一个3D的score map。这样来初步找到可能目标的坐标点集(Region of Interest, ROI),其中会包含很多false positive,不过这也比扫描高效很多。问题是从TABLE 1看,这个网络结构并不像Fully Convolutional Networks啊,更像是个普通的分类网络。不知道作者是如何得到score map的。
2. Discrimination stage removes large number of false positive candidates. 相当于一个3D的CNN,用来检测3D patch。ReLU
is utilized in the C and FC layer.
3D CNN architecture details: The 3D convolution kernels are randomly initialized form the Gaussian distribution (Learning from Scratch
), opimizer is SGD
, loss funciton is cross entropy loss
. Meanwhile, dropout
strategy is utilized. lr=0.03
, momentum=0.9
, dropout rate=0.3
, batch size=100
.
512 $\times$ 512 $\times$ 150 image $\longrightarrow$ 3D FCN $\longrightarrow$ 512 $\times$ 512 $\times$ 150 score map $\longrightarrow$ threshold ($\mathcal{T}$ = 0.64) $\longrightarrow$ 20 $\times$ 20 $\times$ 16 patch $\longrightarrow$ 3D CNN $\longrightarrow$ labeled.
- Results and Conclusions:
1. 3D FCN better than these two methods - Barnes et al. and Chen et al.
2. Good detection performance
对比到对象是Bames et al,random forest和2D-CNN-SVM。
3. Capability of intermediate FEATURE representation better.
这个对比还是很新奇的,使用的工具是t-SNE toolbox.
[2] Multi-level Contextual 3D CNNs for False Positive Reduction in Pulmonary Nodule Detection. paper
- Application: reduce false positive for pulmonary nodule detection in volumetric CT scans.
-
Dataset: LUNA16 challenge held in conjunction with ISBI 2016. Totally extracted
0.65 million
samples to train the 3D CNNs in order to meet the larger parameter scales in 3D CNNs. - Preprocessing: 1) Data augmentation - translated by 1 voxel along each axis and rotated 90, 180 and 270 degrees with the transverse plane. In total, 0.65 million samples generated for training. 2) Normalization - clipped the intensities into the interval (-1000,400) HU and normalized them to the range of (0,1).
-
3D CNN architecture details:
Learning from Scratch
,lr=0.3
and decayed by 5% every 5000 iterations.batchsize=200
,momentum=0.9
, and thedropout rate=0.2
stragety is utilized in C and FC layers. - Evaluation: FROC, Sensitivity
-
System Implementation: Framework based on
Theano
library, using a GPU ofNVIDIA GeForce GTX TITAN Z
. - Method
1. Multi-level contextual receptive field.
实质上是融合了三个不同的3D CNN的预测结果,这三个网络是根据不同尺寸的input patch来训练得到的,也就是说“多尺度”的CNN。。。好吧,理论上的优点是既用到了局部的细节特征,又用到了全局的特征。这个方法我们曾经有想过,也有很多研究者在2D上做过这个。对于多尺度问题,需要定义“尺度”的大小,所以作者就对数据集做了统计分析,如下图
这个多尺度的划分方法感觉是比较原始的,在实际应用中可参考性不佳,因为需要对数据集做一个统计,而选取的样本是否有统计代表性,要是来了新的数据是否还适用,都是不确定的。作者用的是voxels来标定的,首先来说我认为可以改成绝对的尺度(mm)。
2. Multi-model fusion
接下来看三个3D网络的融合过程,三个网络结构如表
Fuse the softwax regression outputs (probabilities) from all networks. The fused posterior probability $P_{fusion}$ is estimated by weighted
linear combination
:$$P_{fusion}=\sum_{i\in{1,2,3}}\gamma_i\cdot P_i$$
The constant weight $\gamma_i$ were determined using
grid search
on a small subset of the training data in our experiments ($\gamma_1=0.3$, $\gamma_2=0.4$, $\gamma_3=0.3$).这个融合其实并没有在网络内部进行融合,只是对于输出的概率做了一个简单的融合,这个是表面上的“融合”。对于融合,还有更多的方法,如拼接三个CNN的全连接层来融合,一个思想是把back propagation机制放在融合的过程中,这才是我比较认同的融合。
- Evaluation Metrics
我觉得这部分是比较有参考价值:
The challenge evaluated detection results by measuring the detection sensitivity and average false positive rate per scan. A predicted candidate location was counted as a true positive if it was located within the radius of a true nodule center.(对于True Positive的定义对于画FROC是很关键的) Detections of irrelevant findings were ignored (i.e., considered as neither false positives nor true positives) in the evaluation. The challenge organizers performed the free receiver operation characteristic (FROC) analysis by setting different thresholds on the raw prediction probabilities submitted by the participating teams. The evaluation also computed the 95% confidence interval using the bootstrapping [36]. A competition performance metric (CPM) score [37], which was calculated as the average sensitivity at seven predefined false positive rates: 1/8, 1/4, 1/2, 1, 2, 4 and 8 false positives per scan, was produced for each algorithm. The ten-fold cross validation on the dataset was specified.
- Results and Conclusions:
1. 3D > 2D
2. Fusion multi-level > single level
在论文的最后作者给出了3D的卷积核的可视化图,我不清楚放这个有什么用,能说明什么结果?
[3] 3D Deeply Supervised Network for Automatic Liver Segmentation from CT Volumes. paper
这篇文章给我的感觉就是一个3D的HED (paper),或者说一个3D Fully Convolutional Networks (paper),来对比一下它们的网络结构:
都是结合中间层的输出map,来做最后的分割预测,这个结构当时给我的疑问是如何设计back propagation,还有怎么把各个中间层结合起来,加权的权重是怎么学习出来的,是否也要放到back propagation中去?
- Application: Liver (肝脏) Segmentation.
- Dataset: MICCAI-SLiver07 dataset. The dataset totally consists of 30 contrast-enhanced CT scans (20 training and 10 testing).
-
3D DSN architecture details: The mainstream network consists 11 layers: 6 convolutional layers, 2 max-pooling lyers, 2 deconvolutional layers and 1 softmax layer.(这里的一个问题是:我发现作者每篇论文中的网络kernel,stride,pooing大小都不太一样,这个是凭感觉决定的吗~正常比较靠谱的convolutional大小应该是像VGG那样的3$\times$3$\times$3)。
Learning from Scratch
,lr=0.1
and divided by 10 every fifty epochs. Thedeep supervision balancing weights
($\eta_h$?) were initialized as 0.3 and 0.4, and decayed by 5% every ten epochs. - Evaluation: Volumetric overlap error (VOE[%]), relative volume difference (VD[%]), average symmetric surface distance (AvgD[mm]), root mean square symmetric surface distance (RMSD[mm]) and maximum symmetric surface distance (MaxD[mm]). Details of these metrics can be found in Comparison and Evaluation of Methods for Liver Segmentation From CT Datasets
-
System Implementation: Framework based on
Theano
library, using a GPU ofNVIDIA GeForce GTX TITAN Z
. - Method
1. vanishing gradients problem
文中提到来梯度消失的问题,在3D的网络中可能会更加严重。解决方案是用多个中间层的预测输出来设计Loss,
$$\mathcal{L}=\mathcal{L}{o}(\mathcal{X};W)+\sum{\eta_h\cdot\mathcal{L}{h}(\mathcal{X};W_h,w_h)}+[regularization]$$
用权重$\eta_h$来控制各个隐层的重要性,从而解决前面几层的梯度消失,这个我个人认为不是很站的住脚,原因是一旦出现梯度消失,这个梯度是很小的,大概就是可以认为是0,那么要乘一个很大很大的权重才可以把数值拉上来,即使这样,其实并没有根本解决梯度消失。另外,ReLU的提出好像就是为了解决这个问题的,我不确定如果在3D中用这个激活函数还需不需要考虑梯度消失问题。
2. 条件随机场(CRF)模型
这个就很拼学术功底了,也是我为什么感觉自己的本科学历不够用的重要原因,正常情况下,我是不可能会想到要用这个模型来优化结果的。文章中的篇幅很小,需要拓展学习。我所知道的是作者引入了很多参数($\mu_1$,$\mu_2$,$\theta_{\alpha}$,$\theta_{\beta}$,$\theta_{\gamma}$),来解一个entropy funciton,用到的方法依然是grid search
。
- Results and Conclusions:
1. 3D DSN > 3D CNN | CRF works good
2. Shorter runtime - 5s for 3D DSN and 87s for CRF.
可以看出,3D到网络运行到时间很短,而条件随机场处理很费时间。
[4] 3D Fully Convolutional Networks for Intervertebral Disc Localization and Segmentation. paper
这篇文章在算法上就只是把2D的FCN变成了3D的FCN,其他没有什么改进的地方,应用到了一个椎间盘的分割数据集中。
- Application: Intervertebral discs (IVDs) (椎间盘) for volumetric data.
- Dataset: MICCAI 2015 Challenge on Automatic Intervertebral Disc Localization and Segmentation.
- Preprocessing: subtracting the mean value before inputting into the network.
-
System Implementation: 3D FCN using the framework based on
Theano
library, using a GPU ofNVIDIA GeForce GTX X
. 2D FCN was implemented with Matlab and C++. - Comparison: 2D FCN - the input is the adjacent slices (3 slices input and the output is the binary mask of the middle slice).
- Evaluation: For IVD localization - mean localization distance (MLD) with standard deviation (SD), successful detection rate $P$. For IVD segmentation - mean dice overlap coefficients (MD) with SD, mean average absolute distance (MAAD) with SD.
- Results and Conclusions:
1. 3D FCN > 2D FCN
总体来看,这篇论文的论点很简单,方法有创新(2D$\longrightarrow$3D),但是比较常规,结论也很简单,但是从我的角度看很有学习的必要,因为在这种情况下要发表,很考验写作的能力了,举例来说,写实验结果的时候,如果让我写,那就是一句话:3D FCN performs better than 2D FCN both in IVD localization and segmentation. 完事儿了。:-)
[5] VoxResNet: Deep Voxelwise Residual Networks for Volumetric Brain Segmentation. H Chen, Q Dou, L Yu, P Heng [CUHK] (2016). paper.
- Propose a deep voxelwise residual network, referred as VoxResNet (3D Residual Network).
- An auto-context version of VoxResNet is proposed
[6] Evaluation and comparison of 3D intervertebral disc localization and segmentation methods for 3D T2 MR data: A grand challenge. paper
这篇期刊是对椎间盘检测和分割[Review.4]的一个比较详细的介绍,也让我直观的感觉到了会议论文和期刊论文的区别,期刊就像对会议论文的每一个点都展开来描述的一样。随着CVPR,IPMI,MICCAI投完,我们也要开始投期刊了,把几个会议的内容充实起来,变成一篇丰满的期刊~没有时间仔细看了!Review到此为止。
Related works
[1] V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi [Johns Hopkins University]. paper.
- Propose an approach to 3D image segmentation based on a volumetric, fully convolutional neural network (3D-FCN).
- Introduce a novel objective function, optimise using Dice coefficient. In this way we can deal with situa- tions where there is a strong imbalance between the number of foreground and background voxels.
- Implementation available at https://github.com/faustomilletari/VNet
- Implementation available at https://github.com/faustomilletari/3D-Caffe
[2] 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox [University of Freiburg, Google Deepmind]. paper, code (Caffe).
- NVIDIA TitanX GPU
[3] Deep MRI brain extraction: A 3D convolutional neural network for skull stripping. Jens Kleesiek, Gregor Urban, Alexander Hubert [Heidelberg University Hospital]. paper.
[4] Integrating Online and Offline 3D Deep Learning for Automated Polyp Detection in Colonoscopy Videos. Lequan Yu, Hao Chen, Qi Dou [CUHK] (2016). paper
- The authors compared three different CNN architectures. 说实话这个的参考价值很低,因为很大程序上取决于经验和试凑。
Discussions online
1. Are there any deep learning libraries that have 3D volumetric/spatial convolutions running on a CPU or a GPU?
A recent addition, but Keras now supports 3D convolution. It should work for voxels and video sequences.
2. 3D CNN in Keras - Action Recognition
3. Software: https://github.com/facebook/C3D
Separable 3D CNN
1. References papers
[1] Learning Separable Filters. Amos Sironi, Bugra Tekin, Roberto Rigamonti [EPFL] 2014. paper -- check Section 5.5.
2. Try on
Examine the separability of the kernels in the pre-trained CNNs, check http://www.mathworks.com/matlabcentral/fileexchange/28238-kernel-decomposition
Some Questions
- 在论文的最后作者给出了3D的卷积核的可视化图,我不清楚放这个有什么用,能说明什么结果?
- 我发现作者每篇论文中的网络kernel,stride,pooing大小都不太一样,这个是凭感觉决定的吗?
- [3] paper 的多层融合以及各个层多权重$\eta_h$的训练,编程是怎么实现的?
- [4] paper 对于3D FCN代码是否有开源,文章中的结论是3D>2D,是否对于3D的FCN有其他细节的改进,因为根据我自己的实验结果,精确度差不多啊。
- 作者的团队现在在用什么框架,是自己编程还是用开源的代码,如今3D的代码Lasagne的开源程度如何?
祝好!