单个RGB-D图像的深度补全
主页:http://deepcompletion.cs.princeton.edu/
Github:https://github.com/yindaz/DeepCompletionRelease
Paper:http://deepcompletion.cs.princeton.edu/paper.pdf
Abstract
Goal--complete the depth channel of
an RGB-D image
Problem--Commodity-grade depth cameras often fail to sense depth for shiny, bright, transparent, and distant surfaces
Method--takes an RGB image as input and predicts dense surface normals and occlusion boundaries. Those predictions are then combined with raw depth observations provided by the RGB-D camera to solve for depths for all pixels, including those missing in the original observation
Introduction
Goal: to complete the depth channel of
an RGB-D image captured with a commodity camera (i.e., fill all the holes) 填充深度图的空缺
以前的depth inpainting (深度修复)方法 使用 hand-tuned(手工调整)来解决,该方法通过 外推边界表面、马尔可夫图像合成;来fill holes
深度网络已经用于depth estimation,但还未用来depth completion,因为有以下几个难点:
- Training data
Large-scale training sets are not readily available for captured RGB-D images paired with ”completed” depth images (e.g., where ground-truth depth is provided for holes)
对于和 补全的深度图 配对的 捕获的RGB-D图像,这样的大规模训练数据不易获得
这样 depth estimation只能重现 observed depth,不能估计 unobserved 深度信息
本文引入了新数据集,105432张RGB-D 图像,且与在72个真实环境中从大规模表面重建计算出的完整深度图像对应
- Depth representation
深度表示
直接用FCN回归depth不work尤其是对图1中缺失那么大的一片区域,因为即使对人类来说,从单眼 color image 估计精确深度也很难啊
所以本文先用网络预测depth的 局部微分属性:表面法线&遮挡边界
- Deep network design 深度网络设计
以前没有人去训练一个端对端网络去从RGB-D图像补全深度
一个想法是扩展一下前人color to depth 的网络,但是
they generally learn only to copy and interpolate the input depth
它们通常只学习对输入的depth复制和插值。
It is also challenging for the network to learn how to adapt for misalignments of color and depth
这里的 不对齐 具体指啥,有空间位置的不对齐吗.有颜色信息的像素不一定有深度信息?
本文是 只将 color image 作为输入,先预测 local surface normals and occlusion boundaries with supervision,因为 从颜色信息预测局部特征是深度网络可以胜任的。然后通过一个把这些预测和输入depth组合起来的全局优化问题来complete depth
The coarse-scale structure of the scene is reconstructed through global optimization with regularization from the input depth
通过从输入深度进行正则化的全局优化来重建场景的粗尺度结构。
Main Insight
- prediction of surface normals and occlusion boundaries only from color
- optimization of global surface structure from those predictions with soft constraints provided by observed depths
好处:这样做smaller relative error ,网络独立于observed depth,不需要因为新的depth sensors再次训练 ?
Related work
- Depth estimation
depth estimation from a monocular color image 单目彩色图像
- Classic methods:
Shape-from-shading
R. Zhang, P.-S. Tsai, J. E. Cryer, and M. Shah. Shape-from-shading: a survey. IEEE transactions on pattern analysis and machine intelligence, 21(8):690–706, 1999.
Shape-from-defocus
S. Suwajanakorn, C. Hernandez, and S. M. Seitz. Depth from focus with your mobile phone. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3497–3506, 2015. 2
Others
based on hand-tuned models and/or assumptions about surface orientations
D. Hoiem, A. A. Efros, and M. Hebert. Automatic photo pop-up. ACM transactions on graphics (TOG), 24(3):577– 584, 2005.
A. Saxena, S. H. Chung, and A. Y. Ng. Learning depth from single monocular images. In Advances in neural information processing systems, pages 1161–1168, 2006.
A. Saxena, M. Sun, and A. Y. Ng. Make3d: Learning 3d scene structure from a single still image. IEEE transactions on pattern analysis and machine intelligence, 31(5):824– 840, 2009. 2
- Newer methods
based on DL
D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolu- tional architecture. In Proceedings ofthe IEEE International Conference on Computer Vision, pages 2650–2658, 2015.
D. Eigen, C. Puhrsch, and R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems, pages 2366–2374, 2014.
I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab. Deeper depth prediction with fully convolutional residual networks. In 3D Vision (3DV), 2016 Fourth Interna- tional Conference on, pages 239–248. IEEE, 2016.
F. Liu, C. Shen, G. Lin, and I. Reid. Learning depth from sin- gle monocular images using deep convolutional neural fields. IEEE transactions on pattern analysis and machine intelli- gence, 38(10):2024–2039, 2016.
A. Roy and S. Todorovic. Monocular depth estimation using neural regression forest. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 5506–5514, 2016.
we focus on depth completion, where the explicit goal is to make novel predictions for pixels where the depth sensor has no return.
上面的方法只能重现 商用RGB-D获得的原始深度,本文关注的 深度补全 是对深度传感器没有返回值 的像素点进行全新的深度预测
- Depth inpainting 深度修复
filling holes in depth channels of RGB-D images
-Old methods
D. Herrera, J. Kannala, J. Heikkil¨a, et al. Depth map inpainting under a second-order smoothness prior. In Scandinavian Conference on Image Analysis, pages 555–566. Springer, 2013
X. Gong, J. Liu, W. Zhou, and J. Liu. Guided depth enhancement via a fast marching method. Image and Vision Computing, 31(10):695–703, 2013.
J. Liu, X. Gong, and J. Liu. Guided inpainting and filtering for kinect depth maps. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 2055–2058. IEEE, 2012.
M. Bertalmio, A. L. Bertozzi, and G. Sapiro. Navier-stokes, fluid dynamics, and image and video inpainting. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings ofthe 2001 IEEE Computer Society Conference on, volume 1, pages I–I. IEEE, 2001.
J. Liu and X. Gong. Guided depth enhancement via anisotropic diffusion. In Pacific-Rim Conference on Multimedia, pages 408–417. Springer, 2013.
K. Matsuo and Y. Aoki. Depth image enhancement using local tangent plane approximations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 3574–3583, 2015.
S. M. Muddala, M. Sjostrom, and R. Olsson. Depth-based inpainting for disocclusion filling. In 3DTV-Conference: The True Vision-Capture, Transmission and Display of3D Video (3DTV-CON), 2014, pages 1–4. IEEE, 2014.
A. K. Thabet, J. Lahoud, D. Asmar, and B. Ghanem. 3d aware correction and completion of depth maps in piecewise planar scenes. In Asian Conference on Computer Vision, pages 226–241. Springer, 2014.
W. Chen, H. Yue, J. Wang, and X. Wu. An improved edge detection algorithm for depth map inpainting. Optics and Lasers in Engineering, 55:69–77, 2014.
H.-T. Zhang, J. Yu, and Z.-F. Wang. Probability contour guided depth map inpainting and superresolution using nonlocal total generalized variation. Multimedia Tools and Applications, pages 1–18, 2017.
Y. Zuo, Q. Wu, J. Zhang, and P. An. Explicit edge inconsis- tency evaluation model for color-guided depth map enhancement. IEEE Transactions on Circuits and Systems for Video Technology, 2016.
H. Xue, S. Zhang, and D. Cai. Depth image inpainting: Improving low rank matrix completion with low gradient regularization. IEEE Transactions on Image Processing, 26(9):4311–4320, 2017.
M. Kulkarni and A. N. Rajagopalan. Depth inpainting by tensor voting. JOSA A, 30(6):1155–1165, 2013.
M. Liu, X. He, and M. Salzmann. Building scene models by completing and hallucinating depth and semantics. In European Conference on Computer Vision, pages 258–274. Springer, 2016
J. T. Barron and J. Malik. Intrinsic scene properties from a single rgb-d image. In Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pages 17–24, 2013.
M. Ciotta and D. Androutsos. Depth guided image completion for structure and texture synthesis. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pages 1199–1203.
D. Doria and R. J. Radke. Filling large holes in lidar data by inpainting depth gradients. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pages 65–72.
J. Gautier, O. Le Meur, and C. Guillemot. Depth-based image completion for view synthesis. In 3DTV Conference: The True Vision-capture, Transmission and Display of 3D Video (3DTV-CON), 2011, pages 1–4.
- DL methods:
Auto-encoder
A. van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al. Conditional image generation with pixel-cnn decoders. In Advances in Neural Information Processing Systems, pages 4790–4798, 2016
GAN
D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. In Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pages 2536–2544, 2016
先前的方法没有研究depth images的inpainting,由于depth images缺少鲁棒的特征strong features 和 大规模训练数据,这是比较难的问题
- Depth super-resolution 深度超分辨
to improve the spatial resolution of depth images using high-resolution color
Markov random fields
[44] [12] [42] [51] [58]
O. Mac Aodha, N. D. Campbell, A. Nair, and G. J. Brostow. Patch based synthesis for single depth image super-resolution. In European Conference on Computer Vision, pages 71–84. Springer, 2012.
J. Diebel and S. Thrun. An application of markov random fields to range sensing. In Advances in neural information processing systems, pages 291–298, 2006
J. Lu, D. Min, R. S. Pahwa, and M. N. Do. A revisit to mrf-based depth map super-resolution and enhancement. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 985–988. IEEE, 2011
J. Park, H. Kim, Y.-W. Tai, M. S. Brown, and I. Kweon. High quality depth map upsampling for 3d-tof cameras. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 1623–1630. IEEE, 2011
E. Shabaninia, A. R. Naghsh-Nilchi, and S. Kasaei. High-order markov random field for single depth image super- resolution. IET Computer Vision, 2017
Shape-from-shading
[23] [71]
Y. Han, J.-Y. Lee, and I. So Kweon. High quality shape from a single rgb-d image under uncalibrated natural illumination. In Proceedings ofthe IEEE International Conference on Computer Vision, pages 1617–1624, 2013
L.-F. Yu, S.-K. Yeung, Y.-W. Tai, and S. Lin. Shading-based shape refinement of rgb-d images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1415–1422, 2013
Segmentation
[41]
J. Lu and D. Forsyth. Sparse depth super resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2245–2253, 2015
Dictionary methods
[18] [30] [45] [63]
W. T. Freeman, T. R. Jones, and E. C. Pasztor. Example-based super-resolution. IEEE Computer graphics and Applications, 22(2):56–65, 2002
M. Kiechle, S. Hawe, and M. Kleinsteuber. A joint intensity and depth co-sparse analysis model for depth map super-resolution. In Proceedings ofthe IEEE International Conference on Computer Vision, pages 1545–1552, 2013
M. Mahmoudi and G. Sapiro. Sparse representations for range data restoration. IEEE Transactions on Image Processing, 21(5):2909–2915, 2012
I. Tosic and S. Drewes. Learning joint intensity-depth sparse representations. IEEE Transactions on Image Processing, 23(5):2122–2132, 2014
尽管一些方法可以被用来 depth completion,但两者的关注点是不同的。
where low-resolution measurements are assumed to be complete and regularly sampled. In contrast, our focus is on filling holes, which can be quite large and complex and thus require synthesis of large-scale content
- Depth reconstruction from sparse samples 稀疏样本的深度重建
其他工作已经研究了用稀疏深度测量集增强的彩色图像的深度重建。
S. Hawe, M. Kleinsteuber, and K. Diepold. Dense disparity maps from sparse disparity measurements. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 2126–2133
L.-K. Liu, S. H. Chan, and T. Q. Nguyen. Depth reconstruction from sparse samples: Representation, algorithm, and sampling. IEEE Transactions on Image Processing, 24(6):1983–1996, 2015
F. Ma and S. Karaman. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. arXiv preprint arXiv:1709.07492, 2017
但是该研究的motivation是降低一定设置下的传感成本(例如 节省机器人的成本),而不是depth completion
Method
对应于introduction提到的三个难点,本文的研究也聚焦于以下三个问题:
- how can we get training data for depth completion
- what depth representation should we use
- how should cues from color and depth be combined
Dataset
to create a dataset of RGB-D
images paired with completed depth images
- Straight forward approach:
to capture images with a low-cost RGB-D camera and align them to images captured simultaneously with a higher cost depth sensor
这项任务的直接方法是使用低成本的RGB-D摄像机捕获图像,并将它们与使用成本较高的深度传感器同时捕获的图像对齐
但这方法昂贵耗时,这个类型的public dataset只包含少量的 indoor scenes
- 本文方法
utilize existing surface meshes reconstructed from multi-view RGB-D scans of large environments
for example: Matterport3D [6], ScanNet [10], SceneNN [28], and SUN3D[22,67]
a) For each scene, we extract a triangle mesh M with ∼1-6 million triangles per room from a global surface reconstruction using screened Poisson surface reconstruction.
对每个场景,使用筛选的泊松表面重建(screened Poisson surface reconstruction)从全局表面重构中提取三角形网格M,该网格在每个房间中含有1-6百万个三角形
b) for a sampling of RGB-D images in the scene, we render the reconstructed mesh M from the camera pose of the image viewpoint to acquire a completed depth image D*
对于场景中RGB-D图像的一个采样,从图像视点的摄像机姿态渲染重建的网格M,从而得到完整的depth image D*
这就得到了包含 RGB-D & D* 图像对的数据集!
疑问:多视点的 RGB-D images 的结合 是需要图像之间的 配准regirstration 吧?难道这个得到mesh的过程是原来的数据集现成的?全局表面重建是数据集现有的
参见
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang. Matterport3d: Learning from rgb-d data in indoor environments. Interna- tional Conference on 3D Vision (3DV), 2017
- 这样的得到的completed depth image 的优点,这些特性有助于网络的训练:
a)
have fewer holes. On average, 64.6% of the pixels missing from the raw depth images are filled in by our reconstruction process
b)
the completed depth images generally replicate
the resolution of the originals for close-up surfaces, but provide far better resolution for distant surfaces.
深度补全的图像对于近距离的表面复制了原始的分辨率,远距离的表面会比原始分辨率有提高
由于表面重建以与深度相机的分辨率相当的3D网格尺寸构造,因此在完成的深度图像中通常不会损失分辨率。 然而,当投影到视平面上时,相同的3D分辨率为远离相机的表面提供了有效的更高像素分辨率。 因此,完成的深度图像可以在渲染高分辨率网格时利用子像素抗锯齿来获得比原始图像更精细的分辨率(请注意图3中家具中的细节)。WHY
c)
completed depth images generally have far less noise than the originals
由于表面重建算法通过滤波和平均来组合来自许多摄像机视图的噪声深度样本,因此它基本上对表面进行去噪。 这对于远距离观测(例如,> 4米)尤其重要,其中原始深度测量被量化并且噪声很大
本文的数据集有 117516 RGB-D images with rendered completions:
Training set:105432;Test set:12084
Depth Representation
what geometric representation is best for deep depth completion
- Straight-forward approach:
to design a network that
regresses completed depth from raw depth and color.
然而difficult to predict absolute depth from monocular images,
as it may require knowledge of object sizes,scene categories,etc
本文是预测每个像素的局部属性,表面法线、遮挡边界
- Indirect representation of depth
a) relative depth [7]
b) depth derivatives [5] 做了实验,但不是最好的
c) depth derivatives & depth [35]
why use 表面法线 遮挡边界:
normals are differential surface properties差分表面属性, they depend only on local neighborhoods of pixels;
relate strongly to local lighting variations directly observable in a color image 和彩色图像中可直接观察到的局部光照变化非常相关 ?
so,的工作在从颜色图像到表面法线的密集预测效果不错 [1,15,34,66,75]
occlusion boundaries produce local patterns in pixels (e.g., edges), and so they usually can be robustly detected with a deep network [14, 75]
那么,如何从surface normals & occlusion boundary 计算深度:
- 理论上来说那是不可能的:
the depth relationships between different parts of the image cannot be inferred only from normals
见图4 a
- 在现实场景中,图像区域不太可能同时被遮挡边界包围,并且根本不包含原始深度观察(图4 b)
所以即使是较大的空缺,也可以使用surface normals 补全深度,有预测遮挡边界加权。
其中预测的表面法线具有通过预测的遮挡边界加权的相干性,并且由观察到的原始深度限制正则化
Network Architecture and training
what is the best way to
train a deep network to predict surface normals and occlusion boundaries for depth completion?
- Network architecture:
[75]
Y. Zhang, S. Song, E. Yumer, M. Savva, J.-Y. Lee, H. Jin, and T. Funkhouser. Physically-based rendering for indoor scene understanding using convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
The model is a fully convolutional neural network built on the back-bone of VGG-16 with symmetry encoder and decoder 基于VGG16,有对称的编码器和解码器
用从重建网格计算出的表面法线和轮廓边界来训练网络
- How to train it
本文以法线估计为例,遮挡边界检测同理
a) What loss should be used to train the network
two choices: trained only on holes vs all pixels:
- loss of all pixels (oberved and holes pixels)
- loss for only unobserved pixels (holes) by masking out the gradients on other pixels during the back-propagation
trained with rendered normals VS raw normals ?
详见论文附件
对比实验结果:
the models trained with all pixels perform better than the ones using only observed or only unobserved pixels, and ones trained with rendered normals perform better than with raw normals
结果显示,所有像素训练的模型比仅使用观察到的或仅使用未观察到的像素训练的模型表现更好,而使用渲染法线训练的模型比使用原始法线训练的模型表现更好
b) What image channels should be input to the network
实验表明如果用RGB-D作为输入来预测法线,在holes部分像素的预测很差(尽管对于observed pixels work),推测这种网络只是从RGB-D中的depth channel预测normals,所以对于hole就不能work了
图5的结论启发作者仅用color image 预测 surface normals
separating “prediction without depth” from “optimization with depth” is compelling for two reasons:好处
- 对于不同的深度传感器不用重训练网络 (==不理解换了sensor就得re train==?)
- 优化可以推广到==各种深度观测作为正则化==,包括稀疏深度样本[43]
Optimizations
前面的网络预测得到 surface normal image N 和 occlusion boundary image B(==长什么样子?==)
求解一个方程组
目标函数由 四个 平方误差的加权求和
$E_D$
:估计的深度和原始观察到的深度的距离
$E_N$
:预测的深度 和表面法线的一致性 by 切线 点乘 法线
$E_S$
:促使相邻像素有相似的深度值
B:$B ∈ [0, 1] $
down- weights the normal terms based on the predicted probability a pixel is on an occlusion boundary $(B(p))$
==提问:如果在边界,实际是不满足法线垂直切线,所以减小他的权重?极端情况只考虑在遮挡边界的$E_N$
??==
this objective function is non-linear,
due to the normalization of the tangent vector$v(p, q)$
required for the dot product with the surface normal in$E_N$
==疑问:本来平方误差不就已经是非线性了吗==
approximate this error term with ==a linear formation by foregoing the vector normalization==, as suggested in [50].
目标函数的矩阵形式 是 稀疏 且 对称正定的,所以可使用==a sparse Cholesky factorization [11] 稀疏 Cholesky 分解== 来求解 近似的目标含函数
曲面法线和遮挡边界(以及可选的深度导数)仅捕获==曲面几何的局部属性==,这使得它们相对容易估计。 只有通过全局优化,我们才能将它们组合在一起,在一致的解决方案中补全所有像素的深度in a consistent solution。
Experiment Result
Unless otherwise specified, networks were pretrained on the SUNCG dataset [60, 75] and fine-tuned on the training split of the our new dataset using only color as input and a loss computed for all rendered pixels.
Optimizations were performed with$λ_D = 10^3$
,$λ_N = 1$
, and$λ_S = 10^{-3}$
. Evaluations were performed on the test split of our new dataset.
time cost
Task | time | Hardware |
---|---|---|
normals & occlusion boundaries | ~0.3 s | NVIDIA TITAN X |
solving equations | ~1.5 s | Intel Xeon 2.4GHz CPU |
Ablation Studies
Evaluation metrics
- median error relative to the rendered depth(Rel) 相对于渲染深度的==中值误差==
- the Root Mean Squares Error in meters(RMSE) 以米为单位的均方根误差
- percentages of pixels with predicted depths falling within an interval(
$\delta=|predicted-true|/true|)$
$\delta=1.05,1.10 ,1.25,1.25^2,1.25^3$
(以上衡量depth error,下面是衡量surface normals)
- mean and median errors in degrees 以度为单位的平均误差、中值误差
- the percentages of pixels with predicted normals less than thresholds of 11.25、22.5、30 degrees 预测==法线小于这些阈值的百分比==?
how different
test inputs, training data, loss functions, depth representations, and optimization methods affect the depth prediction results
1. What data should be input to the network
table 1展示了不同输入下的结果(表中箭头向上 越大越好;反之,越小越好)
例如 normal 的 median error 17.28 < 23.59;depth的 Rel 0.089<0.09
==补充材料==里还展示了不同loss 设置下(observed only VS unobserved only),这个优势依然存在
作者认为当为observed depth时,网络会学习进行插值而不是在holes合成新的depth。
++这个实验结果促使本文将整个方法 分为两个步骤 two stage system++ !!
2. What depth representation is best
train networks separately to predict depths(D),surface normals(N),and depth derivatives(DD) 深度导数,then use different combinations to complete the depth by optimization Equation 1.
表二 注意这里的D是从depth 预测 depth
以Rel为例 N 0.089 < N+DD 0.092 < DD 0.100 < D 0.167。
作者认为由于表面法线只代表了orientation of surfaces ,比较好预测,详见[31];而==且他不随深度的变化而变化,在不同的视图里更一致==
3. Does prediction of occlusion boundaries help
Whether down-weighting the effect of surface normals near predicted occlusion boundaries helps the optimizer solve for better depths
表2 yes 表示有B,No 表示没有down-weights 对比 0.089<0.110,提升约 19%。
occlusion boundaries ==区域的surface normals是 嘈杂,不准确的?== 图6
第2列是网络输出的法线和遮挡边界,第2行第3、4列为 是否有boundary weight 的对比。第1行的3、4列是从输出的深度图计算的surface normal。遮挡(闭塞)边界==提供了深度不连续性信息,有助于保持边界的清晰度/锐度==看从深度计算的法线图
4. How much observed depth is necessary
to test how much depth completion method depends on the quantity 数量 of input depths
degraded the input depth images by ==randomly masking different numbers of pixels== before giving them to the optimizer to solve for completed depths from predicted normals and boundaries
图 7
图像横轴是图像中具有深度的像素的个数(未被masked),左图展示了predicted depth accuracy of observed pixels,右图为predicted depth accuracy of unobserved pixels
显然unobserved 的accuracy 低于 observed;但是只要有一小部分的输入depth(==2000 depths 只占all pixels 的 2.5%==) .这从侧面说明即使是其他depth sensors designs with sparse measurements,也能得到比较客观的预测效果,==也不用重训练网络(网络输入只是颜色啊)== 但是你训练网络时的ground truth normals 来自rendered depth image 啊??如果只做个测试感觉确实不特别依靠raw depth的数目
Comparison to Baseline Methods
compares to baseline depth inpainting and depth estimation methods.
1.Comparison to Inpainting Methods
non-data-driven alternatives for depth inpainting
The focus of this study is to establish how well-known methods perform to provide a baseline on how hard the problem is for this new dataset 本研究的重点是确定知名方法的性能,以提供有关此新数据集的问题难度的基准
表3
表中的对比方法分别是 联合双线性滤波、快速双边求解、全局边缘感知能量优化
发现Rel是所有方法中最小的
图8展示了与 联合双线性滤波的比较
图8展示的结果看,本文方法的深度图边界更精确
2. Comparison to Depth Estimation Methods
和color to depth的深度估计方法对比
[33] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab. Deeper depth prediction with fully convolutional residual networks. In 3D Vision (3DV), 2016 Fourth International Conference on, pages 239–248. IEEE, 2016
[5] A. Chakrabarti, J. Shao, and G. Shakhnarovich. Depth from a single image by harmonizing overcomplete local network predictions. In Advances in Neural Information Processing Systems, pages 2658–2666, 2016
表 4
本文方法个指标都为最佳,提升23-40%。 Y表示 observed depth N表示 unobserved
这也表明 预测法线 对于深度估计问题也是不错的方法
注意看,不仅预测的深度更准确,而且通过对比计算出的surface normals,说明本文方法学习到了更好的场景几何结构
Conclusion
two main research contribution
First, it proposes to complete depth with a two stage process where surface normals and occlusion boundaries are predicted from color, and then completed depths are solved from those predictions.
Second, it learns to complete depth images by supervised training on data rendered from large-scale surface reconstruction 从大规模表面重建渲染得到的数据进行监督训练
搭建桥梁沟通了彩色图和深度图信息 桥就是normals!
显而易见,这是一个牺牲时间换取图像质量的游戏
1.速度很慢。
分辨率320x256的图像,使用NVIDIA TITAN X GPU还需要大约0.3秒;Intel Xeon 2.4GHz CPU上大约1.5秒.
2.依赖高性能硬件。难以控制成本