海量视频数据推动了基于数据驱动的单目图像深度估计研究.针对现有方法存在不同对象深度分配层次感不够的问题,在相似场景具有相似深度的假设前提下,提出一种基于语义级分割和深度迁移的单目图像2D转3D的方法.首先使用分割迁移模型将输入图像的像素进行语义级分类;然后通过语义级分类结果对场景匹配进行约束;再次利用SIFT流建立输入图像和匹配图像间像素级对应关系,并由此将匹配图像的深度迁移到输入图像上;最后通过语义级分割约束的最优化深度融合模型为不同对象区域分配深度值。Make3D测试数据的实验结果表明,该方法估计的深度质量比现有深度迁移方法更高,与最优化融合深度迁移算法相比,平均对数误差和平均相对误差分别降低0.03和0.02个点.
The research of depth estimation from a single monocular image is promoted by available massive video data. Under the assumption that photometrically similar images likely have similar depth fields, in this paper we propose a novel 2D-to-3D method based on semantic segmentation and depth transfer to estimate depth information from a single input image. Firstly, semantic segmentation of the scene is performed and the semantic labels are used to guide the depth transfer. Secondly, pixel-to- pixel correspondences between the input image and all candidates are estimated through SIFT flow. Then, each candidate depth is warped by SIFT flow to be a rough approximation of the input's depth map. Finally, depth is assigned to different objects based on semantic labels guided depth fusion. The experimental results on Make3D datasets demonstrate that our algorithm outperforms the existing depth transfer methods where the average log error and relative error were reduced by 0.03 and 0.02 respectively.