东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

GPU近实时线性双目立体代价聚合

ISSN号：1006-8961
期刊名称：《中国图象图形学报》
时间：0
分类：TP301.6[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]武汉科技大学信息科学与工程学院,武汉430081, [2]武汉科技大学计算机科学与技术学院,武汉430074
相关基金：国家自然科学基金项目（61105070）

关键词：双目视觉, 代价聚合, GPU通用计算, 并行计算, stereo vision, cost aggregation, general purpose GPU, parallel computing

中文摘要：

目的近年来双目视觉领域的研究重点逐步转而关注其“实时化”策略的研究,而立体代价聚合是双目视觉中最为复杂且最为耗时的步骤,为此,提出一种基于GPU通用计算（GPGPU）技术的近实时双目立体代价聚合算法.方法选用一种匹配精度接近于全局匹配算法的局部算法——线性立体匹配算法（linear stereo matching）作为代价聚合策略;结合线性代价聚合的原理,对其主要步骤（代价计算、均值滤波及系数求解等）的计算流程进行有针对性地并行优化.结果对于相同的实验样本,用本文方法在NVIDA GTX780实验平台上能在更短的时间计算出代价矩阵,与原有的CPU实现方法相比,代价聚合的效率平均有了数十倍的提升.结论实时双目立体代价聚合方法,为在个人通用PC平台上实时获取高质量双目视觉深度信息提供了一个高效可靠的途径.

英文摘要：

Objective Stereo vision depends on feasible approaches for real-time/hardware implementation.Cost aggregation,the most complex part of the stereo matching algorithm,substantially affects the overall running time.Therefore,this study proposes a novel parallelization strategy to map the stereo cost aggregation of graphics processing units （GPUs） using compute unified device architecture （CUDA）.Method The linear stereo matching algorithm is selected as the stereo cost aggregation strategy in the proposed approach.Linear stereo matching with constant complexity can achieve more accurate disparity maps than global disparity optimization methods.Although its computation complexity is considerably less than that of most global approaches,linear stereo matching,even when optimized by some effective strategies,remains to demonstrate a performance that exceeds real-time or near real-time requirements for practical applications.The parallelization strategy introduced in this study is based on a separable filter with linear complexity in the filter window size and with proven efficiency on GPU platforms.The computation for each step （cost computation,mean filter,and coefficients computation） of the cost aggregation is reformulated,and the rational use of different types of GPU memory is ensured.This study proposes several parallelization optimizations to increase parallelism degree and data throughput.After being optimized by these parallelization optimizations,our approach ensures that the computation of each CUDA thread is independent of other threads and maximizes parallelism degree.These parallelization optimizations also reduce the complexity of each thread from the exponential relationship to the linear relationship with window radius and further improve the efficiency.The efficiency of the memory access and the data throughput are also dramatically improved in our final implementation,cached by texture or shared memories in certain circumstances.These experimental results show that the proposed strategy is

同期刊论文项目