在统一的能量优化框架下同时进行物体检测和语义分割是一种有效的完整场景理解方式,其中如何设计合适的高阶能量项并对其进行有效求解是2个关键问题.文中给出了对物体检测和语义分割之间的类别一致性进行有效建模的高阶能量项的3条设计准则,并据此给出一种鲁棒的高阶能量项及其对应的高效能量优化算法.首先对正确、错误的以及定位不准的3类物体检测器的限界框中的类别一致性分别进行建模,并表示为3个线性函数的下包络形式;然后证明了在?-expansion下仅需添加2个辅助变量即可通过图割算法对该高阶能量项进行高效求解.在PASCAL VOC 2010数据集上与多个代表性算法进行对比的实验结果表明,文中提出的高阶能量项模型在接受和拒绝物体检测器时,均能有效地约束物体检测和语义分割之间的一致性,且对定位不准的物体检测器具有鲁棒性.
Jointly solving the object detection and semantic segmentation under a unified energy minimization framework is a promising way towards a holistic scene understanding, in which how to design powerful expressive higher order potentials and how to construct the corresponding efficient inference algorithms are two key issues. In this work, we at first introduce three design criteria for suitable higher order potential to appropriately model label consistency between object detection and semantic segmentation, then based on these three criteria, a robust higher order potential and its corresponding efficient inference algorithm are proposed. Our proposed higher order potential separately models the label consistency of the pixels within the bounding boxes for true, false and inaccurate detectors, and can be represented as the lower envelope of three linear functions. By introducing only two auxiliary binary variables, it is proved the higher order ?-expansion move function can be transformed to submodular pairwise energy, which in turn can be efficiently minimized via graph cuts. The comparative experiments on PASCAL VOC 2010 dataset with the state-of-the-art algorithms showed that our proposed robust higher order potential could effectively model the label consistency of object detection and semantic segmentation for both accepted and rejected detectors, while keeping robust to the false detectors resulting from inaccurate localization.