针对部件模型在描述目标上的局限性,提出了一种判别化的视觉文法模型.该模型利用文法的可描述性和可扩展性能够对通用目标类别进行描述并且处理一般化的识别任务.根据目标检测和姿态估计的特点将文法模型实例化为两个单任务文法,同时对比了文法的异同.通过分析检测与姿态估计在应用背景和研究方法上的互补性,进一步提出了一种联合识别文法.联合文法由一组判别符号合并两个单任务文法,其特点是实现了并行化的目标检测与姿态估计,而且能同时提升检测和估计性能.鉴于参数训练所面临的弱监督环境,引入带隐变量的结构化学习框架优化文法参数.实验分别在单任务和多任务场景下对比了部件模型与提出的联合文法.实验结果说明联合文法在性能上优于当前主流的检测模型和姿态估计模型.
Consider that the limitation of part-based models on the description of object categories,we propose a discriminative grammar model.The model,which has powerful description ability and extensibility,can represent general objects and deal with common recognition tasks.We define two instantiations of the grammar model for object detection and pose estimation and then discuss the differences and similarities between them.Viewed from application background and current research methods,there is great complementarity in object detection and pose estimation.This paper further introduces a novel grammar that is constructed by combining two single-task grammars using a set of discriminative symbols.There are two characteristics for the combined grammar.First,it supports joint detection and pose estimation.Second,it can improve the detection performance of both tasks.For learning grammar parameters with weak supervision we utilize a structural SVM with latent variables.We compare the combined grammar with part-based models in single-task scenario and multiple-task scenario.The evaluated results demonstrate that the proposed grammar outperforms the state-of-the-art detection models and pose estimation models.