目标检测与姿态估计在当前视觉研究中分属不同的任务,但两者在研究方法和现实应用上具有较强的互补性.提出了一种混合的层次树模型,该模型包含3类结点,分别描述整体目标、判别部件和组件(即语义部件).中间层的判别部件兼顾承上(目标)与启下(组件)的功能,一方面刻画整体目标的局部特征,另一方面隐含多组件的共现信息.相比当前最新的联合模型,层次树模型能够并行化处理检测与估计,避免串联化联合引发的错误传播.采用基于隐变量的结构化支持向量机训练模型,同时提出了一种新的部件学习方法以自动地初始化和优化判别部件.实验设计了多任务识别和单任务识别2种评估场景,对比了本文模型与当前主流的联合识别模型,实验结果说明层次化模型具有更强的识别性能以及更高的时效性.
Object detection and pose estimation belong to different tasks in computer vision.Viewed from research methods and practical application,there is great complementarity between these two tasks.This paper presents a mixture of hierarchical tree models that consists of three types of nodes,representing the whole object,discriminative parts and components(i.e.semantic parts)respectively.A key point of the model is that the discriminative parts in the middle level characterize not only object features but also mutual information among components.The proposed model can detect articulated objects and estimate their poses in parallel so as to address the error propagation problem that exists in previous joint models.For training the model,we use a latent structured SVM method where the discriminative nodes are viewed as latent variables.A novel learning method is introduced to initialize and optimize the parameters of the discriminative parts automatically.In experiments we design two evaluation scenarios(i.e.multi-task recognition and single-task recognition)to compare the proposed model and obtain the performance with the state-of-the-art joint methods on PASCAL VOC datasets.The results show that the hierarchical model not only outperforms other joint models in both recognition rate,but also has higher time-effectiveness.