随着视频采集和网络传输技术的快速发展以及个人移动终端设备的广泛使用,大量图像数据以集合形式存在.集合内在结构的复杂性使得如何度量集合间距离成为图像集分类的一个关键问题.为了解决这一问题,提出了一种基于双稀疏正则的图像集距离学习框架(double sparse regularizations for image set distance learning,简称DSRID).在该框架中,两集合间距离被建模成其对应的内部典型子结构间的距离,从而保证了度量的鲁棒性和判别性.根据不同的集合表示方法,给出了其在传统的欧式空间以及两个常见的流形空间,即对称正定矩阵流形(symmetric positive definite matrices manifold,简称SPD manifold)和格林斯曼流形(Grassmann manifold)上的实现.在一系列的基于集合的人脸识别、动作识别和物体分类任务中验证了该框架的有效性.
With the development of video acquisition and transmission technologies, and the widespread applications of mobile terminal devices, more and more set-based images are available. The key issue of image set classification is how to measure the distance between two sets over the complexity of inner structure of the set. To address this problem, this paper presents a framework, called double sparse regularizations for image set distance learning (DSRID). In DSRID, the distance between two sets is calculated by the distance between two prominent sub-structures in each set, which enhances the robustness and discrimination of the measure. According to different set representations, this framework is implemented in traditional Euclidean space and two common manifolds, i.e., symmetric positive definite matrices manifold (SPD manifold) and Grassmann manifold. Extensive experiments demonstrate the effectiveness of the proposed method on set-based face recognition, action recognition and object categorization.