利用局部特征描述符来表征视频中一系列关键点的方法已被广泛应用于识别复杂场景下的人体行为,但这些关键点之间隐含的结构化的位置关系目前并未得到有效表征.为此,文中首先采用尺度不变的关键点的检测子和3D-Harris检测子检测视频样本中的局部关键点,结合已有的局部特征描述符和形状描述符来表征关键点位置之间结构化的信息,然后利用bag-of-features模型来计算这些特征的分布,再通过模糊积分对这些局部特征进行有效融合,并给出具体的算法描述.在具有复杂场景的YouTube数据集下的实验表明,所提出的局部特征表征方法能够更有效地表征复杂场景中的人体行为,模糊积分融合方法可有效进行决策层融合.
The approach to representing a series of key points in videos by using local feature descriptors has been widely applied to the recognition of human action in complex scenes.However,the important structural information among the key points has not been investigated yet.In this paper,first,a scale-invariant key point detector and a 3D-Harris detector are used to find the local key points in video samples.Next,the existing local feature descriptor and shape descriptor are employed to describe the structural information about the positions of the key points.Then,the bag-of-features model is utilized to calculate the distribution of the features.Finally,the fuzzy integral scheme is used to fuse the local features,with the corresponding algorithm being also described.It is found form the experiments on the YouTube dataset in complex scenes that the proposed approach to local feature description effectively represents the human action in complex scenes,and that the fuzzy integral fusion scheme is effective in integrating the advantages of the descriptors on the decision level.