当前信息时代,随着计算机和多媒体技术的发展,在互联网尤其是移动互联网中,因视频数据结构复杂,特征维度高,其存储、传输和检索都面临着巨大的挑战,视频哈希学习是解决上述挑战的重要方法之一,已成为多媒体处理领域的研究热点.现有方法主要是利用视频不同特征构造视频哈希,但不同特征存在关联关系,为充分利用视频不同特征之间的关联关系,克服传统视频哈希编码的局限性,提出一种基于特征融合和曼哈顿量化的视频哈希学习方法.该方法首先提取视频的全局、局部和时域特征,并利用张量分解理论实现不同特征的融合,获取视频融合特征表示.然后使用曼哈顿量化对视频融合特征进行量化学习编码,得到视频哈希序列.与传统视频哈希算法相比,该方法不仅充分利用了多特征之间的关联互助关系,而且对原始视频特征的不同维度分别进行编码,较好的保持了原始特征之间的结构相似性.实验结果显示,该方法具有较好的性能.
With the development of computer and multimedia technologies,video storage,transmission and retrieval are facing a huge challenge in the Internet especially the mobile Internet,due to the complex structure and high dimension of the video.Video hash learning is one of the important ways to solve the challenge,and it becomes one of the hot topics in the field of multimedia processing.As known,the existing methods generate video hashes using different types of features.In fact,there are potential relationships among different types of video features.Therefore,to make full use of the relationships among different video features and overcome the limitations of traditional video hashing methods,we proposed a video hash learning method based on feature fusion and Manhattan quantization inthis paper.In the proposed method,the global,local and temporal features are firstly extracted from the video content,and the video clip is considered as a third-order tensor.Then,the tensor decomposition,which is popularly applied in multi-dimensional data processing,is used to fuse the global,local and temporal features.The three low-order tensors are obtained after tensor decomposition,and we concatenate them as the fusion representation of video content.Subsequently,the fused video feature is quantified by Manhattan quantization to get the video hash codes,which are used to construct the final video hash.Compared with the traditional video hashing methods,the proposed method not only makes full use the relationship among different video features,but also achieves the goal of coding with different dimensions respectively,which can well preserve the structural similarity among different video features.Two kinds of experiments are conducted to evaluate the performance of the proposed method,and the results show that the proposed method has a good performance compared with the existing methods.