镜头边界检测是基于内容的视频检索中的关键技术,提出一种利用TextTiling方法来识别视频镜头边界的算法。通过滑动窗口对视频进行初步切割,利用主成分分析将视频帧投影到特征子空间,并在投影空间上计算相邻帧间距离,再根据相邻窗口之间的深度值确定视频镜头边界。针对TREC-2001视频测试数据集的实验结果显示,该算法检测镜头边界的平均查全率和平均查准率分别为89%和96.5%。
Shot boundary detection is the key technology of content-based video retrieval. In this paper we propose an algorithm which identifies the video shot boundaries by using TextTiling. It makes initial segmentation on the video through a sliding window, and projects each frame onto a low-dimensional feature space with PCA (principal components analysis), and calculates in projection space the distance between two adjoining frames. Then it determines video shot boundaries according to the depth scores between the adjacent windows. Experimental result targeted at TREC-2001 video testing data set demonstrated that the algorithm proposed in this paper has the average recall of 89% and the average precision of 96.5%.