视频中的人群计数在智能监控领域具有重要价值.由于摄像机透视效果、图像背景、人群密度分布不均匀和行人遮挡等干扰因素的制约,基于底层特征的传统计数方法准确率较低.本文提出一种基于序的空间金字塔池化(Rank-based spatial pyramid pooling,RSPP)网络的人群计数方法.该方法将原图像分成多个具有相同透视范围的子区域并在各个子区域分别取不同尺度的子图像块,采用基于序的空间金字塔池化网络估计子图像块人数,然后相加所有子图像块人数得出原图像人数.提出的图像分块方法有效地消除了摄像机透视效果和人群密度分布不均匀对计数的影响.提出的基于序的空间金字塔池化不仅能够处理多种尺度的子图像块,而且解决了传统池化方法易损失大量重要信息和易过拟合的问题.实验结果表明,本文方法相比于传统方法具有准确率高和鲁棒性好的优点.
Crowd counting in videos has an important value in the field of intelligent surveillance. Due to the constraints resulting from camera perspective, uneven distribution of crowd density, background clutter, and occlusions, traditional low-level features-based methods suffer from low counting accuracy. In this paper, a new crowd counting method is proposed based on rank-based spatial pyramid pooling(RSPP) network. In the proposed method, the original image is divided into several sub-regions with the same scope of perspective, and then multi-scale sub-image blocks are respectively taken from different sub-regions. Rank-based spatial pyramid pooling network is used to get the numbers of pedestrians in sub-image blocks. Then summing the numbers of persons of all sub-image blocks gives the total number of people on the image. The proposed image blocking method eliminates the effect of camera perspective and uneven distribution of crowd density on crowd counting. The proposed rank-based spatial pyramid pooling can not only handle multi-scale sub-image blocks, but also solve the problem of huge important information loss and over-fitting encountered by traditional pooling methods. Experimental results show that the proposed method has the advantages of high accuracy and good robustness compared with traditional methods.