目前的人体行为理解技术存在对噪声敏感,运算量大,没有关注场景敏感度,未进行事件整体描述的缺陷,并且存在理解结果与人类认知之间的语义鸿沟。一种八元组视频语义模型被提出,该模型既考虑了场景层次语义的理解,又融入了三维人体语义模型,引入了分析和综合的方法来实现视频事件的整体描述。实验结果显示:该方法在识别率和整体性能上都优于基于上下文无关(CFG)的行为识别方法,并在一定程度上解决了语义鸿沟问题。
The human body behavior recognition technology has many defects, such as: noise sensitivity; large operational volume; ignoring scenes and can not describe the whole video affair, and there is semantic wide gap between the comprehension result and human cognition. An eight tuples video semantic model is put forward, this model includes the understanding of level scenes, and 3D human body semantic model, Analysis and synthetic methods are used to describe the whole video affair. The experimental results show the way above precede based Context Free Grammar (CFG) in recognition ratio and overall performance.