Rothrock, Brandon

Stochastic Image Grammars for Human Pose Estimation

2013

Rothrock, Brandon
Advisor(s): Zhu, Song-Chun

Abstract

Robust human pose estimation is of particular interest to the computer vision community, and can be applied to a broad range of applications such as automated surveillance, human-computer interaction, and human activity recognition. In this dissertation, we present a framework for human pose estimation based on stochastic image grammars. Humans in particular are difficult to model, as their articulated geometry, camera viewpoint, and perspective, can produce a very large number of distinct shapes in images. Furthermore, humans often exhibit highly variant and amorphous part appearances, have self-occlusion, and commonly appear in cluttered environments. Our approach capitalizes on the reconfigurable and modular nature of grammatical models to cope with this variability in both geometry and appearance. We present a human body model as a stochastic context-sensitive AND-OR graph grammar, which represents the body as a hierarchical composition of primitive parts while maintaining the articulated kinematics between parts. Each body instance can be composed from a different set of parts and relations in order to explain the unique shape or appearance of that instance. We present grammar models based on coarse-to-fine phrase-structured grammars as well as dependency grammars, and describe efficient algorithms for learning and inference from both generative and discriminative perspectives. Furthermore, we propose extensions to our model to provide ambiguity reasoning in crowded scenes through the use of composite cluster sampling, and reasoning for self-occlusion and external occlusion of parts. We also present a technique to incorporate image segmentation into the part appearance models to improve localization performance on difficult to detect parts. Finally, we demonstrate the effectiveness of our approach by showing state-of-art performance on several recent public benchmark datasets.

Main Content

For improved accessibility of PDF content, download the file to your device.

UCLA

Stochastic Image Grammars for Human Pose Estimation