Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Online Activity Understanding and Labeling in Natural Videos

Creative Commons 'BY-NC' version 4.0 license
Abstract

Understanding human activities in unconstrained natural videos is a widely studied problem, yet it remains as one of the most challenging problems in computer vision. It has many practical applications, including, but not limited to security, surveillance, and robotic vision. The primary requirement of these applications is to learn a recognition model that can differentiate among different types of activities. However, learning a recognition model requires lots of data and these data need to be labeled. Enormous amount of unlabeled videos are being generated from various sources. Labeling these video data is the biggest challenge that an activity understanding method encounters. State-of-the-art approaches generally learn fixed or static recognition model based on the assumptions that all the training data are labeled and available beforehand. But these assumptions are unrealistic for many applications where we have to deal with streaming videos or surveillance cameras. In such scenarios, new unlabeled video data may come over time and can be leveraged upon to incrementally improve the current model. In this thesis, we aim to develop frameworks for online activity understanding by taking advantage of unlabeled video data with a reduced labeling cost and without compromising performance.

We present four distinct frameworks for continuous learning of activity recognition models. We leverage upon tools and techniques from various branches of machine learning and deep learning and effectively combine them together with computer vision in order to develop novel and efficient solutions. Our proposed methods reduce the labeling cost by a significant margin, yet their performances are highly competitive with the state-of-the-art methods. An integral part of activity understanding is to temporally segment the activities from a long video sequence. In this thesis, we also present an approach for activity segmentation that requires limited human supervision.

The first approach we propose is based on an ensemble of SVM classifiers. Given the set of unlabeled data, we select the most informative queries to be labeled by a human annotator. Then, we train new SVMs and include them into the ensemble with updated weights. In the second approach, we take the advantage of contextual relationship among the activities and the objects in the video sequence. We encode contextual information using a conditional random field and then, present a novel active leaning algorithm that utilizes both of the entropy and the mutual information of the activity nodes. In the third approach, we further reduce human effort by making early prediction of the activity labels and providing dynamic suggestions to the annotator. State-of-the-art approaches do not scale with the growing number of video categories and they do not consider the cost of long viewing time for video labeling. Our proposed framework uses label propagation and LSTM based recurrent neural network that effectively selects informative queries and provides early suggestions to the annotator. In the fourth approach, we propose a continuous activity learning framework by intricately tying together deep hybrid feature models and active learning. This allows us to automatically select the most suitable features and to take the advantage of incoming unlabeled instances to improve the existing model incrementally. Finally, we propose a method for temporally segmenting meaningful activities that require very limited supervision. Perceiving these activities in a long video sequence is a challenging problem due to ambiguous definition of meaningfulness as well as clutters in the scene. We approach this problem by learning a generative model for regular motion patterns. We propose two methods that are built upon the autoencoders for their ability to work with unlabeled data.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View