Hariharan, Bharath

Beyond Bounding Boxes: Precise Localization of Objects in Images

2015

Hariharan, Bharath
Advisor(s): Malik, Jitendra

Abstract

Object recognition in computer vision comes in many flavors, two of the most popular being object detection and semantic segmentation. Object detection systems detect every instance of a category in an image, and coarsely localize each with a bounding box. Semantic segmentation systems assign category labels to pixels, thus providing pixel-precise localization but failing to resolve individual instances of the category. We argue for a richer output: recognition systems should detect individual instances of a category and provide pixel precise segmentations for each, a task we call Simultaneous Detection and Segmentation or SDS. We describe approaches to this task that leverage convolutional neural networks for precise localization. We also show that the techniques we develop are also effective for other tasks such as segmenting the parts of a detected object or localizing its keypoints. These are our first steps towards a recognition system that goes beyond category labels and coarse bounding boxes to precise, detailed descriptions of objects in images.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Berkeley

Beyond Bounding Boxes: Precise Localization of Objects in Images