Liu, Sainan

Single-Image 2D to 3D Understanding

2021

Abstract

Visual perception plays an essential role in the human recognition system. We heavily rely on visual cues to accomplish daily tasks. Inspired by human vision and human recognition, computer vision has been widely studied in recent decades to assist human activities better. It has been proven to be highly beneficial to help everyday computer tasks, such as smartphone applications, robotics, and autonomous driving. The fundamental question of computer vision is to understand 3D information from 2D images. Over the years, using machine learning techniques, learning from a single image, research in this area has progressed from 2D recognition to predicting 2.5D images to 3D objects to complete room/street layout prediction. For computer vision to apply to daily tasks, we believe this is the perfect time to introduce the concept of panoptic 3D parsing, which puts the long-studied sub-problems into unified metrics.

In this dissertation, we first decompose the problem into two subcategories: 1. How to learn better effective priors to recognize objects in 3D. 2. How to enable computer vision neural networks to recognize objects in 2D from unseen views using 3D prior information with techniques inspired by the cognitive science community. In the final chapter, we present a set of networks that unify the understanding of 3D information from a single image thanks to the exploding development in modeling and computing and the availability of large-scale datasets.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego

Single-Image 2D to 3D Understanding