Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Deep Scene Understanding using RF and its Fusion with other Modalities

Abstract

Rich scene understanding is a critical first step in creating autonomous systems with situational awareness -- i.e. systems that can not only perceive and comprehend their environments but also project what the future states are going to be. Current vision-based methods of tackling this problem are inadequate as cameras are restricted to the visible spectrum. While they can detect objects, track movements, and make inferences about human expressions, they suffer from several challenges such as lack of depth information and weakness to bad weather conditions. Moreover, there are many other modalities in which information is present around us, and relying solely on one makes it susceptible to a higher chance of failure.

Through my thesis, I aim to include RF (radio-frequency) modality in scene understanding since RF has both complementary and supplementary properties to vision. My hypothesis is that by fusing RF with vision, one can create a richer understanding of their scene which I call ‘deep scene understanding’. There are four key enablers to deep scene understanding -- (1) Detection of objects’ states and activities, (2) Localization of objects in a scene and tracking them, (3) Developing methods to train machine learning models over RF data, and (4) Understanding privacy and societal impacts of instrumenting spaces with sensors.

RF comes with its own set of challenges that make this sort of integration hard. Additionally, instrumenting spaces with sensors such as RF sensors itself can lead to privacy concerns. In solving these challenges, we present -- (1) a framework to detect human activities using a mmWave radar that can ingest sparse and noisy radar point clouds and output what activity is being performed in the scene. (2) a framework to detect, identify and localize hidden objects such as cameras in a scene that may be monitoring a user but are not visible to the naked eye. (3) a radar-camera fusion framework that can estimate dense depth in a scene from a sparse radar point cloud and an image. (4) A self-supervised learning approach that can leverage mutual information between a camera and a radar to train the radar. (5) A user study to understand the privacy perceptions of users when spaces are equipped with sensors.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View