Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Comparing human and machine visual perception

Abstract

In this dissertation, we focus on examining differences in perception between humans and computer vision models and contribute novel research methods to increase their alignment. In recent studies comparing how humans and deep neural networks used in computer vision perceive visual stimuli, we find extensive evidence on how these highly performant models' visual perception often poorly aligns with human perception. For example, these models have been shown to classify objects in a scene solely based on a small fraction of border pixels in an image (Carter et al., 2021), preferentially attend to information outside the human frequency sensitivity spectrum (Subramanian et al., 2023), and (in)famously classify images by local texture rather than by global form (Geirhos et al., 2019b). These deviations of machine vision are often due to their overreliance on short-range features and our first set of contributions directly address this by adding lateral connections---critical for long-range spatial feature processing in biological vision---into deep neural networks. First, in Chapters 2 and 3, we introduce the bio-inspired DivNormEI and V1Net models respectively which implement feedforward and recurrent lateral connections in deep neural networks (DNNs). We show that these models develop bio-realistic orientation tuning and directly lead to robust object recognition/segmentation. We also show that recurrent lateral connections give rise to parameter-efficient contour integration (a task well-known to test long-range feature integration capacity). In Chapter 4, we introduce LocRNN, a high-performing recurrent circuit evolved from V1Net and propose combining it with Adaptive Computation Time (ACT) to learn a dynamic instance-conditional number of RNN timesteps. ACT enables LocRNN to generalize in a zero-shot manner to novel test-time difficulty levels of challenging visual path integration tasks. These chapters together highlight the effectiveness of our proposed bio-inspired design in creating human-like robustness to out-of-distribution settings. Complimentary to bio-inspired design, we also propose a new way to compare human and machine perception; advancing this area helps us better identify factors of deviation between these systems and guides us in building future neural networks with stronger alignment. In an elaborate psychophysics study described in Chapter 5, we explored how humans and deep neural networks alike, can be tricked by barely noticeable adversarial changes to images. We discuss the degree of alignment between the two visual systems and identify factors which influence this alignment. Our actionable predictions we discuss in this Chapter inspires the design of future neural network models with a goal of strengthening their alignment to human perception. We conclude this dissertation by eliciting important future directions of expansion of the research described here to build the next generation of computer vision models increasingly aligned with human vision.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View