Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Design and Use of Computational Notebooks

Abstract

Individuals and organizations increasingly rely on data analysis to generate insights and make decisions. Yet, small changes in how data are collected, cleaned, or modeled can lead to vastly different results. If data-driven insights are to be reviewed, reused, or trusted the process used to generate them must be tracked and communicated in detail. But data analysis is typically an iterative and exploratory process that is hard to articulate, especially when it involves programming. Computational notebooks aim to ease tracking and sharing of complex analyses by enabling analysts to write rich \emph{computational narratives} combining executable code, interactive visualizations, and explanatory text in a single document. While millions of people use computational notebooks, we know little about how they use them, or how well they help people track and share complex analyses.

In this dissertation I present three studies of how people currently use computational notebooks, demonstrating that few notebooks, even those published alongside academic papers, have much in the way of narrative. Instead, most notebooks are loose collections of notes and scripts that even the original analyst struggles to understand. I then present two systems demonstrating how computational notebooks might be designed to support clearer communication of complex analyses. The first system, Janus, shows how current notebooks might be modified to aid both ongoing analysis and later communication by adding interactive hierarchy for selectively showing and hiding portions of the notebook. The second system, ActiveNotes, a prototype clinical note editor, demonstrates how computational notebooks might support data-driven work even when programming is not the primary means of interacting with data.

Together, these studies demonstrate that tracking and sharing of complex analyses is hindered by a tension between exploration and explanation, but that computational notebooks and other media can reduce this tension by supporting not only the combination of, but also flexible organization and navigation of analytical steps, explanatory text, and computed results.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View