Prototyping the Developer Experience for Data Science Practitioners and Instructors
Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Prototyping the Developer Experience for Data Science Practitioners and Instructors

Abstract

Data science encompasses the most prominent collection of methods for creating scientific knowledge in the 21st century. Currently, data scientists must navigate a wide-ranging and often incoherent ecosystem of tools, in addition to organizing sociotechnical interactions with colleagues across many fields of expertise.This predicament motivates my thesis: The elements of data science work that are based in human expertise and social relationships must be integrated into existing programming workflows to create the developer experience that data scientists require to be successful. This dissertation supports my thesis by presenting three empirical studies and two tools. First, I investigated how professional data scientists teach novices about data science focused programming workflows, including how to adapt software development tools to their work, how to navigate the full depth of the stack of technologies that data science relies on, and how to use their tools to help communicate their findings. Then I explored how a team of academic data scientists repurposed the tools from their everyday data science work to create a data science course designed to reach traditionally underrepresented groups in computing. Finally, I examined how consulting data scientists interact with their clients, how their working relationships take them beyond well-characterized programming-oriented cycles, and how they achieve success by integrating designerly work into their data analysis process. These studies inspired me to develop two tools: 1. Datamations animates each step in a data analysis pipeline via transitions that show how rows, columns, and cells move within a data frame. 2. Tidy Data Tutor creates step-by-step interactive illustrations for a data analysis pipeline, so that every individual cell can be tracked. The main research findings of this dissertation are that data scientists adapt software engineering tools to fit into their own workflows, and that data scientists must communicate the uncertainty that they face in their work to novices. Additionally, this dissertation found that several nested cycles are required for data scientists to achieve success in collaboration with their colleagues. Finally, my prototype tools showed that animations and illustrations derived from data wrangling code can help convey a clearer understanding of data analysis pipelines.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View