Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Reducing Faulty Executions of Distributed Systems

Abstract

When confronted with a buggy execution of a distributed system—which are commonplace

for distributed systems software—understanding what went wrong requires

significant expertise, time, and luck. As the first step towards fixing the underlying bug,

software developers typically start debugging by manually separating out events that are

responsible for triggering the bug (signal) from those that are extraneous (noise).

In this thesis, we investigate whether it is possible to automate this separation process.

Our aim is to reduce time and effort spent on troubleshooting, and we do so by

eliminating events from buggy executions that are not causally related to the bug, ideally

producing a “minimal causal sequence” (MCS) of triggering events.

We show that the general problem of execution minimization is intractable, but we

develop, formalize, and empirically validate a set of heuristics—for both partially instrumented

code, and completely instrumented code—that prove effective at reducing execution

size to within a factor of 4.6X of minimal within a bounded time budget of 12 hours.

To validate our heuristics, we relay our experiences applying our execution reduction

tools to 7 different open source distributed systems.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View