Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Software Configuration Learning and Recommendation

Abstract

With software systems becoming more and more complex and configurable, failures due to misconfigurations are becoming a prominent problem. In addition to the severe consequences such failures can cause, the diagnosis process can be difficult and costly. Besides, configurations also heavily affect the system performance. The selection of optimal configuration settings to achieve high performance is thus desired but usually takes a long time. This paper presents the efforts to help with the system correctness and performance problem by configuration analysis. To tackle the correctness problem and automatically detect software Misconfigurations, we take into account two important factors that are unexploited before: the interaction between the executing environment and the configuration file, and the rich correlations between configuration entries. We leverage the fact that with the emerging cloud virtual machines, more system data than just the configuration files are accessible. With the training data enriched with whole system information, our tool learns multiple aspects of the configuration files from the whole system stack, and thus is able to deal with a much broader range of configuration errors. At the same time, our tool provides a highly customizable interface that helps fully utilize users' domain knowledge, making the learning phase adaptive and effective. Results show that EnCore is effective in detecting both injected errors and known real world problems. In addition, it finds 37 new misconfigurations in 25 existing Amazon EC2 public images, as well as 24 new configuration problems in 22 images in a commercial private cloud. These previously undiscovered errors can cause problems in various aspects such as service unavailability and security issues. While correctness standard is usually consistent across different platforms and thus can be learnt with a large data set, different systems usually need different configuration settings to achieve high performance according to their specific characteristics. Therefore only learning the configuration values is not enough. Previous works try to automatically select the optimal configuration settings by trying out the whole space. However it usually takes a long time. We significantly reduce the computation time by analyzing the correlation and constraints of performance settings from source code

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View