Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Automatic Detection and Repair of Input Validation and Sanitization Bugs

Abstract

A crucial problem in developing dependable web applications is the

correctness of the input validation and sanitization. Bugs in string

manipulation operations used for validation and sanitization are common,

resulting in erroneous application behavior and vulnerabilities that are

exploitable by malicious users. In this dissertation, we investigate the

problem of automatic detection and repair of validation and sanitization bugs

both at the client-side (JavaScript) and the server-side (PHP or Java) code.

We first present a formal model for input validation and sanitization

functions along with a new domain specific intermediate language

to represent them. Then, we show how to extract input validation and

sanitization functions in our intermediate language from both client and

server-side code in web applications. After the extraction phase, we use

automata-based static string-analysis techniques to automatically verify

and fix the extracted functions. One of our contributions is the development

of efficient automata-based string analysis techniques for frequently used,

complex string operations.

We developed two basic approaches to bug detection and repair: 1)

policy-based, and 2) differential. In the policy-based approach, input

validation and sanitization policies are expressed using two regular

expressions, one specifying the maximum policy (the upper bound for the

set of strings that should be allowed) and the other specifying the minimum

policy (the lower bound for the set of strings that should be allowed). Using

our string analysis techniques we can identify two types of errors in

an input validation and sanitization function: 1) it accepts a set of strings that

is not permitted by the maximum policy (i.e., it is under-constrained),

or 2) it rejects a set of strings that is permitted by the minimum policy

(i.e., it is over-constrained).

Our differential bug detection and repair approach does not require any

policy specifications. It exploits the fact that, in web applications,

developers typically perform redundant input validation and sanitization

in both the client and the server-side since client-side checks can

be by-passed. Using automata-based string analysis, we compare the

input validation and sanitization functions extracted from the client- and

server-side code, and identify and report the inconsistencies between them.

Finally, we present an automated differential repair technique that can

repair client and server-side code with respect to each other, or across

applications in order to strengthen the validation and sanitization

checks. Given a reference and a target function, our differential repair

technique strengthens the validation and sanitization operations in the

target function based on the reference function by automatically generating

a set of patches.

We experimented with a number of real world web applications and found many

bugs and vulnerabilities. Our analysis generates counter-example behaviors

demonstrating the detected bugs and vulnerabilities to help the developers

with the debugging process. Moreover, we automatically generate patches

that can be used to mitigate the detected bugs and vulnerabilities until

developers write their own patches.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View