Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Bicyclist Exposure Estimation Using Heterogeneous Demand Data Sources

Abstract

Quantifying risks and the effects of risk factors requires controlling for exposure,

or the number of opportunities for the adverse outcome in question to occur. In the

context of traffic crashes, traffic volumes are frequently used as an exposure measure.

Efforts to study bicyclist crash risk have historically been hindered by the lack of

widespread exposure data. This study presents methods to estimate bicycle traffic

volumes across an entire urban network.

The first major chapter of the dissertation presents a data schema for classifying

bicycle demand datasets. There is an ever-growing abundance of transportation data,

with some of the fastest growth seen in realm of non-motorized demand. However,

all of the available datasets provide incomplete information about the system. For

example, some only represent a time series of observations at a single location in

space (automated counters), while others cover all space and time but only represent

a small subset of the population of people and trips (crowdsourced data). In order

to understand how these heterogeneous sources of information correspond to one

another, it was deemed necessary to first identify their differences. Six metadata

characteristics were defined, which are termed the population scope, trip aggregation,

temporal scope, temporal resolution, spatial scale, and demographics. Levels are

defined for each dimension, and examples of generic datasets are discussed in terms

of their metadata dimension.

The second major chapter of the dissertation presents a method of fusing multiple

link-level demand estimates to infer peak-hour bicycle traffic volumes. While

the method is agnostic to the specific sources being used, it is presented with a

case study of San Francisco, CA using data from regional travel demand models,

a smartphone crowdsourcing application, and bikeshare system ridership. The defined process entails first converting the datasets to a common format in terms of

their metadata dimensions, and then fitting these homogenized link-level estimates

to observed counts using a weighted regression technique modeled after Geographically

Weighted Regression. The fitting parameters associated with each dataset are

hypothesized to vary geospatially, and the means by which this variation occurs is

controlled by the specified weighting scheme. A distance decay weighting, where observations

further from a given location contribute less to the parameter estimates, is

found to produce the best results. Cross-validation is employed for model comparison

and the selection of features and hyperparameter values. It is shown that, on the

basis of cross-validated Root-Mean Square Deviation, that fusing data sources provides

greater predictive accuracy than can be achieved using any individual source,

and that utilizing localized regression is more predictive than using a single global

parameter for each data set.

The final chapter is about inferring the temporal distribution of traffic based on

continuous automated count data. Latent Dirichlet Allocation is applied as a signal

decomposition model to identify latent spatio-temporal patterns in the observed

count data, which appear to correspond to coherent activity patterns such as AM

commuting, PM commuting, and midday cycling. Each link’s temporal distribution

can thus be expressed in terms of the extent to which each latent pattern is observed

on it. The mixture of these patterns on unobserved links is interpolated using a

purely autoregressive model, in contrast to the historically ad hoc methods used to

determine the temporal characteristics of bicycle traffic on unobserved links.

The primary conclusion of this work is that the lack of exposure data should no

longer be considered an insurmountable problem for studying bicycle crashes. Using

advanced analytical methods, such as those presented here, in conjunction with

the abundance of new datasets provides a means of generating defensible retrospective

volume estimates for the entire network. This dissertation paves the way for

many future lines of inquiry, including both refinements upon the methods presented

here and application of the volume estimates developed here to problems requiring

exposure quantities, such as the evaluation of crash risk.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View