IImproving Public Transit Resilience by Leveraging Smartcard Data for Rapid Decision-Making under Highly Dynamic Conditions
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

IImproving Public Transit Resilience by Leveraging Smartcard Data for Rapid Decision-Making under Highly Dynamic Conditions

Abstract

The COVID-19 pandemic presented an unprecedented challenge to public transit systems worldwide, testing the ability of transit agencies to make swift and difficult decisions. Balancing the need to provide essential workers with reliable transportation options while dealing with suddenly reduced revenue and increased operational costs forced many agencies to reduce transit services, making life difficult for countless commuters relying on transit. While the pandemic spotlighted the importance of decisive action during times of crisis, disruptions to public transit systems are a regular occurrence. Disruptions vary significantly in time, duration, location, scope, frequency, and impact. Terrorist attacks, the aftermath of violent protests, and natural disasters are a few examples that can drastically change the transit system and our travel needs. Responding quickly and effectively to these disruptions by making informed decisions is crucial not only for minimizing the impact of the disruption, but also for maintaining the reliability of public transit systems for those who depend on them, ensuring they remain available and accessible when needed most.

In recent years, transit agencies have increasingly begun to adopt automated fare collections (AFC) systems, also known as smartcards, to collect fares, making valuable data readily available for analysis. In response, the research community has proposed a number of inferential methods and predictive models to analyze transit patterns. More than 300 research papers have been published on short-term ridership prediction models in the last decade. Despite the abundance of literature, we, the research community, have yet to generate practical knowledge for decision-making. Lack of reproducibility due to inadequate documentation, inaccessible tools and resources, and code and data-sharing issues are prevalent; hindering our ability to create collective wisdom. The problem is further compounded by the lack of rigorous statistical analysis in many studies, making it difficult to identify relevant research, particularly when metrics across studies are incomparable.

Adding to this chaos, there is also a significant gap in inference methods and predictive models to account for disruptions in general. Current inference methods, which are mostly used to enrich transaction data with other data sources, are designed for stable periods and do not capture the changing transit behaviors of transit users during disruptions, such as those caused by the COVID-19 pandemic. Beyond the pandemic, station closures, a common cause of disruptions, are not accounted for in any predictive model. This further hinders the ability of transit agencies to utilize such models for decision-making during disruptions.

This dissertation demonstrates how to use readily available smartcard data to quickly support decision-making in highly dynamic conditions. The objectives of the dissertation are to:

\begin{enumerate} \item Enhance the value of smartcard data by integrating it with other data sources to provide more comprehensive, near-real-time insights during disruptions. \item Create an open-source repository for short-term ridership prediction models to facilitate accurate and reliable comparisons, and to accelerate advancements of the models. \item Compare and assess the performance of state-of-the-art methods for short-term ridership prediction in a highly dynamic condition. \item Improve the accuracy and reliability of short-term ridership prediction models during disruptions by integrating station closure information. \end{enumerate}

All work is done using the transaction data of 147 stations in the Bus Rapid Transit (BRT) system in Bogot {a}, Colombia, from August 2015 to May 2021. This dataset includes two highly dynamic conditions: a month-long protest that occurred in November-December 2019 and the COVID-19 pandemic that began in March 2020. Both conditions drastically changed the travel needs of the users and the transit system. Additionally, the dataset covers several station closures, which are not necessarily associated with these two primary events.

In the study described in Chapter \ref{chap:paper1}, I examined the changes in demand for the BRT system in Bogot {a} during the COVID-19 pandemic. Despite the disruptions caused by the pandemic, transit agencies had to continue providing services as it was a crucial transportation option for many essential workers. However, a comprehensive understanding of the remaining transit users, as well as how the demand adjusted during the recovery period across different population segments, was lacking, despite being critical information to inform the prioritization of service changes, station cleanings, and the effective implementation of safety measures. I enriched smartcard data transactions with block-level demographic data from the 2018 census to infer the socio-economic strata of BRT users, a variable that can be used as a proxy for income. I present a methodology that combines behavioral modeling techniques with well-established inference methods to capture the dynamic heterogeneity of transit use by strata at different points in time, and use this information to estimate a probability vector that assigns frequent transit users to a stratum. At the beginning of the pandemic, the reduction in transit use for all strata was similar, at around 85 to 90\%, but members of lower strata returned to transit at a pace five times faster than members of higher strata. The proposed method had the advantage of providing near real-time insights into the evolution of the COVID-19 pandemic using existing and available data.

Chapter \ref{chap:paper2} describes a study in which I explored predictive models to anticipate ridership and enable proactive resource allocation ahead of time. It was difficult to reliably compare and test models, even under stable conditions, let alone during highly dynamic conditions. Therefore, I built an open-source infrastructure with running code for five major methodologies commonly used in the literature, including econometric and deep learning approaches. The open-source code also provides the Bogot {a} BRT smartcard dataset for the five-year period for other researches to reproduce and replicate results.

While the literature on short-term ridership prediction focuses on relatively stable condition, I systematically compared their performance with two highly dynamic conditions, a one-month-long labour protest, and the COVID-19 pandemic. In stable conditions, most tested models performed relatively similarly, with forecasting errors varying from 8.5 to 12\%. However, all models performed significantly worse in both highly dynamic conditions relative to stable conditions. In the protest condition, increases in prediction error ranged from 14 to 24\%. During the COVID-19 pandemic, increases in prediction error ranged from 12 to 82\%. Notably, in the COVID-19 pandemic scenario, a Recurrent Neural Network (RNN) model, with a long-short term memory (LSTM) cell stood out by outperforming other models and adapting faster to disruptions. The prediction error stabilized within about 1.5 months, while other models had higher error rates even a year after the start of the pandemic.

In Chapter \ref{chap:paper3}, I noted that some disruptions resulted in temporary station closures, causing an increase in demand at nearby transit stations. But station closure information was not incorporated into any short-term predictive models. To improve predictions for stations affected by closures, I proposed a new model that uses graph theory and the attention mechanism. The model aimed to capture spatiotemporal correlations, so that the prediction at one station is sensitive to what happens in another station in a previous time step. By incorporating station closure information into the proposed model, I reduced the prediction errors by 3.3 to 23.5\% for station closures compared to existing models in the open-source codebase infrastructure developed in Chapter 3. These improvements were observed across multiple metrics and modeling strategies.

In summary, the contributions of this dissertation are to:

\begin{enumerate} \item Develop a method to enrich smartcard data with socio-economic information for highly dynamic conditions. \item Expand the understanding of the change in transit use by different socio-demographic groups during the COVID-19 pandemic in Bogot {a}. \item Create an open-source codebase infrastructure to consolidate short-term ridership prediction research to perform systematic, reliable and statistically rigorous benchmarks tests to accelerate the advancements of the models. \item Systematically compare the performance of state-of-the-art methods for short-term ridership prediction during both stable and highly dynamic conditions. \item Demonstrate the importance of including station closures to improve the prediction accuracy for nearby stations. \item Propose a novel modeling framework that captures spatiotemporal correlations of the transit network, improving overall performance and the prediction accuracy for stations impacted by other station closures. \item Integrate the new modeling framework in the open-source codebase infrastructure for other researchers to replicate, reproduce, compare, improve, and build upon our model. \end{enumerate} Smartcard data provides timely information about transit demand that can inform operational and tactical decisions of transit agencies. While not as rich in demographic information as surveys, these data provide timely and relevant information that is critical during disruptions. This information can help transit agencies understand how users adjust travel during disruptions and prioritize populations that may be disproportionately affected. Accurate forecasting can also give transit agencies additional time to plan resource reallocation and take appropriate actions. The creation of an open-source codebase infrastructure lowers the barrier to reproducibility. It allows for reliable and systematic comparisons of model performance, identifying relevant research and ultimately generating knowledge to improve decision-making.

Thorough this dissertation, I empirically demonstrate the contributions by leveraging publicly available smartcard data from the BRT system in Bogot {a} spanning more than five years, including multiple highly dynamic periods of time affecting transit ridership. I showcased the power of smartcard data in understanding the evolution of the impact of the COVID-19 pandemic on frequent transit users from different population segments. In the spirit of collaboration, I built an open-source codebase to foster more innovative solutions, enabling researchers, practitioners, and transit agencies to generate collective wisdom that advances scientific knowledge. I proposed a new modeling framework that leverages station closure information to improve overall short-term ridership forecast. In conclusion, my work demonstrates the potential of utilizing smartcard data to support

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View