Pezeshkpour, Pouya; Pezeshkpour, Pouya

Introducing Gradient-Based Methods and Diagnostic Benchmarks to Trace Errors Back to the Training Data

2022

Pezeshkpour, Pouya
Advisor(s): Singh, Sameer

Creative Commons 'BY' version 4.0 license

Abstract

Deep neural networks that dominate NLP rely on an immense amount of parameters and require large text corpora for training. As these models are increasingly being deployed in the real world, there is an accompanying need to characterize potential failure modes of such models to avoid harms. In particular, it is now widely appreciated that training such models over large corpora commonly introduces biases into model predictions and other undesirable behaviors. Moreover, many of the training datasets are collected automatically or via crowdsourcing, and exhibit systematic biases or annotation artifacts. Considering the current trend in NLP, which relies on ever-growing models' and datasets' size, identifying the origin of an issue when we encounter a problematic behavior is becoming extremely challenging. To recognize any issue, traditionally, one might rely on explaining the model prediction using a variety of attribution methods. Although these methods demonstrate tremendous success in explaining models' predictions, the fragile nature of their explanations hinders their adoption in practice. In addition to looking into models' inner mechanism to identify the source of any issue, one can also try to follow breadcrumb trails into the training process, which is generally very expensive and time-consuming. In this dissertation, we turn toward training data to systematically identify the root of errors. After providing the necessary background in the second chapter, introducing a novel interpretability method for the knowledge graph completion (KGC) task, in the third chapter, we propose an automatic approach to extract mislabeled instances. In the fourth chapter, identifying several shortcomings in the current evaluation setting of KGC, we propose an alternative benchmark for the knowledge graph completion task and demonstrate the poor performance of state-of-the-art models in our benchmark. Shifting toward textual data, in the fifth chapter, we establish reliable and efficient instance attribution methods for explaining large language models. Finally, in the sixth chapter, we incorporate different attribution methods to discover existing artifacts in commonly used NLP benchmarks.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Irvine

Introducing Gradient-Based Methods and Diagnostic Benchmarks to Trace Errors Back to the Training Data