Zhou, Yichao

Learning Robust Representations for Low-resource Information Extraction

2021

Zhou, Yichao
Advisor(s): Wang, Wei

Abstract

Information extraction (IE) plays a significant role in automating the knowledge acquisition process from unstructured or semi-structured textual sources. Named entity recognition and relation extraction are the major tasks of IE discussed in this thesis. Traditional IE systems rely on high-quality datasets of large scale to learn the semantic and structural relationship between the observations and labels while such datasets are rare especially in the area of low-resource language processing (e.g. figurative language processing and clinical narrative curation). This leads to the problems of inadequate supervision and model over-fitting. In this thesis, we work on the low-resource IE algorithms and applications. We believe incorporating the supervision from domain-specific auxiliary knowledge and learning transferable representations can mitigate the deficiency of low-resource IE. Specifically, we explore pre-training domain-specific deep language models to acquire informative word/sentence embeddings to curate clinical narratives. We experiment with multi-modal learning techniques to recognize humor and to recommend keywords for advertisement designers. We also extract attributes of interest from the semi-structured web data by building transferable knowledge representations across different websites. For more applications of the low-resource IE, we build a COVID-19 surveillance system by inspecting users' daily social media data. Extensive experiments prove that our algorithms and systems outperform the state-of-the-art approaches and are of impressive interpretability as well.

Main Content

For improved accessibility of PDF content, download the file to your device.

UCLA

Learning Robust Representations for Low-resource Information Extraction