Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Abstractive Text Summarization Using Hierarchical Reinforcement Learning

Abstract

Sequence-to-sequence models have recently gained the state of the art performance in summarization. However, not too many large-scale high-quality datasets are available and almost all the available ones are mainly news articles with the specific writing style. Moreover, abstractive human-style systems involving a description of the content at a deeper level require data with higher levels of abstraction.

On the other hand, attention-based sequence-to-sequence neural networks optimizing log-likelihoods at word-level or discrete metrics such as ROUGE at sequence-level has achieved promising results on abstractive text summarization but they are far from perfect: the first group of models may fail to handle out of vocabulary words and often produce repetitive words and incorrect facts. The latter methods using reinforcement training while beating the state of the art methods in terms of discrete evaluation metrics, produce non-readable, sometimes irrelevant summaries.

We initially present WikiHow, a dataset of more than 230,000 article and summary pairs extracted and constructed from an online knowledge base written by different human authors. The articles span a wide range of topics and therefore represent high diversity styles. We also evaluate the performance of the existing methods on WikiHow to present its challenges and set some baselines to further improve it.

Moreover, to overcome the problems of existing summarization systems, we propose a novel hierarchical reinforcement learning architecture which makes decisions in two steps: the high-level policy decides on the sub-goal for generating the next chunk of summary and the low-level policy performs primitive actions to fulfill the specified goal. By reinforcing summarization at different levels, our proposed model outperforms the existing approaches in terms of ROUGE and METEOR scores.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View