Reinforcement learning for query-based multi-document extractive summarisation

Jones, Christopher Rhys

doi:10.25949/19433114.v1

01whole.pdf (1.95 MB)

Reinforcement learning for query-based multi-document extractive summarisation

thesis

posted on 2022-03-28, 12:38 authored by Christopher Rhys Jones

Text summarisation helps to manage the growth of digitally stored textual information, by allowing users to learn key information from reading short summaries. This research project focuses on query-based multi-document extractive summarisation, which constructs a summary made of sentences extracted directly from multiple source documents and based on a user query. Much of the past research in extractive summarisation is based on supervised machine learning approaches, which requires converting target human summaries into explicit annotations of the input sentences. In contrast, our research focuses on reinforcement learning, which can incorporate the target human summaries directly into the learning process. We explore the impact of various key aspects of reinforcement learning. First, we compare several variants of the Proximal Policy Optimization (PPO) approach with baseline reinforcement learning approaches. Second, we investigate pretraining our policy using supervised approaches. We report our results on data provided by the BioASQ Challenge. We observe that PPO penalises changes to the policy as mentioned in literature. However, there is no significant improvement to our summarisation quality when using PPO or pre-training.

History

Notes

Theoretical thesis. Bibliography: pages 55-64

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Degree

MRes, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Department of Computing

Year of Award

2019

Principal Supervisor

Diego Mollá-Aliod

Rights

Copyright Christopher Rhys Jones 2019 Copyright disclaimer: http://mq.edu.au/library/copyright

Language

English

Extent

1 online resource (xi, 64 pages) illustrations

Former Identifiers

mq:72010 http://hdl.handle.net/1959.14/1280499

Usage metrics

Keywords

PPO NLP learning Computational linguistics reinforcement summarisation Document

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Reinforcement learning for query-based multi-document extractive summarisation

History

Table of Contents

Notes

Awarding Institution

Degree Type

Degree

Department, Centre or School

Year of Award

Principal Supervisor

Rights

Language

Extent

Former Identifiers

Usage metrics

Categories

Keywords

Licence

Exports