Supervised machine learning for extractive query based summarisation of biomedical data

Kaur, Mandeep

doi:10.25949/19444475.v1

01whole.pdf (1.08 MB)

Supervised machine learning for extractive query based summarisation of biomedical data

thesis

posted on 2022-03-29, 03:36 authored by Mandeep Kaur

Automation of text summarisation is a pressing need due to the plethora of textual information available online. Motivated by the success of machine learning in this domain, this research explores several supervised machine learning approaches for extracting summaries in response to queries. The first objective of this research is to compare the quality of classification and regression approaches for query-based multi-document extractive summarisation. To enable the comparison, we use a common extractive summarisation framework which attempts to identify salient sentences by scoring them based on a common set of features. Our experiments are performed on biomedical data provided by the BioASQ challenges. The second objective is to address the important issue of converting the sample summaries available in the training data into annotations that can be used to train statistical classifiers for extractive summarisation. We conduct different trials of data annotation and assess their impact in the results. On the basis of our investigations for the specific dataset used in this research, we show that the classification scheme performed better than the regression and results presented by different annotation techniques reveal that annotation with threshold 0.1 outperforms the other techniques.

History

Notes

Bibliography: pages 48-53 Empirical thesis.

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Degree

MRes, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Department of Computing

Year of Award

2018

Principal Supervisor

Diego Mollá-Aliod

Rights

Copyright Mandeep Kaur 2018. Copyright disclaimer: http://mq.edu.au/library/copyright

Language

English

Extent

1 online resource (53 pages) graphs, tables

Former Identifiers

mq:70673 http://hdl.handle.net/1959.14/1266590

Usage metrics

Keywords

Abstracts Supervised learning (Machine learning)

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Supervised machine learning for extractive query based summarisation of biomedical data

History

Table of Contents

Notes

Awarding Institution

Degree Type

Degree

Department, Centre or School

Year of Award

Principal Supervisor

Rights

Language

Extent

Former Identifiers

Usage metrics

Categories

Keywords

Licence

Exports