posted on 2022-03-29, 03:36authored byMandeep Kaur
Automation of text summarisation is a pressing need due to the plethora of textual information available online. Motivated by the success of machine learning in this domain, this research explores several supervised machine learning approaches for extracting summaries in response to queries. The first objective of this research is to compare the quality of classification and regression approaches for query-based multi-document extractive summarisation. To enable the comparison, we use a common extractive summarisation framework which attempts to identify salient sentences by scoring them based on a common set of features. Our experiments are performed on biomedical data provided by the BioASQ challenges. The second objective is to address the important issue of converting the sample summaries available in the training data into annotations that can be used to train statistical classifiers for extractive summarisation. We conduct different trials of data annotation and assess their impact in the results. On the basis of our investigations for the specific dataset used in this research, we show that the classification scheme performed better than the regression and results presented by different annotation techniques reveal that annotation with threshold 0.1 outperforms the other techniques.
History
Table of Contents
1. Introduction -- 2. Literature review -- 3. Summarisation corpus and evaluation -- 4. Research methods -- 5. Experimental results and discussions -- 6. Conclusion and future work -- References.
Notes
Bibliography: pages 48-53
Empirical thesis.
Awarding Institution
Macquarie University
Degree Type
Thesis MRes
Degree
MRes, Macquarie University, Faculty of Science and Engineering, Department of Computing