Supervised machine learning for extractive query based summarisation of biomedical data
thesisposted on 29.03.2022, 03:36 by Mandeep Kaur
Automation of text summarisation is a pressing need due to the plethora of textual information available online. Motivated by the success of machine learning in this domain, this research explores several supervised machine learning approaches for extracting summaries in response to queries. The first objective of this research is to compare the quality of classification and regression approaches for query-based multi-document extractive summarisation. To enable the comparison, we use a common extractive summarisation framework which attempts to identify salient sentences by scoring them based on a common set of features. Our experiments are performed on biomedical data provided by the BioASQ challenges. The second objective is to address the important issue of converting the sample summaries available in the training data into annotations that can be used to train statistical classifiers for extractive summarisation. We conduct different trials of data annotation and assess their impact in the results. On the basis of our investigations for the specific dataset used in this research, we show that the classification scheme performed better than the regression and results presented by different annotation techniques reveal that annotation with threshold 0.1 outperforms the other techniques.