posted on 2022-08-01, 03:19authored byEliza Harrison
<p>As online sources of health information increasingly influence what people believe and the decisions they make, the proliferation of misinformation and dubious health claims online poses a risk to public health. As a step towards tools that help address the variable quality of health information shared online, this thesis develops and evaluates a two-stage process for the detection of health claims in threads posted to health-related forums on Reddit, an online discussion platform where health information, support and advice are common.</p>
<p>Following a review of the literature to identify machine learning methods that have been used to analyse user-generated text on Reddit, this study first compared two unsupervised machine learning approaches to the identification of forums discussing a diverse range of health and medical topics. In the second stage, crowdsourcing methods were then used to label the presence of health claims in threads sampled from the health-related forums identified during the first stage of the process. Using this labelled thread dataset, supervised machine learning methods were used to train classifiers to predict the presence of a health claim in any thread posted to a health-related forum on Reddit. </p>
<p>The results of the unsupervised machine learning experiments showed that while both of the tested methods were able to group health-related forums, the clustering method captured a similar number of known examples of health-related Reddit forums without including too many superfluous forums unrelated to health and medical topics. In the second stage, the four supervised machine learning methods that were tested produced variable results in terms of balance between precision and recall, and the best performing method made use of terms and phrases that were plausible as distinguishing features of health claims. </p>
<p>This thesis demonstrates that unsupervised and supervised methods are a feasible way to robustly detect when users make health claims on Reddit. The development of efficient and scalable methods for the detection of health claims provides a strong basis for subsequent pipelines that could be used to automatically link online health claims to relevant, high quality scientific evidence. These systems may form the basis for tools to improve access to credible health information and help people inform their health and medical decisions using credible, evidence-based online health information.</p>
History
Table of Contents
Chapter 1. Introduction -- Chapter 2. Review of machine learning methods for the analysis of health information on Reddit -- Chapter 3. Methods for the clustering of subreddits -- Chapter 4. Results for the clustering of health-related subreddits -- Chapter 5. Methods for classifying threads containing health claims -- Chapter 6. Results of the classification of health claims -- Chapter 7. Discussion -- References -- Appendix A
Notes
A thesis submitted on 4th June 2020 as partial fulfilment of the requirements of the degree of Master of Research in Medicine and Health Sciences
Includes bibliographical references (pages 79‐93)
Awarding Institution
Macquarie University
Degree Type
Thesis MRes
Degree
Thesis (MRes), Macquarie University, Faculty of Medicine, Health and Human Sciences, Australian Institute of Health Innovation, Centre for Health Informatics, 2020
Department, Centre or School
Australian Institute of Health Innovation
Year of Award
2020
Principal Supervisor
Adam Dunn
Additional Supervisor 1
Didi Surian
Rights
Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer
Copyright Eliza Harrison 2020.