Multimodal Machine Learning for Medical Imaging

Singh, Sonit

doi:10.25949/24882099.v1

Multimodal Machine Learning for Medical Imaging

thesis

posted on 2024-09-11, 22:18 authored by Sonit SinghSonit Singh

Humans comprehend the world through images, and use words to communicate. Similarly, radiologists interpret medical images and describe their findings and interpretation in the form of radiology reports. Research shows that up to 30% of imaging studies miss subtle findings, leading to multiple scans, and in the worst case death of patients. These diagnostic errors are mainly due to increasing patient volume, fatigue, inability to locate subtle findings, and the subjective nature of human perception. A recent estimate shows that one billion radiology examinations are performed worldwide annually. Taking 4% of error rate, it equates to 40 million diagnostic errors per year. In order to reduce these errors, there is a need to develop automated clinical decision support systems that can interpret medical images and generate reports to augment radiologists’ work. A trained radiologist can interpret medical images to find abnormalities and generate radiology reports with competence, but having these capabilities in an intelligent system is a significant challenge due to the differences in structure and characteristics of different types of medical images and their radiology reports.

Despite considerable research at the intersection of language and vision technology for generic applications, its adaptation to the medical domain is not fully explored. In this thesis, we develop multimodal machine learning models that can reason jointly on medical images and radiology reports for automatic generation of radiology reports from medical images. Specifically, we propose a unified approach where we first identify the correct modality of medical images and predict relevant clinical concepts present in them. We also propose a self-attention guided convolutional neural network for identification of common thoracic diseases including the COVID-19 disease. Armed with these contributions, we propose two multimodal machine learning models for automatically generating radiology reports from chest X-rays. We propose an encoder-decoder framework, build on the convolutional neural network and multi-stage recurrent neural network, with separate generation of normal and abnormal radiology reports. Inspired from radiology practice, we propose the “show, tell, and summarise" model for radiology report generation, which first generates findings text from medical images and then summarises the findings to output an impression section concluding the study. We perform extensive experiments by varying parameters for both the encoder (medical images side) and the decoder (radiology reports side), and find that these state-of-the-art vision and language models improve radiology report generation. To provide robust measures of model performance in generating coherent, factually complete, and clinically accurate radiology reports, we highlight the need to use both language generation metrics and classification metrics, given their complementary nature in evaluating radiology reports. Finally, we bring together these sub-system and incorporate best practices to design an integrated and robust radiology report generation system.

The work in this thesis offers the potential to augment radiologists by automating the repetitive process of radiology report drafting, detecting possible medical conditions, accelerating clinical workflow by triaging patients depending upon the level of urgency, and reducing diagnostic errors, in turn saving human life.

History

1. Introduction -- 2. Background and related work -- 3. Modality classification and concept detection -- 4. Diagnosis of thoracic diseases -- 5. Diagnosing COVID-19 using chest X-rays -- 6. Radiology report generation -- 7. Clinical context-aware radiology report generation -- 8. Designing a robust radiology report generation system -- 9. Conclusions and future work -- References

Awarding Institution

Macquarie University

Degree Type

PhD Thesis

Degree

Doctor of Philosophy

Department, Centre or School

Department of Computing

Year of Award

2022

Principal Supervisor

Len Hamey

Rights

Copyright: The Author Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

269 pages

Usage metrics

Keywords

medical imaging machine learning deep learning computer vision natural language processing language and vision chest X-rays radiology reports radiology report generation multimodal machine learning

Licence

In Copyright

Multimodal Machine Learning for Medical Imaging

History

Table of Contents

Awarding Institution

Degree Type

Degree

Department, Centre or School

Year of Award

Principal Supervisor

Rights

Language

Extent

Usage metrics

Categories

Keywords

Licence

Exports