Advanced bioinformatics approaches for proteomics data analysis

Noor, Zainab

doi:10.25949/21619419.v1

01whole.pdf (10.98 MB)

Advanced bioinformatics approaches for proteomics data analysis

thesis

posted on 2022-11-25, 03:24 authored by Zainab Noor

Mass spectrometry (MS) coupled with liquid chromatography (LC-MS/MS) has become a well-known technique for the discovery of a large number of proteins from complex biological samples. More recently, the technique called Data Independent Acquisition (DIA) has been developed for MS-based proteomics studies and is emerging as a highly accurate and reproducible method for quantitative proteomics. Data extraction and analysis for DIA-based studies require reference spectral data as a pre-requisite, generated using MS in data-dependent acquisition (DDA) mode. This has been partially addressed recently with the availability of spectral data in public repositories. However, the incorporation of these datasets to create a complete reference proteome for DIA-based proteomics studies, remains a challenge. The overall objective of this thesis is to establish advanced bioinformatics approaches which facilitate and maximise the utilisation of DDA data, to improve the identification and quantitation of proteins from individual DIA experiments. The generation, availability and compatibility of DIA reference libraries is a laborious task, requiring a significant expenditure of computational and experimental resources. In the first phase, I developed and deployed a platform to integrate spectral data stored in proteomics databases, using the R programming language and the R Shiny package. This open-source web-based interactive user interface 'iSwathX' provides fully-automated processing of reference assay libraries by normalizing and combining the spectra from different DDA-based datasets to generate extended libraries. The interface also provides novel functions to analyse the multiple DDA libraries simultaneously for quick and efficient data analysis. In the next phase, I extended the integrated libraries approach to design cross-sample libraries in order to examine the complex human plasma proteome. Plasma-based DIA studies have always been affected by the lack of comprehensive and in-depth DDA libraries which are also dominated by high abundant proteins. In this study, I have designed the integrated cross-sample library by incorporating DDA data from cell samples, in addition to plasma samples. This greatly enhanced the library size and search space for DIA data extraction and analysis. As a result, I was able to identify and quantify a larger number of proteins from human plasma, which could potentially lead to the discovery of new disease biomarkers. Separately, I developed a strategy for utilising proteomics data from homologous species to generate a cross-species reference proteome. For this, I created libraries from the plasma proteomes of domestic animals and applied these not only to study these domestic bovids but also leveraged them to study a wild bovid species. The cross-species libraries were also scrutinized to study the proteome of the distant-related species whose genome sequences are not known. This innovative analysis approach successfully led to the identification and quantification of proteins from multiple species through comparative proteomics analysis. In conclusion, this thesis demonstrates the development and application of novel bioinformatics approaches which support extensive and dynamic analysis of protein data generated by different mass spectrometry techniques. Additionally, enhanced use of DIA-MS methods in large-scale diverse proteomics studies presented novel biological findings. The methods presented in this thesis can potentially accelerate the discovery of previously inaccessible proteomics data, leading to new insights for biomedical and therapeutic studies as well as conservation and biodiversity studies.

History

Chapter 1: Introduction -- Chapter 2: Methods and applications -- Chapter 3: iSwathX: an interactive web-based application for extension of DIA peptide reference libraries -- Chapter 4: Using iSwathX 2.0 for Processing and Extending DDA Spectral Libraries for DIA Data Analysis -- Chapter 5: A combinatorial SWATH library approach to increase proteome discovery space -- Chapter 6: Leveraging of extensive inter-species homologies to study plasma proteomes of domestics and wildlife species using data-independent acquisition -- Chapter 7: Conclusions and future directions -- References

Notes

A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy

Awarding Institution

Macquarie University

Degree Type

Thesis PhD

Degree

Thesis (PhD), Macquarie University, Faculty of Science and Engineering, 2020

Department, Centre or School

Department of Molecular Sciences

Year of Award

2020

Principal Supervisor

Shoba Ranganathan

Additional Supervisor 1

Abidali Mohamedali

Rights

Copyright: Zainab Noor Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

148 pages

Usage metrics

Keywords

Bioinformatics Plasma Proteomics Mass Spectrometry Data Analysis Colorectal Cancer

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Advanced bioinformatics approaches for proteomics data analysis

History

Table of Contents

Notes

Awarding Institution

Degree Type

Degree

Department, Centre or School

Year of Award

Principal Supervisor

Additional Supervisor 1

Rights

Language

Extent

Usage metrics

Categories

Keywords

Licence

Exports