Application of informatics and quantitative proteomics to identify missing proteins and proteins involved in colorectal cancer metastasis
thesisposted on 2022-03-28, 22:28 authored by Subash Adhikan
Accurate management of any human disease requires a thorough understanding of the molecular underpinnings (i.e., genome + epigenome + transcriptome + proteome + peptidome + protein post-translational modifications + metabolome + microbiome) driving the biology of that specific disease. Given the intricate and expanding roles played by proteins in human health and disease, these have been studied extensively to uncover disease mechanisms, define diagnostic, prognostic and theranostic markers and identify novel therapeutic targets. Over the past decade, high throughput mass spectrometry (MS)-based proteomics with subsequent bioinformatic analysis have emerged as one major technological driving force in our attempts to expand human proteomics so that it has a noticeable impact on medicine, human health and the life sciences alike. The Human Proteome Project (HPP) provides a framework for communal proteomics research. It specifically adds value to the task of 'knowing thyself' in strictly molecular terms by mapping the ~20,000 proteins encoded by the human genome. It aims to do so as a corollary to the human genome using measurements at the highest possible accuracy and stringency. The HPP states that it has three initial primary aims, namely to; "i. complete the protein 'parts list' of Homo sapiens by identifying and characterizing at least one protein product and as many posttranslational modifications, single amino acid polymorphisms and splice variant isoforms as possible for each protein-coding gene;ii. transform proteomics so it becomes complementary to genomics across clinical, biomedical and life sciences through technological advances2iii. create knowledgebases for the identification, quantitation and characterization of the functionally networked human proteome." 8This thesis contributes to two major elements of the HPP (C-HPP, Chromosome-Centric-HPP and B/D-HPP, Biology/Disease-HPP) through the use of informatics and proteomics approaches. Expanding previous research efforts by our HPP team at Macquarie University, Chapter 1 aims to advance community-centric resources to accelerate the identification of missing proteins (MPs). This chapter provides a plausible explanation for the observed paucity in identifications of certain missing protein family groups that have failed to be identified by MS over the last decade - namely the olfactory receptors (ORs). Analysis of OR hydrophobicity, topological distribution, tryptic cleavage site and frequency, and ability to predict in silico uniquely-mapping, non-nested tryptic peptides of a communally-required length (9 amino acids or longer) indicated that multiple ORs are unable to generate peptides as per requirements set by the HPP to be called protein existence 1 (PE1 for short) proteins. These ORs may not be MS-identifiable unless they qualify for relaxed stringency criteria or other proteases are used to generate suitable peptides from which we can infer protein identification. A similar observation was made when this analysis was expanded for all multi-transmembrane domain (TMD)-containing human membrane proteins coded for by the human genome.