Data integration and knowledge discovery using biological networks

Kumar, Gaurav

doi:10.25949/19431188.v1

Data integration and knowledge discovery using biological networks

thesis

posted on 2022-03-28, 10:05 authored by Gaurav Kumar

The overall objective of this thesis is to analyse and understand the intricate network of protein interactions inside the cell. Proteins are molecular machines, which interact and communicate to perform different cellular functions. Research effort in molecular and cellular biology enables the detection of molecular interactions on a large scale. The experimental results generated by high-throughput studies are archived in various public databases. In this study, statistical and computational approach is used to integrate information from relative inhomogeneous data sources (public databases) derived from high-throughput experiments. Further, the integrated approach is used to explore the relationships within the interacting protein pairs. Graph-based network model is used to determine the protein relationships based on gene ontology (GO) biological processes, molecular functions and cellular components. -- Network approach has enabled researchers to study the pervasive nature of protein interactions in systems biology. Moreover, different computational methods have been developed to analyse networks and their topological properties. Foremost among them are the methods for analysing direct/indirect protein interactions networks by integrating with the other types of -omic data. This thesis demonstrates the statistical significance of protein interaction networks for the study of subcellular localisation, biological processes and molecular functions. It also suggests the significance of network in biological studies. -- The protein-protein interaction (PPIs) network was created by integrating binary protein interactions deposited in various public databases. Similarly, the metabolic network was created by linking proteins via metabolites, i.e. indirect protein interactions. Both PPIs and metabolic networks were analysed to show the difference in network topologies. Further, we compared and contrasted the subcellular localisation of human proteins using PPIs and metabolic networks. The statistical significance of human protein localisation is demonstrated through statistical measures such as Chi-square (χ2) test, protein colocalisation correlation profile and Z-score. These statistical methods are significant to illustrate the cross-talk among various subcellular compartments and highlight the importance of metabolite-linked protein interaction i.e. functional/indirect association in addition to direct physical interaction of proteins. -- Statistical analyses were extended further for human and yeast proteomes to show the influence of protein degree for determining protein relationships for biological process and molecular function. This analysis demonstrates the tendency of proximal proteins in a network to have the same relationships to depend strongly on their degree/connectivity. Comparison of real networks with that of randomized networks i.e. permutation testing, suggests the significance of such relationships at a network distance less than three. Networks are randomized using an edge swapping method and the distance in a network is calculated for the shortest path between each protein pair, using the Floyd-Warshall algorithm. The significance of the network distance less than three holds true up to six levels of depth from the root node (i.e. zero level) in the hierarchy of gene ontology (GO) terms. -- Application of the network study is further demonstrated using ovarian tumour samples. Gene Expression data from the TCGA (The Cancer Genome Atlas) dataset were collected to encode the functional attributes in a Boolean logic framework for the identification of potential genes in the prognosis and therapy risk assessment in the human diseased condition. The differentially expressed genes were then validated in a co-expression network derived from the ovarian samples deposited in the GEO (Gene Expression Omnibus). A set of 17 differentially expressed genes were identified at the high probability score suggesting their importance in the ovarian cancer diseased condition. Three of these have never been reported before as significant for ovarian cancer.

History

1. Introduction and literature survey -- 2. Methods and applications -- 3. Network analysis of human protein location -- 4. Dissecting the organization of human and yeast interactomes: network relationships from biological process and molecular function -- 5. Identification of ovarian cancer associated genes using an integrated approach in a boolean framework -- 6. Conclusions and future directions.

Notes

Bibliography: p. 167-174 Thesis by publication.

Awarding Institution

Macquarie University

Degree Type

Thesis PhD

Degree

Thesis (PhD), Macquarie University, Faculty of Science, Dept. of Chemistry and Biomolecular Sciences

Department, Centre or School

Department of Chemistry and Biomolecular Sciences

Year of Award

2012

Principal Supervisor

Shoba Ranganathan

Additional Supervisor 1

Helena Nevalainen

Rights

Copyright disclaimer: http://www.copyright.mq.edu.au Copyright Gaurav Kumar 2012.

Language

English

Extent

1 online resource (xiv, 174 p.) ill. (some col.)

Former Identifiers

mq:26317 http://hdl.handle.net/1959.14/222020 1839291

Usage metrics

Keywords

protein protein interaction metabolite linked protein interaction Genes network biology gene expression Proteins -- Analysis -- Data processing data integration Gene regulatory networks Computational biology Protein-protein interaction gene ontology Proteins Genes -- Analysis -- Data processing Gene expression Biology Bioinformatics Biology -- Information resources

Licence

In Copyright

Data integration and knowledge discovery using biological networks

History

Table of Contents

Notes

Awarding Institution

Degree Type

Degree

Department, Centre or School

Year of Award

Principal Supervisor

Additional Supervisor 1

Rights

Language

Extent

Former Identifiers

Usage metrics

Categories

Keywords

Licence

Exports