Bioinformatic analysis of transcriptome data: application to helminth parasites
thesisposted on 2022-03-28, 19:14 authored by Ranjeeta Menon
"Parasitic nematodes of humans and other animals cause diseases of major global socioeconomic importance, collectively known as neglected tropical diseases. These organisms are able to overcome the sophisticated host immune response mechanisms to colonize, mature and reproduce within the host. An in-depth understanding of parasite genomes, host-parasite relationships, the molecular biology of parasites and their functional annotation can help identify therapeutic molecular targets in helminths, from the discovery of novel genes for parasite control with minimum host side effects. With only a few nematode genomes completely sequenced, analysis is carried out with transcriptomic data, which requires a number of computational methods for their pre-processing, clustering, assembly and annotation to yield biologically relevant information. This thesis highlights improved bioinformatics approaches to analyse transcriptomic data from Expressed Sequence Tags (ESTs), and their application to parasitic nematodes. I first conducted a comprehensive review of the steps involved in transcriptome data analysis for the development of new semi-automated bioinformatic pipelines and their application to parasitic helminths. With the advent of Next-Generation sequencing technologies, my focus was to incorporate the assembly and annotation of short reads, and to concentrate on identifying molecules, especially excretory/secretory proteins (ESPs) involved in key biological processes or pathways that might serve as targets for new drugs or vaccines. I carried out a preliminary analysis on Fasciola hepatica, a parasitic flatworm that causes the disease, fascioliasis, and also infects the liver of various mammals, including humans, leading to liver cancer. By integrating transcriptomic data with proteomic analysis emphasizing on proteases, I have been able to understand the complexities involved in the ability of a developing parasite to sustain itself within the mammalian host. The analysis revealed that a number of non-classically secreted proteins were identified by proteomics but not by bioinformatics, to be addressed in the design of a new analysis pipeline. To benchmark current bioinformatics tools for transcriptome analysis for the new analysis pipeline, I then carried out large-scale analysis of the adult stage of Teladorsagia circumcincta (407357 raw ESTs), a parasitic nematode of sheep and goats. Based on the benchmarking results, a robust transcriptome analysis pipeline (TranSeqAnnotator) has been developed, with contig generation from ESTs and short reads, updated pathway analysis, non-classically secreted protein identification and extensive annotation. The pipeline accepts ESTs, quality values, protein sequences and short reads as input and provides as output, assembled contigs and singletons and their annotations including Gene Ontologies, secretory proteins, mapping to protein domains, motifs, metabolic pathways and interaction databases. ESPs are predicted by a combination of computational approaches to effectively identity proteins secreted by classical and nonclassical pathways. The pipeline is available as web service and can be downloaded for local installation. As part of evaluating the pipeline, I carried out an in-depth analysis of transcriptome from a nematode parasite, Ascaris lumbricoides. Results from the pipeline for the analysis of short read sequences of the fourth larval stage of Teladorsagia circumcincta (507,124 sequences) are compared to the adult stage EST annotations, as a large-scale application of TranSeqAnnotator." -- Abstract.