A bioinformatics approach to structure-based T cell epitope prediction
thesisposted on 28.03.2022, 19:24 by Javed Mohammed Khan
The adaptive immune system in higher jawed vertebrates carries out antigen presentation and recognition in two steps. Major histocompatibility complexes (MHC) first bind immunogenic peptide epitopes (p) derived from antigens and present them as peptide-MHC (pMHC) complexes, for subsequent recognition by T cell receptors (TR) leading to T cell activation . A decade after the first TR/pMHC structure was reported, the molecular basis of TR/pMHC interaction is still unknown. Peptide epitopes that bind strongly to MHC proteins are known to elicit T cell response, albeit with ~50% efficiency, forming the basis of T cell-based peptide vaccines. Experimental identification of these epitopes is a tedious, time consuming and expensive process. Computational methods are comparatively inexpensive and efficient in screening numerous peptides against their cognate MHC alleles. Sequence-based prediction methods are well established but are limited by the requirement of large datasets of known MHC-binding peptides. Structure-based prediction approaches, especially docking techniques, are universally applicable and specially suited for alleles with limited data. For efficient vaccine design and to minimize experimental T cell binding assays, precise computational strategies for rapid prediction of high-binding epitopes for all alleles with a high propensity to activate T cells, are required. Our group has previously developed an accurate structure-based docking protocol, from which prediction models for identifying high-binders have been developed. However, this method is not fast enough to scan an entire proteome for large-scale pathogen screening studies. We also need to understand the physicochemical basis of TR binding to pMHC, to screen high-binders for greater TR binding potential and eliminate those that do not lead to T cell activation. These two specific aims are addressed in this thesis, and applied to predict true T cell epitopes amongst high-binders for a disease-implicated MHC allele. pDOCK is a new fast, accurate and robust method for high-throughput screening of pathogenic sequences, based on flexible docking of peptides to MHC-I and MHC-II proteins. Compared to our earlier docking methodology, pDOCK shows upto 2.5 fold improvement in accuracy (7-fold compared to earlier published studies) and is ~60% faster. To dissect TR/pMHC interactions, I have collated and analysed 61 TR/pMHC crystallographic structures, available in the new database, MHC Peptide Interaction Database - version T2 (MPID-T2; http://www.biolinfo.org/mpid-t2). MPID-T2 is an updated and extended version of the earlier MPID-T database, augmented with advanced features and new parameters for analysis of pMHC and TR/pMHC structures. Based on this analysis, I have defined criteria for selecting peptides with high probability to activate T cells. These criteria have been validated with published peptide mutation studies, where TR binding has been changed or abolished. I have applied pDOCK and the TR binding criteria to predict "true" immunogenic epitopes from high MHC-binding peptides for celiac disease and insulin-dependent diabetes mellitus (IDDM) associated HLA-DQ8 allele. Our approach identified potential T cell epitopes, based on MHC and TR specificities, lacking conserved binding motifs, for experimental testing and validation. High prediction accuracy of HLA-DQ8-binding peptides was validated by existing experimental, biochemical and functional data. The bioinformatic approaches developed in this thesis are novel, generic and applicable for the development of effective immunotherapeutic and highly specific peptide vaccines with wide population coverage, capable of eliciting T cell response, thereby cutting down the lead time involved in experimental vaccine development protocols.