Functional annotation of olfactory receptors using in silico approaches
Olfactory receptors (ORs) are the largest family of the seven transmembrane G protein coupled receptors (GPCRs). Some ORs are known to be involved in physiological and pathological functions in non-nasal tissues within human body. Therefore, finding ligands for such “ectopic” ORs is of therapeutic importance. The overall objective of this thesis is to functionally annotate ectopic human ORs by identifying their ligands, through structural bioinformatics, cheminformatics and machine learning (ML) approaches.
Of the 405 human OR gene products, only thirty have protein level evidence, without substantial proteomic data, leaving 375 ORs labelled missing proteins (MPs). As these ORs have “orthogonal” (or non-mass spectrometric) evidence for their existence, as a preliminary study, I collated experimental functional information for the missing ORs, and organized them into five categories comprising deorphanization, site-directed mutagenesis data, functional characterization, structural studies, and antibody and disease association. Orthogonal evidence is available for 107 missing ORs, of which 14 ORs have sufficient deorphanization evidence for computational studies, six being broadly tuned, with more than 25 known ligands.
ML approach on ORs requires ligand data to be redefined in terms of chemical features. Moreover, Several ML methods are available such as random forest (RF), support vector machine (SVM), Naïve Bayes (NB). In order to identify the appropriate ML model (classifier) for OR ligand prediction problem, I carried out a comprehensive assessment of ML classifiers for OR1G1. Based on performance NB classifier was used to rank the small molecules from four databases. Six putative agonists with known bioactivities were recommended for experimental validation.
The ML based ligand prediction method was further extended to two more ORs of pharmacological importance; OR1A1 and OR2W1. As the RF classifier outperforms other ML approaches for the available ligand data for these receptors, it was used to rank the huge chemical space comprised of 22,938,816 compounds from five small molecule databases. The top predictions of the classifier were confirmed through luciferase assays by collaborators. Three compounds for OR1A1 and one for OR2W1 were regarded as potent agonists on the basis of experimental results.
For ORs having less than 25 known ligands, a structural approach for ligand prediction is more suitable than ML. Unfortunately, no OR structure is as yet experimentally solved, requiring the generation of a reliable structural model. Given the complexity of low sequence identity to known GPCR structures, it is challenging to select an appropriate template, as shown in an initial model building study. To this end, I have developed a novel biophysical method, for GPCR template selection. The method takes hydrophobicity profile, similarity within GPCR orthosteric ligand binding sites, and resolution into account to identify the appropriate template for a query GPCR.
The proposed biophysical approach was used to select the template for a narrowly tuned OR, OR1A2, implicated in hepatocarcinoma, for homology modelling with only 13 known ligands, I developed a two-stage virtual screening approach to identify putative ligands for OR1A2. During the first stage, small molecules were screened using a ligand pharmacophore approach, based on atomic property fields and for the second stage screening, structure-based virtual screening (SBVS) was carried out using the OR1A2 homology model. The top five putative ligands were identified and further confirmed using molecular dynamics simulations. Four of the putative ligands have been recommended for in vitro testing.
This biophysical method for template selection was implemented in the form of a tool, Bio- GATS. The method was extended to all GPCRs and enables the user to either automatically select the template for a GPCR query sequence or browse the list of available GPCR structures.
In summary, for three broadly tuned ORs, I applied an ML-based workflow for agonist prediction. As a result, three novel agonists for OR1A1 and a novel agonist for OR2W1 have been experimentally verified. For OR1G1, I have identified six putative agonists which await in vitro testing. For narrowly tuned ORs, a homology model is essential for SBVS. Four putative agonists were identified for OR1A2, using 2-stage screening, with a homology model based on a novel biophysical approach for appropriate GPCR template selection. The workflows presented here are suitable for functional annotation of ORs and for integration into drug discovery pipelines for pharmacologically relevant GPCRs.