Macquarie University
01whole.pdf (10.36 MB)

Molecular similarity and diversity analysis of bioactive small molecules using chemoinformatics approaches

Download (10.36 MB)
posted on 2022-03-28, 13:59 authored by Varun Khanna
The search for pharmaceutically interesting compounds using computational methods is the core idea in chemoinformatics. With the advent of combinatorial synthesis and highthroughput screening (HTS), researchers and drug industries are currently able to screen millions of compounds each day. However, improvements in screening capabilities have failed to yield a proportionate increase in novel chemotypes. Given the magnitude of compounds in one of the most popular chemistry databases, PubChem, it is irrational to experimentally screen all compounds for a potential target. This thesis aims to study the property space occupied by therapeutic compounds of economic importance obtained from public datasets, using chemoinformatics tools and computational technologies. With this objective in mind, a comprehensive review of current chemoinformatics research, with a particular emphasis on drug discovery was carried out. In addition, the most commonly used, freely available small molecule databases and algorithms for small molecule analysis were also reviewed. Further, recent developments in computational library design techniques were summarized in a separate review article. For web-based analysis and visualization of small molecules, I have developed the chemoinformatics analysis module for the Customary Medicinal Knowledgebase (CMKb; which has served as a prototype to integrate the use of medicinal plant among Australian Aboriginals with bioactives, for identifying potential lead compounds. In order to examine the similarity of current drug molecules with human metabolites and toxics, a preliminary comparative study based on several computed physicochemical properties and functional groups was carried out. We established that searching against complete datasets was comparable to results obtained from clustered data. We then used a multi-criteria approach to analyse physicochemical properties, scaffold architecture and fragment occurrence among large public datasets of biological interest viz. drugs, metabolites, toxics, natural products, lead compounds and the ChEMBL dataset. Fragments are often dependent on each other and therefore, fragment co-occurrences were further assessed by association analysis. Going beyond the general datasets, a nematode-specific anthelmintic dataset was also analysed. Machine learning methods were used to screen potential anthelmintic compounds from public collections and novel anthelmintics have been identified. From our preliminary analysis, it was established that although the physicochemical property space occupied by the drugs, human metabolites and toxics was distinct, presentday drugs are more akin to toxic compounds than to metabolites. This result was in accordance with high attrition rates in drug discovery projects. Furthermore, we concluded that empirical rules such as Lipinski’s “rule of five” can be supplemented to include toxicity information. Following preliminary study on physicochemical properties, we corroborated our earlier finding that metabolites are least similar to current day drugs in our subsequent comprehensive analysis. However, in scaffold analysis we found that over 42.0% of the non-redundant metabolite scaffolds are represented among drugs which suggest that drugs and metabolites largely differ in side chains and linkers but vastly share the scaffold space. Additionally, a robust statistical technique known as association analysis was explored for the first time in chemoinformatics to carry out efficient mining and fragment co-occurrence analysis.


Table of Contents

1. Introduction and literary survey -- 2. Methods and applications -- 3. Development of CMKb chemoinformatics module to digitize and store chemical information -- 4. Comparison of physiochemical property space among human metabolites, drugs and toxins -- 5. Scaffold and fragment co-occurrence studies on datasets of biological interest -- 6. Virtual screening of compounds active against parasitic nematodes of major socio-economic importance -- 7. General discussion and conclusion.


"A thesis submitted to Macquarie University in fulfilment of the degree of Doctor of Philosophy" Includes bibliographical references Two articles were suppressed due to copyright restrictions. Thesis by publication. "March 2011"

Awarding Institution

Macquarie University

Degree Type

Thesis PhD


PhD, Macquarie University, Faculty of Science, Department of Chemistry and Biomolecular Sciences

Department, Centre or School

Department of Chemistry and Biomolecular Sciences

Year of Award


Principal Supervisor

Shoba Ranganathan


Copyright disclaimer: Copyright Varun Khanna 2011.




1 online resources (xvi, 232 pages) illustrations, charts, graphs

Former Identifiers

mq:33438 2176490

Usage metrics

    Macquarie University Theses


    Ref. manager