Macquarie University
Browse
- No file added yet -

Protein structure fitting by a self-organising map spectroscopy (SOMSpec): circular dichroism and infrared spectroscopy

Download (17.14 MB)
thesis
posted on 2024-08-16, 01:46 authored by Adewale Olamoyesan

Computational algorithms developed by Vincent Hall, Anthony Nash, and Alison Rodger furnish scientists or researchers working on proteins with a valuable tool to evaluate their secondary structure. Their original structure fitting algorithm, called Secondary Structure Neural Network (SSNN) was based on a self-organizing map (SOM) approach, a distinct type of artificial neural network that learns without any supervision how to classify training data input by creating nodes as weighted sums of input data. Unknowns are then located on the map by determining their best matching units. SSNN was initially designed specifically for determining secondary structure from CD spectra of proteins. Later, Dale Ang generalised the SSNN program to be used for circular dichroism (CD), infrared (IR) and Raman; this revised program was named SOMSpec which simply stands for self-organizing map spectroscopy. Though, the SOMSpec provides spectroscopists with a vital tool that can predict the structural elements of proteins/peptides, in addition, suite of applications that can prepare input data for our algorithm and extract its output for data analysis is required for the works in this thesis. Four of the MATLAB units developed here automate our previous ad hoc approaches for deriving derandomized spectra, and the last of the units in the tab container helps to prepare train and test files for an alternate leave-one-out cross-validation analysis.

The first objective of this work is to extract structural information about previously difficult proteins with suspected significant unfolded domains using our aforementioned revised algorithm. Proteins of this form are derived by removing varying fractions of unstructured (random coil, RC) spectrum from spectra of typical naturally occurring ones, after baseline spectrum was removed and subsequently, they are converted from original units to molar extinction—the resultant spectra are referred to derandomized. The structure of derandomized proteins are evaluated with the algorithm, then followed by reconstitution of the structure elements form the original experimental proteins. With understanding provided from applying this approach to three important proteins as a function of temperature, it was then applied to peptides that have either folded or unfolded structures or may as well be heterogeneous mixtures of the two states. The experimental data used for these peptides enabled the effects of addition of buffer and time on the structure of the peptides to be identified.

The second objective for this work is to implement prediction of spectral and structure fitting of solid-state and aqueous-state protein IR spectra via leave-one-out cross-validation (LOOCV). Our algorithm is trained with one less protein than in the original reference set, and the map from this is used to predict the structure and spectrum of the excluded protein. Then the algorithm ability to accurately do these is accessed by contrasting predicted results with experimental ones. In this work the SOMSpec was complemented with a module (IR train/test) in the MATLAB suite dedicated for preparing train and test sets from an IR reference set previously used for SOMSpec analysis. There are two options available that can be used with SOMSpec to evaluate the secondary structure property of any set of proteins, the option that take few minutes to solve the problem was chosen over the other that takes considerable time (up to a day, sometimes stay idle and fail to complete). In the former case, the set of proteins are train and subsequently test it through train module and the generated trained map is tested with another module (named test). As well, an alternative to LOOCV analysis was devised to explore how SOMSpec trained with full transmission reference set would affect the spectral and structure prediction quality for another reference set with approximately half of the population of the transmission reference set. This approach was implemented by alternating the reference set to be used as training and testing set for SOMSpec.

The works in this thesis enable researchers and the biopharmaceutical industry to readily implement SOMSpec with knowledge of how reliable it will be for different types of proteins. For example, highly helical proteins will be identified, but in some instances, it is unclear whether the algorithm accurately prediction the actual percentage of this structure type. 

History

Table of Contents

Chapter 1. Introduction -- Chapter 2. Development of MATLAB applications for processing circular dichroism or infrared spectra and SOMSpec output files -- Chapter 3. Circular dichroism for secondary structure determination of protein with unfolded domain using a self-organizing map SOMSpec -- Chapter 4. Testing infrared reference sets collected with different sampling modes in leave-one-out cross-validation and alternate approach -- Chapter 5. Application of derandomisation of peptide circular dichroism spectra to determine secondary structure content of peptides -- Chapter 6. General conclusions -- Appendices

Awarding Institution

Macquarie University

Degree Type

Doctor of philosophy

Degree

Thesis PhD

Department, Centre or School

Department of Molecular Sciences

Year of Award

2021

Principal Supervisor

Alison Rodger

Rights

Copyright: The Author Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

430 pages

Usage metrics

    Macquarie University Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC