Macquarie University
01whole.pdf (7.27 MB)

The investigation of amyotrophic lateral sclerosis candidate gene variants

Download (7.27 MB)
posted on 2024-05-08, 03:51 authored by Sandrine Kim Kiow Chan Moi Fat

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease caused by the death of upper and lower motor neurons. The only validated causes of ALS are gene mutations, and the understanding of underlying disease mechanisms remains limited. Approximately 10% of cases are classified as familial, with one-third carrying an unknown gene mutation. The remaining 90% are classified as sporadic ALS. While most sporadic ALS is thought to be caused by a complex interaction of unknown genetic and environmental risk factors, around 10% of sporadic cases harbour a known causative gene mutation. The main goal of this thesis was to develop and apply innovative bioinformatics approaches to identify novel genes and mutations that cause or confer risk to ALS using next-generation sequencing (NGS) data from Australian familial and sporadic ALS patients. Specifically, the aims were to 1) develop bioinformatics scripts and pipelines to manipulate and analyse NGS data; 2) apply gene-burden analysis techniques to identify disease-relevant genes; 3) investigate candidate ALS genes by screening NGS data from large cohorts of sporadic and singleton familial cases and; 4) identify and prioritise novel familial ALS genes and variants (both single nucleotide variants and structural variants) within small families with a history of ALS. Given that the study of structural variants is in its infancy, innovative scripts and techniques had to be developed and applied, while taking a multitude of factors into consideration. To address the aims using the available datasets, a wide range of scripting strategies and pipelines were meticulously developed by the PhD candidate using R and Unix environments for NGS data manipulation and analysis. Data included whole genome sequencing data from 609 sporadic ALS patients and whole exome sequencing data from 81 unrelated familial ALS cases. A candidate gene screening pipeline was developed to identify and prioritise novel and rare variants in candidate genes with potential to contribute to disease. Briey, variants present in target genes were identified and filtered using various metrics to identify potential disease-relevant genes. All variants underwent a comprehensive in silico analysis and prioritisation pipeline to predict their pathogenicity and prioritise for future in vitro studies. Control databases were consulted to exclude variants present in healthy control individuals. This approach led to the identification of 30 “high priority” candidate variants from 16 genes and resulted in three publications. The first study (Manuscript I) examined two recently reported ALS candidate genes (GLT8D1 and ARPP21) and found no disease associated variation. It was concluded that these genes were not a common cause of ALS in Australian cohorts, highlighting the importance of replication studies for validating newly reported ALS genes. For Manuscripts II and III, the Kynurenine (KP) and Tryptophan metabolism pathway (Manuscript II) and the Nicotinamide adenine dinucleotide (NAD) pathway (Manuscript III) were investigated as candidate pathogenic mechanisms. As such, this thesis screened a large number of genes involved in these pathways for a role in disease. For Manuscript II, thesis studies identified 84 rare and novel protein-altering variants from sporadic ALS patients, and five genes showed significant evidence from burden testing, for a role in disease. For Manuscript III, 41 rare or novel protein-altering variants were identified from sporadic ALS patients and burden testing was significant for nine genes. These two studies suggest that genetic variants in genes that alter NAD+ levels, and the by-products of KP are associated with sporadic ALS, may confer risk to developing the disease, and warrant further investigation. Novel gene discovery pipelines, including bioinformatics filtering and in silico prioritisation, were also successfully developed and applied to identify candidate mutations in two small ALS families (MQ52 and FALS147) that were underpowered for traditional genetic studies. Briey, gene discovery in families first involved the identification of shared variants present in multiple affected individuals (MQ52) and absent in a `married-in' family control (FALS147), which then underwent further filtering and analysis to identify a list of putative causative mutations in each family. Variants were filtered to include all protein-altering variants present in the affected individuals, and absent, or at a very low frequency (Allele count ≤5) in controls. Family analysis first considered small nucleotide-level variants and was later extended to structural variants to provide a comprehensive analysis of potential genetic variation causing familial ALS. Seventeen candidate mutations were identified in Family MQ52. This included 14 small nucleotide-level variants and three structural variants. Of these, seven were scored as “high priority” by the in silico pipeline. In Family FALS147, 111 candidate mutations were identified. This included 24 single nucleotide variants and 87 structural variants. Of these, 11 were scored as “high priority” by the in silico pipeline. This work provided multiple novel candidate ALS gene mutations from the analysis of large ALS patient cohorts and two small ALS families, that have been prioritised for downstream investigation in in vitro and in vivo studies. Importantly, the strategies developed during this thesis will continue to be used by our research team, and will be available for others, to investigate newly available ALS families, screen for novel candidate genes, and perform genome-wide burden analysis to identify novel ALS associated genes or disease pathways that may offer new targets for therapeutic development. There is currently no cure or effective treatment for ALS, but progress in unravelling the ALS genetic architecture, facilitated by NGS and newly developed bioinformatics strategies such as those developed here, provides hope for the future.


Table of Contents

1 Introduction -- 2 Subjects and methods -- 3 Development of bioinformatics pipelines and strategies for genetic data analysis -- 4 Analysis of genetic variation in ALS candidate genes from patient cohorts -- 5 Novel gene discovery and prioritisation in small ALS families -- 6 Discussion -- A Appendix -- List of Symbols – References


Additional Supervisor 3: Shu Yang

Awarding Institution

Macquarie University

Degree Type

Thesis PhD


Doctor of Philosophy

Department, Centre or School

Macquarie Medical School

Year of Award


Principal Supervisor

Ian Blair

Additional Supervisor 1

Jennifer Fifita

Additional Supervisor 2

Emily McCann


Copyright: The Author Copyright disclaimer:




536 pages

Former Identifiers

AMIS ID: 245138

Usage metrics

    Macquarie University Theses


    Ref. manager