01whole.pdf (10.02 MB)
Estimation of the genome-wide number and patterns of putative somatic mutations within individuals of Eucalyptus pauciflora
thesisposted on 2022-03-28, 10:49 authored by Eder Alejandro Morales-Suárez
Somatic mutations occur in the body cells of an individual organism. In plants, somatic mutations can accumulate as they grow, causing genetic variation between cells within an individual. Furthermore, in plants, unlike in animals, somatic mutations can be heritable, with potential effects including reduction in fitness through the accumulation of deleterious mutations, increase in genetic diversity in clonal populations, and increased survival under stressful environments. Despite the evidence of the occurrence and consequences of somatic mutations in plants, very little is known about the extent to which they occur across the genome. Even with the progress of next-generation sequencing technology, the lack of standard bioinformatic pipelines makes it difficult to investigate somatic mutations in plants. In this thesis, I present the application of a novel bioinformatics pipeline for somatic variant discovery across the entire genome of three individuals of the snow gum tree Eucalyptus pauciflora. We selected individuals of E. pauciflora (located at the Thredbo Kosciusko National Park in Australia) and sequenced the full genome of eight branches in triplicate within each tree. First, using the sequence data of one individual tree (Chapter 2), I generated(and described the corresponding method) a pseudo-reference genome for this non-model species. Later, I present a novel bioinformatics pipeline for the genome-wide discovery of variable sites with different levels of filtering. I also show how the application of a statistical/phylogenetic approach can serve as a positive-control analysis to assess the relatedness of genotypic information within trees. Later on, I present the analysis described above in two more trees and I compare numbers and patterns of putative somatic mutations among individuals (Chapter 3). I also show the impact of different reference genomes to improve mapping quality and reduce general error rate (Chapter 4). This investigation allowed me to compare how important is mapping quality in the discovery of variable sites within trees. Finally, I present the results of a gene annotation prediction analysis that determined the location in the genome of the putative somatic mutations identified with my protocols (Chapter4). These results provide the first genome-wide comparison of number and patterns of putative somatic mutations in more than one individual applying the same bioinformatics pipeline and open the possibility to explore the potential fitness effects of somatic mutations within plants.