posted on 2022-03-28, 19:40authored byMadawa Priyadarshana Weerasinghe Jayawardana Rathambalage
Heterogeneity in observed data is a common feature that statisticians have to deal with when analyzing data. Estimating these changes in an observed process not only helps to better model the underlying phenomena, but also facilitates the process of making more informed decisions. In health informatics, when analyzing patients’ genomes with complex diseases, it is a pivotal step in finding disease-causing genes or active regions of the genome that has functional importance when characterizing these diseases. Changepoint analysis methods are among the best approaches that can be used to address this problem of locating important genomic variations in genomes. Detection of these variations helps researchers and practitioners to assess disease progression, prognosis and efficacy of treatments. Thus, at patient level it helps to provide more improved personalized medicine to alleviate a disease.
The overall research aim of this thesis is to introduce the Cross-Entropy (CE) method, a model-based stochastic optimization procedure that nests under the branch of evolutionary computing techniques, to establish both the number of change-points and their locations in biological sequences. Particularly we focused on analyzing array comparative genomic hybridization (aCGH) data and DNA read count data obtained through next generation sequencing (NGS) methods. Several variants of the CE method are proposed in this work to detect change-point locations in both continuous and discrete (count) data. Di↵erent model selection criteria are used in the CE method to estimate the optimal number of change-points. It is known that evolutionary computing methods consume more computational resources due to the nature of their implementation. In this thesis we propose two alternative solutions to ameliorate this efficiency issue of the general CE algorithm. At first, we develop a multi-core parallel implementation of the CE algorithm in the R statistical computing environment. Later, for the first time in the literature, we combine two powerful sequential detection techniques with the CE method to further increase its efficiency. We further explore the feasibility of incorporating auxiliary information to the process of change-point detection in the CE method with the use of generalized additive model for location, scale and shape (GAMLSS). A series of extensive simulations were performed in multiple publications to establish the procedures and to ascertain their efficacy. We apply the proposed variants of the CE method to both aCGH and DNAread count data obtained through NGS methods to detect copy number variations. The methods discussed in this thesis are freely available as an R package named “breakpoint” at the website http://cran.r-project.org/web/packages/breakpoint/index.html.
This thesis contains four peer-reviewed publications, which include a book chapter, a journal article and two conference papers. It further includes details of an R package developed to detect multiple change-points in continuous and count data based on the methods developed in this thesis.
History
Alternative Title
CE method and multiple change-point detection.
Table of Contents
1. Introduction and thesis outline -- 2. Methods -- 3. Multiple break-points detection in array CGH data via the cross-entropy method -- 4. Hybrid algorithms for multiple change-point detection biological sequences -- 5. The cross entropy method for detecting multiple change points in DNA read count data -- 6. GAMLSS and extended cross-entropy method to detect multiple change-points in DNA read count data -- 7. Breakpoint : an R package to detect multiple change-points via the CE method -- 8. Discussion and future directions.
Notes
Includes bibliographic references
Thesis by publication.
Spine title: The CE method and multiple change-point detection.
Awarding Institution
Macquarie University
Degree Type
Thesis PhD
Degree
PhD, Macquarie University, Faculty of Science and Engineering, Department of Statistics