Needles in a haystack: advanced statistical techniques and large stellar spectroscopic datasets
thesisposted on 28.03.2022, 18:53 by Arvind Hughes
With the development of advanced astronomical instruments, many survey teams are producing datasets that are too large for traditional analysis. Thanks to recent improvements in computing and statistical methods, it is now possible to extract information more efficiently. In this thesis, data from the GALactic Archaeology with HERMES spectroscopic survey (GALAH), is used to show how machine learning methods can identify rare but interesting stars. This thesis tests a new methodology that employs the t-SNE dimensionality reduction technique with the clustering method, HDBSCAN, and a new tool developed by the researcher, the t-SNE Visualiser. This method was applied to ∼ 200,000 stars in the GALAH dataset with the aim of detecting extremely metal-poor stars and Solar twins. Applying this approach lead to the discovery of 66 possible extremely metal-poor stars and 20 Solar twin candidates. A verification of the success of the new methodology is also presented -- abstract.