Identification and classification of emission-line stars in the GALAH Survey using machine learning
The advent of massive stellar spectroscopic surveys with hundreds of thousands—or even millions of spectra—presents serious challenges for the identification and classification of atypical objects, such as emission-line stars. To date, a variety of machine-learning methods have been applied, but in most cases actual classification has been carried out manually by humans, even for datasets comprising tens of thousands of spectra. As spectroscopic surveys grow larger—by orders of magnitude—manual solutions become untenable. Additionally, in instances where machine learning has been applied, researchers have relied on manually developed training data sets which are not available for atypical spectra.
This thesis seeks to address the twin problems of identification and classification of emission-line stars in large spectroscopic data sets—like the GALAH survey—through the application of unsupervised machine learning methods. GALAH is a million-star highresolution spectroscopic survey of the Milky Way, and its most recent public data release (Data Release 3 - DR3) contains more than 600,000 high-resolution spectra.
When developing machine learning methods to identify emission-line spectra in the GALAH survey, some limitations encountered included a lack of training data, a high proportion of typical spectra in survey data—resulting in poor performance of other machine learning methods—and high dimensionality, among other constraints, making the identification and classification of emission-line spectra extremely challenging.
This necessitates the use of unsupervised machine learning methods, which this thesis will demonstrate by identifying and classifying over 7000 emission-line spectra, including over 200 P Cygni and 200 inverse P Cygni spectra, thereby providing a more accurate estimation of the population of emission line stars found in the DR3 survey. This method can, in turn, be applied to other surveys, leading to more emission line stars being identified, while improving the accuracy of the stellar parameter estimates of these atypical objects.