C.H.A. Logan
Unsupervised star, galaxy, QSO classification: Application of HDBSCAN
Logan, C.H.A.; Fotopoulou, S.
Authors
S. Fotopoulou
Abstract
Context. Classification will be an important first step for upcoming surveys aimed at detecting billions of new sources, such as LSST and Euclid, as well as DESI, 4MOST, and MOONS. The application of traditional methods of model fitting and colour-colour selections will face significant computational constraints, while machine-learning methods offer a viable approach to tackle datasets of that volume. Aims. While supervised learning methods can prove very useful for classification tasks, the creation of representative and accurate training sets is a task that consumes a great deal of resources and time. We present a viable alternative using an unsupervised machine learning method to separate stars, galaxies and QSOs using photometric data. Methods. The heart of our work uses Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to find the star, galaxy, and QSO clusters in a multidimensional colour space. We optimized the hyperparameters and input attributes of three separate HDBSCAN runs, each to select a particular object class and, thus, treat the output of each separate run as a binary classifier. We subsequently consolidated the output to give our final classifications, optimized on the basis of their F1 scores. We explored the use of Random Forest and PCA as part of the pre-processing stage for feature selection and dimensionality reduction. Results. Using our dataset of ∼50 000 spectroscopically labelled objects we obtain F1 scores of 98.9, 98.9, and 93.13 respectively for star, galaxy, and QSO selection using our unsupervised learning method. We find that careful attribute selection is a vital part of accurate classification with HDBSCAN. We applied our classification to a subset of the SDSS spectroscopic catalogue and demonstrated the potential of our approach in correcting misclassified spectra useful for DESI and 4MOST. Finally, we created a multiwavelength catalogue of 2.7 million sources using the KiDS, VIKING, and ALLWISE surveys and published corresponding classifications and photometric redshifts.
Citation
Logan, C., & Fotopoulou, S. (2020). Unsupervised star, galaxy, QSO classification: Application of HDBSCAN. Astronomy & Astrophysics, 633, Article A154. https://doi.org/10.1051/0004-6361/201936648
Journal Article Type | Article |
---|---|
Acceptance Date | Nov 12, 2019 |
Online Publication Date | Jan 23, 2020 |
Publication Date | Jan 31, 2020 |
Deposit Date | Feb 12, 2020 |
Publicly Available Date | Feb 12, 2020 |
Journal | Astronomy and astrophysics. |
Print ISSN | 0004-6361 |
Electronic ISSN | 1432-0746 |
Publisher | EDP Sciences |
Peer Reviewed | Peer Reviewed |
Volume | 633 |
Article Number | A154 |
DOI | https://doi.org/10.1051/0004-6361/201936648 |
Public URL | https://durham-repository.worktribe.com/output/1271307 |
Files
Published Journal Article
(4.4 Mb)
PDF
Copyright Statement
Logan, C. H. A. & Fotopoulou, S. (2020). Unsupervised star, galaxy, QSO classification: Application of HDBSCAN. Astronomy & Astrophysics 633: A154, reproduced with permission, © ESO.
You might also like
Euclid preparation: VII. Forecast validation for Euclid cosmological probes
(2020)
Journal Article
Detecting neutral hydrogen at z ≳ 3 in large spectroscopic surveys of quasars
(2020)
Journal Article
The XXL Survey: XL. Obscuration properties of red AGNs in XXL-N
(2020)
Journal Article
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search