Machine learning for transient discovery in Pan-STARRS1 difference imaging

Wright, D.E.; Smartt, S.J.; Smith, K.W.; Miller, P.; Kotak, R.; Rest, A.; Burgett, W.S.; Chambers, K.C.; Flewelling, H.; Hodapp, K.W.; Huber, M.; Jedicke, R.; Kaiser, N.; Metcalfe, N.; Price, P.A.; Tonry, J.L.; Wainscoat, R.J.; Waters, C.

doi:10.1093/mnras/stv292

Machine learning for transient discovery in Pan-STARRS1 difference imaging

Wright, D.E.; Smartt, S.J.; Smith, K.W.; Miller, P.; Kotak, R.; Rest, A.; Burgett, W.S.; Chambers, K.C.; Flewelling, H.; Hodapp, K.W.; Huber, M.; Jedicke, R.; Kaiser, N.; Metcalfe, N.; Price, P.A.; Tonry, J.L.; Wainscoat, R.J.; Waters, C.

Authors

D.E. Wright

S.J. Smartt

K.W. Smith

P. Miller

R. Kotak

A. Rest

W.S. Burgett

K.C. Chambers

H. Flewelling

K.W. Hodapp

M. Huber

R. Jedicke

N. Kaiser

Dr Nigel Metcalfe nigel.metcalfe@durham.ac.uk
Assistant Professor

P.A. Price

J.L. Tonry

R.J. Wainscoat

C. Waters

Abstract

Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real–bogus classification by constructing a training set from the image data of ∼32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 × 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.

Citation

Wright, D., Smartt, S., Smith, K., Miller, P., Kotak, R., Rest, A., …Waters, C. (2015). Machine learning for transient discovery in Pan-STARRS1 difference imaging. Monthly Notices of the Royal Astronomical Society, 449(1), 451-466. https://doi.org/10.1093/mnras/stv292

Journal Article Type	Article
Acceptance Date	Feb 9, 2015
Publication Date	May 1, 2015
Deposit Date	Aug 3, 2015
Publicly Available Date	Sep 4, 2015
Journal	Monthly Notices of the Royal Astronomical Society
Print ISSN	0035-8711
Electronic ISSN	1365-2966
Publisher	Royal Astronomical Society
Peer Reviewed	Peer Reviewed
Volume	449
Issue	1
Pages	451-466
DOI	https://doi.org/10.1093/mnras/stv292
Keywords	Methods: data analysis, Methods: statistical, Techniques: image processing surveys, Supernovae: general.
Public URL	https://durham-repository.worktribe.com/output/1404474

Files

Published Journal Article (3.8 Mb)
PDF

Copyright Statement
This article has been accepted for publication in Monthly Notices of the Royal Astronomical Society ©: 2015 The Authors Published by Oxford University Press on behalf of the Royal Astronomical Society. All rights reserved.