SMS Spam Filtering using Probabilistic Topic Modelling and Stacked Denoising Autoencoder

Al Moubayed, N.; Breckon, T.P.; Matthews, P.C.; McGough, A.S.

doi:10.1007/978-3-319-44781-0_50

SMS Spam Filtering using Probabilistic Topic Modelling and Stacked Denoising Autoencoder

Al Moubayed, N.; Breckon, T.P.; Matthews, P.C.; McGough, A.S.

Authors

Dr Noura Al Moubayed noura.al-moubayed@durham.ac.uk
Associate Professor

Professor Toby Breckon toby.breckon@durham.ac.uk
Professor

Dr Peter Matthews p.c.matthews@durham.ac.uk
Associate Professor

A.S. McGough

Contributors

Alessandro E. P. Villa
Editor

Paolo Masulli
Editor

Antonio J. Pons Rivero
Editor

Abstract

In This paper we present a novel approach to spam filtering and demonstrate its applicability with respect to SMS messages. Our approach requires minimum features engineering and a small set of labelled data samples. Features are extracted using topic modelling based on latent Dirichlet allocation, and then a comprehensive data model is created using a Stacked Denoising Autoencoder (SDA). Topic modelling summarises the data providing ease of use and high interpretability by visualising the topics using word clouds. Given that the SMS messages can be regarded as either spam (unwanted) or ham (wanted), the SDA is able to model the messages and accurately discriminate between the two classes without the need for a pre-labelled training set. The results are compared against the state-of-the-art spam detection algorithms with our proposed approach achieving over 97 % accuracy which compares favourably to the best reported algorithms presented in the literature.

Citation

Al Moubayed, N., Breckon, T., Matthews, P., & McGough, A. (2016, August). SMS Spam Filtering using Probabilistic Topic Modelling and Stacked Denoising Autoencoder

Presentation Conference Type	Conference Paper (published)
Acceptance Date	Jun 16, 2016
Online Publication Date	Aug 13, 2016
Publication Date	Aug 13, 2016
Deposit Date	Jun 17, 2016
Publicly Available Date	Aug 13, 2017
Print ISSN	0302-9743
Publisher	Springer Verlag
Volume	2
Pages	423-430
Series Title	Lecture notes in computer science
Series Number	9886
Series ISSN	0302-9743,1611-3349
Book Title	Artificial neural networks and machine learning – ICANN 2016 : 25th International Conference on Artificial Neural Networks, Barcelona, Spain, September 6-9, 2016 ; proceedings. Part II.
ISBN	9783319447803
DOI	https://doi.org/10.1007/978-3-319-44781-0_50
Keywords	topic modelling, text processing, deep learning
Public URL	https://durham-repository.worktribe.com/output/1150253
Related Public URLs	https://arxiv.org/abs/1606.05554