Skip to main content

Research Repository

Advanced Search

SMS Spam Filtering using Probabilistic Topic Modelling and Stacked Denoising Autoencoder

Al Moubayed, N.; Breckon, T.P.; Matthews, P.C.; McGough, A.S.

SMS Spam Filtering using Probabilistic Topic Modelling and Stacked Denoising Autoencoder Thumbnail


Authors

A.S. McGough



Contributors

Alessandro E. P. Villa
Editor

Paolo Masulli
Editor

Antonio J. Pons Rivero
Editor

Abstract

In This paper we present a novel approach to spam filtering and demonstrate its applicability with respect to SMS messages. Our approach requires minimum features engineering and a small set of labelled data samples. Features are extracted using topic modelling based on latent Dirichlet allocation, and then a comprehensive data model is created using a Stacked Denoising Autoencoder (SDA). Topic modelling summarises the data providing ease of use and high interpretability by visualising the topics using word clouds. Given that the SMS messages can be regarded as either spam (unwanted) or ham (wanted), the SDA is able to model the messages and accurately discriminate between the two classes without the need for a pre-labelled training set. The results are compared against the state-of-the-art spam detection algorithms with our proposed approach achieving over 97 % accuracy which compares favourably to the best reported algorithms presented in the literature.

Citation

Al Moubayed, N., Breckon, T., Matthews, P., & McGough, A. (2016, August). SMS Spam Filtering using Probabilistic Topic Modelling and Stacked Denoising Autoencoder

Presentation Conference Type Conference Paper (published)
Acceptance Date Jun 16, 2016
Online Publication Date Aug 13, 2016
Publication Date Aug 13, 2016
Deposit Date Jun 17, 2016
Publicly Available Date Aug 13, 2017
Print ISSN 0302-9743
Publisher Springer Verlag
Volume 2
Pages 423-430
Series Title Lecture notes in computer science
Series Number 9886
Series ISSN 0302-9743,1611-3349
Book Title Artificial neural networks and machine learning – ICANN 2016 : 25th International Conference on Artificial Neural Networks, Barcelona, Spain, September 6-9, 2016 ; proceedings. Part II.
ISBN 9783319447803
DOI https://doi.org/10.1007/978-3-319-44781-0_50
Keywords topic modelling, text processing, deep learning
Public URL https://durham-repository.worktribe.com/output/1150253
Related Public URLs https://arxiv.org/abs/1606.05554