Ryan Hodgson ryan.t.hodgson@durham.ac.uk
Post Doctoral Research Associate
Partially-Supervised Metric Learning via Dimensionality Reduction of Text Embeddings using Transformer Encoders and Attention Mechanisms
Hodgson, Ryan; Wang, Jingyun; Cristea, Alexandra I.; Graham, John
Authors
Dr Jingyun Wang jingyun.wang@durham.ac.uk
Assistant Professor
Professor Alexandra Cristea alexandra.i.cristea@durham.ac.uk
Professor
John Graham
Abstract
Real-world applications of word embeddings to downstream clustering tasks may experience limitations to performance, due to the high degree of dimensionality of the embeddings. In particular, clustering algorithms do not scale well when applied to highly dimensional data. One method to address this is through the use of dimensionality reduction algorithms (DRA). Current state of the art algorithms for dimensionality reduction (DR) have been demonstrated to contribute to improvements in clustering accuracy and performance. However, the impact that a neural network architecture can have on the current state of the art Parametric Uniform Manifold Approximation and Projection (UMAP) algorithm is yet unexplored. This work investigates, for the first time, the effects of using attention mechanisms in neural networks for Parametric UMAP, through the application of network architectures that have had considerable effect upon the wider machine learning and natural language processing (NLP) fields - namely, the transformer-encoder, and the bidirectional recurrent neural network. We implement these architectures within a semi-supervised metric learning pipeline, with results demonstrating an improvement in the clustering accuracy, compared to conventional DRA techniques, on three out of four datasets, and comparable SoA accuracy on the fourth. To further support our analysis, we also investigate the effects of the transformer-encoder metric-learning pipeline upon the individual class accuracy of downstream clustering, for highly imbalanced datasets. Our analyses indicate that the proposed pipeline with transformer-encoder for parametric UMAP confers a significantly measurable benefit to the accuracy of underrepresented classes.
Citation
Hodgson, R., Wang, J., Cristea, A. I., & Graham, J. (2024). Partially-Supervised Metric Learning via Dimensionality Reduction of Text Embeddings using Transformer Encoders and Attention Mechanisms. IEEE Access, 12, 77536-77554. https://doi.org/10.1109/access.2024.3403991
Journal Article Type | Article |
---|---|
Acceptance Date | May 17, 2024 |
Online Publication Date | May 22, 2024 |
Publication Date | May 22, 2024 |
Deposit Date | May 28, 2024 |
Publicly Available Date | May 28, 2024 |
Journal | IEEE Access |
Electronic ISSN | 2169-3536 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 12 |
Pages | 77536-77554 |
DOI | https://doi.org/10.1109/access.2024.3403991 |
Public URL | https://durham-repository.worktribe.com/output/2466512 |
Files
Published Journal Article (Advance Online Version)
(3.6 Mb)
PDF
Published Journal Article
(3.7 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
Editorial: New challenges and future perspectives in cognitive neuroscience
(2024)
Journal Article
Using deep learning to analyze the psychological effects of COVID-19
(2023)
Journal Article
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search