Skip to main content

Research Repository

Advanced Search

Partially-Supervised Metric Learning via Dimensionality Reduction of Text Embeddings using Transformer Encoders and Attention Mechanisms

Hodgson, Ryan; Wang, Jingyun; Cristea, Alexandra I.; Graham, John

Partially-Supervised Metric Learning via Dimensionality Reduction of Text Embeddings using Transformer Encoders and Attention Mechanisms Thumbnail


Authors

Ryan Hodgson ryan.t.hodgson@durham.ac.uk
Post Doctoral Research Associate

John Graham



Abstract

Real-world applications of word embeddings to downstream clustering tasks may experience limitations to performance, due to the high degree of dimensionality of the embeddings. In particular, clustering algorithms do not scale well when applied to highly dimensional data. One method to address this is through the use of dimensionality reduction algorithms (DRA). Current state of the art algorithms for dimensionality reduction (DR) have been demonstrated to contribute to improvements in clustering accuracy and performance. However, the impact that a neural network architecture can have on the current state of the art Parametric Uniform Manifold Approximation and Projection (UMAP) algorithm is yet unexplored. This work investigates, for the first time, the effects of using attention mechanisms in neural networks for Parametric UMAP, through the application of network architectures that have had considerable effect upon the wider machine learning and natural language processing (NLP) fields - namely, the transformer-encoder, and the bidirectional recurrent neural network. We implement these architectures within a semi-supervised metric learning pipeline, with results demonstrating an improvement in the clustering accuracy, compared to conventional DRA techniques, on three out of four datasets, and comparable SoA accuracy on the fourth. To further support our analysis, we also investigate the effects of the transformer-encoder metric-learning pipeline upon the individual class accuracy of downstream clustering, for highly imbalanced datasets. Our analyses indicate that the proposed pipeline with transformer-encoder for parametric UMAP confers a significantly measurable benefit to the accuracy of underrepresented classes.

Citation

Hodgson, R., Wang, J., Cristea, A. I., & Graham, J. (2024). Partially-Supervised Metric Learning via Dimensionality Reduction of Text Embeddings using Transformer Encoders and Attention Mechanisms. IEEE Access, 12, 77536-77554. https://doi.org/10.1109/access.2024.3403991

Journal Article Type Article
Acceptance Date May 17, 2024
Online Publication Date May 22, 2024
Publication Date May 22, 2024
Deposit Date May 28, 2024
Publicly Available Date May 28, 2024
Journal IEEE Access
Electronic ISSN 2169-3536
Publisher Institute of Electrical and Electronics Engineers
Peer Reviewed Peer Reviewed
Volume 12
Pages 77536-77554
DOI https://doi.org/10.1109/access.2024.3403991
Public URL https://durham-repository.worktribe.com/output/2466512

Files






You might also like



Downloadable Citations