Skip to main content

Research Repository

Advanced Search

Semi-supervised Speech Confidence Detection using Psuedo-labelling and Whisper Embeddings

Wynn, Adam; Wang, Jingyun; Tan, Xiangyu

Authors

Adam Wynn adam.t.wynn@durham.ac.uk
PGR Student Doctor of Philosophy

Xiangyu Tan



Abstract

Understanding speaker confidence is crucial in educational settings, as it can enhance personalised feedback and improve learning outcomes. This study introduces a novel framework for detecting speaker confidence by integrating human-engineered features with embeddings from the Whisper encoder. To address data limitations, a pseudo-labelling technique is employed to expand the labelled dataset, allowing the model to learn from both human-annotated and model-generated labels. The framework combines traditional speech features including pitch, volume, rate of speech, and the presence of disfluencies and stress, with Whisper embeddings, and uses a co-attention mechanism to fuse these representations and achieve an overall accuracy of 75%. This study contributes to advancing speech analysis, enabling applications that support personalised learning and speaking skill development.

Citation

Wynn, A., Wang, J., & Tan, X. (2025, July). Semi-supervised Speech Confidence Detection using Psuedo-labelling and Whisper Embeddings. Presented at The 26th International Conference on Artificial Intelligence in Education, Palermo, Italy

Presentation Conference Type Conference Paper (published)
Conference Name The 26th International Conference on Artificial Intelligence in Education
Start Date Jul 22, 2025
End Date Jul 26, 2025
Acceptance Date Apr 4, 2025
Online Publication Date Feb 20, 2025
Publication Date Feb 20, 2025
Deposit Date Jul 4, 2025
Peer Reviewed Peer Reviewed
Book Title Artificial Intelligence in Education
DOI https://doi.org/10.1007/978-3-031-98465-5_34
Public URL https://durham-repository.worktribe.com/output/4253149