Laila Alrajhi laila.m.alrajhi@durham.ac.uk
PGR Student Doctor of Philosophy
Solving the imbalanced data issue: automatic urgency detection for instructor assistance in MOOC discussion forums
Alrajhi, Laila; Alamri, Ahmed; Pereira, Filipe Dwan; Cristea, Alexandra I.; Oliveira, Elaine H. T.
Authors
Ahmed Alamri
Filipe Dwan Pereira
Professor Alexandra Cristea alexandra.i.cristea@durham.ac.uk
Professor
Elaine H. T. Oliveira
Abstract
In MOOCs, identifying urgent comments on discussion forums is an ongoing challenge. Whilst urgent comments require immediate reactions from instructors, to improve interaction with their learners, and potentially reducing drop-out rates—the task is difficult, as truly urgent comments are rare. From a data analytics perspective, this represents a highly unbalanced (sparse) dataset. Here, we aim to automate the urgent comments identification process, based on fine-grained learner modelling—to be used for automatic recommendations to instructors. To showcase and compare these models, we apply them to the first gold standard dataset for Urgent iNstructor InTErvention (UNITE), which we created by labelling FutureLearn MOOC data. We implement both benchmark shallow classifiers and deep learning. Importantly, we not only compare, for the first time for the unbalanced problem, several data balancing techniques, comprising text augmentation, text augmentation with undersampling, and undersampling, but also propose several new pipelines for combining different augmenters for text augmentation. Results show that models with undersampling can predict most urgent cases; and 3X augmentation + undersampling usually attains the best performance. We additionally validate the best models via a generic benchmark dataset (Stanford). As a case study, we showcase how the naïve Bayes with count vector can adaptively support instructors in answering learner questions/comments, potentially saving time or increasing efficiency in supporting learners. Finally, we show that the errors from the classifier mirrors the disagreements between annotators. Thus, our proposed algorithms perform at least as well as a ‘super-diligent’ human instructor (with the time to consider all comments).
Citation
Alrajhi, L., Alamri, A., Pereira, F. D., Cristea, A. I., & Oliveira, E. H. T. (2024). Solving the imbalanced data issue: automatic urgency detection for instructor assistance in MOOC discussion forums. User Modeling and User-Adapted Interaction, 34(3), 797-852. https://doi.org/10.1007/s11257-023-09381-y
Journal Article Type | Article |
---|---|
Acceptance Date | Aug 9, 2023 |
Online Publication Date | Dec 1, 2023 |
Publication Date | Jul 1, 2024 |
Deposit Date | Jan 10, 2024 |
Publicly Available Date | Jan 10, 2024 |
Journal | User Modeling and User-Adapted Interaction |
Print ISSN | 0924-1868 |
Electronic ISSN | 1573-1391 |
Publisher | Springer |
Peer Reviewed | Peer Reviewed |
Volume | 34 |
Issue | 3 |
Pages | 797-852 |
DOI | https://doi.org/10.1007/s11257-023-09381-y |
Keywords | MOOCs, Machine learning, Undersampling, Error analysis, Natural language processing, Adaptive models, Imbalanced data, Text augmentation, Urgent comments |
Public URL | https://durham-repository.worktribe.com/output/2118421 |
Files
Published Journal Article (Advance Online Version)
(2.1 Mb)
PDF
Licence
http://creativecommons.org/licenses/by/4.0/
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
Copyright Statement
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Published Journal Article
(2.1 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
Serendipitous Gains of Explaining a Classifier - Artificial versus Human Performance and Annotator Support in an Urgent Instructor-Intervention Model for MOOCs
(2023)
Presentation / Conference Contribution
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search