Arwa Al saqaabi arwa.alsaqaabi@durham.ac.uk
PGR Student Doctor of Philosophy
A Deep Learning Approach for Paragraph-Level Paraphrase Generation for Plagiarism Detection
Saqaabi, Arwa Al; Stewart, Craig; Akrida, Eleni; Cristea, Alexandra I.
Authors
Dr Craig Stewart craig.d.stewart@durham.ac.uk
Associate Professor
Dr Eleni Akrida eleni.akrida@durham.ac.uk
Associate Professor
Professor Alexandra Cristea alexandra.i.cristea@durham.ac.uk
Professor
Abstract
Expressing information in different forms is an important skill that students should develop in school. This skill positively impacts academic reading and writing. However, it can also lead to negative consequences, such as plagiarism. Students may paraphrase original texts and present them as their own work. Therefore, the need to develop effective approaches to detect plagiarism and identify paraphrase has become increasingly important in academia, journalism, publishing, and other fields where innovation, novelty, and originality are highly valued, especially with the rising incidence of plagiarism in these areas because of the easy access to information on the internet and the capabilities of large language models. Most published detection methods analyse plagiarism at the sentence-level. We have developed approaches for generating and detecting paraphrased paragraphs by considering inter-sentence and intra-sentence relations, which enables the identification of paraphrased text at the paragraph-level. This includes joining, splitting, and/or shifting sentences within a paragraph, as students often plagiarise paragraphs. In the generating stage, we create the ALECS dataset, by developing three algorithms and applying a masking approach to tackle the paragraph’s syntactic and lexical layers while maintaining the paragraph’s semantics. ALECS can contribute to developing students’ abilities in paraphrasing, as there are more than 6 different forms for each source paragraph. In addition, as in this study, ALECS can be employed to train deep learning models for the purpose of generating or detecting plagiarised paragraphs. For the detection phase, our method shows robust results and outperforms existing work in detecting paragraph-level paraphrases, achieving a 90.1 F1 score with Longformer and reaching 96 when using a fine-tuned GPT-3.5. Graphical Abstract:
Citation
Saqaabi, A. A., Stewart, C., Akrida, E., & Cristea, A. I. (2025). A Deep Learning Approach for Paragraph-Level Paraphrase Generation for Plagiarism Detection. Neural Processing Letters, 57, 59. https://doi.org/10.1007/s11063-025-11771-9
Journal Article Type | Article |
---|---|
Acceptance Date | May 10, 2025 |
Online Publication Date | Jun 12, 2025 |
Publication Date | Jun 12, 2025 |
Deposit Date | Jun 19, 2025 |
Publicly Available Date | Jun 23, 2025 |
Journal | Neural Processing Letters |
Print ISSN | 1370-4621 |
Electronic ISSN | 1573-773X |
Publisher | Springer |
Peer Reviewed | Peer Reviewed |
Volume | 57 |
Pages | 59 |
DOI | https://doi.org/10.1007/s11063-025-11771-9 |
Keywords | Plagiarism detection, Paraphrase identification, Artificial intelligence, Paragraph-level, Academic writing, Natural language processing, Large language models |
Public URL | https://durham-repository.worktribe.com/output/4104818 |
Files
Published Journal Article
(2 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
Paraphrase Generation and Identification at Paragraph-Level
(2024)
Presentation / Conference Contribution
Towards Designing Profitable Courses: Predicting Student Purchasing Behaviour in MOOCs
(2021)
Journal Article
Serious games and e-learning standards: Towards an integrated experience
(2013)
Journal Article
Authoring \& Culture in Online Education.
(2008)
Journal Article