Dr Will Yeadon will.yeadon@durham.ac.uk
Career Development Fellow
Dr Will Yeadon will.yeadon@durham.ac.uk
Career Development Fellow
Dr Alex Peach a.m.peach@durham.ac.uk
Assistant Professor
Dr Craig Testrow craig.p.testrow@durham.ac.uk
Assistant Professor
This study evaluates the performance of ChatGPT variants, GPT-3.5 and GPT-4, both with and without prompt engineering, against solely student work and a mixed category containing both student and GPT-4 contributions in university-level physics coding assignments using the Python language. Comparing 50 student submissions to 50 AI-generated submissions across different categories, and marked blindly by three independent markers, we amassed n=300 data points. Students averaged 91.9% (SE:0.4), surpassing the highest performing AI submission category, GPT-4 with prompt engineering, which scored 81.1% (SE:0.8)—a statistically significant difference (p = 2.482×10-10). Prompt engineering significantly improved scores for both GPT-4 (p = 1.661×10-4) and GPT-3.5 (p = 4.967×10-9). Additionally, the blinded markers were tasked with guessing the authorship of the submissions on a four-point Likert scale from ‘Definitely AI’ to ‘Definitely Human’. They accurately identified the authorship, with 92.1% of the work categorized as ‘Definitely Human’ being human-authored. Simplifying this to a binary ‘AI’ or ‘Human’ categorization resulted in an average accuracy rate of 85.3%. These findings suggest that while AI-generated work closely approaches the quality of university students’ work, it often remains detectable by human evaluators.
Yeadon, W., Peach, A., & Testrow, C. (2024). A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course. Scientific Reports, 14, Article 23285. https://doi.org/10.1038/s41598-024-73634-y
Journal Article Type | Article |
---|---|
Acceptance Date | Sep 19, 2024 |
Online Publication Date | Oct 7, 2024 |
Publication Date | Oct 7, 2024 |
Deposit Date | Oct 23, 2024 |
Publicly Available Date | Oct 23, 2024 |
Journal | Scientific Reports |
Electronic ISSN | 2045-2322 |
Publisher | Nature Research |
Peer Reviewed | Peer Reviewed |
Volume | 14 |
Article Number | 23285 |
DOI | https://doi.org/10.1038/s41598-024-73634-y |
Keywords | ChatGPT, Benchmark, Coding, GPT-4 |
Public URL | https://durham-repository.worktribe.com/output/2954838 |
Published Journal Article
(2.3 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
Evaluating AI and human authorship quality in academic writing through physics essays
(2024)
Journal Article
The impact of AI in physics education: a comprehensive review from GCSE to university levels
(2024)
Journal Article
The death of the short-form physics essay in the coming AI revolution
(2023)
Journal Article
Emergent dark gravity from (non)holographic screens
(2019)
Journal Article
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search