Dr Will Yeadon will.yeadon@durham.ac.uk
Assistant Professor
Dr Will Yeadon will.yeadon@durham.ac.uk
Assistant Professor
Dr Alex Peach a.m.peach@durham.ac.uk
Assistant Professor
Dr Craig Testrow craig.p.testrow@durham.ac.uk
Assistant Professor
This study evaluates the performance of ChatGPT variants, GPT-3.5 and GPT-4, both with and without prompt engineering, against solely student work and a mixed category containing both student and GPT-4 contributions in university-level physics coding assignments using the Python language. Comparing 50 student submissions to 50 AI-generated submissions across different categories, and marked blindly by three independent markers, we amassed n=300 data points. Students averaged 91.9% (SE:0.4), surpassing the highest performing AI submission category, GPT-4 with prompt engineering, which scored 81.1% (SE:0.8)—a statistically significant difference (p = 2.482×10-10). Prompt engineering significantly improved scores for both GPT-4 (p = 1.661×10-4) and GPT-3.5 (p = 4.967×10-9). Additionally, the blinded markers were tasked with guessing the authorship of the submissions on a four-point Likert scale from ‘Definitely AI’ to ‘Definitely Human’. They accurately identified the authorship, with 92.1% of the work categorized as ‘Definitely Human’ being human-authored. Simplifying this to a binary ‘AI’ or ‘Human’ categorization resulted in an average accuracy rate of 85.3%. These findings suggest that while AI-generated work closely approaches the quality of university students’ work, it often remains detectable by human evaluators.
Yeadon, W., Peach, A., & Testrow, C. (2024). A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course. Scientific Reports, 14, Article 23285. https://doi.org/10.1038/s41598-024-73634-y
Journal Article Type | Article |
---|---|
Acceptance Date | Sep 19, 2024 |
Online Publication Date | Oct 7, 2024 |
Publication Date | Oct 7, 2024 |
Deposit Date | Oct 23, 2024 |
Publicly Available Date | Oct 23, 2024 |
Journal | Scientific Reports |
Electronic ISSN | 2045-2322 |
Publisher | Nature Research |
Peer Reviewed | Peer Reviewed |
Volume | 14 |
Article Number | 23285 |
DOI | https://doi.org/10.1038/s41598-024-73634-y |
Keywords | ChatGPT, Benchmark, Coding, GPT-4 |
Public URL | https://durham-repository.worktribe.com/output/2954838 |
Published Journal Article
(2.3 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
The death of the short-form physics essay in the coming AI revolution
(2023)
Journal Article
Emergent dark gravity from (non)holographic screens
(2019)
Journal Article
Tensor network models of multiboundary wormholes
(2017)
Journal Article
Hot multiboundary wormholes from bipartite entanglement
(2015)
Journal Article
Schrödinger holography with z = 2
(2015)
Journal Article
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search