Skip to main content

Research Repository

Advanced Search

A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course

Yeadon, Will; Peach, Alex; Testrow, Craig

A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course Thumbnail


Authors



Abstract

This study evaluates the performance of ChatGPT variants, GPT-3.5 and GPT-4, both with and without prompt engineering, against solely student work and a mixed category containing both student and GPT-4 contributions in university-level physics coding assignments using the Python language. Comparing 50 student submissions to 50 AI-generated submissions across different categories, and marked blindly by three independent markers, we amassed n=300 data points. Students averaged 91.9% (SE:0.4), surpassing the highest performing AI submission category, GPT-4 with prompt engineering, which scored 81.1% (SE:0.8)—a statistically significant difference (p = 2.482×10-10). Prompt engineering significantly improved scores for both GPT-4 (p = 1.661×10-4) and GPT-3.5 (p = 4.967×10-9). Additionally, the blinded markers were tasked with guessing the authorship of the submissions on a four-point Likert scale from ‘Definitely AI’ to ‘Definitely Human’. They accurately identified the authorship, with 92.1% of the work categorized as ‘Definitely Human’ being human-authored. Simplifying this to a binary ‘AI’ or ‘Human’ categorization resulted in an average accuracy rate of 85.3%. These findings suggest that while AI-generated work closely approaches the quality of university students’ work, it often remains detectable by human evaluators.

Citation

Yeadon, W., Peach, A., & Testrow, C. (2024). A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course. Scientific Reports, 14, Article 23285. https://doi.org/10.1038/s41598-024-73634-y

Journal Article Type Article
Acceptance Date Sep 19, 2024
Online Publication Date Oct 7, 2024
Publication Date Oct 7, 2024
Deposit Date Oct 23, 2024
Publicly Available Date Oct 23, 2024
Journal Scientific Reports
Electronic ISSN 2045-2322
Publisher Nature Research
Peer Reviewed Peer Reviewed
Volume 14
Article Number 23285
DOI https://doi.org/10.1038/s41598-024-73634-y
Keywords ChatGPT, Benchmark, Coding, GPT-4
Public URL https://durham-repository.worktribe.com/output/2954838

Files





You might also like



Downloadable Citations