Chat GPT and AI writing software has become increasingly prominent. They can generate text in seconds. However, this has raised the risk of cheating in take home exams and assignment.
Anti-cheating software is trying to tackle this looming problem. This includes Turnitin, which claims to be able to detect AI generated text. Similarly, AICheatCheck offers a free tool to check for AI generated text.
How would anti-cheating software detect AI generated text
How then does anti-cheating software detect AI generated text? Eric Wang, the Vice President of AI at Turnitin, argues that it is by looking for the text’s central tendency. They argue that AI generated text is very ‘average’. AI text tends to produce the most statistically likely words in the most statistically likely syntax. Thus, the software looks for how much the text resembles that average.
“These models are trained at the sum of human knowledge, so they write extremely average. They are mad-lib machines that pick the most probable word in the most probable place. Humans are idiosyncratic… no person is actually exactly average”
The software can be successful. But, it has issues. For example, AICheatCheck can successfully detect AI generated text for simple queries. For example, if we ask Chat GPT to “explain free cash flow”, the software can detect AI generated text.
The software can struggle with more complex examples. For example, if we ask Chat GPT to “explain free cash flow in the style of Lewis Carroll”, the software might not detect that it is AI generated. This makes some sense: the AI generated text is surreal. However, nevertheless, the software does not detect that it is AI generated.
This also exposes a further problem. A potential cheater can ask the AI to generate text and can then slightly change the words, word order, sentences, and syntax. This could potentially defeat cheat detection software. Reddit users have already raised this possibility.
What then are academics to do?
There are some solutions to AI-driven cheating. Open AI has indicated that it will include cryptographic markers to enable cheat-detection software to detect AI generated text. However, this does seem to solve the issue of when students generate AI text and slightly alter the wording.
Academics can mitigate this issue with appropriately designed exams and assignments. At present, AI tends to generate generic, superficial answers. The answers tend to involve generalizations. These generalizations can be useful. However, it is often not sophisticated enough to appropriately apply evidence to practical scenarios.
Academics can use AI’s current weaknesses in their exam design. This can involve designing problem sets that require applied knowledge, especially where the answers require students to cite appropriate scenario-specific evidence. For example, law problem questions often require students to apply statutes and cases to specific fact scenarios. AI tends to do a poor job correctly citing and applying evidence. It also might confuse jurisdictions, meaning that an AI-generated answer might simply be wrong. This means that, in an appropriately designed question, the AI generated answer might generate low marks (or fail) regardless.
Regardless of the current solution, AI poses challenges for assessment design. As AI develops over time, so too must assessments and cheat-detection software. But, currently, academics can improve assessment design to reduce these risks.
Share this post