How to detect ChatGPT plagiarism and why it's so difficult
Chatbots have now become a hot topic, with ChatGPT being the most popular. Academics, educators, editors and others are facing a rising tide of AI-generated plagiarism. Your existing plagiarism detection tools may not be able to distinguish the real from the fake.
- Plenty of detection options
- Testing them
- Closing
In this article, I talk a bit about this nightmare side of AI chatbots, check out some online plagiarism detection tools, and examine how dire the situation has become.
Plenty of detection options
Latest edition ChatGPT of November 2022 by startup OpenAI has essentially put the chatbot's agility in the spotlight. With it, anyone, whether a regular Joe or a professional, can create intelligent, understandable articles or essays, as well as solve text-based math problems. The content created by AI is easy to read for the inexperienced or uninformed reader. That's why students love it and teachers hate it.
The big challenge for AI writing tools is their double-edged ability to use natural language and grammar to build unique and almost individualized content, even if the content itself was taken from a database. This means that AI-based cheating is in danger. Here are some options I found that are now free.
GPT-2 ChatGPT creator OpenAI created Output Detector to show that it can detect chatbot text. Output Detector can be used by anyone. Users simply need to type in a text, and the tool will immediately give an assessment of whether the text was sent from a human.
Two other tools with clean UI are Writer AI Content Detector and Content at Scale. You can add a URL (writer only) to scan the content or manually add text. The results will give you a percentage score that indicates how likely it is that the content was generated by a human.
GPTZero, a beta tool that Princeton University's Edward Zen created and posted on Streamlit, is a homegrown version. It differs from all the rest in the way the "algiarism" model, (AI-assisted plagiarizing) presents its results. GPTZero divides metrics into perplexity (or burstiness). Perplexity measures the randomness of a sentence, while burstiness measures the overall randomness of a text. The tool assigns a number to each metric - the lower the number, the more likely it is that the text was created automatically by a bot.
Just for fun, I've included the Giant Language Model Test Room (GLTR), developed by researchers at the MIT-IBM Watson AI Lab and Harvard Natural Language Processing Group. It doesn't show its final results in a clear distinction between "human" and "bot" like GPTZero. GLTR uses bots to identify text written or edited by bots. Bots are less likely than humans to select unpredictable word phrases. The results are shown in a color-coded histogram that ranks AI-generated text against human-generated text. The more unpredictable the text, the more likely it is to be human-generated text.
Testing
All of these options might make you think we're in a good place when it comes to AI detection. To test the effectiveness of each tool, I wanted to do it myself. I created some sample paragraphs answering questions that I also asked ChatGPT.
My first question was simple: why is buying a pre-assembled computer considered a bad idea? Here are my answers to ChatGPT's question.
My real writing | ChatGPT | ||||||||||||||||||
GPT-2 Output Detector | 1. 18% fake | 36. 57% fake | |||||||||||||||||
Writer AI | 100% human | 99% human | |||||||||||||||||
Content at Scale | 99% human | 73% human | |||||||||||||||||
GPTZero | 80 perplexity ChatGPT also fooled most of these detectors with its responses. It scored 99% human in the Writer AI Content Detector, for starters, and was flagged only 36% false by the GPT-based detector. Microsoft confirmed that GPT-4 will appear in the week beginning March 13. . GLTR was the worst offender, claiming that my words were just as likely to have been written by a human as ChatGPT.
However, I decided to give it another chance, and this time the answers were much better. ChatGPT was asked to summarize the Swiss Federal Institute of Technology's research on preventing fogging with gold particles. Detection applications did a better job of validating my answer and detecting ChatGPT.
The top three tests really showed their strength in this answer. GLTR was not able to see my writing as a human. However, it was able to catch ChatGPT. ClosingFrom the results of each query, it appears that online plagiarism detectors are not perfect. These applications are able to detect more complex written work (such as the second prompt). However, simpler answers are more difficult for these applications to detect. However, this is not foolproof. Sometimes detection tools can misclassify essays or articles as being generated by ChatGPT. This is a problem for editors or teachers who rely on them to catch cheaters. Developers are constantly improving accuracy and false positive rates. However, they are also preparing for GPT-3, which boasts a much better data set and more complex capabilities than GPT-2 (from which ChatGPT is trained). At this point, educators and editors will have to combine reason with a little human intuition to identify AI-generated content. Chatbot users who are tempted to use chatbots such as ChatGPT, ChatGPT or Notion to pass off their "work" as legitimate should not do so. Whichever way you look at it, repurposing content created in a bot database is plagiarism. |