GPTZero: How to detect ChatGPT plagiarism

As far as world-changing technologies go, ChatGPT has truly had a massive impact on the way people think about writing and coding in the short time it has been available.

However, this ability also comes with a significant downside, particularly in the education sector, where students are tricked into using ChatGPT for their own papers or exams. This type of plagiarism prevents students from learning as much as they could and has created a whole new problem for teachers: how to detect the use of AI?

Teachers and other users are now looking for ways to detect ChatGPT use in students’ work, and many are turning to tools like GPTZero, a ChatGPT detection tool developed by Princeton University student Edward Tian. The software is available to anyone, so if you want to try it out and determine the likelihood that a particular section of text was written using ChatGPT, here’s how to do it.

What is GPTZero?

A MidJourney depiction of a student and his robot friend in front of a blackboard. — In the middle of the journey

GPTZero is a web app and service designed to detect whether a text was written by a human or an artificial intelligence. Currently, the system appears to be able to detect the output of a variety of large language models, including ChatGPT, GPT-4, and Claude, as well as whether the text was written by a human in collaboration with an AI.

It was developed and first released in January 2023 by Edward Tian, a 22-year-old computer science student at Princeton University and former software engineering intern at Microsoft. Announcing the platform on X (formerly Twitter), Tian pointed out that the analysis was based on research by Princeton PhD student Sreejan Kumar and work by Princeton’s Natural Language Processing Group.

The analysis is based on ongoing research with and @sreejan_kumar And @princeton_nlp. hopefully we’ll release something empirical soon. But in the meantime, it was fun making this app 🙂

– Edward Tian (@edward_the6) January 3, 2023

Is GPTZero free?

GPTZero is designed for educators, but anyone can use it for free. A free account lets you scan 40 documents per hour and access the GPTZero dashboard. The Essential plan for $10/month scans up to 150,000 words per month and grants access to “Premium” AI detection models, as well as “Plagiarism Scan” and “Advanced Grammar and Writing Skills” feedback. The Premium package for $16/month increases the word count to 300,000 per month and offers “Advanced AI Deep Scan” and multilingual AI detection in addition to the Pro tier benefits. The top-tier Professional subscription for $16/month offers 500,000 words per month and an additional 10 million words “in excess.” That’s a lot of alleged plagiarism.

Is GPTZero accurate?

While GPTZero touts its service as extremely powerful, Some users have noted that the service’s accuracy is “inconsistent, as it often flags human-written text as AI-generated and has issues with certain types of generated text.” At the suggestion of Reddit user Smellz_Of_Elderberry, I asked ChatGPT to write a short story about the book The Old Man and the Sea as if it were a high school student. GPTZero was not fooled.

ChatGPT writes as if he were a high school student — Image used with permission of the copyright holder

I tried again, adding some wrong punctuation, wrong tense, and other small errors in the text, but GPTZero still stated: “Your text was probably written entirely by an AI.”

The scan correctly guessed the AI origin of a passage even when text generators other than Claude or GPT-4 were used. I have given Gemini 1.5 Pro a separate report on The Old Man and the Sea but GPTZero noticed that too.

The accuracy of GPTZero is still being assessed, but from these anecdotal tests it seems to work well.

When using GPTZero, you must remember that errors are possible. If you use GPTZero to detect AI or ChatGPT to write a document, you still need to check the work for errors.

How does GPTZero work?

GTPZero’s AI text scoring includes statistics on confusion and burstiness. — Image used with permission of the copyright holder

GPTZero analyzes the randomness of texts, the so-called perplexity, and the uniformity of this randomness within the text, which in statistics is called burstiness. An AI is very consistent in its perplexity and burstiness, while human authors unconsciously vary these characteristics.

The work is ongoing, and Tian notes that more tests will be added to improve the accuracy of AI text recognition. In particular, implicit bias is one area being investigated as another way to determine if the text was generated by an AI.

We are currently still investigating implicit bias in LM-generated text and will hopefully add a few more tests and factors to improve the model

– Edward Tian (@edward_the6) January 3, 2023

How can I use GPTZero?

GPTZero is available on its website. Simply copy the text you want to check and paste it into the large box labeled Try it out.

GPTZero's website is quite simple and consists of a text field and a submit button. — Image used with permission of the copyright holder

You can also upload a PDF, Word or text file and paste it into the Get results button. You must also check the box to confirm that you agree to the Terms of Service.

Alternatives to GPTZero

GPTZero is not the only AI-powered plagiarism detector on the market. OpenAI offers its GPT-2 Output Detector has reportedly developed an updated version, although there is no information on if and when this will be released. Content at Scale AI Content Detection, ZeroGPT (I’m not sure how this got past the trademark office), Writefull GPT Detector, and Originality.ai all offer similar services with varying degrees of accuracy.

Why are my texts marked as AI?

With the rise of ChatGPT and the development of AI detection tools, both authors and readers now have new concerns about how to tell if content was created by AI and whether real texts are being flagged as coming from an AI. This is especially a concern for students, who face consequences from their schools or universities if they are found to be using AI. Some students now regularly run their own original work through detectors like GPTZero and find that sentences are being flagged as being written by AI even when they are not.

In 2024, an author for The Atlantic, Ian Bogost, described He ran his own original work through plagiarism detection software and found that an astonishing 74% of his work was initially flagged as plagiarism. Through careful review and elimination, he managed to reduce this number to zero, but it took him several hours of reviewing and adjusting settings to achieve this goal.

AI detection is similar to plagiarism detection in that both can only give best guesses about whether or not a piece of text is original and created by humans. And these tools require very careful review as both can tend to produce false positives. If you find that your work is flagged as AI-generated on GPTZero when it is not, it could be for very different reasons, such as you not being a native English speaker, repeating your ideas too much, or using a tool like Grammarly.

If your work is flagged as AI, double-check that all citations and references are properly formatted, and avoid using automated tools like Grammarly to make edits if possible.

And remember, this is GPTZero’s black box, a “trade secret,” a proprietary algorithm that claims your text is statistically similar to other examples found all over the public internet. The company will not explain in court how its product actually works, nor will it accurately prove it, so if you find yourself in danger for alleged generative plagiarism, it’s your word against the company’s. Get a lawyer and let them prove their work.

Do we really need a plagiarism checker?

OpenAI goes far beyond the research labs to which many text generation AIs are tied and has ChatGPT was launched to the public in late November 2022. As of January 2023, ChatGPT had over 100 million users, making it the fastest growing public application to date.

This means that concerns about plagiarism will only grow as this AI assistance becomes available in all areas of life. Microsoft is integrating OpenAI’s technology into Bing search and Google is testing its own version, known as Gemini (formerly Bard).

A color painting of a laughing robot, generated by Dall-E. — Image used with permission of the copyright holder

On a related note, AI image generators such as Dall-E and Stable Diffusion are being investigated for possible copyright infringement. All of these artificial intelligence services have been trained on text, photos and artwork found on the internet and created by billions of people.

In a sense, AI is using human intelligence without creating anything itself. If I take something from another human, I have to cite the source and potentially pay a royalty fee. With generative AI, it becomes more difficult to cite a source because each text or image is broken down into diffuse elements and then reassembled using thousands or millions of sources to create a new work.

We need to either rethink our attitudes toward copyright and plagiarism, or find tools to help identify AI-generated material and perhaps develop a method to acknowledge the large number of people who contribute to each AI-generated work.