Photo/Illutration Text from the ChatGPT page of the OpenAI website is shown in New York on Feb. 2. (AP file photo)

ChatGPT, a generative artificial intelligence (AI) chatbot, appears at first sight to be fluent in Japanese, but its Japanese texts remain 100 percent distinguishable from those written by humans, researchers said.

Stylistic features used in criminal investigations were drawn upon for their study, which was published in a journal on Aug. 9.

Concerns have grown that ChatGPT could be used dishonestly for writing university essays, academic papers and other reports.

It has been reported that ChatGPT-generated English-language research papers can be distinguished with high accuracy from those written by humans, but no similar study had been done with respect to Japanese texts.

Wataru Zaitsu, an associate professor of criminal psychology with Mejiro University in Tokyo, and Jin Mingzhe, a specially appointed professor of data science with Kyoto University of Advanced Science, took on the task.

The pair compared 72 existing Japanese-language academic papers on psychology with 144 texts generated by two versions of ChatGPT after being instructed to write the papers under the same titles of those articles.

The researchers mainly compared stylistic features, including the positioning of commas and the sequence of parts-of-speech.

The study found that ChatGPT-generated texts more frequently used a comma following the postpositional particle “wa,” and there was a higher frequency of the prefix “hon,” which means “the present” as in “the present paper,” they said.

On the basis of those features, an AI classifier could distinguish between ChatGPT-generated and human-written Japanese texts with an accuracy of 100 percent, the researchers said.

The study results remained the same with GPT-4, the latest version of the chatbot, which is believed to be more powerful in generating texts.

Zaitsu previously served as a senior researcher with the criminal investigation laboratory of the Toyama prefectural police.

He said the analysis technique in the research using commas and parts-of-speech draws on a method studied in criminal investigations to identify or authenticate the authorship of notes, letters and other writings.

“ChatGPT-generated Japanese texts appear natural to my eyes, but if you rely on data, you can tell them apart from human-written texts without much difficulty,” Zaitsu said. “Doing so is easier than distinguishing between texts written by different humans.”

Zaitsu added that other text-generative AI chatbots produce Japanese texts that are closer to those written by humans, so he hopes to continue studying methods for distinguishing them.

The research paper was published in the “Plos One” science journal (https://doi.org/10.1371/journal.pone.0288453).