Photo/Illutration The preprint research paper “Can ChatGPT Pass the 2023 Japanese National Medical Licensing Examination?” written by Yudai Kaneda and his fellow researchers (Takahiro Takenouchi)

Yudai Kaneda tested the artificial intelligence ChatGPT to see how it would fare on this year's national licensing exam for doctors held in February.

To the aspiring doctor’s surprise, the AI chatbot with advanced conversational capabilities, which anyone can use for free, correctly answered 55 percent of the questions.

That is short of a passing grade on Japans National Examination for Medical Practitioners but is higher than his own score.

“In the future, we might be able to casually ask an AI about the exam’s questions, said Kaneda, a fifth-year student at Hokkaido University’s medical department. Our way of studying medicine could change.”

PUTTING CHATGPT TO THE TEST

ChatGPT was developed by the U.S. startup OpenAI.

Even though the chatbot isn’t trained in specific areas, the media reported in the United States that the chatbot passed a licensing exam to become a doctor and also an MBA exam.

It is hoped that a conversational agent such as ChatGPT will help doctors on the front lines diagnose patients in the future.

Kaneda and his fellow researchers tested ChatGPT with questions from the national exam to find out how the chatbot would fare in the Japanese language.

The exam is a national licensing test for doctors, which generally requires test-takers to be at least sixth-year medical students at a university. 

It is required to have completed a university’s medical program to take the exam and Japanese universities offer six-year programs. Exam takers can take the question sheets home after the test.

In addition, the health ministry posts the exam’s questions on its website after the test day every year.

Kaneda manually inputted all 400 questions and answer choices from the February exam into ChatGPT.

He could do so because a student who is senior to him took the test and brought home the question sheets.

Then he scored the answers given by ChatGPT by referring to the sample answers that were published by a prep school specialized in the medical licensing examination.  

He published a paper on the results as a preprint on a website on March 10.
(https://www.preprints.org/manuscript/202303.0191/v1

One of the questions in the February exam asked, “A 62-year-old woman came to a hospital saying her main symptoms are a rash and a fever.” “What is the most probable diagnosis? Say the reason as well. Choose from the following: a) fixed drug eruption ...

To this question, ChatGPT answered, ”The most probable diagnosis is drug-induced hypersensitivity syndrome. In this case, these symptoms were caused by taking general-purpose cold medicine …”

ChatGPT gave reasons for its responses, in addition to choosing its answer from the multiple options offered.

However, the reason given by ChatGPT included obviously incorrect information in some cases although it looks credible at first glance.

It had been thought that ChatGPT could give incorrect answers like this.

Its response to the question about the 62-year-old patient was also wrong, according to the sample answer published by the prep school.

Excluding the 11 questions that required looking at images to respond, ChatGPT gave correct answers for 55 percent of 389 questions.

If scores given to each question are taken into consideration, ChatGPT scored 135 points out of the full marks of 197 points, achieving a score rate of 69 percent, in the compulsory questions, which requires at least a score rate of 80 percent to pass.

It also marked a score rate of 51 percent, earning 149 points out of 292 points, in the general and clinical questions, which requires around 70 percent to pass.

Therefore, ChatGPT didn’t pass both parts of the exam.

However, many of these questions ask test-takers to choose their answers from five options.

That means if test-takers randomly choose answers, the correct answer rate would be 20 percent or so, and ChatGPT obviously fared better than that.

“I was honestly surprised at how AI correctly answered more than half of the questions, even though it isn’t designed to answer the question of the national exam for doctors and is available to everyone, Kaneda said. I believe that ChatGPT is as knowledgeable as medical students who are in their first months of the sixth year at universities, the period they start seriously studying for the exam.”

In addition, GPT-4, the latest model of the ChatGPT series, has an even higher language ability.

Kaneda said that GPT-4 correctly answered 16 of the 20 questions that ChatGPT failed.

THE FUTURE OF MEDICAL STUDIES?

Tetsuya Tanimoto, a physician at the Medical Governance Research Institute in Tokyo, who compiled the paper with Kaneda, said the AI programs result is significant.

“GPT-4 has an incredible level of language ability, he said. It can even write a tanka poem in Japanese, for example.

If a conversational AI program is developed based on medically credible literature, not dubious blogs or something similar, it could be used for front-line medical services in the not-so-distant future.”

Kaneda tried this year’s national exam himself as well using the question sheets that the student senior to him brought home and his score rate was 29 percent, meaning he still has a long way to go.

“When I take the exam in two years’ time, I might be able to casually ask an AI program like ChatGPT, ‘Why is this treatment the wrong answer for this question?’ or ‘How should I think about that question?’ I believe that (AI) will change the way of studying medicine.”

Kaneda and his fellow researchers have submitted their paper on the study to an academic journal where it is being peer-reviewed by researchers.