That ChatGPT passes academic exams is useless

These days we have ChatGPT even in the soup. That’s right. OpenAI’s artificial intelligence chatbot has taken the world by storm thanks to its impressive capabilities. From responding to any question that is asked, to generating complete texts on the subject you imagine, among many others. Although it is likely that nothing has provoked as many reactions as his “ability” to pass exams from law, business and even medical schools, from some of the most prestigious universities in the United States. But does it really matter?

Don’t be misinterpreted. That ChatGPT has been able to pass the aforementioned exams is yet another example of the immense potential of the language model on which it is based. But we should not give it much more relevance than that, since it is not a special skill that today places it one step away from replacing doctors, lawyers or other professionals. Nothing is further from reality.

The problem is in the approach that is given in the media to these achievements of ChatGPT. Something that, most of the time, is transferred to social networks, where it is amplified. With a quick search of Twitter we can find dozens of viral tweets highlighting the AI ​​chatbot’s ability to pass college exams, but without providing even a shred of context as to why anyone thought of putting it through this in the first place. type of tests. Or if certain issues were more challenging than others for resolution.

But if ChatGPT can pass a medical exam today, doctors will be obsolete in a few years, right? I’m sorry to tell you, friends, that the picture is much more complex than that. And it doesn’t matter how advanced AI is, or how much it will evolve in the coming years. The fact that today the bot is capable of passing an academic test does not make it a doctor, lawyer or economist. And for a simple reason: we cannot evaluate it in human terms.

ChatGPT’s “ability” to pass exams does not prove much

The artificial intelligence that powers ChatGPT has been trained on millions of pages of publicly available content on the web. Therefore, it is logical that you can pass a university exam in seconds. After all, unlike a person, you don’t need weeks or months of hard study to learn, understand, and retain the important information you need to answer the questions your teachers present.

What’s more, you don’t even have to worry about retaining the data because you can refer back to it when needed, without much effort. In addition, those who decided to submit ChatGPT to the exams of law or medicine did not do so with the intention of proving that artificial intelligence can do the work of human professionals with less effort.

Jon Choi, a law professor at the University of Minnesota, explained that his intention was test the potential of ChatGPT when helping students to complete exams, or lawyers during their practice. “ChatGPT struggled with the more classic components of law school exams, such as spotting potential legal issues and in-depth analysis of the application of legal rules to the facts of a case. But it could be very useful for producing a first draft that a student could then refine,” he explained.

In the four courses that the OpenAI chatbot completed in law school, its performance was not particularly outstanding. According to the teachers in charge of the corrections, the marks of the AI ​​platform were equivalent to those of a student who obtains a C+. That is, a low grade, but enough to pass the exams.

While in a business management course at the University of Pennsylvania, he did better. There she earned B and B- grades. According to a professor, she excelled in questions about operations management and process analysis. However, she presented difficulties in more complex exercises. To the point that she made “astonishing mistakes” when dealing with basic math.

On the other hand, ChatGPT was used to complete the United States Medical Licensing Exam. The chatbot surprised the researchers, who noted that it completed the questions at a level close to or above the pass threshold. However, they also indicated that its potential could be helpful in the educational process for those who aspire to become doctors, or even to assist in decision-making in the clinical setting in the future. But nothing to replace doctors immediately.

The old dilemma of anthropomorphizing AI

ChatGPT, OpenAI

Saying that ChatGPT’s “ability” to pass academic exams today has no value sounds harsh, but it’s not a hater stance. On the contrary, he seeks to put his feet on the ground by talking about the qualities of the artificial intelligence chatbot and its implications for everyday life.

Until we can hire an AI to defend us in court, we can’t say it will have replaced lawyers. Until it is capable of curing us without any type of human intervention, we cannot say that it will have replaced doctors, nurses or specialists. And the same applies to any other type of profession that is more or less affected by ChatGPT or tools of this type.

One of the great dilemmas of dealing with advanced language models, or those that operate on an ever-increasing scale, is their anthropomorphization. That is to say, that the people who interact with them give them gifts or human qualities, even if they do not have them. This is not something new, and goes back to the first experiments with natural language processing software in the 1960s. Although, certainly, it has taken on another dimension in more recent years thanks to the evolution of AI.

It is clear that ChatGPT does not escape this situation. In just a couple of months, the public’s interaction with advanced artificial intelligence tools has drastically changed, which, until not long ago, were inaccessible to the common people. But it still has to resolve serious drawbacks, such as biases and the incorporation of erroneous —or outright false— data that can pass for true if they are not given due attention. Something that can be aggravated if it is believed that there is some kind of conscience behind the generation of the response.

This problem has already been noticed in the past by some of the most prominent researchers in the world of AI. Margaret Mitchell and Timnit Gebru, who served at Google until 2020, described it this way:

The tendency of human interlocutors to impute meaning where there is none may mislead both natural language processing researchers and the general public into regarding synthetic text as meaningful. […]

The text generated by a language model is not based on the communicative intention, any model of the world, or any model of the reader’s mental state. […]

The problem is that if one side of the communication is meaningless, then understanding the implicit meaning is an illusion arising from our unique human understanding of language (regardless of the model). Contrary to what it may seem when we look at what it generates, a language model is a system for randomly joining sequences of language forms that it has observed in its vast training data, based on probabilistic information about how they are combined, but without No reference to meaning. It is a stochastic parrot.

Excerpt from “On the Perils of Stochastic Parrots: Can Language Models Be Too Large?”.

ChatGPT still has a lot of room to improve and evolve, and nothing suggests that it is not for the better. But to believe that by passing academic exams you are ready to govern more important areas of our daily life, is nothing more than a delusion. Let’s enjoy technology and its achievements, but let’s not fall into assumptions fueled by a lack of context, or by the hype behind a new product.

View Hide summary