Why large language models aren’t headed toward humanlike understanding

Generative AI is not very skillful at applying what it learns to new situations

Two abstract heads look at each other. One has a computer brain and the other has a real human brain.

As impressive as they seem, the latest computer brains usually fail at tasks that require generalizing concepts — tasks that come easily to humans.

DrAfter123/GETTY IMAGES

Apart from the northward advance of killer bees in the 1980s, nothing has struck as much fear into the hearts of headline writers as the ascent of artificial intelligence.

Ever since the computer Deep Blue defeated world chess champion Garry Kasparov in 1997, humans have faced the prospect that their supremacy over machines is merely temporary. Back then, though, it was easy to show that AI failed miserably in many realms of human expertise, from diagnosing disease to transcribing speech.

But then about a decade ago or so, computer brains — known as neural networks — received an IQ boost from a new approach called deep learning. Suddenly computers approached human ability at identifying images, reading signs and enhancing photographs — not to mention converting speech to text as well as most typists.

Those abilities had their limits. For one thing, even apparently successful deep learning neural networks were easy to trick. A few small stickers strategically placed on a stop sign made an AI computer think the sign said “Speed Limit 80,” for example. And those smart computers needed to be extensively trained on a task by viewing numerous examples of what they should be looking for. So deep learning produced excellent results for narrowly focused jobs but couldn’t adapt that expertise very well to other arenas. You would not (or shouldn’t) have hired it to write a magazine column for you, for instance.

But AI’s latest incarnations have begun to threaten job security not only for writers but also a lot of other professionals.

“Now we’re in a new era of AI,” says computer scientist Melanie Mitchell, an artificial intelligence expert at the Santa Fe Institute in New Mexico. “We’re beyond the deep learning revolution of the 2010s, and we’re now in the era of generative AI of the 2020s.”

Generative AI systems can produce things that had long seemed safely within the province of human creative ability. AI systems can now answer questions with seemingly human linguistic skill and knowledge, write poems and articles and legal briefs, produce publication quality artwork, and even create videos on demand of all sorts of things you might want to describe.

Many of these abilities stem from the development of large language models, abbreviated as LLMs, such as ChatGPT and other similar models. They are large because they are trained on huge amounts of data — essentially, everything on the internet, including digitized copies of countless printed books. Large can also refer to the large number of different kinds of things they can “learn” in their reading — not just words but also word stems, phrases, symbols and mathematical equations.

By identifying patterns in how such linguistic molecules are combined, LLMs can predict in what order words should be assembled to compose sentences or respond to a query. Basically, an LLM calculates probabilities of what word should follow another, something critics have derided as “autocorrect on steroids.”

Even so, LLMs have displayed remarkable abilities — such as composing texts in the style of any given author, solving riddles and deciphering from context whether “bill” refers to an invoice, proposed legislation or a duck.

“These things seem really smart,” Mitchell said this month in Denver at the annual meeting of the American Association for the Advancement of Science.

LLMs’ arrival has induced a techworld version of mass hysteria among some experts in the field who are concerned that run amok, LLMs could raise human unemployment, destroy civilization and put magazine columnists out of business. Yet other experts argue that such fears are overblown, at least for now.

At the heart of the debate is whether LLMs actually understand what they are saying and doing, rather than just seeming to. Some researchers have suggested that LLMs do understand, can reason like people (big deal) or even attain a form of consciousness. But Mitchell and others insist that LLMs do not (yet) really understand the world (at least not in any sort of sense that corresponds to human understanding).

“What’s really remarkable about people, I think, is that we can abstract our concepts to new situations via analogy and metaphor.”

Melanie Mitchell

In a new paper posted online at arXiv.org, Mitchell and coauthor Martha Lewis of the University of Bristol in England show that LLMs still do not match humans in the ability to adapt a skill to new circumstances. Consider this letter-string problem: You start with abcd and the next string is abce. If you start with ijkl, what string should come next?

Humans almost always say the second string should end with m. And so do LLMs. They have, after all, been well trained on the English alphabet.

But suppose you pose the problem with a different “counterfactual” alphabet, perhaps the same letters in a different order, such as a u c d e f g h i j k l m n o p q r s t b v w x y z. Or use symbols instead of letters. Humans are still very good at solving letter-string problems. But LLMs usually fail. They are not able to generalize the concepts used on an alphabet they know to another alphabet.

“While humans exhibit high performance on both the original and counterfactual problems, the performance of all GPT models we tested degrades on the counterfactual versions,” Mitchell and Lewis report in their paper.

Other similar tasks also show that LLMs do not possess the ability to perform accurately in situations not encountered in their training. And therefore, Mitchell insists, they do not exhibit what humans would regard as “understanding” of the world.

“Being reliable and doing the right thing in a new situation is, in my mind, the core of what understanding actually means,” Mitchell said at the AAAS meeting.

Human understanding, she says, is based on “concepts” — basically mental models of things like categories, situations and events. Concepts allow people to infer cause and effect and to predict the probable results of different actions — even in circumstances not previously encountered.

“What’s really remarkable about people, I think, is that we can abstract our concepts to new situations via analogy and metaphor,” Mitchell said.

She does not deny that AI might someday reach a similar level of intelligent understanding. But machine understanding may turn out to be different from human understanding. Nobody knows what sort of technology might achieve that understanding and what the nature of such understanding might be.

If it does turn out to be anything like human understanding, it will probably not be based on LLMs.

After all, LLMs learn in the opposite direction from humans. LLMs start out learning language and attempt to abstract concepts. Human babies learn concepts first, and only later acquire the language to describe them.

So LLMs are doing it backward. In other words, perhaps reading the internet might not be the correct strategy for acquiring intelligence, artificial or otherwise.

Tom Siegfried is a contributing correspondent. He was editor in chief of Science News from 2007 to 2012 and managing editor from 2014 to 2017.