Large language models such as OpenAI’s GPT-3 can display the ability to solve complex reasoning tasks which humans crack using analogies.
Researchers have presented GTP-3 – first released in 2020 – with tasks to try to understand its apparent capacity for reasoning by analogy, a cornerstone of human reason which allows people to solve a novel problem by comparing it to an earlier known one.
Taylor Webb, a post-doctoral researcher at the University of California, Los Angeles, and his colleagues, presented the popular LLM with text-based matrix reasoning problems, letter-string analogies, verbal analogies and story analogies, all of which can be solved by applying an established pattern to a new situation.
“We have presented an extensive evaluation of analogical reasoning in a state-of-the-art LLM. We found that GPT-3 appears to display an emergent ability to reason by analogy, matching or surpassing human performance across a wide range of text-based problem types,” their paper, published in Nature Human Behaviour today, said.
The question remains over how the statistical model does it. Webb and his colleagues argue one possibility is that, the sheer size and diversity of GPT-3’s training data, has forced it “to develop mechanisms similar to those thought to underlie human analogical reasoning — despite not being explicitly trained to do so.”
But while analysts of cognitive science tend to agree that humans reason by analogy using a “systematic comparison of knowledge based on explicit relational representations,” the researchers said they were unsure how GPT-3 would implement these processes.
“Does GPT-3 possess some form of emergent relational representations, and if so, how are they computed? Does it perform a mapping process similar to the type that plays a central role in cognitive theories of analogy?” the paper asked.
In the absence of a deeper understanding of how the model might arrive at its answers, the researchers speculate that the ability may come from its “transformer architecture” which is common among LLMs. This may be similar to the cognitive models of analogy.
“But although the mechanisms incorporated into LLMs such as GPT-3 may have some important links to building blocks of human reasoning, we must also entertain the possibility that this type of machine intelligence is fundamentally different from the human variety,” the paper said.
The authors also pointed out that GPT-3 had been trained on a huge corpus of human language, which itself is the output of human evolution and rich with analogy.
“Thus, to the extent that LLMs capture the analogical abilities of adult human reasoners, their capacity to do so is fundamentally parasitic on natural human intelligence,” the paper posited.
Since the launch of GPT-4 captured the public imagination with its ability to perform tasks to a somewhat human level, such as writing poetry and computer code, a debate has ranged about whether LLMs can reason in the same way humans can. Meanwhile, it has also been observed that the models can also “hallucinate” information and make deductive errors – both a human feature and supremely unhelpful to those meatbags hoping to save time by using them.
Melanie Mitchell, a computer scientist at the Santa Fe Institute in New Mexico, and her team have found limits to their ability to reason with simple visual puzzles called ConceptARC. Humans score over 90 percent in the test, while GPT-4 records results just above 30 per cent.
“We showed that the machines are still not able to get anywhere near the level of humans,” Mitchell told Nature magazine. “It was surprising that it could solve some of the problems, because it had never been trained on them,” she said. ®