Over the past few years, artificial intelligence models of language have become very good at certain tasks. Most notably, they are good at predicting the next word in a string of text; The technology helps search engines and messaging apps predict what words you will type next.

The latest generation of predictive language models also seem to be able to learn some basic meanings about language. These models can not only predict the next word, but also perform tasks that seem to require some level of true understanding, such as answering questions, summarizing documents, and completing stories.

Such models are designed to optimize the performance of predicting a particular function, text, without trying to mimic anything about how the human brain performs this task or understands language. But a new study by NEUROscientists at MIT suggests that the basic function of these models is similar to that of speech processing centers in the human brain.

Computer models that perform well on other types of language tasks do not show this similarity to the human brain, providing evidence that the human brain may use next-word prediction to drive language processing.

“The better the model gets at predicting the next word, the closer it gets to the human brain,” says Nancy Kanwisher, walter A. Rosenblitz professor of cognitive Neuroscience and A member of MIT’s McGovern Brain Institute and Center for Brain, Mind, and Machine (CBMM) and author of the new study. It’s amazing how well these models fit, and it suggests very indirectly that maybe what the human language system is doing is predicting what’s going to happen next.”

Joshua Tenenbaum, MIT Professor of Computational Cognitive Science and member of CBMM and MIT Artificial Intelligence Laboratory (CSAIL); And Frederick A. and Carole J. Middleton associate professor of career development in neuroscience and McGovern Institute member Evelina Fedorenko is the senior author of the study, which was published this week in the Proceedings of the National Academy of Sciences. Martin Schrimpf, an MIT graduate student working at CBMM, is the first author of the paper.

To make predictions

The new, high-performance next word prediction model belongs to a class of models called deep neural networks. These networks consist of computing “nodes” that form connections of varying strength and layers that pass information between each other in prescribed ways.

Over the past decade, scientists have used deep neural networks to create visual models that recognize objects like primate brains do. The MIT study also showed that the basic functions of visual object recognition models matched the organization of the primate visual cortex, even though these computer models were not specifically designed to mimic the brain.

In the new study, the MIT team used a similar approach to compare language-processing centers and language-processing models in the human brain. The researchers analyzed 43 different language models, including several that were optimized for predicting the next word. These models include one called GPT-3 (Generative pre-trained Transformer 3), which can generate text similar to what humans produce when given cues. Other models are designed to perform different language tasks, such as filling in the blanks in a sentence.

As each model was presented with a string of words, the researchers measured the activity of the nodes that made up the network. They then compared those patterns to human brain activity measured in subjects performing three language tasks: listening to a story, reading a sentence at a time, and reading a one-word sentence at a time. These human data sets include functional magnetic resonance (fMRI) data and intracranial cortical electrogram measurements in people undergoing brain surgery for epilepsy.

They found that the patterns of activity in the best-performing next-word prediction models were very similar to those in the human brain. The activity in these same models is also highly correlated with measures of human behavior, such as how fast people can read text.

“We found that the models that were good at predicting neural responses also tended to best predict human behavioral responses in the form of reading time. Both are then explained by how well the model predicts the next word. “This triangle really ties everything together,” Schrimpf said.

“A key takeaway from this work is that language processing is a highly constrained problem: as this paper shows, the best solutions created by AI engineers end up being similar to those discovered by the evolutionary processes that created the human brain. Daniel, assistant professor of psychology and computer science at Stanford yaming, (Daniel Yamins) said: “because of the artificial intelligence network does not seek direct imitation of the brain, but in the end really looks like the brain, suggesting that, in a sense, a convergence happened between artificial intelligence and natural evolution”, who was not involved in the study.

Game changer

One of the key computational features of prediction models like GPT-3 is an element called a forward unidirectional prediction transformer. The converter can predict what’s coming next based on the previous sequence. An important feature of the converter is its ability to make predictions based on very long prior backgrounds (hundreds of words), not just the last few words.

Scientists haven’t found any brain circuits or learning mechanisms that correspond to this processing, Tenenbaum said. However, the new findings are consistent with the previously proposed hypothesis that prediction is one of the key functions of language processing, he said.

“One of the challenges of language processing is its real-time nature,” he says. The language comes and you have to keep up with it and be able to understand it in real time.”

The researchers now plan to build variants of these language-processing models to see how small changes in their architecture affect their performance and ability to adapt to human neural data.

“For me, this result is a game changer, “Fedorenko said. It completely changed my research plans, because I didn’t expect that in my lifetime we would reach these computationally explicit models that have enough of a grasp of the brain that we can really use them to understand how the brain works.”

The researchers also plan to try to combine these high-performance language models with some previously developed computer models in Tenenbaum’s lab that can perform other types of tasks, such as building perceptual representations of the physical world.

“If we can understand what these language models do and how they can connect with models that do things more like perception and thinking, then that could give us more comprehensive models of how things work in the brain,” Tenenbaum says. This could lead us to better models of ARTIFICIAL intelligence, and to better models of how more brains work and general intelligence emerges than we’ve had in the past.

This study obtained the takeda study gold, gold day research at the Massachusetts institute of technology, semiconductor, a research firm, the MIT media lab consortium, gold Singleton research at the Massachusetts institute of technology, Massachusetts institute of technology President graduate research gold, gold, friends of the McGovern institute for research the brain at the Massachusetts institute of technology, ideas, and machine center (through National Science Foundation), the National Institutes of Health, MIT’s Department of Brain and Cognitive Sciences, and the McGovern Institute.

The other authors of the paper are Dr Idan Blank, 16, and graduate students Greta Tuckute, Carina Kauf and Eghbal Hosseini.