Researchers discover simple functions at the core of complex Language Models
Large language models are extremely sophisticated; examples of these include those seen in widely used artificial intelligence chatbots like ChatGPT. Scientists still don’t fully understand how these models work, despite the fact that they are employed as tools in numerous fields, including language translation, code development, and customer assistance.
To gain further insight into the inner workings of these huge machine-learning models, researchers from MIT and other institutions examined the techniques involved in retrieving stored knowledge.
According to this article, they discovered an unexpected finding: To retrieve and decode stored facts, large language models (LLMs) frequently employ a relatively basic linear function. Additionally, the model applies the same decoding function to facts of a similar kind. The simple, straight-line relationship between two variables is captured by linear functions, which are equations with just two variables and no exponents.
The researchers demonstrated how they could probe the model to find out what it knew about new subjects and where that knowledge was stored within the model by identifying linear functions for various facts.
The researchers discovered that even in cases where a model provides an inaccurate response to a prompt, it frequently retains accurate data by employing a method they devised to calculate these simple functions. In the future, this method could be used by scientists to identify and fix errors inside the model, which could lessen the model’s propensity to occasionally produce erroneous or absurd results.
“Even though these models are really complicated, nonlinear functions that are trained on lots of data and are very hard to understand, there are sometimes really simple mechanisms working inside them. This is one instance of that,” says Evan Hernandez, an electrical engineering and computer science (EECS) graduate student and co-lead author of a paper detailing these findings.
Hernandez collaborated on the paper with senior author David Bau, an assistant professor of computer science at Northeastern; others at MIT, Harvard University, and the Israeli Institute of Technology; co-lead author Arnab Sharma, a graduate student at Northeastern University studying computer science; and his advisor, Jacob Andreas, an associate professor in EECS and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The International Conference on Learning Representations is where the study will be presented.
Finding facts
Neural networks make up the majority of large language models, also known as transformer models. Neural networks, which are loosely modeled after the human brain, are made up of billions of interconnected nodes, or neurons, that encode and process data. These neurons are arranged into numerous layers.
A transformer’s knowledge can be modeled mostly in terms of relations between subjects and objects. An example of a relation connecting the subject, Miles Davis, and the object, trumpet, is “Miles Davis plays the trumpet.”
A transformer retains more information on a certain topic across several levels as it gains more knowledge. In order to answer a user’s question regarding that topic, the model must decode the most pertinent fact.
When a transformer is prompted with the phrase “Miles Davis plays the…” instead of “Illinois,” which is the state of Miles Davis’ birth, it should say “trumpet.”
“Somewhere in the network’s computation, there has to be a mechanism that goes and looks for the fact that Miles Davis plays the trumpet, and then pulls that information out and helps generate the next word. We wanted to understand what that mechanism was,” Hernandez says.
Through a series of studies, the researchers investigated LLMs and discovered that, despite their immense complexity, the models use a straightforward linear function to decode relational information. Every function is unique to the kind of fact that is being retrieved.
To output the instrument a person plays, for instance, the transformer would use one decoding function, while to output the state of a person’s birth, it would use a different function.
The researchers computed functions for 47 distinct relations, including “capital city of a country” and “lead singer of a band,” after developing a method to estimate these simple functions.
Although the number of possible relationships is infinite, the researchers focused on this particular subset since they are typical of the kinds of facts that can be written in this manner.
To see if each function could recover the right object information, they changed the subject for each test. If the subject is Norway, the function of the “capital city of a country” should return to Oslo; if the subject is England, it should return to London.
Over 60% of the time, functions were able to extract the proper information, indicating that some information in a transformer is encoded and retrieved in this manner.
“But not everything is linearly encoded. For some facts, even though the model knows them and will predict text that is consistent with these facts, we can’t find linear functions for them. This suggests that the model is doing something more intricate to store that information,” he says.
Visualizing a model’s knowledge
They also employed the functions to determine the veracity of a model’s beliefs regarding certain subjects.
In one experiment, they began with the instruction “Bill Bradley was a” and tested the model’s ability to recognize that Sen. Bradley was a basketball player who went to Princeton by using the decoding functions for “plays sports” and “attended university.”
“We can show that, even though the model may choose to focus on different information when it produces text, it does encode all that information,” Hernandez says.
They created what they refer to as an “attribute lens,” a grid that shows where precise details about a certain relationship are kept inside the transformer’s multiple layers using this probing technique.
It is possible to automatically build attribute lenses, which offers a simplified way to help researchers learn more about a model. With the use of this visualization tool, engineers and scientists may be able to update knowledge that has been stored and stop an AI chatbot from providing false information.
In the future, Hernandez and his associates hope to learn more about what transpires when facts are not kept sequentially. In addition, they would like to investigate the accuracy of linear decoding functions and conduct tests with larger models.
“This is an exciting work that reveals a missing piece in our understanding of how large language models recall factual knowledge during inference. Previous work showed that LLMs build information-rich representations of given subjects, from which specific attributes are extracted during inference. This work shows that the complex nonlinear computation of LLMs for attribute extraction can be well-approximated with a simple linear function,” says Mor Geva Pipek, an assistant professor in the School of Computer Science at Tel Aviv University, who was not involved with this work.
The Israeli Science Foundation, Open Philanthropy, and an Azrieli Foundation Early Career Faculty Fellowship provided some funding for this study.
While this research provides valuable insights into how large language models encode and retrieve certain types of factual knowledge, it also highlights that there is still much to uncover about the inner workings of these extremely complex systems. The discovery of simple linear functions being used for some fact retrieval is an intriguing finding, but it seems to be just one piece of a highly intricate puzzle.
As the researchers noted, not all knowledge appears to be encoded and accessed via these linear mechanisms. There are likely more complex, nonlinear processes at play for other types of information storage and retrieval within these models. Additionally, the reasons why certain facts get decoded incorrectly, even when the right information is present, remain unclear.
Moving forward, further research is needed to fully map out the pathways and algorithms these language AIs use to process, store, and produce information. The “attribute lens” visualization could prove to be a valuable tool in this endeavor, allowing scientists to inspect different layers and factual representations within the models.
Ultimately, gaining a more complete understanding of how these large language models operate under the hood is crucial. As their capabilities and applications continue to expand rapidly, ensuring their reliability, safety, and alignment with intended behaviors will become increasingly important. Peering into their mechanistic black boxes through methods like this linear decoding analysis will be an essential part of that process.