AI models could become more flexible and effective thanks to two new neural network architectures, which could alter how AI develops and learns
Two AI research teams announced breakthroughs that could change how AI works. Google and a Japanese company called Sakana have created new ways to build AI that might work better than current approaches. Right now, most AI (including ChatGPT) uses something called transformers. Think of transformers like a reader who looks at every word concerning other words to understand the meaning. While this works well, it has some limitations.
As explained here, both Google and Sakana looked at how the human brain works to create better AI systems. Instead of having the entire AI system work on every task, they made systems that can activate different parts for different jobs—just like how your brain uses different regions for different tasks. The result? Their AIs can be smarter and faster without needing to be bigger or more expensive.
This is important because, until now, making AI better usually meant making it bigger. Here is what they created: Google’s “Titans”: Imagine the different types of memories of your brain—one for things that just happened (like remembering what you had for breakfast) and one for long-term knowledge (like riding a bike). Current AI is great at short-term memory but bad at long-term memory. Google’s Titans fixes this by creating three types of memory:
Short-term memory (like current AI), Long-term memory (for remembering things over time), Persistent memory (for storing specific knowledge)
This means Titans can handle much more information at once. While current AI might struggle to remember something from the beginning of a long conversation, Titans can handle text that’s about 40 times longer! Sakana’s “Transformer Squared” is like switching between different skills without having to relearn everything. When you switch from cooking to playing basketball, your brain adjusts automatically. Sakana’s system works similarly:
It first figures out what kind of task it’s dealing with. Then it activates the specific “expert” parts needed for that task, and it can adjust itself in real-time without needing to be retrained.
The clever part is that it only updates the parts it needs to, rather than changing everything. While it might take a bit longer to think (like how you might pause before switching tasks), it’s much more efficient overall. This is important because currently, AI companies often compete by creating bigger and bigger models. It’s like saying, “My AI is better because it has more brain cells.” However, these new approaches suggest that smarter organization might be more important than size. Today’s AI systems often need extra tools (like RAG or LoRAs ((Low-Rank Adaptation)) to enhance their capabilities. But if these new approaches prove successful, we might see a fundamental shift in how AI is built. In the fast-moving world of AI, it often takes just one breakthrough to change everything, and either of these innovations could be that breakthrough. What’s particularly exciting is that these improvements don’t require massive increases in computing power or cost. Instead, they work smarter, not harder—just like our brains do.
This shift could democratize AI development, making it more accessible to smaller companies and researchers who lack the massive computational resources currently required for state-of-the-art AI systems. It could also address one of AI’s biggest environmental concerns: the enormous energy consumption required for training and running large models.
Moreover, these architectures’ ability to handle longer contexts and adapt to new tasks more efficiently could open up entirely new applications for AI, for example developing more personalized AI assistants that learn and adapt to individual users’ needs.