A Chinese startup with a language model larger than Llama 2 and Falcon

According to this article, the 34-billion-parameter large language model (LLM) from 01.AI, a Chinese startup started by seasoned AI expert and investor Kai-Fu Lee, beats the 70-billion Llama 2 open-source counterparts developed by Meta Platforms, Inc. and 180-billion Falcon by the Technology Innovation Institute in Abu Dhabi, respectively.

The new artificial intelligence model, known as Yi-34B, can be adjusted for a range of use cases and supports both Chinese and English. Additionally, the startup provides a smaller version that performs worse on popular AI/ML model benchmarks while maintaining respectable performance. This version has been trained with 6 billion parameters.

In due course, the company—which achieved unicorn status [a startup company valued at over US$1 billion] in less than eight months after its founding—aims to expand on these models and introduce a product that can compete with OpenAI, the industry leader in generative AI as measured by user count.

The approach draws attention to a global trend in which multinational corporations are creating generative AI models primarily for their own markets.

Human and AI

In March, Lee established 01.AI intending to advance to an AI 2.0 era in which huge language models have the potential to boost human productivity and enable people to make profound changes in the economy and society.

“The team behind 01.AI firmly believes that the new AI 2.0, driven by foundation model breakthrough, is revolutionizing technology, platforms, and applications at all levels. We predict that AI 2.0 will create a platform opportunity ten times larger than the mobile internet, rewriting all software and user interfaces. This trend will give rise to the next wave of AI-first applications and AI-empowered business models, fostering AI 2.0 innovations over time”, the company writes on its website.

Lee reportedly moved quickly to gather the necessary chips for 01.AI’s Yi series of model training, as well as an AI team of specialists from Google, Huawei, and Microsoft Research Asia.

Alibaba’s cloud division and Sinovation Ventures, which Lee chairs, provided the majority of the project’s original funding. The precise sum raised, though, is still unknown at this time.

Two multilingual (English/Chinese) base models with parameter sizes of 6B and 34B were released by the company in its first public release. Both models were trained with 4K sequence lengths, with the possibility of increasing to 32K during inference time. The models were later released with a 200K context length.

With a performance better than the considerably larger pre-trained base LLMs, such as the Llama 2-70B and Falcon-180B, the base 34B model stood out on Hugging Face.

For instance, the 01.AI model produced scores of 80.1 and 76.4 on the benchmarked tests that focused on reading comprehension and common reasoning, while Llama 2 closely trailed behind with scores of 71.9 and 69.4. The Chinese model performed higher even on the massive multitask language understanding (MMLU) benchmark, scoring 76.3 compared to 68.9 and 70.4 for the Llama and Falcon models, respectively.

End users could be able to fine-tune the model and create apps that target various use cases at a lower cost if a smaller model with higher performance saves compute resources. The company states that academic research is welcome on any models in the current Yi series. Teams will need to secure the necessary permissions to begin using the models, though if free commercial use is required.

The next steps

The products that Lee’s startup is currently offering are profitable choices for international businesses that focus on Chinese clients. The approach can be used to create chatbots that can respond in both Chinese and English. The company intends to continue similar efforts in the future by expanding the open-source models’ language support. It also intends to introduce a larger commercial LLM that will go after OpenAI’s GPT series; however, not much information about the project has been made public yet.

Interestingly, 01.AI is not the only AI company with LLMs that focuses on particular languages and markets. The Chinese behemoth Baidu just revealed the release of ERNIE 4.0 LLM and gave a sneak peek at a plethora of new apps designed to run on top of it, such as Qingduo, a creative platform meant to compete with Canva and Adobe Creative Cloud.

Similar to this, the massive Korean company Naver is releasing HyperCLOVA X, its next-generation large language model (LLM) that can understand not only natural Korean-language expressions but also laws, institutions, and cultural contexts pertinent to Korean society. This LLM has learned 6,500 times more Korean data than ChatGPT. Reliance Industries of India and Nvidia are collaborating to develop a sizable language model that is suited for many applications and has been trained on the several languages spoken in the country.

The development of optimized large language models like Yi-34B by startups like 01.AI represents both the democratization and fragmentation of AI. On one hand, access to generative AI is diversifying beyond a few Western Big Tech companies. This allows smaller players to tailor solutions to their markets and languages, potentially increasing inclusion. However, the proliferation of localized models also presents interoperability challenges. As companies tune AI to their geographies, seamless communication, and equitable access across countries may suffer. Ultimately, responsible governance is needed to balance innovation with coordination. But the arrival of startups like 01.AI signals generative AI’s transition from concentrated domination to a more decentralized, double-edged phenomenon.