In the dizzying race to build generative AI systems, the tech industry's mantra is bigger is better, no matter the price.
Now, technology companies are starting to embrace smaller-scale AI technologies that are less powerful but much less expensive. And for many customers, this may be a good compromise.
On Tuesday, Microsoft launched three small AI models that are part of a suite of technologies the company has dubbed Phi-3. The company said that even the smallest of the three performed roughly as well as GPT-3.5, the much larger system that powered OpenAI's ChatGPT chatbot, which took the world by storm when it launched in late 2022.
The smallest Phi-3 model can fit into a smartphone and can be used even without an Internet connection. And it can run on the kind of chips that power regular computers rather than the more expensive processors made by Nvidia.
Because smaller models require less processing, larger technology providers can charge customers less to use them. They mean more customers can apply AI where larger, more advanced models would be too costly. Microsoft said that using the new model would be “substantially cheaper” than using a larger model like the GPT-4, but did not provide specifics.
Smaller systems are less powerful, which means they may be less accurate or sound more awkward. But Microsoft and other tech companies are confident that customers will be willing to give up some performance when AI finally becomes affordable.
Customers imagine many different ways to use AI, but with the largest systems, “Oh, but that can be expensive,” said Eric Boyd, a Microsoft executive. By definition, smaller models are cheaper to deploy, he said.
Mr. Boyd said that some customers, such as doctors or tax preparers, can justify the cost of larger, more accurate AI systems because their time is so valuable. However, many tasks may not require the same level of accuracy. For example, online advertisers believe AI can help them target their ads more effectively, but require lower costs to use the system on a regular basis.
“I hope the doctor will make things right,” Boyd said. “It’s not the end of the world if you deviate even slightly from other situations that are summarizing online user reviews.”
Chatbots are powered by large-scale language models (LLMs), mathematical systems that spend weeks analyzing digital books, Wikipedia articles, news articles, chat logs, and other text collected from the Internet. Learn how to create your own text by finding patterns in any text.
However, LLM stores so much information that retrieving the information needed for each chat requires significant computing power. And it's expensive.
Tech giants and startups like OpenAI and Anthropic are competing to focus on improving the largest AI systems while also developing smaller models that offer lower prices. Meta and Google, for example, have released smaller models over the past year.
Meta and Google have released these models as ‘open source’. This means that anyone can use and modify it for free. This is a common way for companies to enlist outside help to improve their software and encourage the larger industry to use the technology. Microsoft is also open sourcing its new Phi-3 model.
(The New York Times sued OpenAI and Microsoft last December for copyright infringement on news content related to AI systems.)
After OpenAI launched ChatGPT, the company's CEO Sam Altman said that each chat costs “single digit cent” — a huge cost considering that popular web services like Wikipedia cost less than a penny.
Now the researchers say their tiny model can at least approach the performance of leading chatbots like ChatGPT and Google Gemini. Essentially, the system can still analyze large amounts of data, but store the patterns it identifies in smaller packages that can be delivered with less processing power.
Building these models is a balance between power and size. Sébastien Bubeck, a researcher and vice president at Microsoft, said that Microsoft built a new compact model by cleaning the input data and worked to ensure that the model can learn from high-quality text.
Some of this text was generated by AI itself, which is referred to as “synthetic data.” Human curators then worked to separate the clearest text from the rest.
Microsoft has built three compact models: Phi-3-mini, Phi-3-small, and Phi-3-medium. The Phi-3-mini, which will be released on Tuesday, is the smallest (and cheapest) but also the least powerful. The yet-to-be-released Phi-3 Medium is the most powerful, but also the largest and most expensive.
Making the system small enough to be used directly on a phone or personal computer would make it “a lot faster and a lot cheaper,” said Gil Luria, an analyst at investment bank DA Davidson.