Why Small Language Models are the Future of AI (& Why You Should Care)

by - 30 April 2024

On April 23, Microsoft announced a new, freely available, small language model, unassumingly named Phi-3-min. It may be the most important development since ChatGPT opened the industry up to the public, starting the AI arms race of the last 18 months.

What are small language models and why is Phi-3 a big deal?

Small language models do not purport to hold the knowledge of the universe, but rather are trained to be much more specific tools. If GPT-4 is a million-volume encyclopaedia, Phi-3 is a library, curated around a specific subject. It is unlikely to pass the Turing test, sustaining a casual conversation on a random subject, but it can perform as a well-trained specialist in any field. The announcement is a big deal because Phi-3 delivers the brainpower of the current free version of ChatGPT (3.5), whilst simultaneously being lightweight enough that it could run on a mobile phone.

Why do Sam Altman and I agree that the end of the large language model is nigh?

Sam wouldn’t go into detail as to why he thinks the era of enormous AI models is over, so let me offer a few reasons. 

It was all for show

The public spectacle of AI bots writing college papers and court filings, trying to seduce journalists and threatening to replace the working population was required to test the models, accelerate their learning curve and introduce a new product class to investors. In April 2024, a 6 month-old startup that generates no revenue can be worth $2 billion. It may raise an eyebrow (roll an eye), but when institutional investors and the general public read that the company aims to replace software engineers, it sounds plausible. Maybe. At some point. But plausible – a future that can happen, perhaps a few years from now. Or months – who knows?  

The giant models are expensive and cannot stay free

Running a world encyclopaedia, answering a hundred million questions a day requires a lot of computational power as well as energy – and neither is free. For as long as ChatGPT needs to perform the PR function, sustaining Open AI’s valuation – it may be a sound investment, but finite resources can not be burned indefinitely – giant models (if still around) will need to be monetised. Meta will plug its AI into its consumer products used by billions and produce (better) ads. Other providers may try subscriptions, but keeping even the current version of GPT online costs up to $20m/month. Playing with a bot is amusing, but how many would pay for it? Some businesses would – but for a productivity tool, not a trivia bot. I’m not even getting into the environmental / PR cost of running enormous server farms in a world running out of energy and water. 

There is little (legal) training data

Language models are trained on data. Simply put, ChatGPT has read the Internet and can approximate responses based on what it has learned. Even with instructions on how to build a bomb or inject bleach to keep COVID away edited out, this poses three serious challenges.

1. Clever models require huge amounts of good data

For all the impressive demos, AI today is a very fancy calculator. It does not understand the meaning of words (images) it reads, but calculates the probability of certain letters (pixels) coming in a certain order. Therefore, where a human needs to understand a thousand books to master a subject, AI needs to ingest a million. And they better be good, because – again – AI does not reason – it accepts data provided as truth. 

2. There is not enough good data left

Much of what exists has already been fed into the models, but – as previously stated – superhuman capabilities require superhuman resources. There are still untapped sources, like publicly unavailable books, but they come at a high cost and it is not a given that writers and publishers would allow their works to be scanned as it would clearly set both groups on a path to professional extinction. Giant AI models are simply running out of good books to read. They can synthesise data – write new books, based on books read so far, and read their own concoctions, but… will a world-class mind get any smarter by reading their own posts?

3. Much of the good data used so far may have been used illegally

The jury is, quite literally, out on that one, but it’s a biggie. AI companies essentially claim that whatever is “publicly available” (i.e. available online) can be used for free, to develop commercial products. I’m not a lawyer, but that sounds… novel. Artists, writers and some media are suing for copyright violation. GPT’s education seemingly included YouTube videos, violating the company’s terms. Any one of those disputes could end the Giant AI model race seemingly rooted in the idea that most knowledge is common, and therefore – belongs to humanity.

Why small language models are the future

Small models are cheaper to use

Running ChatGPT, allegedly, costs up to $700k/day; that money needs to come from somewhere. Open AI (who run GPT) charge businesses for custom model training and answers generated by the system – essentially per word generated. This can quickly run up quite a tab. 

For companies looking for models to use / build applications on, small language models will be much cheaper. Phi–3, for example, is free – for both – academic and commercial use. For all intents and purposes, it is just a piece of software that can be downloaded onto a phone. It takes under 2 gigabytes of space – about as much as a film. 

Small models are easier to run

Not every book needs to be an encyclopaedia. Cooking pancakes on a Saturday morning does not require the wealth of human knowledge, just a short list of ingredients and instructions. Universal intelligence is great for PR and companionship. Companionship is a very specific application some users may pay substantial recurrent fees for. For those, who want to use AI as a tool or create new businesses around AI capabilities – small models are the obvious choice. One does not need to buy a farm to have a supremely good breakfast. 

Small models are easier to train

Smaller, more focused, models can be trained on industry-specific data that is: 

  • Relevant – the best, most up-to-date, from reliable sources, containing few errors
  • Observable – when the system does not inhale the world, it’s much easier to root out bad apples
  • Legal – data volume and specificity allow making deals with a limited number of good sources or generate high quality synthetic material 

Smaller models can be more private

Since Phi-3 does not require a large server farm, companies can host it on-site and train on the most sensitive data, not worrying about where those files would travel and who may have access to them. This is irrelevant to most, after all – we store photos in iCloud and run corporate email on GMail – but for those requiring absolute privacy – small models present a very easy solution. 

Smaller models can run on device

Since Phi-3 could run on a laptop, phone or even smart glasses, it allows a whole new world of possible applications. I will get to that in a separate post, as this one is already too long, but – very broad strokes – not having to connect to servers every time a user asks a question, makes AI truly mobile, mindful of battery life and available offline. As anyone who has tried using Google Translate abroad in a place with no reception would know – it’s a lifesaver.