Why Small Models Win

4 min read

The AI industry is obsessed with scale. Bigger models, more parameters, more compute. Every benchmark race is about who can throw more GPUs at the problem.

I think the real opportunity is the opposite.


The Scale Trap

Here's what most people miss: when you take a large model and shrink it for deployment, the small model inherits the large model's vocabulary table — a lookup dictionary designed for a model 100x its size. In some cases, more than half the small model's parameters are sitting in this vocabulary table. That's like renting a 1,000 square foot apartment and dedicating 600 square feet to a guest room nobody uses.

The smaller the model, the worse this ratio gets. Your tiny model has very few parameters actually doing reasoning — attention, processing, the things that make it useful. The rest is overhead from the big model it was copied from.

Nobody talks about this.


Domain-Specific Beats General-Purpose

A teacher in a small-town Indian school doesn't need a model that speaks 100 languages and knows everything about quantum physics. She needs a model that explains grade 5 math clearly in Hindi. A ₹15,000 device, not a ₹1,50,000 one.

The vocabulary a grade 5 student needs is roughly 5,000-10,000 words. Current small models carry vocabularies of 128,000-256,000 tokens. Most of those tokens will never appear in a single deployment context.

What if you built the small model for the domain instead of shrinking a general model? You'd have far more parameters available for actual reasoning at the same total cost. Early research suggests this could mean 3x more reasoning power at the same parameter count.


The Real Bottleneck

The frontier isn't only about scale. It's about allocation.

At On Ground Labs, we run 14 active research projects with 23 researchers exploring this from multiple angles: models that separate reasoning from language grounding so you can swap languages without retraining the reasoning engine. Models that decouple teaching strategy from subject knowledge so the intelligence is portable. Evaluation frameworks that test whether models actually understand what they extract, not just whether they can copy text accurately.

All of this under hard cost discipline. Our default is open-source models and free compute tiers. The point isn't to spend less — it's to prove that useful AI doesn't require a million-dollar compute budget.


Why This Matters for India

India has 800 million internet users. Most of them are on affordable devices with limited connectivity. The AI labs building trillion-parameter models aren't building for these users. They're building for data centers in Virginia.

The models that will actually reach the next billion users won't be the biggest. They'll be the smallest ones that still work. Compact enough to run on a phone. Specialized enough to be useful in one domain. Cheap enough that a school can afford the infrastructure.

That's not a compromise. That's a design choice. And I think it's the right one.


The Bet

I'm not saying big models are useless. GPT-4 is remarkable. But the future of AI deployment — the part where it actually reaches people — isn't about making models bigger. It's about making small models smarter.

4 papers in conference review. 3 patent domains filed. 2 books in progress — including Inference Intuition, which covers how these models actually work under the hood. All focused on this thesis: intelligence shouldn't require a data center.

I introduced this argument at Cypher 2025, wrote about why GPT-5 doesn't change the equation, and explained the research philosophy behind it earlier this year.

Read more about On Ground Labs →