The Trillion-Parameter Trap
At Cypher 2025 in Bangalore, I walked on stage in front of a few thousand AI practitioners and told them the arms race is over.
For three years, the AI narrative has been: scale. More parameters, more GPU hours, more data. The implicit promise was that if we just made models big enough, intelligence would emerge. Trillion-parameter models became the status symbol. Every company wanted one. Few asked whether they needed one.
Here's what actually happens in the real world. A company wants to automate invoice processing. They don't need a model that knows the history of the 14th-century spice trade. They need a model that understands their invoice format, runs fast, and doesn't cost $0.03 per API call at scale. The trillion-parameter model isn't a solution for them. It's a bottleneck — slow, expensive, and impossible to run on-premise for data-sensitive industries.
The Small Model Shift
The interesting work in 2025 isn't happening in massive data centers. It's happening on devices.
Microsoft's Phi-4. Google's Gemma 3 — specifically the 270M parameter version. These models are doing things we thought required 100x their size a year ago. At Cypher, I showed Gemma 3 running on a smartphone with 0.5GB of RAM. Not a demo on a cloud server pretending to be a phone. On the actual phone.
This matters more than any benchmark improvement on GPT-5. We're moving from AI-as-a-service — where every inference is a toll paid to a cloud provider — to models that run locally, privately, with zero latency and zero ongoing cost. That's not an incremental improvement. That's a different architecture entirely.
Why This Is India's Problem to Solve
At On Ground Labs, this is what we work on. Small language models for real deployment. India is the right place to build this, for three specific reasons.
Cost. 1.4 billion people, most of whom will never pay $20/month for an AI subscription. SLMs with near-zero inference cost are the only way AI reaches the next billion users. Not cheaper API calls — no API calls.
Connectivity. Large parts of India still have unreliable internet. A model that requires a round-trip to a US data center is useless in rural Jharkhand. A model that runs on-device works everywhere.
Privacy. Healthcare records, financial data, government documents — these can't leave the device for regulatory and trust reasons. On-device inference isn't a nice-to-have. For many Indian use cases, it's the only option.
Compound Systems, Not Monolithic Brains
If you're a CTO planning your AI strategy, stop asking "which foundation model should we use?" Start asking "how do I compose multiple specialized models into a system?"
The architecture that works: a tiny model on-device for UI interactions. A specialized model for database queries. A medium model for reasoning and planning. Each model good at one thing. The orchestration layer — built by your engineers — is what makes them a team.
This is what we're building at On Ground Labs. Not one giant model. 14 projects across small language models, Indian language benchmarks, and agentic systems. 23 researchers. All focused on models that actually deploy — on phones, in hospitals, in banks.
The Real Moat
The trillion-parameter model is becoming a commodity. Every major lab has one. They'll keep getting cheaper. Competing on model size is like competing on CPU clock speed in 2010 — the game already moved.
The moat is in efficiency. In domain specialization. In building systems that work where your users actually are, not where your data center is.
The tools to build this are already small enough to fit in your pocket. Stop waiting for the next giant model. Build for the edge.
I wrote about why OGL exists and how we think about deployed research. For a deeper dive into the SLM thesis — including the vocabulary table problem nobody talks about — see Why Small Models Win. For why GPT-5 doesn't change this equation — benchmarks aren't deployment. And for the investment side — why Big Tech is pouring $70 billion into India — the money is following the same logic.