Inception Raises $50M to Build Diffusion Models for AI

Diffusion-based AI could rewrite how code and text are generated.

While most of Silicon Valley is racing to build larger and more expensive LLMs, one Stanford professor is taking a completely different path — and just raised $50 million to prove it works.

Meet Inception, a startup building diffusion-based AI models — a technology better known for powering image generators like Stable Diffusion, Midjourney, and Sora — but now being applied to text and code.

The company, led by Stanford’s Stefano Ermon, just secured a star-studded $50M seed round backed by Menlo Ventures, Microsoft’s M12, Snowflake Ventures, Databricks, NVentures (Nvidia), and AI pioneers Andrew Ng and Andrej Karpathy.

And it’s not just hype. Inception also unveiled an upgraded version of its flagship model, Mercury, designed specifically for software development. It’s already being integrated into popular developer tools like ProxyAI, Buildglare, and Kilo Code.

The key breakthrough? Mercury runs on diffusion, not auto-regression — and that makes all the difference.

While models like GPT-5 or Gemini generate output one word at a time (auto-regression), diffusion models refine their responses iteratively — adjusting the entire output with each step until it aligns perfectly with context and intent.

Think of it like painting: instead of sketching word by word, diffusion models fill in the whole canvas, polishing it over multiple passes.

That structural change has massive implications for speed, cost, and scalability.

Ermon says Inception’s models process over 1,000 tokens per second — far beyond what’s achievable with traditional LLMs — because diffusion can run operations in parallel. Where text models move linearly, diffusion spreads its workload across multiple layers of hardware, drastically cutting latency.

And in an era where AI companies are burning billions on compute, that efficiency could become a defining advantage.

As Inception explains, diffusion-based systems handle both text and structured data faster and more cost-effectively, making them ideal for code generation, large-scale reasoning, and data-constrained environments.

Recent research from CMU’s ML Blog backs this up — showing that diffusion models outperform autoregressive ones when data is limited or when parallelism matters most.

Inception’s founder, Ermon, puts it simply:

“These diffusion-based LLMs are much faster and much more efficient than what everybody else is building today. It’s just a completely different approach.”

The implications go beyond code. If diffusion models prove scalable for text, they could fundamentally reshape how we build AI systems — moving away from the “bigger is better” mindset and toward models that are smarter, faster, and lighter.

For now, Inception is focusing on developers — those building the future of software. But the technology could soon touch everything from content creation to scientific research.

And while everyone’s watching the trillion-dollar giants compete to train the largest LLMs, this small team of Stanford researchers might just have the one thing they don’t — a faster way forward.

Because sometimes, innovation isn’t about more.
It’s about different.