I have a feeling that the next breakthroughs in AI won’t necessarily come from more capable models, but rather from much faster inference. This week, we’ve seen two interesting developments in that direction.

First, OpenAI announced GPT-5.6 Sol:

“We’re also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July”

Second, Google released Nano Banana 2 Lite, bringing image generation down to just a few seconds.

I suspect that even with model capabilities around the level of Claude Opus 4.6+, this kind of inference speed would enable a lot of new use cases, much like video streaming only became practical once the internet became fast enough.