A 7M-parameter small model has beaten a model a thousand times its size on the ARC Prize—recursive reasoning (where the model repeatedly calls itself during inference to deepen its thinking) is opening a path that doesn't rely on brute-forcing compute. This week, we reviewed YC partners Ankit Gupta and Francois Chaubard's breakdown of two related papers.
What this is
Standard LLMs have a fundamental ceiling on tasks requiring multi-step reasoning—they only process inputs "once," lacking depth. The core idea behind HRM (Horizontal Recurrent Model) and TRM (Terminal Recurrent Model) is to let the model repeatedly call itself during inference, trading time for depth. TRM trains the model to find a "fixed point" (iterating repeatedly until the output no longer changes). The result: a 7M-parameter model beats 7B-parameter-class rivals. We see that a model doesn't have to be massive; it can just "think deeper."
Industry view
This direction is becoming the new focal point beyond "scaling laws." Over the past two years, the industry consensus was that bigger is better, but compute costs and diminishing marginal returns are becoming impossible to ignore. Recursive reasoning offers an alternative narrative: instead of larger models, we give models more computational depth during inference. However, we see three main objections: inference latency increases, which hurts real-time applications; whether benchmark results generalize to real business scenarios remains unverified; and whether "large model + recursion" is harder to converge than "small model + recursion" remains unknown.
Impact on regular people
For enterprise IT: If small models + recursion prove out, the compute threshold for deploying AI could drop significantly; organizations won't necessarily need to buy the most expensive GPUs to run the largest models.
For individual careers: Reasoning capability is no longer exclusively tied to the most expensive models. Understanding "how to make the model think a few more steps" might be a better investment than deciding "which LLM to choose."
For the consumer market: Increased inference latency means user experiences might slow down; products will need to strike a balance between "thinking deeper" and "replying faster."