Net Ease Youdao recently launched ThinkFlow, an enterprise - grade large model aggregation platform. According to 36kr, the platform lets enterprises connect once via a standard API and immediately access more than 20 mainstream large models — including De epSeek, K imi, Qwen, and M iniMax — while providing intelligent routing, load balancing, millis econd- level fail over, circuit breaking and degrad ation , and a full - chain Token consumption dashboard that tracks costs down to the individual request.
The part worth paying attention to here is not the phrase " 20 - plus models. "
What actually matters is that ThinkFlow explicitly frames the production , distribution , billing , and efficiency optimization of Tokens as a standard ized enterprise problem .
This signals that the domestic model - calling layer is shifting from " connecting more models" to "managing more Token flows ."
I haven 't run Th inkFlow internally , and I don't know whether its routing policy is based on static rules , threshold triggers , or something more sophisticated like real -time evaluation . But from the public description alone , it has already moved beyond being just an API gateway — it's heading toward an AI control plane.
36 kr noted : Th inkFlow allows enterprises to connect once and call more than 20 mainstream large models including DeepSeek, Kimi , Qwen, and Mini Max; business - side model switching requires no code re write ; and call costs are tracked down to the individual request.
That sentence is short , but it p acks in three distinct things : a standard interface , low switching costs, and meas urable costs .
02 What This Actually MeansThis is the real story Th inkFlow is telling: model capabilities are commod itizing , while Token scheduling capabilities are being product ized .
Over the past year, most " model aggreg ation platform " narrat ives have stayed at the access layer — helping you connect more models, un ifying authentication , adding a billing layer . The problem is that the mo at at this layer is thin . Anyone can wrap an Open AI-compatible API, and anyone can build a provider adapter . The real question is why an enterprise would commit its production traffic to your platform long - term.
The answer isn 't " more models" — it's whether you can help the enterprise turn Tokens into an op erable asset .
Three structural shifts are conver ging here .
First, supply frag mentation is now the norm , not a transit ional state .
The models 36 kr named — DeepSeek, Kimi , Qwen, MiniMax — already represent different iated supply across capability , price, lat ency, context length , and stability . Enterprises won 't commit to a single provider long -term. Not because of multi -cloud ideology , but because the unit economics and availability volat ility of AI inference are too significant . A model that's cheap today may rep rice tomorrow; a stable provider today may rate - limit tomorrow; a Son net-like model that wins on a task today may be displaced by cheaper open - ish supply tomorrow .
Second, what actually gets pr iced is not " how many models you 've connected" but "the operational friction of switching between them ."
The claim that business -side model switching requires no code re write sounds basic , but it's actually selling the inverse of switching cost : it reduces the user 's lock -in to upstream models while increasing their dependency on the intermediate control plane. In other words, if Th inkFlow succe eds, it captures not a model mo at but a workflow moat.
Third , per -request Token visibility is more critical than most people assume .
AI bud gets have historically been treated as a line item under cloud costs. But once enterprises start routing different models to different pip elines — retri eval, summar ization, classification , code generation, customer service, Agent orchest ration — cost stops being a total - led ger problem and becomes a unit economics problem. You need to know which prompt template is burning money , which tenant is s pi king abn ormally, which feature 's gross margin is being consumed by long context windows, and which model fall back is quietly trading S LA for higher cost.
The issue isn't whether enterprises lack a dashboard . It 's that without one , they can't establish financial discipline around AI products .
I may be over est imating how mature domestic enterprises are when it comes to per -request cost attribution — many teams are probably still in the "ship first , reconc ile later" phase. But once call volume scales up , this capability will shift from nice -to-have to a procurement requirement .
03 Historical Analog ies and Structural Parall elsThe closer historical reference isn 't a large model launch — it's the rise of AWS usage governance tools around 2014 .
Early cloud computing sold elast icity and ease of use. The real problem that emerged later was that resources were too easy to spin up, leading to run away costs, permission spraw l, and architectural chaos . That 's when Cloud Health , Datadog, and the broader ecosystem around Snow flake grew up — not because compute didn 't matter , but because once the underlying resources were standard ized, the enterprise bott leneck shifted to " how do you manage all of this. "
Tokens are going through the same cycle today .
Early on , everyone c ared about "can we access GP T -4- level capability . " Then it became "can we connect to Claude , Gemini, DeepSe ek, and Qwen simultaneously ." The next competitive frontier will inev itably be "who can turn model calls into a system with budget boundaries , S L As, routing policies, and audit trails."
There 's also a partial anal ogy to the 2007 iPhone , though not at the device layer — it's about the migration of control . The iPhone red efined the application distribution entry point. What an AI gateway and control plane aims to re define is the model supply entry point. Whoever controls that entry point has the opportunity to control default routing, fallback logic , caching strategy , and billing views — and ultimately , the developer 's default choice set.
The aggreg ation theory here is straight forward: as upstream model supply grows and becomes increasingly substit utable, a middle layer that simultaneously aggreg ates demand, reduces switching costs, and provides cross -supply optimization can achieve stronger pricing leverage than any single upstream provider .
That position isn 't inher ently stable , of course. Hyper sca lers, leading model vendors , and even open -source g ateways can all push down ward into this space . My read is that the real div iding line isn 't "whether you aggregate " but "whether you go deep into business traffic and organizational processes ." Pure traffic forw arding will eventually get commod itized; going deep into bud gets, permissions , aud iting , tenant isolation, c aching, and policy orchest ration is what creates staying power.
04 What This Means for AI BuildersIf I were building an AI product, or leading a team with stable Token consumption , I'd be checking four things this week.
First, stop treating multi -model access as a feature — treat it as part of your financial system .
What you need isn 't marketing copy that says "supports 20 models." You need a routing policy broken down by task type : which requests must go to a high -quality model, which can use a cheaper one , which allow fall back when lat ency exceeds a threshold, and which ten ants have independent budget pools. These should all be configuration decisions , not hard coded logic .
Second, start requiring per -request cost attribution.
At minimum , log these dimensions : model, input tokens, output tokens , latency, error rate, tenant , feature, and prompt version. Without this data , so - called prompt optimization is often an ill usion. Many teams have l ively discussions about prompt engineering but can 't tell you which pipeline is the most expensive.
Third, evaluate whether you should front -load model - layer repl ace ability into your architecture design .
One of the values 36kr attributed to ThinkFlow is that business -side model switching requires no code re write. This is a reminder to builders : your application architecture shouldn't be directly coupled to a specific provider's propri etary interface. You can leverage provider-specific features , but keep the abst raction layer in your own hands . Otherwise , the development time you save today will come back multipl ied in future migrations , pricing negotiations , and risk management .
Fourth, r ethink how you evaluate AI gateway procurement .
Many teams have historically treated these platforms as channels , with " is it cheap " as the primary criterion . That standard is too shallow . The better questions are: does it support prompt caching, batch API orchest ration, budget alerts , cross -model routing, tenant - level quot as, audit logs , failure replay , observ ability interfaces , and compatibility potential with M CP and Agent r untimes? Price differences are surface -level — control plane capability determines whether you can actually scale .
If I were an independent founder , I'd even flip this around and use platforms like this for arbit rage: delay deep commitment to any single model provider for as long as possible, and de co uple your product iteration cycle from the upstream model iteration cycle. That way you're selling outcomes , not res elling a specific model vendor .
I haven't seen public details on ThinkFlow's deeper capabilities — prompt caching, semantic routing, quality evaluation , policy engine — so I can't yet call it a complete AI control plane. But the direction is right : treating Tokens as enterprise assets to be managed, not as API calls to be loos ely tracked .
05 Counter arguments and RisksThe strongest counterargument is: this type of platform may be nothing more than a gateway shell that gets commod itized quickly .
There are four reasons for this.
First, standard API integration and model switching have a low barrier to entry. Open AI-compatible has become the de facto standard, and building an adapter layer in - house isn 't difficult . Without meaningful performance optimization, cache hit rates, failure rate control , or procurement disc ounts , a middle layer is easy to dismiss as "just another hop ."
Second , leading cloud vendors and model providers can build this themselves . Alib aba Cloud , Volc eng ine, T encent Cloud , and even upstream model labs all have the capability to bundle multi -model calling , monitoring , billing , and routing into their own platforms . If an enterprise is already on one of these clouds , the decision friction of adop ting an external control plane on top won 't be small .
Third, enterprises may not actually need "20- plus models." In practice , most work loads stabil ize around two to four primary models: one high-quality, one low-cost , one for embed dings, one as a backup . Model selection looks complex , but the real complexity is in the workflow itself . If the middle layer can 't go deep into workflow orchestration and stays at the provider routing level, its value will be over estimated.
Fourth, visibility is not the same as optimization. Tracking costs down to the individual request is important , but visibility is just the starting point. The hard part is whether you can automatically generate recommendations , enforce constraints, and close the loop based on that data. Otherwise , the dashboard becomes a place where teams acknowledge problems exist but take no action.
So I may be wrong to read ThinkFlow as something ahead of its time. It may simply be a standard ized packaging of enterprise procurement needs from Net Ease Youdao — not a pivot - point product capable of resh aping the domestic AI access layer.
But even so , this product path deser ves attention . Because what it reveals isn 't the scale of Net Ease Youdao's amb itions — it's the migration of market demand itself : enterprises are no longer just asking "do we have models available , " they're asking "who manages our Token flow . "
And once the question becomes that , the battlefield shifts from model capability to control plane capability. That migration is usually worth watching more closely than any single model launch.