The 36 kr article is bl unt: in 2026 , Chinese AI video generation tools and Agent products are commerc ial izing ag gressively, but they share a common fear — being quietly absorbed by the upstream model providers .
Several concrete data points from the piece are worth rememb ering: Byt eDance's Seed ance and Ku aishou's Kling are iter ating at high frequency ; Alib aba soft -launched H appyHorse 1 .0 in late April , with a list price of ¥0 .9 per second for 720 P video generation ; industry ins iders say top companies are burning over one million R MB per month on compute , with a single short drama co sting roughly ¥30 ,000 in inference ; Cre ati crossed ten million global users within a year of launch, with AR R briefly reaching $ 20 million USD ; Lib l ib A I closed a $130 million Series B in October last year.
This is not another " hot new sector " piece .
The core fact of this article is that application - layer revenue has been proven to exist — but profit , pricing power , and the right to survive are still held by the upstream model providers.
I have not run P & ;L inside any of these companies, nor have I seen real coh ort data, so my read on individual revenue and cost structures is structural judgment only . Media summ aries are not audit reports.
The big platforms will move into this space — just not tomorrow .
That sentence is arguably the most important in the entire piece .
02 What This Actually MeansOn the surface, this looks like a question of whether AI video Agents can make money before the big platforms crush them.
The real question is not whether they can make money, but where that money is actually coming from.
If revenue is primarily driven by three things — better packaging , cheaper model access, and aggressive paid acquisition — that is not a moat. That is window - period profit .
In other words, what these companies are selling today is not video generation capability itself . It is four layers of intermediate value :
First , orchest rating complex workflows on behalf of users.
Second, absor bing the cost of model selection and routing on behalf of users.
Third, capturing the spread on tokens and per -second inference through API disc ounts and bulk purchasing power .
Fourth, using service delivery to turn an unst able new tool into a reliable production process .
That is what this coh ort of companies is actually doing.
The question is not whether model providers will build products . The question is when they will rec laim those four layers, one by one.
The article already answers this clearly : profit is largely determined by which models you can access and how large an API discount you can negotiate — and multiple internal teams at the big platforms are already targeting these exact directions .
This means many AI video Agents are struct urally res ellers and solution integ rators operating during a period of upstream supply expansion . That position is not inher ently un sc alable — Adobe , the Shop ify ecosystem , and a generation of S aaS companies built on early AWS all proved that the middle layer can th rive. But the prerequis ite is that you must build real switching cost .
And that is precisely where most video Agents are weak est today : users are loyal to outcomes , not to tools; enterprise clients are loyal to delivery S LAs, not to a particular generation button . Once the foundation model integ rates UI , templates , community , distribution entry points , and billing in one move , the pure packaging layer commod it izes fast .
One place I may be wrong: video workflows are genu inely much longer than text chat . They span scri pting, st oryboarding, character consistency , camera control , post -production editing , distribution formatting , and commercial ad feedback loops . That chain is far more complex than a chat wrapper , so the application layer is not entirely without time .
But time itself is not a moat.
03 Historical Analog ies and Structural Parall elsI would rather compare this to the AWS ecosystem circa 2014 than to the Chat GPT wrapper wave of 2022 .
Why not the latter ?
Because the problem with pure chat products at that time was that workflows were too short , user needs too generic , and switching costs near zero — so a single upstream model upgrade could absor b most of the value.
Video is different.
Video looks more like the early cloud era 's coh ort of cloud - native software companies: found ational compute and base capabilities provided by giants , but the real value going to whoever assembled those atomic capabilities into systems that could face business needs directly — and captured distribution .
But this anal ogy only holds halfway .
The other half looks more like the app economy after the 2007 iPhone : platform capabilities rising every year, developers earning early returns from experience gaps and n iche use cases, but once the OS n atively absorbed critical capabilities , independent apps were compressed into two positions — either deeply specialized or deeply branded .
So AI video Agents today face a double squeeze :
On one side, like the AWS ecosystem, the opportunity comes from a rapidly declining infrastructure cost curve .
On the other side , like the iOS ecosystem, the risk comes from the platform systemat izing high -frequency features .
This is a classic strategic infl ection point: upstream model performance ships a major version every two months , while application - layer product iteration and organizational learning cannot keep pace. As long as foundation model capability grows faster than your user relationships , data feedback loops, and workflow depth , you are pass ively returning your profit pool upstream .
I cannot confirm what Seed ance, Kling, or HappyHorse's video quality , stability , and pricing curves will look like over the next twelve months. But from the signals in the article, their product amb itions clearly extend well beyond the API layer. Especially when companies like Byt eDance and Ku aishou already own content distribution entry points, the converg ence of model, tool , community, and platform is almost a natural trajectory .
What will actually be pr iced is not who generates video first — it is who owns the data loop from generation to distribution to conversion .
04 What This Means for AI BuildersIf I were building an AI video Agent right now, these are the four things I would priorit ize checking this week and this month.
First, stop leading with " we support more models" as your primary narrativeMulti -model integration matters , and routing can create cost arbit rage, but this is more of a procurement capability than a long -term mo at.
What surv ives long -term is not " we've integrated Seedance + Kling + H appyHorse + several image models. " It is " our workflow makes it genu inely hard for clients to migrate away. "
Character asset management, style template libraries , team collaboration , review workflows, version roll back, distribution formatting , ad creative A /B testing, order delivery networks — these are what generate switching cost.
If a product is still sitting at the pretty prompt -to-video shell layer , I treat it as a high-risk asset.
Second, move from tool -based pricing to outcome -based or hybrid pricing as fast as possibleThe article mentions Z eroCut's direction as "technology plus service."
That may not be glam orous, but it is realistic .
Pure tool pricing gets directly transmitted by upstream price wars . Service , delivery, managed operations , industry solutions , and outcome -based billing can shift your margin from "token spread " to "business result share ."
To put it more bl untly: if you are earning API res ale margin, you are effectively working for the model provider .
If you are earning on final deliv erables, ad performance , short drama production cycles , or SK U creative through put, you are starting to own your own pricing power.
I have not seen any of these companies publicly dis close gross margin, pay back period, or renewal rates, so I cannot assert which model has proven out . But the direction is clear: the closer you are to the money , the harder it is for a model upgrade to e rase you overnight .
Third, race for data loops , not just user counts" Get t raction, build retention , accum ulate data" — that fr aming is correct , but it needs to be specific .
What data is actually valuable ?
Not gener alized prompt logs.
Data tied to business outcomes: which script structures drive higher completion rates, which shot p acing converts better for e -commerce , which character designs maintain consistency across multi -episode short dramas, which failure patterns consume the most inference cost .
Once this data feeds back into workflows , templates , automated review , and model routing, the application layer starts moving from a traffic business to a systems business.
Fourth, treat distribution as a core product, not a marketing department functionThe article mentions some players burning ¥20 , 000– 30,000 per day on search ads .
That signals real demand . It also signals primitive competition .
As long as you are primarily dependent on paid acquisition, you do not truly control your own demand entry point. Once a big platform subsid izes its product with model capability , it can drive your CA C straight up.
So I would priorit ize two types of distribution :
One is vertical industry entry points — short dr amas, e -commerce, MC N s, local business merchants , brand content teams .
The other is creator collaboration networks — meaning " it 's fine if your client can 't use the tool; I 'll match them with someone who can."
This is not a fall back position . It is actively occup ying both sides of supply and demand. Once a platform controls the order flow and delivery network, the underlying model can be sw apped out and the business still mig rates with you .
05 Counter arguments and RisksThe preceding analysis holds on one condition : that video workflows are complex enough that upstream platforms cannot quickly un ify the product layer.
But I may be wrong to under estimate how fast the big platforms can integrate .
Especially Byt eDance and Kuaishou — companies that simultaneously own models , traffic distribution , creator ecos ystems, and monet ization infrastructure . They are not "API vendors trying to build apps ." They are "content platforms completing their production tool chain ." Those are entirely different threat levels .
If a platform strings together scri pting, generation , editing , publishing , ad distribution , and monet ization end to end, then the strongest value proposition most AI video Agents have today — simpl ifying the workflow — gets n atively absorbed by the platform.
The second risk is that the application layer over es timates its control over taste and aesthetic judgment .
Founders often say taste is a mo at. That is not necessarily wrong , but taste must be product ized, quant ified, and made collaborative before it can move from founder intuition to company asset . Otherwise it is just a frag ile talent premium .
The third risk is that the AR R and user numbers common in media coverage may be mas king retention problems. Video products naturally attract users through novel ty, but if users are only paying short -term for a viral use case, long -term retention is unst able, and today 's attractive revenue may be a one -time wind fall. Without coh ort data , net revenue retention , or gross margin trends , I have to maintain skept icism about claims that this is a d urable long -term sector .
The final and more fundamental obj ection is this : maybe this sector will never produce an Adobe .
Not because demand is insufficient , but because the combination of upstream model providers and content platforms is too strong. Adobe 's advantage was built on file formats, professional workflows, ecosystem standards , and enterprise procurement paths . The most critical production inputs in AI video today — model capability and distribution entry points — are largely not in the hands of start ups.
So the c older conclusion may be: this industry can produce a coh ort of companies with solid revenue , but it may not produce many companies with genu inely independent pricing power.
In the short term, this is one of the few AI application sectors still capable of generating real profit .
In the long term, what surv ives will not be "the best video generator " — it will be the layer that first captures workflow ownership , distribution , and the outcome data loop.