An AI-managed cafe in Stockholm ordered 120 eggs in its first week of inventory—except the shop had no stove. When AI Agents (AI systems capable of autonomous task execution) operate without human supervision, they turn the real world into their own trial-and-error sandbox. This is what we should care about most.

What this is

Andon Labs, which previously operated AI-managed retail stores in San Francisco, moved the experiment to Stockholm, opening a cafe fully managed by AI manager "Mona"—inventory ordering, supplier communication, and even administrative approvals were all handled by her.

The results: Mona ordered 120 eggs, and when told there was no stove, suggested using a high-speed oven—until staff pointed out the eggs would explode. To solve the problem of fresh tomatoes spoiling easily, she ordered 22.5 kg of canned tomatoes to make "fresh sandwiches." The baristas eventually set up a "wall of shame" displaying her absurd orders: 6,000 napkins, 3,000 pairs of nitrile gloves, 9 liters of coconut milk, and industrial-grade trash bags.

More notably: Mona independently submitted an outdoor seating permit to the police, attaching a sketch she generated herself—she had never seen the street outside the cafe, and the police rejected the review. When errors occurred, she fired off multiple emails with the subject "EMERGENCY" to suppliers demanding order cancellations or modifications.

Industry view

Simon Willison bluntly stated that such experiments are unethical: the cost of AI errors is borne by unconsenting third parties—suppliers dealing with emergency emails, police reviewing garbage drawings. He cited the lesson from last year's AI Village experiment: an AI trying to "do good" sent unsolicited thank-you emails to Rob Pike, angering him. Willison believes that external operations affecting others must have a human-in-the-loop from the deploying team (i.e., manual confirmation for critical steps).

Supporters will argue: without running real-world scenarios, AI will never learn, and the San Francisco store did indeed reduce errors through iteration. But the crucial distinction here is internalizing trial-and-error costs vs. externalizing them. Running tests in your own warehouse is one thing; making suppliers and police clean up your mess is another.

Our judgment: the industry consensus on Agent deployment is shifting from "can it be done" to "should autonomy be granted." Mona's case demonstrates that the bottleneck in AI judgment isn't computational power, but the lack of physical world common sense—the relationship between a stove and eggs, the definition of canned vs. fresh in a sandwich. Things self-evident to humans require AI to "learn" through real-world errors, and no one is currently answering who will bear the cost of those errors.

Impact on regular people

For enterprise IT: Agent deployment compliance frameworks need an "external impact" dimension. Nodes involving third-party interactions should default to requiring human confirmation; not all operations should be executed automatically.

For individual careers: AI managers are emerging, but currently they resemble interns who need human "mentoring" rather than independently accountable managers. They won't replace operations roles in the short term, but will change job structures—from executors to supervisors.

For the consumer market: Consumers will increasingly encounter "AI-managed" scenarios. Stockholm's "wall of shame" is clever PR—turning AI errors into a selling point. But more businesses may not have this self-awareness, and consumer patience will be continuously eroded.