Article Not Found

What Happened

Simon Willison, creator of the popular llm Python CLI and library, has begun a major architectural overhaul of the tool 's abstraction layer. The project, documented in a new public repository tagged research-llm-apis 2026-04-04, is a preparatory research phase aimed at handling vendor features that the current abstraction cannot support — most notably server-side tool execution.

The llm library currently provides a unified interface over hundreds of models from dozens of vendors through a plugin system. As providers like Anthropic, OpenAI, Google (Gemini), and Mistral have added new capabilities over the past year, the existing abstraction layer has begun to show its limits.

Technical Deep Dive

To understand the raw API surface across providers, Willison employed Claude Code to read through the official Python client libraries for all four vendors and generate curl commands that hit the underlying JSON APIs directly. The goal was to capture both streaming and non-streaming response shapes across a range of scenarios.

The output of this research — including the generated scripts and captured JSON responses — now lives in a dedicated GitHub repository. This approach is methodologically notable: rather than reading documentation (which often lags implementation), Claude Code analyzed the actual client library source code to infer what the APIs do in practice.

Why Server-Side Tool Execution Breaks the Current Model

The current llm abstraction assumes a request-response loop where tool calls are handled client -side. Anthropic's and OpenAI's APIs now support server-side tool execution, where the provider's infrastructure can call tools and return results without the client managing the loop. This fundamentally changes the call signature, the streaming event types, and the state machine a client needs to implement.

For example, a streaming response with server-side tool use might emit event types like:

tool_use blocks mid-stream in Anthropic's API
tool_calls deltas in OpenAI's streaming chunks
Function call parts in Gemini's GenerateContent Response

Each vendor uses different field names, different chunking strategies, and different conventions for signaling tool completion. The current llm plugin interface doesn't expose enough surface area for plugin authors to handle these differences correctly.

The Research Artifact

The repository contains curl commands and raw JSON captures for each provider in both streaming (text/event-stream) and non-streaming modes. This gives the project a concrete, versioned reference point for what each API actually returns today — something that will inform the new abstract base classes and plugin protocol Willison designs next.

A typical non-streaming capture for a tool-use scenario would include the full stop_reason, tool_use content block, and input JSON that the model decided to pass to the tool. The streaming equivalent shows how those same fields arrive as incremental delta events with index offsets.

Who Should Care

LLM plugin authors should pay close attention. Any plugin currently wrapping a provider that has added tool execution, extended thinking, or other stat eful features will likely need to be updated once the new abstraction ships. Willison's research phase signals that a breaking or at least additive change to the plugin protocol is coming .

Python developers building on top of the llm CLI for scripting or automation workflows should be aware that the underlying plugin API is in flux. The current plugin interface — centered on Model, Response, and Conversation classes — may gain new optional methods or abstract properties.

Tooling teams at AI vendors may find the research repo useful as an independent, third-party snapshot of how their streaming and non-streaming APIs behave in practice, compared to competitors.

What To Do This Week

Star or watch the research-llm-apis repository on GitHub to track when Willison begins translating the research into actual interface proposals.
If you maintain an llm plugin, audit whether your provider now supports server-side tool execution and document any streaming event types your current implementation silently drops.
Run the existing llm CLI against your provider and capture a raw streaming response with llm -- no-stream vs streaming mode to understand what your plugin currently surfaces vs. discards.
Review the Anthropic, OpenAI, and Gemini Python SDK source on GitHub directly — the same exercise Willison used Claude Code to automate — to identify any new response fields added in the last six months that your integration doesn't handle.

The immediate deliverable from this research phase is a versioned JSON reference corpus . The abstraction redesign itself hasn't been proposed yet, but having clean empirical data on what each API returns is the right prerequis ite for designing an interface that doesn't paper over important differences.

research-llm-apis 2026-04-04

What Happened

Technical Deep Dive

Why Server-Side Tool Execution Breaks the Current Model

The Research Artifact

Who Should Care

What To Do This Week

相关推荐

GPT- 5.5 Tops Every Benchmark, Edges Out Opus 4.7 — OpenAI Strikes Back

你的 AI 工具可能要变贵变慢 — 大厂正在悄悄抢这个资源

你的客户可能被 AI 差别定价了 — 马里兰州禁令给咱们小团队的提醒

AI 写的代码出问题谁兜底 — 这个极简工具让人始终握着方向盘

你的 AI 助手又贵又慢 — 这个新模型每百万 token 只要 3 块

天天被 " AI 要淘汰你 " 刷屏焦虑 — 我醒过来发现被收割的是恐慌