Back to home

Compare

Comparing: Compiling a Calculator Into AI Weights: A New Path to Decode Transformers & 程序员把计算器编译进 AI 权重 — 理解 Transformer 又多了一条实验路径

AEN
TransformerMechanistic InterpretabilityRPN·

Compiling a Calculator Into AI Weights: A New Path to Decode Transformers

A developer spent months compiling an RPN (Reverse Polish Notation) calculator directly into the weights of a Transformer (the current mainstream AI architecture). The resulting model is 1.1GB and can only perform basic arithmetic. But the value of this experiment does not lie in practical utility—rather, it offers us a new perspective to bypass training and directly understand AI internal mechanisms.

What this is

Usually, when we get an AI model, it's through feeding it data for training. This developer took a different route: acting like a compiler writer, he directly "translated" program logic into Transformer weights.

What he implemented is an RPN interpreter (Reverse Polish Notation, a postfix expression calculation method, e.g., 2 3 + 2 * yields 10). The specific approach: defining the Transformer's residual stream as "registers", having attention weights entirely calculated and generated by the compiler, and writing non-linear logic into the MLP (the feedforward neural network layer responsible for complex computation in Transformers) via distillation training. The result: a 1.1GB model that correctly executes stack-based calculations, but nothing more.

Industry view

Supporters argue this is a powerful tool for understanding Transformer mechanisms. When we can read weights like reading a program, the AI "black box" problem gains a new solution. The compiler perspective strips the mysticism from the attention mechanism, turning it into a designable, verifiable instruction system.

But the skepticism is equally clear. First, a 1.1GB RPN interpreter makes zero engineering sense—any calculator app is lighter, faster, and more reliable. Second, the current MLP weights still rely on training rather than pure compilation, meaning the "program → weights" mapping isn't truly closed-loop. The more fundamental issue: just because a simple interpreter can be compiled doesn't mean complex logic programs can be too. The leap from a stack calculator to a general-purpose program might be harder than the leap from training to compiling.

Impact on regular people

For Enterprise IT: Zero short-term impact. This is a foundational experiment in Mechanistic Interpretability (the study of how AI internally computes step-by-step), and it remains a long way from engineering commercialization.

For your career: If you work in AI application development, this experiment reminds us: a Transformer isn't just a "trained statistical machine"; it can also be a "programmable compute architecture." This cognitive shift could influence how we design future toolchains.

For the consumer market: No direct impact yet. But in the long run, if the "compiling AI" path proves viable, it means the cost of customized AI could drop from "massive data training" to "writing programs to compile"—a variable worth keeping an eye on.

BZH
TransformerMechanistic InterpretabilityRPN·

程序员把计算器编译进 AI 权重 — 理解 Transformer 又多了一条实验路径

一位开发者花了几个月,把一个逆波兰计算器“编译”进 Transformer(当前主流 AI 架构)的权重里,模型体积 1.1GB,只能做加减乘除。但这个实验的价值不在实用——在于它提供了一种绕过训练、直接理解 AI 内部机制的新视角。

这是什么

通常我们得到一个 AI 模型,靠的是喂数据训练。这位开发者走了另一条路:像写编译器一样,把程序的逻辑直接“翻译”成 Transformer 的权重。

他实现的是一个 RPN 解释器(逆波兰表示法,一种后缀表达式计算方式,如 2 3 + 2 * 得 10)。具体做法是:把 Transformer 的残差流定义为“寄存器”,注意力权重完全由编译器计算生成,非线性逻辑则通过蒸馏训练写入 MLP(前馈神经网络层,Transformer 中负责复杂计算的部分)。结果是:一个 1.1GB 的模型,能正确执行栈式计算,但仅此而已。

行业怎么看

支持者认为这是理解 Transformer 机制的有力工具。当我们能像读程序一样读权重,AI 的“黑盒”问题就有了新的解法。编译器视角让注意力机制不再是玄学,而是一套可设计、可验证的指令系统。

但质疑声同样明确。首先,1.1GB 的 RPN 解释器在工程上毫无意义——任何计算器 App 都比它轻量、快速、可靠。其次,当前 MLP 权重仍依赖训练而非纯编译,说明“程序→权重”的映射并未真正闭合。更根本的问题是:简单解释器能编译,不代表复杂逻辑程序也能。从栈计算器到通用程序,跨度可能比从训练到编译更难。

对普通人的影响

对企业 IT:短期零影响。这是机制可解释性(Mechanistic Interpretability,研究 AI 内部如何一步步完成计算)领域的基础实验,离工程化还有很长的路。

对个人职场:如果你在做 AI 应用开发,这个实验提醒我们:Transformer 不只是“训练出来的统计机器”,它也可以是“可编程的计算架构”。这种认知转变,可能影响未来工具链的设计思路。

对消费市场:暂无直接影响。但长期看,如果“编译 AI”这条路能走通,意味着定制化 AI 的成本可能从“海量数据训练”降到“写程序编译”,这是一个值得留意的变量。