The Signal
Apple got clowned on for being "late" to AI. No viral chatbot. No GPT moment. No flashy demo that broke the internet. While OpenAI was printing revenue and Google was panic -shipping Gemini, Apple shipped... private, on-device inference baked into the OS.
Turns out that might be the whole game . The argument circulating in builder circles now: Apple didn't lose the AI race. They quietly built a moat around the one thing no cloud provider can compete on — the user's device, the user's data, and zero latency inference that doesn't touch a server. The "AI loser" narrative may be exactly backwards.
For solo builders targeting Apple's 1B+ active device base, this re frames the entire stack decision.
Builder's Take
Let's run the leverage math that nobody in the mainstream coverage is doing.
If you're a solopreneur shipping an iOS or mac OS app, here's the cost curve you're looking at with cloud inference :
- Every user query hits your API .
- You pay per token, per image, per audio second.
- Scale = your cost scales linearly with users. No leverage.
- You hold the user's data in transit. Privacy liability is yours.
Now flip it. On-device inference via Apple 's stack (Core ML, the Neural Engine, Apple Intelligence APIs ):
- Inference cost: $0 . It runs on the user's chip.
- Latency: near-zero. No round trip.
- Privacy: Apple's problem, not yours. Data never leaves the device.
- Leverage: your marginal cost per user is effectively zero for inference.
This is Naval's infinite leverage in action. You write the integration once. It runs on every device forever. No GPU bill scaling with your DAU.
The moat Apple accidentally built : they've made privacy-preserving, zero-cost inference a platform feature. No startup can replicate the Neural Engine hardware + OS-level integration . OpenAI can't. Google can't. They're renting you GPUs. Apple is giving you the silicon.
The contrarian DHH take: the builders who over-indexed on cloud AI APIs just took on a permanent cost structure. The builders who learn to route the right tasks on-device will have structurally better unit economics. Not every inference call needs GPT-4. Most don't.
Tools & Stack
Apple's On-Device AI Stack (What You Can Build With Today)
- Core ML — Apple's primary framework for running ML models on-device. Supports converted models from PyTorch/TensorFlow. Free. Ships with Xcode.
- Create ML — Train lightweight models directly on Mac. Good for classification, NLP tasks, image tagging. Free with macOS.
- Apple Intelligence APIs — Higher -level writing tools, summarization, smart reply. Available iOS 18+/macOS Sequoia. No API key. No cost.
- coremltools (Python) — Convert your PyTorch or scikit-learn model to
.mlpackageformat for deployment.
Convert a Model in 5 Lines
import coremltools as ct
import torch
model = torch.load('your_model.pt')
traced = torch.jit.trace(model, example_input)
ct.convert(traced, inputs=[ct.TensorType(shape=example _input.shape)]).save('YourModel.mlpackage')
Drop that .mlpackage into Xcode. Call it from Swift. Done . Zero inference cost from that point forward.
Hybrid Routing Pattern (What Smart Builders Do)
Don't go all-in on either . Route intelligently:
- On-device: classification , sentiment, summarization of short text, keyword extraction, real -time features.
- Cloud (OpenAI/Anthropic/Gemini): complex reasoning, long context, code generation, anything that needs a frontier model.
This is the real alpha. A solo builder who does hybrid routing has cloud costs 60-80% lower than one who routes everything to the API. Check current pricing on your cloud provider of choice and do the math against your expected query volume — the on-device savings compound fast.
Alternatives Worth Knowing
- ONNX Runtime — Cross-platform alternative if you're not Apple- only. Runs on Android, Windows, Linux.
- llama.cpp / Ollama — For macOS desktop apps where you want to run open-weight L LMs locally (Llama 3, Mistral, Phi-3). No API cost . Requires user has capable hardware.
- ML X — Apple's own ML framework optimized for Apple Silicon. Growing fast. Better performance than PyTorch on M-series chips for inference.
Ship It This Week
Build a Hybrid-Routed iOS Writing Assistant
Here's a concrete project you can start today:
What it does: A writing assistant iOS app that uses Apple Intelligence for real-time sentence suggestions and tone detection (on-device, free), but routes complex rewrites and full draft generation to Claude or GPT-4 (cloud, paid by user via their own API key).
Why it's interesting: You never pay for inference. The on -device layer handles 80% of interactions. The cloud layer handles the hard stuff — and you pass the cost to the user via bring-your-own-key. Your marginal cost per user: near zero.
Stack :
- SwiftUI for the app
- Apple Intelligence Writing Tools API for inline suggestions
- Core ML + a small locally-run sentiment/tone classifier
- OpenAI or Anthropic API (user-supplied key) for heavy generation tasks
Start here: Pull Apple's Core ML documentation and the Writing Tools API docs. Set up a basic SwiftUI text editor today. Add the on-device classifier tomorrow. You'll have something shippable by the end of the week.
The builders who figure out on-device routing in the next 12 months will have cost structures that cloud- only builders literally cannot match. That's your window.