What Happened
Developer Saladino93 published Hitoku, an open-source, voice-first AI assistant for macOS that runs entirely on-device. Announced via r/LocalLLaMA, the project supports two local text generation backends — Gemma 4 (via Google's LiteRT runtime) and Qwen 3.5 (via Apple's MLX framework) — with no data leaving the machine. The project is available on GitHub under the litert branch and a packaged download is live at hitoku.me/draft with a free access code for the first 50 downloads.
Speech-to-text is handled by a choice of three backends: Parak eet, Whisper, or Qwen3-ASR, giving users flexibility based on hardware and accuracy preferences. A Ctrl+S shortcut enables inline voice dictation with optional text polishing.
Why It Matters
Hitoku targets a gap in the local- AI ecosystem: most on-device assistants lack persistent environmental awareness. This tool reads the active application, open documents, and screen content to answer questions without requiring the user to manually copy-paste context . According to the developer, current personal workflows include integration with Claude Code, Obsidian, academic paper reading, and email drafting — all triggered by voice.
For engineering and security-conscious teams, the fully local execution model means sensitive documents, internal code, and emails are never transmitted to a third-party API. This positions Hitoku as a practical alternative to cloud-connected assistants like Microsoft Copilot for users operating under data residency or compliance constraints.
The project also signals growing developer momentum around Apple's MLX framework as a production-viable inference layer, with Qwen 3.5 running natively on-device without requiring external runtimes.
The Technical Detail
Model Backends
- Gemma 4 via LiteRT: Faster for multimodal/image tasks according to the developer, but carries known caveats. L iteRT dylibs add approximately 98 MB to the application bundle, growing total app size from ~50 MB to ~150 MB. No official Swift package exists from Google, so dylibs are bundled manually.
- Qwen 3.5 via MLX: Pure MLX inference, no LiteRT dependency. Described as slower than LiteRT for image tasks but more stable and controllable. Recommended as the safer default until upstream L iteRT issues are resolved.
Known Issues — Gemma 4 / LiteRT
- Memory spikes: LiteRT's WebGPU backend can allocate significantly more GPU memory than model weights alone. Flag ged as rare but confirmed upstream — tracked at google-ai-edge/LiteRT#5706.
- Swift support: Described by the developer as "a bit lacking" at current LiteRT versions, requiring custom Swift wrappers around the official Google library.
- Roadmap item: Developer is actively working on a native MLX path for Gemma 4, trading some speed for stability and eliminating the LiteRT dependency entirely.
Context Pipeline
The assistant reads screen content, active application state, and loaded documents at query time. This allows queries like "summarize this PDF" or "draft a reply to this email" without manual context injection. The implementation appears to use macOS accessibility and screen capture APIs , though the specific implementation details are in the open-source repository .
STT Architecture
Three speech-to-text engines are supported inter changeably: Parakeet (NVIDIA's open ASR model), Whisper (OpenAI's open-weight model), and Qwen3-ASR. All run locally. No backend selection guidance is provided in the announcement, suggesting the choice is left to user preference and hardware.
What To Watch
- ML X-native Gemma 4: The developer has confirmed active work on replacing the LiteRT runtime with a pure MLX inference path. Completion timeline is not stated but positions this as the next meaningful release.
- LiteRT upstream fix : Google's resolution of the WebGPU memory allocation issue (#5706) will directly unblock a cleaner Gemma 4 integration . Watch the upstream issue for patch activity.
- Google Swift package: An official LiteRT Swift package from Google would eliminate the manual dylib bundling and reduce app size. No timeline has been announced by Google.
- Download cap: The free access code is valid for 50 downloads only. Post-cap pricing or licensing terms have not been disclosed — watch hitoku.me for commercial model details .
- Community adoption: As an r/LocalLLaMA project, traction will be visible via GitHub stars and fork activity on the
Saladino93/hitokudraftrepository. Early integrations with developer tools like Claude Code suggest a target audience of technical macOS users.