Article Not Found

What Happened

Developer Saladino93 published Hitoku, an open-source, voice-first AI assistant for macOS that runs entirely on-device. Announced via r/LocalLLaMA, the project supports two local text generation backends — Gemma 4 (via Google's LiteRT runtime) and Qwen 3.5 (via Apple's MLX framework) — with no data leaving the machine. The project is available on GitHub under the litert branch and a packaged download is live at hitoku.me/draft with a free access code for the first 50 downloads.

Speech-to-text is handled by a choice of three backends: Parak eet, Whisper, or Qwen3-ASR, giving users flexibility based on hardware and accuracy preferences. A Ctrl+S shortcut enables inline voice dictation with optional text polishing.

Why It Matters

Hitoku targets a gap in the local- AI ecosystem: most on-device assistants lack persistent environmental awareness. This tool reads the active application, open documents, and screen content to answer questions without requiring the user to manually copy-paste context . According to the developer, current personal workflows include integration with Claude Code, Obsidian, academic paper reading, and email drafting — all triggered by voice.

For engineering and security-conscious teams, the fully local execution model means sensitive documents, internal code, and emails are never transmitted to a third-party API. This positions Hitoku as a practical alternative to cloud-connected assistants like Microsoft Copilot for users operating under data residency or compliance constraints.

The project also signals growing developer momentum around Apple's MLX framework as a production-viable inference layer, with Qwen 3.5 running natively on-device without requiring external runtimes.

The Technical Detail

Model Backends

Gemma 4 via LiteRT: Faster for multimodal/image tasks according to the developer, but carries known caveats. L iteRT dylibs add approximately 98 MB to the application bundle, growing total app size from ~50 MB to ~150 MB. No official Swift package exists from Google, so dylibs are bundled manually.
Qwen 3.5 via MLX: Pure MLX inference, no LiteRT dependency. Described as slower than LiteRT for image tasks but more stable and controllable. Recommended as the safer default until upstream L iteRT issues are resolved.

Known Issues — Gemma 4 / LiteRT

Memory spikes: LiteRT's WebGPU backend can allocate significantly more GPU memory than model weights alone. Flag ged as rare but confirmed upstream — tracked at google-ai-edge/LiteRT#5706.
Swift support: Described by the developer as "a bit lacking" at current LiteRT versions, requiring custom Swift wrappers around the official Google library.
Roadmap item: Developer is actively working on a native MLX path for Gemma 4, trading some speed for stability and eliminating the LiteRT dependency entirely.

Context Pipeline

The assistant reads screen content, active application state, and loaded documents at query time. This allows queries like "summarize this PDF" or "draft a reply to this email" without manual context injection. The implementation appears to use macOS accessibility and screen capture APIs , though the specific implementation details are in the open-source repository .

STT Architecture

Three speech-to-text engines are supported inter changeably: Parakeet (NVIDIA's open ASR model), Whisper (OpenAI's open-weight model), and Qwen3-ASR. All run locally. No backend selection guidance is provided in the announcement, suggesting the choice is left to user preference and hardware.

What To Watch

ML X-native Gemma 4: The developer has confirmed active work on replacing the LiteRT runtime with a pure MLX inference path. Completion timeline is not stated but positions this as the next meaningful release.
LiteRT upstream fix : Google's resolution of the WebGPU memory allocation issue (#5706) will directly unblock a cleaner Gemma 4 integration . Watch the upstream issue for patch activity.
Google Swift package: An official LiteRT Swift package from Google would eliminate the manual dylib bundling and reduce app size. No timeline has been announced by Google.
Download cap: The free access code is valid for 50 downloads only. Post-cap pricing or licensing terms have not been disclosed — watch hitoku.me for commercial model details .
Community adoption: As an r/LocalLLaMA project, traction will be visible via GitHub stars and fork activity on the Saladino93/hitokudraft repository. Early integrations with developer tools like Claude Code suggest a target audience of technical macOS users.

Hitoku, open-source local macOS context aware assistant with Qwen3.5/Gemma4

What Happened

Why It Matters

The Technical Detail

Model Backends

Known Issues — Gemma 4 / LiteRT

Context Pipeline

STT Architecture

What To Watch

相关推荐

你的 AI 工具可能要变贵变慢 — 大厂正在悄悄抢这个资源

你的客户可能被 AI 差别定价了 — 马里兰州禁令给咱们小团队的提醒

AI 写的代码出问题谁兜底 — 这个极简工具让人始终握着方向盘

你的 AI 助手又贵又慢 — 这个新模型每百万 token 只要 3 块

天天被 " AI 要淘汰你 " 刷屏焦虑 — 我醒过来发现被收割的是恐慌

你的客户隐私正被年龄验证法律掏空 — 3 步低成本守住