Local LLM
8 articles tagged with this topic
AMD Strix Halo Rumored at 192GB: Local LLM Hardware Bottleneck is Loosening
AMD's next-gen Strix Halo rumored with 192GB unified memory can run 122B LLMs locally. Breaking this memory bottleneck reshapes enterprise private AI
AMD's 128GB Halo Box Prototype Challenges Apple Mac's Local LLM Dominance
AMD's Halo Box prototype (Ryzen 395 + 128GB) gives x86 Mac Studio-rivaling local LLM capacity. We see the local AI inference hardware landscape shifti
一个 Reddit 帖子揭示的真相:本地跑 AI 大模型,硬件门槛比厂商说的要高得多
A user's 24GB AMD mini PC could only allocate 8GB VRAM to AI. The fix isn 't simple—and that gap exposes a wider industry problem .
llama.cpp Tensor Parallelism Breakthrough: Local AI Compute Barrier Drops Another Level
Multi-GPU local inference enables enterprises to run LLMs without cloud dependency. Private deployment compute costs and technical barriers decline si
llama.cpp llama-bench Adds -fitc and -fitt Benchmark Flags
llama-bench gains -fitc and -fitt flags from build b4679, enabling finer control over benchmark timing output.
MiniMax-M2.7 Open-Source Release Delayed to This Weekend
MiniMax delays M2.7 open-source release due to infrastructure work, now targeting this weekend.
37 LLMs Benchmarked on MacBook Air M5 32GB: Full Speed Results
Community benchmark of 37 local LLMs on M5 Air 32GB using llama-bench reveals MoE models as clear winners for speed-to-quality ratio.
Qwen3.5 vs Gemma4 vs Cloud LLMs: Python Turtle Drawing Benchmark
A Reddit user benchmarks local and cloud LLMs on Python turtle graphics, revealing Gemma4 and Gemini share visual style.