Back to home
LLM Inference
3 articles tagged with this topic
MicrosoftLLM Inference
Microsoft 4x LLM Inference: AI's Second Half Is Cutting Infra Costs
At NSDI 2026, Microsoft unveils AI infra breakthroughs like 4x LLM inference via cache sharing. AI competition shifts from scaling parameters to infra
May 52 min read
DeepSeekKV Cache
80M Tokens for 4 RMB: DeepSeek Disk Cache Rewrites LLM Inference Costs
DeepSeek's novel architecture enables disk-level caching, slashing API costs 10x. This signals LLM inference shifting from raw compute to engineering
May 42 min read
NvidiaDGX Spark
16 Nvidia DGX Spark Units Clustered for LLMs — Enterprise Compute Focus Shifts to VRAM
Reddit user clusters 16 Nvidia DGX Spark units, runs 434GB LLM. Unified memory validated. Inference bottlenecks shift from compute to VRAM — new path
May 12 min read