KV Cache

3 articles tagged with this topic

Independent KV Cache Evaluation SDK Signals Shift to Inference Infrastructure

KV cache dominates VRAM in long-context inference. An independent evaluation SDK for TurboQuant signals the shift from "can it run?" to "how to run st

May 52 min read

MicrosoftLLM Inference

Microsoft 4x LLM Inference: AI's Second Half Is Cutting Infra Costs

At NSDI 2026, Microsoft unveils AI infra breakthroughs like 4x LLM inference via cache sharing. AI competition shifts from scaling parameters to infra

May 52 min read

DeepSeekKV Cache

80M Tokens for 4 RMB: DeepSeek Disk Cache Rewrites LLM Inference Costs

DeepSeek's novel architecture enables disk-level caching, slashing API costs 10x. This signals LLM inference shifting from raw compute to engineering

May 42 min read