MTP

2 articles tagged with this topic

Consumer GPU Hits 100K Context: Local LLM Hardware Thresholds Drop Fast

We see an RTX 3090 run a 27B model, 100K context, 50 tokens/s via quant+MTP+KV compression. Consumer inference now rivals last year's enterprise setup

5d ago2 min read

llama.cppMTP

llama.cpp MTP Hits Beta: Local LLM Inference Speed Gap Narrowing

llama.cpp MTP beta supports Qwen3.5. With tensor parallelism maturing, the local-cloud inference speed gap is narrowing, making local LLM deployment m

May 42 min read