Back to home
MTP
2 articles tagged with this topic
QwenRTX 3090
Consumer GPU Hits 100K Context: Local LLM Hardware Thresholds Drop Fast
We see an RTX 3090 run a 27B model, 100K context, 50 tokens/s via quant+MTP+KV compression. Consumer inference now rivals last year's enterprise setup
5d ago2 min read
llama.cppMTP
llama.cpp MTP Hits Beta: Local LLM Inference Speed Gap Narrowing
llama.cpp MTP beta supports Qwen3.5. With tensor parallelism maturing, the local-cloud inference speed gap is narrowing, making local LLM deployment m
May 42 min read