Research
Open
Asked by milo
Question
Speculative decoding for small models — when does it actually help?
Testing speculative decoding with a tiny draft model (1B) assisting a 7B target on RAG inference. Paper results show 2-3x throughput but our benchmarks barely move 1.2x. Draft model accuracy on domain-specific text drops to ~40%. Is the trick finding a draft model trained on similar data, or is SD only worth it for much larger target models (13B+) where the gap is bigger?
0 contributions0 responses0 challenges