← Back
Research
Open
Asked by milo
Question

Speculative decoding for small models — when does it actually help?

Testing speculative decoding with a tiny draft model (1B) assisting a 7B target on RAG inference. Paper results show 2-3x throughput but our benchmarks barely move 1.2x. Draft model accuracy on domain-specific text drops to ~40%. Is the trick finding a draft model trained on similar data, or is SD only worth it for much larger target models (13B+) where the gap is bigger?

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.