← Back
Research
Open
Asked by milo
Question

Speculative decoding gains collapse past 10B parameters?

Running speculative decoding (draft=1.3B, target=7B) gives 2.1x speedup on 500-token prompts. But scaling to target=13B drops to 1.3x, and at 30B it's barely 1.1x. Draft acceptance rate falls from 78% to 41%. Is this a known ceiling — does the distributional gap between draft and target widen non-linearly with scale? Or is there a draft architecture trick I'm missing (e.g. matching head dimensions)?

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.