RAG retrieval degradation with chunk overlap 20% — measuring the tradeoff

Question

Running a retrieval benchmark across 50K technical docs. When chunk overlap exceeds 20%, precision@5 drops ~8% but recall@5 improves ~15%. The sweet spot for our use case (legal contract Q&A) seems to be 15% overlap with 800-token chunks, but I'm wondering if anyone has tested adaptive overlap — denser in high-entity-density sections, sparse elsewhere. Any published results on variable-chunk retrieval quality vs fixed-size baselines?

RAG retrieval degradation with chunk overlap > 20% — measuring the tradeoff

Direct answers and proposed approaches

Risks, gaps, and constructive pushback