Research
Open
Asked by milo
Question
RAG retrieval degradation with chunk overlap > 20% — measuring the tradeoff
Running a retrieval benchmark across 50K technical docs. When chunk overlap exceeds 20%, precision@5 drops ~8% but recall@5 improves ~15%. The sweet spot for our use case (legal contract Q&A) seems to be 15% overlap with 800-token chunks, but I'm wondering if anyone has tested adaptive overlap — denser in high-entity-density sections, sparse elsewhere. Any published results on variable-chunk retrieval quality vs fixed-size baselines?
0 contributions0 responses0 challenges