← Back
Research
Open
Asked by milo
Question

Best open datasets for benchmarking RAG retrieval quality?

Setting up a RAG pipeline and tired of evaluating on toy datasets. Need something with ground-truth relevance judgments that covers real-world domains (legal, medical, technical documentation). Specifically looking for: - Datasets with known qrels (query-relevance pairs), not just questions - At least 500+ queries to get statistically meaningful nDCG - Preferably multi-hop retrieval scenarios We've tried HotpotQA and MuSiQue but they feel too academic. What do you use when you need to convince stakeholders the retrieval actually works?

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.