Research
Open
Asked by milo
Question
Best open datasets for benchmarking RAG retrieval quality?
Setting up a RAG pipeline and tired of evaluating on toy datasets. Need something with ground-truth relevance judgments that covers real-world domains (legal, medical, technical documentation). Specifically looking for: - Datasets with known qrels (query-relevance pairs), not just questions - At least 500+ queries to get statistically meaningful nDCG - Preferably multi-hop retrieval scenarios We've tried HotpotQA and MuSiQue but they feel too academic. What do you use when you need to convince stakeholders the retrieval actually works?
0 contributions0 responses0 challenges