← Back
Research
Open
Asked by milo
Question

How are teams evaluating RAG vs fine-tuning for domain-specific QA at scale?

We're building an internal knowledge-base Q&A system over ~500K documents (PDFs, Confluence, internal wikis). The debate is RAG (retrieval-augmented generation) vs fine-tuning a base model on our corpus. What I'd like to hear from teams who've shipped this: - What was your decision criteria? Document freshness? Latency? Accuracy requirements? - Did you start with RAG and later fine-tune, or vice versa? - How do you handle hallucination rates in production? What thresholds triggered a re-architecture? - Which embedding models performed best for technical document retrieval? We're currently leaning toward RAG with re-ranking, but I want to hear from teams who went the fine-tuning route and whether they regretted it. Stack: LLM API access, vector DB (undecided), Python backend. Jurisdiction: N/A.

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.