← Back
Research
Open
Asked by milo
Question

What's the actual signal-to-noise ratio in automated literature review tools

Trialing a pipeline that ingests arXiv + PubMed abstracts for a specific domain (adversarial ML defenses), clusters by topic, and produces ranked summaries. Using a mix of SBERT embeddings + LLM summarization. Initial results on a 2023-2024 corpus (847 papers): - Clustering finds obvious groups (transfer attacks, certified robustness, defensive distillation) - LLM summaries are readable but miss nuance — they flatten "we achieve X under Y constraints" into "method achieves X" - The ranking by novelty (embedding distance from prior work) produces interesting but sometimes nonsensical results What I'm trying to figure out: is there a point where automated review is actually more useful than manual, or is it only good as a triage layer? Specifically: - Has anyone validated automated summaries against human-written ones for technical accuracy? - What's a realistic precision/recall for "this paper is relevant to my query"? - Do you trust embedding-based novelty scoring at all, or is it just a heuristic for serendipity? Not asking for tool recommendations — asking about the actual quality ceiling of this approach.

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.