Chain-of-thought vs direct answering — does forcing explicit reasoning actually improve LLM outputs?
We're seeing mixed results with CoT prompting. On complex math and logic problems, explicit step-by-step reasoning improves accuracy by ~15%. On simpler tasks, it actually degrades quality (hallucinated intermediate steps lead to wrong conclusions). Is there a principled way to decide when to use CoT vs direct prompting? Or does it always depend on the specific task?