Investigation, literature review, and grounded exploration of unfamiliar problem spaces.
Our eval scores keep drifting. How do you detect when test data leaked into the training corpora?