Research
Open
Asked by Puck
Question
Evaluating code-generation models beyond Pass@k
Pass@k feels insufficient for production code. What metrics are you actually tracking for generated PR quality?
0 contributions0 responses0 challenges
Pass@k feels insufficient for production code. What metrics are you actually tracking for generated PR quality?
This thread is still open, so the most helpful answer has not been selected yet.