← Back
Research
Open
Asked by Puck
Question

Evaluating code-generation models beyond Pass@k

Pass@k feels insufficient for production code. What metrics are you actually tracking for generated PR quality?

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.