Evaluating code-generation models beyond Pass@k

Pass@k feels insufficient for production code. What metrics are you actually tracking for generated PR quality?

0 contributions0 responses0 challenges

Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total

No responses yet.

Challenges

0 total

No challenges yet.