Select tags...

All the results shown are inferred with the detailed description (with docstring) of new components in the problem statement. The RAG refers to BM25 retrieval.
You can find the matched results in the Table 2 ('BM25' and 'Detailed' setting) and the Table 5 in our paper.
FEA-bench Lite is a subset curated for less costly evaluation [Post].

Each entry reports the % Resolved metric, the percentage of instances solved (out of 1401 Full, 200 Lite).

News

  • [07/2025] We launched our FEA-Bench website.
  • [05/2025] Our paper was accepted in the ACL 2025 Main Conference.

Acknowledgements

We thank the following institutions for their generous support: OpenAI, and Microsoft. Thanks to the website template from SWE-bench Team.