princeton-nlp / SWE-agent
- пятница, 5 апреля 2024 г. в 00:00:01
SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models
Website & Demo | Discord | Paper [coming April 10th]
SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.
On the full SWE-bench test set, SWE-agent resolves 12.29% of issues, achieving the state-of-the-art performance on the full test set.
We accomplish these results by designing simple LM-centric commands and feedback formats to make it easier for the LM to browse the repository, view, edit and execute code files. We call this an Agent-Computer Interface (ACI) and build the SWE-agent repository to make it easy to iterate on ACI design for repository-level coding agents.
Just like how typical language models requires good prompt engineering, good ACI design leads to much better results when using agents. As we show in our paper, a baseline agent without a well-tuned ACI does much worse than SWE-agent.
SWE-agent contains features that we discovered to be immensely helpful during the agent-computer interface design process:
cat
files. We found that this file viewer works best when displaying just 100 lines in each turn. The file editor that we built has commands for scrolling up and down and for performing a search within the file.Read our paper for more details.
@misc{yang2024sweagent,
title={SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models},
author={John Yang and Carlos E. Jimenez and Alexander Wettig and Shunyu Yao and Karthik Narasimhan and Ofir Press},
year={2024},
}
swe-agent
environment with conda env create -f environment.yml
conda activate swe-agent
../setup.sh
to create the swe-agent
docker image.keys.cfg
file at the root of this repository and fill in the following:GITHUB_TOKEN: 'GitHub Token Here (required)'
OPENAI_API_KEY: 'OpenAI API Key Here if using OpenAI Model (optional)'
ANTHROPIC_API_KEY: 'Anthropic API Key Here if using Anthropic Model (optional)'
TOGETHER_API_KEY: 'Together API Key Here if using Together Model (optional)'
See the following links for tutorials on obtaining Anthropic, OpenAI, and Github tokens.
There are two steps to the SWE-agent pipeline. First SWE-agent takes an input GitHub issue and returns a pull request that attempts to fix it. We call that step inference. The second step (currently, only available for issues in the SWE-bench benchmark) is to evaluate the pull request to verify that it has indeed fixed the issue.
NOTE: At this moment, there are known issues with a small number of repositories that don't install properly for arm64
/ aarch64
architecture computers. We're working on a fix, but if you'd like to run and evaluate on the entirety of SWE-bench, the easiest way is by using an x86
machine.
Inference on any GitHub Issue: Using this script, you can run SWE-agent on any GitHub issue!
python run.py --model_name gpt4 \
--data_path https://github.com/pvlib/pvlib-python/issues/1603 \
--config_file config/default_from_url.yaml
Inference on SWE-bench: Run SWE-agent on SWE-bench Lite and generate patches.
python run.py --model_name gpt4 \
--per_instance_cost_limit 2.00 \
--config_file ./config/default.yaml
If you'd like to run on a single issue from SWE-bench, use the --instance_filter
option as follows:
python run.py --model_name gpt4 \
--instance_filter marshmallow-code__marshmallow-1359
scripts/
folder for other useful scripts and details.config/
folder for details about how you can define your own configuration!sweagent/agent/
folder for details about the logic behind configuration based workflows.sweagent/environment/
folder for details about the SWEEnv
environment (interface + implementation).trajectories/
folder for details about the output of run.py
.This step is only available for issues from the SWE-bench set. To evaluate generated pull requests:
cd evaluation/
./run_eval.sh <predictions_path>
Replace <predictions_path>
with the path to the model's predictions, which should be generated from the Inference step. The <predictions_path>
arguments should look like ../trajectories/<username>/<model>-<dataset>-<hyperparams>/all_preds.jsonl
evaluation/
folder for details about how evaluation works.Contact person: John Yang and Carlos E. Jimenez (Email: {jy1682, carlosej}@princeton.edu).
MIT. Check LICENSE
.