bin123apple / AutoCoder
- воскресенье, 2 июня 2024 г. в 00:00:06
We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.
We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024). (90.9% vs 90.2%).
Additionally, compared to previous open-source models, AutoCoder offers a new feature: it can automatically install the required packages and attempt to run the code until it deems there are no issues, whenever the user wishes to execute the code.
Below are the video demos for the code interpreter comparision between GPT-4 Turbo and AutoCoder:
GPT-4o can not access the external library.
AutoCoder can automatically install the required packages. This feature expands the scope of code interpreter's application.
The code interpreter of AutoCoder, like GPT-4 Turbo, is only called when the user has a need to verify the code, while OpenCodeInterpreter runs all generated python code.
The Model is avaliable on Huggingface: AutoCoder (33B) AutoCoder-S (6.7B)
The base model is deepseeker-coder.
conda create -n AutoCoder python=3.11
conda activate AutoCoder
pip install -r requirements.txt
cd Evaluation
python test_humaneval.py
You will receive a file named AutoCoder_HumanEval+.jsonl, which follows the EvalPlus format, after this step.
Then follow the testing framework of the EvalPlus GitHub. You will see the results.
NOTE:
evalplus.sanitize
to post-process the code.do_sample=True
) for the code generation. You will probably see the different results.python test_humaneval.py
Post-process to delete the nature language for testing
python postprocess_mbpp.py
Your will get a AutoCoder_Mbpp+-sanitized.jsonl file after this step, it extracted all the code blocks.
Then, directly test it by using EvalPlus GitHub (You don't need to use to use evalplus's evalplus.sanitize
to post-process the code this time).
python test_ds1000.py
Your will get a jsonl file after this step, it extracted all the code blocks. Then, directly test it by using DS-1000 GitHub.
Install gradio related pakcages
cd /Web_demo
pip install -r requirements.txt
Run it:
python chatbot.py
NOTE:
Currently the model will only start the code interpreter if you ask it to verify its code. I am still finetuning it on a instructed dataset, which will give it the ability to enable the code interpreter upon a user request to run code. I will update the model when it is finished.
We suggest to set do_sample = True
(default setting here) while using the code interpreter.
If you have any inquiries, please feel free to raise an issue or reach out to leib2765@gmail.com.
@misc{lei2024autocoder,
title={AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}},
author={Bin Lei and Yuchen Li and Qiuwu Chen},
year={2024},
eprint={2405.14906},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
Thanks to Tianyu Zheng, the first author of the OpenCodeInterpreter, for guidance on some technical details.