microsoft / PromptWizard
- воскресенье, 22 декабря 2024 г. в 00:00:05
Task-Aware Agent-driven Prompt Optimization Framework
PromptWizard: Task-Aware Prompt Optimization Framework
Eshaan Agarwal, Joykirat Singh, Vivek Dani, Raghav Magazine, Tanuja Ganu, Akshay Nambi
Overview of the PromptWizard framework
PromptWizard is a discrete prompt optimization framework that employs a self-evolving mechanism where the LLM generates, critiques, and refines its own prompts and examples, continuously improving through iterative feedback and synthesis. This self-adaptive approach ensures holistic optimization by evolving both the instructions and in-context learning examples for better task performance.
Three key components of PromptWizard are te following :
Stage 1: Iterative optimization of instructions
Stage 2: Sequential optimization of instruction and examples
Follow these steps to set up the development environment and install the package:
Clone the repository
git clone https://github.com/microsoft/PromptWizard
cd PromptWizard
Create and activate a virtual environment
On Windows
python -m venv venv
venv\Scripts\activate
On macOS/Linux:
python -m venv venv
source venv/bin/activate
Install the package in development mode:
pip install -e .
There are three main ways to use PromptWizard:
NOTE : Refer this notebook to get a detailed understanding of the usage for each of the scenarios. This serves as a starting point to understand the usage of PromptWizard
promptopt_config.yaml
to set configurations. For example for GSM8k this file can be used.env
to set environmental varibles. For GSM8k this file can be usedAZURE_OPENAI_ENDPOINT="XXXXX"
# Replace with your Azure OpenAI Endpoint
OPENAI_API_VERSION="XXXX"
# Replace with the version of your API
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME="XXXXX"
# Create a deployment for the model and place the deployment name here.
task_description
,base_instruction
and answer_format
need to be changed for different datasets in BBII, the rest of the configs remain the same.jsonl
file format.jsonl
should have 2 fields :
question
: It should contain the complete question that is to asked to the LLManswer
: It should contain the ground truth answer which can be verbose or consizeNOTE : Refer to demos folder for examples of folders for four datasets. The .ipynb
in each of the folders shows how to run PromptWizard on that particular dataset. A similar procedure can be followed for a new dataset. Below is the explanation of each of the components of the .ipynb
and the dataset specifc folder structure in detail
Every new dataset needs to have the following
configs
folder to store files for defining optimization hyperparameters and setup configsdata
folder to store train.jsonl
and test.jsonl
as curated here (this is done in the notebooks).env
file for environment varibles to be used for API calling.py/.ipynb
script to run the codeSet the hyperparameters like number of mutations, refine steps, in-context examples etc.
task_description
: Desciption of the task at hand which will be fed into the prompt
You are a mathematics expert. You will be given a mathematics problem which you need to solve
base_instruction
: Base instruction in line with the dataset
Lets think step by step.
answer_format
: Instruction for specifying the answer format
answer_format
properly to ensure correct extraction by def extract_final_answer()
At the end, wrap only your final option between <ANS_START> and <ANS_END> tags
def extract_final_answer()
we can simply write code to extract string between the tagsseen_set_size
: The number of train samples to be used for prompt optimization
few_shot_count
: The number of in-context examples needed in the prompt
generate_reasoning
: Whether or not to generate reasoning for the in-context examples
generate_expert_identity
and generate_intent_keywords
: Having these helped improve the prompt as they help making the prompt relevant to the task
promptopt_config.yaml
files in folders present here for the descriptions used for AQUARAT, SVAMP and GSM8k. For BBII refer description.py which has the meta instructions for each of the datasetsrun_without_train_examples
is a global hyperparameter which can be used when there are no training samples and in-context examples are not required in the final promptgenerate_synthetic_examples
is a global hyperparameter which can be used when there are no training samples and we want to generate synthetic data for traininguse_examples
is a global hyperparameter which can be used to optimize prompts using training dataCreate a dataset specific class which inherits class DatasetSpecificProcessing
similar to GSM8k(DatasetSpecificProcessing)
in demo.ipynb and define the following functions in it
def extract_answer_from_output()
: This is a dataset specific function, given the answer
from the dataset it should extract and return a consize form of the answer. Note that based on the dataset it can also simply return the answer
as it is like in case of SVAMP and AQUARAT datasetsdef extract_final_answer()
: This is a LLM output specific function, given the verbose answer from the LLM it should extract and return the consize final answerdef access_answer()
: This function takes an input the LLM output, then does the following:
def extract_final_answer()
from the LLM output as defined abovedef access_answer()
in this notebookHere we define the various hyperparameters used in prompt optimization process found in promptopt_config.yaml
mutate_refine_iterations
: Number of iterations for conducting mutation of task description
followed by refinement of instructionsmutation_rounds
: Number of rounds of mutation to be performed when generating different stylesrefine_task_eg_iterations
: Number of iterations for refining task description and in context examplesstyle_variation
: Number of thinking style variations to be used in prompt mutationquestions_batch_size
: Number of questions to be asked to LLM in a single batch, during training stepmin_correct_count
: Minimum number of batches of questions to correctly answered, for a prompt to be considered as performing goodmax_eval_batches
: Maximum number of mini-batches on which we should evaluate the prompttop_n
: Number of top best prompts to be considered from scoring stage for the next stageseen_set_size
: Number of samples from trainset to be used for trainingfew_shot_count
: Number of in-context examples required in final promptFollowing are some of best pracitices we followed during are experiments
mutate_refine_iterations
,mutation_rounds
,refine_task_eg_iterations
to be 3 or 5seen_set_size
can be increased to 50 and few_shot_count
can be set based on the use casePromptWizard consistently outperforms other methods across various thresholds, maintaining the highest p(τ) values, indicating that it consistently performs near the best possible accuracy across all tasks
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com. When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA. This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
If you make use of our work, please cite our paper:
@misc{agarwal2024promptwizardtaskawarepromptoptimization,
title={PromptWizard: Task-Aware Prompt Optimization Framework},
author={Eshaan Agarwal and Joykirat Singh and Vivek Dani and Raghav Magazine and Tanuja Ganu and Akshay Nambi},
year={2024},
eprint={2405.18369},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2405.18369},
}
For guidelines and best practices related to Responsible AI, please refer to our Responsible AI Guidelines.