deepseek-ai / ESFT
- четверг, 30 января 2025 г. в 00:00:03
Expert Specialized Fine-Tuning
Official Repo for paper Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models by Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li and Y. Wu.
ESFT aims to efficiently customize Large Language Models (LLMs) with Mixture-of-Experts (MoE) architecture by adjusting only task-relevant parts, improving efficiency and performance while using fewer resources and storage.
📅 2024.9.20: Glad to announce that ESFT has been accepted to the EMNLP 2024 Main Conference!
📅 2024.8.11: We now release the ESFT training code! ✨ You can now try it with your own models and dataset!
git clone https://github.com/deepseek-ai/ESFT.git
cd esft
pip install transformers torch safetensors accelerate
bash scripts/download_adapters.sh
Usage:
python eval_multigpu.py \
--eval_dataset=translation \
--base_model_path=deepseek-ai/ESFT-vanilla-lite \
--adapter_dir=all_models/adapters/token/translation \
--output_path=results/completions/token/translation.jsonl \
--openai_api_key=YOUR_OPENAI_API_KEY
python scripts/expert/get_expert_scores.py \
--eval_dataset=translation \
--base_model_path=deepseek-ai/ESFT-vanilla-lite \
--output_dir=results/expert_scores/translation \
--n_sample_tokens=131072 \
--world_size=4 \
--gpus_per_rank=2
python scripts/expert/generate_expert_config.py \
--eval_datasets=intent,summary,law,translation \
--expert_scores_dir=results/expert_scores \
--output_dir=results/expert_configs \
--score_function=token \
--top_p=0.2 # the scoring function and top_p are hyperparameters
python train.py \
--base_model_path=deepseek-ai/ESFT-vanilla-lite \
--expert_config=results/expert_configs/intent.json \
--train_dataset=intent \
--train_config=configs/base.yaml \
--output_dir=results/checkpoints/intent
torchrun --nproc-per-node=8 train_ep.py \
--base_model_path=deepseek-ai/ESFT-vanilla-lite \
--expert_config=results/expert_configs/translation.json \
--train_dataset=translation \
--train_config=configs/base.yaml \
--output_dir=results/checkpoints/translation
For bug reports, feature requests, and general inquiries, please open an issue on our GitHub Issues page. Make sure to include as much detail as possible to help us address your issue quickly.
If you find our code or paper useful, please cite:
@article{wang2024letexpertsticklast,
title={Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models},
author={Zihan Wang and Deli Chen and Damai Dai and Runxin Xu and Zhuoshu Li and Y. Wu},
year={2024},
eprint={2407.01906},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.01906},
}