THUDM / slime
- суббота, 14 февраля 2026 г. в 00:00:04
slime is an LLM post-training framework for RL Scaling.
slime is an LLM post-training framework for RL scaling, providing two core capabilities:
slime is the RL-framework behind GLM-4.7, GLM-4.6, GLM-4.5 and apart from models from Z.ai, we also supports the following models:
Module Descriptions:
For a comprehensive quick start guide covering environment setup, data preparation, training startup, and key code analysis, please refer to:
We also provide examples for some use cases not covered in the quick start guide; please check examples.
slime has powered several novel research projects and production systems. Here are some notable examples:
P1 is a family of open-source physics reasoning models trained entirely through reinforcement learning. P1 leverages slime as the RL post training framework, and introduces a multi-stage RL training algorithm that progressively enhances reasoning ability through adaptive learnability adjustment and stabilization mechanisms. Enpowered by this training paradigm, P1 delivers breakthrough performance in open-source physics reasoning.
RLVE introduces an approach using verifiable environments that procedurally generate problems and provide algorithmically verifiable rewards, to scale up RL for language models (LMs). With joint training across 400 verifiable environments, RLVE enables each environment to dynamically adapt its problem difficulty distribution to the policy model's capabilities as training progresses.
TritonForge leverages slime's SFT & RL capabilities to train LLMs that automatically generate optimized GPU kernels. By using a two-stage training approach—supervised fine-tuning followed by reinforcement learning with multi-turn compilation feedback—TritonForge achieves remarkable results in converting PyTorch operations into high-performance Triton kernels.
APRIL introduces a system-level optimization that seamlessly integrates with slime to accelerate the rollout generation phase in RL training. By intelligently over-provisioning requests and actively managing partial completions, APRIL addresses the long-tail generation bottleneck that typically consumes over 90% of RL training time.
qqr (a.k.a. hilichurl) is a lightweight extension for slime designed to evolve open-ended agents. It implements the ArenaRL algorithm to tackle discriminative collapse through tournament-based relative ranking (e.g., Seeded Single-Elimination, Round-Robin) and seamlessly integrates the Model Context Protocol (MCP). qqr leverages slime's high-throughput training capabilities to enable scalable, distributed evolution of agents in standardized, decoupled tool environments.
These projects showcase slime's versatility—from training code-generation models to optimizing RL training systems—making it a powerful foundation for both research and production deployments.
Arguments in slime are divided into three categories:
--tensor-model-parallel-size 2.--sglang-. For example, --mem-fraction-static should be passed as --sglang-mem-fraction-static.For complete usage instructions, please refer to the Usage Documentation.
Contributions are welcome! If you have suggestions for new features, performance tuning, or feedback on user experience, feel free to submit an Issue or PR 😊
Use pre-commit to ensure code style consistency for your commits:
apt install pre-commit -y
pre-commit install
# run pre-commit to ensure code style consistency
pre-commit run --all-files --show-diff-on-failure --color=always@misc{slime_github,
author = {Zilin Zhu and Chengxing Xie and Xin Lv and slime Contributors},
title = {slime: An LLM post-training framework for RL Scaling},
year = {2025},
howpublished = {\url{https://github.com/THUDM/slime}},
note = {GitHub repository. Corresponding author: Xin Lv},
urldate = {2025-06-19}
}