danijar / dreamerv3
- среда, 22 февраля 2023 г. в 00:13:48
Mastering Diverse Domains through World Models
A reimplementation of DreamerV3, a scalable and general reinforcement learning algorithm that masters a wide range of applications with fixed hyperparameters.
If you find this code useful, please reference in your paper:
@article{hafner2023dreamerv3,
title={Mastering Diverse Domains through World Models},
author={Hafner, Danijar and Pasukonis, Jurgis and Ba, Jimmy and Lillicrap, Timothy},
journal={arXiv preprint arXiv:2301.04104},
year={2023}
}
To learn more:
DreamerV3 learns a world model from experiences and uses it to train an actor critic policy from imagined trajectories. The world model encodes sensory inputs into categorical representations and predicts future representations and rewards given actions.
DreamerV3 masters a wide range of domains with a fixed set of hyperparameters, outperforming specialized methods. Removing the need for tuning reduces the amount of expert knowledge and computational resources needed to apply reinforcement learning.
Due to its robustness, DreamerV3 shows favorable scaling properties. Notably, using larger models consistently increases not only its final performance but also its data-efficiency. Increasing the number of gradient steps further increases data efficiency.
If you just want to run DreamerV3 on a custom environment, you can pip install dreamerv3
and copy example.py
from this repository as a starting
point.
If you want to make modifications to the code, you can either use the provided
Dockerfile
that contains instructions or follow the manual instructions
below.
Install JAX and then the other dependencies:
pip install -r requirements.txt
Simple training script:
python example.py
Flexible training script:
python dreamerv3/train.py \
--logdir ~/logdir/$(date "+%Y%m%d-%H%M%S") \
--configs crafter --batch_size 16 --run.train_ratio 32
configs.yaml
and you can override them
from the command line.debug
config block reduces the network size, batch size, duration
between logs, and so on for fast debugging (but does not learn a good model).--jax.platform cpu
flag. Note that multi-GPU support is untested.--configs crafter large
.Too many leaves for PyTreeDef
error, it means you're
reloading a checkpoint that is not compatible with the current config. This
often happens when reusing an old logdir by accident.small
, medium
, large
config blocks to reduce memory
requirements. The default is xlarge
. See the scaling graph above to see how
this affects performance.scripts
and the Dockerfile
for
reference.encoder.mlp_keys
, encode.cnn_keys
,
decoder.mlp_keys
and decoder.cnn_keys
.log_
prefix
and enable logging via the run.log_keys_...
options.--logdir
points to the same directory.This repository contains a reimplementation of DreamerV3 based on the open source DreamerV2 code base. It is unrelated to Google or DeepMind. The implementation has been tested to reproduce the official results on a range of environments.