eloialonso / iris
- вторник, 6 сентября 2022 г. в 00:36:20
Transformers are Sample Efficient World Models
Transformers are Sample Efficient World Models
Vincent Micheli*, Eloi Alonso*, François Fleuret
* Denotes equal contribution
tl;dr
If you find this code or paper useful, please use the following reference:
@article{iris2022,
title={Transformers are Sample Efficient World Models},
author={Micheli, Vincent and Alonso, Eloi and Fleuret, François},
journal={arXiv preprint arXiv:2209.00588},
year={2022}
}
pip install -r requirements.txt
python src/main.py env.train.id=BreakoutNoFrameskip-v4 common.device=cuda:0 wandb.mode=online
By default, the logs are synced to weights & biases, set wandb.mode=disabled
to turn it off.
config/
, the main configuration file is config/trainer.yaml
.Each new run is located at outputs/YYYY-MM-DD/hh-mm-ss/
. This folder is structured as:
outputs/YYYY-MM-DD/hh-mm-ss/
│
└─── checkpoints
│ │ last.pt
| | optimizer.pt
| | ...
│ │
│ └─── dataset
│ │ 0.pt
│ │ 1.pt
│ │ ...
│
└─── config
│ | trainer.yaml
|
└─── media
│ │
│ └─── episodes
│ | │ ...
│ │
│ └─── reconstructions
│ | │ ...
│
└─── scripts
| | eval.py
│ │ play.sh
│ │ resume.sh
| | ...
|
└─── src
| | ...
|
└─── wandb
| ...
checkpoints
: contains the last checkpoint of the model, its optimizer and the dataset.media
:
episodes
: contains train / test / imagination episodes for visualization purposes.reconstructions
: contains original frames alongside their reconstructions with the autoencoder.scripts
: from the run folder, you can use the following three scripts.
eval.py
: Launch python ./scripts/eval.py
to evaluate the run.resume.sh
: Launch ./scripts/resume.sh
to resume a training that crashed.play.sh
: Tool to visualize some interesting aspects of the run.
./scripts/play.sh -a
to watch the agent play live in the environment. The left panel displays the original environment, and the right panel shows what the agent actually sees through its discrete autoencoder../scripts/play.sh -w
to unroll live trajectories with your keyboard inputs (i.e. to play in the world model). Note that for faster interaction, the memory of the Transformer is flushed every 20 frames../scripts/play.sh
to visualize the episodes contained in media/episodes
.The folder results/data/
contains raw scores (for each game, and for each training run) for IRIS and the baselines.
Use the notebook results/results_iris.ipynb
to reproduce the figures from the paper.