eloialonso / iris
- вторник, 6 сентября 2022 г. в 00:36:20
Transformers are Sample Efficient World Models
Transformers are Sample Efficient World Models
Vincent Micheli*, Eloi Alonso*, François Fleuret
* Denotes equal contribution
tl;dr
If you find this code or paper useful, please use the following reference:
@article{iris2022,
title={Transformers are Sample Efficient World Models},
author={Micheli, Vincent and Alonso, Eloi and Fleuret, François},
journal={arXiv preprint arXiv:2209.00588},
year={2022}
}
pip install -r requirements.txtpython src/main.py env.train.id=BreakoutNoFrameskip-v4 common.device=cuda:0 wandb.mode=onlineBy default, the logs are synced to weights & biases, set wandb.mode=disabled to turn it off.
config/, the main configuration file is config/trainer.yaml.Each new run is located at outputs/YYYY-MM-DD/hh-mm-ss/. This folder is structured as:
outputs/YYYY-MM-DD/hh-mm-ss/
│
└─── checkpoints
│ │ last.pt
| | optimizer.pt
| | ...
│ │
│ └─── dataset
│ │ 0.pt
│ │ 1.pt
│ │ ...
│
└─── config
│ | trainer.yaml
|
└─── media
│ │
│ └─── episodes
│ | │ ...
│ │
│ └─── reconstructions
│ | │ ...
│
└─── scripts
| | eval.py
│ │ play.sh
│ │ resume.sh
| | ...
|
└─── src
| | ...
|
└─── wandb
| ...
checkpoints: contains the last checkpoint of the model, its optimizer and the dataset.media:
episodes: contains train / test / imagination episodes for visualization purposes.reconstructions: contains original frames alongside their reconstructions with the autoencoder.scripts: from the run folder, you can use the following three scripts.
eval.py: Launch python ./scripts/eval.py to evaluate the run.resume.sh: Launch ./scripts/resume.sh to resume a training that crashed.play.sh: Tool to visualize some interesting aspects of the run.
./scripts/play.sh -a to watch the agent play live in the environment. The left panel displays the original environment, and the right panel shows what the agent actually sees through its discrete autoencoder../scripts/play.sh -w to unroll live trajectories with your keyboard inputs (i.e. to play in the world model). Note that for faster interaction, the memory of the Transformer is flushed every 20 frames../scripts/play.sh to visualize the episodes contained in media/episodes.The folder results/data/ contains raw scores (for each game, and for each training run) for IRIS and the baselines.
Use the notebook results/results_iris.ipynb to reproduce the figures from the paper.