FoundationVision / VAR
- воскресенье, 7 апреля 2024 г. в 00:00:01
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction"
This is the official PyTorch implementation of Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction.
We provide a demo website for you to play with VAR models and generate images interactively. Enjoy the fun of visual autoregressive modeling!
We also provide demo_sample.ipynb for you to see more technical details about VAR.
Visual Autoregressive Modeling (VAR) redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".
We provide VAR models for you to play with, which are on or can be downloaded from the following links:
model | reso. | FID | rel. cost | #params | HF weights🤗 |
---|---|---|---|---|---|
VAR-d16 | 256 | 3.55 | 0.4 | 310M | var_d16.pth |
VAR-d20 | 256 | 2.95 | 0.5 | 600M | var_d20.pth |
VAR-d24 | 256 | 2.33 | 0.6 | 1.0B | var_d24.pth |
VAR-d30 | 256 | 1.97 | 1 | 2.0B | var_d30.pth |
VAR-d30-re | 256 | 1.80 | 1 | 2.0B | var_d30.pth |
You can load these models to generate images via the codes in demo_sample.ipynb. Note: you need to download vae_ch160v4096z32.pth first.
This project is licensed under the MIT License - see the LICENSE file for details.
If our work assists your research, feel free to give us a star ⭐ or cite us using:
@Article{VAR,
title={Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction},
author={Keyu Tian and Yi Jiang and Zehuan Yuan and Bingyue Peng and Liwei Wang},
year={2024},
eprint={2404.02905},
archivePrefix={arXiv},
primaryClass={cs.CV}
}