ashawkey / stable-dreamfusion
- воскресенье, 9 октября 2022 г. в 00:32:48
A pytorch implementation of text-to-3D dreamfusion, powered by stable diffusion.
A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model.
The original paper's project page: DreamFusion: Text-to-3D using 2D Diffusion.
Examples generated from text prompt a high quality photo of a pineapple
viewed with the GUI in real time:
This project is a work-in-progress, and contains lots of differences from the paper. Also, many features are still not implemented now. The current generation quality cannot match the results from the original paper, and many prompts still fail badly!
git clone https://github.com/ashawkey/stable-dreamfusion.git
cd stable-dreamfusion
Important: To download the Stable Diffusion model checkpoint, you should provide your access token. You could choose either of the following ways:
huggingface-cli login
and enter your token.TOKEN
under this directory (i.e., stable-dreamfusion/TOKEN
) and copy your token into it.pip install -r requirements.txt
# (optional) install nvdiffrast for exporting textured mesh (--save_mesh)
pip install git+https://github.com/NVlabs/nvdiffrast/
# (optional) install the tcnn backbone if using --tcnn
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
# (optional) install CLIP guidance for the dreamfield setting
pip install git+https://github.com/openai/CLIP.git
By default, we use load
to build the extension at runtime.
We also provide the setup.py
to build each extension:
# install all extension modules
bash scripts/install_ext.sh
# if you want to install manually, here is an example:
pip install ./raymarching # install to python path (you still need the raymarching/ folder, since this only installs the built extension.)
First time running will take some time to compile the CUDA extensions.
### stable-dreamfusion setting
## train with text prompt
# `-O` equals `--cuda_ray --fp16 --dir_text`
python main.py --text "a hamburger" --workspace trial -O
## after the training is finished:
# test (exporting 360 video, and an obj mesh with png texture)
python main.py --workspace trial -O --test
# test with a GUI (free view control!)
python main.py --workspace trial -O --test --gui
### dreamfields (CLIP) setting
python main.py --text "a hamburger" --workspace trial_clip -O --guidance clip
python main.py --text "a hamburger" --workspace trial_clip -O --test --gui --guidance clip
This is a simple description of the most important implementation details. If you are interested in improving this repo, this might be a starting point. Any contribution would be greatly appreciated!
./nerf/sd.py > StableDiffusion > train_step
:# 1. we need to interpolate the NeRF rendering to 512x512, to feed it to SD's VAE.
pred_rgb_512 = F.interpolate(pred_rgb, (512, 512), mode='bilinear', align_corners=False)
# 2. image (512x512) --- VAE --> latents (64x64), this is SD's difference from Imagen.
latents = self.encode_imgs(pred_rgb_512)
... # timestep sampling, noise adding and UNet noise predicting
# 3. the SDS loss, since UNet part is ignored and cannot simply audodiff, we manually set the grad for latents.
w = (1 - self.scheduler.alphas_cumprod[t]).to(self.device)
grad = w * (noise_pred - noise)
latents.backward(gradient=grad, retain_graph=True)
./nerf/utils.py > Trainer > train_step
.
./nerf/renderer.py > NeRFRenderer > run_cuda
.
--cuda_ray
) may harm the generation progress, since once a grid cell is marked as empty, rays won't pass it later...--cuda_ray
also works now:
# `-O2` equals `--fp16 --dir_text`
python main.py --text "a hamburger" --workspace trial -O2 # faster training, but slower rendering
./nerf/network*.py > NeRFNetwork > forward
. Current implementation harms training and is disabled.
--albedo_iters 1000
to enable random shading mode after 1000 steps from albedo, lambertian, and textureless../nerf/provider.py > get_view_direction
.
--angle_overhead, --angle_front
to set the border. How to better divide front/back/side regions?./nerf/network*.py
) can be chosen by the --backbone
option, but tcnn
and vanilla
are not well tested../nerf/network*.py > NeRFNetwork > gaussian
.The amazing original work: DreamFusion: Text-to-3D using 2D Diffusion.
@article{poole2022dreamfusion,
author = {Poole, Ben and Jain, Ajay and Barron, Jonathan T. and Mildenhall, Ben},
title = {DreamFusion: Text-to-3D using 2D Diffusion},
journal = {arXiv},
year = {2022},
}
Huge thanks to the Stable Diffusion and the diffusers library.
@misc{rombach2021highresolution,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
year={2021},
eprint={2112.10752},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{von-platen-etal-2022-diffusers,
author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},
title = {Diffusers: State-of-the-art diffusion models},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/huggingface/diffusers}}
}
The GUI is developed with DearPyGui.