williamyang1991 / Rerender_A_Video
- среда, 20 сентября 2023 г. в 00:00:02
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Shuai Yang, Yifan Zhou, Ziwei Liu and Chen Change Loy
in SIGGRAPH Asia 2023 Conference Proceedings
Project Page | Paper | Supplementary Video | Input Data and Video Results
Abstract: Large text-to-image diffusion models have exhibited impressive proficiency in generating high-quality images. However, when applying these models to video domain, ensuring temporal consistency across video frames remains a formidable challenge. This paper proposes a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos. The framework includes two parts: key frame translation and full video translation. The first part uses an adapted diffusion model to generate key frames, with hierarchical cross-frame constraints applied to enforce coherence in shapes, textures and colors. The second part propagates the key frames to other frames with temporal-aware patch matching and frame blending. Our framework achieves global style and local texture temporal consistency at a low cost (without re-training or optimization). The adaptation is compatible with existing image diffusion techniques, allowing our framework to take advantage of them, such as customizing a specific subject with LoRA, and introducing extra spatial guidance with ControlNet. Extensive experimental results demonstrate the effectiveness of our proposed framework over existing methods in rendering high-quality and temporally-coherent videos.
Features:
Please make sure your installation path only contain English letters or _
git submodule update --init --recursive
)git clone git@github.com:williamyang1991/Rerender_A_Video.git --recursive
cd Rerender_A_Video
pip install -r requirements.txt
You can also create a new conda environment from scratch.
conda env create -f environment.yml
conda activate rerender
./models
.python install.py
rerender.py
python rerender.py --cfg config/real2sculpture.json
Before running the above 1-4 steps, you need prepare:
KeyError: 'dataset'
: upgrade Gradio to the latest version (#14 (comment))python webUI.py
The Gradio app also allows you to flexibly change the inference options. Just try it for more details. (For WebUI, you need to download revAnimated_v11 and realisticVisionV20_v20 to ./models/
after Installation)
Upload your video, input the prompt, select the seed, and hit:
We provide abundant advanced options to play with
sd_model_cfg.py
to add paths to the saved SD modelscontrol_type = gr.Dropdown(['HED', 'canny', 'depth']
here Line 690 in b6cafb5
elif control_type == 'depth':
following Line 88 in b6cafb5
elif control_type == 'depth':
following Line 122 in b6cafb5
We also provide a flexible script rerender.py
to run our method.
Set the options via command line. For example,
python rerender.py --input videos/pexels-antoni-shkraba-8048492-540x960-25fps.mp4 --output result/man/man.mp4 --prompt "a handsome man in van gogh painting"
The script will run the full pipeline. A work directory will be created at result/man
and the result video will be saved as result/man/man.mp4
Set the options via a config file. For example,
python rerender.py --cfg config/van_gogh_man.json
The script will run the full pipeline.
We provide some examples of the config in config
directory.
Most options in the config is the same as those in WebUI.
Please check the explanations in the WebUI section.
Specifying customized models by setting sd_model
in config. For example:
{
"sd_model": "models/realisticVisionV20_v20.safetensors",
}
Similar to WebUI, we provide three-step workflow: Rerender the first key frame, then rerender the full key frames, finally rerender the full video with propagation. To run only a single step, specify options -one
, -nb
and -nr
:
python rerender.py --cfg config/van_gogh_man.json -one -nb
python rerender.py --cfg config/van_gogh_man.json -nb
python rerender.py --cfg config/van_gogh_man.json -nr
We provide a separate Ebsynth python script video_blend.py
with the temporal blending algorithm introduced in
Stylizing Video by Example for interpolating style between key frames.
It can work on your own stylized key frames independently of our Rerender algorithm.
Usage:
video_blend.py [-h] [--output OUTPUT] [--fps FPS] [--beg BEG] [--end END] [--itv ITV] [--key KEY]
[--n_proc N_PROC] [-ps] [-ne] [-tmp]
name
positional arguments:
name Path to input video
optional arguments:
-h, --help show this help message and exit
--output OUTPUT Path to output video
--fps FPS The FPS of output video
--beg BEG The index of the first frame to be stylized
--end END The index of the last frame to be stylized
--itv ITV The interval of key frame
--key KEY The subfolder name of stylized key frames
--n_proc N_PROC The max process count
-ps Use poisson gradient blending
-ne Do not run ebsynth (use previous ebsynth output)
-tmp Keep temporary output
For example, to run Ebsynth on video man.mp4
,
videos/man/keys
for every 10 frames (named as 0001.png
, 0011.png
, ...)videos/man/video
(named as 0001.png
, 0002.png
, ...).videos/man/blend.mp4
under FPS 25 with the following command:python video_blend.py videos/man \
--beg 1 \
--end 101 \
--itv 10 \
--key keys \
--output videos/man/blend.mp4 \
--fps 25.0 \
-ps
![]() |
![]() |
![]() |
![]() |
white ancient Greek sculpture, Venus de Milo, light pink and blue background | a handsome Greek man | a traditional mountain in chinese ink wash painting | a cartoon tiger |
![]() |
![]() |
![]() |
![]() |
a swan in chinese ink wash painting, monochrome | a beautiful woman in CG style | a clean simple white jade sculpture | a fluorescent jellyfish in the deep dark blue sea |
Text-guided virtual character generation.
Video stylization and video editing.
If you find this work useful for your research, please consider citing our paper:
@inproceedings{yang2023rerender,
title = {Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation},
author = {Yang, Shuai and Zhou, Yifan and Liu, Ziwei and and Loy, Chen Change},
booktitle = {ACM SIGGRAPH Asia Conference Proceedings},
year = {2023},
}
The code is mainly developed based on ControlNet, Stable Diffusion, GMFlow and Ebsynth.