Winfredy / SadTalker
- понедельник, 10 апреля 2023 г. в 00:14:25
(CVPR 2023)SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
TL;DR: single portrait image
extensions -> install from URL -> https://github.com/Winfredy/SadTalker
, checkout more details here.full image mode
is online! checkout here for more details.still+enhancer in v0.0.1 | still + enhancer in v0.0.2 | input image @bagbag1815 |
---|---|---|
![]() |
still mode
, reference mode
, resize mode
are online for better and custom applications.
[2023.04.08]:
[2023.04.08]: v0.0.2, full image animation, adding baidu driver for download checkpoints. Optimizing the logic about enhancer.
git clone https://github.com/Winfredy/SadTalker.git
cd SadTalker
conda create -n sadtalker python=3.8
conda activate sadtalker
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
conda install ffmpeg
pip install -r requirements.txt
### tts is optional for gradio demo.
### pip install TTS
More tips about installnation on Windows and the Docker file can be founded here
Installing the lastest version of stable-diffusion-webui and install the sadtalker via extension
.
Then, restarting the stable-diffusion-webui(The models will be downloaded automatically in the right place if you have a good speed network). Or (Important!) you need pre-download sadtalker checkpoints to SADTALKTER_CHECKPOINTS
in webui_user.sh
(linux) or webui_user.bat
(windows) by:
# windows (webui_user.bat)
set COMMANDLINE_ARGS=--no-gradio-queue --disable-safe-unpickle
set SADTALKER_CHECKPOINTS=D:\SadTalker\checkpoints
# linux (webui_user.sh)
export COMMANDLINE_ARGS=--no-gradio-queue --disable-safe-unpickle
export SADTALKER_CHECKPOINTS=/path/to/SadTalker/checkpoints
After installation, the SadTalker can be used in stable-diffusion-webui directly. (Some important discussion if you are unable to use full
mode).
You can run the following script to put all the models in the right place.
bash scripts/download_models.sh
OR download our pre-trained model from google drive or our github release page, and then, put it in ./checkpoints.
OR we provided the downloaded model in 百度云盘 提取码: sadt.
Model | Description |
---|---|
checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker. |
checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker. |
checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker. |
checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker. |
checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from the reappearance of face-vid2vid. |
checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in Deep3DFaceReconstruction. |
checkpoints/wav2lip.pth | Highly accurate lip-sync model in Wav2lip. |
checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in dilb. |
checkpoints/BFM | 3DMM library file. |
checkpoints/hub | Face detection models used in face alignment. |
python inference.py --driven_audio <audio.wav> --source_image <video.mp4 or picture.png> --enhancer gfpgan
The results will be saved in results/$SOME_TIMESTAMP/*.mp4
.
More examples and configuration and tips can be founded in the >>> best practice documents <<<.
Using --still
to generate a natural full body video. You can add enhancer
to improve the quality of the generated video.
python inference.py --driven_audio <audio.wav> \
--source_image <video.mp4 or picture.png> \
--result_dir <a file to store results> \
--still \
--preprocess full \
--enhancer gfpgan
A local gradio demo similar to our hugging-face demo can be run by:
## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
python app.py
If you find our work useful in your research, please consider citing:
@article{zhang2022sadtalker,
title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
journal={arXiv preprint arXiv:2211.12194},
year={2022}
}
Facerender code borrows heavily from zhanglonghao's reproduction of face-vid2vid and PIRender. We thank the authors for sharing their wonderful code. In training process, We also use the model from Deep3DFaceReconstruction and Wav2lip. We thank for their wonderful work.
This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.
LOGO: color and font suggestion: ChatGPT, logo font:Montserrat Alternates .
All the copyright of the demo images and audio are from communities users or the geneartion from stable diffusion. Free free to contact us if you feel uncomfortable.