X-LANCE / AniTalker
- вторник, 14 мая 2024 г. в 00:00:08
conda create -n anitalker python==3.9.0
conda activate anitalker
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install -r requirements.txt
Please download the checkpoint and place them into the folder ckpts
Keep pose_yaw, pose_pitch, pose_roll to zero.
Demo script:
python ./code/demo_audio_generation.py \
--infer_type 'mfcc_pose_only' \
--stage1_checkpoint_path 'ckpts/stage1.ckpt' \
--stage2_checkpoint_path 'ckpts/stage2_pose_only.ckpt' \
--test_image_path 'test_demos/portraits/monalisa.jpg' \
--test_audio_path 'test_demos/audios/english_female.wav' \
--result_path 'results/monalisa_case1/' \
--control_flag True \
--seed 0 \
--pose_yaw 0 \
--pose_pitch 0 \
--pose_roll 0
Changing pose_yaw from 0
to 0.25
Demo script:
python ./code/demo.py \
--infer_type 'mfcc_pose_only' \
--stage1_checkpoint_path 'ckpts/stage1.ckpt' \
--stage2_checkpoint_path 'ckpts/stage2_pose_only.ckpt' \
--test_image_path 'test_demos/portraits/monalisa.jpg' \
--test_audio_path 'test_demos/audios/english_female.wav' \
--result_path 'results/monalisa_case2/' \
--control_flag True \
--seed 0 \
--pose_yaw 0.25 \
--pose_pitch 0 \
--pose_roll 0
Demo script:
python ./code/demo.py \
--infer_type 'mfcc_pose_only' \
--stage1_checkpoint_path 'ckpts/stage1.ckpt' \
--stage2_checkpoint_path 'ckpts/stage2_pose_only.ckpt' \
--test_image_path 'test_demos/portraits/monalisa.jpg' \
--test_audio_path 'test_demos/audios/english_female.wav' \
--result_path 'results/monalisa_case3/'
See MORE_SCRIPTS
@misc{liu2024anitalker,
title={AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding},
author={Tao Liu and Feilong Chen and Shuai Fan and Chenpeng Du and Qi Chen and Xie Chen and Kai Yu},
year={2024},
eprint={2405.03121},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
We would like to express our sincere gratitude to the numerous prior works that have laid the foundation for the development of AniTalker.
Stage 1, which primarily focuses on training the motion encoder and the rendering module, heavily relies on resources from LIA. The second stage of diffusion training is built upon diffae and espnet. For the computation of mutual information loss, we implement methods from CLUB and utilize AAM-softmax in the training of face recognition. Moreover, we leverage the pretrained Hubert model provided by TencentGameMate.
Additionally, we employ 3DDFA_V2 to extract head pose and torchlm to obtain face landmarks, which are used to calculate face location and scale. We have already open-sourced the code usage for these preprocessing steps at talking_face_preprocessing. We acknowledge the importance of building upon existing knowledge and are committed to contributing back to the research community by sharing our findings and code.
This library's code is not a formal product, and we have not tested all use cases; therefore, it cannot be directly offered to end-service customers.
The main purpose of making our code public is to facilitate academic demonstrations and communication. Any use of this code to spread harmful information is strictly prohibited.
Please use this library in compliance with the terms specified in the license file and avoid improper use.
When using the code, please follow and abide by local laws and regulations.
During the use of this code, you will bear the corresponding responsibility. Our company (AISpeech Ltd.) is not responsible for the generated results.