Yufeng Yin,
Di Chang,
Guoxian Song,
Shen Sang,
Tiancheng Zhi,
Jing Liu,
Linjie Luo,
Mohammad Soleymani
USC ICT, ByteDance
WACV 2024
Arxiv
This is the official implementation of our WACV 2024 Algorithm Track paper: FG-Net: Facial Action Unit Detection with Generalizable Pyramidal Features.
FG-Net extracts feature maps from a StyleGAN2 model pre-trained on a large and diverse face image dataset. Then, these features are used to detect AUs with a Pyramid CNN Interpreter, making the training efficient and capturing essential local features. The proposed FG-Net achieves a strong generalization ability for heatmap-based AU detection thanks to the generalizable and semantic-rich features extracted from the pre-trained generative model. Extensive experiments are conducted to evaluate within- and cross-corpus AU detection with the widely-used DISFA and BP4D datasets. Compared with the state-of-the-art, the proposed method achieves superior cross-domain performance while maintaining competitive within-domain performance. In addition, FG-Net is dataefficient and achieves competitive performance even when trained on 1000 samples.
Clone repo:
git clone https://github.com/ihp-lab/FG-Net.git
cd FG-Net
The code is tested with Python == 3.7, PyTorch == 1.10.1 and CUDA == 11.3 on NVIDIA Quadro RTX 8000. We recommend you to use anaconda to manage dependencies.
conda create -n fg-net python=3.7
conda activate fg-net
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
conda install cudatoolkit-dev=11.3
pip install pandas
pip install tqdm
pip install -U scikit-learn
pip install opencv-python
pip install dlib
pip install imutils
Download the BP4D and DISFA dataset from the official website.
We first pre-process the input image by dlib to obatain the facial landmarks. The detected landmark are used to crop and align the face by FFHQ-alignment. We finally use dlib again to detect the facial labdmarks for the aligned images to generate heatmaps.
You should get a dataset folder like below:
data
├── DISFA
│ ├── labels
│ │ └── 0
│ │ │ ├── train.csv
│ │ │ └── test.csv
│ │ ├── 1
│ │ └── 2
│ ├── aligned_images
│ └── aligned_landmarks
└── BP4D
StyleGAN2: To get pytorch checkpoints for StyleGAN2 (stylegan2-ffhq-config-f.pt), check Section Convert weight from official checkpoints
StyleGAN2 Encoder:
pSp encoder. Rename the pt file to encoder.pt
.
Put all checkpoints under the folder /code/checkpoints
.
cd code
CUDA_VISIBLE_DEVICES=0 python train_interpreter.py --exp experiments/bp4d_0.json
CUDA_VISIBLE_DEVICES=1 python train_interpreter.py --exp experiments/disfa_0.json
cd code
CUDA_VISIBLE_DEVICES=0 python eval_interpreter.py --exp experiments/eval_b2d.json
CUDA_VISIBLE_DEVICES=1 python eval_interpreter.py --exp experiments/eval_d2b.json
cd code
CUDA_VISIBLE_DEVICES=0 python single_image_inference.py --exp experiments/single_image_inference_bp4d.json
CUDA_VISIBLE_DEVICES=1 python single_image_inference.py --exp experiments/single_image_inference_disfa.json
Our code is distributed under the MIT License. See LICENSE
file for more information.
@article{yin2023fg,
title={FG-Net: Facial Action Unit Detection with Generalizable Pyramidal Features},
author={Yin, Yufeng and Chang, Di and Song, Guoxian and Sang, Shen and Zhi, Tiancheng and Liu, Jing and Luo, Linjie and Soleymani, Mohammad},
journal={arXiv preprint arXiv:2308.12380},
year={2023}
}
If you have any questions, please raise an issue or email to Yufeng Yin ([email protected]
or [email protected]
).
Our code follows several awesome repositories. We appreciate them for making their codes available to public.