MiVOLO: Multi-input Transformer for Age and Gender Estimation, Maksim Kuprashevich, Irina Tolstykh, 2023 arXiv 2307.04616
[Paper
] [Demo
] [BibTex
] [Data
]
Gender & Age recognition performance.
Model | Type | Dataset | Age MAE | Age CS@5 | Gender Accuracy | download |
---|---|---|---|---|---|---|
volo_d1 | face_only, age | IMDB-cleaned | 4.29 | 67.71 | - | checkpoint |
volo_d1 | face_only, age, gender | IMDB-cleaned | 4.22 | 68.68 | 99.38 | checkpoint |
mivolo_d1 | face_body, age, gender | IMDB-cleaned | 4.24 [face+body] 6.87 [body] |
68.32 [face+body] 46.32 [body] |
99.46 [face+body] 96.48 [body] |
checkpoint |
volo_d1 | face_only, age | UTKFace | 4.23 | 69.72 | - | checkpoint |
volo_d1 | face_only, age, gender | UTKFace | 4.23 | 69.78 | 97.69 | checkpoint |
mivolo_d1 | face_body, age, gender | Lagenda | 3.99 [face+body] | 71.27 [face+body] | 97.36 [face+body] | demo |
Please, cite our paper if you use any this data!
-
Lagenda dataset: images and annotation.
-
IMDB-clean: follow these instructions to get images and download our annotations.
-
UTK dataset: origin full images and our annotation: split from the article, random full split.
-
Adience dataset: follow these instructions to get images and download our annotations.
Click to expand!
After downloading them, your
data
directory should look something like this:data └── Adience ├── annotations (folder with our annotations) ├── aligned (will not be used) ├── faces ├── fold_0_data.txt ├── fold_1_data.txt ├── fold_2_data.txt ├── fold_3_data.txt └── fold_4_data.txt
We use coarse aligned images from
faces/
dir.Using our detector we found a face bbox for each image (see tools/prepare_adience.py).
This dataset has five folds. The performance metric is accuracy on five-fold cross validation.
images before removal fold 0 fold 1 fold 2 fold 3 fold 4 19,370 4,484 3,730 3,894 3,446 3,816 Not complete data
only age not found only gender not found SUM 40 1170 1,210 (6.2 %) Removed data
failed to process image age and gender not found SUM 0 708 708 (3.6 %) Genders
female male 9,372 8,120 Ages (8 classes) after mapping to not intersected ages intervals
0-2 4-6 8-12 15-20 25-32 38-43 48-53 60-100 2,509 2,140 2,293 1,791 5,589 2,490 909 901 -
FairFace dataset: follow these instructions to get images and download our annotations.
Click to expand!
After downloading them, your
data
directory should look something like this:data └── FairFace ├── annotations (folder with our annotations) ├── fairface-img-margin025-trainval (will not be used) ├── train ├── val ├── fairface-img-margin125-trainval ├── train ├── val ├── fairface_label_train.csv ├── fairface_label_val.csv
We use aligned images from
fairface-img-margin125-trainval/
dir.Using our detector we found a face bbox for each image and added a person bbox if it was possible (see tools/prepare_fairface.py).
This dataset has 2 splits: train and val. The performance metric is accuracy on validation.
images train images val 86,744 10,954 Genders for validation
female male 5,162 5,792 Ages for validation (9 classes):
0-2 3-9 10-19 20-29 30-39 40-49 50-59 60-69 70+ 199 1,356 1,181 3,300 2,330 1,353 796 321 118
Install pytorch 1.13+ and other requirements.
pip install -r requirements.txt
pip install .
- Download body + face detector model to
models/yolov8x_person_face.pt
- Download mivolo checkpoint to
models/mivolo_imbd.pth.tar
wget https://variety.com/wp-content/uploads/2023/04/MCDNOHA_SP001.jpg -O jennifer_lawrence.jpg
python3 demo.py \
--input "jennifer_lawrence.jpg" \
--output "output" \
--detector-weights "models/yolov8x_person_face.pt " \
--checkpoint "models/mivolo_imbd.pth.tar" \
--device "cuda:0" \
--with-persons \
--draw
To reproduce validation metrics:
- Download prepared annotations for imbd-clean / utk / adience / lagenda / fairface.
- Download checkpoint
- Run validation:
python3 eval_pretrained.py \
--dataset_images /path/to/dataset/utk/images \
--dataset_annotations /path/to/dataset/utk/annotation \
--dataset_name utk \
--split valid \
--batch-size 512 \
--checkpoint models/mivolo_imbd.pth.tar \
--half \
--with-persons \
--device "cuda:0"
Supported dataset names: "utk", "imdb", "lagenda", "fairface", "adience".
Please, see here
If you use our models, code or dataset, we kindly request you to cite the following paper and give repository a ⭐
@article{mivolo2023,
Author = {Maksim Kuprashevich and Irina Tolstykh},
Title = {MiVOLO: Multi-input Transformer for Age and Gender Estimation},
Year = {2023},
Eprint = {arXiv:2307.04616},
}