Pytorch training code for UniQer and CLEVR ask environment.
- Ubuntu 16.04 / 18.04 (recommended)
- Python 3.8 (recommended)
- Cuda compatible GPU(s) (recommended)
All required packages can be installed via pipenv.
- Make sure that your
PyTorch
version is compatible with yourcuda
version. - If you use a different version of
PyTorch
than the one we used(1.2.0), the resnet download path may change. (checkivqg/src/modules/image_encoder/imagenet_model.py
)
$ pipenv install --skip-lock
$ pipenv shell
(ivqg) $ wandb login # follow the instruction
---
$ sudo apt install wkhtmltopdf # for pdf visualization
Downloadable version of datasets are available.
- CLEVR Ask3
- CLEVR_Ask3(ImageDataset) to
data/
- M85k3(Questions) to
data/
- CLEVR_Ask3(ImageDataset) to
- CLEVR Ask4
- CLEVR_Ask4(ImageDataset) to
data/
- M85k4(Questions) to
data/
- CLEVR_Ask4(ImageDataset) to
- Image vectors of CLEVR_Ask3 to
results/
- Image vectors of CLEVR_Ask4 to
results/
- Spatial vectors of CLEVR_Ask3 to
results/
- Spatial vectors of CLEVR_Ask4 to
results/
- restricted object list of CLEVR_Ask3 to
data/M85k3/save_data/
- restricted object list of CLEVR_Ask4 to
data/M85k4/save_data/
If you want to setup the dataset manually please see the following document.
(Pre-)trained models and pre-extracted features are available as:
The extracted folder should be placed under the root of the workspace.
Execute a training in supervised learning(chose either ask3_uniqer_rl.yaml
or ask4_uniqer_rl.yaml
)
$ pipenv run python src/main.py --train_single_tf --yaml_path params/ask3_uniqer_supervised.yaml
To evaluate the model, run the following:
$ pipenv run python src/main.py --check_single_tf --yaml_path params/ask3_uniqer_supervised.yaml
Execute a training in reinforcement learning (chose either ask3_uniqer_rl.yaml
or ask4_uniqer_rl.yaml
)
$ pipenv run python src/main.py --train_rl --yaml_path params/ask3_uniqer_rl.yaml
To evaluate the model, run the following:
$ pipenv run python src/main.py --check_rl --yaml_path params/ask3_uniqer_rl.yaml
- Supervised Learning: 587 epochs (set 50epochs patience / 1000 epochs)
- Reinforcement Learning: 150 epochs
-
Supervised Learning
-
For both Ask3 and Ask4 dataset, Uniqer was able to detect objects that match a given dialogue with near-perfect F1 score.
-
As for the QDT performance, the correct address ratio is higher than the perfect address ratio for both datasets.
Model F1 score Perfect Address Correct Address UniQer(Ask3) 0.994 57.67 % 86.91 % UniQer(Ask4) 0.994 43.20 % 69.79 %
-
-
Reinforcement Learning
-
Ask3 Dataset
Model Name New Image Task Success (%) New Object Task Success (%) Baseline 60.00 ± 6.35 59.60 ± 6.87 Ours(vanilla) 72.98 ± 3.13 72.88 ± 3.47 Ours(not unified MLP Guesser) 69.43 ± 2.75 69.50 ± 2.99 Ours(not unified) 50.61 ± 6.51 50.37 ± 6.02 Ours(full) 84.10 ± 4.41 83.96 ± 4.70 -
Ask4 Dataset
Model Name New Image Task Success (%) New Object Task Success (%) Baseline 64.75 ± 0.82 64.21 ± 0.34 Ours(vanilla) 67.38 ± 4.18 67.01 ± 4.34 Ours(not unified MLP Guesser) 72.89 ± 5.95 72.35 ± 5.94 Ours(not unified) 65.15 ± 3.33 64.25 ± 3.01 Ours(full) 81.20 ± 4.37 80.50 ± 4.86
-
The average runtime for each result, or estimated energy cost and a description of the computing infrastructure used
- Supervised Learning: 25h (with a Quadro RTX 8000)
- Reinforcement Learning: 20h (with a Quadro RTX 8000)
Issues and pull requests are always welcomed!