I am trying to train my DeepLabCut network on some animal videos I have using departmental resources - GPU. I have activated my environment, made sure there are no broken requirements, no matter what script configuration I use I still get the FileNotFoundError: pose_cfg.yaml error. I have successfully trained with this DLC project locally on my CPU so I know the project structure is sound. I also have ensured the pose_cfg.yaml file exists in the following directory of my project:
dlc-models/iteration-0/blinktrackJul31-trainset95shuffle1/train/pose_cfg.yaml
This is the script I am using currently to submit my job to the slurm server and this is the output I get from the .out and .err files:
#!/bin/bash
#SBATCH --job-name=deepcut_train
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --time=10:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --out=deep_cut.out.%J
#SBATCH --err=deep_cut.err.%J
# Load Anaconda
source /opt/anaconda3/etc/profile.d/conda.sh
# Activate environment
conda activate tensorflow-gpu-env
# versions of TensorFlow, Keras, and DeepLabCut
python -c "import tensorflow as tf; print('TensorFlow version:', tf.__version__)"
python -c "import keras; print('Keras version:', keras.__version__)"
python -c "import deeplabcut; print('DeepLabCut version:', deeplabcut.__version__)"
# Create the dataset for training
python -c "import deeplabcut; deeplabcut.create_training_dataset('./config.yaml')"
# Run DeepLabCut training
python -c "import deeplabcut; deeplabcut.train_network('./config.yaml', maxiters=150000, displayiters=10000, saveiters=20000)"
# check GPU availability
python -c "import tensorflow as tf; print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))"
.out:
TensorFlow version: 2.11.0
Keras version: 2.11.0
Loading DLC 2.3.6...
DLC loaded in light mode; you cannot use any GUI (labeling, relabeling and standalone GUI)
DeepLabCut version: 2.3.6
Loading DLC 2.3.6...
DLC loaded in light mode; you cannot use any GUI (labeling, relabeling and standalone GUI)
Selecting single-animal trainer
Num GPUs Available: 1
.err:
2024-09-02 17:57:33.982224: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-02 17:57:35.121651: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-09-02 17:57:35.121741: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-09-02 17:57:35.121753: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-09-02 17:57:37.515305: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-02 17:57:38.471934: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-09-02 17:57:38.471996: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-09-02 17:57:38.472003: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-09-02 17:57:40.213478: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-02 17:57:41.161126: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-09-02 17:57:41.161188: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-09-02 17:57:41.161195: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-09-02 17:57:47.390557: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-02 17:57:48.351630: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-09-02 17:57:48.351734: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-09-02 17:57:48.351756: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/impacs/std31/.conda/envs/tensorflow-gpu-env/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/training.py", line 223, in train_network
raise e
File "/impacs/std31/.conda/envs/tensorflow-gpu-env/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/training.py", line 219, in train_network
allow_growth=allow_growth,
File "/impacs/std31/.conda/envs/tensorflow-gpu-env/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/core/train.py", line 161, in train
cfg = load_config(config_yaml)
File "/impacs/std31/.conda/envs/tensorflow-gpu-env/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/config.py", line 71, in load_config
return cfg_from_file(filename)
File "/impacs/std31/.conda/envs/tensorflow-gpu-env/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/config.py", line 49, in cfg_from_file
with open(filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'dlc-models/iteration-0/blinktrackJul31-trainset95shuffle1/train/pose_cfg.yaml'
2024-09-02 17:57:54.125175: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-02 17:57:55.056195: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-09-02 17:57:55.056258: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-09-02 17:57:55.056264: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
I am using the following versions:
deeplabcut - 2.3.6 tensorflow - 2.11.0 numpy - 1.21.6 pandas - 1.3.5
Things I have tried:
Resetting and creating a clean environment. Different versions of the above 4. Variations in the script including syntax, order of commands etc.
Any advice would be appreciated and tried.