0

I am trying to train my DeepLabCut network on some animal videos I have using departmental resources - GPU. I have activated my environment, made sure there are no broken requirements, no matter what script configuration I use I still get the FileNotFoundError: pose_cfg.yaml error. I have successfully trained with this DLC project locally on my CPU so I know the project structure is sound. I also have ensured the pose_cfg.yaml file exists in the following directory of my project:

dlc-models/iteration-0/blinktrackJul31-trainset95shuffle1/train/pose_cfg.yaml

This is the script I am using currently to submit my job to the slurm server and this is the output I get from the .out and .err files:

#!/bin/bash
#SBATCH --job-name=deepcut_train
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --time=10:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --out=deep_cut.out.%J
#SBATCH --err=deep_cut.err.%J

# Load Anaconda
source /opt/anaconda3/etc/profile.d/conda.sh

# Activate environment
conda activate tensorflow-gpu-env

# versions of TensorFlow, Keras, and DeepLabCut
python -c "import tensorflow as tf; print('TensorFlow version:', tf.__version__)"
python -c "import keras; print('Keras version:', keras.__version__)"
python -c "import deeplabcut; print('DeepLabCut version:', deeplabcut.__version__)"

# Create the dataset for training
python -c "import deeplabcut; deeplabcut.create_training_dataset('./config.yaml')"

# Run DeepLabCut training
python -c "import deeplabcut; deeplabcut.train_network('./config.yaml', maxiters=150000, displayiters=10000, saveiters=20000)"

# check GPU availability
python -c "import tensorflow as tf; print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))"

.out:

TensorFlow version: 2.11.0
Keras version: 2.11.0
Loading DLC 2.3.6...
DLC loaded in light mode; you cannot use any GUI (labeling, relabeling and standalone GUI)
DeepLabCut version: 2.3.6
Loading DLC 2.3.6...
DLC loaded in light mode; you cannot use any GUI (labeling, relabeling and standalone GUI)
Selecting single-animal trainer
Num GPUs Available:  1

.err:

  2024-09-02 17:57:33.982224: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    2024-09-02 17:57:35.121651: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
    2024-09-02 17:57:35.121741: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
    2024-09-02 17:57:35.121753: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
    2024-09-02 17:57:37.515305: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    2024-09-02 17:57:38.471934: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
    2024-09-02 17:57:38.471996: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
    2024-09-02 17:57:38.472003: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
    2024-09-02 17:57:40.213478: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    2024-09-02 17:57:41.161126: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
    2024-09-02 17:57:41.161188: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
    2024-09-02 17:57:41.161195: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
    2024-09-02 17:57:47.390557: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    2024-09-02 17:57:48.351630: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
    2024-09-02 17:57:48.351734: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
    2024-09-02 17:57:48.351756: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/impacs/std31/.conda/envs/tensorflow-gpu-env/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/training.py", line 223, in train_network
        raise e
      File "/impacs/std31/.conda/envs/tensorflow-gpu-env/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/training.py", line 219, in train_network
        allow_growth=allow_growth,
      File "/impacs/std31/.conda/envs/tensorflow-gpu-env/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/core/train.py", line 161, in train
        cfg = load_config(config_yaml)
      File "/impacs/std31/.conda/envs/tensorflow-gpu-env/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/config.py", line 71, in load_config
        return cfg_from_file(filename)
      File "/impacs/std31/.conda/envs/tensorflow-gpu-env/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/config.py", line 49, in cfg_from_file
        with open(filename, "r") as f:
    FileNotFoundError: [Errno 2] No such file or directory: 'dlc-models/iteration-0/blinktrackJul31-trainset95shuffle1/train/pose_cfg.yaml'
    2024-09-02 17:57:54.125175: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    2024-09-02 17:57:55.056195: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
    2024-09-02 17:57:55.056258: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
    2024-09-02 17:57:55.056264: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

I am using the following versions:

deeplabcut - 2.3.6 tensorflow - 2.11.0 numpy - 1.21.6 pandas - 1.3.5

Things I have tried:

Resetting and creating a clean environment. Different versions of the above 4. Variations in the script including syntax, order of commands etc.

Any advice would be appreciated and tried.

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.