Temporal Context Aggregation for Video Retrieval with Contrastive Learning

By Jie Shao*, Xin Wen*, Bingchen Zhao and Xiangyang Xue (*: equal contribution)

This is the official PyTorch implementation of the paper "Temporal Context Aggregation for Video Retrieval with Contrastive Learning".

Introduction

In this paper, we propose TCA (Temporal Context Aggregation for Video Retrieval), a video representation learning framework that incorporates long-range temporal information between frame-level features using the self-attention mechanism.

To train it on video retrieval datasets, we propose a supervised contrastive learning method that performs automatic hard negative mining and utilizes the memory bank mechanism to increase the capacity of negative samples.

The proposed method shows a signiﬁcant performance advantage (∼17% mAP on FIVR-200K) over state-of-the-art methods with video-level features, and deliver competitive results with 22x faster inference time comparing with frame-level features.

Getting Started

Requirements

Currently, we only tested the code compacity with the following dependencies:

Python 3.7
PyTorch == 1.4.0
Torchvision == 0.5.0
CUDA == 10.1
Other dependencies

Installation

Clone this repo:

git clone https://github.com/xwen99/temporal_context_aggregation.git
cd temporal_context_aggregation

Install the dependencies:

pip install -r requirements.txt

Preparing the Data

Please follow the instructions in the pre-processing folder.

Training

Example training script on the VCDB dataset on an 8-gpu machine:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 horovodrun -np 8 -H localhost:8 \
python train.py \
--annotation_path datasets/vcdb.pickle \
--feature_path PATH/TO/YOUR/DATASET \
--model_path PATH/TO/YOUR/MODEL \
--num_clusters 256 \
--num_layers 1 \
--output_dim 1024 \
--normalize_input \
--neg_num 16 \
--epochs 40 \
--batch_sz 64 \
--learning_rate 1e-5 \
--momentum 0.9 \
--weight_decay 1e-4 \
--pca_components 1024 \
--padding_size 300 \
--num_readers 32 \
--num_workers 1 \
--moco_k 4096 \
--moco_m 0. \
--moco_t 1.0 \
--print-freq 1 \
--use-adasum \
--fp16-allreduce \

Evaluation

Example evaluation script on the FIVR-5K dataset:

python3 evaluation.py \
--dataset FIVR-5K \
--pca_components 1024 \
--num_clusters 256 \
--num_layers 1 \
--output_dim 1024 \
--padding_size 300 \
--metric cosine \
--model_path PATH/TO/YOUR/MODEL \
--feature_path PATH/TO/YOUR/DATASET \

Acknowledgement

Our codebase builds upon several existing publicly available codes. Specifically, we have modified and integrated the following repos into this project:

Citation

@InProceedings{Shao_2021_WACV,
    author    = {Shao, Jie and Wen, Xin and Zhao, Bingchen and Xue, Xiangyang},
    title     = {Temporal Context Aggregation for Video Retrieval With Contrastive Learning},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2021},
    pages     = {3268-3278}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contact

Xin Wen: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
datasets		datasets
pre_processing		pre_processing
LICENSE		LICENSE
README.md		README.md
data.py		data.py
evaluation.py		evaluation.py
model.py		model.py
requirements.txt		requirements.txt
teaser.jpg		teaser.jpg
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Temporal Context Aggregation for Video Retrieval with Contrastive Learning

Introduction

Getting Started

Requirements

Installation

Preparing the Data

Training

Evaluation

Acknowledgement

Citation

License

Contact

About

Releases

Packages

Languages

License

xwen99/temporal_context_aggregation

Folders and files

Latest commit

History

Repository files navigation

Temporal Context Aggregation for Video Retrieval with Contrastive Learning

Introduction

Getting Started

Requirements

Installation

Preparing the Data

Training

Evaluation

Acknowledgement

Citation

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages