captioning

Star

Here are 76 public repositories matching this topic...

facebookresearch / mmf

Star

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

deep-learning dialog pytorch vqa pretrained-models captioning multimodal multi-tasking textvqa hateful-memes

Updated Nov 15, 2024
Python

roboflow / maestro

Star

streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL

transformers vqa objectdetection captioning fine-tuning multimodal vision-and-language phi-3-vision paligemma florence-2

Updated Dec 14, 2024
Python

ltguo19 / VSUA-Captioning

Star

Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019

nlp deep-learning pytorch captioning language-generation

Updated Oct 18, 2019
Python

fpgaminer / joycaption

Star

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

vlm captioning joycaption

Updated Nov 29, 2024
Python

DavidHuji / CapDec

Star

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

clip zero-shot-learning captioning multimodal-deep-learning gpt-2 clipcap

Updated Jan 28, 2024
Python

Labbeti / aac-datasets

Star

Audio Captioning datasets for PyTorch.

audio deep-learning pytorch dataset caption datasets captioning audio-captioning

Updated Nov 4, 2024
Python

mitvis / vistext

Star

VisText is a benchmark dataset for semantically rich chart captioning.

charts dataset captioning-images captioning t5

Updated Oct 3, 2023
Jupyter Notebook

drethage / fully-convolutional-point-network

Star

Fully-Convolutional Point Networks for Large-Scale Point Clouds

deep-neural-networks computer-vision deep-learning point-cloud point-clouds semantic-segmentation meshes 3d captioning

Updated Mar 22, 2019
Python

audio-captioning / clotho-dataset

Star

Python code for handling the Clotho dataset.

audio natural-language-processing deep-learning audio-signal-processing captioning audio-captioning clotho-dataset

Updated Nov 24, 2020
Python

HaydenFaulkner / Tennis

Star

A Tennis dataset and models for event detection & commentary generation

machine-learning video computer-vision mxnet dataset tennis gluon sportsanalytics fine-grained captioning eventdetection

Updated Aug 17, 2020
Python

wangleihitcs / MedicalReportGeneration

Star

A Base Tensorflow Project for Medical Report Generation

tensorflow-models captioning medical-report-generate

Updated Jun 16, 2019
Python

Mauville / MedCLIP

Star

Medical image captioning using OpenAI's CLIP

machine-learning deep-learning medical-imaging clip captioning what-a-challenge-this-was

Updated Mar 7, 2023
Jupyter Notebook

ParitoshParmar / MTL-AQA

Star

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

pytorch video-processing lstm representation-learning action-recognition video-understanding c3d video-captioning captioning fine-grained-classification multitask-learning dilated-convolution action-quality-assessment mtl-aqa fine-grained-action-recognition dilated-c3d

Updated Nov 10, 2024
Python

TheShadow29 / VidSitu

Star

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

nlp video vision srl captioning captioning-videos vision-and-language grounding video-language event-relations semantic-roles

Updated Aug 17, 2021
Python

aimagelab / pacscore

Star

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. CVPR 2023

computer-vision cvpr captioning-images captioning captioning-videos vision-and-language cvpr2023

Updated Oct 29, 2024
Python

42lux / CaptainCaption

Star

A gradio based image captioning tool that uses the GPT-4-Vision API to generate detailed descriptions of images.

tagging gradio captioning openai-api gpt-4-vision

Updated Nov 20, 2024
Python

Chen-Yang-Liu / Awesome-RS-Temporal-VLM

Star

A Comprehensive Survey on Remote Sensing Temporal Vision-Language Models

captioning change-detection multimodal-deep-learning

Updated Dec 6, 2024

lucidrains / AoA-pytorch

Star

A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering

vqa attention attention-mechanism captioning visual-question-answering

Updated Nov 8, 2020
Python

DavidMChan / caption-by-committee

Star

Using LLMs and pre-trained caption models for super-human performance on image captioning.

python machine-learning image ai deep-learning captioning chatgpt

Updated Oct 13, 2023
Python

audio-captioning / dcase-2020-baseline

Star

Audio captioning baseline system for DCASE 2020 challenge.

machine-learning deep-neural-networks deep-learning signal-processing audio-signal-processing captioning dcase machine-listening audio-captioning dcase2020

Updated Aug 22, 2023
Python

Improve this page

Add a description, image, and links to the captioning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the captioning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

captioning

Here are 76 public repositories matching this topic...

facebookresearch / mmf

roboflow / maestro

ltguo19 / VSUA-Captioning

fpgaminer / joycaption

DavidHuji / CapDec

Labbeti / aac-datasets

mitvis / vistext

drethage / fully-convolutional-point-network

audio-captioning / clotho-dataset

HaydenFaulkner / Tennis

wangleihitcs / MedicalReportGeneration

Mauville / MedCLIP

ParitoshParmar / MTL-AQA

TheShadow29 / VidSitu

aimagelab / pacscore

42lux / CaptainCaption

Chen-Yang-Liu / Awesome-RS-Temporal-VLM

lucidrains / AoA-pytorch

DavidMChan / caption-by-committee

audio-captioning / dcase-2020-baseline

Improve this page

Add this topic to your repo