Skip to content

Kaif10/Automatic-Image-captioning-with-AutoEncoders

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Automatic Image captioning with Auto-Encoders

Building the model

Coming to the main model, image captioning architecture consists of three models:

A CNN: used to extract the image features A TransformerEncoder: The extracted image features are then passed to a Transformer based encoder that generates a new representation of the inputs A TransformerDecoder: This model takes the encoder output and the text data (sequences) as inputs and tries to learn to generate the caption.

Short summary of model

CNN extract features >> Tranformer encoder (new representation of CNN output) >> TransformerDecoder takes (transformer encoder outputs + text data (in integer sequence format) and learns to generate captions corresponding to imgs)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published