Automatic Image captioning with Auto-Encoders

Building the model

Coming to the main model, image captioning architecture consists of three models:

A CNN: used to extract the image features A TransformerEncoder: The extracted image features are then passed to a Transformer based encoder that generates a new representation of the inputs A TransformerDecoder: This model takes the encoder output and the text data (sequences) as inputs and tries to learn to generate the caption.

Short summary of model

CNN extract features >> Tranformer encoder (new representation of CNN output) >> TransformerDecoder takes (transformer encoder outputs + text data (in integer sequence format) and learns to generate captions corresponding to imgs)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Automatic_image_captioning.ipynb		Automatic_image_captioning.ipynb
README.md		README.md
outputs.png		outputs.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Image captioning with Auto-Encoders

Building the model

Short summary of model

About

Releases

Packages

Languages

Kaif10/Automatic-Image-captioning-with-AutoEncoders

Folders and files

Latest commit

History

Repository files navigation

Automatic Image captioning with Auto-Encoders

Building the model

Short summary of model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages