Review 2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

DETECTING AND CAPTIONING

IMAGES USING CNN-LSTM DEEP


NEURAL NETWORKS AND FLASK
UNDER GUIDANCE OF
Dr. P. VENKATESWARA RAO
PROFESSOR & HOD
DEPARTMENT OF CSE

PRESENTED BY
P.YASWANTH SAI (15F11A0563)
SK.ANEEF (16F15A0501)
T.SAI HARISH (15F11A0585)
ABSTRACT
Captioning images automatically is one of the heart of the human
visual system. There are various advantages if there is an
application which automatically caption the scenes surrounded by
them and revert back the caption as a plain message. In this paper,
we present a model based on CNN-LSTM neural networks which
automatically detects the objects in the images and generations
descriptions for the images. This model can perform two operations.
The first one is to detect objects in the image using convolutional
neural networks and the other is to caption the images using RNN
based LSTM. Interface of the model is developed using flask rest
API, which is a web development framework of python. The main
use case of this project is to help visually impaired to understand the
surrounding environment and act according to that.
EXISTING SYSTEM
Three papers we are going to discuss to define the existing system:

Deep Visual-Semantic Alignments for Generating


Image Descriptions
Sentences act as weak labels : contiguous sequences of words correspond to
some particular (unknown) location in image.

Show and Tell: A Neural Image Caption


Generator
Works fine only with images within dataset.

Deep Captioning with Multimodal Recurrent Nets


Very complex and time taking strategy, which is very difficult and costlier to
deploy in real world.
PROPOSED SYSTEM

DETECTING OBJECTS
CNN

CONVERTING TO NATURAL LANGUAGE


RNN & LSTM

GENERATING CAPTIONS

Using vigorous training and pre-trained libraries of python


DATASETS USED

Flickr8k
 8000 images, each annotated with 5 sentences via AMT

 1000 for validation, testing

Flickr30k
 30k images

 1000 validation, 1000 testing


MSCOCO
 123,000 images

 5000 for validation, testing


SYSTEM REQUIREMENTS

Hardware Requirements Recommendation:
 RAM: 4 GB (minimum)

 Hard disk: 500 GB

Software Requirements Recommendation:


 Operating System : Windows (above 7 64-bit), Linux and MAC

 Web Interface : Flask Rest API ( Python Web Framework)

 Programming Language: Python

 Libraries : Tensorflow, Keras, Numpy, PIL, Flask-python, captionBot

 Browser: Chrome , Firefox


ARCHITECTURE DIAGRAM OF THE PROJECT
MODULAR DIVISION OF THE PROJECT

Creating pre-trained model (Transfer Learning)

Object detection

Sentence Generation

Ranking based caption retrieval

Deployment to Web Server


TRANSFER LEARNING
• Transfer learning is computer vision’s popular
method ,because it allows us to build accurate
models in a timesaving manner.
• With transfer learning, instead of starting the
learning process from scratch, you start from
patterns that have been learned when solving a
different problem. 
• In computer vision, transfer learning is usually
expressed through the use of pre-trained
models.
OBJECT DETECTION/WORD DETECTION
SENTENCE GENERATION
RANKING BASED CAPTION RETRIEVAL
NLP PROBABILISTIC MODEL
NLP PROBABILISTIC MODEL
NLP PROBABILISTIC MODEL
NLP PROBABILISTIC MODEL
WORKING OF CNN

• CNN stands for CONVOLUTIONAL NEURAL NETWORK.


WORKING OF RNN
• RNN stands for RECURRENT NEURAL NETWORK.
• [St=F(St-1+Wt)]
WORKING OF LSTM

• LSTM stands for LONG SHORT TERM MEMORY.


PRE TRAINED FILES

• Descriptions.txt
• Features.pkl
• Tokenizer.pkl
• Model.h5
LIBRARIES USED

• TENSORFLOW
• KERAS
• PICKLE
• NLTK
DEPLOYING AS A WEB APPLICATION

• We have used FLASK to deploy our


project as a REST-API in the form of
a web application.
• FLASK is a web application
framework of PYTHON used mostly
to deploy machine learning models.

You might also like