1822 B.E Cse Batchno 224
1822 B.E Cse Batchno 224
1822 B.E Cse Batchno 224
at
Sathyabama Institute of Science and Technology (Deemed to be
University)
Pitambara Awadhesh
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC
(Established under Section 3 of UGC Act, 1956)
JEPPIAAR NAGAR, RAJIV GANDHI SALAI, CHENNAI– 600119
www.sathyabamauniversity.ac.in
_________________________________________________________________________________
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of Pitambara Awadhesh (38110406) and
Reema Rose Toppo (38110458) who carried out the project entitled Indian Sign Language Recognition
System to Help Mute and Deaf People under my supervision from _____________ to ____________.
3
ACKNOWLEDGEMENT
I convey my thanks to Dr. T. Sasikala M.E., Ph.D., Dean, School of Computer Science and
Technology and Dr. L. Lakshmanan, Head of the Department, Dept. of Computer Science and
Technology for providing me necessary support and details at the right time during the progressive
reviews.
I would like to express my sincere and deep sense of gratitude to my Project Guide Dr. B. Bharathi
for her valuable guidance, suggestions and constant encouragement paved way for the successful
completion of my project work.
I wish to express my thanks to all Teaching and Non-teaching staff members of the Department of
Computer Science and Engineering who were helpful in many ways for the completion of the
project.
4
INDIAN SIGN LANGUAGE RECOGNITION SYSTEM TO HELP DEAF AND MUTE
PEOPLE
Submitted in partial fulfillment of the requirements for the award of Bachelor of Engineering /
Technology degree in Computer Science and Engineering (Specialisation of degree)
By
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC
JEPPIAAR NAGAR, RAJIV GANDHI SALAI, CHENNAI - 600119
MAY - 2022
5
ABSTRACT
Everything has changed due to the COVID19 pandemic. We went from offline to
internet mode in no time. Some people found it easy to adjust to this way of life, but
many others with disabilities never did and still find it difficult to explain their ideas in
online meetings. To solve this problem, we have proposed a Convolutional Neural
Network (CNN) based model for Indian sign language recognition. In our proposed
method, a mute or deaf person can interact with the camera integrated into a computer
and use gestures that will be recognized and converted to text for others to understand.
For this, we have created a sample dataset and pre-processed it using a label binarizer.
Afterward, feature extraction was done using two models, first for the palm region and
the other for the fingers. Later on, this dataset was fed into a custom-made CNN model
whose learning parameters were provided as 0.001 learning rate, 128 batch-size, and
10 epochs. The model performed well with 93% of accuracy while recognizing a hand
gesture. Hence, our proposed model can be integrated into other online meeting sites
for the target audience to use.
6
CONTENTS
CHAPTERS TITLE PAGE
Bonafide 2
Declaration 3
Acknowledgement 4
Certificate 5
Abstract 6
1 INTRODUCTION
1.1 Basics 8
1.2 Python3 8
1.3 Python Modules 9-11
1.4 Deep Learning and Types 12-13
1.5 OpenCV 14
1.6 Tensorflow 15
2 Literature Survey
2.1 Civil airline fare prediction with a multi-attribute dual- 16
stage attention mechanism
2.2 AirFare Prediction 17
7
CHAPTER 1 - INTRODUCTION
1.1 Basics
The goal of this project was to build a neural network able to classify which word of the Indian
Sign Language (ISL) is being signed, given an image of a signing hand. This project is a first step
towards building a possible sign language translator, which can take communications in sign
language and translate them into written and oral language. Such a translator would greatly lower
the barrier for many deaf and mute individuals to be able to better communicate with others in
day-to-day interactions.
This goal is further motivated by the isolation that is felt within the deaf community. Loneliness
and depression exist at higher rates among the deaf population, especially when they are
immersed in a hearing world. Large barriers that profoundly affect life quality stem from the
communication disconnect between the deaf and the hearing. Some examples are information
deprivation, limitation of social connections, and difficulty integrating in society.
Most research implementations for this task have used depth maps generated by the depth
camera and high-resolution images. The objective of this project was to see if neural networks are
able to classify signed ISL letters using an input video sequence of a person taken with a personal
device such as a laptop webcam. This is in alignment with the motivation as this would make a
future implementation of a real-time ISL-to-oral/written language translator practical in everyday
situations.
1.2 Python3
8
large-scale projects. Python is dynamically typed and garbage-collected. It supports
multiple programming paradigms including structured (particularly, procedural), object-
oriented, and functional programming. Python is often described as a "batteries included"
language due to its comprehensive standard library.
(a) NumPy
NumPy is a python library used for working with arrays. It also has functions for
working in the domain of linear algebra, Fourier transform, and matrices. NumPy
was created in 2005 by Travis Oliphant. It is an open-source project and you can
use it freely. NumPy stands for Numerical Python.
Why NumPy?
In Python, we have lists that serve the
purpose of arrays, but they are slow to
process. NumPy aims to provide an
array object that is up to 50x faster than
traditional Python lists.
ndarray, in
provides a lot of supporting functions that make working with ndarray very easy. Arrays
are very frequently used in data science, where speed and resources are very
important.
(b) Pandas
9
Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built
on the Numpy package and its key data structure is called the DataFrame.
DataFrames allow you to store and manipulate tabular data in rows of observations
and columns of variables.
10
Figure 2: Pandas: DataFrames
▪ Tools for reading and writing data between in-memory data structures and different file formats.
▪ Data alignment and integrated handling of missing data.
▪ Hierarchical axis indexing to work with high-dimensional data in a lower-dimensional data structure.
▪ Time series-functionality: Date range generation and frequency conversion, moving window
statistics, moving window linear regressions, date shifting, and lagging.
▪ Provides data filtration.
(c) MatPlotlib
11
Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python. Matplotlib produces publication-quality figures in a variety of
hardcopy formats and interactive environments across platforms. Matplotlib can be
used in Python scripts, the Python and IPython shell, web application servers, and
various graphical user interface toolkits.
12
• High-level abstractions for structuring multi-plot grids that let you easily
build complex visualizations
• Concise control over matplotlib figure styling with several built-in themes
• Tools for choosing color palettes that faithfully reveal patterns in your data
Deep learning drives much artificial intelligence (AI) applications and services that
improve automation, performing analytical and physical tasks without human
intervention.
Deep neural networks consist of multiple layers of interconnected nodes, each building
upon the previous layer to refine and optimize the prediction or categorization. This
progression of computations through the network is called forward propagation. The
input and output layers of a deep neural network are called visible layers. The input
layer is where the deep learning model ingests the data for processing, and the output
layer is where the final prediction or classification is made.
1. Input layer – The input layer has input features in a dataset that is known to us.
2. Hidden Layer – Hidden layer, just like we need to train the brain through hidden
neurons.
3. Output layer – value that we want to classify.
13
Fig 5: Types of Deep learning algorithms
There are many types of deep learning algorithms developed over the years but there
are a few algorithms that are frequently used:
14
Artificial neural networks work like a human brain. The human brain has billions of
neurons and each neuron is made up of a cell body that is responsible for computing
information by carrying forward information towards hidden neurons and providing the
final Output.
The aim is to minimize the error by adjusting the weight and bias of the interconnection
which is known as backpropagation. With the process of backpropagation, the
difference between the desired output and actual output produces the least error.
CNN is a supervised type of Deep learning, most preferable used in image recognition
and computer vision. CNN has multiple layers that process and extract important
features from the image. Convolutional neural networks are composed of multiple layers
of artificial neurons. Artificial neurons, a rough imitation of their biological counterparts,
are mathematical functions that calculate the weighted sum of multiple inputs and
output an activation value. When you input an image in a ConvNet, each layer
generates several activation functions that are passed on to the next layer.
15
The first layer usually extracts basic features such as horizontal or diagonal edges. This
output is passed on to the next layer which detects more complex features such as
corners or combinational edges. As we move deeper into the network it can identify
even more complex features such as objects, faces, etc.
RNN is a type of supervised deep learning where the output from the previous step is
fed as input to the current step. RNN deep learning algorithm is best suited for
sequential data. RNN is most preferably used in image captioning, time-series analysis,
natural-language processing, handwriting recognition, and machine translation.
16
Fig 8: Recurrent Neural Networks (RNNs)
The most vital feature of RNN is the Hidden state, which memorizes some information
about a sequence. There are mainly 4 steps of how RNN works.
1. The output of the hidden state at t-1 is fed into input at time t.
2. Same way, the output at time t fed into the input at time t+1.
3. RNN can process inputs of any considerable length
4. The RNN computation depends on historical sequence data and the model size
doesn’t increase with input size.
1.5 OpenCV
OpenCV is a great tool for image processing and performing computer vision tasks. It is
an open-source library that can be used to perform tasks like face detection, objection
17
tracking, landmark detection, and much more. It supports multiple languages including
python, java C++. Although, For this article, we will be limiting to python only.
The library is equipped with hundreds of useful functions and algorithms, which are all
freely available to us. Some of these functions are really common and are used in
almost every computer vision task. Whereas many of the functions are still unexplored
and haven’t received much attention yet.
With the help of Open CV in python, it's possible to process images, videos easily and
can extract useful information from them, as there are lots of functions available. Some
of the common applications are
1. Image Processing:
Images can be read, written, shown, and processed with the OpenCV, which can
generate a new image from that either by changing its shape, color, or extracting
something useful from the given one and write into a new image.
2. Face Detection:
Either from the live streaming using a web camera or from the locally stored
videos/images utilizing Haar-Cascade Classifiers.
3. Face Recognition:
4. Object Detection:
18
Open CV along with the YOLO, an object detection algorithm can be used to
detect objects from the image, videos either moving or stationary objects.
1.6 TensorFlow
Fig 9: TensorFlow
TensorFlow works on the basis of data flow graphs that have nodes and edges. As the
execution mechanism is in the form of graphs, it is much easier to execute TensorFlow
code in a distributed manner across a cluster of computers while using GPUs.
19
1.6.1 Tensorflow Object Detection API
20
CHAPTER 2 – LITERATURE SURVEY
At first, we started studying the most basic human-computer interaction and how it
actually works. Ghotkar and Kharate explained three techniques and experimented
with interaction with a Desktop/Laptop with static hand gestures. All these techniques
used a real-time approach with different feature descriptors such as Fourier
Descriptor(FD),7 Hu moments, Convex Hull, and Finger Detection. Real-time
Recognition efficiency was calculated with respect to recognition time for FD and 7 Hu
moments.
There are two major approaches for hand gesture recognition: Data Glove, Vision-
based. Each approach is having its limitations and advantages, but vision-based
approaches are more feasible as compared to data gloves as users do not need to
wear cumbersome devices like data gloves.
21
22
2.2 Real-Time Hand Gesture Recognition Using Finger Segmentation (Hindawi)
In this paper, Zhi-Hua Chen and Jung-Tae Kim present an efficient and effective method
for hand gesture recognition. The hand region is detected through the background
subtraction method. Then, the palm and fingers are split so as to recognize the fingers.
After the fingers are recognized, the hand gesture can be classified through a simple
rule classifier.
23
2.4
24
CHAPTER 4 -EXPERIMENTAL OR MATERIALS AND METHODS; ALGORITHMS USED
Confusion Matrix
3.2 Dataset
CNN (Convolutional Neural Networks) is used in the suggested system. It's a four-layer
modified CNN model that starts with three nodes.Modules like the torch were used to
load, divide, and test the data during the training phase.The epochs were set to ten, and
the device was examined to determine whether Cuda was available; if it wasn't, the
CPU was used.The photo positions and labels were afterwards obtained from the
25
data.csv file. Adam and Cross-Entropy Loss are the optimizer and criterion used for the
functions, respectively.Each predicted class probability is compared to the real class
intended result, which is either 0 or 1, to determine a score/loss that penalises the
probability based on how far it deviates from the actual expected value.Data loader,
optimizer, criterion, model, and train data are used to produce a function called "fit" for
training.The parameters data loader, optimizer, criterion, model, and valid data are
supplied to another validation function called "validate."The validate loss and accuracy
are calculated using this function.Learning rate = 0.001, batch size = 128, and epochs =
10 are specified as learning parameters. Within the period range, "fit" is called to
provide accuracy and loss for training.The model is retained for future use after it has
been trained.To open the web camera and then show the gesture, which is anticipated,
and the word is displayed on the screen to receive the output and truly predict a hand
sign.
26
3.3 Flowchart
Figure 7 : Flowchart
27
CHAPTER 5 – RESULTS AND DISCUSSIONS;PERFORMANCE ANALYSIS
The data is trained using the CPU. As a result, CUDA is identified as zero (without
GPU). The 42000 dataset is split into two parts: one for training and the other for
validation.Training set: is a collection of samples used to learn how to fit the classifier's
parameters [i.e., weights]. Validation set: A set of examples used to fine-tune a
classifier's parameters [architecture, not weights], such as the number of hidden units in
a neural network.
28
The model is then fitted with two functions: one for training and the other for validating. This will
produce the results, which will be compared to the target.
5.2 Results
Graph for train loss which shows the deviation of error loss while training. A loss
function, also known as a cost function, is a function that transfers an event or the
values of one or more variables onto a real number that intuitively represents some
"cost" connected with the event in mathematical optimization and decision theory. This
is the initial data epochs vs error loss. An epoch is a word used in machine learning that
refers to the number of passes the machine learning algorithm has made across the full
training dataset. At 0 epoch, loss calculated is approx. 0.005.
29
The graph below depicts the initial training accuracy, which indicates that the model was
saved after a checkpoint. At 0 epoch, accuracy attained is approx 83%.As the training
moves forward it changes to 98%.
30
31
:
32
CHAPTER 6 – CONCLUSION AND FUTURE WORK
6.1 Conclusion
Using a CustomCNN classifier, the output is a fully trained model with a 93 percent
accuracy. With a total of 1200 samples, the visuals used were in Indian Sign Language.
These photographs are organised into folders based on their meaning. The dataset was
downloaded in jpeg format, then label binarized and pre-processed. We achieved a loss
rate of roughly 0.10 after training the model. After that, the trained model was saved and
reloaded. A script was used to enable the web camera. Finally, a hand gesture was
demonstrated, which was effectively detected.
In the future, this might be modified into an application where a person practicing sign
language can transmit live video to the other end, which can be understood by the
person watching. Basically, when the video is sequenced, it will capture the main
gesture which classifies as the sign language, and then the recognized sign is
classified. The database can be modified by adding more images to it. This will lead to
making the system more accurate in detecting everything.
33
REFERENCES
Python3: www.python.org/download/releases/3.0/
[1] Study of vision-based hand gesture recognition using Indian sign language in
INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS
VOL. 7, NO. 1, MARCH 2014.
[2] Real-time hand gesture recognition using finger segmentation in THE SCIENTIFIC
WORLD JOURNAL (JAN 2014).
[3] Hand sign recognition from depth images with multi-scale density features for deaf-
mute persons in International Conference on Computational Intelligence and Data
Science (ICCIDS 2019).
[4] Indian sign language interpreter using image processing and machine learning in
IOP Conference Series: Materials Science and Engineering 2020.
[6] Hand Gesture Recognition A Literature Review in IJSRD - International Journal for
Scientific Research & Development| Vol. 8, Issue 2, 2020.
[8] Visual tracking utilizing robust complementary learner and adaptive refiner in
Science Direct,10th May 2017
34
[9] Tracking of dynamic gesture fingertips position in a video sequence in Archives of
Control Sciences Volume 30(LXVI), 2020.
[10] Visual tracking utilizing robust complementary learner and adaptive refiner in
Science Direct,10th May 2017.
35