Real Time Sign Language Interpreter Report
Real Time Sign Language Interpreter Report
Real Time Sign Language Interpreter Report
INTERPRETER
A Major Project Report
Submitted in partial fulfillment of the requirements for the degree of
Bachelor of Technology
in
Internet of Things
Submitted by
Aditya Nema 0108IO201005
Ajinkya Balwant Soley 0108IO201006
Deepanshu Dixit 0108IO201018
Kushagra Shrivastava 0108IO201028
Shalini Sharma 0108IO201056
Project Guide:
Dr. Shailendra Kumar Shrivastava
It is certified that the work contained in the project report entitled “ML-BASED
REAL-TIME SIGN LANGUAGE INTERPRETER” by the following students has been
carried out under my supervision and that this work has not been submitted elsewhere for
a degree.
. .
Project Coordinator Examiner(s)
Declaration
SATI Vidisha
27 November 2023
We declare that this written submission represents our ideas in our own words and where
others’ ideas or words have been included, We have adequately cited and referenced the
original sources. We declare that we have properly and accurately acknowledged all
sources used in the production of this report. We also declare that we have adhered to all
principles of academic honesty and integrity and have not misrepresented or fabricated or
falsified any idea/data/fact/source in our submission. We understand that any violation of
the above will be a cause for disciplinary action by the Institute and can also evoke penal
action from the sources which have thus not been properly cited or from whom proper
permission has not been taken when needed.
v
Acknowledgements
We would like to extend our sincere gratitude to everyone who was involved in this engi-
neering project. We appreciate the dedication and hard work of our team members, our
project coordinator Prof. Ramratan Ahirwal and project guide, Dr. Shailendra Kumar
Shrivastava, who have been instrumental in helping us reach our goals.
We are thankful for the valuable guidance of Dr. Vipin Patait and Prof. Rashi Kumar
which was key in the completion of this project. We are thankful for the valuable guid-
ance and assistance provided by our supervisors and mentors.
We are also grateful for the support and encouragement of our family and friends, which
helped us throughout this project. Lastly, we thank all those who have helped us through
their advice and constructive feedback.
We feel fortunate to have had such a strong support system throughout this journey and
look forward to it.
vii
Abstract
The "ML-based Real-time Indian Sign Language Interpreter" project aims to develop an
innovative system that facilitates seamless communication between individuals with hear-
ing impairments and the broader community. Leveraging machine learning (ML) tech-
niques, this real-time interpreter is specifically designed for the Indian Sign Language
(ISL).
The system employs a combination of computer vision and deep learning algorithms to
recognize and interpret gestures made in ISL. A robust dataset of diverse sign gestures is
utilized to train the model, allowing it to adapt and accurately interpret signs performed
by users in real-time. The incorporation of neural networks enhances the system’s ability
to generalize and comprehend variations in signing styles and contexts.
This project proposes a machine learning (ML) based real-time Indian sign language in-
terpreter. The interpreter would use a camera to capture the signer’s hand gestures, and
then use ML to translate those gestures into spoken or written text. The interpreter would
be designed to be accurate, efficient, and user-friendly.
The interpreter would be implemented using a deep learning model. The model would
be trained on a dataset of Indian sign language. The model would be able to recognize a
variety of hand gestures, including single-hand gestures and two-hand gestures.
The interpreter would be evaluated using a variety of metrics, including accuracy, speed,
and user satisfaction. The interpreter would be compared to other existing sign language
interpreters, both human and machine.
The results of this project would have a significant impact on the lives of deaf and hard-
of-hearing people in India. The interpreter would provide them with a new way to com-
municate with the hearing world.
ix
Table of Contents
Acknowledgements vii
Abstract ix
1 Introduction 1
1.1 Project Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Review 7
A Appendix 29
A.1 Appendix 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
A.2 Appendix 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
xi
xii Table of Contents
References 33
List of Figures
3.1 MobileNet V1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Convulation Neural Network . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Load and Preprocess the Dataset . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Training Data configuration . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.5 Model Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.6 Setting up Epochs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.7 Plotting the points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.8 Plotted points on the graph . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.9 Display the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
xiii
Chapter 1
Introduction
The significance of early exposure to sign language becomes evident when consid-
ering the challenges faced by children with hearing loss. The National Library of
Medicine[1] highlights the critical "golden period of learning," wherein the age of detec-
tion of hearing loss and the subsequent use of hearing aids significantly impact language
development. Learning sign language in the early stages of life not only enhances
linguistic growth but also nurtures cognitive and social development, empowering these
children to interact effectively and excel academically. It extends its positive influence to
families, fostering improved communication and comprehension.
1
2 Introduction
At its core, the project leverages a fusion of computer vision and ML algorithms
to create a real-time sign language interpreter. The choice of these technologies is
driven by the need for the system to not only recognize static signs but also dynamically
interpret the fluid and nuanced gestures inherent in ISL. The project’s success hinges on
the robustness of the dataset used for training, encompassing a wide spectrum of sign
gestures, thereby enabling the model to adapt and generalize effectively.
The deep learning model, a central component of the project, undergoes compre-
hensive training on the diverse ISL dataset. This training equips the model to recognize
an extensive repertoire of hand gestures, ranging from single-hand expressions to more
complex two-hand gestures. The project’s evaluation framework is multifaceted, encom-
passing metrics such as accuracy, speed, and user satisfaction. Comparative analyses with
existing sign language interpreters, both human and machine-based, provide insights into
the system’s performance and potential areas of improvement.
Beyond the technological intricacies, the project’s significance lies in its potential
societal impact. By providing a reliable and efficient means of communication, the
interpreter seeks to enhance the quality of life for individuals within the deaf and
hard-of-hearing community in India. It endeavors to break down communication barriers,
fostering greater inclusivity and enabling meaningful participation in a world that is
increasingly reliant on spoken and written language.
This introduction lays the foundation for a comprehensive exploration of the method-
ologies, results, and implications of the "ML-based Real-time Indian Sign Language
Interpreter" project, underscoring its potential to bring about positive and transformative
change in the lives of its users.
1.1 Project Scope 3
• Gather a diverse dataset of ISL gestures, representing various signing styles and
contexts.
• Preprocess the data to eliminate noise, normalize hand positions, and enhance fea-
ture extraction.
Feature Extraction:
• Identify hand positions, finger movements, and hand orientations to capture relevant
information.
• Extract features that encompass the spatial arrangement of fingers, palm orientation,
and relative motion between hands.
• Design a deep learning model architecture suitable for sign language recognition.
• Train the deep learning model on the preprocessed dataset, optimizing hyperparam-
eters for high accuracy and generalization.
• Integrate the trained deep learning model into the real-time processing pipeline for
efficient gesture recognition.
• Develop a user-friendly interface for capturing hand gestures and displaying trans-
lated text or speech.
• Integrate the real-time gesture recognition and translation modules into the user
interface.
• Evaluate the system’s performance using metrics such as accuracy, speed, and user
satisfaction.
• Conduct comparative analyses with existing sign language interpreters, both human
and machine.
• Implement measures to protect user privacy, ensuring compliance with ethical stan-
dards and data protection regulations.
• Ensure cultural sensitivity in the design and deployment of the interpreter, respect-
ing the nuances of ISL and the diverse communities it serves.
Chapter 2
Literature Review
According to Michele Friedner’s research, some parents of deaf children in India might
see Sign Language as causing problems in their families. They worry that their deaf chil-
dren, who use Indian Sign Language (ISL), spend more time with other deaf people and
less with their hearing family members. This can strain family relationships, especially
when parents don’t know sign language well. Learning sign language themselves could
help bridge this gap.[2]
In India, the use of sign language in education is not common. Historically, some
people believed that deafness was a punishment for past sins. Deaf individuals even faced
legal restrictions, like being denied the right to inherit property. Due to these beliefs, deaf
education hasn’t been a priority in Indian society. Even though students in Deaf schools
naturally use ISL to communicate, it’s not formally recognized or encouraged by school
authorities.[3]
To change this, there’s a plan to introduce ISL classes and interpreter training pro-
grams. This started with the creation of an ISL Cell at the National Institute for the
Hearing Handicapped (NIHH) in 2001[4]. So far, it has been successful, but many people
still don’t know about ISL. To help deaf individuals communicate better with the broader
community, they are working on developing sign language interpreters.
The significant communication gap between the deaf and normal populations in
India, primarily due to the lack of sign language knowledge and limited availability of
interpreters[5]. Although efforts are underway to develop sign language recognition
systems, real-time recognition remains a substantial challenge. This paper introduces
an innovative approach that utilizes convolutional neural networks, data augmentation,
7
8 Literature Review
batch normalization, dropout, stochastic pooling, and the diffGrad optimizer to recognize
static signs in the Indian sign language alphabet. With remarkable training and validation
accuracy exceeding 99%, this method surpasses the performance of previous systems,
offering promise for more effective communication solutions.
The importance of Indian Sign Language (ISL) for communication with the hear-
ing impaired, as recognized by the RPwD Act 2016 in India[7]. It emphasizes the need
for sign language interpreters in government organizations and public sector undertak-
ings. The paper presents a deep learning-based methodology for ISL static alphabet
recognition using Convolutional Neural Networks (CNN), achieving an impressive
accuracy of 98.64% —outperforming many existing methods.
According to the 2019 6th International Conference on Image and Signal Process-
ing and their Applications (ISPA)[9] In the realm of human-computer interaction, hand
gestures provide a natural and versatile means for various applications. However,
challenges such as the intricate nature of gesture patterns, variations in hand size, diverse
hand postures, and fluctuating environmental lighting can impact the effectiveness of
hand gesture recognition algorithms. The recent integration of deep learning has signif-
icantly elevated the capabilities of image recognition systems, with deep convolutional
neural networks (CNNs) showcasing superior performance in image representation and
classification when compared to traditional machine learning methods. This literature
survey focuses on a comparative analysis of two techniques for American Sign Language
hand gesture recognition. The first technique employs a proposed deep-convolution
neural network, while the second incorporates transfer learning using the pre-trained
MobileNetV2 model. Both models undergo training and testing with 1815 segmented
images characterized by color and a black background, encompassing static hand
gestures from five volunteers with variations in scale, lighting, and noise. The outcomes
reveal that the proposed CNN model attains an impressive classification accuracy of
98.9%, demonstrating a 2% enhancement over the CNN model enriched through transfer
learning techniques, which achieved 97.06%.
Network architecture.
3.1 Objectives
1. Develop a Real-Time Indian Sign Language Interpreter:
Our foremost objective is to create a real-time Indian Sign Language (ISL) interpreter,
thereby enabling instantaneous communication between ISL users and individuals
unfamiliar with sign language. Real-time communication is essential for meaningful
conversations and interactions, allowing for seamless exchanges without undue delays.
To make this communication accessible and effective, our system will feature a user-
friendly interface that can adapt to diverse ISL expressions and signing styles. It will
be designed to be intuitive and easy to use, ensuring that both ISL speakers and those
who are not proficient in sign language can interact smoothly. Moreover, the system’s
robustness and adaptability will be central to its development, enabling it to function
reliably under various environmental conditions and for a wide range of users, ultimately
ensuring inclusivity in communication.
11
12 Problem Formulation and Proposed Solution
sign language users, those who prefer text-based interactions, or individuals who rely on
spoken language. This comprehensive approach ensures that the project is not limited to
a single mode of communication and can be widely used.
By achieving these objectives, our project aims to create a valuable tool that em-
powers individuals who use the Indian Sign Language and promotes more effective
communication and understanding across a broader spectrum of society.
1. MobileNets:
Advantages:
a. Lightweight and Efficient Design: MobileNet is specifically designed to be lightweight
and efficient, making it suitable for applications where computational resources are
limited. Its architecture allows for faster processing without compromising performance.
b. Reduced Model Size and Parameters: MobileNet has a smaller number of parameters
compared to deeper architectures, resulting in a reduced model size. This is advantageous
for scenarios with limited storage capacity and facilitates quicker model deployment.
c. Well-Suited for Real-Time Applications on Resource-Constrained Devices: The
efficient design of MobileNet, both in terms of model size and computational require-
ments, makes it well-suited for real-time applications, especially on devices with limited
resources such as mobile phones and edge devices.
14 Problem Formulation and Proposed Solution
Disadvantages:
a. May Sacrifice Some Accuracy Compared to Deeper Architectures: Due to its
lightweight design, MobileNet may sacrifice a small amount of accuracy compared to
deeper and more complex architectures like traditional CNNs. In scenarios where achiev-
ing the highest possible accuracy is paramount, a trade-off between model efficiency and
precision might need consideration.
CNNs are structured to mimic the visual processing performed by the human brain.
Through the use of convolutional layers, the network learns to identify patterns and
features within the input, allowing it to recognize complex structures in images. These
networks have shown remarkable success in tasks like object recognition, image classifi-
cation, and, in the context of the Real-Time Sign Language Interpreter, capturing spatial
details of hand gestures.
The hierarchical architecture of CNNs enables them to automatically learn and ex-
tract increasingly abstract and complex features as the data passes through successive
layers. Non-linear activation functions, pooling layers, and fully connected layers
contribute to the network’s ability to understand and classify visual information.
CNNs are known for their parameter sharing and spatial hierarchies, making them
well-suited for tasks where the spatial arrangement of features is crucial. This character-
istic is particularly valuable in recognizing the unique hand configurations and positions
associated with sign language gestures.
In the Real-Time Sign Language Interpreter project, the CNN serves as a key component
for extracting essential spatial information from live video input, contributing to the
accurate and real-time interpretation of Indian Sign Language gestures. By harnessing the
power of convolutional layers, the CNN precisely identifies intricate patterns, shapes, and
3.2 Develop a Real Time Sign Language Interpreter 15
orientations of hands, enabling it to discern the rich vocabulary of Indian Sign Language.
This spatial understanding is crucial for the interpreter to recognize not only static
hand shapes but also dynamic movements, ensuring a comprehensive interpretation of
gestures. The CNN’s role extends beyond mere recognition; it actively contributes to the
system’s adaptability, allowing it to handle diverse signing styles, lighting conditions, and
backgrounds, ultimately enhancing the interpreter’s robustness in real-world scenarios.
The integration of CNN within the hybrid model showcases its versatility, making it an
indispensable tool for fostering inclusive communication for individuals with hearing
impairments.
Advantages:
a. Can Capture Complex Hierarchical Features: CNNs are designed to automatically
learn hierarchical features from raw pixel values. This ability is crucial for tasks like
image classification where the model needs to understand patterns at various levels of
abstraction. In sign language interpretation, capturing the hierarchical features of hand
gestures, finger movements, and spatial relationships is essential for accurate recognition.
b. Flexible Architecture for Customization: CNNs offer a flexible architecture that allows
customization to suit the specific characteristics of your dataset. You can design the
network with multiple convolutional layers, pooling layers, and fully connected layers
to capture and process the unique features of sign language gestures. This flexibility is
advantageous when tailoring the model to the intricacies of the task.
c. Well-Suited for Tasks Requiring High Accuracy: CNNs are known for their ability
to achieve high accuracy in image classification tasks when trained on large and diverse
datasets. In sign language interpretation, where precision is crucial for meaningful
communication, a CNN can be trained to recognize subtle variations in gestures, leading
to higher overall accuracy.
16 Problem Formulation and Proposed Solution
Disadvantages:
a. Larger Model Size: CNNs can have a larger number of parameters, leading to a larger
model size compared to more lightweight architectures. This can be a disadvantage in
scenarios with limited storage or when deploying the model on resource-constrained
devices, as it might require more memory and storage space.
Our decision is also influenced by the size of our dataset, which is quite big with
lots of labeled examples covering 36 different classes. CNNs excel in handling such large
and varied datasets. Their flexibility allows us to customize the model to pick up on the
unique features of sign language gestures, making it better at making precise predictions.
After comparing CNNs and MobileNets, it’s clear that CNNs are the better choice
for our project. While MobileNets are more efficient, we’ve found that the detailed
features CNNs can capture significantly contribute to the accuracy we’re aiming for in
sign language interpretation. So, our decision to go with CNNs is well-thought-out and
aligns with our commitment to achieving excellence in accuracy and precision for our
real-time sign language interpreter.
3.3 Work Done 17
\import numpy as np
Purpose: NumPy is a library for numerical operations in Python. It supports large, multi-
dimensional arrays and matrices, along with mathematical functions.
Purpose: Keras is a high-level neural networks API. The functions load img and img to
array are used for loading images and converting them to arrays, respectively.
Purpose: Matplotlib is a plotting library for Python. It provides a variety of static, ani-
mated, and interactive plots.
\import os
Purpose: The os module provides interaction with the operating system. It’s used for
navigating the file system and specifying file paths.
18 Problem Formulation and Proposed Solution
Purpose: Google Colab is a cloud-based platform. The drive module is used to mount
Google Drive, enabling access to files stored in Google Drive.
Configures the data generator for the training set. It specifies the directory containing
the training images, the target size, color mode (grayscale in this case), batch size, class
mode (categorical, indicating a classification task), and shuffle the data after each epoch.
Similar to the training set, configures the data generator for the validation set. The key
difference is that shuffle is set to False for the validation set, meaning the order of the
images will not be shuffled during training.
dropout.
- The last layer has units equal to the number of classes with softmax activation.
Compile the Model: - Choose the Adam optimizer with a learning rate of 0.0001.
- Set the loss function to categorical crossentropy (suitable for multi-class classification).
- Choose accuracy as the evaluation metric.
3.3.3 Training
Set Number of Epochs: - Specify the number of training epochs (10 in this case).
Import Callback: - Import the ModelCheckpoint callback from Keras. This callback is
used to save the model weights during training. Configure ModelCheckpoint Callback:
- Create a ModelCheckpoint instance to save the best model weights based on validation
20 Problem Formulation and Proposed Solution
accuracy. - Monitor validation accuracy, set verbose mode, and save only the best weights.
Create Callbacks List: - Create a list containing the ModelCheckpoint callback for later
use during model training.
Start Training: - Use the ‘fit‘ method on the model to train it.
- Provide the training data generator (‘train_generator‘) and the number of steps per
epoch.
- Specify the number of epochs, validation data generator (‘validation_generator‘), and
the number of validation steps.
- Include the callbacks list for additional functionality during training, such as saving the
best model weights.
Show Plot:
- Use the ‘show‘ method to display the generated plot.
3.3.4 Prediction
Define Categories:
-Create a list named CATEGORIES containing the labels for the different classes in your
sign language dataset.
Prepare Function:
-Define a function named prepare that takes a file path as input.
-Set the desired image size (IMG_SIZE) to 48 pixels.
-Read the image using OpenCV (cv2) and convert it to grayscale.
-Resize the image to the specified size.
-Reshape the image array to the format expected by the model ((-1, IMG_SIZE,
22 Problem Formulation and Proposed Solution
IMG_SIZE, 1)).
Load the Model:
-Load the pre-trained sign language interpreter model (full.model) using TensorFlow’s
Keras API.
Make Predictions:
-Use the prepare function to preprocess an input image for prediction.
-Pass the preprocessed image to the loaded model to obtain predictions for each class.
-The model outputs a probability distribution over the classes, and the class with the high-
est probability is considered the predicted class.
Display Results:
-Print or visualize the prediction results, showing the predicted sign language class for the
input image.
Chapter 4
In the comprehensive evaluation of our sign language interpreter model across 50 epochs,
we have witnessed a commendable trajectory of improvement. The training process,
consisting of 696 batches per epoch, demonstrates the model’s ability to learn and
generalize effectively.
Starting with the initial epoch, the model exhibited a training accuracy of 27.86%
and a validation accuracy of 34.33%. Over subsequent epochs, these values witnessed
significant enhancements, reflecting the model’s progressive refinement. By the final
epoch, the training accuracy reached an impressive 99.84%, while the validation accuracy
attained a noteworthy 99.99%.
As seen in the Figure 4.1 the corresponding loss metrics also tell a compelling
story. The training loss started at 2.4948 and steadily decreased over epochs, reaching
23
24 Results and Discussion
a minimal value of 0.0052 in the final epoch. Similarly, the validation loss showcased a
consistent downward trend, culminating in a minimal value of 0.00016967.
The notable aspect of this training journey is the early indication of model profi-
ciency. As early as the second epoch, the validation accuracy surpassed the 69% mark,
and subsequent epochs witnessed substantial leaps in performance. The model’s ability
to learn intricate patterns from the dataset is evident in the steady climb of accuracy and
the concurrent decline in loss values.
In conclusion, the results of this training process instill confidence in the efficacy
of the Convolutional Neural Network (CNN) architecture for our sign language inter-
preter model. The impressive accuracy values, coupled with the diminishing loss metrics,
underscore the model’s capability to precisely interpret a diverse range of sign language
gestures. As we move forward, the emphasis will be on continued evaluation, potential
fine-tuning, and the application of this trained model in real-world scenarios.
Chapter 5
5.1 Conclusion
In conclusion, the journey to develop a robust and accurate sign language interpreter has
been guided by a deliberate selection of Convolutional Neural Networks (CNNs) over
MobileNets. Our primary driver has been the unwavering commitment to achieving the
highest possible accuracy in sign language recognition. Understanding the intricate and
nuanced nature of sign language gestures, we recognized the need for a model capable
of grasping complex hierarchical features for precise interpretation. CNNs, renowned
for their prowess in learning from diverse datasets, seamlessly align with our quest to
attain superior accuracy by discerning the subtle variations inherent in sign language
expressions.
The significance of our decision is further underscored by the scale of our dataset,
comprising a substantial number of labeled instances across 36 distinct classes. This
sizable dataset provides an opportune landscape for CNNs to leverage their inherent flex-
ibility and adaptability. Through comprehensive training, we harness the customization
potential of CNNs, constructing a model finely tuned to the unique characteristics of
sign language gestures. This not only enhances the model’s ability to generalize but also
empowers it to make precise predictions, crucial for effective sign language interpretation.
25
26 Conclusion and Future Work
As we strive for excellence in accuracy and precision in our real-time sign lan-
guage interpreter, the decision to employ CNNs is rooted in a thoughtful alignment with
the unique demands of our project. The adaptability, customization, and feature-capturing
capabilities of CNNs position them as the optimal choice to meet the specific intricacies
of sign language recognition. Looking ahead, our commitment to refining and advancing
our sign language interpreter underscores our dedication to making a meaningful impact
in accessibility and communication for the hearing-impaired community. The journey
may have concluded, but the impact of our decision to embrace CNNs resonates in the
potential of a more inclusive and accessible future.
Looking ahead, the future work for our project involves a holistic approach. We
strive to break down communication barriers by embracing multiple modes of expression
and by upholding the highest ethical standards. The journey ahead involves refining our
system, expanding its capabilities, and fostering a community-driven ecosystem that
encourages innovation and inclusivity.
Appendix A
Appendix
A.1 Appendix 1
In this appendix, we provide a detailed overview of the architecture used for training
our sign language interpreter model and the key parameters associated with the training
process.
Model Architecture:
The sign language interpreter is built on a Convolutional Neural Network (CNN) archi-
tecture, leveraging its capability to capture intricate patterns and hierarchical features
crucial for sign language recognition. The model comprises multiple convolutional
layers followed by max-pooling layers to extract essential features from input images.
Subsequently, fully connected layers and a softmax layer are employed for classification
across the diverse set of sign language gestures.
Training Parameters:
1. Epochs: The model underwent training for 50 epochs, indicating the number of
times the entire dataset was processed.
2. Optimizer: The Adam optimizer was employed, known for its effectiveness in op-
timizing the model’s weights during training.
29
30 Appendix
4. Learning Rate: A default learning rate of 0.001 was utilized to control the step size
during optimization.
5. Batch Size: The training data was divided into batches, and each batch contained
32 images. This facilitated more efficient updates to the model’s weights during
training.
6. Validation Split: A validation split of 20% was applied, ensuring that a portion of
the training data was reserved for validation, allowing us to monitor the model’s
performance on unseen data.
A.2 Appendix 2
In this appendix, we present comprehensive details about the dataset used for training
and testing our sign language interpreter model.
Dataset Overview:
Our dataset consists of a diverse collection of images representing 26 letters of the
alphabet, numbers 0-9, and additional classes for symbols such as ’del,’ ’nothing,’ and
’space.’ Each class encompasses approximately 3,000 labeled images, resulting in a
substantial dataset with 36 classes in total.
Data Augmentation:
To enhance the model’s robustness and generalization, data augmentation techniques
were applied during training. These techniques include random rotations, horizontal
flips, and zooming, providing the model with a more varied set of training examples.
Image Preprocessing:
Images were preprocessed to ensure uniformity in dimensions, converting them to
grayscale and resizing to a fixed dimension of 48x48 pixels. This standardization allows
for consistent input to the model during training and inference.
Class Distribution:
Maintaining a balanced class distribution is crucial for preventing biases in the model.
The approximately 3,000 images per class contribute to a well-distributed dataset,
ensuring that the model is exposed to an adequate number of examples for each sign
language gesture.
A.2 Appendix 2 31
[1] R. Bhadauria, S. Nair, and D. Pal, “A survey of deaf mutes,” Medical Journal Armed
Forces India, vol. 63, no. 1, pp. 29–32, 2007.
[2] M. Friedner, “Sign language as virus: Stigma and relationality in urban india,” Med-
ical Anthropology, vol. 37, no. 5, pp. 359–372, 2018.
[7] C. Sruthi and A. Lijiya, “Signet: A deep learning based indian sign language recog-
nition system,” in 2019 International conference on communication and signal pro-
cessing (ICCSP). IEEE, 2019, pp. 0596–0600.
[8] X. Liu, Z. Jia, X. Hou, M. Fu, L. Ma, and Q. Sun, “Real-time marine animal images
classification by embedded system based on mobilenet and transfer learning,” in
OCEANS 2019 - Marseille, 2019, pp. 1–5.
[9] K. Bousbai and M. Merah, “A comparative study of hand gestures recognition based
on mobilenetv2 and convnet models,” in 2019 6th International Conference on Im-
age and Signal Processing and their Applications (ISPA), 2019, pp. 1–6.
33
34 References
[10] S. Vishwanath and S. S. Yawer, “Sign language interpreter using computer vision
and lenet-5 convolutional neural network architecture,” International Journal of In-
novative Science and Research Technology ISSN, no. 2456-2165, 2021.
[11] F. Wang, R. Hu, and Y. Jin, “Research on gesture image recognition method based
on transfer learning,” Procedia Computer Science, vol. 187, pp. 140–145, 2021.
[12] I. Stančin and A. Jović, “An overview and comparison of free python libraries for
data mining and big data analysis,” in 2019 42nd International convention on infor-
mation and communication technology, electronics and microelectronics (MIPRO).
IEEE, 2019, pp. 977–982.