Aanchal
Aanchal
Aanchal
By
Aanchal S. Bafna
(1072589)
1
SONOPANT DANDEKAR SHIKSHAN MANDALI’S
SONOPANT DANDEKAR ARTS, V.S. APTE COMMERCE & M.H.
MEHTASCIENCE COLLEGE
(Affiliated to University of Mumbai)
CERTIFICATE
This is to certify that the project entitled “Real Time Sign Language Recognition Using Machine
Learning”, is bonafied work of Aanchal Bafna bearing (1072589) submitted in partial fulfilment of the
requirements for the award of degree of BACHELOR OF SCIENCE in INFORMATION
TECHNOLOGY from University of Mumbai.
External Examiner
2
DECLARATION
I here by declare that the project entitled, “ Real Time Sign Language Recognition Using
Machine Learning ” done at SDSM College, has not been in any case duplicated to submit to
any other university for the award of any degree. To the best of my knowledge other than me,
no one has submitted to any other university.
The project is done in partial fulfilment of the requirements for the award of degree of
Aanchal Bafna
3
ABSTRACT
Sign language detection project is to detect the sign language hand gestures, which really
helps the common people like is to understand what a deaf or mute people are trying to
converse with us. The sign language detection translates the sign language, in which user
forms a hand shape that is structured signs or gestures. In sign language, the configuration
of the fingers, the orientation of the hand, and the relative position of fingers and hands to
the body are the expressions of a deaf and mute person. Based on this application, the user
must be able to capture images of the hand signs or gestures using web camera and they
shall predict the hand signs or meaning of the sign and display the name of sign language
on screen. Then we are going to label the images with the Label Image python application
file, which is very helpful for object detection. The Label Img application file develops an
XML document for the corresponding image for the training process. In the training
process, we have used TensorFlow object detection API to train our model. After training
the model, we have detected the sign language or hand gestures in real time; with the help
of OpenCV-python, we access the webcam and load the configs and trained model, so that
we have detected the sign languages in real time.
4
ACKNOWLEDGEMENT
The successful completion of any task would be incomplete without mentioning all those
people who made it possible, the constant and encouragement, crowns the effort with
success.
I wish many thanks to our Head of Department Dr. Juita T. Raut for providing guidance
throughout the course and all those who have indirectly guided and helped us in preparation
of this project.
I express my thanks to my project guide Ms. Cynthia N. Shinde & Mrs. Sayli M.
Bhosale or this constant motivation and valuable help through the project work.
Aanchal Bafna
5
TABLE OF CONTENTS
Chapter 1:
Introduction 09
1.1 Background 09
1.2 Objectives 10
1.3 Purpose and Scope 11
1.3.1 Purpose 11
1.3.2 Scope 12
1.3.3 Applicability 13
1.4 Achievements 14
1.5 Organization of Report 15
6
Chapter 4: System Design 27
4.1 Basic Modules 27
4.2 Data Design 29
4.2.1 Schema Design 30
4.2.2 Data Integrity and constants 31
Chapter 7: Conclusions 62
7.1 Conclusion 62
7.2 Significance of Project 62
7.3 Limitation and Future Scope 63
7.4 References 65
7
TABLE OF FIGURES
8
CHAPTER 1
Introduction
1.1 BACKGROUND
Sign language, a remarkable form of nonverbal communication, serves as a bridge
connecting diverse individuals, transcending linguistic barriers. This project embarks on
a captivating journey that fuses technology and human interaction to create a
sophisticated system capable of Sign Language Recognition, Emotion Detection, and
Text Conversion. By harnessing the power of Python, OpenCV, and advanced machine
learning techniques, we delve into a realm where visual expression meets computational
intelligence, ultimately fostering inclusivity, empathy, and seamless communication.
The foundation of this project rests upon the intricate interplay between the American
Sign Language (ASL) dataset and the Indian Sign Language (ISL) dataset. These
repositories of gestures, representing distinct cultures and linguistic nuances, lay the
groundwork for our system's versatility. This rich tapestry enables our model to decipher
signs from multiple sign languages, ensuring our technology serves a global audience.
9
emotions with precision. Through the marvel of machine learning, our model interprets
facial cues, assigning emotions to recognized gestures.
However, communication is not complete without the power of words. Python's linguistic
capabilities are harnessed to transform recognized gestures into coherent sentences. The
convergence of visual interpretation and textual articulation results in a harmonious
fusion, bridging the gap between sign language and spoken or written language
Real-Time Interaction and Conclusion
The climax of our project unfurls in real-time interaction, where our technology
transforms from concept to reality. Armed with a camera, our system captures live sign
language gestures. The intricately trained CNN deciphers these gestures, translating them
into textual narratives. Simultaneously, the emotion detection module adds depth to the
text, infusing it with the nuanced emotions conveyed through facial expressions.
In conclusion, this project stands as an embodiment of technological innovation with a
profound human touch. The fusion of Python, OpenCV, and machine learning has birthed
a dynamic system capable of recognizing signs, discerning emotions, and converting
them into meaningful text. This innovation celebrates the diversity of human expression,
transcending linguistic barriers and nurturing empathy and understanding. As we traverse
the crossroads of technology and communication, we pave the way for a more inclusive
world where sign language speaks eloquently to all.
The main purpose of sign language recognition using machine learning is to create a
bridge of effective communication between individuals who use sign language as their
primary mode of expression and those who do not understand sign language. By
leveraging the capabilities of machine learning algorithms and computer vision
techniques, the goal is to enable real-time interpretation and translation of sign language
gestures into text or speech. This technology aims to enhance inclusivity, accessibility,
and understanding in various aspects of life, including education, workplaces, public
spaces, and social interactions.
1.2 OBJECTIVES :
❖ The goal of project is to communication between disable people and non
disable people easier
The primary goal of this project is to facilitate seamless communication between
individuals with disabilities, particularly those who use sign language, and individuals
10
without disabilities. By leveraging advanced technologies such as Python and OpenCV,
we aim to develop a system that recognizes sign language gestures, interprets emotions
conveyed through facial expressions, and converts these interactions into easily
understandable text. This innovative approach seeks to bridge the communication gap,
fostering a more inclusive and empathetic society where disabled and non-disabled
individuals can engage in meaningful conversations with greater ease and understanding.
❖ To detect various hand gesture and symbolization sign language is preferable
it is the most often utilized method of communication.
The preferred and frequently used method of communication for detecting various hand
gestures and symbolizations is sign language. Sign language serves as a powerful means
of expression, enabling individuals to convey a wide range of messages, emotions, and
concepts through intricate hand movements and gestures. By focusing on sign language,
we tap into a highly effective and widely recognized mode of communication that allows
for nuanced and versatile interactions. This emphasis on sign language as a means of
detecting and interpreting hand gestures enhances the accuracy and authenticity of our
communication system.
❖ To educate disable people
A significant aspect of this project is its potential to provide valuable education and
empowerment to individuals with disabilities. By harnessing technology to recognize and
interpret sign language gestures, as well as capturing emotions, we offer a comprehensive
tool for disabled individuals to learn and engage. This educational dimension extends
beyond basic communication, empowering them to express themselves, learn new
concepts, and connect with others in a meaningful way. Through this project, we aspire
to equip disabled individuals with the means to enhance their knowledge, confidence,
and overall quality of life.
11
Colleges and Universities: Higher education institutions can use this technology to
provide accessible lectures and discussions, making college education more inclusive.
❖ Public Libraries:
Local libraries can offer real-time sign language recognition services to help deaf
individuals access and understand books, documents, and digital resources, promoting
literacy and learning.
❖ Healthcare Facilities:
Hospitals and Clinics: Medical institutions can use sign language recognition for
effective communication between healthcare providers and deaf patients during
medical consultations and procedures.
Pharmacies: Pharmacies can employ this technology to assist deaf customers in
understanding medication instructions and dosage.
❖ Local Government Offices:
Government agencies at the local level can use sign language recognition to
provide services to the deaf community, including applying for permits, accessing
social services, and participating in public meetings.
❖ Public Transportation:
Local transportation services, such as buses and trains, can implement sign
language recognition to assist deaf passengers with ticketing, directions, and safety
announcements, enhancing their travel experience.
❖ Community Centers:
Local community centers can offer sign language recognition for communication
during events, workshops, and recreational activities, ensuring that deaf participants
can fully engage in community life.
❖ Emergency Services:
Local police, fire departments, and emergency medical services can employ sign
language recognition during emergencies to ensure that deaf individuals receive
crucial information and assistance.
1.3.2 SCOPE:
a. Accessibility and Inclusivity:
The primary purpose is to make communication more accessible and inclusive for
individuals with hearing impairments or those who rely on sign language. By providing
a means for real-time translation, it enables these individuals to engage in conversations,
share ideas, and express themselves without barriers.
12
b. Empowerment:
Sign language recognition empowers individuals who use sign language by offering
them a tool to interact confidently and effectively with people who may not understand
sign language. It promotes independence and self-expression, fostering a sense of
empowerment and autonomy.
c. Education:
In educational settings, sign language recognition can facilitate better communication
between teachers, students, and parents. It can be integrated into e-learning platforms,
classrooms, and tutorials, allowing for seamless interaction and access to educational
content.
d. Workplaces and Professional Settings:
In professional environments, sign language recognition can aid in communication
between colleagues, supervisors, and clients. This technology can enhance productivity,
teamwork, and collaboration among diverse teams.
e. Public Services:
Sign language recognition has potential applications in public services, such as healthcare
facilities, government offices, and customer service centers. It ensures that individuals
with hearing impairments can access essential services without communication barriers.
f. Social Interaction and Inclusion:
By enabling real-time communication, sign language recognition promotes social
interaction and inclusion in social gatherings, events, and everyday interactions. It fosters
a more welcoming and understanding society.
g. Advancements in Assistive Technology:
Sign language recognition contributes to the advancement of assistive technologies. It
can be integrated into wearable devices, smartphones, and other tools that individuals
with hearing impairments use on a daily basis.
h. Research and Innovation:
The field of sign language recognition opens avenues for research and innovation in
computer vision, machine learning, and human-computer interaction. It challenges
researchers to develop robust and accurate algorithms, spurring advancements in these
domains.
13
In essence, the main purpose of sign language recognition using machine learning is to
break down communication barriers, promote inclusivity, and empower individuals with
hearing impairments to communicate effectively and seamlessly with the world around
them.
1.3.3 APPLICABILITY:
1.Accessibility and Inclusivity:
Communication for Deaf and Hard-of-Hearing Individuals: The primary
application is to enable individuals with hearing impairments to communicate more
effectively with those who do not understand sign language.
2.Education:
Classroom Learning: In educational settings, sign language recognition can assist
teachers and students by providing real-time translation of sign language, making it easier
for deaf or hard-of-hearing students to follow lessons.
Online Learning: It can be integrated into e-learning platforms, allowing deaf or hard-of-
hearing students to access online educational resources.
3.Workplaces and Professional Settings:
Meetings and Collaboration: Sign language recognition can facilitate
communication between colleagues in office meetings, ensuring that deaf or hard-of-
hearing employees are fully engaged in workplace discussions.
Customer Service: In customer service roles, it enables businesses to provide
accessible support to customers with hearing impairments.
4.Healthcare:
Doctor-Patient Communication: Sign language recognition can be used in
healthcare settings to assist doctors and nurses in communicating with patients who use
sign language.
Medical Consultations: It can facilitate remote medical consultations for patients
with hearing impairments.
14
5.Public Services:
Government Offices: It can be applied in government offices, allowing
government employees to communicate effectively with citizens who use sign language.
Emergency Services: In emergency situations, it can help first responders
communicate with individuals in distress who rely on sign language.
6.Social Interaction and Inclusion:
Social Gatherings: Sign language recognition promotes inclusivity in social
gatherings, events, and parties by enabling effective communication between individuals
with and without hearing impairments.
Dating Apps and Social Media: It can be integrated into dating apps and social
media platforms to facilitate communication between users.
7.Assistive Devices and Technology:
Wearable Devices: Sign language recognition can be incorporated into wearable
devices, such as smart glasses or hearing aids, to provide real-time translation.
Smartphones and Tablets: Mobile applications can use this technology to offer
accessible communication tools for individuals with hearing impairments.
8.Research and Development:
Advancements in Technology: The application of sign language recognition drives
research and development in computer vision, machine learning, and artificial
intelligence.
Gesture Recognition: It can be extended to recognize gestures in various contexts
beyond sign language, such as in gaming or human-computer interaction.
9.Cross-Lingual Communication:
Sign language recognition can be adapted to recognize multiple sign languages,
enhancing its applicability in different regions and linguistic communities.
10.Customization and Personalization:
Users can customize and personalize the system to recognize their own gestures or
adapt it to specific contexts or industries.
15
1.4 ACHIEVEMENTS :
1.Enhanced Communication Accessibility:
One of the primary achievements is the significant improvement in communication
accessibility for individuals with hearing impairments. Real-time sign language
recognition provides them with a means to communicate more effectively with the
broader community.
2.Inclusive Education:
In educational settings, this technology has empowered deaf and hard-of-hearing
students by allowing them to actively participate in classroom discussions and access
educational resources online. It promotes inclusive education and equal learning
opportunities.
3.Improved Workplace Inclusion:
In professional settings, real-time sign language recognition has facilitated better
communication between deaf or hard-of-hearing employees and their colleagues,
supervisors, and clients. This fosters a more inclusive work environment.
4.Accessible Healthcare:
Services In healthcare, it has improved doctor-patient communication, ensuring
that individuals with hearing impairments can effectively communicate their medical
needs and understand medical advice.
5.Empowerment and Independence:
Real-time sign language recognition has empowered individuals with hearing
impairments, enhancing their independence and self-confidence in various aspects of life.
6.Accessible Customer Service:
Businesses and organizations have been able to provide more accessible customer
service to individuals with hearing impairments, ensuring that they receive the same level
of support and assistance as other customers.
7.Cross-Lingual Support:
The adaptability of this technology to recognize multiple sign languages has made
it applicable in various linguistic communities, promoting inclusivity on a global scale.
8.Facilitation of Social Interactions:
16
In social gatherings, events, and online platforms, it has facilitated social
interactions and connections, allowing individuals with hearing impairments to engage
more actively in social life.
9.Research Advancements:
The development of real-time sign language recognition systems has spurred
advancements in computer vision, machine learning, and artificial intelligence.
Researchers have gained valuable insights into gesture recognition and related fields.
10.Personalization and Customization:
Users can personalize and customize the system to recognize their own sign
language gestures or adapt it to specific contexts, industries, or regional variations.
11.Assistive Technologies:
Integration into wearable devices, smartphones, and tablets has paved the way for
practical and portable assistive technologies that individuals with hearing impairments
can carry with them for communication support.
12.Promoting Inclusivity and Awareness:
Real-time sign language recognition has raised awareness about the needs of
individuals with hearing impairments and the importance of inclusive design and
technology.
17
CHAPTER 2
Survey of Technology
Moryossef, A., Tsochantaridis, I., Aharoni, R., Ebling, S., & Narayanan, S. (2020) [1]. Springer
International Publishing. We propose a lightweight real-time sign language detection model,
as we identify the need for such a case in videoconferencing. We extract optical flow
features based on human pose estimation and, using a linear classifier, show these
features are meaningful with an accuracy of 80%, evaluated on the Public DGS Corpus.
Using a recurrent model directly on the input, we see improvements of up to 91%
accuracy, while still working under 4 ms. We describe a demo application to sign
language detection in the browser in order to demonstrate its usage possibility in
videoconferencing applications.
Sahoo, A. K. (2021, June)[2]. Indian sign language recognition using machine learning
techniques. In Macromolecular symposia (Vol. 397, No. 1, p. 2000241).Automatic
conversion of sign language to text or speech is indeed helpful for interaction between
deaf or mute people with people who even do not have knowledge of sign language. This
is the demand of current times to develop an automatic system to convert ISL signs to
normal text and vice versa.A new feature extraction and selection technique using
structural features and some of the best available classifiers are proposed to recognize
ISL signs for better communication for computer‐human interface. This paper we used
which narrates a system for automatic recognition of ISL immobile numeric signs, in
which a standard digital camera was only used to acquire the signs, no wearable devices
are required to capture electrical signals.
Buckley, N., Sherrett, L., & Secco, E. L. (2021, July)[3]. A CNN sign language
recognition system with single & double-handed gestures. In 2021 IEEE 45th Annual
Computers, Software, and Applications Conference (COMPSAC) (pp. 1250-1253).
IEEE. A literature review focused on current (1) state of sign language recognition
systems and (2) techniques used is conducted. This review process is used as a foundation
on which a Convolutional Neural Network (CNN) based system is designed and then
implemented .with the help of that Algorithm It recognized and process the
18
ASL(American Sign Language)and ISL(Indian Sign Language )datasets images. During
testing, the system achieved an average recognition accuracy of 95%.
Sawant, S. N., & Kumbhar, M. S. (2014, May)[4]. Real time sign language recognition
using pca. In 2014 IEEE International Conference on Advanced Communications,
Control and Computing Technologies (pp. 1412-1415). IEEE. The Sign Language is a
method of communication for deaf-dumb people. This paper presents the Sign Language
Recognition system capable of recognizing 26 gestures from the Indian Sign Language
by using pycharm. The proposed system having four modules such as: pre-processing
and hand segmentation, feature extraction, sign recognition and sign to text and voice
conversion. Segmentation is done by using image processing. We referred this Algorithm
in our project for gesture recognition and recognized gesture is converted into text and
voice format. The proposed system helps to minimize communication barrier between
deaf-dumb people and normal people.
Taskiran, M., Killioglu, M., & Kahraman, N. (2018, July)[5]. A real-time system for
recognition of American sign language by using deep learning. In 2018 41st international
conference on telecommunications and signal processing (TSP) (pp. 1-5). IEEE. Deaf
people use sign languages to communicate with other people in the community. Although
the sign language is known to hearing-impaired people due to its widespread use among
them, it is not known much by other people. In this article, we have developed a real-time
sign language recognition system for people who do not know sign language to
communicate easily with hearing-impaired people. The sign language used in this paper
is American sign language.After network training is completed, the network model and
network weights are recorded for the real-time system. In the real-time system, the skin
color is determined for a certain frame for hand use, and the hand gesture is determined
using the convex hull algorithm, and the hand gesture is defined in real-time using the
registered neural network model and network weights. The accuracy of the real-time
system is 98.05%.
19
CHAPTER 3
Requirements and Analysis
20
TensorFlow: A deep learning framework for building and training CNN models.
OpenCV: For capturing and preprocessing video frames.
NumPy: For numerical operations and data manipulation.
Matplotlib : For visualizing data and model performance.
Visual Studio (VS Code) or an integrated development environment (IDE) of your
choice.
Scikit-learn (optional): For model evaluation and metrics.
Flask : Hosting using localhost server directly with python.
3. Dataset:
A comprehensive and diverse dataset of sign language gestures, including different signs
and gestures relevant to the target sign language(s).
Data preprocessing tools to clean, normalize, and augment the dataset.
4. CNN Model Architecture:
Design and implement a CNN architecture optimized for sign language recognition.
Specify the number of layers, filter sizes, activation functions, and other model
hyperparameters.
Utilize TensorFlow for model construction.
5. Real-time Video Input Module:
Develop a module to capture video frames from a webcam or camera using OpenCV.
Preprocess video frames by resizing, normalizing, and removing noise.
21
3.3 Planning and Scheduling:
Software Requirements:
1. Python 3.x: Python will be the primary programming language for development.
You can download Python from the official website: https://www.python.org/downloads/
2. TensorFlow: TensorFlow is a deep learning framework that provides tools for building
and training machine learning models.
3. OpenCV: OpenCV is a library for computer vision tasks, including image and video
processing.
22
Install OpenCV using pip:
5. Matplotlib : Matplotlib is a library for data visualization, which can be useful for
plotting results.
6. Visual Studio Code : VS Code is the interactive development ideal specially for
pyhton projects.
For download Visual Studio in system refer :
https://code.visualstudio.com/download/windows
Hardware Requirements:
23
2. Webcam or Camera:
A webcam or camera is required for capturing real-time sign language gestures.
3. Microphone (Optional):
If your project includes voice recognition in addition to sign language recognition, you'll
need a microphone.
24
Select appropriate machine learning models (e.g., CNNs, RNNs) for sign language
recognition. Train the models on labeled data to learn the mapping between features and
sign language gestures. Optimize the models by adjusting hyper parameters, using
techniques like transfer learning, or applying regularization methods.
Validation and Testing:
Evaluate the model's performance using validation and testing datasets to ensure it
generalizes well to new, unseen data.
Real-time Inference:
Implement algorithms that allow for real-time processing of video input, enabling
immediate recognition and translation of sign language.
Gesture Detection:
Identify and detect individual sign language gestures within a video stream.
Continuous Sign Sequencing:
Recognize and interpret sequences of signs to form coherent sentences or phrases.
User Interface:
Provide a user-friendly interface for individuals using sign language to interact with the
system, potentially incorporating visual feedback.
Text Output:
Convert recognized sign language into corresponding text to facilitate communication
with non-signers.
25
3.6.1 Use Case Diagram
26
3.6.2 Activity Diagram
27
3.6.4 Sequence Diagram
28
CHAPTER 4
System Design
1.Data Acquisition:
The process begins with capturing the hand gestures, which are typically performed by
individuals who are fluent in sign language.
The data may be acquired through various means, such as video cameras, depth sensors,
or specialized gloves equipped with sensors.
29
Image or Video Frames:
If using cameras, each frame can be used as input.
Depth Maps:
In the case of depth sensors, depth maps are often used to capture 3D information about
the hand.
Skeleton Data:
If using gloves with sensors, you can obtain joint position data.
Preprocessing Module:
This module is responsible for cleaning and preparing the data for feature extraction
and classification. Preprocessing steps may include:
Noise Reduction: Removing background noise or unrelated movements.
Normalization: Scaling the data to a standard size or range.
Segmentation: Isolating the hand from the rest of the frame.
Feature Extraction: Extracting relevant information from the data, such as hand
position, orientation, or key points. Common techniques include:
Edge Detection: Identifying hand edges for shape analysis.
Feature Point Detection: Identifying key points like fingertips or palm center.
Motion Features: Calculating motion vectors or trajectories of key points.
1.Descriptor Module:
The descriptor module transforms the feature vectors extracted in the preprocessing step
into a format suitable for machine learning. Common techniques include:
Histogram of Oriented Gradients (HOG): Capturing local shape and motion
information.
Local Binary Patterns (LBP): Describing local texture patterns.
Principal Component Analysis (PCA): Reducing dimensionality.
30
Convolutional Neural Networks (CNNs): For deep learning-based approaches,
especially when working with image or depth data.
2.Classifier Module:
This module is responsible for recognizing and classifying the sign language gestures
based on the descriptors generated in the previous step. Common classifiers include:
Deep Learning Models (e.g., Recurrent Neural Networks, Convolutional Neural
Networks, or Transformers)
3.Sign Database:
A sign database or dataset is crucial for training and testing the system. It contains a
collection of sign language gestures with corresponding labels.
31
Extract relevant features from the video frames or sequences. This can involve
techniques like:
Spatial Features: Extract information from the appearance of hands, face, and body.
Temporal Features: Consider how the signs evolve over time, capturing motion
patterns.
6.Model Selection:
Choose an appropriate machine learning model. For sign language recognition, deep
learning models like Convolutional Neural Networks (CNNs) and Recurrent Neural
Networks (RNNs) are commonly used due to their ability to capture spatial and
temporal information.
7.Model Training:
Train the selected model on the training data. The model learns to recognize patterns in
the features extracted from the sign language videos.
8.Model Evaluation:
Use the validation set to monitor the model's performance during training. Adjust hyper
parameters or try different architectures as needed.
9.Model Testing:
Assess the model's performance on the testing set. This provides an unbiased estimate
of how well the model will perform on new, unseen data.
10.Model Optimization:
Fine-tune the model if necessary. This may involve adjusting hyper parameters or using
techniques like transfer learning.
11.Deployment:
Once the model is trained and performs well, it can be deployed in real-world
applications. This could be as part of a mobile app, a website, or integrated into a larger
system.
12.Monitoring and Maintenance:
Regularly monitor the model's performance in real-world scenarios. Re-train or fine-
tune it if performance degrades over time.
32
4.2.1 SCHEMA DESIGN
33
34
4.3 LOGIC DIAGRAM
35
Data Annotation: Annotate the dataset with the corresponding sign language labels,
specifying the signs' meanings or letters.
Data Splitting: Divide the dataset into training, validation, and test sets. Typically, you
might use an 80-10-10 or 70-15-15 split.
3.Feature Extraction:
Extract relevant features from the sign language data. This can be done using
techniques like:
Image-based Sign Recognition: Extract features from images or video frames, such as
histograms of oriented gradients (HOG), scale-invariant feature transform (SIFT), or
deep learning-based features using convolutional neural networks (CNNs).
Sensor-based Sign Recognition: If you are using sensor data (e.g., accelerometers or
gyroscopes), process and extract relevant information.
Deep Learning Features: Train deep neural networks to learn features directly from the
raw data using techniques like Convolutional Neural Networks (CNNs) or Recurrent
Neural Networks (RNNs).
4.Data Augmentation:
To enhance the diversity of your dataset and improve model generalization, apply data
augmentation techniques like rotation, scaling, and translation to your training data.
36
4.3.2 ALGORITHEM DESIGN
1. Import necessary libraries:
➢ cv2: OpenCV for image processing.
➢ cvzone.HandTrackingModule: A custom module from the CVZone library for
hand tracking.
➢ cvzone.ClassificationModule: A custom module from the CVZone library for
image classification.
➢ numpy: NumPy for numerical operations.
➢ math: Python's math module for mathematical functions.
2. Initialize variables and objects:
➢ cap: Open a video capture object to capture video from the default camera
(camera index 0).
➢ detector: Create a hand detection object with a maximum of 1 hand.
➢ classifier: Create a hand gesture classifier object that loads a pre-trained Keras
model and associated labels.
➢ offset: Set an offset for cropping the hand region.
➢ imgSize: Define the size for the output image.
➢ folder: Set a folder variable (unused in the provided code).
➢ counter: Initialize a counter variable (unused in the provided code).
➢ labels: Create a list of labels corresponding to hand gestures.
3. Start an infinite loop using while True to continuously process video frames.
4. Read a frame from the video capture and copy it to imgOutput.
5. Use the detector to find hands in the current frame, and store the hand information in
the hands list.
6. If hands are detected (the hands list is not empty), proceed with hand gesture
recognition.
7. Retrieve information about the first detected hand (index 0) in the hands list. Extract
the bounding box coordinates (x, y, w, h) of the hand region.
8. Create a white image (imgWhite) of size imgSize x imgSize filled with white color
(255, 255, 255).
9. Crop the region around the detected hand using the offset value to obtain imgCrop.
37
10. Determine the aspect ratio of the hand region (h / w). This is used to decide how to
resize and fit the cropped image into the imgWhite canvas.
11. If the aspect ratio is greater than 1, it means the hand is taller than it is wide. In this
case:
➢ Calculate a scaling factor (k) based on the height of imgWhite.
➢ Compute the new width (wCal) by scaling the original width (w) using k.
➢ Resize the imgCrop to have a width of wCal while maintaining the aspect ratio.
➢ Calculate the gap needed on both sides of the resized image to center it in
imgWhite.
➢ Copy the resized image into the center of imgWhite.
➢ Use the classifier to predict the hand gesture based on imgWhite and get the
prediction and corresponding index.
1) If the aspect ratio is less than or equal to 1, it means the hand is wider than it is
tall. In this case:
2) Draw rectangles and text on imgOutput to display the recognized hand gesture:
➢ A filled rectangle with a label for the gesture (using labels[index]).
➢ The label text itself.
➢ A rectangle around the detected hand.
1) Display intermediate images (the cropped hand region and the resized image)
using cv2.imshow().
2) Display the final processed frame in a window labeled "Image".
3) Use cv2.waitKey(1) to update the display.
38
4.4 PROCEDURAL DESIGN
39
4.5 TEST CASE DESIGN
40
CHAPTER 5
IMPLEMENTATION AND TESTING
Data Collection :
1. Import Libraries :
Import necessary libraries including OpenCV (cv2), HandDetector from
cvzone.HandTrackingModule, and numpy for numerical operations.
3. Set Constants:
Define constants such as offset (for cropping), imgSize (size of the image for
collection), folder (where images will be saved), and initialize a counter to keep track
of the number of collected images.
41
• Wait for a key press. If the key "s" is pressed:
• Increment the counter.
• Save the preprocessed image in the specified folder with a unique name based
on the current timestamp using cv2.imwrite().
• Print the current value of the counter.
5. Release Resources:
Once the loop is terminated (manually), release the video capture device and
close all OpenCV window.
Main Code :
1. Import Libraries:
Import the necessary libraries, including OpenCV (cv2), HandDetector and Classifier
from cvzone, and numpy for numerical operations.
3. Set Constants:
Define constants such as offset (for cropping), imgSize (size of the image for
classification), folder (where images will be saved), labels (list of labels for
classification).
4. Initialize Variables:
Initialize current_word as an empty string to store the recognized word.
Initialize cooldown_frames to prevent rapid detections.
6. Main Loop:
Enter an infinite loop to continuously capture frames from the camera.
Read a frame from the video capture device using cap.read().
Create a copy of the frame for output visualization (imgOutput).
42
7. Hand Detection:
Use findHands() method of the hand detector to detect hands in the frame.
If hands are detected:
Extract the bounding box (x, y, w, h) of the detected hand.
Crop the region of interest (ROI) containing the hand with an offset to include some
extra space around the hand.
9. Visualization:
Draw rectangles and put text displaying the recognized label on the output frame.
Display the output frame with the recognized label and the current word.
Display the current word in a separate window.
10. Check for Key Press:Wait for a key press using cv2.waitKey(1).
If the key "q" is pressed, break out of the loop and exit the program.
Code efficiency is a broad term used to depict the reliability, speed and
programming methodology used in developing codes for an application. Code
efficiency is directly linked with algorithmic efficiency and the speed of runtime
execution for software. It is the key element in ensuring high performance. The goal of
code efficiency is to reduce resource consumption and completion time as much as
possible with minimum risk to the business or operating environment. The software
product quality can be accessed and evaluated with the help of the efficiency of the
code used. Code efficiency plays a significant role in applications in a high-execution-
speed environment where performance and scalability are paramount.
CNN (Convolutional Neural Network): CNNs are effective for image recognition
tasks like sign language recognition. They efficiently learn spatial hierarchies of
43
features through convolutional layers, enabling accurate classification of hand
gestures. Utilizing CNNs ensures that the model can extract relevant features from sign
language images effectively.
Python: Python's simplicity, readability, and extensive libraries make it an ideal choice
for implementing machine learning projects. Its rich ecosystem, including libraries like
NumPy, Pandas, and OpenCV, facilitates data manipulation, preprocessing, and model
evaluation. Python's flexibility allows seamless integration with TensorFlow and other
machine learning frameworks, enhancing code efficiency by enabling rapid
prototyping and experimentation.
A test approach is the test strategy implementation of a project which defines how
testing would be carried out, defines the strategy which needs to be implemented and
executed, to carry out a particular task. There are two types of testing approaches-
Proactive Testing & Reactive Testing. We’ve selected Proactive approach for the
testing of our project. Proactive Testing approach is an approach in which the test
design process is initiated as early as possible in order to find and fix the defects
before the build is created. The proactive approach includes planning for the future,
taking into consideration the potential problems that on occurrence may disturb the
orders of processes in the system. It is about recognizing the future threats and
preventing them with requisite actions and planning so that you don’t end up getting
into bigger trouble.
For this project, we’ve adopted proactive testing approach because it not only can
reduce defects in delivered software systems. Unlike conventional testing, which
44
merely reacts to whatever has been designed/developed and frequently is perceived as
interfering with development, Proactive Testing also helps prevent problems and
make development faster, easier, and less aggravating.. Organizations that learn to let
Proactive Testing drive development not only can avoid such showstoppers’ impacts;
they also can significantly reduce the development time and effort needed to correct
the errors that cause the showstoppers. Hence, we found Proactive Testing approach
will prove to be effective & beneficiary according to our project model.
“Unit Testing” is a type of testing which is done by software developers in which the
smallest testable module of an application - like functions, procedures or interfaces -
are tested to ascertain if they are fit to use. This testing is done to ensure that the
source code written by the developer meets the requirement and behaves in an
expected manner. The goal of unit testing is to break each part of source code and
check that each part works properly. This should mean that if any set of input is fed
to function or procedure, it should return an expected output.
Advantages of unit testing:
✓ Defects are found at an early stage. Since it is done by the dev team by testing
individual pieces of code before integration, it helps in fixing the issues early on in
source code without affecting other source codes.
✓ It helps maintain the code. Since it is done by individual developers, stress is
being put on making the code less inter dependent, which in turn reduces the
chances of impacting other sets of source code.
✓ It helps in reducing the cost of defect fixes since bugs are found early on in the
development cycle.
✓ It helps in simplifying the debugging process. Only latest changes made in the
code need to be debugged if a test case fails while doing unit testing.
45
5.4 Integrated Testing
Various unit testing modules were integrated to form a complete system they
are illustrated along with their outputs below:
46
5.5Test Case
1 Latter “A” Sign Display letter “A” sign Recognized as Pass 95%
to the camera letter “A”
The table presents a series of test actions conducted to evaluate the accuracy of
a sign recognition system. Each action involves displaying a specific sign to the
camera as input, and the obtained output is the recognition result produced by the
system. The test result is then determined by comparing the obtained output with the
expected output, with a specified accuracy percentage. For instance, in the first
action, display in the letter "A" sign to the camera should ideally result in the system
recognizing it as the letter "A", achieving an accuracy of 95%. Similarly, the
accuracy percentages for recognizing the number "S" sign are reported as 92% and
the "HELLO" sign are reported as Failed. The test results are categorized as Pass or
Fail based on whether the obtained output aligns with the expected output within the
specified accuracy threshold.
47
5.5.2 INTEGRATED TEST CASE :
2 Interact with Tap on “Start Camera” Gesture window Window Opens Pass
User Interface button Frame will Open Successfully
3 Quit Application Tap on “Stop Camera” Application will Quit Application Fail
button NOT Quit Successfully
The test scenario outlined in the table evaluates the functionality of a camera
application across various actions and inputs. It begins with testing the activation of
the camera view upon tapping the "Start Camera" button, followed by the not
successful closure of the application upon tapping the "Stop Camera" button or
closing the app. Each action's input is compared against the expected output, ensuring
that the application behaves as intended. This structured approach to testing aims to
confirm that the camera application operates smoothly and effectively for users,
ultimately ensuring a positive user experience.
48
5.6 Modification and Improvements
EPILOGUE :
49
2. ERROR CODE TEST ID : E2
In “index.html” there is “style.css” file is not mentioned by this file the web page
open like first figure which is white and don’t have any layout and attractiveness.
50
EPILOGUE :
By using the html line
<link rel="stylesheet"
href="./style.css">
And mention the location of
the file to access css properties
which gives the attractive and
user friendly look to our
website.
There is *no compile* error find so the window and text field not appearing and
program not compiled properly.
51
EPILOGUE :
hand_gesture_recognition.start()
52
CHAPTER 6
RESULTS AND DISCUSSION
1 Execute the Browser will Open GUI or Web Obtain GUI of Pass
HTML Code Page Model
Successfully
2 Interact with Tap on “Start Camera” Gesture window Window Opens Pass
User Interface button Frame will Open Successfully
3 Quit Application Tap on “End Camera” Application will Quit Application Pass
button Quit Successfully
The test scenario outlined in the table evaluates the functionality of a camera
application across various actions and inputs. It begins with testing the activation of
the camera view upon tapping the "Start Camera" button, followed by the successful
closure of the application upon tapping the "Stop Camera" button or closing the app.
Each action's input is compared against the expected output, ensuring that the
application behaves as intended. This structured approach to testing aims to confirm
that the camera application operates smoothly and effectively for users, ultimately
ensuring a positive user experience.
Conclusion: Successfully Tested User Interface Design.
53
Final Test Case 2
The table presents a series of test actions conducted to evaluate the accuracy of a sign
recognition system. Each action involves displaying a specific sign to the camera as
input, and the obtained output is the recognition result produced by the system. The
test result is then determined by comparing the obtained output with the expected
output, with a specified accuracy percentage. For instance, in the first action, display
in the letter "A" sign to the camera should ideally result in the system recognizing it
as the letter "A", achieving an accuracy of 95%. Similarly, the accuracy percentages
for recognizing the number "S" sign and the "H,E,L,L,O" sign are reported as 92%
and 82% respectively. The test results are categorized as Pass or Fail based on
whether the obtained output aligns with the expected output within the specified
accuracy threshold.
Conclusion: Successfully ASL Dataset Testing is done.
54
Final Test Case 3
The table presents a series of test actions conducted to evaluate the accuracy of
a sign recognition Gesture Augmentation . Each action involves displaying a specific
sign to the camera as input, and the obtained output is the recognition result produced
by the system. The test result is then determined by comparing the obtained output with
55
the expected output, with a specified accuracy percentage. For instance, in the first
action, display in the letter "A" sign to the camera should ideally result in the system
recognizing it as the letter "D", achieving an accuracy of 88%. Similarly, the accuracy
percentages for recognizing the number "1" sign and the "BYE" sign are reported as
85% and 82% respectively. The test results are categorized as Pass or Fail based on
whether the obtained output aligns with the expected output within the specified
accuracy threshold.
Conclusion: Successfully ASL Dataset Augmentation Testing is done.
1 Execute the Browser will Open GUI or Web Obtain GUI of Pass
HTML Code Page Model
Successfully
2 Interact with Tap on “Start Camera” Gesture window Window Opens Pass
User Interface button Frame will Open Successfully
3 Quit Application Tap on “End Camera” Application will Quit Application Pass
button Quit Successfully
The test scenario outlined in the table evaluates the functionality of a camera
application across various actions and inputs. It begins with testing the activation of
the camera view upon tapping the "Start Camera" button, followed by the successful
closure of the application upon tapping the "Quite Application" button or closing the
app. Each action's input is compared against the expected output, ensuring that the
application behaves as intended. This structured approach to testing aims to confirm
that the camera application operates smoothly and effectively for users, ultimately
56
ensuring a positive user experience.
Conclusion: Successfully Tested User Interface Design.
Step I :
Firstly to open the Application User need to go any browser like chrome browser or
Microsoft Edge End search on the search engine. Localhost:\\5000 after executing the
code in visual studio The Web Site interface will be opened.
57
Step II :
After opening the website, It will be showing a “Machine learning World” based
website which contains two buttons. The first is “start camera” and the second is
“Quiet Application” with the navigation bar that contains home, CNN,, datasets blog,
features, ai, and contact.
After that click on the start camera Button the webcam is on and a window is appear
as given in the above figure.
Step II :
58
After opening the Window framework with the help of open cv The user need to detect
their hand in the given specific area of the Camera The Algorithm detect your hand.
User need to show and make their hand shape according to the ASL which is American
Sign Language As given on the above figure I make my hand shape like C. It will be
detect and print below the frame which is “C”.
Step IV :
Now user need to make their hand shape like a Shape of “A” cell data set, and it will
be print the a letter On the side of the letter. So the word making is started now.
59
Step V :
Now I make my hand shape like “B” according to the ASL Dataset and my algorithm
detects its B and print after the A. So the word “CAB” is become , by the help of this,
you can make any words and sentence.
Step VI :
After, printing the sentence and doing all things user needs to click on the
“Cross mark” that Situated on top left of the frame So the window is
disclosed and user redirect on the web page.
60
Step VII :
Click on the quiet application button and the application will be closed.
61
CHAPTER 7
CONCLUSIONS
7.1 Conclusion
Sign language recognition using machine learning has significant benefits, including:
Accessibility:
It enhances accessibility for the deaf and hard-of-hearing community by enabling
communication through sign language interpretation.
Communication: It bridges the communication gap between the deaf community and
non-signers, facilitating more inclusive interactions.
Education: It improves educational opportunities for deaf individuals by providing
access to online courses, lectures, and educational materials through sign language
translation.
Employment: By enabling better communication, it enhances employment
opportunities for deaf individuals in various sectors, reducing barriers to participation
in the workforce.
Independence: It promotes independence for deaf individuals by allowing them to
navigate public spaces, interact with technology, and access services without relying
on interpreters.
Research: Sign language recognition facilitates linguistic research and the
development of educational tools for sign language learners and linguists.
Innovation: Advances in sign language recognition can lead to the development of
innovative applications and devices, such as sign language translation apps and
wearable devices for real-time interpretation.
62
Overall, sign language recognition using machine learning holds promise for
improving accessibility, communication, and inclusion for the deaf and hard-of-
hearing community.
➢ Limitations
Privacy and Security: Privacy concerns arise when deploying sign language
recognition systems in sensitive contexts, such as healthcare or surveillance, where
ensuring the confidentiality and integrity of communication data is paramount.
Robust encryption, anonymization, and access control mechanisms are needed to
protect user privacy and prevent unauthorized access.
➢ Future Work
Certainly, here are some advanced points that could shape the future of sign
language recognition using machine learning:
63
transformer-based models or graph neural networks, could lead to more accurate and
efficient sign language recognition systems by better capturing spatial and temporal
dependencies in sign language sequences.
Transfer Learning and Few-shot Learning: Leveraging transfer learning and few-
shot learning techniques could enable sign language recognition systems to generalize
better to new sign languages or users with limited training data, thereby increasing their
adaptability and scalability.
These advanced points represent cutting-edge research directions that have the
potential to significantly advance the field of sign language recognition using machine
learning in the coming years.
64
7.5 References
Moryossef, A., Tsochantaridis, I., Aharoni, R., Ebling, S., & Narayanan, S. (2020)
[1]. Real-time sign language detection using human pose estimation. In Computer
Vision–ECCV 2020 Workshops: Glasgow,
UK, August 23–28, 2020, Proceedings, Part II 16 (pp. 237-248).
https://link.springer.com/chapter/10.1007/978-3-030-66096-3_17
Buckley, N., Sherrett, L., & Secco, E. L. (2021, July)[3]. A CNN sign language
recognition system with single & double-handed gestures. In 2021 IEEE 45th Annual
Computers.
https://ieeexplore.ieee.org/abstract/document/9529449/
Sawant, S. N., & Kumbhar, M. S. (2014, May)[4]. Real time sign language
recognition using pca. In 2014 IEEE International Conference on Advanced
Communications, Control and Computing Technologies (pp. 1412-1415).
https://ijaer.com/admin/upload/03%20Thigale%20S%20S%2001213.pdf
Taskiran, M., Killioglu, M., & Kahraman, N. (2018, July)[5]. A real-time system
for recognition of American sign language by using deep learning. In 2018 41st
international conference on telecommunications and signal processing (TSP) (pp. 1-5
https://dergipark.org.tr/en/pub/tjst/issue/72762/1073116
65