Aanchal

Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

REAL TIME SIGN LANGUAGE RECOGNITION

USING MACHINE LEARNING


A Project Report
Submitted in partial fulfilment of the
Requirements for the award of the Degree
of
BACHELOR OF SCIENCE (INFORMATION TECHNOLOGY)

By

Aanchal S. Bafna
(1072589)

Under the esteemed guidance


of
Ms. Cynthia N. Shinde
Assistant Professor

DEPARTMENT OF INFORMATION TECHNOLOGY

SONOPANT DANDEKAR ARTS, V.S. APTE COMMERCE & M.H.


MEHTASCIENCE COLLEGE
(Affiliated to University of
Mumbai)PALGHAR 401404
MAHARASHTRA
2023

1
SONOPANT DANDEKAR SHIKSHAN MANDALI’S
SONOPANT DANDEKAR ARTS, V.S. APTE COMMERCE & M.H.
MEHTASCIENCE COLLEGE
(Affiliated to University of Mumbai)

PALGHAR MAHARASHTRA 401404

DEPARTMENT OF INFORMATION TECHNOLOGY

CERTIFICATE

This is to certify that the project entitled “Real Time Sign Language Recognition Using Machine
Learning”, is bonafied work of Aanchal Bafna bearing (1072589) submitted in partial fulfilment of the
requirements for the award of degree of BACHELOR OF SCIENCE in INFORMATION
TECHNOLOGY from University of Mumbai.

Internal Examiner Head of Department

External Examiner

Date: College Seal

2
DECLARATION

I here by declare that the project entitled, “ Real Time Sign Language Recognition Using
Machine Learning ” done at SDSM College, has not been in any case duplicated to submit to
any other university for the award of any degree. To the best of my knowledge other than me,
no one has submitted to any other university.
The project is done in partial fulfilment of the requirements for the award of degree of

BACHELOROF SCIENCE (INFORMATION TECHNOLOGY) to be submitted as


final semester project as part of our curriculum.

Aanchal Bafna

3
ABSTRACT

Sign language detection project is to detect the sign language hand gestures, which really
helps the common people like is to understand what a deaf or mute people are trying to
converse with us. The sign language detection translates the sign language, in which user
forms a hand shape that is structured signs or gestures. In sign language, the configuration
of the fingers, the orientation of the hand, and the relative position of fingers and hands to
the body are the expressions of a deaf and mute person. Based on this application, the user
must be able to capture images of the hand signs or gestures using web camera and they
shall predict the hand signs or meaning of the sign and display the name of sign language
on screen. Then we are going to label the images with the Label Image python application
file, which is very helpful for object detection. The Label Img application file develops an
XML document for the corresponding image for the training process. In the training
process, we have used TensorFlow object detection API to train our model. After training
the model, we have detected the sign language or hand gestures in real time; with the help
of OpenCV-python, we access the webcam and load the configs and trained model, so that
we have detected the sign languages in real time.

4
ACKNOWLEDGEMENT

The successful completion of any task would be incomplete without mentioning all those
people who made it possible, the constant and encouragement, crowns the effort with
success.

I wish many thanks to our Head of Department Dr. Juita T. Raut for providing guidance
throughout the course and all those who have indirectly guided and helped us in preparation
of this project.

I express my thanks to my project guide Ms. Cynthia N. Shinde & Mrs. Sayli M.
Bhosale or this constant motivation and valuable help through the project work.

I am indebted to my well-wishers and friends who encourage me in successful completion


of the project.

Aanchal Bafna

5
TABLE OF CONTENTS

Chapter 1:
Introduction 09
1.1 Background 09
1.2 Objectives 10
1.3 Purpose and Scope 11
1.3.1 Purpose 11
1.3.2 Scope 12
1.3.3 Applicability 13
1.4 Achievements 14
1.5 Organization of Report 15

Chapter 2: Survey of Technologies 17

Chapter 3: Requirements and Analysis 19


3.1 Problem Definition 19
3.2 Requirements Specification 20
3.3 Planning and Scheduling
20
3.4 Software and Hardware Requirements 21
3.4.1 Software Requirements 21
3.4.2 Hardware Requirements
22
3.5 Preliminary Product Description 22
3.6 Conceptual Models 24
3.6.1 Use Case Diagram 24
3.6.2 Data Flow Diagram 24
3.6.3 Activity Diagram 25
3.6.4 Sequence Diagram

6
Chapter 4: System Design 27
4.1 Basic Modules 27
4.2 Data Design 29
4.2.1 Schema Design 30
4.2.2 Data Integrity and constants 31

4.3 Logic Diagrams 32


4.3.1 Data Structures 32
4.3.2 Algorithm Design 33

4.4 Procedural Design 36


4.5 Test Cases Design 37

Chapter 5: Implementation and Testing 41


5.1 Coding Details and Efficiency 41
5.1.1 Coding Details 41
5.1.2 Coding Efficiency 43

5.2 Testing Approach 44


5.3 Unit Testing 45
5.4 Integrated Testing 46
5.5 Test Case 47
5.5.1 Unit Test Case 47
5.5.2 Integrated Test Case 48
5.6 Modification and Improvement 49

Chapter 6: Result and Discussion 53


6.1 Final Test Case 53
6.2 System User Document 57

Chapter 7: Conclusions 62
7.1 Conclusion 62
7.2 Significance of Project 62
7.3 Limitation and Future Scope 63
7.4 References 65

7
TABLE OF FIGURES

Fig.3.1 Gantt Chart 22


Fig.3.2 Use Case Diagram 26
Fig.3.3 Data Flow Diagram 26
Fig.3.4 Activity Diagram 27
Fig.3.5 Sequence Diagram 28
Fig.4.1 Schema Design 33
Fig.4.3 Logic Diagram 35
Fig.4.4 Procedural Design 39
Fig.4.5 Test Case Design

8
CHAPTER 1
Introduction
1.1 BACKGROUND
Sign language, a remarkable form of nonverbal communication, serves as a bridge
connecting diverse individuals, transcending linguistic barriers. This project embarks on
a captivating journey that fuses technology and human interaction to create a
sophisticated system capable of Sign Language Recognition, Emotion Detection, and
Text Conversion. By harnessing the power of Python, OpenCV, and advanced machine
learning techniques, we delve into a realm where visual expression meets computational
intelligence, ultimately fostering inclusivity, empathy, and seamless communication.
The foundation of this project rests upon the intricate interplay between the American
Sign Language (ASL) dataset and the Indian Sign Language (ISL) dataset. These
repositories of gestures, representing distinct cultures and linguistic nuances, lay the
groundwork for our system's versatility. This rich tapestry enables our model to decipher
signs from multiple sign languages, ensuring our technology serves a global audience.

Sign Language Recognition


Our journey unfolds with the meticulous curation of an extensive dataset capturing a
myriad of ASL and ISL gestures. Embracing Python's scripting prowess and leveraging
OpenCV's image processing capabilities, we initiate data preprocessing. This entails
enhancing image quality, standardizing formats, and reducing noise, setting the stage for
subsequent analysis.
Central to our endeavor is the Convolutional Neural Network (CNN), an embodiment of
modern machine learning prowess. Guided by the artistry of Python libraries such as
TensorFlow the CNN undergoes training on the enriched dataset. Through countless
iterations, the network learns to identify intricate patterns within the visual
representations, paving the way for robust sign language recognition.

Emotion Detection and Text Conversion


The symphony of our project deepens as we pivot towards the realm of human emotion.
OpenCV's ability to analyze facial expressions comes to the fore, enabling us to detect

9
emotions with precision. Through the marvel of machine learning, our model interprets
facial cues, assigning emotions to recognized gestures.
However, communication is not complete without the power of words. Python's linguistic
capabilities are harnessed to transform recognized gestures into coherent sentences. The
convergence of visual interpretation and textual articulation results in a harmonious
fusion, bridging the gap between sign language and spoken or written language
Real-Time Interaction and Conclusion
The climax of our project unfurls in real-time interaction, where our technology
transforms from concept to reality. Armed with a camera, our system captures live sign
language gestures. The intricately trained CNN deciphers these gestures, translating them
into textual narratives. Simultaneously, the emotion detection module adds depth to the
text, infusing it with the nuanced emotions conveyed through facial expressions.
In conclusion, this project stands as an embodiment of technological innovation with a
profound human touch. The fusion of Python, OpenCV, and machine learning has birthed
a dynamic system capable of recognizing signs, discerning emotions, and converting
them into meaningful text. This innovation celebrates the diversity of human expression,
transcending linguistic barriers and nurturing empathy and understanding. As we traverse
the crossroads of technology and communication, we pave the way for a more inclusive
world where sign language speaks eloquently to all.
The main purpose of sign language recognition using machine learning is to create a
bridge of effective communication between individuals who use sign language as their
primary mode of expression and those who do not understand sign language. By
leveraging the capabilities of machine learning algorithms and computer vision
techniques, the goal is to enable real-time interpretation and translation of sign language
gestures into text or speech. This technology aims to enhance inclusivity, accessibility,
and understanding in various aspects of life, including education, workplaces, public
spaces, and social interactions.

1.2 OBJECTIVES :
❖ The goal of project is to communication between disable people and non
disable people easier
The primary goal of this project is to facilitate seamless communication between
individuals with disabilities, particularly those who use sign language, and individuals

10
without disabilities. By leveraging advanced technologies such as Python and OpenCV,
we aim to develop a system that recognizes sign language gestures, interprets emotions
conveyed through facial expressions, and converts these interactions into easily
understandable text. This innovative approach seeks to bridge the communication gap,
fostering a more inclusive and empathetic society where disabled and non-disabled
individuals can engage in meaningful conversations with greater ease and understanding.
❖ To detect various hand gesture and symbolization sign language is preferable
it is the most often utilized method of communication.
The preferred and frequently used method of communication for detecting various hand
gestures and symbolizations is sign language. Sign language serves as a powerful means
of expression, enabling individuals to convey a wide range of messages, emotions, and
concepts through intricate hand movements and gestures. By focusing on sign language,
we tap into a highly effective and widely recognized mode of communication that allows
for nuanced and versatile interactions. This emphasis on sign language as a means of
detecting and interpreting hand gestures enhances the accuracy and authenticity of our
communication system.
❖ To educate disable people
A significant aspect of this project is its potential to provide valuable education and
empowerment to individuals with disabilities. By harnessing technology to recognize and
interpret sign language gestures, as well as capturing emotions, we offer a comprehensive
tool for disabled individuals to learn and engage. This educational dimension extends
beyond basic communication, empowering them to express themselves, learn new
concepts, and connect with others in a meaningful way. Through this project, we aspire
to equip disabled individuals with the means to enhance their knowledge, confidence,
and overall quality of life.

1.3 PURPOSE, SCOPE AND APPLICABILITY


1.3.1 PURPOSE:
❖ Educational Institutions:
Schools: In local schools, real-time sign language recognition can be used in
classrooms to assist deaf students in understanding and participating in lessons,
improving their educational experience.

11
Colleges and Universities: Higher education institutions can use this technology to
provide accessible lectures and discussions, making college education more inclusive.
❖ Public Libraries:
Local libraries can offer real-time sign language recognition services to help deaf
individuals access and understand books, documents, and digital resources, promoting
literacy and learning.
❖ Healthcare Facilities:
Hospitals and Clinics: Medical institutions can use sign language recognition for
effective communication between healthcare providers and deaf patients during
medical consultations and procedures.
Pharmacies: Pharmacies can employ this technology to assist deaf customers in
understanding medication instructions and dosage.
❖ Local Government Offices:
Government agencies at the local level can use sign language recognition to
provide services to the deaf community, including applying for permits, accessing
social services, and participating in public meetings.
❖ Public Transportation:
Local transportation services, such as buses and trains, can implement sign
language recognition to assist deaf passengers with ticketing, directions, and safety
announcements, enhancing their travel experience.
❖ Community Centers:
Local community centers can offer sign language recognition for communication
during events, workshops, and recreational activities, ensuring that deaf participants
can fully engage in community life.
❖ Emergency Services:
Local police, fire departments, and emergency medical services can employ sign
language recognition during emergencies to ensure that deaf individuals receive
crucial information and assistance.

1.3.2 SCOPE:
a. Accessibility and Inclusivity:
The primary purpose is to make communication more accessible and inclusive for
individuals with hearing impairments or those who rely on sign language. By providing
a means for real-time translation, it enables these individuals to engage in conversations,
share ideas, and express themselves without barriers.

12
b. Empowerment:
Sign language recognition empowers individuals who use sign language by offering
them a tool to interact confidently and effectively with people who may not understand
sign language. It promotes independence and self-expression, fostering a sense of
empowerment and autonomy.
c. Education:
In educational settings, sign language recognition can facilitate better communication
between teachers, students, and parents. It can be integrated into e-learning platforms,
classrooms, and tutorials, allowing for seamless interaction and access to educational
content.
d. Workplaces and Professional Settings:
In professional environments, sign language recognition can aid in communication
between colleagues, supervisors, and clients. This technology can enhance productivity,
teamwork, and collaboration among diverse teams.
e. Public Services:
Sign language recognition has potential applications in public services, such as healthcare
facilities, government offices, and customer service centers. It ensures that individuals
with hearing impairments can access essential services without communication barriers.
f. Social Interaction and Inclusion:
By enabling real-time communication, sign language recognition promotes social
interaction and inclusion in social gatherings, events, and everyday interactions. It fosters
a more welcoming and understanding society.
g. Advancements in Assistive Technology:
Sign language recognition contributes to the advancement of assistive technologies. It
can be integrated into wearable devices, smartphones, and other tools that individuals
with hearing impairments use on a daily basis.
h. Research and Innovation:
The field of sign language recognition opens avenues for research and innovation in
computer vision, machine learning, and human-computer interaction. It challenges
researchers to develop robust and accurate algorithms, spurring advancements in these
domains.

13
In essence, the main purpose of sign language recognition using machine learning is to
break down communication barriers, promote inclusivity, and empower individuals with
hearing impairments to communicate effectively and seamlessly with the world around
them.

1.3.3 APPLICABILITY:
1.Accessibility and Inclusivity:
Communication for Deaf and Hard-of-Hearing Individuals: The primary
application is to enable individuals with hearing impairments to communicate more
effectively with those who do not understand sign language.
2.Education:
Classroom Learning: In educational settings, sign language recognition can assist
teachers and students by providing real-time translation of sign language, making it easier
for deaf or hard-of-hearing students to follow lessons.
Online Learning: It can be integrated into e-learning platforms, allowing deaf or hard-of-
hearing students to access online educational resources.
3.Workplaces and Professional Settings:
Meetings and Collaboration: Sign language recognition can facilitate
communication between colleagues in office meetings, ensuring that deaf or hard-of-
hearing employees are fully engaged in workplace discussions.
Customer Service: In customer service roles, it enables businesses to provide
accessible support to customers with hearing impairments.
4.Healthcare:
Doctor-Patient Communication: Sign language recognition can be used in
healthcare settings to assist doctors and nurses in communicating with patients who use
sign language.
Medical Consultations: It can facilitate remote medical consultations for patients
with hearing impairments.

14
5.Public Services:
Government Offices: It can be applied in government offices, allowing
government employees to communicate effectively with citizens who use sign language.
Emergency Services: In emergency situations, it can help first responders
communicate with individuals in distress who rely on sign language.
6.Social Interaction and Inclusion:
Social Gatherings: Sign language recognition promotes inclusivity in social
gatherings, events, and parties by enabling effective communication between individuals
with and without hearing impairments.
Dating Apps and Social Media: It can be integrated into dating apps and social
media platforms to facilitate communication between users.
7.Assistive Devices and Technology:
Wearable Devices: Sign language recognition can be incorporated into wearable
devices, such as smart glasses or hearing aids, to provide real-time translation.
Smartphones and Tablets: Mobile applications can use this technology to offer
accessible communication tools for individuals with hearing impairments.
8.Research and Development:
Advancements in Technology: The application of sign language recognition drives
research and development in computer vision, machine learning, and artificial
intelligence.
Gesture Recognition: It can be extended to recognize gestures in various contexts
beyond sign language, such as in gaming or human-computer interaction.
9.Cross-Lingual Communication:
Sign language recognition can be adapted to recognize multiple sign languages,
enhancing its applicability in different regions and linguistic communities.
10.Customization and Personalization:
Users can customize and personalize the system to recognize their own gestures or
adapt it to specific contexts or industries.

15
1.4 ACHIEVEMENTS :
1.Enhanced Communication Accessibility:
One of the primary achievements is the significant improvement in communication
accessibility for individuals with hearing impairments. Real-time sign language
recognition provides them with a means to communicate more effectively with the
broader community.
2.Inclusive Education:
In educational settings, this technology has empowered deaf and hard-of-hearing
students by allowing them to actively participate in classroom discussions and access
educational resources online. It promotes inclusive education and equal learning
opportunities.
3.Improved Workplace Inclusion:
In professional settings, real-time sign language recognition has facilitated better
communication between deaf or hard-of-hearing employees and their colleagues,
supervisors, and clients. This fosters a more inclusive work environment.
4.Accessible Healthcare:
Services In healthcare, it has improved doctor-patient communication, ensuring
that individuals with hearing impairments can effectively communicate their medical
needs and understand medical advice.
5.Empowerment and Independence:
Real-time sign language recognition has empowered individuals with hearing
impairments, enhancing their independence and self-confidence in various aspects of life.
6.Accessible Customer Service:
Businesses and organizations have been able to provide more accessible customer
service to individuals with hearing impairments, ensuring that they receive the same level
of support and assistance as other customers.
7.Cross-Lingual Support:
The adaptability of this technology to recognize multiple sign languages has made
it applicable in various linguistic communities, promoting inclusivity on a global scale.
8.Facilitation of Social Interactions:

16
In social gatherings, events, and online platforms, it has facilitated social
interactions and connections, allowing individuals with hearing impairments to engage
more actively in social life.
9.Research Advancements:
The development of real-time sign language recognition systems has spurred
advancements in computer vision, machine learning, and artificial intelligence.
Researchers have gained valuable insights into gesture recognition and related fields.
10.Personalization and Customization:
Users can personalize and customize the system to recognize their own sign
language gestures or adapt it to specific contexts, industries, or regional variations.
11.Assistive Technologies:
Integration into wearable devices, smartphones, and tablets has paved the way for
practical and portable assistive technologies that individuals with hearing impairments
can carry with them for communication support.
12.Promoting Inclusivity and Awareness:
Real-time sign language recognition has raised awareness about the needs of
individuals with hearing impairments and the importance of inclusive design and
technology.

1.5 ORGANIZATION REPORT


This project aims to bridge the communication gap between sign language users and
non-signers through the development of a robust Sign Language Recognition (SLR)
system. The initiative encompasses diverse data collection, preprocessing, and feature
extraction, focusing on spatial and temporal cues. The selected Convolutional Neural
Network (CNN) architecture is trained on labeled data, achieving optimal performance
through hyper parameter tuning and regularization. Real-time processing is enabled,
ensuring immediate recognition of sign language gestures. Integration with depth
sensing and user-friendly interfaces enhance accessibility. Rigorous testing and
continuous feedback mechanisms ensure accuracy and facilitate iterative improvements.
The system demonstrates high adaptability to various signing styles and holds promise
for future multilingual applications.

17
CHAPTER 2
Survey of Technology

Moryossef, A., Tsochantaridis, I., Aharoni, R., Ebling, S., & Narayanan, S. (2020) [1]. Springer
International Publishing. We propose a lightweight real-time sign language detection model,
as we identify the need for such a case in videoconferencing. We extract optical flow
features based on human pose estimation and, using a linear classifier, show these
features are meaningful with an accuracy of 80%, evaluated on the Public DGS Corpus.
Using a recurrent model directly on the input, we see improvements of up to 91%
accuracy, while still working under 4 ms. We describe a demo application to sign
language detection in the browser in order to demonstrate its usage possibility in
videoconferencing applications.

Sahoo, A. K. (2021, June)[2]. Indian sign language recognition using machine learning
techniques. In Macromolecular symposia (Vol. 397, No. 1, p. 2000241).Automatic
conversion of sign language to text or speech is indeed helpful for interaction between
deaf or mute people with people who even do not have knowledge of sign language. This
is the demand of current times to develop an automatic system to convert ISL signs to
normal text and vice versa.A new feature extraction and selection technique using
structural features and some of the best available classifiers are proposed to recognize
ISL signs for better communication for computer‐human interface. This paper we used
which narrates a system for automatic recognition of ISL immobile numeric signs, in
which a standard digital camera was only used to acquire the signs, no wearable devices
are required to capture electrical signals.

Buckley, N., Sherrett, L., & Secco, E. L. (2021, July)[3]. A CNN sign language
recognition system with single & double-handed gestures. In 2021 IEEE 45th Annual
Computers, Software, and Applications Conference (COMPSAC) (pp. 1250-1253).
IEEE. A literature review focused on current (1) state of sign language recognition
systems and (2) techniques used is conducted. This review process is used as a foundation
on which a Convolutional Neural Network (CNN) based system is designed and then
implemented .with the help of that Algorithm It recognized and process the

18
ASL(American Sign Language)and ISL(Indian Sign Language )datasets images. During
testing, the system achieved an average recognition accuracy of 95%.

Sawant, S. N., & Kumbhar, M. S. (2014, May)[4]. Real time sign language recognition
using pca. In 2014 IEEE International Conference on Advanced Communications,
Control and Computing Technologies (pp. 1412-1415). IEEE. The Sign Language is a
method of communication for deaf-dumb people. This paper presents the Sign Language
Recognition system capable of recognizing 26 gestures from the Indian Sign Language
by using pycharm. The proposed system having four modules such as: pre-processing
and hand segmentation, feature extraction, sign recognition and sign to text and voice
conversion. Segmentation is done by using image processing. We referred this Algorithm
in our project for gesture recognition and recognized gesture is converted into text and
voice format. The proposed system helps to minimize communication barrier between
deaf-dumb people and normal people.

Taskiran, M., Killioglu, M., & Kahraman, N. (2018, July)[5]. A real-time system for
recognition of American sign language by using deep learning. In 2018 41st international
conference on telecommunications and signal processing (TSP) (pp. 1-5). IEEE. Deaf
people use sign languages to communicate with other people in the community. Although
the sign language is known to hearing-impaired people due to its widespread use among
them, it is not known much by other people. In this article, we have developed a real-time
sign language recognition system for people who do not know sign language to
communicate easily with hearing-impaired people. The sign language used in this paper
is American sign language.After network training is completed, the network model and
network weights are recorded for the real-time system. In the real-time system, the skin
color is determined for a certain frame for hand use, and the hand gesture is determined
using the convex hull algorithm, and the hand gesture is defined in real-time using the
registered neural network model and network weights. The accuracy of the real-time
system is 98.05%.

19
CHAPTER 3
Requirements and Analysis

3.1 Problem Definition:


Communication Barrier:
Problem: Deaf and hard of hearing individuals face significant communication barriers
when interacting with the hearing population.
Language Barrier:
Problem: Sign language varies by region and may have multiple dialects, making it
challenging for deaf individuals to communicate across linguistic boundaries.
Lack of Sign Language Software:
Problem: India often lacks accessible sign language interpretation software and
resources.
Accessibility in Education:
Problem: Deaf students may face challenges in accessing education due to the lack of
sign language interpreters in schools and universities.

3.2 Requirement Specification:


1. Hardware Requirements:
A computer with sufficient processing power, memory, and storage for training and
running machine learning models.
Webcam or camera for capturing video input (for real-time applications).
GPU (optional but recommended) for faster CNN model training and inference.
2. Software Requirements:
Python 3.x: The project will be developed using Python.

20
TensorFlow: A deep learning framework for building and training CNN models.
OpenCV: For capturing and preprocessing video frames.
NumPy: For numerical operations and data manipulation.
Matplotlib : For visualizing data and model performance.
Visual Studio (VS Code) or an integrated development environment (IDE) of your
choice.
Scikit-learn (optional): For model evaluation and metrics.
Flask : Hosting using localhost server directly with python.
3. Dataset:
A comprehensive and diverse dataset of sign language gestures, including different signs
and gestures relevant to the target sign language(s).
Data preprocessing tools to clean, normalize, and augment the dataset.
4. CNN Model Architecture:
Design and implement a CNN architecture optimized for sign language recognition.
Specify the number of layers, filter sizes, activation functions, and other model
hyperparameters.
Utilize TensorFlow for model construction.
5. Real-time Video Input Module:
Develop a module to capture video frames from a webcam or camera using OpenCV.
Preprocess video frames by resizing, normalizing, and removing noise.

21
3.3 Planning and Scheduling:

3.4 Software and Hardware Requirements:

Software Requirements:

1. Python 3.x: Python will be the primary programming language for development.
You can download Python from the official website: https://www.python.org/downloads/

2. TensorFlow: TensorFlow is a deep learning framework that provides tools for building
and training machine learning models.

3. OpenCV: OpenCV is a library for computer vision tasks, including image and video
processing.

22
Install OpenCV using pip:

4. NumPy: NumPy is a library for numerical computations, essential for data


manipulation.
Install NumPy using pip:

5. Matplotlib : Matplotlib is a library for data visualization, which can be useful for
plotting results.

Install Matplotlib using pip:

6. Visual Studio Code : VS Code is the interactive development ideal specially for
pyhton projects.
For download Visual Studio in system refer :
https://code.visualstudio.com/download/windows

Hardware Requirements:

1. Computer with Adequate Processing Power:


A modern computer with a reasonably powerful CPU is necessary for training and
running machine learning models.
A GPU (Graphics Processing Unit) can significantly accelerate model training, but it's
optional.

23
2. Webcam or Camera:
A webcam or camera is required for capturing real-time sign language gestures.
3. Microphone (Optional):
If your project includes voice recognition in addition to sign language recognition, you'll
need a microphone.

3.5 PRIMINALIRY PRODUCT DESCRIPTION


Accurate Recognition: The primary objective is to accurately recognize and interpret
sign language gestures and expressions to convert them into corresponding text .Real-
time Processing: Enable real-time processing to facilitate seamless communication
between individuals using sign language and those who rely on spoken or written
language .Multimodal Integration: Integrate multiple modalities such as video, image,
and potentially depth data to capture a comprehensive understanding of sign language
communication .Adaptability and Generalization: Ensure the system can recognize a
wide range of signs, gestures, and expressions from different individuals, allowing for
adaptability to various signing styles and body postures.
Accessibility: Make sign language recognition technology accessible to a broad user
base, including individuals with hearing impairments, to promote inclusivity and equal
opportunities .Reduced Language Barriers: Mitigate language barriers between
individuals who communicate through sign language and those who do not have
proficiency in sign language.

Data Collection and Preprocessing:


Collect diverse datasets of sign language videos. Preprocess data by converting video
frames into a suitable format for analysis and extracting relevant features.
Feature Extraction:
Extract spatial and temporal features from video frames to capture essential information
about hand and body movements, facial expressions, and other relevant cues.

Model Training and Optimization:

24
Select appropriate machine learning models (e.g., CNNs, RNNs) for sign language
recognition. Train the models on labeled data to learn the mapping between features and
sign language gestures. Optimize the models by adjusting hyper parameters, using
techniques like transfer learning, or applying regularization methods.
Validation and Testing:
Evaluate the model's performance using validation and testing datasets to ensure it
generalizes well to new, unseen data.
Real-time Inference:
Implement algorithms that allow for real-time processing of video input, enabling
immediate recognition and translation of sign language.
Gesture Detection:
Identify and detect individual sign language gestures within a video stream.
Continuous Sign Sequencing:
Recognize and interpret sequences of signs to form coherent sentences or phrases.
User Interface:
Provide a user-friendly interface for individuals using sign language to interact with the
system, potentially incorporating visual feedback.
Text Output:
Convert recognized sign language into corresponding text to facilitate communication
with non-signers.

3.6 CONCEPTUAL MODEL

25
3.6.1 Use Case Diagram

3.6.2 Data Flow Diagram

26
3.6.2 Activity Diagram

27
3.6.4 Sequence Diagram

28
CHAPTER 4
System Design

4.1 BASIC MODULE

1.Data Acquisition:
The process begins with capturing the hand gestures, which are typically performed by
individuals who are fluent in sign language.
The data may be acquired through various means, such as video cameras, depth sensors,
or specialized gloves equipped with sensors.

2.Hand Gesture Representation:


The raw data acquired in the first step needs to be processed to represent hand gestures
in a format that machine learning algorithms can understand. Common approaches
include:

29
Image or Video Frames:
If using cameras, each frame can be used as input.
Depth Maps:
In the case of depth sensors, depth maps are often used to capture 3D information about
the hand.
Skeleton Data:
If using gloves with sensors, you can obtain joint position data.
Preprocessing Module:
This module is responsible for cleaning and preparing the data for feature extraction
and classification. Preprocessing steps may include:
Noise Reduction: Removing background noise or unrelated movements.
Normalization: Scaling the data to a standard size or range.
Segmentation: Isolating the hand from the rest of the frame.
Feature Extraction: Extracting relevant information from the data, such as hand
position, orientation, or key points. Common techniques include:
Edge Detection: Identifying hand edges for shape analysis.
Feature Point Detection: Identifying key points like fingertips or palm center.
Motion Features: Calculating motion vectors or trajectories of key points.

1.Descriptor Module:
The descriptor module transforms the feature vectors extracted in the preprocessing step
into a format suitable for machine learning. Common techniques include:
Histogram of Oriented Gradients (HOG): Capturing local shape and motion
information.
Local Binary Patterns (LBP): Describing local texture patterns.
Principal Component Analysis (PCA): Reducing dimensionality.

30
Convolutional Neural Networks (CNNs): For deep learning-based approaches,
especially when working with image or depth data.
2.Classifier Module:
This module is responsible for recognizing and classifying the sign language gestures
based on the descriptors generated in the previous step. Common classifiers include:
Deep Learning Models (e.g., Recurrent Neural Networks, Convolutional Neural
Networks, or Transformers)
3.Sign Database:
A sign database or dataset is crucial for training and testing the system. It contains a
collection of sign language gestures with corresponding labels.

4.2 DATA DESIGN


1.Data Collection:
Gather a diverse dataset of sign language videos. These videos should cover a wide
range of signs, gestures, and expressions from different individuals to ensure the
model's generalization.
2.Data Preprocessing:
Convert the video data into a format suitable for processing. This may involve
extracting frames or using sequences of frames (video clips).
Normalize or standardize the data. This can involve resizing frames, adjusting color
channels, or any other necessary transformations.
3.Labeling:
Assign labels to each video or frame indicating the corresponding sign being
performed. Each label should be associated with a unique identifier.
4.Data Splitting:
Divide the dataset into training, validation, and testing sets. The training set is used to
train the model, the validation set helps tune hyperparameters and prevent overfitting,
and the testing set is used to evaluate the final model's performance.
5.Feature Extraction:

31
Extract relevant features from the video frames or sequences. This can involve
techniques like:
Spatial Features: Extract information from the appearance of hands, face, and body.
Temporal Features: Consider how the signs evolve over time, capturing motion
patterns.
6.Model Selection:
Choose an appropriate machine learning model. For sign language recognition, deep
learning models like Convolutional Neural Networks (CNNs) and Recurrent Neural
Networks (RNNs) are commonly used due to their ability to capture spatial and
temporal information.
7.Model Training:
Train the selected model on the training data. The model learns to recognize patterns in
the features extracted from the sign language videos.
8.Model Evaluation:
Use the validation set to monitor the model's performance during training. Adjust hyper
parameters or try different architectures as needed.
9.Model Testing:
Assess the model's performance on the testing set. This provides an unbiased estimate
of how well the model will perform on new, unseen data.
10.Model Optimization:
Fine-tune the model if necessary. This may involve adjusting hyper parameters or using
techniques like transfer learning.
11.Deployment:
Once the model is trained and performs well, it can be deployed in real-world
applications. This could be as part of a mobile app, a website, or integrated into a larger
system.
12.Monitoring and Maintenance:
Regularly monitor the model's performance in real-world scenarios. Re-train or fine-
tune it if performance degrades over time.

32
4.2.1 SCHEMA DESIGN

4.2.2 DATA INTEGIRITY AND CONSTRAINTS

33
34
4.3 LOGIC DIAGRAM

4.3.1 DATA STRUCTURE


1.Collect a Sign Language Dataset:
Obtain a comprehensive dataset of sign language gestures. You may need to create your
dataset, collaborate with sign language experts, or use publicly available datasets.
2.Data Preprocessing:
Data Cleaning: Remove any noisy or irrelevant data, ensuring that the dataset is clean
and consistent.

35
Data Annotation: Annotate the dataset with the corresponding sign language labels,
specifying the signs' meanings or letters.
Data Splitting: Divide the dataset into training, validation, and test sets. Typically, you
might use an 80-10-10 or 70-15-15 split.
3.Feature Extraction:
Extract relevant features from the sign language data. This can be done using
techniques like:
Image-based Sign Recognition: Extract features from images or video frames, such as
histograms of oriented gradients (HOG), scale-invariant feature transform (SIFT), or
deep learning-based features using convolutional neural networks (CNNs).
Sensor-based Sign Recognition: If you are using sensor data (e.g., accelerometers or
gyroscopes), process and extract relevant information.
Deep Learning Features: Train deep neural networks to learn features directly from the
raw data using techniques like Convolutional Neural Networks (CNNs) or Recurrent
Neural Networks (RNNs).

4.Data Augmentation:
To enhance the diversity of your dataset and improve model generalization, apply data
augmentation techniques like rotation, scaling, and translation to your training data.

5.Model Selection and Training:


Choose an appropriate machine learning or deep learning model for sign language
recognition. Popular choices include:
o Convolutional Neural Networks (CNNs) for image-based recognition.
o Recurrent Neural Networks (RNNs) or LSTM networks for sequential data (such
as video frames).
o Transformer models, which have also shown promise in sequence-to-sequence
tasks.
o Train your model on the training data and use the validation set to fine-tune
hyperparameters and prevent overfitting.

36
4.3.2 ALGORITHEM DESIGN
1. Import necessary libraries:
➢ cv2: OpenCV for image processing.
➢ cvzone.HandTrackingModule: A custom module from the CVZone library for
hand tracking.
➢ cvzone.ClassificationModule: A custom module from the CVZone library for
image classification.
➢ numpy: NumPy for numerical operations.
➢ math: Python's math module for mathematical functions.
2. Initialize variables and objects:
➢ cap: Open a video capture object to capture video from the default camera
(camera index 0).
➢ detector: Create a hand detection object with a maximum of 1 hand.
➢ classifier: Create a hand gesture classifier object that loads a pre-trained Keras
model and associated labels.
➢ offset: Set an offset for cropping the hand region.
➢ imgSize: Define the size for the output image.
➢ folder: Set a folder variable (unused in the provided code).
➢ counter: Initialize a counter variable (unused in the provided code).
➢ labels: Create a list of labels corresponding to hand gestures.
3. Start an infinite loop using while True to continuously process video frames.
4. Read a frame from the video capture and copy it to imgOutput.
5. Use the detector to find hands in the current frame, and store the hand information in
the hands list.
6. If hands are detected (the hands list is not empty), proceed with hand gesture
recognition.
7. Retrieve information about the first detected hand (index 0) in the hands list. Extract
the bounding box coordinates (x, y, w, h) of the hand region.
8. Create a white image (imgWhite) of size imgSize x imgSize filled with white color
(255, 255, 255).
9. Crop the region around the detected hand using the offset value to obtain imgCrop.

37
10. Determine the aspect ratio of the hand region (h / w). This is used to decide how to
resize and fit the cropped image into the imgWhite canvas.
11. If the aspect ratio is greater than 1, it means the hand is taller than it is wide. In this
case:
➢ Calculate a scaling factor (k) based on the height of imgWhite.
➢ Compute the new width (wCal) by scaling the original width (w) using k.
➢ Resize the imgCrop to have a width of wCal while maintaining the aspect ratio.
➢ Calculate the gap needed on both sides of the resized image to center it in
imgWhite.
➢ Copy the resized image into the center of imgWhite.
➢ Use the classifier to predict the hand gesture based on imgWhite and get the
prediction and corresponding index.

1) If the aspect ratio is less than or equal to 1, it means the hand is wider than it is
tall. In this case:

➢ Calculate a scaling factor (k) based on the width of imgWhite.


➢ Compute the new height (hCal) by scaling the original height (h) using k.
➢ Resize the imgCrop to have a height of hCal while maintaining the aspect ratio.
➢ Calculate the gap needed at the top and bottom of the resized image to center it in
imgWhite.
➢ Copy the resized image into the center of imgWhite.
➢ Use the classifier to predict the hand gesture based on imgWhite and get the
prediction and corresponding index.

2) Draw rectangles and text on imgOutput to display the recognized hand gesture:
➢ A filled rectangle with a label for the gesture (using labels[index]).
➢ The label text itself.
➢ A rectangle around the detected hand.

1) Display intermediate images (the cropped hand region and the resized image)
using cv2.imshow().
2) Display the final processed frame in a window labeled "Image".
3) Use cv2.waitKey(1) to update the display.

38
4.4 PROCEDURAL DESIGN

39
4.5 TEST CASE DESIGN

40
CHAPTER 5
IMPLEMENTATION AND TESTING

5.1 Coding Details and Efficiency

5.1.1 Coding Details

Data Collection :

1. Import Libraries :
Import necessary libraries including OpenCV (cv2), HandDetector from
cvzone.HandTrackingModule, and numpy for numerical operations.

2. Initialize Video Capture and Hand Detector :


Open a connection to the default camera (cv2.VideoCapture(0)) and create an
instance of HandDetector with a maximum number of hands to detect set to 1.

3. Set Constants:
Define constants such as offset (for cropping), imgSize (size of the image for
collection), folder (where images will be saved), and initialize a counter to keep track
of the number of collected images.

4. Data Collection Loop:


Enter an infinite loop to continuously capture frames from the camera. Inside the
loop:

• Read a frame from the video capture device.


• Use the findHands() method of the hand detector to detect hands in the frame.
• If hands are detected:
• Extract the bounding box (x, y, w, h) of the detected hand.
• Crop the region of interest (ROI) containing the hand with an offset to include
some extra space around the hand.
• Resize and preprocess the cropped hand image to a fixed size (imgSize).
• Display the cropped hand image (imgCrop) and the preprocessed image
(imgWhite) for visual inspection.
• Display the original frame.

41
• Wait for a key press. If the key "s" is pressed:
• Increment the counter.
• Save the preprocessed image in the specified folder with a unique name based
on the current timestamp using cv2.imwrite().
• Print the current value of the counter.

5. Release Resources:
Once the loop is terminated (manually), release the video capture device and
close all OpenCV window.

Main Code :

1. Import Libraries:
Import the necessary libraries, including OpenCV (cv2), HandDetector and Classifier
from cvzone, and numpy for numerical operations.

2. Initialize Video Capture, Hand Detector, and Classifier:


Open a connection to the default camera (cv2.VideoCapture(0)).
Create instances of HandDetector and Classifier with specific parameters.

3. Set Constants:
Define constants such as offset (for cropping), imgSize (size of the image for
classification), folder (where images will be saved), labels (list of labels for
classification).

4. Initialize Variables:
Initialize current_word as an empty string to store the recognized word.
Initialize cooldown_frames to prevent rapid detections.

5. Create Output Window for Word:


Create a window using cv2.namedWindow() to display the recognized word.
Resize and move the window for better visualization.

6. Main Loop:
Enter an infinite loop to continuously capture frames from the camera.
Read a frame from the video capture device using cap.read().
Create a copy of the frame for output visualization (imgOutput).

42
7. Hand Detection:
Use findHands() method of the hand detector to detect hands in the frame.
If hands are detected:
Extract the bounding box (x, y, w, h) of the detected hand.
Crop the region of interest (ROI) containing the hand with an offset to include some
extra space around the hand.

8. Image Preprocessing and Classification:


Preprocess the cropped hand image.
Use the classifier to predict the hand gesture.
Update current_word by appending the recognized alphabet to it.
Set a cooldown period to prevent rapid detections.

9. Visualization:
Draw rectangles and put text displaying the recognized label on the output frame.
Display the output frame with the recognized label and the current word.
Display the current word in a separate window.

10. Check for Key Press:Wait for a key press using cv2.waitKey(1).
If the key "q" is pressed, break out of the loop and exit the program.

5.1.2 Coding Efficiency

Code efficiency is a broad term used to depict the reliability, speed and
programming methodology used in developing codes for an application. Code
efficiency is directly linked with algorithmic efficiency and the speed of runtime
execution for software. It is the key element in ensuring high performance. The goal of
code efficiency is to reduce resource consumption and completion time as much as
possible with minimum risk to the business or operating environment. The software
product quality can be accessed and evaluated with the help of the efficiency of the
code used. Code efficiency plays a significant role in applications in a high-execution-
speed environment where performance and scalability are paramount.

In a sign language recognition project, each component plays a crucial role in


enhancing code efficiency:

CNN (Convolutional Neural Network): CNNs are effective for image recognition
tasks like sign language recognition. They efficiently learn spatial hierarchies of

43
features through convolutional layers, enabling accurate classification of hand
gestures. Utilizing CNNs ensures that the model can extract relevant features from sign
language images effectively.

TensorFlow: TensorFlow provides a robust framework for building and training


machine learning models, including CNNs. Its computational efficiency, especially
when leveraging GPU acceleration, enables faster training and inference. TensorFlow's
high-level APIs simplify the implementation of complex neural network architectures,
reducing development time and enhancing code efficiency.

Python: Python's simplicity, readability, and extensive libraries make it an ideal choice
for implementing machine learning projects. Its rich ecosystem, including libraries like
NumPy, Pandas, and OpenCV, facilitates data manipulation, preprocessing, and model
evaluation. Python's flexibility allows seamless integration with TensorFlow and other
machine learning frameworks, enhancing code efficiency by enabling rapid
prototyping and experimentation.

PyCharm: PyCharm is an Integrated Development Environment (IDE) that offers


advanced code editing, debugging, and project management features specifically
tailored for Python development. Its intelligent code completion, refactoring tools, and
version control integration streamline the coding process, improving developer
productivity and code quality. PyCharm's debugging capabilities help identify and fix
errors efficiently, leading to more robust and maintainable code in the sign language
recognition project.

5.2 Testing Approach

A test approach is the test strategy implementation of a project which defines how
testing would be carried out, defines the strategy which needs to be implemented and
executed, to carry out a particular task. There are two types of testing approaches-
Proactive Testing & Reactive Testing. We’ve selected Proactive approach for the
testing of our project. Proactive Testing approach is an approach in which the test
design process is initiated as early as possible in order to find and fix the defects
before the build is created. The proactive approach includes planning for the future,
taking into consideration the potential problems that on occurrence may disturb the
orders of processes in the system. It is about recognizing the future threats and
preventing them with requisite actions and planning so that you don’t end up getting
into bigger trouble.
For this project, we’ve adopted proactive testing approach because it not only can
reduce defects in delivered software systems. Unlike conventional testing, which
44
merely reacts to whatever has been designed/developed and frequently is perceived as
interfering with development, Proactive Testing also helps prevent problems and
make development faster, easier, and less aggravating.. Organizations that learn to let
Proactive Testing drive development not only can avoid such showstoppers’ impacts;
they also can significantly reduce the development time and effort needed to correct
the errors that cause the showstoppers. Hence, we found Proactive Testing approach
will prove to be effective & beneficiary according to our project model.

5.3 Unit Testing :

“Unit Testing” is a type of testing which is done by software developers in which the
smallest testable module of an application - like functions, procedures or interfaces -
are tested to ascertain if they are fit to use. This testing is done to ensure that the
source code written by the developer meets the requirement and behaves in an
expected manner. The goal of unit testing is to break each part of source code and
check that each part works properly. This should mean that if any set of input is fed
to function or procedure, it should return an expected output.
Advantages of unit testing:

✓ Defects are found at an early stage. Since it is done by the dev team by testing
individual pieces of code before integration, it helps in fixing the issues early on in
source code without affecting other source codes.
✓ It helps maintain the code. Since it is done by individual developers, stress is
being put on making the code less inter dependent, which in turn reduces the
chances of impacting other sets of source code.
✓ It helps in reducing the cost of defect fixes since bugs are found early on in the
development cycle.
✓ It helps in simplifying the debugging process. Only latest changes made in the
code need to be debugged if a test case fails while doing unit testing.

45
5.4 Integrated Testing

Various unit testing modules were integrated to form a complete system they
are illustrated along with their outputs below:

We create a GUI for our project that contain two buttons :


1. Start Camera : When user click on start camera button the on click function
triggered and the windows framework is open.
2. Quite Application : When user click on quiet application button the button
triggers and the user redirect into the main windows, The application being
closed.

46
5.5Test Case

5.5.1 UNIT TEST CASE :

Test Scenario : Sign Language Data Code Test ID. : E1


Pre-Requisite : Verification of ASL Priority : High
Datasets

Sr. Action Input Obtained


No Output Test Result Accuracy (%)

1 Latter “A” Sign Display letter “A” sign Recognized as Pass 95%
to the camera letter “A”

2 Number “3” Sign Display number “3” Recognize as Pass 92%


sign to the camera number “3”

3 Word “Hello” Sign Display “HELLO” Sign Not Recognized Fail -


Gesture to the camera as “HELLO”

The table presents a series of test actions conducted to evaluate the accuracy of
a sign recognition system. Each action involves displaying a specific sign to the
camera as input, and the obtained output is the recognition result produced by the
system. The test result is then determined by comparing the obtained output with the
expected output, with a specified accuracy percentage. For instance, in the first
action, display in the letter "A" sign to the camera should ideally result in the system
recognizing it as the letter "A", achieving an accuracy of 95%. Similarly, the
accuracy percentages for recognizing the number "S" sign are reported as 92% and
the "HELLO" sign are reported as Failed. The test results are categorized as Pass or
Fail based on whether the obtained output aligns with the expected output within the
specified accuracy threshold.

Conclusion: Words in ASL Sign Language Datasets Testing is Failed.

47
5.5.2 INTEGRATED TEST CASE :

Test Scenario : Connectivity of Model Code Test ID. : E2


Pre-Requisite : Integrate Front-end and Priority : High
Back-end

Sr. Action Input Expected Obtained Test Result


No Output Output
1 Execute the Browser will Open GUI or Web Obtain GUI of Pass
HTML Code Page Model
Successfully

2 Interact with Tap on “Start Camera” Gesture window Window Opens Pass
User Interface button Frame will Open Successfully

3 Quit Application Tap on “Stop Camera” Application will Quit Application Fail
button NOT Quit Successfully

The test scenario outlined in the table evaluates the functionality of a camera
application across various actions and inputs. It begins with testing the activation of
the camera view upon tapping the "Start Camera" button, followed by the not
successful closure of the application upon tapping the "Stop Camera" button or
closing the app. Each action's input is compared against the expected output, ensuring
that the application behaves as intended. This structured approach to testing aims to
confirm that the camera application operates smoothly and effectively for users,
ultimately ensuring a positive user experience.

Conclusion: Application QUIT Testing is Failed.

48
5.6 Modification and Improvements

1. ERROR CODE TEST ID : E1

In this the error is showing the Model/keras.h5 is not


imported.

EPILOGUE :

“The folder model which contain the trainable data set


of alphabet is steeling in the given directory according
to there location. This model train by Google
Teachable Machine.”

49
2. ERROR CODE TEST ID : E2

In “index.html” there is “style.css” file is not mentioned by this file the web page
open like first figure which is white and don’t have any layout and attractiveness.

50
EPILOGUE :
By using the html line
<link rel="stylesheet"
href="./style.css">
And mention the location of
the file to access css properties
which gives the attractive and
user friendly look to our
website.

3. ERROR CODE TEST ID : E3

There is *no compile* error find so the window and text field not appearing and
program not compiled properly.

51
EPILOGUE :

There is one line missing at the last which is above :

hand_gesture_recognition.start()

It is because the “.start()” function don’t call at the last .

52
CHAPTER 6
RESULTS AND DISCUSSION

6.1 Final Test Cases

Final Test Case 1

Test Scenario : Sign Language Code Test ID. : E1


Recognition Model

Pre-Requisite : Verification Process of Priority : High


Model

Sr. Action Input Expected Obtained Test Result


No Output Output

1 Execute the Browser will Open GUI or Web Obtain GUI of Pass
HTML Code Page Model
Successfully

2 Interact with Tap on “Start Camera” Gesture window Window Opens Pass
User Interface button Frame will Open Successfully

3 Quit Application Tap on “End Camera” Application will Quit Application Pass
button Quit Successfully

The test scenario outlined in the table evaluates the functionality of a camera
application across various actions and inputs. It begins with testing the activation of
the camera view upon tapping the "Start Camera" button, followed by the successful
closure of the application upon tapping the "Stop Camera" button or closing the app.
Each action's input is compared against the expected output, ensuring that the
application behaves as intended. This structured approach to testing aims to confirm
that the camera application operates smoothly and effectively for users, ultimately
ensuring a positive user experience.
Conclusion: Successfully Tested User Interface Design.

53
Final Test Case 2

Test Scenario : Sign Language Data Code Test ID. : E2

Pre-Requisite : Verification of ASL Priority : High


Datasets

Sr. Action Input Obtained Test Result Accuracy (%)


No. Output
1 Latter “A” Sign Display letter “A” sign Recognized as Pass 95%
to the camera letter “A”

2 Number “1” Sign Display number “1” Recognize as Pass 92%


sign to the camera number “1”

3 Word “Hello” Sign Display “HELLO” Sign Recognized as Pass 65%


Gestures to the “HELLO”
camera

The table presents a series of test actions conducted to evaluate the accuracy of a sign
recognition system. Each action involves displaying a specific sign to the camera as
input, and the obtained output is the recognition result produced by the system. The
test result is then determined by comparing the obtained output with the expected
output, with a specified accuracy percentage. For instance, in the first action, display
in the letter "A" sign to the camera should ideally result in the system recognizing it
as the letter "A", achieving an accuracy of 95%. Similarly, the accuracy percentages
for recognizing the number "S" sign and the "H,E,L,L,O" sign are reported as 92%
and 82% respectively. The test results are categorized as Pass or Fail based on
whether the obtained output aligns with the expected output within the specified
accuracy threshold.
Conclusion: Successfully ASL Dataset Testing is done.

54
Final Test Case 3

Test Scenario : Augmentation Code Test ID. : E3


of ASL alphabets
Pre-Requisite : Data Priority : High
Augmentation
in ASL data

Sr. Test Case Input Action Obtained Accuracy Result


No. Description Output (%)

1 Recognition of Image of sign Feed the image Model 95% Pass


letter “A” language into the train predicts
Gesture gesture for “A” machine learning letter “A”
model

2 Recognition of Image of sign Feed the image Model 92% Pass


letter “S” language into the train predicts
Gesture with gesture for “S” machine learning letter “S”
augmentation with added model
noise.

3 Recognition of Image of sign Feed the image Model 88% Pass


letter “D” language into the train predicts
Gesture with gesture for “D” machine learning letter “D”
different under varying model
lightning lightning
condition condition
4 Recognition of Image of sign Feed the image Model 87% Pass
letter “G” language into the train predicts
Gesture gesture for “G” machine learning letter “G”
model

5 Recognition of Image of sign Feed the image Model 85% Pass


letter “1” language into the train predicts
Gesture gesture for machine learning number “1”
number “1” model

6 Recognition of Sequence of Feed the image Model 82% Pass


letter “BYE” image of word into the train predicts
Gesture “BYE” of machine learning word “BYE”
Gesture model

The table presents a series of test actions conducted to evaluate the accuracy of
a sign recognition Gesture Augmentation . Each action involves displaying a specific
sign to the camera as input, and the obtained output is the recognition result produced
by the system. The test result is then determined by comparing the obtained output with
55
the expected output, with a specified accuracy percentage. For instance, in the first
action, display in the letter "A" sign to the camera should ideally result in the system
recognizing it as the letter "D", achieving an accuracy of 88%. Similarly, the accuracy
percentages for recognizing the number "1" sign and the "BYE" sign are reported as
85% and 82% respectively. The test results are categorized as Pass or Fail based on
whether the obtained output aligns with the expected output within the specified
accuracy threshold.
Conclusion: Successfully ASL Dataset Augmentation Testing is done.

Final Test Case 4

Test Scenario : Connectivity of Model Code Test ID. : E4


Pre-Requisite : Integrate Front-end and Priority : High
Back-end

Sr. Action Input Expected Obtained Test Result


No Output Output

1 Execute the Browser will Open GUI or Web Obtain GUI of Pass
HTML Code Page Model
Successfully

2 Interact with Tap on “Start Camera” Gesture window Window Opens Pass
User Interface button Frame will Open Successfully

3 Quit Application Tap on “End Camera” Application will Quit Application Pass
button Quit Successfully

The test scenario outlined in the table evaluates the functionality of a camera
application across various actions and inputs. It begins with testing the activation of
the camera view upon tapping the "Start Camera" button, followed by the successful
closure of the application upon tapping the "Quite Application" button or closing the
app. Each action's input is compared against the expected output, ensuring that the
application behaves as intended. This structured approach to testing aims to confirm
that the camera application operates smoothly and effectively for users, ultimately

56
ensuring a positive user experience.
Conclusion: Successfully Tested User Interface Design.

6.2 System User Documentation

Step I :

Firstly to open the Application User need to go any browser like chrome browser or
Microsoft Edge End search on the search engine. Localhost:\\5000 after executing the
code in visual studio The Web Site interface will be opened.

57
Step II :

After opening the website, It will be showing a “Machine learning World” based
website which contains two buttons. The first is “start camera” and the second is
“Quiet Application” with the navigation bar that contains home, CNN,, datasets blog,
features, ai, and contact.
After that click on the start camera Button the webcam is on and a window is appear
as given in the above figure.

Step II :

58
After opening the Window framework with the help of open cv The user need to detect
their hand in the given specific area of the Camera The Algorithm detect your hand.

User need to show and make their hand shape according to the ASL which is American
Sign Language As given on the above figure I make my hand shape like C. It will be
detect and print below the frame which is “C”.

Step IV :

Now user need to make their hand shape like a Shape of “A” cell data set, and it will
be print the a letter On the side of the letter. So the word making is started now.

59
Step V :

Now I make my hand shape like “B” according to the ASL Dataset and my algorithm
detects its B and print after the A. So the word “CAB” is become , by the help of this,
you can make any words and sentence.

Step VI :

After, printing the sentence and doing all things user needs to click on the
“Cross mark” that Situated on top left of the frame So the window is
disclosed and user redirect on the web page.

60
Step VII :

Click on the quiet application button and the application will be closed.

61
CHAPTER 7

CONCLUSIONS

7.1 Conclusion

In conclusion, sign language recognition powered by machine learning offers


transformative benefits for both the deaf and hard-of-hearing community and society
as a whole. By enabling better communication, accessibility, education, and
employment opportunities, it fosters inclusivity and empowers individuals to
participate fully in various aspects of life. Additionally, it facilitates linguistic research,
innovation in technology, and the development of practical applications to enhance the
quality of life for deaf individuals. As advancements continue in this field, the potential
for sign language recognition to break down barriers and promote understanding
between different linguistic communities is immense.

7.2 Significance of the Project

Sign language recognition using machine learning has significant benefits, including:
Accessibility:
It enhances accessibility for the deaf and hard-of-hearing community by enabling
communication through sign language interpretation.
Communication: It bridges the communication gap between the deaf community and
non-signers, facilitating more inclusive interactions.
Education: It improves educational opportunities for deaf individuals by providing
access to online courses, lectures, and educational materials through sign language
translation.
Employment: By enabling better communication, it enhances employment
opportunities for deaf individuals in various sectors, reducing barriers to participation
in the workforce.
Independence: It promotes independence for deaf individuals by allowing them to
navigate public spaces, interact with technology, and access services without relying
on interpreters.
Research: Sign language recognition facilitates linguistic research and the
development of educational tools for sign language learners and linguists.
Innovation: Advances in sign language recognition can lead to the development of
innovative applications and devices, such as sign language translation apps and
wearable devices for real-time interpretation.
62
Overall, sign language recognition using machine learning holds promise for
improving accessibility, communication, and inclusion for the deaf and hard-of-
hearing community.

7.3 Limitations and Future Work

➢ Limitations

Multimodal Fusion: Integrating information from multiple modalities such as hand


gestures, facial expressions, and body movements presents challenges in designing
effective fusion strategies that leverage complementary cues while avoiding
redundancy or conflicting signals.

Model Interpretability: The black-box nature of many machine learning models


hinders interpretability, making it difficult to understand how decisions are made or
diagnose errors in sign language recognition systems, which is crucial for trust,
accountability, and debugging.

Continual Learning: Sign language recognition systems deployed in real-world


settings must adapt to evolving signing conventions, user preferences, and
environmental conditions over time. Continual learning techniques are needed to
facilitate incremental updates and model refinement without catastrophic forgetting
or performance degradable.

Privacy and Security: Privacy concerns arise when deploying sign language
recognition systems in sensitive contexts, such as healthcare or surveillance, where
ensuring the confidentiality and integrity of communication data is paramount.
Robust encryption, anonymization, and access control mechanisms are needed to
protect user privacy and prevent unauthorized access.

➢ Future Work

Certainly, here are some advanced points that could shape the future of sign
language recognition using machine learning:

Deep Learning Architectures: Advancements in deep learning architectures, such as

63
transformer-based models or graph neural networks, could lead to more accurate and
efficient sign language recognition systems by better capturing spatial and temporal
dependencies in sign language sequences.

Transfer Learning and Few-shot Learning: Leveraging transfer learning and few-
shot learning techniques could enable sign language recognition systems to generalize
better to new sign languages or users with limited training data, thereby increasing their
adaptability and scalability.

Attention Mechanisms: Further exploration of attention mechanisms could enhance


the interpretability and robustness of sign language recognition models by allowing
them to focus on relevant parts of the input data and ignore irrelevant distractions or
noise.

Self-supervised Learning: By incorporating self-supervised learning techniques, sign


language recognition systems could learn more meaningful representations of sign
language data in an unsupervised or weakly supervised manner, potentially reducing
the need for large labeled datasets.

Multimodal Fusion: Advanced techniques for fusing information from multiple


modalities, such as vision and motion sensors, could enable sign language recognition
systems to achieve higher accuracy and robustness by leveraging complementary
information sources.

Continual Learning: Developing continual learning algorithms that can incrementally


learn new sign language signs or adapt to changes in signing styles over time could
make sign language recognition systems more flexible and capable of long-term use.

Human-AI Collaboration: Exploring ways to facilitate seamless collaboration


between humans and AI systems in sign language interpretation tasks, such as
interactive feedback mechanisms or mixed-initiative interfaces, could enhance the
overall effectiveness and user experience of sign language recognition technology.

These advanced points represent cutting-edge research directions that have the
potential to significantly advance the field of sign language recognition using machine
learning in the coming years.

64
7.5 References

Moryossef, A., Tsochantaridis, I., Aharoni, R., Ebling, S., & Narayanan, S. (2020)
[1]. Real-time sign language detection using human pose estimation. In Computer
Vision–ECCV 2020 Workshops: Glasgow,
UK, August 23–28, 2020, Proceedings, Part II 16 (pp. 237-248).
https://link.springer.com/chapter/10.1007/978-3-030-66096-3_17

Sahoo, A. K. (2021, June)[2]. Indian sign language recognition using machine


learning techniques. In Macromolecular symposia (Vol. 397, No. 1, p. 2000241).
https://onlinelibrary.wiley.com/doi/abs/10.1002/masy.202000241

Buckley, N., Sherrett, L., & Secco, E. L. (2021, July)[3]. A CNN sign language
recognition system with single & double-handed gestures. In 2021 IEEE 45th Annual
Computers.
https://ieeexplore.ieee.org/abstract/document/9529449/

Sawant, S. N., & Kumbhar, M. S. (2014, May)[4]. Real time sign language
recognition using pca. In 2014 IEEE International Conference on Advanced
Communications, Control and Computing Technologies (pp. 1412-1415).
https://ijaer.com/admin/upload/03%20Thigale%20S%20S%2001213.pdf

Taskiran, M., Killioglu, M., & Kahraman, N. (2018, July)[5]. A real-time system
for recognition of American sign language by using deep learning. In 2018 41st
international conference on telecommunications and signal processing (TSP) (pp. 1-5
https://dergipark.org.tr/en/pub/tjst/issue/72762/1073116

65

You might also like