Hand Sign Language Research

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

GestureNet: Bridging Communication Gaps with

Hand Sign Recognition


Kanika Rana Khushseerat Kaur Manpreet Singh
Chandigarh University Chandigarh University Chandigarh University
Punjab, India Punjab, India Punjab, India
[email protected] [email protected] [email protected]

Harpuneet Singh Shashank Jassandeep Singh


Chandigarh University Chandigarh University Chandigarh University
Punjab, India Punjab, India Punjab, India
[email protected] [email protected] [email protected]

Abstract— Sign language recognition using hand gestures, I. INTRODUCTION


especially from signed digital languages, is a crucial approach to the It is a way of communicating with the deaf and hard-of-
ability of people who are deaf or hard of hearing to interact with the hearing people where signs and gestures convey ideas or
hearing community. Thus, this paper develops vision-based ASL
emotions. However, because sign language is not widely known
fingerspelling gesture recognition towards converting signs into
by hearing people, there are many barriers to effective
correlative text utilizing CNNs. The model can also identify a blank
communication. This has sparked the creation of technologies
symbol and 27 signs, which consist of the ASL alphabet; its accuracy is
capable of translating gestures into text, thus enhancing
98%. Some of the system features consist of dataset creation to
communication between the deaf and the hearing people [1].
capture customized gestures, conversion into grayscale images, and
blurring of images using the Gaussian Blur filter, and a Two-tiered
Vision-based sign language recognition technologies have
Well-Distinguished Classification Algorithm to aid in separating similar attracted much interest in these technologies because they are
movements. The real-time model uses Python, Tomas, Keras, and relatively cheaper and operable with standard devices, including
OpenCV as Machine learning frameworks. Also, an autocorrect webcams and smartphones.
functionality integrated into the software corrects the obtained text
This work concerns identifying hand gestures in American
and makes the whole communication process more pleasant. As much
as the system shows promising results for faces against light coloured Sign Language (ASL), emphasizing the alphabet and
backgrounds and moderate lighting, there continue to be fingerspelling. This project aims to create a natural time system
shortcomings when it comes to the varying conditions of the that undertakes the identification of hand gestures and the
environment. Future enhancement includes subtraction of background conversion of the identified hand gestures into text characters
and better pre-processing in complex environments. This model can be using machine learning and computer vision approaches [2]. The
implemented to make it affordable and easily accessible to deaf and system was developed using convolutional neural networks
hard-of-hearing people, and it can significantly assist them in their (CNNs), which is most appropriate for image recognition since it
attempts to communicate with hearing people in real time. can learn spatial hierarchies and features from hand gesture
images [3].
Keywords— Hand sign language recognition, Convolutional
Neural Networks (CNNs), Image pre-processing, Real-time To overcome the problem of data scarcity, a unique dataset
gesture recognition, TensorFlow, OpenCV. was developed, which contained the raw pictures of ASL symbols
taken in different conditions. Some pre-processing steps to
enhance gesture recognition include greyscale conversion,
Gaussian filter, and image normalization [4]. The model also uses
a multi-layered classification algorithm to deal with visually

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


similar gestures and further increase the accuracy and reliability subtraction methods and dynamic lighting compensation [11].
of the system [5]. Furthermore, the system is still limited to recognising static
gestures of single letters. However, future work will expand the
However, the system could be more effective in challenging model to include dynamic gestures to translate whole words and
scenarios, including low light or when the background is crowded phrases [12].
with objects of a similar appearance. Future work will include
integrating background subtraction algorithms and improving the
pre-processing methods to get better results in real environments
[6]. Lastly, the aim of this research is to design a cheap and easy- A. RELEVANT CONTEMPORARY ISSUES:
to-use application that may be installed on most devices in use The hand sign language recognition models, including the one
today to enhance the communication of the deaf and hard of presented in this paper, meet several current challenges,
hearing. especially the ones related to accessibility and inclusion of the
deaf and hard of hearing populations. The major problem is the
An essential strength of this research is that it uses standard
lack of understanding between sign language users and those
hardware such as webcams for data collection of hand gestures
who do not understand sign language. Thus, it is crucial to have
and natural movements instead of using depth sensors or motion
technology that can help break the barriers put in place due to
capture gloves that are costly and not readily available. This
disability. Another of the present issues is that there are not
makes the system very affordable and can be used in everyday life
and for the general public [7]. The availability of standard enough cheap and accessible technologies to help. Current
computer vision libraries such as OpenCV also helps develop and solutions include using sensor-based gloves or depth-sensing
deploy the system on various platforms, from personal computers cameras for sign language recognition; they are costly and not
to smartphones [8]. portable for everyday use. Such vision-based approaches are
more efficient and economical than traditional systems, making
assistive devices affordable and available to the general
populace.
This work’s machine learning application is based on
Convolutional Neural Networks (CNNs) for gesture recognition. Furthermore, there is a need for real-time gesture
This is because the architecture of CNNs is pyramidal, making
recognition, especially in fast-paced environments where there
them learn the essential features of the images directly from the
is a need to pass a message across, such as in hospitals or
raw input data. In this model, the CNNs extract features such as
emergency services. However, most existing approaches
edges and shapes from the hand gesture images, which are helpful
perform poorly in practical scenarios due to changes in lighting
in differentiating between the different ASL signs. This is because
the network has several convolutional layers that help it learn conditions, complex backgrounds, and the difference in the
complex patterns, thus enabling it to classify the gestures shapes of users’ hands. To address these problems, better pre-
correctly [9]. processing and robust machine-learning techniques capable of
operating in different scenarios must be developed.
The proposed system also has an autocorrection feature to Furthermore, some ethical concerns arise regarding user privacy
help minimise errors and increase the quality of the translated in computer vision applications. Solutions must guarantee that
text. This feature is beneficial when working with gestures data, especially video feeds, are processed securely and with
resembling one another or when there are partial inaccuracies permission from the users, which is standard in today’s society.
since the user can choose the correction from the available options
based on the context [10]. This is to ensure that by integrating B. IDENTIFICATION OF PROBLEM:
real-time gesture recognition with intelligent error correction, the The first issue solved in this research is the lack of accessibility
system can offer better and more reliable communication.
of effective communication between deaf or hard of hearing
However, the system still has some drawbacks when it is used people and the majority of hearing people. Therefore, this form
in real-world applications. The outcome of gesture recognition of communication is very influential among deaf people but has
can be influenced by the environment, which includes lighting a major disadvantage in that it is too specialized and not easily
changes, background noise, and variance in skin colour. To understood by the general populace in most cases. This leads to
overcome these issues, there is a need for further improvements exclusion from society and the inability to find a way to use social
in image pre-processing, such as the use of better background services since individuals cannot communicate with each other.
Although interpreters can break this bidirectional discontinuity, the systems used mainly were hardware-based; this included
they are not always present, and if they are, they are not always glove-based sensors and depth cameras for hand movements
willing to assist in real-time or casual situations. and gestures. However, these devices were expensive and,
hence, not suitable for use by the general populace; thus, their
Furthermore, current state-of-the-art approaches for sign availability was restricted [1].
language recognition require more complex technological tools,
including motion capture gloves or depth-sensing cameras, Computer vision and profound learning developments have
which can hardly be popularized. Although the current led to video-based sign language recognition systems that
technologies in machine learning and computer vision have employ regular cameras and machine learning techniques to
improved, the real-time system still has some problems: low identify hand signs. This is because convolutional neural
accuracy in natural environments with changes in lighting networks, or CNNs, have been identified as efficient in image
conditions, complex background of the hand and variations in classification and have become popular in this field. Some recent
hand sizes. In addition, most models rely on recognizing gestures works, for instance, Lin et al. (2017), have shown that CNNs can
in still images, thus lacking the capacity to recognize dynamic be used in detecting hand gestures with high accuracy, but these
signs commonly used during sign language conversations. As systems have been designed to recognize only a few static
such, it is critical to create an affordable, low-implement cost, gestures [2].
real-time mode for identification that works effectively in
various platforms. In the same way, several studies have been conducted to
enhance gesture recognition by applying image pre-processing
C. IDENTIFICATION OF TASK methods. For example, Zaki and Shahen (2011) used background
The main goal of this study is to design a real time hand sign subtraction and skin colour detection to increase the recognition
language recognition system that can effectively convert the performance in complex scenes [3]. Nevertheless, these
American Sign Language (ASL) into text by employing machine approaches have several drawbacks so that they may fail in
learning algorithms. This involves several key components: different light conditions and background scenes. Even though
Gathering a relevant set of ASL hand gesture data, normalizing some systems have shown remarkable results in controlled
the images for gesture recognition and creating a CNN that can conditions, the problem of maintaining performance in natural
classify between 27 symbols, including the whole alphabet and a conditions remains acute.
blank. This also involves real-time video processing in which
hand gestures are received from a video stream, processed by the Moreover, multi-layered algorithms help discriminate
system and translated into the appropriate text on the screen. between visually similar gestures, a significant issue with gesture
recognition systems. For instance, Kang et al. (2015) proposed a
Also, the task encompasses enhanced pre-processing methods
multi-layer classification model in which extra classifiers were
such as background subtraction and Gaussian smoothing to
added to differentiate between similar hand signs, thus
enhance the model's reliability in variable environments. Another
important consideration is the problem of overlapping gestures in enhancing the recognition of ambiguous signs [4].
which two gestures may look almost identical; this is where a
Nevertheless, the majority of such systems remain confined
layered classification algorithm is necessary to minimize
by one or another requirement for additional hardware or
misclassification. In addition, an auto-correction feature is
operating conditions. This research, therefore, seeks to
included in the system to provide correction options for the
mistakenly typed or identified word. The objective is to design a contribute to these studies by designing an affordable vision-
system that would be inexpensive, user-friendly and efficient in based model that can operate in real time using low-cost
real-life conditions with high accuracy. components such as webcams while considering the issues of
precision and environmental adaptability.
D. RELATED WORK
This paper focuses on sign language recognition, which has E. SUMMARY:
been an area of study for researchers in the last few decades, This research aims to design a real-time vision-based hand
with the main aim of developing technologies that bridge the gap sign language recognition system that translates American Sign
between people who are deaf or hard of hearing and the hearing Language into text. To this end, the system employs CNNs to
identify 27 ASL signs, including alphabet letters and symbols
communities. In the early stages of the development of signify,
with no input. To overcome the problem of limited dataset • A multistage classification approach will be proposed to
availability, a new dataset was created, which contained raw improve the classification of gestures that look almost the
images of hand gestures taken using a standard webcam. These same and increase the general performance of the model.
images then get pre-processed for better recognition, and some of
the reprocessing steps include converting images to grayscale, the G. CONCEPT GENERATION:
application of a Gaussian filter, and resizing the images. The concept generation for this hand sign language recognition
model revolves around leveraging modern computer vision and
In the proposed CNN-based model, convolution and pooling machine learning techniques to address the communication gap
layers capture critical spatial features from hand gesture images between deaf individuals and the hearing population. The
for classification. To enhance the performance, the algorithm was foundational idea is to create a system that translates American
doubled to capture visually similar gestures and added an auto- Sign Language (ASL) gestures into text in real-time, using a
correct feature for text output. In the controlled environment, the combination of convolutional neural networks (CNNs) and image
system has a recognition rate of 98 per cent. processing algorithms.

Despite these, the model has potential challenges like Several key concepts are generated from this idea:
background noise, changes in lighting conditions, and moving
gestures. Future work includes using background subtraction • Real-Time Video Processing
algorithms and better preprocessing methods for better • CNN-Based Gesture Recognition
performance in actual world conditions. Therefore, this project • Custom Dataset Generation
seeks to develop a cost-effective solution that ensures proper and • Image Preprocessing and Enhancement
instant communication between deaf and hearing people through • Multi-Layered Classification Algorithm
sign language.
H. DESIGN CONSTRAINTS:
The system is intended to be implemented on cheap and easily
accessible hardware including ordinary webcams. This hampers
F. OBJECTIVES the accuracy of gesture tracking when compared to other
The primary objectives of this research on hand sign language expensive technologies such as depth or motion tracking gloves
recognition are as follows: that might influence the imprecision in complicated gestures or
even small hand movements.
• The objective of this research is to develop a natural time
system that will be able to identify gestures used in • Feature Selection:-
American Sign Language fingerspelling and translate them Choosing the right features is essential when designing a hand
into text. This will create better communication between sign language recognition system, especially in gesture
the deaf or hard of hearing and normal hears in society. recognition. The main characteristics of this project are based on
the hand gestures detected in each image or video frame. Such
• The main goal is to exploit CNNs' potential in identifying the features include the hand's shape, edges and orientation, which
27 ASL symbols, including the alphabet and a blank symbol, are very useful in differentiating the various ASL symbols. The
from hand gesture images by identifying some important features that are considered necessary are extracted from raw
image data during training of the Convolutional Neural Networks
features.
(CNNs). Feature selection can only be improved by the use of
preprocessing techniques. For instance, in converting the images
• Since no datasets that fulfil the abovementioned
to grayscale, the entry-level continues to be simplified since
requirements are available, a second goal is to collect a new
colour information is not a necessity in gesture recognition.
dataset of ASL gestures recorded using a standard webcam.
Besides, using the Gaussian blur prevents noise and shows the
The project also aims to fine-tune basic image processing
leading edges of the hand. Another critical step is edge detection,
methods such as converting colour images to black and
which helps in defining the edges of the hand, making it easier for
white, blurring using Gaussian filters, and resizing images to
the CNN to emphasise the right parts of the image.
improve the recognition rate.
• Feature Importance:-
In the hand sign language recognition model, different
features are critical to identifying the various signs and, thus, the
performance of the system. The most significant features are The proposed multi-layered classification algorithm distinguished
shape and contour since they contain the main idea of the hand between similar gestures (for example, “D” and “R” hand signs)
gesture that represents a particular word or phrase. The and ensured minimal misclassification of gestures, which is a
configuration of fingers in forming a concept in the American common problem in the recognition of sign languages.
Sign Language (ASL) is crucial for differentiating the signs. For
instance, the position of fingers in the signs for ‘D’ and ‘R’ are Autocorrect Feature:
similar but different in specific ways that the system must be able The autocorrect feature enhanced the system’s text output by
to tell to avoid confusion. Another essential feature is edge suggesting accurate words depending on the context, decreasing
detection because it allows the model to determine the limits of the frequency of incorrect gestures.
the hand. Therefore, the model can pay more attention to the most
critical areas that differ from one gesture to another. These
include Finger positioning and movement orientation, which help
provide a detailed meaning of the gesture. In static gesture
ANALYSIS:
recognition, the position of fingers plays a vital role, as a slight
change in the position changes the recognized letter.

II. RESULT ANALYSIS AND VALIDATION


A. Result Analysis
The hand sign language recognition model was tested
extensively on a custom dataset of American Sign Language
(ASL) hand gestures, encompassing 27 symbols (A-Z and a blank
symbol). The system achieved an overall accuracy of 98%,
demonstrating its strong ability to correctly classify static
gestures. The high accuracy rate highlights the effectiveness of
the Convolutional Neural Network (CNN) in extracting and
identifying key features, such as hand shape, finger position, Graph: Model Accuracy under Different Test Conditions
and edge contours, from the images.

Model Accuracy:
The model could accurately recognize 27 American Sign
Language (ASL) gestures for static gesture classification, with
98% including the alphabet (A) and a blank symbol.
Gesture Classification:
The CNN-based model effectively captured and identified
relevant features like hand shape and finger positioning, which
are vital in distinguishing one ASL sign from another.

Preprocessing Impact:
Most preprocessing techniques, such as Gray scaling and
Gaussian blur filtering, enhance the recognition rate by
eliminating background noise and making the gesture images
clearer.
Handling Similar Gestures:
model can be used for real-time applications when conditions are
perfect. However, there is a need to improve the model to perform
better in dynamic and complex real-life situations. However, the
system has the potential of providing an accurate and easily
accessible means of communication to people who are deaf or
hard of hearing and hearing persons.

III. CONCLUSION AND FUTURE WORK


A. CONCLUSION:
The hand sign language recognition model developed in this
research works as an improvement towards reducing the
communication barrier between the deaf and the hearing
individuals. The proposed system implemented a Convolutional
Neural Network (CNN) for gesture classification of ASL, and it
achieved an accuracy of 98% for ASL alphabets and a blank
gesture. The model manages real-time recognition well through
image preprocessing methods, including grayscale conversion
and Gaussian blur filtering, which improve the distinction of
gestures and eliminate noise.

Furthermore, the incorporation of a multi-layered


classification algorithm helped the model better classify visually
similar gestures and, therefore, increase accuracy. It was also
important to note that the autocorrect feature improved the output
by suggesting corrections for misclassified words and, therefore,
enhanced the user experience.
Graph: Hand Sign Language Recognition Process
Despite the system's success in simulation, there are issues
concerning generalization to real-world scenarios, including
B. Validation illumination changes and cluttered backgrounds. However, the
The hand sign language recognition system's validation current model can only identify static gestures, and future
includes extensive testing performed in different environments to enhancements will be made to include dynamic gestures and
check the system's reliability and applicability. First, the system increase the practicality for various conditions. The proposed
was tested in controlled settings, producing high accuracy in system is generally inexpensive and readily available and can
identifying hand gestures in front of plain backgrounds with improve communication between people who are deaf or hard of
steady lighting conditions. In order to expand the scope of the hearing and people who are hard of hearing.
model, the model was then tested under more complex real-world
scenarios, including different lighting conditions, complex B. FUTURE WORK:
backgrounds, and different hand shapes and sizes of the user. The
As for future work on the hand sign language recognition
system performed well and maintained high accuracy in
model, the current shortcomings will be further investigated, and
controlled environments. However, a slight degradation in
the model will be further developed to increase its practical
accuracy was seen in the complex environment, which proves that
lighting and background complexity affect the efficiency of the applicability. The first goal of the proposed work is to enhance
proposed model. the model’s ability to operate under different conditions by
integrating more sophisticated techniques of background
Moreover, the system was tested with different subjects to subtraction and light adaptation so that the system can work
establish the effectiveness of the system with more than one user. efficiently in different lighting conditions with complex
The general performance of the model was good, but some background environments. Moreover, dynamic gesture
discrepancies were realized when the gestures were done at
recognition will be a significant concern so that the model
different speeds or when there was a slight change in the
recognizes the static fingerspelling gestures and the complete
positioning of the fingers. Thus, these results indicate that the
sequences of dynamic gestures that form whole words and [7] Ekman, P., & Friesen, W. V. (1978). Facial Action Coding
phrases in conversational sign language. System: A Technique for the Measurement of Facial
Movement. Consulting Psychologists Press.
Another potential area of improvement is to expand the size of
the training dataset and include more variations in the hand
[8] Kotsia, I., & Pitas, I. (2008). Facial expression recognition in
shapes, sizes, skin colours and gesture velocities. Using high-
image sequences using geometric deformation features and
level NLP methods can also enhance sentence formation,
enabling the system to work with more complicated syntactic support vector machines. IEEE Transactions on Image
forms. Last but not least, optimizing the model for mobile and Processing, 16(1), 172-
edge devices will allow the translation of sign language in real- 187. https://doi.org/10.1109/TIP.2006.888195
time on smartphones and other portable devices, thus improving
the communication of deaf people in daily life. [9] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J.,
... & Zheng, X. (2016). TensorFlow: Large-scale machine
REFERENCES learning on heterogeneous distributed
systems. Proceedings of the 12th USENIX Conference on
[1] Buolamwini, J., & Gebru, T. (2018). Gender shades:
Operating Systems Design and Implementation, 265-
Intersectional accuracy disparities in commercial gender
283. https://www.usenix.org/system/files/conference/osdi
classification. Proceedings of the Conference on Fairness,
16/osdi16-abadi.pdf
Accountability, and Transparency, 77-
91. https://doi.org/10.1145/3287560.3287593
[10] Mollahosseini, A., Hasani, B., & Mahoor, M. H. (2017).
AffectNet: A database for facial expression, valence, and
[2] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining
arousal computing in the wild. IEEE Transactions on
and harnessing adversarial examples. International
Affective Computing, 10(1), 18-
Conference on Learning Representations
31. https://doi.org/10.1109/TAFFC.2017.2740923
(ICLR). https://arxiv.org/abs/1412.6572

[3] Jiao, W., Lyu, M. R., & King, I. (2019). Real-time emotion
recognition via attention-gated hierarchical memory
network. Proceedings of the AAAI Conference on Artificial
Intelligence, 33(1), 8002-
8009. https://doi.org/10.1609/aaai.v33i01.33018002

[4] Subramanian, B., Kim, J., Maray, M., & Paul, A. (2022). Digital
twin model: A real-time emotion recognition system for
personalized healthcare. IEEE Access, 10, 81155-
81165. https://doi.org/10.1109/ACCESS.2022.3187717

[5] Tadesse, M. M., Hong, M., & Eom, J. H. (2020). Emotion


detection using affective computing techniques: A
comprehensive review. IEEE Transactions on Affective
Computing, 11(4), 432-
450. https://doi.org/10.1109/TAFFC.2020.2972557

[6] Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2016). Learning
social relation traits from face images. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 39(6), 1295-
1307. https://doi.org/10.1109/TPAMI.2016.2572671

You might also like