DOC-20241117-WA0000

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 52

CHAPTER 1

INTRODUCTION

DEEP LEARNING

Deep learning which is a hot buzz nowadays and has firmly put down its roots in a
vast multitude of industries that are investing in fields like Artificial Intelligence, Big
Data and Analytics. For example, Google is using deep learning in its voice and image
recognition algorithms whereas Netflix and Amazon are using it to understand the
behavior of their customer. In fact, you won’t believe it, but researchers at MIT are trying to
predict future using deep learning. Deep learning can be considered as a subset of machine
learning. It is a field that is based on learning and improving on its own by examining
computer algorithms. While machine learning uses simpler concepts, deep learning works
with artificial neural networks, which are designed to imitate how humans think and learn.
Neural Network is the biological neurons, which is nothing but a brain cell. Until
recently, neural networks were limited by computing power and thus were limited in
complexity.

Deep learning has aided image classification, language translation, speech recognition.
It can be used to solve any pattern recognition problem and without human intervention. Deep
learning models are capable enough to focus on the accurate features themselves by requiring
a little guidance from the programmer and are very helpful in solving out the problem of
dimensionality. Deep learning algorithms are used, especially when we have a huge no of
inputs and outputs.

HOW DEEP LEARNING WORKS?

Neural Networks are layers of nodes, much like the human brain is made up of
neurons. Nodes within individual layers are connected to adjacent layers. The network is
said to be deeper based on the number of layers it has. A single neuron in the human
brain receives thousands of signals from other neurons. In an artificial neural network,
signals travel between nodes and assign corresponding weights. A heavier weighted node
will exert more effect on the next layer of nodes. The final layer compiles the weighted
inputs to produce an output. Deep learning systems require powerful hardware because
they have a large amount of data
being processed and involve several complex mathematical calculations. Even with such
advanced hardware, however, training a neural network can take weeks.

Deep learning systems require large amounts of data to return accurate results;
accordingly, information is fed as huge data sets. When processing the data, artificial neural
networks are able to classify data with the answers received from a series of binary true or
false questions involving highly complex mathematical calculations. Deep learning takes
this one step ahead. Deep learning automatically finds out the features which are important
for classification because of deep neural networks, whereas in case of Machine Learning we
had to manually define these features.

WHY DEEP LEARNING IS POPULAR?

The first advantage of deep learning over machine learning is the needlessness of the
so-called feature extraction. Long before deep learning was used, traditional machine learning
methods were mainly used such as Decision Trees, SVM, Naïve Bayes Classifier and Logistic
Regression. These algorithms are also called flat algorithms. Flat here means that these
algorithms cannot normally be applied directly to the raw data (such as .csv, images, text,
etc.). We need a pre-processing step called Feature Extraction. The result of Feature
Extraction is a representation of the given raw data that can now be used by these classic
machine learning algorithms to perform a task. Feature Extraction is usually quite complex
and requires detailed knowledge of the problem domain. This pre-processing layer must
be adapted, tested and refined over several iterations for optimal results. The feature
extraction step is already part of the process that takes place in an artificial neural network.
During the training process, this step is also optimized by the neural network to obtain the
best possible abstract representation of the input data. This means that the models of deep
learning thus require little to no manual effort to perform and optimize the feature extraction
process.

TYPES OF DEEP LEARNING

✓ Feed forward neural network


✓ Radial basis function neural networks
✓ Multi-layer Perceptron
✓ Convolution neural network (CNN)
✓ Recurrent neural network
✓ Modular neural network
FEED FORWARD NEURAL NETWORK

This type of neural network is the very basic neural network where the flow control
occurs from the input layer and goes towards the output layer. These kinds of networks are
only having single layers or only 1 hidden layer since the data moves only in 1 direction there
is no back propagation technique in this network. In the feed-forward neural network, there
are not any feedback loops or connections in the network. There can be multiple hidden
layers which depend on what kind of data you are dealing with. The number of hidden layers
is known as the depth of the neural network. The deep neural network can learn from more
functions. Input layer first provides the neural network with data and the output layer
then make predictions on that data which is based on a series of functions. ReLU Function
is the most commonly used activation function in the deep neural network.

RADIAL BASIS FUNCTION NEURAL NETWORK

This kind of neural network has generally more than 1 layer preferably two layers
Radial basis networks are generally used in power restoration systems to restore the power in
the shortest span of time to avoid blackouts. The popular type of feed-forward network is the
radial basis function (RBF) network. It has two layers, not counting the input layer, and
contrasts from a multilayer perceptron in the method that the hidden units implement
computations. Each hidden unit significantly defines a specific point in input space, and its
output, or activation, for a given instance based on the distance between its point and the
instance, which is only a different point. The closer these two points, the better the activation.

The parameters that such a network understands are the centres and widths of the
RBFs and the weights used to design the linear set of the outputs acquired from the hidden
layer. An essential benefit over multilayer perceptrons is that the first group of parameters
can be decided independently of the second group and make accurate classifiers. One
method to decide the first group of parameters is to use clustering. The simple k-means
clustering algorithm can be applied, clustering each class independently to obtain k-basis
functions for each class. The second group of parameters is understood by keeping the
first parameters constant. This includes learning a simple linear classifier using one of the
approaches such as linear or logistic regression. If there are long fewer hidden units than
training instances, this can be done fast.

The limitation of RBF networks is that they provide each attribute with a similar
weight because all are considered equally in the distance computation unless attribute
weight
parameters are contained in the complete optimization process. Therefore, they cannot deal
efficiently with inappropriate attributes, against multilayer perceptrons. Support vector
machines share similar issues. Support vector machines with Gaussian kernels (i.e., “RBF
kernels”) are a definite method of RBF network, in which one function is centered on each
training instance, all basis functions have a similar width, and the outputs are merged linearly
by calculating the maximum-margin hyperplane. This has the result that some of the RBFs
have a nonzero weight the ones that define the support vectors.

MULTI LAYER PERCEPTRON

This type of network are having more than 3 layers and its used to classify the data
which is not linear. These networks are extensively used for speech recognition and other
machine learning technologies. Multilayer perception is also known as MLP. It is fully
connected dense layers, which transform any input dimension to the desired dimension. A
multi-layer perception is a neural network that has multiple layers. To create a neural
network we combine neurons together so that the outputs of some neurons are inputs of
other neurons.

There are three inputs and thus three input nodes and the hidden layer has three
nodes. The output layer gives two outputs, therefore there are two output nodes. The
nodes in the input layer take input and forward it for further process, in the diagram above
the nodes in the input layer forwards their output to each of the three nodes in the hidden
layer, and in the same way, the hidden layer processes the information and passes it to the
output layer. Every node in the multi-layer perception uses a sigmoid activation function.
The sigmoid activation function takes real values as input and converts them to numbers
between 0 and 1 using the sigmoid formula.

CONVOLUTION NEURAL NETWORK

CNN is one of the variations of the multilayer Perceptron.CNN can contain more than
1 convolution layer and since it contains a convolution layer the network is very deep with
fewer parameters.CNN is very effective for image recognition and identifying different image
patterns. It is assumed that the reader knows the concept of neural networks.
When it comes to Machine Learning, Artificial Neural Networks perform really well.
Artificial Neural Networks are used in various classification tasks like image, audio, words.
Different types of Neural Networks are used for different purposes, for example for
predicting the sequence of words we use Recurrent Neural Networks more precisely an
LSTM, similarly for image classification we use Convolution Neural networks. In this blog,
we are going to build a basic building block for CNN. Before diving into the Convolution
Neural Network, let us first revisit some concepts of Neural Network. In a regular Neural
Network there are three types of layers:

1. Input Layers: It’s the layer in which we give input to our model. The number of
neurons in this layer is equal to the total number of features in our data (number of
pixels in the case of an image).
2. Hidden Layer: The input from the Input layer is then feed into the hidden layer. There
can be many hidden layers depending upon our model and data size. Each hidden layer
can have different numbers of neurons which are generally greater than the number of
features. The output from each layer is computed by matrix multiplication of output of
the previous layer with learnable weights of that layer and then by the addition of
learnable biases followed by activation function which makes the network nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a logistic function like
sigmoid or softmax which converts the output of each class into the probability score of
each class.
The data is then fed into the model and output from each layer is obtained this step is
called feed forward, we then calculate the error using an error function, some common error
functions are cross-entropy, square loss error, etc. After that, we back propagate into the
model by calculating the derivatives. This step is called Back propagation which basically is
used to minimize the loss.

RECURRENT NEURAL NETWORK

RNN is a type of neural network where the output of a particular neuron is fed back as
an input to the same node. This method helps the network to predict the output. This kind of
network is useful in maintaining a small state of memory which is very useful for developing
the Chatbot and text-to-speech technologies. The output from previous step are fed as
input to the current step. In traditional neural networks, all the inputs and outputs are
independent of each other, but in cases like when it is required to predict the next word of a
sentence, the previous words are required and hence there is a need to remember the
previous words. Thus
RNN came into existence, which solved this issue with the help of a Hidden Layer. The main
and most important feature of RNN is Hidden state, which remembers some information
about a sequence.

RNN have a “memory” which remembers all information about what has been calculated. It
uses the same parameters for each input as it performs the same task on all the inputs or
hidden layers to produce the output. This reduces the complexity of parameters, unlike other
neural networks. RNN will do the following:
 RNN converts the independent activations into dependent activations by providing the
same weights and biases to all the layers, thus reducing the complexity of increasing
parameters and memorizing each previous outputs by giving each output as input to the
next hidden layer.
 Hence these three layers can be joined together such that the weights and bias of all the
hidden layers is the same, into a single recurrent layer.

MODULAR NEURAL NETWORK

A modular neural network is made up of several neural network models that are linked
together via an intermediate. Modular neural networks allow for more complex management
and handling of more basic neural network systems. In this case, the multiple neural networks
act as modules, each solving a portion of the issue. An integrator is responsible for dividing
the problem into multiple modules as well as integrating the answers of the modules to create
the system's final output. Modular neural networks have been studied in various methods
since the 1980s. A collection of "simple" or "weak" learners can outperform a single
deep learning model, according to the idea of ensemble learning. Modular neural networks,
in general, allow engineers to expand the possibilities of employing these technologies to
push the limits of what neural networks can do. Each network is converted into a module that
may be freely combined with modules of different sorts.
Factors leading to Modular Neural Network's development
 Reducing model complexity: Controlling the degrees of freedom of the system is one
method to minimize training time.
 Data fusion and prediction averaging: Network committees may be thought of as
composite systems consisting of comparable parts.
 Combination of techniques: As a building block, more than one method or network
class can be utilized.
 Learning several tasks at the same time: Trained modules can be transferred
between systems that are built for various tasks.
 Robustness and incrementality: The integrated network may be fault-tolerant and
develop progressively.

APPLICATIONS OF DEEP LEARNING

NATURAL LANGUAGE PROCESSING (NLP) - Understanding the complexities


associated with language whether it is syntax, semantics, tonal nuances, expressions, or
even sarcasm, is one of the hardest tasks for humans to learn. Constant training since birth
and exposure to different social settings help humans develop appropriate responses and a
personalized form of expression to every scenario. Natural Language Processing through
Deep Learning is trying to achieve the same thing by training machines to catch linguistic
nuances and frame appropriate responses.

Healthcare - Medical professionals use a CNN or Convolution Neural Network, a Deep


learning method, to grade different types of cancer cells. The deep CNN models then
demarcate various cellular features within the sample and detect carcinogenic elements.
Helping early, accurate and speedy diagnosis of life-threatening diseases, augmented
clinicians addressing the shortage of quality physicians and healthcare providers,
pathology results and treatment course standardization, and understanding genetics to
predict future risk of diseases and negative health episodes are some of the Deep
Learning projects picking up speed in the Healthcare domain.

IMAGE – LANGUAGE TRANSLATIONS - A fascination application of Deep Learning


includes the Image – Language translations. With the Google Translate app, it is now
possible to automatically translate photographic images with text into a real-time language
of your choice. All you need to do is to hold the camera on top of the object and your phone
runs a deep learning network to read the image, OCR it (i.e. convert it to text) and then
translate it into a text in the preferred language. This is an extremely useful application
considering that languages will gradually stop being a barrier, allowing universal human
communication.

Pixel Restoration - The concept of zooming into videos beyond its actual resolution was
unrealistic until Deep Learning came into play. In 2017, Google Brain researchers trained
a Deep Learning network to take very low resolution images of faces and predict the
person’s face through it. This method was known as the Pixel Recursive Super Resolution.
It enhances the resolution of photos significantly, pinpointing prominent features in order
that is just enough for personality identification.

NEWS AGGREGATION AND FAKE NEWS DETECTION - Deep Learning allows you
to customize news depending on the readers’ persona. Neural Networks help develop
classifiers that can detect fake and biased news and remove it from your feed.

ROBOTICS - Deep Learning is heavily used for building robots to perform human-like
tasks. Robots powered by Deep Learning use real-time updates to sense obstacles in their
path and pre-plan their journey instantly. Boston Dynamics robots react to people when
someone pushes them around, they can unload a dishwasher, get up when they fall, and do
other tasks as well.

SELF DRIVING CARS - Deep Learning is the force that is bringing autonomous driving
to life. A million sets of data are fed to a system to build a model, to train the machines to
learn, and then test the results in a safe environment. The Uber Artificial Intelligence Labs
at Pittsburg is not only working on making driverless cars humdrum but also integrating
several smart features such as food delivery options with the use of driverless cars. The
major concern for autonomous car developers is handling unprecedented scenarios. A
regular cycle of testing and implementation typical to deep learning algorithms is ensuring
safe driving with more and more exposure to millions of scenarios. Data from cameras,
sensors, geo-mapping is helping create succinct and sophisticated models to navigate
through traffic, identify paths, signage, pedestrian-only routes, and real-time elements like
traffic volume and road blockages.
ADVANTAGE OF DEEP LEARNING

✓ No need to label the data


✓ Effective at Producing High-Quality Results
✓ There Is No Need for Feature Engineering
✓ Scalability
✓ The Cost-Effectiveness
✓ Advanced Analytics
✓ Support Parallel and Distributed algorithms
DISADVANTAGE OF DEEP LEARNING

✓ Massive Data Requirement


✓ High Processing Power
✓ Struggles With Real-Life Data and Not have strong theoretical groundwork

COMPANY PROFILE
Wavtech solution as a leading IT solution and service provider, provides innovative
information technology - enabled solutions and services to meet the demands arising from
social transformation, shaping new life styles for individuals and creating values for the
society.

Focusing on software technology, Wavtech solution provides industry solutions and


product engineering solutions, related software products & platforms, and services,
through seamless integration of software and services, software and manufacturing,
as well as technology and industrial management capacity.

Wavtech solution helps industry customers establish best practices in business


development and management. The Wavtech solution serves include real time projects,
web designing, web hosting, software development and training etc, in many of which,
has a leading market share. Notably, Wavtech Solution has participated in the
formulation of many national IT standards and specifications.

Wavtech solution has the world’s leading product engineering capabilities, ranging from
consultation, design, R&D, and integration to testing of embedded software, in the fields
of automotive electronics, smart devices, digital home products, and IT products. The
software provided by fantasy solution runs in a number of globally renowned brands.
Particularly offering the services that include application development & maintenance,
ERP implementation & consulting, testing, performance engineering, software localization
& globalization, IT infrastructures, BPO, IT education & training, etc.Sticking to its
business philosophy and brand commitment of “Beyond Technology”, fantasy solution is
dedicated to providing innovative information technologies to drive the sustainable
development of society, as well as becoming a company that is well recognized and
respected by employees, shareholders, customers, and society.

OBJECTIVE
The process of turning the user's signs and motions into text is referred to as sign
language recognition. It helps persons who are unable to communicate with the broader
public. The motion is mapped to relevant text in the training data using image processing
techniques and neural networks, and so raw images/videos are turned into text that can
be read and comprehended. Dumb persons are frequently denied access to normal
communication with other members of society. It has been found that they find it difficult
to connect with normal people with their gestures at times, as only a few of them are
recognised by the majority of people. Because people with hearing loss or who are deaf are
unable to communicate verbally, they must rely on some form of visual communication the
majority of the time. In the deaf and dumb community, sign language is the major mode
of communication. It has syntax and vocabulary much like any other language, but it
communicates through visual means. The issue arises when people who are deaf or dumb
try to communicate with others using these sign language grammars. This is due to the
fact that most people are unaware of these grammar rules. As a result, it has been observed
that a impared person's communication is limited to his or her family or the deaf
community. The increased public acceptance and funding for international projects
emphasises the necessity of sign language. For the dumb community, a computer-based
solution is in high demand in these age of technology. Some steps toward this goal include
teaching a computer to recognise speech, facial emotions, and human gestures.
Nonverbally communicated information is referred to as gestures. At any given time, a human
can make an infinite number of gestures. Computer vision researchers are particularly
interested in human gestures since they are received through vision. The goal of the project is
to create an HCI that can detect human motions. The conversion of these motions into
machine language necessitates the use of a complicated programming procedure.
CHAPTER 2

LITERATURE SURVEY
INTRODUCTION

A literature review surveys books, scholarly articles, and any other sources relevant
to a particular issue, area of research, or theory, and by so doing, provides a description,
summary, and critical evaluation of these works in relation to the research problem being
investigated. A literature survey, or literature review, is a proof essay of sorts. It is a
study and review of relevant literature materials. Literature reviews are designed to
provide an overview of sources you have explored while researching a particular topic and
to demonstrate to your readers how your research fits within a larger field of study.

Various Literature survey have been conducted by analysing papers under the area
of data mining, Machine Learning, Neural Network, Deep Learning to get an insight
about the undergoing research progression in the area of educational sector.

RELATED WORKS

Shagun Katoch (2022) [1] A basic human need is the capacity for communication
and self-expression. However, our viewpoints and the ways in which we interact with
people might differ significantly from those of individuals around us depending on
factors such as our upbringing, education, culture, and other factors. Additionally, it is
crucial to make sure that we are understood in the manner in which we want. Despite
this, regular people have little trouble engaging with one another and expressing
themselves through voice, gestures, body language, reading, writing, and talking, all of
which are commonly utilised by them. However, individuals with speech impediment only
use sign language, which makes it more challenging for them to interact with the majority
of people. This suggests the need for software that can identify sign language and
translate it into spoken or written language and vice versa. But the availability, price, and
use of such identities are constrained. The development of automatic sign language
recognition systems is mostly the result of scholars from several nations working on these
sign language recognizers.
Hamzah Luqman (2022) [2] The main form of communication for those who have
hearing loss is sign language. This language relies heavily on non-manual motions and hand
articulations. Recognition of sign language has gained popularity recently. In this study, we
present a trainable deep learning network that can efficiently capture the spatiotemporal
information from a limited number of sign frames for isolated sign language detection. Three
networks—the dynamic motion network (DMN), the accumulative motion network (AMN),
and the sign recognition network—combine to form our proposed hierarchical sign learning
module (SRN). In addition, we provide a method for addressing the variances in the sign
samples produced by various signers by extracting essential postures. These crucial postures
help the DMN stream acquire the spatiotemporal details relevant to the symptoms. We also
provide a cutting-edge method for encapsulating both static and dynamic information about
sign motions in a single frame. The main postures of the sign are fused in the forward and
backward directions to produce an accumulative video motion frame, preserving the sign's
spatial and temporal information.

Shikhar Sharma (2021) [3] The communication between a person from the impaired
community with a person who does not understand sign language could be a tedious task.
Sign language is the art of conveying messages using hand gestures. Recognition of dynamic
hand gestures in American Sign Language (ASL) became a very important challenge that
is still unresolved. In order to resolve the challenges of dynamic ASL recognition, a more
advanced successor of the Convolutional Neural Networks (CNNs) called 3-D CNNs is
employed, which can recognize the patterns in volumetric data like videos. The CNN is
trained for classification of 100 words on Boston ASL (Lexicon Video Dataset) LVD
dataset with more than 3300 English words signed by 6 different signers. 70% of the dataset
is used for Training while the remaining 30% dataset is used for testing the model. The
proposed work outperforms the existing state-of-art models in terms of precision (3.7%),
recall (4.3%), and f-measure (3.9%). The computing time (0.19 seconds per frame) of the
proposed work shows that the proposal may be used in real-time applications.
Hamzah Luqman (2022) [4] In order to communicate and engage with persons who
have hearing impairments, as well as for applications involving human-machine
interaction, sign language depends on the visual movements of human body parts. In recent
years, this discipline has drawn increasing interest, and a number of study findings
encompassing a range of topics, including sign acquisition, segmentation, recognition,
translation, and language structures, have been observed. This work presents a thorough,
current review of the state-of-the-art literature on automated sign language processing.
With an emphasis on acquisition tools, readily accessible databases, and recognition
approaches for finger spelling signs, isolated sign words, and continuous phrase recognition
systems, the study offers a taxonomy and overview of the body of knowledge and research
activities. It explores several relevant difficulties and highlights current advancements such
as deep machine learning and multimodal techniques. The goal of this survey is to gather
information from junior researchers and business developers working on sign language
gesture recognition and related systems, as well as to identify distinctive features, the
current state of the field, and potential future directions that could lead to further
advancements.

Noha Sarhan (2022) [5] In this study, we suggest multi-phase fine-tuning for deep
networks to recognise sign language instead than only normal object identification (SLR). By
fine-tuning the network's weights over numerous stages, it expands on the fruitful concept of
transfer learning. Layers are trained in steps by gradually unfreezing layers for training,
starting at the top of the network. Use this innovative training strategy for SLR as there is
a lack of training data and significant differences from the datasets typically utilised for pre-
training in this application. Our tests demonstrate that multi-phase fine-tuning may achieve
much higher accuracy in a smaller number of training epochs than earlier fine-tuning
methods. A key question in transfer learning is how many layers to fine-tune to take
advantage of the generality of lower layers’ features, while allowing the network to ft to
the target task. Suggested a sequential fine-tuning approach, starting by merely adjusting
the weights in the final fully linked layer and gradually adding more layers. Utilizing
Google Net, one of the most widely used network architectures, it was applied to transfer
learning from the field of object recognition to SLR.
CHAPTER 3

SYSTEM ANALYSIS

EXISTING SYSTEM

The sign language is used widely by people who are deaf-dumb these are used as a
medium for communication. A sign language is nothing but composed of various gestures
formed by different shapes of hand, its movements, orientations as well as the facial
expressions. There are around 466 million people worldwide with hearing loss and 34 million
of these are children. `Deaf' people have very little or no hearing ability. They use sign
language for communication. People use different sign languages in different parts of the
world. Compared to spoken languages they are very less in number. In existing system, lack
of datasets along with variance in sign language with locality has resulted in restrained
efforts in finger gesture detection. Existing project aims at taking the basic step in bridging
the communication gap between normal people and deaf and dumb people using Indian
sign language. Effective extension of this project to words and common expressions may
not only make the deaf and dumb people communicate faster and easier with outer world,
but also provide a boost in Developing autonomous systems for understanding and aiding
them. The Indian Sign Language lags behind its American Counterpart as the research in this
field is hampered by the lack of standard datasets

DISADVANTAGES

• Need hardware control to detect the hands

• Hand segmentation become complex of various backgrounds

• Segmentation accuracy is less in hand tracking


PROPOSED SYSTEM

Instead of acoustic sound patterns, Sign Language is a gesture-based language that


uses hand movements, hand orientation, and facial expression. This form of language is
not universal and has different patterns depending on the people. However, because most
individuals aren't familiar with sign language, Deaf-mute persons are finding it more difficult
to communicate without the aid of a translation of some sort. They believe they are being
shunned. Between deaf-mute people and normal people, Sign Language Recognition has
become a commonly accepted communication approach. Computer vision-based and sensor-
based systems are two types of recognition models. The camera is utilized for input in
computer vision-based gesture recognition, and image processing of input motions is
done before recognition. Following that, several algorithms such as region of interest
algorithm and Neural Network approaches are used to recognize the processed gestures.
The fundamental disadvantage of a vision-based sign language recognition system is that
the picture collecting process is subject to numerous environmental concerns, such as camera
placement, background conditions, and lightning sensitivity. However, it is more
convenient and cost-effective than employing a camera and tracker to collect data. However,
for greater accuracy, Neural Network methods like the Hidden Markov Model are combined
with camera data.

ADVANTAGES

• Segmentation accuracy is high

• Easy to detect the finger postures

• Track fingers and sign recognition with less computational steps

• No need for additional hardware system


CHAPTER 4

SYSTEM SPECIFICATION

HARDWARE SPECIFICATION

 Processor : Intel core processor 2.6.0 GHZ


 RAM : 4 GB
 Hard disk : 320 GB
 Keyboard : Standard keyboard
 Monitor : 15 inch color monitor

SOFTWARE SPECIFICATION

 Operating System : Windows OS


 Front End : PYTHON
 IDE : PYCHARAM
 Back End : MYSQL
 Application : WINDOWS APPLICATION
CHAPTER 5

SOFTWARE DESCRIPTION

ABOUT FRONT END

Python is an interpreted high-level programming language for general-purpose


programming. Created by Guido van Rossum and first released in 1991, Python has a design
philosophy that emphasizes code readability, notably using significant whitespace. It provides
constructs that enable clear programming on both small and large scales. In July 2018, Van
Rossum stepped down as the leader in the language community. Python features a dynamic
type system and automatic memory management. It supports multiple programming
paradigms, including object-oriented, imperative, functional and procedural, and has a large
and comprehensive standard library. Python interpreters are available for many operating
systems. CPython, the reference implementation of Python, is open source software and has a
community-based development model, as do nearly all of Python's other implementations.
Python and CPython are managed by the non-profit Python Software Foundation. Rather than
having all of its functionality built into its core, Python was designed to be highly extensible.
This compact modularity has made it particularly popular as a means of adding programmable
interfaces to existing applications. Van Rossum's vision of a small core language with a large
standard library and easily extensible interpreter stemmed from his frustrations with ABC,
which espoused the opposite approach. While offering choice in coding methodology, the
Python philosophy rejects exuberant syntax (such as that of Perl) in favor of a simpler, less-
cluttered grammar. As Alex Martelli put it: "To describe something as 'clever' is not
considered a compliment in the Python culture."Python's philosophy rejects the Perl "there
is more than one way to do it" approach to language design in favour of "there should
be one—and preferably only one—obvious way to do it".

Python's developers strive to avoid premature optimization, and reject patches to non-
critical parts of CPython that would offer marginal increases in speed at the cost of clarity.[
When speed is important, a Python programmer can move time-critical functions to extension
modules written in languages such as C, or use PyPy, a just-in-time compiler. CPython is also
available, which translates a Python script into C and makes direct C-level API calls into the
Python interpreter. An important goal of Python's developers is keeping it fun to use. This is
reflected in the language's name a tribute to the British comedy group Monty Python and in
occasionally playful approaches to tutorials and reference materials, such as examples that
refer to spam and eggs (from a famous Monty Python sketch) instead of the standard for and
bar.

A common neologism in the Python community is pythonic, which can have a wide
range of meanings related to program style. To say that code is pythonic is to say that it uses
Python idioms well, that it is natural or shows fluency in the language, that it conforms with
Python's minimalist philosophy and emphasis on readability. In contrast, code that is difficult
to understand or reads like a rough transcription from another programming language is called
unpythonic. Users and admirers of Python, especially those considered knowledgeable or
experienced, are often referred to as Pythonists, Pythonistas, and Pythoneers. Python is an
interpreted, object-oriented, high-level programming language with dynamic semantics. Its
high-level built in data structures, combined with dynamic typing and dynamic binding, make
it very attractive for Rapid Application Development, as well as for use as a scripting or glue
language to connect existing components together. Python's simple, easy to learn syntax
emphasizes readability and therefore reduces the cost of program maintenance. Python
supports modules and packages, which encourages program modularity and code reuse. The
Python interpreter and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed. Often, programmers fall
in love with Python because of the increased productivity it provides. Since there is no
compilation step, the edit-test-debug cycle is incredibly fast. Debugging Python programs is
easy: a bug or bad input will never cause a segmentation fault. Instead, when the interpreter
discovers an error, it raises an exception. When the program doesn't catch the exception, the
interpreter prints a stack trace. A source level debugger allows inspection of local and global
variables, evaluation of arbitrary expressions, setting breakpoints, stepping through the code a
line at a time, and so on. The debugger is written in Python itself, testifying to Python's
introspective power. On the other hand, often the quickest way to debug a program is to add a
few print statements to the source: the fast edit-test-debug cycle makes this simple approach
very effective.

Python’s initial development was spearheaded by Guido van Rossum in the late
1980s. Today, it is developed by the Python Software Foundation. Because Python is a
multiparadigm language, Python programmers can accomplish their tasks using different
styles of programming: object oriented, imperative, functional or reflective. Python can be
used in Web development, numeric programming, game development, serial port access and
more.
There are two attributes that make development time in Python faster than in other
programming languages:

1. Python is an interpreted language, which precludes the need to compile code before
executing a program because Python does the compilation in the background. Because
Python is a high-level programming language, it abstracts many sophisticated details
from the programming code. Python focuses so much on this abstraction that its code
can be understood by most novice programmers.
2. Python code tends to be shorter than comparable codes. Although Python offers fast
development times, it lags slightly in terms of execution time. Compared to fully
compiling languages like C and C++, Python programs execute slower. Of course,
with the processing speeds of computers these days, the speed differences are
usually only observed in benchmarking tests, not in real-world operations. In most
cases, Python is already included in Linux distributions and Mac OS X machines.

PYCHARM

PyCharm is the most popular IDE used for Python scripting language. This
chapter will give you an introduction to PyCharm and explains its features.

PyCharm offers some of the best features to its users and developers in the following
aspects

 Code completion and inspection


 Advanced debugging
 Support for web programming and frameworks such as Django and Flask
Features:
 Coding assistance and analysis, with code
completion, syntax and error highlighting, linter integration, and quick fixes
 Project and code navigation: specialized project views, file structure views and
quick jumping between files, classes, methods and usages
 Python code refactoring: including rename, extract method, introduce variable,
introduce constant, pull up, push down and others
 Support for web frameworks: Django, web2py and Flask
 Integrated Python debugger
 Integrated unit testing, with line-by-line coverage
 Google App Engine Python development
 Version control integration: unified user interface
for Mercurial, Git, Subversion, Perforce and CVS with changelists and merge
 Scientific tools integration: integrates with IPython Notebook, has an interactive
Python console, and supports Anaconda as well as multiple scientific packages
including Matplotlib and NumP

ALGORITHM USED
CNN ALGORITHM
CNN (Convolutional Neural Network) is a deep learning algorithm that is
primarily used for image processing and computer vision tasks. The algorithm is based
on a type of neural network architecture that has several convolutional layers, which
are responsible for extracting meaningful features from the input image.
The CNN (Convolutional Neural Network) algorithm typically involves the following
steps:
Input image: The input image is fed into the CNN algorithm.
Convolution: In this step, the input image is convolved with a set of learnable filters.
Each filter extracts a specific feature from the image, such as edges or textures.
Activation function: The result of the convolution operation is then passed through a
non-linear activation function, such as the Rectified Linear Unit (ReLU), to introduce
non-linearity into the model.
Pooling: In this step, the output of the activation function is downsampled using a
pooling operation, such as max pooling or average pooling. This helps to reduce the
spatial dimensionality of the input and makes the model more efficient.
Fully Connected Layers: After several convolutional and pooling layers, the output is
then flattened and fed into a series of fully connected layers. These layers perform the
classification or regression task by mapping the features extracted from the input
image to the target output.
Output layer: Finally, the output layer computes the final prediction, which could be
a classification probability distribution or a continuous value.
Optimization: The optimizer updates the model's weights using backpropagation to
minimize the loss function.
Repeat: The above steps are repeated multiple times until the model converges to the
optimal weights that minimize the loss function and improve the accuracy of the
predictions.
CHAPTER 6

PROJECT DESCRIPTION

PROBLEM DEFINITION

The problem of sign language recognition involves developing a system that can
accurately interpret and understand sign language gestures and translate them into written or
spoken language. This can be a challenging task because sign language is a complex and
expressive visual language that involves hand gestures, body language, and facial expressions.
To address this problem, a deep learning model can be developed that can accurately
recognize and interpret sign language gestures. The model should be trained on a large
dataset of sign language gestures, using techniques such as convolutional neural networks
(CNNs) and recurrent neural networks (RNNs). The deep learning model should be able to
recognize the various nuances and subtleties of sign language gestures, such as the speed
and direction of hand movements, facial expressions, and body language. The model
should also be able to handle variations in sign language dialects and regional differences.
Additionally, the model should be able to handle real-time sign language recognition,
which requires high-speed processing and low latency. This can be achieved by
optimizing the model architecture and using hardware accelerators such as graphics
processing units (GPUs). Overall, the development of a deep learning model for sign
language recognition has the potential to provide a valuable tool for individuals who
use sign language as their primary mode of communication. The model can help bridge
the communication gap between hearing and deaf individuals and promote inclusivity and
accessibility for all.
PROJECT OVERVIEW

Sign language is a natural way of communication for deaf people. However,


communication between deaf and hearing people is often challenging due to the language
barrier. To bridge this gap, sign language recognition systems can be developed that can
recognize and interpret sign language gestures.In this project, we will develop a sign language
recognition system using Convolutional Neural Networks (CNNs). CNNs are a type of deep
learning model that can learn to recognize patterns and features in images, making them an
excellent choice for image-based tasks like sign language recognition To train our CNN
model, we will use a sign language dataset that contains images of different hand gestures
used in sign language. The dataset can be split into training and testing datasets, where the
training dataset is used to train the CNN model, and the testing dataset is used to evaluate
its performance Before training the CNN model, we will preprocess the images in the dataset
by resizing them to a uniform size, converting them to grayscale, and normalizing their
pixel values. This preprocessing step helps the CNN model to learn the features of the hand
gestures better. We will use a CNN model architecture that consists of multiple convolutional
layers, max-pooling layers, and fully connected layers. The convolutional layers are
responsible for learning the features of the hand gestures, and the fully connected layers are
responsible for classifying the gestures. To train the CNN model, we will use
backpropagation to adjust the weights of the model to minimize the training loss. We will
use a categorical cross-entropy loss function, and the Adam optimization algorithm to
optimize the model's parameters. To evaluate the performance of our CNN model, we will
use the testing dataset and calculate metrics such as accuracy, precision, recall, and F1 score.
These metrics will help us understand how well our model is performing and identify areas
for improvement. In this project, we will develop a sign language recognition system using
CNNs. By leveraging the power of deep learning and image processing, we aim to create a
model that can accurately recognize and interpret hand gestures used in sign language, which
can help bridge the communication gap between deaf and hearing people.
MODULES DESCRIPTION

HAND IMAGE

ACQUISITION

The hand gesture, during daily life, is a natural communication method mostly used
only among people who have some difficulty in speaking or hearing. However, a human
computer interaction system based on gestures has various application scenarios. In this
module, we can input the hand images from real time camera. The inbuilt camera can be
connected to the system. Gesture recognition has become a hot topic for decades. Nowadays
two methods are used primarily to perform gesture recognition. One is based on professional,
wearable electromagnetic devices, like special gloves. The other one utilizes computer vision.
The former one is mainly used in the film industry. It performs well but is costly and unusable
in some environment. The latter one involves image processing. However, the performance of
gesture recognition directly based on the features extracted by image processing is relatively
limited. Hand image captured from web camera. The purpose of Web camera is to capture the
human generated hand gesture and store its image in memory. The package called python
framework is used for storing image in memory

BINARIZATION
Background subtraction is one of the major tasks in the field of computer
vision and image processing whose aim is to detect changes in image sequences. Background
subtraction is any technique which allows an image's foreground to be extracted for further
processing (object recognition etc.). Many applications do not need to know everything about
the evolution of movement in a video sequence, but only require the information of changes in
the scene, because an image's regions of interest are objects (humans, cars, text etc.) in its
foreground. After the stage of image preprocessing (which may include image denoising, post
processing like morphology etc.) object localization is required which may make use of this
technique. Detecting foreground to separate these changes taking place in the foreground of
the background. It is a set of techniques that typically analyze the video sequences in real
time and are recorded with a stationary camera. All detection techniques are based on
modeling the background of the image i.e. set the background and detect which changes
occur. Defining the background can be very difficult when it contains shapes, shadows, and
moving objects. In defining the background it is assumed that the stationary objects could
vary in color and intensity over time. Scenarios where these techniques apply tend to be
very diverse. There can be highly variable sequences, such as images with very different
lighting, interiors, exteriors,
quality, and noise. In addition to processing in real time, systems need to be able to adapt to
these changes the implement the techniques to extract the foreground from background image
using Binarization approach to assign the values to background and foreground. Foreground
pixels are identified in real time environments

REGION OF FINGER DETECTION

Segmentation refers to the process of partitioning a digital image into multiple


segments. In other words, grouping of pixels into different groups is known as Segmentation.
More precisely, image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual characteristics The division of
an image into meaningful structures, image segmentation, is often an essential step in image
analysis, object representation, visualization, and many other image processing tasks. But
segmentation of a satellite image into differently textured regions (groups) is a difficult
problem. One does not know a priori what types of textures exist in a satellite image, how
many textures there are, and what regions have certain textures. The monitoring task can be
performed by unsupervised segmentation and supervised segmentation techniques. A region
of interest (ROI) is a subset of an image or a dataset identified for a particular purpose. In
other words, region of interest (ROI) can be defined as a portion of an image which is
needed to be filtered or to be performed some other operation on.

CLASSIFICATION OF FINGER GESTURES

Artificial Neural Networks (ANN) can learn and therefore can be trained to recognize
patterns, find solutions, forecast future events and classify data. CNN is well documented to
be used for traffic related tasks. Neural Networks learning and behavior is dependent on the
way its individual computing elements are connected and by the strengths of these
connections or weights. These weights can be adjusted automatically by training the
network according to a specified learning rule until it performs the desired task correctly.
CNN is a supervised learning method i.e. a machine learning algorithm that uses known
dataset also known as training dataset. These known parameters help CNN to make
predictions. Input data along with their response values are the fundamental components of a
training dataset. In order to have higher predictive power and the ability to generalize for
several new datasets, the best way is to use larger training datasets. The fingers can be
classified by using convolutional neural network algorithm. CNN is a common method of
training artificial neural networks so as to minimize the objective function. It is a supervised
learning method, and is a generalization of the delta
rule. It requires a dataset of the desired output for many inputs, making up the training set. It
is most useful for feed-forward networks (networks that have no feedback, or simply, that
have no connections that loop).

SIGN RECOGNITION

Sign Language is a well-structured code gesture, every gesture has meaning assigned to
it. Sign Language is the only means of communication for deaf people. With the
advancement of science and technology many techniques have been developed not only
to minimize the problem of deaf people but also to implement it in different fields. From
the classification of sign features, label the signs with improved accuracy rate. It will display
the Alphabet Letters.

FLOW DIGRAM

LEVEL 0

Using system camera,user’s hand is detected and gesture is recogonised.This image is


compared with the trained data and gesture is recogonised based on these trained data.

Figure 6.1 Level 0 DFD

LEVEL-1

Symbol showed by the user is converted to binary values .This binary values are
compared with the alphabets and corresponding alphabets will be displayed.

Figure 6.2 Level 1 DFD


LEVEL-2

Gestures are detected using finger regions with the help of deeplearning.And each signs are
labelled accordingly.

Figure 6.3 Level 2 DFD


SYSTEM ARCHITECTURE

A system architecture or systems architecture is the conceptual model that defines the
structure, behavior, and more views of a system. An architecture description is a formal
description and representation of a system, organized in a way that supports reasoning about
the structures and behaviors of the system. System architecture can comprise system
components, the externally visible properties of those components, the relationships (e.g. the
behavior) between them. It can provide a plan from which products can be procured, and
systems developed, that will work together to implement the overall system. There have been
efforts to formalize languages to describe system architecture; collectively these are called
architecture description languages (ADLs).

Figure 6.4 System Architecture


CHAPTER 7

SYSTEM IMPLEMENTATION

7.1 IMPLEMENTATION

In sign language recognition system that detects a variety of gestures by recording


video and converting it into independent sign language labels. Hand pixels are then
classified and matched to an image obtained and sent to be compared with a trained model.
So, this system is very strong in finding specific character labels. The Proposed System
consists of Collaborative Communication which allows users to communicate properly
due to language or speech barriers, the proposed system also consists of Embedded Voice
Module with a User-Friendly Interface. This system can be used by both verbal speakers
and sign language users for communication, which is the biggest advantage of this
proposed system. The proposed system works on Python with CNN Algorithm which works
with images. CNN (Convolutional Neural Network) is a deep learning algorithm that is
primarily used for image processing and computer vision tasks. The algorithm is based on a
type of neural network architecture that has several convolutional layers, which are
responsible for extracting meaningful features from the input image.
CHAPTER 8

CONCLUSION AND FUTURE ENHANCEMENT

8.1 CONCLUSION

The ability to look, listen, talk, and respond appropriately to events is one of the most
valuable gifts a human being can have. However, some unfortunate people are denied this
opportunity. People get to know one another through sharing their ideas, thoughts, and
experiences with others around them. There are several ways to accomplish this, the best of
which is the gift of "Speech." Everyone can very persuasively transfer their thoughts and
comprehend each other through speech. Our initiative intends to close the gap by including a
low-cost computer into the communication chain, allowing sign language to be captured,
recognised, and translated into speech for the benefit of blind individuals. An image
processing technique is employed in this project to recognise the handmade movements. This
application is used to present a modern integrated planned system for hear impaired people.
The camera- based zone of interest can aid in the user's data collection. Each action will be
significant in its own right.

FUTURE ENHANCEMENT

Despite it having average accuracy, our system is still well-matched with the existing
systems, given that it can perform recognition at the given accuracy with larger vocabularies
and without an aid such as gloves or hand markings. In future, we can extend the framework
to implement various deep learning algorithms to recognize the signs and implement in real
time applications. In future Streamline Speed can be increased to get Sign input and display it
in a Sentences.
CHAPTER 9

APPENDICES

SOURCE CODE

importnumpy as np

import cv2 as cv

defcalc_landmark_list(image, landmarks):

image_width, image_height = image.shape[1], image.shape[0]

landmark_point = []

for _, landmark in enumerate(landmarks.landmark):

landmark_x = min(int(landmark.x * image_width), image_width - 1)

landmark_y = min(int(landmark.y * image_height), image_height - 1)

landmark_point.append([landmark_x, landmark_y])

returnlandmark_point

import cv2 as cv

from PIL import Image, ImageDraw, ImageFont

defdraw_landmarks(image, landmark_point):

iflen(landmark_point) > 0:

cv.line(image, tuple(landmark_point[2]), tuple(landmark_point[3]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[2]), tuple(landmark_point[3]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[3]), tuple(landmark_point[4]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[3]), tuple(landmark_point[4]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[5]), tuple(landmark_point[6]),

(0, 0, 0), 6)
cv.line(image, tuple(landmark_point[5]), tuple(landmark_point[6]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[6]), tuple(landmark_point[7]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[6]), tuple(landmark_point[7]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[7]), tuple(landmark_point[8]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[7]), tuple(landmark_point[8]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[9]), tuple(landmark_point[10]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[9]), tuple(landmark_point[10]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[10]), tuple(landmark_point[11]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[10]), tuple(landmark_point[11]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[11]), tuple(landmark_point[12]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[11]), tuple(landmark_point[12]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[13]), tuple(landmark_point[14]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[13]), tuple(landmark_point[14]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[14]), tuple(landmark_point[15]),


(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[14]), tuple(landmark_point[15]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[15]), tuple(landmark_point[16]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[15]), tuple(landmark_point[16]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[17]), tuple(landmark_point[18]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[17]), tuple(landmark_point[18]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[18]), tuple(landmark_point[19]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[18]), tuple(landmark_point[19]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[19]), tuple(landmark_point[20]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[19]), tuple(landmark_point[20]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[0]), tuple(landmark_point[1]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[0]), tuple(landmark_point[1]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[1]), tuple(landmark_point[2]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[1]), tuple(landmark_point[2]),

(255, 255, 255), 2)


cv.line(image, tuple(landmark_point[2]), tuple(landmark_point[5]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[2]), tuple(landmark_point[5]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[5]), tuple(landmark_point[9]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[5]), tuple(landmark_point[9]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[9]), tuple(landmark_point[13]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[9]), tuple(landmark_point[13]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[13]), tuple(landmark_point[17]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[13]), tuple(landmark_point[17]),

(255, 255, 255), 2)

cv.line(image, tuple(landmark_point[17]), tuple(landmark_point[0]),

(0, 0, 0), 6)

cv.line(image, tuple(landmark_point[17]), tuple(landmark_point[0]),

(255, 255, 255), 2)

for index, landmark in enumerate(landmark_point):

if index == 0:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 1:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),


-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 2:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 3:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 4:

cv.circle(image, (landmark[0], landmark[1]), 8, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 8, (0, 0, 0), 1)

if index == 5:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 6:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 7:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 8:
cv.circle(image, (landmark[0], landmark[1]), 8, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 8, (0, 0, 0), 1)

if index == 9:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 10:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 11:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 12:

cv.circle(image, (landmark[0], landmark[1]), 8, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 8, (0, 0, 0), 1)

if index == 13:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 14:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)


if index == 15:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 16:

cv.circle(image, (landmark[0], landmark[1]), 8, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 8, (0, 0, 0), 1)

if index == 17:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 18:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 19:

cv.circle(image, (landmark[0], landmark[1]), 5, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 5, (0, 0, 0), 1)

if index == 20:

cv.circle(image, (landmark[0], landmark[1]), 8, (255, 255, 255),

-1)

cv.circle(image, (landmark[0], landmark[1]), 8, (0, 0, 0), 1)

return image

defdraw_info_text(image, handedness, hand_sign_text):

info_text = handedness.classification[0].label[0:]
if hand_sign_text != "":

info_text = "Predicted Text" + ':' + hand_sign_text

cv.putText(image, info_text, (10, 60), cv.FONT_HERSHEY_SIMPLEX, 1.0, (196, 255,


255), 2, cv.LINE_AA)

#cv.putText(image, info_text, (10, 60), font, 1.0, (196, 255, 255), 2, cv.LINE_AA)

return image

import copy

importitertools

defpre_process_landmark(landmark_list):

temp_landmark_list = copy.deepcopy(landmark_list)

base_x, base_y = 0, 0

for index, landmark_point in enumerate(temp_landmark_list):

if index == 0:

base_x, base_y = landmark_point[0], landmark_point[1]

temp_landmark_list[index][0] = temp_landmark_list[index][0] - base_x

temp_landmark_list[index][1] = temp_landmark_list[index][1] - base_y

temp_landmark_list = list(

itertools.chain.from_iterable(temp_landmark_list))

max_value = max(list(map(abs, temp_landmark_list)))

def normalize_(n):

return n / max_value

temp_landmark_list = list(map(normalize_, temp_landmark_list))

returntemp_landmark_list

importcsv
deflogging_csv(number, mode, landmark_list):

if mode == 1 and (0 <= number <= 9):

csv_path = 'model/keypoint_classifier/keypoint.csv'

with open(csv_path, 'a', newline="") as f:

writer = csv.writer(f)

writer.writerow([number, *landmark_list])

return

importnumpy as np

importtensorflow as tf

classKeyPointClassifier(object):

def init (

self,

model_path='model/keypoint_classifier/keypoint_classifier.tflite',

num_threads=1,

):

self.interpreter = tf.lite.Interpreter(model_path=model_path,

num_threads=num_threads)

self.interpreter.allocate_tensors()

self.input_details = self.interpreter.get_input_details()

self.output_details = self.interpreter.get_output_details()

def call (

self,

landmark_list,

):

input_details_tensor_index = self.input_details[0]['index']

self.interpreter.set_tensor(
input_details_tensor_index,

np.array([landmark_list], dtype=np.float32))

self.interpreter.invoke()

output_details_tensor_index = self.output_details[0]['index']

result = self.interpreter.get_tensor(output_details_tensor_index)

result_index = np.argmax(np.squeeze(result))

returnresult_index

importmediapipe as mp

import cv2

importnumpy as np

importuuid

importos

'''import subprocess as sp

programName = "notepad.exe"

#fileName = "sms.txt"

#sp.Popen([programName, fileName])

sp.Popen([programName])'''

mp_drawing = mp.solutions.drawing_utils

mp_hands = mp.solutions.hands

cap = cv2.VideoCapture(0)
withmp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5) as
hands:

whilecap.isOpened():

ret, frame = cap.read()

# BGR 2 RGB

image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

# Flip on horizontal

image = cv2.flip(image, 1)

# Set flag

image.flags.writeable = False

# Detections

results = hands.process(image)

# Set flag to true

image.flags.writeable = True

# RGB 2 BGR

image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

# print(results)

# Rendering results

ifresults.multi_hand_landmarks:

fornum, hand in enumerate(results.multi_hand_landmarks):

mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS,


mp_drawing.DrawingSpec(color=(14, 22, 76), thickness=2, circle_radius=4),

mp_drawing.DrawingSpec(color=(24, 44, 250), thickness=2, circle_radius=2),

cv2.imshow('Hand Tracking', image)

if cv2.waitKey(10) & 0xFF == ord('q'):

break

cap.release()

cv2.destroyAllWindows()

importcsv

importnumpy as np

importtensorflow as tf

fromsklearn.model_selection import train_test_split

RANDOM_SEED = 42

dataset = 'model/keypoint_classifier/keypoint.csv'

model_save_path = 'keypoint_classifier_new.h5'

NUM_CLASSES = 26

X_dataset = np.loadtxt(dataset, delimiter=',', dtype='float32', usecols=list(range(1, (21 * 2) +


1)))

y_dataset = np.loadtxt(dataset, delimiter=',', dtype='int32', usecols=(0))

print(len(X_dataset))

print(len(y_dataset))

print(y_dataset)

print(X_dataset.shape)

train_ratio = 0.80

test_ratio = 0.20

X_train, X_test, y_train, y_test = train_test_split(X_dataset, y_dataset, test_size=1-


train_ratio, random_state=RANDOM_SEED)

model = tf.keras.models.Sequential([
tf.keras.layers.Input((21 * 2, )),

tf.keras.layers.Dropout(0.2),

tf.keras.layers.Dense(20, activation='relu'),

tf.keras.layers.Dropout(0.4),

tf.keras.layers.Dense(10, activation='relu'),

tf.keras.layers.Dense(NUM_CLASSES, activation='softmax')

])

model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy']
)

cp_callback = tf.keras.callbacks.ModelCheckpoint(model_save_path, verbose=1,


save_weights_only=False)

es_callback = tf.keras.callbacks.EarlyStopping(patience=20, verbose=1)

model.summary()

hist=model.fit(X_train,y_train,epochs=500,batch_size=128,validation_data=(X_test,
y_test),callbacks=[cp_callback, es_callback])

importmatplotlib.pyplot as plt

# val_loss, val_acc = model.evaluate(X_test, y_test, batch_size=128)

scores = model.evaluate(X_test,y_test, verbose=0)

#print("CNN Error: %.2f%%" % (100 - scores[1] * 100))

model.save(model_save_path,include_optimizer=False)

plt.plot(hist.history['accuracy'])

plt.plot(hist.history['val_accuracy'])

plt.title('model accuracy')

plt.ylabel('accuracy')

plt.xlabel('epoch')

plt.legend(['train', 'test'], loc='upper left')

plt.show()

# summarize history for loss


plt.plot(hist.history['loss'])

plt.plot(hist.history['val_loss'])

plt.title('model loss')

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train', 'test'], loc='upper left')

plt.show()

fromkeras.models import load_model

fromsklearn.metrics import classification_report, confusion_matrix

importnumpy as np

import time

importmatplotlib.pyplot as plt

defplot_confusion_matrix(cm,

target_names,

title='Confusion matrix',

cmap=None,

normalize=True):

importitertools

accuracy = np.trace(cm) / float(np.sum(cm))

misclass = 1 - accuracy

ifcmap is None:

cmap = plt.get_cmap('Blues')

plt.figure(figsize=(20, 20))

plt.imshow(cm, interpolation='nearest', cmap=cmap)

plt.title(title)

plt.colorbar()

iftarget_names is not None:


tick_marks = np.arange(len(target_names))

plt.xticks(tick_marks, target_names, rotation=45)

plt.yticks(tick_marks, target_names)

if normalize:

cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

thresh = cm.max() / 1.5 if normalize else cm.max() / 2

fori, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):

if normalize:

plt.text(j, i, "{:0.4f}".format(cm[i, j]),

horizontalalignment="center",

color="white" if cm[i, j] > thresh else "black")

else:

plt.text(j, i, "{:,}".format(cm[i, j]),

horizontalalignment="center",

color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()

plt.ylabel('True label')

plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))

plt.savefig('model/keypoint_classifier/confusion_matrix.png')

model = load_model('model/keypoint_classifier/keypoint_classifier_new.h5')

pred_labels=[]

start_time = time.time()

pred_probabs = model.predict(X_test)

end_time = time.time()

pred_time = end_time - start_time

avg_pred_time = pred_time / X_test.shape[0]

print('Average prediction time: %fs' % (avg_pred_time))


forpred_probab in pred_probabs:

pred_labels.append(list(pred_probab).index(max(pred_probab)))

cm = confusion_matrix(y_test, np.array(pred_labels))

classification_report = classification_report(y_test, np.array(pred_labels)) print('\n\

nClassification Report')

print('---------------------------')

print(classification_report)

plot_confusion_matrix(cm, range(26), normalize=False)

importcsv

import copy

import cv2 as cv

importmediapipe as mp

from model import KeyPointClassifier

fromapp_files import calc_landmark_list, draw_info_text, draw_landmarks, get_args,


pre_process_landmark

from PIL import Image, ImageDraw, ImageFont

importnumpy as np

def main():

args = get_args()

cap_device = args.device

cap_width = args.width

cap_height = args.height

use_static_image_mode = args.use_static_image_mode

min_detection_confidence = args.min_detection_confidence

min_tracking_confidence = args.min_tracking_confidence

cap = cv.VideoCapture(cap_device)

cap.set(cv.CAP_PROP_FRAME_WIDTH, cap_width)
cap.set(cv.CAP_PROP_FRAME_HEIGHT, cap_height)

mp_hands = mp.solutions.hands

hands = mp_hands.Hands(

static_image_mode=use_static_image_mode,

max_num_hands=1,

min_detection_confidence=min_detection_confidence,

min_tracking_confidence=min_tracking_confidence,

keypoint_classifier = KeyPointClassifier()

with open('model/keypoint_classifier/keypoint_classifier_label.csv', encoding='utf-8-sig') as


f:

keypoint_classifier_labels = csv.reader(f)

keypoint_classifier_labels = [

row[0] for row in keypoint_classifier_labels

while True:

key = cv.waitKey(10)

if key == 27: # ESC

break

ret, image = cap.read()

if not ret:

break

image = cv.flip(image, 1)

debug_image = copy.deepcopy(image)

# print(debug_image.shape)

# cv.imshow("debug_image",debug_image)

image = cv.cvtColor(image, cv.COLOR_BGR2RGB)


image.flags.writeable = False

results = hands.process(image)

image.flags.writeable = True

ifresults.multi_hand_landmarks is not None:

forhand_landmarks, handedness in zip(results.multi_hand_landmarks,


results.multi_handedness):

landmark_list = calc_landmark_list(debug_image, hand_landmarks)

#print(hand_landmarks)

pre_processed_landmark_list = pre_process_landmark(landmark_list)

hand_sign_id = keypoint_classifier(pre_processed_landmark_list)

debug_image = draw_landmarks(debug_image, landmark_list)

debug_image = draw_info_text(

debug_image,

handedness,

keypoint_classifier_labels[hand_sign_id])

cv.imshow('Hand Gesture Recognition', debug_image)

cap.release()

cv.destroyAllWindows()

if name == ' main ':


SCREENSHOTS

Figure 9.1Coding

Figure 9.2 HAND IMAGE ACQUISITION


Figure 9.3 BINARIZATION

Figure 9.4 SIGN RECOGNITION


Figure 9.5 Worde Recognition
CHAPTER 10

REFERENCES

[1] Arpita Halder, Real-time Vernacular Sign Language Recognition using MediaPipe and
Machine Learning, 2021

[2] Hamzah Luqman, An Efficient Two-Stream Network for Isolated Sign Language
Recognition Using Accumulative Video Motion, 2022

[3] Hamzah Luqman, A comprehensive survey and taxonomy of sign language research, 2022

[4] Ilias Papastratis, Continuous Sign Language Recognition through a Context-Aware


Generative Adversarial Network, 2021

[5] Kil-Houm Park, An integrated mediapipe-optimized GRU model for Indian sign language
recognition, 2022

[6] Noha Sarhan, Multi-phase Fine-Tuning: A New Fine-Tuning Approach for Sign
Language Recognition, 2022

[7] Prabu P, ML Based Sign Language Recognition System, 2021

[8] Rahaf Abdulaziz Alawwad, Arabic Sign Language Recognition using Faster R-CNN, 2021

[9] Shagun Katoch, Indian Sign Language recognition system using SURF with SVM and
CNN, 2022

[10] Shikhar Sharma, ASL-3DCNN: American Sign Language recognition technique using 3-
D convolutional neural networks, 2021

You might also like