ANN Assignment

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

ASSIGNMENT NO.

1
SUBJECT
FUZZY AND NEURAL NETWORK
BRANCH
COMMUNICATION SYSTEMS(ECE)

SUBMITTED TO:
Er. SHILPA JASWAL
ASSISTANT PROFESSOR

SUBMITTED BY:
ARUN KUMAR
M.TECH., 3rd SEM.

Roll.No: H10449
SCHOOL OF ENGINEERING AND TECHNOLOGY
CAREER POINT UNIVERSITY HAMIRPUR

Q.1:-How Neural Networks can be used in Pattern Recognition?


Ans:- In machine learning and cognitive science, artificial neural networks (ANNs)
are a family of statistical learning models inspired by biological neural
networks (the central nervous systems of animals, in particular the brain) and are
used to estimate or approximate functions that can depend on a large number
of inputs and are generally unknown. Artificial neural networks are generally
presented as systems of interconnected "neurons" which exchange messages
between each other. The connections have numeric weights that can be tuned
based on experience, making neural nets adaptive to inputs and capable of
learning. For example, a neural network for handwriting recognition is defined by
a set of input neurons which may be activated by the pixels of an input image. After
being weighted and transformed by a function (determined by the network's
designer), the activations of these neurons are then passed on to other neurons.
This process is repeated until finally, an output neuron is activated. This determines
which character was read. Like other machine learning methods - systems that learn
from data - neural networks have been used to solve a wide variety of tasks that are
hard to solve using ordinary rule-based programming, including computer
vision and speech recognition.

Neural network is a nonlinear mapping system whose structure is loosely based on principles of
the real brain. The whole network is build up with simple processing units, structures of those
can be seen. The unit is a simplified model of a real neuron. Its parts are the input vector x whose
containing information is manipulated by the weighted nodes by weigh vector w. The node with
weight and input 1 is a so called bias of the neuron which gives the freedom to the neuron to be
able to shift the function f(). Function f() is usually a nonlinear function and is mostly called
activation function. Input to this function f() is a sum of all the nodes and is denoted as u. There
are many different types of the neural networks such as perceptron, back-propagation network,
counter propagation network, Hopfield networks, etc. Pattern recognition is a branch of machine
learning that focuses on the recognition of patterns and regularities in data, although is in some
cases considered to be nearly synonymous with machine learning. Pattern recognition systems
are in many cases trained from labeled "training" data (supervised learning), but when no labeled
data are available other algorithms can be used to discover previously unknown patterns
(unsupervised learning).
Recurrent neural network (RNN) is a class of artificial neural network where connections
between units form a directed cycle. This creates an internal state of the network which allows it
to exhibit dynamic temporal behavior. Unlike feed-forward neural networks, RNNs can use their
internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks
such as un-segmented connected handwriting recognition, where they have achieved the best
known results.
Support vector machines (SVMs, also support vector networks) are supervised learning models
with associated learning algorithms that analyze data and recognize patterns, used
for classification and regression analysis. Given a set of training examples, each marked as
belonging to one of two categories, an SVM training algorithm builds a model that assigns new
examples into one category or the other, making it a non-probabilistic binary linear classifier. An
SVM model is a representation of the examples as points in space, mapped so that the examples
of the separate categories are divided by a clear gap that is as wide as possible. New examples

are then mapped into that same space and predicted to belong to a category based on which side
of the gap they fall on.
Back-propagation Network (BPN)
What is making the neural networks interesting is that the determination of its weights w is done
by learning. A back-propagation is one of many different learning algorithms that can be applied
for neural network training and has been used in this thesis. It belongs to a category of so called
learning with the teacher. For every input vector x that is presented to the neural network there is
predefined desired response of the network in a vector t (the teacher). The desired output of the
neural network is then compared with the real output by computing an error e of vector t and
neural network output vector y. The correction of the weights in the neural network is done by
propagating the error e backward from the output layer towards the input layer, therefore the
name of the algorithm. The weight change in each Layer is done with steepest descent algorithm.
The back-propagation algorithm is carried out in the following steps:
1. Select a training pair from the training set; apply the input vector to the network input.
2. Calculate the output of the network.
3. Calculate the error between the network output and the desired output (the target vector from
the training pair)
4. Adjust the weights of the network in a way that minimizes the error.
5. Repeat the steps 1 through 4 for each vector in the training set until the error for the entire set
is acceptably low.
Neural network model
Neural networks can be viewed as a parallel computing systems consisting of an extremely large
number of simple processors with many interconnections between them. Typically, a neural
network or to be more specific, an artificial neural network (ANN) is a self-adaptive trainable
process that is able to learn and resolve complex problems based on available knowledge. An
ANN-based system behaves in the same manner as how the biological brain works; it is
composed of interconnected processing elements that simulate neurons. Using this
interconnection, each neuron can pass information to another. Artificial Neural network models
attempt to use some organizational principles such as learning, generalization, adaptively, fault
tolerance and distributed representation, and computation in the network of weighted directed
graphs in which the artificial neurons forms the nodes of the model and the directed edges (with
weights) are connections between neuron outputs and neuron inputs. The weights applied to the
connections results from the learning process and indicate the importance of the contribution of
the preceding neuron in the information being passed to the following neuron. The main
characteristics of all the neural networks are that they possess the ability to learn complex
nonlinear input-output relationships, use sequential training procedures, and adapt themselves to
the data. The following diagram is a two layer neural network with one input layer constituting
of three neurons and one output layer with two neurons and corresponding weights are assigned
in between them.

Learning paradigms: The basics of Pattern recognition with ANN approach


There are three major learning paradigms, each corresponding to a particular abstract learning
task. These are supervised learning, unsupervised learning and reinforcement learning.
Supervised learning
In supervised learning, we are given a set of example pairs (x, y), x belongs to X, y belongs to Y
and the aim is to find a function f: X-->Y in the allowed class of functions that matches the
examples. In other words, we wish to infer the mapping implied by the data; the cost function is
related to the mismatch between our mapping and the data and it implicitly contains prior
knowledge about the problem domain. A commonly used cost is the mean-squared error, which
tries to minimize the average squared error between the network's output, f(x), and the target
value y over all the example pairs. When one tries to minimize this cost using gradient
descent for the class of neural networks called multilayer perceptrons, one obtains the common
and well-known back propagation algorithm for training neural networks. Tasks that fall within
the paradigm of supervised learning are pattern recognition (also known as classification)
and regression (also known as function approximation). The supervised learning paradigm is also
applicable to sequential data (e.g., for speech and gesture recognition). This can be thought of as
learning with a "teacher", in the form of a function that provides continuous feedback on the
quality of solutions obtained thus far. Supervised learning is a process of allotting a function to
some desired category as learnt from supervised training data. Here the training data consist of a
set of training examples where each set consist of a pair consisting of an input object and a
desired output value. A supervised learning algorithm learns from this training pair relationship
and produces an inferred function. In simple terms, in supervised learning, there is a teacher who
provides a category label or cost for each pattern in the training set which is used as a classifier.
So basically a supervised learning method is used for classification purpose. In the given figure,
the input image consists of a mixture of two alphabets, i.e., A and B. Then the classification
algorithm classifies the input to two different categories. Here a set of combined input is
classified using supervised learning approach
Unsupervised learning
In unsupervised learning, some data x is given and the cost function to be minimized, that can be
any function of the data x and the network's output, f .The cost function is dependent on the
task (what we are trying to model) and our a priori assumptions (the implicit properties of our
model, its parameters and the observed variables).As a trivial example, consider the model
f(x)=a where a is a constant and the cost C=E [(x f(x))^2] . Minimizing this cost will
give us a value of a that is equal to the mean of the data. The cost function can be much more
complicated. Its form depends on the application: for example, in compression it could be related
to the mutual information between x and f(x), whereas in statistical modeling, it could be related

to the posterior probability of the model given the data (note that in both of those examples those
quantities would be maximized rather than minimized).Tasks that fall within the paradigm of
unsupervised learning are in general estimation problems; the applications include clustering, the
estimation of statistical distributions, compression and filtering. Un-supervised learning can be
defined as the problem of trying to find out the hidden structure in an unlabeled data set. Since
the examples given to the learner are unlabeled, each algorithm itself classifies the test set. In
simple terms, here no labelled training sets are provided and the system applies a specified
clustering or grouping to the unlabeled datasets based on some similarity criteria. So an
unsupervised learning method is used for clustering. Here the input consists of some unlabeled
values whose distinguishing feature is initially not known. The following input consists of such a
combination with all values technically same but still its clusters are formed using some metric
which is different for each algorithm. Neural Networks have wide range of applications from
Biomedical engineering to Artificial intelligence, Pattern recognition, Image processing.
Traditional techniques from statistical pattern recognition like the Bayesian discriminant and the
Parzen windows were popular until the beginning of the 1990s. Since then, neural networks
(ANNs) have increasingly been used as an alternative to classic pattern classifiers and clustering
techniques. Non-parametric feed-forward ANNs quickly turned out to be attractive trainable
machines for feature-based segmentation and object recognition. When no gold standard is
available, the self-organizing feature map (SOM) is an interesting alternative to supervised
techniques. It may learn to discriminate, e.g., different textures when provided with powerful
features. The current use of ANNs in image processing exceeds the aforementioned traditional
applications. The role of feed-forward ANNs and SOMs has been extended to encompass also
low-level image processing tasks such as noise suppression and image enhancement. Hopfield
ANNs were introduced as a tool for finding satisfactory solutions to complex (NP-complete)
optimization problems. This makes them an interesting alternative to traditional optimization
algorithms for image processing tasks that can be formulated as optimization problems. The
different problems addressed in the field of digital image processing can be organized into what
we have\chosen to call the image processing chain. We make the following distinction between
steps in the Image processing.
Reinforcement learning
In reinforcement learning, data x are usually not given, but generated by an agent' interactions
with the environment. At each point in time t, the agent performs an action yt and the
environment generates an observation xt and an instantaneous cost ct, according to some (usually
unknown) dynamics. The aim is to discover a policy for selecting actions that minimizes some
measure of a long-term cost, e.g., the expected cumulative cost. The environment's dynamics and
the long-term cost for each policy are usually unknown, but can be estimated. More formally the
environment is modeled as a Markov decision process (MDP) with states s1, ; sn belongs to S
and actions a1 , ; am belongs to A with the following probability distributions: the
instantaneous cost distribution P(xt|st), the observation distribution P(ct|st) and the
transition P(st+1|st,at),while a policy is defined as the conditional distribution over actions
given the observations. Taken together, the two then define a Markov chain (MC). The aim is to

discover the policy (i.e., the MC) that minimizes the cost. ANNs are frequently used in
reinforcement learning as part of the overall algorithm. Dynamic programming has been coupled
with ANNs (Neuro dynamic programming) by Bertsekas and Tsitsiklis and applied to multidimensional nonlinear problems such as those involved in vehicle routing, natural resources
management or medicine because of the ability of ANNs to mitigate losses of accuracy even
when reducing the discretization grid density for numerically approximating the solution of the
original control problems. Tasks that fall within the paradigm of reinforcement learning are
control problems, games and other sequential decision making tasks.
How Artificial neural networks are useful for pattern matching applications.
Pattern matching consists of the ability to identify the class of input signals or patterns. Pattern
matching ANN are typically trained using supervised learning techniques. One application where
artificial neural nets have been applied extensively is optical character recognition (OCR). OCR
has been a very successful area of research involving artificial neural networks. An example of a
pattern matching neural network is that used by VISA for identifying suspicious transactions and
fraudulent purchases. When input symbols do not match an accepted pattern, the system raises a
warning flag that indicates a potential problem. Pattern recognition is a branch of machine
learning that focuses on the recognition of patterns and regularities in data, although it is in some
cases considered to be nearly synonymous with machine learning. Pattern recognition systems
are in many cases trained from labeled "training" data (supervised learning), but when no labeled
data are available other algorithms can be used to discover previously unknown patterns
(unsupervised
learning).The
terms
pattern
recognition,
machine
learning, data
mining and knowledge discovery in databases (KDD) are hard to separate, as they largely
overlap in their scope. Machine learning is the common term for supervised learning methods
and originates from artificial intelligence, whereas KDD and data mining have a larger focus on
unsupervised methods and stronger connection to business use. Pattern recognition has its origins
in engineering, and the term is popular in the context of computer vision: a leading computer
vision conference is named Conference on Computer Vision and Pattern Recognition. In pattern
recognition, there may be a higher interest to formalize, explain and visualize the pattern;
whereas machine learning traditionally focuses on maximizing the recognition rates. Yet, all of
these domains have evolved substantially from their roots in artificial intelligence, engineering
and statistics; and have become increasingly similar by integrating developments and ideas from
each other. In machine learning, pattern recognition is the assignment of a label to a given input
value. In statistics, discriminant analysis was introduced for this same purpose in 1936. An
example of pattern recognition is classification, which attempts to assign each input value to one
of a given set of classes (for example, determine whether a given email is "spam" or "nonspam"). However, pattern recognition is a more general problem that encompasses other types of

output as well. Other examples are regression, which assigns a real-valued output to each
input; sequence labeling, which assigns a class to each member of a sequence of values (for
example, part of speech tagging, which assigns a part of speech to each word in an input
sentence); and parsing, which assigns aparse tree to an input sentence, describing the syntactic
structure of the sentence. Pattern recognition algorithms generally aim to provide a reasonable
answer for all possible inputs and to perform "most likely" matching of the inputs, taking into
account their statistical variation. This is opposed to pattern matching algorithms, which look for
exact matches in the input with pre-existing patterns. A common example of a pattern-matching
algorithm is regular expression matching, which looks for patterns of a given sort in textual data
and is included in the search capabilities of many text editors and word processors. In contrast to
pattern recognition, pattern matching is generally not considered a type of machine learning,
although pattern-matching algorithms (especially with fairly general, carefully tailored patterns)
can sometimes succeed in providing similar-quality output of the sort provided by patternrecognition algorithms.
Difference between supervised and unsupervised learning in case of Pattern Recognition
Pattern recognition is generally categorized according to the type of learning procedure used to
generate the output value. Supervised learning assumes that a set of training data (the training
set) has been provided, consisting of a set of instances that have been properly labeled by hand
with the correct output. A learning procedure then generates a model that attempts to meet two
sometimes conflicting objectives: Perform as well as possible on the training data, and generalize
as well as possible to new data (usually, this means being as simple as possible, for some
technical definition of "simple", in accordance with Occam's Razor, discussed
below). Unsupervised learning, on the other hand, assumes training data that has not been handlabeled, and attempts to find inherent patterns in the data that can then be used to determine the
correct output value for new data instances. [2] A combination of the two that has recently been
explored is semi-supervised learning, which uses a combination of labeled and unlabeled data
(typically a small set of labeled data combined with a large amount of unlabeled data). Note that
in cases of unsupervised learning, there may be no training data at all to speak of; in other words,
the data to be labeled is the training data.
Note that sometimes different terms are used to describe the corresponding supervised and
unsupervised learning procedures for the same type of output. For example, the unsupervised
equivalent of classification is normally known as clustering, based on the common perception of
the task as involving no training data to speak of, and of grouping the input data
into clusters based on some inherent similarity measure (e.g. the distance between instances,
considered as vectors in a multi-dimensional vector space), rather than assigning each input
instance into one of a set of pre-defined classes. Note also that in some fields, the terminology is
different: For example, in community ecology, the term "classification" is used to refer to what is
commonly known as "clustering".

The piece of input data for which an output value is generated is formally termed an instance.
The instance is formally described by a vector of features, which together constitute a description
of all known characteristics of the instance. (These feature vectors can be seen as defining points
in an appropriate multidimensional space, and methods for manipulating vectors in vector
spaces can be correspondingly applied to them, such as computing the dot product or the angle
between two vectors.) Typically, features are either categorical (also known as nominal, i.e.,
consisting of one of a set of unordered items, such as a gender of "male" or "female", or a blood
type of "A", "B", "AB" or "O"),ordinal (consisting of one of a set of ordered items, e.g., "large",
"medium" or "small"), integer-valued (e.g., a count of the number of occurrences of a particular
word in an email) or real-valued (e.g., a measurement of blood pressure). Often, categorical and
ordinal data are grouped together; likewise for integer-valued and real-valued data. Furthermore,
many algorithms work only in terms of categorical data and require that real-valued or integervalued data be discretized into groups (e.g., less than 5, between 5 and 10, or greater than 10).
PATTERN RECOGNITION ALGORITHMS
The field of pattern recognition has been explored widely by a number of
researchers who as a result have developed various algorithms. The design
pattern of all these algorithms consists of three basic elements, i.e., data
perception, feature extraction and classification. There are various different
techniques to implement these three basic elements. So which technique is
chosen for each element in design cycle defines the algorithm characteristic
of the pattern recognition algorithm. This is the design cycle of a basic
pattern recognition model. Algorithms for pattern recognition depend on the
type of label output, on whether learning is supervised or unsupervised
Probabilistic classifiers : An ANN-Pattern Recognizing Algo Approach
Many common pattern recognition algorithms are probabilistic in nature, in that they
use statistical inference to find the best label for a given instance. Unlike other algorithms, which
simply output a "best" label, often probabilistic algorithms also output a probability of the
instance being described by the given label. In addition, many probabilistic algorithms output a
list of the N-best labels with associated probabilities, for some value of N, instead of simply a
single best label. When the number of possible labels is fairly small (e.g., in the case
of classification), N may be set so that the probability of all possible labels is output.
Probabilistic algorithms have many advantages over non-probabilistic algorithms:

They output a confidence value associated with their choice. (Note that some other
algorithms may also output confidence values, but in general, only for probabilistic
algorithms is this value mathematically grounded in probability theory. Non-probabilistic
confidence values can in general not be given any specific meaning, and only used to
compare against other confidence values output by the same algorithm.)

Correspondingly, they can abstain when the confidence of choosing any particular output
is too low.

Because of the probabilities output, probabilistic pattern-recognition algorithms can be


more effectively incorporated into larger machine-learning tasks, in a way that partially or
completely avoids the problem of error propagation.

Feature selection Algo:A non-monotonous approach:


Feature selection algorithms attempt to directly prune out redundant or irrelevant features. A
general introduction to feature selection which summarizes approaches and challenges has been
given. The complexity of feature-selection is, because of its non-monotonous character,
an optimization problem where given a total of n features the power set consisting of all (2^n1) subsets of features need to be explored. The Branch-and-Bound algorithm does reduce this
complexity but is intractable for medium to large values of the number of available features n.
For a large-scale comparison of feature-selection algorithms see. Techniques to transform the
raw feature vectors (feature extraction) are sometimes used prior to application of the patternmatching algorithm. For example, feature extraction algorithms attempt to reduce a largedimensionality feature vector into a smaller-dimensionality vector that is easier to work with and
encodes less redundancy, using mathematical techniques such as principal components
analysis (PCA). The distinction between feature selection and feature extraction is that the
resulting features after feature extraction has taken place are of a different sort than the original
features and may not easily be interpretable, while the features left after feature selection are
simply a subset of the original features.

1. Preprocessing= Operations that give as a result a modified image with the same dimensions
as the original image (e.g., contrast enhancement and noise reduction).
2. Data reduction= feature extraction. Any operation that extracts significant components from
an image (window).The number of extracted features is generally smaller than the number of
pixels in the input window.

3. Segmentation= Any operation that partitions an image into regions that are coherent with
respect to some criterion. One example is the segregation of different textures.
4. Object detection and recognition= Determining the position and, possibly, also the
orientation and scale of speci1c objects in an image, and classifying these objects.
5. Image understanding= Obtaining high level (semantic) knowledge of what an image shows.
6. Optimization= Minimization of a criterion function which may be used for, e.g., graph
matching or object delineation. Optimization techniques are not seen as a separate step in the
image processing chain but as a set of auxiliary techniques, which support the other steps.
Besides the actual task performed by an algorithm, its processing capabilities are partly
determined by the abstraction level of the input data.

You might also like