Chapter4 Notes NEP

CHAPTER - 4
Learning
An agent is learning if it improves its performance on future tasks after making observations
about the world. In this chapter we will concentrate on one class of learning problem, which
seems restricted but actually has vast applicability: from a collection of input–output pairs, learn
a function that predicts the output for new inputs.
Why would we want an agent to learn?
If the design of the agent can be improved, why wouldn’t the designers just program in that
improvement to begin with?
There are three main reasons.
First, the designers cannot anticipate all possible situations that the agent might find itself in. For
example, a robot designed to navigate mazes must learn the layout of each new maze it
encounters.
Second, the designers cannot anticipate all changes over time; a program designed to predict
tomorrow’s stock market prices must learn to adapt when conditions change from boom to bust.
Third, sometimes human programmers have no idea how to program a solution themselves. For
example, most people are good at recognizing the faces of family members, but even the best
programmers are unable to program a computer to accomplish that task, except by using learning
algorithms.
Any component of an agent can be improved by learning from data. The improvements, and the
techniques used to make them, depend on four major factors:
Which component is to be improved.
What prior knowledge the agent already has.
What representation is used for the data and the component.
What feedback is available to learn from.
Components to be learned
The components of these agents include:
1. A direct mapping from conditions on the current state to actions.
2. A means to infer relevant properties of the world from the percept sequence.
3. Information about the way the world evolves and about the results of possible actions the
agent can take.
4. Utility information indicating the desirability of world states.
Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 1

CHAPTER - 4
5. Action-value information indicating the desirability of actions.

6. Goals that describe classes of states whose achievement maximizes the agent’s utility.
Each of these components can be learned. Consider, for example, an agent training to become
a taxi driver. Every time the instructor shouts ―Brake!‖ the agent might learn a condition– action
rule for when to brake (component 1); the agent also learns every time the instructor does not
shout. By seeing many camera images that it is told contain buses, it can learn to recognize them
(2). By trying actions and observing the results—for example, braking hard on a wet road—it
can learn the effects of its actions (3). Then, when it receives no tip from passengers who have
been thoroughly shaken up during the trip, it can learn a useful component of its overall utility
function (4).
We have seen several examples of representations for agent components: propositional and first-
order logical sentences for the components in a logical agent.
There is another way to look at the various types of learning. We say that learning a (possibly
incorrect) general function or rule from specific input–output pairs is called inductive learning.
We can also do analytical or deductive learning: going from a known general rule to a new
rule that is logically entailed, but is useful because it allows more efficient processing.
Feedback to learn from
There are three types of feedback that determine the three main types of learning:
In unsupervised learning the agent learns patterns in the input even though no explicit feedback
is supplied. The most common unsupervised learning task is clustering, association: detecting
potentially useful clusters of input examples. For example, a taxi agent might gradually develop
a concept of ―good traffic days‖ and ―bad traffic days‖ without ever being given labelled
examples of each by a teacher.
In Reinforcement learning the agent learns from a series of reinforcements—rewards or
punishments. For example, the lack of a tip at the end of the journey gives the taxi agent an
indication that it did something wrong. The two points for a win at the end of a chess game tells
the agent it did something right. It is up to the agent to decide which of the actions prior to the
reinforcement were most responsible for it.
In supervised learning the agent observes some example input–output pairs and learns a
function that maps from input to output. In component 1 above, the inputs are percepts and the

CHAPTER - 4
output are provided by a teacher who says ―Brake!‖ or ―Turn left.‖ In component 2, the inputs
are camera images and the outputs again come from a teacher who says ―that’s a bus.‖
In 3, the theory of braking is a function from states and braking actions to stopping distance in
feet. In this case the output value is available directly from the agent’s percepts (after the fact);
the environment is the teacher.
Supervised Learning:
Supervised learning is the types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output. The labelled data
means some input data is already tagged with the correct output.
Supervised learning is where you have input variables (x) and an output variable (Y) and you use
an algorithm to learn the mapping function from the input to the output.
Y = f(X)
The goal is to approximate the mapping function so well that when you have new input data (x)
that you can predict the output variables (Y) for that data.
Learning takes place in the presence of a supervisor or a teacher. A supervised learning
algorithm learns from labelled training data, helps you to predict outcomes for unforeseen data.

CHAPTER - 4
Right now, almost all learning is supervised. Your data has known labels as output. It involves a
supervisor that is more knowledgeable than the neural network itself
Supervised learning problems can be further grouped into regression and classification
problems.
Classification: A classification problem is when the output variable is a category, such as ―red‖
or ―blue‖ or ―disease‖ and ―no disease‖.
When the output y is one of a finite set of values (such as sunny, cloudy or rainy), the learning
problem is called classification, and is called Boolean or binary classification if there are only
two values.
Regression: A regression problem is when the output variable is a real value, such as ―dollars‖
or ―weight‖. Regression is a ML algorithm that can be trained to predict real numbered outputs;
like temperature, stock price, etc. Regression models are used to predict a continuous value.
Predicting prices of a house given the features of house like size, price etc is one of the common
examples of Regression. It is a supervised technique.
When y is a number (such as tomorrow’s temperature), the learning problem is called regression.
Why Supervised Learning?
 Supervised learning allows you to collect data or produce a data output from the previous
experience.
 Helps you to optimize performance criteria using experience.
 Supervised machine learning helps you to solve various types of real-world computation
problems.
How Supervised Learning works?
For example, you want to train a machine to help you predict how long it will take you to drive
home from your workplace. Here, you start by creating a set of labeled data. This data includes
 Weather conditions
 Time of the day
 Holidays
The output is the amount of time it took to drive back home on that specific day.
If it's raining outside, then it will take you longer to drive home. But the machine needs data and
statistics. This training set will contain the total travel time and corresponding factors like

CHAPTER - 4
weather, time, etc. Based on this training set, your machine might see there's a direct relationship
between the amount of rain and time you will take to get home
So, it ascertains that the more it rains, the longer you will be driving to get back to your home. It
might also see the connection between the time you leave work and the time you'll be on the
road.
The closer you're to 6 p.m. the longer time it takes for you to get home. Your machine may find
some of the relationships with your labeled data.
LEARNING DECISION TREES
Decision tree induction is one of the simplest and yet most successful forms of machine learning.
We first describe the representation (the hypothesis space) and then show how to learn a good
hypothesis.
A decision tree is a supervised learning algorithm that is used for classification and
regression modeling.
A decision tree is a flowchart-like structure used to make decisions or predictions.
It consists of nodes representing decisions or tests on attributes, branches representing the
outcome of these decisions, and leaf nodes representing final outcomes or predictions.
Each internal node corresponds to a test on an attribute, each branch corresponds to the
result of the test, and each leaf node corresponds to a class label or a continuous value.
Decision Tree Terminologies

Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.

CHAPTER - 4
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes
are called the child nodes.
The decision tree representation
Decision trees classify instances by sorting them down the tree from the root to some leaf node,
which provides the classification of the instance. Each node in the tree specifies a test of some
attribute of the instance, and each branch descending from that node corresponds to one of the
possible values for this attribute. An instance is classified by starting at the root node of the tree,
testing the attribute specified by this node, then moving down the tree branch corresponding to
the value of the attribute in the given example. This process is then repeated for the subtree
rooted at the new node.
Here the target attribute PlayTennis, which can have values yes or no for different Saturday
mornings, is to be predicted based on other attributes of the morning in question

CHAPTER - 4
FIGURE above shows A decision tree for the concept Play Tennis. An example is classified by
sorting it through the tree to the appropriate leaf node, then returning the classification
associated with this leaf (in this case, Yes or No). This tree classifies Saturday mornings
according to whether or not they are suitable for playing tennis.
The DECISION-TREE-LEARNING algorithm adopts a greedy divide-and-conquer strategy:
always test the most important attribute first. This test divides the problem up into smaller sub
problems that can then be solved recursively. By ―most important attribute,‖ i.e the one that
makes the most difference to the classification of an example. That way, we hope to get to the
correct classification with a small number of tests, meaning that all paths in the tree will be short
and the tree as a whole will be shallow.
The DECISION-TREE-LEARNING algorithm
How does the Decision Tree algorithm Work?
The process of creating a decision tree involves:
1. Selecting the Best Attribute: Using a metric like Gini impurity, entropy, or information gain,
the best attribute to split the data is selected.
2. Splitting the Dataset: The dataset is split into subsets based on the selected attribute.
3. Repeating the Process: The process is repeated recursively for each subset, creating a new
internal node or leaf node until a stopping criterion is met (e.g., all instances in a node belong to
the same class or a predefined depth is reached).
DECISION TREE ALGORITHM
Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).

CHAPTER - 4
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and
called the final node as a leaf node.
Attribute Selection Measures
While implementing a Decision tree, the main issue arises that how to select the best attribute for
the root node and for sub-nodes. So, to solve such problems there is a technique which is called
as Attribute selection measure or ASM. By this measurement, we can easily select the best
attribute for the nodes of the tree.
1. Entropy
2. Information gain
3. Gini index
1. Entropy
It is the measure of impurity (or) uncertainty in the data. It lies between 0 to 1 and is calculated
using the below formula.
N is the no.of classes. The lesser the entropy the better will be the model because the classes
would be split better because of less uncertainty.

CHAPTER - 4
2. Information gain: It is simply the measure of change in entropy. The higher the information
gain, the lower is the entropy. Thus for a model to be good, it should have high Information gain.
A decision tree algorithm in general tries to maximize the value of information gain, and an
attribute/feature having the highest information gain is split first.
Where ―before‖ is the dataset before the split, N is the number of subsets that got generated after
we split the node, and (i, after) is the subset 'i' after the split.
3. Gini Index
It is a measure of purity or impurity while creating a decision tree. It is calculated by subtracting
the sum of the squared probabilities of each class from one. It is the same as entropy but is
known to calculate quicker as compared to entropy. CART ( Classification and regression tree )
uses the Gini index as an attribute selection measure to select the best attribute/feature to split.
The attribute with a lower Gini index is used as the best attribute to split.
Example: Suppose there is a candidate who has a job offer and wants to decide whether he
should accept the offer or Not. So, to solve this problem, the decision tree starts with the root
node (Salary attribute by ASM). The root node splits further into the next decision node (distance
from the office) and one leaf node based on the corresponding labels. The next decision node
further gets split into one decision node (Cab facility) and one leaf node. Finally, the decision
node splits into two leaf nodes (Accepted offers and Declined offer). Consider the below
diagram:

CHAPTER - 4
Advantages of the Decision Tree

It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
It can be very useful for solving decision-related problems.
It helps to think about all the possible outcomes for a problem.
There is less requirement of data cleaning compared to other algorithms.
Disadvantages of the Decision Tree
The decision tree contains lots of layers, which makes it complex.
It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.
For more class labels, the computational complexity of the decision tree may increase.
Broadening and Applications of Decision Tree
Missing data: In many domains, not all the attribute values will be known for every example.
The values might have gone unrecorded, or they might be too expensive to obtain. This gives rise
to two problems: First, given a complete decision tree, how should one classify an example that
is missing one of the test attributes? Second, how should one modify the information-gain
formula when some examples have unknown values for the attribute?
Multivalued attributes: When an attribute has many possible values, the information gain
measure gives an inappropriate indication of the attribute’s usefulness.
Continuous and integer-valued input attributes: Continuous or integer-valued attributes such
as Height and Weight, have an infinite set of possible values. Rather than generate infinitely
many branches, decision-tree learning algorithms typically find the SPLIT POINT split point that
gives the highest information gain.
Continuous-valued output attributes: If we are trying to predict a numerical output value, such
as the price of an apartment, then we need a regression tree rather than a classification tree. A
regression tree has at each leaf a linear function of some subset of numerical attributes, rather
than a single value.
REGRESSION AND CLASSIFICATION WITH LINEAR MODELS
Regression is a statistical method that helps us to understand and predict the relationship
b/w variables.
It describes how one variable(dependent) changes another variable(independent)

CHAPTER - 4
Dependent variable is the one which is used to predict or explain Y

Independent variable is used to predict or explain changes in dependent variable(X)
Ex: Predicting resell value of vehicle based on its age>here Age is independent
variable(X) and resell value is dependent variable(Y)
In positive regression as X increases Y also increases
In negative regression as X increases Y decreases or vice versa.
LINEAR REGRESSION
Linear regression analysis is used to predict the value of a variable based on the value of another
variable.
The variable you want to predict is called the dependent variable. The variable you are using to
predict the other variable's value is called the independent variable.
Y=mX+b
m: slope of the line (i.e how much Y changes for a unit change in X)
b: intercept(the value of Y when X is 0)
Y: dependent variable
X: Independent Variable
Steps involved in linear regression
Step 1: Data Collection
Step 2: Calculations
Step 3: Prediction
Step 4: Visualization
Ex: Predicting the pizza price
Dataset:

CHAPTER - 4
m=sum of product of deviations/sum of square of deviation for X

=12/8=1.5
i.e., If there is 1 unit of change in X then Y changes by 1.5
b=mean(Y) - (m*mean(X))
b=13-(1.5*10)
b=13-15=-2
By Using the above value we can predict the price of pizza with diameter 20’’ as follows
Y=mX+b
Y=1.5*20+(-2)
Y=28
i.e pizza with 20 diameter can be sold for 28 dollars
Univariate linear regression
Linear regression is commonly employed to describe the relationship between a single
independent variable (x) and a dependent variable (y). For instance, estimating a person’s weight
(y) given his/her height (x).
Y=b0+b1*X
Y = b0 + b1 * X Where, b0 and b1 are the coefficients of regression.
here we try to find the best b0 and b1 by training a model so that our predicted variable y has
minimum difference with actual y..
Y=b0+b1*X
Y = b0 + b1 * X Where, b0 and b1 are the coefficients of regression.
Here we try to find the best b0 and b1 by training a model so that our predicted variable y has
minimum difference with actual y.

CHAPTER - 4
Multivariate linear regression

Statistically analysed data often involves multiple variables, rather than just one response
variable and one explanatory variable. The number of variables can vary depending on the study
being conducted. To assess the relationships between these multidimensional variables,
multivariate regression is used.
Multivariate regression is a sophisticated technique used to determine the extent to which various
independent variables are linearly related to multiple dependent variables. This linear
relationship is established through the correlation between the variables.
Characteristics of Multivariate Regression
Multivariate regression allows one to have a different view of the relationship between
various variables from all the possible angles.
It helps you predict the behaviour of the response variables depending on how the
predictor variables move.
Multivariate regression can be applied to various machine learning fields, including
economics, science, and medical research studies.
For Example, a doctor has meticulously gathered data on individuals’ blood pressure, weight,
and red meat consumption in order to investigate the correlation between health and dietary
habits. This extensive dataset offers valuable insights into how choices such as red meat intake
may impact physiological factors like blood pressure and weight.
In such cases, experts gather a wide range of data points, from dietary habits to environmental
conditions, to better understand complex phenomena. These multidimensional datasets are then
analysed using multivariate regression techniques, allowing for a deeper understanding of the
interactions among various variables.
Advantages and disadvantages of multivariate regression
Advantages:
The multivariate regression method helps you find a relationship between multiple
variables or features.
It also defines the correlation between independent variables and dependent variables.
Disadvantages:
Multivariate regression technique requires high-level mathematical calculations.
It is complex.

CHAPTER - 4
The output of the multivariate regression model is difficult to analyse.

The loss can use errors in the output.
CLASSIFICATION
Classification is a supervised machine learning method where the model tries to predict the
correct label of a given input data. In classification, the model is fully trained using the training
data, and then it is evaluated on test data before being used to perform prediction on new unseen
data.
For instance, an algorithm can learn to predict whether a given email is spam or ham (no spam),
as illustrated below.
The task of the classification algorithm is to find the mapping function to map the input(x) to the
discrete output(y).
In classification algorithm, a discrete output function(y) is mapped to input variable(x).
There are two types of learners in machine learning classification: lazy and eager learners.
Lazy learners or instance-based learners, on the other hand, do not create any model
immediately from the training data, and this is where the lazy aspect comes from. They just
memorize the training data, and each time there is a need to make a prediction, they search for
the nearest neighbor from the whole training data, which makes them very slow during
prediction. Some examples of this kind are:
K-Nearest Neighbor.

CHAPTER - 4
Case-based reasoning.
Eager learners are machine learning algorithms that first build a model from the training dataset
before making any prediction on future datasets. They spend more time during the training
process because of their eagerness to have a better generalization during the training from
learning the weights, but they require less time to make predictions.
Most machine learning algorithms are eager learners, and below are some examples:
Logistic Regression.
Support Vector Machine.
Decision Trees.
Artificial Neural Networks.
Linear classification
The linear classification model is basically the same as the linear regression, but the target value
is generally 1/0 discrete (Fig. 2.14), rather than outputting continuous values like regression. A
linear classifier is an algorithm that separates two types of objects by a line or a hyperplane.
Some of the linear classification models are as follows:

Logistic Regression
Support Vector Machines having kernel = ‘linear’
Single-layer Perceptron
Stochastic Gradient Descent (SGD) Classifier
Logistic Regression
Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a binary or
discrete format such as 0 or 1.

CHAPTER - 4
Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or
No, True or False, Spam or not spam, etc.
It is a predictive analysis algorithm which works on the concept of probability.
Logistic regression is a type of regression, but it is different from the linear regression
algorithm in the term how they are used.
Logistic regression is used for binary classification where we use sigmoid function, that takes
input as independent variables and produces a probability value between 0 and 1.
Sigmoid function is also known as the squashing function, as it takes the input from the
previously hidden layer and squeezes it between 0 and 1. So a value fed to the sigmoid function
will always return a value between 0 and 1, no matter how big or small the value is fed.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an
input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to Class
It’s referred to as regression because it is the extension of linear regression but is mainly used for
classification problems.
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
Logistic Regression is a significant machine learning algorithm because it has the ability
to provide probabilities and classify new data using continuous and discrete datasets.

CHAPTER - 4
Artificial Neural Networks

What is Artificial Neural Network?
The term "Artificial Neural Network" is derived from Biological neural networks that develop
the structure of a human brain. Similar to the human brain that has neurons interconnected to one
another; artificial neural networks also have neurons that are interconnected to one another in
various layers of the networks. These neurons are known as nodes. An Artificial Neural Network
in the field of Artificial intelligence where it attempts to mimic the network of neurons makes up
a human brain so that computers will have an option to understand things and make decisions in
a human-like manner. The artificial neural network is designed by programming computers to
behave simply like interconnected brain cells. There are around 1000 billion neurons in the
human brain. Each neuron has an association point somewhere in the range of 1,000 and
100,000. In the human brain, data is stored in such a manner as to be distributed, and we can
extract more than one piece of this data when necessary from our memory parallelly. We can say
that the human brain is made up of incredibly amazing parallel processors.

CHAPTER - 4
 Dendrites from Biological Neural Network represent inputs in Artificial Neural

Networks.
 Cell nucleus represents Nodes
 Synapse represents Weights
 Axon represents Output.
Artificial Neural Network has a huge number of interconnected processing elements, also
known as Nodes. These nodes are connected with other nodes using a connection link.
The connection link contains weights, these weights contain the information about the
input signal.
Each iteration and input in turn leads to updation of these weights.
After inputting all the data instances from the training data set, the final weights of the
Neural Network along with its architecture is known as the Trained Neural Network. This
process is called Training of Neural Networks.
Types of tasks that can be solved using an artificial neural network include Classification
problems, Pattern Matching, Data Clustering, etc.
The architecture of an artificial neural network:

CHAPTER - 4
Input Layer: As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer: The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer: The input goes through a series of transformations using the hidden layer, which
finally results in output that is conveyed using this layer. The artificial neural network takes input
and computes the weighted sum of the inputs and includes a bias. This computation is
represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the output.
Activation functions choose whether a node should fire or not. Only those who are fired make it
to the output layer. There are distinctive activation functions available that can be applied upon
the sort of task we are performing.
Advantages of Artificial Neural Network (ANN)
Parallel processing capability: Artificial neural networks have a numerical value that
can perform more than one task simultaneously.
Storing data on the entire network: Data that is used in traditional programming is
stored on the whole network, not on a database. The disappearance of a couple of pieces
of data in one place doesn't prevent the network from working.
Capability to work with incomplete knowledge: After ANN training, the information
may produce output even with inadequate data. The loss of performance here relies upon
the significance of missing data.
Having a memory distribution: For ANN is to be able to adapt, it is important to
determine the examples and to encourage the network according to the desired output by
demonstrating these examples to the network. The succession of the network is directly

CHAPTER - 4
proportional to the chosen instances, and if the event can't appear to the network in all its
aspects, it can produce false output.
Having fault tolerance: Extortion of one or more cells of ANN does not prohibit it from
generating output, and this feature makes the network fault tolerance.
Disadvantages of Artificial Neural Network:
Assurance of proper network structure: There is no particular guideline for
determining the structure of artificial neural networks. The appropriate network structure
is accomplished through experience, trial, and error.
Unrecognized behavior of the network: It is the most significant issue of ANN. When
ANN produces a testing solution, it does not provide insight concerning why and how. It
decreases trust in the network.
Hardware dependence: Artificial neural networks need processors with parallel
processing power, as per their structure. Therefore, the realization of the equipment is
dependent.
Difficulty of showing the issue to the network: ANNs can work with numerical data.
Problems must be converted into numerical values before being introduced to ANN. The
presentation mechanism to be resolved here will directly impact the performance of the
network. It relies on the user's abilities.
The duration of the network is unknown: The network is reduced to a specific value of
the error, and this value does not give us optimum results.
Types of Artificial Neural Network:
Feedback ANN: In this ANN, the information flow is unidirectional. A unit sends information
to other unit from which it does not receive any information. There are no feedback loops. They
are used in pattern generation/recognition/classification. They have fixed inputs and outputs.

CHAPTER - 4
Feed-Forward ANN: A feed-forward network is a basic neural network comprising of an input

layer, an output layer, and at least one layer of a neuron. Through assessment of its output by
reviewing its input, the intensity of the network can be noticed based on group behaviour of the
associated neurons, and the output is decided. The primary advantage of this network is that it
figures out how to evaluate and recognize input patterns. Here, feedback loops are allowed. They
are used in content addressable memories.
Applications of Neural Networks

Aerospace − Autopilot aircrafts, aircraft fault detection.
Automotive − Automobile guidance systems.
Military − Weapon orientation and steering, target tracking, object discrimination, facial
recognition, signal/image identification.
Electronics − Code sequence prediction, IC chip layout, chip failure analysis, machine
vision, voice synthesis.
Financial − Real estate appraisal, loan advisor, mortgage screening, corporate bond
rating, portfolio trading program, corporate financial analysis, currency value prediction,
document readers, credit application evaluators.
Industrial − Manufacturing process control, product design and analysis, quality
inspection systems, welding quality analysis, paper quality prediction, chemical product
design analysis, dynamic modelling of chemical process systems, machine maintenance
analysis, project bidding, planning, and management.
Medical − Cancer cell analysis, EEG and ECG analysis, prosthetic design, transplant
time optimizer.
Speech − Speech recognition, speech classification, text to speech conversion.

CHAPTER - 4
Telecommunications − Image and data compression, automated information services,

real-time spoken language translation.
Support Vector Machine
SVM is one of the most popular Supervised Learning algorithms, which is used for
Classification as well as Regression problems. However, primarily, it is used for Classification
problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified using a
decision boundary or hyperplane:
Example: Suppose we see a strange cat that also has some features of dogs, so if we want a
model that can accurately identify whether it is a cat or dog, so such a model can be created by
using the SVM algorithm. We will first train our model with lots of images of cats and dogs so
that it can learn about different features of cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary between these two data (cat and dog)
and choose extreme cases (support vectors), it will see the extreme case of cat and dog. On the
basis of the support vectors, it will classify it as a cat. Consider the below diagram:

CHAPTER - 4
SVM algorithm can be used for Face detection, image classification, text categorization, etc.
Applications of Artificial Intelligence:
NLP:
NLP stands for Natural Language Processing, which is a part of Computer Science, Human
language, and Artificial Intelligence. It is the technology that is used by machines to understand,
analyse, manipulate, and interpret human's languages. It helps developers to organize knowledge
for performing tasks such as translation, automatic summarization, Named Entity Recognition
(NER), speech recognition, relationship extraction, and topic segmentation.
Components of NLP
There are the following two components of NLP –
1. Natural Language Understanding (NLU):
Natural Language Understanding (NLU) helps the machine to understand and analyse human
language by extracting the metadata from content such as concepts, entities, keywords, emotion,
relations, and semantic roles.
NLU mainly used in Business applications to understand the customer's problem in both spoken
and written language.
NLU involves the following tasks –
It is used to map the given input into useful representation.
It is used to analyze different aspects of the language.
2. Natural Language Generation (NLG):
Natural Language Generation (NLG) acts as a translator that converts the computerized data into
natural language representation. It mainly involves Text planning, Sentence planning, and Text
Realization.

CHAPTER - 4
There are the following five phases of NLP:

1. Lexical Analysis and Morphological: The first phase of NLP is the Lexical Analysis. This
phase scans the source code as a stream of characters and converts it into meaningful lexemes. It
divides the whole text into paragraphs, sentences, and words.
2. Syntactic Analysis (Parsing): Syntactic Analysis is used to check grammar, word
arrangements, and shows the relationship among the words.
Example: Agra goes to the Poonam
In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is rejected
by the Syntactic analyzer.
3. Semantic Analysis: Semantic analysis is concerned with the meaning representation. It
mainly focuses on the literal meaning of words, phrases, and sentences.
4. Discourse Integration: Discourse Integration depends upon the sentences that proceeds it and
also invokes the meaning of the sentences that follow it.
5. Pragmatic Analysis: Pragmatic is the fifth and last phase of NLP. It helps you to discover the
intended effect by applying a set of rules that characterize cooperative dialogues.
For Example: "Open the door" is interpreted as a request instead of an order.
Applications of NLP
Question Answering: Question Answering focuses on building systems that
automatically answer the questions asked by humans in a natural language.
Spam Detection: Spam detection is used to detect unwanted e-mails getting to a user's
inbox.
Sentiment Analysis: Sentiment Analysis is also known as opinion mining. It is used on
the web to analyse the attitude, behaviour, and emotional state of the sender. This
application is implemented through a combination of NLP (Natural Language
Processing) and statistics by assigning the values to the text (positive, negative, or
natural), identify the mood of the context (happy, sad, angry, etc.)
Machine Translation: Machine translation is used to translate text or speech from one
natural language to another natural language. Example: Google Translator
Spelling correction: Microsoft Corporation provides word processor software like MS-
word, PowerPoint for the spelling correction.

CHAPTER - 4
Speech Recognition: Speech recognition is used for converting spoken words into text.
It is used in applications, such as mobile, home automation, video recovery, dictating to
Microsoft Word, voice biometrics, voice user interface, and so on.
Chatbot: Implementing the Chatbot is one of the important applications of NLP. It is
used by many companies to provide the customer's chat services.
Information extraction: Information extraction is one of the most important applications
of NLP. It is used for extracting structured information from unstructured or semi-
structured machine-readable documents.
Natural Language Understanding (NLU): It converts a large set of text into more
formal representations such as first-order logic structures that are easier for the computer
programs to manipulate notations of the natural language processing.
Advantages of NLP
NLP helps users to ask questions about any subject and get a direct response within
seconds. NLP offers exact answers to the question means it does not offer unnecessary
and unwanted information.
NLP helps computers to communicate with humans in their languages.
It is very time efficient.
Most of the companies use NLP to improve the efficiency of documentation processes,
accuracy of documentation, and identify the information from large databases.
Disadvantages of NLP
NLP may not show context.
NLP is unpredictable.
NLP may require more keystrokes.
NLP is unable to adapt to the new domain, and it has a limited function that's why NLP is
built for a single and specific task only
Text Classification

CHAPTER - 4
Step1: Data collection

Collect a set of text documents with their corresponding categories for the text labeling process.
Gathering data is the most important step in solving any supervised machine learning problem.
Your text classifier can only be as good as the dataset it is built from.
Step 2: Data pre-processing
Clean and prepare the text data by removing unnecessary symbols, converting to lowercase, and
handling special characters such as punctuation. Understanding the characteristics of your data
beforehand will enable you to build a better model. This could simply mean obtaining a higher
accuracy. It’s good practice to run some checks on it: pick a few samples and manually check if
they are consistent with your expectations.
Step3: Tokenization
Break the text apart into tokens, which are small units like words. Tokens help find matches and
connections by creating individually searchable parts. This step is especially useful for vector
search and semantic search, which give results based on user intent.
Step 4: Feature extraction
Convert the text into numerical representations that machine learning models can understand.
Some common methods include counting the occurrences of words (also known as Bag-of-
Words) or using word embeddings to capture word meanings. When we convert all of the texts
in a dataset into tokens, we may end up with tens of thousands of tokens. Not all of these
tokens/features contribute to label prediction. So we can drop certain tokens, for instance those
that occur extremely rarely across the dataset. We can also measure feature importance (how
much each token contributes to label predictions), and only include the most informative tokens.
Step5: Build, Train and evaluate model:
Now that the data is clean and pre-processed, you can use it to train a machine learning model.
The model will learn patterns and associations between the text’s features and their categories.
This helps it understand the text labelling conventions using the pre-labelled examples.
Training involves making a prediction based on the current state of the model, calculating how
incorrect the prediction is, and updating the weights or parameters of the network to minimize
this error and make the model predict better. We repeat this process until our model has
converged and can no longer learn.
There are three key parameters to be chosen for this process

CHAPTER - 4
• Metric: How to measure the performance of our model using a metric. We used accuracy as the
metric in our experiments.
• Loss function: A function that is used to calculate a loss value that the training process then
attempts to minimize by tuning the network weights. For classification problems, cross-entropy
loss works well.
• Optimizer: A function that decides how the network weights will be updated based on the
output of the loss function. We used the popular Adam optimizer in our experiments
Step 6: Text labelling
Create a new, separate dataset to start text labelling and classifying new text. In the text labelling
process, the model separates the text into the predetermined categories from the data collection
step.
Step 8: Hyper parameter tuning
Depending on how the model evaluation goes, you may want to adjust the model's settings to
optimize its performance. We had to choose a number of hyper parameters for defining and
training the model. We relied on intuition, examples and best practice recommendations. Our
first choice of hyper parameter values, however, may not yield the best results. It only gives us a
good starting point for training.
Step 9: Model deployment
Use the trained and tuned model to classify new text data into their appropriate categories.
Information retrieval
Information retrieval is defined as a completely automated procedure that answers to a
user query by reviewing a group of documents and producing a sorted document list that
ought to be relevant to the user's query criteria.
A retrieval model (IR) chooses and ranks relevant pages based on a user's query.
Document selection and ranking can be formalized using matching functions that return
retrieval status values (RSVs) for each document in a collection since documents and
queries are written in the same way. The majority of IR systems portray document
contents using a collection of descriptors known as words from a vocabulary V.
The query-document matching function in an IR model is defined in the following ways:
The estimation of the likelihood of user relevance for each page and query in relation to a
collection of q training documents.

CHAPTER - 4
In a vector space, the similarity function between queries and documents is computed.
Types of Information Retrieval Models
1. Classic IR Model
It is the most basic and straightforward IR model. This paradigm is founded on mathematical
information that was easily recognized and comprehended. The three traditional IR models are
Boolean, Vector, and Probabilistic.
2. Non-Classic IR Model
It is diametrically opposed to the traditional IR model. Addition than probability, similarity, and
Boolean operations, such IR models are based on other ideas. Non-classical IR models include
situation theory models, information logic models, and interaction models.
3. Alternative IR Model
It is an improvement to the traditional IR model that makes use of some unique approaches from
other domains. Alternative IR models include fuzzy models, cluster models, and latent semantic
indexing (LSI) models
Speech Recognition
Speech recognition is a software technology driven by cutting-edge solutions like Natural
Language Processing (NLP) and Machine Learning (ML).
NLP, an AI system that analyses natural human speech, is sometimes referred to as
human language processing.
The vocal data is first transformed into a digital format that can be processed by
computer software. Then, the digitized data is subjected to additional processing using
NLP, ML, and deep learning techniques. Consumer products like smartphones, smart
homes, and other voice activated solutions can employ this digitized speech.
A piece of software operates four procedures to convert the audio that a microphone
records into text that both computers and people can understand:
1. Analyze the audio;
2. Separate it into sections;
3. Create a computer-readable version of it using digitization, and
4. The most appropriate text representation should be found using an algorithm.
Due to how context-specific and extremely varied human speech is, voice recognition algorithms
must adjust. Different speech patterns, speaking styles, languages, dialects, accents, and

CHAPTER - 4
phrasings are used to train the software algorithms that process and organize audio into text. The
software also distinguishes speech sounds from the frequently present background noise.
Speech recognition systems utilize one of two types of models to satisfy these requirements:
Good modelling. These illustrate the connection between speech linguistics and audio
signals.
Language layouts. Here, word sequences and sounds are matched to identify words with
similar sounds.
Different speech recognition algorithms
Hidden Markov model (HMM). HMM is an algorithm that handles speech diversity,
such as pronunciation, speed, and accent. It provides a simple and effective framework
for modelling the temporal structure of audio and voice signals and the sequence of
phonemes that make up a word. For this reason, most of today’s speech recognition
systems are based on an HMM.
Dynamic time warping (DTW). DTW is used to compare two separate sequences of
speech that are different in speed. For example, you have two audio recordings of
someone saying ―good morning‖ – one slow, one fast. In this case, the DTW algorithm
can sync the two recordings, even though they differ in speed and length.
Artificial neural networks (ANN). ANN is a computational model used in speech
recognition applications that helps computers understand spoken human language. It uses
deep learning techniques and basically imitates the patterns of how neural networks work
in the human brain, which allows the computer to make decisions in a human-like
manner.
Image processing
AI image processing works through a combination of advanced algorithms, neural networks, and
data processing to analyze, interpret, and manipulate digital images. Here's a simplified overview
of how AI image processing works:

CHAPTER - 4
Acquisition of image: The initial level begins with image pre-processing which uses a sensor to
capture the image and transform it into a usable format.
Enhancement of image: Image enhancement is the technique of bringing out and emphasising
specific interesting characteristics which are hidden in an image.
Restoration of image: Image restoration is the process of enhancing an image's look. Picture
restoration, as opposed to image augmentation, is carried out utilising specific mathematical or
probabilistic models.
Colour image processing: A variety of digital colour modelling approaches such as HSI (Hue-
Saturation-Intensity), CMY (Cyan-Magenta-Yellow) and RGB (Red-Green-Blue) etc. are used in
colour picture processing.
Compression and decompression of image: This enables adjustments to image resolution and
size, whether for image reduction or restoration depending on the situation, without lowering
image quality below a desirable level. Lossy and lossless compression techniques are the two
main types of image file compression which are being employed in this stage.
Morphological processing: Digital images are processed depending on their shapes using an
image processing technique known as morphological operations. The operations depend on the
pixel values rather than their numerical values, and well suited for the processing of binary
images. It aids in removing imperfections for structure of the image.

CHAPTER - 4
Segmentation, representation and description: The segmentation process divides a picture

into segments, and each segment is represented and described in such a way that it can be
processed further by a computer. The image's quality and regional characteristics are covered by
representation. The description's job is to extract quantitative data that helps distinguish one class
of items from another.
Recognition of image: A label is given to an object through recognition based on its description.
Some of the often-employed algorithms in the process of recognising images include the Scale-
invariant Feature Transform (SIFT), the Speeded Up Robust Features (SURF), and the PCA
(Principal Component Analysis)
Computer vision
Computer vision is a field of artificial intelligence that enables computers to derive meaningful
information from digital images, videos, and other visual inputs . It uses machine learning
algorithms to analyse and interpret images, and it has a wide range of applications in various
fields.
Computer vision works by using deep learning algorithms to analyse and interpret images. These
algorithms are trained on large datasets of images and use convolutional neural networks (CNNs)
to identify patterns and features in the images .Once the patterns and features are identified, the
computer can use them to recognize objects, classify images, and perform other image-based
tasks.
Some of the applications of computer vision include:
1. Autonomous Vehicles: Computer vision is used in autonomous vehicles to detect and
recognize objects such as pedestrians, other vehicles, and traffic signals. This helps the vehicle
navigate safely and avoid accidents
2. Retail: Computer vision is used in retail to track inventory, monitor customer behaviour, and
analyse shopping patterns.
It can also be used to personalize the shopping experience for customers
3. Healthcare: Computer vision is used in healthcare to diagnose diseases, monitor patients, and
assist with surgical procedures. It can also be used to analyse medical images such as X-rays and
MRIs
4. Security: Computer vision is used in security to detect and recognize faces, track individuals,
and monitor public spaces. It can also be used to detect suspicious behaviour and prevent crime .

CHAPTER - 4
5. Entertainment: Computer vision is used in entertainment to create special effects, enhance

video games, and develop virtual reality experiences. It can also be used to analyze audience
reactions and preferences.
Difference
Robotics
Robotics is a field of engineering and science that involves the design, construction, operation,
and use of robots. Robots are autonomous or semiautonomous machines that can perform tasks
in the physical world.
AI allows robots to interpret and respond to their environment. Advanced sensors combined with
AI algorithms enable robots to perceive and understand the world around them using vision,
touch, sound, and other sensory inputs. Robots can learn from experience through machine
learning algorithms. This capability enables them to adapt to changing conditions, improve their
performance over time, and even learn from human demonstrations. They can analyse data,
assess situations, and make decisions based on predefined rules or learned patterns.
Automation and robotics:-
Automation and robotics are two closely related technologies. In an industrial context, we can
dean automation as a technology that is concerned with the use of mechanical, electronic, and
computer-based systems in the operation and control of production Examples of this technology

CHAPTER - 4
include transfer lines. Mechanized assembly machines, feedback control systems (applied to
industrial processes), numerically controlled machine tools, and robots. Accordingly, robotics is
a form of industrial automation. Ex:- Robotics, CAD/CAM, FMS, CIMS
Different types of Robots
1. Androids
Androids are robots that resemble humans. They are often mobile, moving around on wheels or a
track drive. According to the American Society of Mechanical Engineers, these humanoid robots
are used in areas such as caregiving and personal assistance, search and rescue, space exploration
and research, entertainment and education, public relations and healthcare, and manufacturing.
2. Telechir
A telechir is a complex robot that is remotely controlled by a human operator for a telepresence
system. It gives that individual the sense of being on location in a remote, dangerous or alien
environment, and enables them to interact with it since the telechir continuously provides
sensory feedback.
3. Telepresence robot
A telepresence robot simulates the experience -- and some capabilities -- of being physically
present at a location. It combines remote monitoring and control via telemetry sent over radio,
wires or optical fibers, and enables remote business consultations, healthcare, home monitoring,
childcare and more.
4. Industrial robot
The IFR (International Federation of Robotics) defines an industrial robot as an "automatically
controlled, reprogrammable multipurpose manipulator programmable in three or more axes."
Users can adapt these robots to different applications as well. Combining these robots with AI
has helped businesses move them beyond simple automation to higher-level and more complex
tasks.
5. Swarm robot
Swarm robots (aka insect robots) work in fleets ranging from a few to thousands, all under the
supervision of a single controller. These robots are analogous to insect colonies, in that they
exhibit simple behaviours individually, but demonstrate behaviours that are more sophisticated
with an ability to carry out complex tasks in total.
6. Smart robot

CHAPTER - 4
This is the most advanced kind of robot. The smart robot has a built-in AI system that learns
from its environment and experiences to build knowledge and enhance capabilities to
continuously improve. A smart robot can collaborate with humans and help solve problems in
areas like the following:
Agricultural labor shortages;
Food waste;
Study of marine ecosystems;

Chapter4 Notes NEP

Uploaded by

Copyright:

Available Formats

Chapter4 Notes NEP

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter4 Notes NEP

Uploaded by

Copyright:

Available Formats

CHAPTER - 4

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 1

5. Action-value information indicating the desirability of actions.

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 2

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 3

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 4

Decision Tree Terminologies

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 5

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 6

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 7

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 8

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 9

Advantages of the Decision Tree

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 10

Dependent variable is the one which is used to predict or explain Y

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 11

m=sum of product of deviations/sum of square of deviation for X

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 12

Multivariate linear regression

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 13

The output of the multivariate regression model is difficult to analyse.

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 14

Some of the linear classification models are as follows:

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 15

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 16

Artificial Neural Networks

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 17

 Dendrites from Biological Neural Network represent inputs in Artificial Neural

The architecture of an artificial neural network:

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 18

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 19

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 20

Feed-Forward ANN: A feed-forward network is a basic neural network comprising of an input

Applications of Neural Networks

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 21

Telecommunications − Image and data compression, automated information services,

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 22

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 23

There are the following five phases of NLP:

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 24

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 25

Step1: Data collection

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 26

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 27

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 28

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 29

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 30

Segmentation, representation and description: The segmentation process divides a picture

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 31

5. Entertainment: Computer vision is used in entertainment to create special effects, enhance

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 32

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 33

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 34

You might also like