Unit 1

Topic:1 Definition of Machine Learning/A well posed learning problem
“A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at the tasks improves with the experiences
Simple learning process
For any learning system, we must be knowing the three elements — T (Task), P (Performance
Measure), and E (Training Experience). At a high level, the process of learning system looks as
below.
Example: Learn to play checkers:
 T: play checkers
 P: percentage of games won in a tournament
 E: Playing practice games against itself.
Example 2:A Hand Writing recognition learning problem
T:Recognizing and classifying handwritten words in the images
P:Percentage of words correctly classified.
E:A database of handwritten words with the given classifications
Example 3:A robot driving learning problem
T:driving on highways using vision sensors.
P:Average Distance travelled before an error.
E:Sequence of images and steering commands recorded in its database.
Assume we want to classify an incoming mail as a Spam or Not. This problem can be described in
terms of three elements as below :
Example 4:Spam Mail detection learning problem
1. Task T: To recognize and classify emails into ‘spam’ or ‘not spam’.
2. Performance measure P: Total percent of mails being correctly classified as ‘spam’ (or ‘not
spam’ ) by the program.
3. Training experience E: A set of mails with given labels (‘spam’ / ‘not spam’).
Topic 2: Design of a learning system

Designing of a learning system follows the learning process, need to consider a few design choices.
The design choices will be to decide the following key components:
1. Type of training experience
2. The exact type of knowledge to be learned (Choosing the Target Function). Initially, the target
function will be unknown.
3. A representation for this target knowledge (Choosing a representation for the Target
Function)
4. A learning mechanism (Choosing an approximation algorithm for the Target Function)
look into the checkers learning problem and apply the above design choices. For a checkers learning
problem, the three elements will be,
1. Task T: To play checkers
2. Performance measure P: Total percent of the game won in the tournament.
3. Training experience E: A set of games played against itself
1.Training experience
During the design of the checker's learning system, the type of training experience available for a
learning system will have a significant effect on the success or failure of the learning.
1. Direct or Indirect training experience — In the case of direct training experience, an individual
board states and correct move for each board state are given.
In case of indirect training experience, the move sequences for a game and the final result (win,
loss or draw) are given for a number of games. How to assign credit or blame to individual moves
is the credit assignment problem.it is used in indirect experience.
2. Teacher or Not — Supervised — The training experience will be labeled, which means, all the
board states will be labeled with the correct move. So the learning takes place in the presence of a
supervisor or a teacher.
Unsupervised — The training experience will be unlabeled, which means, all the board states will
not have the moves. So the learner generates random games and plays against itself with no
supervision or teacher involvement.
Semi-supervised — Learner generates game states and asks the teacher for help in finding the
correct move if the board state is confusing.

3. Is the training experience good — Do the training examples represent the distribution of
examples over which the final system performance will be measured?
Performance is best when training examples and test examples are from the same/a similar
distribution.
The checker player learns by playing against oneself. Its experience is indirect. It may not encounter
moves that are common in human expert play. Once the proper training experience is available, the
next design step will be choosing the Target Function.
2.Choosing the Target Function
In this design step, we need to determine exactly what type of knowledge has to be learned and it's
used by the performance program.
When you are playing the checkers game, at any moment of time, you make a decision on choosing the
best move from different possibilities. You think and apply the learning that you have gained from the
experience. Here the learning is, for a specific board, you move a checker such that your board state
tends towards the winning situation. Now the same learning has to be defined in terms of the target
function.
Here there are 2 considerations — direct and indirect experience.
During the direct experience, the checkers learning system, it needs only to learn how to choose the
best move among some large search space. We need to find a target function that will help us choose
the best move among alternatives. Let us call this function ChooseMove and use the
notation ChooseMove : B →M to indicate that this function accepts as input any board from the set of
legal board states B and produces as output some move from the set of legal moves M.
When there is an indirect experience, it becomes difficult to learn such function. How about
assigning a real score to the board state. So the function be V : B →R indicating that this accepts as
input any board from the set of legal board states B and produces an output a real score. This function
assigns the higher scores to better board states.

If the system can successfully learn such a target function V, then it can easily use it to select the best
move from any board position.
Let us therefore define the target value V(b) for an arbitrary board state b in B, as follows:
1. if b is a final board state that is won, then V(b) = 100
2. if b is a final board state that is lost, then V(b) = -100
3. if b is a final board state that is drawn, then V(b) = 0
4. if b is a not a final state in the game, then V (b) = V (b’), where b’ is the best final board state
that can be achieved starting from b and playing optimally until the end of the game.
The (4) is a recursive definition and to determine the value of V(b) for a particular board state, it
performs the search ahead for the optimal line of play, all the way to the end of the game. So this
definition is not efficiently computable by our checkers playing program, we say that it is a
nonoperational definition.
The goal of learning, in this case, is to discover an operational description of V ; that is, a
description that can be used by the checkers-playing program to evaluate states and select moves
within realistic time bounds.
It may be very difficult in general to learn such an operational form of V perfectly. We expect learning
algorithms to acquire only some approximation to the target function ^V.

3.Choosing a representation for the Target Function
Now its time to choose a representation that the learning program will use to describe the function ^V
that it will learn. The representation of ^V can be as follows.
1. A table specifying values for each possible board state?
2. collection of rules?
3. neural network?
4. a polynomial function of board features?
5. …
To keep the discussion simple, let us choose a simple representation for any given board state, the
function ^V will be calculated as a linear combination of the following board features:
 x1(b) — number of black pieces on board b
 x2(b) — number of red pieces on b
 x3(b) — number of black kings on b
 x4(b) — number of red kings on b
 x5(b) — number of red pieces threatened by black (i.e., which can be taken on black’s next turn)
 x6(b) — number of black pieces threatened by red
^V = w0 + w1 · x1(b) + w2 · x2(b) + w3 · x3(b) + w4 · x4(b) +w5 · x5(b) + w6 · x6(b)
Where w0 through w6 are numerical coefficients or weights to be obtained by a learning algorithm.
Weights w1 to w6 will determine the relative importance of different board features.
Specification of the Machine Learning Problem at this time — Till now we worked on choosing the
type of training experience, choosing the target function and its representation. The checkers learning
task can be summarized as below.
 Task T : Play Checkers
 Performance Measure : % of games won in world tournament
 Training Experience E : opportunity to play against itself
 Target Function : V : Board → R
 Target Function Representation : ^V = w0 + w1 · x1(b) + w2 · x2(b) + w3 · x3(b) + w4 ·
x4(b) +w5 · x5(b) + w6 · x6(b)

The first three items above correspond to the specification of the learning task,whereas the final two
items constitute design choices for the implementation of the learning program.
4. Choosing an approximation algorithm for the Target Function

Generating training data —
To train our learning program, we need a set of training data, each describing a specific board state b
and the training value V_train (b) for b. Each training example is an ordered pair <b,V_train(b)>
For example, a training example may be <(x1 = 3, x2 = 0, x3 = 1, x4 = 0, x5 = 0, x6 = 0), +100">. This
is an example where black has won the game since x2 = 0 or red has no remaining pieces. However,
such clean values of V_train (b) can be obtained only for board value b that are clear win, loss or draw.
In above case, assigning a training value V_train(b) for the specific boards b that are clean win, loss or
draw is direct as they are direct training experience. But in the case of indirect training experience,
assigning a training value V_train(b) for the intermediate boards is difficult. In such case, the training
values are updated using temporal difference learning. Let Successor(b) denotes the next board state
following b for which it is again the program’s turn to move. ^V is the learner’s current approximation
to V. Using these information, assign the training value of V_train(b) for any intermediate board state b
as below :
V_train(b) ← ^V(Successor(b))
In the above figure, V_train(b1) ← ^V(b3), where b3 is the successor of b1. Once the game is played,
the training data is generated. For each training example, the V_train(b) is computed.
Adjusting the weights

Now its time to define the learning algorithm for choosing the weights and best fit the set of training
examples. One common approach is to define the best hypothesis as that which minimizes the squared
error E between the training values and the values predicted by the hypothesis ^V.
The learning algorithm should incrementally refine weights as more training examples become
available and it needs to be robust to errors in training data
Least Mean Square (LMS) training rule is the one training algorithm that will adjust weights a small
amount in the direction that reduces the error.
The LMS algorithm is defined as follows:
5.Final Design for Checkers Learning system
1. The performance System — Takes a new board as input and outputs a trace of the game it played
against itself.
2. The Critic — Takes the trace of a game as an input and outputs a set of training examples of the
target function.
3. The Generalizer — Takes training examples as input and outputs a hypothesis that estimates the
target function. Good generalization to new cases is crucial.
4. The Experiment Generator — Takes the current hypothesis (currently learned function) as input
and outputs a new problem (an initial board state) for the performance system to explore.
Topic 3: Perspective & Issues in Machine Learning
1.3.1 Perspective:
 It involves searching a very large space of possible hypothesis to determine the

one that best fits the observed data and any prior knowledge held by learner.
 For example, in checkers game, the hypothesis is all the evaluation functions
with some choice of values for w 0 to w6.The learner’s task is to find the best
hypothesis that is consistent with the available training examples.
 The LMS algorithm works iteratively for finding the best weight values.
1.3.2 Issues:
 What algorithms exist for learning generic target functions from specific training
examples? In what settings will particular algorithms converge to the desired function,
given sufficient training data? Which algorithms perform best for which types of
problems and representation?
 How much training data is sufficient? What general bounds can be found to relate the
confidence in learned hypothesis to the amount of training experience and the
character of the learner’s hypothesis?
 When and how the prior knowledge held by the learner guide the process of
generalizing from examples? Can prior knowledge be helpful even when it is only
approximately correct?
 The best strategy for choosing a useful next training experience.
 What specific function should the system attempt to learn?Can this process itself be
automated?
 How can learner automatically alter it’s representation to improve it’s ability to
represent and learn the target function?
Topic 4: Types of machine learning Algorithms
Machine Algorithms are divided into categories according to their purpose and the
main categories are the following:
 Supervised learning
 Unsupervised Learning
 Semi-supervised Learning
 Reinforcement Learning
Supervised Learning
 Supervised learning algorithms try to model relationships and dependencies

between the target prediction output and the input features such that we can
predict the output values for new data based on those relationships which it learned
from the previous data sets.
 Predictive Model
 we have labelled data
The majority of practical machine learning uses supervised learning.
Supervised learning is where you have input variables (x) and an output variable (Y) and
you use an algorithm to learn the mapping function from the input to the output.
Y = f(X)
The goal is to approximate the mapping function so well that when you have new input
data (x) that you can predict the output variables (Y) for that data.
It is called supervised learning because the process of an algorithm learning from the
training dataset can be thought of as a teacher supervising the learning process. We know
the correct answers, the algorithm iteratively makes predictions on the training data and is
corrected by the teacher. Learning stops when the algorithm achieves an acceptable level
of performance.
The main types of supervised learning problems include regression and
classification problems
List of Common Algorithms
 Nearest Neighbor
 Naive Bayes
 Decision Trees
 Linear Regression
 Support Vector Machines (SVM)
 Neural Networks
Unsupervised Learning
 The computer is trained with unlabeled data.

 Here there’s no teacher at all, actually the computer might be able to teach you new
things after it learns patterns in data, these algorithms a particularly useful in cases
where the human expert doesn’t know what to look for in the data.
 These are the family of machine learning algorithms which are mainly used
in pattern detection and descriptive modelling. However, there are no output
categories or labels here based on which the algorithm can try to model
relationships. These algorithms try to use techniques on the input data to mine for
rules, detect patterns, and summarize and group the data points which help in
deriving meaningful insights and describe the data better to the users.
 Descriptive Model
 The main types of unsupervised learning algorithms include Clustering algorithms
and Association rule learning algorithms.
 k-means clustering, Association Rules
Semi-supervised Learning
In the previous two types, either there are no labels for all the observation in the dataset
or labels are present for all the observations. Semi-supervised learning falls in between
these two. In many practical situations, the cost to label is quite high, since it requires
skilled human experts to do that. So, in the absence of labels in the majority of the
observations but present in few, semi-supervised algorithms are the best candidates for
the model building. These methods exploit the idea that even though the group
memberships of the unlabeled data are unknown, this data carries important
information about the group parameters.
Here’s how it works:
 Train the model with the small amount of labeled training data just like
you would in supervised learning, until it gives you good results.
 Then use it with the unlabeled training dataset to predict the outputs
 Link the data inputs in the labeled training data with the inputs in the
unlabeled data.
 Then, train the model the same way as you did with the labeled set in the
beginning in order to decrease the error and improve the model’s
accuracy.
Reinforcement Learning
This method aims at using observations gathered from the interaction with the
environment to take actions that would maximize the reward or minimize the risk.
Reinforcement learning algorithm (called the agent) continuously learns from the
environment in an iterative fashion. In the process, the agent learns from its
experiences of the environment until it explores the full range of possible states.
Reinforcement Learning is a type of Machine Learning, and thereby also a branch
of Artificial Intelligence. It allows machines and software agents to automatically
determine the ideal behavior within a specific context, in order to maximize its
performance. Simple reward feedback is required for the agent to learn its behavior;
this is known as the reinforcement signal.
Reinforcement learning goes through the following steps:
1. Input state is observed by the agent.
2. Decision making function is used to make the agent perform an action.
3. After the action is performed, the agent receives reward or reinforcement from the
environment.
4. The state-action pair information about the reward is stored.
 Q-Learning
 Temporal Difference (TD)
 Deep Adversarial Networks
Use cases:
Some applications of the reinforcement learning algorithms are computer played board
games (Chess, Go), robotic hands, and self-driving cars.

Unit 1

Uploaded by

Copyright:

Available Formats

Unit 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1

Uploaded by

Copyright:

Available Formats

Topic:1 Definition of Machine Learning/A well posed learning problem

Simple learning process

Example: Learn to play checkers:

 P: percentage of games won in a tournament

 E: Playing practice games against itself.

Example 2:A Hand Writing recognition learning problem

T:Recognizing and classifying handwritten words in the images

P:Percentage of words correctly classified.

E:A database of handwritten words with the given classifications

Example 3:A robot driving learning problem

T:driving on highways using vision sensors.

P:Average Distance travelled before an error.

E:Sequence of images and steering commands recorded in its database.

terms of three elements as below :

Example 4:Spam Mail detection learning problem

1. Task T: To recognize and classify emails into ‘spam’ or ‘not spam’.

spam’ ) by the program.

Topic 2: Design of a learning system

The design choices will be to decide the following key components:

1. Type of training experience

function will be unknown.

4. A learning mechanism (Choosing an approximation algorithm for the Target Function)

problem, the three elements will be,

1. Task T: To play checkers

2. Performance measure P: Total percent of the game won in the tournament.

3. Training experience E: A set of games played against itself

is the credit assignment problem.it is used in indirect experience.

supervision or teacher involvement.

correct move if the board state is confusing.

examples over which the final system performance will be measured?

next design step will be choosing the Target Function.

2.Choosing the Target Function

used by the performance program.

Here there are 2 considerations — direct and indirect experience.

assigns the higher scores to better board states.

move from any board position.

1. if b is a final board state that is won, then V(b) = 100

2. if b is a final board state that is lost, then V(b) = -100

3. if b is a final board state that is drawn, then V(b) = 0

within realistic time bounds.

algorithms to acquire only some approximation to the target function ^V.

that it will learn. The representation of ^V can be as follows.

1. A table specifying values for each possible board state?

4. a polynomial function of board features?

function ^V will be calculated as a linear combination of the following board features:

 x1(b) — number of black pieces on board b

 x2(b) — number of red pieces on b

 x3(b) — number of black kings on b

 x4(b) — number of red kings on b

 x6(b) — number of black pieces threatened by red

^V = w0 + w1 · x1(b) + w2 · x2(b) + w3 · x3(b) + w4 · x4(b) +w5 · x5(b) + w6 · x6(b)

Where w0 through w6 are numerical coefficients or weights to be obtained by a learning algorithm.

Weights w1 to w6 will determine the relative importance of different board features.

task can be summarized as below.

 Task T : Play Checkers

 Performance Measure : % of games won in world tournament

 Training Experience E : opportunity to play against itself

 Target Function : V : Board → R

 Target Function Representation : ^V = w0 + w1 · x1(b) + w2 · x2(b) + w3 · x3(b) + w4 ·