Unit 1
Unit 1
Unit 1
“A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at the tasks improves with the experiences
For any learning system, we must be knowing the three elements — T (Task), P (Performance
Measure), and E (Training Experience). At a high level, the process of learning system looks as
below.
T: play checkers
Assume we want to classify an incoming mail as a Spam or Not. This problem can be described in
2. Performance measure P: Total percent of mails being correctly classified as ‘spam’ (or ‘not
3. Training experience E: A set of mails with given labels (‘spam’ / ‘not spam’).
2. The exact type of knowledge to be learned (Choosing the Target Function). Initially, the target
3. A representation for this target knowledge (Choosing a representation for the Target
Function)
look into the checkers learning problem and apply the above design choices. For a checkers learning
1.Training experience
During the design of the checker's learning system, the type of training experience available for a
learning system will have a significant effect on the success or failure of the learning.
1. Direct or Indirect training experience — In the case of direct training experience, an individual
board states and correct move for each board state are given.
In case of indirect training experience, the move sequences for a game and the final result (win,
loss or draw) are given for a number of games. How to assign credit or blame to individual moves
2. Teacher or Not — Supervised — The training experience will be labeled, which means, all the
board states will be labeled with the correct move. So the learning takes place in the presence of a
supervisor or a teacher.
Unsupervised — The training experience will be unlabeled, which means, all the board states will
not have the moves. So the learner generates random games and plays against itself with no
Semi-supervised — Learner generates game states and asks the teacher for help in finding the
Performance is best when training examples and test examples are from the same/a similar
distribution.
The checker player learns by playing against oneself. Its experience is indirect. It may not encounter
moves that are common in human expert play. Once the proper training experience is available, the
In this design step, we need to determine exactly what type of knowledge has to be learned and it's
When you are playing the checkers game, at any moment of time, you make a decision on choosing the
best move from different possibilities. You think and apply the learning that you have gained from the
experience. Here the learning is, for a specific board, you move a checker such that your board state
tends towards the winning situation. Now the same learning has to be defined in terms of the target
function.
During the direct experience, the checkers learning system, it needs only to learn how to choose the
best move among some large search space. We need to find a target function that will help us choose
the best move among alternatives. Let us call this function ChooseMove and use the
notation ChooseMove : B →M to indicate that this function accepts as input any board from the set of
legal board states B and produces as output some move from the set of legal moves M.
When there is an indirect experience, it becomes difficult to learn such function. How about
assigning a real score to the board state. So the function be V : B →R indicating that this accepts as
input any board from the set of legal board states B and produces an output a real score. This function
Let us therefore define the target value V(b) for an arbitrary board state b in B, as follows:
4. if b is a not a final state in the game, then V (b) = V (b’), where b’ is the best final board state
that can be achieved starting from b and playing optimally until the end of the game.
The (4) is a recursive definition and to determine the value of V(b) for a particular board state, it
performs the search ahead for the optimal line of play, all the way to the end of the game. So this
definition is not efficiently computable by our checkers playing program, we say that it is a
nonoperational definition.
The goal of learning, in this case, is to discover an operational description of V ; that is, a
description that can be used by the checkers-playing program to evaluate states and select moves
It may be very difficult in general to learn such an operational form of V perfectly. We expect learning
Now its time to choose a representation that the learning program will use to describe the function ^V
2. collection of rules?
3. neural network?
5. …
To keep the discussion simple, let us choose a simple representation for any given board state, the
x5(b) — number of red pieces threatened by black (i.e., which can be taken on black’s next turn)
Specification of the Machine Learning Problem at this time — Till now we worked on choosing the
type of training experience, choosing the target function and its representation. The checkers learning
items constitute design choices for the implementation of the learning program.
To train our learning program, we need a set of training data, each describing a specific board state b
and the training value V_train (b) for b. Each training example is an ordered pair <b,V_train(b)>
is an example where black has won the game since x2 = 0 or red has no remaining pieces. However,
such clean values of V_train (b) can be obtained only for board value b that are clear win, loss or draw.
In above case, assigning a training value V_train(b) for the specific boards b that are clean win, loss or
draw is direct as they are direct training experience. But in the case of indirect training experience,
assigning a training value V_train(b) for the intermediate boards is difficult. In such case, the training
values are updated using temporal difference learning. Let Successor(b) denotes the next board state
following b for which it is again the program’s turn to move. ^V is the learner’s current approximation
to V. Using these information, assign the training value of V_train(b) for any intermediate board state b
as below :
V_train(b) ← ^V(Successor(b))
In the above figure, V_train(b1) ← ^V(b3), where b3 is the successor of b1. Once the game is played,
the training data is generated. For each training example, the V_train(b) is computed.
examples. One common approach is to define the best hypothesis as that which minimizes the squared
error E between the training values and the values predicted by the hypothesis ^V.
The learning algorithm should incrementally refine weights as more training examples become
Least Mean Square (LMS) training rule is the one training algorithm that will adjust weights a small
1. The performance System — Takes a new board as input and outputs a trace of the game it played
against itself.
2. The Critic — Takes the trace of a game as an input and outputs a set of training examples of the
target function.
3. The Generalizer — Takes training examples as input and outputs a hypothesis that estimates the
4. The Experiment Generator — Takes the current hypothesis (currently learned function) as input
and outputs a new problem (an initial board state) for the performance system to explore.
Topic 3: Perspective & Issues in Machine Learning
1.3.1 Perspective:
1.3.2 Issues:
What algorithms exist for learning generic target functions from specific training
examples? In what settings will particular algorithms converge to the desired function,
given sufficient training data? Which algorithms perform best for which types of
problems and representation?
How much training data is sufficient? What general bounds can be found to relate the
confidence in learned hypothesis to the amount of training experience and the
character of the learner’s hypothesis?
When and how the prior knowledge held by the learner guide the process of
generalizing from examples? Can prior knowledge be helpful even when it is only
approximately correct?
The best strategy for choosing a useful next training experience.
What specific function should the system attempt to learn?Can this process itself be
automated?
How can learner automatically alter it’s representation to improve it’s ability to
represent and learn the target function?
Topic 4: Types of machine learning Algorithms
Machine Algorithms are divided into categories according to their purpose and the
main categories are the following:
Supervised learning
Unsupervised Learning
Semi-supervised Learning
Reinforcement Learning
Supervised Learning
Nearest Neighbor
Naive Bayes
Decision Trees
Linear Regression
Support Vector Machines (SVM)
Neural Networks
Unsupervised Learning
In the previous two types, either there are no labels for all the observation in the dataset
or labels are present for all the observations. Semi-supervised learning falls in between
these two. In many practical situations, the cost to label is quite high, since it requires
skilled human experts to do that. So, in the absence of labels in the majority of the
observations but present in few, semi-supervised algorithms are the best candidates for
the model building. These methods exploit the idea that even though the group
memberships of the unlabeled data are unknown, this data carries important
information about the group parameters.
Here’s how it works:
Train the model with the small amount of labeled training data just like
you would in supervised learning, until it gives you good results.
Then use it with the unlabeled training dataset to predict the outputs
Link the data inputs in the labeled training data with the inputs in the
unlabeled data.
Then, train the model the same way as you did with the labeled set in the
beginning in order to decrease the error and improve the model’s
accuracy.
Reinforcement Learning
This method aims at using observations gathered from the interaction with the
environment to take actions that would maximize the reward or minimize the risk.
Reinforcement learning algorithm (called the agent) continuously learns from the
environment in an iterative fashion. In the process, the agent learns from its
experiences of the environment until it explores the full range of possible states.
Reinforcement Learning is a type of Machine Learning, and thereby also a branch
of Artificial Intelligence. It allows machines and software agents to automatically
determine the ideal behavior within a specific context, in order to maximize its
performance. Simple reward feedback is required for the agent to learn its behavior;
this is known as the reinforcement signal.
Reinforcement learning goes through the following steps:
1. Input state is observed by the agent.
2. Decision making function is used to make the agent perform an action.
3. After the action is performed, the agent receives reward or reinforcement from the
environment.
4. The state-action pair information about the reward is stored.
List of Common Algorithms
Q-Learning
Temporal Difference (TD)
Deep Adversarial Networks
Use cases:
Some applications of the reinforcement learning algorithms are computer played board
games (Chess, Go), robotic hands, and self-driving cars.