Machine Learning: Cognate/ Elective 2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 46

MACHINE LEARNING

COGNATE/ ELECTIVE 2
MACHINE LEARNING

• Machine learning is programming computers to optimize a performance


criterion using example data or past experience. We have a model defined up to
some parameters, and learning is the execution of a computer program to
optimize the parameters of the model using the training data or past experience.
• The model may be predictive to make predictions in the future, or descriptive
to gain knowledge from data, or both.
• Arthur Samuel, an early American leader in the field of computer gaming and
artificial intelligence, coined the term “Machine Learning” in 1959 while at
IBM. He defined machine learning as “the field of study that gives computers
the ability to learn without being explicitly programmed.”
MACHINE LEARNING

• Machine learning is a set of tools that, broadly speaking, allow us to “teach”


computers how to perform tasks by providing examples of how they should be done.
• For example, suppose we wish to write a program to distinguish between valid email
messages and unwanted spam. We could try to write a set of simple rules, for example,
flagging messages that contain certain features (such as the word “viagra” or
obviously-fake headers). However, writing rules to accurately distinguish which text is
valid can actually be quite difficult to do well, resulting either in many missed spam
messages, or, worse, many lost emails.
• Worse, the spammers will actively adjust the way they send spam in order to trick
these strategies (e.g., writing “vi@gr@”). Writing effective rules — and keeping them
up-to-date — quickly becomes an insurmountable task.
• Fortunately, machine learning has provided a solution.
• Modern spam filters are “learned” from examples: we provide the learning
algorithm with example emails which we have manually labeled as “ham”
(valid email) or “spam” (unwanted email), and the algorithms learn to
distinguish between them automatically.
• Definition A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its performance at tasks
T, as measured by P, improves with experience E.
• Examples:
• A. Handwriting recognition learning problem
• • Task T: Recognising and classifying handwritten words within images
• • Performance P: Percent of words correctly classified
• • Training experience E: A dataset of handwritten words with given classifications
• B. A robot driving learning problem
• • Task T: Driving on highways using vision sensors
• • Performance measure P: Average distance traveled before an error
• • training experience: A sequence of images and steering commands recorded while
observing a human driver
• C. A chess learning problem
• • Task T: Playing chess
• • Performance measure P: Percent of games won against opponents
• • Training experience E: Playing practice games against itself
• DEFINITION
• A computer program which learns from experience is called a machine
learning program or simply a learning program. Such a program is sometimes
also referred to as a learner.
COMPONENTS OF LEARNING

• Basic components of learning process


• The learning process, whether by a human or a machine, can be divided into
four components, namely, data storage, abstraction, generalization and
evaluation.
• Figure 1.1 illustrates the various components and the steps involved in the
learning process.
COMPONENTS OF LEARNING

• 1. DATA STORAGE
• Facilities for storing and retrieving huge amounts of data are an important component of the learning
process. Humans and computers alike utilize data storage as a foundation for advanced reasoning.
• In a human being, the data is stored in the brain and data is retrieved using electrochemical signals.
• Computers use hard disk drives, flash memory, random access memory and similar devices to store data and
use cables and other technology to retrieve data.
• 2. ABSTRACTION
• The second component of the learning process is known as abstraction.
• Abstraction is the process of extracting knowledge about stored data. This involves creating general
concepts about the data as a whole. The creation of knowledge involves application of known models and
creation of new models. The process of fitting a model to a dataset is known as training. When the model
has been trained, the data is transformed into an abstract form that summarizes the original information.
COMPONENTS OF LEARNING

• 3. GENERALIZATION
• The third component of the learning process is known as generalization. The term
generalization describes the process of turning the knowledge about stored data into a
form that can be utilized for future action. These actions are to be carried out on tasks that
are similar, but not identical, to those what have been seen before. In generalization, the
goal is to discover those properties of the data that will be most relevant to future tasks.
• 4. EVALUATION
• Evaluation is the last component of the learning process. It is the process of giving
feedback to the user to measure the utility of the learned knowledge. This feedback is then
utilised to effect improvements in the whole learning process
APPLICATIONS OF MACHINE
LEARNING

• Application of machine learning methods to large databases is called data mining.


• In data mining, a large volume of data is processed to construct a simple model with valuable
use, for example, having high predictive accuracy.
• The following is a list of some of the typical applications of machine learning.
• 1. In retail business, machine learning is used to study consumer behavior.
• 2. In finance, banks analyze their past data to build models to use in credit applications, fraud detection, and
the stock market.
• 3. In manufacturing, learning models are used for optimization, control, and troubleshooting.
• 4. In medicine, learning programs are used for medical diagnosis.
• 5. In telecommunications, call patterns are analyzed for network optimization and maximizing the quality
of service.
• 6. In science, large amounts of data in physics, astronomy, and biology can only be
analyzed fast enough by computers. The World Wide Web is huge; it is constantly growing
and searching for relevant information cannot be done manually.
• 7. In artificial intelligence, it is used to teach a system to learn and adapt to changes so that
the system designer need not foresee and provide solutions for all possible situations.
• 8. It is used to find solutions to many problems in vision, speech recognition, and robotics.
• 9. Machine learning methods are applied in the design of computer-controlled vehicles to
steer correctly when driving on a variety of roads.
• 10. Machine learning methods have been used to develop programs for playing games
such as chess, backgammon and Go.
MULTIPLE WAYS TO DEFINE MACHINE
LEARNING

• Machine learning is a diverse and exciting field, and there are multiple ways of
defining it:
• 1. THE ARTIFICIAL INTELLIGENCE VIEW.
• Learning is central to human knowledge and intelligence, and, likewise, it is also
essential for building intelligent machines. Years of effort in AI has shown that
trying to build intelligent computers by programming all the rules cannot be done;
automatic learning is crucial.
• For example, we humans are not born with the ability to understand language —
we learn it — and it makes sense to try to have computers learn language instead
of trying to program it all it.
• 2. THE SOFTWARE ENGINEERING VIEW.
• Machine learning allows us to program computers by example, which can be
easier than writing code the traditional way.
• 3. THE STATS VIEW.
• Machine learning is the marriage of computer science and statistics:
computational techniques are applied to statistical problems. Machine learning
has been applied to a vast number of problems in many contexts, beyond the
typical statistics problems. Machine learning is often designed with different
considerations than statistics (e.g., speed is often more important than accuracy).
TWO PHASES

• Often, machine learning methods are broken into two phases:


• 1. TRAINING:
• A model is learned from a collection of training data.
• 2. APPLICATION:
• The model is used to make decisions about some new test data.
• For example, in the spam filtering case, the training data constitutes email
messages labeled as ham or spam, and each new email message that we receive
(and which to classify) is test data. However, there are other ways in which
machine learning is used as well.
TYPES OF MACHINE LEARNING

• Some of the main types of machine learning are:


• 1. Supervised Learning, in which the training data is labeled with the correct
answers, e.g., “spam” or “ham.” The two most common types of supervised
learning are classification (where the outputs are discrete labels, as in spam
filtering) and regression (where the outputs are real-valued).
• 2. Unsupervised learning, in which we are given a collection of unlabeled data,
which we wish to analyze and discover patterns within. The two most important
examples are dimension reduction and clustering.
• 3. Reinforcement learning, in which an agent (e.g., a robot or controller) seeks to
learn the optimal actions to take based the outcomes of past actions.
OTHER TYPES OF MACHINE LEARNING

• There are many other types of machine learning as well, for example:
• 1. Semi-supervised learning, in which only a subset of the training data is
labeled
• 2. Time-series forecasting, such as in financial markets
• 3. Anomaly detection such as used for fault-detection in factories and in
surveillance
• 4. Active learning, in which obtaining data is expensive, and so an algorithm
must determine which training data to acquire and many others.
LEARNING MODELS

• Machine learning is concerned with using the right features to build the right
models that achieve the right tasks.
• The basic idea of Learning models has divided into three categories.
• For a given problem, the collection of all possible outcomes represents the sample
space or instance space.
• Using a Logical expression. (Logical models)
• Using the Geometry of the instance space. (Geometric models)
• Using Probability to classify the instance space. (Probabilistic models)
• Grouping and Grading
LEARNING MODELS

• LOGICAL MODELS use a logical expression to divide the instance space


into segments and hence construct grouping models.
• A logical expression is an expression that returns a Boolean value, i.e., a True
or False outcome. Once the data is grouped using a logical expression, the data
is divided into homogeneous groupings for the problem we are trying to solve.
For example, for a classification problem, all the instances in the group belong
to one class.
LEARNING MODELS

• There are mainly two kinds of logical models: Tree models and Rule models.
• Rule models consist of a collection of implications or IF-THEN rules. For
tree-based models, the ‘if-part’ defines a segment and the ‘then-part’ defines
the behaviour of the model for this segment. Rule models follow the same
reasoning.
LOGICAL MODELS AND CONCEPT
LEARNING

• To understand logical models further, we need to understand the idea of Concept Learning.
Concept Learning involves learning logical expressions or concepts from examples.
• The idea of Concept Learning fits in well with the idea of Machine learning, i.e., inferring a
general function from specific training examples.
• Concept learning forms the basis of both tree-based and rule-based models. More formally,
Concept Learning involves acquiring the definition of a general category from a given set of
positive and negative training examples of the category.
• A Formal Definition for Concept Learning is “The inferring of a Boolean-valued function
from training examples of its input and output.” In concept learning, we only learn a
description for the positive class and label everything that doesn’t satisfy that description as
negative.
LO G I C A L LE A R N I N G M O D EL S

• The following example explains this idea in more detail.


LO G I C A L LE A R N I N G M O D EL S

• A Concept Learning Task called “Enjoy Sport” as shown above is defined by a set of data
from some example days.
• Each data is described by six attributes. The task is to learn to predict the value of Enjoy Sport
for an arbitrary day based on the values of its attribute values. The problem can be represented
by a series of hypotheses.
• Each hypothesis is described by a conjunction of constraints on the attributes. The training
data represents a set of positive and negative examples of the target function.
• In the example, each hypothesis is a vector of six constraints, specifying the values of the six
attributes – Sky, AirTemp, Humidity, Wind, Water, and Forecast.
• The training phase involves learning the set of days (as a conjunction of attributes) for which
Enjoy Sport = yes.
• Thus, the problem can be formulated as:
• Given instances X which represent a set of all possible days, each described by the attributes:
• Sky – (values: Sunny, Cloudy, Rainy),
• AirTemp – (values: Warm, Cold),
• Humidity – (values: Normal, High),
• Wind – (values: Strong, Weak),
• Water – (values: Warm, Cold),
• Forecast – (values: Same, Change).
• Try to identify a function that can predict the target variable Enjoy Sport as yes/no, i.e., 1 or 0.
• GEOMETRIC MODELS
• In the previous section, we have seen that with logical models, such as decision
trees, a logical expression is used to partition the instance space. Two instances
are similar when they end up in the same logical segment. In this section, we
consider models that define similarity by considering the geometry of the
instance space.
• In Geometric models, features could be described as points in two dimensions
(x- and y-axis) or a three-dimensional space (x, y, and z). Even when features
are not intrinsically geometric, they could be modelled in a geometric manner
(for example, temperature as a function of time can be modelled in two axes).
G E O M ET RI C L E A RN I N G M O D E L

• In geometric models, there are two ways we could impose similarity.


• We could use geometric concepts like lines or planes to segment (classify) the
instance space. These are called Linear models.
• Alternatively, we can use the geometric notion of distance to represent
similarity. In this case, if two points are close together, they have similar values
for features and thus can be classed as similar. We call such models as
Distance-based models.
• Linear models
• Linear models are relatively simple. In this case,
the function is represented as a linear combination
of its inputs. Thus, if x1 and x2 are two scalars or
vectors of the same dimension and a and b are
arbitrary scalars, then ax1 + bx2 represents a linear
combination of x1 and x2. In the simplest case
where f(x) represents a straight line, we have an
equation of the form f (x)= mx + c where c
represents the intercept and m represents the slope.
• Linear models are parametric, which means that they have a fixed form with a small
number of numeric parameters that need to be learned from data. For example, in f (x) =
mx + c, m and c are the parameters that we are trying to learn from the data. This
technique is different from tree or rule models, where the structure of the model (e.g.,
which features to use in the tree, and where) is not fixed in advance.
• Linear models are stable, i.e., small variations in the training data have only a limited
impact on the learned model. In contrast, tree models tend to vary more with the training
data, as the choice of a different split at the root of the tree typically means that the rest of
the tree is different as well. As a result of having relatively few parameters, Linear models
have low variance and high bias. This implies that Linear models are less likely to overfit
the training data than some other models. However, they are more likely to underfit.
• Distance-based models
• Distance-based models are the second class of Geometric models. Like Linear
models, distance-based models are based on the geometry of data. As the name
implies, distance-based models work on the concept of distance. In the context
of Machine learning, the concept of distance is not based on merely the
physical distance between two points. Instead, we could think of the distance
between two points considering the mode of transport between two points.
• Thus, depending on the entity and the mode of travel, the concept of distance
can be experienced differently. The distance metrics commonly used are
Euclidean, Minkowski, Manhattan, and Mahalanobis
• Distance is applied through the concept of neighbors and exemplars. Neighbors are points in
proximity with respect to the distance measure expressed through exemplars. Exemplars are
either centroids that find a center of mass according to a chosen distance metric or medoids
that find the most centrally located data point. The most commonly used centroid is the
arithmetic mean, which minimizes squared Euclidean distance to all other points.
• Notes:
• The centroid represents the geometric center of a plane figure, i.e., the
arithmetic mean position of all the points in the figure from the centroid point.
This definition extends to any object in n-dimensional space: its centroid is the
mean position of all the points.
• Medoids are similar in concept to means or centroids. Medoids are most
commonly used on data when a mean or centroid cannot be defined. They are
used in contexts where the centroid is not representative of the dataset, such as
in image data.
• Examples of distance-based models include the nearest-neighbor models,
which use the training data as exemplars – for example, in classification. The
K-means clustering algorithm also uses exemplars to create clusters of similar
data points.
P RO B A BI LI ST I C LE A R N I N G M O D EL S

• PROBABILISTIC MODELS
• The third family of machine learning algorithms is the probabilistic models.
We have seen before that the k-nearest neighbor algorithm uses the idea of
distance (e.g., Euclidian distance) to classify entities, and logical models use a
logical expression to partition the instance space. In this section, we see how
the probabilistic models use the idea of probability to classify new entities.
• Probabilistic models see features and target variables as random variables. The
process of modelling represents and manipulates the level of uncertainty with
respect to these variables. There are two types of probabilistic models:
Predictive and Generative.
• Predictive probability models use the idea of a conditional probability
distribution P (Y |X) from which Y can be predicted from X.
• Generative models estimate the joint distribution P (Y, X).
• Naïve Bayes is an example of a probabilistic classifier.
• The Naïve Bayes algorithm is based on the idea of Conditional Probability.
Conditional probability is based on finding the probability that something will
happen, given that something else has already happened. The task of the
algorithm then is to look at the evidence and to determine the likelihood of a
specific class and assign a label accordingly to each entity.
DESIGN OF A LEARNING SYSTEM

• Just now we looked into the learning process and also understood the goal of the
learning. When we want to design a learning system that follows the learning
process, we need to consider a few design choices.
• The design choices will be to decide the following key components:
• 1. Type of training experience
• 2. Choosing the Target Function
• 3. Choosing a representation for the Target Function
• 4. Choosing an approximation algorithm for the Target Function
• 5. The final Design
TYPE OF TRAINING EXPERIENCE

• During the design of the checker's learning system, the type of training
experience available for a learning system will have a significant effect on the
success or failure of the learning.
• Direct or Indirect training experience — In the case of direct training
experience, an individual board states and correct move for each board state are
given. In case of indirect training experience, the move sequences for a game
and the final result (win, loss or draw) are given for a number of games. How
to assign credit or blame to individual moves is the credit assignment problem.
TY P E O F T RA I N I N G EX P ERI E N C E

• Teacher or Not —
• Supervised — The training experience will be labeled, which means, all the
board states will be labeled with the correct move. So the learning takes place
in the presence of a supervisor or a teacher.
• Unsupervised — The training experience will be unlabeled, which means, all
the board states will not have the moves. So the learner generates random
games and plays against itself with no supervision or teacher involvement.
• Semi-supervised — Learner generates game states and asks the teacher for
help in finding the correct move if the board state is confusing.
• Is the training experience good — Do the training examples represent the
distribution of examples over which the final system performance will be
measured? Performance is best when training examples and test examples are
from the same/a similar distribution.
PERSPECTIVES AND ISSUES IN
MACHINE LEARNING

• Perspectives in Machine Learning


• One useful perspective on machine learning is that it involves searching a very
large space of possible hypotheses to determine one that best fits the observed
data and any prior knowledge held by the learner.
ISSUES IN MACHINE LEARNING

• Our checkers example raises a number of generic questions about machine learning. The field
of machine learning, and much of this book, is concerned with answering questions such as the
following:
•  What algorithms exist for learning general target functions from specific training examples?
In what settings will particular algorithms converge to the desired function, given sufficient
training data? Which algorithms perform best for which types of problems and representations?
•  How much training data is sufficient? What general bounds can be found to relate the
confidence in learned hypotheses to the amount of training experience and the character of the
learner's hypothesis space?
•  When and how can prior knowledge held by the learner guide the process of generalizing
from examples? Can prior knowledge be helpful even when it is only approximately correct?
ISSUES IN MACHINE LEARNING

•  What is the best strategy for choosing a useful next training experience, and
how does the choice of this strategy alter the complexity of the learning
problem?
•  What is the best way to reduce the learning task to one or more function
approximation problems? Put another way, what specific functions should the
system attempt to learn? Can this process itself be automated?
•  How can the learner automatically alter its representation to improve its
ability to represent and learn the target function?
VERSION SPACES

• Definition (Version space).


• A concept is complete if it covers all positive examples.
• A concept is consistent if it covers none of the negative examples. The version
space is the set of all complete and consistent concepts. This set is convex and
is fully defined by its least and most general elements.

• The key idea in the CANDIDATE-ELIMINATION algorithm is to output a


description of the set of all hypotheses consistent with the training examples
• Representation
• The Candidate – Elimination algorithm finds all describable hypotheses that
are consistent with the observed training examples. In order to define this
algorithm precisely, we begin with a few basic definitions. First, let us say that
a hypothesis is consistent with the training examples if it correctly classifies
these examples.
THE LIST-THEN-ELIMINATION
ALGORITHM

• The LIST-THEN-ELIMINATE algorithm first initializes the version space to


contain all hypotheses in H and then eliminates any hypothesis found
inconsistent with any training example.
CANDIDATE-ELIMINATION LEARNING
ALGORITHM

• The CANDIDATE-ELIMINTION algorithm computes the version space


containing all hypotheses
• from H that are consistent with an observed sequence of training examples.

You might also like