Basics of Machine Learning
Basics of Machine Learning
Basics of Machine Learning
Machine Intelligence
Prof. Srikanta Patnaik, Professor
Department of Computer Science and Engineering
SOA University, Bhubaneswar, Odisha, India
Email: [email protected]
Four Paradigm of Machine Intelligence
Systems Systems
that think that act
like humans like humans
Systems Systems
that think that act
rationally rationally
In the table the first column is concerned with thought processes, where as second column
is concerned with behaviour. And again First row belongs to human performance, where as
second row deals with ideal concept of intelligence, which is called as rationality.
Conti…
“The art of creating machines that perform functions which require intelligence when performed by people”
“The study of how to make computers do things at which, people are better at present”
The Turing Test, proposed by Alan Turing, was designed to provide an operational definition of intelligence.
Turing defined intelligent behaviour as the ability to achieve human-level performance in all cognitive tasks,
sufficient to fool an interrogator. Informally speaking, the test he proposed is that the computer should be
interrogated by a human via a teletype, and passes the test if the interrogator cannot tell if there is a computer or
a human at the other end.
However, the total Turing Test includes a video signal so that the interrogator can test the subject's perceptual
abilities, as well as the opportunity for the interrogator to pass physical objects ``through the hatch.'' To pass the
total Turing Test, the computer will need
computer vision to perceive objects, and
robotics to move them about.
Systems that think like humans (Cognitive Modeling Approach)
“The exciting new effort to make computers think ... machines with minds, in the full and literal
sense”
“The automation of activities that we associate with human thinking, activities such as decision-
making, problem solving, learning ..”
If we are going to say that a given program thinks like a human, we must have some way of
determining how humans think. We need to know the actual workings of human minds.
There are two ways to do this: through introspection--trying to catch our own thoughts as they
go by--or through psychological experiments. Once we have a sufficiently precise theory of the
mind, it becomes possible to express the theory as a computer program. If the program's
input/output and timing behaviour matches human behaviour, it is evidence that some of the
program's mechanisms may also be operating in humans.
The interdisciplinary field of cognitive science brings together computer models from machine
intelligence and experimental techniques from psychology to try to construct precise and testable
theories of the workings of the human mind.
Systems that think rationally (Laws of Thought Approach)
“The study of the computations that make it possible to perceive, reason, and act”
The Greek philosopher Aristotle was one of the first to attempt to codify ``right thinking,'' that
is, undisputable reasoning processes. His famous syllogisms provided patterns for argument
structures that always gave correct consequences given correct antecedent.
Example: ”Socrates is a man; all men are mortal; therefore Socrates is mortal” These laws of
thought were supposed to govern the operation of the mind, and initiated the field of logic.
There are two main obstacles to this approach. First, it is not easy to take informal knowledge
and state it in the formal terms required by logical notation, particularly when the knowledge
is not certain. Second, there is a big difference between being able to solve a problem ``in
principle'' and practice it. Even problems with just a few dozen facts can exhaust the
computational resources of any computer unless it has some guidance as to which reasoning
steps to apply first.
Systems that act rationally (Rational Agent Approach)
“A field of study that seeks to explain and emulate intelligent behaviour in terms of computational
processes''
“The branch of computer science that is concerned with the automation of intelligent behaviour”
Acting rationally means acting so as to achieve one's goals, given one's beliefs. An agent is just
something that perceives and acts. In this approach, machine intelligence is viewed as the study
and construction of rational agents.
In the “laws of thought” approach, the whole emphasis was on correct inferences. Making correct
inferences is sometimes part of being a “rational agent”, because one way to act rationally is to
reason logically to the conclusion that a given action will achieve one's goals, and then to act on
that conclusion. There are also ways of acting rationally that cannot be reasonably said to involve
inference. For example, pulling one's hand off of a hot stove is a reflex action that is more
successful than a slower action taken after careful deliberation.
All the ``cognitive skills'' needed for the Turing Test are there to allow rational actions. Thus, we
need the ability to represent knowledge and reason with it because this enables us to reach good
decisions in a wide variety of situations.
Computer Soft AI tools
Vision Computing
Fig. Three Cycles namely Acquisition, Perception and Learning & Coordination
with their states in the model of Cognition
Various States of Cognition
Sensing and Acquisition: Sensing in engineering science refers to reception and
transformation of signals into a measurable form, which has a wider perspective in cognitive
science. It includes pre-processing and extraction of features from the sensed data along with
stored knowledge of LTM. For example, visual information on reception is filtered from
undesirable noise and the elementary features like size, shape, color are extracted and stored in
STM .
Reasoning: Generally this state constructs high-level knowledge from acquired information of
relatively lower level and organizes it in structural form for the efficient access. The process of
reasoning analyses the semantic or meaningful behavior of the low-level knowledge and their
association. It can be modeled by a number of techniques such as commonsense reasoning,
causal reasoning, non-monotonic reasoning, default reasoning, fuzzy reasoning, spatial and
temporal reasoning and meta-level reasoning .
Learning: Generally speaking, Learning is a process that takes the sensory stimuli from the outside
world in the form of examples and classifies these things without providing any explicit rules . For
instance, a child cannot distinguish between a cat and a dog. But as he grows, he can do so, based
on numerous examples of each animal given to him. Learning involves a teacher, who helps to
classify things by correcting the mistake of the learner each time. In machine learning, a program
takes the place of a teacher, which discovers the mistake of the learner. Numerous methods and
techniques of learning have been developed and classified as Supervised, Unsupervised and
Reinforcement learning.
Planning: The state of planning engages itself to determine the steps of action involved in deriving
the required goal state from known initial states of the problem. The main task is to identify the
appropriate piece of knowledge derived from LTM at a given instance of time. Then planning
executes this task through matching the problem states with its perceptual model.
Various States of Cognition (Cont..)
Action and Coordination: This state determines the control commands for various actuators
to execute the schedule of the action-plan of a given problem, which is carried out through a
process of supervised learning . The state also coordinates between various desired actions and
the input stimuli.
Cognitive Memory:
Sensory information is stored in human brain at closely linked neuron cells. Information in
some cells could be preserved only for a short duration, which is referred as Short Term
Memory (STM). Further, there are cells in human brain that can hold information for quite a
long time, which is called Long Term Memory (LTM). STM and LTM could also be of two
basic varieties, namely iconic memory and echoic memory. The iconic memory can store
visual information where as the echoic memory deals with audio information. These two types
of memories together are generally called as sensory memory. Tulving alternatively classified
human memory into three classes namely episodic, semantic and procedural memory. Episodic
memory saves the facts on their happening; the semantic memory constructs knowledge in
structural form, where as the procedural memory helps in taking decisions for actions.
Cycles of Cognition
Acquisition Cycle: The task of the Acquisition Cycle is to store the information temporarily
in STM after sensing the information through various sensory organs. Then it compares the
response of the STM with already acquired and permanently stored information in LTM. The
process of representation of the information for storage and retrieval from LTM is a critical
job, which is known as knowledge representation. It is not yet known how human beings
store, retrieve and use the information from LTM.
Perception Cycle: It is a cycle or a process that uses the previously stored knowledge in
LTM to gather and interpret the stimuli registered by the sensory organs through Acquisition
Cycle. Three relevant states of Perception are reasoning, attention and recognition and
generally carried out by a process of unsupervised learning. Here, we can say that the
learning is unsupervised, since such refinement of knowledge is an autonomous process and
requires no trainer for its adaptation. Therefore, this cycle does not have ‘Learning’ as an
exclusive state. It is used mainly for feature extraction, image matching and robot world
modeling.
Learning & Coordination Cycle: Once the environment is perceived and stored in LTM in a
suitable format (data structure), the autonomous system utilizes various states namely
Learning, Planning and Action & Coordination . These three states taken together are called
as Learning & Coordination Cycle, which is being utilized by the agents to plan about its
actuations in the environment.
Machine Learning
14
Alpydin & Ch. Eick: ML Topic1
Why “Learn”?
Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
There is no need to “learn” to calculate payroll
Learning is used when:
Human expertise does not exist (navigating on Mars),
Humans are unable to explain their expertise (speech recognition)
Solution changes in time (routing on a computer network)
Solution needs to be adapted to particular cases (user biometrics)
15
What We Talk About When We Talk
About“Learning”
Learning general models from a data of particular examples
Data is cheap and abundant (data warehouses, data marts);
knowledge is expensive and scarce.
Example in retail: Customer transactions to consumer
behavior:
People who bought “Da Vinci Code” also bought “The Five People
You Meet in Heaven” (www.amazon.com)
Build a model that is a good and useful approximation to the
data.
16
Data Mining/KDD
Definition := “KDD is the non-trivial process of
identifying valid, novel, potentially useful, and
ultimately understandable patterns in data” (Fayyad)
Applications:
Retail: Market basket analysis, Customer relationship management (CRM)
Finance: Credit scoring, fraud detection
Manufacturing: Optimization, troubleshooting
Medicine: Medical diagnosis
Telecommunications: Quality of service optimization
Bioinformatics: Motifs, alignment
Web mining: Search engines
...
17
What is Machine Learning?
Machine Learning
Study of algorithms that
improve their performance
at some task
with experience
Optimize a performance criterion using example data or past
experience.
Role of Statistics: Inference from a sample
Role of Computer science: Efficient algorithms to
Solve the optimization problem
Representing and evaluating the model for inference
18
Machine Learning
The brain basically learns from experience. Neural networks are sometimes called machine learning algorithms,
because changing of its connection weights (training) causes the network to learn the solution to a problem. The
strength of connection between the neurons is stored as a weight-value for the specific connection. The system
learns new knowledge by adjusting these connection weights. The learning ability of a neural network is determined
by its architecture and by the algorithmic method chosen for training.
Unsupervised learning: The hidden neurons must find a way to organize themselves without help from the outside.
In this approach, no sample outputs are provided to the network against which it can measure its predictive
performance for a given vector of inputs. This is learning by doing.
Reinforcement learning: his method works on reinforcement from the outside. The connections among the neurons
in the hidden layer are randomly arranged, then reshuffled as the network is told how close it is to solving the
problem. Reinforcement learning is also called supervised learning, because it requires a teacher. The teacher may
be a training set of data or an observer who grades the performance of the network results.
Both unsupervised and reinforcement suffer from relative slowness and inefficiency relying on a random shuffling
to find the proper connection weights.
Back propagation: This method is proven highly successful in training of multilayered neural nets. The network is
not just given reinforcement for how it is doing on a task. Information about errors is also filtered back through the
system and is used to adjust the connections between the layers, thus improving performance. A form of supervised
learning.
Off-line or On-line Learning
One can categorize the learning methods into yet another group, off-line or on-line.
When the system uses input data to change its weights to learn the domain
knowledge, the system could be in training mode or learning mode. When the
system is being used as a decision aid to make recommendations, it is in the
operation mode, this is also sometimes called recall.
Off-line: In the off-line learning methods, once the systems enters into the
operation mode, its weights are fixed and do not change any more. Most of the
networks are of the off-line learning type.
On-line: In on-line or real time learning, when the system is in operating mode
(recall), it continues to learn while being used as a decision tool. This type of
learning has a more complex design structure.
Learning laws
There are a variety of learning laws which are in common use. These laws are mathematical
algorithms used to update the connection weights. Most of these laws are some sort of variation of the
best known and oldest learning law, Hebb’s Rule.
Man’s understanding of how neural processing actually works is very limited. Learning is certainly
more complex than the simplification represented by the learning laws currently developed. Research
into different learning functions continues as new ideas routinely show up in trade publications etc. A
few of the major laws are given as an example below.
Hebb’s Rule: The first and the best known learning rule was introduced by Donald Hebb. The
description appeared in his book The organization of Behavior in 1949. This basic rule is: If a neuron
receives an input from another neuron, and if both are highly active (mathematically have the same
sign), the weight between the neurons should be strengthened.
Hopfield Law: This law is similar to Hebb’s Rule with the exception that it specifies the magnitude
of the strengthening or weakening. It states, "if the desired output and the input are both active or both
inactive, increment the connection weight by the learning rate, otherwise decrement the weight by the
learning rate."
Note: Most learning functions have some provision for a learning rate, or a learning constant. Usually this
term is positive and between zero and one.
Continue…
The Delta Rule: The Delta Rule is a further variation of Hebb’s Rule, and it is one of the most commonly
used. This rule is based on the idea of continuously modifying the strengths of the input connections to
reduce the difference (the delta) between the desired output value and the actual output of a neuron. This
rule changes the connection weights in the way that minimizes the mean squared error of the network. The
error is back propagated into previous layers one layer at a time. The process of back-propagating the
network errors continues until the first layer is reached. The network type called Feed forward, Back-
propagation derives its name from this method of computing the error term.
Note: This rule is also referred to as the Widrow-Hoff Learning Rule and the Least Mean Square Learning Rule.
Kohonen’s Learning Law: his procedure, developed by Teuvo Kohonen, was inspired by learning in
biological systems. In this procedure, the neurons compete for the opportunity to learn, or to update their
weights. The processing neuron with the largest output is declared the winner and has the capability of
inhibiting its competitors as well as exciting its neighbours. Only the winner is permitted output, and only
the winner plus its neighbours are allowed to update their connection weights.
The Kohonen rule does not require desired output. Therefore it is implemented in the unsupervised methods
of learning. Kohonen has used this rule combined with the on-center/off-surround intra- layer connection to
create the self-organizing neural network, which has an unsupervised learning method.
Applications
Detection of medical phenomena. A variety of health-related indices (e.g., a combination of heart rate,
levels of various substances in the blood, respiration rate) can be monitored. The onset of a particular
medical condition could be associated with a very complex (e.g., nonlinear and interactive) combination
of changes on a subset of the variables being monitored. Neural networks have been used to recognize
this predictive pattern so that the appropriate treatment can be prescribed.
Stock market prediction. Fluctuations of stock prices and stock indices are another example of a
complex, multidimensional, but in some circumstances at least partially-deterministic phenomenon.
Neural networks are being used by many technical analysts to make predictions about stock prices based
upon a large number of factors such as past performance of other stocks and various economic indicators.
Credit assignment. A variety of pieces of information are usually known about an applicant for a loan.
For instance, the applicant's age, education, occupation, and many other facts may be available. After
training a neural network on historical data, neural network analysis can identify the most relevant
characteristics and use those to classify applicants as good or bad credit risks.
Monitoring the condition of machinery. Neural networks can be instrumental in cutting costs by
bringing additional expertise to scheduling the preventive maintenance of machines. A neural network
can be trained to distinguish between the sounds a machine makes when it is running normally ("false
alarms") versus when it is on the verge of a problem. After this training period, the expertise of the
network can be used to warn a technician of an upcoming breakdown, before it occurs and causes costly
unforeseen "downtime."
Engine management. Neural networks have been used to analyze the input of sensors from an engine.
The neural network controls the various parameters within which the engine functions, in order to achieve
a particular goal, such as minimizing fuel consumption.
Related Fields
data
mining control theory
statistics
decision theory
information theory machine
learning
cognitive science
databases
psychological models
evolutionary neuroscience
models
Machine learning is primarily concerned with the accuracy and effectiveness of the computer
system.
Machine Learning Paradigms
rote learning
learning by being told (advice-taking)
learning from examples (induction)
learning by analogy
speed-up learning
concept learning
clustering
discovery
…
Architecture of a Learning System
feedback performance standard
critic percepts
changes ENVIRONMEN
learning performance
element element actions
knowledge
learning goals
problem
generator
Learning Element
Design affected by:
performance element used
e.g., utility-based agent, reactive agent, logical agent
functional component to be learned
e.g., classifier, evaluation function, perception-action function,
representation of functional component
e.g., weighted linear function, logical theory, HMM
feedback available
e.g., correct action, reward, relative preferences
Dimensions of Learning Systems
type of feedback
supervised (labeled examples)
unsupervised (unlabeled examples)
reinforcement (reward)
representation
attribute-based (feature vector)
relational (first-order logic)
use of knowledge
empirical (knowledge-free)
analytical (knowledge-guided)
Outline
Supervised learning
empirical learning (knowledge-free)
attribute-value representation
logical representation
target function f: X → Y
example (x,f(x))
hypothesis g: X → Y such that g(x) = f(x)
32
Alpydin & Ch. Eick: ML Topic1
Applications
Association Analysis
Supervised Learning
Classification
Regression/Prediction
Unsupervised Learning
Reinforcement Learning
33
Learning Associations
Basket analysis:
P (Y | X ) probability that somebody who buys X also buys Y
where X and Y are products/services.
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Classification
Example: Credit
scoring
Differentiating
between low-risk and
high-risk customers
from their income and
savings
Model 35
Classification: Applications
Aka Pattern recognition
Face recognition: Pose, lighting, occlusion (glasses, beard),
make-up, hair style
Character recognition: Different handwriting styles.
Speech recognition: Temporal dependency.
Use of a dictionary or the syntax of the language.
Sensor fusion: Combine multiple modalities; eg, visual (lip image) and
acoustic for speech
Medical diagnosis: From symptoms to illnesses
Web Advertizing: Predict if a user clicks on an ad on the Internet.
36
Face Recognition
Training examples of a person
Test images
37
Prediction: Regression
Example: Price of a used car
x : car attributes
y : price y = wx+w0
y = g (x | θ)
g ( ) model,
θ parameters
38
Regression Applications
Navigating a car: Angle of the steering wheel (CMU NavLab)
Kinematics of a robot arm
α1
39
Supervised Learning: Uses
Example: decision trees tools that create rules
Prediction of future cases: Use the rule to predict the output
for future inputs
Knowledge extraction: The rule is easy to understand
Compression: The rule is simpler than the data it explains
Outlier detection: Exceptions that are not covered by the rule,
e.g., fraud
40
Unsupervised Learning
Learning “what normally happens”
No output
Clustering: Grouping similar instances
Other applications: Summarization, Association Analysis
Example applications
Customer segmentation in CRM
Image compression: Color quantization
Bioinformatics: Learning motifs
41
Reinforcement Learning
Topics:
Policies: what actions should an agent take in a particular situation
Utility estimation: how good is a state (used by policy)
No supervised output but delayed reward
Credit assignment problem (what was responsible for the
outcome)
Applications:
Game playing
Robot in a maze
Multiple agents, partial observability, ...
42
Resources: Datasets
UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html
UCI KDD Archive:
http://kdd.ics.uci.edu/summary.data.application.html
Statlib: http://lib.stat.cmu.edu/
Delve: http://www.cs.utoronto.ca/~delve/
43
Resources: Journals
Journal of Machine Learning Research www.jmlr.org
Machine Learning
IEEE Transactions on Neural Networks
IEEE Transactions on Pattern Analysis and Machine
Intelligence
Annals of Statistics
Journal of the American Statistical Association
...
44
Resources: Conferences
International Conference on Machine Learning (ICML)
European Conference on Machine Learning (ECML)
Neural Information Processing Systems (NIPS)
Computational Learning
International Joint Conference on Artificial Intelligence (IJCAI)
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)
IEEE Int. Conf. on Data Mining (ICDM)
45
Summary COSC 6342
Introductory course that covers a wide range of machine learning
techniques—from basic to state-of-the-art.
More theoretical/statistics oriented, compared to other courses I teach
might need continuous work not “to get lost”.
You will learn about the methods you heard about: Naïve Bayes’, belief
networks, regression, nearest-neighbor (kNN), decision trees, support vector
machines, learning ensembles, over-fitting, regularization, dimensionality reduction
& PCA, error bounds, parameter estimation, mixture models, comparing models,
density estimation, clustering centering on K-means, EM, and DBSCAN, active
and reinforcement learning.
Covers algorithms, theory and applications
It’s going to be fun and hard work
46
Alpydin & Ch. Eick: ML Topic1
Which Topics Deserve More Coverage
—if we had more time?
Graphical Models/Belief Networks (just ran out of time)
More on Adaptive Systems
Learning Theory
More on Clustering and Association Analysiscovered by Data
Mining Course
More on Feature Selection, Feature Creation
More on Prediction
Possibly: More depth coverage of optimization techniques, neural
networks, hidden Markov models, how to conduct a machine
learning experiment, comparing machine learning algorithms,…
47
Alpydin & Ch. Eick: ML Topic1