Basics of Machine Learning

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 47
At a glance
Powered by AI
The document discusses four paradigms of machine intelligence: systems that think like humans, systems that act like humans, systems that think rationally, and systems that act rationally. It also covers topics in machine learning such as supervised learning, unsupervised learning, and reinforcement learning.

The four paradigms of machine intelligence discussed are: systems that think like humans, systems that act like humans, systems that think rationally, and systems that act rationally.

Some examples of applications of unsupervised learning mentioned are customer segmentation in CRM, image compression through color quantization, and discovering motifs in bioinformatics.

Introduction to

Machine Intelligence
Prof. Srikanta Patnaik, Professor
Department of Computer Science and Engineering
SOA University, Bhubaneswar, Odisha, India
Email: [email protected]
Four Paradigm of Machine Intelligence

Systems Systems
that think that act
like humans like humans

Systems Systems
that think that act
rationally rationally

In the table the first column is concerned with thought processes, where as second column
is concerned with behaviour. And again First row belongs to human performance, where as
second row deals with ideal concept of intelligence, which is called as rationality.
Conti…

Note: A system is rational if it does the right thing. Historically,


all four approaches have been followed. As one might expect, a
tension exists between approaches centered around humans and
approaches centered around rationality. A human-centered
approach must be an empirical science, involving hypothesis and
experimental confirmation. A rationalist approach involves a
joint venture by mathematicians and engineers. People in each
group sometimes cast aspersions on work done in the other
groups, but the truth is that each direction has yielded valuable
insights.
Systems that act like humans (Turing Test Approach)

 “The art of creating machines that perform functions which require intelligence when performed by people”

 “The study of how to make computers do things at which, people are better at present”

 The Turing Test, proposed by Alan Turing, was designed to provide an operational definition of intelligence.
Turing defined intelligent behaviour as the ability to achieve human-level performance in all cognitive tasks,
sufficient to fool an interrogator. Informally speaking, the test he proposed is that the computer should be
interrogated by a human via a teletype, and passes the test if the interrogator cannot tell if there is a computer or
a human at the other end.

 The computer would need to possess the following capabilities:


 natural language processing
 knowledge representation
 automated reasoning
 machine learning

 However, the total Turing Test includes a video signal so that the interrogator can test the subject's perceptual
abilities, as well as the opportunity for the interrogator to pass physical objects ``through the hatch.'' To pass the
total Turing Test, the computer will need
 computer vision to perceive objects, and
 robotics to move them about.
Systems that think like humans (Cognitive Modeling Approach)

 “The exciting new effort to make computers think ... machines with minds, in the full and literal
sense”

 “The automation of activities that we associate with human thinking, activities such as decision-
making, problem solving, learning ..”

 If we are going to say that a given program thinks like a human, we must have some way of
determining how humans think. We need to know the actual workings of human minds.

 There are two ways to do this: through introspection--trying to catch our own thoughts as they
go by--or through psychological experiments. Once we have a sufficiently precise theory of the
mind, it becomes possible to express the theory as a computer program. If the program's
input/output and timing behaviour matches human behaviour, it is evidence that some of the
program's mechanisms may also be operating in humans.

 The interdisciplinary field of cognitive science brings together computer models from machine
intelligence and experimental techniques from psychology to try to construct precise and testable
theories of the workings of the human mind.
Systems that think rationally (Laws of Thought Approach)

 “The study of mental faculties through the use of computational models''

 “The study of the computations that make it possible to perceive, reason, and act”

 The Greek philosopher Aristotle was one of the first to attempt to codify ``right thinking,'' that
is, undisputable reasoning processes. His famous syllogisms provided patterns for argument
structures that always gave correct consequences given correct antecedent.

 Example: ”Socrates is a man; all men are mortal; therefore Socrates is mortal” These laws of
thought were supposed to govern the operation of the mind, and initiated the field of logic.

 There are two main obstacles to this approach. First, it is not easy to take informal knowledge
and state it in the formal terms required by logical notation, particularly when the knowledge
is not certain. Second, there is a big difference between being able to solve a problem ``in
principle'' and practice it. Even problems with just a few dozen facts can exhaust the
computational resources of any computer unless it has some guidance as to which reasoning
steps to apply first.
Systems that act rationally (Rational Agent Approach)

 “A field of study that seeks to explain and emulate intelligent behaviour in terms of computational
processes''

 “The branch of computer science that is concerned with the automation of intelligent behaviour”

 Acting rationally means acting so as to achieve one's goals, given one's beliefs. An agent is just
something that perceives and acts. In this approach, machine intelligence is viewed as the study
and construction of rational agents.

 In the “laws of thought” approach, the whole emphasis was on correct inferences. Making correct
inferences is sometimes part of being a “rational agent”, because one way to act rationally is to
reason logically to the conclusion that a given action will achieve one's goals, and then to act on
that conclusion. There are also ways of acting rationally that cannot be reasonably said to involve
inference. For example, pulling one's hand off of a hot stove is a reflex action that is more
successful than a slower action taken after careful deliberation.

 All the ``cognitive skills'' needed for the Turing Test are there to allow rational actions. Thus, we
need the ability to represent knowledge and reason with it because this enables us to reach good
decisions in a wide variety of situations.
Computer Soft AI tools
Vision Computing

VLSI Intelligent Cognitive


Machines Technology

Distributed Wireless Sensors


Actuators Networks Technologies
Model of Cognition

LTM = Long Term Memory; STM = Short Term Memory

Fig. Three Cycles namely Acquisition, Perception and Learning & Coordination
with their states in the model of Cognition
Various States of Cognition
 Sensing and Acquisition: Sensing in engineering science refers to reception and
transformation of signals into a measurable form, which has a wider perspective in cognitive
science. It includes pre-processing and extraction of features from the sensed data along with
stored knowledge of LTM. For example, visual information on reception is filtered from
undesirable noise and the elementary features like size, shape, color are extracted and stored in
STM .

 Reasoning: Generally this state constructs high-level knowledge from acquired information of
relatively lower level and organizes it in structural form for the efficient access. The process of
reasoning analyses the semantic or meaningful behavior of the low-level knowledge and their
association. It can be modeled by a number of techniques such as commonsense reasoning,
causal reasoning, non-monotonic reasoning, default reasoning, fuzzy reasoning, spatial and
temporal reasoning and meta-level reasoning .

 Attention: It is responsible for processing of a certain part of information more extensively,


while remaining part is neglected or suppressed. Generally, it is a task specific visual
processing which is being adopted by animal visual systems. For instance, finding out the area
of interest in the scene autonomously is an act of attention.
Various States of Cognition (Cont..)
 Recognition: It involves identifying a complex arrangement of sensory stimuli such as, a letter of an
alphabet or a human face from complex scene. For example, when a person recognizes a pattern or
an object from a large scene, his sensory-organs process, transform and organize the raw data
received by the sensory receptors. Then it compares the acquired data of STM with the information
stored earlier in LTM through appropriate reasoning for recognition of the sensed pattern.

 Learning: Generally speaking, Learning is a process that takes the sensory stimuli from the outside
world in the form of examples and classifies these things without providing any explicit rules . For
instance, a child cannot distinguish between a cat and a dog. But as he grows, he can do so, based
on numerous examples of each animal given to him. Learning involves a teacher, who helps to
classify things by correcting the mistake of the learner each time. In machine learning, a program
takes the place of a teacher, which discovers the mistake of the learner. Numerous methods and
techniques of learning have been developed and classified as Supervised, Unsupervised and
Reinforcement learning.

 Planning: The state of planning engages itself to determine the steps of action involved in deriving
the required goal state from known initial states of the problem. The main task is to identify the
appropriate piece of knowledge derived from LTM at a given instance of time. Then planning
executes this task through matching the problem states with its perceptual model.
Various States of Cognition (Cont..)
 Action and Coordination: This state determines the control commands for various actuators
to execute the schedule of the action-plan of a given problem, which is carried out through a
process of supervised learning . The state also coordinates between various desired actions and
the input stimuli.

Cognitive Memory:
Sensory information is stored in human brain at closely linked neuron cells. Information in
some cells could be preserved only for a short duration, which is referred as Short Term
Memory (STM). Further, there are cells in human brain that can hold information for quite a
long time, which is called Long Term Memory (LTM). STM and LTM could also be of two
basic varieties, namely iconic memory and echoic memory. The iconic memory can store
visual information where as the echoic memory deals with audio information. These two types
of memories together are generally called as sensory memory. Tulving alternatively classified
human memory into three classes namely episodic, semantic and procedural memory. Episodic
memory saves the facts on their happening; the semantic memory constructs knowledge in
structural form, where as the procedural memory helps in taking decisions for actions.
Cycles of Cognition
 Acquisition Cycle: The task of the Acquisition Cycle is to store the information temporarily
in STM after sensing the information through various sensory organs. Then it compares the
response of the STM with already acquired and permanently stored information in LTM. The
process of representation of the information for storage and retrieval from LTM is a critical
job, which is known as knowledge representation. It is not yet known how human beings
store, retrieve and use the information from LTM.

 Perception Cycle: It is a cycle or a process that uses the previously stored knowledge in
LTM to gather and interpret the stimuli registered by the sensory organs through Acquisition
Cycle. Three relevant states of Perception are reasoning, attention and recognition and
generally carried out by a process of unsupervised learning. Here, we can say that the
learning is unsupervised, since such refinement of knowledge is an autonomous process and
requires no trainer for its adaptation. Therefore, this cycle does not have ‘Learning’ as an
exclusive state. It is used mainly for feature extraction, image matching and robot world
modeling.

 Learning & Coordination Cycle: Once the environment is perceived and stored in LTM in a
suitable format (data structure), the autonomous system utilizes various states namely
Learning, Planning and Action & Coordination . These three states taken together are called
as Learning & Coordination Cycle, which is being utilized by the agents to plan about its
actuations in the environment.
Machine Learning

14
Alpydin & Ch. Eick: ML Topic1
Why “Learn”?
 Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
 There is no need to “learn” to calculate payroll
 Learning is used when:
 Human expertise does not exist (navigating on Mars),
 Humans are unable to explain their expertise (speech recognition)
 Solution changes in time (routing on a computer network)
 Solution needs to be adapted to particular cases (user biometrics)

15
What We Talk About When We Talk
About“Learning”
 Learning general models from a data of particular examples
 Data is cheap and abundant (data warehouses, data marts);
knowledge is expensive and scarce.
 Example in retail: Customer transactions to consumer
behavior:
People who bought “Da Vinci Code” also bought “The Five People
You Meet in Heaven” (www.amazon.com)
 Build a model that is a good and useful approximation to the
data.

16
Data Mining/KDD
Definition := “KDD is the non-trivial process of
identifying valid, novel, potentially useful, and
ultimately understandable patterns in data” (Fayyad)
Applications:
 Retail: Market basket analysis, Customer relationship management (CRM)
 Finance: Credit scoring, fraud detection
 Manufacturing: Optimization, troubleshooting
 Medicine: Medical diagnosis
 Telecommunications: Quality of service optimization
 Bioinformatics: Motifs, alignment
 Web mining: Search engines
 ...

17
What is Machine Learning?
 Machine Learning
 Study of algorithms that
 improve their performance
 at some task
 with experience
 Optimize a performance criterion using example data or past
experience.
 Role of Statistics: Inference from a sample
 Role of Computer science: Efficient algorithms to
 Solve the optimization problem
 Representing and evaluating the model for inference

18
Machine Learning
The brain basically learns from experience. Neural networks are sometimes called machine learning algorithms,
because changing of its connection weights (training) causes the network to learn the solution to a problem. The
strength of connection between the neurons is stored as a weight-value for the specific connection. The system
learns new knowledge by adjusting these connection weights. The learning ability of a neural network is determined
by its architecture and by the algorithmic method chosen for training.

 Unsupervised learning: The hidden neurons must find a way to organize themselves without help from the outside.
In this approach, no sample outputs are provided to the network against which it can measure its predictive
performance for a given vector of inputs. This is learning by doing.

 Reinforcement learning: his method works on reinforcement from the outside. The connections among the neurons
in the hidden layer are randomly arranged, then reshuffled as the network is told how close it is to solving the
problem. Reinforcement learning is also called supervised learning, because it requires a teacher. The teacher may
be a training set of data or an observer who grades the performance of the network results.

Both unsupervised and reinforcement suffer from relative slowness and inefficiency relying on a random shuffling
to find the proper connection weights.

 Back propagation: This method is proven highly successful in training of multilayered neural nets. The network is
not just given reinforcement for how it is doing on a task. Information about errors is also filtered back through the
system and is used to adjust the connections between the layers, thus improving performance. A form of supervised
learning.
Off-line or On-line Learning
One can categorize the learning methods into yet another group, off-line or on-line.
When the system uses input data to change its weights to learn the domain
knowledge, the system could be in training mode or learning mode. When the
system is being used as a decision aid to make recommendations, it is in the
operation mode, this is also sometimes called recall.

 Off-line: In the off-line learning methods, once the systems enters into the
operation mode, its weights are fixed and do not change any more. Most of the
networks are of the off-line learning type.

 On-line: In on-line or real time learning, when the system is in operating mode
(recall), it continues to learn while being used as a decision tool. This type of
learning has a more complex design structure.
Learning laws
There are a variety of learning laws which are in common use. These laws are mathematical
algorithms used to update the connection weights. Most of these laws are some sort of variation of the
best known and oldest learning law, Hebb’s Rule.
Man’s understanding of how neural processing actually works is very limited. Learning is certainly
more complex than the simplification represented by the learning laws currently developed. Research
into different learning functions continues as new ideas routinely show up in trade publications etc. A
few of the major laws are given as an example below.

 Hebb’s Rule: The first and the best known learning rule was introduced by Donald Hebb. The
description appeared in his book The organization of Behavior in 1949. This basic rule is: If a neuron
receives an input from another neuron, and if both are highly active (mathematically have the same
sign), the weight between the neurons should be strengthened.

 Hopfield Law: This law is similar to Hebb’s Rule with the exception that it specifies the magnitude
of the strengthening or weakening. It states, "if the desired output and the input are both active or both
inactive, increment the connection weight by the learning rate, otherwise decrement the weight by the
learning rate."

Note: Most learning functions have some provision for a learning rate, or a learning constant. Usually this
term is positive and between zero and one.
Continue…
 The Delta Rule: The Delta Rule is a further variation of Hebb’s Rule, and it is one of the most commonly
used. This rule is based on the idea of continuously modifying the strengths of the input connections to
reduce the difference (the delta) between the desired output value and the actual output of a neuron. This
rule changes the connection weights in the way that minimizes the mean squared error of the network. The
error is back propagated into previous layers one layer at a time. The process of back-propagating the
network errors continues until the first layer is reached. The network type called Feed forward, Back-
propagation derives its name from this method of computing the error term.

Note: This rule is also referred to as the Widrow-Hoff Learning Rule and the Least Mean Square Learning Rule.

 Kohonen’s Learning Law: his procedure, developed by Teuvo Kohonen, was inspired by learning in
biological systems. In this procedure, the neurons compete for the opportunity to learn, or to update their
weights. The processing neuron with the largest output is declared the winner and has the capability of
inhibiting its competitors as well as exciting its neighbours. Only the winner is permitted output, and only
the winner plus its neighbours are allowed to update their connection weights.

The Kohonen rule does not require desired output. Therefore it is implemented in the unsupervised methods
of learning. Kohonen has used this rule combined with the on-center/off-surround intra- layer connection to
create the self-organizing neural network, which has an unsupervised learning method.
Applications
 Detection of medical phenomena. A variety of health-related indices (e.g., a combination of heart rate,
levels of various substances in the blood, respiration rate) can be monitored. The onset of a particular
medical condition could be associated with a very complex (e.g., nonlinear and interactive) combination
of changes on a subset of the variables being monitored. Neural networks have been used to recognize
this predictive pattern so that the appropriate treatment can be prescribed.

 Stock market prediction. Fluctuations of stock prices and stock indices are another example of a
complex, multidimensional, but in some circumstances at least partially-deterministic phenomenon.
Neural networks are being used by many technical analysts to make predictions about stock prices based
upon a large number of factors such as past performance of other stocks and various economic indicators.

 Credit assignment. A variety of pieces of information are usually known about an applicant for a loan.
For instance, the applicant's age, education, occupation, and many other facts may be available. After
training a neural network on historical data, neural network analysis can identify the most relevant
characteristics and use those to classify applicants as good or bad credit risks.

 Monitoring the condition of machinery. Neural networks can be instrumental in cutting costs by
bringing additional expertise to scheduling the preventive maintenance of machines. A neural network
can be trained to distinguish between the sounds a machine makes when it is running normally ("false
alarms") versus when it is on the verge of a problem. After this training period, the expertise of the
network can be used to warn a technician of an upcoming breakdown, before it occurs and causes costly
unforeseen "downtime."

 Engine management. Neural networks have been used to analyze the input of sensors from an engine.
The neural network controls the various parameters within which the engine functions, in order to achieve
a particular goal, such as minimizing fuel consumption.
Related Fields
data
mining control theory
statistics
decision theory
information theory machine
learning
cognitive science
databases
psychological models
evolutionary neuroscience
models

Machine learning is primarily concerned with the accuracy and effectiveness of the computer
system.
Machine Learning Paradigms
 rote learning
 learning by being told (advice-taking)
 learning from examples (induction)
 learning by analogy
 speed-up learning
 concept learning
 clustering
 discovery
 …
Architecture of a Learning System
feedback performance standard
critic percepts

changes ENVIRONMEN
learning performance
element element actions
knowledge
learning goals

problem
generator
Learning Element
Design affected by:
 performance element used
 e.g., utility-based agent, reactive agent, logical agent
 functional component to be learned
 e.g., classifier, evaluation function, perception-action function,
 representation of functional component
 e.g., weighted linear function, logical theory, HMM
 feedback available
 e.g., correct action, reward, relative preferences
Dimensions of Learning Systems
 type of feedback
 supervised (labeled examples)
 unsupervised (unlabeled examples)
 reinforcement (reward)
 representation
 attribute-based (feature vector)
 relational (first-order logic)
 use of knowledge
 empirical (knowledge-free)
 analytical (knowledge-guided)
Outline
 Supervised learning
 empirical learning (knowledge-free)
 attribute-value representation
 logical representation

 analytical learning (knowledge-guided)


 Reinforcement learning
 Unsupervised learning
 Performance evaluation
 Computational learning theory
Inductive (Supervised) Learning
Basic Problem: Induce a representation of a function (a systematic
relationship between inputs and outputs) from examples.

 target function f: X → Y
 example (x,f(x))
 hypothesis g: X → Y such that g(x) = f(x)

x = set of attribute values (attribute-value representation)


x = set of logical sentences (first-order representation)

Y = set of discrete labels (classification)


Y =  (regression)
Decision Trees
Should I wait at this restaurant?
Growth of Machine Learning
 Machine learning is preferred approach to
 Speech recognition, Natural language processing
 Computer vision
 Medical outcomes analysis
 Robot control
 Computational biology
 This trend is accelerating
 Improved machine learning algorithms
 Improved data capture, networking, faster computers
 Software too complex to write by hand
 New sensors / IO devices
 Demand for self-customization to user, environment
 It turns out to be difficult to extract knowledge from human expertsfailure of expert
systems in the 1980’s.

32
Alpydin & Ch. Eick: ML Topic1
Applications
 Association Analysis
 Supervised Learning
 Classification
 Regression/Prediction
 Unsupervised Learning
 Reinforcement Learning

33
Learning Associations
 Basket analysis:
P (Y | X ) probability that somebody who buys X also buys Y
where X and Y are products/services.

Example: P ( chips | beer ) = 0.7

Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Classification
 Example: Credit
scoring
 Differentiating
between low-risk and
high-risk customers
from their income and
savings

Discriminant: IF income > θ1 AND savings > θ2


THEN low-risk ELSE high-risk

Model 35
Classification: Applications
 Aka Pattern recognition
 Face recognition: Pose, lighting, occlusion (glasses, beard),
make-up, hair style
 Character recognition: Different handwriting styles.
 Speech recognition: Temporal dependency.
 Use of a dictionary or the syntax of the language.
 Sensor fusion: Combine multiple modalities; eg, visual (lip image) and
acoustic for speech
 Medical diagnosis: From symptoms to illnesses
 Web Advertizing: Predict if a user clicks on an ad on the Internet.

36
Face Recognition
Training examples of a person

Test images

AT&T Laboratories, Cambridge UK


http://www.uk.research.att.com/facedatabase.html

37
Prediction: Regression
 Example: Price of a used car
 x : car attributes

y : price y = wx+w0
y = g (x | θ)
g ( ) model,
θ parameters

38
Regression Applications
 Navigating a car: Angle of the steering wheel (CMU NavLab)
 Kinematics of a robot arm

(x,y) α1= g1(x,y)


α2= g2(x,y)
α2

α1

39
Supervised Learning: Uses
Example: decision trees tools that create rules
 Prediction of future cases: Use the rule to predict the output
for future inputs
 Knowledge extraction: The rule is easy to understand
 Compression: The rule is simpler than the data it explains
 Outlier detection: Exceptions that are not covered by the rule,
e.g., fraud

40
Unsupervised Learning
 Learning “what normally happens”
 No output
 Clustering: Grouping similar instances
 Other applications: Summarization, Association Analysis
 Example applications
 Customer segmentation in CRM
 Image compression: Color quantization
 Bioinformatics: Learning motifs

41
Reinforcement Learning
 Topics:
 Policies: what actions should an agent take in a particular situation
 Utility estimation: how good is a state (used by policy)
 No supervised output but delayed reward
 Credit assignment problem (what was responsible for the
outcome)
 Applications:
 Game playing
 Robot in a maze
 Multiple agents, partial observability, ...

42
Resources: Datasets
 UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html
 UCI KDD Archive:
http://kdd.ics.uci.edu/summary.data.application.html
 Statlib: http://lib.stat.cmu.edu/
 Delve: http://www.cs.utoronto.ca/~delve/

43
Resources: Journals
 Journal of Machine Learning Research www.jmlr.org
 Machine Learning
 IEEE Transactions on Neural Networks
 IEEE Transactions on Pattern Analysis and Machine
Intelligence
 Annals of Statistics
 Journal of the American Statistical Association
 ...

44
Resources: Conferences
 International Conference on Machine Learning (ICML)
 European Conference on Machine Learning (ECML)
 Neural Information Processing Systems (NIPS)
 Computational Learning
 International Joint Conference on Artificial Intelligence (IJCAI)
 ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)
 IEEE Int. Conf. on Data Mining (ICDM)

45
Summary COSC 6342
 Introductory course that covers a wide range of machine learning
techniques—from basic to state-of-the-art.
 More theoretical/statistics oriented, compared to other courses I teach
might need continuous work not “to get lost”.
 You will learn about the methods you heard about: Naïve Bayes’, belief
networks, regression, nearest-neighbor (kNN), decision trees, support vector
machines, learning ensembles, over-fitting, regularization, dimensionality reduction
& PCA, error bounds, parameter estimation, mixture models, comparing models,
density estimation, clustering centering on K-means, EM, and DBSCAN, active
and reinforcement learning.
 Covers algorithms, theory and applications
 It’s going to be fun and hard work

46
Alpydin & Ch. Eick: ML Topic1
Which Topics Deserve More Coverage
—if we had more time?
 Graphical Models/Belief Networks (just ran out of time)
 More on Adaptive Systems
 Learning Theory
 More on Clustering and Association Analysiscovered by Data
Mining Course
 More on Feature Selection, Feature Creation
 More on Prediction
 Possibly: More depth coverage of optimization techniques, neural
networks, hidden Markov models, how to conduct a machine
learning experiment, comparing machine learning algorithms,…

47
Alpydin & Ch. Eick: ML Topic1

You might also like