Lecture 4-Machine Learning Applications

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 52

LECTURE 4

MACHINE-LEARNING TECHNIQUES
LEARNING OBJECTIVES

5.1 Understand the basic concepts and definitions of artificial neural networks (AN N)
5.2 Learn the different types of AN N architectures
5.3 Understand the concept and structure of support vector machines (SV M)
5.4 Learn the advantages and disadvantages of SV M compared to AN N
5.5 Understand the concept and formulation of k-nearest neighbor (k N N) algorithm
LEARNING OBJECTIVES

5.6 Learn the advantages and disadvantages of k N N compared to AN N and SV M


5.7 Understand the basic principles of Bayesian learning and Naïve Bayes algorithm
5.8 Learn the basics of Bayesian Belief Networks and how they are used in predictive
analytics
5.9 Understand different types of ensemble models and their pros and cons in predictive
analytics
OPENING VIGNETTE (1 OF 4)
Predictive Modeling Helps Better Understand and
Manage Complex Medical Procedures
• Situation
• Problem
• Solution
• Results
• Answer & discuss the case questions.
OPENING VIGNETTE (2 OF 4)
Discussion Questions for the Opening Vignette:
1. Why is it important to study medical procedures? What is the value in predicting outcomes?
2. What factors do you think are the most important in better understanding and managing
healthcare?
3. What would be the impact of predictive modeling on healthcare and medicine? Can predictive
modeling replace medical or managerial personnel?
4. What were the outcomes of the study? Who can use these results? How can they be
implemented?
5. Search the Internet to locate two additional cases in managing complex medical procedures.
OPENING VIGNETTE (3 OF 4)
A Process Map for Training and Testing Four Predictive Models
OPENING VIGNETTE (4 OF 4)
The Comparison of the Four Models

1
Acronyms for model types: artificial neural networks (A N N), support vector machines (S V M), popular decision tree algorithm (C5), classification and regression trees
(CA R T).
2
Prediction results for the test data samples are shown in a confusion matrix where the rows represent the actuals and columns represent the predicted cases.
3
Accuracy, sensitivity, and specificity are the three performance measures that were used in comparing the four prediction models.
NEURAL NETWORK CONCEPTS
• Neural networks (N N): a human brain metaphor for information processing
• Neural computing
• Artificial neural network (AN N)
• Many uses for AN N for
• pattern recognition, forecasting, prediction, and classification
• Many application areas
• finance, marketing, manufacturing, operations, information systems, and so on
BIOLOGICAL NEURAL NETWORKS

• Two interconnected brain cells (neurons)


PROCESSING INFORMATION IN AN N

• A single neuron (processing element – P E) with inputs and outputs


BIOLOGY ANALOGY
Biological Artificial
Soma Node
Dendrites Input
Axon Output
Synapse Weight
Slow Fast
Many neurons (109) Few neurons (a dozen to hundreds of thousands)
ELEMENTS OF AN N
• Processing element (P E)
• Network architecture
• Hidden layers
• Parallel processing
• Network information processing
• Inputs
• Outputs
• Connection weights
• Summation function
APPLICATION CASE 5.1
Neural Networks Are Helping to Save Lives in the
Mining Industry
Questions for Discussion:
1. How did neural networks help save lives in the mining industry?
2. What were the challenges, the proposed solution, and the
obtained results?
NEURAL NETWORK
ARCHITECTURES
• Architecture of a neural network is driven by the task it is intended to address
• Classification, regression, clustering, general optimization, association
• Feedforward, multi-layered perceptron with backpropagation learning algorithm
• Most popular architecture:
• This AN N architecture will be covered later
• Other AN N Architectures – Recurrent, self-organizing feature maps, hopfield
networks, …
NEURAL NETWORK
ARCHITECTURES
RECURRENT NEURAL NETWORKS
OTHER POPULAR AN N PARADIGMS
SELF ORGANIZING MAPS (SO M)

• First introduced by the


Finnish Professor Teuvo
Kohonen
• Applies to clustering type
problems
OTHER POPULAR AN N PARADIGMS
HOPFIELD NETWORKS

• First introduced by John Hopfield


• Highly interconnected neurons
• Applies to solving complex
computational problems (e.g.,
optimization problems)
APPLICATION CASE 5.2
Predictive Modeling Is Powering the Power
Generators
Questions for Discussion:
1. What are the key environmental concerns in the electric
power industry?
2. What are the main application areas for predictive modeling
in the electric power industry?
3. How was predictive modeling used to address a variety of
problems in the electric power industry?
SUPPORT VECTOR MACHINES (SV M)

• SV M are among the most popular machine-learning techniques.


• SV M belong to the family of generalized linear models… (capable of representing
non-linear relationships in a linear fashion)
• SV M achieve a classification or regression decision based on the value of the linear
combination of input features.
• Because of their architectural similarities, SV M are also closely associated with AN N.
SUPPORT VECTOR MACHINES (SV M)

• Goal of SV M: to generate mathematical functions that map input variables to desired


outputs for classification or regression type prediction problems.
• First, SV M uses nonlinear kernel functions to transform non-linear relationships
among the variables into linearly separable feature spaces.
• Then, the maximum-margin hyperplanes are constructed to optimally separate
different classes from each other based on the training dataset.
• SV M has solid mathematical foundation!
SUPPORT VECTOR MACHINES (SV M)

• A hyperplane is a geometric concept used to describe the separation surface between


different classes of things.
• In SV M, two parallel hyperplanes are constructed on each side of the separation
space with the aim of maximizing the distance between them.
• A kernel function in SV M uses the kernel trick (a method for using a linear classifier
algorithm to solve a nonlinear problem)
• The most commonly used kernel function is the radial basis function (RB F).
SUPPORT VECTOR MACHINES (SV M)

• Many linear classifiers (hyperplanes) may separate the data


HOW DOES A SV M WORKS?
• Following a machine-learning process, a SV M learns from the historic cases.
• The Process of Building SV M
• Preprocess the data

• Scrub and transform the data.


2. Develop the model.

• Select the kernel type (RB F is often a natural choice).


• Determine the kernel parameters for the selected kernel type.
• If the results are satisfactory, finalize the model, otherwise change the
kernel type and/or kernel parameters to achieve the desired accuracy level.
3. Extract and deploy the model.
THE PROCESS OF BUILDING A SV M
SV M APPLICATIONS
• SV M are the most widely used kernel-learning algorithms for wide range of classification and
regression problems
• SV M represent the state-of-the-art by virtue of their excellent generalization performance,
superior prediction power, ease of use, and rigorous theoretical foundation
• Most comparative studies show its superiority in both regression and classification type
prediction problems.
• SV M versus AN N?
K-NEAREST NEIGHBOR METHOD (K-N N)

• ANN s and SVM s  time-demanding, computationally intensive iterative derivations


• k-N N a simplistic and logical prediction method, that produces very competitive results
• k-N N is a prediction method for classification as well as regression types (similar to AN N
& SV M)
• k-N N is a type of instance-based learning (or lazy learning) – most of the work takes place
at the time of prediction (not at modeling)
• k : the number of neighbors used in the model
K-NEAREST NEIGHBOR METHOD (K-
N N) (2 OF 2)

• The answer to “which class a


data point belongs to?”
depends on the value of k
THE PROCESS OF K-N N METHOD
K-N N MODEL PARAMETER (1 OF 2)

1. Similarity Measure: The Distance Metric

• Numeric versus nominal values?


K-N N MODEL PARAMETER

2. Number of Neighbors (the value of k)


• The best value depends on the data
• Larger values reduces the effect of noise but also make boundaries between
classes less distinct
• An “optimal” value can be found heuristically
• Cross Validation is often used to determine the best value for k and the distance
measure
APPLICATION CASE 5.4
Efficient Image Recognition and Categorization
with k N N
Questions for Discussion:
1. Why is image recognition/classification a worthy but
difficult problem?
2. How can kN N be effectively used for image
recognition/classification applications?
NAÏVE BAYES METHOD FOR
CLASSIFICATION
• Naïve Bayes is a simple probability-based classification method
• Naïve - assumption of independence among the input variables
• Can use both numeric and nominal input variables
• Numeric variables need to be discretized
• Can be used for both regression and classification
• Naïve based models can be developed very efficiently and effectively
• Using maximum likelihood method
BAYES THEOREM
• Developed by Thomas Bayes (1701–1761)
• Determines the conditional probabilities
• Given that X and Y are two events:

• Go trough the simple example in the book


NAÏVE BAYES METHOD FOR
CLASSIFICATION (2 OF 2)
• Process of Developing a Naïve Bayes Classifier
• Training Phase
1. Obtain and pre-process the data
2. Discretize the numeric variables
3. Calculate the prior probabilities of all class labels
4. Calculate the likelihood for all predictor
variables/values
• Testing Phase
• Using the outputs of Steps 3 and 4 above, classify the new samples
• See the numerical example in the book…
APPLICATION CASE 5.5 (1 OF 2)
Predicting Disease Progress in Crohn’s Disease
Patients: A Comparison of Analytics Methods
Questions for Discussion:
1. What is Crohn’s disease and why is it important?
2. Based on the findings of this Application Case, what can you
tell about the use of analytics in chronic disease
management?
3. What other methods and data sets might be used to better
predict the outcomes of this chronic disease?
APPLICATION CASE 5.5 (2 OF 2)
Predicting Disease Progress in Crohn’s Disease
Patients: A Comparison of Analytics Methods

ct ion
edi y
P curac
r
og Ac
ol
od
e th
M
y

b le c
a
ri rtan
a
V po
Im
e
BAYESIAN NETWORKS
• A tool for representing dependency structure in a graphical, explicit, and
intuitive way
• A directed acyclic graph whose nodes correspond to the variables and arcs
that signify conditional dependencies between variables and their possible
values
• Direction of the arc matter
• A partial causality link in student retention
BAYESIAN NETWORKS

How can B N be constructed?


1. Manually
• By an engineer with the help of a domain expert
• Time demanding, expensive (for large networks)
• Experts may not even be available
2. Automatically
• Analytically …
• By learning/inducing the structure of the network from the historical data

• Availability high-quality historical data is imperative


BAYESIAN NETWORKS

How can B N be constructed?


• Analytically
BAYESIAN NETWORKS

How can B N be constructed?


Tree Augmented Naïve Bayes Network Structure

1. Compute information
function
2. Build the undirected graph
3. Build a spanning tree
4. Convert the undirected
graph into
a directed one
Tree Augmented Naïve (TA N) Bayes
5. Construct a TA N model Network Structure
BAYESIAN NETWORKS

• EXAMPLE: Bayesian Belief Network for Predicting Freshmen Student


Attrition
ENSEMBLE MODELING (1 OF 3)
• Ensemble – combination of models (or model outcomes) for better results
• Why do we need to use ensembles:
• Better accuracy
• More stable/robust/consistent/reliable outcomes
• Reality: ensembles wins competitions!
• Netflix $1M Prise completion
• Many recent competitions at Kaggle.com
• The Wisdom of Crowds
ENSEMBLE MODELING (2 OF 3)

Figure 5.19 Graphical Depiction of Model Ensembles for


Prediction Modeling.
TYPES OF ENSEMBLE MODELING

Figure 5.20 Simple Taxonomy for Model Ensembles.


TYPES OF ENSEMBLE MODELING

Figure 5.20 Bagging-Type Decision Tree Ensembles.


TYPES OF ENSEMBLE MODELING

Figure 5.20 Boosting-Type Decision Tree Ensembles.


ENSEMBLE MODELING
• Variants of Bagging & Boosting (Decision Trees)
• Decision Trees Ensembles
• Random Forest
• Stochastic Gradient Boosting
• Stacking
• Stack generation or super learners
• Information Fusion
• Any number of any models
• Simple/weighted combining
TYPES OF ENSEMBLE MODELING
• STACKING • INFORMATION FUSION
ENSEMBLES – PROS AND CONS
Table 5.9 Brief List of Pros and Cons of Model Ensembles Compared to
Individual Models.
PROS (Advantages) Description
• Accuracy Model ensembles usually result in more accurate models than individual models.
• Robustness Model ensembles tend to be more robust against outliers and noise in the data set than
individual models.
• Reliability (stable) Because of the variance reduction, model ensembles tend to produce more stable, reliable,
and believable results than individual models.
• Coverage Model ensembles tend to have a better coverage of the hidden complex patterns in the data
set than individual models.
CONS (Shortcomings) Description
• Complexity Model ensembles are much more complex than individual models.
• Computationally expensive Compared to individual models, ensembles require more time and computational power to
build.
• Lack of transparency Because of their complexity, it is more difficult to understand the inner structure of model
(explainability) ensembles (how they do what they do) than individual models.
• Harder to deploy Model ensembles are much more difficult to deploy in an analytics-based Managerial
decision-support system than single models.
APPLICATION CASE 5.6 (1 OF 3)
To Imprison or Not to Imprison: A Predictive
Analytics-Based DS S for Drug Courts
Questions for Discussion:
1. What are drug courts and what do they do for the society?
2. What are the commonalities and differences between
traditional (theoretical) and modern (machine-learning) base
methods in studying drug courts?
3. Can you think of other social situations and systems for
which predictive analytics can be used?
APPLICATION CASE 5.6 (2 OF 3)
To Imprison or Not to Imprison: A Predictive
Analytics-Based DS S for Drug Courts
Methodolog
y
APPLICATION CASE 5.6 (3 OF 3)
To Imprison or Not to Imprison: A Predictive
Analytics-Based DS S for Drug Courts
Prediction Accuracy

AN N: artificial neural networks; D T: decision trees; L R: logistic regression; R F: random forest; H E: heterogeneous ensemble; AU C: area under the curve; G: graduated; T:
terminated

You might also like