Techniques For Predictive Modeling: Learning Objectives For Chapter 6

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 19
At a glance
Powered by AI
The key takeaways are the three main predictive modeling techniques discussed - artificial neural networks, support vector machines, and k-nearest neighbor algorithms.

The three main predictive modeling techniques discussed in the chapter are artificial neural networks, support vector machines, and k-nearest neighbor algorithms.

The main components of an artificial neural network are processing elements, network structure, input, outputs, connection weights, summation function, and transformation (transfer) function.

Chapter 6:

Techniques for Predictive Modeling

Learning Objectives for Chapter 6

1. Understand the concept and definitions of artificial neural networks (ANN)


2. Learn the different types of ANN architectures
3. Know how learning happens in ANN
4. Understand the concept and structure of support vector machines (SVM)
5. Learn the advantages and disadvantages of SVM compared to ANN
6. Understand the concept and formulation of k-nearest neighbor algorithm (kNN)
7. Learn the advantages and disadvantages of kNN compared to ANN and SVM

CHAPTER OVERVIEW

Predictive modeling is perhaps the most commonly practiced branch in data mining. It
allows decision makers to estimate what the future holds by means of learning from the
past. In this chapter, we study the internal structures, capabilities/limitations, and
applications of the most popular predictive modeling techniques, such as artificial neural
networks, support vector machines, and k-nearest neighbor. These techniques are capable
of addressing both classification- and regression-type prediction problems. Often, they
are applied to complex prediction problems where other techniques are not capable of
producing satisfactory results. In addition to these three (that are covered in this chapter),
other notable prediction modeling techniques include regression (linear or nonlinear),
logistic regression (for classification-type prediction problems), naive Bayes
(probabilistically oriented classification modeling), and different types of decision trees
(covered in Chapter 5).

CHAPTER OUTLINE

6.1 OPENING VIGNETTE: PREDICTING MODELING HELPS BETTER


UNDERSTAND AND MANAGE COMPLEX MEDICAL PROCEDURES
 Questions for the Opening Vignette
A. WHAT WE CAN LEARN FROM THIS VIGNETTE

1
Copyright © 2014 Pearson Education, Inc.
6.2 BASIC CONCEPTS OF NEURAL NETWORKS
A. BIOLOGICAL AND ARTIFICIAL NEURAL NETWORKS
 Technology Insights 6.1: The Relationship Between
Biological and Artificial Neural Networks
 Application Case 6.1: Neural Networks Are Helping to
Save Lives in the Mining Industry
B. ELEMENTS OF ANN
1. Processing Elements
2. Network Structure
C. NETWORK INFORMATION PROCESSING
1. Input
2. Outputs
3. Connection Weights
4. Summation Function
5. Transformation (Transfer) Function
6. Hidden Layers
D. NEURAL NETWORK ARCHITECTURES
1. Kohonen’s Self-Organizing Feature Maps
2. Hopfield Networks
 Application Case 6.2: Predictive Modeling Is Powering the
Power Generators
 Section 6.2 Review Questions

6.3 DEVELOPING NEURAL NETWORK-BASED SYSTEMS


A. THE GENERAL ANN LEARNING PROCESS
B. BACKPROPAGATION
 Section 6.3 Review Questions

6.4 ILLUMINATING THE BLACK BOX OF ANN WITH SENSITIVITY


ANALYSIS
 Application Case 6.3: Sensitivity Analysis Reveals Injury
Severity Factors in Traffic Accidents
 Section 6.4 Review Questions

6.5 SUPPORT VECTOR MACHINES


 Application Case 6.4: Managing Student Retention with
Predictive Modeling
A. MATHEMATICAL FORMULATION OF SVMS
B. PRIMAL FORM
C. DUAL FORM
D. SOFT MARGIN
E. NONLINEAR CLASSIFICATION
F. KERNEL TRICK
 Section 6.5 Review Questions

6.6 A PROCESS-BASED APPROACH TO THE USE OF SVM

2
Copyright © 2014 Pearson Education, Inc.
1. Numericizing the Data
2. Normalizing the Data
3. Select the Kernel Type and Kernel Parameters
4. Deploy the Model
A. SUPPORT VECTOR MACHINES VERSUS ARTIFICIAL NEURAL
NETWORKS
 Section 6.6 Review Questions

6.7 NEAREST NEIGHBOR METHOD FOR PROTECTION


A. SIMILARITY MEASURE: THE DISTANCE METRIC
B. PARAMETER SELECTION
1. Cross-Validation
 Application Case 6.5: Efficient Image Recognition and
Categorization with kNN
 Section 6.7 Review Questions

Chapter Highlights
Key Terms
Questions for Discussion
Exercises
Teradata University Network (TUN) and Other Hands-On Exercises
Team Assignments and Role-Playing Projects
Internet Exercises
 End of Chapter Application Case: Coors Improves Beer
Flavors with Neural Networks
 Questions for the Case
References

TEACHING TIPS/ADDITIONAL INFORMATION        


Predictive models grew out of research in pattern recognition and classification.
They form an important element of the new analytics movement. You can discuss these in
the “descriptive, predictive, prescriptive” framework. This chapter and chapter 5 focus on
the predictive.
This chapter is pretty technical, and gives an in-depth overview of artificial neural
networks (ANN), support vector machines (SVM), and the k-nearest neighbor
algorithms. There is some rather dense mathematics and detailed design/development
specifications, which can be daunting for students. If you would rather use your finite
time on other topics, it can be skipped without interfering with students’ ability to
understand the chapters that follow.
Glossing over this material probably won’t do the students much good. You
should try to go in depth if possible. If you find your time limited, you might want to
cover one of the three approaches in more depth. The easiest to explain is probably kNN.
But the most interesting for students is probably neural networks, since this is the

3
Copyright © 2014 Pearson Education, Inc.
approach that emulates the human brain and that is the most frequently depicted in
popular media (for example, consider Data’s “brain” in the Star Trek series).
Where the book refers to artificial neurons “more or less resembling the structure
of” their biological counterparts in Section 6.2 (the first “content” section of the chapter),
students must understand that this does not mean a physical resemblance. An artificial
neuron is a silicon construct, usually embodied in a software model. The ANN equivalent
of sending a signal to another biological neuron is placing data in a shared location or
sending a software message. The structural resemblance is functional, not physical. This
may seem obvious to an instructor, but is often not equally obvious to students seeing the
concept for the first time.
Be sure to discuss some common and comparative features between the three
approaches. Bring up the distinction between “classification” and “regression” problems,
as well as what it means to “train” a system in a supervised learning environment. Stress
the importance in all of these models of picking the right parameters, and discuss
approaches (such as cross-validation) for this. Students should be able to associate
“hyperplanes” with SVM, “similarity measures” with nearest-neighbor, and
“backpropagation” with neural networks. Also, a distinction should be clearly made
between feedforward and recurrent neural networks. Even if students can’t implement the
algorithms, knowing the terminology will help them conceptualize the different
approaches. Also, point out that in many of the application cases, multiple approaches can
be used. In fact, the opening vignette explicitly espouses the value of using and
comparing several algorithms for the accuracy and efficiency.

ANSWERS TO END OF SECTION REVIEW QUESTIONS      


Section 6.1 Review Questions

1. Why is it important to study medical procedures? What is the value in predicting


outcomes?

Demand for healthcare services is increasing because of the aging population, but
the supply side is having problems keeping up with the level and quality of
service. Healthcare systems ought to significantly improve their operational
effectiveness (doing the right thing, such as diagnosing and treating accurately)
and efficiency (doing it the right way, such as using the least amount of resources
and time). Clinical decision support systems that use the outcome of data mining
studies can support healthcare managers and/or medical professionals in making
accurate and timely decisions to optimally allocate resources in order to increase
the quantity and quality of medical services.

2. What factors do you think are the most important in better understanding and
managing healthcare? Consider both managerial and clinical aspects of
healthcare.

4
Copyright © 2014 Pearson Education, Inc.
Healthcare systems ought to significantly improve their operational effectiveness
(doing the right thing, such as diagnosing and treating accurately) and efficiency
(doing it the right way, such as using the least amount of resources and time).
Effectiveness is probably more of a clinical concern, while efficiency is more of a
managerial concern.

3. What would be the impact of predictive modeling on healthcare and medicine?


Can predictive modeling replace managerial or medical personnel?

Clinical decision support systems that use the outcome of data mining studies
(such as the ones presented in this case study) are shown to be useful and
reasonably accurate predictors, especially if used in combination. These are not
meant to replace healthcare managers and/or medical professionals. Rather, they
intend to support them in making accurate and timely decisions to optimally
allocate resources in order to increase the quantity and quality of medical
services. There still is a long way to go before we can see these decision aids
being used extensively in healthcare practices. Among others, there are
behavioral, ethical, and political reasons for this resistance to adoption. Maybe the
need and the government incentives for better healthcare systems will expedite
the adoption.

4. What were the outcomes of the study? Who can use these results? How can the
results be implemented?

The main outcome of this study was to show the efficacy of data mining in
predicting the outcome and in analyzing the prognostic factors of complex
medical procedures such as CABG surgery. The study showed that using a
number of prediction methods (as opposed to only one) in a competitive
experimental setting has the potential to produce better predictive as well as
explanatory results. SVM, ANN, and both C5 and CART decision trees were
used.

5. Search the Internet to locate two additional cases where predictive modeling is
used to understand and manage complex medical procedures.

Open-ended question, answer will be determined by the search results each


student selects.

Section 6.2 Review Questions

1. What is an ANN?

An ANN (artificial neural network) is a computer program that models a


biological neural network to learn to recognize patterns, and then recognize them
in new data.

5
Copyright © 2014 Pearson Education, Inc.
2. Explain the following terms: neuron, axon, and synapse.

A neuron is defined in the book as one of the cells in the human brain. More
generally, and as students may have learned in biology courses, it is an
electrically excitable cell in the nervous system of an animal.

An axon is the component of a neuron that sends signals to other cells.

A synapse is a junction between two neurons through which signals pass from one
neuron to the next, perhaps being altered en route.

3. How do weights function in an ANN?

Weights define the impact that a given input has on a neuron in the next layer. As
such, they embody what the network has learned so far. As a network learns, its
weights are adjusted.

4. What is the role of the summation and transformation function?

The summation function determines the total input to a neuron by calculating the
weighted sum of its individual input values. Its output is input to the
transformation (transfer) function. The transformation (or transfer) function
determines the output of a neuron from its input (i.e., output of the summation
function).

Many possible transformation functions exist. Most are sigmoid functions. (A


sigmoid function is a non-decreasing function whose value is in the range 0–1.) A
simple transformation function produces an output of 0 for inputs up to a
threshold value, an output of 1 from that value up.

5. What are the most common ANN architectures? How do they differ from each
other?

The most common ANN architectures include feedforward (multilayer perceptron


with backpropagation), associative memory, recurrent networks, Kohonen’s self-
organizing feature maps, and Hopfield networks. The simplest architectures are
feedforward networks, which involve information flowing unidirectionally from
input layer to hidden layers to output layer. More complex architectures are
recurrent; these have many connections in every direction between the layers and
neurons, creating a complex connection structure. Kohonen networks SOM
networks provide a way to represent multidimensional data in much lower
dimensional spaces, usually one or two dimensions. One of the most interesting
aspects of SOM is that they learn to classify data without supervision (i.e., no
output vectors). Hopfield networks are interconnected networks of nonlinear
neurons, and are effective in solving complex constraint optimization problems.

6
Copyright © 2014 Pearson Education, Inc.
With Hopfield networks, each neuron is connected to every other neuron within
the network.

Section 6.3 Review Questions

1. List the nine steps in conducting a neural network project.

These are shown in the flowchart of Figure 6.9. After testing (step 8), it is
possible to return to a previous step:
Step 1: Collect data
Step 2: Separate (data) into training and test sets
Step 3: Define a network structure
Step 4: Select a learning algorithm
Step 5: Set parameters and values
Step 6: Initialize weights and start training
Step 7: Stop training, freeze the network weights
Step 8: Test the trained network
Step 9: Implementation: use the network with new cases
1.
2. What are some of the design parameters for developing a neural network?

Some design parameters to consider include the type of network to employ, the
number of nodes (input, hidden, and output) and layers, the types of
transformation functions within each neuron, the original weight settings, and the
acceptable delta (error) level.

3. How does backpropagation learning work?

Backpropagation (short for back-error propagation) is the most widely used


supervised learning algorithm in neural computing. Backpropagation involves a
feedforward network with one or more hidden layers. Backpropagation involves
supervised learning because predetermined correct answers for each training
pattern are given and compared to the results given by the backpropagation
algorithm. The difference between the correct answer and the output (known as
the delta value) is calculated, and weights of the output nodes are adjusted
accordingly. Weight adjustment also proceeds backward to the hidden layers;
hence the term “backpropagation.” This process repeats, continuously testing
against training patterns, until the delta value is sufficiently low.

4. Describe different types of neural network software available today.

Many commercial ANN software products function like software shells. They
provide a set of standard architectures, learning algorithms, and parameters, along
with the ability to manipulate the data. Some development tools can support up to
several dozen network paradigms and learning algorithms. Most of the leading

7
Copyright © 2014 Pearson Education, Inc.
data mining tools (e.g., SAS Enterprise Miner, IBM SPSS Modeler, Statistica
Data Miner) include neural network learning algorithms. Some specialized neural
network products include California Scientific (BrainMaker), NeuralWare,
NeuroDimension Inc., Ward Systems Group (Neuroshell), and Megaputer. Others
are implemented as spreadsheet add-ins. In addition, there are class libraries and
APIs for languages such as Java and C++. Mathematical applications such as
MATLAB also include neural network algorithms.

5. How are neural networks implemented in practice when the training/testing is


complete?

After testing and training, the network is deployed for use on unknown new cases.
It might be used as a stand-alone system or as part of another software system
where new input data will be presented to it and its output will be a recommended
decision. At this point the recommendations provided by the neural network are
considered to be valid, because it has been extensively trained on training data
and tested with test data.

Section 6.4 Review Questions

1. What is the so-called “black-box” syndrome?

ANN are typically thought of as black boxes, capable of solving complex


problems but lacking the explanation of its capabilities. This phenomenon is
commonly referred to as the “black-box” syndrome.

2. Why is it important to be able to explain an ANN’s model structure?

It is important to be able to explain a model’s “inner being”; such an explanation


offers assurance that the network has been properly trained and will behave as
desired once deployed in a business intelligence environment. Such a need to
“look under the hood” might be attributable to a relatively small training set (as a
result of the high cost of data acquisition) or a very high liability in case of a
system error.

3. How does sensitivity analysis work?

Sensitivity analysis is a method for extracting the cause-and-effect relationships


among the inputs and the outputs of a trained neural network model. In the
process of performing sensitivity analysis, the trained neural network’s learning
capability is disabled so that the network weights are not affected.

4. Search the Internet to find other ANN explanation methods.

Open-ended answer, depending on what a student finds.

8
Copyright © 2014 Pearson Education, Inc.
Section 6.5 Review Questions

1. How do SVM work?

Support vector machines (SVM) are supervised learning methods that produce
input-output functions from a set of labeled training data. Both classification
functions and regression functions are possible in SVMs, and these can be either
linear or nonlinear functions. For example, given a classification-type prediction
problem, linear classifiers (hyperplanes) can separate the data into multiple
subsections, each representing one of the classes. If there are n classes to group
data into, then the hyperplane will have n-1 dimensions.

2. What are the advantages and disadvantages of SVM?

SVMs are popular because of their superior predictive power and their theoretical
foundation. SVMs have demonstrated highly competitive performance in
numerous real-world prediction problems. A significant advantage of SVMs is
that while ANNs may suffer from multiple local minima, the solutions to SVMs
are global and unique. Two more advantages of SVMs are that they have a simple
geometric interpretation and give a sparse solution. The reason that SVMs often
outperform ANNs in practice is that they successfully deal with the “over fitting”
problem, which is a big issue with ANNs. One disadvantage with SVM is the
selection of the kernel type and kernel function parameters. A second and perhaps
more important limitation of SVMs are the speed and size, both in the training
and testing cycles. Model building in SVMs involves complex and time-
demanding calculations. From the practical point of view, perhaps the most
serious problem with SVMs is the high algorithmic complexity and extensive
memory requirements of the required quadratic programming in large-scale tasks.

3. What is the meaning of “maximum margin hyperplanes”? Why are they important
in SVM?

Although many linear classifiers (hyperplanes) can separate the data into multiple
subsections, only one hyperplane achieves the maximum separation between the
classes. This is the hyperplane whose distance from the nearest data points is
maximized. The trick is to find the parallel hyperplanes that separate the classes
and whose margin of distance is at a maximum. The assumption is that the larger
the margin or distance between these parallel hyperplanes, the better the
generalization power of the classifier.

4. What is “kernel trick”? How is it used in SVM?

The kernel trick is a method for converting a linear classifier algorithm into a
nonlinear one by using a nonlinear function to map the original observations into

9
Copyright © 2014 Pearson Education, Inc.
a higher-dimensional space; this makes a linear classification in the new space
equivalent to nonlinear classification in the original space. This is what enables
the general hyperplane approach to SVMs (which are inherently linear) to solve
nonlinear classification problems. Common kernel types are polynomial, radial
basis function (RBF), Gaussian RBF, and sigmoid.

Section 6.6 Review Questions

1. What are the main steps and decision points in developing a SVM model?

First, numericize the data. Each data instance must be represented as a vector of
numeric values, including the categorical variable (the classification). Second,
normalize (scale) the data. This prevents larger-magnitude attributes from
dominating the others during the learning process. Next, select the kernel type and
the kernel parameters. Finally, deploy the model.

2. How do you determine the optimal kernel type and kernel parameters?

You can do this experimentally, trying different ones out and comparing results.
Often the RBF is a good start. Deciding on the best parameters for a kernel
involves a parameter search method, such as cross-validation or grid search.

3. Compared to ANN, what are the advantages of SVM?

A significant advantage of SVMs is that while ANNs may suffer from multiple
local minima, the solutions to SVMs are global and unique. Two more advantages
of SVMs are that they have a simple geometric interpretation and give a sparse
solution. The reason that SVMs often outperform ANNs in practice is that they
successfully deal with the “over fitting” problem, which is a big issue with ANNs.

4. What are the common application areas for SVM? Conduct a search on the
Internet to identify popular application areas and specific SVM software tools
used in those applications.

Common applications of SVM include text and hypertext categorization, image


classification, protein classification for medical science, and hand-written
character recognition. Many of the same software products that include neural
network algorithms also include SVM algorithms (e.g., MATLAB, SPSS).
(Student answers will vary.)

Section 6.7 Review Questions

1. What is special about the kNN algorithm?

10
Copyright © 2014 Pearson Education, Inc.
The k-nearest neighbor algorithm is among the simplest of all machine-learning
algorithms. It is easy to understand (and explain to others) what it does and how it
does it.

2. What are the advantages and disadvantages of kNN as compared to ANN and
SVM?

Compared to both ANN and SVM, the k-nearest neighbor algorithm is very
simple to learn and implement. But it is a lazy learner, often reaching local rather
than global minima/maxima. In addition, the accuracy of the kNN algorithm can
be significantly different with different values of k. Furthermore, the predictive
power of the kNN algorithm degrades with the presence of noisy, inaccurate, or
irrelevant features.

3. What are the critical success factors for a kNN implementation?

One critical factor is selection of the best similarity metric for determining what is
a “nearest” neighbor. A second is the selection of the correct parameter (i.e., the k
value). This can be done using cross-validation.

4. What is a similarity (or distance measure)? How can it be applied to both


numerical and nominal valued variables?

A similarity measure is a mathematically calculable distance metric. Given a new


case, kNN makes predictions based on the outcome of the k neighbors closest in
distance to that point. Therefore, to make predictions with kNN, we need to define
a metric for measuring the distance between the new case and the cases from the
examples. Two popular approaches are Euclidean and rectilinear. Both of these
assume numerical data points. There are ways to measure distance for non-
numerical (nominal) data as well. In the simplest case, for a multi-value nominal
variable, if the value of that variable for the new case and that for the example
case are the same, the distance would be zero, otherwise one. In cases such as text
classification, more sophisticated metrics exist, such as the overlap metric (or
Hamming distance).

5. What are the common applications of kNN?

Image recognition, DNA sequencing, pattern recognition, statistical classification,


Internet marketing, and plagiarism detection are some of the possible applications.
(Student answers will vary.)

ANSWERS TO APPLICATION CASE QUESTIONS FOR DISCUSSION  


Application Case 6.1: Neural Networks Are Helping to Save Lives in the Mining
Industry

11
Copyright © 2014 Pearson Education, Inc.
1. How did neural networks help save lives in the mining industry?

The Council for Scientific and Industrial Research (CSIR) in South Africa
developed a device with an embedded neural network that assists any miner in
making an objective decision when determining the integrity of the hanging wall.
This helps prevent death and injury from rock falls, a common danger to miners.

2. What were the challenges, the proposed solution, and the obtained results?

In the mining industry, most of the underground injuries and fatalities are due to
rock falls (i.e., fall of hanging wall/roof). The method that has been used for many
years in the mines when determining the integrity of the hanging wall is to tap the
hanging wall with a sounding bar and listen to the sound emitted. An experienced
miner can differentiate an intact/solid hanging wall from a detached/loose hanging
wall by the sound that is emitted. But this method is subjective. The proposed
solution is to provide miners with a device that uses a trained neural network to
record and classify sounds to identify a hanging wall as either intact or detached.
The multilayer perceptron-type ANN architecture that was built achieved better
than 70 percent prediction accuracy on sample data. At this point, the system is in
its prototype-testing phase.

3. What was their implementation strategy? Why is it important to produce results as


early as possible in data mining studies?

The implementation was done using NeuroSolutions, a popular artificial neural


network modeling software developed by NeuroDimensions, Inc., to develop the
classification type prediction models.

Application Case 6.2: Predictive Modeling Is Powering the Power Generators

1. What are the key environmental concerns in the electric power industry?

Even though some energy-generation methods are favored over others, all forms
of electricity generation have positive and negative aspects. Some are
environmentally favored but are economically unjustifiable; others are
economically superior but environmentally prohibitive. In a market economy, the
options with fewer overall costs are generally chosen above all other sources. It is
not clear yet which form can best meet the necessary demand for electricity
without permanently damaging the environment. Current trends indicate that
increasing the shares of renewable energy and distributed generation from mixed
sources has the promise of reducing/balancing environmental and economic risks.

2. What are the main application areas for predictive modeling in the electric power
industry?

12
Copyright © 2014 Pearson Education, Inc.
Predictive modeling can be used to optimize operational parameters to produce
cleaner combustion and more stable flame temperatures. Another application is to
predict problems, such as failures or maintenance issues, before they happen.
Modeling can also be used to reduce NOx emissions.

3. How was predictive modeling used to address a variety of problems in the electric
power industry?

See answer to question #2.

Application Case 6.3: Sensitivity Analysis Reveals Injury Severity Factors in Traffic
Accidents

1. How does sensitivity analysis shed light on the black box (i.e., neural networks)?

Sensitivity analysis techniques provide a clear interpretation of how a neural


network does what it does; that is, specifically how (and to what extent) the
individual inputs factor into the generation of specific network output. This
process extracts the cause-and-effect relationships among the inputs and the
outputs of a trained neural network model.

2. Why would someone choose to use a black-box tool like neural networks over
theoretically sound, mostly transparent statistical tools like logistic regression?

ANN are known to be superior in capturing highly nonlinear complex


relationships between predictor and target variables, assuming a linear
relationship is often an oversimplification of the problem.

3. In this case, how did neural networks and sensitivity analysis help identify injury-
severity factors in traffic accidents?

ANN and sensitivity analysis helped estimate the significance of the crash factors
on the level of injury severity sustained by the driver. This study was a two-step
process. In the first step, the testers developed a series of prediction models (one
for each injury severity level) to capture the in-depth relationships between the
crash-related factors and a specific level of injury severity. In the second step,
they conducted sensitivity analysis on the trained neural network models to
identify the prioritized importance of crash-related factors as they relate to
different injury severity levels.

The study revealed that the variable seatbelt use was the most important
determinant for predicting higher levels of injury severity but it was one of the
least significant predictors for lower levels of injury severity. Other interesting

13
Copyright © 2014 Pearson Education, Inc.
findings involved gender (good predictor for low injury severity, but not for high)
and age (vice versa).

Application Case 6.4: Managing Student Retention with Predictive Modeling

1. Why is attrition one of the most important issues in higher education?

Student attrition has become one of the most challenging problems for decision
makers in academic institutions. In spite of all of the programs and services to
help retain students, according to the U.S. Department of Education, Center for
Educational Statistics (nces.ed.gov), only about half of those who enter higher
education actually graduate with a bachelor’s degree. High rates of student
attrition usually result in loss of financial resources, lower graduation rates, and
inferior perception of the school in the eyes of all stakeholders.

2. How can predictive analytics (ANN, SVM, and so forth) be used to better manage
student retention?

An alternative (or a complementary) approach to the traditional survey-based


retention research is an analytic approach where the data commonly found in
institutional databases is used. Educational institutions routinely collect a broad
range of information about their students, including demographics, educational
background, social involvement, socioeconomic status, and academic progress.
Using institutional data, prediction models can be developed to accurately identify
the students at risk of dropout, so that limited resources (people, money, time,
etc., at an institution’s student success center) can be optimally used to retain
most of them.

3. What are the main challenges and potential solutions to the use of analytics in
retention management?

In order to meet the challenges cited in the answer to #2, the main goals of
analytic studies in this area are to (1) develop models to correctly identify the
freshman students who are most likely to drop out after their freshman year, and
(2) identify the most important variables by applying sensitivity analyses on
developed models. The model building approach will involve the same steps you
usually perform in predictive analytics tasks, including data
collection/consolidation/preprocessing, cross-validation for parameter selection,
use of various algorithms (ANN, SVM, etc.), and sensitivity analysis.

Application Case 6.5: Efficient Image Recognition and Categorization with kNN

1. Why is image recognition/classification a worthy but difficult problem?

14
Copyright © 2014 Pearson Education, Inc.
Application areas of image recognition and categorization range from agriculture
to homeland security, personalized marketing to environmental protection. Image
recognition is an integral part of an artificial intelligence field called computer
vision. While the field of visual recognition and category recognition has been
progressing rapidly, much remains to be done to reach human-level performance.
Current approaches are capable of dealing with only a limited number of
categories (100 or so categories) and are computationally expensive.

2. How can kNN be effectively used for image recognition/classification


applications?

kNN classifiers are natural in this setting, and have computational advantages
over SVMs. But they suffered from the problem of high variance (in bias-variance
decomposition) in the case of limited sampling. By combining kNN with SVM,
you can improve performance while maintaining computational advantage.
Another possible hybrid is combining kNN with Naïve Bayes algorithm.

ANSWERS TO END OF CHAPTER QUESTIONS FOR DISCUSSION   

1. Discuss the evolution of ANN. How have biological networks contributed to the
development of artificial networks? How are the two networks similar?

Research on Artificial Neural Network (ANN) started nearly half a century ago.
McCulloch and Pitts (1943) introduced a simple model of a binary artificial
neuron that captured some functions of biological neurons. Using information-
processing machines to model the brain, McCulloch and Pitts built their neural
network model using a large number of interconnected artificial binary neurons.
With this foundation, neural network research became quite popular in the late
1950’s and early 1960’s. Introduction of new network topologies, new activation
functions, and new algorithms, as well as progress in neuroscience and cognitive
science have influenced recent ANN research to a great extent. Advances in
theory and methodology have overcome many obstacles that hindered neural
network research a few decades ago. Evidenced by the appealing results of
numerous studies, neural networks are gaining in acceptance and popularity.
The functional aspects of biological networks have contributed to the elementary
development of ANN, although the intricacies of biological neural network
cannot be replicated artificially. ANN’s have far fewer neurons than biological
networks. (See 6.2 Basic Concepts of Neural Networks for more details.)

2. What are the major concepts related to network information processing in ANN?
Explain the summation and transformation functions and their combined effects
on ANN performance.

15
Copyright © 2014 Pearson Education, Inc.
The major concepts that are related to network information processing in ANN
are inputs, outputs, connection weights, summation function, transformation
(transfer) function, and hidden layers.
The summation and transformation functions use a neuron’s inputs to create its
output. First, the summation function aggregates the inputs into a single value
based on weighted summation. Second, the transformation function converts the
input value to an output value, generally between 0 and 1 (see 6.2 Basic Concepts
of Neural Networks – Network Information Processing for more details.)

3. Discuss the common ANN architectures. What are the main differences between
Kohonen’s self-organizing feature maps and Hopfield networks?

There are several neural network architectures—feedforward, associative


memory, recurrent networks, Kohonen’s self-organizing feature maps, and
Hopfield networks. In a generic architecture of a feedforward network
architecture the information flows unidirectional from input layer to hidden layers
to output layer. In contrast, recurrent neural network architecture has many
connections in every direction between the layers and neurons, creating a complex
connection structure.

Kohonen’s Self-Organizing Feature Maps (SOM) provide ways to represent


multidimensional data in much lower dimensional spaces, usually one or two
dimensions. In such a case, an output vector is usually absent, and the SOMs learn
to classify data without any supervision. Therefore, the SOM is commonly used
for clustering tasks where a group of cases may be assigned an arbitrary number
of naturals groups. Figure 6.8a in the text is used to illustrate a very small
Kohonen network, which consists 4 × 4 nodes that are connected to the input
layer (with three inputs), representing a two-dimensional vector.

In Hopfield Networks a series highly interconnected networks of nonlinear


neurons solve complex computational problems. One of the major advantages of
Hopfield neural networks is the fact that their structure can be realized on an
electronic circuit board, possibly on a VLSI (very large-scale integration) circuit,
to be used as an online solver with a parallel-distributed process. Architecturally,
a general Hopfield network is represented as a single large layer of neurons with
total interconnectivity; that is, each neuron is connected to every other neuron
within the network (see Figure 6.8b). Ultimately, the architecture of a neural
network model is driven by the task it is intended to carry out. For instance,
neural network models have been used as classifiers, as forecasting tools, as
customer segmentation mechanisms, and as general optimizers.

4. Explain the steps in neural network-based systems development. What procedures


are involved in back propagation learning algorithms and how do they work?

The steps to develop a neural-network based system are as follows:


1. Collect, organize, and format the data

16
Copyright © 2014 Pearson Education, Inc.
2. Separate data into training, validation, and testing sets
3. Decide on a network architecture and structure
4. Select a learning algorithm
5. Set network parameters and initialize their values
6. Initialize weights and start training (and validation)
7. Stop training, freeze the network weights
8. Test the trained network
9. Deploy the network for use on unknown new cases.

The learning algorithm includes the following procedures:


1. Initialize weights with random values and set other parameters.
2. Read in the input vector and the desired output.
3. Compute the actual output via the calculations, working forward through the
layers.
4. Compute the error.
5. Change the weights by working backward from the output layer through the
hidden layers.

This procedure is repeated for the entire set of input vectors until the desired.
Output and the actual output agree within some predetermined tolerance. Given
the calculation requirements for iteration, a large network can take a very long
time to train; therefore, in one variation, a set of cases is run forward and an
aggregated error is fed backward to speed up learning.

5. A building society uses neural network to predict the creditworthiness of


mortgage applicants. There are two output nodes: one for yes (1=yes, 0=no) and
one for no (1=no, 0=yes). If an applicant scores 0.80 for the “yes” output node
and 0.39 for the “no” output node, what will be the outcome of the application?
Would the applicant be a good credit risk?

The score for “yes” is stronger than the score for “no.” The factors that led to the
scores are unknown, but the two scores can be expected to be independent of each
other and may be based on different combinations of inputs. Were a simple
threshold value of 0.5 used to determine the validity of an output value this
network would have affirmed creditworthiness, denied non-creditworthiness.

The ANN output suggests that the applicant is probably a good (though perhaps
not outstanding) credit risk. The relatively high score for non-creditworthiness
suggests a possible problem in the applicant’s background that should be looked
into further before credit is granted.

6. Stock markets can be unpredictable. Factors that may cause changes in stock
prices are only imperfectly known, and understanding them has not been
successful. Would ANN be a viable solution? Compare and contrast ANN with
other decision support technologies.

17
Copyright © 2014 Pearson Education, Inc.
The factors that lead to changes in stock prices are imperfectly known. An ANN
might be able to find them in a mass of data because it has no preconceived
notions about what they should be. However, an ANN could fail in this
application if it cannot identify the relevant factors, or if it identifies a set of
factors that would have predicted stock movements during one time period (when
the market was driven by one set of factors) but are not useful in predicting them
in another period (when it is driven by different factors). Attempting to use factors
derived from analysis of an earlier period to guide investments in a later one could
be a recipe for bankruptcy.

Other decision support technologies are driven by human guidance in some way
and rely ultimately on human decision makers to identify the relevant factors.

ANSWERS TO END OF CHAPTER APPLICATION CASE QUESTIONS  


1. Why is beer flavor important to the Coors profitability?

Beer flavor is a key factor in customer preferences and buying behavior.

2. What is the objective of the neural network used at Coors?

To determine the link between the chemical composition of a beer, which can be
measured, and its flavor, which cannot be measured but which consumers can
detect and care about.

3. Why were the results of the Coors neural network initially poor, and what was
done to improve the results?

The results were initially poor for two reasons. First, the network was only trained
with one type of beer, so the variation in its inputs was low. Second, it was only
trained for one flavor factor, so its performance was impacted by the inputs that
had no impact on that factor but whose variations within the training sample
created distracting “noise.”

To improve the results, Coors trained the neural network using a wider variety of
products and more combinations of inputs.

4. What benefits might Coors derive if this project is successful?

If this project is successful, Coors might ultimately be able to control the flavors
of its beers via their chemical composition, which could be monitored
automatically and presumably controlled during the brewing process. Being able
to do this depends, of course, on more than knowing the relationships between the
chemicals in beer and its flavor.

18
Copyright © 2014 Pearson Education, Inc.
5. What modifications would you provide to improve the results of beer flavor
prediction?

Based on the next-to-last paragraph of the case, I would refine the sensitivity of
the instrumentation, measure a larger number of flavor-active compounds, and
measure the factors that contribute to mouth-feel and the beer’s physical
characteristics. Whether this is practical, if practical whether it is cost-effective,
and whether it would result in a more effective beer production process than
Coors can achieve with experienced, professional brew masters, is (as far as we
can tell from this case) still an open question.

19
Copyright © 2014 Pearson Education, Inc.

You might also like