Lecture 1

INT234: Predictive Analytics
By: Dr. Avinash Kaur (Associate Professor)

Overview of Predictive Analytics
• A small direct response company had developed dozens of programs in
cooperation with major brands to sell books and DVDs. These affi nity
programs were very successful, but required considerable up-front work
to develop the creative content and determine which customers, already
engaged with the brand, were worth the signifi cant marketing spend to
purchase the books or DVDs on subscription. Typically, they fi rst
developed test mailings on a moderately sized sample to determine if the
expected response rates were high enough to justify a larger program.
What is analytics
• Analytics is the process of using computational methods to

discover and report infl uential patterns in data.
• The goal of analytics is to gain insight and often to affect
decisions. Data is necessarily a measure of historic
information so, by definition, analytics examines historic
data.
• The term itself rose to prominence in 2005, in large part
due to the introduction of Google Analytics.
• Nevertheless, the ideas behind analytics are not new at all
but have been represented by different terms throughout
the decades, including cybernetics, data analysis, neural
networks, pattern recognition, statistics, knowledge
discovery, data mining, and now even data science
What Is Predictive Analytics?
• Predictive analytics is the process of
discovering interesting and meaningful
patterns in data.
• It draws from several related disciplines, some
of which have been used to discover patterns
in data for more than 100 years, including
pattern recognition, statistics, machine
learning, artificial intelligence, and data
mining.
What differentiates predictive analytics from
other types of analytics?
• First, predictive analytics is data-driven, meaning

that algorithms derive key characteristic of the
models from the data itself rather than from
assumptions made by the analyst.
• Second, predictive analytics algorithms automate
the process of finding the patterns from the data.
Powerful induction algorithms not only discover
coefficients or weights for the models, but also
the very form of the models.
Supervised vs. Unsupervised Learning
• Algorithms for predictive modeling are often
divided into two groups: supervised learning
methods and unsupervised learning methods.
Supervised learning
• In supervised learning models, the supervisor is the
target variable, a column in the data representing
values to predict from other columns in the data.
• The target variable is chosen to represent the answer
to a question the organization would like to answer or
a value unknown at the time the model is used that
would help in decisions.
• Sometimes supervised learning is also called predictive
modeling.
• The primary predictive modeling algorithms are
classification for categorical target variables or
regression for continuous target variables.
• Examples of target variables include whether
a customer purchased a product, the amount
of a purchase, if a transaction was fraudulent,
if a customer stated they enjoyed a movie,
how many days will transpire before the next
gift a donor will make, if a loan defaulted, and
if a product failed.
• Records without a value for the target variable
cannot be used in building predictive models
Unsupervised learning
• Unsupervised learning, sometimes called
descriptive modeling, has no target variable.
• The inputs are analyzed and grouped or
clustered based on the proximity of input values
to one another.
• Each group or cluster is given a label to indicate
which group a record belongs to.
• In some applications, such as in customer
analytics, unsupervised learning is just called
segmentation because of the function of the
models (segmenting customers into groups).
Predictive Analytics vs. Statistics
• Predictive analytics and statistics have
considerable overlap, with some statisticians
arguing that predictive analytics is, at its core, an
extension of statistics.
• Predictive modelers, for their part, often use
algorithms and tests common in statistics as a
part of their regular suite of techniques,
sometimes without applying the diagnostics most
statisticians would apply to ensure the models
are built properly.
Machine learning
• Machine learning is an application of
artificial intelligence (AI) that provides
systems the ability to automatically learn and
improve from experience without being
explicitly programmed.
• Machine learning focuses on the
development of computer programs that can
access data and use it learn for themselves
Steps to apply machine learning to
your data
1. Collecting data: Whether the data is written
on paper, recorded in text files and
spreadsheets, or stored in an SQL database,
you will need to gather it in an electronic
format suitable for analysis. This data will
serve as the learning material an algorithm
uses to generate actionable knowledge.
2. Exploring and preparing the data
The quality of any machine learning project is based largely
on the quality of data it uses.
This step in the machine learning process tends to require a
great deal of human intervention.
An often cited statistic suggests that 80 percent of the
effort in machine learning is devoted to data.
Much of this time is spent learning more about the data
and its nuances during a practice called data exploration
3. Training a model on the data
By the time the data has been prepared for analysis, you are
likely to have a sense of what you are hoping to learn from
the data.
The specific machine learning task will inform the selection of an
appropriate algorithm, and the algorithm will represent the
data in the form of a model.
4. Evaluating model performance:
Because each machine learning model results in a
biased solution to the learning problem, it is
important to evaluate how well the algorithm
learned from its experience.
Depending on the type of model used, you might
be able to evaluate the accuracy of the model
using a test dataset, or you may need to develop
measures of performance specific to the
intended application
5. Improving model performance:
If better performance is needed, it becomes necessary to
utilize more advanced strategies to augment the
performance of the model. Sometimes, it may be
necessary to switch to a different type of model
altogether.
You may need to supplement your data with additional
data, or perform additional preparatory work as in step
two of this process.
Using R for machine learning
• https://cran.r-project.org/index.html
• Download R
Installing a package in R
• Live demonstration

Lecture 1

Uploaded by

Copyright:

Available Formats

Lecture 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 1

Uploaded by

Copyright:

Available Formats

INT234: Predictive Analytics

By: Dr. Avinash Kaur (Associate Professor)

• Analytics is the process of using computational methods to

• First, predictive analytics is data-driven, meaning

You might also like