This document provides an overview of predictive analytics and machine learning. It defines predictive analytics as using computational methods to discover patterns in data in order to gain insights and affect decisions. Predictive models are either supervised, using a target variable, or unsupervised, grouping data without a target. Machine learning allows systems to learn from data without being explicitly programmed. The key steps of machine learning involve collecting and preparing data, training a model on the data, evaluating model performance, and improving model performance. R is a commonly used programming language for machine learning tasks.
This document provides an overview of predictive analytics and machine learning. It defines predictive analytics as using computational methods to discover patterns in data in order to gain insights and affect decisions. Predictive models are either supervised, using a target variable, or unsupervised, grouping data without a target. Machine learning allows systems to learn from data without being explicitly programmed. The key steps of machine learning involve collecting and preparing data, training a model on the data, evaluating model performance, and improving model performance. R is a commonly used programming language for machine learning tasks.
This document provides an overview of predictive analytics and machine learning. It defines predictive analytics as using computational methods to discover patterns in data in order to gain insights and affect decisions. Predictive models are either supervised, using a target variable, or unsupervised, grouping data without a target. Machine learning allows systems to learn from data without being explicitly programmed. The key steps of machine learning involve collecting and preparing data, training a model on the data, evaluating model performance, and improving model performance. R is a commonly used programming language for machine learning tasks.
This document provides an overview of predictive analytics and machine learning. It defines predictive analytics as using computational methods to discover patterns in data in order to gain insights and affect decisions. Predictive models are either supervised, using a target variable, or unsupervised, grouping data without a target. Machine learning allows systems to learn from data without being explicitly programmed. The key steps of machine learning involve collecting and preparing data, training a model on the data, evaluating model performance, and improving model performance. R is a commonly used programming language for machine learning tasks.
Download as PPT, PDF, TXT or read online from Scribd
Download as ppt, pdf, or txt
You are on page 1of 19
INT234: Predictive Analytics
By: Dr. Avinash Kaur (Associate Professor)
Overview of Predictive Analytics • A small direct response company had developed dozens of programs in cooperation with major brands to sell books and DVDs. These affi nity programs were very successful, but required considerable up-front work to develop the creative content and determine which customers, already engaged with the brand, were worth the signifi cant marketing spend to purchase the books or DVDs on subscription. Typically, they fi rst developed test mailings on a moderately sized sample to determine if the expected response rates were high enough to justify a larger program. What is analytics
• Analytics is the process of using computational methods to
discover and report infl uential patterns in data. • The goal of analytics is to gain insight and often to affect decisions. Data is necessarily a measure of historic information so, by definition, analytics examines historic data. • The term itself rose to prominence in 2005, in large part due to the introduction of Google Analytics. • Nevertheless, the ideas behind analytics are not new at all but have been represented by different terms throughout the decades, including cybernetics, data analysis, neural networks, pattern recognition, statistics, knowledge discovery, data mining, and now even data science What Is Predictive Analytics? • Predictive analytics is the process of discovering interesting and meaningful patterns in data. • It draws from several related disciplines, some of which have been used to discover patterns in data for more than 100 years, including pattern recognition, statistics, machine learning, artificial intelligence, and data mining. What differentiates predictive analytics from other types of analytics?
• First, predictive analytics is data-driven, meaning
that algorithms derive key characteristic of the models from the data itself rather than from assumptions made by the analyst. • Second, predictive analytics algorithms automate the process of finding the patterns from the data. Powerful induction algorithms not only discover coefficients or weights for the models, but also the very form of the models. Supervised vs. Unsupervised Learning • Algorithms for predictive modeling are often divided into two groups: supervised learning methods and unsupervised learning methods. Supervised learning • In supervised learning models, the supervisor is the target variable, a column in the data representing values to predict from other columns in the data. • The target variable is chosen to represent the answer to a question the organization would like to answer or a value unknown at the time the model is used that would help in decisions. • Sometimes supervised learning is also called predictive modeling. • The primary predictive modeling algorithms are classification for categorical target variables or regression for continuous target variables. • Examples of target variables include whether a customer purchased a product, the amount of a purchase, if a transaction was fraudulent, if a customer stated they enjoyed a movie, how many days will transpire before the next gift a donor will make, if a loan defaulted, and if a product failed. • Records without a value for the target variable cannot be used in building predictive models Unsupervised learning • Unsupervised learning, sometimes called descriptive modeling, has no target variable. • The inputs are analyzed and grouped or clustered based on the proximity of input values to one another. • Each group or cluster is given a label to indicate which group a record belongs to. • In some applications, such as in customer analytics, unsupervised learning is just called segmentation because of the function of the models (segmenting customers into groups). Predictive Analytics vs. Statistics • Predictive analytics and statistics have considerable overlap, with some statisticians arguing that predictive analytics is, at its core, an extension of statistics. • Predictive modelers, for their part, often use algorithms and tests common in statistics as a part of their regular suite of techniques, sometimes without applying the diagnostics most statisticians would apply to ensure the models are built properly. Machine learning • Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. • Machine learning focuses on the development of computer programs that can access data and use it learn for themselves Steps to apply machine learning to your data 1. Collecting data: Whether the data is written on paper, recorded in text files and spreadsheets, or stored in an SQL database, you will need to gather it in an electronic format suitable for analysis. This data will serve as the learning material an algorithm uses to generate actionable knowledge. 2. Exploring and preparing the data The quality of any machine learning project is based largely on the quality of data it uses. This step in the machine learning process tends to require a great deal of human intervention. An often cited statistic suggests that 80 percent of the effort in machine learning is devoted to data. Much of this time is spent learning more about the data and its nuances during a practice called data exploration 3. Training a model on the data By the time the data has been prepared for analysis, you are likely to have a sense of what you are hoping to learn from the data. The specific machine learning task will inform the selection of an appropriate algorithm, and the algorithm will represent the data in the form of a model. 4. Evaluating model performance: Because each machine learning model results in a biased solution to the learning problem, it is important to evaluate how well the algorithm learned from its experience. Depending on the type of model used, you might be able to evaluate the accuracy of the model using a test dataset, or you may need to develop measures of performance specific to the intended application 5. Improving model performance: If better performance is needed, it becomes necessary to utilize more advanced strategies to augment the performance of the model. Sometimes, it may be necessary to switch to a different type of model altogether. You may need to supplement your data with additional data, or perform additional preparatory work as in step two of this process. Using R for machine learning • https://cran.r-project.org/index.html