SAS E Miner Cloud-Based Software - Tutorial 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

1

Chapter 1

Introduction to SAS Enterprise


Miner 15.2

What Is SAS Enterprise Miner? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


How Does SAS Enterprise Miner Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Benefits of Using SAS Enterprise Miner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Accessibility Features of SAS Enterprise Miner 15.2 . . . . . . . . . . . . . . . . . . . . . . . . . 3
Overview of Accessibility Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Exceptions to Standard Keyboard Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Other Exceptions to Accessibility Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Getting to Know the Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

What Is SAS Enterprise Miner?


SAS Enterprise Miner streamlines the data mining process to create highly accurate
predictive and descriptive models based on analysis of vast amounts of data from across
an enterprise. Data mining is applicable in a variety of industries and provides
methodologies for such diverse business problems as fraud detection, householding,
customer retention and attrition, database marketing, market segmentation, risk analysis,
affinity analysis, customer satisfaction, bankruptcy prediction, and portfolio analysis.
In SAS Enterprise Miner, the data mining process has the following (SEMMA) steps:
• Sample the data by creating one or more data sets. The sample should be large
enough to contain significant information, yet small enough to process. This step
includes the use of data preparation tools for data import, merge, append, and filter,
as well as statistical sampling techniques.
• Explore the data by searching for relationships, trends, and anomalies in order to
gain understanding and ideas. This step includes the use of tools for statistical
reporting and graphical exploration, variable selection methods, and variable
clustering.
• Modify the data by creating, selecting, and transforming the variables to focus the
model selection process. This step includes the use of tools for defining
transformations, missing value handling, value recoding, and interactive binning.
• Model the data by using the analytical tools to train a statistical or machine learning
model to reliably predict a desired outcome. This step includes the use of techniques
such as linear and logistic regression, decision trees, neural networks, partial least
2 Chapter 1 • Introduction to SAS Enterprise Miner 15.2

squares, LARS and LASSO, nearest neighbor, and importing models defined by
other users or even outside SAS Enterprise Miner.
• Assess the data by evaluating the usefulness and reliability of the findings from the
data mining process. This step includes the use of tools for comparing models and
computing new fit statistics, cutoff analysis, decision support, report generation, and
score code management.
You might or might not include all of the SEMMA steps in an analysis, and it might be
necessary to repeat one or more of the steps several times before you are satisfied with
the results.
After you have completed the SEMMA steps, you can apply a scoring formula from one
or more champion models to new data that might or might not contain the target
variable. Scoring new data that is not available at the time of model training is the goal
of most data mining problems.
Furthermore, advanced visualization tools enable you to quickly and easily examine
large amounts of data in multidimensional histograms and to graphically compare
modeling results.
Scoring new data that is not available at the time of model training is the goal of most
data mining exercises. SAS Enterprise Miner includes tools for generating and testing
complete score code for the entire process flow diagram as SAS Code, C code, and Java
code, as well as tools for interactively scoring new data and examining the results. You
can register your model to a SAS Metadata Server to share your results with users of
applications such as SAS Enterprise Guide and SAS Data Integration Studio that can
integrate the score code into reporting and production processes. SAS Model Manager
complements the data mining process by providing a structure for managing projects
through development, testing, and production environments and is fully integrated with
SAS Enterprise Miner.

How Does SAS Enterprise Miner Work?


In SAS Enterprise Miner, the data mining process is driven by a process flow diagram
that you create by dragging nodes from a toolbar that is organized by SEMMA
categories and dropping them onto a diagram workspace.

Figure 1.1 Example Process Flow Diagram

The graphical user interface (GUI) is designed in such a way that the business analyst
who has little statistical expertise can navigate through the data mining methodology,
and the quantitative expert can explore each node in depth to fine-tune the analytical
process.
SAS Enterprise Miner automates the scoring process and supplies complete scoring code
for all stages of model development in SAS, C, Java, and PMML. The scoring code can
Accessibility Features of SAS Enterprise Miner 15.2 3

be deployed in a variety of real-time or batch environments within SAS, on the web, or


directly in relational databases.

Benefits of Using SAS Enterprise Miner


The benefits of using SAS Enterprise Miner include the following:
• Support the entire data mining process with a broad set of tools. Regardless of
your data mining preference or skill level, SAS Enterprise Miner is flexible and
addresses complex problems. Going from raw data to accurate, business-driven data
mining models becomes a seamless process, enabling the statistical modeling group,
business managers, and the IT department to collaborate more efficiently.
• Build more models faster with an easy-to-use GUI. The process flow diagram
environment dramatically shortens model development time for both business
analysts and statisticians. SAS Enterprise Miner includes an intuitive user interface
that incorporates common design principles established for SAS software and
additional navigation tools for moving easily around the workspace. The GUI can be
customized for all analysts' needs via flexible, interactive dialog boxes, code editors,
and display settings.
• Enhance accuracy of predictions. Innovative algorithms enhance the stability and
accuracy of predictions, which can be verified easily by visual model assessment and
validation. Both analytical and business users enjoy a common, easy-to-interpret
visual view of the data mining process. The process flow diagrams serve as self-
documenting templates that can be updated easily or applied to new problems
without starting over from scratch.
• Surface business information and easily share results through the unique model
repository. Numerous integrated assessment features enable you to compare results
of different modeling techniques in both statistical and business terms within a
single, easy-to-interpret framework. SAS Enterprise Miner projects support the
collaborative sharing of modeling results among quantitative analysts. Models can
also be imported into the SAS Model Manager repository for sharing with scoring
officers and independent model validation testers.

Accessibility Features of SAS Enterprise Miner


15.2

Overview of Accessibility Features


SAS Enterprise Miner 15.2 includes accessibility and compatibility features that improve
the usability of the product for users with disabilities, with the exceptions noted below.
These features are related to accessibility standards for electronic information
technology that were adopted by the U.S. Government under Section 508 of the U.S.
Rehabilitation Act of 1973, as amended.
SAS Enterprise Miner 15.2 conforms to accessibility standards for the Windows
platform. For specific information about Windows accessibility features, refer to your
operating system's help.
4 Chapter 1 • Introduction to SAS Enterprise Miner 15.2

If you have questions or concerns about the accessibility of SAS products, send email to
[email protected].

Exceptions to Standard Keyboard Controls


SAS Enterprise Miner 15.2 uses the same keyboard shortcuts as other Windows
applications, with these exceptions:
• There is no keyboard equivalent for accessing the Explore window for a data source
via the right-click pop-up menu. However, an alternate control is accessible from the
View menu.
• There are no keyboard equivalents for these actions:
• selecting a SAS Server Directory that is a subdirectory lower in the tree than the
default folders in the Create New Project Wizard
• selecting and editing the value of column attributes in the Data Source Wizard
• maximizing or minimizing the Results window
• accessing the Expression Builder in the Transform Variable node

Other Exceptions to Accessibility Standards


Other exceptions to the accessibility standards described in Section 508 of the U.S.
Rehabilitation Act of 1973 include the following:
• On-screen indication of the current focus is not well-defined in some dialog boxes, in
some menus, and in tables.
• High contrast color schemes are not universally inherited.
• SAS Enterprise Miner 15.2 is not fully accessible to assistive technologies:
• Many controls are not read by JAWS, and the accessible properties of many
controls are not surfaced to the Java Accessibility API.
• Some content in the Data Source Wizard and Library Wizard is not accessible.

Getting to Know the Graphical User Interface


You use the SAS Enterprise Miner GUI to build a process flow diagram that controls
your data mining project.
Getting to Know the Graphical User Interface 5

Figure 1.2 The SAS Enterprise Miner GUI

1. Toolbar Shortcut Buttons — Use the toolbar shortcut buttons to perform common
computer functions and frequently used SAS Enterprise Miner operations. Move the
mouse pointer over any shortcut button to see the text name. Click a shortcut button
to use it.
2. Project Panel — Use the Project Panel to manage and view data sources, diagrams,
results, and project users.
3. Properties Panel — Use the Properties Panel to view and edit the settings of data
sources, diagrams, nodes, and users.
4. Property Help Panel — The Property Help Panel displays a short description of any
property that you select in the Properties Panel. Extended help can be found from the
Help main menu.
6 Chapter 1 • Introduction to SAS Enterprise Miner 15.2

5. Toolbar — The Toolbar is a graphic set of node icons that you use to build process
flow diagrams in the Diagram Workspace. Drag a node icon into the Diagram
Workspace to use it. The icon remains in place in the Toolbar, and the node in the
Diagram Workspace is ready to be connected and configured for use in the process
flow diagram.
6. Diagram Workspace — Use the Diagram Workspace to build, edit, run, and save
process flow diagrams. In this workspace, you graphically build, order, sequence,
and connect the nodes that you use to mine your data and generate reports.
7. Diagram Navigation Toolbar — Use the Diagram Navigation Toolbar to organize
and navigate the process flow diagram.
TIP The book “Predictive Modeling with SAS Enterprise Miner: Practical Solutions
for Business Applications” provides examples of saving and exporting SAS code and
offers additional discussion about the SAS Enterprise Miner graphical user interface.
7

Chapter 2

Learning by Example: Building


and Running a Process Flow

About the Scenario in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7


Prerequisites for This Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

About the Scenario in This Book


This book presents a basic data mining example that is intended to familiarize you with
many features of SAS Enterprise Miner. In this example, you learn how to perform tasks
that are required to build and run a process flow in order to solve a particular business
problem. You should follow the chapters and the steps within the chapters in the order in
which they are presented.
For the purpose of the scenario in this book, you are a data analyst at a national
charitable organization. Your organization seeks to use the results of a previous postcard
mail solicitation for donations to better target its next one. In particular, you want to
determine which of the individuals in your mailing database have characteristics similar
to those of your most profitable donors. By soliciting only these people, your
organization can spend less money on the solicitation effort and more money on
charitable concerns.
When you have finished building the process flow diagram as outlined in this example,
the diagram will resemble the one shown below:
8 Chapter 2 • Learning by Example: Building and Running a Process Flow

TIP “Predictive Modeling with SAS Enterprise Miner: Practical Solutions for
Business Applications” provides several additional example process flow diagrams
for you to create and run.

Prerequisites for This Example


In order to re-create this example, you must have access to SAS Enterprise Miner 15.2,
either as a client/server application or as a complete installation on your local machine.
You must also have saved, on the SAS Enterprise Miner server machine, a copy of the
sample data that is used in the example. You can download a ZIP file that contains the
data from http://support.sas.com/documentation/onlinedoc/miner/. Look for the item
named Example data for Getting Started with SAS Enterprise Miner 15.2. For more
Prerequisites for This Example 9

information about the structure of the sample data, see Sample Data Reference on page
63.
10 Chapter 2 • Learning by Example: Building and Running a Process Flow
11

Chapter 3

Set Up the Project

About the Tasks That You Will Perform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11


Create a New Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Create a Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Create a Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Create a Diagram and Add the Input Data Node . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

About the Tasks That You Will Perform


To perform these tasks, you need to have downloaded and unzipped the example data for
SAS Enterprise Miner 15.2. If you have not, then see “Prerequisites for This Example”
on page 8.
To set up the example project, you will perform the following tasks:
1. You will create a new SAS Enterprise Miner project.
2. You will define a new library that enables SAS Enterprise Miner to access the
sample data.
3. You will define a new data source in the project, which is later used to import the
sample data into a process flow.
4. You will create a new diagram within the project, and you will create the first node
(for the input data source) in the process flow.
The steps in this example are written as if you were completing them in their entirety
during one SAS Enterprise Miner session. However, you can easily complete the steps
over multiple sessions. To return to the example project after you have closed and
reopened SAS Enterprise Miner, click Open Project in the Welcome to Enterprise
Miner window, and navigate to the saved project.

Create a New Project


In SAS Enterprise Miner, you store your work in projects. A project can contain multiple
process flow diagrams and information that pertains to them.
12 Chapter 3 • Set Up the Project

TIP For organizational purposes, it is a good idea to create a separate project for each
major data mining problem that you want to investigate.
To create the project that you will use in this example:
1. Open SAS Enterprise Miner.
2. In the Welcome to Enterprise Miner window, click New Project. The Create New
Project Wizard opens.
3. Proceed through the steps below to complete the wizard. Contact your system
administrator if you need to be granted directory access or if you are unsure about
the details of your site's configuration.
a. Select the logical workspace server to use. Click Next.
b. Enter Getting Started Charitable Giving Example as the Project
Name.
The SAS Server Directory is the directory on the server machine in which SAS
data sets and other files that are generated by the project will be stored. It is
likely that your site is configured in such a way that the default path is
appropriate for this example. Click Next.
c. The SAS Folder Location is the directory on the server machine in which the
project itself will be stored. It is likely that your site is configured in such a way
that the default path is appropriate for the example project that you are about to
create. Click Next.
Note: If you complete this example over multiple sessions, then this is the
location to which you should navigate after you select Open Project in the
Welcome to Enterprise Miner window.
d. Click Finish.

Create a Library
In order to access the sample data sets using SAS Enterprise Miner, you must create a
SAS library to indicate to SAS the location in which they are stored. When you create a
library, you give SAS a shortcut name and pointer to a storage location in your operating
environment where you store SAS files.
To create a new SAS library for the sample data:
1. On the File menu, select New ð Library. The Library Wizard opens.
2. Proceed through the steps below to complete the wizard. Contact your system
administrator if you need to be granted directory access or if you are unsure about
the details of your site's configuration.
a. The Create New Library option button is automatically selected. Click Next.
b. Enter Donor as the Name.
Then enter the Path to the directory on the server machine that contains the
sample data that you downloaded from the web. For example, if the sample data
is located on the desktop of the server machine (denoted by the C drive), then
you could enter C:\Users\<username>\Desktop, where <username> is
your user name on the server machine. Click Next.
c. Click Finish.
Create a Data Source 13

Create a Data Source


To use sample data that is stored in a SAS data set in a SAS Enterprise Miner project,
you need to define a data source. In SAS Enterprise Miner, a data source stores the
metadata of an input data set.
TIP You can also use input data saved in files (with extensions such as .jmp and .csv)
that are not SAS data sets in a process flow. To import an external file into a process
flow diagram, use the File Import node, which is located on the Sample tab on the
Toolbar.
To create a new data source for the sample data:
1. On the File menu, select New ð Data Source. The Data Source Wizard opens.
2. Proceed through the steps that are outlined in the wizard.
a. SAS Table is automatically selected as the Source. Click Next.
b. Enter DONOR.DONOR_RAW_DATA as the two-level filename of the Table. Click
Next.
c. Click Next.
d. Select the Advanced option button. Click Next.
e. Change the value of Role for the variables to match the description below. Then,
click Next.
• CONTROL_NUMBER should have the Role ID.
• TARGET_B should have the Role Target.
• TARGET_D should have the Role Rejected.
• CLUSTER_CODE should have the Role Rejected.
• All other variables should have the Role Input.
14 Chapter 3 • Set Up the Project

To change an attribute, click the value of that attribute and select from the drop-
down menu that appears.
Note: SAS Enterprise Miner automatically assigns the role Target to any
variable whose name begins with the prefix TARGET_. For more
information about the rules that SAS Enterprise Miner uses to automatically
assign roles, see the SAS Enterprise Miner Help.
f. Select the Yes option button to indicate that you want to build models based on
the values of decisions. Click Next.
• On the Prior Probabilities tab, select the Yes option button to indicate that
you want to enter new prior probabilities. In the Adjusted Prior column of the
table, enter 0.05 for Level 1 and 0.95 for Level 0.

The values in the Prior column reflect the proportions of observations in the
data set for which TARGET_B is equal to 1 and 0 (0.25 and 0.75,
respectively). However, as the business analyst, you know that these
proportions resulted from over-sampling of donors from the 97NK
solicitation. In fact, you know that the true proportion of donors for the
solicitation was closer to 0.05 than 0.25. For this reason, you adjust the prior
probabilities.
• On the Decision Weights tab, the Maximize option button is automatically
selected, which indicates that you want to maximize profit in this analysis.
Enter 14.5 as the Decision 1 weight for Level 1, -0.5 as the Decision 1
weight for Level 0, and 0.0 as the Decision 2 weight for both levels. Click
Next.

In this example, Decision 1 is the decision to mail a solicitation to an


individual. Decision 2 is the decision to not mail a solicitation. If you mail a
solicitation, and the individual does not respond, then your cost is $0.50 (the
price of postage). However, if the individual does respond, then based on the
previous solicitation, you expect to receive a $15.00 donation on average.
Less the $0.50 postage cost, your organization expects $14.50 profit. If you
do not mail a solicitation, you neither incur a cost nor expect a profit. These
numbers are reflected in the decision weights that you entered in the table.
Click Next.
g. In the Data Source Wizard — Create Sample window, you decide whether to
create a sample data set from the entire data source. This example uses the entire
data set, so you need to select No. Click Next.
h. The Role of the data source is automatically selected as Raw. Click Next.
i. Click Finish.
Create a Diagram and Add the Input Data Node 15

Create a Diagram and Add the Input Data Node


Now that you have created a project and defined the data source, you are ready to begin
building a process flow diagram.
To create a process flow diagram and add the first node:
1. On the File menu, select New ð Diagram.
2. Enter Donations as the Diagram Name, and click OK. An empty diagram opens
in the Diagram Workspace.
3. Select the DONOR_RAW_DATA data source in the Project Panel. Drag it into the
Diagram Workspace; this action creates the input data node.

TIP Refer to “Predictive Modeling with SAS Enterprise Miner: Practical Solutions
for Business Applications” for more examples about creating new projects, creating
data sources, creating diagrams, and adding nodes to your diagram workspace. The
book also discusses your metadata options.

You might also like