Gasue34 003 PDF
Gasue34 003 PDF
Gasue34 003 PDF
Course Notes
Data Manipulation and Analytics Using SAS® University Edition Course Notes was developed by Jaymie
Shanahan and Mike Speed. Editing and production support was provided by the Curriculum Development
and Support Department.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product
names are trademarks of their respective companies.
Data Manipulation and Analytics Using SAS ® University Edition Course Notes
Copyright © 2016 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States of
America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written
permission of the publisher, SAS Institute Inc.
Book code E70815, course code GASUE34, prepared date 03Aug2016. GASUE34_003
ISBN 978-1-62960-685-9
For Your Infor mation iii
Table of Contents
Course Description ......................................................................................................vi
Course Description
This introduction to SAS University Edition hands-on workshop shows how one can use the menu driven
tasks and SAS code in SAS University Edition 4.3 to perform common reporting and research tasks:
querying, reporting, and analyzing data. Several statistical procedures will be used to analyze data and
produce reports. SAS University Edition provides a SAS graphical point-and-click interface as well as
code that helps you exploit the power of SAS and publish dynamic results in a Microsoft Windows client
application. Demonstrations in the presentation will use research type data and tasks in illustrating the
functionality of SAS University Edition.
To learn more…
For information about other courses in the curriculum, contact the SAS
Education Division at 1-800-333-7660, or send e-mail to [email protected].
You can also find this information on the web at http://support.sas.com/training/
as well as in the Training Course Catalog.
For a list of other SAS books that relate to the topics covered in this
course notes, USA customers can contact the SAS Publishing Department
at 1-800-727-3228 or send e-mail to [email protected]. Customers outside
the USA, please contact your local SAS office.
Also, see the SAS Bookstore on the web at http://support.sas.com/publishing/
for a complete list of books and a convenient order form.
For Your Infor mation vii
Prerequisites
This course is designed for users who are familiar with Windows and other software, such as Microsoft
Office or spreadsheet programs. No programming experience or SAS knowledge is required. Having
completed an undergraduate course in statistics covering p-values, hypothesis testing, analysis of variance
and regression is helpful, but not required.
viii For Your Information
Chapter 1 Getting Started
Exercises............................................................................................................. 1-38
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Using/Installing SAS University Edition 1-3
Objectives
Use/Install SAS University Edition.
3
3
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-4 Chapter 1 Getting Started
7. Change the Name to MYSAS and make sure that the Re-create box is selected. Then click OK.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Using/Installing SAS University Edition 1-5
8. Create the data library. Click Server Files and Folders. Navigate to
D:\Workshop\WINSAS\GASUE34\data.
9. Right-click and create a library. Select Re-create this library at start-up.
Install Oracle VirtualBox virtualization software on your machine. We have also installed it.
Download the SAS University Edition vApp. We have already done this step on the race
image.
6. After you click the Download button, you are prompted to do the following (Skip this step if you are
using a RACE image.):
a. Create a SAS profile if you do not already have one. If you already have a SAS profile, sign in.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-6 Chapter 1 Getting Started
b) In VirtualBox, select the SAS University Edition vApp, and then select Machine
Settings.
c) In the navigation pane of the Settings dialog box, select Shared Folders and then click
.
d) In the Add Share dialog box, select Other as the folder path.
e) In the Browse for Folder window, open the SASUniversityEdition folder and select the
myfolders subfolder. Click OK (or Choose, depending on your operating system).
f) In the Add Share dialog box, confirm that Read-only is not selected, and then select the
Auto-mount and Make Permanent (if available) options. Click OK.
(1) Click Create Library and name the library MYSAS. Select Re-create this
library at start-up and then click OK.
(2) Create the data library. Click Server Files and Folders. Navigate to
D:\Workshop\WINSAS\GASUE34\data.
(3) Right-click and create the library. Select Re-create this library at start-up.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Using/Installing SAS University Editio n 1-7
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-8 Chapter 1 Getting Started
Objectives
Explain a typical workflow for data analysis.
State the features of SAS University Edition.
State the capabilities and major features of
SAS software.
State the purpose of the different areas of the
SAS University Edition workspace.
Name the steps in a typical SAS University Edition
session.
5
5
Predictive Modeling
What is the best that
Descriptive could happen?
$ROI Modeling
Ad Hoc Reports What will happen?
and OLAP
Standard
Reports Why did it happen?
Raw Data
What happened?
6
6
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-9
7
7 ...
8
8
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-10 Chapter 1 Getting Started
9
9
10
10
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-11
11
11
SAS University Edition software is an easy-to-use browser-based application that provides the following:
an intuitive, visual interface
access to the power of SAS
transparent access to data
ready-to-use tasks for analysis and reporting
easy exporting of data and results to other applications
scripting and automation
Users of all experience levels, from novice to expert, can use SAS University Edition to produce
meaningful results quickly.
This course is written for SAS University Edition 3.5. Previous versions have similar
functionality, but the interface has significant differences. To determine which version of SAS
University Edition you are using, select Help About SAS University Edition.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-12 Chapter 1 Getting Started
12
12
SAS University Edition provides a state-of-the-art browser-based interface with these features:
drag-and-drop functionality
dialog boxes
wizards
a color-coded syntax editor
a full online Help facility, embedded context-sensitive Help, and a Getting Started tutorial
13
13
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-13
14
14
Using SAS University Edition, you can access the power of SAS without learning the SAS programming
language.
If you are a SAS programmer, you can use the SAS University Edition Code Editor to create new code or
to modify existing SAS programs.
15
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-14 Chapter 1 Getting Started
Tabs
16
16
17
17
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-15
18
18
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-16 Chapter 1 Getting Started
This demonstration illustrates the use of SAS University Edition to analyze the profits for sporting goods
by state among U.S. customer orders from the southwest sales region.
Many of the tasks shown in this demonstration are discussed later in the course. This
demonstration is intended as a broad overview of a typical data analysis case study.
1. Open SAS University Edition.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-17
6. Highlight MYSAS and change the name from IMPORT to USCustomers. Click Save.
7. Click the Running Person button to run the code.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-18 Chapter 1 Getting Started
9. In the MYSAS Files folder, open USCUSTOMERS.sas7bdat by double-clicking. View the data and
then close it by clicking on the x in the tab..
10. Join the USCUSTOMERS data table to the orders.sas7bdat data table in the Data folder. Directly
below the S on the Server Files and Folders tab, there is a button. Click the down arrow to
the right of the button.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-19
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-20 Chapter 1 Getting Started
13. Click the + button again and select JOIN. Using the drop-down arrows, change the names of the left
and right tables as shown below and then click Save.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-21
14. Select the variables that you want to keep by clicking the COLUMNS tab. From the ORDERS table,
drag Profit from the pane on the left to the Column Name field. Select the Summary field for Profit
and change the value from NONE to SUM. From the USCUSTOMERS table, drag the variable
State to the Column Name field.
16. Run this program by clicking the running person. Look in your MYSAS Files folder to see the new
file.
17. Create a pie chart of profits by State.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-22 Chapter 1 Getting Started
19. Double-click Pie Chart. Use the Table button to navigate to the file. Use the + buttons to assign
variables.
20. Assign State as the category variable. Assign Profit as the response variable.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-23
The pie chart shows that the profits are highest in OK (Oklahoma) and lowest in AZ (Arizona).
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-24 Chapter 1 Getting Started
20
20
The next objective is to evaluate whether the profits vary significantly by order type (retail, phone, and
Internet sales).
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-25
Perform analysis of variance with pairwise comparisons to determine whether profits differ significantly
across order type.
2. Select the Orders data set and use the + button to add Profit as the dependent variable and
Order_Type as the categorical variable (factor).
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-26 Chapter 1 Getting Started
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-27
5. The tabular results from the analysis of variance appear first, followed by the diagnostic graphics.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-28 Chapter 1 Getting Started
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-29
There is evidence of poor model fit, suggested by the large number of outliers and the skewed distribution
of the residuals. At this point, we would usually stop and fix the problems. However, for illustration,
further analyses are shown.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-30 Chapter 1 Getting Started
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-31
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-32 Chapter 1 Getting Started
Objectives
Define the business need for the scenario used in the
class demonstrations.
Define the business need for the scenario used in the
exercises.
24
24
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Course Scenarios 1-33
27
27
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-34 Chapter 1 Getting Started
228
8
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Course Scenarios 1-35
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-36 Chapter 1 Getting Started
The output includes frequency counts for character columns, summary statistics for numeric columns,
and basic graphs to characterize each.
Partial Output
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Course Scenarios 1-37
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-38 Chapter 1 Getting Started
Exercises
Level 1
Level 2
The List Table Attributes task enables you to create a report with information about a
particular data set. The report includes creation date, location, and number of observations, as
well as the variable names, labels, types, and formats. Select Tasks Data List Table
Attributes.
a. Use the List Table Attributes task on the employee_organization data set. Create the default
report only.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.4 Solutions 1-39
1.4 Solutions
Solutions to Exercises
1. Creating a Project and Running the Characterize Data Task
Use the Characterize Data task to investigate the distribution of each of the columns in the
employee_organization data set.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-40 Chapter 1 Getting Started
Partial Output
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.4 Solutions 1-41
The List Table Attributes task enables you to create a report with information about a
particular data set. The report includes creation date, location, and number of observations, as
well as the variable names, labels, types, and formats. Select Tasks Data List Table
Attributes.
a. Use the List Table Attributes task on the employee_organization data set. Create the default
report only.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-42 Chapter 1 Getting Started
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 2 Working with Data in a
Project
Exercises............................................................................................................. 2-18
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Introduction to Tabular Data 2-3
Objectives
State the definition of a SAS data set.
State how data is stored in a SAS data set.
3
3
4
4
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-4 Chapter 2 Working w ith Data in a Project
5
5
SAS/ACCESS Software
SAS/ACCESS enables you to read data from software
vendors’ files and applications and use those files as
input data for SAS programs and SAS tasks.
If you have appropriate authority, SAS/ACCESS also
enables you to write to other software vendors’ files.
Your system administrator will likely enable access to the
systems that you need.
6
6
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Introduction to Tabular Data 2-5
Interface Engines
Teradata Teradata
Physical Storage Engine
Oracle Oracle
Physical Storage Engine
7
7
Other engines include relational database systems such as SQL Server, Sybase, Informix, and OLE DB.
SAS/ACCESS software for non-relational databases includes ADABAS, IMS, and CA-IDMS.
SAS/ACCESS software to Data Warehouse Appliances includes Netezza and HP Neoview.
Rows
Columns
8
8
SAS University Edition provides data sources to tasks and queries in table format (a set of rows and
columns). A row is one occurrence of an entity. An entity can be a product, a customer, a sale, or some
other thing. Each column describes a characteristic of the entity, such as the product's ID, the customer's
name, or the quantity sold.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-6 Chapter 2 Working w ith Data in a Project
9
9
SAS University Edition allows spaces and special symbols in variable names because it invokes the
VALIDVARNAME=ANY option when SAS is initialized. Because not all SAS environments invoke this
option, it is recommended that you follow SAS standard naming conventions for column names if you
anticipate running code generated by SAS University Edition in other SAS applications. Standard naming
rules are that columns must start with a letter or underscore, and that the remaining characters are letters,
underscores, or numbers.
10
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Introduction to Tabular Data 2-7
Currency, date, and time data types are stored as numeric data. Declaring a column as a currency, date, or
time data type enables SAS University Edition to validate data values when you edit your table and to
display values in a default format.
11
11
A format (display format) is an instruction that you apply to a column. The format directs SAS University
Edition as to how to display data values. Use formats to control the appearance of data values or to group
data values together for analysis.
An informat (read-in format) is an instruction that SAS University Edition uses to read data values into a
variable. For example, the following value contains a dollar sign and commas:
$1,000,000
To remove the dollar sign ($) and commas (,) before storing the numeric value 1000000 in a variable, read
this value with the DOLLAR10. informat.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-8 Chapter 2 Working w ith Data in a Project
12
12
Attributes of a SAS data set are stored in the properties and include the following:
the data set name
the storage location
the date last modified
all column attributes (such as name, type, and length)
the number of rows and columns
Missing Values
If a data value is not present for a column in a particular
row, it is considered missing.
A missing character value is displayed as a blank.
A missing numeric value is displayed as a period or dot.
13
13
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Introduction to Tabular Data 2-9
store
-365 0 366
display
16
16
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-10 Chapter 2 Working w ith Data in a Project
Objectives
Add a local Excel spreadsheet to the SAS files.
View the properties of the data.
18
18
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Microsoft Excel Data 2-11
In this demonstration, we are not going to repeat all of the steps in Chapter 1 starting on page 1-16. We
will use selected steps from 1 to 8.
1. Orion maintains a list of products in a Microsoft Excel file named products.xlsx. To use this data in
SAS University Edition, it must be imported into a SAS data set. Highlight the filename and drag it
over.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-12 Chapter 2 Working w ith Data in a Project
2. The next step is to confirm the source data and designate the output SAS data set name and storage
location.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Microsoft Excel Data 2-13
To import a Microsoft Excel 2007 file, you must have any of the follow ing installed on your
computer:
Microsoft Excel 2007.
the 2007 Office System Driver.
the Microsoft Office 2007 Compatibility Pack for Word, Excel, and PowerPoint 2007 file
formats if you are using Microsoft Office 2000, Office 2002, or Office 2003. You can
download the compatibility pack or the 2007 Office system driver from the Microsoft website.
3. Change the library name and filename.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-14 Chapter 2 Working w ith Data in a Project
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Microsoft Excel Data 2-15
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-16 Chapter 2 Working w ith Data in a Project
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Microsoft Excel Data 2-17
Updating Results
You can use the products.ctl file to rerun the task.
20
20
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-18 Chapter 2 Working w ith Data in a Project
Exercises
Level 1
Read the data from the Phones worksheet and indicate that the first row contains field
names.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Microsoft Excel Data 2-19
Level 2
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-20 Chapter 2 Working w ith Data in a Project
Objectives
Use drag and drop to import JMP files.
Use the Import Data task to import tab-delimited,
SPSS, and STATA files into SAS University Edition.
25
25
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing JMP, Tab- Delimited, STATA, SPSS Files 2-21
In the data folder, there is a JMP file named PROFIT_SUMMARY. Drag it to the workspace. Change
the data set name to Profit_Summary. Run it.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-22 Chapter 2 Working w ith Data in a Project
Business Scenario
Orion Star has a tab-delimited text file named orders_tab.
To use the data in the text file, it must be imported into
a SAS data set.
29
29
Business Scenario
The SAS data set must have the following attributes:
Name the data set orders_tab.
Exclude the Employee_ID column.
Assign column names based on the first row of the
text file.
Format Profit as a currency value.
Format Discount with a percent sign.
30
30
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing JMP, Tab- Delimited, STATA, SPSS Files 2-23
To your computer, a text file is only strings of characters. SAS University Edition requires that the data be
structured with rows and columns (observations and variables).
We cannot use the drag-and-drop functionality directly. We will use the Import Data function.
1. The Import Data Function can be used to convert a text file into a customized SAS data set.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-24 Chapter 2 Working w ith Data in a Project
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing JMP, Tab- Delimited, STATA, SPSS Files 2-25
Next Steps
Exclude the Employee_ID column.
Format Profit as a currency value.
Format Discount with a percent sign.
Click the Code button.
31
7. Use the Orders_tab.sas code. Press the F4 key and drag the file to the workspace. Run it.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-26 Chapter 2 Working w ith Data in a Project
Results
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing JMP, Tab- Delimited, STATA, SPSS Files 2-27
Exercises
Level 1
Level 2
4. Creating a Data Set with Only the Active Employees by Modifying the Generated SAS Program
Subset the employee_payroll_b data set to include only current employees who do not have a
termination date.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-28 Chapter 2 Working w ith Data in a Project
Partial Results
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Solutions 2-29
2.4 Solutions
Solutions to Exercises
1. Creating a New SAS Data Set from an Excel Spreadsheet
a. Add the Phones spreadsheet from the employee_info.xlsx workbook.
b. Open the employee_info.xlsx workbook using the drag-and-drop technique and create a new SAS
data set named employee_info_phones. Save it in the MYSAS library.
Read the data from the Phones worksheet and indicate that the first row contains field
names.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-30 Chapter 2 Working w ith Data in a Project
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Solutions 2-31
2) Run it.
b. Do not include the columns for Marital_Status or Dependents.
c. Change the type for Employee_ID to Number.
d. Change the format for Salary to Currency (Dollar14.2).
Use the employee_payroll.sas program to make the changes.
data mysas.employee_payroll_b;
set mysas.employee_payroll_a;
employee_id1=input(employee_id,7.);
salary1=input(salary,14.0);
format salary1 dollar14.0;
drop employee_id salary marital_status dependents;
run;
4. Creating a Data Set with Only the Active Employees by Modifying the Generated SAS Program
Subset the employee_payroll_b data set to include only current employees who do not have a
termination date.
a. Select Tasks Data Filter Data.
b. Select employee_payroll_b
c. Modify the SAS program so that the name of the data set is Employee_Payroll_Active.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-32 Chapter 2 Working w ith Data in a Project
d. Include only those employees with a missing value for Employee_Term_Date. Run the code.
Partial Results
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 3 Combining Data with
Queries
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Introduction to Filtering, Sorting, and Querying Data 3-3
Objectives
State the function of the Filter and Sort tasks.
State the function of the New Query task.
Compare the functionality available in each task.
3
3
4
4
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-4 Chapter 3 Combining Data w ith Queries
5
5
Filter Task
The Filter enables you to create a new SAS table by
selecting rows and columns.
6
6
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Joining Tables 3-5
Sort Task
The Sort Task enables you to create a new SAS table by
creating a sort sequence.
7
7
Sort Task
8
8
A query is a collection of specifications that enables you to focus on a particular set of data. The Filter
task, the Sort task, and the Query Builder can be used to build these query specifications.
Behind the scenes, the Filter task, the Sort task, and the Query Builder generate Structured Query
Language (SQL) code.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-6 Chapter 3 Combining Data w ith Queries
Objectives
Join multiple tables by common columns.
Include only matching rows.
10
10
Business Scenario
In a query of the TOPPRODUCTS table, you select only
products where total profits exceed $500. After producing
this list, you want to also see the product category as well
as the product, supplier, and country names. The columns
to include come from three different tables.
TOPPRODUCTS
PRODUCTS
COUNTRY_LOOKUP
11
11
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Joining Tables 3-7
Business Scenario
To include the necessary columns, the TOPPRODUCTS
SAS table must be joined with the PRODUCTS SAS table
and the COUNTRY_LOOKUP Excel spreadsheet (or SAS
table).
12
12
Joining Tables
Joining tables enables you to extract and simultaneously
process data from more than one table.
13
13
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-8 Chapter 3 Combining Data w ith Queries
Tables that participate in the join should have a common column. This column should contain values that
can be matched exactly or can be easily compared between at least two of the tables. A join created on
columns whose values match exactly is called an inner join or equijoin and is the default join type in
SAS University Edition. In addition, greater-than joins, less-than joins, non-equijoins (theta joins), and
outer joins are also supported.
The Query Builder attempts to join tables by columns that have the same name and type. If no column
name and type matches are found, then a warning message tells you to join the columns manually.
Join Properties
The Join Properties window enables you to modify the
join type or condition. Selecting a different join type can
be used to identify or eliminate nonmatching rows.
14
14
To modify the join type or apply a condition, click the down arrow to get a list of the other joins.
The following join types are available:
All rows from the left table given a condition Left join
All rows from the right table given a condition Right join
All rows from both tables given a condition Full outer join
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Joining Tables 3-9
Joining Tables
Someone in the company built a query where products with total profits exceeding $500 were identified.
Analysts asked for more details about these top products, including the product category and the product,
supplier, and country names. To include the necessary columns, the TOPPRODUCTS SAS table must be
joined with the PRODUCTS SAS table and the COUNTRY_LOOKUP Excel spreadsheet or SAS table
using the Query Builder.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-10 Chapter 3 Combining Data w ith Queries
4. Select the data library and then select the TOPPRODUCTS table. Click OK.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Joining Tables 3-11
6. Click the plus sign and then click Join. Put TOPPRODUCTS in the left table and PRODUCTS in
the right table.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-12 Chapter 3 Combining Data w ith Queries
7. Click Save. University Edition automatically selects the join columns if there are appropriate columns
to join. You can override this.
8. Click the plus sign and click Join. Put PRODUCTS in the left table and COUNTRY_LOOKUP in
the right table.
9. Click Save.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Joining Tables 3-13
11. Click the Columns tab and expand the three tables. Drag and drop the following columns to add each
to the Select tab: Product_ID, SUM_of_Profit, Product_Category, Product_Name,
Supplier_Name, and Country_Name.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-14 Chapter 3 Combining Data w ith Queries
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 4 Transforming Data
Exercises............................................................................................................. 4-15
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Creating New Columns w ith an Expression 4-3
Objectives
Explain the use of the DATA, SET, and WHERE
statements.
Define a new column of data by building an
expression.
3
3
4
4
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-4 Chapter 4 Transforming Data
Business Scenario
Orion Star wants to analyze shipment methods by
determining how many days elapse between each order
date and delivery date. The company also wants to
calculate the total amount invoiced to the customer, which
is the sum of the total retail price and the shipping charges.
Delivery_Date – Order_Date
5
5
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Creating New Columns w ith an Expression 4-5
SET Statement
The SET statement reads observations from an existing
SAS data set for further processing in the DATA step.
7
7
WHERE Statement
The WHERE statement selects observations from
a SAS data set that meet a particular condition.
8
8
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-6 Chapter 4 Transforming Data
9
9
10
10
Columns can be computed as summarized data (such as the sum of a variable), a recoded column (such as
grouping values from 0 to 100 as “low”), or an advanced expression (calculated with a mathematical
formula, a function, or some combination of both).
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Creating New Columns w ith an Expression 4-7
SAS Functions
A SAS function is a routine that returns a value
that is determined from specified arguments.
function-name(argument1,argument2, ...)
Example:
sum(Salary,Bonus)
11
11
12
12
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-8 Chapter 4 Transforming Data
Orion Star would like to calculate the total amount invoiced to the customer, which is the sum of the total
retail price and the shipping charges. Use the transform.sas program to create a table named shipping
with the new computed column.
7. To emphasize the orders with the longest delivery time, sort the table in descending sequence by
Time_to_Delivery. Click Time_to_Delivery.
Partial Results (8 of 617 rows)
8. Save the program to the SAS Code folder with the name shipping.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Creating New Columns w ith an Expression 4-9
Exercises
Level 1
h. Create New_Salary, which is the current salary plus an additional 2% raise. Format the values
with a dollar sign, comma, and two decimal places.
i. Include only active employees in the output table. In other words, include only employees who
have a missing value for Employee_Term_Date.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-10 Chapter 4 Transforming Data
Level 2
Use a function to calculate the total, thereby ignore any missing values that might be in
the input columns.
Level 3
Hint: The TODAY function can be used to provide the current date. The function
YRDIF(start-date,end-date,actual) computes the years. The format is 4.1.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Creating New Columns w ith an Expression 4-11
c. Run the program and verify the results. Save the program to the SAS Code folder as
Howlong.sas.
Partial Results (8 of 308 rows).
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-12 Chapter 4 Transforming Data
Objectives
Recode individual values or a range of values
in a column.
18
18
Business Scenario
To further analyze profit per order, management would
like to categorize each order in the following ranges:
$0 to $100
$100 to $500
$500 and Above
19
19
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Creating New Columns by Recoding Values 4-13
TRUE Then
When Order_Type=2 Order_Type_Detail
= 'Catalog Sale'
FALSE
TRUE Then
When Order_Type=3 Order_Type_Detail
= 'Internet Sale'
20
20
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-14 Chapter 4 Transforming Data
To further analyze profit per order, Management wants to categorize the profit value for each order
as follows:
$0 to $100
$100 to $500
$500 and above
SAS Recode.sas.
data work.recode;
set data.orders;
length Profit_Category $ 20;
if Profit <=100 then Profit_Category = "Less or Equal $100";
else if 100.001 <Profit and Profit <= 499.999 then
Profit_Category ="$100 to $500";
else Profit_Category = "Greater Than $500";
keep Customer_ID Order_ID profit Profit_Category;
run;
Use Recode.sas to create a new column named Profit_Category. Run the program
Partial Results
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Creating New Columns by Recoding Values 4-15
Exercises
Level 1
Level 2
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-16 Chapter 4 Transforming Data
d. Run the program. Save the program in the SAS Code folder as SalaryRange.sas.
Partial Results (8 of 424 rows)
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Solutions 4-17
4.3 Solutions
Solutions to Exercises
1. Joining Data from Two Tables
Use the New Query Builder to create a new table whose data results from a join of two tables. Create
a new column.
a. Use the New Query Builder to join employee_payroll and employee_addresses to create a table
named payroll_location.
b. Make sure that the join between the two tables is on the Employee_ID column.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-18 Chapter 4 Transforming Data
c. Include the following columns on the Select Data tab: Employee_ID, Employee_Name,
Employee_Gender, Birth_Date, Salary, Street_Number, Street_Name, City, State,
Employee_Term_Date, and Country.
e. Write a program for parts f through j below. Name the output file payroll_location_1.
f. Format the Salary column with a dollar sign, comma, and two decimal places.
g. Create a new column named Bonus that represents 1.5% of the Salary column. Format the values
with a dollar sign, comma, and two decimal places.
h. Create New_Salary, which is the current salary plus an additional 2% raise. Format the values
with a dollar sign, comma, and two decimal places.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Solutions 4-19
i. Include only active employees in the output table. In other words, include only employees who
have a missing value for Employee_Term_Date.
j. Insert drop employee_term_date; directly before the RUN statement.
k. Run the code and view the results. Save the program in the SAS Code folder as Location.sas.
data Mysas.Payroll_Location1;
set Mysas.Payroll_Location;
format Salary Dollar12.2;
Bonus = Salary*.015;
format Bonus Dollar10.2;
New_Salary = Salary+.02*Salary;
format New_Salary Dollar12.2;
if Employee_Term_Date= .;
drop Employee_Term_Date;
run;
Partial Results (8 of 308 rows)
Create a table from the employee_donations table with a new column that indicates the total
contributions from each employee. Open the employee_donations table.
a. Write a program to create a table named mysas.donations.
b. Include the Employee_ID, Recipients, and Paid_By columns in the new table.
c. Create a new column named Total_Donations, which totals the values of the four quarterly
donations. Format the new column to display dollar signs, commas, and two decimal places.
Use a function to calculate the total, and thereby ignore any missing values that might be
in the input columns.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-20 Chapter 4 Transforming Data
a. Use the employee_payroll table as the input table. Include Employee_ID and
Employee_Hire_Date. Include only employees without a termination date.
b. Create a new column named Years Employed that calculates the number of years that each
employee worked at Orion Star, based on the hire date and today's date. Apply a format that
displays the values rounded to one decimal place.
Hint: The TODAY function can be used to provide the current date. The function
YRDIF(start-date,end-date,actual) computes the years. The format is 4.1
c. Run the program and verify the results. Save the program to the SAS Code folder as
Howlong.sas.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Solutions 4-21
data Mysas.Years_At_Work;
set Mysas.Employees_Payroll;
Years_Employed = Yrdif(Employee_Hire_Date,Today(),Actual);
format Years_Employed 4.1;
keep Employee_ID Employee_Hire_Date Years_Employed;
if Employee_Term_Date = .;
run;
4. Creating a New Column by Recoding Distinct Values
The Country column in the employee_addresses table includes country codes, such as US
or AU. Create a query that includes a new column also named Country_Name that recodes the
values of US and AU to the full country name.
Write a program to create a table named mysas.country_name. Name the program
country_name.sas and store it in the SAS Code folder.
data Mysas.Country_Name;
set Data.Employee_Addresses;
length Country_Name $15;
if Upcase(Country )= "Us" Then Country_Name = "United States";
else If Upcase(Country )= "Au" Then Country_Name = "Australia";
else Country_Name = "Missing";
run;
5. Creating a New Column by Recoding Ranges of Values
Write a program to create a new column that groups salary values into four categories. Name the new
column Salary_Range. Recall that Salary is in mysas.employees_payroll.
a. Create a table named mysas. Salary_Range.
b. Include only the Employee_ID, Salary, and Employee_Hire_Date columns. Format
Employee_Hire_Date with the DDMMYYDw. format so that dates appear as 20/10/2008.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-22 Chapter 4 Transforming Data
c. Create a new column named Salary_Range based on the values in the Salary column.
d. Run the program. Save the program in the SAS Code folder as SalaryRange.sas.
Partial Results (8 of 424 rows)
data Mysas.Salary_Range;
set Mysas.Employees_Payroll;
if Salary > 0 and Salary <24999.999 then
Salary_Range = "Below $25K";
else if Salary >=25000 And Salary < 49999.999 then
Salary_Range = "$25K To $50K";
else if Salary >50000 And Salary <99999.999 then
Salary_Range = "$50 To $100K";
else if Salary > 100000 Then Salary_Range = "Over $100K";
else Salary_Range = "Missing";
format Employee_Hire_Date DDMMYY10.;
keep Employee_ID Employee_Hire_Date Salary_Range Salary;
run;
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 5 Exploring Data with
Tasks
Exercises............................................................................................................. 5-25
Exercises............................................................................................................. 5-36
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating a Frequency Report 5-3
Objectives
Access the One-Way Frequencies task.
Generate a one-way frequency report and bar chart.
3
3
4
4
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-4 Chapter 5 Exploring Data w ith Tasks
Business Scenario
Orion Management wants a report that counts the number
of products offered for each category.
5
5
What Is a Task?
A task is a specific type of analysis,
report, or data manipulation that you
can perform against data in a project.
A task is typically referenced by its
description.
6
6
SAS University Edition tasks generate SAS code and formatted results. The tasks include SAS procedures
that range from simple data listings to more complex analytical procedures. Expand the Task area in the
left navigation pane. Here you see five categories of tasks: Data, Graph, Combinatorics and Probability,
Statistics, and Forecasting. Within these categories are individual procedures. A list of all the procedures
is shown below.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating a Frequency Report 5-5
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-6 Chapter 5 Exploring Data w ith Tasks
7
7
8
8
The One-Way Frequencies task produces a one-way frequency report for each variable included in the
analysis variable role. Binomial and chi-square tests can be requested on the Options tab.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating a Frequency Report 5-7
10
10
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-8 Chapter 5 Exploring Data w ith Tasks
11
11
For binomial proportions, specify a test proportion (null hypothesis proportion value) and confidence
level. Select the tests to perform. To compute the Monte Carlo estimates of the exact p-values instead of
directly computing the exact p-values, select the Use Monte Carlo estimation check box. Monte Carlo
estimation can be useful for large problems that require a great amount of time and memory for exact
computations but for which asymptotic approximations might be insufficient. Exact Computations
Methods Limit computation time specifies the time limit (in seconds) for the computation of each p-value
for each crosstabulation table. The default is 300 seconds (or 5 minutes).
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating a Frequency Report 5-9
Orion management wants a report that summarizes the number of products offered for each category.
Use the One-Way Frequencies task to generate the report.
1. In the Tasks section of the navigation pane, expand the Statistics folder and double-click One-Way
Frequencies.
2. On the DATA tab, select DATA.PRODUCTS as the desired data set for analysis.
3. Under ROLES, click the + icon to assign the variable Product_Category to the analysis variable
role. Click OK.
4. On the OPTIONS tab, expand FREQUENCIES AND PERCENTAGES. To include only the
frequency and percent statistics, select Frequency table and then select Include percentages
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-10 Chapter 5 Exploring Data w ith Tasks
5. Select Run to generate the report and examine the results. View the report in the RESULTS tab.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating a Frequency Report 5-11
Exercises
Level 1
1. Using the One-Way Frequencies Task to Generate Frequency Counts and Percentages
a. Use the data set DATA.employee_organization and the One-Way Frequencies task to generate a
report that analyzes the Department column.
b. Modify the One-Way Frequencies task to display only frequencies and percentages.
c. Generate the report and examine the results.
d. Save the project as Dept_Freq.sas.
Level 2
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-12 Chapter 5 Exploring Data w ith Tasks
Objectives
Create a summary table.
Create a summary report.
Export summary data to Microsoft Excel.
14
14
Business Scenario
Management would like a report that includes a summary
of profits for each product line and category. In this
scenario, you use the data set Orion_Profit.
15
15
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-13
16
16
17
17
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-14 Chapter 5 Exploring Data w ith Tasks
Compute statistics
for each numeric
variable in the list.
18
18
Group Separate descriptive statistics are generated for each group. The groups
are determined by the values of the variable that you assign to this role.
Frequency count This is a variable whose value represents the number of times that the
row should be counted.
Weight This is a variable whose value for each row is used to calculate weighted
means, variances, and sums.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-15
19
19
On the OPTIONS tab, you can choose to include extra information in your output, such as specific
statistics, percentiles, plots, and details about the divisor for standard deviation and variance. The extra
options include the following:
Mean Standard Deviation
Standard Error Variance
Kurtosis Skewness
Minimum or Maximum Percentiles: 1st , 5th , 10th, 90th , 95th, or 99th
Lower or Upper Quartile Median
Coefficient of Variation Corrected or Uncorrected Sum of Squares
Histogram Box Plot
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-16 Chapter 5 Exploring Data w ith Tasks
20
20
Under the Output tab, the option to create an output data set is available. After the option is executed, the
data set of the summary statistics can be found in its respective library.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-17
Summarizing Data
Management would like to be able to access an HTML report or a Microsoft Excel file from a shared
network drive. This report should include a summary of profits for each product line and category.
1. To create the summary report and data set, begin by selecting Tasks Statistics Summary
Statistics from the navigation pane.
2. On the DATA tab, select DATA.ORION_PROFIT as the data set for analysis.
3. On the DATA tab, click the + symbol and add the variable Profit to the analysis variables role.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-18 Chapter 5 Exploring Data w ith Tasks
4. On the DATA tab, click the + symbol next to Classification variables roles, and add the variables
Product_Line and Product_Category.
So far, your screen should match what is shown below. If you are viewing in split screen mode, you
will also see the SAS code automated to the right of the Data pane.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-19
5. Designate statistics to include in the report. On the OPTIONS tab, do the following:
a. Expand Basic Statistics. Clear the check boxes for Standard deviation, Minimum, and
Maximum.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-20 Chapter 5 Exploring Data w ith Tasks
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-21
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-22 Chapter 5 Exploring Data w ith Tasks
9. Create a more descriptive name for the task. Click the Save As button and save the task as Profit by
Product Summary under MY SAS FILES.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-23
b. Double-click Generate CSV File to open snippet code to generate an Excel file.
c. Edit the SAS code to represent the correct output data set name. For this demonstration, replace
data = sashelp.cars with the output data set that you created in step 8, data =work.profit_sum.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-24 Chapter 5 Exploring Data w ith Tasks
e. Click Run to export the Excel file. If the code runs properly, a window appears that asks whether
you want to open the excel file. Select Open.
Partial Results
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-25
Exercises
Level 1
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-26 Chapter 5 Exploring Data w ith Tasks
Level 2
Level 3
b. Include the mean, minimum, maximum, and median salary for each group, and create a
histogram.
c. Within the Summary Statistics task, on the OPTIONS tab, expand Additional Statistics. Select
the Skewness box. Click Run.
Skewness is a measure that indicates to what degree the distribution of data values has a
longer tail to the left or to the right. For the Salary distribution for both US and AU, there
are a few extreme values at the high end and a concentration of values at the low end of
the distribution, so the right tail of the distribution curve is longer (right-skewed). For
such data, the skewness statistic is greater than zero. If the data are left-skewed, the
skewness statistic is less than zero.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-27
d. Examine the output and verify that the additional skewness statistic was added to the results.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-28 Chapter 5 Exploring Data w ith Tasks
Objectives
Filter data directly in a task.
Using various operators, build multiple filter criteria.
23
23
Business Scenario
The Marketing Department at Orion Star wants to run
a special promotion featuring products purchased by
customers between the ages of 19 and 38 who have
placed orders since May 2003.
24
24
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Filtering Data in a Task 5-29
25
25
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-30 Chapter 5 Exploring Data w ith Tasks
27
27
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Filtering Data in a Task 5-31
The Marketing Department at Orion Star wants to run a special promotion featuring products purchased
by customers between the ages of 19 and 38 who placed orders since May 2003. Modify the Summary
Statistics task to add an appropriate filter.
1. In the navigation pane, expand Tasks. Then expand Data and select Filter Data.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-32 Chapter 5 Exploring Data w ith Tasks
4. After Customer_Age is assigned to Variable 1, you want Customer_Age to be greater than or equal
to 19.
a. In the Comparison field, click the drop-down arrow and select Greater than or equal to.
b. In the Value type field, click the drop-down arrow and select Enter a Value.
c. In the Value field, enter 19.
5. You also want to include customers who are 38 or younger. Therefore, in the Logical field, click the
drop-down arrow and select AND. This action adds a second filter to include this condition.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Filtering Data in a Task 5-33
8. Now you want to include customers who have placed orders since May 2003. To add a third filter,
select the drop-down arrow in the Logical field and select AND.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-34 Chapter 5 Exploring Data w ith Tasks
c. In the Value field, enter '01MAY2003'd. (Be sure to include the single quotation marks.)
11. Expand OUTPUT DATA SET and create a data set named FILTER_PROFIT. Select the show
output data check box.
12. Click Run. Check the output data set to make sure that it is filtered the way that you want.
Partial Results
13. Open the previous demonstration that was saved as Profit by Product Summary.
14. On the DATA tab, select WORK.FILTER_PROFIT as the data set to analyze
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Filtering Data in a Task 5-35
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-36 Chapter 5 Exploring Data w ith Tasks
Exercises
Level 1
a. Select the DATA.EMPLOYEE_ADDRESSES data set and use the Filter Data task so that only
San Diego employees are included in the report.
e. Click Run
f. Save the project as output_SanDiego.
Level 2
a. Select the DATA.EMPLOYEE_ADDRESSES data set and use the Filter Data task so that all
employees from Australia are included in the report.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Filtering Data in a Task 5-37
e. Click Run.
f. Save the project as output_AU.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-38 Chapter 5 Exploring Data w ith Tasks
Objectives
Compare means using a t test.
Run a distribution analysis.
Run a nonparametric one-way ANOVA.
30
30
Business Scenario
The Human Resources Department is involved in a
situation that requires them to determine whether the
mean salary is different between men and women.
31
31
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-39
t Test
A two-sample t test is used to determine whether the
means of two populations are equal. The test assumes
that the data are normally distributed for the t test to be
valid.
This analysis tests whether
the mean salary for women
is different from the mean
salary of men.
32
32
33
33
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-40 Chapter 5 Exploring Data w ith Tasks
34
34
If there are three or more groups, the Kruskal-Wallis test is used when you specify the Wilcoxon
option. The Kruskal-Wallis and the Wilcoxon rank-sum tests are equivalent when two groups are
compared.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-41
Comparing Means
1. Expand Tasks in the navigation pane. Then expand Statistics and select T Tests.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-42 Chapter 5 Exploring Data w ith Tasks
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-43
6. On the OPTIONS tab, expand the PLOTS section and select Default plots. Default plots include
histogram, box plot, and normality plot.
7. Click Run.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-44 Chapter 5 Exploring Data w ith Tasks
The Satterthwaite p-value of 0.1249 is greater than .05, our significance level. Therefore, we fail to
reject the null hypothesis that the means are equal, and conclude that there is insufficient evidence
that the means between the two groups differ. This indicates that the mean salary of employees does
not differ significantly between genders. This assumes normally distributed data.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-45
Distribution Analysis
Run a distribution analysis to confirm that the data are not normally distributed.
1. In the navigation pane, expand Tasks. Then expand Statistics and select Distribution Analysis.
2. On the DATA tab, under DATA, select MYSAS.EMPLOYEE_PAYROLL as the data set to analyze.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-46 Chapter 5 Exploring Data w ith Tasks
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-47
5. On the OPTIONS tab, expand CHECKING FOR NORMALITY and select the box for Histogram
and goodness-of-fit tests.
6. Click Run.
Distribution of Salary – Female
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-48 Chapter 5 Exploring Data w ith Tasks
Based on the p-value of <0.01 (for both male and female employees), we reject the null hypothesis
that the data are normally distributed. The t test would not be the most appropriate test to compare the
groups in this example. A nonparametric test would be preferable.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-49
1. In the navigation pane, expand Tasks. Then expand Statistics, and select Nonparametric One-Way
ANOVA.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-50 Chapter 5 Exploring Data w ith Tasks
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-51
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-52 Chapter 5 Exploring Data w ith Tasks
5. On the OPTIONS tab, expand TESTS and select the Wilcoxon Scores box.
6. Click Run.
The Wilcoxon two-sided Pr > Z of 0.5889 is greater than the significance level of 0.05. You fail to
reject the null hypothesis that the salaries are equal. The salaries of males and females at Orion Star
are not significantly different.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-53
5.5 Solutions
Solutions to Exercises
1. Using the One-Way Frequencies Task to Generate Frequency Counts and Percentages
a. Use the data set DATA.employee_organization and the One-Way Frequencies task to generate a
report that analyzes the Department column.
1) To open the One-Way Frequencies task, select Tasks Statistics One-Way Frequencies
from the navigation pane.
2) On the DATA tab, select DATA.EMPLOYEE_ORGANI ZATION as the data set for
analysis.
3) On the DATA tab, under ROLES, click the + icon and select the variable Department to
assign it to the analysis variable role.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-54 Chapter 5 Exploring Data w ith Tasks
b. Modify the One-Way Frequencies task to display only frequencies and percentages.
Click the OPTIONS tab and expand Frequencies and percentages. Select Frequency table and
Include percentages to display frequencies and percentages only.
c. Click Run to generate the report. Examine the results.
d. Save the project as Dept_Freq.
a. Use the data set DATA.employee_organization and the One-Way Frequencies task to analyze
Job_Title, grouped by Department. Include only frequencies and percentages.
1) To open the One-Way Frequencies task, select Tasks Statistics One-Way Frequencies
from the navigation pane.
2) On the DATA tab, under ROLES, click the + icon to add the variable Job_Title to the analysis
variables role. Click OK.
3) On the DATA tab, expand ADDITIONAL ROLES. Click the + icon next to group analysis
by to add Department as the group analysis by variable.
4) On the OPTIONS tab, expand Frequencies and Percentages in the Frequency Table options
pane and select Include cumulative frequencies and percentages
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-55
a. Use the Summary Statistics task to generate a report on the Salary column by Department.
1) In the navigation pane, expand Tasks. Select Statistics Summary Statistics.
2) On the DATA tab, select DATA.EMPLOYEE_MASTER as the data set to analyze.
3) Under ROLES, assign Salary to the analysis variable role.
4) Under ROLES, assign Department to the classification variable role.
b. Include the mean, minimum, maximum, and median salary for each group.
1) On the OPTIONS tab, Expand Basic Statistics and select Mean, Minimum, Maximum, and
Median.
4) Click Save.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-56 Chapter 5 Exploring Data w ith Tasks
3) Click Run and then open the CSV file that is created.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-57
Partial Results
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-58 Chapter 5 Exploring Data w ith Tasks
b. Include the mean, minimum, maximum, and median salary for each group.
1) On the OPTIONS tab, select Statistics Basic Statistics. Then select Mean, Minimum,
Maximum, and Median.
2) Expand Percentiles and select Median.
c. Submit the task and view the report. Save the task as Salary by Dept/State Summary.
1) Click Run.
2) Click the Save as button.
3) In the Name field, enter Salary by Dept/State Summary.
4) Click Save.
Partial Results
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-59
c. Within the Summary Statistics task, on the OPTIONS tab, expand Additional Statistics. Then
select Skewness.
Skewness is a measure that indicates to what degree the distribution of data values has a
longer tail to the left or to the right. For the Salary distribution, for both US and AU,
there are a few extreme values at the high end and a concentration of values at the low
end of the distribution, so the right tail of the distribution curve is longer (right-skewed).
For such data, the skewness statistic is greater than zero. If the data is left-skewed, the
skewness statistic will be less than zero.
Click Run.
d. Examine the output and verify that the additional skewness statistic was added to the results.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-60 Chapter 5 Exploring Data w ith Tasks
c. Click Run.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-61
Partial Results
d. Select Tasks Data List Data. Use the filtered data set and include the following variables
Employee_ID, Employee_Name, Street_Number, Street_Name, and Postal_Code.
1) In navigation pane, expand Tasks and select Data Filter Data.
2) On the DATA tab, select WORK.FILTER_ADDRESSES as the data set to analyze.
3) Under ROLES, click the + symbol in the List variables area and select the following
variables: Employee_ID, Employee_Name, Street_Number, Street_Name, and
Postal_Code.
e. Click Run.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-62 Chapter 5 Exploring Data w ith Tasks
Partial Results
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-63
11) In the Value field, click the drop-down arrow and select au.
c. Click Run. Check the output data set to make sure that the filter was applied properly.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-64 Chapter 5 Exploring Data w ith Tasks
Partial Results
d. Select Tasks Data List Data. Use the filtered output data set, filter_address_au, and
include the following variables: Employee_ID, Employee_Name, Street_Number,
Street_Name, and Postal_Code.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-65
Partial Results
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-66 Chapter 5 Exploring Data w ith Tasks
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 6 Visualizing Data
6.1 Creating a Graph ....................................................................................................... 6-3
Demonstration: Creating a Bar Chart ......................................................................... 6-8
Demonstration: Creating a Stacked Bar Chart ........................................................... 6-14
Exercises............................................................................................................. 6-17
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-3
Objectives
Name the types of graphs supported in
SAS University Edition.
State the differences among the output formats
that are supported in SAS University Edition.
Create a bar chart.
Create a grouped/stacked bar chart using the
advanced bar chart interface.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-4 Chapter 6 Visualizing Data
Business Scenario
To better understand the demographics of the Orion Star
customers, Management wants to see a comparison of
average customer age by country.
SAS University Edition has several Graph tasks that you can choose from. Above are quick examples of a
line graph, pie chart, bar chart, and line bar chart
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-5
Types of Graphs
The graphs are displayed
under Tasks Graph.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-6 Chapter 6 Visualizing Data
On the Options tab, you can enter titles and footnotes for the graphical tasks and even choose the font size
for each one. Options vary depending on the graph.
10
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-7
11
12
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-8 Chapter 6 Visualizing Data
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-9
3. On the DATA tab, under ROLES, assign Customer_Country as the category variable by clicking the
+ symbol
4. On the DATA tab, under ROLES, assign Customer_Age as the Response variable by clicking the +
symbol
On the DATA Tab, expand DIRECTION and select the Horizontal radio button.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-10 Chapter 6 Visualizing Data
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-11
5. On the DATA tab, expand STATISTICS, and select Mean as the statistic for the bar chart.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-12 Chapter 6 Visualizing Data
b. Expand BAR DETAILS and select the Apply bar color check box. Choose any color by using
the Bar color menu. Change the Color bars by value to Bar category.
d. Expand BAR LABELS and select the Show bar labels or statistics box. This displays the
average age on each bar.
e. Expand CATEGORY AXIS and select Show values in data order, and slow label.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-13
7. Select Run.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-14 Chapter 6 Visualizing Data
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-15
5. On the DATA tab, expand Direction and select Vertical. Then expand Group Layout and select
Stack.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-16 Chapter 6 Visualizing Data
b. Clear the box for Show values in data order. This puts the category variable in ascending order.
To change to descending order, select the Reverse check box.
8. Select Run.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-17
Exercises
Level 1
1. Creating a Vertical Bar Chart That Graphs Total Salary for Each City
a. Use the Bar Chart task and select DATA.EMPLOYEE_DETAIL as the data set to analyze.
b. Create a vertical bar chart that shows the sum of Salary for each value of City.
c. Change the following options:
1) Title the graph Total Salary for Each City and change Data skin to Matte.
2) Show data labels at the top of each bar with the sum of Salary.
3) Label the bar height (vertical) axis as Total Salary.
d. Run the task and review the results.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-18 Chapter 6 Visualizing Data
Objectives
Create a scatter plot.
Enhance a scatter plot by adding a regression line.
Create a pie chart.
16
Scatter Plot
The default Scatter Plot task creates a two-dimensional
plot that displays the values of two variables as data
points on one horizontal axis (x) and one vertical axis (y).
17
Each pair of x and y values forms a data point. The scatter plot is especially useful for non-summarized,
randomly distributed data. It is designed to accommodate large data sets. The user can set the color, size,
and shape of the markers shown. Interpolation including linear regression is also available in this plot.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating a Scatter Plot and Pie Chart 6-19
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-20 Chapter 6 Visualizing Data
4. Also under ROLES, assign Shipping as the Y variable by selecting the + symbol.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating a Scatter Plot and Pie Chart 6-21
6. To produce a regression line that shows the best fit through these data points, expand FIT PLOTS
and select the box for Regression.
The resulting regression line through the scatter plot indicates the relationship between
shipping costs and price per unit shipped.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-22 Chapter 6 Visualizing Data
Pie Chart
Pie charts represent a popular way to display frequencies.
As with bar charts, the size of the slices can also
represent a sum or mean of a response variable.
19
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating a Scatter Plot and Pie Chart 6-23
To better understand the demographics of the Orion Star profit, Management wants to see a comparison
of total profit based on product line. Use the Pie Chart task to create the result.
1. Expand Tasks in the navigation pane and then expand Graph. Double-click the Pie Chart task.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-24 Chapter 6 Visualizing Data
3. Under ROLES, assign the variable Product_Line by selecting the + symbol next to the Category
variable role.
4. Under ROLES, assign the variable Profit by selecting the + symbol next to the Response variable
role.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating a Scatter Plot and Pie Chart 6-25
5. Select Run.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-26 Chapter 6 Visualizing Data
Exercises
Level 1
2. Creating a Two-Dimensional Scatter Plot
Management wants to see how an employee's birthdate relates to his or her hire date. Produce a report
that addresses this request.
a. Use the Scatter Plot task and select MYSAS.EMPLOYEE_PAYROLL as the data set to analyze.
b. Create a scatter plot that shows Employee_Hire_Date and Birth_Date.
c. Add a linear regression interpolation line to the graph.
d. Name the scatter plot Hiring Trends.
e. Select Run and analyze the results.
f. Are there any trends or changes in the hiring practices of the company?
Level 2
3. Using the Pie Chart Task to Analyze Salary by Department
a. Use the Pie Chart task and select DATA.EMPLOYEE_DETAIL as the data set to analyze. The
slices should represent the sum of Salary for each department.
b. Show the department name and salary total outside each slice.
c. Change the title to Total Salary by Department and run the task.
d. Select Run.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-27
6.3 Solutions
Solutions to Exercises
1. Creating a Vertical Bar Chart That Graphs Total Salary for Each City
a. Use the Bar Chart task and select DATA.EMPLOYEE_DETAIL as the data set to analyze.
Expand Tasks on the navigation pane. Expand Graph and select Bar Chart.
b. Create a vertical bar chart that shows the sum of Salary for each value of City.
1) On the DATA tab, do the following:
a) Select DATA.EMPLOYEE_DETAIL as the data set to analyze.
b) Under ROLES, assign City to the category variable role.
c) Under ROLES, assign Salary to the response variable role.
d) Under DIRECTION, select Vertical.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-28 Chapter 6 Visualizing Data
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-29
Expand RESPONSE AXIS. Select the Show label box, and enter Total Salary in the
Custom label field.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-30 Chapter 6 Visualizing Data
b. Create a scatter plot that shows Employee_Hire_Date on the horizontal axis and Birth_Date on
the vertical axis
1) On the DATA tab, select MYSAS.EMPLOYEE_PAYROLL as the data set to analyze.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-31
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-32 Chapter 6 Visualizing Data
f. Are there any trends or changes in the hiring practices of the company?
Since 2003, the company seems to be hiring employees of a much more diverse age range.
3. Using the Pie Chart Task to Analyze Salary by Department
a. Use the Pie Chart task and select DATA.EMPLOYEE_DETAIL as the data set to analyze where
the slices represent the sum of Salary for each department.
1) Expand Tasks on the navigation pane. Expand Graph and select Bar Chart.
2) On the DATA tab, select DATA.EMPLOYEE_DETAIL as the data set to analyze.
3) Under ROLES, specify Department as the category variable by clicking the + symbol next to
the Category variable role.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-33
4) Under ROLES, specify Salary as the response variable by clicking the + symbol next to the
Response variable role.
b. Show the department name and salary total outside each slice.
On the OPTIONS tab, expand PIE LABELS and select Outside for the Location field.
c. Change the title to Total Salary by Department and run the task.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-34 Chapter 6 Visualizing Data
d. Select Run.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 7 Performing Statistical
Analyses with Tasks
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-3
Objectives
Describe each analysis in the ANOVA, Regression,
and Multivariate task menus.
Name an example of a research or business problem
that can be addressed by each type of analysis.
SAS University Edition has a wide selection of tasks for performing statistical analysis with your data.
With SAS programs, you can do even more. This section provides an overview of the tasks for statistical
analysis in SAS University Edition and examples of applications of each type of analysis.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-4 Chapter 7 Performing Statistical Analyses w ith Tasks
Cars Data
The demonstrations in this chapter use the CARS data
set, which is distributed with SAS software and included
in the Sashelp library.
Data is gathered for the price and physical characteristics
of 428 cars.
Sashelp.CARS
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-5
Sashelp Library
In addition to the MAPS library, SAS comes with data sets
in a library named Sashelp. These data sets are used
throughout the SAS documentation to show examples of
the usage of SAS software.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-6 Chapter 7 Performing Statistical Analyses w ith Tasks
t-Test Task
The t-Test task performs the following actions:
calls the TTEST procedure
performs one-sample, paired-samples, and
two-sample t tests
generates two types of confidence intervals for the
standard deviation
enables specification of the null hypothesis value for
the test
Examples of t Tests
Do students show a change in test scores after a
semester in a training program?
Does revenue differ between two advertising
campaigns?
Is the average temperature of a refrigerated product
significantly different from the target temperature of
34 degrees Fahrenheit?
10
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-7
11
12
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-8 Chapter 7 Performing Statistical Analyses w ith Tasks
13
14
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-9
15
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-10 Chapter 7 Performing Statistical Analyses w ith Tasks
One-Way ANOVA
Use the One-Way ANOVA task to compare the MSRP for cars made in the US, Europe, and Asia.
Compare groups using Tukey’s HSD test and create a plot of the means with error bars.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-11
3. On the DATA tab, select SASHELP.CARS as the data set for analysis. Click the Browse button
under DATA, expand the SASHELP library, and select CARS data set.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-12 Chapter 7 Performing Statistical Analyses w ith Tasks
6. On the OPTIONS tab, expand COMPARISONS. From the Comparisons method list, select Tukey.
(This is the default comparison method.)
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-13
The ANOVA results show F(2,425)=76.98 and p-value < 0.0001. This suggests that the MSRP is
significantly different for at least two of the Origin groups.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-14 Chapter 7 Performing Statistical Analyses w ith Tasks
Tukey’s Studentized Range (HSD) test compares the groups using an experimentwise error rate of 0.05.
To read the results, look at the final table. Any pair of groups that are significantly different is indicated
by a p-value less than 0.05.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-15
Here is the correct way to interpret these results: European cars have significantly different mean MSRP
values than US or Asian cars. The mean MSRPs for US and Asian cars are not significantly different.
The plot shows that the highest prices are for European cars. This plot also enables you to see outliers.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-16 Chapter 7 Performing Statistical Analyses w ith Tasks
17
18
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-17
19
20
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-18 Chapter 7 Performing Statistical Analyses w ith Tasks
21
22
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-19
23
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-20 Chapter 7 Performing Statistical Analyses w ith Tasks
Use the CARS data set to predict MPG_Highway from Weight using simple linear regression.
1. Expand Tasks on the navigation pane. Under Tasks, expand Statistics and double-click Linear
Regression.
2. On the DATA tab, select SASHELP.CARS as the data set for analysis. Click the Browse button
under DATA, expand the SASHELP library, and select the CARS data set.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-21
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-22 Chapter 7 Performing Statistical Analyses w ith Tasks
6. Highlight the variable Weight and click Add. The variable Weight is then added to the Model effects
list.
Here is the code that SAS Studio created from the Linear Regression task:
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-23
Interpretation
With a significant p-value of <.0001 for Weight, the regression results demonstrate a significant
negative linear relationship between Weight and MPG_Highway. The regression equation can
be determined from the Parameter Estimates table. We claim that there is sufficient evidence that
for every one pound increase in Weight, the MPG_Highway decreases by 0.006.
Equation
MPG_Highway = 48.25 – 0.006*Weight.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-24 Chapter 7 Performing Statistical Analyses w ith Tasks
The fit plot shows 95% confidence limits for the regression line (shaded area) and 95% prediction
limits (dashed lines). The plot shows evidence of some unusual observations, which could merit
further investigation.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-25
Correlations Task
The Correlations task performs the following actions:
calls the CORR procedure
performs correlation analysis and partial correlation
analysis
produces four different tests of association
displays Cronbach’s coefficient alpha and related
statistics
creates a scatter-plot matrix of variables and other
useful plots
25
26
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-26 Chapter 7 Performing Statistical Analyses w ith Tasks
27
28
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-27
Objectives
Use the Correlations task to explore data.
Use the Query Builder to transform data.
Use the Linear Regression task to fit a linear
regression model.
Locate a statistical option for a SAS program in the
SAS documentation.
Modify a SAS program generated by a task to perform
a customized statistical analysis.
Perform Gamma regression with the Generalized
Linear Models task.
30
Correlations Task
Investigate plots of MSRP versus predictor variables of
interest using the Correlations task.
31
In this section, the Correlations task produces a scatter plot of the response variable, MSRP, versus the
predictor variables of interest: Horsepower, MPG_City, MPG_Highway, and Weight. From these, you
can examine associations among these variables to help you determine which effects should be entered
into a linear regression model.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-28 Chapter 7 Performing Statistical Analyses w ith Tasks
32
The Linear Regression task provides model selection options, including stepwise and all-subsets
regression, with a variety of criteria. In the next demonstration, you fit an all-subsets regression model
using Mallows’ Cp as the criterion for assessing the best models.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-29
To explore the relationships between the predictors and the response, perform correlation analysis.
1. Expand Tasks on the navigation pane. Under Tasks, expand Statistics and double-click Correlation
Analysis.
2. On the DATA tab, select SASHELP.CARS as the data set for analysis. Click the Browse button
under DATA, expand the SASHELP library, and select the CARS data set.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-30 Chapter 7 Performing Statistical Analyses w ith Tasks
4. Assign Horsepower, MPG_City, MPG_Highway, and Weight to the Correlate with role. Select
Results.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-31
5. On the OPTIONS tab, expand PLOTS. Select Matrix of scatter plots for the Type of plot field.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-32 Chapter 7 Performing Statistical Analyses w ith Tasks
A scatter plot is created for each predictor and the response. There is also a scatter-plot matrix at the end
of the output, which might be easier to use in this example. The scatter-plot matrix shows evidence of
outliers and a curvilinear association between the predictors and MSRP. Also, evidence suggests that the
variance in MSRP is not constant across values of the predictors.
To fit linear regression with polynomial terms, higher-order variables can be created with the Transform
Data task.
The Linear Regression task does not build higher-order terms automatically. However, it does
perform stepwise and all-subsets regression. The Linear Regression task creates estimate
parameters and test hypotheses using the same statistical methods (ordinary or weighted least
squares).
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-33
7. To create higher-order terms, expand Tasks on the navigation pane. Under Tasks, expand DATA and
double-click Transform Data.
8. On the DATA tab, select SASHELP.CARS as the data set for analysis.
9. Assign Horsepower to the Variable 1 role
10. From the Transform menu, select the square. This creates a new variable named sqr_Horsepower.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-34 Chapter 7 Performing Statistical Analyses w ith Tasks
11. Expand OUTPUT DATA SET and create a data set named Transform. Select Run. This creates a
temporary data set of SASHELP.CARS, named WORK.TRANSFORM, and creates the variable
sqr_Horsepower
12. Now, in the same Transform Data task, select WORK.TRANSFORM as the data set for analysis.
13. Assign the variables MPG_highway MPG_city, and Weight to TRANSFORM 1, 2, and 3,
respectively. Square each variable by selecting Square in each of the corresponding Transform
fields.
14. Expand OUTPUT DATA SET and re-create the data set transform_cars to add these additional
parameters. Select Run.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-35
Now we want to create a new linear regression using the new higher-ordered parameters.
15. Expand Tasks on the navigation pane. Under Tasks, expand Statistics and double-click Linear
Regression.
16. On the DATA tab, select WORK.TRANSFORM_CARS as the data set for analysis. Click the
Browse under DATA, expand the WORK library, and select the TRANSFORM_CARS data set.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-36 Chapter 7 Performing Statistical Analyses w ith Tasks
18. Assign Horsepower, MPG_City, MPG_Highway, Weight, and the four squared variables that you
created as explanatory variables by clicking the + symbol next to Continuous variables.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edit ion 7-37
19. On the MODEL tab, highlight all of the variables in the Variable list and click Add. This adds all the
variables to the Model effects list.
20. On the SELECTION tab, expand MODEL SELECTION and then do the following:
a. Select Stepwise selection in the Selection method field.
b. Select Mallows’ C(p) Statistic in the Add/remove effects with field.
c. Select Mallows’ C(p) Statistic in the Stop Adding/removing effects with field.
d. Select Mallows’ C(p) Statistic in the Select best model by field.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-38 Chapter 7 Performing Statistical Analyses w ith Tasks
21. On the SELECTION tab, expand SELECTION STATISTICS. Select Selected fit statistics in the
Model fit statistics field. A number of selection statistics appear. Select Adjusted R-square.
Therefore, with one mile per hour increase MPG_Highway and holding all other variables constant,
MSRP will increase $2134.66.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-39
Example, with a p-value of 0.0043, MPG_City is a significant negative linear predictor of MSRP
when MPG_Highway, Sqr_horsepower, Sqr_MGP_highway, and Sqr_MPG_City are in the
model.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-40 Chapter 7 Performing Statistical Analyses w ith Tasks
SAS Syntax
SAS programs consist of two types of steps:
DATA steps create, modify, format, and combine data.
– The DATA step begins with a DATA statement.
PROC steps perform complex analyses or operations
with a few simple statements.
– A PROC step begins with a PROC statement.
A SAS statement begins with a keyword and ends with
a semicolon.
34
DATA Step
35
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-41
PROC Step
proc glm data=exp;
class group;
model y = a b a*b;
run;
36
37
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-42 Chapter 7 Performing Statistical Analyses w ith Tasks
In this example, use the documentation to find an option to display only the five best models from the all-
subsets regression, and use this option in the SAS program generated by SAS University Edition.
1. Look at the SAS program generated by SAS University Edition by clicking the Code tab. Scroll
down to the section that begins with PROC REG.
The REG procedure is the primary procedure called by the Linear Regression task.
2. To open the SAS documentation for the REG procedure, Click the ? in the top right corner of SAS
Studio and select SAS Product Documentation.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-43
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-44 Chapter 7 Performing Statistical Analyses w ith Tasks
4. On the Documentation tab, click HTML next to SAS 9.4 Procedures by Name and Product.
5. In the left navigation pane, expand SAS Procedures and click SAS Procedures by Name.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-45
6. Under R, click REG to access the documentation for the REG procedure (in this case, the SAS/STAT®
User’s Guide).
7. From here you can review the syntax for all of the options in PROC REG. If you want to add code to
the pre-existing code that the Linear Regression task has written, on the CODE tab in the right
navigation pane, click Edit. This opens a new programming tab and copies the same code from the
Linear Regression task. You can now edit the code.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-46 Chapter 7 Performing Statistical Analyses w ith Tasks
39
The Generalized Linear Models task fits models in which some function of the response variable (called
the link function) is modeled as a linear combination of predictors and model parameters. In generalized
linear models, the responses can come from any distribution in the exponential family of distributions.
These distributions can represent continuous or discrete data.
Examples of continuous distributions in the exponential family include normal and gamma. The gamma
distribution is useful for right-skewed data in which the variance increases as a function of the square of
the mean. In the previous example, the variance of MSRP was not constant, so a gamma regression
model might be more appropriate.
In addition to plots that can provide evidence of nonconstant variance, the Linear Regression task
in SAS University Edition can request a statistical test for heteroscedasticity. In the Linear
Regression task, select Statistics Heteroscedasticity test.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-47
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-48 Chapter 7 Performing Statistical Analyses w ith Tasks
Although the inverse link is the canonical link for the gamma distribution, the log link is
often used to maintain positive predicted values. That makes sense in this example, because
MSRP should not be negative.
You do not need to add the squared terms that you created, although you can do so if you
prefer. The Generalized Linear Models task can create the polynomial terms for you
automatically.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-49
6. On the MODEL tab, select the four variables in the Variables pane and click Polynomial Order = N.
An N=2 degrees of freedom specification generates all one- and two-way polynomial terms.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-50 Chapter 7 Performing Statistical Analyses w ith Tasks
The quadratic term for MPG_Highway is not statistically significant, and it has the largest p-value,
0.2033. It might be useful to drop this term and refit the model.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-51
Partial Output
The Diagnostic Statistics panel shows evidence of several outlier or leverage points (or both) in the data
set. These observations should be investigated further.
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-52 Chapter 7 Performing Statistical Analyses w ith Tasks
Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.