Gasue34 003 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 270

Data Manipulation and

Analytics Using SAS®


University Edition

Course Notes
Data Manipulation and Analytics Using SAS® University Edition Course Notes was developed by Jaymie
Shanahan and Mike Speed. Editing and production support was provided by the Curriculum Development
and Support Department.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product
names are trademarks of their respective companies.
Data Manipulation and Analytics Using SAS ® University Edition Course Notes
Copyright © 2016 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States of
America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written
permission of the publisher, SAS Institute Inc.
Book code E70815, course code GASUE34, prepared date 03Aug2016. GASUE34_003

ISBN 978-1-62960-685-9
For Your Infor mation iii

Table of Contents
Course Description ......................................................................................................vi

Prerequisites .............................................................................................................. vii

Chapter 1 Getting Started .................................................................................. 1-1

1.1 Using/Installing SAS University Edition .............................................................. 1-3

1.2 Introducing SAS University Edition .................................................................... 1-8


Demonstration: ACTEVA Data Analysis: Part 1 .............................................1-16
Demonstration: ACTEVA Data Analysis: Part 2 .............................................1-25

1.3 Course Scenarios .............................................................................................1-32


Demonstration: Navigating the SAS University Edition Workspace...................1-35
Exercises ..................................................................................................1-38

1.4 Solutions ........................................................................................................1-39


Solutions to Exercises.................................................................................1-39

Chapter 2 Working with Data in a Project ......................................................... 2-1

2.1 Introduction to Tabular Data............................................................................... 2-3

2.2 Accessing Microsoft Excel Data ........................................................................2-10


Demonstration: Reading in an Excel File....................................................... 2-11
Exercises ..................................................................................................2-18

2.3 Importing JMP, Tab-Delimited, STATA, SPSS Files ..............................................2-20


Demonstration: Read a JMP File into University Edition .................................2-21
Demonstration: Read a Tab-Delimited File into University Edition ....................2-23
Exercises ..................................................................................................2-27

2.4 Solutions ........................................................................................................2-29


Solutions to Exercises.................................................................................2-29

Chapter 3 Combining Data with Queries........................................................... 3-1

3.1 Introduction to Filtering, Sorting, and Querying Data............................................. 3-3

3.2 Joining Tables .................................................................................................. 3-5


Demonstration: Joining Tables ...................................................................... 3-9
iv For Your Information

Chapter 4 Transforming Data ............................................................................ 4-1

4.1 Creating New Columns with an Expression .......................................................... 4-3


Demonstration: Creating a Column with an Expression .................................... 4-8
Exercises ................................................................................................... 4-9

4.2 Creating New Columns by Recoding Values ........................................................4-12


Demonstration: Creating a New Column by Recoding Values ..........................4-14
Exercises ..................................................................................................4-15

4.3 Solutions ........................................................................................................4-17


Solutions to Exercises.................................................................................4-17

Chapter 5 Exploring Data with Tasks ................................................................ 5-1

5.1 Creating a Frequency Report .............................................................................. 5-3


Demonstration: Generating a One-Way Frequency Report ................................ 5-9
Exercises .................................................................................................. 5-11

5.2 Generating Summary Statistics ..........................................................................5-12


Demonstration: Summarizing Data...............................................................5-17
Exercises ..................................................................................................5-25

5.3 Filtering Data in a Task.....................................................................................5-28


Demonstration: Filtering Data in a Task ........................................................5-31
Exercises ..................................................................................................5-36

5.4 Comparing Means ...........................................................................................5-38


Demonstration: Comparing Means ...............................................................5-41
Demonstration: Distribution Analysis ...........................................................5-45
Demonstration: Nonparametric One-Way ANOVA .........................................5-49

5.5 Solutions ........................................................................................................5-53


Solutions to Exercises.................................................................................5-53

Chapter 6 Visualizing Data ................................................................................ 6-1

6.1 Creating a Graph .............................................................................................. 6-3


Demonstration: Creating a Bar Chart ............................................................. 6-8
Demonstration: Creating a Stacked Bar Chart ................................................6-14
Exercises ..................................................................................................6-17
For Your Infor mation v

6.2 Creating a Scatter Plot and Pie Chart ..................................................................6-18


Demonstration: Creating a Two-Dimensional Scatter Plot with a Regression
Line ..................................................................................6-19
Demonstration: Creating a Pie Chart.............................................................6-23
Exercises ..................................................................................................6-26

6.3 Solutions ........................................................................................................6-27


Solutions to Exercises.................................................................................6-27

Chapter 7 Performing Statistical Analyses with Tasks ..................................... 7-1

7.1 Analytical Tasks Overview................................................................................. 7-3


Demonstration: One-Way ANOVA ...............................................................7-10
Demonstration: Simple Linear Regression Analysis ........................................7-20

7.2 Performing Statistical Analyses in SAS University Edition ....................................7-27


Demonstration: Polynomial Regression Analysis ............................................7-29
Demonstration: Customizing a SAS Program .................................................7-42
Demonstration: Gamma Regression in the Generalized Linear Models Task .......7-47
vi For Your Information

Course Description
This introduction to SAS University Edition hands-on workshop shows how one can use the menu driven
tasks and SAS code in SAS University Edition 4.3 to perform common reporting and research tasks:
querying, reporting, and analyzing data. Several statistical procedures will be used to analyze data and
produce reports. SAS University Edition provides a SAS graphical point-and-click interface as well as
code that helps you exploit the power of SAS and publish dynamic results in a Microsoft Windows client
application. Demonstrations in the presentation will use research type data and tasks in illustrating the
functionality of SAS University Edition.

To learn more…

For information about other courses in the curriculum, contact the SAS
Education Division at 1-800-333-7660, or send e-mail to [email protected].
You can also find this information on the web at http://support.sas.com/training/
as well as in the Training Course Catalog.

For a list of other SAS books that relate to the topics covered in this
course notes, USA customers can contact the SAS Publishing Department
at 1-800-727-3228 or send e-mail to [email protected]. Customers outside
the USA, please contact your local SAS office.
Also, see the SAS Bookstore on the web at http://support.sas.com/publishing/
for a complete list of books and a convenient order form.
For Your Infor mation vii

Prerequisites
This course is designed for users who are familiar with Windows and other software, such as Microsoft
Office or spreadsheet programs. No programming experience or SAS knowledge is required. Having
completed an undergraduate course in statistics covering p-values, hypothesis testing, analysis of variance
and regression is helpful, but not required.
viii For Your Information
Chapter 1 Getting Started

1.1 Using/Installing SAS University Edition ................................................................... 1-3

1.2 Introducing SAS University Edition .......................................................................... 1-8


Demonstration: ACTEVA Data Analysis: Part 1.......................................................... 1-16
Demonstration: ACTEVA Data Analysis: Part 2.......................................................... 1-25

1.3 Course Scenarios .................................................................................................... 1-32


Demonstration: Navigating the SAS University Edition Workspace ............................... 1-35

Exercises............................................................................................................. 1-38

1.4 Solutions ................................................................................................................. 1-39


Solutions to Exercises ........................................................................................... 1-39
1-2 Chapter 1 Getting Started

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Using/Installing SAS University Edition 1-3

1.1 Using/Installing SAS University Edition

Objectives
 Use/Install SAS University Edition.

3
3

Using SAS University Edition on a RACE image.


1. On the Desktop, click the SAS Studio icon.
2. The login name is SAS and the password is Orion123.
3. Expand Folder Shortcuts and D:.

4. Naviage to D:  Workshop  WINSAS  GASUE34. Expand GASUE34.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-4 Chapter 1 Getting Started

5. Right-click My SAS Files.

6. Click Create  Library.

7. Change the Name to MYSAS and make sure that the Re-create box is selected. Then click OK.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Using/Installing SAS University Edition 1-5

8. Create the data library. Click Server Files and Folders. Navigate to
D:\Workshop\WINSAS\GASUE34\data.
9. Right-click and create a library. Select Re-create this library at start-up.

 The folder structure is as follows:


 Data folder contains the SAS, xls, xlsx, tab-delimited, JMP, Stata, and so on, files used in
this course.
 MY SAS Files are the files created in this course. (Some might already exist.)
 SAS CODE contains SAS programs needed for this class.

10. Go to Section 1.2.


Installing SAS University Edition on a Local Machine.
1. Go to http://www.sas.com/en_us/software/university-edition.html.
2. Click Get free software.
3. Under Direct Download, click Download Now.
4. Choose your operating system. We use Windows for this course.
5. Get the Quick Start Guide (PDF or Video).

 Install Oracle VirtualBox virtualization software on your machine. We have also installed it.

 Download the SAS University Edition vApp. We have already done this step on the race
image.

6. After you click the Download button, you are prompted to do the following (Skip this step if you are
using a RACE image.):
a. Create a SAS profile if you do not already have one. If you already have a SAS profile, sign in.

b. Accept the user licensing agreement.


c. Begin the download. If your browser asks whether you want to save or open the file, click Save to
save the file in your Downloads directory.
1) Add the SAS University Edition vApp to VirtualBox.

a) Launch VirtualBox and then select File  Import Appliance.


b) From the Downloads (or D:\Workshop\WINSAS\GASUE34 directory on the race image)
select the file for the SAS University Edition vApp (an OVA file), and then click Open.
c) Click Next and then click Import.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-6 Chapter 1 Getting Started

2) Create a folder for your data and results.


a) On your local computer (in a location that you will remember, or the D: drive for race
image), create a folder named SASUniversityEdition and a subfolder named myfolders.
You will save all of your SAS University Edition files to this location. Copy all of the
folders and files from D:\Workshop\WINSAS\GASUE34 into myfolders.

b) In VirtualBox, select the SAS University Edition vApp, and then select Machine 
Settings.

c) In the navigation pane of the Settings dialog box, select Shared Folders and then click
.

d) In the Add Share dialog box, select Other as the folder path.
e) In the Browse for Folder window, open the SASUniversityEdition folder and select the
myfolders subfolder. Click OK (or Choose, depending on your operating system).

f) In the Add Share dialog box, confirm that Read-only is not selected, and then select the
Auto-mount and Make Permanent (if available) options. Click OK.

3) Click OK to close the Settings dialog box.


4) Start the University Edition vApp.
In VirtualBox, select the SAS University Edition vApp, and then select Machine  Start. It
might take a few minutes for the virtual machine to start.

5) Open SAS University Edition.


a) In a web browser on your local computer, enter http://localhost:10080.
b) From the SAS University Edition: Information Center, click Start SAS Studio.
6) Create SAS libraries
a) Expand My Folders.
b) Right-click My SAS Files.

(1) Click Create  Library and name the library MYSAS. Select Re-create this
library at start-up and then click OK.

(2) Create the data library. Click Server Files and Folders. Navigate to
D:\Workshop\WINSAS\GASUE34\data.

(3) Right-click and create the library. Select Re-create this library at start-up.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Using/Installing SAS University Editio n 1-7

 The folder structure is as follows:


 Data folder contains the SAS, xls, xlsx, tab-delimited, JMP, Stata, and so on, files used
in this course.
 MY SAS Files are the files created in this course. (Some might already exist.)
 SAS CODE contains SAS programs needed for this class.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-8 Chapter 1 Getting Started

1.2 Introducing SAS University Edition

Objectives
 Explain a typical workflow for data analysis.
 State the features of SAS University Edition.
 State the capabilities and major features of
SAS software.
 State the purpose of the different areas of the
SAS University Edition workspace.
 Name the steps in a typical SAS University Edition
session.

5
5

Achieving Success with Analytics


Optimization Modeling

Predictive Modeling
What is the best that
Descriptive could happen?
$ROI Modeling
Ad Hoc Reports What will happen?
and OLAP
Standard
Reports Why did it happen?
Raw Data
What happened?

Data Information Intelligence

Decision Support Decision Guidance

6
6

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-9

The Process of Data Analysis


 Accessing data sources
 Combining tables
 Transforming variables
 Exploring and describing data
 V isualizing patterns in the data
 Analyzing and modeling data

7
7 ...

The Process of Data Analysis


 Accessing data sources
 Combining tables
 Transforming variables
 Exploring and describing data
 V isualizing patterns in the data
 Analyzing and modeling data

8
8

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-10 Chapter 1 Getting Started

Introduction to SAS University Edition


SAS University Edition is a bundle of the four most
commonly used SAS products, ideally suited to
accomplish ACTEVA data analysis.
 Base SAS
 SAS/STAT
 SAS/IML
 SAS/ACCESS Interface to PC Files
For all desktop and many server installations,
SAS University Edition provides easy point-and-click
access to data processing, graphing, and analysis.

9
9

Introduction to SAS University Edition


SAS University Edition is driven by SAS programs.
 SAS programs consist of commands in the form of
DATA steps and PROC (or procedure) steps.
 DATA steps generate, transform, combine, and modify
data.
 PROC steps perform complex actions on data using a
few simple lines of code.
 SAS University Edition is a point-and-click interface to
SAS that writes programs automatically.

10
10

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-11

Introduction to SAS University Edition

Example of a Bar Chart

11
11

SAS University Edition software is an easy-to-use browser-based application that provides the following:
 an intuitive, visual interface
 access to the power of SAS
 transparent access to data
 ready-to-use tasks for analysis and reporting
 easy exporting of data and results to other applications
 scripting and automation
Users of all experience levels, from novice to expert, can use SAS University Edition to produce
meaningful results quickly.

 This course is written for SAS University Edition 3.5. Previous versions have similar
functionality, but the interface has significant differences. To determine which version of SAS
University Edition you are using, select Help  About SAS University Edition.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-12 Chapter 1 Getting Started

Introduction to SAS University Edition


SAS University Edition provides a point-and-click
interface for managing data and generating reports.

12
12

SAS University Edition provides a state-of-the-art browser-based interface with these features:
 drag-and-drop functionality
 dialog boxes
 wizards
 a color-coded syntax editor
 a full online Help facility, embedded context-sensitive Help, and a Getting Started tutorial

Introduction to SAS University Edition


Behind the scenes, SAS University Edition generates
SAS programs that are submitted to SAS, and results
are returned to SAS University Edition.

13
13

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-13

SAS University Edition Interface


SAS University Edition also includes a full programming
interface that can be used to write, edit, and submit
SAS programs.

14
14

Using SAS University Edition, you can access the power of SAS without learning the SAS programming
language.

If you are a SAS programmer, you can use the SAS University Edition Code Editor to create new code or
to modify existing SAS programs.

SAS University Edition Program Editor


SAS University Edition includes a programming editor
similar to the Enhanced Program Editor.
Additional functionality in the SAS University Edition
Program Editor includes the following:
 autocomplete
 dynamic syntax
tooltips
 formatting programs
to provide consistent
spacing

15

SAS University Edition provides transparent access to the following:


 local or remote SAS data
– all SAS data sets (SAS 6, SAS 8, SAS®9)

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-14 Chapter 1 Getting Started

 local and remote data other than SAS


– Microsoft Excel, csv, tab-delimited, SPSS, JMP, and Stata
– ODBC, OLE DB, Microsoft Exchange folders
– tables from databases such as Oracle and DB2 (using licensed SAS database engines)
You can modify the programming code to customize the results and to access SAS features that are not
available in the point-and-click interface.

SAS University Edition Interface

Tabs

16
16

Behind the Scenes


As you build tasks, SAS University Edition generates
SAS programs.

17
17

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-15

Overview Demonstration: Part 1

18
18

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-16 Chapter 1 Getting Started

ACTEVA Data Analysis: Part 1

This demonstration illustrates the use of SAS University Edition to analyze the profits for sporting goods
by state among U.S. customer orders from the southwest sales region.

 Many of the tasks shown in this demonstration are discussed later in the course. This
demonstration is intended as a broad overview of a typical data analysis case study.
1. Open SAS University Edition.

2. Expand the My Folders and Data folders.

3. Select USCustomers.xlsx from the list of files in the Data folder.


4. Drag the file to the Program area to start the Import Data task.
You should see this:

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-17

5. Click the Change button.

6. Highlight MYSAS and change the name from IMPORT to USCustomers. Click Save.
7. Click the Running Person button to run the code.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-18 Chapter 1 Getting Started

8. Click the Code/Results tab.

9. In the MYSAS Files folder, open USCUSTOMERS.sas7bdat by double-clicking. View the data and
then close it by clicking on the x in the tab..

10. Join the USCUSTOMERS data table to the orders.sas7bdat data table in the Data folder. Directly
below the S on the Server Files and Folders tab, there is a button. Click the down arrow to
the right of the button.

11. Click Query.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-19

12. Click the + button to add the two tables.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-20 Chapter 1 Getting Started

13. Click the + button again and select JOIN. Using the drop-down arrows, change the names of the left
and right tables as shown below and then click Save.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-21

14. Select the variables that you want to keep by clicking the COLUMNS tab. From the ORDERS table,
drag Profit from the pane on the left to the Column Name field. Select the Summary field for Profit
and change the value from NONE to SUM. From the USCUSTOMERS table, drag the variable
State to the Column Name field.

15. Click the PROPERTIES tab and make the changes:

16. Run this program by clicking the running person. Look in your MYSAS Files folder to see the new
file.
17. Create a pie chart of profits by State.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-22 Chapter 1 Getting Started

18. Select Tasks  Graph  Pie Chart.

19. Double-click Pie Chart. Use the Table button to navigate to the file. Use the + buttons to assign
variables.

20. Assign State as the category variable. Assign Profit as the response variable.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-23

21. Click the Running Person button.


22. Save as a PDF file.

The pie chart shows that the profits are highest in OK (Oklahoma) and lowest in AZ (Arizona).

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-24 Chapter 1 Getting Started

Overview Demonstration: Part 2

20
20

The next objective is to evaluate whether the profits vary significantly by order type (retail, phone, and
Internet sales).

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-25

ACTEVA Data Analysis: Part 2

Perform analysis of variance with pairwise comparisons to determine whether profits differ significantly
across order type.

1. Select Tasks  Statistics  One-Way ANOVA

2. Select the Orders data set and use the + button to add Profit as the dependent variable and
Order_Type as the categorical variable (factor).

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-26 Chapter 1 Getting Started

3. Click the OPTIONS tab and make the changes below:

4. Run the code.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-27

5. The tabular results from the analysis of variance appear first, followed by the diagnostic graphics.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-28 Chapter 1 Getting Started

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-29

There is evidence of poor model fit, suggested by the large number of outliers and the skewed distribution
of the residuals. At this point, we would usually stop and fix the problems. However, for illustration,
further analyses are shown.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-30 Chapter 1 Getting Started

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Introducing SAS University Edition 1-31

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-32 Chapter 1 Getting Started

1.3 Course Scenarios

Objectives
 Define the business need for the scenario used in the
class demonstrations.
 Define the business need for the scenario used in the
exercises.

24
24

Three Levels of Exercises


Level 1 The exercise mimics an example presented
in the section.

Level 2 Less information and guidance are provided


in the exercise instructions, possibly including
exploring options not presented in the section.

Level 3 Assumes prior knowledge of the SAS


programming language or advanced statistical
concepts. Exercises illustrate the integration of
point-and-click functionality together with the
ability to write and edit code.
 You are not expected to complete all of the exercises
in the time allotted. Choose the exercise or exercises
that are at the level that you are most comfortable with.
25
25

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Course Scenarios 1-33

Orion Star Sports & Outdoors

For most demonstrations and exercises, you use data


from Orion Star Sports & Outdoors. This is a fictitious
global sports and outdoors retailer with traditional stores,
an online store, and a large catalog business.
The corporate headquarters is located in the United
States with offices and stores in many countries
throughout the world.
Orion Star has approximately 1,000 employees and
90,000 customers, processes approximately 150,000
orders annually, and purchases products from 64
suppliers.
26
26

Orion Star Project


Throughout the course, use SAS University Edition to
access, combine, transform, explore, and analyze data.

27
27

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-34 Chapter 1 Getting Started

Navigating the SAS University Edition


Workspace
Examine Customer
Data
 Characterize the
data task.

Investigate the Output


 Examine a SAS
Open Data
program.
 customers
 Examine the log.
 Export data.

228
8

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Course Scenarios 1-35

Navigating the SAS University Edition Workspace

Start SAS University Edition.


1. Open SAS University Edition.
2. Select Tasks  Data  Characterize Data. Double-click and add the Customers data set under
DATA. Select all variables except the ID variable.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-36 Chapter 1 Getting Started

3. Click the OPTIONS tab and make the changes below.

The output includes frequency counts for character columns, summary statistics for numeric columns,
and basic graphs to characterize each.

Partial Output

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Course Scenarios 1-37

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-38 Chapter 1 Getting Started

Exercises

Level 1

1. Creating a Project and Running the Characterize Data Task


Use the Characterize Data task to investigate the distribution of each of the columns in the
employee_organization data set.
a. Select Tasks  Data  Characterize Data.
b. Limit the number of unique categorical values to be reported per variable to 15.
c. Run the task and examine the results.

Level 2

2. Running the List Table Attributes Task

 The List Table Attributes task enables you to create a report with information about a
particular data set. The report includes creation date, location, and number of observations, as
well as the variable names, labels, types, and formats. Select Tasks  Data  List Table
Attributes.

a. Use the List Table Attributes task on the employee_organization data set. Create the default
report only.

b. Run the task and examine the results.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.4 Solutions 1-39

1.4 Solutions
Solutions to Exercises
1. Creating a Project and Running the Characterize Data Task
Use the Characterize Data task to investigate the distribution of each of the columns in the
employee_organization data set.

a. Select Tasks  Data  Characterize Data.


From the data library, select the employee_organization data set.
b. Limit the number of unique categorical values to be reported per variable to 15.
c. Run the task and examine the results.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-40 Chapter 1 Getting Started

Partial Output

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.4 Solutions 1-41

2. Running the List Table Attributes Task

 The List Table Attributes task enables you to create a report with information about a
particular data set. The report includes creation date, location, and number of observations, as
well as the variable names, labels, types, and formats. Select Tasks  Data  List Table
Attributes.

a. Use the List Table Attributes task on the employee_organization data set. Create the default
report only.

b. Run the task and examine the results.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-42 Chapter 1 Getting Started

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 2 Working with Data in a
Project

2.1 Introduction to Tabular Data ..................................................................................... 2-3

2.2 Accessing Microsoft Excel Data ............................................................................. 2-10


Demonstration: Reading in an Excel File.................................................................. 2-11

Exercises............................................................................................................. 2-18

2.3 Importing JMP, Tab-Delimited, STATA, SPSS Files................................................. 2-20


Demonstration: Read a JMP File into University Edition.............................................. 2-21

Demonstration: Read a Tab-Delimited File into University Edition................................. 2-23


Exercises............................................................................................................. 2-27

2.4 Solutions ................................................................................................................. 2-29


Solutions to Exercises ........................................................................................... 2-29
2-2 Chapter 2 Working w ith Data in a Project

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Introduction to Tabular Data 2-3

2.1 Introduction to Tabular Data

Objectives
 State the definition of a SAS data set.
 State how data is stored in a SAS data set.

3
3

The Process of Data Analysis


 Accessing data sources
 Combining tables
 Transforming variables
 Exploring and describing data
 V isualizing patterns in the data
 Analyzing and modeling data

4
4

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-4 Chapter 2 Working w ith Data in a Project

Common Data Formats


SAS Enterprise Guide can read and use data from a
variety of different formats.
Fixed-width
JMP, SPSS,
and delimited
and Stata files
text files
Microsoft Excel
spreadsheets
ODBC- Microsoft
compliant data Access tables
OLE DB
provider’s files
SAS data sets SAS/ACCESS

5
5

SAS/ACCESS Software
SAS/ACCESS enables you to read data from software
vendors’ files and applications and use those files as
input data for SAS programs and SAS tasks.
If you have appropriate authority, SAS/ACCESS also
enables you to write to other software vendors’ files.
Your system administrator will likely enable access to the
systems that you need.

6
6

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Introduction to Tabular Data 2-5

Interface Engines
Teradata Teradata
Physical Storage Engine

DB2 DB2 SAS


Physical Storage Engine Software

Oracle Oracle
Physical Storage Engine

7
7

Other engines include relational database systems such as SQL Server, Sybase, Informix, and OLE DB.
SAS/ACCESS software for non-relational databases includes ADABAS, IMS, and CA-IDMS.
SAS/ACCESS software to Data Warehouse Appliances includes Netezza and HP Neoview.

SAS Data Set


A SAS data set (or table) is a rectangular table of rows
and columns.

Rows

Columns

8
8

SAS University Edition provides data sources to tasks and queries in table format (a set of rows and
columns). A row is one occurrence of an entity. An entity can be a product, a customer, a sale, or some
other thing. Each column describes a characteristic of the entity, such as the product's ID, the customer's
name, or the quantity sold.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-6 Chapter 2 Working w ith Data in a Project

SAS Data Set


All columns must have a name, type, and length.
Names can
be 1 to 32
characters
long.

9
9

SAS University Edition allows spaces and special symbols in variable names because it invokes the
VALIDVARNAME=ANY option when SAS is initialized. Because not all SAS environments invoke this
option, it is recommended that you follow SAS standard naming conventions for column names if you
anticipate running code generated by SAS University Edition in other SAS applications. Standard naming
rules are that columns must start with a letter or underscore, and that the remaining characters are letters,
underscores, or numbers.

 Labels for the columns can be up to 256 characters in length.

SAS Data Set


A column’s type is either character (string) or numeric.
The type plays a role in determining the length.

Numeric values are 8 bytes


of floating point storage:
Character
values are Numeric
1 to 32,767 Date (days from 01JAN1960)
characters
Time (seconds from midnight)
(bytes) long.
10

10

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Introduction to Tabular Data 2-7

Currency, date, and time data types are stored as numeric data. Declaring a column as a currency, date, or
time data type enables SAS University Edition to validate data values when you edit your table and to
display values in a default format.

SAS Data Set


A format is used to control how values are displayed.
Formats do not affect how values are stored.

Format: DDMMYY Format: DOLLAR


Width: 10 Width: 8
Stored value: 15766 Decimal Places: 2
Stored value: 234.60

11
11

A format (display format) is an instruction that you apply to a column. The format directs SAS University
Edition as to how to display data values. Use formats to control the appearance of data values or to group
data values together for analysis.

An informat (read-in format) is an instruction that SAS University Edition uses to read data values into a
variable. For example, the following value contains a dollar sign and commas:
$1,000,000

To remove the dollar sign ($) and commas (,) before storing the numeric value 1000000 in a variable, read
this value with the DOLLAR10. informat.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-8 Chapter 2 Working w ith Data in a Project

SAS Data Set Properties


Viewing a data set’s properties enables you to examine the
data set and column attributes. Double-click the filename.

12
12

Attributes of a SAS data set are stored in the properties and include the following:
 the data set name
 the storage location
 the date last modified
 all column attributes (such as name, type, and length)
 the number of rows and columns

Missing Values
If a data value is not present for a column in a particular
row, it is considered missing.
 A missing character value is displayed as a blank.
 A missing numeric value is displayed as a period or dot.

13
13

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Introduction to Tabular Data 2-9

SAS Date Values


SAS stores date values as numeric values.

01JAN1959 01JAN1960 01JAN1961

store
-365 0 366

display

01/01/1959 01/01/1960 01/01/1961

A SAS date value is stored as the number of days


between January 1, 1960, and a specific date.

16
16

SAS can perform calculations on dates starting from 1582 A.D.


SAS can read either two- or four-digit year values. If SAS encounters a two-digit year, the
YEARCUTOFF= system option is used to specify to which 100-year span the two-digit year
should be attributed. For example, by setting the option YEARCUTOFF= option to 1950, the
100-year span from 1950 to 2049 is used for two-digit year values.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-10 Chapter 2 Working w ith Data in a Project

2.2 Accessing Microsoft Excel Data

Objectives
 Add a local Excel spreadsheet to the SAS files.
 View the properties of the data.

18
18

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Microsoft Excel Data 2-11

Reading in an Excel File

In this demonstration, we are not going to repeat all of the steps in Chapter 1 starting on page 1-16. We
will use selected steps from 1 to 8.

1. Orion maintains a list of products in a Microsoft Excel file named products.xlsx. To use this data in
SAS University Edition, it must be imported into a SAS data set. Highlight the filename and drag it
over.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-12 Chapter 2 Working w ith Data in a Project

2. The next step is to confirm the source data and designate the output SAS data set name and storage
location.

Verify the file


to be imported.

Change the name and storage


location for the output SAS
data set.

The Import Data Wizard can read the following:


 dBASE III, dBASE IV, and dBASE 5.0
 Microsoft Excel 4-7, Excel 97, Excel 2000, Excel 2002 (XP), Excel 2003, and Excel 2007
 Microsoft Exchange Server 2000, 2003, and 2007
 HTML
 IBM Lotus 1-2-3 (*WKS, *.WK1, *.WK3, *.WK4)
 Paradox 3 - 12
 text (delimited or fixed column)
 all versions of Stata (*.DTA) under Microsoft Windows
 all versions of SPSS (*.SAV) under Microsoft Windows
 JMP

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Microsoft Excel Data 2-13

 To import a Microsoft Excel 2007 file, you must have any of the follow ing installed on your
computer:
 Microsoft Excel 2007.
 the 2007 Office System Driver.
 the Microsoft Office 2007 Compatibility Pack for Word, Excel, and PowerPoint 2007 file
formats if you are using Microsoft Office 2000, Office 2002, or Office 2003. You can
download the compatibility pack or the 2007 Office system driver from the Microsoft website.
3. Change the library name and filename.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-14 Chapter 2 Working w ith Data in a Project

4. Run the code. The new SAS data set is created

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Microsoft Excel Data 2-15

5. Save the code

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-16 Chapter 2 Working w ith Data in a Project

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Microsoft Excel Data 2-17

Updating Results
You can use the products.ctl file to rerun the task.

20
20

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-18 Chapter 2 Working w ith Data in a Project

Exercises

Level 1

1. Creating a New SAS Data Set from an Excel Spreadsheet


a. Add the Phones spreadsheet from the employee_info.xlsx workbook.
b. Open the employee_info.xlsx workbook using the drag-and-drop technique and create a new SAS
data set named employee_info_phones. Save it in the MYSAS library.

 Read the data from the Phones worksheet and indicate that the first row contains field
names.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Microsoft Excel Data 2-19

Level 2

2. Creating a New SAS Data Set from a Worksheet


a. Add the address table from employee_info.xlsx.
b. Create a SAS data set named employee_info_phone from the Address worksheet contained in
the employee_info workbook.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-20 Chapter 2 Working w ith Data in a Project

2.3 Importing JMP, Tab-Delimited, STATA,


SPSS Files

Objectives
 Use drag and drop to import JMP files.
 Use the Import Data task to import tab-delimited,
SPSS, and STATA files into SAS University Edition.

25
25

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing JMP, Tab- Delimited, STATA, SPSS Files 2-21

Read a JMP File into University Edition

In the data folder, there is a JMP file named PROFIT_SUMMARY. Drag it to the workspace. Change
the data set name to Profit_Summary. Run it.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-22 Chapter 2 Working w ith Data in a Project

Business Scenario
Orion Star has a tab-delimited text file named orders_tab.
To use the data in the text file, it must be imported into
a SAS data set.

29
29

Business Scenario
The SAS data set must have the following attributes:
 Name the data set orders_tab.
 Exclude the Employee_ID column.
 Assign column names based on the first row of the
text file.
 Format Profit as a currency value.
 Format Discount with a percent sign.

30
30

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing JMP, Tab- Delimited, STATA, SPSS Files 2-23

Read a Tab-Delimited File into University Edition

To your computer, a text file is only strings of characters. SAS University Edition requires that the data be
structured with rows and columns (observations and variables).

We cannot use the drag-and-drop functionality directly. We will use the Import Data function.
1. The Import Data Function can be used to convert a text file into a customized SAS data set.

2. Select Import Data.

3. Drag over the Orders_tab file to the workspace.


4. Highlight MYSAS folder

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-24 Chapter 2 Working w ith Data in a Project

5. Change the filename and click Save.

6. Make the changes below and then run.

 ‘09’x is the hexadecimal representation for tab.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing JMP, Tab- Delimited, STATA, SPSS Files 2-25

Next Steps
 Exclude the Employee_ID column.
 Format Profit as a currency value.
 Format Discount with a percent sign.
 Click the Code button.

31

7. Use the Orders_tab.sas code. Press the F4 key and drag the file to the workspace. Run it.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-26 Chapter 2 Working w ith Data in a Project

Results

8. Click Discount so that its values are sorted highest to lowest.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing JMP, Tab- Delimited, STATA, SPSS Files 2-27

Exercises

Level 1

3. Importing a Comma-Delimited Text File into a New SAS Data Set


The employee_payroll.csv file is a comma-delimited text file. The first row in the text file contains
appropriate names for the columns in the new SAS data set. The first eight rows of the text file are
shown below.

a. Create MYSAS.employee_payroll from the data in the employee_payroll.csv file.


b. Do not include the columns for Marital_Status or Dependents.
c. Change the type for Employee_ID to Number.
d. Change the format for Salary to Currency.

Level 2

4. Creating a Data Set with Only the Active Employees by Modifying the Generated SAS Program
Subset the employee_payroll_b data set to include only current employees who do not have a
termination date.

a. Select Tasks  Data  Filter Data.


b. Select employee_payroll_b.
c. Modify the SAS program so that the name of the data set is Employee_Payroll_Active.
d. Include only those employees with a missing value for Employee_Term_Date.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-28 Chapter 2 Working w ith Data in a Project

Partial Results

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Solutions 2-29

2.4 Solutions
Solutions to Exercises
1. Creating a New SAS Data Set from an Excel Spreadsheet
a. Add the Phones spreadsheet from the employee_info.xlsx workbook.
b. Open the employee_info.xlsx workbook using the drag-and-drop technique and create a new SAS
data set named employee_info_phones. Save it in the MYSAS library.

 Read the data from the Phones worksheet and indicate that the first row contains field
names.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-30 Chapter 2 Working w ith Data in a Project

2. Creating a New SAS Data Set from a Worksheet


a. Add the address table from employee_info.xlsx.
b. Create a SAS data set named employee_info_address from the address worksheet contained in
the employee_info workbook.
3. Importing a Comma-Delimited Text File into a New SAS Data Set
The employee_payroll.csv file is a comma-delimited text file. The first row in the text file contains
appropriate names for the columns in the new SAS data set. The first eight rows of the text file are
shown below.

a. Create MYSAS.employee_payroll from the data in the employee_payroll.csv file.


1) Drag and drop employee_payroll.csv into the new workspace. Change the library name and
the filename. Select None or Default for the end of line delimiter.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Solutions 2-31

2) Run it.
b. Do not include the columns for Marital_Status or Dependents.
c. Change the type for Employee_ID to Number.
d. Change the format for Salary to Currency (Dollar14.2).
Use the employee_payroll.sas program to make the changes.
data mysas.employee_payroll_b;
set mysas.employee_payroll_a;
employee_id1=input(employee_id,7.);
salary1=input(salary,14.0);
format salary1 dollar14.0;
drop employee_id salary marital_status dependents;
run;
4. Creating a Data Set with Only the Active Employees by Modifying the Generated SAS Program
Subset the employee_payroll_b data set to include only current employees who do not have a
termination date.
a. Select Tasks  Data  Filter Data.
b. Select employee_payroll_b
c. Modify the SAS program so that the name of the data set is Employee_Payroll_Active.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-32 Chapter 2 Working w ith Data in a Project

d. Include only those employees with a missing value for Employee_Term_Date. Run the code.

Partial Results

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 3 Combining Data with
Queries

3.1 Introduction to Filtering, Sorting, and Querying Data .............................................. 3-3

3.2 Joining Tables ........................................................................................................... 3-5


Demonstration: Joining Tables .................................................................................. 3-9
3-2 Chapter 3 Combining Data w ith Queries

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Introduction to Filtering, Sorting, and Querying Data 3-3

3.1 Introduction to Filtering, Sorting, and


Querying Data

Objectives
 State the function of the Filter and Sort tasks.
 State the function of the New Query task.
 Compare the functionality available in each task.

3
3

The Process of Data Analysis


 Accessing data sources
 Combining tables
 Transforming variables
 Exploring and describing data
 V isualizing patterns in the data
 Analyzing and modeling data

4
4

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-4 Chapter 3 Combining Data w ith Queries

Filter, Sort, and New Query Tasks


The Filter task, Sort task, and New Query task can be
used to create a new data source from one or more tables
according to the criteria specified by the user.

5
5

Filter Task
The Filter enables you to create a new SAS table by
selecting rows and columns.

6
6

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Joining Tables 3-5

Sort Task
The Sort Task enables you to create a new SAS table by
creating a sort sequence.

7
7

Sort Task

8
8

A query is a collection of specifications that enables you to focus on a particular set of data. The Filter
task, the Sort task, and the Query Builder can be used to build these query specifications.
Behind the scenes, the Filter task, the Sort task, and the Query Builder generate Structured Query
Language (SQL) code.

3.2 Joining Tables

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-6 Chapter 3 Combining Data w ith Queries

Objectives
 Join multiple tables by common columns.
 Include only matching rows.

10
10

Business Scenario
In a query of the TOPPRODUCTS table, you select only
products where total profits exceed $500. After producing
this list, you want to also see the product category as well
as the product, supplier, and country names. The columns
to include come from three different tables.

TOPPRODUCTS

PRODUCTS

COUNTRY_LOOKUP

11
11

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Joining Tables 3-7

Business Scenario
To include the necessary columns, the TOPPRODUCTS
SAS table must be joined with the PRODUCTS SAS table
and the COUNTRY_LOOKUP Excel spreadsheet (or SAS
table).

12
12

Joining Tables
Joining tables enables you to extract and simultaneously
process data from more than one table.

13
13

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-8 Chapter 3 Combining Data w ith Queries

You might need to join tables to accomplish the following:


 create a report with columns from more than one table
 compute a new column when the input columns are located in separate tables
 add information from a lookup table into a base table
 identify values of a column that do or do not occur in other tables

Tables that participate in the join should have a common column. This column should contain values that
can be matched exactly or can be easily compared between at least two of the tables. A join created on
columns whose values match exactly is called an inner join or equijoin and is the default join type in
SAS University Edition. In addition, greater-than joins, less-than joins, non-equijoins (theta joins), and
outer joins are also supported.

The Query Builder attempts to join tables by columns that have the same name and type. If no column
name and type matches are found, then a warning message tells you to join the columns manually.

Join Properties
The Join Properties window enables you to modify the
join type or condition. Selecting a different join type can
be used to identify or eliminate nonmatching rows.

14
14

To modify the join type or apply a condition, click the down arrow to get a list of the other joins.
The following join types are available:

SAS University Edition Join Option Join Type

Matching rows only given a condition (default) Inner join

All rows from the left table given a condition Left join

All rows from the right table given a condition Right join

All rows from both tables given a condition Full outer join

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Joining Tables 3-9

Joining Tables

Someone in the company built a query where products with total profits exceeding $500 were identified.
Analysts asked for more details about these top products, including the product category and the product,
supplier, and country names. To include the necessary columns, the TOPPRODUCTS SAS table must be
joined with the PRODUCTS SAS table and the COUNTRY_LOOKUP Excel spreadsheet or SAS table
using the Query Builder.

1. Click the New Options tab.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-10 Chapter 3 Combining Data w ith Queries

2. Select New Query.

3. Click the plus sign and then click the table.

4. Select the data library and then select the TOPPRODUCTS table. Click OK.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Joining Tables 3-11

5. Repeat for the PRODUCTS and COUNTRY_LOOKUP tables.

6. Click the plus sign and then click Join. Put TOPPRODUCTS in the left table and PRODUCTS in
the right table.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-12 Chapter 3 Combining Data w ith Queries

7. Click Save. University Edition automatically selects the join columns if there are appropriate columns
to join. You can override this.

8. Click the plus sign and click Join. Put PRODUCTS in the left table and COUNTRY_LOOKUP in
the right table.
9. Click Save.

10. Now add Supplier_Country and country_key.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Joining Tables 3-13

11. Click the Columns tab and expand the three tables. Drag and drop the following columns to add each
to the Select tab: Product_ID, SUM_of_Profit, Product_Category, Product_Name,
Supplier_Name, and Country_Name.

12. Run the query. Verify the results.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-14 Chapter 3 Combining Data w ith Queries

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 4 Transforming Data

4.1 Creating New Columns with an Expression ............................................................. 4-3


Demonstration: Creating a Column with an Expression ................................................ 4-8
Exercises............................................................................................................... 4-9

4.2 Creating New Columns by Recoding Values .......................................................... 4-12


Demonstration: Creating a New Column by Recoding Values ..................................... 4-14

Exercises............................................................................................................. 4-15

4.3 Solutions ................................................................................................................. 4-17


Solutions to Exercises ........................................................................................... 4-17
4-2 Chapter 4 Transforming Data

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Creating New Columns w ith an Expression 4-3

4.1 Creating New Columns with an


Expression

Objectives
 Explain the use of the DATA, SET, and WHERE
statements.
 Define a new column of data by building an
expression.

3
3

The Process of Data Analysis


 Accessing data sources
 Combining tables
 Transforming variables
 Exploring and describing data
 V isualizing patterns in the data
 Analyzing and modeling data

4
4

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-4 Chapter 4 Transforming Data

Business Scenario
Orion Star wants to analyze shipment methods by
determining how many days elapse between each order
date and delivery date. The company also wants to
calculate the total amount invoiced to the customer, which
is the sum of the total retail price and the shipping charges.

Delivery_Date – Order_Date

5
5

DATA Statement – subset.sas


The DATA statement begins a DATA step and provides
the name of the SAS data set to create.

data work.subset1; DATA output-SAS-data-set;


set data.orders;
where Profit < 20 and
Order_Type = 2;
run;

A DATA step can create temporary or permanent data


sets.

 The rules for SAS variable names also apply to


data set names.
6
6

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Creating New Columns w ith an Expression 4-5

SET Statement
The SET statement reads observations from an existing
SAS data set for further processing in the DATA step.

data work.subset1; SET input-SAS-data-set;


set data.orders;
where Profit < 20 and
Order_Type = 2;
run;

 The SET statement reads all observations and all


variables from the input data set.
 Observations are read sequentially, one at a time.
 The SET statement can read temporary or permanent
data sets.

7
7

WHERE Statement
The WHERE statement selects observations from
a SAS data set that meet a particular condition.

data work.subset1; WHERE WHERE-expression;


set data.orders;
where Profit < 20 and
Order_Type = 2
run;

The variables named in the WHERE expression must


exist in the input SAS data set.

8
8

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-6 Chapter 4 Transforming Data

Viewing the Log


Partial SAS Log

SAS read 33 of the observations.

9
9

Computed Columns – transform.sas


data work.subset1;
set data.orders;
Time_to_Delivery=Delivery_Date
- Order_Date;
keep Time_to_Delivery Delivery_Date
Order_Date;
run;

10
10

Columns can be computed as summarized data (such as the sum of a variable), a recoded column (such as
grouping values from 0 to 100 as “low”), or an advanced expression (calculated with a mathematical
formula, a function, or some combination of both).

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Creating New Columns w ith an Expression 4-7

SAS Functions
A SAS function is a routine that returns a value
that is determined from specified arguments.

General form of a SAS function:

function-name(argument1,argument2, ...)

Example:

sum(Salary,Bonus)

11
11

Using SAS Functions


SAS functions can do the following:
 perform arithmetic operations
 compute sample statistics (for example, sum, mean,
and standard deviation)
 manipulate SAS dates
 process character values
 perform many other tasks

 Sample statistics functions ignore missing values.

12
12

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-8 Chapter 4 Transforming Data

Creating a Column with an Expression

Orion Star would like to calculate the total amount invoiced to the customer, which is the sum of the total
retail price and the shipping charges. Use the transform.sas program to create a table named shipping
with the new computed column.

1. Open the transform.sas program by dragging it over to the workspace.


2. In the DATA statement, change work.transform to work.shipping.
3. Add Invoice_Amount=sum(Total_Retail_price, Shipping); after the Time_to_Delivery assignment
statement.

4. Add format Invoice_Amount Dollar10.2; after the Invoice_Amount assignment statement.


5. Replace the KEEP statement with the following:
keep Order_ID Order_Date Delivery_Date Product_ID Total_Retail_Price Shipping
Profit Time_to_Delivery Invoice_Amount;
data work.shipping;
set Data.Orders;
Time_to_Delivery=Delivery_Date - Order_Date;
Invoice_Amount=Sum(Total_Retail_Price, Shipping);
format Invoice_Amount Dollar10.2;
keep Order_ID Order_Date Delivery_Date Product_ID
Total_Retail_Price Shipping Profit Time_to_Delivery
Invoice_Amount;
run;
6. Run the code.

7. To emphasize the orders with the longest delivery time, sort the table in descending sequence by
Time_to_Delivery. Click Time_to_Delivery.
Partial Results (8 of 617 rows)

8. Save the program to the SAS Code folder with the name shipping.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Creating New Columns w ith an Expression 4-9

Exercises

Level 1

1. Joining Data from Two Tables


Use the New Query Builder to create a new table whose data results from a join of two tables. Create
a new column.
a. Use the New Query Builder to join employee_payroll and employee_addresses to create a table
named payroll_location.
b. Make sure that the join between the two tables is on the Employee_ID column.
c. Include the following columns on the Select Data tab: Employee_ID, Employee_Name,
Employee_Gender, Birth_Date, Salary, Street_Number, Street_Name, City, State,
Employee_Term_Date, and Country.

d. Run the new query.


e. Write a program for parts f through j below. Name the output file payroll_location_1.
f. Format the Salary column with a dollar sign, comma, and two decimal places.
g. Create a new column named Bonus that represents 1.5% of the Salary column. Format the values
with a dollar sign, comma, and two decimal places.

h. Create New_Salary, which is the current salary plus an additional 2% raise. Format the values
with a dollar sign, comma, and two decimal places.

i. Include only active employees in the output table. In other words, include only employees who
have a missing value for Employee_Term_Date.

j. Insert drop employee_term_date; directly before the RUN statement.


k. Run the code and view the results. Save the program in the SAS Code folder as Location.sas.
Partial Results (8 of 308 rows)

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-10 Chapter 4 Transforming Data

Level 2

2. Creating a Calculated Column Using a Function


Create a table from the employee_donations table with a new column that indicates the total
contributions from each employee. Open the employee_donations table.
a. Write a program to create a table named mysas.donations.
b. Include the Employee_ID, Recipients, and Paid_By columns in the new table.
c. Create a new column named Total_Donations, which totals the values of the four quarterly
donations. Format the new column to display dollar signs, commas, and two decimal places.

 Use a function to calculate the total, thereby ignore any missing values that might be in
the input columns.

d. Run the program.


e. Save the program in the SAS Code folder as Donations.sas.
Partial Results (8 of 124 rows)

Level 3

3. Creating a Program to Calculate the Number of Years Employed


Create a program to create a table named years_at_work in the MYSAS folder that includes active
employees (those with no termination date) and the number of years that each has been employed.
a. Use the Employee_Payroll table as the input table. Include Employee_ID and
Employee_Hire_Date. Include only employees without a termination date.
b. Create a new column named Years Employed that calculates the number of years that each
employee worked at Orion Star, based on the hire date and today's date. Apply a format that
displays the values rounded to one decimal place.

Hint: The TODAY function can be used to provide the current date. The function
YRDIF(start-date,end-date,actual) computes the years. The format is 4.1.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Creating New Columns w ith an Expression 4-11

c. Run the program and verify the results. Save the program to the SAS Code folder as
Howlong.sas.
Partial Results (8 of 308 rows).

 Your results will differ from those shown above.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-12 Chapter 4 Transforming Data

4.2 Creating New Columns by Recoding


Values

Objectives
 Recode individual values or a range of values
in a column.

18
18

Business Scenario
To further analyze profit per order, management would
like to categorize each order in the following ranges:
 $0 to $100
 $100 to $500
 $500 and Above

19
19

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Creating New Columns by Recoding Values 4-13

Recoded Values: The Logic


Recoding a column enables you to assign a value to a
new column based on the value of an existing column.
TRUE Then
When Order_Type=1 Order_Type_Detail
= 'Retail Sale'
FALSE

TRUE Then
When Order_Type=2 Order_Type_Detail
= 'Catalog Sale'
FALSE

TRUE Then
When Order_Type=3 Order_Type_Detail
= 'Internet Sale'
20
20

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-14 Chapter 4 Transforming Data

Creating a New Column by Recoding Values

To further analyze profit per order, Management wants to categorize the profit value for each order
as follows:
 $0 to $100
 $100 to $500
 $500 and above

SAS Recode.sas.
data work.recode;
set data.orders;
length Profit_Category $ 20;
if Profit <=100 then Profit_Category = "Less or Equal $100";
else if 100.001 <Profit and Profit <= 499.999 then
Profit_Category ="$100 to $500";
else Profit_Category = "Greater Than $500";
keep Customer_ID Order_ID profit Profit_Category;
run;
Use Recode.sas to create a new column named Profit_Category. Run the program
Partial Results

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Creating New Columns by Recoding Values 4-15

Exercises

Level 1

4. Creating a New Column by Recoding Distinct Values


The Country column in the employee_addresses table includes country codes, such as US
and AU. Create a query that includes a new column, also named Country_Name, that recodes the
values US and AU to the full country name.
Write a program to create a table named mysas.country_name. Name the program
country_name.sas and store in the SAS Code folder.

Partial Results (8 of 424 rows)

Level 2

5. Creating a New Column by Recoding Ranges of Values


Write a program to create a new column that groups salary values into four categories. Name the new
column Salary_Range. Recall that Salary is in mysas.employees_payroll.
a. Create a table named mysas. Salary_Range.
b. Include only the Employee_ID, Salary, and Employee_Hire_Date columns. Format
Employee_Hire_Date with the DDMMYYDw. format so that dates appear as 20/10/2008.
c. Create a new column named Salary_Range based on the values in the Salary column.

0 – 24999.99 Below $25K

25000 – 49999.99 $25K to $50K

50000 – 99999.99 $50K to $100K

100000 and above Over $100K

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-16 Chapter 4 Transforming Data

d. Run the program. Save the program in the SAS Code folder as SalaryRange.sas.
Partial Results (8 of 424 rows)

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Solutions 4-17

4.3 Solutions
Solutions to Exercises
1. Joining Data from Two Tables
Use the New Query Builder to create a new table whose data results from a join of two tables. Create
a new column.

a. Use the New Query Builder to join employee_payroll and employee_addresses to create a table
named payroll_location.

b. Make sure that the join between the two tables is on the Employee_ID column.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-18 Chapter 4 Transforming Data

c. Include the following columns on the Select Data tab: Employee_ID, Employee_Name,
Employee_Gender, Birth_Date, Salary, Street_Number, Street_Name, City, State,
Employee_Term_Date, and Country.

d. Run the new query.

e. Write a program for parts f through j below. Name the output file payroll_location_1.
f. Format the Salary column with a dollar sign, comma, and two decimal places.
g. Create a new column named Bonus that represents 1.5% of the Salary column. Format the values
with a dollar sign, comma, and two decimal places.

h. Create New_Salary, which is the current salary plus an additional 2% raise. Format the values
with a dollar sign, comma, and two decimal places.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Solutions 4-19

i. Include only active employees in the output table. In other words, include only employees who
have a missing value for Employee_Term_Date.
j. Insert drop employee_term_date; directly before the RUN statement.
k. Run the code and view the results. Save the program in the SAS Code folder as Location.sas.
data Mysas.Payroll_Location1;
set Mysas.Payroll_Location;
format Salary Dollar12.2;
Bonus = Salary*.015;
format Bonus Dollar10.2;
New_Salary = Salary+.02*Salary;
format New_Salary Dollar12.2;
if Employee_Term_Date= .;
drop Employee_Term_Date;
run;
Partial Results (8 of 308 rows)

2. Creating a Calculated Column Using a Function

Create a table from the employee_donations table with a new column that indicates the total
contributions from each employee. Open the employee_donations table.
a. Write a program to create a table named mysas.donations.
b. Include the Employee_ID, Recipients, and Paid_By columns in the new table.
c. Create a new column named Total_Donations, which totals the values of the four quarterly
donations. Format the new column to display dollar signs, commas, and two decimal places.

 Use a function to calculate the total, and thereby ignore any missing values that might be
in the input columns.

d. Run the program.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-20 Chapter 4 Transforming Data

e. Save the program in the SAS Code folder as Donations.sas.


data mysas.donations;
set data.employee_donations;
Total_Donations=sum(Qtr1, Qtr2, Qtr3, Qtr4);
format Total_Donations Dollar10.2;
run;
Partial Results (8 of 124 rows)

3. Creating a Program to Calculate the Number of Years Employed


Create a program to create a table named years_at_work in the MYSAS folder that includes active
employees (those with no termination date) and the number of years that each has been employed.

a. Use the employee_payroll table as the input table. Include Employee_ID and
Employee_Hire_Date. Include only employees without a termination date.
b. Create a new column named Years Employed that calculates the number of years that each
employee worked at Orion Star, based on the hire date and today's date. Apply a format that
displays the values rounded to one decimal place.
Hint: The TODAY function can be used to provide the current date. The function
YRDIF(start-date,end-date,actual) computes the years. The format is 4.1

c. Run the program and verify the results. Save the program to the SAS Code folder as
Howlong.sas.

Partial Results (8 of 308 rows).

 Your results will differ from those shown above.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Solutions 4-21

data Mysas.Years_At_Work;
set Mysas.Employees_Payroll;
Years_Employed = Yrdif(Employee_Hire_Date,Today(),Actual);
format Years_Employed 4.1;
keep Employee_ID Employee_Hire_Date Years_Employed;
if Employee_Term_Date = .;
run;
4. Creating a New Column by Recoding Distinct Values
The Country column in the employee_addresses table includes country codes, such as US
or AU. Create a query that includes a new column also named Country_Name that recodes the
values of US and AU to the full country name.
Write a program to create a table named mysas.country_name. Name the program
country_name.sas and store it in the SAS Code folder.

Partial Results (8 of 424 rows)

data Mysas.Country_Name;
set Data.Employee_Addresses;
length Country_Name $15;
if Upcase(Country )= "Us" Then Country_Name = "United States";
else If Upcase(Country )= "Au" Then Country_Name = "Australia";
else Country_Name = "Missing";
run;
5. Creating a New Column by Recoding Ranges of Values
Write a program to create a new column that groups salary values into four categories. Name the new
column Salary_Range. Recall that Salary is in mysas.employees_payroll.
a. Create a table named mysas. Salary_Range.
b. Include only the Employee_ID, Salary, and Employee_Hire_Date columns. Format
Employee_Hire_Date with the DDMMYYDw. format so that dates appear as 20/10/2008.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-22 Chapter 4 Transforming Data

c. Create a new column named Salary_Range based on the values in the Salary column.

0 – 24999.99 Below $25K

25000 – 49999.99 $25K to $50K

50000 – 99999.99 $50K to $100K

100000 and above Over $100K

d. Run the program. Save the program in the SAS Code folder as SalaryRange.sas.
Partial Results (8 of 424 rows)

data Mysas.Salary_Range;
set Mysas.Employees_Payroll;
if Salary > 0 and Salary <24999.999 then
Salary_Range = "Below $25K";
else if Salary >=25000 And Salary < 49999.999 then
Salary_Range = "$25K To $50K";
else if Salary >50000 And Salary <99999.999 then
Salary_Range = "$50 To $100K";
else if Salary > 100000 Then Salary_Range = "Over $100K";
else Salary_Range = "Missing";
format Employee_Hire_Date DDMMYY10.;
keep Employee_ID Employee_Hire_Date Salary_Range Salary;
run;

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 5 Exploring Data with
Tasks

5.1 Creating a Frequency Report .................................................................................... 5-3


Demonstration: Generating a One-Way Frequency Report............................................ 5-9
Exercises............................................................................................................. 5-11

5.2 Generating Summary Statistics .............................................................................. 5-12


Demonstration: Summarizing Data.......................................................................... 5-17

Exercises............................................................................................................. 5-25

5.3 Filtering Data in a Task ............................................................................................ 5-28


Demonstration: Filtering Data in a Task .................................................................... 5-31

Exercises............................................................................................................. 5-36

5.4 Comparing Means ................................................................................................... 5-38


Demonstration: Comparing Means .......................................................................... 5-41
Demonstration: Distribution Analysis........................................................................ 5-45

Demonstration: Nonparametric One-Way ANOVA ...................................................... 5-49

5.5 Solutions ................................................................................................................. 5-53


Solutions to Exercises ........................................................................................... 5-53
5-2 Chapter 5 Exploring Data w ith Tasks

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating a Frequency Report 5-3

5.1 Creating a Frequency Report

Objectives
 Access the One-Way Frequencies task.
 Generate a one-way frequency report and bar chart.

3
3

The Process of Data Analysis


 Accessing data sources
 Combining tables
 Transforming variables
 Exploring and describing data
 V isualizing patterns in the data
 Analyzing and modeling data

4
4

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-4 Chapter 5 Exploring Data w ith Tasks

Business Scenario
Orion Management wants a report that counts the number
of products offered for each category.

5
5

What Is a Task?
A task is a specific type of analysis,
report, or data manipulation that you
can perform against data in a project.
A task is typically referenced by its
description.

6
6

SAS University Edition tasks generate SAS code and formatted results. The tasks include SAS procedures
that range from simple data listings to more complex analytical procedures. Expand the Task area in the
left navigation pane. Here you see five categories of tasks: Data, Graph, Combinatorics and Probability,
Statistics, and Forecasting. Within these categories are individual procedures. A list of all the procedures
is shown below.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating a Frequency Report 5-5

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-6 Chapter 5 Exploring Data w ith Tasks

One-Way Frequencies Task


The One-Way Frequencies task can be used to generate
frequency counts, percentages, graphs, and statistical
tests to analyze the distribution of your data.

7
7

A one-way frequency report shows the distribution of a variable’s values.


To access the One-Way Frequencies task, select Tasks  Statistics  One-Way Frequencies from
the navigation pane.

One-Way Frequencies: Task Roles


A frequency report is generated for each column assigned
to the Analysis variables role.

8
8

The One-Way Frequencies task produces a one-way frequency report for each variable included in the
analysis variable role. Binomial and chi-square tests can be requested on the Options tab.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating a Frequency Report 5-7

One-Way Frequencies: Task Roles


The Group analysis by role can be used to produce
separate frequency reports for each unique value of a
variable.

*Bar charts are not shown on the slide but are


display ed in SAS Studio.
9
9

One-Way Frequencies: Task Roles


The Frequency count role specifies the variable to use as
the frequency count. When you assign a variable to this
role, each observation in the table is assumed to
represent n observations, where n is the value of the
frequency count for that row. Only one variable can be
assigned to this role.

10
10

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-8 Chapter 5 Exploring Data w ith Tasks

One-Way Frequencies: Additional Options


Use additional options to customize the report.

11
11

For binomial proportions, specify a test proportion (null hypothesis proportion value) and confidence
level. Select the tests to perform. To compute the Monte Carlo estimates of the exact p-values instead of
directly computing the exact p-values, select the Use Monte Carlo estimation check box. Monte Carlo
estimation can be useful for large problems that require a great amount of time and memory for exact
computations but for which asymptotic approximations might be insufficient. Exact Computations
Methods Limit computation time specifies the time limit (in seconds) for the computation of each p-value
for each crosstabulation table. The default is 300 seconds (or 5 minutes).

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating a Frequency Report 5-9

Generating a One-Way Frequency Report

Orion management wants a report that summarizes the number of products offered for each category.
Use the One-Way Frequencies task to generate the report.

1. In the Tasks section of the navigation pane, expand the Statistics folder and double-click One-Way
Frequencies.

2. On the DATA tab, select DATA.PRODUCTS as the desired data set for analysis.
3. Under ROLES, click the + icon to assign the variable Product_Category to the analysis variable
role. Click OK.

4. On the OPTIONS tab, expand FREQUENCIES AND PERCENTAGES. To include only the
frequency and percent statistics, select Frequency table and then select Include percentages

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-10 Chapter 5 Exploring Data w ith Tasks

5. Select Run to generate the report and examine the results. View the report in the RESULTS tab.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating a Frequency Report 5-11

Exercises

Level 1

1. Using the One-Way Frequencies Task to Generate Frequency Counts and Percentages
a. Use the data set DATA.employee_organization and the One-Way Frequencies task to generate a
report that analyzes the Department column.
b. Modify the One-Way Frequencies task to display only frequencies and percentages.
c. Generate the report and examine the results.
d. Save the project as Dept_Freq.sas.

Level 2

2. Using the One-Way Frequencies Task to Produce a Grouped Report


a. Use the data set DATA.employee_organization and the One-Way Frequencies task to analyze
Job_Title, grouped by Department. Include only frequencies and percentages.
b. Exclude the distribution plots from the results.
c. Click Run and analyze the results.
d. Save the project as Job_Freq.sas.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-12 Chapter 5 Exploring Data w ith Tasks

5.2 Generating Summary Statistics

Objectives
 Create a summary table.
 Create a summary report.
 Export summary data to Microsoft Excel.

14
14

Business Scenario
Management would like a report that includes a summary
of profits for each product line and category. In this
scenario, you use the data set Orion_Profit.

15
15

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-13

Summary Statistics Task


The Summary Statistics task can be used to calculate
summary statistics based on groups within the data. You
can produce reports, graphs, and data sets as output.

16
16

Summary Statistics Task


The Summary Statistics task is located under Tasks 
Statistics  Summary Statistics, which is used to set
up the results.

17
17

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-14 Chapter 5 Exploring Data w ith Tasks

Summary Statistics: Task Roles


After you drag the Summary Statistics task to the Work
pane, you can assign variable roles.

Compute statistics
for each numeric
variable in the list.

Specify variables whose


values define subgroups.

18
18

Additional roles include the following:

Group Separate descriptive statistics are generated for each group. The groups
are determined by the values of the variable that you assign to this role.

Frequency count This is a variable whose value represents the number of times that the
row should be counted.

Weight This is a variable whose value for each row is used to calculate weighted
means, variances, and sums.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-15

Summary Statistics: Statistics and Results


On the Options tab, you can choose statistics and results
to include, including a report and graphics

19
19

On the OPTIONS tab, you can choose to include extra information in your output, such as specific
statistics, percentiles, plots, and details about the divisor for standard deviation and variance. The extra
options include the following:
 Mean  Standard Deviation
 Standard Error  Variance
 Kurtosis  Skewness
 Minimum or Maximum  Percentiles: 1st , 5th , 10th, 90th , 95th, or 99th
 Lower or Upper Quartile  Median
 Coefficient of Variation  Corrected or Uncorrected Sum of Squares
 Histogram  Box Plot

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-16 Chapter 5 Exploring Data w ith Tasks

Summary Statistics: Output Data Set


On the Output tab, you can choose to create an output
data set of the results.

20
20

Under the Output tab, the option to create an output data set is available. After the option is executed, the
data set of the summary statistics can be found in its respective library.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-17

Summarizing Data

Management would like to be able to access an HTML report or a Microsoft Excel file from a shared
network drive. This report should include a summary of profits for each product line and category.

1. To create the summary report and data set, begin by selecting Tasks  Statistics Summary
Statistics from the navigation pane.

2. On the DATA tab, select DATA.ORION_PROFIT as the data set for analysis.
3. On the DATA tab, click the + symbol and add the variable Profit to the analysis variables role.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-18 Chapter 5 Exploring Data w ith Tasks

4. On the DATA tab, click the + symbol next to Classification variables roles, and add the variables
Product_Line and Product_Category.
So far, your screen should match what is shown below. If you are viewing in split screen mode, you
will also see the SAS code automated to the right of the Data pane.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-19

5. Designate statistics to include in the report. On the OPTIONS tab, do the following:
a. Expand Basic Statistics. Clear the check boxes for Standard deviation, Minimum, and
Maximum.

b. Expand Additional Statistics and select Sum.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-20 Chapter 5 Exploring Data w ith Tasks

c. Expand Percentiles and select Median.

6. Create a box plot. On the OPTIONS tab, do the following:


a. Expand PLOTS, and select Comparative box plot to add a graph to the report.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-21

7. Create an output data set of the results


a. Click the OUTPUT tab. Select Create output data set.
b. Name the output data set Profit_sum.

8. Click Run to generate the report. Examine the results.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-22 Chapter 5 Exploring Data w ith Tasks

9. Create a more descriptive name for the task. Click the Save As button and save the task as Profit by
Product Summary under MY SAS FILES.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-23

10. Export the output data set to Excel.


a. Expand Snippets on the navigation pane and then expand Data.

b. Double-click Generate CSV File to open snippet code to generate an Excel file.

c. Edit the SAS code to represent the correct output data set name. For this demonstration, replace
data = sashelp.cars with the output data set that you created in step 8, data =work.profit_sum.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-24 Chapter 5 Exploring Data w ith Tasks

d. Replace %let _DATAOUT_NAME=cars.csv with %let _DATAOUT_NAME=profit_sum.csv.


Your corrected code should look like this:

e. Click Run to export the Excel file. If the code runs properly, a window appears that asks whether
you want to open the excel file. Select Open.
Partial Results

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-25

Exercises

Level 1

3. Creating a Summary Report of Salary by Department


Use the data set data.employee_master and the Summary Statistics task to generate a report
analyzing the salaries of all employees by department.
a. Use the Summary Statistics task to generate a report on the Salary column by Department.
b. Include the mean, minimum, maximum, and median salary for each group.
c. Create an output SAS data set named salary_summary that includes the calculated statistics.
d. Submit the task and view the report. Save the task as Salary by Dept Summary.

e. Export the output data to an Excel file named Salary_Summary.csv.


Here are the results that you should obtain:

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-26 Chapter 5 Exploring Data w ith Tasks

Level 2

4. Creating a Summary Report of Salary by Department and State


a. Use the data set data.employee_master and the Summary Statistics task to summarize Salary by
Department first, and then by State.
b. Include the mean, minimum, maximum, and median salary for each group.
c. Submit the task and view the report. Save the task as Salary by Dept/State Summary.
Partial Results

Level 3

5. Modifying the Summary Statistics Task Code to Include an Additional Statistic


a. Use the data set data.employee_master and the Summary Statistics to summarize Salary by
Country.

b. Include the mean, minimum, maximum, and median salary for each group, and create a
histogram.

c. Within the Summary Statistics task, on the OPTIONS tab, expand Additional Statistics. Select
the Skewness box. Click Run.

 Skewness is a measure that indicates to what degree the distribution of data values has a
longer tail to the left or to the right. For the Salary distribution for both US and AU, there
are a few extreme values at the high end and a concentration of values at the low end of
the distribution, so the right tail of the distribution curve is longer (right-skewed). For
such data, the skewness statistic is greater than zero. If the data are left-skewed, the
skewness statistic is less than zero.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Generating Summary Statistics 5-27

d. Examine the output and verify that the additional skewness statistic was added to the results.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-28 Chapter 5 Exploring Data w ith Tasks

5.3 Filtering Data in a Task

Objectives
 Filter data directly in a task.
 Using various operators, build multiple filter criteria.

23
23

Business Scenario
The Marketing Department at Orion Star wants to run
a special promotion featuring products purchased by
customers between the ages of 19 and 38 who have
placed orders since May 2003.

24
24

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Filtering Data in a Task 5-29

Filtering Data Task


The Filtering Data task enables you to apply filters to the
input data to select which rows should be included in the
task.
This can be found under Tasks  Data  Filter Data.

25
25

Filtering Data: Task Roles


After you drag the Data Filter task to the Work pane, you
can assign variable roles.
Choose input data set
to filter

Select variable that you


would like to filter on

Logical operator to add


additional filter
variables (and, or)

Create output data set


of filtered data
26
26

Available operators include those listed below:


 Equal to/Not equal to
 In a list/Not in a list
 Less than/Less than or equal to
 Greater than/Greater than or equal to
 Between/Not between
 Contains/Does not contain

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-30 Chapter 5 Exploring Data w ith Tasks

 Is missing/Is not missing

Filter Data: Output Data


After you run your Filter Data task, the output data is
created, and you are able to view the data set and select
variables to view or add more filters.

27
27

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Filtering Data in a Task 5-31

Filtering Data in a Task

The Marketing Department at Orion Star wants to run a special promotion featuring products purchased
by customers between the ages of 19 and 38 who placed orders since May 2003. Modify the Summary
Statistics task to add an appropriate filter.

1. In the navigation pane, expand Tasks. Then expand Data and select Filter Data.

2. On the DATA tab, select DATA.ORION_PROFIT as the data set.


3. To include customers in the target age range, click the + symbol beside Variable 1 and assign
Customer_Age to the Variable 1 role.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-32 Chapter 5 Exploring Data w ith Tasks

4. After Customer_Age is assigned to Variable 1, you want Customer_Age to be greater than or equal
to 19.
a. In the Comparison field, click the drop-down arrow and select Greater than or equal to.
b. In the Value type field, click the drop-down arrow and select Enter a Value.
c. In the Value field, enter 19.

5. You also want to include customers who are 38 or younger. Therefore, in the Logical field, click the
drop-down arrow and select AND. This action adds a second filter to include this condition.

6. Assign the variable Customer Age to the Variable 2 role.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Filtering Data in a Task 5-33

7. Select the correct parameters to include customers 38 and younger.


a. In the Comparison field, click the drop-down arrow and select Less than or equal to.
b. In the Value type field, click the drop-down arrow and select Enter a Value.
c. In the Value field, enter 38.

8. Now you want to include customers who have placed orders since May 2003. To add a third filter,
select the drop-down arrow in the Logical field and select AND.

9. Select Order Date as Variable 3.


10. Select the correct parameters to include customers 38 and younger.
a. In the Comparison field, click the drop-down arrow and select Greater than or equal to.
b. In the Value type field, click the drop-down arrow and select Enter a Value.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-34 Chapter 5 Exploring Data w ith Tasks

c. In the Value field, enter '01MAY2003'd. (Be sure to include the single quotation marks.)

11. Expand OUTPUT DATA SET and create a data set named FILTER_PROFIT. Select the show
output data check box.

12. Click Run. Check the output data set to make sure that it is filtered the way that you want.
Partial Results

13. Open the previous demonstration that was saved as Profit by Product Summary.
14. On the DATA tab, select WORK.FILTER_PROFIT as the data set to analyze

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Filtering Data in a Task 5-35

15. Assign Profit to the analysis variables role.


16. Assign Product line and Product Category to the classification variable role.
17. Click Run.

18. Save the project as filter_profit.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-36 Chapter 5 Exploring Data w ith Tasks

Exercises

Level 1

6. Using the Filter Data Task to Display San Diego Employees


Partial Results

a. Select the DATA.EMPLOYEE_ADDRESSES data set and use the Filter Data task so that only
San Diego employees are included in the report.

b. Save the output data set as FILTER_ADDRESS_SD.


c. Click Run.
d. Select Tasks  Data  List Data. Use the filtered data set and include the following variables:
Employee_ID, Employee_Name, Street_Number, Street_Name, and Postal_Code.

e. Click Run
f. Save the project as output_SanDiego.

Level 2

7. Creating a Grouped List Report for All Employees from Australia

a. Select the DATA.EMPLOYEE_ADDRESSES data set and use the Filter Data task so that all
employees from Australia are included in the report.

 Include all variations of AU.

b. Save the output data set as FILTER_ADDRESS_AU.


c. Click Run.
d. Select Tasks  Data  List Data. Use the filtered output data set, filter_address_au, and
include the following variables: Employee_ID, Employee_Name, Street_Number,
Street_Name, and Postal_Code.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Filtering Data in a Task 5-37

e. Click Run.
f. Save the project as output_AU.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-38 Chapter 5 Exploring Data w ith Tasks

5.4 Comparing Means

Objectives
 Compare means using a t test.
 Run a distribution analysis.
 Run a nonparametric one-way ANOVA.

30
30

Business Scenario
The Human Resources Department is involved in a
situation that requires them to determine whether the
mean salary is different between men and women.

31
31

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-39

t Test
A two-sample t test is used to determine whether the
means of two populations are equal. The test assumes
that the data are normally distributed for the t test to be
valid.
This analysis tests whether
the mean salary for women
is different from the mean
salary of men.

32
32

Test the Distribution


The resulting distribution plot from the t test (at left) shows
that the data are not normally distributed. Run the
distribution analysis task (at right) to test for normality of
the data.

33
33

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-40 Chapter 5 Exploring Data w ith Tasks

Nonparametric One-Way ANOVA


When the data are not normally distributed, the
Nonparametric One-Way ANOVA (Analysis of Variance)
task is appropriate to test for group differences. The
Wilcoxon rank-sum test can be used for comparing two
groups.
This can be found at Tasks  Statistics 
Nonparametric One-Way ANOVA.

34
34

 If there are three or more groups, the Kruskal-Wallis test is used when you specify the Wilcoxon
option. The Kruskal-Wallis and the Wilcoxon rank-sum tests are equivalent when two groups are
compared.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-41

Comparing Means

1. Expand Tasks in the navigation pane. Then expand Statistics and select T Tests.

2. On the DATA tab, select MYSAS.EMPLOYEE_PAYROLL as the data set to analyze.


3. On the ROLES tab, click the drop-down arrow for T test and select Two-sample test.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-42 Chapter 5 Exploring Data w ith Tasks

4. Assign Salary to the analysis variable role.

5. Assign Employee_Gender to the group variable role.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-43

6. On the OPTIONS tab, expand the PLOTS section and select Default plots. Default plots include
histogram, box plot, and normality plot.

7. Click Run.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-44 Chapter 5 Exploring Data w ith Tasks

The Satterthwaite p-value of 0.1249 is greater than .05, our significance level. Therefore, we fail to
reject the null hypothesis that the means are equal, and conclude that there is insufficient evidence
that the means between the two groups differ. This indicates that the mean salary of employees does
not differ significantly between genders. This assumes normally distributed data.

However, the distribution plots


show data skewed to the right.
Therefore, a distribution analysis
needs to be completed and the use
of a nonparametric one-way
ANOVA analysis needs to be
considered to accurately depict
this hypothesis. See the
demonstrations that follow.
.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-45

Distribution Analysis

Run a distribution analysis to confirm that the data are not normally distributed.
1. In the navigation pane, expand Tasks. Then expand Statistics and select Distribution Analysis.

2. On the DATA tab, under DATA, select MYSAS.EMPLOYEE_PAYROLL as the data set to analyze.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-46 Chapter 5 Exploring Data w ith Tasks

3. Under ROLES, assign Salary as the analysis variable.

4. Under ADDITIONAL ROLES, assign Employee_Gender as the group analysis by variable.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-47

5. On the OPTIONS tab, expand CHECKING FOR NORMALITY and select the box for Histogram
and goodness-of-fit tests.

6. Click Run.
Distribution of Salary – Female

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-48 Chapter 5 Exploring Data w ith Tasks

Distribution of Salary – Male

Based on the p-value of <0.01 (for both male and female employees), we reject the null hypothesis
that the data are normally distributed. The t test would not be the most appropriate test to compare the
groups in this example. A nonparametric test would be preferable.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-49

Nonparametric One-Way ANOVA

1. In the navigation pane, expand Tasks. Then expand Statistics, and select Nonparametric One-Way
ANOVA.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-50 Chapter 5 Exploring Data w ith Tasks

2. Under DATA, select MYSAS.EMPLOYEE_PAYROLL as the data set to analyze.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Comparing Means 5-51

3. Select Salary as the dependent variable.

4. Select Employee_Gender as the classification variable.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-52 Chapter 5 Exploring Data w ith Tasks

5. On the OPTIONS tab, expand TESTS and select the Wilcoxon Scores box.

6. Click Run.
The Wilcoxon two-sided Pr > Z of 0.5889 is greater than the significance level of 0.05. You fail to
reject the null hypothesis that the salaries are equal. The salaries of males and females at Orion Star
are not significantly different.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-53

5.5 Solutions
Solutions to Exercises
1. Using the One-Way Frequencies Task to Generate Frequency Counts and Percentages

a. Use the data set DATA.employee_organization and the One-Way Frequencies task to generate a
report that analyzes the Department column.

1) To open the One-Way Frequencies task, select Tasks  Statistics  One-Way Frequencies
from the navigation pane.

2) On the DATA tab, select DATA.EMPLOYEE_ORGANI ZATION as the data set for
analysis.

3) On the DATA tab, under ROLES, click the + icon and select the variable Department to
assign it to the analysis variable role.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-54 Chapter 5 Exploring Data w ith Tasks

b. Modify the One-Way Frequencies task to display only frequencies and percentages.
Click the OPTIONS tab and expand Frequencies and percentages. Select Frequency table and
Include percentages to display frequencies and percentages only.
c. Click Run to generate the report. Examine the results.
d. Save the project as Dept_Freq.

1) Select File  Save Project As.


2) Enter Dept_Freq in the File name field and select Save.
2. Using the One-Way Frequencies Task to Produce a Grouped Report
Partial Results

a. Use the data set DATA.employee_organization and the One-Way Frequencies task to analyze
Job_Title, grouped by Department. Include only frequencies and percentages.

1) To open the One-Way Frequencies task, select Tasks  Statistics  One-Way Frequencies
from the navigation pane.

2) On the DATA tab, under ROLES, click the + icon to add the variable Job_Title to the analysis
variables role. Click OK.
3) On the DATA tab, expand ADDITIONAL ROLES. Click the + icon next to group analysis
by to add Department as the group analysis by variable.
4) On the OPTIONS tab, expand Frequencies and Percentages in the Frequency Table options
pane and select Include cumulative frequencies and percentages

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-55

b. Exclude the plots from the results.


On the OPTIONS tab, expand PLOTS and select Suppress plots.
c. Click Run and analyze the results.
d. Save the project as Job_Freq.
1) Select File  Save Project As.
2) Enter Job_Freq in the File name field and select Save.
3. Creating a Summary Report of Salary by Department
Use the data set DATA.employee_master and the Summary Statistics task to generate a report
analyzing the salaries of all employees by department.

a. Use the Summary Statistics task to generate a report on the Salary column by Department.
1) In the navigation pane, expand Tasks. Select Statistics  Summary Statistics.
2) On the DATA tab, select DATA.EMPLOYEE_MASTER as the data set to analyze.
3) Under ROLES, assign Salary to the analysis variable role.
4) Under ROLES, assign Department to the classification variable role.
b. Include the mean, minimum, maximum, and median salary for each group.
1) On the OPTIONS tab, Expand Basic Statistics and select Mean, Minimum, Maximum, and
Median.

2) Expand Percentiles and select Median.


c. Create an output SAS data set named salary_summary that includes the calculated statistics.
1) On the OUTPUT tab, select the Create output data set box.
2) Enter salary_summary as the data set name.
d. Submit the task and view the report. Save the task as Salary by Dept Summary.
1) Click Run.
2) To save the task, click the Save as button.
3) In the Name field, enter Salary by Dept Summary.

4) Click Save.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-56 Chapter 5 Exploring Data w ith Tasks

5) Save the task as Salary by Dept Summary.

e. Export the output data to an Excel file named Salary_Summary.csv.


1) Expand Snippets in the navigation pane. Expand Data and select Generate CSV File.
2) Change the code to match the code shown below to export the output data set
salary_Summary.

3) Click Run and then open the CSV file that is created.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-57

Partial Results

4. Creating a Summary Report of Salary by Department and State


a. Use the data set DATA.employee_master and the Summary Statistics task to summarize Salary
by Department first, and then by State.

1) Expand Tasks. Select Statistics  Summary Statistics.


2) On the DATA tab, select DATA.employee_master as the data set to analyze.
3) Under ROLES, assign Salary to the analysis variables role.
4) Under ROLES, assign Department and State, in that order, to the classification variables role.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-58 Chapter 5 Exploring Data w ith Tasks

b. Include the mean, minimum, maximum, and median salary for each group.
1) On the OPTIONS tab, select Statistics  Basic Statistics. Then select Mean, Minimum,
Maximum, and Median.
2) Expand Percentiles and select Median.
c. Submit the task and view the report. Save the task as Salary by Dept/State Summary.

1) Click Run.
2) Click the Save as button.
3) In the Name field, enter Salary by Dept/State Summary.
4) Click Save.
Partial Results

5. Modifying the Summary Statistics Task Code to Include an Additional Statistic


a. Use the data set DATA.employee_master and the Summary Statistics to summarize Salary by
Country.

1) Expand Tasks in the navigation pane. Select Statistics  Summary Statistics.


2) On the DATA tab, select DATA.employee_master as the data set to analyze.
3) Under ROLES, assign Salary to the analysis variables role.
4) Under ROLES, assign Country to the classification variables role.
b. Include the mean, minimum, maximum, and median salary for each group, and create a histogram
1) On the OPTIONS tab, select Statistics  Basic Statistics. Then select Mean, Minimum,
Maximum, and Median.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-59

2) Expand Percentiles and select Median.


3) Under PLOTS, select Histogram.
4) Click Run.

c. Within the Summary Statistics task, on the OPTIONS tab, expand Additional Statistics. Then
select Skewness.

 Skewness is a measure that indicates to what degree the distribution of data values has a
longer tail to the left or to the right. For the Salary distribution, for both US and AU,
there are a few extreme values at the high end and a concentration of values at the low
end of the distribution, so the right tail of the distribution curve is longer (right-skewed).
For such data, the skewness statistic is greater than zero. If the data is left-skewed, the
skewness statistic will be less than zero.

Click Run.
d. Examine the output and verify that the additional skewness statistic was added to the results.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-60 Chapter 5 Exploring Data w ith Tasks

6. Using the Filter Data Task to Display San Diego Employees


a. Select the DATA.EMPLOYEE_ADDRESSES data set and use the Filter Data task so that only
San Diego employees are included in the report.
1) In the navigation pane, expand Tasks. Then select Data  Filter Data.
2) On the DATA tab, select DATA.EMPLOYEE_ADDRESSES as the data set to analyze.

3) Use the drop-down arrow in the Variable 1 field to select City.


4) In the Comparison field, click the drop-down arrow and select Equal.
5) In the Value type field, click the drop-down arrow and select Select distinct value.
6) In the Value field, click the drop-down arrow and select San Diego.

b. Save the output data set as FILTER_ADDRESS.

c. Click Run.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-61

Partial Results

d. Select Tasks  Data  List Data. Use the filtered data set and include the following variables
Employee_ID, Employee_Name, Street_Number, Street_Name, and Postal_Code.
1) In navigation pane, expand Tasks and select Data  Filter Data.
2) On the DATA tab, select WORK.FILTER_ADDRESSES as the data set to analyze.

3) Under ROLES, click the + symbol in the List variables area and select the following
variables: Employee_ID, Employee_Name, Street_Number, Street_Name, and
Postal_Code.

e. Click Run.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-62 Chapter 5 Exploring Data w ith Tasks

Partial Results

f. Save the project as output_SanDiego.


7. Creating a Grouped List Report for All Employees from Australia
a. Use the Filter Data task so that all employees from Australia are included in the report.

 Include all variations of AU.

1) In the navigation pane, expand Tasks. Select Data  Filter Data.


2) On the DATA tab, select DATA.EMPLOYEE_ADDRESSES as the data set to analyze.

3) Use the drop-down arrow in the Variable 1 field to select Country.


4) In the Comparison field, click the drop-down arrow and select Equal.
5) In the Value Type field, click the drop-down arrow and select Select Distinct Value.
6) In the Value field, click the drop-down arrow and select AU.
7) In the Logical field, click the drop-down arrow and select OR. The FILTER 2 area appears.
8) Use the drop-down arrow in the Variable 1 field to select Country.
9) In the Comparison field, click the drop-down arrow and select Equal.
10) In the Value Type field, click the drop-down arrow and select Select Distinct Value.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-63

11) In the Value field, click the drop-down arrow and select au.

b. Save the output data set as FILTER_ADDRESS_AU.

c. Click Run. Check the output data set to make sure that the filter was applied properly.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-64 Chapter 5 Exploring Data w ith Tasks

Partial Results

d. Select Tasks  Data  List Data. Use the filtered output data set, filter_address_au, and
include the following variables: Employee_ID, Employee_Name, Street_Number,
Street_Name, and Postal_Code.

e. Click Run and observe the output.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.5 Solutions 5-65

Partial Results

f. Save the project as output_AU.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-66 Chapter 5 Exploring Data w ith Tasks

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 6 Visualizing Data
6.1 Creating a Graph ....................................................................................................... 6-3
Demonstration: Creating a Bar Chart ......................................................................... 6-8
Demonstration: Creating a Stacked Bar Chart ........................................................... 6-14
Exercises............................................................................................................. 6-17

6.2 Creating a Scatter Plot and Pie Chart ..................................................................... 6-18


Demonstration: Creating a Two-Dimensional Scatter Plot with a Regression Line........... 6-19
Demonstration: Creating a Pie Chart ....................................................................... 6-23
Exercises............................................................................................................. 6-26

6.3 Solutions ................................................................................................................. 6-27


Solutions to Exercises ........................................................................................... 6-27
6-2 Chapter 6 Visualizing Data

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-3

6.1 Creating a Graph

Objectives
 Name the types of graphs supported in
SAS University Edition.
 State the differences among the output formats
that are supported in SAS University Edition.
 Create a bar chart.
 Create a grouped/stacked bar chart using the
advanced bar chart interface.

The Process of Data Analysis


 Accessing data sources
 Combining tables
 Transforming variables
 Exploring and describing data
 V isualizing patterns in the data
 Analyzing and modeling data

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-4 Chapter 6 Visualizing Data

Business Scenario
To better understand the demographics of the Orion Star
customers, Management wants to see a comparison of
average customer age by country.

Graphs Using SAS University Edition

SAS University Edition has several Graph tasks that you can choose from. Above are quick examples of a
line graph, pie chart, bar chart, and line bar chart

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-5

Types of Graphs
The graphs are displayed
under Tasks  Graph.

Graphs: Task Roles


Each graph has roles to fill in specific to that task.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-6 Chapter 6 Visualizing Data

Graphs: Task Options


On the OPTIONS tab, each graph task has options that
enable you to customize the graph.

On the Options tab, you can enter titles and footnotes for the graphical tasks and even choose the font size
for each one. Options vary depending on the graph.

Bar Chart Task


The Bar Chart task enables you to do the following:
 select and filter the data using a WHERE clause
 assign variables to roles
 specify a role statistic – mean versus sum

 assign bar orientation


 choose to stack or cluster bars (only for frequency charts
with a group variable)

10

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-7

Bar Chart Task: Options


The OPTIONS tab of the Bar Chart task includes additional
options. For example, you can change bar shape and color,
add titles and footnotes, and modify labels and legend
details.

11

Bar Chart Task: Final Results

12

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-8 Chapter 6 Visualizing Data

Creating a Bar Chart


To better understand the demographics of the Orion Star customers, management wants to see a
comparison of average customer age by country. Use the Bar Chart task to create the result.
1. Expand Tasks in the navigation pane and then expand Graph. Double-click the Bar Chart task

2. On the DATA tab, select DATA.CUSTOMERS as the data set to analyze

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-9

3. On the DATA tab, under ROLES, assign Customer_Country as the category variable by clicking the
+ symbol

4. On the DATA tab, under ROLES, assign Customer_Age as the Response variable by clicking the +
symbol
On the DATA Tab, expand DIRECTION and select the Horizontal radio button.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-10 Chapter 6 Visualizing Data

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-11

5. On the DATA tab, expand STATISTICS, and select Mean as the statistic for the bar chart.

6. On the OPTIONS tab, perform the following actions:


a. Specify Average Customer Age by Country as the title.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-12 Chapter 6 Visualizing Data

b. Expand BAR DETAILS and select the Apply bar color check box. Choose any color by using
the Bar color menu. Change the Color bars by value to Bar category.

c. Still under BAR DETAILS, select Crisp for Data skin.

d. Expand BAR LABELS and select the Show bar labels or statistics box. This displays the
average age on each bar.

e. Expand CATEGORY AXIS and select Show values in data order, and slow label.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-13

7. Select Run.

8. Save the task as Customer Country By Age Bar Chart

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-14 Chapter 6 Visualizing Data

Creating a Stacked Bar Chart


To plan future promotions, Management wants to compare the purchasing habits of men and women
according to their ages and types.
1. Open the task from the previous demonstration, Customer Country By Age Bar Chart.
2. On the DATA tab, select DATA.CUSTOMERS as the data set to analyze
3. Assign Customer_Age_Group as the category variable.
4. Assign Customer_Group as the group variable.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-15

5. On the DATA tab, expand Direction and select Vertical. Then expand Group Layout and select
Stack.

6. On the OPTIONS tab, change the title to Customer Classification.

7. On the OPTIONS tab, expand Category Axis, and do the following:


a. Remove the custom label of Average Customer Age.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-16 Chapter 6 Visualizing Data

b. Clear the box for Show values in data order. This puts the category variable in ascending order.
To change to descending order, select the Reverse check box.

8. Select Run.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Creating a Graph 6-17

Exercises

Level 1
1. Creating a Vertical Bar Chart That Graphs Total Salary for Each City

a. Use the Bar Chart task and select DATA.EMPLOYEE_DETAIL as the data set to analyze.
b. Create a vertical bar chart that shows the sum of Salary for each value of City.
c. Change the following options:
1) Title the graph Total Salary for Each City and change Data skin to Matte.
2) Show data labels at the top of each bar with the sum of Salary.
3) Label the bar height (vertical) axis as Total Salary.
d. Run the task and review the results.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-18 Chapter 6 Visualizing Data

6.2 Creating a Scatter Plot and Pie Chart

Objectives
 Create a scatter plot.
 Enhance a scatter plot by adding a regression line.
 Create a pie chart.

16

Scatter Plot
The default Scatter Plot task creates a two-dimensional
plot that displays the values of two variables as data
points on one horizontal axis (x) and one vertical axis (y).

17

Each pair of x and y values forms a data point. The scatter plot is especially useful for non-summarized,
randomly distributed data. It is designed to accommodate large data sets. The user can set the color, size,
and shape of the markers shown. Interpolation including linear regression is also available in this plot.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating a Scatter Plot and Pie Chart 6-19

Creating a Two-Dimensional Scatter Plot with a Regression


Line
Management is considering offering special free shipping promotions, and wants to understand
the relationship between shipping cost and the cost of products that are currently being sold.
1. Expand Tasks in the navigation pane, and then expand Graph. Double-click the Scatter Plot task.

2. On the DATA tab, select DATA.ORDERS as the data set to analyze.


3. Under ROLES, assign CostPrice_Per_Unit as the X variable by selecting the + symbol.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-20 Chapter 6 Visualizing Data

4. Also under ROLES, assign Shipping as the Y variable by selecting the + symbol.

5. Select Run and view the result.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating a Scatter Plot and Pie Chart 6-21

6. To produce a regression line that shows the best fit through these data points, expand FIT PLOTS
and select the box for Regression.

 The resulting regression line through the scatter plot indicates the relationship between
shipping costs and price per unit shipped.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-22 Chapter 6 Visualizing Data

Pie Chart
Pie charts represent a popular way to display frequencies.
As with bar charts, the size of the slices can also
represent a sum or mean of a response variable.

19

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating a Scatter Plot and Pie Chart 6-23

Creating a Pie Chart

To better understand the demographics of the Orion Star profit, Management wants to see a comparison
of total profit based on product line. Use the Pie Chart task to create the result.
1. Expand Tasks in the navigation pane and then expand Graph. Double-click the Pie Chart task.

2. On the DATA tab, select DATA.ORION_PROFIT as the data set to analyze.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-24 Chapter 6 Visualizing Data

3. Under ROLES, assign the variable Product_Line by selecting the + symbol next to the Category
variable role.

4. Under ROLES, assign the variable Profit by selecting the + symbol next to the Response variable
role.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Creating a Scatter Plot and Pie Chart 6-25

5. Select Run.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-26 Chapter 6 Visualizing Data

Exercises

Level 1
2. Creating a Two-Dimensional Scatter Plot
Management wants to see how an employee's birthdate relates to his or her hire date. Produce a report
that addresses this request.
a. Use the Scatter Plot task and select MYSAS.EMPLOYEE_PAYROLL as the data set to analyze.
b. Create a scatter plot that shows Employee_Hire_Date and Birth_Date.
c. Add a linear regression interpolation line to the graph.
d. Name the scatter plot Hiring Trends.
e. Select Run and analyze the results.
f. Are there any trends or changes in the hiring practices of the company?

Level 2
3. Using the Pie Chart Task to Analyze Salary by Department
a. Use the Pie Chart task and select DATA.EMPLOYEE_DETAIL as the data set to analyze. The
slices should represent the sum of Salary for each department.
b. Show the department name and salary total outside each slice.
c. Change the title to Total Salary by Department and run the task.
d. Select Run.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-27

6.3 Solutions
Solutions to Exercises
1. Creating a Vertical Bar Chart That Graphs Total Salary for Each City
a. Use the Bar Chart task and select DATA.EMPLOYEE_DETAIL as the data set to analyze.
Expand Tasks on the navigation pane. Expand Graph and select Bar Chart.

b. Create a vertical bar chart that shows the sum of Salary for each value of City.
1) On the DATA tab, do the following:
a) Select DATA.EMPLOYEE_DETAIL as the data set to analyze.
b) Under ROLES, assign City to the category variable role.
c) Under ROLES, assign Salary to the response variable role.
d) Under DIRECTION, select Vertical.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-28 Chapter 6 Visualizing Data

e) Under STATISTICS, select Sum.

c. Change the following options:


1) Title the graph Total Salary for Each City and change Data skin to Matte.
On the options tab, do the following:
a) Enter Total Salary for Each City in the Title field.
b) Select Matte in the Data skin field.
2) Show data labels at the top of each bar with the sum of Salary.
On the OPTIONS tab, expand BAR LABELS and select the Show bar labels or statistics
box.
3) Label the bar height (vertical) axis as Total Salary.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-29

Expand RESPONSE AXIS. Select the Show label box, and enter Total Salary in the
Custom label field.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-30 Chapter 6 Visualizing Data

d. Run the task and review the results.

2. Creating a Two-Dimensional Scatter Plot


Management wants to see how an employee's birthdate relates to his or her hire date. Produce a report
that addresses this request.
a. Use the Scatter Plot task and select MYSAS.EMPLOYEE_PAYROLL as the data set to analyze.

b. Create a scatter plot that shows Employee_Hire_Date on the horizontal axis and Birth_Date on
the vertical axis
1) On the DATA tab, select MYSAS.EMPLOYEE_PAYROLL as the data set to analyze.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-31

2) Under ROLES, specify Employee_Hire_Date as the X variable by selecting the + symbol


next to X variable role.
3) Under ROLES, specify Birth_Date as the Y variable by selecting the + symbol next to the Y
variable role.

c. Add a linear regression interpolation line to the graph.


To produce a regression line that shows the best fit through these data points, expand FIT
PLOTS and select the box for Regression.

d. Name the scatter plot Hiring Trends.


1) Under the Options Tab, Type Hiring Trends in the Title Text box.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-32 Chapter 6 Visualizing Data

e. Select Run and analyze the results.

f. Are there any trends or changes in the hiring practices of the company?
Since 2003, the company seems to be hiring employees of a much more diverse age range.
3. Using the Pie Chart Task to Analyze Salary by Department
a. Use the Pie Chart task and select DATA.EMPLOYEE_DETAIL as the data set to analyze where
the slices represent the sum of Salary for each department.
1) Expand Tasks on the navigation pane. Expand Graph and select Bar Chart.
2) On the DATA tab, select DATA.EMPLOYEE_DETAIL as the data set to analyze.
3) Under ROLES, specify Department as the category variable by clicking the + symbol next to
the Category variable role.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-33

4) Under ROLES, specify Salary as the response variable by clicking the + symbol next to the
Response variable role.

b. Show the department name and salary total outside each slice.
On the OPTIONS tab, expand PIE LABELS and select Outside for the Location field.
c. Change the title to Total Salary by Department and run the task.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-34 Chapter 6 Visualizing Data

d. Select Run.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 7 Performing Statistical
Analyses with Tasks

7.1 Analytical Tasks Overview ........................................................................................ 7-3


Demonstration: One-Way ANOVA ........................................................................... 7-10
Demonstration: Simple Linear Regression Analysis.................................................... 7-20

7.2 Performing Statistical Analyses in SAS University Edition.................................... 7-27


Demonstration: Polynomial Regression Analysis ....................................................... 7-29

Demonstration: Customizing a SAS Program ............................................................ 7-42


Demonstration: Gamma Regression in the Generalized Linear Models Task .................. 7-47
7-2 Chapter 7 Performing Statistical Analyses w ith Tasks

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-3

7.1 Analytical Tasks Overview

Objectives
 Describe each analysis in the ANOVA, Regression,
and Multivariate task menus.
 Name an example of a research or business problem
that can be addressed by each type of analysis.

The Process of Data Analysis


 Accessing data sources
 Combining tables
 Transforming variables
 Exploring and describing data
 V isualizing patterns in the data
 Analyzing and modeling data

SAS University Edition has a wide selection of tasks for performing statistical analysis with your data.
With SAS programs, you can do even more. This section provides an overview of the tasks for statistical
analysis in SAS University Edition and examples of applications of each type of analysis.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-4 Chapter 7 Performing Statistical Analyses w ith Tasks

Cars Data
The demonstrations in this chapter use the CARS data
set, which is distributed with SAS software and included
in the Sashelp library.
Data is gathered for the price and physical characteristics
of 428 cars.

Sashelp.CARS

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-5

Sashelp Library
In addition to the MAPS library, SAS comes with data sets
in a library named Sashelp. These data sets are used
throughout the SAS documentation to show examples of
the usage of SAS software.

Expand Libraries in the


navigation pane. Under
My Libraries, expand
SASHELP and locate
the CARS data set.

Analytical Task Overview


Select Tasks on the
navigation pane. Expand
Statistics to see the list
of statistical procedures
that the SAS University
Edition provides.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-6 Chapter 7 Performing Statistical Analyses w ith Tasks

t-Test Task
The t-Test task performs the following actions:
 calls the TTEST procedure
 performs one-sample, paired-samples, and
two-sample t tests
 generates two types of confidence intervals for the
standard deviation
 enables specification of the null hypothesis value for
the test

Examples of t Tests
 Do students show a change in test scores after a
semester in a training program?
 Does revenue differ between two advertising
campaigns?
 Is the average temperature of a refrigerated product
significantly different from the target temperature of
34 degrees Fahrenheit?

10

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-7

One-Way ANOVA Task


The One-Way ANOVA task performs the following
actions:
 calls the ANOVA procedure
 performs normal and variance-weighted analysis of
variance to compare group means
 performs tests of equal variance
 performs mean comparison with a variety of methods
for controlling experimentwise or comparisonwise
type I error
 produces relevant plots for comparing means with
ANOVA

11

Examples of One-Way ANOVA


 Do three treatments differ in the change in blood
pressure produced?
 Do four types of paint have significantly different
visibility on a roadway?
 Do male and female students differ in scores on a
reading comprehension test?

12

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-8 Chapter 7 Performing Statistical Analyses w ith Tasks

Nonparametric One-Way ANOVA Task


The Nonparametric One-Way ANOVA task performs the
following actions:
 calls the NPAR1WAY procedure
 performs one-way analysis of variance without
assuming a normal distribution of the data
 offers a variety of tests for comparing groups
 offers exact and approximate tests

13

Examples of Nonparametric One-Way ANOVA


 Do three dose groups differ in reaction time after
taking a drug?
 Are there significant differences among the weights
of newly hatched birds for two habitats?

14

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-9

One-Way ANOVA Task


Compare the manufacturer’s suggested retail price
(MSRP) for cars made in the U.S., Europe, and Asia
using the Sashelp.CARS data set. Compare individual
groups using pairwise comparisons.

15

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-10 Chapter 7 Performing Statistical Analyses w ith Tasks

One-Way ANOVA

Use the One-Way ANOVA task to compare the MSRP for cars made in the US, Europe, and Asia.
Compare groups using Tukey’s HSD test and create a plot of the means with error bars.

1. Open the Sashelp.CARS data set.


2. Expand Tasks on the navigation pane. Under Tasks, expand Statistics and double-click One-Way
ANOVA

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-11

3. On the DATA tab, select SASHELP.CARS as the data set for analysis. Click the Browse button
under DATA, expand the SASHELP library, and select CARS data set.

4. Assign MSRP to the Dependent variable role.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-12 Chapter 7 Performing Statistical Analyses w ith Tasks

5. Assign Origin to the Categorical variable role.

6. On the OPTIONS tab, expand COMPARISONS. From the Comparisons method list, select Tukey.
(This is the default comparison method.)

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-13

7. Select Run and view the results.


Three groups for Origin are displayed in the Class Level Information table.

The ANOVA results show F(2,425)=76.98 and p-value < 0.0001. This suggests that the MSRP is
significantly different for at least two of the Origin groups.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-14 Chapter 7 Performing Statistical Analyses w ith Tasks

Tukey’s Studentized Range (HSD) test compares the groups using an experimentwise error rate of 0.05.
To read the results, look at the final table. Any pair of groups that are significantly different is indicated
by a p-value less than 0.05.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-15

Here is the correct way to interpret these results: European cars have significantly different mean MSRP
values than US or Asian cars. The mean MSRPs for US and Asian cars are not significantly different.

The plot shows that the highest prices are for European cars. This plot also enables you to see outliers.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-16 Chapter 7 Performing Statistical Analyses w ith Tasks

Linear Regression Task


The Linear Regression task performs the following actions:
 calls the REG procedure
 performs simple and multiple linear regression analysis
 uses a variety of model selection methods
 provides diagnostic statistics and plots

17

Examples of Linear Regression


 Does the fuel efficiency of cars differ as a function
of horsepower, weight, and number of cylinders?
 Does revenue vary as a function of marketing and
sales training expenditures?

18

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-17

Logistic Regression Task


The Logistic Regression task performs the following actions:
 calls the LOGISTIC procedure
 performs binary, ordinal, and multinomial logistic
regression analysis with continuous and categorical
predictors
 fits models with n-way, nested, and polynomial effects
 offers several model selection methods
 produces plots appropriate for the specified analysis

19

Examples of Logistic Regression


 Are loan applicants from one group more likely
to default than others?
 Are doctors from one specialty more likely to have
a malpractice suit filed against them than doctors
in other specialties?

20

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-18 Chapter 7 Performing Statistical Analyses w ith Tasks

Generalized Linear Models Task


The Generalized Linear Models task performs the
following actions:
 calls the GENMOD procedure
 performs generalized linear model analysis, fitting
responses from the exponential family of distributions
 offers a variety of link functions to relate the response
to a linear function of the predictors
 fits models with continuous and categorical predictors
 fits models with n-way, nested, and polynomial effects

21

Examples of Generalized Linear Models


 How does the expected number of melanoma cases
vary across five European cities?
 Does a crime rate change significantly after
implementation of a Community Watch program?

22

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-19

Linear Regression Task


Perform a simple linear regression analysis to predict
MPG_Highway from Weight.

23

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-20 Chapter 7 Performing Statistical Analyses w ith Tasks

Simple Linear Regression Analysis

Use the CARS data set to predict MPG_Highway from Weight using simple linear regression.
1. Expand Tasks on the navigation pane. Under Tasks, expand Statistics and double-click Linear
Regression.

2. On the DATA tab, select SASHELP.CARS as the data set for analysis. Click the Browse button
under DATA, expand the SASHELP library, and select the CARS data set.

3. Assign MPG_Highway to the Dependent variable role by clicking the + symbol.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-21

4. Assign Weight to the Continuous variable role.

5. On the MODEL tab, click the Edit button.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-22 Chapter 7 Performing Statistical Analyses w ith Tasks

6. Highlight the variable Weight and click Add. The variable Weight is then added to the Model effects
list.

Here is the code that SAS Studio created from the Linear Regression task:

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-23

7. Select Run and view the results.

Interpretation
With a significant p-value of <.0001 for Weight, the regression results demonstrate a significant
negative linear relationship between Weight and MPG_Highway. The regression equation can
be determined from the Parameter Estimates table. We claim that there is sufficient evidence that
for every one pound increase in Weight, the MPG_Highway decreases by 0.006.
Equation
MPG_Highway = 48.25 – 0.006*Weight.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-24 Chapter 7 Performing Statistical Analyses w ith Tasks

The fit plot shows 95% confidence limits for the regression line (shaded area) and 95% prediction
limits (dashed lines). The plot shows evidence of some unusual observations, which could merit
further investigation.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Analytical Tasks Overview 7-25

Correlations Task
The Correlations task performs the following actions:
 calls the CORR procedure
 performs correlation analysis and partial correlation
analysis
 produces four different tests of association
 displays Cronbach’s coefficient alpha and related
statistics
 creates a scatter-plot matrix of variables and other
useful plots

25

Examples of Correlation Analysis


 What is the association between income and years of
schooling among adults?
 What is the relationship between the concentration of
a pollutant and trunk circumference for a species of
tree?

26

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-26 Chapter 7 Performing Statistical Analyses w ith Tasks

Predictive Regression Models


 Selects the most influential effects based on observed
data.
 Data can be partitioned into training, validation, and
testing subsets.
 Task can also score new data.

27

Examples of Predictive Regression Model


 Which model best predicts which patients are at risk of
developing diabetes?

28

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-27

7.2 Performing Statistical Analyses in SAS


University Edition

Objectives
 Use the Correlations task to explore data.
 Use the Query Builder to transform data.
 Use the Linear Regression task to fit a linear
regression model.
 Locate a statistical option for a SAS program in the
SAS documentation.
 Modify a SAS program generated by a task to perform
a customized statistical analysis.
 Perform Gamma regression with the Generalized
Linear Models task.

30

Correlations Task
Investigate plots of MSRP versus predictor variables of
interest using the Correlations task.

31

In this section, the Correlations task produces a scatter plot of the response variable, MSRP, versus the
predictor variables of interest: Horsepower, MPG_City, MPG_Highway, and Weight. From these, you
can examine associations among these variables to help you determine which effects should be entered
into a linear regression model.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-28 Chapter 7 Performing Statistical Analyses w ith Tasks

Linear Regression Task


 Create quadratic terms in the Query Builder.
 Fit polynomial regression to predict MSRP from
Horsepower, MPG_City, MPG_Highway, and
Weight.
 Select a model with all-subsets regression.

32

The Linear Regression task provides model selection options, including stepwise and all-subsets
regression, with a variety of criteria. In the next demonstration, you fit an all-subsets regression model
using Mallows’ Cp as the criterion for assessing the best models.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-29

Polynomial Regression Analysis

In this demonstration, a consumer is interested in developing a model to predict the manufacturer’s


suggested retail price (MSRP) based on the characteristics of automobiles. A selection of predictors
(horsepower, city miles per gallon, highway miles per gallon, and weight) are chosen based on subject-
matter knowledge.

To explore the relationships between the predictors and the response, perform correlation analysis.
1. Expand Tasks on the navigation pane. Under Tasks, expand Statistics and double-click Correlation
Analysis.

2. On the DATA tab, select SASHELP.CARS as the data set for analysis. Click the Browse button
under DATA, expand the SASHELP library, and select the CARS data set.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-30 Chapter 7 Performing Statistical Analyses w ith Tasks

3. Assign MSRP to the Analysis variables role.

4. Assign Horsepower, MPG_City, MPG_Highway, and Weight to the Correlate with role. Select
Results.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-31

5. On the OPTIONS tab, expand PLOTS. Select Matrix of scatter plots for the Type of plot field.

6. Select Run and view the results.


From the Pearson Correlation Coefficients table, we can conclude that Horsepower is strongly
correlated to MSRP, whereas MPG_City, MPG_Highway, and Weight are moderately correlated to
MSRP.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-32 Chapter 7 Performing Statistical Analyses w ith Tasks

A scatter plot is created for each predictor and the response. There is also a scatter-plot matrix at the end
of the output, which might be easier to use in this example. The scatter-plot matrix shows evidence of
outliers and a curvilinear association between the predictors and MSRP. Also, evidence suggests that the
variance in MSRP is not constant across values of the predictors.

To fit linear regression with polynomial terms, higher-order variables can be created with the Transform
Data task.

 The Linear Regression task does not build higher-order terms automatically. However, it does
perform stepwise and all-subsets regression. The Linear Regression task creates estimate
parameters and test hypotheses using the same statistical methods (ordinary or weighted least
squares).

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-33

7. To create higher-order terms, expand Tasks on the navigation pane. Under Tasks, expand DATA and
double-click Transform Data.
8. On the DATA tab, select SASHELP.CARS as the data set for analysis.
9. Assign Horsepower to the Variable 1 role

10. From the Transform menu, select the square. This creates a new variable named sqr_Horsepower.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-34 Chapter 7 Performing Statistical Analyses w ith Tasks

11. Expand OUTPUT DATA SET and create a data set named Transform. Select Run. This creates a
temporary data set of SASHELP.CARS, named WORK.TRANSFORM, and creates the variable
sqr_Horsepower
12. Now, in the same Transform Data task, select WORK.TRANSFORM as the data set for analysis.
13. Assign the variables MPG_highway MPG_city, and Weight to TRANSFORM 1, 2, and 3,
respectively. Square each variable by selecting Square in each of the corresponding Transform
fields.

14. Expand OUTPUT DATA SET and re-create the data set transform_cars to add these additional
parameters. Select Run.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-35

Partial data set with new parameters: WORK.TRANSFORM_CARS

Now we want to create a new linear regression using the new higher-ordered parameters.
15. Expand Tasks on the navigation pane. Under Tasks, expand Statistics and double-click Linear
Regression.

16. On the DATA tab, select WORK.TRANSFORM_CARS as the data set for analysis. Click the
Browse under DATA, expand the WORK library, and select the TRANSFORM_CARS data set.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-36 Chapter 7 Performing Statistical Analyses w ith Tasks

17. Assign MSRP as the dependent variable

18. Assign Horsepower, MPG_City, MPG_Highway, Weight, and the four squared variables that you
created as explanatory variables by clicking the + symbol next to Continuous variables.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edit ion 7-37

19. On the MODEL tab, highlight all of the variables in the Variable list and click Add. This adds all the
variables to the Model effects list.

20. On the SELECTION tab, expand MODEL SELECTION and then do the following:
a. Select Stepwise selection in the Selection method field.
b. Select Mallows’ C(p) Statistic in the Add/remove effects with field.
c. Select Mallows’ C(p) Statistic in the Stop Adding/removing effects with field.
d. Select Mallows’ C(p) Statistic in the Select best model by field.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-38 Chapter 7 Performing Statistical Analyses w ith Tasks

21. On the SELECTION tab, expand SELECTION STATISTICS. Select Selected fit statistics in the
Model fit statistics field. A number of selection statistics appear. Select Adjusted R-square.

22. Select Run and view the results.

The output can be interpreted as follows:


Coefficient Interpretation
The coefficient values are interpreted as to how much of a unit change in Y will occur for a unit
increase in a particular X predictor variable, given that the other variables are held constant.

Therefore, with one mile per hour increase MPG_Highway and holding all other variables constant,
MSRP will increase $2134.66.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-39

Do this for each significant predictor.


F Test of Overall Significance
From the ANOVA table, the p-value of <.0001 suggests that we reject the null hypothesis that all the
slopes equal zero and conclude that at least one of the slopes differs significantly from zero.
t Test for Individual Coefficients
A t test on an individual coefficient is a test of its significance in the presence of all other explanatory
variables.

Example, with a p-value of 0.0043, MPG_City is a significant negative linear predictor of MSRP
when MPG_Highway, Sqr_horsepower, Sqr_MGP_highway, and Sqr_MPG_City are in the
model.

Do this for each significant predictor.


Coefficient of Determination R2
The R2 of 73.2% is interpreted as “73.2% of the variation in MSRP is explained by MPG_city,
MPG_Highway, Sqr_horsepower, Sqr_MGP_highway, and Sqr_MPG_City.”
SAS University Edition has limited space in which to surface program options while maintaining ease
of use, but there is a great deal more that can be done with the SAS programming language than what
is surfaced in SAS University Edition tasks.
SAS University Edition automatically produces SAS program code that is submitted to SAS. You can
access this code and modify it to fit your needs.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-40 Chapter 7 Performing Statistical Analyses w ith Tasks

SAS Syntax
SAS programs consist of two types of steps:
 DATA steps create, modify, format, and combine data.
– The DATA step begins with a DATA statement.
 PROC steps perform complex analyses or operations
with a few simple statements.
– A PROC step begins with a PROC statement.
A SAS statement begins with a keyword and ends with
a semicolon.

34

DATA Step

35

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-41

PROC Step
proc glm data=exp;
class group;
model y = a b a*b;
run;

36

Customizing a SAS Program


Use the SAS documentation to locate an option to reduce
the output from all-subsets regression, and add that
option to the program generated by SAS University
Edition in the previous demonstration.

37

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-42 Chapter 7 Performing Statistical Analyses w ith Tasks

Customizing a SAS Program

In this example, use the documentation to find an option to display only the five best models from the all-
subsets regression, and use this option in the SAS program generated by SAS University Edition.

1. Look at the SAS program generated by SAS University Edition by clicking the Code tab. Scroll
down to the section that begins with PROC REG.

 The REG procedure is the primary procedure called by the Linear Regression task.

2. To open the SAS documentation for the REG procedure, Click the ? in the top right corner of SAS
Studio and select SAS Product Documentation.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-43

3. Under B, click Base SAS

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-44 Chapter 7 Performing Statistical Analyses w ith Tasks

4. On the Documentation tab, click HTML next to SAS 9.4 Procedures by Name and Product.

5. In the left navigation pane, expand SAS Procedures and click SAS Procedures by Name.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-45

6. Under R, click REG to access the documentation for the REG procedure (in this case, the SAS/STAT®
User’s Guide).

7. From here you can review the syntax for all of the options in PROC REG. If you want to add code to
the pre-existing code that the Linear Regression task has written, on the CODE tab in the right
navigation pane, click Edit. This opens a new programming tab and copies the same code from the
Linear Regression task. You can now edit the code.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-46 Chapter 7 Performing Statistical Analyses w ith Tasks

Generalized Linear Models Task

39

The Generalized Linear Models task fits models in which some function of the response variable (called
the link function) is modeled as a linear combination of predictors and model parameters. In generalized
linear models, the responses can come from any distribution in the exponential family of distributions.
These distributions can represent continuous or discrete data.
Examples of continuous distributions in the exponential family include normal and gamma. The gamma
distribution is useful for right-skewed data in which the variance increases as a function of the square of
the mean. In the previous example, the variance of MSRP was not constant, so a gamma regression
model might be more appropriate.

 In addition to plots that can provide evidence of nonconstant variance, the Linear Regression task
in SAS University Edition can request a statistical test for heteroscedasticity. In the Linear
Regression task, select Statistics  Heteroscedasticity test.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-47

Gamma Regression in the Generalized Linear Models Task

Fit a Gamma regression model for the SASHELP.CARS data set.


1. Under the left navigation pane, expand Tasks. Under Tasks, expand Statistics and double-click
Generalized Linear Models

2. Under ROLES, select Gamma in the Distribution field.

3. Assign MSRP as the response variable

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-48 Chapter 7 Performing Statistical Analyses w ith Tasks

4. Under ROLES, select Logarithm in the Link function field.

 Although the inverse link is the canonical link for the gamma distribution, the log link is
often used to maintain positive predicted values. That makes sense in this example, because
MSRP should not be negative.

5. Assign Horsepower, MPG_City, MPG_Highway, and Weight as the quantitative, or continuous


variables. Select Model.

 You do not need to add the squared terms that you created, although you can do so if you
prefer. The Generalized Linear Models task can create the polynomial terms for you
automatically.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-49

6. On the MODEL tab, select the four variables in the Variables pane and click Polynomial Order = N.
An N=2 degrees of freedom specification generates all one- and two-way polynomial terms.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-50 Chapter 7 Performing Statistical Analyses w ith Tasks

7. Select Run and view the results.


Partial Output

The quadratic term for MPG_Highway is not statistically significant, and it has the largest p-value,
0.2033. It might be useful to drop this term and refit the model.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Performing Statistical Analyses in SAS University Edition 7-51

Partial Output

The Diagnostic Statistics panel shows evidence of several outlier or leverage points (or both) in the data
set. These observations should be investigated further.

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-52 Chapter 7 Performing Statistical Analyses w ith Tasks

Copyright © 2016, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

You might also like