SBE12 CH 01

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 47


. John Loucks
St. Edwards
. University
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 1
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 1
Data and Statistics
Applications in Business and Economics
Data Sources
Computers and Statistical
Data Mining
Ethical Guidelines for Statistical

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 2
or duplicated, or posted to a publicly accessible website, in whole or in part.

The term statistics can refer to numerical facts

such as averages, medians, percents, and index
numbers that help us understand a variety of
business and economic situations.
Statistics can also refer to the art and science of
collecting, analyzing, presenting, and
interpreting data.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 3
or duplicated, or posted to a publicly accessible website, in whole or in part.
Applications in
Business and Economics
Public accounting firms use statistical sampling
procedures when conducting audits for their clients.
Economists use statistical information in making
forecasts about the future of the economy or some
aspect of it.
Financial advisors use price-earnings ratios and
dividend yields to guide their investment advice.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 4
or duplicated, or posted to a publicly accessible website, in whole or in part.
Applications in
Business and Economics
Electronic point-of-sale scanners at retail checkout
counters are used to collect data for a variety of
marketing research applications.
A variety of statistical quality control charts are used
to monitor the output of a production process.
Information Systems
A variety of statistical information helps administrators
assess the performance of computer networks.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 5
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data and Data Sets

Data are the facts and figures collected,

and summarized for presentation and
the data collected in a particular study are referred
to as the data set for the study.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 6
or duplicated, or posted to a publicly accessible website, in whole or in part.
Elements, Variables, and Observations

Elements are the entities on which data are collected.

A variable is a characteristic of interest for the elements.
The set of measurements obtained for a particular
element is called an observation.
A data set with n elements contains n observations.
The total number of data values in a complete data
set is the number of elements multiplied by the
number of variables.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 7
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data, Data Sets,
Elements, Variables, and Observations

Observatio Variables
Elemen n
t Stock Annual Earn/
Company Exchange Sales($M) Share($)

Dataram NQ 73.10 0.86

EnergySouth N 74.00 1.67
Keystone N 365.70 0.86
LandCare NQ 111.40 0.33
Psychemedics N 17.60 0.13

Data Set
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 8
or duplicated, or posted to a publicly accessible website, in whole or in part.
Scales of Measurement

Scales of measurement include:

Nominal Interval
Ordinal Ratio

The scale determines the amount of information

contained in the data.

The scale indicates the data summarization and

statistical analyses that are most appropriate.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 9
or duplicated, or posted to a publicly accessible website, in whole or in part.
Scales of Measurement


Data are labels or names used to identify an

attribute of the element.

A nonnumeric label or numeric code may be used.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 10
or duplicated, or posted to a publicly accessible website, in whole or in part.
Scales of Measurement


Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business, Humanities,
Education, and so on.
Alternatively, a numeric code could be used for
the school variable (e.g. 1 denotes Business,
2 denotes Humanities, 3 denotes Education, and
so on).

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 11
or duplicated, or posted to a publicly accessible website, in whole or in part.
Scales of Measurement


The data have the properties of nominal data and

the order or rank of the data is meaningful.

A nonnumeric label or numeric code may be used.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 12
or duplicated, or posted to a publicly accessible website, in whole or in part.
Scales of Measurement


Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 13
or duplicated, or posted to a publicly accessible website, in whole or in part.
Scales of Measurement


The data have the properties of ordinal data, and

the interval between observations is expressed in
terms of a fixed unit of measure.

Interval data are always numeric.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 14
or duplicated, or posted to a publicly accessible website, in whole or in part.
Scales of Measurement


Melissa has an SAT score of 1985, while Kevin
has an SAT score of 1880. Melissa scored 105
points more than Kevin.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 15
or duplicated, or posted to a publicly accessible website, in whole or in part.
Scales of Measurement


The data have all the properties of interval data

and the ratio of two values is meaningful.

Variables such as distance, height, weight, and time

use the ratio scale.

This scale must contain a zero value that indicates

that nothing exists for the variable at the zero point.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 16
or duplicated, or posted to a publicly accessible website, in whole or in part.
Scales of Measurement


Melissas college record shows 36 credit hours
earned, while Kevins record shows 72 credit
hours earned. Kevin has twice as many credit
hours earned as Melissa.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 17
or duplicated, or posted to a publicly accessible website, in whole or in part.
Categorical and Quantitative Data

Data can be further classified as being categorical

or quantitative.

The statistical analysis that is appropriate depends

on whether the data for the variable are categorical
or quantitative.

In general, there are more alternatives for statistical

analysis when the data are quantitative.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 18
or duplicated, or posted to a publicly accessible website, in whole or in part.
Categorical Data

Labels or names used to identify an attribute of

each element

Often referred to as qualitative data

Use either the nominal or ordinal scale of


Can be either numeric or nonnumeric

Appropriate statistical analyses are rather limited

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 19
or duplicated, or posted to a publicly accessible website, in whole or in part.
Quantitative Data

Quantitative data indicate how many or how much:

discrete, if measuring how many

continuous, if measuring how much

Quantitative data are always numeric.

Ordinary arithmetic operations are meaningful for

quantitative data.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 20
or duplicated, or posted to a publicly accessible website, in whole or in part.
Scales of Measurement


Categorical Quantitativ

Numeric Non-numeric Numeric

Nomina Ordina Nominal Ordinal Interval Ratio

l l

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 21
or duplicated, or posted to a publicly accessible website, in whole or in part.
Cross-Sectional Data

Cross-sectional data are collected at the same or

approximately the same point in time.

Example: data detailing the number of building

permits issued in November 2012 in each of the
counties of Ohio

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 22
or duplicated, or posted to a publicly accessible website, in whole or in part.
Time Series Data

Time series data are collected over several time


Example: data detailing the number of building

permits issued in Lucas County, Ohio in each of
the last 36 months

Graphs of time series help analysts understand

what happened in the past,
identify any trends over time, and
project future levels for the time series

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 23
or duplicated, or posted to a publicly accessible website, in whole or in part.
Time Series Data

Graph of Time Series Data

U.S. Average Price Per Gallon
For Conventional Regular Gasoline

Source: Energy
Energy Information
Information Administration,
Administration, U.S.
U.S. Department
Department of
of Energy,
Energy, May
May 2009.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 24
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sources

Existing Sources

Internal company records almost any department

Business database services Dow Jones & Co.
Government agencies - U.S. Department of Labor
Industry associations Travel Industry Association
of America
Special-interest organizations Graduate Management
Admission Council
Internet more and more firms

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 25
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sources

Data Available From Internal Company

Record Some of the Data Available
Employee records name, address, social security number
Production recordspart number, quantity produced,
direct labor cost, material cost
Inventory records part number, quantity in stock,
reorder level, economic order quantity
Sales records product number, sales volume, sales
volume by region
Credit records customer name, credit limit, accounts
receivable balance
Customer profile age, gender, income, household size
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 26
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sources

Data Available From Selected Government

Government Agency Some of the Data Available
Census Bureau Population data, number of households, household income
Federal Reserve Board Data on money supply, exchange
www.federalreserve.govrates, discount rates
Data on revenue, expenditures, debt
Office of Mgmt. & Budget
of federal government
Department of CommerceData on business activity, value of shipments, profit by industry
Customer spending, unemployment
Bureau of Labor Statistics rate, hourly earnings, safety record

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 27
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sources

Statistical Studies -
In experimental studies the variable of interest is
first identified. Then one or more other variables
are identified and controlled so that data can be
obtained about how they influence the variable of

The largest experimental study ever conducted is

believed to be the 1954 Public Health Service
experiment for the Salk polio vaccine. Nearly two
million U.S. children (grades 1- 3) were selected.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 28
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sources

Statistical Studies - Observational

In observational (nonexperimental) studies no
attempt is made to control or influence the
variables of interest. a survey is a good
Studies of smokers and nonsmokers are
observational studies because researchers
do not determine or control
who will smoke and who will not smoke.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 29
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Acquisition Considerations

Time Requirement
Searching for information can be time consuming.
Information may no longer be useful by the time it
is available.
Cost of Acquisition
Organizations often charge for information even
when it is not their primary business activity.
Data Errors
Using any data that happen to be available or were
acquired with little care can lead to misleading
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 30
or duplicated, or posted to a publicly accessible website, in whole or in part.
Descriptive Statistics

Most of the statistical information in

newspapers, magazines, company reports, and
other publications consists of data that are
summarized and presented in a form that is
easy to
Such understand.
summaries of data, which may be
tabular, graphical, or numerical, are referred to
as descriptive statistics.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 31
or duplicated, or posted to a publicly accessible website, in whole or in part.
Example: Hudson Auto Repair

The manager of Hudson Auto would like to

have a
better understanding of the cost of parts used in
engine tune-ups performed in her shop. She
50 customer invoices for tune-ups. The costs of
rounded to the nearest dollar, are listed on the

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 32
or duplicated, or posted to a publicly accessible website, in whole or in part.
Example: Hudson Auto Repair

Sample of Parts Cost ($) for 50 Tune-

91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 33
or duplicated, or posted to a publicly accessible website, in whole or in part.
Tabular Summary:
Frequency and Percent Frequency
Example: Hudson Auto

Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26 (2/50)100
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
50 100

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 34
or duplicated, or posted to a publicly accessible website, in whole or in part.
Graphical Summary: Histogram

Example: Hudson Auto

18 Tune-up Parts Cost

Cost ($)
5059 6069 7079 8089 9099 100-110
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 35
or duplicated, or posted to a publicly accessible website, in whole or in part.
Numerical Descriptive Statistics

The most common numerical descriptive statistic

is the average (or mean).
The average demonstrates a measure of the central
tendency, or central location, of the data for a variable.
Hudsons average cost of parts, based on the 50
tune-ups studied, is $79 (found by summing the
50 cost values and then dividing by 50).

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 36
or duplicated, or posted to a publicly accessible website, in whole or in part.
Statistical Inference

Population the set of all elements of interest in a

particular study
Sample a subset of the population

the process of using data obtained

Statistical inference
from a sample to make estimates
and test hypotheses about the
characteristics of a population
Census collecting data for the entire population

Sample survey collecting data for a sample

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 37
or duplicated, or posted to a publicly accessible website, in whole or in part.
Process of Statistical Inference

1. Population
consists of all tune- 2. A sample of 50
ups. Average cost of engine tune-ups
parts is unknown. is examined.

4. The sample average 3. The sample data

provide a sample
is used to estimate the average parts cost
population average. of $79 per tune-up.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 38
or duplicated, or posted to a publicly accessible website, in whole or in part.
Computers and Statistical Analysis

Statisticians often use computer software to perform

the statistical computations required with large
amounts of data.
Many of the data sets in this book are available on
the website that accompanies the book.
The data sets can downloaded in either Minitab or
Excel format.
Also, the Excel add-in StatTools can be downloaded
from the website.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 39
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Warehousing

Organizations obtain large amounts of data on a

daily basis by means of magnetic card readers, bar
code scanners, point of sale terminals, and touch
screen monitors.
Wal-Mart captures data on 20-30 million transactions
per day.
Visa processes 6,800 payment transactions per second.
Capturing, storing, and maintaining the data, referred
to as data warehousing, is a significant undertaking.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 40
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Mining

Analysis of the data in the warehouse might aid in

decisions that will lead to new strategies and higher
profits for the organization.
Using a combination of procedures from statistics,
mathematics, and computer science, analysts mine
the data to convert it into useful information.
The most effective data mining systems use automated
procedures to discover relationships in the data and
predict future outcomes, prompted by only general,
even vague, queries by the user.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 41
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Mining Applications

The major applications of data mining have been

made by companies with a strong consumer focus
such as retail, financial, and communication firms.
Data mining is used to identify related products that
customers who have already purchased a specific
product are also likely to purchase (and then pop-ups
are used to draw attention to those related products).
As another example, data mining is used to identify
customers who should receive special discount offers
based on their past purchasing volumes.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 42
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Mining Requirements

Statistical methodology such as multiple regression,

logistic regression, and correlation are heavily used.
Also needed are computer science technologies
involving artificial intelligence and machine learning.
A significant investment in time and money is
required as well.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 43
or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Mining Model Reliability

Finding a statistical model that works well for a

particular sample of data does not necessarily mean
that it can be reliably applied to other data.
With the enormous amount of data available, the
data set can be partitioned into a training set (for
model development) and a test set (for validating
the model).
There is, however, a danger of over fitting the model
to the point that misleading associations and
conclusions appear to exist.
Careful interpretation of results and extensive testing
is important.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 44
or duplicated, or posted to a publicly accessible website, in whole or in part.
Ethical Guidelines for Statistical Practice

In a statistical study, unethical behavior can take a

variety of forms including:
Improper sampling
Inappropriate analysis of the data
Development of misleading graphs
Use of inappropriate summary statistics
Biased interpretation of the statistical results
You should strive to be fair, thorough, objective, and
neutral as you collect, analyze, and present data.
As a consumer of statistics, you should also be aware
of the possibility of unethical behavior by others.

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 45
or duplicated, or posted to a publicly accessible website, in whole or in part.
Ethical Guidelines for Statistical Practice

The American Statistical Association developed the

report Ethical Guidelines for Statistical Practice.
The report contains 67 guidelines organized into
eight topic areas:
Responsibilities to Funders, Clients, Employers
Responsibilities in Publications and Testimony
Responsibilities to Research Subjects
Responsibilities to Research Team Colleagues
Responsibilities to Other Statisticians/Practitioners
Responsibilities Regarding Allegations of Misconduct
Responsibilities of Employers Including Organizations,
Individuals, Attorneys, or Other Clients
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 46
or duplicated, or posted to a publicly accessible website, in whole or in part.
End of Chapter 1

2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 47
or duplicated, or posted to a publicly accessible website, in whole or in part.

You might also like