Population vs. Sample
Population vs. Sample
Population vs. Sample
Sample
Population
a b cd
Sample
b gi o r y
Measures computed from sample data are called statistics
c n u
ef gh i jk l m n o p q rs t u v w x y z
Descriptive statistics
Collecting, summarizing, and presenting data Drawing conclusions about a population based only on sample data
Inferential statistics
Descriptive Statistics
Collect data
e.g., Survey
Present data
Characterize data
X
n
Inferential Statistics
Estimation
e.g., Estimate the population mean weight using the sample mean weight e.g., Test the claim that the population mean weight is 120 pounds Drawing conclusions about a population based on sample results.
Hypothesis testing
Collecting Data
Primary
Data Collection
Secondary
Data Compilation Print or Electronic
Observation
Survey
Experimentation
Types of Data
Data
Categorical
Examples:
Numerical
Discrete
Examples:
Continuous
Examples:
Ordinal Data
Lowest Level (Weakest form of measurement)
Nominal Data
Height, Age, Weekly Food Spending Temperature in Fahrenheit, Standardized exam score Service quality rating, Standard & Poors bond rating, Student letter grades Marital status, Type of car owned
Ordinal Data
Nominal Data
Data in raw form are usually not easy to use for decision making
Table Graph
Bar charts and pie charts Pareto diagram Ordered array Stem-and-leaf display Frequency distributions, histograms and polygons Cumulative distributions and ogives Contingency tables Scatter diagrams
Graphing Data
Pie Charts
Pareto Diagram
Bar charts and Pie charts are often used for qualitative data (categories or nominal scale) Height of bar or size of pie slice shows the frequency or percentage for each category
Investor's Portfolio
10
20
30
40
50
Amount in $1000's
Amount
Percentage
(%)
Savings 15%
Stocks 42%
Bonds 29%
Pareto Diagram
Used to portray categorical data (nominal scale) A bar chart, where categories are shown in descending order of frequency A cumulative polygon is often shown in the same graph Used to separate the vital few from the trivial many
40%
35%
80% 70%
30% 60% 25% 50% 20% 40% 15% 30% 10% 20%
5%
0%
Ordered Array
Stem-and-Leaf Display
Data in raw form (as collected): 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Data in ordered array from smallest to largest: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Stem-and-Leaf Diagram
A simple way to see distribution details in a data set METHOD: Separate the sorted data series into leading digits (the stem) and the trailing digits (the leaves)
Example
Data in ordered array:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
2 3 4
1 8 1
Example
(continued)
2 3 4
1 4 4 6 7 7 0 2 8 1
Leaf 1 8 2
6 7 12
A frequency distribution is a list or a table containing class groupings (ranges within which the data fall) ... and the corresponding frequencies with which data fall within each grouping or category
It is a way to summarize numerical data It condenses the raw data into a more useful form... It allows for a quick visual interpretation of the data
Each class grouping has the same width Determine the width of each interval by
range Width of interval number of desired class groupings
Usually at least 5 but no more than 15 groupings Class boundaries never overlap Round up the interval width to get desirable endpoints
Find range: 58 - 12 = 46 Select number of classes: 5 (usually between 5 and 15) Compute class interval (width): 10 (46/5 then round up) Determine class boundaries (limits): 10, 20, 30, 40, 50, 60 Compute class midpoints: 15, 25, 35, 45, 55 Count observations & assign to classes
Class
Frequency
Percentage
10 but less than 20 3 20 but less than 30 6 30 but less than 40 5 40 but less than 50 4 50 but less than 60 2
15 30 25 20 10
Class 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60
Frequency Percentage 3 6 5 4 2 20 15 30 25 20 10
Total
A graph of the data in a frequency distribution is called a histogram The class boundaries (or class midpoints) are shown on the horizontal axis the vertical axis is either frequency, relative frequency, or percentage Bars of the appropriate heights are used to represent the number of observations within each class
Histogram Example
Class 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60 Class Midpoint Frequency 15 25 35 45 55 3 6 5 4 2
(In a percentage polygon the vertical axis would be defined to show the percentage of observations per class)
Class Midpoints
10 20
20 30
30 40
40 50
50 60
60
Investment
55 44 20 28 147
129 95 49 51 324
(Individual values could also be expressed as percentages of the overall total, percentages of the row totals, or percentages of the column totals)
Scatter Diagrams
Scatter Diagrams are used to examine possible relationships between two numerical variables
The Scatter Diagram: one variable is measured on the vertical axis and the other variable is measured on the horizontal axis
140
A Time Series Plot is used to study patterns in the values of a variable over time
The Time Series Plot: one variable is measured on the vertical axis and the time period is measured on the horizontal axis
Year 1996 1997 1998 1999 2000 2001 2002 2003 2004
54 60 73 82 95 107 99 95