Population vs. Sample

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 44

Population vs.

Sample
Population
a b cd

Sample
b gi o r y
Measures computed from sample data are called statistics

c n u

ef gh i jk l m n o p q rs t u v w x y z

Measures used to describe a population are called parameters

Two Branches of Statistics

Descriptive statistics

Collecting, summarizing, and presenting data Drawing conclusions about a population based only on sample data

Inferential statistics

Descriptive Statistics

Collect data

e.g., Survey

Present data

e.g., Tables and graphs

Characterize data

e.g., Sample mean =

X
n

Inferential Statistics

Estimation

e.g., Estimate the population mean weight using the sample mean weight e.g., Test the claim that the population mean weight is 120 pounds Drawing conclusions about a population based on sample results.

Hypothesis testing

Collecting Data
Primary
Data Collection

Secondary
Data Compilation Print or Electronic

Observation

Survey

Experimentation

Types of Data
Data

Categorical
Examples:

Numerical

Marital Status Political Party Eye Color (Defined categories)

Discrete
Examples:

Continuous
Examples:

Number of Children Defects per hour (Counted items)

Weight Voltage (Measured characteristics)

Levels of Measurement and Measurement Scales


Differences between measurements, true zero exists Differences between measurements but no true zero

Ratio Data Interval Data

Highest Level (Strongest forms of measurement)

Higher Levels Ordered Categories (rankings, order, or scaling)

Ordinal Data
Lowest Level (Weakest form of measurement)

Categories (no ordering or direction)

Nominal Data

Levels of Measurement and Measurement Scales


EXAMPLES:

Ratio Data Interval Data

Differences between measurements, true zero exists

Height, Age, Weekly Food Spending Temperature in Fahrenheit, Standardized exam score Service quality rating, Standard & Poors bond rating, Student letter grades Marital status, Type of car owned

Differences between measurements but no true zero

Ordinal Data

Ordered Categories (rankings, order, or scaling)

Nominal Data

Categories (no ordering or direction)

Presenting Data in Tables and Charts

Organizing and Presenting Data Graphically

Data in raw form are usually not easy to use for decision making

Some type of organization is needed


Table Graph

Techniques reviewed here:


Bar charts and pie charts Pareto diagram Ordered array Stem-and-leaf display Frequency distributions, histograms and polygons Cumulative distributions and ogives Contingency tables Scatter diagrams

Tables and Charts for Categorical Data


Categorical Data

Tabulating Data Summary Table Bar Charts

Graphing Data

Pie Charts

Pareto Diagram

The Summary Table


Summarize data by category Example: Current Investment Portfolio
Investment Type Stocks Bonds CD Savings Total Amount Percentage (in thousands $) (%) 46.5 32.0 15.5 16.0 110.0 42.27 29.09 14.09 14.55 100.0

(Variables are Categorical)

Bar and Pie Charts

Bar charts and Pie charts are often used for qualitative data (categories or nominal scale) Height of bar or size of pie slice shows the frequency or percentage for each category

Bar Chart Example


Current Investment Portfolio
Investment Amount Percentage Type (in thousands `) (%)

Stocks Bonds CD Savings Total

46.5 32.0 15.5 16.0 110.0

42.27 29.09 14.09 14.55 100.0


Savings CD Bonds Stocks 0

Investor's Portfolio

10

20

30

40

50

Amount in $1000's

Pie Chart Example


Current Investment Portfolio
Investment Type
(in thousands `)

Amount

Percentage
(%)

Stocks Bonds CD Savings Total

46.5 32.0 15.5 16.0 110.0

42.27 29.09 14.09 14.55 100.0 CD 14%

Savings 15%

Stocks 42%

Bonds 29%

Percentages are rounded to the nearest percent

Pareto Diagram

Used to portray categorical data (nominal scale) A bar chart, where categories are shown in descending order of frequency A cumulative polygon is often shown in the same graph Used to separate the vital few from the trivial many

Pareto Diagram Example


Current Investment Portfolio
45% 100% 90%

% invested in each category (bar graph)

40%

35%

80% 70%

cumulative % invested (line graph)

30% 60% 25% 50% 20% 40% 15% 30% 10% 20%

5%

10% 0% Stocks Bonds Savings CD

0%

Tables and Charts for Numerical Data


Numerical Data

Ordered Array

Frequency Distributions and Cumulative Distributions Histogram Polygon Ogive

Stem-and-Leaf Display

The Ordered Array


A sequence of data in rank order:
Shows range (min to max) Provides some signals about variability within the range May help identify outliers (unusual observations) If the data set is large, the ordered array is less useful

The Ordered Array


(continued)

Data in raw form (as collected): 24, 26, 24, 21, 27, 27, 30, 41, 32, 38

Data in ordered array from smallest to largest: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Stem-and-Leaf Diagram

A simple way to see distribution details in a data set METHOD: Separate the sorted data series into leading digits (the stem) and the trailing digits (the leaves)

Example
Data in ordered array:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Here, use the 10s digit for the stem unit:


Stem Leaf

21 is shown as 38 is shown as 41 is shown as

2 3 4

1 8 1

Example
(continued)

Data in ordered array:


21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Completed stem-and-leaf diagram:


Stem Leaves

2 3 4

1 4 4 6 7 7 0 2 8 1

Using other stem units

Using the 100s digit as the stem:

Round off the 10s digit to form the leaves


Stem

Leaf 1 8 2

613 would become 776 would become ... 1224 becomes

6 7 12

Using other stem units


(continued)

Using the 100s digit as the stem:

The completed stem-and-leaf display:


Data:
613, 632, 658, 717, 722, 750, 776, 827, 841, 859, 863, 891, 894, 906, 928, 933, 955, 982, 1034, 1047,1056, 1140, 1169, 1224 Stem 6 7 8 9 10 11 12 Leaves 136 2258 346699 13368 356 47 2

Tabulating Numerical Data: Frequency Distributions


What is a Frequency Distribution?

A frequency distribution is a list or a table containing class groupings (ranges within which the data fall) ... and the corresponding frequencies with which data fall within each grouping or category

Why Use a Frequency Distribution?

It is a way to summarize numerical data It condenses the raw data into a more useful form... It allows for a quick visual interpretation of the data

Class Intervals and Class Boundaries


Each class grouping has the same width Determine the width of each interval by
range Width of interval number of desired class groupings

Usually at least 5 but no more than 15 groupings Class boundaries never overlap Round up the interval width to get desirable endpoints

Frequency Distribution Example


Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

Frequency Distribution Example


(continued)

Sort raw data in ascending order:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Find range: 58 - 12 = 46 Select number of classes: 5 (usually between 5 and 15) Compute class interval (width): 10 (46/5 then round up) Determine class boundaries (limits): 10, 20, 30, 40, 50, 60 Compute class midpoints: 15, 25, 35, 45, 55 Count observations & assign to classes

Frequency Distribution Example


(continued)

Data in ordered array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Relative Frequency

Class

Frequency

Percentage

10 but less than 20 3 20 but less than 30 6 30 but less than 40 5 40 but less than 50 4 50 but less than 60 2

.15 .30 .25 .20 .10

15 30 25 20 10

Tabulating Numerical Data: Cumulative Frequency


Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Cumulative Cumulative Frequency Percentage 3 9 14 18 20 100 15 45 70 90 100

Class 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60

Frequency Percentage 3 6 5 4 2 20 15 30 25 20 10

Total

Graphing Numerical Data: The Histogram

A graph of the data in a frequency distribution is called a histogram The class boundaries (or class midpoints) are shown on the horizontal axis the vertical axis is either frequency, relative frequency, or percentage Bars of the appropriate heights are used to represent the number of observations within each class

Histogram Example
Class 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60 Class Midpoint Frequency 15 25 35 45 55 3 6 5 4 2

Histogram : Daily High Tem perature 7 6 Frequency 5 4 3 2 1 0 5 15 25 35 45 55 Class Midpoints 65

(No gaps between bars)

Graphing Numerical Data: The Frequency Polygon


Class 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60 Class Midpoint Frequency 15 25 35 45 55 3 6 5 4 2

Frequency Polygon: Daily High Temperature 7 6 Frequency 5 4 3 2 1 0 5 15 25 35 45 55 65

(In a percentage polygon the vertical axis would be defined to show the percentage of observations per class)

Class Midpoints

Graphing Cumulative Frequencies:


The Ogive (Cumulative % Polygon)
Lower Cumulative class boundary Percentage 0 10 20 30 40 50 0 15 45 70 90 100 Class Less than 10 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60

Ogive: Daily High Temperature


100 Cumulative Percentage 80 60 40 20 0 10

10 20

20 30

30 40

40 50

50 60

60

Class Boundaries (Not Midpoints)

Tabulating and Graphing Multivariate Categorical Data

Contingency Table for Investment Choices (`1000s)


Investor A Investor B Investor C Total

Investment

Stocks Bonds CD Savings Total

46.5 32.0 15.5 16.0 110.0

55 44 20 28 147

27.5 19.0 13.5 7.0 67.0

129 95 49 51 324

(Individual values could also be expressed as percentages of the overall total, percentages of the row totals, or percentages of the column totals)

Tabulating and Graphing Multivariate Categorical Data


(continued)

Side-by-side bar charts


Comparing Investors
S avings CD B onds S toc k s 0 10 Inves tor A 20 30 Inves tor B 40 50 Inves tor C 60

Side-by-Side Chart Example

Sales by quarter for three sales territories:


East West North 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20.4 27.4 59 20.4 30.6 38.6 34.6 31.6 45.9 46.9 45 43.9

60 50 40 30 20 10 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East West North

Scatter Diagrams

Scatter Diagrams are used to examine possible relationships between two numerical variables
The Scatter Diagram: one variable is measured on the vertical axis and the other variable is measured on the horizontal axis

Scatter Diagram Example


Volume per day 23 24 26 29 33 38 41 42 50 55 60 Cost per day 131 120 151 160 167 185 170 188 195 200
Cost per Day 250 200 150 100 50 0 0 10 20 30 40 50 60 70 Volume per Day

Cost per Day vs. Production Volume

140

Time Series Plot

A Time Series Plot is used to study patterns in the values of a variable over time
The Time Series Plot: one variable is measured on the vertical axis and the time period is measured on the horizontal axis

Scatter Diagram Example


Number of Franchises 43
Number of Franchises 120 100 80 60 40 20 0 1994 1996 1998 2000 Year 2002 2004 2006

Year 1996 1997 1998 1999 2000 2001 2002 2003 2004

Number of Franchises, 1996-2004

54 60 73 82 95 107 99 95

Misusing Graphs and Ethical Issues


Guidelines for good graphs: Do not distort the data Avoid unnecessary adornments (no chart junk) Use a scale for each axis on a two-dimensional graph The vertical axis scale should begin at zero Properly label all axes The graph should contain a title Use the simplest graph for a given set of data

You might also like