Chapter 1
Chapter 1
Chapter 1
Intorduction
to Statistics
and Data
Analysis
2
Introduction
Statistics is a science that helps us make decisions and
draw conclusions in the presence of variability.
Sources of Variability
5
What is Statistics
Statistics is the estimation of parameters (e.g.,
mean, median) and selection of distribution type
needed to quantify uncertainty.
Statistical Path
Data Collection
The data collected could be Qualitative (shape, color,
etc.) or Quantitative (length, weight, etc.)
Data Collection
Population: Collection of all individuals or individual items of a
particular type.
Data Collection
There are three basic methods of collecting data:
Statistical Path
Data Presentation
12
Data Presentation
Data presentation is defined as the process of
using various graphical formats to visually
represent the relationship between two or more
data sets.
13
Statistical Path
Data Analysis
Descriptive statistics are brief descriptive coefficients that
summarize a given dataset (sample).
Statistical Path
Decision Making
17
Decision Making
Descriptive Statistics information about the sample only
Descriptive + Probability concepts Conclusions about
Statistics employment the population
Inferential Statistics
Decision Making
Example 1
26
27
28
Textbook Exercises
Textbook Exercises
30
Textbook Exercises
Textbook Exercises
32
Data Presentation
The dot diagram is a useful data display for small samples up to about
20 observations. However, when the number of observations is
moderately large, other graphical displays may be more useful. In this
chapter we will discuss three different types of graphs which are:
➢ Box Plot
33
➢ Divide each number (xi) into two parts: a stem, consisting of the
leading digits, and a leaf, consisting of the remaining digit. Choose
relatively few stems in comparison with the number of observations
(between 5 and 20 stems).
➢ List the stem values in a vertical column.
➢ Record the leaf for each observation beside its stem.
➢ Write the units for the stems and leaves on the display.
35
The number of bins depends on the number of observations and the amount of scatter or
dispersion in the data. A frequency distribution that uses either too few or too many bins
will not be informative. We usually find that between 5 and 20 bins is satisfactory in most
cases.
Choose the number of bins approximately equal to the square root of the number of
observations
Number of bins ≅ 𝑛
mode < median < mean if the distribution is skewed to the right
mode > median > mean if the distribution is skewed to the left
Mode = median = mean if the distribution is symmetric
42
Relative Frequency Distribution & Histogram
(a) is negative or left skewed (based on the location of the long tail) in
which the median is greater than the mean 𝑥 > 𝑥ҧ .
(c) is positive or right skewed (based on the location of the long tail) in
which the median is smaller than the mean 𝑥 < 𝑥ҧ .
43
Relative Frequency Distribution & Histogram
44
Box-and-Whisker Plot or Box Plot
The box plot (vertical or horizontal) is a graphical display that simultaneously
describes several important features of a data set, such as center, spread,
departure from symmetry, and identification of unusual observations or outliers
Median (Q2)=?
Q1=? Q3=? IQR=?
Minimum=? Maximum=?
Range=?
Outliers=? Extreme
Outliers=?
Symmetry?
48