Chapter 7

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

Chapter 7: Statistical Analysis

of Data Treatment and


Evaluation
Samples and Populations
 In scientific study, we infer information about a population
from observations made on a sample.

 The population is the collection of all measurements of


interest and must be carefully defined by an experimenter.

 Some can be finite and real, while in others, populations can be


hypothetical or conceptual in nature.
Examples
A. Production run of
multivitamin tablets that
produces hundreds of
thousands of tablets. (Test
tablets for quality control)
B. The determination of
calcium in a community of
water supply to determine
water hardness.

(population is conceptual)
C. Determination of glucose
in the blood of a patient

(population is conceptual)
Population mean µ and
sample mean xˉ
 Sample mean xˉ is the arithmetic average of a limited sample
drawn from a population of data. The sample mean is defined
as the sum of the measurement values divided by the number
of measurements (N).
 The population mean µ in contrast is the true mean for the
population.
 The symbols emphasize the difference.
 In most cases we do not know µ and must infer its value from
xˉ.
 More often than not, particularly when N is small, xˉ differs
from µ because a small data may not exactly represent its
population .
Most of the results in scientific studies are
expressed in mean±standard
deviation

Article paper
a measure of precision
 Standard deviation (s) – has the same unit as the data
 Variance (s2) – units of the data squared
 Relative Standard Deviation
 Coefficient of variance
 Spread/ range – difference between the largest value and the
smallest value. Describe the precision of a set of replicate
results.
Example 1a:
1. The following results were obtained in the replicate
determination of the lead content of a blood sample: 0.752,
0.756, 0.752, 0.751, and 0.760 ppm Pb. Find the mean and
the standard deviation of this set of data.
2. From the set of data in example 1a, calculate a.) the variance b.)
the relative standard deviation in parts per thousand c.) the
coefficient of variation and d.) the spread
Let’s move on to the errors we would like to prevent in
the results.
 Similarly, in statistical tests we would like to determine
whether two quantities are the same, two types of
errors are possible:

A. A type I error occurs when we reject the hypothesis that


two quantities are the same, when they are statistically
identical.
B. A type II error occurs when we accept that they are the
same when they are not statistically identical.

The characteristics of these errors in statistical testing and the


ways we can minimize them are among the subjects of this
chapter: Statistical Treatment of Data
Confidence Intervals (when s is a
good estimate of sigma N≥30)
 In most quantitative chemical analyses, the true value of the mean,
µ, cannot be determined because a huge number of measurements
(approaching infinity) would be required.

 However, the interval surrounding the experimentally


determined mean, x, can be determined within which the
population mean µ is expected to lie with a certain degree of
probability. This interval is known as the confidence interval.
The limits of the interval are called confidence limits.
 The bigger the confidence
level, the wider the interval is.
 - For example, we might
say that it is 99% probable
that the true population
mean for a set of
potassium measurements
lies in the interval 7.25 ±
0.15 % K. Thus, the
probability that the mean
lies in the interval from
7.10 to 7.40 % K is 99%.
Confidence Interval (when s is not
known to be a good estimate of sigma and N<30)

 In case of limitations in time or in the amount of sample


available, a single set of replicate measurements must provide
not only a mean but also an estimate of precision.
 s calculated from a small set of data may be quite uncertain
(do not use z score).
 Use the t statistic which is often called Student’s t.
Student was the name used by W. S. Gossett
 Like z, t depends on the desired confidence level as well as on
the number of degrees of freedom in the calculation of s.
 You must use the t-distribution table when working
problems when the population standard deviation (σ) is not
known and the sample size is small (n<30).
 General Correct Rule: If σ is not known, then using t-
distribution is correct. If σ is known, then using the
normal distribution is correct.
Example
 The following results were obtained in the replicate
determination of the lead content of a blood sample: 0.752,
0.756, 0.752, 0.751, and 0.760 ppm Pb. Calculate the
Confidence Interval at 95%.
 Statistical analysis
All assays were performed at least in triplicate and the results
were expressed as mean±standard deviation (SD).
Statistical analyses were done using SPSS version 18.0.
Differences were evaluated by one-way analysis of
variance (ANOVA) test completed by Tukey’s multi-
comparison test. Differences were considered significant at
p<0.05. Pearson correlations were used to evaluate the
correlation between the various parameters. IC50 values were
determined by plotting a percent inhibition versus
concentration curve for all assays.
 For a comparison of more than two group means the one-way
analysis of variance (ANOVA) is the appropriate method instead
of the t test. As the ANOVA is based on the same assumption with
the t test, the interest of ANOVA is on the locations of the
distributions represented by means too. Then why is the method
comparing several means the 'analysis of variance', rather
than 'analysis of means' themselves? It is because that the
relative location of the several group means can be more
conveniently identified by variance among the group means than
comparing many group means directly when number of means
are large.
 Pearson’s correlation coefficient is the test statistics that
measures the statistical relationship, or association, between two
continuous variables. It is known as the best method of
measuring the association between variables of interest because
it is based on the method of covariance. It gives information
about the magnitude of the association, or correlation, as well as
the direction of the relationship.

You might also like