Summary Data

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Factors

Factor = every aspect of the experimental conditions which may influ-


ence the response, as determined in the experiment

- 2 factors are additive when there is no interaction between them.


- Factors can be labelled with a +1 or -1 when there are 2 levels
- Level = 20oC and 80oC for example

effect = ̄y+ − ̄y−

Higher order interactions, for example, TPF (temperature, pressure, flow rate) are negligible, since
they are hard to be found in nature.

HOW TO CALCULATE MAIN EFFECT

Take the difference between the average of all values with a + and the average of all values with a –
(tutorial 1 question 2a)

HOW TO CALCULATE INTERACTION EFFECT

1. Write down the sum of the factors for each experiment:


+ and + -> +
+ and - -> -
- and - -> +
2. Take the difference between the average of all values with a + and the average of all values
with a –

(tutorial 1 question 2a)

Standard deviation
An effect is usually significant if it is 2-3 times larger than the standard error.

Degrees of freedom = number of measurements - 1


HOW TO CALCULATE THE STANDARD ERROR IF THERE ARE 2 REPLICATES

1. (per experiment)

2. (per experiment)
3.

4.

5. (2.1) (tutorial 1 question 2b)

Design matrix:

Number of experiments = levelsfactors

HOW TO MAKE A REGRESSION MODEL

x are the factors

1. Fill in a, this is the intercept, this is represented by a column of +. So a is the average of all
values
2. Fill in the main and interaction effects for b, c, d…

The regression model is the line with the values for a, b, c… filled in (tutorial 1 question 3a)

In fractional design only the most important factors are selected with screening

Screening = selecting most important factors

In a full factorial design all factors are used

HOW TO MAKE A EXPERIMENTAL DESIGN

1. Look at the number of experiments, then decide if you can use a full factorial design or a
fractional design
2. Make the design matrix, vary the combinations of + and -

Variable scales

1. Nominal: value is described with words, no ranking


2. Ordinal: also described with words, values are ranked
3. Interval: numbers
4. Ratio: numbers, has a 0 value
Mean and errors
Mean:

(4.1)

Mean = average

Mode = most occurring value

Median = middle value

Variance: (4.3)

Standard deviation: (4.4)

(4.6)

Range (R) = difference between highest and lowest value

When the number of measurements is limited,

d2 is a tabulated value, approximated by

Repeatability: under same circumstances

Reproducibility: under different circumstances

Blunders = personal errors that cause large deviations

Random errors = inevitable errors, related to precision, increase the spread around the central value.

Systematic errors = influence the result in a specific direction, instrumental errors, solutions, reading
off values

Distributions
Of discrete variables:

- Uniform: every possible outcome of an experiment is equally likely


- Binominal:
(5.2)

(5.3)
p = probability to find exactly k blue balls in n attempts
- Poisson distribution for low probability of succes

The probability for k events to occur (observe a given number of events within a fixed interval
of time)
with p the probability

Of continuous variables (real valued numbers):

- Continuous uniform: the probability for each possible value of the variable is equal

(5.5)
- Exponential (wait times between number of events in Poisson distribution)

Pdf: (5.6)
Cdf: (5.7)
- Normal

Pdf: (5.9)
- Lognormal: measurements are not symmetrically spread around the mean

Pdf: (5.10)
- Students t: For , the student’s t-distribution is exactly equal to the standardized normal
distribution. The smaller the number of degrees of freedom, the larger the difference
between the normal distribution and the student’s t-distribution

- distribution: described how the square of a standard normal variable is distributed


- F-distribution: ratio of two -distributed variables
A Q-Q plot is used to check if data is normally distributed, this is on the y axis and a theoretical
distribution on the x-axis. If the data is normally distributed it has a straight line.

The binomial distribution is a generalisation of the Bernoulli


distribution for the probability of observing a specific number of
‘success’ from multiple ‘trials’

Confidence interval
HOW TO CALCULATE A CONFIDENCE INTERVAL

When number of measurements is less than 30:

95% confidence interval:

(6.6)

Look for u in table B2, α = 0.05 unless stated otherwise

(or

68% -> t = 1

95% -> t = 2

99% -> t = 3)

Hypothesis tests
Ho = no significant difference

H1 = significant difference

Two sided = can be both smaller and higher

One sided = can be either smaller or higher -> look at question, not data

One-sample = compare one sample with reference

Two-sample = compare two samples

Significance level α is mostly 0.05, gives probability of type 1 error

Type 1 error: when H0 is incorrectly rejected given by α

Type 2 error: when H0 is incorrecty accepted given by β (mostly 0.2)

HOW TO DO A ONE-SAMPLE T-TEST

(7.1)
Look for utab in table B1 or B2 (two-sided test) or B3 or B4 (one-sided test)

If ucal > utab, the null hypothesis is rejected

When the sample is small (n is less than 30), u is replaced by t

HOW TO DO AN UNPAIRED TWO-SAMPLE T-TEST

(7.2)

(7.3)

(7.4) Welch’s test

The null hypothesis is rejected if tcal > t’

HOW TO TAKE A PAIRED TWO-SAMPLE T-TEST

Paired measurements are taken from the same object or person.


(7.6)

(7.7)

HOW TO DO AN F-TEST

While an t-test compares averages between two groups, an f-test is used to compare the standard
deviations between two groups

1. (7.8)

The largest variance is the numerator, the smallest variance is the denominator

2. Look at table B7 or B8 for Ftab, if Fcal < Ftab, H0 is accepted (the variances are not different) and
you can pool the variances. If Fcal > Ftab, use Welch’s test

Always first do an F-test, then a T-test

When to use an F-test:

- When samples are not dependent on each other


- When n < 30

When no F-test:

- Simple comparison of means (you can assume the variances are the same)
- Non-normal data
- One-sample t-test
Because you put the largest variance in the numerator, and the smallest in the denominator, the f-
test is one-sided

Non-parametric tests:

Wilcoxon signed-rank test (reader page 81)

1. Calculate the differences between the observations and the reference value
2. Rank the differences ignoring the minus sign
3. Restore the original minus sign in the ranked differences
4. Rank the ranked differences (1, 2, 3, 4…) (when 2 values are equal do _.5 for both)
5. Calculate the sum of the absolute value for both the positive and negative ranks
6. The smallest of the 2 sums is Tcal
7. Table B9 lists Ttab, if the null hypothesis is rejected, this is opposite from normal

Correlation and regression


Correlation coefficient determines how good the points are on a straight line, the larger r, the better
the correlation between x and y:

(8.1)

with a intercept and b slope

Is the linear relation significant? -> 2 ways:

1. (8.2)

Ttab in table B5, degrees of freedom is n-2 because you have x and y

2. Calculate the confidence interval of the slope and intercept, if 1 lies inside the confidence
interval of the slope and 0 in the one for the intercept, the methods give the same results

HOW TO MAKE A REGRESSION LINE

(8.7) (8.5) same equation

a = average y – b * average x

HOW TO CALCULATE THE CONFIDENCE INTERVALS OF A REGRESSION LINE

residuals:
(8.8)

(8.9)

(8.10)

ANOVA
Compares more than 2 series of measurements

H0: all means are equal

H1: at least one mean is significantly different

(9.2)

(9.3)

(9.4)

- ANOVA doesn’t test whether all group means are different

- ANOVA doesn’t require that the groups sizes are equal

- ANOVA doesn’t test whether the variances of different groups are equal

You might also like