Biostatistics and Orthodontics
Biostatistics and Orthodontics
Biostatistics and Orthodontics
First part
• History of biostatistics
• Definition of biostatistics
• Basics of research methodology
• Measures of central tendency
• Measures of dispersion
• Methods of Data presentation
Second part
• Sampling variability
• Significance
• Tests of significanc
Hypothesis testing
What is a Hypothesis?
What is a Hypothesis?
•A hypothesis is an I assume the mean GPA
assumption about the of this class is 3.5!
population parameter.
– A parameter is a
characteristic of the
population, like its mean
or variance.
– The parameter must be
identified before
analysis.
.
The Null Hypothesis, H0
• States the Assumption (numerical) to be tested
• e.g. The grade point average of juniors is at least
3.0 (H0: 3.0)
• Begin with the assumption that the null
hypothesis is TRUE.
(Similar to the notion of innocent until proven guilty)
• Steps:
– State the Null Hypothesis (H0: 3.0)
Assume the
population
mean age is 50.
(Null Hypothesis) Population
The Sample
Is X 20 50? Mean Is 20
No, not likely!
REJECT
Null Hypothesis Sample
Our hypothesis testing procedure
• Set by researcher
• Type II Error
– Do Not Reject False Null Hypothesis (“False
Negative”)
– Probability of Type II Error Is (Beta)
Level of Significance, and
the Rejection Region
H0: 3 Critical
Value(s)
H1: < 3
Rejection 0
Regions
H0: 3
H1: > 3
0
/2
H0: 3
H1: 3
0
Type I error
• We fixed the rejection region so that, when
the null hypothesis is true, we have a 5%
chance of incorrect rejecting the hypothesis.
• This is called the type I error, or “size” of the
test.
• This of course also means that, when the
null hypothesis is true, we have a 95%
chance of making the correct decision.
P < 0.05
Type II error
Factors Affecting
Type II Error,
• True Value of Population Parameter
– Increases When Difference Between Hypothesized
Parameter & True Value Decreases
• Significance Level
– Increases When Decreases
• Population Standard Deviation
– Increases When Increases
Factors Affecting
Type II Error,
• True Value of Population Parameter
– Increases When Difference Between Hypothesized
Parameter & True Value Decreases
• Significance Level
– Increases When Decreases
• Population Standard Deviation
– Increases When Increases
• Sample Size n
– Increases When n Decreases
n
Hypothesis Testing: Steps
Hypothesis
Testing
Procedures
Parametric Nonparametric
Wilcoxon Kruskal-Wallis
Rank Sum H-Test
Test
One-Way Many More Tests Exist!
Z Test t Test
ANOVA
o Means and standard deviations are called
Parameters; all theoretical distributions have
parameters.
o Statistical tests that assume a distribution and
use parameters are called parametric tests
o Statistical tests that don't assume a
distribution or use parameters are called
nonparametric tests
When to use non parametric tests????
• While many things in nature, and science, are
normally distributed, some are not. In this case
using a t-test, for example, could be inappropriate
and misleading.
• Examples:
– Nominal data: race, sex,
– Ordered categorical data: mild, moderate, severe
– Likert scales: strongly disagree, disagree, no
opinion, agree, strongly agree
How do nonparametric tests work?
6 75 Girl
8 7
7 80 Girl
8 85 Boy
9 10
9 90 Boy
10 95 Girl
What about ties? Use average ranks of the tied scores
Test Ranks
Scores Rank Score Sex
Boys Girls 1 50 Girl Boys Girls
2 55 Boy
2 1
3 60 Girl
70 60
4 65 Boy 4 3
90 50 5 70 Boy
6 75 Girl 5 6
85 95
7 7.5 85 Girl
8 7
55 85 8 7.5 85 Boy
7.5 7.5
65 75 9 90 Boy 9 10
10 95 Girl
Commonly used nonparametric tests
Wilcoxon Rank Sum Test
n1 n 1 1
U 1 n 1n 2 R1
2
n1= sample size of group 1
U2=n1n2-U1 n2= sample size of group 2
R1= sum of ranks of group 1
Mann-Whitney U test
Null hypothesis
Sample
The two groups
Have the same
median
Test statistic
U1 or U2 Null distribution
compare
(use the largest) U with n1, n2
• Test of proportions
• Non parametric test
• Dichotomous variables are used
• Tests the association between two
factors
e.g. treatment and disease
gender and mortality
• It is the only test which can be used as
parametric as well as nonparametric test.
• The test we use to measure the differences
between what is observed and what is
expected according to an assumed
hypothesis is called the chi-square test.
Important
χ = ∑ (O – E)
2 2
(Note: that although there are 3 cells in the table that are not greater than 5,
these are observed frequencies. It is only the expected frequencies that have to
Expected frequency = row total x column total
Grand total
Eg: expected frequency for old industry in LE1 = (50 x 13) / 92 = 7.07
Add up all of the above numbers to obtain the value for chi square: χ
2 = 15.14.
• Look up the significance tables. These will tell
you whether to accept the null hypothesis or
reject it.
• Wilcoxon Rank-sum test ~ t test
– (More commonly called the Mann-Whitney test)
• Z test
• t test
• f test
• ANOVA test
t test-origin
• Founder WS Gosset
• Wrote under the pseudonym “Student”
• Mostly worked in tea (t) time
• ? Hence known as Student's t test.
• Preferable when the n < 60
• Certainly if n < 30
Is there a difference?
between …means,
who is meaner?
Statistical Analysis
control treatment
group group
mean mean
Is there a difference?
What does difference mean?
The mean difference
medium is the same for all
variability three cases
high
variability
low
variability
So we estimate
= t-value
low
variability
Probability - p
0.05
0.025 0.025
MSgroup
F
MSerror
SSgroup SSgroup SSerror SSerror
MSgroup MSerror
dfgroup k 1 dferror N k
�
SSgroup ni (Y i Y) 2 SSerror si2 (ni 1)
�
Y i = mean of group i ni = size of sample i
Y = overall mean N = total sample size
ANOVA
Null hypothesis
k Samples All groups have
the same mean
Test statistic
MSgroup Null distribution
F compare
F with k-1, N-k df
MSerror