Handy Reference Sheet 2 - HRP 259 Calculation Formula's For Sample Data

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 15

Handy Reference II

HANDY REFERENCE SHEET 2 – HRP 259

Calculation Formula’s for Sample Data:


Univariate:
n
1 if success
Sample proportion: 
i 1
xi  
 0 if failure
pˆ 
n
n

Sample mean: x = x
i 1
i

n n
Sum of squares of x: SS x   ( xi  x )
2

i 1
[to ease computation: SS x  x
i 1
i
2
 nx 2 ]

n
SS
Sample variance: s x2 =  x =  (x
i 1
i  x)2
n 1
n 1

Sample standard deviation: s x =


SS x
=  (x
i 1
i  x)2
n 1
n 1

sx  (x
i 1
i  x)2
Standard error of the sample mean: =
n n 1
n

2. Bivariate

n n
Sum of squares of xy: SS xy   ( xi  x )( y i  y ) [to ease computation: SS xy  x y i i  nx y ]
i 1 i 1
n

Sample Covariance: 2
s xy =
SS xy
=  (x
i 1
i  x )( y i  y )
n 1
n 1

2
s xy SS xy  (x
i 1
i  x )( y i  y )
Sample Correlation: rˆ  = 
s x2 s 2y SS x SS y n
 n


i 1
( xi  x ) 2 (y
i 1
i  y) 2

Variance rules for correlated random variables:


Var (x+y)=Var(x)+Var(y)+2Cov(x,y); Var (x-y)=Var(x)+Var(y)-2Cov(x,y)
vii
Handy Reference II

Hypothesis Testing

The Steps:
1. Define your hypotheses (null, alternative)
2. Specify your null distribution
3. Do an experiment
4. Calculate the p-value of what you observed
5. Reject or fail to reject (~accept) the null hypothesis

The Errors
Your Statistical True state of null hypothesis
Decision
H0 True H0 False

Reject H0 Type I error ( ) Correct

Do not reject H0 Type II Error ( )


Correct

Power=1-

viii
Handy Reference II

Confidence intervals (estimation)


For a mean (σ2 unknown):

sx
x  t n 1, / 2  [if variance known or large sample size t df , / 2  Z / 2 ]
n

For a paired difference (σ2 unknown):

sd
d  t n 1, / 2  [where di = the within-pair difference]
n

For a difference in means, 2 independent samples (σ2’s unknown but roughly equal):
s 2p s 2p SS x  SS y (n x  1) s x2  ( n y  1) s 2y
( x  y )  t n  2, / 2   s 2p = or
nx ny n2 n2

For a proportion:

( pˆ )(1  pˆ )
pˆ  Z  / 2 
n

For a difference in proportions, 2 independent samples:

( pˆ 1 )(1  pˆ 1 ) ( pˆ 2 )(1  pˆ 2 )
( pˆ 1  pˆ 2 )  Z  / 2  
n1 n2

For a correlation coefficient

1  rˆ 2
rˆ  t n  2, / 2 *
n2

For a regression coefficient:


n

ˆ  t n  2, / 2 *
s2
[ ˆ SS xy (y
i 1
i  yˆ i ) 2
]
SS x  ;s2 
SS x n2

Common values of t and Z


Confidence t10, / 2 t 20 , / 2 t 30, / 2 t 50, / 2 t100 , / 2 Z / 2
level
90% 1.81 1.73 1.70 1.68 1.66 1.64
95% 2.23 2.09 2.04 2.01 1.98 1.96

ix
Handy Reference II

99% 3.17 2.85 2.75 2.68 2.63 2.58


For an odds ratio:

 1 1 1 1  1 1 1 1
95% confidence limits: OR * exp  1.96 a

b

c
 
d 
, OR * exp
 1.96
 a

b

c
 
d 

For a risk ratio:

95% confidence limits:


 1 a /( a  b ) 1 c /( c  d )   1 a /( a  b ) 1 c /( c  d ) 
 1.96    1.96  
 a c   a c 
RR * exp , RR * exp

x
Handy Reference II

Corresponding hypothesis tests

Test for Ho:  μ= μo (σ2 unknown):


x  0
t n 1 
sx
n

Test for Ho:  μd = 0 (σ2 unknown):


d 0
t n 1 
sd
n

Test for Ho:  μx- μy = 0 (σ2 unknown, but roughly equal):


( x  y)  0
t n2 
s 2p s 2p

nx ny

Test for Ho:  p = po:


pˆ  p 0
Z
( p 0 )(1  p 0 )
n

Test for Ho: p1­ p2= 0:

( pˆ 1  pˆ 2 )  0 n1 pˆ 1  n 2 pˆ 2
Z ;p
( p )(1  p ) ( p )(1  p ) n1  n 2

n1 n2

Test for Ho: r = 0:


rˆ  0
t n2 
1  rˆ 2
n2

Test for: Ho: β = 0
ˆ  0
t n2 
s2
SS x

xi
Handy Reference II

Corresponding sample size/power


Sample size required to test Ho:  μd = 0 (paired difference ttest):
 d2 ( Z power  Z  / 2 ) 2
n
d 2

Corresponding power for a given n:

d
Z power  n  Z / 2
d

Smaller group sample size required to test Ho:  μx – μy = 0 (two sample ttest):
(where r=ratio of larger group to smaller group)
(r  1)  ( Z power  Z  / 2 )
2 2
n smaller 
r ( x   y ) 2

Corresponding power for a given n:

x  y nr
Z power   Z / 2
 r 1

Smaller group sample size required to test Ho:  p1 – p2 = 0 (difference in two proportions):
(where r=ratio of larger group to smaller group)
2
(r  1) p (1  p )( Z power  Z  / 2 )
n smaller 
r ( p1  p 2 ) 2

Corresponding power for a given n:

p1  p 2 nr
Z power   Z / 2
p (1  p ) r 1

Sample size required to test Ho:  r = 0 (correlation/equivalent to simple linear regression):


(where r=ratio of larger group to smaller group)
(1  r ) 2 ( Z power  Z  / 2 ) 2
n 2
r2

Corresponding power for a given n:

r
Z power  n  2  Z / 2
1 r2

xii
Handy Reference II

Common values of Zpower


Zpower: .25 .52 .84 1.28 1.64 2.33
Power: 60% 70% 80% 90% 95% 99%
Linear regression

Assumptions of Linear Regression

Linear regression assumes that…


1. The relationship between X and Y is linear
2. Y is distributed normally at each value of X
3. The variance of Y at every value of X is the same (homogeneity of variances)

xiii
Handy Reference II

ANOVA TABLE

Source
Sourceofof Sum of squares MeanMean
Sum Sum
of of
variation
variation d.f.
d.f. Sum of squares Squares
Squares F-statistic
F-statistic p-value
p-value
kk


Between
Model k-1
k-1 SSM SSB SSM SSB Go toGo to
SSB  nn (( yyii  yy))22
SSM
(k(klevels
groups) i k  1k  1 k 1 k 1
of X) ii 11 Fk­1,N­kFk­1,nk­k
SSE SSW chart chart
N  k nk  k
Within
Error nk-k
N-k k n 2 SSW
y i ) 2 s  s  N  knk  k
N
2 SSE
SSE 
j 1
2
SSW  ( y ij ( yyˆ i ) ij
i 1 j 1
TSS=
Total variation N-1 TSS=
Total nk-1 n

variation SS y 
SS y   ( y  y)
k

 ( y  y )
i 1
n
i
ij
2
2

i 1 j 1

variation explained by the predictor SSB 1  SSW


Coefficient of Determination: r 2  R 2  = 
total variation in the outcome TSS TSS

ANOVA TABLE FOR linear regression (more general) case

Coefficient of Determination:
variation explained by the predictor SSM 1  SSE
r 2  R 2   
total variation in the outcome TSS TSS

xiv
Handy Reference II

Probability distributions often used in statistics:

T-distribution
x
Given n independent observations x i , t 
s/ n

The Chi-Square Distribution


n
 n   Z 2 ; where Z~ Normal(0,1)
i 1

E(χn) = n
Var(χn) = 2n

The F- Distribution

n
Fn,m=
n
m
m

xv
Handy Reference II

Summary of common statistical tests for epidemiology/clinical research:


Choice of appropriate statistical test or measure of association for various types of data by study
design.

Types of variables to be analyzed

Predictor (independent) Outcome (dependent)


Statistical procedure
variable/s variable
or measure of association

Cross-sectional/case-control studies

Binary Continuous T-test*

Categorical Continuous ANOVA*

Continuous Continuous Simple linear regression

Multivariate
(categorical and Continuous Multiple linear regression
continuous)

Categorical Categorical Chi-square test§

Binary Binary Odds ratio, Mantel-Haenszel OR

Multivariate (categorical
Binary Logistic regression
and continuous)

Cohort Studies/Clinical Trials

Binary Binary Relative risk

Kaplan-Meier curve/ log-rank


Categorical Time-to-event
test

Multivariate (categorical
Time-to-event Cox-proportional hazards model
and continuous)

Categorical Continuous—repeated Repeated-measures ANOVA

Multivariate (categorical Mixed models for repeated


Continuous—repeated
and continuous) measures
*Non-parametric tests are used when the outcome variable is clearly non-normal and sample size is small.

16
Handy Reference II

§
Fisher’s exact test is used when the expected cells contain less than 5 subjects.

17
Handy Reference II

Course coverage in the HRP statistics sequence:


Choice of appropriate statistical test or measure of association for various types of data by study
design.

Types of variables to be analyzed

Predictor (independent) Outcome (dependent)


Statistical procedure
variable/s variable
or measure of association

Cross-sectional/case-control studies

Binary Continuous T-test*

Categorical Continuous ANOVA*

Continuous Continuous Simple linear regression

Multivariate HRP259
(categorical and Continuous
Multiple linear regression
continuous)

Categorical Categorical Chi-square test§

Binary Binary Odds ratio, Mantel-Haenszel OR

Multivariate (categorical
Binary Logistic regression HRP261
and continuous)

Cohort Studies/Clinical Trials

Binary Binary Risk ratio

Kaplan-Meier curve/ log-rank


Categorical Time-to-event
test

Multivariate (categorical Cox-proportional hazards model


Time-to-event
and continuous) (hazard ratios) HRP262

Categorical Continuous—repeated Repeated-measures ANOVA

Multivariate (categorical Mixed models for repeated


Continuous—repeated
and continuous) measures
*Non-parametric tests are used when the outcome variable is clearly non-normal and sample size is small.

18
Handy Reference II

§
Fisher’s exact test is used when the expected cells contain less than 5 subjects.

19
Handy Reference II

Corresponding SAS PROCs:


Choice of appropriate statistical test or measure of association for various types of data by study
design.

Types of variables to be analyzed

Statistical procedure SAS PROC


or measure of association
Predictor Outcome

Cross-sectional/case-control studies

Binary Continuous T-test* PROC TTEST

Categorical Continuous ANOVA* PROC ANOVA

Continuous Continuous Simple linear regression PROC REG

Multivariate Multiple linear regression


PROC GLM
(categorical Continuous
/continuous)

Categorical Categorical Chi-square test§ PROC FREQ

Binary Binary Odds ratio, Mantel-Haenszel OR PROC FREQ

Multivariate
(categorical/ Binary Logistic regression PROC LOGISTIC
continuous)
Cohort Studies/Clinical Trials
Binary Binary Risk ratio PROC FREQ

Categorical Time-to-event Kaplan-Meier curve/ log-rank test PROC LIFETEST

Multivariate Cox-proportional hazards model


(categorical and Time-to-event (hazard ratios) PROC PHREG
continuous)

Continuous— Repeated-measures ANOVA


Categorical PROC GLM
repeated

Multivariate Mixed models for repeated measures


Continuous—
(categorical and PROC MIXED
repeated
continuous)

20
Handy Reference II

*Non-parametric equivalents: PROC NPAR1WAY; §Fisher’s exact test: PROC FREQ, option: exact

21

You might also like