Handy Reference Sheet 2 - HRP 259 Calculation Formula's For Sample Data

Handy Reference II
HANDY REFERENCE SHEET 2 – HRP 259
Calculation Formula’s for Sample Data:

Univariate:
n
1 if success
Sample proportion: 
i 1
xi  
 0 if failure
pˆ 
n
n
Sample mean: x = x
i 1
i
n n
Sum of squares of x: SS x   ( xi  x )
2
i 1
[to ease computation: SS x  x
i 1
i
2
 nx 2 ]
n
SS
Sample variance: s x2 = x =  (x
i 1
i  x)2
n 1
n 1
Sample standard deviation: s x =

SS x
=  (x
i 1
i  x)2
n 1
n 1
sx  (x
i 1
i  x)2
Standard error of the sample mean: =
n n 1
n
2. Bivariate
n n
Sum of squares of xy: SS xy   ( xi  x )( y i  y ) [to ease computation: SS xy  x y i i  nx y ]
i 1 i 1
n
Sample Covariance: 2
s xy =
SS xy
=  (x
i 1
i  x )( y i  y )
n 1
n 1
2
s xy SS xy  (x
i 1
i  x )( y i  y )
Sample Correlation: rˆ  = 
s x2 s 2y SS x SS y n
 n

i 1
( xi  x ) 2 (y
i 1
i  y) 2
Variance rules for correlated random variables:

Var (x+y)=Var(x)+Var(y)+2Cov(x,y); Var (x-y)=Var(x)+Var(y)-2Cov(x,y)
vii
Handy Reference II
Hypothesis Testing
The Steps:
1. Define your hypotheses (null, alternative)
2. Specify your null distribution
3. Do an experiment
4. Calculate the p-value of what you observed
5. Reject or fail to reject (~accept) the null hypothesis
The Errors
Your Statistical True state of null hypothesis
Decision
H0 True H0 False
Reject H0 Type I error ( ) Correct
Do not reject H0 Type II Error ( )

Correct
Power=1-
viii
Handy Reference II
Confidence intervals (estimation)

For a mean (σ2 unknown):
sx
x  t n 1, / 2  [if variance known or large sample size t df , / 2  Z / 2 ]
n
For a paired difference (σ2 unknown):
sd
d  t n 1, / 2  [where di = the within-pair difference]
n
For a difference in means, 2 independent samples (σ2’s unknown but roughly equal):
s 2p s 2p SS x  SS y (n x  1) s x2  ( n y  1) s 2y
( x  y )  t n  2, / 2   s 2p = or
nx ny n2 n2
For a proportion:
( pˆ )(1  pˆ )
pˆ  Z  / 2 
n
For a difference in proportions, 2 independent samples:
( pˆ 1 )(1  pˆ 1 ) ( pˆ 2 )(1  pˆ 2 )
( pˆ 1  pˆ 2 )  Z  / 2  
n1 n2
For a correlation coefficient
1  rˆ 2
rˆ  t n  2, / 2 *
n2
For a regression coefficient:

n
ˆ  t n  2, / 2 *
s2
[ ˆ SS xy (y
i 1
i  yˆ i ) 2
]
SS x  ;s2 
SS x n2
Common values of t and Z

Confidence t10, / 2 t 20 , / 2 t 30, / 2 t 50, / 2 t100 , / 2 Z / 2
level
90% 1.81 1.73 1.70 1.68 1.66 1.64
95% 2.23 2.09 2.04 2.01 1.98 1.96
ix
Handy Reference II
99% 3.17 2.85 2.75 2.68 2.63 2.58

For an odds ratio:
 1 1 1 1  1 1 1 1
95% confidence limits: OR * exp  1.96 a

b

c
 
d 
, OR * exp
 1.96
 a

b

c
 
d 
For a risk ratio:
95% confidence limits:

 1 a /( a  b ) 1 c /( c  d )   1 a /( a  b ) 1 c /( c  d ) 
 1.96    1.96  
 a c   a c 
RR * exp , RR * exp
x
Handy Reference II
Corresponding hypothesis tests
Test for Ho: μ= μo (σ2 unknown):

x  0
t n 1 
sx
n
Test for Ho: μd = 0 (σ2 unknown):

d 0
t n 1 
sd
n
Test for Ho: μx- μy = 0 (σ2 unknown, but roughly equal):

( x  y)  0
t n2 
s 2p s 2p

nx ny
Test for Ho: p = po:

pˆ  p 0
Z
( p 0 )(1  p 0 )
n
Test for Ho: p1 p2= 0:
( pˆ 1  pˆ 2 )  0 n1 pˆ 1  n 2 pˆ 2
Z ;p
( p )(1  p ) ( p )(1  p ) n1  n 2

n1 n2
Test for Ho: r = 0:

rˆ  0
t n2 
1  rˆ 2
n2
Test for: Ho: β = 0
ˆ  0
t n2 
s2
SS x
xi
Handy Reference II
Corresponding sample size/power

Sample size required to test Ho: μd = 0 (paired difference ttest):
 d2 ( Z power  Z  / 2 ) 2
n
d 2
Corresponding power for a given n:
d
Z power  n  Z / 2
d
Smaller group sample size required to test Ho: μx – μy = 0 (two sample ttest):
(where r=ratio of larger group to smaller group)
(r  1)  ( Z power  Z  / 2 )
2 2
n smaller 
r ( x   y ) 2
x  y nr
Z power   Z / 2
 r 1
Smaller group sample size required to test Ho: p1 – p2 = 0 (difference in two proportions):
2
(r  1) p (1  p )( Z power  Z  / 2 )
n smaller 
r ( p1  p 2 ) 2
p1  p 2 nr
Z power   Z / 2
p (1  p ) r 1
Sample size required to test Ho: r = 0 (correlation/equivalent to simple linear regression):

(1  r ) 2 ( Z power  Z  / 2 ) 2
n 2
r2
r
Z power  n  2  Z / 2
1 r2
xii
Handy Reference II
Common values of Zpower

Zpower: .25 .52 .84 1.28 1.64 2.33
Power: 60% 70% 80% 90% 95% 99%
Linear regression
Assumptions of Linear Regression
Linear regression assumes that…

1. The relationship between X and Y is linear
2. Y is distributed normally at each value of X
3. The variance of Y at every value of X is the same (homogeneity of variances)
xiii
Handy Reference II
ANOVA TABLE
Source
Sourceofof Sum of squares MeanMean
Sum Sum
of of
variation
variation d.f.
d.f. Sum of squares Squares
Squares F-statistic
F-statistic p-value
p-value
kk

Between
Model k-1
k-1 SSM SSB SSM SSB Go toGo to
SSB  nn (( yyii  yy))22
SSM
(k(klevels
groups) i k  1k  1 k 1 k 1
of X) ii 11 Fk1,NkFk1,nkk
SSE SSW chart chart
N  k nk  k
Within
Error nk-k
N-k k n 2 SSW
y i ) 2 s  s  N  knk  k
N
2 SSE
SSE 
j 1
2
SSW  ( y ij ( yyˆ i ) ij
i 1 j 1
TSS=
Total variation N-1 TSS=
Total nk-1 n
variation SS y 
SS y   ( y  y)
k
 ( y  y )
i 1
n
i
ij
2
2
i 1 j 1
variation explained by the predictor SSB 1  SSW

Coefficient of Determination: r 2  R 2  = 
total variation in the outcome TSS TSS
ANOVA TABLE FOR linear regression (more general) case
Coefficient of Determination:
variation explained by the predictor SSM 1  SSE
r 2  R 2   
total variation in the outcome TSS TSS
xiv
Handy Reference II
Probability distributions often used in statistics:
T-distribution
x
Given n independent observations x i , t 
s/ n
The Chi-Square Distribution

n
 n   Z 2 ; where Z~ Normal(0,1)
i 1
E(χn) = n
Var(χn) = 2n
The F- Distribution
n
Fn,m=
n
m
m
xv
Handy Reference II
Summary of common statistical tests for epidemiology/clinical research:

Choice of appropriate statistical test or measure of association for various types of data by study
design.
Types of variables to be analyzed
Predictor (independent) Outcome (dependent)

Statistical procedure
variable/s variable
or measure of association
Cross-sectional/case-control studies
Binary Continuous T-test*
Categorical Continuous ANOVA*
Continuous Continuous Simple linear regression
Multivariate
(categorical and Continuous Multiple linear regression
continuous)
Categorical Categorical Chi-square test§
Binary Binary Odds ratio, Mantel-Haenszel OR
Multivariate (categorical
Binary Logistic regression
and continuous)
Cohort Studies/Clinical Trials
Binary Binary Relative risk
Kaplan-Meier curve/ log-rank

Categorical Time-to-event
test
Time-to-event Cox-proportional hazards model
and continuous)
Categorical Continuous—repeated Repeated-measures ANOVA
Multivariate (categorical Mixed models for repeated

Continuous—repeated
and continuous) measures
*Non-parametric tests are used when the outcome variable is clearly non-normal and sample size is small.
16
Handy Reference II
§
Fisher’s exact test is used when the expected cells contain less than 5 subjects.
17
Handy Reference II
Course coverage in the HRP statistics sequence:

design.
Predictor (independent) Outcome (dependent)

Statistical procedure
variable/s variable
Binary Continuous T-test*
Categorical Continuous ANOVA*
Continuous Continuous Simple linear regression
Multivariate HRP259
(categorical and Continuous
Multiple linear regression
continuous)
Categorical Categorical Chi-square test§
Binary Binary Odds ratio, Mantel-Haenszel OR
Binary Logistic regression HRP261
and continuous)
Binary Binary Risk ratio
Kaplan-Meier curve/ log-rank

Categorical Time-to-event
test
Multivariate (categorical Cox-proportional hazards model

Time-to-event
and continuous) (hazard ratios) HRP262
Categorical Continuous—repeated Repeated-measures ANOVA
Multivariate (categorical Mixed models for repeated

Continuous—repeated
and continuous) measures
*Non-parametric tests are used when the outcome variable is clearly non-normal and sample size is small.
18
Handy Reference II
§
Fisher’s exact test is used when the expected cells contain less than 5 subjects.
19
Handy Reference II
Corresponding SAS PROCs:

design.
Statistical procedure SAS PROC

Predictor Outcome
Binary Continuous T-test* PROC TTEST
Categorical Continuous ANOVA* PROC ANOVA
Continuous Continuous Simple linear regression PROC REG
Multivariate Multiple linear regression

PROC GLM
(categorical Continuous
/continuous)
Categorical Categorical Chi-square test§ PROC FREQ
Binary Binary Odds ratio, Mantel-Haenszel OR PROC FREQ
Multivariate
(categorical/ Binary Logistic regression PROC LOGISTIC
continuous)
Binary Binary Risk ratio PROC FREQ
Categorical Time-to-event Kaplan-Meier curve/ log-rank test PROC LIFETEST
Multivariate Cox-proportional hazards model

(categorical and Time-to-event (hazard ratios) PROC PHREG
continuous)
Continuous— Repeated-measures ANOVA

Categorical PROC GLM
repeated
Multivariate Mixed models for repeated measures

Continuous—
(categorical and PROC MIXED
repeated
continuous)
20
Handy Reference II
*Non-parametric equivalents: PROC NPAR1WAY; §Fisher’s exact test: PROC FREQ, option: exact
21

Handy Reference Sheet 2 - HRP 259 Calculation Formula's For Sample Data

Uploaded by

Copyright:

Available Formats

Handy Reference Sheet 2 - HRP 259 Calculation Formula's For Sample Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handy Reference Sheet 2 - HRP 259 Calculation Formula's For Sample Data

Uploaded by

Copyright:

Available Formats

Handy Reference II

HANDY REFERENCE SHEET 2 – HRP 259

Calculation Formula’s for Sample Data:

Sample standard deviation: s x =

Variance rules for correlated random variables:

Reject H0 Type I error ( ) Correct

Do not reject H0 Type II Error ( )

Confidence intervals (estimation)

For a paired difference (σ2 unknown):

For a difference in proportions, 2 independent samples:

For a correlation coefficient

For a regression coefficient:

Common values of t and Z

99% 3.17 2.85 2.75 2.68 2.63 2.58

For a risk ratio:

95% confidence limits:

Corresponding hypothesis tests

Test for Ho: μ= μo (σ2 unknown):

Test for Ho: μd = 0 (σ2 unknown):

Test for Ho: μx- μy = 0 (σ2 unknown, but roughly equal):

Test for Ho: p = po:

Test for Ho: p1­ p2= 0:

Test for Ho: r = 0:

Corresponding sample size/power

Corresponding power for a given n:

Corresponding power for a given n:

Corresponding power for a given n:

Sample size required to test Ho: r = 0 (correlation/equivalent to simple linear regression):

Corresponding power for a given n:

Common values of Zpower

Assumptions of Linear Regression

Linear regression assumes that…

variation explained by the predictor SSB 1  SSW

ANOVA TABLE FOR linear regression (more general) case

Probability distributions often used in statistics:

The Chi-Square Distribution

Summary of common statistical tests for epidemiology/clinical research:

Types of variables to be analyzed

Predictor (independent) Outcome (dependent)

Binary Continuous T-test*

Categorical Continuous ANOVA*

Continuous Continuous Simple linear regression

Categorical Categorical Chi-square test§

Binary Binary Odds ratio, Mantel-Haenszel OR

Cohort Studies/Clinical Trials

Binary Binary Relative risk

Kaplan-Meier curve/ log-rank

Categorical Continuous—repeated Repeated-measures ANOVA

Multivariate (categorical Mixed models for repeated

Course coverage in the HRP statistics sequence:

Types of variables to be analyzed

Predictor (independent) Outcome (dependent)

Binary Continuous T-test*

Categorical Continuous ANOVA*

Continuous Continuous Simple linear regression

Categorical Categorical Chi-square test§

Binary Binary Odds ratio, Mantel-Haenszel OR

Cohort Studies/Clinical Trials

Binary Binary Risk ratio

Test for Ho: p1 p2= 0: