Creative Commons Attribution-Noncommercial-Sharealike License
Creative Commons Attribution-Noncommercial-Sharealike License
Creative Commons Attribution-Noncommercial-Sharealike License
Copyright 2009, The Johns Hopkins University and Saifuddin Ahmed. All rights reserved. Use of these materials
permitted only in accordance with license rights granted. Materials provided AS IS; no representations or
warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently
review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for
obtaining permissions for use from third parties as needed.
weighting
Sample Weighting
The purpose of weighting sample data is to
improve representativeness of the sample in
terms of:
size
Distribution, and
characteristics of the study population.
T ( y ) = N * y,
where y is the estimated mean
We may rewrite this as:
N
T ( y) = * n * y
n
An example
N = 300,000
n = 300
Sampling fraction (f) = 300/300,000 = 0.1
y_bar(mean) = 0.5 (50% of children are
immunized)
ta v005
weight |
Freq.
Percent
Cum.
------------+----------------------------------199788 |
326
5.15
5.15
203687 |
107
1.69
6.84
342747 |
91
1.44
8.27
352024 |
312
4.93
13.20
473247 |
262
4.14
17.33
571152 |
248
3.91
21.25
726240 |
267
4.21
25.46
728423 |
128
2.02
27.48
792283 |
81
1.28
28.76
836419 |
187
2.95
31.71
845095 |
294
4.64
36.35
851062 |
138
2.18
38.53
907765 |
384
6.06
44.59
930111 |
242
3.82
48.41
967833 |
134
2.12
50.53
979842 |
391
6.17
56.70
1005824 |
348
5.49
62.19
1026552 |
67
1.06
63.25
1068896 |
145
2.29
65.54
1083215 |
496
7.83
73.37
1095179 |
79
1.25
74.62
1106982 |
120
1.89
76.51
1224149 |
417
6.58
83.09
1312089 |
76
1.20
84.29
1465329 |
884
13.95
98.25
1608275 |
111
1.75
100.00
------------+----------------------------------Total |
6,335
100.00
Why do we weight?
To improve representativeness of the
sample in terms of size, distribution and
characteristics of the study population.
to ensure that the estimates are simple
unbiased estimates.
When to weight?
For appropriate representativeness of
smaller domain (e.g., residence,
geographical territories, race, sex )
Fixed sample size for different
geographical areas
Defect in sampling frame, errors in
selection, high non-response
Disadvantages
Increased complexity
Inconvenience
Increased variance with haphazard
weighting
Analysis /statistical programming
Cost
Higher possibilities of error
Increased bias with incorrect weight
Region = A
N =50,000
n=500
P=0.5
Region = B
N =15,000
n =500
P=0.6
(# immunized=500*.5
=250)
(# immunized=500*6
=300)
Region = C
N =30,000
n=500
P=0.7
Region = D
N =5,000
n=500
P=0.8
(# immunized=500*.7
=350)
(# immunized=500*.8
=400)
Nh
wh =
nh
50,000
wA =
= 100
500
15,000
wB =
= 30
500
30,000
wC =
= 60
500
5,000
wD =
= 10
500
t=
H nh
h =1i =1
wh yih
t
y=
wh
. di (100*.5+30*.6+60*.7+10*.8)/ (100+30+60+10)
.59
. di (100*250+30*300+60*350+10*400)/ (50000+15000+30000+5000)
.59