Methods For Proportions

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 19

Methods for Proportions

Relations between Categorical Variables


Chapter 10

Goals for Chapter 10

1. Standard deviations for proportion differences


2. Confidence intervals and hypothesis tests for
proportion differences
3. Contingency Tables for several proportions
4. Statistical Significance:
the Chi-Square statistic for a contingency table
5. Relative Risk, Increased Risk, Odds Ratio

Difference in Sample Proportions-standard deviation, confidence interval

(notation: pi =xi/Ni is observed proportion for sample i, i=


1 or 2 for two samples; note: text uses hat, with carat
over it)
estimated s.d. for proportion difference:
s1- 2 = [p1(1-p1)/N1 + p2(1-p2)/N2]
Confidence Interval for proportion difference:
p1-p2 -z1-/2 s1- 2 1- 2 p1-p2 +z1-/2 s1- 2
Hypothesis test for proportion difference:
use z-statistic: z = (p1-p2)/ s1- 2
formula above assumes null hypothesis,
H0:
population proportion difference is 0.

Example for Proportion Differences

A study classified pregnant women according to


whether they smoked and whether they were able
to get pregnant during the first cycle they tried.
RESULTS:
71

100

Percent 1st
cycle
29.0%

1st cycle 2nd cycle Total


Smoker 29
Non

198

288

486

40.7%

Total

227

359

586

38.7%

Calculating Conditional Percentages

What is proportion of women who smoke who also become


pregnant during the first cycle? 29 /100 = 29 %
What is proportion of women who dont smoke who also
become pregnant during the first cycle? 198/486=40.7 %
all

227

nonsmoker

198

29

smoker

0%

486

288

71

20% 40% 60% 80% 100%

1st cycle?
2nd+ cycle?

Statistical Significance of 2x2 Tables--1

Strength of Relation:
Relation Compare percents or rates of
those who do with those who dont;
Example: smokers % pregnant for 1st cycle, 29%

nonsmokers % pregnant 1st cycle, 41%


Therefore 41 / 29 = 1.4 times as likely to become
pregnant during 1st cycle if nonsmoker.
Size of Study Sample:
Sample How does the number of
subjects affect the significance of the result?
Clearly the result becomes more significant (not
result of chance) as sample size increases.
If there had been only 59 women in the study,
difference in proportions would be much less
significant.

Assessing Statistical Significance of Tables--2

We use the Chi-Square Statistic to determine whether


differences between proportions is real or due to chance .
The Chi-square statistic shows how the distribution of
observed proportions compared to those expected on the
basis of pure chance varies; for example, if we tossed
snake-eyes in craps on every throw we might think the
dice were loaded
For previous example, if there were no difference between
smokers and nonsmokers, we would expect the proportions
for both to be the same as in the total:
First cycle:227/586 =0.387 or 38.7%; on the basis of this
expected proportion, we calculate the numbers:

Calculating the Chisquare Statistic-Expected Values

Since the total number of smokers is 100, there would


be 100x 0.387 = 38.7 smokers pregnant in the first
cycle if there no difference between smokers and non.
Since the total number of pregnant during the first
cycle is 227, there would be 227-38.7= 188.3
nonsmokers pregnant during the first cycle,
1st cycle
smokers 38.7
expected
188.3
non
expected
Total
227

2nd cycle

Total

61.3

100

297.7

486

359

586

Calculating the Chisquare Statistic--Differences

Once the Expected values for each cell are calculated,


we take the differences between the observed and
expected values for each cell i, observedi - expectedi:
Note that we only have to calculate one difference; the
differences in rows or columns have to sum to zero.
2nd cycle

total

smokers -9.7
difference

+9.7

+9.7
non
difference

-9.7

1st cycle

Calculating the Chisquare Statistic-2x2 Tables

Once the differences, Di, and expected values, Ei, for


each cell are calculated, then the chisquare statistic is
evaluated from the formula.

2 = Di2 / Ei,
2 = (9.7)2[ 1/38.7 + 1/61.3 + 1/188.3 + 1/297.7] = 4.78
This value is greater than 3.84, the critical value for chisquare
at a 95% significance level
1st cycle
smokers 38.7
expected
188.3
non
expected
Total
227

2nd cycle

Total

61.3

100

297.7

486

359

586

Two X Two Tables and Chi-Square Statistics

Example Are males more likely to be underachievers?


Students classified as underachievers if grades in high
school below the prediction given by a reading test at Age
12.
Total

Under
Over

Girls
Boys
0%

Boys
Girls
Total

20%

40%

Under
26
8
34

60%

Over
13
22
25

80%

Total
39
30
69

100%

%Under
61
27
49

2x2 Table and Chisquare Statistic Example,cont.

Calculation of Chisquare Statistic for previous example


1. Compute expected values: boys under: (39/69)x34 = 19.2 ;
girls under: 34 - 19.2 = 14.8;
boys over: 39 - 19.2 = 19.8;
girls over: 30
- 14.8 = 15.2
2. Take the difference between observed and expected, square it,
and divide by expected for each cell: boys under: (-6.8) 2/ 19.2 =
2.41; girls under: (+6.8)2 / 14.8= 3.12; boys over: (+6.8) 2 /
19.8= 2.34; girls over: (-6.8)2/ 15.2 = 3.04.
3. Sum the terms calculated in 2 to get the Chisquare statistic:
Chisquare = 2.41 + 3.12 + 2.34 + 3.04 = 10.91
4. Compare the calculated Chisquare statistic with 3.84 to
determine significance (at the 95% level). In this example, 10.91
is much greater than 3.84 so results (difference in proportion) is
statistically significant.

Risk and Odds

Both the Risk and Odds give information about the likelihood of a
positive response to a categorical variable, but their numerical values
differ. Example: 2x2 Table gives results for stopping smoking after
eight weeks use of either a nicotine patch or placebo: Note that the risk
of continuing to smoke after using the nicotine patch is 0.47 or 47%
compared to the greater risk for the placebo use, 0.80 or 80 %. Thus the
RISK is equivalent to the conditional probability for the outcome
variable, given a response variable. The ODDS is the ratio of these
conditional probabilities for the two outcome variables and can be less
than or greater than one.

Nicotine
Placebo

Smokes Stops Total Risk


Odds
64
56
120 64/120=0.47 64/56=1.1
96
24
120 96/120 = 0.8 96/24=4.0

Relative Risk and 2x2 Tables

Every 2x2 table will have two explanatory variables (eg,


for the previous slide, whether a nicotine patch or a
placebo was used.
The ratio of the risks for these two variables is called the
RELATIVE RISK.
Example
RR, relative risk of continuing to smoke if placebo
rather than nicotine patch used:
RR= 0.80 / 0.47 = 80% / 47% =
1.70

Odds Ratio and 2x2 Tables

The ratio of the odds for the two explanatory variables is


called, as might be expected, the Odds Ratio (OR). If the
odds are very small then the Odds Ratio and Relative Risk
are approximately equal.
Examples:
stopping smoking with nicotine patch:
Odds(placebo) = 96/24 = 4.0;
Odds (nicotine) = 64/56 =1.1;
Thus OR = (96/24) / (64/ 56) = 3.5,
(compared to RR = 1.7)

Simpsons Paradox and Hidden Variables

Example : 1972 admissions rates for graduate programs, UC


(Berkeley)--found overall that percent of women applicants
admitted was less than the percent of men applicants, even
though women percentages were higher for individual
departments. (see Exercise 13). The paradox can be explained
by different overall selectivity in each program and different
proportions of men and women applying to each program.

Program | Sex

Admit

Deny

Percent Admit

A | Men
A | Women

400
50

250 (400 / 650 ) 61.5


25 ( 50 / 75 ) 66.7

B | Men
B | Women
A+B | Men
A+B | Women

50
125
450
175

300
300
550
325

( 50 / 350 ) 14.3
(125 / 425) 29.4
( 450 /1000) 45.0
( 175 / 500) 35.0

Goodness of Fit-Comparing Observed with Theoretical Proportions

Procedure:
tabulate observed frequencies,Ni,for each category;
tabulate expected (theoretical) frequencies, Ei;
take difference between corresponding observed
and theoretical frequencies, Di= Ni-Ei
calculate Chi-square statistic by formula
2 = Di2 / Ei,
with the degrees of freedom (df) = k-1, where k is
the number of categories (one proportion)
Example: Exercises 10-14,10.15 (p 402).

Comparing Several Proportions, Categories

Procedure:
Set up a k (no. of explanatory categories) by r (number of
response proportions) contingency table;
tabulate number for each cell in the table, marginal totals for
each category, Nk, response variable, Nr, and grand total, N.
For the cell in the table corresponding to the kth category, rth
response variable, the expected number (if the proportion
would be the same for all response variables) is E kr = (Nk Nr)
/N
Calculate the difference Dkr = Nkr - Ekr and calculate a Chisquare statistic from the formula

2 = Dkr2 / Ekr
Degrees of freedom, df = (k-1)(r-1)

Comparing Several Proportions, Categories


Example, Exercise 10.18, p.408:
A study of potential age discrimination considers promotions
among middle managers in a large company. The data are

promoted: 9 under 30; 29 in 30 to 39 category; 32 in 40 to 49; and


10 in 50 and over category (total 80);
not promoted: 41 under 30; 41 in 30 to 39 category,
48 in 40 to 49, and 49 in 50 and over (total 170)

You might also like