HLTH2024 Exam Practice2

HLTH2024 Research Methods in Health
Final Examination Practice Set 2
Choose the one best answer. Items have numerical and statistical content. Data for these analyses
are mostly from StatSci.org, especially from Rex Boggs from Glennmore State School, Rockhampton,
QLD who contributes to StatSci; also Statsoft.com and IBM SPSS.
Correlations
In most published literature, the word “correlation” by itself refers to the Pearson product moment
correlation unless otherwise stated. All correlations in this unit are Pearson correlations. Correlations
were discussed in the final tutorial. See also the pulse-rate study and tutorial study notes for the
Epidemiology tutorial.
1. The Pearson product moment correlation, r, is a statistic which examines the strength and
direction of association between…
A Two nominal scale variables.
B Two ordinal scale variables.
C One nominal scaled variable, and an interval or ratio-scaled variable.
D Two ratio or interval scale variables.
2. Which of these correlation values for r indicates the strongest association?
A .095
B .50
C Anything less than .05 but greater than zero.
D -.95
For Western Sydney University Research Methods classes taught by John Bidewell Page 1
HLTH2024 Research Methods in Health – Final Examination Practice Set 2
Monkey therapists
According to a report in the Journal of Rehabilitation Research and Development, 28 (2), Spring 1991,
91-961, monkeys were trained to help disabled people with household tasks. For each of 9 monkeys,
Table 1 shows how many years each trained monkey had been working and how many tasks that
monkey could do.
Table 1
Data contributed by Rex Boggs.

Years working Number of tasks
Monkey name
Hellion 10 28
Freeway 8 24
SuSu 7 28
Henri 6 28
Jo 5 27
Peepers 2 23
Cleo 1 15
Jeep 1 6
Maggie 0 23
3. The correlation (r) between the number of years the monkeys had worked and the number of
tasks the monkeys could do was +.66. What does this correlation imply?
A More years of working is strongly related to that monkey doing more tasks.
B The number of years a monkey had been working is significantly related to the number of
tasks the monkey can do.
C On average, the monkeys could do 66% of the tasks given to them.
D After each extra year of working experience, a monkey could perform 66% more tasks
compared with the same monkey’s performance during the previous year.
1 No original report author is identified for this dataset. Perhaps the author lives on bananas.
Page 2
Handy fossils
4. From a study of human physiology with N = 9 subjects, Musgrave and Harneja (1978) found a
correlation of +.86 between the length in centimetres of metacarpal I (a bone in the hand) and
the person’s overall height, p = .0032. A palaeontologist working with fossilised human remains
wants to know how reliably the length of a metacarpal I bone can predict its owner’s height in
the absence of a full skeleton. As an evidence-based physiologist, what do you recommend?
A Metacarpal I bone length is no reliable guide to a person’s height. Although the results
are statistically significant, the correlation is near 1, the null value for a correlation.
B Although there is some relationship between metacarpal I bone length and height,
Musgrave and Harneja’s results are unlikely to generalise beyond their small sample.
Therefore, metacarpal I bone length is not a reliable guide to a person’s overall height.
C In general, a longer metacarpal I bone is strongly and significantly associated with the
person being taller, and therefore is a reliable guide to overall height but only if both
measurements are done in centimetres.
D In general, a longer metacarpal I bone will strongly and reliably predict increased height
regardless of the measurement units.
Page 3
The Boggs ablution study
5. Rex Boggs from Rockhampton, QLD wanted to know whether the amount of soap used during
a shower depended on the size of the cake of soap. For 15 successive showers, Rex weighed
his soap bar in grams to see how much soap he used during each shower. The study ended
when the remaining sliver of soap broke into two and one piece disappeared down the
plughole. From the original data JB also calculated the weight loss of the soap bar with each
shower, N = 14 scores, a higher value indicating more weight lost during the shower. Because
Rex and soap bar are each a sample of N = 1, we’ll skip the significance tests because it’s hard
to specify what the population is.
Three correlations are possible:

 Shower number (1 to 15) and soap weight: r = -.99.
 Shower number and change in soap weight: r = +.17.
 Weight of the soap and the change in weight of the soap: r = -.23.
What would an evidence-based hygienist say about Rex’s soap consumption during a shower?
A With each successive shower, the soap tends to lose more weight but we can’t very
accurately predict the weight change in the soap from the shower number.
B As the soap bar shrank with successive showers, Rex used more soap per shower.
C The number of showers the soap has endured is almost perfectly correlated with the size
of the remaining soap. More showers means less soap remaining.
D All of the above.
Lucky countries
Table 2 shows correlations between selected variables from a 1995 survey of countries. The data are
old but the results are interesting.
Table 2
Correlations – world data from 1995.

N = 75 to 109 countries.
City living Literacy rate Baby mortality Calories daily Birth rate Death rate
Variable
Birth rate -0.63 -0.87 0.87 -0.76 1.00 0.37
Death rate -0.48 -0.49 0.63 -0.35 0.37 1.00
Page 4
6. Which of these conclusions is true about the countries analysed in Table 2?

This is a mind-bending question, too hard for the exam. The correlations refer to entire countries, not individual
people. Knowing that should help with interpretation. The correct answer involves a logical twist.
A Birth and death rates are higher in countries that have more of their population living in
rural areas rather than living in cities.
B Daily calorie consumption has a stronger effect on the death rate than on the birth rate.
C The average birth rate equals the average death rate across all of the countries analysed.
D Literacy rate and baby mortality have the same effects on the birth rate.
Page 5
In the diagnostic studies lecture we looked at correlations between selected male body parts. From
the same data set Figure 1 shows more correlations and the scatter-plots. Assume that the sample is
representative of the male population referred to in the question statements.
Figure 1
Weight by age Height by weight N = 251

r = -.01, p = .844 r = +.49, p < .001
180 200
160
190
140
Height (cm)
Weight (kg)
120 180
100
80 170
60 160
40
0 0
0 10 20 30 40 50 60 70 80 0 40 60 80 100 120 140 160 180
Age in years Weight (kg)
Height by age BMI by body fat
r = -.25, p < .001 r = +.73, p < .001
200 60
190 50
Height (cm)
40
180
BMI
30
170 20
160 10
0 0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50
Age in years Body fat estimated percentage
Page 6
7. From Figure 1 which of these statements is correct?
A For weight and age, the value for r is less than .05, whilst the p value suggests a strong
association correlation between weight and ages shows that weight increases with age in
the male population.
B Tall men tend on average to be moderately overweight in the sample and in the
population.
C Young males in the sample tend to be taller than older males, but this correlation is too
weak to generalise to the population.
D BMI strongly predicts body fat and body fat strongly predicts BMI in the sample and in the
population.
Cross-tabulations
Contingency tables for epidemiological studies could look like Table 3. Numbers of people, and row
and column percentages would go in the yellow shaded cells.
Table 3
Case Control Row total

Exposed
Not exposed
Column total
8. If the study were a cohort study, to find the absolute risks the researcher would be looking
particularly at the…
A hard question. The answer says a lot about the difference between the major epidemiological study designs and
how 2 x 2 tables work, so it’s worth a go. The epidemiology lecture will help.
A Row percentages.
B Column percentages.
C Total percentages.
D None of the above.
Page 7
Going grey
Family Circle magazine, 6 June 1990, reported rates of grey hair among males and females aged 25.
We’ll assume that this information is about the incidence of grey hair. Table 4 uses this information to
show grey hair incidence among an imaginary random sample of 1,000 persons aged 25 years.
Table 4
Hair greyness by Gender at age 25.

Gender Gender Row
Grey hair Male Female totals
Grey hair 145 170 315
Column % 29.0% 34.0%
Row % 46.0% 54.0%
Total % 14.5% 17.0% 31.5%
No grey hair 355 330 685
Column % 71.0% 66.0%
Row % 51.8% 48.2%
Total % 35.5% 33.0% 68.5%
Column totals 500 500 1000
Total % 50.0% 50.0% 100.0%
9. What would an epidemiologist say is the absolute risk of grey hair among males at age 25?
A 31.5%
B 46.0%
C 29.0%
D 14.5%
10. What is the absolute risk of a female having no grey hair at age 25?
A 34.0%
B 66.0%
C 48.2%
D 50.0%
Page 8
11. What is the relative risk of a 25-year old female having grey hair compared with a 25-year old
male having grey hair?
The relative risk is the ratio of two absolute risks. Assume that all calculations in the answers
are correct.
Another question that’s too hard for the exam. It’s as much about absolute risks as relative risks. Know how to find
an absolute risk in the table and you can avoid a dangerous distracter.
A 29% minus 34% = -5%
B 34%  29% = 1.17
C 34% minus 29% = 5%. Then 5%  29% = 17%, so 25-year old women are 17% more
likely than 25-year old men to have grey hair.
D 17.0%  14.5% = 1.17
Page 9
Who wants what wheels
Table 5 and Table 6 are from a survey of 300 people conducted for the Australian Federal Office for
Road Safety by the Research Institute of Gender and Health at the University of Newcastle. Subjects
were surveyed at car parks in and around the University of Newcastle and shopping centres from
December 1997 to January 1998. Table 5 shows male and female participants’ preferred size of car.
Table 5
Preferred car by Gender

Gender Gender Row
Preferred car Female Male Totals
Small 43 25 68
Column % 29.9% 18.8%
Row % 63.2% 36.8%
Total % 15.5% 9.0% 24.5%
Medium 75 61 136
Column % 52.1% 45.9%
Row % 55.1% 44.9%
Total % 27.1% 22.0% 49.1%
Large 26 47 73
Column % 18.1% 35.3%
Row % 35.6% 64.4%
Total % 9.4% 17.0% 26.4%
Total % 52.0% 48.0% 100.0%
12. What does Table 5 tell us about the sample?
A 22.0% of males prefer a medium-sized car.
B Males would rather have a large car than a female.
C Females typically prefer smaller cars than the size of car that males prefer.
D Large cars are preferred by 64.4% of males.
Page 10
Table 6 shows the size of car most often driven, for each category of age.
Table 6
Car driven most often, by Age in years.

Age Age Age Age Row
Car driven most often < 25 years 25-39 40-59 60+ years totals
Small 53 29 13 5 100
Column % 42% 41% 18% 17%
Row % 53% 29% 13% 5%
Total % 18% 10% 4% 2% 33%
Medium 52 18 21 9 100
Column % 41% 26% 28% 30%
Row % 52% 18% 21% 9%
Total % 17% 6% 7% 3% 33%
Large 21 23 40 16 100
Column % 17% 33% 54% 53%
Row % 21% 23% 40% 16%
Total % 7% 8% 13% 5% 33%
Column totals 126 70 74 30 300
Total % 42% 23% 25% 10% 100%
13. What is the design for Table 6 according to the usual way of describing tables?
A 3x4
B 4x3
C 4x5
D 14 x 5
14. What does Table 6 say about the sample?
A Age and the size of the car most often driven are unrelated.
B The “absolute risk” of a person aged 60+ driving a small car is the same as the absolute
risk of an under 25-year old driving a large car.
C Of those driving large cars, 54% were 40-59 years old
D Younger people mostly drive small cars.
Page 11
What’s important to children
Chase and Dummer (1992) surveyed primary schoolchildren from three localities in Michigan, USA.
The survey asked the children which of these three personal goals was the most important to them:
 School grades (i.e., school results).
 Sports.
 Popularity.
The children’s locality (urban or rural), school year (4, 5 or 6) and gender were also recorded. Table 7
is a cross-tabulation showing the most important personal goal by the child’s residential locality, the
type of district where the child lived.
Table 7
Goal by type residential locality.

Most important Locality Locality Locality Row
personal goal Urban Suburban Rural totals
Grades 103 87 57 247
Column % 58% 58% 38%
Sports 26 22 42 90
Column % 15% 15% 28%
Popularity 49 42 50 141
Column % 28% 28% 34%
Column totals 178 151 149 478
15. Which of these conclusions follows from Table 7?
A Grades were the most important personal goal for all three localities.
B 58% of children whose most important personal goal was grades lived in urban or
suburban areas.
C Each child from every locality would rather be popular than good at sport.
D Rural children’s percentages are low because there were fewer rural children in the data
compared with numbers of urban or suburban children.
Page 12
Another cross-tabulation from Chase and Dummer (1992), this time showing goal by school year:
Table 8
Personal goal by school year.

Most important Year Year Year Row
personal goal 4 5 6 totals
Grades 63 88 96 247
Column % 53% 50% 52%
Sports 25 33 32 90
Column % 21% 19% 17%
Popularity 31 55 55 141
Column % 26% 31% 30%
Column totals 119 176 183 478

The interpretation of this table is not definite. A bit of judgement is involved. Don’t get wound up about it. The table is
just an exercise; there are no marks in it. Sometimes the answers don’t leap out of the data. You have to look hard.
A School year appears not all that strongly related to personal goals. The children’s most
important personal goals are similar across school years.
B There’s a tendency for older children (i.e., higher school year) to be less enthusiastic
about sports.
C Popularity is an important goal for more older children (Years 5 and 6) compared with
Year 4 children.
D All of the above are reasonable conclusions from this table.
Finally, a table showing personal goal by gender:
Table 9
Goal by gender.
Most important Gender Gender Row
personal goal Boy Girl totals
Grades 117 130 247
Column % 52% 52%
Sports 60 30 90
Column % 26% 12%
Popularity 50 91 141
Column % 22% 36%
Page 13
17. Which of these conclusion follows from Table 9?
A Girls would rather be popular than good at sport.
B Grades are more important to girls than to boys, because more girls than boys chose
grades as their most important goal.
C Girls are more popular than boys.
D A lot of children can’t have answered the survey properly because for the sports
preference category, 26% + 12% = 38%, which is a long way short of 100%.
Epidemiology
You will have realised by now that epidemiology and cross-tabulations go together.
18. Fife et al. (2011, as cited in a newspaper article by Taor, June 4-5, 2011) report that:
 13% of women of normal weight have a caesarean section when they give birth.
 19% of women who are overweight have a caesarean section when they give birth.
 31% of women who are obese have a caesarean section when they give birth.
These individual statistics refer to the…
A Absolute risk of a caesarean section when giving birth.
B Relative risk of a caesarean section when giving birth.
C Absolute risk reduction of a caesarean section when giving birth.
D Number needed to treat.
Page 14
19. From the same study as the previous question:
Fife et al. (2011, as cited in Taor, June 4-5, 2011) reported that:
 13% of women of normal weight have a caesarean section when they give birth.
 19% of women who are overweight have a caesarean section when they give birth.
 31% of women who are obese have a caesarean section when they give birth.
When using these percentage data to compare caesarean rates among women of different
body mass, which of these measures could equal one (1).
A Hazard ratio.
B Relative risk.
C Odds ratio.
D None of the above.
20. In a newspaper article, Dunlevy (June 1, 2011) reported a 64% increase in the incidence of
bowel cancer among people aged under 35 years (“young people”). This statistic refers to…
A How many young people have bowel cancer at any one time.
B The mortality rate of bowel cancer among young people.
C The number of new diagnoses of bowel cancer in young people.
D The bowel cancer rate among young people.
Page 15
21. If we accept Dunlevy’s (June 1, 2011) report of a 64% increase in the incidence of bowel
cancer among people aged under 35 years (“young people”) as true, which of these
conclusions is also true?
A The incidence of bowel cancer among those testing positive to the disease is now 64%
B Bowel cancer is now common among young people.
C For every 100 young people newly diagnosed with bowel cancer during the previous
observation period, 164 were newly diagnosed with bowel cancer in the more recent
observation period.
D Among the people who have bowel cancer, there is a 64% increase in the number of
young people.
22. If we accept Dunlevy’s (June 1, 2011) report of a 64% increase in the incidence of bowel
cancer among people aged under 35 years (“young people”) as true, which of these
conclusions is also true?
A The incidence of bowel cancer among those testing positive to the disease is now 64%.
B Bowel cancer is now common among young people.
C For every 100 young people newly diagnosed with bowel cancer during the previous
observation period, 164 were newly diagnosed with bowel cancer in the more recent
observation period.
D Among the people who have bowel cancer, there is a 64% increase in the number of
young people.
23. What does an R0 of zero mean?
A The relative risk of catching the disease is zero.
B The prevalence of the disease is zero.
C The disease will eventually disappear from the population; future incidence will be zero.
D The disease is not at all contagious.
Page 16
24. Freedman et al. (2012, as cited in Taor, May 26-27, 2012) report that from a study of 400,000
people between 1995 and 2008, of whom 52,000 died:
 Males who drank one cup of coffee per day had a risk of death of 0.99 compared with males
who drank no coffee at all.
 Males who drank four or five cups of coffee per day had a risk of death of 0.88 compared
with males who drank no coffee at all.
 Females who drank four or five cups of coffee per day had a risk of death of 0.84 compared
with females who drank no coffee at all.
Numbers of people who did or did not drink coffee, and who lived or died during the study
period could be shown on this contingency table, with cells labelled from X to Z.
Lived Died
Drank coffee W X
Drank no coffee Y Z
Which of these statements is the most likely to be true?

Rather difficult, and not for the exam. Think about what row and column percentages refer to, and relate that to the
answers. Try finding one answer that makes sense against the scenario. If desperate, hit the discussion board.
A The row percentage for cell W will be lower than the row percentage of cell X.
B The column percentage of cell W will be lower than the column percentage of cell Y.
C The row percentage in cell W will be lower than the column percentage in cell Z.
D The row percentage in cell X will be lower than the row percentage in cell Z.
25. Which of these values for number needed to treat indicates the best treatment?
A Negative 1 (minus 1)
B Zero
C One (1)
D 100
Page 17
Odds ratios
Table 10 shows selected odds ratios, confidence intervals and p values from p. 7 of Newstead and
D’Elia’s (2007) study into car colour and crash risk. Suppose the contingency table for the study is set
out as below, where a non-white car is defined as exposure to a hazard because the researchers
believe that white is easier to see:
Outcome
Exposure: Car colour
Crashes = “case” Does not crash = “control”
Number of non-white cars Number of non-white cars

Exposed Non-white car
crashing not crashing
Number of white cars Number of white cars

Not exposed White car
Selected results for the study are in Table 10.
Table 10
Colour Odds ratio Lower 95% CI Upper 95% CI p value

relative to white
Green 1.04 1.005 1.08 .0018
Red 1.08 1.05 1.11 < .0001
Silver 1.10 1.06 1.14 < .0001
Blue 1.05 1.02 1.08 .0018
26. Which of these conclusions follows best from Table 10?
A During the study period, the probability of a crash in a green, red, silver or blue car was
no higher than 0.18%, and could even be less than .01%.
B None of the four colours in Table 10 is safer than white.
C The risk of a crash in a silver car is 10% higher than the risk of a crash in a white car.
D The confidence intervals show that the odds ratio in the population could not possibly be
1 for any of the four colours in Table 10 when their crash risk is compared to white.
Page 18
27. From Table 10, which of these four car colours is the safest compared to white?
A Green
B Red
C Silver
D Blue
Suppose the row for white cars and the row for non-white cars were swapped, so white appears on
the upper row and non-white on the lower row. The contingency table will look like this:
Outcome
Exposure: Car colour
Crashes = “case” Does not crash = “control”
Number of white cars Number of white cars

Exposed White car
Number of non-white cars Number of non-white cars

Not exposed Non-white car
Now we are looking at the risk of driving a white car. The above table can be used to answer the next
question.
28. With white cars moved to the “exposed” row and non-white cars to the “not exposed” row, what
would happen to the odds ratios in Table 10?
A All four odds ratios will remain unchanged.
B The table will make no sense, thus valid odds ratios could not be calculated.
C All four odds ratios will be less than 1.
D All four odds ratios will increase because if white cars are a hazard, then non-white cars
will be even more of a hazard despite any re-labelling of the contingency table.
Page 19
Hypothesis testing
29. What should we infer when p < .05 for a study testing the effect of a treatment on a continuous
measure of health?
A The statistically significant results shows that the treatment is effective and should be
implemented in clinical practice.
B A Type 1 error has occurred.
C The alternative hypothesis of a non-zero effect of the treatment in the population is

proved.
D The null hypothesis of a zero effect of the treatment in the population should be rejected.
30. Why are confidence intervals considered to be more informative evidence than hypothesis
tests?
A Confidence intervals can be calculated for continuous and non-continuous data.
B Confidence intervals give a range of likely effect sizes that would happen if the study
were repeated, which is more informative than a choice between a zero or non-zero
effect size.
C Confidence intervals show the range of possible treatment effects for individual patients
in the population, which is exactly what a clinician wants to know.
D Confidence intervals provide more precise evidence for decisions than hypothesis tests.
31. Which of the following are possible if a Type 1 error occurs?
A The sample size is too small.
B A zero effect-size in the population.
C A non-zero effect-size in the population.
D A Type 2 error.
Page 20
Systematic review
32. Figure 2 below shows a forest plot from a meta-analysis of controlled trials. Assume that a
higher odds ratio (OR) of at least 2 indicates that the treatment is beneficial compared with the
control condition. What conclusion does Figure 2 best support?
A There is statistical evidence for the treatment’s effectiveness.
B There is evidence that the treatment will provide a worthwhile benefit to patients.
C Both A and B.
D Results from the five studies are too heterogeneous for any firm conclusion to be made.
Diagnostic accuracy
Bohannan (2005) tested the diagnostic accuracy of manual muscle testing (MMT, the index test)
against hand-held dynamometer testing (HHD, the reference test) for between-side differences in
knee extension force among 107 rehabilitation patients. Levels of the dynamometer were tested at
differences between sides of 15%, 20%, 25% and 30%, where a value exceeding that difference was
said to indicate a case with impairment. The aim was to see at which level of dynamometer difference
the manual muscle testing gave the best diagnostic accuracy.
Page 21
Here are the formulae for sensitivity, specificity, positive and negative predictive values and overall
accuracy:
 Sensitivity = True positives ÷ (True positives + False negatives).

 Whether cases test positive.
 Specificity = True negatives ÷ (True negatives + False positives).

 Whether controls test negative.
 Positive predictive value = True positives ÷ (True positives + False positives).

 Whether positives are cases.
 Negative predictive value = True negatives ÷ (True negatives + False negatives).

 Whether negatives are controls.
 Diagnostic accuracy is the percentage of all decisions that were correct.
Table 11 shows the contingency table for HHD = 15% difference.
Table 11
15% HHD difference.

Based on Bohannan (2005) Table 3.
MMT MMT Row
HHD Negative Positive totals
Case 26 44 70
Column % 44% 92%
Row % 37% 63%
Control 33 4 37
Column % 56% 8%
Row % 89% 11%
33. From Table 11, what is the negative predictive value when HHD = 15%?
A 26 ÷ 59 = 44%
B 33 ÷ 59 = 56%
C 26 ÷ 59 = 37%
D (26 + 33) ÷ 107 = 55%
Page 22
For practice, find the sensitivity, specificity and positive predictive values for Table 11. Answers for
these questions and others like them will be given when all answers for this practice set are released.
You won’t get this type of open-ended question in the exam.
Next find the sensitivity, specificity and positive predictive values for Table 12, which shows results for
a 20% HDD as the diagnostic cut-off.
Table 12
20% HHD difference.

MMT MMT Row
Case 20 43 63
Column % 34% 90%
Row % 32% 68%
Control 39 5 44
Column % 66% 10%
Row % 89% 11%
…and now Table 13, which shows results for 25% HHD.
Table 13
25% HHD difference.

MMT MMT Row
Case 15 39 54.0
Column % 25% 81%
Row % 28% 72%
Control 44 9 53.0
Column % 75% 19%
Row % 83% 17%
Column totals 59 48 107.0
Page 23
…and Table 14, with the results for 30% HHD.
Table 14
30% HHD difference.

MMT MMT Row
Case 13 34 47
Column % 22% 71%
Row % 28% 72%
Control 46 14 60
Column % 78% 29%
Row % 77% 23%
34. Supposing that MMT was intended as a screening test for one-sided knee weakness, with the
aim of detecting as many potential cases as possible, leaving few if any cases missed, at what
reference test cut-off does MMT give the worst result?
A 15%
B 20%
C 25%
D 30%
35. Now suppose that MMT was the reference test and HHD was the index test. Which of the four
cut-off levels would an evidence-based practitioner recommend for HHD in order to achieve the
highest diagnostic accuracy?
A 15%
B 20%
C 25%
D 30%
Page 24
36. A pregnancy-testing kit available in supermarkets and intended for home use claims on the
packet “99% accurate”. Suppose that a women who really is pregnant is a case, and that a
positive result indicates pregnancy according to the test.
In terms of diagnostic accuracy, what should “99% accurate” mean?
A 1% of women who use the test will have either a false positive or false negative result.
B 99% of women who are not pregnant will test negative.
C 99% of women who are pregnant will test positive.
D 99% of women who test positive are pregnant.
Page 25
Differences between means

Relevant readings include the final week lecture on effect sizes and the graphs in the third tutorial.
Analyses in the final exam are simpler than these more complicated practice scenarios.
37. Which graph in Figure 3 shows the best evidence that the treatment has a harmful effect?
Figure 3
Graph A Graph B
Good High
Outcome mean
Outcome mean
Treatm ent Treatm ent

Poor Placebo Low Placebo
Before After Before After
Graph C Graph D
Outcome mean mean
Good Good
Outcome mean
Treatm ent Treatm ent

Poor Control Poor Control
Before After Before After
Swaying
Teasdale, Bard, La Rue, and Fleury (1993) wanted to know how difficult is it to stay balanced while
concentrating, and to compare performance across age. Each of 17 subjects stood barefoot on a
"force platform" and tried to maintain a steady and upright position while reacting as fast as possible
to an unpredictable noise. When the random noise sounded, the subject pressed a button on a
handheld device as soon as they could. The platform measured swaying distance in millimetres in
Page 26
forward-backward and the side-to-side directions. Table 17 shows the complete data for elderly and
young subjects. Less swaying means the person is better able to maintain a standing balance.
The research question asks whether the two age groups have different swaying performance. Table
15 shows descriptive and inferential statistics comparing the average swaying distance for elderly and
young subjects. Because two related variables are analysed simultaneously, the Bonferroni
adjustment should be used, so the threshold p value now becomes .025 and not the usual .05.
CONSORT Item 20 refers to “multiplicity of analyses” but you don’t have to know about Bonferroni
adjustments or multiple analyses for this unit. Just accept that a p value must be 0.025 or less for
statistical significance in Table 15.
Table 15
Swaying distance (mm) descriptive statistics and independent t test results.

Mean Mean t df p Valid N Valid N SD SD
Swaying Elderly Young Elderly Young Elderly Young
Fowards-backwards 26.33 18.13 2.30 10.97 0.0418 9 8 9.77 4.09
Side to side 22.22 15.13 1.92 10.50 0.0820 9 8 10.27 3.91
38. What should we conclude from Table 15?
A The amount of swaying is low for both groups, and especially for young people.
B There is significantly more forwards-backwards swaying for older people compared with
younger people, but not significantly more side-to-side swaying for more elderly people.
C In the population, young people are better able to maintain their standing balance when
distracted.
D In the population, elderly and young people are equally able to maintain standing balance
when distracted.
New Zealand Air-force helmets
The New Zealand Air-force bought a batch of flight helmets that didn’t fit many pilots. After that
experience, the Air-force decided to use callipers (like wide tongs) to measure the heads of all recruits
before they received a helmet. There was a choice between cheap cardboard callipers, and more
expensive and uncomfortable but accurate metal callipers. However, there is no point using cheap
and comfortable cardboard callipers if they’re inaccurate. The Air-force had to decide whether to use
cardboard or metal callipers. The Air-force was particularly interested in whether there is a systematic
difference between Air-force recruits’ head diameter measurements taken using metal and cardboard
callipers. Ideally there should not be a systematic difference, because that would mean cardboard is
sufficiently accurate, as well as cheaper and more comfortable.
Anyone who wears hats will tell you that correct size is crucial. It’s particularly important avoid
headwear that’s too small. For fighter-pilots, a too-small helmet may cause headaches which could
Page 27
impair a pilot’s concentration and safety. Too-large headwear may not cause headaches but will slip
around and be a nuisance. We’ll consider a measurement error of at least 3 mm to be unacceptable.
That’s because a 3 mm error in diameter measurement could lead to a 10 mm error in head
circumference, which is a lot.
Head diameter was measured in millimetres using both types of calliper for 18 recruits. Descriptive
and inferential statistics are shown in Table 16.
Table 16
Head diameter descriptive statistics and paired t test.

Paired t test result: t = 3.19, df = 17, p = .0054.
Data courtesy of Dr Stephen Legg, StatSci.org, and
from Seber and Lee (1998).
Valid N Mean 95% CI 95% CI SD
Variable mm lower upper
Metal 18 152.94 150.19 155.70 5.54
Cardboard 18 154.56 151.66 157.45 5.82
Difference 18 -1.61 -2.68 -0.54 2.15
39. What should an evidence-based occupational health consultant conclude from Table 16?
A The mean measurement error is less than 3 mm. Either calliper type will give sufficient
measurement accuracy. Cardboard callipers are recommended because they are more
comfortable and cheaper than metal.
B The mean difference between cardboard and metal, and the confidence interval for the
difference are both within the acceptable limit. Cardboard callipers recommended over
metal because they are sufficiently accurate, while more comfortable and cheaper.
C Cardboard callipers are recommended because their mean measurements are

significantly higher than the metal calliper measurements. Cardboard’s over-estimation of
head diameter will reduce the risk of a helmet being too tight.
D Metal callipers are recommended because cardboard callipers measurements are

significantly different. Metal callipers are known to be accurate, so it is unwise to use
cardboard callipers for which the measurements differ systematically from metal.
Do blondes feel more pain?
This question was investigated at the University of Melbourne and is reported in McClave and
Dietrich II (1991). It’s hard to think of a scientific reason why hair colour should affect pain perception.
Perhaps dark-haired people feel pain more than blondes do. Until we see data, we’ll never know. The
analyses concentrate on statistical significance and effect sizes.
Page 28
For the study we’re looking at here, 19 people of different hair colours were tested for their “pain
threshold”, which is how much they have to be hurt before they feel severe pain. There are ways to
test pain threshold that an ethics committee may approve, but we don’t want to spread ideas.
Four different hair colours were tested:

1. Light blond.
2. Dark blond.
3. Light brunette (light brown).
4. Dark brunette (dark brown).
We have no information about how many males or females were in the sample, or whether hair colour
was natural or out of a packet. Let’s say that hair colour was natural, so that subjects were not
allocated to hair colours. (The researchers didn’t find a sample of dark-haired people, and bleach
selected subjects’ hair or tint it a lighter colour.) Next, the researchers applied pain to the subjects and
measured their tolerance for it.
40. Which of these NHMRC levels of evidence study designs best fits this hair colour and pain
threshold study, as described? See Hoffmann et al. (2013) Table 2.5 on p. 28 for a list.
A non-statistical question.
A Randomised controlled trial (Level II).
B Pseudo-randomised trial, a type of quasi experiment (Level III-1).
C Comparative study with concurrent controls, a type of quasi-experiment (Level III-2)
D Case series, a non-experimental design (Level IV).
Before we look at the results, here’s a general question.
41. What does a 95% confidence interval (CI) for a mean show?
A A range of likely values for the sample means if the study were replicated many times.
B An estimate of the true score for each case in a sample; we can be 95% sure that each
case’s true score lies within the 95% CI.
C An estimate of the range of scores in a sample; we can be sure that 95% of the scores lie
within the 95% CI.
D An estimate of the minimum important clinical difference (MID) for a sample; we can be
95% sure that the MID lies within the 95% CI.
Page 29
Table 17 shows the results from the pain study. From left to right:
 Mean (average) pain threshold for each hair colour.
 95% confidence intervals for pain threshold per colour.
 Sample sizes for each hair colour.
 Standard deviations.
Table 17
Pain threshold by hair colour descriptive statistics and 95% confidence intervals.
Higher pain threshold = more resistant to pain. Results are rounded to whole numbers.
Pain threshold 95% CI 95% CI Group N Pain threshold
Hair colour Means Lower Upper SD
Light blond 59 49 70 5 9
Dark blond 51 40 63 5 9
Light brunette 43 34 51 4 5
Dark brunette 37 27 48 5 8
All hair colours 48 42 53 19 11
Looking at Table 17, it’s clear that pain thresholds increase with lighter hair colour. This trend
suggests that people with lighter coloured hair are “tougher” (more pain resistant) than people with
darker hair. Hair colour appears to affect pain tolerance. We’ll assume that pain tolerance doesn’t
cause a change in hair colour. However, we may ask:
 Are these differences in average pain threshold clinically important? The background
information gives no clue about the minimum difference in pain sensitivity that has practical
implications. Let’s say that a difference of at least 5 in pain threshold is needed for clinical
importance. A change in pain threshold of less than five has no implications for evidence-
based practice. We’ll call that difference of 5 the minimum clinically important effect size. (This
value of 5 has no basis in the literature. We’re choosing it here for convenience. From the
supplied background information we know nothing about the pain threshold scale.)
 Are observed differences statistically significant? If the observed differences are statistically
significant, we can expect these sample effects to generalise to the wider population. For
statistical significance, we want a p < .05.
We’ll do a sequence of three statistical tests.
1. Contrasting light blondes with all others (dark blond, light brunette and dark brunette combined
into one large group).
2. Contrasting dark blondes with brunettes (light brunette and dark brunette as one large group).
Light blondes are omitted.
3. Contrasting light brunette with dark brunette. All blondes are omitted.
Page 30
These three tests are an efficient way of exploring the differences. For each analysis there are four
possible conclusions:
1. The effect is not clinically important (effect size is less than 5) and the statistical test is not
significant (p > .05). The difference is too small to be concerned about clinically and it won’t
generalise to the wider population. Any observed difference is sample-specific. The population
effect can be considered as zero, the value of a null effect for the difference between means.
2. The effect is clinically important (effect size is more than 5) but the statistical test is not
significant (p > .05). The effect is large enough matter clinically. However, it won’t generalise
to the wider population. The observed difference is sample-specific. The population likely has
an effect size of zero.
3. A clinically unimportant effect size (less than 5) that is statistically significant (p < .05). The
effect is too small to be important clinically but can be expected to generalise, although we
would assume that the population effect size will also be small.
4. A clinically important effect size (more than 5) that is statistically significant (p < .05). The
observed difference matters clinically and we can expect it to carry over to the population.
Table 18 shows the results for the light blond versus darker hair colours (dark blond, light brunette
and dark brunette).
Table 18
Pain thresholds: Light blondes versus all darker colours.

Results rounded to nearest whole number.
Statistical significance: p = .0024.
Pain threshold 95% CI 95% CI Pain threshold Pain threshold
Hair colour Means Lower Upper N SD
Light blonde 59 49 70 5 9
Darker 44 38 49 14 10
Page 31
A The difference in average pain thresholds between light blondes and darker coloured hair
types is neither clinically important nor statistically significant. The effect size is small and
should be considered as zero for the population.
B Light blondes have substantially higher pain thresholds than darker coloured hair types.
The difference is clinically important but the non-significant result means this difference is
unlikely to generalise to the population.
C Although light blondes have somewhat higher average pain thresholds than darker
coloured hair types, the effect is not important clinically but is likely to generalise to the
population.
D Light blondes have significantly higher pain thresholds than darker coloured hair types.
The difference is clinically important and will likely generalise to the population.
Table 19 shows results for the contrast between dark blondes and combined results for light and dark
brunettes.
Table 19
Pain thresholds: Dark blondes versus brunettes.

Dark blonde 51 40 63 5 9
Brunette 40 34 45 9 7
A The difference in average pain thresholds between dark blondes and brunette hair types
is neither clinically important nor statistically significant. The effect size is small and
should be considered as zero for the population.
B Dark blondes have substantially higher pain thresholds than brunette hair types. The
difference is clinically important but the non-significant result means this difference is
C Although dark blondes have higher average pain thresholds than brunette hair types, the
effect is not important clinically but is likely to generalise to the population.
D Dark blondes have significantly higher average pain thresholds than brunette hair types.
Page 32
Table 20 contrasts light and dark brunettes. Blondes are omitted because they were examined in
Table 18 and Table 19.
Table 20
Pain thresholds: Light brunettes versus dark brunettes.

Light brunette 43 34 51 4 5
Dark brunette 37 27 48 5 8
A The difference in average pain thresholds between light brunettes and dark brunettes is
neither clinically important nor statistically significant. The effect size is small and should
be considered as zero for the population.
B Light brunettes have substantially higher pain thresholds than dark brunettes. The
difference is clinically important but the non-significant result means this difference is
C Although light brunettes have higher average pain thresholds than dark brunettes, the
effect is not important clinically but is likely to generalise to the population.
D Light brunettes have significantly higher average pain thresholds than dark brunettes.
45. Finally, what are the overall conclusions at a population level according to this study,
expressed in simple terms?
A Light blondes are the least sensitive to pain among the four tested hair types. The status
of redheads in this hierarchy remains unknown.
B Blondes are less sensitive to pain than are brunettes.
C Light and dark brunettes have similar pain thresholds.
D All of the above conclusions follow from the evidence.
Page 33
Take the 2 x 2 cross-tab challenge!

There’s nothing like this exercise in the final exam. Try it for fun if you like. If you
can figure it out, you’re doing well. If not, that’s OK. Arithmetic combined with
reasoning are needed. The answers show the difficulty of drawing firm
conclusions from incomplete information.
On February 5, 2011, Wessel and Cummins in The Wall Street Journal wrote
that Egypt’s “high-school graduates account for 42% of the work force—but 80%
of the unemployed” (para. 9). These two percentage values, as given in the
Wessel and Cummins article, contain precise information. Below we’ll see how
those two percentages can lead to different conclusions depending on other data in the table, which
we’re not given but can invent and test for ourselves. The two percentages are consistent with a wide
range of scenarios. They say nothing definite about the unemployment rate of Egyptian school
leavers.
Below is a cross-tabulation with the two Wessel and Cummins percentages included. By “workforce,”
we mean people in Egypt who are eligible for employment, which includes school leavers and other
adults. The only other information is 1000 for the grand total, referring to an imaginary random sample
of the Egyptian working-age population.
Table 21
N = 1000 for the entire table. Workforce
Employed Unemployed
N
High school
graduate
Column % 42% 80%
Education
N
Other
education
Column %
Column totals
All of the answers are open-ended, no multiple-choice. The first task is easy.
Complete other two column percentages in Table 21.
Page 34
Next, we’ll complete the rest of Table 21 to show that unemployment is moderately high among
Egyptian high school graduates.
Fill in the missing cell frequencies, labelled N and shaded yellow in Table 21 to show
moderately high unemployment among Egyptian high school leavers.
There are numerous reasonable possibilities. Putting the number 10 in one of the yellow cells will
make the point strongly and avoid fractional values for the other three numbers. You may experiment
with other numbers if you like.
To answer the next question you’ll need to think a bit. Don’t get trapped by the column totals!
What is the percentage unemployment among high school graduates in your completed
Table 21?
Now we’ll complete the table again, another way, showing that unemployment is disastrously high
among Egyptian high school graduates.
Complete Table 22 to show that the unemployment rate is extremely high among Egyptian
high school graduates.
The column percentages cannot change. Try the number 21 in one of the yellow cells to avoid
fractions among the other three numbers. Try other numbers if you like.
Table 22
N = 1000 for the entire table. Workforce
Employed Unemployed
N
High school
graduate
Column % 42% 80%
Education
N
Other
education
Column %
Column totals
Page 35
If you’ve used the recommended numbers of 10 and 21, then when you compare the column totals
you’ve calculated for Table 21 and Table 22 you’ll notice a coincidental resemblance between a
section of the two tables.
What is the risk of unemployment among high school graduates in your completed Table 22,
expressed as a percentage?
An epidemiologist would refer to these unemployment rates as the absolute risks of unemployment for
Egyptian high school graduates.
Clearly, the same percentages of 42% and 80% for the high school graduate column totals can
happen with very different rates of unemployment. As a further exercise (you needn’t do it), you’ll find
that placing the number 1 in one of the cells can give a high school graduate unemployment rate of
practically nothing.
How can we get such different unemployment rates from the same two percentages?
It’s a difficult question. Think of what the 42% and 80% tell you and what they don’t tell you. The
column totals give you another clue.
Page 36
Courtroom counting
This puzzle is based on Gigerenzer (2000/2003). There are no questions as
complicated as this in the examination.
As we all know, DNA evidence is used as evidence in criminal cases. A reported

match between the DNA at the crime scene and the DNA of the accused may be
accepted as proof that the accused is guilty. Let’s work with an example. We’ll
assume that results of DNA testing are the only evidence for this prosecution.
 In a very large city there are 10 million people who could have committed the crime.
 The chances of randomly selected person having the same DNA as the DNA at the crime
scene 0.0001%, that’s one in 1 million, which is low. The test appears very accurate. We
would therefore expect DNA matching to give a good indication of guilt.
 If a person really does have the same DNA as the DNA at crime scene, then a laboratory test
will almost certainly show a match. Again, the test appears accurate.
 If the person doesn’t have the crime scene DNA, then the chances of a match are 0.001%,
that’s one in 100 thousand. False positive matches are therefore possible but unlikely. The
test appears accurate.
 A match has been found between the accused and the DNA on the crime scene.
What are the chances of the accused person really having same DNA as at the crime scene?
Completing Table 23 will help. It already has the information given above. You can fill out the other
cells, and the row and column totals. That will make the answer easy to find.
Table 23
Doesn’t match to crime Matches to crime Row totals

scene DNA scene DNA
Negative test result Positive test result
Has same DNA profile

as DNA at crime scene 10 at a rate of 1 in
0
1 million in the city.
Define as a case
Has different DNA

profile to crime scene 100 at a rate of 1 in
100,000 in the city
Define as a control
Column totals 10,000,000 = number

of people in the city.
Page 37
Is this accuracy rate the sensitivity, specificity, or the positive or negative predictive value?
Next you can calculate the other measures of diagnostic accuracy.
46. Suppose that Table 23 shows the results of DNA testing and matching to the crime scene of all
10 million people in the city. What sort of bias, if any, does the DNA testing show?
All exam questions have only four options for the answers.
A Negative bias, because there are so many more negative than positive results.
B Negative bias, because the negative likelihood ratio is so low.
C Negative bias, because the positive predictive value is so low.
D Positive bias, because all of the cases tested positive and none tested negative.
E Positive bias, because the false positive rate is higher than the false negative rate.
F No bias.
Is a person whose DNA matches DNA from the crime scene guilty of the crime?
From Table 23, what are the chances of someone whose DNA is tested and found to match the DNA
at the crime scene actually being guilty of the crime? Assume that there is only 1 guilty person.
Of course, DNA testing will continue to improve in accuracy, and additional evidence beyond DNA
may be tendered to the court to assist the prosecution or the defence. We can’t assume that the
results here are indicative of DNA testing today, but they do show how a seemingly accurate test can
fail in important respects.
If you understand all this, you can see how the principles of diagnostic testing can be applied to
forensic science – interesting if ever you’re considering a change of career.
Page 38
Wet cupping for low back pain

AlBedah et al. (2015) used a randomised control trial to test the effect of wet cupping treatment on
symptoms of low back pain. Treatment and control groups had 40 patients each. The treatment group
received wet cupping, a type of alternative therapy, whilst the control group did not. The minimum
clinically important difference (MCID) was defined as -15, meaning a reduction (i.e., improvement) of
15 on a pain measurement scale. If the reduction in pain is at least 15, the improvement is considered
worthwhile. If the reduction in pain is less than 15, the change over time is not clinically important.
In their results, Albedah et al. reported that “31 of 40 (77.5%) patients in the wet cupping group
showed an MCID (-15) after 2 weeks compared with only 1 of 40 patients in the control group”
(p. 506).
Using the above information, complete Table 24 to show the number of patients in each of the
four table cells, and the row, column and table totals.
Table 24
Achieved MCID or Achieved less than Row totals

better after 2 weeks MCID after 2 weeks Numbers given in scenario
description above
Treatment group
Received wet cupping
Control group
Usual care; no treatment
Column totals
What do the numbers in Table 24 suggest about the effect of wet cupping on low back pain
compared with no treatment or usual care?
Calculate the odds ratio for this table using the formula given in the Epidemiology lecture and
the matching tutorial.
You are not expected to calculate odds ratios or know the formula for the exam.
Interpret this odds ratio in terms of what it says about the effectiveness of wet cupping for
treating low back pain compared with usual care or no treatment.
Reference
AlBedah, A., Khalil, M., Elolemy, A., Hussein, A. A., AlQaed, M., Al Mudaiheem, A.,… Bakrain, M. Y.
(2015). The use of wet cupping for persistent nonspecific low back pain: Randomized
controlled clinical trial. Journal of Alternative & Complementary Medicine, 21(8), 504-508.
doi:10.1089/acm.2015.0065
Page 39

HLTH2024 Exam Practice2

Uploaded by

Copyright:

Available Formats

HLTH2024 Exam Practice2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HLTH2024 Exam Practice2

Uploaded by

Copyright:

Available Formats

HLTH2024 Research Methods in Health

Final Examination Practice Set 2

A Two nominal scale variables.

B Two ordinal scale variables.

C One nominal scaled variable, and an interval or ratio-scaled variable.

D Two ratio or interval scale variables.

2. Which of these correlation values for r indicates the strongest association?

C Anything less than .05 but greater than zero.

Data contributed by Rex Boggs.

C On average, the monkeys could do 66% of the tasks given to them.

The Boggs ablution study

Three correlations are possible:

D All of the above.

Correlations – world data from 1995.

6. Which of these conclusions is true about the countries analysed in Table 2?

Weight by age Height by weight N = 251

7. From Figure 1 which of these statements is correct?

Case Control Row total

D None of the above.

Hair greyness by Gender at age 25.

A 29% minus 34% = -5%

B 34%  29% = 1.17

D 17.0%  14.5% = 1.17

Who wants what wheels

Preferred car by Gender

12. What does Table 5 tell us about the sample?

A 22.0% of males prefer a medium-sized car.

B Males would rather have a large car than a female.

D Large cars are preferred by 64.4% of males.

Car driven most often, by Age in years.

14. What does Table 6 say about the sample?

C Of those driving large cars, 54% were 40-59 years old

D Younger people mostly drive small cars.

What’s important to children

 School grades (i.e., school results).

Goal by type residential locality.

15. Which of these conclusions follows from Table 7?

Personal goal by school year.

16. Which of these conclusions follows from Table 8?

D All of the above are reasonable conclusions from this table.

Finally, a table showing personal goal by gender:

17. Which of these conclusion follows from Table 9?

A Girls would rather be popular than good at sport.

C Girls are more popular than boys.

These individual statistics refer to the…

A Absolute risk of a caesarean section when giving birth.

B Relative risk of a caesarean section when giving birth.

C Absolute risk reduction of a caesarean section when giving birth.

D Number needed to treat.

19. From the same study as the previous question:

D None of the above.

B The mortality rate of bowel cancer among young people.

C The number of new diagnoses of bowel cancer in young people.

D The bowel cancer rate among young people.

B Bowel cancer is now common among young people.

B Bowel cancer is now common among young people.

23. What does an R0 of zero mean?

A The relative risk of catching the disease is zero.

B The prevalence of the disease is zero.