Stastitics (Clinical Trials)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

RANDOMISED CLINICAL TRIALS

By Thomas Perneger et Christophe Combescure


University of Geneva

Translated by Dr Meera MANRAJ, University of Mauritius


REVISION: STATISTICAL TESTS

• Procedure for choosing between the null hypothesis and


the alternate hypothesis
• We first formulate the hypotheses
• H0: no association
– Treatment does not improve survival
– No difference in size between women and men
• HA: association is present
– Treatment improves survival by 20%
– Women are on average 10 cm shorter than men

• Then we collect the data


–Observations of survival in treated and
untreated patients
– Measurement of sizes of men and women
REVISION: RESULTS and ERRORS
• We perform the test, which is significant or non-significant
• If the null hypothesis is true:
– Non-significant result (correct answer) in 95% of cases
– Significant result (type 1 error) in 5% of cases

• If the alternative hypothesis is true: =power


– Significant result (correct answer) in 80% of cases (or 90%)
– Non-significant result (type 2 error) in 20% of cases (or 10%)

• If significant result is obtained: We retain these


– Either the alternative hypothesis is true options
– Either the null hypothesis is true and we have made a type 1 error

• If non-significant result:
– Either the null hypothesis is true
– Either the alternate hypothesis is true and we made a type 2 error
REVISION: CHOOSING the TEST

Variables compared
quantitative
qualitative
continuous
Chi-2 Student
2 (Fischer) (Matt-Whitney)
How many
groups are 3 or Chi-2 ANOVA
compared more (Fischer)
REVISION: p value

• Measures how much the observed results contradict


the null hypothesis
• The smaller the p, the more the results contradict H0
• p-value: probability of occurrence of the observed
difference or of a larger difference if the null
hypothesis was true
• p ≤ 0.05 is equivalent to a significant test result
• p > 0.05 is equivalent to a non-significant test result
REVISION: Confidence interval
• Hypotheses are made on parameters (true values that
describe the universe), but we make our observations on
estimators (from samples of limited size)
• Confidence interval (CI): set of values ​of the parameter which
are compatible with the observed estimator
• 95% CI: if the study was repeated a large number of times,
95% of calculated CIs would contain the true value of the
parameter (but in particular cases we do not know if this is the case…)
• The narrower the CI, the more accurate the estimate.
• If CI contains the value of the parameter corresponding to H0:
non-significant test
• If CI excludes the value of the parameter corresponding to H0:
significant test
LEARNING OBJECTIVES IN THIS LECTURE

• Understand what a randomized clinical trial is useful


for
• Discover the notion of bias
• Be able to explain the usefulness of
– Random allocation (randomization)
– Placebo
– Concealment of the allocation of the next patient
(concealment of allocation)
– Blinding
– Pre-specification of the main evaluation criterion (outcome)

Petrie/Sabin
Chapter 14
Example: Pharyngitis
How should it be treated?

• Group A Streptococcus
– Antibiotics
– Otherwise risk of complications (rheumatic fever)

• Other causes (mostly viruses)


– Symptomatic treatment

• New option considered:


– Steroidal anti-inflammatory drug (dexamethasone)
Student scenario

• You wake up one day with a fever and a sore throat


• A friend who passed his final professional examinations
recommends that you take 10 mg dexamethasone (an anti-
inflammatory drug)
• 2 days later you feel much better!

• Did the medicine help you?

• Impossible to say; all individual disease pathways are unique,


and it is not possible to know whether the medicine did
influence the evolution of the disease
Dr Dexa scenario

• You are a general practitioner and are used to prescribing 10 mg


of dexamethasone to your patients who consult for acute
pharyngitis
• You document their symptoms at 48 hours
• On average, half of your patients are cured within 48 hours
• Is the medication effective?
• Is the drug effective in half of the patients?
• Impossible to say (same criticism)
• We should not fall into the “post hoc ergo propter hoc” trap
Informal fallacy that states: "Since event Y followed event X, event Y must have
been caused by event X."
Evaluation → comparison

• Observation of the patient only does not make it possible to


know if the treatment given was helpful…
• …because we don't know what would have happened without
the treatment or with a different treatment

• On the other hand, we can know if a treatment is useful on


average, by comparing treated patients with untreated patients
(or patients treated differently)
Dr Kompar scenario

• You decide to prescribe 10 mg of dexamethasone to a part of


your patients who consult for acute pharyngitis, those with the
most severe inflammation
• The other patients do not receive dexamethasone
• After 48 hours, half of the patients are cured in each of the 2
groups

• Is the medication effective? ineffective?

• We wish to state that the treatment is not useful


• However… we need to know if the 2 groups of patients had the
same a priori chances of recovery
What does health/disease depend on?
• Genome/human biology
• Environment (physical, social)
• Behaviours
• Health Care (prevention, treatment, rehabilitation)
• …and everything is connected and interdependent

• To evaluate the health care, we must take into account all the
rest!
Dr Historix scenario

• You decide to prescribe right now 10 mg of dexamethasone to


your patients who consult for an acute pharyngitis
• For comparison purposes, you are going to retrieve the
records of patient treated for pharyngitis in the past year,
those who have not had dexamethasone
• Among current patients, 50% are cured within 48 hours; from
information of patients seen in the past year for whom you
were able to establish the outcome (follow-up), only 20%
were cured

• Is the medication effective?


Possible explanations

• Dexamethasone is truly effective


• The patients of the past year were different
– Different pathogens (Selection bias)
– New ENT specialist has settled in the neighborhood
–…
• Documentation of clinical results is different
– Only some patients returned for follow-up
– Different definition of “cured” (Measurement bias)
–…
What is a bias?
• Systematic error
– will happen again if we do things the same way
• consequence:
– the biased estimators do not fluctuate around the true value
of the parameter, but are deflected up or down
– (for geeks: the expectation of the biased estimator does not
equal the value of the parameter)
• Most common causes
– Non-representative sample (selection bias)
– Incorrect, biased measurement (measurement bias)

• (To be distinguished from random error


– which is due to chance
– is added to the possible bias)
To prevent bias we should ensure the following:

• Identical patient populations


• Exposed to identical pathogens
• Identical disease severity levels
• Identical associated diseases and severity factors
• Identical basic health care
• Identical methods of measurements of results (outcomes)
• Only one difference:
– Dexamethasone or
– No dexamethasone
How do we make groups “identical” or comparable?

• Deliberately, by pairing
– Identify all the characteristics that influence the outcome
– For each patient treated by A, measure these variables, and
find another patient identical in all respects who will be
treated by B
• Limitations
– Unmanageable logistics
– Incomplete: we never know all the important variables
• Alternative: leave it to chance!

Randomization = patients randomly assigned to groups


Why do we randomise?

• To make the treated and untreated groups comparable in


terms of potential severity factors
• This allows to:
– Neutralize the differences due to severity factors, in order to…
– Isolate the effect of the treatment
• Randomization allows to neutralize
– Known severity factors
– But also unknown ones!
• Be careful:
– Chance tends to balance the groups “on average”, when number of
repetitions is large
– In some particular cases we can have unbalanced groups
– It is therefore always necessary to check the comparability of the
groups
How do we randomise?
• Computer generated random sequence
– Simple list
– List in blocks of 4-6-8 patients
– Several lists (e.g. one per center) = stratification

• If the trial is open, as soon as a patient is included:


– We open a numbered opaque envelope that contains the
information on the treatment to be given
– We contact the randomization center by phone or internet

• If the trial is blinded, we give the treatment kit


bearing the next number, without knowing what it
contains
– Kits are prepared in pharmacies
Concealment of allocation

• Procedure that prevents us from knowing what treatment we


are going to give to the next patient, before his or her inclusion
• Otherwise, we could try to avoid recruiting a patient with good
or poor prognosis in a particular arm of the trial (according to our
own preferences)
• This would run counter to randomisation, and would
reintroduce a possible bias

• To be distinguished from blinding: concealment of the treatment


received, after randomization
The following are not good methods…

• Simple alternation
• Date of birth (even-odd)
• First letter of name
• Random list but visible to all

In these cases we know in


advance which patient will go
to which group
Is it ethical to randomize?
• YES, if:
– We have a real uncertainty about the effectiveness of the treatment
– The trial is approved by a research ethics committee that is independent of
researchers
– The study procedures are explained to the patient (including randomisation,
the constraints, etc.), the risks and benefits incurred, the freedom to
participate.
– The patient freely agrees to participate (informed consent)

• NO, if:
– One of the treatments is known to be ineffective or toxic
– The risks for the participants are too high compared to the benefit for
society
– The patient is misinformed, under pressure, or does not provide his/her
consent.
– No ethics committee has approved the study
Dr Alea Scenario

• You decide to perform a randomized clinical trial and get the


green light
• You recruit patients who volunteer to participate
• You distribute the patients randomly:
– One group receives 10 mg of dexamethasone
– The other group does not receive dexamethasone
• You estimate whether there is cure at 48 hours for all
• Among the patients under treatment, 50% recovered, and among
the untreated patients, 25% were cured.

• Is the medication effective?


The following should be considered…
• Treated patients are convinced that this super = Placebo
drug should help them and suddenly feel much effect
better
• Patients on dexamethasone want to please the
doctor and say they feel better even if it is not true, = Measurement
and the disappointed untreated patients say that bias
they do not feel okay
• Dr. Alea, who believes in the treatment, asks = Measurement
questions differently to the patients in the 2 groups bias
• Dr. Alea suspects that treated patients who are
not well have not taken their treatment, and = Selection bias
therefore excludes them from analysis
• Untreated patients will buy their own drug from = Contamination
the pharmacy, and this reduces the contrast /dilution
between groups
Placebo effect

• Benefit caused by expectations of the patient who thinks s/he


receives an effective treatment
• Real effect that has a neurophysiological basis
Effects of treatment

Observed
effect
Pharmacogic
effect
Placebo Placebo
effect effect
Spontaneous Spontaneous Spontaneous
evolution evolution evolution

Active Placebo No
treatment treatment
Why give the placebo?

1) To neutralize the “placebo effect”, and highlight only the


pharmacological effect of active treatment
2) To allow blinding
When not to give a placebo

• It is not ethical when an effective treatment exists


– In this case, the new treatment is compared to the old one
• When it is impossible or difficult:
– Recognizable treatment effects (e.g. heroin substitution)
– Overt intervention: surgery, psychotherapy, physiotherapy,
education, exercise, nutrition
– Informative intervention: diagnostic strategy

• When we want to evaluate a treatment under


conditions that are close to those of real life
(“pragmatic” trials)
– The placebo effect is then part of the evaluated intervention
Blinding
• Procedure that prevents from knowing which treatment
the patient receives in a clinical trial
• Avoids a difference in management of patients in the 2
arm of the trial, beyond the randomized treatments (which
could attenuate or accentuate the differences)
• Enables unbiased measurement and analysis of results
(outcomes)
• May apply to:
– Patients
– Medical/health care team
– Researchers/research associates
– Data analyst
Dr Total Scenario

• You decide to perform a randomized clinical trial, and get


approval from the Ethics Committee
• You randomize the patients:
– One group receives 10 mg of dexamethasone
– The other group receives a placebo in all respects similar
• Patients are not aware of which treatment they took
(blinding)
• Evaluation of cured status at 48 hours is performed by an
assistant who does not know which treatment was received
(blind)
• Among the patients under treatment, 50% recovered, and
among the untreated patients, 25%

• Is the medication effective?


Do not forget the contribution of chance

• Among the patients under treatment, 50% recovered, and


among the untreated patients, 25%

• If it is 10/20 that is compared to 5/20, this difference or a larger


difference is plausible under the assumption that
dexamethasone has no effect (H0): p=0.19
• We do not reject H0, and treatment is not recommended.

• If it is 30/60 compared to 15/60, this difference or a larger


difference is less likely under the assumption that
dexamethasone has no effect: p=0.008
• We reject H0, and we recommend the treatment

• (Bear in mind that in both cases we can be wrong! Type 1 and 2 errors)
Basic ingredients of a randomized clinical trial

• Population/patients:
– Which type of patients, with which disease, are we
interested in?
• Intervention:
– Which intervention is evaluated (new
treatment)?
• Comparator:
– Which intervention is the one compared to?
• Outcome:
– How will success or failure be judged?
– Upon which judgment/evaluation criteria?
Outcome (evaluation criterion) must be…

• Important/relevant for patients


– Survival/mortality
– Cure
– Ability to lead the life you want
• Sensitive to treatment effect
• Measurable in an accurate and valid manner

• Be wary of studies that only measure criteria based


on “paraclinical” evaluations:
– Laboratory tests
– Imaging
Possible evaluation criteria of treatment for
pharyngitis

• Mortality: Too rare, not specific to the disease

• Perceived improvement measured informally (“are you doing


better? ”) Too subjective, social desirability bias
• Disappearance of fever: Partial element, influenced by other treatments
• Normalization of the blood count and the C-reactive protein
assay: Not necessarily linked to the patient's condition, shifted in time
• Structured evaluation by the patient of 5 clinical elements
(headache, fever, sore throat, fatigue, difficulty swallowing) on a
numerical scale from 0 to 10, that was previously validated
•…
Example of pharyngitis

• Research question :
– In patients with pharyngitis but who do not need antibiotics
immediately, does a single dose of dexamethasone (10 mg p.o.)
increase the probability of healing at 24 hours, compared to
placebo?
• The research protocol provides the details
– Patients: inclusion and exclusion criteria, recruitment method
– Number of patients to include, according to type 1 and 2 errors, and the
difference to be detected between the arms of the test.
– Randomization and blinding method
– Treatment and placebo details
– Definition of primary outcome (healing at 24h) and secondary outcomes
– Statistical analysis methods
« Association »
• Definition: A and B are associated if the distribution of A
depends of the value of B (and vice versa)

• Examples:
– The probability of recovery depends on the treatment received
– Smoking increases the risk of lung cancer
– Taller people are heavier on average

• Lack of association:
– ABO blood groups have the same distribution in men and women

• How do we detect an association:


– Difference in means
– Difference in ratio or difference in proportions
– Correlation between continuous variables
Comparison of proportions

• Observed proportion of cured patients = estimate of


"risk" of cure = R
– Dexamethasone arm: R1
– Placebo arm: R0
• Measures of association:
– Risk difference: RD = R1 – R0
– Relative risk: RR = R1/R0
• Null hypothesis (dexamethasone has no effect):
– RD=0
– RR=1
• Test: χ2 (chi2)
Initial comparison (extract)
Initial comparison
• Are the groups strictly identical?
- No
• Are the groups similar enough for their comparison
of cure not to be "confounded"?
– Probably yes (clinical opinion)
• Why are we not doing a statistical test on these
initial comparisons?
– We know that the null hypothesis is true: the patients
represent the same population, and were randomly
assigned
– We would expect 5% of statistically significant results,
which would all be type 1 errors
Cure at 24 hours

• Primary outcome: resolution of symptoms at 24h


– Dexamethasone: 65/288 = 22.6%
– Placebo: 49/277 = 17.7%
– Risk difference: 22.6% - 17.7% = 4.9%

• Clinical interpretation:
– 4.9%, is it too little or a great difference?
– If I were a patient, would I take this treatment?
– According to the protocol, the investigators wanted to
detect a 18% improvement (study had 90% power for
detecting such a difference); so they thought therefore
that a difference <18% was unimportant
Statistical uncertainty
• Resolution of symptoms within 24 hours
– Dexamethasone: 22.6%, Placebo: 17.7%
– Risk difference: 22.6% - 17.7% = 4.9% (95% CI: -1.8% to 11.2%)

• Why the confidence interval? The calculation performed is


precise (despite the typographical error in the article, 4.7%)

• The confidence interval does not concern the observed result,


which is an estimator, it concerns the parameter, the unobserved
true value effect of dexamethasone

• The value of this parameter lies between -1.8% and 11.2%


– The effect may be a little deleterious (-1.8% cure)
– The benefit may go up to +11.2% cure
– The effect may be zero (0% is in the range)
– All these values ​are below the postulated effect at 18%
– This estimation method is wrong 1 time out of 20…
Statistical uncertainty

• Risk difference: 4.9% (95% CI: -1.8% to 11.2%), p=0.14


• Null hypothesis: Difference = 0%
• If dexamethasone had no effect, and we repeat the study, a
difference between the study arms of 4.9% or more will occur in
14% of cases

• 2 possible interpretations:
– Dexamethasone is ineffective (H0 is true) and we observed a result that
is compatible with this hypothesis
– Dexamethasone is effective (Ha is true: 18% improvement) but a type 2
error has occurred; indeed a non-significant result can occur with a
probability of 0.10 under Ha

• We do not know which of the explanations is right, but by


convention we keep the 1st one
Cure at 48 hours
• Secondary outcome: resolution of symptoms at 48 hours
– Dexamethasone: 102/288 = 35.4%
– Placebo: 75/277 = 27.1%
– Risk difference: 35.4% - 27.1% = 8.3% (95% CI: 1.2% to 16.2%), p=0.03
• The gain in cure at 48 hours attributable to dexamethasone is
between 1.2% and 16.2%, all positive values
• If dexamethasone was ineffective, we would see a difference of
8.3% or greater only 3% of the time
• So, either:
– Dexamethasone is ineffective and a rare event has been seen (type 1
error)
– Dexamethasone is effective
• We do not know which of the explanations is right, but by
convention we opt for the 2nd one
Why pre-define the primary outcome?

• The investigators obtained


– A non-significant difference at 24h (p=0.14)
– A significant difference at 48 hours (p=0.03)
• Without a pre-defined protocol, they could have put forward
the result at 48 hours, and affirm that dexamethasone reduces
the pharyngitis symptoms
• Furthermore, they could have tested the differences at 1-2-3-4-
5-6-7…days, and published the most favorable result, "forgetting"
the others
• The search for a “significant” result occurs quite frequently, and
introduces a positive bias in the results published
• Pre-specification of the primary outcome and principal analysis
avoids this bias
Which patients to include in the analysis?
• Sometimes patients do not receive the assigned treatment,
or receive it incompletely:
– Side effects
– Perceived lack of effectiveness
– Change of opinion
– Logistic problem

• Two options:
– Intention-to-treat (ITT): pts are maintained in the assigned
group
– Per-protocol (PP): patients are grouped according to
treatment actually received

• Only ITT preserves the benefit of randomization


• PP may introduce a selection bias
We should pre-specify…..

• Outcomes:
– Primary (unique in general)
– Secondary(can be plural)
• Population analyzed
– ITT or PP
– Exclusions
• Statistical analysis method
– Model or statistical test chosen
– With or without adjustment for factors related to disease severity
– Which subgroup analyzes

To avoid performing 15 analyses with the aim to


choose the most favorable or impressive one!
In Summary (1)
• A randomized clinical trial makes it possible to rigorously
assess a medical intervention
• PICO question: does the intervention improve the
evaluation criterion (outcome) in a given population, in
comparison with current practice (or placebo)
• Randomization makes it possible to isolate the effect of the
treatment by making the groups otherwise comparable
• Blinding allows the following:
– Ensure equivalent care in the groups being compared
– Avoid measurement or analysis biases
• Blinding can involve patients, researchers, caregivers,
analyst
In Summary (2)

• The placebo effect is a treatment effect due to


patient expectations; the placebo is an inert
substance that allows to achieve this effect
• 2 roles for the placebo:
– Neutralize the placebo effect
– Allow blinding
• Concealment of the allocation (we do not know in
which group the next patient will go to) prevents
manipulation of the recruitment
In Summary (3)

• Pre-specification of the main evaluation criterion


(outcome) and of the main analysis avoids a biased
interpretation of results
• It is recommended to compare randomized groups
(intention-to-treat)
• Any study on humans requires approval of the
research protocol by a Research Ethics Committee or
Institutional Review Board
Objective of the next lecture

• Illustrate the difficulties of evaluating medical


treatments, for example hydroxychloroquine in the
treatment of COVID-19, and some others…

You might also like