Describing A Sample: August 2001 Number 2
Describing A Sample: August 2001 Number 2
Describing A Sample: August 2001 Number 2
Describing a sample
Contents unrepresentative as neither the age group nor
• Samples and populations indication are representative of the population in which
• Intersubject variation the drug will be used.
• Standard deviation
• Use of graphs to illustrate data This is why the inclusion and exclusion criteria of a trial
• Centiles and ranges should be specified to allow the reader to judge whether
a suitable sample has been selected
Samples and populations
Intersubject variation
What is a “sample”?
A sample is a group of individuals and/or observations
selected from a larger group for purposes of analysis. What does intersubject variation mean?
Opinion polls are an example. The views of a selected It is unlikely that all patients in a sample will respond
group are analysed in order to predict those of the similarly to an intervention. Some will respond better
population. than others, and some may fail to respond at all. This
is known as intersubject variation.
What is a “population”?
In statistical terms, the population is an entire group of How relevant is intersubject variation?
individuals and/or observations about which information The degree of such variation will influence the reliability
is sought. It does not necessarily refer to people; a of measurements and subsequent analysis of results.
population may be a collection of blood pressure or For example, for antihypertensive drugs, if the range of
height measurements. blood pressure responses is wide and the sample size
small, the mean blood pressure reading may be an
unreliable index of response.
In clinical trials, what do these terms mean and how
are they used?
Selected groups of patients (samples) are exposed to Although differences in individual responses to a drug
various treatments to assess their response. From the may relate to recognised patient characteristics, most
results, conclusions are drawn regarding the efficacy of differences cannot be explained and are therefore
such treatments in general use (i.e. within the attributed to random variation. This causes problems,
population from which the samples are drawn). as it is a factor that may significantly influence results.
To overcome this, we use statistical analyses to give
the reader an indication of the potential influence of
Why is choice of sample important?
random variation on results. This will be discussed
Not surprisingly, a sample representative of its
further in later issues.
population is more likely to yield results that predict the
response of that population than an unrepresentative
sample will. How is the spread of values, or variability, within a
sample described?
An obvious measure is the range, the difference
For example, if a new antihypertensive drug were to be
between the highest and lowest values. This was
used in an elderly population with moderate
discussed in the first issue of this series of bulletins.
hypertension, a trial with a sufficiently large sample of
However, the range is influenced by extreme values
elderly patients with this condition would constitute a
and will vary from sample to sample. One way to
representative sample. A trial involving a sample of
overcome this problem is to use the standard deviation.
young patients with malignant hypertension would be
Standard deviation
√
2
SD = ∑(x-x)
n-1
SD = standard deviation
x = the individual values within the sample
x = the mean value
n = the number of values
∑ = sum
The histogram plots the heights (in 5cm intervals)
For example, imagine that we have measured the against the number of men who had a particular height
heights of 20 men to the nearest centimetre. The i.e. the height frequency.
respective heights are:
165, 170, 170, 172, 175, 167, 170, 167, 150, 155, 170, What does the histogram describe?
172, 187, 180, 177, 167, 167, 172, 162, 180. A histogram describes frequency distribution. This
shows the relationship between individual values in a
The mean value ( x = 170) is derived from adding all the sample and the frequency with which those values
values together (to give 3395) and dividing by the occur. Thus, figure 1 illustrates that 7 men had a height
number of values (n = 20). between 170 and 174cm while only one had a height in
the range 155 to 159cm.
If we subtract the mean value from each value in turn,
we would get both negative and positive numbers i.e. - What if the sample size is larger?
2, 0, +1 etc. and if we added these together we would As the sample size increases, the distribution of the
get 0! But if each of the numbers is squared, we get a samples will approximate more closely to that of the
positive number. parent population.
2 2 2
Hence, Σ [(165-170) , (170-170) , (170-170) , (172-
2
170) ….etc], gives Σ [25, 0, 0, 4…etc] which equals Does the shape of the frequency distribution
1341. matter?
In figure 1, the values are roughly symmetrical around
What is the variance of a sample? the mean value of 170cm. If the sample size is larger
The variance is a statistical term which describes the the histogram will become even more symmetrical. This
variability within a sample. Using the above example, type of frequency distribution within a population is
to calculate the variance, we divide 1341 by [n-1] i.e. known as Normal (or Gaussian) distribution. It follows a
19, to give 70.6 (the variance). curve which is symmetrical around the mean with a
characteristic bell-shape. This is illustrated in figure 2.
Why is n-1 used rather than n in the formula for
standard deviation?
The use of n-1 involves a rather elusive statistical
concept known as degrees of freedom. We shall return
to this in a future bulletin. By using n-1 we obtain a
closer estimate of the variability around the mean within
the population from which the sample is taken.
Therefore, as n (the sample) gets larger, the difference
between n and n-1 is reduced.
The information contained in this bulletin is the best available from the resources at our disposal at the time of publication. Any
opinions reflected are the author’s own and may not necessarily reflect those of the Health Authority
parameters – the mean and the standard deviation. Its
curves are always symmetrical and bell-shaped - the
extent to which the bell is flattened or compressed
depends on the standard deviation of the population.
The information contained in this bulletin is the best available from the resources at our disposal at the time of publication. Any
opinions reflected are the author’s own and may not necessarily reflect those of the Health Authority