Editorial ajog.


Standard vs population reference curves in

obstetrics: which one should we use?
Cande V. Ananth, PhD, MPH; Justin S. Brandt, MD; Anthony M. Vintzileos, MD

What is a “standard,” and how does it differ from a

A bnormal fetal growth shapes disease risks later in the
perinatal period, at infancy, and in childhood stages
through chronic diseases and even death later in life.1
“reference” curve? In our view, authors and readers poorly
appreciate their distinctions. The goal of this Editorial is
Identification of fetal growth abnormalities, at both ex- focused exclusively on clarification of the similarities and
tremes of growth and particularly at the lower threshold of differences between standard and population reference
fetal size, continues to be debated vigorously.2 There are at curves and highlighting the strengths and weaknesses of
least 2 reasons that have fueled this ongoing debate. First, each approach.
fetuses grow at different growth velocities, and divergence in
growth is a phenomenon that is relatively well-observed at
later gestations. It is therefore important that the heteroge-
A note on terminology: standard vs (population) reference
neity in any given biometric parameter be taken into
A growth “standard” is constructed by the selection of only a
consideration when the percentile distributions are estimated.
part of the population, often those with no complications or
Fortunately, cutting-edge advances in the biostatistical
with normal outcomes (ie, “low-risk” subjects). In contrast, a
modeling literature have paved ways to address this effec-
“reference” (population) curve is based on an unselected group
tively.3,4 Second, and arguably a critical issue, is the selection
of subjects and combines both low- and high-risk subjects and
of the right growth curve on which fetal size is assessed. After
cases with normal and abnormal outcomes, thus being equated
all, the cohort composition of all growth charts is not the
with a “population” curve. We also must define the commonly
same, and the prevalence of abnormal fetal growth varies
used terms “nomogram” and “normogram.” The term nomo-
across different growth charts.2,5,6
gram is a mathematic term that was derived from the Greek
In this issue of the American Journal of Obstetrics and Gy-
words “nόmo2,” which means “law,” and “grammή,” which
necology, Hoftiezer et al7 develop a prescriptive (standard)
means “line,” and describes how the data points can be used
birthweight chart based on approximately 1.6 million well-
dated, singleton infants who were born in the Netherlands
E), which has nothing to do with the nature of the studied
(2000e2014). The birthweight charts were constructed after
population. The term normogram, which is derived from the
the exclusion of preexisting maternal high-risk conditions
word “normal,” is a graph that depicts the distribution or range
and high-risk conditions that developed later in pregnancy.
of normal values, regardless of the type of population studied.
The charts were derived from infants born at 23e42 weeks
Therefore, both terms can be applied to both standard and
gestation to “healthy” mothers after uncomplicated preg-
reference population curves. The clinical implications of the
nancies and spontaneous onset of labor (ie, low-risk sub-
use of standard vs reference curves depend on the specific
jects). The authors concluded that their “standards”
clinical scenarios of interest. Thus, an understanding of the
resemble the fetal weight “reference” charts and have greater
pros and cons for each approach is of paramount importance.
ability to discriminate between normal and abnormal
A glossary of various terms related to the assessment of size
and growth is described in the Table.
conclusion in a large Canadian study8 that compared the
INTERGROWTH 21st standards9e11 to Canadian birthweight
cautioned regarding the premature adoption of the standards.
Glossary of terms to define growth and size
Term Definition
Reference Statistical summary of the frequency distribution of fetal size of a reference population (descriptive); this
descriptive depiction of fetal size across gestational age is based on an unselected population.
Standard Nomogram of the frequency distribution of fetal size of an ideal population (prescriptive); this prescriptive
depiction of fetal size across gestational age is based on a selected, low-risk population and reflects
aspirational fetal size.
Nomogram A graphic representation that consists of several lines marked off to scale and arranged in such a way that,
by the use of a straightedge to connect known values on 2 lines, an unknown value can be read at the point
of intersection with another line (Merriam Webster’s definition); a nomogram can be applied to both
standards and references.
Growth chart Reference or standards of fetal size, not velocity; the data may be based on cross-sectional or longitudinal
ascertainment of ultrasound or birthweight data.
Descriptive chart Reference of fetal size that is based on a specific unselected population.
Prescriptive chart Standard of fetal size that is based on a selected, low-risk population.
Birthweight chart Reference of newborn infant size based on birthweights that are assumed to correlate with gestational age at
delivery; because preterm infants are more likely to be pathologically small, birthweight-for-gestational-age
charts generally underdiagnose small for gestational age at preterm gestations.
Ultrasound-based chart Reference or standards of fetal size based on sonographic biometric parameters of fetal size that correlate
with birthweight.
Individualized chart Individualized assessment of fetal size based on early ultrasound assessments and projected trajectory of
fetal growth.
Customized chart Statistical summary of the frequency distribution of fetal size that incorporates maternal and fetal
physiologic parameters (such as maternal race, parity, body mass index, and fetal sex).
Size Quantitative assessment of fetal size or estimated weight at a specific gestational age (usually from
cross-sectional assessments).
Velocity Quantitative assessment of the change in fetal size over time, which reflects fetal growth (usually from
a longitudinal study).
Growth Quantitative assessment of velocity (rate of change) that is ascertained longitudinally.
Fetal growth restriction Estimate of fetal size at a specific gestational age that is below a predefined threshold (usually the bottom
percentile) based on a specific reference or nomogram; although this distinction is intended to identify
fetuses who are at risk for adverse perinatal outcomes, some of these fetuses are not at risk for these
complications; the prevalence of fetal growth restriction is dependent on the reference or nomogram.
Small for gestational age Birthweight at a specific gestational age that is below a predefined threshold (usually the bottom percentile)
based on a specific reference or nomogram; although this distinction is intended to identify neonates who
are at risk for adverse neonatal outcomes, some small-for-gestational-age neonates are not at risk for these
complications; the prevalence of small for gestational age is dependent on the reference or standard.
Pathologic small size Distinction of small fetal size associated with adverse perinatal outcomes; optimally, only these fetuses
would be identified as fetal growth restricted; however, in clinical practice, some may be characterized
as normally grown and not exposed to antenatal surveillance.
Constitutional small size Distinction of small fetal size that denotes normal fetal size; optimally, none of these fetuses should be
identified as fetal growth restriction because they are not at risk for adverse perinatal outcomes; however,
in clinical practice, some of these fetuses may be characterized as fetal growth restricted and exposed
to potential, iatrogenic risks that are associated with false-positive antenatal surveillance and
associated interventions.
Second, should the curves that are generated be based on the singleton infants, born to ‘healthy’ mothers after uncompli-
exclusion of a particular set of high-risk conditions and not cated pregnancies and spontaneous onset of labor.” However,
others? For instance, the standard curves by Hoftiezer et al7 their study included preterm births that started at 23 weeks
were based on the exclusion of maternal high-risk conditions gestation. Are such newborn infants really low-risk or are
only and not fetal or neonatal factors. The authors stated that births at preterm gestations an abnormal phenomenon? It
the “final low-risk study population consisted of live-born would seem that any growth chart based on preterm
birthweights would have to be “descriptive” in nature; yet, Fourth, and perhaps the most underappreciated, issue,
Hoftiezer et al7 have created a standard that appears to be more pertains to the purpose for which a nomogram (standard or
effective at the identification of growth abnormalities than the reference) is intended to identify. For a given fixed percentile
Dutch birthweight charts. Another approach, which is mainly cutoff, the use of standard curves will increase the sensitivity
relevant to fetal growth curves, could have excluded not only in the classification of fetuses as growth-restricted or
all mothers with complications or high-risk conditions but also newborn infants as SGA; inevitably, the specificity will be
all neonates who experienced complications at or after birth. decreased. Thus, standard curves may be viewed as
But again, is the prospective clinical application of such curves “screening” tests (“Clinical implications”).
appropriate when the final outcome is not yet known? In short,
the inclusion criteria for a standard curve may reflect selection The pros and cons of using population reference curves
bias that limits the clinical applicability of the standards across Population reference curves are developed for a defined
specific populations. population and without any subject exclusions. The advan-
The selection bias that remains inherent in the development of tage of the use of population reference curves is that these can
each standard also prevents appropriate comparisons between be used prospectively to evaluate all patients, low- and high-
the standards. This point is emphasized by the recent confusion risk, without the need to know the outcome. The disadvan-
and debate regarding the adoption of the newly developed tage of the use of population curves is that these may not be
growth standardsethe INTERGROWTH 21st project,9-11 WHO comparable from population to population, given the dif-
Multicentre Growth Reference Study,13 or the NICHD’s Fetal ferences in racial and ethnic composition and other biologic
Growth Studies14einto clinical practice. Because the scientific factors that affect growth. Thus, investigators are “forced” to
community has grappled with the uncertain clinical utility of construct and use fetal and birthweight reference curves from
these standards, researchers have started to compare them with their own population.
established reference curves. As we have demonstrated earlier, Population curves based on birthweights have been used
the dramatically different assumptions and the inherent selec- widely to identify fetuses with pathologic growth. As a
tion bias complicate these comparisons. consequence of including preterm deliveries, these references
For example, INTERGROWTH 21st was applied to all underdiagnose impaired fetal growth at preterm gestations
singleton live births in Canada (excluding Quebec) from when there are high rates of growth restriction. On the other
2002e2012. With the use of the birthweight-for-gestational hand, ultrasound curve of Hadlock et al,19 which remains the
age, the frequency of small for gestational age (SGA) and most widely used fetal growth reference in the United States,
associated neonatal morbidity/mortality rates was determined was based on on-going pregnancies, thus including normal
and compared with the Canadian birthweight reference. The fetuses. This is the most likely reason for resembling the
study found important differences in the frequencies of SGA standard curves of Hoftiezer et al.7
and neonatal morbidity and mortality rates that were asso- In contrast to a standard curve, for a given fixed percentile
ciated with specific percentile categories. Although it is cutoff, the use of population reference curves will increase
possible that the difference reflects real “biologic” differences, specificity but may result in lower sensitivity in the detection
it is more likely that these differences are the consequence of of fetuses or newborn infants with abnormal growth. Thus,
varying cohort composition. population reference curves may be viewed as “diagnostic”
Third, the prospective clinical application of standard curves tests20 (“Clinical implications”).
that are created based on exclusion of preexisting high-risk
conditions may be problematic because they do not correct Clinical implications
for complications that may occur later in gestation. In real life, What are the clinical implications for the adoption of a
a proportion of women who are recruited early in pregnancy standard vs a population reference curve? Although the
inevitably will experience (pathologic) complications that are answer may be seemingly trivial, an important, yet under-
likely to affect fetal size and growth. These pathologies, appreciated, implication pertains to an understanding of the
depending on the gestational age at which they occur or are aims and scope of the generated curves. Charts from unse-
diagnosed, will have important implications in the identifica- lected (reference curves) vs selected (standard) populations
tion of newborn infants that are SGA or large for gestational each have a distinct role with respect to clinical implications.
age for pathologic vs constitutional reasons.15 Should these Application of a chart from a standard curve to assess
patients be later excluded because of the development of high- “newborn infant size” invariably will render the newborn
risk conditions such as preeclampsia or gestational diabetes infant relatively “large” (ie, higher percentile distribution) in
mellitus? Thus, the practicability of standard curves should be comparison with a reference curve. Although such distinc-
examined cautiously. It is important to emphasize that, no tions may be obvious for well-grown newborn infants, the
matter which curves are used, we may never be able to capture problem surfaces at the extremes of growth (percentiles at <5
all at-risk fetuses with a single assessment of fetal size and that or >95). It is in these percentile ranges where the application
longitudinal assessments of fetal growth that include individ- of an appropriate chart matters.
ualized and customized growth charts may be reasonable al- To put this in perspective, consider the evaluation of the
ternatives.16-18 effectiveness of a new test to identify subjects with a disease.21

