Tooth-Size Discrepancy and Bolton's Ratios: The Reproducibility and Speed of Two Methods of Measurement
Objective: To determine and compare the reproducibility and speed of two methods of performing Bolton’s tooth-size analysis.
Design: Analysis of randomly selected clinical sample.
Setting: Bristol Dental Hospital, University of Bristol, United Kingdom.
Materials and methods: Pre-treatment study casts of 150 patients were selected randomly from 1100 consecutively treated
Caucasian orthodontic patients. Bolton tooth-size discrepancies and ratios were measured using two methods; one method
employed entirely manual measurement and the Odontorule slide rule, while the other employed digital callipers and the
HATS analysis software. Twenty study casts were measured twice, a week apart with both methods. Another three
investigators also measured 20 study casts twice with the HATS analysis.
Results: There were small or no systematic errors within or between these two methods. A very significant difference was
evident for mean time measurements between the two methods (mean time for HATS was 3.5 minutes and for Odontorule was
8.9 minutes). There was relatively high error variance of both methods of measurement as a percentage of the total variance.
Conclusions: On-line electronic measurement was found to be more rapid than the manual method used. Both methods
demonstrate relatively high random error and this has important consequences for the clinical use of Bolton’s ratios.
Key words: Bolton’s ratios, tooth-size discrepancy, reproducibility, methods of measurement
values but not the correction values in millimetres. If a points. Contact points were defined at the points on the
measurement is to be used to determine a therapeutic proximal surfaces, as observed or estimated as those
intervention, then it is important to know the measure- which should be touching when the teeth were perfectly
ment errors in relation to the planned dimensions of that aligned.
intervention. Houston15 wrote that if any study using Method 1: this employed the Odontorule for analysis
measurements is to be of value, it is imperative that such of the maxillary–mandibular tooth size relationship.
error analysis be undertaken and reported, and the very This employs a rotating wheel, which is in-effect a
discipline of undertaking an error analysis should also circular slide rule, and was developed by Dr David C.
improve the quality of results. Hamilton and Dr Charles W. Patton based upon studies
The aims and objectives of the present study were to by Dr Wayne A. Bolton as an aid to measurement which
determine and compare the reproducibility and speed of would be faster and more convenient than looking up
two methods of performing Bolton’s tooth-size analysis. tables of figures. The mesiodistal tooth sizes were
The two methods were a manual measurement with measured in millimetres manually to the nearest
the Odontorule (Dental Corporation of America, West 0.5 mm with Helios sliding callipers. The sum of the
Chester, PA, USA) and a computerized method – the total maxillary and mandibular teeth (6–6) and sum of
Hamilton Arch Tooth System (HATS) (GAC the anterior maxillary and mandibular teeth (3–3) were
International, Central Islip, NY, USA). The null calculated using a calculator. The total and anterior
hypothesis was that there would be no difference ratios were determined by Bolton’s formulas.5 The
between the methods with regard to rapidity or amounts of correction in the maxillary and mandibular
reproducibility. The central hypothesis was that at least arches for the total ratio and anterior ratio are obtained
one method would be sufficiently rapid and sufficiently on the rotating wheel. Each analysis was timed by a
reproducible to be a robust and practicable method of stopwatch from the first measurement to the final
measurement for an individual clinical case. computation.
Method 2: this employed the HATS software which is
Materials and methods available from GAC. All the study casts were measured
to the nearest 0.01 mm with digital callipers (PRO-
Pre-treatment study casts of 150 patients were used in MAX Digital Callipers, Fred V. Fowler Co., Inc.,
this study. The study model numbers of 1100 patients Newton, MA, USA) connected to a computer. The
treated consecutively in a teaching hospital from 1999 to HATS software calculates the Bolton’s ratios and also
2002 were obtained from the laboratory database and a recommends the tooth size correction in either arch to
computer-generated list of random numbers was used to achieve Bolton’s average ratio for an ideal occlusion.
select the sample from this consecutive series. If a case The entire procedure was timed from initial measure-
was discarded because it did not meet the selection ment to availability of the calculated results and the
criteria, the next consecutive eligible case was included. results were then printed. Neither the method of
This sample therefore included a random selection of measurement nor the timing of the measurements was
different malocclusions representative of an orthodontic amenable to blinding of the assessor at recording.
treatment population. The Chairman of the Local
Research Ethics Committee confirmed that ethical Assessment of reproducibility
approval for measuring study casts was not required.
The following selection criteria were used: The principal intra-examiner reproducibility procedure
consisted of the primary investigator (SAO) measuring
N good quality pre-treatment models;
20 sets of study models randomly selected from the
N a fully erupted permanent dentition from first molar
larger group of 150 patients. These patients were
to first molar;
selected from the 150 by means of a computer-generated
N Caucasian ethnicity.
random number list. A sample of 20 sets of study models
Rejection criteria included: was deemed to be adequate and sufficiently representa-
tive in relation to the variance of the larger group and to
N gross restorations, build-ups, crowns, onlays, Class II
the size of error regarded to be of clinical significance,
amalgam or composite restoration that affect the
which was judged to be half the size of TSD (1.5 mm)
tooth’s mesiodistal diameter;
which Proffit6 considered of potential clinical signifi-
N congenitally missing teeth and impacted teeth.
cance. All the teeth were measured twice for the two
The mesiodistal diameter tooth sizes were measured methods, with a week between the measurements. An
from first molar to first molar at the level of the contact inter-examiner calibration involved three additional
examiners – an experienced orthodontist (NWTH) and significant difference found between the two sets of
two senior trainees (CD and SD) with four years of measurements, however the P value for time was close
previous orthodontic training. These three examiners to statistical significance (P50.058) and again showed a
also measured the 20 study models twice using the small reduction in time with the second set of measure-
HATS method only, to determine intra- and inter- ments. Table 1c contains the analysis of systematic
examiner systematic error and to compare random differences between four examiners using the HATS
errors for that method. method. Statistically significant differences were found
between the four examiners for the three total arch
Statistical analysis measurements, but not for the anterior arch measure-
The distribution of data was evaluated for normality. ments or the time. Results also reflected the results for
For assessment of the systematic error, repeated SAO in Table 1b, in that there was no within-operator
measures analysis of variance (ANOVA) and the systematic bias for the HATS method for any of the
Greenhouse–Geisser approximation were used to test operators. The mean figures in Table 1c are the averages
statistical significance. Greenhouse–Geisser is a stan- of the duplicated measurements.
dard method of dealing with sphericity with the
Random error
assumption that each of the examiners was related to
each other in the same way. The paired-sample t-test Random errors are given in Table 2 for Odontorule and
was used to evaluate the systematic error and the Table 3 for HATS for observer SAO. The error variance
differences in timings for the two methods of measure- is a high percentage of the total variance for all measures
ment. Random error was calculated in terms of the and for both methods of measurement. The total variance
standard deviation of the differences in replicate for the sample of 150 was, by chance, much higher than
measurements as advocated by Houston.15 The variance the total variance for the sample of 20 used for the
of the difference between two replicate measurements is duplicate measurements. A sample size of 20 was judged
double that of a single measurement, so the variance of to be completely sufficient for all analytical purposes
the differences must be halved to give a correct estimate except for the comparison of error variance to total
of the error for a single measurement. This measure was variance. The total variance of the randomly chosen 20
preferred to the root mean square error (as advocated by subjects was unpredictably larger by chance than for the
Dahlberg,16 and which is still frequently employed), sample of 150, so the variance of the full 150 sample was
because it avoids the possibility of any systematic bias the better choice for the comparison of error variance to
affecting the assessment of random error. Dahlberg’s total variance. Extending the duplication of measure-
formula is only accurate if there is no systematic bias. ments to a number greater than 20 would only stand a
The analysis in the present study included the percen- small chance of increasing the reliability of all other values
tage of the total sample variance that consists of error in the reproducibility analysis. Table 4(a–d) therefore
variance (the variance of replicate measurements), contains the random error analyses for all four observers
because Houston makes the crucial point that the for the HATS method, using this more representative
potential effect of random error on interpretation of complete sample variance for comparison with the error
results can only be properly estimated in relation to the variance. The percentages of error variance were corre-
variance from all sources in a representative sample. spondingly lower than in Table 3, but still high.
casts twice by the same investigator. The HATS results to the study. This familiarization factor is of potential
showed no significant differences between the means significance for the occasional user in a clinical setting.
although the mean reduction in measurement time of The Odontorule results also show statistically signifi-
0.21 minutes or 12 seconds approached significance. cant differences in the mean total correction values. The
For the Odontorule, the mean reduction in measure- mean differences are approximately 0.6 mm and are
ment time of 42 seconds for the second measurement therefore small in terms of clinical significance – Proffit6
was statistically significant. Both these results suggest felt that a discrepancy of ,1.5 mm is rarely of
that a process of familiarization was still occurring significance. The same values when measured with
during this part of the study, in spite of fairly extensive HATS show a similar trend but to a smaller and non-
use of both methods by the principal investigator prior significant extent. Table 1c shows statistically significant
Table 1 (a) Intra-examiner reproducibility (systematic error): Odontorule. Observer SAO (n520).
Descriptive Mean time 1 Mean time 2 Mean difference Lower Upper P value
Total Bolton ratio (%) 91.23 91.90 –0.68 –1.13 –0.22 0.054
Upper total correction (mm) –0.29 0.45 0.72 –1.26 –0.18 0.007*
Lower total correction (mm) 0.07 –0.49 0.56 0.04 1.07 0.012*
Anterior Bolton ratio (%) 78.07 78.80 –0.73 –1.40 –0.17 0.088
Upper anterior correction (mm) 0.80 0.94 –0.14 –0.53 0.25 0.216
Lower anterior correction (mm) –0.60 –0.77 0.17 –0.23 0.54 0.270
Time (minutes) 9.53 8.82 0.70 0.34 1.06 0.001*
Table 1 (b) Intra-examiner reproducibility (systematic error): HATS. Observer SAO (n520).
Descriptive Mean time 1 Mean time 2 Mean difference Lower Upper P value
Total Bolton ratio (%) 91.18 91.58 20.40 21.15 0.03 0.135
Upper total correction (mm) 20.40 0.29 20.43 21.00 0.14 0.134
Lower total correction (mm) 0.13 20.26 0.40 20.13 0.91 0.135
Anterior Bolton ratio (%) 78.15 78.30 20.15 20.79 0.50 0.639
Upper anterior correction (mm) 0.54 0.66 20.12 20.50 0.30 0.389
Lower anterior correction (mm) 20.41 20.51 0.10 20.21 0.41 0.393
Time (minutes) 3.97 3.46 0.21 20.00 0.43 0.058
systematic differences in the mean measurements method of measuring is probably inappropriate. The
obtained by the four operators. These mean differences results from the full sample of 150 revealed a substan-
were very small (less than 0.5% and less than 0.5 mm) tially larger total variance and the expert statistical
and were again confined to the total arch measurements. advice was that the larger sample is a more valid
Nevertheless, the existence of any systematic error indicator of total variance in the orthodontic popula-
suggests that considerable familiarity with these techni- tion. The results using this total variance are in Table 4
ques is required before there is stability of point and show that for the main examiner (SAO, Table 4a),
identification and that occasional use of this analysis is this estimate of the percentage random error was much
not appropriate. Inter-operator errors were not analysed closer to the 10% recommended by Midtgård et al.17 but
for the manual Odontorule method because it had remains higher than is desirable for a robust measure-
already become apparent that the substantial additional ment method.
time required for this method, with no evidence of The inter-examiner reproducibility was then assessed
greater reproducibility, made this a method which could to see whether this percentage of error variance was
not be recommended for clinical use. particular to the main examiner. It can be seen in
Table 4(b–d) that for all examiners the error variance
Random error was a higher percentage of the total variance than
advocated by Midtgård et al.17 and by Houston.15 There
The results for the main examiner (SHO) for the two were some differences between the examiners, but none
methods are shown in Tables 2 and 3, and they are of these differences in random error was statistically
similar for both methods. The standard deviation of significant as assessed by Greenhouse–Geisser. Two
replicate measurements is of the order of 1 mm for investigators (CD and SD) were much less familiar with
correction and 1% for ratios. These standard deviations the HATS method, but their random error was similar
are significant, being more than half the size of TSD to the main investigator (SAO) who was significantly
(1.5 mm) which Proffit6 considered of significance. It is more familiar with this process. NWTH was familiar
also important to place the error variance in the context with the method and over a longer period of time and
of the total variance of the sample. Midtgård et al.17 had lower random error values, which were within the
suggested that the error variance should not exceed 3% recommended 10%, but higher than the ideal 3%. The
of the total variance, and if it exceeded 10% the applied results suggest that experience may reduce random error
Table 4 (a) Random error: Hats. Observer SAO. Using complete sample variance (n5150) for error variance %.
Table 4 (b) Random error: Hats. Observer NWTH. Using complete sample variance (n5150) for error variance %.
Table 4 (c) Random error: Hats. Observer CD. Using complete sample variance (n5150) for error variance %.
Table 4 (d) Random error: Hats. Observer SD. Using complete sample variance (n5150) for error variance %.
to a worthwhile extent, but not to a level where between the HATS and the Vernier callipers was
confidence can be placed in a single measurement. r50.825. However, there was no separate test for
Statistical analysis of this suggestion is complicated by random error with either method in their study.
the difficulty in quantifying with validity the relevant Shellhart et al.8 studied the reliability of Bolton’s
experience in a group of four operators. tooth-size analysis when applied to crowded dentitions
These results are clearly important in relation to the using needle-pointed dividers and the Boley gauge. For
assessment of a single patient. Great caution should be 14 of the 16 measures, there was no statistically
exercised before instituting an intervention on the basis significant difference. Random error was estimated by
of one measurement of the Bolton discrepancy. correlation coefficients. These varied very greatly from a
Confidence in the calculation of discrepancy is particu- reasonable correlation of r50.79 to a very low figure of
larly important if the resulting intervention is reduction r520.15 for intra-investigator errors. Intraclass coeffi-
of tooth width by interdental stripping or extraction. To cients for measurements made by four investigators
reduce the random error for both methods of measure- ranged from 0.80 to 0.29. Many of these values are
ment explored in this study, a clinician is strongly therefore very much lower than would be considered
advised to measure the same study models three or four desirable for a good method of measurement. The
times and then average the values obtained before authors agreed with this view and stated that, ‘If a
committing to any active intervention. clinician’s repeatability of the Bolton analysis is average,
calculations of tooth-size discrepancy should be viewed
Error in relation to tooth irregularity as ¡2.2 mm.’ This recommended confidence level is
very large in relation to a clinically significant TSD and
Locating contact points on a crowded dentition is
their conclusion begs the question as to what method
difficult. The sample in this present study consisted of a
should, in their view, actually be used to decide on
variety of malocclusions with a range of crowding.
therapeutic intervention.
Shellhart et al.8 found that every investigator made at
Zilberman et al.9 also reported that measurement with
least one error in measurement that was greater than a
clinically significant value for the tooth-size excess when digital callipers on plaster models showed better
measuring Bolton discrepancies on crowded dentitions reproducibility than measurements on virtual computer-
(at least 3 mm of crowding) with a Boley gauge and ized models (OrthoCAD). Importantly, the repeated
needle-point dividers. It would be possible to take study measures of the total tooth-size widths were evaluated,
casts in the middle of treatment for analysis once but not Bolton’s ratios or the discrepancies. They found
alignment had been achieved and in very crowded both random and systematic errors were very small and
dentitions, this is advisable. clinically insignificant. The error of the sum of the tooth
widths is, however, likely to be much smaller than the
error in the calculated Bolton’s ratios or correction in
Comparison of errors between methods of measurement
millimetres, because the sum of tooth widths is a much
The present study indicated that there were no larger absolute figure. Direct comparison of the
differences between the two methods of measurement. reproducibility of Zilberman et al.9 with the present
This result is in agreement with that reported by study is therefore not reliable. As has been mentioned,
Tomassetti et al.7 who compared the HATS system some well-known studies did not report the measure-
and Vernier callipers. Their correlation coefficient ment error at all11 or very inadequately.12–14
Table 5 Comparison of mean results for the Odontorule and HATS methods (n520). Observer SAO. Paired t-test.
Descriptive Mean Odontorule Mean HATS Mean difference Lower Upper P value
Total Bolton ratio (%) 91.23 91.18 20.05 20.56 0.45 0.825
Upper total correction (mm) 20.29 20.40 0.14 20.41 0.68 0.603
Lower total correction (mm) 0.07 0.13 0.05 20.46 0.56 0.825
Anterior Bolton ratio (%) 78.07 78.15 0.07 20.59 0.84 0.774
Upper anterior correction (mm) 0.80 0.54 20.26 20.64 0.12 0.168
Lower anterior correction (mm) 20.60 20.41 0.18 20.17 0.61 0.341
Time (minutes) 8.93 3.49 25.45 25.88 25.01 ,0.001
