Cao et al.
Vol. 31, No. 8 / August 2014 / J. Opt. Soc. Am. A
1773
Scaling perceived saturation
R. Cao,1 M. Castle,2 W. Sawatwarakul,1 M. Fairchild,2 R. Kuehni,1 and R. Shamey1,*
1
Color Science and Imaging Laboratory, North Carolina State University, Raleigh, North Carolina 27695-8301, USA
2
Munsell Color Science Laboratory, Rochester Institute of Technology, Rochester, New York 14623-5603, USA
*Corresponding author:
[email protected]
Received April 9, 2014; accepted June 21, 2014;
posted June 26, 2014 (Doc. ID 209771); published July 23, 2014
Two psychophysical experiments were conducted at North Carolina State University (NCSU) and Rochester
Institute of Technology (RIT) to obtain replicated perceived saturation data from color normal observers on
the order of one unit of saturation. The same 37 Munsell sample sheets, including up to four references that
had similar perceived saturation but different hue, were used in both experiments. Different assessment methods
included presenting either four references simultaneously or only one reference at a time to observers and
obtaining judged saturation magnitudes for the given Munsell samples. Four saturation models comprising
S ab , S uv , CIECAM02, as well as Richter/Lübbe, were tested. CIECAM02 gave the best prediction of saturation
for data obtained at NCSU while S ab outperformed other models for the RIT data. For the combined dataset,
S ab , the Richter/Lübbe, and CIECAM02-based saturation models exhibited comparable performances. The Standardized Residual Sum of Squares index was used to measure the inter- and intra-observer variability and goodness
of fit. Inter- and intra-observer variability of assessments was smaller than or comparable to those reported for the
typical color difference evaluation experiments. © 2014 Optical Society of America
OCIS codes: (330.1710) Color, measurement; (330.5020) Perception psychology; (330.1730) Colorimetry.
http://dx.doi.org/10.1364/JOSAA.31.001773
1. INTRODUCTION
The term “hue” is generally defined as denoting an “attribute
of visual perception according to which an area (in the visual
field) appears to be similar to one of the colors named red,
yellow, green, or blue, or a combination of adjacent pairs
of these colors, considered in a closed circle.” So-called achromatic colors lack a hue and are named white, gray, or black.
They differ in terms of perceived lightness. These are two of
the three dimensions generally taken to define the multitude
of perceived colors of objects. Hues can be considered to have
a qualitative aspect, e.g., the orangeness of orange, but also a
quantitative aspect, from a very weak orange to orange at its
highest chromatic intensity. Viewing an orange-colored object
in daylight, in natural surroundings, results in a somewhat
weaker experience of orangeness. The former is considered
to provide the perception on an absolute basis with the three
dimensions being hue, brightness of the light viewed directly
or reflected from a white surface, and what today is called
colorfulness. This term was proposed by Hunt [1] in 1977
for the purpose of designating “an absolute subjective chromatic response.” On a relative basis the related terms are
hue, lightness, and chroma, with lightness defined as relative
brightness: the brightness of an area of an object in a field of
view relative to the brightness of another object that is perceived as white. Chroma is defined as the colorfulness of
the object compared to that of the object appearing to be
white. The term “chroma” was introduced at the beginning
of the 20th century by A. H. Munsell to designate chromatic
intensity of object colors at levels of equal lightness.
In 1705, in his book Opticks, Newton [2] used the term “intensiveness” to describe the perceptual property designated
by radial lines from the achromatic center to the periphery
of his hue circle. Helmholtz [3] used the term Sättigung
1084-7529/14/081773-09$15.00/0
(saturation) and, in the first edition of his Handbuch der
physiologischen Optik in 1867, appears to have been the
first to introduce the idea of a color cone with black at the
endpoint. In his 1874 book for artists, W. von Bezold included
a more detailed depiction of the color cone with colored
illustrations of the view toward both ends. A psychophysical
version of a color cone was sketched by Richter et al. [4]
in 1940, who drew lines of constant saturation into a
cross section through the Luther–Nyberg object color
solid (Fig. 1).
This figure indicates that a color cone is an idealization
and that the true picture is more complicated. Asked in the
1940s to develop a new kind of perceptually uniform color
atlas Richter experimentally determined a constant-saturation contour in the 1931 CIE chromaticity diagram and built
the system around it. The contour bears reasonable resemblance to the Munsell value 6/chroma 8 contour of the
Munsell system. But the Munsell system is a cylindrical,
not a conical, representation of chromatic intensity. The result of the effort is known as the DIN6164 system. The atlas
was developed based on limited experimental data, intraand extrapolation of those data, and an attempt to generate
a perceptually uniform system by transforming the chromaticity data in the CIE x, y diagram into MacAdam’s 1937 u, v
version of Judd’s perceptually more uniform UCS system of
1935 [5,6].
As is evident, over the years the definition of the term “saturation” has undergone a number of changes. With the introduction of the term “colorfulness” in an absolute sense and
with “chroma” defined as “colorfulness as a proportion of
the brightness of a similarly illuminated area that appears
white” the present definition for “saturation” is “colorfulness
of an area judged in proportion to its brightness,” in a sense
© 2014 Optical Society of America
1774
J. Opt. Soc. Am. A / Vol. 31, No. 8 / August 2014
Fig. 1. Lines of constant saturation drawn into a cross section
through the Luther–Nyberg object color solid [4].
comparable to that of the DIN6164 system. Expressed in terms
of the Munsell or colorimetric systems, some numeric definitions of saturation are as follows:
Munsell system:
S
C
:
V
CIELUV/CIELAB systems:
S uv 100
C uv
;
L
S ab 100
C ab
.
L
CIECAM02 system, where M is an expression for colorfulness
and Q represents brightness [7]:
r
M
:
Q
S 100
Richter/Lübbe formula: Lübbe interpreted a general concept
by Richter as expressible in the following formula [8]:
C ab
:
S 100 q
L2 C 2
ab
It is of some interest to compare the relationship between
Munsell chroma steps for two essentially complementary hues
and the CIELAB, CIELUV, and Lübbe formula results (see
Fig. 2). Based on the Munsell renotation of the optimal object
color limits at value 8 for hue 5Y and at value 4 for 2.5PB, essentially linear relationships are obtained between Munsell
chroma and both CIELAB and CIELUV S , while for the blue
color only S ab is linear. Here, the relationship between
Munsell chroma and S uv is strongly nonlinear because of
the compression of yellow and greenish yellow hue stimuli
in the u, v chromaticity diagram. The relationship is highly
nonlinear between the Lübbe formula data and Munsell
Cao et al.
chroma for both hues. The results indicate that these three
formulas predict quite different saturation values for identical
stimuli.
The concept of saturation is of theoretical and practical importance, but it is perceptually not as intuitively accessible as
that of chroma. When viewing, e.g., 2.5PB 2/4 and 5/12 blue
samples under the same viewing conditions it is not easy to
comprehend the essential identity of saturation as predicted
by S ab , even though blueness of the darker sample is clearly
visible in good illumination. It is even more difficult to conceptually accept that samples 5Y 3/4 and 8/14 have nearly
identical saturation. It is equally difficult to comprehend,
when viewing the sample series 2.5PB 9/2 to 2/2, that the
saturation between the lightest and the darkest sample increases by a factor of nearly 7. These seem to be abstract facts
not perceptually evident at first glance.
There have been relatively few perceptual experiments
that attempt to assess which formula is in best agreement with
extensive experimental perceptual determinations of saturation. A relatively recent evaluation is that by Juan and Luo
in 2000 [9]. For the purpose of saturation judgments, Juan’s
samples consisted of 132 NCS samples of different hues, lightness, and chromaticness, presented as cubes against white,
light gray, and black surrounds. Seven observers were
instructed in the meaning of the attributes, hue, lightness,
colorfulness, and saturation and, in case of saturation, observers made comparison judgments against three different-hued
reference samples. They were given a saturation value for the
reference sample and judged the saturation of the test samples
against it on an open scale. The observers were given
extensive training in judging saturation. Mean results for all
observers were compared against several appearance models
available at the time, with the best fit obtained with the
LLAB96 model (with the mean coefficient of variation for
the three surrounds of 21%). Of all four attributes estimated
in the general experiment, saturation was found to have
the highest inter-observer variability. A conclusion was that
observers can be trained well to make comparable saturation
judgments.
Using Munsell atlas samples for an experiment assessing
perceptual saturation is useful because, at a given value level,
all samples have identical colorimetric lightness, differing
only in chroma, unlike in the case of the NCS atlas samples.
While chroma steps are not closely related to saturation, sample series offer the opportunity to determine the mean
perceived saturation at a given lightness level as falling on
one or between two neighboring chroma steps. A related
benefit is the ability to see to what degree, if any, there is confusion by the observers about the sequence of constant value
samples in regard to saturation (and implicitly chroma).
Fig. 2. Munsell chroma versus saturation for 5Y (left) and 5PB (right).
Cao et al.
The purpose of the present experiment was twofold: (a) to
establish new data of saturation judgments, based on evaluating Munsell atlas samples against samples designated, based
on results from a preliminary experiment, as having an
arbitrary saturation value of 1; (b) determine the replicability
of results by performing the same experiment using the same
methodology and samples in a different location with different
experimenters and observers.
Samples of ten different hues were used in the experiment.
For each hue, at a given colorimetric lightness, there were
three or four samples differing in chroma around the sample
with a Munsell value and chroma-related saturation value near
1. Thus, this experiment is limited to the Munsell atlas samples
with the highest chroma. The methodology does not provide a
complete assessment of the perceptual concept of saturation
but provides data for a limited set of conditions. Assessing in a
similar manner the saturation of high and low colorimetric
lightness samples is, for most hues, not possible because of
the limited number of samples available in the Munsell atlas
at high and low values. An additional possible experiment
would be to assess, for as many as 40 hues, intermediate
saturation on the basis of the mean 1-unit data established
in this experiment, for example at 0.5 units of perceived
saturation.
2. METHOD
A. Samples
A total of 37 Munsell atlas samples were obtained from X-Rite
Ltd. Several 200 × 200 samples were cut from the same Munsell
sheets and used in both locations for the original and
replicated experiment. All samples were measured with a
DataColor SF600 spectrophotometer with a large area view
aperture (30 mm), and UV and specular light excluded. Samples were rotated 90° and repositioned after each reading to
reduce measurement error. Each sample was measured a total
of 4 times and the results were averaged. Illuminant D65 and
the CIE 1964 Supplementary Standard Observer were used for
all colorimetric calculations. The colorimetric attributes of the
samples, L a b , are given in Table 7 in Appendix A, with the
four reference samples identified in bold letters. The location
of samples in the CIELAB a b diagram is shown in Fig. 3.
Colorimetric data of the samples were also obtained before,
during, and after the experiment, with a mean change by
sample of 0.28 ΔEab units.
B. Sample Presentation
At NCSU samples were presented to observers in a SpectraLight QC calibrated viewing booth with a single test sample
presented above the four reference samples on a light gray
easel with its surface at a 45° angle relative to the booth surface and illuminated from above (as shown in Fig. 4) with a
calibrated filtered tungsten approximation of D65 illumination
at an intensity of 2200 lux and color temperature of 6560 K.
The measured L a b values of the easel are 72.51, −1.03, and
0.04, respectively, and those of the surrounding booth surface
are 75.71, −0.41, and 1.31. A PTFE white standard was placed
in the booth where samples were presented and the white
point of the light source was measured using a Photo
Research PR670 spectroradiometer. The absolute and relative
tristimulus values of the background, easel, and PTFE are
shown in Table 1. The presentation of the samples in the
Vol. 31, No. 8 / August 2014 / J. Opt. Soc. Am. A
1775
Fig. 3. Distribution of samples in the CIELAB a b plane (red stars
denote the location of the four references).
original experiment and its replication was essentially identical. However, the viewing booth used at RIT was a GretagMacbeth SpectraLight III set on “Daylight 75” with a CCT of 7500 K
and an illuminance of 950 lux, as measured by a Konica
Minolta CS-100A. The easel used was a copy of that used
at NCSU with similar L a b coordinates. While the actual
light sources employed in experiments were different from
the standard illuminants used for colorimetric calculations, results based on using the spectral power distribution of actual
light sources are only slightly different from those based on
standard illuminant data and do not alter the main findings
and conclusions drawn here. As such, only the standard illuminant data are used for all calculations reported in the
present work.
C. Observers
The number of observers in the experiment at NCSU was 28
[14 male (M), 14 female (F), average age 27], while 20 observers (10 M, 10 F, average age 30) were employed in the replication at RIT. All observers were tested and found to have
normal color vision. In the NCSU experiment, 72% of observers had little or no previous experience in making color related judgments, 52% were of Asian, 17% of Middle Eastern,
and 31% of Western ethnicity. In the RIT replication, 45% of
observers had little or no experience in color judgments
and 90% of the observers were Western and 10% were Asian.
Fig. 4. Visual assessment involving 45/0 illumination viewing geometry, and a custom made sample stand painted in neutral gray that
housed the standard and test samples.
1776
J. Opt. Soc. Am. A / Vol. 31, No. 8 / August 2014
Cao et al.
Table 1. Normalized Tristimulus Values of the
Easel, Booth Surface and the
PTFE Plate at NCSU
PTFE
EASEL
BOOTH
X10
Y10
Z10
95.82
47.30
42.22
100.00
49.41
44.42
111.18
53.58
49.35
statistical tool is its symmetry whereby S and V can be interchanged. In addition, STRESS is confined to the range of
0–100, where larger values mean worse agreement between
visual and computed saturation and vice versa. For a given
visual dataset, the ratio of the square STRESS values
from two saturation models, shown in Eq. (2), follows an
F-distribution, and can be used to compare the statistical
significance of two formulas at any confidence level:
The observers were requested to read an information sheet
about the experiment before the first test, including: (1) illustrations of the concept of saturation via a DIN color order
system on a color calibrated display [10]; (2) the procedure
of the experiment and the task of the observers as described
in Appendix B. Before each test observers were exposed to
the experimental setup for at least 5 minutes to adapt them
to the viewing conditions.
D. Test Procedure
At NCSU, after the adaptation period, two types of assessments were carried out. First, single samples were placed
in random order on the easel and the observer was asked
to assess the saturation of the test sample in relation to the
four references displayed below the test sample, each considered to have a saturation value of 1.0 based on results of a
preliminary experiment described in this section. The results
could be expressed in fractions of 1 or multiples and fractions.
Following the completion of this task, references 1 and then 3
were placed below the test sample, one at a time, and the
observer was asked to repeat the assessment of all samples
based on each reference. Thus, each observer assessed each
sample’s saturation value in one trial three times. In the replication experiment at RIT, however, all four references were
presented at the same time and observers gave four ratings of
each test sample based on the references, assuming that the
saturation value of each reference sample was 1. Each
observer performed the test sequence of all samples three
times, with at least 24 h between tests. If, during assessments,
observers wanted to reacquaint themselves with the concept
of saturation, due to having difficulty in assigning values, they
were allowed to view the illustrative examples on the computer display and review the information sheet. Several
observers expressed that assigning numerical values to saturation was difficult. About one third requested to view the instructions again in the second trial but felt more comfortable
completing the task in the third trial.
E. Measure of Fit
The standardized residual sum of squares (STRESS), shown in
Eq. (1), was proposed by Garcia et al. [11] as a tool to determine the goodness of fit for the visual data and predicted data
and the statistical significance of the differences between
models, and to evaluate the inter- and intra-observer variability in perceptual studies [12].
P
S i − F 1 V i 2 1∕2
STRESS 100
;
F 21 V 2i
P 2
S
where F 1 P i .
Si V i
(1)
Here, V i and S i are the visual and computed saturation for the
ith sample, and F 1 is a scaling factor. A key property of this
F
STRESS2A
:
STRESS2B
(2)
A critical F value, F c , which can be obtained from a lookup
table or calculated, is the lower value of a two-tailed F
distribution with 95% confidence level, where F C
fdfA; dfB; 0.025, and dfA and dfB are the degrees of freedom.
In this study, F C 0.51 and 1∕F C 1.96 [13]. Despite its clear
advantages over other metrics, it should be noted that
STRESS is based on a linear model that crosses the origin,
and should, thus, be used with caution [14].
To determine inter-observer variability, STRESS was calculated between the mean visual ratings from repeated trials of a
given observer and the mean visual ratings obtained from all
observers’ evaluations. For intra-observer variability, STRESS
was computed for the visual ratings of each observer in each
trial and the mean visual ratings from three trials for the same
observer. For the performance evaluation of various saturation formulas, the overall arithmetic mean visual ratings from
all observers and the predicted saturation values are used.
Calculation of the results based on geometric means was
not found to change the results significantly.
F. Selection of Reference Samples
The reference samples were selected on the basis of a preliminary test at NCSU involving five color normal observers using
an identical experimental procedure, except that in addition
to showing all four references simultaneously in part one of
the experiment (Mtd1) they repeated the test using each of the
four references individually (Mtd2). In the preliminary test,
10R4/8, 10G4/8, 10GY5/8, and 10PB3/6 were employed as
references. The criteria for the selection of references were:
(1) selected samples should exhibit close visual ratings based
on Mtd2 and (2) they should differ significantly in hue. The
experimental results for the methods described showed similar intra- and inter-observer variability. In terms of STRESS
inter-observer variability was 19.31 (Mtd1) and 16.61 (Mtd2)
while intra-observer variability values were 17.68 (Mtd1)
and 21.24 (Mtd2). The overall mean perceived saturation
based on the above two methods is shown in Fig. 5. Based
on the above criteria and using the evaluation results of
the preliminary experiment samples 10R4/8, 10YR7/10,
10Y7/10, and 10B3/6 were selected as references showing
visual saturation values of 1.365, 1.360, 1.412, and 1.343,
respectively, and in the following visual experiment were
assigned a saturation value of 1 unit.
3. RESULTS
A. Inter- and Intra-Observer Variability
The degree of inter- and intra-observer variability reflect the
“accuracy” of assessments by a given observer and “precision”
among a group of observers, respectively. An “accurate
observer” is one that agrees closely with mean visual results
Cao et al.
Vol. 31, No. 8 / August 2014 / J. Opt. Soc. Am. A
1777
Fig. 5. Scatter plot of visual results for the two methods examined in
the preliminary experiment.
from all observers, assuming that the mean values were the
“true” values for each sample.
The inter-observer variability results at NCSU are shown in
Fig. 6(a). They are comparable for the two different methods
examined. The mean, maximum, and minimum STRESS
values are 15.0, 33.0, and 6.5, respectively. Intra-observer variability results at NCSU and RIT are given in Table 2. Variability
based on Mtd1, when four references were presented simultaneously, was found to be larger than that for Mtd2. Little
variability in visual assessments for each observer was noted
when using only one reference at a time. The inter-observer
variability results of the replicated experiment at RIT are also
shown in Fig. 6(b). The mean inter-observer variability was
22.0, which is 7 units larger than that at NCSU. This may
be due to the fact that observers gave four ratings per sample
at a given time. Also, fewer observers were employed in the
RIT experiment resulting in a less accurate estimate of the
mean. Intra-observer variability at RIT followed the same
trend as that at NCSU. The mean STRESS value representing
the intra-observer variability for four references was nearly
the same (18). Intra-observer variability for the first trial
was found to be larger than that for the second and the third
trials. This has been reported for several psychophysical experiments previously [11,12], and has been linked to potential
observer “training.”
B. Statistical Analysis of Visual Results
To analyze the precision of the mean responses and variability
of the visual results, standard error (SE) and standard
deviation (SD) were computed and compared. The average,
maximum, minimum, and SE and SD, are listed in Table 3.
The average SD ranges from about 0.26 to 0.45 and the average SE is from 0.03 to 0.05, which is smaller than or equal to
that of typical psychophysical experiments pertaining to color
difference evaluation [15,16]. The results in Fig. 7 indicate that
the variability increases with an increase in saturation or
chroma for samples of the same hue. The variability is also
hue dependent. The 10YR and 10B samples exhibit the largest
mean SD (∼0.45) for both methods, while 10P, 10PB, and 10Y
exhibit the smallest mean SD value (∼0.26) depending on the
method used.
Visual saturation results of the four references were compared for the various assessment methods examined. The four
Fig. 6. (a) Inter-observer variability results at NCSU. (b) Interobserver variability results at RIT.
samples are Ref1:10R4/8, Ref2:10YR7/10, Ref3:10Y7/10, and
Ref4:10B3/6. The results shown in Table 4 show that no significant differences in perceived saturation based on methods
were obtained.
The STRESS between visual results obtained from two
methods was also calculated and was 4.16 for Mtd1 against
Mtd2Ref1, 5.30 for Mtd1 against Mtd2Ref3, and 5.30 for
Mtd2Ref1 against Mtd2Ref3, indicating again that no significant difference in responses based on methods is evident.
Comparable analyses for the results in the RIT experiment
are shown in Table 5. In this case the average SD values for the
four references used are: 0.71, 1.07, 1.07, and 0.78, and the
average SE values are: 0.09, 0.14, 0.14, and 0.10, respectively.
These values are almost twice those obtained in the NCSU
experiment. This may be due to a slight difference in the
experimental procedure used in two locations and/or related
to the two different sets of observers or number of observers.
The SD and SE of results obtained from Ref2 (10YR7/10) and
Ref3 (10Y7/10) are found to be larger than those based on Ref1
(10R4/8) and Ref4 (10B3/6). Ideally, the values in the diagonal
direction in Table 4 should be 1. Of the results shown, however, only sample 2 (10R4/8) has a value close to 1. This
1778
J. Opt. Soc. Am. A / Vol. 31, No. 8 / August 2014
Cao et al.
Table 2. Mean Intra-observer Variability Results of Three Trials at NCSU and RIT, Based on STRESS
NCSU
Mean
Max
Min
RIT
Mtd1
Mtd2(Ref1)
Mtd2(Ref3)
Ref1
Ref2
Ref3
Ref4
12.05
23.91
3.26
10.53
20.84
2.94
10.47
20.00
2.74
17.93
40.72
6.89
18.73
46.06
8.15
18.63
39.35
6.84
18.60
52.35
7.36
Table 3. Mean, SE, and SD of Saturation Determined Visually in Experiments Conducted at NCSU and RIT
NCSU
SD
SE
RIT
SD
SE
Mean
Max
Min
Mean
Max
Min
Mean
Max
Min
Mean
Max
Min
10R
10YR
10Y
10GY
10G
10BG
10B
10PB
10P
10RP
MEAN
0.32
0.63
0.14
0.03
0.07
0.02
0.75
1.25
0.34
0.10
0.16
0.04
0.45
0.74
0.20
0.05
0.08
0.02
0.86
1.44
0.36
0.11
0.19
0.05
0.27
0.48
0.14
0.03
0.05
0.01
0.70
0.98
0.40
0.09
0.13
0.05
0.32
0.50
0.19
0.03
0.05
0.02
0.83
1.12
0.60
0.11
0.14
0.08
0.31
0.47
0.18
0.03
0.05
0.02
1.01
1.24
0.70
0.13
0.16
0.09
0.32
0.41
0.21
0.03
0.04
0.02
0.99
1.23
0.77
0.13
0.16
0.10
0.43
0.77
0.18
0.05
0.08
0.02
0.94
1.38
0.47
0.12
0.18
0.06
0.26
0.31
0.19
0.03
0.03
0.02
1.09
1.29
0.86
0.14
0.17
0.11
0.27
0.41
0.18
0.03
0.04
0.02
1.04
1.31
0.80
0.13
0.17
0.10
0.30
0.59
0.17
0.03
0.06
0.02
0.87
1.34
0.57
0.11
0.17
0.07
0.32
0.53
0.18
0.03
0.06
0.02
0.91
1.26
0.58
0.12
0.16
0.08
indicates a possible visual interactive effect among the references employed in the perceptual assessment of saturation.
A similar trend to that seen in the NCSU experiment was
also observed for the RIT experiment, i.e., the variability increases with an increase in saturation or chroma of samples
with the same hue. However, the hue dependent variability is
not as obvious as that noted in the NCSU experiment when
using reference 1 or 4. Nonetheless, the variability for samples
10G, 10BG, 10B, 10PB, and 10P was found to be larger than
that for other hues. The mean visual saturation values from
RIT are shown in Table 5.
A comparison of the calculated STRESS based on different
methods indicates variability in the mean visual saturation
based on different references. The results based on reference
4 are largely different from those based on other references.
The STRESS for reference 1 against 4 is 6.55, that between
references 2 and 4 is 8.35 and that for references 3 and 4 is
7.74. These results are an indicator of the complexity of
assessing saturation visually.
C. Comparison of Results from Two Experiments
The NCSU experimental results using method 1 and method 2
are denoted as NMtd1 and NMtd2. The grand mean visual saturation data for four references from the RIT experiment are
denoted RMtd1 and those for references 1 and 3, which is the
same as that used in the NCSU experiment, are denoted
RMtd2. Scatter plots of NMtd1 against RMtd1, and NMtd2
against RMtd2 are shown in Figs. 7 and 8.
Table 4. Perceived Saturation Results Based on
Various Methods
Mtd1
Mtd2Ref1
Mtd2Ref3
10R4/8
10YR7/10
10Y7/10
10B3/6
0.96
1.04
1.02
1.03
1.04
1.08
1.01
1.09
1.04
1.00
1.06
1.09
Table 5. Mean Visual Saturation of Reference
Samples Based on Different References
Determined at RIT
Visual Saturation Based on Each Reference
Fig. 7. Agreement between NMtd1 and RMtd1 results.
Sample ID
10R4/8
10YR7/10
10Y7/10
10B3/6
10R4/8
10YR7/10
10Y7/10
10B3/6
1.08
1.15
1.35
1.29
1.15
1.14
1.42
1.46
1.09
1.16
1.28
1.26
1.11
1.19
1.47
1.21
Cao et al.
Vol. 31, No. 8 / August 2014 / J. Opt. Soc. Am. A
Fig. 8. Agreement between NMtd2 and RMtd2 results.
The two sets of results show general agreement with a
STRESS of 10.92 for Mtd1 in two locations and 11.79 for
NMtd2 against RMtd2. The results are strongly related with
a correlation coefficient of 0.95. However, there are also
certain differences between results:
(1) The visual saturation range of the two datasets is different. The ranges for NMtd1 and NMtd2 are 0.61 to 1.59 and
0.65 to 1.57, respectively, and those for RMtd1 and RMtd2 are
0.70 to 2.27, and 0.66 to 2.16, respectively.
(2) Visual steps between two successive chroma values,
especially for samples with high chroma, are larger in the
RIT experiment. This is in part due to responses from an
observer with large visual ratings.
D. Performance of Various Saturation Models
The performance of four saturation models, i.e., S ab , S uv ,
S Lübbe , and S CAM02 , against visual data were tested using the
STRESS index. Models’ performances against visual saturation were compared for Mtd1, Mtd2Ref1, Mtd2Ref3, as well
as the NCSU experimental grand mean (denoted S N ), as
shown in Table 6 and Fig. 9. Due to differences in the scales,
results were normalized to a range of 0–10 for comparison.
The normalization does not affect the STRESS values.
Results in Table 6 indicate that S CAM02 outperformed all
other models for the NCSU data with a mean STRESS value
Table 6. STRESS between Perceived Saturation
Against Saturation Based on Various Models (Bold
Letter Indicates Models with the Best Agreement)
STRESS
Location/Method
NCSU
RIT
COMBINED DATA
Mtd1
Mtd2Ref1
Mtd2Ref3
SN
Ref1
Ref2
Ref3
Ref4
SR
S ab
S uv
S Lübbe
S CAM02
14.38
13.84
15.61
14.41
14.31
16.46
15.78
15.43
14.99
13.67
21.60
21.15
22.22
21.56
23.93
25.44
25.58
24.97
24.68
22.73
11.87
10.67
11.14
11.15
18.10
19.63
18.10
18.45
18.20
14.51
11.19
9.65
10.00
10.22
18.39
19.87
18.43
18.33
18.41
14.36
1779
Fig. 9. Comparison of STRESS for different saturation models
against NCSU, RIT, and combined experimental results.
of 10.22. The Lübbe formula resulted in a slightly higher
STRESS value than S CAM02 . Both significantly outperformed
the S ab , and S uv models.
For the RIT experiment, S ab gives the best performance,
with a STRESS of 15.0. S Lübbe and S CAM02 models gave comparable performances, with STRESS values of approximately
18.3. The worst results for both mean sets of experimental
data were obtained from S uv .
For the combined NCSU and RIT data the mathematical
mean was calculated. STRESS between combined data, denoted S COM , and that computed by four models are also shown
in Table 6. For the NCSU data SCAM02 statistically outperformed the other three models, and S Lübbe was found to be
significantly better than S ab and S uv . For the RIT data, S ab performed statistically better than S uv . For the combined dataset,
modeling by S ab , S Lübbe , and SCAM02 gave comparable results,
with no significant difference between models; however, S uv
performed significantly worse than all other models.
4. CONCLUSIONS
Two psychophysical experiments were conducted at two laboratories to collect new saturation judgments using similar
methodology and the same Munsell atlas samples. STRESS
values indicate that the average inter- and intra-observer
variability of the experimental results at NCSU is smaller than
or comparable to that of typical color difference evaluation experiments. Results for the RIT experiment are slightly larger.
For samples with higher chroma values, responses were found
to be less consistent, reflected by higher associated SE. The
variability was also found to be hue dependent.
Visual results within each experiment for different methods
or different reference samples show good agreement, with an
average STRESS of 4.92 for the NCSU experiment and 5.52 for
the RIT experiment. The STRESS when comparing results
from two locations is 11.36 indicating reasonable agreement
between the two sets of data.
Among the four saturation models, i.e., S ab , S uv , S Lübbe , and
S CAM02 tested against the visual responses obtained. SCAM02
performed best for the NCSU results, slightly better than
the Lübbe model. Both significantly outperformed S ab and
S uv . For the RIT data, S ab was found to be the best, followed
by the S Lübbe and S CAM02 models, with S uv being the worst.
1780
J. Opt. Soc. Am. A / Vol. 31, No. 8 / August 2014
Cao et al.
According to STRESS, for the combined dataset, S ab , S Lübbe ,
and S CAM02 resulted in similar performance, better than S uv .
Given the differences in means and ranges of STRESS
values of essentially identical experimental conditions
(Ref1 and Ref3) it appears that a large number aof observers
(about 50) is required to establish statistically solid mean
perceptual saturation data.
APPENDIX A
a calibrated monitor. First, a set of samples with given saturation values (0, 1, 3, 5, 7), but different hue and lightness, are
displayed. Second, samples with the same hue (Red, Yellow,
Green, and Blue), but different saturation and lightness values
are shown. Finally, samples with the same lightness (1, 3, 5),
but different saturation and hue are shown. The arrangement
of samples will also be explained. This process can be repeated during the experiment if the observer is not certain
about their understanding of saturation.
Table 7. Colorimetric Values and the Visual Saturation of Samples and References (bold italic)a
Sample ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Munsell
X10
Y10
Z10
SN
SD
10R 4/6
10R 4/8
10R 4/10
10R 4/12
10YR 7/8
10YR 7/10
10YR 7/12
10YR 7/14
10Y 7/6
10Y 7/8
10Y 7/10
10Y 7/12
10GY 5/6
10GY 5/8
10GY 5/10
10G 4/6
10G 4/8
10G 4/10
10BG 3/4
10BG 3/6
10BG 3/8
10B 3/4
10B 3/6
10B 3/8
10B 3/10
10PB 3/4
10PB 3/6
10PB 3/8
10PB 3/10
10P 3/4
10P 3/6
10P 3/8
10P 3/10
10RP 3/4
10RP 3/6
10RP 3/8
10RP 3/10
14.45
15.85
17.68
18.11
43.39
45.99
45.63
46.29
37.71
38.02
37.48
36.92
14.75
12.86
11.85
7.83
7.09
6.09
4.91
4.32
4.00
5.94
5.83
5.47
5.53
7.06
7.63
8.68
9.00
7.66
8.68
9.38
10.16
8.40
8.78
9.49
11.03
11.31
11.28
11.78
11.32
40.15
41.75
40.43
40.18
40.95
41.16
40.55
39.99
20.06
19.45
19.63
12.36
12.88
13.12
6.88
7.02
7.71
7.17
7.43
7.68
8.40
6.77
6.94
7.49
7.39
6.52
6.74
6.67
6.56
6.73
6.35
6.75
6.29
6.24
4.12
3.15
1.85
13.38
9.27
5.35
2.51
16.31
10.97
6.17
3.07
10.94
7.48
5.39
11.63
11.45
11.25
9.99
11.26
13.40
13.28
16.87
19.89
24.65
12.97
16.07
20.49
23.92
10.00
12.13
13.62
15.01
6.78
6.16
6.44
6.00
0.82
0.99
1.28
1.48
0.78
1.04
1.44
1.58
0.63
0.78
1.04
1.32
0.82
0.98
1.29
0.98
1.1
1.26
1.01
1.2
1.25
0.91
1.04
1.31
1.55
0.9
0.99
1.12
1.16
0.94
1.06
1.15
1.26
0.92
1.05
1.02
1.45
0.15
0.14
0.36
0.63
0.21
0.23
0.61
0.74
0.22
0.20
0.16
0.48
0.19
0.26
0.50
0.18
0.28
0.47
0.21
0.34
0.41
0.21
0.20
0.55
0.77
0.23
0.20
0.29
0.31
0.22
0.18
0.29
0.41
0.21
0.20
0.20
0.59
SE
0.02
0.02
0.04
0.07
0.02
0.03
0.06
0.08
0.02
0.02
0.02
0.05
0.02
0.03
0.05
0.02
0.03
0.05
0.02
0.04
0.04
0.02
0.02
0.06
0.08
0.02
0.02
0.03
0.03
0.02
0.02
0.03
0.04
0.02
0.02
0.02
0.06
SR
SD
SE
0.71
1.11
1.48
1.99
0.71
1.16
1.76
2.27
0.70
0.86
1.38
1.74
1.04
1.19
1.62
1.13
1.43
1.86
1.08
1.42
1.82
0.92
1.3
1.73
2.17
1.19
1.29
1.38
1.6
1.15
1.47
1.63
1.94
0.96
1.20
1.11
2.03
0.34
0.56
0.87
1.27
0.37
0.55
1.11
1.45
0.84
0.41
0.75
0.92
0.67
0.72
1.15
0.75
1.15
1.24
0.84
1.03
1.26
0.49
0.93
1.06
1.39
0.92
1.22
1.08
1.30
0.89
1.09
1.09
1.35
0.63
0.85
0.73
1.39
0.044
0.072
0.113
0.164
0.047
0.071
0.143
0.188
0.108
0.052
0.096
0.119
0.086
0.093
0.149
0.097
0.149
0.160
0.108
0.133
0.162
0.063
0.121
0.137
0.180
0.119
0.158
0.139
0.167
0.115
0.140
0.141
0.174
0.081
0.110
0.094
0.179
a
Illuminant D65, CIE 1964 standard observer, S N and S R are the grand mean visual saturation from the NCSU and RIT experiments, SD, and SE represent the
mean SD and SE.
APPENDIX B
1. Procedure for Visual Assessment of Saturation
This experiment aims to elucidate our understanding for the
perception of the term “saturation.” There are three sections
in this experiment.
A. Section I
In this section the aim is to provide an understanding of the
meaning of the term saturation by showing a set of samples on
B. Section II
In this section, the observer is asked to determine a numerical
value for the saturation of 37 samples. For each sample, four
different assessments involving four different reference samples are conducted. Observers will wear a gray lab coat and
gray gloves and sit in front of the empty viewing booth for at
least two minutes to adapt to the source. During the experiment, a test sample and a reference sample will be placed on a
custom stand at a 45° viewing angle, with a gap between them.
Cao et al.
An arbitrary value of 1 is given to the reference sample on the
left, and the observer gives a numerical rating of the saturation of the test sample based on the reference. The value can
be multiples or fractions of 1, e.g., 0.5, 1.2 or 2 or more.
C. Section III
In this section, the observer will assess the saturation of 37
samples in the presence of four reference samples shown simultaneously. The assigned numerical saturation value of all
reference samples is 1, and the observer will give a rating for
the test sample based on the reference samples. The rating
can be multiples or fractions of 1, e.g., 0.5, or 2 or more.
Notes:
• Observers are notified that there are no right or wrong
answers.
• If they find it difficult to provide a rating for the saturation of samples during the experiment, they may ask for additional training.
• Observers are asked to refrain from handling the
samples and ask the experimenter if they would like to move
them.
2. Saturation
Thank you for participating in our experiment to measure our
perceptions of saturation.
Saturation is one attribute of our perception of color.
Other attributes include lightness (black is of low lightness,
white is of high lightness) and hue (often described by color
names such as red, yellow, green, blue). For this experiment,
we are interested in the perception of saturation independent of perceived lightness or hue.
The formal, technical, definition of saturation is:
the colorfulness of an area judged in proportion to its
brightness,
where colorfulness is: the attribute of a visual perception
according to which the perceived color of an area appears to
be more or less chromatic.
And brightness is: the attribute of a visual perception according to which an area appears to emit, or reflect, more or
less light.
More practically, saturation can be thought of as how much
a color stimulus differs from a neutral (white, gray, or black)
stimulus in terms of the intensity of perceived hue present. A
neutral, or gray, color has no hue present and therefore a saturation value of zero. A vivid red color is clearly different from
gray in that it has a hue with an intensity and therefore has a
saturation significantly greater than zero [the exact amount
will be defined by the reference color(s) in the experiment].
You are being shown examples of sets of colors of constant
saturation at various lightness levels. The different sets are for
various hues and saturation levels. Each set is of constant saturation and labeled for the saturation to provide an idea of
Vol. 31, No. 8 / August 2014 / J. Opt. Soc. Am. A
1781
what changes in saturation look like. It is most important
to note that each set of color samples illustrates constant
saturation across a range of lightness rather than any changes
in saturation.
There are other ways to describe the intensity of hue in a
color stimulus. These are known as colorfulness and chroma.
In this experiment, we are not interested in those attributes.
We are only interested in your perception of saturation.
If you need any clarification on the definition of saturation,
please ask the experimenter to review the examples with you.
Specific instructions for defining the reference color(s) and
completing the experiment follow.
Thank you.
ACKNOWLEDGMENTS
The authors thank Mr. Art Schmehling, Munsell Color Services
Business Manager (X-Rite) for donation of Munsell sheets.
Thanks are also due to all observers who took part in the
study.
REFERENCES
1. R. W. G. Hunt, “The specification of colour appearance. I.
Concepts and terms,” Color Res. Appl. 2, 55–68 (1977).
2. I. Newton, Opticks (Smith and Walford, 1704), p. 117.
3. H. v. Helmholtz, Handbuch der Physiologischen Optik (Leopold
Voss, 1867), p. 283.
4. M. Richter, I. Schmidt, and A. Dresler, Grundriss der
Farbenlehre der Gegenwart (Steinkopff, 1940).
5. D. L. MacAdam, “Projective transformations of I. C. I. color specifications,” J. Opt. Soc. Am. 27, 294–297 (1937).
6. D. B. Judd, “A Maxwell triangle yielding uniform chromaticity
scales,” J. Opt. Soc. Am. 25, 24–35 (1935).
7. CIE, “A colour appearance model for colour management systems: CIECAM02,” CIE Publication 159 (CIE Central Bureau,
2004).
8. E. Lübbe, “Sättigung im CIELAB-Farbsystem und LShFarbsystem,” Ph.D. dissertation (Technische Universität
Ilmenau, 2011).
9. L. Y. G. Juan and M. R. Luo, “Magnitude estimation for scaling
saturation,” Proc. SPIE 4421, 575–578 (2002).
10. http://www.vcsconsulting.co.uk/home.html, retrieved 4/8/2014.
11. P. A. Garcia, R. Huertas, M. Melgosa, and G. Cui, “Measurement
of the relationship between perceived and computed color
differences,” J. Opt. Soc. Am. A 24, 1823–1829 (2007).
12. M. Melgosa, P. A. García, L. Gómez-Robledo, R. Shamey, D.
Hinks, G. Cui, and M. R. Luo, “Notes on the application of
the standardized residual sum of squares index for the assessment of intra- and inter-observer variability in color-difference
experiments,” J. Opt. Soc. Am. A 28, 949–953 (2011).
13. http://danielsoper.com/statcalc3/calc.aspx?id=, retrieved 4/8/
2014.
14. E. Kirchner and N. Dekker, “Performance measures of colordifference equations: correlation coefficient versus standardized
residual sum of squares,” J. Opt. Soc. Am. A 28, 1841–1848 (2011).
15. M. R. Luo and B. Rigg, “Chromaticity-discrimination ellipses for
surface colours,” Color Res. Appl. 11, 25–42 (1986).
16. R. Berns, D. H. Alman, L. Reniff, G. D. Snyder, and M. R.
Balonon-Rosen, “Visual determination of suprathreshold
color-difference tolerances using probit analysis,” Color Res.
Appl. 16, 297–316 (1991).