Perturbation Study of Shading in Pictures: Jan J Koenderink, Andrea J Van Doom, Chris Christou

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Perception, 1996, volume 25, pages 1009-1026

Perturbation study of shading in pictures

Jan J Koenderink, Andrea J van Doom, Chris Christou


Helmholtz Instituut, Universiteit Utrecht, Princetonplein 5, NL 3584 CC Utrecht, The Netherlands;
e-mail: [email protected]
Joseph S Lappin
Vanderbilt University, Nashville, TN 37240, USA
Received 17 January 1996, in revised form 26 July 1996

Abstract. Pictorial relief was measured for a series of pictures of a smooth solid object. The
scene was geometrically identical for all pictures, but the rendering was different. Whereas all
pictures were monochrome full-scale photographs, they were taken under different illuminations
of the scene, the source being frontal and displaced towards either the upper left, the upper right,
the lower right, or the lower left. It was found that different illuminations led to significantly
different, systematic alterations of pictorial relief. It is concluded that though shape constancy
under changes in illumination might be said to rule in the first rough approximation, the devia-
tions from true constancy are indeed both significant and systematic. Different from either stim-
ulus-reduction or cue-conflict paradigms, this 'perturbation analysis' shows that shading is used
as an important source of information even if the particular illumination appears to be ignored
at first blush. For all subjects, brighter parts in the stimulus were consistently interpreted as
being nearer in pictorial space, both for the global layout and for the subsidiary relief.
1 Introduction
When you look at a photograph of a familiar scene you are aware of it at—at least—
two different levels of experience simultaneously (Gombrich 1982). Perhaps most
obviously, you are aware of the photograph as an object. This is the level you need to
interact with the photograph in many important ways, for example, if you want to pick it
up or hand it to someone. This tangible object is covered with pigments distributed in a
certain order. On another level the photograph is a window into 'pictorial space'.
Pictorial space is in a sense a true three-dimensional space, much like the space
we move in, but it is not a tangible entity. Two dimensions are closely tied to the
dimensions in the plane of the picture in the sense that 'A is on the left of B' can be
said both of pigment blobs in the picture plane and of objects in the pictorial scene.
Moreover, we are aware of an immediate relation between the pigment blobs and the
objects in the scene. The 'depth dimension' of pictorial space, however, has no such
immediate relation to the distribution of pigments. Extension in depth is not necessarily
defined at all points in a picture, and may be only ordinal where it is defined. 'Depth'
is a poorly understood function of both the picture and the observer—of both the
distribution of pigments in the picture plane and the observer's experience and skills in
reading pictures (Clifton 1973; Hale 1980). Indeed, pictorial space is an aspect of the
observer's perceptual awareness in viewing a picture. The picture and the observer are
both relevant factors in investigating the structure of pictorial space.
Moreover, pictorial space may be expected to be similar in many aspects to the
visual space of natural scenes. Thus it seems more apt to say that observers possess
competences based on a tacit knowledge of the laws of ecological optics relating the
generic structure of the optical environment to the structure of its images. Because we
have some scientific understanding of the optical relations between environmental
structure and image structure, we may investigate whether and to what extent observers
exploit such knowledge in perceiving spatial structure of pictures. The idea that observers'
1010 J J Koenderink, A J van Doom, C Christou, J S Lappin

perceptions of pictorial space reflect lawful relations between environmental structure


and image structure need not be incompatible with the notion that visual sensitivity to
'depth cues' is acquired by a probabilistic, trial-and-error process similar to biological
development and evolution.
The 'method of reduction' has generally been proven very powerful in the sciences.
It has also been applied to the study of perception with many notable successes. It is
only natural that the study of (pictorial) depth cues has been approached from this
perspective. One hopes to isolate given cues and to study them in a pure state, without
contamination from other cues, or one tries to set up cue conflicts between a pair of
cues in the absence of others and studies the interactions. Regrettably but under-
standably, such efforts easily lead to misleading or artificial results in this case. For
instance, we have ourselves shown that shading, in the absence of other optical structure
such as contour, often evokes nonveridical perceptual judgments (Erens et al 1991,
1993a, 1993b). Such impoverished stimuli do not look like pictures of anything and
observers are hard put to see any pictorial space to begin with. Forced-choice
methods, of course, allow one to collect data but subjects tend to hate the task because
it seems unreasonable to them. However, when other potential sources of information
such as disparity, texture, or—especially—contour are introduced it appears that these
largely govern the result (Bulthoff and Mallot 1988; Todd and Ackerstrom 1987) and
the shading appears almost immaterial. In other cases one can show that the percept
of shading is affected by contour and other cues (Buckley et al 1994; Knill and Kersten
1991). Are we to conclude then that the shading is essentially irrelevant? In a previous
paper (Koenderink et al 1996) we have argued that the method of reduction is ill suited
to address this issue and indeed strongly biased towards a negative result. Others
have argued this same point (Blake et al 1993). The task set to the subject in the cue-
reduction paradigm is like asking for the sound of one hand clapping; forcing any
answer will of course produce results, but these are unlikely to bear upon the issue.
Articulate pictorial space is an entity that only appears if the picture is rich in
optical structure, eg a photograph of a natural scene. For instance, in the Erens et al
(1991) study the overall slant of the background surface was ambiguous (like that of
the blue sky) and led to appreciable intersubject differences. In renderings of natural
scenes intersubject differences tend to be small (Koenderink et al 1992). 'Pictorial
space' indeed occurs in many degrees of articulation, for example, it may be like
looking into a fog or at the blue sky. This is quite different from looking at a clear
picture of a solid object such as a piece of sculpture. In the latter case, subjects experi-
ence no problems in assigning surface attitudes to many locations in the visual field
(the picture) in a reproducible manner. Such mutual agreement and reproducibility is not
found in the former ('less-articulated') cases. Stimulus reduction has to lead to less
articulation (because one removes information) and when carried so far as to lead to
poorly articulated pictorial space may be said to represent an artificial situation if the
aim is to study pictorial space in realistic settings. In the aforementioned Erens et al
(1991) study we clearly maneuvered ourselves in this situation.
If one sets out to study a 'single cue' (shading, say) one method that might
conceivably yield reasonable results is to vary this component of the total optical
structure parametrically in the context of a rich bouquet of diverse optical structures.
The variations in the response, for so far as they correlate with the parametric varia-
tions, must be due to the fiducial aspect that is being varied. There is no need to
isolate the 'cue' (and end up with artefactual results) because the correlation picks it
out automatically. In this paper we demonstrate how this method reveals the true
potency of shading in articulating pictorial relief.
The method is to vary the shading parametrically in a scene that appears by and
large unambiguous to the casual observer, for instance, a single well-defined object
Perturbation study of shading in pictures 1011

against a simple background. We used an object of uniform surface quality placed on


a featureless tabletop against a uniform back panel. The object was a generic, moderately
articulated, smoothly curved shape with a fairly intricate silhouette. In this setting the
pictorial space is dominated by the articulations of the object. The parametric variation
of the shading was implemented as a set of four photographs of the identical scene
under displacements of the (single) light source (figure 1).
The pictures were parametrized by the location of the source relative to the object.
The source was always frontal to the object, that is, between the object and
the camera, and displaced sideways along one of the diagonal directions. The angle
subtended by the directions of illumination and the direction of viewing was about 45°.

Figure 1. The four shaded images arranged in the order of the position of the source: upper left,
stimulus UL; upper right, stimulus UR; lower right, stimulus LR; and lower left, stimulus LL.
The geometry of the pictures is identical to that used in a previous experiment in which a silhouette
and a cartoon rendering were investigated.
1012 J J Koenderink, A J van Doom, C Christou, J S Lappin

We refer to the source locations as upper left (UL), upper right (UR), lower right
(LR), and lower left (LL). Notice that a displacement of the source led to a multitude
of changes in the scene. This is the case because the photometric situation was indeed
complicated: the major effect was the shading of the object due to variations of
the local surface attitude (Lambert's law) and the screening of the extended source by
the object itself (so-called 'vignetting'). The coating of the object was white, semigloss;
thus specularities were apparent and depended on source position. The object, table,
and walls of the room scattered light of the source, mainly evident in an illumination
of the shadow side of the object. Even though the real photometry is very complicated, it
remains the case that the source position is a simple and effective parametrization. It
is easy enough to replicate the illumination setup, for instance.
We have also prepared silhouette and cartoon renderings of the same object in the
same setup. We have reported results on these stimuli in an earlier paper (Koenderink
et al 1996). These data are an important baseline for the present results and we draw
on it heavily in this paper. Here is a short summary of the relevant findings. The
silhouette is only a poor source of information: naive observers produce results that are
sometimes only weakly articulated within the contour as compared with the pictorial
relief obtained with a shaded picture, sometimes surprisingly articulated but idiosyn-
cratic. In the latter case the articulation is largely due to the observer's share, though
constrained by the actual contour. In contradistinction, the cartoon rendering by itself is
sufficient to evoke a fully developed pictorial relief that is in most respects remarkably
similar to what is obtained with a regular photograph. If the subject had prior exposure
to the object depicted or to pictures of it, then familiarity enables the subject to get
much from even the silhouette alone. Thus the addition of shading is certainly not
required to obtain a good impression of solid shape. The question is rather whether
the addition of the shading changes the pictorial relief to a measurable extent. This
proves to be the case, and the changes are such that the subjects agree better with each
other in the case when shading is present than when it is not. This may be interpreted
as a sign that pictorial relief is to a lesser extent due to the observer's share and
more strongly caused by the optical structure if shading is present. Monitoring the
nature of the change then lets us circumvent (at least partially) the problem of contam-
ination with other potential sources of information or familiarity, either of a generic
kind or specific (prior exposure to the object depicted).
Of course, we expect the variations to depend on the blend of other optical
structures present in the image of the scene. We perform a differential measurement, of
which the base point may be varied. The base point summarizes the experimental
conditions other than the fiducial parameter. Thus one might repeat the experiment at
different base points, such as different backgrounds, in order to study the effect of
contour contrast on the shading, etc. The present experiment is thus only one specific
instance. A 'zero-level' base point (the perfect elimination of all pictorial cues except
for the fiducial one) is of course a fictional ideal. There exists an essential difference
between settings in which the subject has the actual impression of looking at the
depiction of something real or only the distribution of gray levels or colors in a certain
simultaneous order. (Consider the difference between naturalistic and abstract art.) In
the latter case (our examples in Erens et al 1991, 1993a, 1993b) we argue that stimulus
reduction was driven too far. In the former case we may speak of a certain degree of
'naturalism' and only then the concept of 'cue' assumes a meaning. In such cases the
fiducial cue will typically be very far from being 'isolated', ie very far removed from
the zero base point.
Perturbation study of shading in pictures 1013

2 Methods
2.1 Stimuli
As objects we used mannequins, commercially available for the display of fashion
items. These objects can be obtained in a variety of postures of both genders. Many
have rather complicated shapes yielding interesting silhouettes from many vantage
points and striking shading patterns for typical 'Hollywood-type' illuminations
(Nurnberg 1948). The mannequin used in this study had a twisted pose [the "figura
serpentinata-type" (Lomazzo 1958)] and a semigloss white finish. Thus there was no
information available due to texture, but there was important optical structure in the
form of contour and shading.
The mannequin was photographed in a frontal, anterior pose. The stimuli were in
continuous gray tone shown on an 8-bit monitor. Quality of the full scale pictures
was comparable to good postcard-size photographs (see figure 1). Notice how different
these pictures are: light-dark relations may actually reverse for many pairs of locations
if one goes from one picture to another.
Our observers did notice the different illuminations, but accepted the stimuli as
pictures of one and the same three-dimensional shape, even for the illuminations from
below. For the shaded pictures we used rather simple illumination schemes (Hunter
and Fuqua 1990): a single broad source (a halogen bulb in a metalized umbrella
reflector) provided the main illumination, a large matt white reflection screen filled in
the shadows. The background was carefully controlled and promoted a 'wraparound
illumination' effect (Hattersley 1979): at the light side of the object it was slightly
darker, at the dark side slightly brighter than the object.
2.2 Subjects
The experiment was performed by three subjects (also authors). CC is emmetropic,
AD and JL are slightly myopic and wore their usual correction.
The same subjects also participated in the earlier experiment aimed at the investi-
gation of the importance of contour (Koenderink et al 1996).
2.3 Viewing conditions
Viewing was monocular; a chin rest was used to fix the vantage point. Viewing
distance was 57 cm and the vertical dimension of the target was 17 cm.
Stimuli were presented on a computer screen. The room was dimly lit, thus the
subjects clearly perceived the monitor. They also perceived the frame of the picture on
the computer desktop screen. At all times they were fully aware of looking at pictures.
2.4 Paradigm
We used a paradigm that is fast and intuitive for the subjects and that allows us to
draw detailed inferences concerning local attitude (slant and tilt) of the pictorial relief.
This method has been described in detail elsewhere (Koenderink and van Doom
1995; Koenderink et al 1992). Briefly, we superimposed a 'gauge figure' on the picture
and let the subject change the geometry of this probe so as to perceptually 'fit' the
relief. The gauge figure was an ellipse and the subject tried to fit the ellipse (adjusting
both its orientation and its eccentricity) so that it appeared as a circle 'painted' on the
pictorial surface. The ellipse is interpreted as a circle in projection, thus we can
convert the parameters of the ellipse to depth gradient or (equivalently) slant and tilt.
About 300 points in the picture within the area of the silhouette were probed (one at
a time) in random order. The settings were repeated several times (on different days)
to allow us to judge reproducibility.
We used this method to operationalize 'depth'. Of course, there are numerous con-
ceptual problems involved in any such method, and we are indeed committed to a
long-term endeavor to clarify some of the issues in detail. However, for our present
1014 J J Koenderink, A J van Doom, C Christou, J S Lappin

purpose the idiosyncracies of the method appear of little concern. We are only interested
in changes of the results that correlate with changes in the illumination. The specific
operationalization is simply one other parameter of the base point of the perturbation
analysis.
2.5 Data evaluation
Data were evaluated in depth-gradient space. This is indeed the natural setting since
it represents the raw observations immediately and isotropy of pictorial space is not
assumed. (Thus, for example, surface normals could not be used because their definition
implicitly assumes that the frontoparallel and depth dimensions are commensurable.)
We applied the usual statistical methods to judge the scatter in the data, then
checked whether the data conformed to any pictorial surface at all. In order to be able
to do so, the curl of the empirically determined depth gradient should vanish within
the scatter. This was the case for all conditions. Then surfaces were fitted to the data.
These surfaces are a convenient and intuitive summary of the data. They summarize
the deterministic structure of the data completely, ie what remains is explained by the
session-to-session variability. We show these results as depth maps with isodepth
contours. Each such picture succinctly presents about 600 data items (empirical slant
and tilt at 300 points), thus the comparisons were based on 2400 data items per
subject, each a mean of nine settings. Comparisons between conditions were made
on the basis of these surfaces (referred to as the 'pictorial reliefs').

3 Experiment
3.1 Procedures
Subjects AD and CC performed the task of fitting gauge figures nine times on each
of the stimuli (denoted UL, UR, LR, and LL), subject JL only three times. All subjects
finished one stimulus before moving on to the next, though in different orders. In one
session each point at which the pictorial surface was probed was visited three times.
Different sessions were done on different days. The total experiment lasted for several
weeks.
A typical result (for the stimulus with illumination from the upper left, subject JL)
is shown in figure 2.
3.2 Results
3.2.1 Scatter in the settings. For each point at which the pictorial surface attitude was
probed we have nine independent settings, yielding a point cluster in depth-gradient
space. Our standard procedure involved finding a two-dimensional median and
discarding the two points farthest from this median location as possible outliers. (Thus
we get rid of possible mistakes by the subjects who sometimes hit the 'ready' button
accidentally, as the settings were done at a very high pace. This occurred actually quite
rarely. It turns out to be the case that leaving out a fixed percentage of outliers or
not hardly affects our conclusions at all.) The remaining cluster of seven points was
used to find a mean location and a covariance ellipse. This yielded a very robust and
fully objective initial data processing.
The mean location was used as the result of the settings. The covariance ellipse is
represented by the principal axes and the orientation. If we compare results for all
vertices, we find that the ellipses were very eccentric, with the major axis aligned with
the gradient direction (direction of steepest slope, axis ratios varied from 1.9-2.6).
This means that most of the variance was in the slant whereas the tilt (at least for
nonvanishing slants) was rather well defined (see table 1). The standard deviation in the
magnitude of the depth gradient (that is, the tangent of the slant angle) amounted to
about 6% to 17% plus a constant (slant) rms deviation of about 2°.
Perturbation study of shading in pictures 1015

aP^J^b*
^OOOOOOOOOOGO
OOOOOOOOOOGr
W
O O G O O O O 0 e-
eOGGOOOOO^G
00000000005-
<?*?OOOOOOOG3-
%oooooooe-
^OOOOOOQO-
-0 OOOOOO Gf
yQOOOOOGf
ySOOOOOGfr
/OOOOOOGGJ-
/0OOGOGGG0-
y OOOOOOOGGr-
yOOOOOOOO© I-
-y© © 0 0 0 0 0 0 ( 9 08-
iO 0 0 0 0 0 0 0 6 ) 0 0 -
1 GOG0OOG5SO9-
-10000000(5000
-DOOOOG^^OOO^Q-G
4 0OO0GQ^e0OG0
JO O O 0
Figure 2. Raw data (slant/tilt settings at all vertices of the triangulation) for subject JL and
illumination from the upper left. Notice that in the actual experiment the subjects were not
aware of the triangulation at all and dealt with the gauge figure one location at a time only
(in random order). This figure is merely a convenient compilation of the results.
3.2.2 Consistency. In all cases the data turned out to be perfectly consistent within
the tolerance given by the scatter in repeated sessions. The violations of surface
consistency (failure of the curl of the depth gradient to vanish identically) were indeed
fully explained by the scatter in repeated sessions, which was not significantly different
for the various stimuli (see table 1.) This was also true for the silhouette and cartoon

Table 1. Standard deviations in the magnitude of the depth gradient (tangent of the slant angle):
constant error; the trend as a Weber fraction (a percentage of the magnitude); a measure of the
anisotropy (ratio of the standard error within ±45° of the gradient direction to that within
±45° orthogonal to the slant direction); and the rms violation of the surface consistency constraint.
For illumination conditions, see figure 1.

Subject Condition Constant error/ 0 Weber fraction/% Anisotropy Violation/pixels

AD UL 1.8 16.5 1:2.5 0.55


UR 2.3 15.0 1 2.3 0.71
LR 1.5 16.1 1 2.2 0.58
LL 2.3 15.4 1 2.6 0.83
CC UL 4.4 14.7 1 2.2 0.35
UR 3.4 14.3 1 2.2 0.36
LR 2.3 17.0 1 2.2 0.49
LL 3.2 13.8 1 2.1 0.35
JL UL 0.7 8.6 1 1.9 0.63
UR 1.4 6.1 1 2.1 0.64
LR 1.0 7.9 1 2.1 0.48
LL 1.0 8.5 1 2.1 0.59
1016 J J Koenderink, A J van Doom, C Christou, J S Lappin

renderings on which we reported earlier (Koenderink et al 1996). Typical violations


amounted to about half a pixel in depth, which is very small in view of the fact that
typical depth variations over a face were many pixels (up to a hundred pixels). A value
of half a pixel corresponds roughly to about a degree in the direction of the surface
normal, much less than the scatter of the normal over repeated settings.
This result is perhaps surprising because the pictorial surface in the case of the
silhouette is completely 'filled in' whereas in the case of the shaded images there are
strong indications of the nature of the physical relief. Given the fact that we find
consistent surfaces for these degenerate images it is perhaps hardly surprising that we
also find them for the shaded images.

4 Variation of pictorial relief with direction of illumination


Even by eye one can detect obvious differences between the depth maps obtained
for the various illuminations, although the main impression is certainly that of
great similarity between the data. Depth maps for subject JL are shown in figure 3.

Figure 3. Equidepth maps for subject JL. For illumination conditions, see figure 1.
Perturbation study of shading in pictures 1017

These similarities are perhaps surprising in view of the fact that h u m a n observers
typically have great difficulty with recognition tasks when the illumination is unfami-
liar, for example, from below in a portrait (Johnston et al 1992). This may have to
do with an inborn inclination to assume illumination from above ( R a m a c h a n d r a n
1988). Whereas some famous people are indeed easy to recognize in illumination
from below (for example, famous portraits of Hitchcock are often illuminated from
below in order to graphically illustrate his status as 'master of suspense'), this is the
exception, and most people have a hard time to recognize their spouses illuminated
in this manner. The practical importance of this observation is illustrated by the facts
that photographs expressly m a d e for identification purposes are always made under
standard illumination (passport portraits, police photographs), and that the names
of many Hollywood stars have become associated with pictures made with standard
lighting setups (eg G r e t a Garbo, frontal high direct light). Thus one may well expect
dramatic differences. This proves not to be the case though: the depth maps appear

Figure 4. Depth residuals maps for subject JL. The shades of gray (including black) denote
the 0%-25%, 25%-50%, 50%-75%, and 75%-100% interquartile ranges. For illumination
conditions, see figure 1.
1018 J J Koenderink, A J van Doom, C Christou, J S Lappin

very similar, an indication that no dramatic distortions of three-dimensional shape


occur. "Shape constancy pertains under changes of illumination" appears a reason-
able first summary of the data. The subject apparently manages to discount the
illumination geometry, either by ignoring the shading altogether (and relying on the
contours alone) or by some sophisticated process of discounting.
Despite the obvious similarity, the differences are nevertheless significant. Maps of
the residuals (see figure 4) reveal a systematic pattern.
Some of the apparent differences are apparently due to an overall slant of the
pictorial relief that seems to be induced by the illumination. That is why a straightfor-
ward correlation between the conditions is thus not necessarily a good indicator of
the essential similarity between the pictorial reliefs. We can show this if we first com-
pute the average depth map and then try to find an overall slant and tilt that mini-
mizes the sum of squares of the residuals. We explain this in somewhat greater detail.
Let {x, y} denote the position of a location in the picture plane, and let z denote the
depth. For another rendering, the depth z at the same position {x, y} will differ
from z. A regression of z on z over all locations is referred to as a straightforward
correlation of the data. A multiple regression of z on z, x, and y will be referred to as
an affine correlation. In the afiine correlation the frontoparallel plane regresses to a
slanted plane. Thus in the straightforward method we find only a depth scaling, and in
the affine method both a depth scaling and an overall slant and tilt. Because the over-
all slant has to be referred to some standard we referred it to the mean, thus we
affmely correlated all results with the average result. We find that the correlation could
significantly be increased through the introduction of overall attitude variations with
direction of illumination (affine correlation). Typical slants needed for a best fit are of
the order of four degrees.
Notice that the illuminance directions are in a cyclical order, namely UL (upper left),
UR (upper right), LR (lower right), and LL (lower left), where UL again succeeds LL.
In our analysis the results inherit this cyclical order. If we plot the deviations in
attitude space (polar coordinates with the slant as radius and the tilt as azimuth)
we find four points, and connecting them according to the inherited cyclical order we
obtain a quadrangular polygon without intersecting edges. Indeed, the polygons prove
to be convex, to include the origin, and to encircle the origin all in the same sense
for all subjects (see figure 5). In all these quadrangles not only is the sense of orienta-
tion the same, but the UL point is in the same quadrant. Such a concordance between
subjects is indeed highly improbable by chance alone, thus there need be little doubt

Figure 5. The global attitude changes needed to bring the pictorial reliefs as close to the
frontoparallel plane as possible (in the least-squares sense). The slant angle is plotted in the radial
direction, the tilt angle determines the orientation in this polar diagram. Results are shown for
subjects AD, CC, and JL.
Perturbation study of shading in pictures 1019

that the overall attitudes are causally related to the direction of illumination. The overall
attitude change is such that the frontal plane is apparently tipped so that the part in the
direction of the source comes forward; for example, for illumination from the upper left
the upper-left part of the frontal plane moves towards the observer. Since on the object
this part will generally be the brightest (because the object is turning away in depth at the
contour), we can also say that the global attitude change is such that the brightest parts
tip forward.
4.1 Principal-components analysis
One way to study the effect of the direction of illumination is to perform a principal-
components analysis (Mardia et al 1977). We take the affinely corrected depth maps
(otherwise we would just find the global effect once more) and stack them to form a
matrix of four by about three hundred. Since we did a normalization (the affine match-
ing to the mean) we expect this matrix to have rank three (there exists one linear
relation between the four items). Indeed, if we do a singular-values decomposition we
find that three of the singular values are several orders of magnitude larger than the
fourth one. The components corresponding to these singular values are quite similar
for all subjects (see figure 6 and table 2). The major component is simply the average
depth. The two minor components look roughly like a vertically and a horizontally
slanted plane.

1st PC 2nd PC 3rd PC


Figure 6. The first three principal components (PCs) for subject JL. The shades of gray (including
black) denote the 0%-25%, 25%-50%, 50%-75%, and 75%-100% interquartile ranges.

Table 2. Table of the results of a singular-values decomposition (principal-components analysis) of


the depth residuals for all subjects. Listed are the three major singular values and the percentage
of the variance explained by the first one, two, or three principal components.

Singular value Variance explained/%

1054 96.1
199 99.5
77 100.0
1053 93.8
265 99.8
50 100.0
872 93.4
207 98.6
106 100.0
1020 J J Koenderink, A J van Doom, C Christou, J S Lappin

Figure 7. Projections of the residuals projected on the second and third principal component for
subjects AD, CC, and JL.
If we project the residuals from the affine correlations on these latter two components
we find again a convex quadrangle of a single orientation for each subject (see figure 7).
Notice again the common phase (position of the UL point). It is indeed highly unlikely
that this concordance between subjects would arise from chance. We see a causal effect of
the displacement of the source on all subjects responses. This indicates that the relief
depends on the direction of illumination. If we carefully consider signs and directions we
find that near points of the relief (parts sticking out towards the observer) have a
tendency to follow the source, ie a roughly spherical bulge appears like an egg with the
sharp end pointing somewhere between the directions to the vantage point and to the
source.
There may or may not exist a relation to the cue-reduction findings of Erens et al
(1993b).
Thus the tendency to judge 'brighter' as 'closer' holds both for the overall attitude
and for the modulation of it, that is, the subsidiary relief on a local scale riding on
the global surface attitude (as revealed by the residuals of the affine correlation).
4.2 Factor analysis
The principal-components analysis as described above is intuitively less satisfactory
because the components (though clearly close to slants about vertical and horizontal
axes) have no immediate relation to the physical causes of the stimulus. Indeed, we
understand the structure of the shading quite well and we should be in a position to
predict the components from the physics instead of having to deduce them from the
data (though that certainly lends the analysis an enhanced degree of objectivity). If we
do this we essentially do a factor analysis (Mardia et al 1977) with factors that have a
clear interpretation and are intuitively more satisfactory because they yield a handle
on the causal interpretation of the results.
Here it should be understood that, whereas factor analysis itself is a well-defined
statistical technique, the actual factors are the result of 'art': one posits factors (by
educated guesswork for instance) and then the factor analysis reveals to what extent the
data are 'explained' by these factors. If the factors explain little of the variance in
the data, then the conclusion should be that the guess was wrong; if they explain an
appreciable part, then the factors are likely to reveal an underlying cause.
The factors that are implicated by the physics are the components of the range
gradient. Indeed, in simple cases (collimated source, all surface patches illuminated)
the surface illuminance is a function of only the direction of illumination and the
range gradient (that is, the local surface attitude). If the range gradient of a patch
points towards the source the illumination will be higher than that of a frontoparallel
patch. If the range gradient points away from the source the illumination will be lower
than that of a frontoparallel patch. In realistic situations the range gradient cannot be
expected to explain the luminance recorded by the camera fully. [Vignetting—that is,
screening of the (extended) source by the object itself, cast shadows, multiple inter-
reflections, and material properties (non-Lambertian scattering) are the major factors
that destroy the 'ideal'.] However, in the overwhelming majority of instances parts of
the object turned towards the source will scatter a higher luminance towards the
Perturbation study of shading in pictures 1021

vantage point than parts turned away from the source: one must expect surface attitude
to play an important role.
Thus, if 'brighter' is indeed interpreted as 'closer' we expect that the depth residuals
will be highly correlated with the component of the range gradient in the direction of
the source.
Since we do not have access to the actual range m a p we are in no position to use
the range-gradient components as factors. However, the depth maps obtained from
the gauge-figure settings should bear a close relation to the range map. Consequently,
we may posit the depth-gradient components as promising factors. Given the approx-
imations in the photometry (neglect of vignetting, cast shadows, interreflection, and
material properties) we expect that the approximation of the range m a p by the depth
map will be of relatively minor importance.
Thus if 'brighter' is indeed interpreted as 'closer' we expect that the depth residuals
will be highly correlated with the component of the depth gradient in the direction of the
source. Since we have neglected important photometric effects and have approximated
the range by the depth, we cannot expect to be able to explain all of the variance in
the data with these two factors. Nevertheless, we expect to be able to account for a
good part of the variance, and a factor analysis should settle the issue.
So far, we have explained the particular choice of factors (notice again that we
are actually free in the choice of factors, factor analysis provides a check on their
relevance). In order to perform the factor analysis we first calculate the average depth
and the horizontal and vertical components of the average depth gradient. The latter
can be found immediately from the raw data (the gauge-figure settings). These maps are
illustrated in figure 8. Notice that the horizontal and vertical components of the depth
gradient look like shaded pictures of the object illuminated from the right and from the
top respectively. This is a manifestation of the physics described above and suggests that
we have indeed selected important factors. By taking linear combinations we can synthe-
size pictures that simulate actual luminance patterns for arbitrary illumination directions.
These maps are treated as 309-dimensional vectors (because the triangulation has
309 vertices). A depth m a p is thus a list of 309 depth values and can be considered a
vector in a 309-dimensional abstract space. Each vector in this space denotes a possible
depth map. We want to check whether our results can be explained as linear combinations
of a few basis vectors (or 'factors'), that is to say, whether the results are constrained to
the low-dimensional subspace spanned by these factors.

Mean depth Horizontal components Vertical components


of depth gradient of depth gradient
Figure 8. The factors considered in the factor analysis for subject JL. The shades of gray (including
black) denote the 0%-25%, 25%-50%, 50%-75%, and 75%-100% interquartile ranges.
1022 J J Koenderink, A J van Doom, C Christou, J S Lappin

As factors we pick the average depth map and the depth-gradient-component


maps. These are also lists of 309 values and thus can also be considered vectors in the
same abstract space. We normalize the factors to vectors of unit length and denote
them {el, e2, e3}. These vectors turn out to be close to mutually orthogonal (not quite,
of course) and span a three-dimensional subspace in the space of depth maps. A useful
index of orthonormality is the ratio of the largest to the smallest eigenvalue of the matrix
of scalar products et • e}. A value of unity would signify a perfectly orthonormal basis, a
very large value a degenerate (not three-dimensional) basis. For our subjects we find
values of about 2 (see table 3). The factors are depicted in figure 8.

Table 3. Table of the numerical conditions of the basis spanned by the factors. For an orthonormal
basis the condition is equal to unity. The condition is the ratio of the largest to the smallest
eigenvalue.
Subject Condition

AD 1.9
CC 1.9
JL 2.5

Next we compute depth-residual maps by subtracting the average depth from the
four individual depth maps (for the UL, UR, LR, and LL illuminations). The residuals
are indeed appreciable. For instance, the ratio of the interquartile ranges of the resid-
uals to the interquartile ranges of the depths amounts to approximately 30%, which is
highly significant. When we project these residual maps on the three-dimensional sub-
space spanned by {ex, e2, ez} we find that about half of this variance is accounted for,
roughly equally spread over the three dimensions (see tables 4 and 5). This is remarkable,

Table 4. Table of the factor loadings for all subjects in all conditions. For conditions, see figure 1.

Subject Condition First loading Second loading Third loading

AD UL -105 41 84
UR -48 -38 -15
LR 75 -26 42
LL 78 24 57
CC UL -25 41 -53
UR -115 34 -88
LR 55 -63 49
LL 85 -12 92
JL UL -112 63 -98
UR -17 -16 -23
LR 67 -68 53
LL 62 21 68

Table 5. Table of the percentages of the variance explained by the factors. For conditions, see
figure 1.

Subject Condition
UL UR LR LL
AD 59% 61% 65% 63%
CC 50% 66% 61% 60%
JL 67% 34% 62% 61%
Perturbation study of shading in pictures 1023

since it means that the depth residuals are indeed highly significantly correlated with
the mean depth and the gradient components. In other words, we have indeed selected
(by intuitive reasoning based on ecological optics) relevant factors. We are of course
especially interested in the latter two components, since they reflect the causal influence
of the potential information specified by the shading.
Since we wanted to see whether the depth residuals 'follow the light source' we
prepared a scatter plot of the directions of the residual maps in {e2, e3 }-factor space
against the direction towards the light source measured in the frontoparallel plane.
That is essentially a correlation of the changes of the tilts of the elements of pictorial
relief against the light direction. We obtain no more than four points, of course, and
in order to make the pattern more visually apparent we plotted two full cycles of
each direction in figure 9. (Thus each point appears four times in this plot.) The result
is clearcut: the points cluster on lines of unit slope, indicating a fixed phase difference
between these two sets of directions. The phase difference is close to 180° (see table 6;
in the figures we drew lines for the 180° phase difference, rather than the best-fitting
unit-slope lines) which means (if sign conventions are carefully taken into account)
that the directions of fastest increase in depth coincide with the directions towards the
source. This again means that the subjects indeed confuse 'brighter' with 'nearer'. This
result is a specialization of the earlier result from the principal-components analysis
in that we have decided on the components on a priori grounds.
The factor analysis thus confirms that the empirically determined principal compo-
nents are similar to depth-gradient components.
AD CC JL

100 300 500 700 100 300 500 700 100 300 500 700
Direction of illumination/0

Figure 9. Phases from the factor analyses for all three subjects (AD, CC, and JL).
Table 6. Table of the results of a linear regression on the illumination directions and phases
from the factor analysis.
Subject Phase angle/0
AD 173 ±15
CC 150 ±24
JL 188 ±12

4.3 Discussion
We conclude that shading has a significant influence on the pictorial relief, even though
at first blush the relief appears already almost fully specified by other optical structures,
in the present case the contour and perhaps familiarity. The residuals amount to about
one third of the interquartile depth range.
Since we do not have a numerical representation of the physical shape we cannot
draw strong conclusions concerning the issue of whether the human observer exploits
'shape from shading' in order to arrive at greater veridicality than the contour alone
1024 J J Koenderink, A J van Doom, C Christou, J S Lappin

might provide. However, since the relief is different for the four shading conditions
we may confidently conclude that the results would turn out to deviate significantly
from such a ground truth in all cases.
By comparing the reliefs for the UL, UR, LR, and LL illuminations we found a
small but very significant and systematic effect of shading. The subjects (unknowingly)
interpreted brighter as nearer. This is a robust effect that could be shown by a variety
of techniques such as an analysis of overall attitude, principal-components analysis, or
a factor analysis based on factors that are the physical causes of the surface illuminance
distribution.
The variations of pictorial relief under variation of the source location turn out to
correlate quite well with that parameter: from the factor analysis we find that we
account for about half of the total residual variance and thus can causally link this to
the variation of the illumination. The remaining variance is still somewhat too large
to be completely accounted for by random cause (day-to-day scatter in repeated trials).
There could be a variety of reasons for this, one being the fact that the fitting of linear
models to the data is likely to be a first approximation at best. On the basis of the
given data it seems rash to want to draw further conclusions, though.

5 Conclusion
In a previous study (Koenderink et al 1996) we have shown that the naive observer
does not obtain sufficient information from a silhouette rendering in order to produce
a fully articulated pictorial relief that approximates that due to a fully shaded rendering.
In this impoverished case, prior knowledge (familiarity with the object or pictures of
it) largely determines the articulation and the degree of veridicality of the result. For
a cartoon rendering all subjects produced fully articulated pictorial reliefs with a high
degree of similarity to that obtained with fully shaded renderings. Apparently the
objective optical structure in the picture overrides the effects of familiarity and naive
and nonnaive subjects score about equally. We have also shown that, although the
addition of full shading did not alter the pictorial relief dramatically, significant
changes did nevertheless occur. Clearly the 'observer's share' in the pictorial relief
(Gombrich 1982) decreases with any additional optical structure due to the rendering.
From these results it might be concluded that shading adds little to what can be
specified by a clear outline drawing. However, that would be a premature conclusion
for it is certainly easy to conceive of cases where the shading would be vital. For
instance, think of objects which present the same (circular, say) contour from a given
vantage point, but are differently structured in the interior of the silhouette (indenta-
tions and protrusions that do not contribute to the contour, say). Then shading would
without a doubt reveal the relief and show up the differences between these objects.
From this example it is evident that cases exist where shading has to yield a necessary
additional source of potential information on top of the outline. There is good reason
to expect that shading might possibly add to or alter the relief specified by outline
alone. Such is, of course, an obvious fact to the artist (Schone 1954).
In this study we have used the geometrically identical scene (same object in
identical attitude from the same vantage point) with parametrically varied shading.
The parametrization was with respect to the position of the (single, broad) source.
The source was placed at one of four positions, towards the front and sideways in
either the upper-left, upper-right, lower-right, or lower-left diagonal direction. Displace-
ment of the source brought about many photometrical changes because the object had
a semigloss white finish (hence shifting specularities), reflecting surfaces were present
in the scene (hence shifting reflexes on the dark side of the object), and vignetting of
the (broad) source by the object itself (yielding various dark indentations) was
important, etc.
Perturbation study of shading in pictures 1025

As it turns out, the significant changes in the pictorial relief with displacement of
the source are highly correlated with that parameter. This could be shown in various
ways by standard data analysis. A guess concerning the causal nature of these correla-
tions could also be corroborated via a factor analysis where the factors were picked
as the entities that primarily influence surface irradiance, ie physical causes that would
figure prominently in any 'shape-from-shading' algorithm. For all three subjects it
turned out to be the case that they (unknowingly) took 'brighter' for 'nearer'. This is
true both in a global (overall attitude) and in a local sense (local protrusions follow
the source). This fact might be described as a 'regression to the proximal stimulus'
(ie mistaking the retinal irradiance for nearness to the observer), but it might easily
arise as an artifact in many shape-from-shading methods. We do not believe that the
present data suffice to differentiate between alternatives here.
A simple possible explanation is provided by the following argument. Think of
the fact that an anisotropic scaling in the direction of the source would yield another
object with the same pattern of isophotes. Thus a mere relabelling of isophotes (maybe
the visual system uses mainly ordinal information) would easily yield an effect as
found. The fact that such a scaling would affect the outline is not necessarily a coun-
terargument since there is no compelling evidence that the system does not simply
ignore minor inconsistencies. The inconsistencies are indeed minor because the angle
between the direction of illumination and the viewing direction is not very large. This
argument correctly predicts the type of residuals, though it cannot predict the magnitude
of the effect (there might even be a sign flip). An argument like this does have the
advantage that it applies irrespective of the particular shape-from-shading algorithm,
though. As long as arguments like this apply, it is too early to draw on detailed shape-
from-shading theories.
That 'brighter' can be used to signal 'nearer' is a fact that is well known to visual
artists and has been often exploited. A primitive example is edge darkening to make
objects look round and solid (it lets the contour recede), a method that may be as
ancient as the Lascaux cave art and that is widespread in the art of the European
Middle Ages [to be sharply distinguished from shading proper, or chiaroscuro, a much
more recent invention (Rawson 1969)]. The method is widely exploited even today and
the present psychophysical results show how it also occurs in visual judgments of
visual relief in natural scenes. One should be careful to distinguish the effect from a
different one, namely the tendency to judge high-contrast areas as nearer than low-
contrast ones. If one exploits the latter principle it is easy to apparently invert the
'brighter-looks-nearer' tendency. Good examples of this are to be found in Seurat's
chalk drawings (Broude 1978).
On a more general conceptual level, we have presented an example of how a
certain aspect of the optical structure can be studied without isolating it by reduction.
Any aspect of optical structure can be studied in the context of a rich natural image
if we consider the correlation of changes brought about by parametric variation of
that aspect of the optical structure with that variation. In the present case, that
appears as the only reasonable way to study the potentials of shading since the method
of reduction leads to very ambiguous stimuli that appear to possess little pictorial relief
in the first place. That the pictorial relief might be dominated by the other aspects of the
available optical structure (here mainly the contour) or prior information is irrelevant as
long as variation of the fiducial aspect leads to significant changes of the relief. This
appears to be an important methodological principle that is in fact a standard tool of
engineering (Truxal 1955) but appears to have been greatly neglected in psychophysical
research.
1026 J J Koenderink, A J van Doom, C Christou, J S Lappin

Acknowledgements. Jan Koenderink is supported by the Human Frontiers Science Program.


Andrea van Doom is supported by the Netherlands Organization for Scientific Research (NWO).
Chris Christou and Joe Lappin were supported by the ESPRIT Basic Research Action INSIGHT
of the European Commission.
References
Blake A, Bulthoff H H, Sheinberg D, 1993 "Shape from texture: ideal observers and human
psychophysics" Vision Research 33 1723 -1737
Broude N (Ed.), 1978 Seurat in Perspective (Englewood Cliffs, NJ: Prentice-Hall)
Buckley D, Frisby J P, Freeman J, 1994, "Lightness perception can be affected by surface curvature
from stereopsis" Perception 23 869-881
Bulthoff H H, Mallot H A, 1988 "Integration of depth modules: stereo and shading" Journal of
the Optical Society of America A 5 1749-1758
Clifton J, 1973 The Eye of the Artist (Westport, CT: North Light Publishers)
Erens R G F, Kappers A M L, Koenderink J J, 1991 "Limits on the perception of local shape
from shading", in Studies in Perception and Action Eds P J Beek, R J Bootsma, P C W van
Wieringen (Amsterdam: Rodopi) pp 65-71
Erens R G F, Kappers A M L, Koenderink J J, 1993 a "Perception of local shape from shading"
Perception & Psychophysics 54 145 -156
Erens R G F, Kappers A M L, Koenderink J J, 1993b "Estimating local shape from shading in
the presence of global shading" Perception & Psychophysics 54 334-342
Gombrich E H, 1982 The Image and the Eye, Further Studies in the Psychology of Pictorial
Representation (Oxford: Phaidon)
Hale N C, 1980 Abstraction in Art and Nature (New York: Watson-Guptill Publications)
Hattersley R, 1979 Photographic Lighting, Learning to See (Englewood Cliffs, NJ: Prentice-Hall)
Hunter F, Fuqua P, 1990 Light, Science and Magic, An Introduction to Photographic Lighting
(Boston, MA: Focal Press)
Johnston A, Hill H, Carmann N, 1992 "Recognising faces: effects of lighting direction, inversion
and brightness reversal" Perception 21 365-375
Knill D C, Kersten D, 1991 "Apparent surface curvature affects lightness perception" Nature
(London) 351 228 -230
Koenderink J J, Doom A J van, 1995 "Relief: pictorial and otherwise" Image and Vision Computing
13 321-334
Koenderink J J, Doom A J van, Kappers A M L, 1992 "Surface perception in pictures" Perception
& Psychophysics 32 487-496
Koenderink J J, Doom A J van, Christou C, Lappin J S, 1996 "Shape constancy in pictorial
relief" Perception 25 155 -164
Lomazzo P, 1958 "Treatise on the art of painting; The First Booke: Of Proportion", in A Doc-
umentary History of Art volume 2 Ed. E G Holt (New York: Doubleday) p. 78
Mardia K V, Kent J T, Bibby J M, 1977 Multivariate Analysis (London: Academic Press)
Nurnberg W, 1948 Lighting for Portraiture (London: Focal Press)
Ramachandran V, 1988 "Shape from shading" Nature (London) 331 163 -166
Rawson P, 1969 Drawing (Oxford: Oxford University Press)
Schone W, 1954 Uber das Licht in der Malerei (Berlin: Mann)
Todd J J, Ackerstrom R A, 1987 "Perception of three-dimensional form from patterns of optical
texture" Journal of Experimental Psychology: Human Perception and Performance 13 242-255
Truxal J G, 1955 Automatic Feedback Control System Synthesis (New York: McGraw-Hill)

p © 1996 a Pion publication printed in Great Britain

You might also like