Jov 7 1 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Journal of Vision (2007) 7(1):4, 1–18 http://journalofvision.

org/7/1/4/ 1

Parameter learning but not structure learning:


A Bayesian network model of constraints
on early perceptual learning
Department of Brain and Cognitive Sciences,
Center for Visual Science, University of Rochester,
Melchi M. Michel Rochester, NY, USA

Department of Brain and Cognitive Sciences,


Center for Visual Science, University of Rochester,
Robert A. Jacobs Rochester, NY, USA

Visual scientists have shown that people are capable of perceptual learning in a large variety of circumstances. Are there
constraints on such learning? We propose a new constraint on early perceptual learning, namely, that people are capable of
parameter learningVthey can modify their knowledge of the prior probabilities of scene variables or of the statistical
relationships among scene and perceptual variables that are already considered to be potentially dependentVbut they are not
capable of structure learningVthey cannot learn new relationships among variables that are not considered to be potentially
dependent, even when placed in novel environments in which these variables are strongly related. These ideas are formalized
using the notation of Bayesian networks. We report the results of five experiments that evaluate whether subjects can
demonstrate cue acquisition, which means that they can learn that a sensory signal is a cue to a perceptual judgment. In
Experiment 1, subjects were placed in a novel environment that resembled natural environments in the sense that it
contained systematic relationships among scene and perceptual variables that are normally dependent. In this case, cue
acquisition requires parameter learning and, as predicted, subjects succeeded in learning a new cue. In Experiments 2–5,
subjects were placed in novel environments that did not resemble natural environmentsVthey contained systematic
relationships among scene and perceptual variables that are not normally dependent. Cue acquisition requires structure
learning in these cases. Consistent with our hypothesis, subjects failed to learn new cues in Experiments 2–5. Overall, the
results suggest that the mechanisms of early perceptual learning are biased such that people can only learn new
contingencies between scene and sensory variables that are considered to be potentially dependent.
Keywords: perceptual learning, motion discrimination, Bayesian networks, cue integration, perceptual cue acquisition
Citation: Michel, M. M., & Jacobs, R. A. (2007). Parameter learning but not structure learning: A Bayesian network model of
constraints on early perceptual learning. Journal of Vision, 7(1):4, 1–18, http://journalofvision.org/7/1/4/, doi:10.1167/7.1.4.

nents were consistent with the dependencies that charac-


Introduction terize the phrase structures of natural languages.
This article proposes a new constraint or bias on early, or
Acquiring new information about the world requires con- low-level, perceptual learning.1 We hypothesize that people’s
tributions from both nature and nurture. These factors de- early perceptual processes can modify their knowledge of
termine both what biological organisms can and cannot the prior probabilities of scene properties or their knowledge
learn. Numerous studies have shown that organisms’ learn- of the statistical relationships among scene and sensory var-
ing processes are often biased or constrained. Perhaps the iables that are already considered to be potentially dependent.
most famous demonstration that learning processes are However, they cannot learn new relationships among scene
biased comes from the work of Garcia and Koelling (1966) and sensory variables that are not considered to be poten-
who showed that rats are predisposed to learn certain types tially dependent, even when placed in novel environments
of stimulus associations and not others. They interpreted in which these variables are strongly related. To illustrate
their results in terms of a learning bias referred to as this idea, consider the problem of perceptual cue acquisition.
Bbelongingness[Vorganisms more easily learn associations Wallach (1985) proposed a theory of cue acquisition that is
among types of stimuli that are correlated or participate in representative of other theories in the literature (a closely
cause-and-effect relationships in natural environments. A related theory was originally proposed by Brunswick, 1956;
more recent demonstration that learning processes are readers interested in this topic should also see Haijiang,
biased comes from the work of Saffran (2002). She found Saunders, Stone, & Backus, 2006). He hypothesized that
that people more easily learned an artificial language in every perceptual domain (e.g., perception of motion di-
when the statistical relationships among sound compo- rection), there is at least one primary source of information,
doi: 1 0. 11 67 / 7 . 1 . 4 Received April 18, 2006; published January 16, 2007 ISSN 1534-7362 * ARVO

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 2

useable innately and not modifiable by experience. Other per- ditory stimuli (i.e., to anticipate our discussion of Bayesian
ceptual cues are acquired later through correlation with the networks below, this would be an instance of parameter
innate process. Using Wallach’s theory, we consider con- learning in which subjects modify their conditional dis-
straints on the learning processes underlying cue acquisition. tribution of the perceived auditory signal given the estimated
One possibility is that these processes are general purpose, motion direction). We predicted, therefore, that subjects
which means that they are equally sensitive to correlations would learn to use the hitherto unfamiliar and, thus,
between known cues and any signal. For example, let us uninformative auditory stimulus as a cue to motion direction.
suppose that retinal image slip is an innate cue to motion As reported below, the experimental results are consistent
direction and let us consider an observer placed in a novel with our prediction. Experiment 1 can be regarded as a
environment in which retinal image slip is perfectly control experiment in the sense that that it verified that our
correlated with a novel signal, such as the temperature of experimental procedures are adequate for inducing observ-
the observer’s toes (e.g., leftward retinal slip is correlated ers to learn a new perceptual cue in the manner suggested
with cold toes, and rightward retinal slip is correlated with by Wallach (i.e., by correlating a signal that is not cur-
hot toes). According to Wallach’s theory, it ought to be the rently a cue with an existing cue).
case that the observer learns that the temperature of his or her In Experiments 2, 3, 4, and 5, subjects were placed in novel
toes is a perceptual cue to motion direction. For example, the environments that did not resemble natural environmentsV
observer may learn that cold toes indicate leftward motion, they contained systematic relationships among scene and
whereas hot toes indicate rightward motion. Alternatively, it perceptual variables that do not normally share systematic
may be that the learning processes underlying cue acquis- relationships. In Experiments 2 and 3, the visual cue to
ition are biased such that they are more sensitive to some motion direction was correlated with binocular disparity
correlations than to others. In particular, we conjecture that or brightness signals, respectively; the experimental pro-
these processes cannot learn new relationships among scene cedures were otherwise identical to those of Experiment 1.
and sensory variables that are not considered to be potentially In the natural world, neither brightness nor binocular
dependent. It seems likely that an observer placed in the disparity varies systematically with transverse object
novel environment described above would not believe that motion (i.e., motion in the frontoparallel plane). For ex-
motion direction and the temperature of his or her toes are ample, it is not the case that brighter objects tend to move
potentially dependent variables, and thus, the observer’s right whereas darker objects move left, nor is it the case
early perceptual system would fail to learn that the that nearer objects tend to move right whereas distant
temperature of his or her toes is a cue to motion direction. objects move left. Consequently, observers should not
In the remainder of this article, we report the results of consider motion direction and either brightness or binoc-
five experiments. These experiments evaluate our hypoth- ular disparity as potentially dependent variables. In
esis regarding biases in early perceptual learning. They do contrast to Experiment 1, the predictions of Wallach’s
so in the context of Wallach’s theory of cue acquisition hypothesis for Experiments 2 and 3 differ from those of
described above, namely, that new perceptual cues can be our theory. Wallach’s hypothesis suggests that correlating
acquired by correlating an existing cue with a novel sensory novel signals with existing cues should be sufficient to
signal. We then present a simple model, described in terms induce cue learning. In contrast, our hypothesis claims that
of Bayesian networks, that formalizes our hypothesis, observers can only learn relationships between variables
accounts for our results, and is consistent with the existing that are considered to be potentially dependent. Because
literature on perceptual learning. transverse motion direction and either brightness or
In Experiment 1, subjects were placed in a novel binocular disparity are not considered to be potentially
environment that resembled natural environments in the dependent, we predicted that subjects in Experiments 2
sense that it contained systematic relationships among and 3 would fail to learn to use brightness or binocular
scene and perceptual variables that normally share system- disparity signals as cues to transverse motion direction
atic relationships. Subjects were trained to perceive the (i.e., to anticipate the discussion of Bayesian networks
motion direction of a field of moving dots when the visual below, we predicted that subjects would fail to show
cue to motion direction was correlated with a novel au- structure learning). The experimental results are consistent
ditory signal. When an object moves in a natural environ- with this prediction.
ment, this event often gives rise to correlated visual and Experiments 1, 2, and 3 attempted to teach subjects a
auditory signals. In other words, perceived auditory and new cue to transverse motion direction. To check that
visual motion signals are both dependent on the motion of there is nothing idiosyncratic about this perceptual judg-
objects in a scene and, thus, people regard visual or au- ment, we used a different task in Experiments 4 and 5.
ditory signals as potentially dependent on the motion Subjects were trained to perceive the light source direction
direction in a scene. We reasoned that subjects in our ex- when the shading cue to this direction was correlated with
periment should be able to estimate the motion direction a visual disparity or auditory signal. Because neither
of the moving dots based on the auditory and visual sig- binocular disparity nor auditory signals share systematic
nals and then modify their knowledge of the relationship relationships with light source direction in the natural
between motion direction and the initially unfamiliar au- world, we predicted that subjects would fail to learn that

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 3

these signals were also cues to light source direction in our were conducted in a darkened room, with black paper ob-
novel experimental environments. Again, the experimental scuring the edges of the CRT.
results are consistent with this prediction. Auditory stimuli consisted of 1 s of Bnotched[ white
Taken as a whole, the experimental results are consistent noise played through a pair of headphones. We used
with the hypothesis that the learning processes underlying auditory noise because we wanted to create ambiguous
cue acquisition are biased by prior beliefs about potentially motion stimuli.2 Two stimuli defining the endpoints of a
dependent variables such that cue acquisition is possible continuum, denoted A and B, were each constructed by
when a signal is correlated with a cue to a scene property combining two narrow bands of noise (sampled at 22 kHz).
and the signal is potentially dependent on that property. If Stimulus A had approximately equal amplitude in the ranges
the signal is not believed to be potentially dependent on the 4000–5000 and 8000–10000 Hz, whereas stimulus B had
property, cue acquisition fails. In the General discussion approximately equal amplitude in the ranges 1–2000 and
section, we introduce a Bayesian network model formal- 6000–7000 Hz. Intermediate stimuli were created by linearly
izing this hypothesis. combining stimuli A and B, where the linear coefficients
formed a unit-length vector whose endpoint lied on a circle
passing through the points (1,0) and (0,1) [e.g., the co-
efficients (1, 0) produced stimulus A, the coefficients (0, 1)
Experiment 1: Auditory cue to produced stimulus B, and the coefficients (1/¾2, 1/¾2) pro-
motion direction duced a stimulus midway between A and B].3 Auditory stimuli
were normalized to have equal maximum amplitudes.

Subjects in Experiment 1 were trained to perceive the


motion direction of a field of dots when the visual cue to Procedure
motion direction was correlated with an auditory signal. The experiment used four tasks, referred to as the vision-
The experiment examined whether subjects would learn only, audition-only, and vision–audition training tasks, and
that the auditory signal is also a cue to motion direction. the vision–audition test task. The vision-only and audition-
Because moving objects often give rise to both visual and only tasks allowed us to characterize each subject’s
auditory signals in natural environments (i.e., because performances on visual and auditory discrimination tasks,
sounds are created by physical motion), we expected that respectively. The goal of the vision–audition training task
subjects would consider motion direction and an auditory was to expose subjects to an environment in which an
signal to be potentially dependent and, thus, would learn auditory signal is correlated with a visual cue to motion
that the auditory signal is also a cue. direction. The goal of the vision–audition test task was to
evaluate whether subjects learned that the auditory signal is
Methods also a cue to motion direction.
In each trial of the vision-only training task, four visual
Subjects displays were presented: A fixation square was presented
Subjects were eight students at the University of for 500 ms, followed by the first RDK for 1,000 ms,
Rochester with normal or corrected-to-normal vision and followed by a second fixation square for 400 ms, followed
normal hearing. All subjects were naive to the purposes of by the second RDK for 1,000 ms. The stimulus direction of
the study. the first RDK, referred to as the Bstandard[ stimulus, was
always 0- (vertical). The second RDK, referred to as the
Bcomparison[ stimulus, had a stimulus direction different
Stimuli from the standard. Subjects judged whether the dots in the
Visual stimuli were random-dot kinematograms (RDKs) comparison stimulus moved to the left (anticlockwise) or to
presented for a duration of 1 s. The kinematograms the right (clockwise) of those in the standard (vertical)
consisted of 309 small antialiased white dots (each sub- stimulus. They responded by pressing the appropriate key
tending approximately 0.65 min of visual angle) moving (at on the keyboard. At the end of every 10 trials, subjects were
a rate of 1.4-/s) behind a simulated circular aperture (with a informed of the number of those trials on which they
diameter of 5.72- of visual angle) against a black back- responded correctly. The ease or difficulty of the task was
ground. Half the dots in a display moved in the same varied over trials by varying the stimulus direction of the
direction, referred to as the stimulus direction, whereas comparison so that difficult trials contained smaller direction
each of the remaining dots moved in a direction sampled differences between the standard and comparison stimuli than
from a uniform distribution. Each dot had a lifetime of did easy trials. This direction was determined using inter-
approximately 150 ms, after which a new replacement dot leaved 2-up, 1-down and 4-up, 1-down staircases. Trials were
appeared in a random position within the aperture. These run until there were at least 12 reversals of each staircase. A
stimuli were presented on a standard 19-in. CRT with a res- subject’s approximate 71% correct and 84% correct thresh-
olution of 1,024  768 pixels and a refresh rate of 100 Hz olds were set to the average values over the last 10 reversals of
and were viewed from a distance of 1.5 m. All experiments the 2-up, 1-down and 4-up, 1-down staircases, respectively.

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 4

The audition-only training task was identical to the of the auditory stimulus was offset from vertical by either a
vision-only training task with the following exception. value % or j%, where % was set to a subject’s 84% correct
Instead of viewing RDK, subjects heard auditory stimuli. threshold on the audition-only training trials. In contrast,
The standard was an auditory stimulus midway between the comparison stimulus was a Bcue-consistent[ stimulus.
stimuli A and B defined above, whereas the comparison By comparing performances when the auditory signal in
was either nearer to A or nearer to B. Subjects judged the standard had an offset of % versus j%, we can evaluate
whether the comparison was closer to A or B relative to whether this signal influenced subjects’ judgments of
the standard. Subjects were familiarized with A and B motion direction.
prior to performing the task. Subjects performed the four tasks during two experi-
Subjects also performed a vision–audition training task in mental sessions. In Session 1, they performed the vision-
which an auditory signal is correlated with a visual cue to only and audition-only training tasks. Before performing
motion direction. Before performing this task, we formed a these tasks, subjects performed a small number of practice
relationship between visual and auditory stimuli by map- trials in which they were given feedback on every trial.
ping subjects’ visual thresholds onto their auditory thresh- They also performed the vision–audition training task
olds. This was done using a log-linear function twice. In Session 2, they performed the vision–audition
training task and then performed the vision–audition test
logðdv Þ ¼ mlogðda Þ þ b; ð1Þ task twice.

where dv and da are visual and auditory Bdirections,[


respectively, m is a slope parameter, and b is an intercept Results
parameter. The log-linear function ensured that corre-
sponding visual and auditory stimuli were (approximately) Two subjects’ results on the vision–audition test task are
equally salient. The vision–audition training task was shown in Figure 1. The horizontal axis of each graph gives
identical to the vision-only training task with the follow- the direction of the comparison, whereas the vertical axis
ing exception. Instead of only viewing RDK, subjects both gives the probability that the subject judged the direction
viewed RDK and heard the corresponding auditory of the comparison as clockwise relative to that of the
stimuli. They were instructed to focus on the visual standard. The data points indicated by circles or crosses
motion-direction discrimination task but were also told are for the trials in which the auditory signal in the
that the auditory stimulus might be helpful. Half the standard was offset from vertical by the amount % or j%,
subjects were run in the Bno-switch[ condition, which respectively. The dotted and solid lines are cumulative
means that the relationship between an auditory cue and a Normal distributions fit to these data points using a
response key was the same on this task as it was on the maximum-likelihood procedure from Wichmann and Hill
audition-only task. The remaining subjects were run in the (2001a).
Bswitch[ condition. (In other words, for half the subjects, To compare a subject’s performances when the offset of
the stimulus direction of auditory stimulus A was the auditory signal in the standard was % versus j%, we
anticlockwise of vertical and the direction of B was compared a subject’s point of subjective equality (PSE) in
clockwise of vertical, whereas this relationship was each case. The PSE is defined as the direction of the
reversed for the remaining subjects.) This was done so comparison at which a subject is equally likely to judge
that results on the vision–audition training and test tasks this direction as being anticlockwise or clockwise relative
could not be attributed to an association between auditory to that of the standard. For example, consider subject
stimuli and response keys learned when subjects per- BCY whose data are shown in the left graph of Figure 1.
formed the audition-only trials. This subject’s PSE is about j3- in the j% case and about
Vision–audition test trials were conducted to evaluate 2- in the % case, indicating a PSE shift of about 5-. For
whether subjects learned that the auditory signal is each of the subjects whose data are illustrated in Figure 1,
correlated with the visual cue to motion direction and, thus, their PSE when the offset was j% is significantly less than
whether it, too, is a cue to motion direction. These test trials their PSE when the offset was % (both subjects had
were similar to vision–audition training trials with the significant PSE shifts using a significance level of p G .05,
following differences. First, the presentation order of the where the test of significance is based on a Monte Carlo
standard and comparison was randomized. Subjects were procedure described by Wichmann & Hill, 2001b). Seven
instructed to judge whether the direction of the second of the eight subjects run in the experiment had significant
stimulus was anticlockwise or clockwise relative to that of PSE shifts.
the first stimulus. Second, subjects never received feed- The graph in Figure 2 shows the combined data for all
back. Third, stimuli were selected according to the method eight subjects, along with the maximum-likelihood psy-
of constant stimuli rather than according to a staircase. chometric fits for the pooled data. The average value of
Importantly, standard stimuli were Bcue-conflict[ stimu- the offset % across all subjects was equivalent to a 4.30-
liVthe direction of the RDK was vertical, but the direction rotation in motion direction. Importantly for our purposes,

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 5

Figure 1. The data for two subjects from the vision–audition test trials. The horizontal axis of each graph gives the direction of the
comparison, whereas the vertical axis gives the probability that the subject judged the direction of the comparison as clockwise relative to
that of the standard. The data points indicated by circles or crosses are for the trials in which the auditory signal in the standard was offset
from vertical by the amount % or j%, respectively. The dotted and solid lines are cumulative Normal distributions fit to these data points
using a maximum-likelihood procedure.

subjects showed a large shift in their PSEVthe PSE shift (3.72- on average) was smaller, consistent with the idea
for the combined data is 3.72- ( p G .001). These data that subjects combined information from the visual and
suggest that subjects based their judgments on information auditory signals.
from both the visual and auditory signals. Had subjects In summary, the results suggest that subjects acquired a
used only the visual signal, we would have expected no new perceptual cueVthey learned that the initially unfa-
shift in their PSEs. Conversely, if subjects had used only miliar auditory signal was correlated with the visual cue to
the auditory signal, then a PSE shift of 2% (8.6- on motion direction and, thus, it, too, is a cue to motion
average) would have been expected. The actual PSE shift direction. Furthermore, the subjects used the new cue for
the purposes of sensory integrationVthey combined infor-
mation from the new auditory cue with information from
the previously existing visual cue when judging motion
direction.

Experiment 2: Disparity cue to


motion direction
Subjects in this experiment were trained to perceive the
direction of moving dots when the visual cue to motion
direction was correlated with a binocular disparity signal.
The experiment examined whether subjects would learn
that the disparity signal is also a cue to motion direction.
Because the transverse motion of objects in the natural
world does not affect the binocular disparities received by
observers, we reasoned that subjects in our experiment
would not believe that there is a potential dependency
between transverse motion and disparity and would, there-
Figure 2. The data from the vision–audition test trials for all eight fore, be unable to learn that the disparity signal is also a cue
subjects combined. to motion direction in our novel experimental environment.

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 6

Methods 1
δ (fit)
Subjects 0.9 +δ (fit)
Subjects were eight students at the University of 0.8 δ (data)
+δ (data)

P(comparison>standard)
Rochester with normal or corrected-to-normal vision. All 0.7
subjects were naive to the purposes of the study.
0.6

0.5
Stimuli
0.4
Motion stimuli were RDKs identical to those used in
Experiment 1, except that, to limit Bbleeding[ across 0.3
image frames in the stereo condition, only the red gun of the 0.2
CRT was used. Stimuli containing binocular disparities were
0.1
created as follows. Stationary dots were placed at simulated
depths (all dots in a given display were at the same depth) 0
30 20 10 0 10 20 30
ranging from j23 to 23 cm relative to fixation (or from 127 Motion direction (comparison)
to 173 cm in absolute depth from the observer) and
rendered from left-eye and right-eye viewpoints. Left-eye
and right-eye images were presented to subjects using Figure 3. Data from the motion–disparity test trials for all eight
LCD shutter glasses (CrystalEyes 3 from Stereographics). subjects combined.
Stimuli with both visual motion and disparity signals were
created by placing moving dots at simulated depths and
rendering the dots from left-eye and right-eye viewpoints. direction. The experimental outcome is that subjects did
not learn to use the disparity signal as a cue to motion
directionVthe 0.04- shift in PSEs when the offset was %
Procedure
versus j% was not significantly different from zero at the
The procedure for Experiment 2 was identical to that for p G .05 level.
Experiment 1, except that the auditory signal was replaced
by the binocular disparity signal. That is, subjects
performed motion-only, disparity-only, and motion–dis-
parity training trials, and motion–disparity test trials. For Experiment 3: Brightness cue to
the motion–disparity training trials, stimuli with both motion direction
motion and disparity signals were constructed as in
Experiment 1, by mapping motion direction values onto
disparity values based on the motion and disparity dis- Subjects in Experiment 3 were trained to perceive the
crimination thresholds obtained in the motion-only and direction of moving dots when the visual cue to motion
disparity-only training trials. The motion–disparity test trials direction was correlated with a visual brightness signal.
were functionally identical to those of Experiment 1, with % The experiment examined whether subjects would learn
now representing offsets from the vertical direction in the that the brightness signal too is a cue to motion direction.
disparity signal of the standard stimulus. Because the transverse motion of objects in the natural
world does not affect their brightness, we reasoned that
subjects do not represent a potential dependency between
transverse motion and brightness and would, therefore, be
Results
unable to learn that the brightness signal is also a cue to
motion direction in our novel experimental environment.
If a subject had different PSEs when the disparity offset
was j% versus %, then we can conclude that the subject
learned to use the disparity signal as a cue to motion
direction. Only one of the eight subjects had significantly
Methods
different PSEs in the two conditions (at the p G .05 level), Subjects
suggesting that subjects did not learn to use the disparity Subjects were eight students at the University of
signal when judging motion direction. Rochester with normal or corrected-to-normal vision. All
The data for all subjects are shown in Figure 3. We fit subjects were naive to the purposes of the study.
psychometric functions (cumulative Normal distributions)
to the combined data from all eight subjects when the
offset in the standard was % (solid line) and when it was Stimuli
j% (dotted line). The average value across subjects for the Motion stimuli were RDKs identical to those used in
offset % was equivalent to a 4.49- rotation in motion Experiments 1 and 2, except that the individual dots were

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 7

assigned a neutral or pedestal brightness value. The offset in the standard was % (solid line) and when it was
brightness stimuli consisted of stationary random-dot j% (dotted line). The average value across subjects for the
images whose dots all shared a common pixel brightness offset % was equivalent to a 5.55- rotation in motion
value that ranged from 78 to 250 on a scale of 0–255. direction. The 0.70- shift in PSEs in the % versus j% cases
The pedestal pixel brightness of 164 had a luminance of was not statistically significant at the p G .05 level,
45.0 cd/m2. Near this pedestal, luminance values scaled indicating that subjects did not learn to use the brightness
approximately linearly with pixel brightness, with 1 unit signal as a cue to motion direction.
of RGB pixel brightness equivalent to 0.786 cd/m2.
Stimuli with both visual motion and brightness signals
were created by assigning brightness pixel values to
moving dots.
Discussion
Procedure We hypothesize that our early perceptual systems are
The procedure for Experiment 3 was identical to those capable of learning novel statistical relationships among
for Experiments 1 and 2, except that the auditory or scene and sensory variables that are already considered to
disparity signals were replaced by a brightness signal. be potentially dependent but that they cannot learn new
relationships among scene and sensory variables that are
not considered to be potentially dependent, even when
placed in novel environments in which these variables are
Results strongly related. Our experiments were designed to eval-
uate this hypothesis in the context of cue acquisition.
The motion–brightness test trials contained two condi- Experiments 1, 2, and 3 evaluated whether people could
tionsVthe direction of the brightness signal in the standard learn new cues to transverse motion (motion in the
stimulus was offset from vertical by an amount j% or %. If frontoparallel plane).
a subject had different PSEs in the two conditions, then we In Experiment 1, subjects were exposed to an environ-
can conclude that the subject learned to use the brightness ment in which visual motion direction was correlated with
signal as a cue to motion direction. None of the eight an auditory signal. Because motion in natural environ-
subjects had significantly different PSEs in the two ments often gives rise to both visual and auditory signals,
conditions (at the p G .05 level), suggesting that subjects it seems reasonable to assume that people believe that
did not learn to use the brightness signal when judging there is a potential dependency between motion direction
motion direction. and an auditory stimulus, and thus, we predicted that
The data for all subjects are illustrated in Figure 4. We fit subjects would succeed in acquiring a new cue. The
psychometric functions (cumulative Normal distributions) experimental results are consistent with this prediction.
to the combined data from all eight subjects when the We can regard Experiment 1 as a control experimentVit
establishes that our experimental procedures are adequate
for inducing cue acquisition and that our statistical
analyses are adequate for detecting this acquisition.
Experiments 2 and 3 exposed subjects to an environ-
ment in which visual motion direction was correlated with
a binocular disparity signal or a brightness signal,
respectively. In contrast to Experiment 1, cue acquisition
in these cases requires representing statistical relationships
among variables that do not share dependencies in the
natural world. Transverse motion in natural environments
does not lead to changes in disparity or brightness, and
thus, people should not believe that there is a potential
dependency between motion direction and disparity or
brightness. We predicted that subjects would not acquire
new cues to motion direction in these experiments, and the
experimental results are consistent with these predictions.
There are at least two alternative explanations of our
experimental results, however, that should be considered.
First, perhaps there is something idiosyncratic about judg-
ments of transverse motion. If so, one would not expect the
Figure 4. Data from the motion–brightness test trials for all eight experimental results to generalize to other perceptual
subjects combined. judgments. Second, Experiment 1, where cue acquisition

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 8

was successful, used signals from different sensory


modalities, whereas Experiments 2 and 3, where cue
acquisition was not successful, used signals from a single
modality. Perhaps this difference accounts for the differ-
ences in experimental outcomes. Experiments 4 and 5
were designed to evaluate these alternative explanations.

Experiment 4: Disparity cue to


light source direction
Subjects in this experiment were trained to perceive the
direction of a light source when the visual cue to light
source directionVthe pattern of shading across the visual Figure 5. A sample stimulus from Experiment 4. In this example,
objectsVwas correlated with a visual disparity signal. The the bumps are illuminated from the left.
experiment examined whether subjects would learn that the
disparity signal is also a cue to light source direction.
Because the direction of a light source has no effect on the flat lighting so that they appeared as discs of uniform lu-
depth of a lit object in the natural world, we reasoned that minance and, as with the static dots in Experiment 2, the
subjects should not represent a potential dependency discs were placed at simulated depths ranging from j23 to
between light source direction and disparity. Thus, we 23 cm relative to the observer (with all discs in a given
predicted that subjects would be unable to learn that the display lying at a common depth). Stimuli with both shad-
disparity signal is also a cue to light source direction in our ing and disparity signals were created by rendering the
novel experimental environment. shaded bumps at simulated depths. In all tasks, each stim-
ulus was presented for 1 s.

Methods Procedure
Subjects The procedure for Experiment 4 was analogous to those
Subjects were eight students at the University of for Experiments 1, 2, and 3. We used shading-only and
Rochester with normal or corrected-to-normal vision. All disparity-only training tasks to characterize each subject’s
subjects were naive to the purposes of the study. performance on lighting direction and depth discrimination
tasks, respectively, and then trained subjects during shading–
disparity training trials by exposing them to an environment
Stimuli in which disparity was correlated with shading. Finally, we
Figure 5 depicts the stimuli used in Experiment 4. The tested subjects during shading–disparity test trials to
shading stimuli consisted of 23 bumps (hemispheres) lying evaluate whether they had learned that the disparity signal
on a common frontoparallel plane whose pattern of is also a cue to light source direction.
shading provided information about the light source
direction. Each bump subtended approximately 26 min
of visual angle, and the bumps were scattered uniformly Results
within a circular aperture (with a diameter of 6.28-). The
light source was rendered as an infinite point source The shading–disparity test trials contained two condi-
located 45- away from the frontoparallel plane along the tionsVthe direction of the disparity signal in the standard
z-axis (in the direction of the observer). The angular stimulus was offset from vertical by an amount j% or %. If
location of the light source varied from j90- (light a subject had different PSEs in the two conditions, then we
coming from the left) to 90- (light coming from the right), can conclude that the subject learned to use the disparity
with the light source direction in the standard stimulus signal as a cue to light source direction. None of the eight
always set to vertical (0-). In the shading-only training subjects had significantly different PSEs in the two
task, subjects viewed the stimuli monocularly with their conditions (at the p G .05 level), suggesting that subjects
dominant eyes. In all conditions, the bumps were rendered did not learn to use the disparity signal when judging light
using only the red gun of the CRT. source direction.
The stimuli with binocular disparities were identical to The data for all subjects are illustrated in Figure 6. We fit
those in the shading-only training task, except that the bumps psychometric functions (cumulative Normal distributions)
were rendered from left-eye and right-eye viewpoints with to the combined data from all eight subjects when the

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 9

Stimuli
Shading stimuli consisted of 23 bumps (hemispheres)
lying on a common frontoparallel plane whose pattern of
shading provided information about the light source
direction. Each bump subtended approximately 26 min of
visual angle, and the bumps were scattered uniformly
within a circular aperture (with a diameter of 6.28-). The
light source was rendered as a diffuse panel source (i.e., as
an array of local point sources) located 45- away from the
frontoparallel plane along the z-axis (in the direction of the
observer) and with its surface normal pointing toward the
center of the bump array. The angular location of the light
source varied from j90- (light coming from the left) to
90- (light coming from the right), with the light source
direction in the standard stimulus always set to vertical
–30 –20 –10
(0-). Because we were concerned that subjects might be
unable to bind the dynamic auditory signal with static vis-
ual stimuli in the combined trials, we changed the visual
Figure 6. Data from the shading–disparity test trials for all eight stimuli by jittering the light source so that the temporal
subjects combined. microstructure of the visual stimulus seemed consistent
with the dynamic (white noise) auditory signal. One of the
point light sources in the panel array was selected at
offset in the standard was % (solid line) and when it was random and turned off in each frame to jitter the stimulus.
j% (dotted line). The average value across subjects for the This resulted in both flicker and positional jitter.
offset % was equivalent to a 21.32- rotation in light source Auditory stimuli used in the auditory-only and shading–
direction. The 0.59- shift in PSEs in the % versus j% cases auditory trials were identical to those used in Experiment 1.
was not statistically significant at the p G .05 level,
indicating that subjects did not learn to use the disparity
signal as a cue to light source direction. Procedure
The procedure for Experiment 5 was analogous to those
for Experiments 1, 2, 3, and 4. We used shading-only and
auditory-only training tasks to characterize each subject’s
Experiment 5: Auditory cue to performance on lighting direction and auditory discrim-
light source direction ination tasks, respectively, and then trained the subjects
during shading–auditory training trials by exposing them
to an environment in which an auditory signal was
Subjects in Experiment 5 were trained to perceive the correlated with shading. Finally, we tested subjects during
direction of a light source when the visual cue to light source shading–auditory test trials to evaluate whether they had
directionVthe pattern of shading across the visual learned that the auditory signal is also a cue to light source
objectsVwas correlated with a dynamic auditory signal. direction.
The experiment examined whether subjects would learn that
the auditory signal is also a cue to light source direction.
Because the direction of a light source has no effect on the
motion of an object in the natural worldVand thus, no effect Results
on auditory signalsVwe reasoned that subjects should not
represent a dependency between light source direction and The shading–auditory test trials contained two condi-
the auditory signal. Thus, we predicted that subjects would tionsVthe direction of the auditory signal in the standard
be unable to learn that the disparity signal is also a cue to stimulus was offset from vertical by an amount j% or %. If
light source direction in our novel experimental environment. a subject had different PSEs in the two conditions, then we
can conclude that the subject learned to use the auditory
signal as a cue to light source direction. None of the eight
Methods subjects had significantly different PSEs in the two
conditions (at the p G .05 level), suggesting that subjects
Subjects did not learn to use the auditory signal when judging light
Subjects were eight students at the University of source direction.
Rochester with normal or corrected-to-normal vision. All The data for all subjects are illustrated in Figure 7. We fit
subjects were naive to the purposes of the study. psychometric functions (cumulative Normal distributions)

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 10

2000). The basic idea underlying Bayesian networks is


that a joint distribution over a set of random variables can
be represented by a directed acyclic graph in which nodes
correspond to variables and edges between nodes corre-
spond to direct statistical dependencies among variables.
For example, an edge from node x1 to node x2 means that
the distribution of variable x2 depends on the value of
variable x1 (as a matter of terminology, node x1 is referred
to as the parent of x2). A Bayesian network is a
representation of the following factorization of a joint
distribution:

n
Pðx1 ; I; xn Þ ¼ k Pðxi kpaðxi ÞÞ; ð2Þ
i¼1

where P(x1, I, xn) is the joint distribution of variables x1,


I, xn and P(xiªpa(xi)) is the conditional distribution of xi
Figure 7. Data from the shading–auditory test trials for all eight given the values of its parents [if xi has no parents, then
subjects combined. P(xiªpa(xi)) = P(xi)].
As an introduction to Bayesian networks, consider the
following scenario: an observer pulls into a parking lot, and
to the combined data from all eight subjects when the as she begins to exit her car, she hears a Rotweiller’s bark.
offset in the standard was % (solid line) and when it was Looking in the direction of the bark, she sees a distant
j% (dotted line). The average value across subjects for the Rotweiller. The observer’s car is parked close to a
offset % was equivalent to a 7.97- rotation in light source building’s entrance, and the observer must decide whether
direction. The 0.073- shift in PSEs in the % versus j% to wait for the dog to leave the vicinity or to try to make it to
cases was not statistically significant at the p G .05 level, the entrance before encountering the dog. In making this
indicating that subjects did not learn to use the auditory decision, the observer would like to know the following:
signal as a cue to light source direction. (1) Is the dog dangerous? and (2) How far away is the dog?
To simplify things, assume that the observer has access to
only three pieces of information: the loudness of the dog’s
General discussion bark and the size of the dog’s image, which are both cues to
the distance of the dog, and whether the dog is foaming at
the mouth, which lets the observer know whether the dog is
Numerous studies have shown that organisms’ learning rabid and, therefore, dangerous (for simplicity, assume that
processes are often biased or constrained. This article has only rabid dogs are dangerous).
demonstrated that, similar to other learning processes, Figure 8 shows a Bayesian network that represents this
perceptual learning is biased, and has proposed a new situation. The variables corresponding to scene properties
constraint on early perceptual learning to account for this are located toward the top of this figure, whereas the
bias, namely, that people can modify their knowledge of the variables corresponding to percepts are located toward the
prior probabilities of scene variables or of the statistical bottom. Scene variables do not have parents, although
relationships among scene and perceptual variables that are they serve as parents to sensory variables as indicated by
already considered to be potentially dependent but they the arrows. A Bayesian network is a useful representation
cannot learn new relationships among variables that are not of the joint distribution of scene and sensory variables
considered to be potentially dependent. An important goal because of the way it represents potential dependencies.
of this article is to formalize these ideas using the notation Although statistical dependency and causality are not
of Bayesian networks and to illustrate how previous studies equivalent relationships, Bayesian networks are often
of early perceptual learning can be viewed as instances of interpreted as instances of Bgenerative models[ whose
parameter learning. edges point in the direction of causality. Consider the
Bayesian networks are a tool for representing probabil- edges in Figure 8. A change in the physical distance from
istic knowledge that has been developed in the artificial- the dog to the observer causes a change in the perceived
intelligence community (Neapolitan, 2004; Pearl, 1988). size of the dog’s image on the observer’s retina and in the
They have proven useful for modeling many aspects of perceived loudness of the dog’s bark at the observer’s ear.
machine and biological visual perception (e.g., Freeman, Likewise, rabies may lead to the observer perceiving the
Pasztor, & Carmichael, 2000; Kersten, Mamassian, & dog to foam at the mouth. These relationships, however,
Yuille, 2004; Kersten & Yuille, 2003; Schrater & Kersten, are not deterministic; the perceived size of the dog’s

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 11

Figure 8. A simple Bayesian network representing a distant barking dog that may or may not have rabies. The variables corresponding to
scene properties are located toward the top of this figure, whereas the variables corresponding to percepts are located toward the bottom.
Scene variables do not have parents, although they serve as parents to sensory variables as indicated by the arrows.

image and the perceived loudness of the dog’s bark can Consequently, the joint distribution on the right-hand side
also vary due to additional factors that are difficult to of Equation 3 can be factored as follows:
measure, such as physical or neural noise. The conditional
distributions associated with sensory variables represent Pðimage size; loudness of bark k distance to dogÞ ¼
these uncertainties.
Bayesian networks are most useful when they represent Pðimage size k distance to dogÞ
relationships among variables in ways that are both sparse  Pðloudness of bark k distance to dogÞ: ð4Þ
and decomposable.4 The structure of the graph in Figure 8
has been greatly simplified using our knowledge about The computational advantages of statistical relationships
causality, and thus, the graph represents potential relation- that are sparse and decomposable are difficult to appreciate
ships among variables in a sparse way. We understand, for in a simple scenario with seven random variables but have
example, that knowing that the dog has rabies or is enormous importance in real-world situations with hun-
foaming at the mouth tells us nothing about the dog’s dreds of variables. Indeed, whether reasoning requires the
distance, its retinal image size, or the loudness of its bark. consideration of only a small subset of variables versus the
Consequently, there are no edges that link the former and need to take into account all variables or whether reasoning
latter variables. It is precisely these sorts of simplifica- requires the calculation of high-dimensional joint distribu-
tions, or assumed independencies, that make reasoning tions versus low-dimensional distributions are typically the
computationally feasible. For example, an observer rea- most important factors that make a problem solvable versus
soning about whether a dog has rabies only needs to unsolvable in practice (Bishop, 1995; Neapolitan, 2004).
consider whether the dog is foaming at the mouth and can Using the notation of Bayesian networks, we can restate
ignore the values of all other variables. Bayesian networks our hypothesis about constraints on early perceptual
also represent relationships in ways that are decompos- learning. Recall our hypothesis that people’s early percep-
able. An observer wishing to estimate the distance to a tual processes can modify their knowledge of the prior
dog based on the dog’s retinal image size and the loudness probabilities of scene properties or of the statistical
of the dog’s bark can do so using Bayes’ rule: relationships among scene and sensory variables that are
already considered to be potentially dependent. However,
Pðdistance to dog k image size; loudness of barkÞò they cannot learn new relationships among scene and
sensory variables that are not considered to be potentially
Pðimage size; loudness of bark k distance to dogÞ
dependent. In terms of Bayesian networks, our hypothesis
 Pðdistance to dogÞ: ð3Þ states that early perceptual processes can modify their prior
probability distributions for scene variables or their condi-
This calculation can be simplified by noting that, tional probability distributions, specifying how sensory
according to the Bayesian network in Figure 8, the size variables depend on scene variables. However, they cannot
of the dog’s image and the loudness of its bark are add new nodes or new edges between scene and sensory
conditionally independent given the distance to the dog. variables in their graphical representation. In the machine

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 12

learning literature, researchers make a distinction between


Bparameter learning,[ which means learning the prior and
conditional probability distributions, and Bstructure
learning,[ which means learning the nodes and edges of
the graphical structure. Using the terminology of this
literature, our hypothesis states that early perceptual
processes are capable of parameter learning, but they are
not capable of structure learning.5 Interestingly, param-
eter learning is often thought to be computationally
feasibleVthe machine learning literature contains several
maximum-likelihood and Bayesian algorithms that often
work well in practice. In contrast, structure learning is
Figure 9. A Bayesian network depicting the dependence assumptions
widely regarded as intractableVthere are currently no
underlying perceptual judgments in tasks that require observers to
general-purpose algorithms for structure learning that
make a fine discrimination along a simple perceptual dimension.
work well on moderate- or large-sized problems (Rish,
2000).6
Our hypothesis can be divided into two parts: (i) early these distributions because these variances suggest how
perceptual processes are capable of parameter learning, and reliable or informative each feature is in indicating the
(ii) early perceptual processes are not capable of structure value of the scene property. Features whose values have a
learning. To our knowledge, there are no demonstrations in large variance for a fixed value of the scene property are
the scientific literature of structure learning by early relatively unreliable or uninformative. In contrast, features
perceptual processes. That is, we believe that all examples whose values have a small variance for a fixed scene value
of early perceptual learning are demonstrations of param- are more reliable. More accurate estimates of the condi-
eter learning. To illustrate this point, we review several tional distributions and, thus, their variances allow an
classes of early perceptual learning phenomena. observer to more accurately weight features according
Many studies on perceptual learning report the results of to their relative reliabilities, effectively placing greater
experiments in which observers show improved perfor- weight on more reliable features and lesser weight on
mance on tasks requiring fine discriminations along simple less reliable features. Most important, for our purposes,
perceptual dimensions. For example, observers might observers’ improved performances can be accounted for
improve at tasks that require the discrimination of motion solely based on parameter learning; structure learning is
directions of single dots (Matthews & Welch, 1997) or not required.
fields of dots (Ball & Sekuler, 1987), of orientations of Ball and Sekuler (1987) reported an experiment in which
line segments (Matthews & Welch, 1997), of spatial observers improved at discriminating motion directions in
frequencies within plaid patterns (Fine & Jacobs, 2000), or RDKs centered at a training direction but did not show
of offsets between nearly collinear line segments (Fahle, improved performance when tested with motion directions
Edelman, & Poggio, 1995). Often, the learning demon- centered at a novel orthogonal direction. An account of
strated using such tasks is stimulus specific in the sense this finding based on Bayesian networks is as follows.
that the learning fails to generalize to novel versions of the Consider a network in which the parent node corresponds
task using different stimulus positions or configurations. to a scene variable representing the direction of motion in
Figure 9 shows a Bayesian network depicting the the scene and in which the N child nodes correspond to
dependence assumptions that might underlie performance sensory feature detectors sensitive to image motion in N
on a task that requires observers to make fine discrim- different directions. When observers view a display of a
inations along a simple perceptual dimension. In this kinematogram containing a motion direction near the
figure, the physical scene gives rise to a set of condition- training direction, they first estimate this direction based
ally independent perceptual features that the observer uses on the values of their feature detectors. They then adapt
to make decisions regarding the value of some scene their estimates of the conditional distributions of the
property. An account of how an observer improves at values of the feature detectors given the estimated value of
estimating the scene property based on the perceptual the motion direction. Over the course of many trials,
features is as follows. The process of learning consists of observers learn which feature detectors have a small
improving the estimates of the relationships between the variance for a given value of the motion direction, which
scene property and the values of each of the perceptual means that they learn which feature detectors are most
features. In terms of Bayesian networks, learning consists reliable for judging motion direction for directions near
of improving the estimates of the conditional distributions the training direction.7 Because observers do not modify
associated with the perceptual variables, that is, the distribu- their distributions that are conditioned on other motion
tions of the values of the features given a value (or an es- directions, they do not show improved performance when
timated value) of the scene property: P(feature iªscene = x) tested with kinematograms whose motion directions are
for i = 1, I, N. Of particular interest are the variances of centered at a novel orthogonal direction.

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 13

A demonstration that observers learn to weight features


according to their relative reliabilities in a manner
consistent with the account of learning described above
was provided by Gold, Sekuler, and Bennett (2004). These
researchers examined observers’ perceptual representa-
tions through the construction of Bclassification images.[
Briefly, classification images are created by correlating
image feature values (e.g., pixel luminances) with the
values of a scene property. Image features that vary
reliably with the scene property take on extreme values in
the classification image, whereas unreliable features take
on values near zero. Classification images are often used
in the context of perceptual classification. An ideal classi-
fication image for a set of stimuli is constructed by corre-
lating the feature values of each stimulus with its correct
classification; classification images for individual observ- Figure 10. A Bayesian network representing the type of modification
ers can be created by correlating the feature values of each that might underlie the acquisition of new cue combination rules.
stimulus with the classification indicated by the observer. Cue 1 represents a cue whose reliability is fixed, whereas Cue 2
For difficult tasks, the ideal classification images tend to represents a cue that has become less reliable. The solid black
be relatively sparse, with few reliable features for discrimi- curves represent the final conditional cue distributions given a
nating between stimulus classes. Our account of learning value (or estimated value) of the scene property. The dashed gray
described above predicts that naive observers should ini- curve represents the conditional distribution for Cue 2 before
tially use a large set of features when discriminating stimuli learning.
and then gradually reduce the influence of many features as
they discover which features are most reliable for the task. scene property, but the observer is placed in a novel en-
Gold et al.’s results suggest that, during the course of learn- vironment in which Cue 1 is reliable but Cue 2 is not [e.g.,
ing, observers’ classification images move toward the ideal the variance of P(Cue 1ªscene = x) is small, whereas the
classification image in exactly this manner, with observers variance of P(Cue 2ªscene = x) is large]. An account of
incrementally basing their decisions on a smaller, more how an observer improves at estimating the scene property
reliable subset of the available features. based on the two perceptual cues is as follows. On each
A second class of phenomena studied in the perceptual trial of an experiment, observers first estimate the value of
learning literature is the acquisition of new cue combina- the scene property based on the values of all sensory
tion rules. Several studies have found that observers modify variables. They then improve the estimates of the relation-
how they combine information from two visual cues when ships between the scene property and the values of each
one of the cues is made less reliable (Atkins, Fiser, & of the cues; that is, observers modify their estimates of
Jacobs, 2001; Ernst, Banks, & Bülthoff, 2000; Jacobs & P(Cue 1ªscene = x) and P(Cue 2ªscene = x), where x is
Fine, 1999). Ernst et al. (2000), for example, placed the estimated value of the scene property. More accurate
observers in an environment in which the slant of a estimates of these distributions, particularly their variances,
surface indicated by a haptic cue was correlated with the allow an observer to more accurately weight cues accord-
slant indicated by one visual cue but uncorrelated with the ing to their relative reliabilities, effectively placing greater
slant indicated by another visual cue (this slant value weight on more reliable cues and lesser weight on less
varied randomly over trials). Observers adapted their visual reliable cues. As above, observers’ improved performances
cue combination rules to place more weight on the can be accounted for solely based on parameter learning
information derived from the visual cue that was consistent and does not require structure learning.
with haptics and less weight on the information derived A third class of perceptual learning phenomena is often
from the other visual cue. This class of learning phenomena referred to as Bcue recalibration[ (e.g., Atkins, Jacobs, &
can be regarded as conceptually equivalent to the first class Knill, 2003; Bedford, 1993; Epstein, 1975; Harris, 1965;
of phenomena described above in which observers modify Mather & Lackner, 1981; Welch, 1986). For example, an
how they combine information from multiple feature de- observer may wear prisms that shift the visual world 10-
tectors. Consequently, our account of learning for this sec- to the right. As a result, objects visually appear at
ond class is very similar to our account for the first class. locations 10- to the right of the locations indicated by
The top node of the Bayesian network in Figure 10 other sensory signals. Over time, observers notice this
represents a scene variable, such as the slant of a surface, discrepancy and recalibrate their interpretations of the
whereas the two child nodes represent corresponding visual cue so that the visual location is more consistent
sensory variables based on two perceptual cues, such as with the locations indicated by other sensory cues.
slant from visual stereo and slant from visual texture. Using our Bayesian network framework, we hypothesize
Imagine that both cues are normally good indicators of the that observers first estimate the value of the scene property

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 14

The Bayesian network in Figure 12 has two scene


variables, corresponding to the object’s shape and the light
source’s location, and a sensory variable, corresponding to
the perceived visual shape of the object. Our account of
learning in this setting is as follows. Based on an
unambiguous haptic percept (not shown in Figure 12)
and the ambiguous visual percept, observers estimate the
object’s shape. Based on this shape and the perceived
visual shape, observers then estimate the light source’s
location. Learning occurs due to the discrepancy between
the estimated location of the light source and the expected
location based on observers’ prior probability distribution
of this location. To reduce this discrepancy, observers
modify their prior distribution appropriately. Thus, as in
the other classes of learning phenomena reviewed above,
Figure 11. A Bayesian network representing the type of modifica-
the acquisition of prior probability distributions can be
tion that might underlie perceptual recalibration. Cue 1 represents
accounted for through parameter learning and does not
an accurate and low-variance cue, whereas Cue 2 represents a
require structure learning.
low-variance cue but whose estimates are no longer accurate.
We have reviewed four classes of early perceptual
The solid curves represent the final conditional cue distributions
learning phenomena and outlined how they can be
for a particular value of the scene variable. The dashed gray curve
accounted for solely through parameter learning. We
represents the conditional distribution for Cue 2 before learning.
hypothesize that all early perceptual learning is parameter
learning; that is, all early perceptual learning involves the
(top node of Figure 11) based on the values of all sensory modification of knowledge of the prior probabilities of
cues (bottom nodes of Figure 11). They then modify their scene properties or of the statistical relationships among
estimates of the conditional distributions associated with scene and sensory variables that are already considered to
the sensory variables: P(Cue 1ªscene = x) and P(Cue 2ª be potentially dependent. Conversely, we hypothesize that
scene = x), where x is the estimated value of the scene early perceptual learning processes are biased or con-
property. Unlike the case of learning new cue combination strained such that they are incapable of structure learning
rules, the modification is not primarily to the estimate of
the variance of the distribution associated with a newly
unreliable cue. Rather, it is to the estimate of the mean of
the distribution associated with a newly uncalibrated cue
(due, perhaps, to the shift in the visual world caused by
prisms). As before, observers’ improved performances can
be accounted for solely based on parameter learning.
The last class of perceptual learning phenomena that we
consider here is the acquisition of visual priors. For
example, consider observers viewing displays of circular
patches that are lighter toward their top and darker toward
their bottom. These displays are consistent with a bump that
is lit from above or a dimple that is lit from below.
Observers tend to assume that the light source is above a
scene and, thus, prefer to interpret the object as a bump.
Observers in an experiment by Adams, Graf, and Ernst
(2004) viewed objects whose shapes were visually
ambiguous and also touched these objects, thereby
obtaining haptic information disambiguating the objects’
shapes. The shape information obtained from haptics was
consistent with an interpretation of the visual display in
which the estimated light source location was offset from
its expected location based on observers’ prior probability
distributions of the light source’s location. Adams et al.
found that observers modified their prior distributions to Figure 12. A Bayesian network characterizing subjects’ modifica-
reduce the discrepancy between estimated and expected tions of their prior distribution of the light source location in the
light source locations. experiment reported by Adams et al. (2004).

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 15

(the addition of new nodes or new edges between scene and


sensory variables), which means that these processes cannot
learn new relationships among scene and sensory variables
that are not considered to be potentially dependent.
In the experimental part of this article, we reported the
results of five experiments that evaluate whether subjects
can demonstrate cue acquisition. Figures 13 and 14
illustrate the relationships between scene and sensory
variables in Experiments 1, 2, and 3 and Experiments 4
and 5, respectively, in terms of Bayesian networks. Here,
the solid black edges represent dependencies that exist in
the natural world, whereas the dashed gray edges represent
dependencies that do not exist in the natural world but that
we introduced in our novel experimental environments.
For the reasons outlined in the experimental sections, we
expected that observers started our experiments with the
belief that variables that are connected by a black edge are
potentially dependent, whereas variables that are con-
nected by a gray dashed edge are not.
In Experiment 1, subjects were placed in a novel Figure 14. A Bayesian network representing the statistical relation-
environment that resembled natural environments in the ships studied in Experiments 4 and 5. The solid black lines rep-
sense that it contained systematic relationships among resent preexisting edgesVconditional dependencies that exist in
scene and perceptual variables that are normally dependent. the natural worldVwhereas the dashed gray lines represent con-
In this case, cue acquisition requires parameter learning ditional dependencies that do not exist in the natural world but that
and, as predicted, subjects succeeded in learning a new we introduced in our novel experimental environments.
cue. In Experiments 2, 3, 4, and 5, subjects were placed
in novel environments that did not resemble natural and 5. Taken as a whole, our hypothesis provides a good
environmentsVthey contained systematic relationships account of the pattern of experimental results reported
among scene and perceptual variables that are not here. That is, it explains why people learn in some
normally dependent. Cue acquisition requires structure situations and fail to learn in other situations.
learning in these cases. Consistent with our hypothesis, In addition to providing an account of experimental data,
subjects failed to learn new cues in Experiments 2, 3, 4, our hypothesis also has the property of being motivated by
computational considerations. As discussed above,
machine learning researchers have found that parameter
learning in networks with sparse connectivity is a com-
paratively easy problem, whereas structure learning is
typically intractable. Thus, there are good computational
reasons why early perceptual learning processes might be
constrained in the ways hypothesized here.
Our theory is limited to early perceptual learning and is
not intended to be applied to late perceptual or cognitive
learning. This point can be demonstrated in at least two
ways. First, it seems reasonable to believe that learning to
visually recognize an object involves structure learning.
Gauthier and Tarr (1997), for example, trained subjects to
visually recognize objects referred to as Bgreebles.[ A
plausible account of what happens when a person learns to
visually recognize a novel object as the greeble named
Bpimo[ is that the person adds a new node (along with
Figure 13. A Bayesian network representing the statistical relation- new edges) to his or her Bayesian network representation
ships studied in Experiments 1, 2, and 3. The solid black edges that corresponds to this newly familiar object. If this
represent dependencies that exist in the natural world, whereas speculation is correct, then it raises the question of why
dashed gray edges represent dependencies that do not exist in the structure learning is computationally feasible for late
natural world but that we introduced in our novel experimental en- perceptual learning but intractable for early perceptual
vironments. We expect that observers started our experiments with learning. It may be that structure learning of higher level
the belief that variables connected by a black edge are potentially knowledge becomes feasible when a preexisting structure
dependent, whereas variables connected by a gray edge are not. representing lower level knowledge is already in place.

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 16

Second, there have been several demonstrations of


Bcontextually dependent[ perceptual learning that, we
Footnotes
conjecture, may be accounted for via late perceptual
learning processes performing structure learning. Atkins 1
We use the terms Blow-level perception,[ Bearly per-
et al. (2001), for example, trained subjects to combine ception,[ or Bearly perceptual learning[ in the same ways
depth information from visual motion and texture cues in as many other researchers in the perceptual sciences
one way when the texture elements of an object were red literature (see Fahle & Poggio, 2002; Gilbert, 1994, for
and to combine information from these cues in a different reviews). Although the exact meanings of these terms can
way when the elements were blue. In other words, the be fuzzyVfor example, the boundary between early versus
discrete color of the elements signaled which of two late perception is not completely understoodVinvestigators
contexts subjects were currently in, and these two contexts have found these terms to be highly useful.
required subjects to use different cue combination rules to 2
The visual and auditory stimuli in this experiment are
improve their performance on an experimental task. analogous to small particles (e.g., sand grains) moving
Because there is no systematic relationship between color across a surface with an anisotropic texture. The sound
and cue combination rule in natural environments, people produced by such a stimulus depends on the properties of
should not believe that color and cue combination rule are the surface texture, and, as in the current experiment, the
potentially dependent variables, which means that the type anisotropic surface texture would cause changes in the
of learning demonstrated here would require structure mean direction of the moving particles to lead to system-
learning. Related results in the domain of cue acquisition atic changes in spectral properties of the resulting sound.
have recently been reported by Haijiang et al. (2006). We 3
speculate that this type of contextually dependent perceptual A reader may wonder why we did not use more
learning is due to learning processes that have a higher level traditional auditory motion stimuli with interaural phase
than the processes that we have focused on in this article.8 and intensity differences. There are two reasons for this.
We have described a hypothesis about constraints on early First, we wanted to make the auditory signal ambiguous;
perceptual learning. Admittedly, the hypothesis is specula- setting up systematic interaural differences would bias
tive. Although the data favoring the hypothesis are currently observers to perceive the auditory stimulus as indicating a
sparse, its advantages include the following: it accounts for particular motion direction. Second, the visual stimuli
an important subset of data about perceptual learning that used in the current experiment represent field motion and
would otherwise be confusing; it uses a Bayesian network not object motion (i.e., the mean velocity of dot fields in
formulation that is well specified (and, thus, falsifiable) and our RDKs varies between stimuli, whereas the mean
mathematically rigorous; and it leads to several theoretically position of these dots remains constant). Interaural phase
interesting and important research questions. The hypothesis and intensity differences result from changes in the
is meant to deal with perceptual learning in general, although position of an objectVor in the mean position of a group
the experiments in this article have focused on the of objects. Because the mean positions of our visual
predictions of our hypothesis for perceptual cue acquisition. stimuli remain constant, interaural differences are inap-
For us, this seems to be a natural place to begin because the propriate for representing their motion.
4
hypothesis’s predictions with respect to cue acquisition are A reader might wonder why a fully connected
straightforward. Future work will focus on delineating and Bayesian network (i.e., one in which all scene variables
testing the hypothesis’s predictions on other perceptual connect to all sensory variables) is not always used. An
learning tasks. A primary challenge of such work will lie advantage of such a network is that it could represent any
in developing efficient experimental methods for detecting relationship between scene and sensory variables.
changes (or the lack thereof) in observers’ representations of Unfortunately, as mentioned in this article and detailed
variable dependencies. in the literature on machine learning, there is a price to
pay for such representational richnessVinference and
learning in fully connected networks are prohibitively
Acknowledgments expensive in terms of computation. In fact, inference and
learning are computationally feasible only in networks
We thank the reviewers for helpful comments on an with sparse connectivity. Similarly, a reader might wonder
earlier version of this manuscript. This work was supported why a network that initially contains no connections but in
by NIH research grant R01-EY13149. which connections are added over time as needed is not
always used. As before, this type of Bstructure learning[ is
Commercial relationships: none. prohibitively expensive from a computational viewpoint.
5
Corresponding author: Robert A. Jacobs. If people’s early perceptual knowledge can be charac-
Email: [email protected]. terized by Bayesian networks but the structures of these
Address: 416 Meliora Hall, Department of Brain and networks are not learned, then this raises the question of
Cognitive Sciences, University of Rochester, Rochester, where these structures come from. We speculate that
NY 14627-0268. people’s network structures are innate, resulting from our

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 17

evolutionary history in an environment with stationary References


physical laws. If this Bstrong[ view is not strictly correct,
we would not be uncomfortable with a Bweaker[ view in
which the structures are fixed in adults but that structural Adams, W. J., Graf, E. W., & Ernst, M. O. (2004).
learning takes place during infancy or early childhood. Experience can change the Flight-from-above_ prior.
6
As a technical detail, it is worth noting that both Nature Neuroscience, 7, 1057–1058. [PubMed] [Article]
parameter and structure learning in Bayesian networks are Atkins, J. E., Fiser, J., & Jacobs, R. A. (2001).
often regarded as NP-hard problems (Cooper, 1990; Experience-dependent visual cue integration based
Jordan & Weiss, 2002). To illustrate why parameter on consistencies between visual and haptic percepts.
learning is regarded as NP-hard, one must keep in mind Vision Research, 41, 449– 461. [PubMed]
that inference (determining the conditional distribution of Atkins, J. E., Jacobs, R. A., & Knill, D. C. (2003).
some variables given the values of other variables) is often Experience-dependent visual cue recalibration based
a subproblem of parameter learning (e.g., the E-step of the on discrepancies between visual and haptic percepts.
Expectation–Maximization [EM] algorithm requires infer- Vision Research, 43, 2603–2613. [PubMed]
ence). In the machine learning community, inference is
typically performed using the junction-tree algorithm. The Ball, K., & Sekuler, R. (1987). Direction-specific
computational complexity of this algorithm is a function improvement in motion discrimination. Vision
of the size of the cliques upon which message-passing Research, 27, 953–965. [PubMed]
operations are performed. Unfortunately, summing a Bedford, F. (1993). Perceptual learning. Psychology of
clique potential is exponential in the number of nodes in Learning and Motivation, 30, 1– 60.
the clique. This fact motivates the need to use Bayesian Bishop, C. M. (1995). Chapter 2: Probability density
networks with sparse connectivity. Because such networks estimation. In C. M. Bishop (Ed.), Neural networks
tend to minimize clique sizes, inference (and, thus, for pattern recognition (pp. 33–76). New York, NY:
parameter learning) in these networks is often feasible. Oxford University Press.
To illustrate why structure learning is regarded as NP-
hard, one must keep in mind that structure learning is Brunswick, E. (1956). Perception and the representative
typically posed as a model selection problem within a design of psychological experiments. Berkeley, CA:
hierarchical Bayesian framework. The top level of the University of California Press.
hierarchy includes binary variables {Mi, i = 1, I, K}, Cooper, G. F. (1990). The computational complexity of
where Mi indicates whether model i is the Bcorrect[ model probabilistic inference using Bayesian belief net-
and the number of such models K is superexponential in works. Artificial Intelligence, 42, 393 – 405.
the number of nodes in the network. The middle level Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977).
includes real-valued variables {Ei, i = 1, I, N}, where Ei Maximum likelihood from incomplete data via the
is the set of parameters for model i. The bottom level is EM algorithm. Journal of the Royal Statistical
the data, denoted D. The likelihood for model i, P(DªMi), Society Series B, 34, 1–38.
is computed as follows: P(DªMi) = X P (DªEi) P(EiªMi) Epstein, W. (1975). Recalibration by pairing: A process of
dEi. Note that P(DªEi) is the likelihood for the parameters perceptual learning. Perception, 4, 59 –72. [PubMed]
Ei (used during parameter learning). Also note that this Ernst, M. O., Banks, M. S., & Bülthoff, H. H. (2000).
integral is typically not analytically solvable. Touch can change visual slant perception. Nature
7
Readers familiar with the machine learning literature Neuroscience, 3, 69–73. [PubMed] [Article]
will recognize that we are conjecturing that the observer’s
learning rule resembles an EM algorithm, an algorithm for Fahle, M., Edelman, S., & Poggio, T. (1995). Fast
maximizing likelihood functions. At first blush, it may perceptual learning in hyperacuity. Vision Research,
seem that the observer is faced with a Bchicken-and-egg[ 35, 3003– 3013. [PubMed]
problem: Observers first use their feature detectors to Fahle, M., & Poggio, T. (2002). Perceptual learning.
estimate the motion direction and then use the estimated Cambridge, MA: MIT Press.
motion direction to determine the most reliable features. Fine, I., & Jacobs, R. A. (2000). Perceptual learning for a
The EM algorithm is often used to solve such chicken- pattern discrimination task. Vision Research, 40,
and-egg problems (see Dempster, Laird, & Rubin, 1977). 3209 –3230. [PubMed]
8
Interestingly, contextually dependent learning is often
regarded as different than most other forms of learning. Freeman, W. T., Pasztor, E. C., & Carmichael, O. T.
For example, researchers in the animal learning theory (2000). Learning low-level vision. International
community distinguish standard forms of learning, which Journal of Computer Vision, 40, 25 – 47.
they refer to as associative learning, from contextually Garcia, J., & Koelling, R. A. (1966). The relation of cue to
dependent learning, which they refer to as occasion setting consequence in avoidance learning. Psychonomic
(Schmajuk & Holland, 1998). Science, 4, 123 –124.

Downloaded from jov.arvojournals.org on 03/23/2024


Journal of Vision (2007) 7(1):4, 1–18 Michel & Jacobs 18

Gauthier, I., & Tarr, M. J. (1997). Becoming a BGreeble[ Matthews, N., & Welch, L. (1997). Velocity-dependent
expert: Exploring mechanisms for face recognition. improvements in single-dot direction discrimination.
Vision Research, 37, 1673–1682. [PubMed] Perception & Psychophysics, 59, 60 –72. [PubMed]
Gilbert, C. D. (1994). Early perceptual learning. Proceed- Neapolitan, R. E. (2004). Learning Bayesian networks.
ings of the National Academy of Sciences of the United Upper Saddle River, NJ: Pearson Education.
States of America, 91, 1195–1197. [PubMed] [Article] Pearl, J. (1988). Probabilistic reasoning in intelligent
Gold, J. M., Sekuler, A. B., & Bennett, P. J. (2004). systems. San Mateo, CA: Morgan Kaufmann.
Characterizing perceptual learning with external Rish, I. (2000). Advances in Bayesian learning. Proceed-
noise. Cognitive Science, 28, 167 – 207. ings of the 2000 International Conference on Artifi-
Haijiang, Q., Saunders, J. A., Stone, R. W., & Backus, B. T. cial Intelligence. CSREA Press.
(2006). Demonstration of cue recruitment: Change in Saffran, J. R. (2002). Constraints on statistical language learn-
visual appearance by means of Pavlovian conditioning. ing. Journal of Memory and Language, 47, 172 –196.
Proceedings of the National Academy of Sciences of
the United States of America, 103, 483 – 488. [PubMed] Schmajuk, N. A., & Holland, P. C. (1998). Occasion
[Article] setting. Washington, DC: American Psychological
Association.
Harris, C. S. (1965). Perceptual adaptation to inverted,
reversed, and displaced vision. Psychological Review, Schrater, P. R., & Kersten, D. (2000). How optimal depth
72, 419–444. [PubMed] cue integration depends on the task. International
Journal of Computer Vision, 40, 71– 89.
Jacobs, R. A., & Fine, I. (1999). Experience-dependent
integration of texture and motion cues to depth. Wallach, H. (1985). Learned stimulation in space and mo-
Vision Research, 39, 4062– 4075. [PubMed] tion perception. American Psychologist, 40, 399– 404.
[PubMed]
Jordan, M. I., & Weiss, Y. (2002).Graphical models:
Probabilistic inference. In M. Arbib (Ed.), The hand- Welch, R. B. (1986). Adaptation of space perception. In
book of brain theory and neural networks (2nd ed.). K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.),
Cambridge, MA: MIT Press. Handbook of perception and human performance (vol. 1,
pp. 1–45). New York: Wiley-Interscience.
Kersten, D., Mamassian, P., & Yuille, A. (2004). Object
perception as Bayesian inference. Annual Review of Wichmann, F. A., & Hill, N. J. (2001a). The psychometric
Psychology, 55, 271 –304. [PubMed] function: I. Fitting, sampling, and goodness of fit.
Perception & Psychophysics, 63, 1293–1313.
Kersten, D., & Yuille, A. (2003). Bayesian models of [PubMed] [Article]
object perception. Current Opinion in Neurobiology,
13, 150 –158. [PubMed] Wichmann, F. A., & Hill, N. J. (2001b). The psychometric
function: II. Bootstrap-based confidence intervals and
Mather, J., & Lackner, J. R. (1981). Adaptation to visual sampling. Perception & Psychophysics, 63, 1314–1329.
displacement: Contribution of proprioceptive, visual, [PubMed] [Article]
and attentional factors. Perception, 10, 367–374.
[PubMed]

Downloaded from jov.arvojournals.org on 03/23/2024

You might also like