Proceedings of the 2002 International Conference on Auditory Display, Kyoto, Japan, July 2-5, 2002



Lauri Savioja and Tapio Lokki Jyri Huopaniemi

Telecommunications Software and Nokia Research Center

Multimedia Laboratory Speech and Audio Systems Lab.
Helsinki University of Technology P.O.Box 407
P.O.Box 5400, FIN-02015 HUT, Finland FIN-00045 Nokia Group
[email protected] Finland
[email protected] [email protected]

ABSTRACT parametric auralization. Section 4 is a description of perceptual

The primary goal of this paper is to give a general view on room evaluation of the DIVA auralization system. Guidelines for further
acoustic modeling and auralization, and especially to describe the development of the system are presented next. Finally, Section 6
current status of the DIVA auralization system. We have been concludes the paper.
building the system for several years, and it has evolved a lot dur-
ing that time. It is a room acoustic modeling and auralization sys- 2. BACKGROUND
tem suitable for both real-time and non-realtime acoustic render-
ing, and it is designed for research purposes. It applies the para- In this Section, we present some currently available auralization
metric room impulse response rendering technique described in systems and research institutes working in this field, and after that
the article. In this paper we review the architecture and design we concentrate on the DIVA auralization system and its history
principles of the system. A description of recent advances is given and applications.
and results of perceptual evaluations are presented.
2.1. Auralization systems
Auralization and room acoustics have been researched in several
The research problems related to the design of an auralization 1 places during the past decade. Each research institute has had its
system are illustrated in Fig. 1. The field is multidisciplinary and own specific area, and in the following we have made a rough di-
thus the implementation of the system requires understanding and vision into three categories applying the same division as in Fig. 1.
knowledge of room acoustics, digital signal processing, and psy- Please, note that only a couple of early references for each institute
choacoustics. On a more general level the research problem in are presented. The list is not exhaustive, but only gives the most
auralization is to model and simulate the sound propagation from important ones from our viewpoint.
sound sources to the ear drums of a listener through the modeled The idea of auralization and digital simulation and reproduc-
space. The research problems related to the design of an auraliza- tion of room acoustics is quite old. The predecessors of auraliza-
tion system lie in the areas of room acoustic modeling, digital filter tion systems have been artificial reverberation algorithms. Since
design, and 3D sound reproduction. the pioneering work of Schroeder [2, 3] digital reverberators have
There are two main approaches in auralization. In the percep- been developed for professional audio and music industry. The
tual one the main emphasis is on the perceived sound whereas in
the physics-based approach the physical behavior of sound is under
simulation. The perceptual one is computationally less expensive, Research areas Research problems Goal
and therefore it is widely applied in the entertainment industry, for
example, modern PC sound cards with 3-D sound capabilities uti- Room
lize this approach. In physics-based modeling the goal is accurate acoustic
acoustics modeling
simulation that can be utilized in applications with higher require-
ments such as in acoustical design of concert halls.
Digital Perceptually Authentic
This paper is organized as follows. In Section 2 we make a signal optimized auralization
brief review of the most important virtual acoustic systems avail- processing filter design
able, and describe the background of the DIVA auralization sys-
tem in more detail. In the next Section we discuss the basics of Psycho 3D sound
acoustics reproduction
1 The term auralization has been defined by Kleiner et al. [1] as follows.
Auralization is the process of rendering audible, by physical or mathemat-
ical modeling, the sound field of a source in a space, in such a way as to
simulate the binaural listening experience at a given position in the mod- Figure 1: The research areas and problems involved in the design
eled space. of an auralization system.

Proceedings of the 2002 International Conference on Auditory Display, Kyoto, Japan, July 2-5, 2002

design goal in reverberators is similar to the one in auralization Coordination Acoustique/Musique) in France. For example, they
systems; to model a decaying sound field by producing a dense have created a spatializer tool, Spatialisateur, for musicians with
pattern of reflections. perceptual modeling approach [20, 21, 22]. Especially, they have
Nowadays computers are widely applied to help in room acous- developed signal processing algorithms for efficient rendering and
tic design. Most of the available applications for this purpose also for real-time systems.
have their own module for auralization. Computational modeling
of room acoustics and auralization has been the main area of re- 2.2. History and design goals of the DIVA auralization system
search for the following organizations:
 One of the first attempts to model room acoustics and re-
At the Helsinki University of Technology (HUT) research on room
acoustic modeling and 3-D sound reproduction has been conducted
alize binaural auralizations of concert halls was realized in since early 1990s. In 1994 first attempts to combine these efforts
the Centre Scientifique et Technique du Batiment, France, were made. Since then we have developed the Digital Interactive
[4, 5]. In addition, they were one of the firsts who proposed Virtual Acoustics (DIVA) auralization system for room acoustic
separate modeling of early reflections and statistical late re- modeling aiming at both real-time and non-realtime auralization.
In the first DIVA auralization system we applied a dedicated
 In the Technical University of Denmark, research on mod- DSP-processor for auralization, but already in 1994 we also had
eling of concert hall acoustics and auralization [6] has led an implementation running on a UNIX-workstation, and after that
to the room acoustic modeling program called Odeon2 . we have been working entirely without any special hardware. In
 A lot of research on modeling of reflections as well as ba- the original version, headphone reproduction was applied, but cur-
rently both multichannel reproduction with loudspeakers and bin-
sic research on auralization methods has been conducted in
the Chalmers University of Technology, Sweden, [1, 7]. As aural reproduction with headphones are supported.
a product of these studies the room acoustic modeling pro- The DIVA auralization system has been a part of a more com-
gram, CATT Acoustic3 , including auralization, has been de- plex system aiming at virtual concert performance. As a result we
veloped. Recently, they have studied edge diffraction mod- had our own virtual orchestra. The first public performance of the
eling successfully [8]. DIVA virtual orchestra was in 1997 in SIGGRAPH97 conference
in Los Angeles. Another demonstration of the DIVA auralization
 The room acoustic simulation program EASE/EARS in- system was the Marienkirche video presenting the acoustic design
cluding auralization has been developed by the Acoustic of a concert hall. The video was presented in the Electronic The-
Design Ahnert4 , Germany, [9]. ater of SIGGRAPH98 conference [23], and after that it has been
 In the University of Parma, Italy, a lot of research on con- shown on several TV-channels worldwide.
cert hall and automotive acoustics, and sound systems in-
side cars [10] has been executed. In addition, they have 3. ROOM ACOUSTIC MODELING AND AURALIZATION
produced the room acoustic modeling software Ramsete5 .
The following research units have had a significant impact on In room acoustic modeling the propagation of sound waves in a
the field of 3-D sound reproduction: space is under study. This can be divided into two subparts: mod-
eling of the actual propagation and modeling of reflections from
 In the Ruhr-Universitat Bochum, Germany, a lot of basic boundaries of a space. The modeling of wave propagation is quite
research in room acoustic modeling and especially in hu- straightforward. In a free space each sound source emits a spher-
man spatial hearing has been done [11, 12]. In addition, ical wavefront, i.e., an elementary wave, that propagates homo-
they have their own auralization system [13, 14]. geneously in all directions. The amplitude of sound is inversely
 In the NASA-Ames Research Center, USA, the research has proportional to the distance from the sound source. Modeling of
reflections is a bit more challenging. In each reflection a new
been driven by the interest to directional hearing and real-
time systems [15, 16]. Recently, they have been building a wavefront is created, and the reflections can be modeled as new
real-time auralization tool, SLAB [17], for interactive spa- sound sources. Therefore it is possible to reduce the model such
tial sound research. that recursively in each reflection a new sound source is created.
Finally we have only sound sources, but no reflections. In actual
 Binaural technology and HRTF measurement methods have modeling the effect of each source is composed to produce the fi-
been developed extensively in the Aalborg University, Den- nal sound field in the listening positions. However, some of these
mark [18]. secondary sources are not visible to a listening point due to occlu-
 Research on artificial reverberators and auralization have sion by surfaces. For this reason validity of all sources is verified
been carried out since 1992 in the Massachusetts Institute of with a visibility check. One of the most commonly applied room
Technology, USA, [19]. Further development of this work acoustic modeling techniques is called the image source method
has led to pro audio softwares by Wave Arts6 . [24], and it is based on this approach.
Figure 2 illustrates this concept of sound field decomposition
The leading research institute concerning perception-based real- [25]. Each reflection from a wall is replaced with an image source
time auralization has been the IRCAM (Institut de Recherche et and each corner (except convex rectangular corners) is replaced
2 ~odeon/ with an edge source. All of these secondary sources emit a wave-
3 front that are shown inside the geometry. With the concept of im-
4 age sources each elementary wave can be easily filtered with fre-
5 quency dependent acoustic phenomena such as sound source di-
6 rectivity, distance delay and attenuation, air, material, and wall ab-

Proceedings of the 2002 International Conference on Auditory Display, Kyoto, Japan, July 2-5, 2002

first order late

image sources reverberation
edge anechoic unit, R
sources sound
... ... out(left)
T ( z) T ( z) ... T ( z)
0 1 N
F ( z) F ( z) ... F ( z)
0 1 N

second order
image source ...

Figure 3: The signal processing structure of the DIVA auralization

system [28, 27]. In this example the output is binaural.
Figure 2: An example of the sound field decomposition into the
elementary waves with the image-source method. The illustration
is done by computing an impulse response in each pixel and by  set of filter coefficients describing the material properties in
plotting the time moment of 680th sample which corresponds to reflections,
14.2 ms in time.  required parameters for calculation of response from a diffract-
ing edge in the case of an edge source.
sorption which are all included to the simulation shown in Fig. 2. The parameters of late reverberation are pre-calculated based
When auralizing the elementary waves for headphone reproduc- either on measurements or results of room acoustic modeling. By
tion the binaural features of human hearing can be considered as this technique we can tune the reverberation time and some other
well, as proposed by Borish almost twenty years ago [26]. essential features of reverberance according to the properties of the
The most straightforward method to realize auralization is to space.
measure or model binaural room impulse responses (BRIR) and
convolve them with anechoic signals. This method is called the
3.2. Audio signal processing
direct room impulse response rendering method. The technique is
computationally heavy, and implementation of dynamic systems In the DIVA auralization system the image-source calculation pro-
is difficult. This is caused by the fact that in dynamic rendering vides the auralization parameters which are finally converted to
the BRIRs have to be measured from all the possible locations signal processing parameters. The reason for this two level process
of a listener. In practice this can be implemented by making the is the fact, that in dynamic rendering the auralization parameters
measurements with a certain grid, and during auralization the exact do not need to be updated for every audio sample. However, the
response is obtained with interpolation. signal processing parameters have to be defined on a sample by
sample basis. In the DIVA auralization system this is achieved by
3.1. Parametric room impulse response rendering interpolating the signal processing parameters between the updates
of auralization parameters.
The DIVA auralization system applies the parametric room im- The signal processing structure utilized in the DIVA auraliza-
pulse response rendering method in which the BRIRs are not cal- tion system is depicted in Fig. 3. It contains a long delay line DL
culated before the actual auralization process. Instead, a set of which is fed with anechoic sound to be processed. The distance
either perception- or physics-based parameters for the auralization of the image source from the listener defines the pick-up point to
process is defined. Such parameterization enables a more robust the filter block Tk (z ), where k = 0; 1; 2; ::; N is the identifier of
way for dynamic and real-time rendering than the direct room im- the image source (k = 0 corresponds to the direct sound). Blocks
pulse response rendering [27]. T0:::N (z ) modify sound signal with the sound source directivity
The DIVA auralization system is based on room acoustic mod- filters, distance dependent gains, air absorption filters and mate-
eling and aims at perceptually authentic rendering of the modeled rial filters (not for the direct sound). The incoming direction of
space. In our system the modeling is divided into two parts. The the sound is defined with blocks F0:::N (z ) containing directional
first part is time and place variant containing modeling of the direct filtering or panning depending on the reproduction method. The
sound and early reflections. The image source method is applied superimposed outputs of the filters F0:::N (z ) are finally summed
for this purpose. The second part is for rendering the late reverber- with the outputs of the late reverberation unit R which is a complex
ation that is assumed to be diffuse such that its parameters do not recursive algorithm.
change as a function of time or place.
The image source method implemented in the DIVA auraliza-
tion system gives the following parameters for each image source: 3.3. Modeling of diffraction
 order of reflection, The most recent advancement in the DIVA auralization system is
 orientation (azimuth and elevation angles) of sound source, the diffraction modeling, and it is described in more detail in this
 distance from the listener,
section. Svensson et al. [8] have derived a mathematical solution
for calculating the impulse response for an edge of a finite length.
 incoming direction of sound (azimuth and elevation angle The impulse response is calculated from the source to the listen-
in relation to the listener), ing position through the edge. With this analytical solution the

Proceedings of the 2002 International Conference on Auditory Display, Kyoto, Japan, July 2-5, 2002

R 11111
11111 Top view
11111 11111
00000 R
11111 000000
11111 z1
11111 sound position r2
n 000000
11111 source
l 000000
R 000000
11 000000
11111 m
11 000000
00000 000000
11111 000000
w S 11
1111100 S 000000
11 000000
z apex
111111 receiver
11 000000
z0 000000
111111 position r1
111111 0
111111 0

Figure 4: Geometry of a finite wedge. The positions of source

S and receiver R are indicated in cylindrical coordinates. On the
right, sound paths via edge points z0 and z1 are indicated by the Figure 5: The geometry of the studied lecture room and the po-
solid lines, the least-time sound path via the apex point zapex is de- sitions of the sound source and two receiver positions, utilized in
picted with dashed line and some other sound paths are illustrated the evaluation. In both receiver positions the listener was looking
with dotted lines. straightly forward.

edge diffraction is modeled to the DIVA auralization system. The

auralization parameters for an edge source are: situations where this simplification could be most audible would
be long edges that are close to the listener.
 wedge angle w , In the current implementation the edge diffraction filters are
 position of source S (x; y; z ), designed between image source calculation and auralization pro-
 position of receiver R(x; y; z ),
cesses. As such, our implementation is not practical for real-time
use, but dynamic off-line rendering is straightforward.
 start and end point of the edge Z0 (x; y; z ); Z1 (x; y; z ),
 normal vector ~n of a surface. 4. PERCEPTUAL EVALUATION
With this data for each edge, the impulse response is calculated
with the following equations [8, 29]: In the design and implementation of the DIVA auralization system
Z   we have pursued towards an ultimate goal of an authentic auraliza-
 Z1 m + l ++ + + + + + tion in which a listener is unable to distinguish a simulated sound
h(t) = t dz;
4 Z0 c ml from a recorded sound. For this reason our system has been eval-
(1) uated by both objective and subjective means. The main emphasis
on this Section is on the subjective case, but first the objective ap-
proach is briefly reviewed. In both cases the careful analysis is
 =  sin[ (  S  R )] :
performed with a model of one lecture hall at the HUT as illus-

 cosh 1 1+sin sin cos[ (  S  R )]

trated in Fig. 5.
cosh cos cos The objective evaluation has been based on calculation of room
(2) acoustic attributes such as revereberation time (T20), early decay
time (EDT) and clarity (C50). These attributes have been obtained
An example of a finite wedge is depicted in Fig. 4 to illustrate the both from the simulation results and from the corresponding mea-
variables. In addition, c is speed of sound,  = =w is the wedge sured impulse responses. In general, the results show that above
index, m is the source-to-edge point distance, and l is the edge 400Hz the attributes coincide quite well. However, below that
point-to-receiver distance. The integration range is between the there are some minor defects in modeling, for example, the aural-
two end points of a finite edge. izations are less reverberant than the recordings on that frequency
Based on the impulse responses we have designed diffraction range.
filters that implement the diffraction phenomenon as an impulse
response in one point [29], but in real life diffraction sources are 4.1. Evaluation framework
not point-like. Sound passes the edge through all points along the
edge, however, most of the energy is concentrated on the least-time The perceptual evaluation of auralization quality was based on the
point of the edge. Based on this, the simplification to a point-like framework illustrated in Fig. 6 [31]. The evaluation was performed
secondary source is not too severe. In addition, the diffraction by comparing recorded and auralized sound-tracks. The record-
image source, being a point source, can be panned to the direction ings made in the studied lecture room were considered as reference
the least-time point indicates as proposed by Torres et al. [30]. signals.
The same principle holds for the sound source directivity, since To find out subjective perceptual differences between the
from the viewpoint of the edge most of the sound energy from the recorded and the auralized sound-tracks several listening tests have
actual source radiates towards the least-time point of the edge. The been carried out.

Proceedings of the 2002 International Conference on Auditory Display, Kyoto, Japan, July 2-5, 2002

ANECHOIC STIMULUS nificant differences between the grades given to spatial and timbral
properties. Signals having sustained total characteristics such as
sound of a clarinet were judged with the best grades. With sig-
nals having transients such as a hit of a snare drum the differences
LOUDSPEAKER 3D model of a room were clearly audible but on the average they were evaluated to be
IN REAL ROOM image source calculation plausible and natural sounding.
source directivity filters
REALHEAD material filters
OR MONOPHONIC air absorption filters
RECORDING HRTF filters The current modeling techniques applied typically in room acous-
late reverberation tic design are based on the geometrical acoustics. In the future, the
phenomena caused by the wave-nature of sound should be mod-
parameters eled more carefully. This can be done either by employing some
AURALIZATION wave-based model such as the digital waveguide mesh [33], or by
REALHEAD adding their modeling into existing ray-based systems.
The next major improvement to the DIVA auralization system
will be modeling of diffusion. In general, diffusion plays an im-
background portant role in room acoustics. For example, in concert halls the
noise sidewalls are typically made to be diffusive rather than specularly
reflecting. Our approach to this problem will be to incorporate dif-
binaural or binaural or
fusion into the image-source method by using surface sources base
monophonic monophonic
recording auralization
on the ideas suggested by Dalenback [34]. In addition, we are go-
ing to further enhance the auralization quality by continuing both
subjective and objective evaluations.

The research on auralization has gained significant progress during

Figure 6: A framework for evaluation of virtual acoustic system.
the past decade. Current computers are computationally efficient
and thus enable elaborate audio signal processing required in high-
quality auralization. In the DIVA auralization system developed at
Different listening test methods have been tried out due to the the HUT the latest improvements have incorporated modeling of
reason that no recommended listening test methodology for testing diffraction into the system. The quality of the DIVA auralization
the auralization quality exists. Finally, we chose to apply the ABX system has been assessed with several listening tests. At the best
paradigm. The method utilizes double-blind triple stimulus with the quality has been shown to be so good that the listeners have
hidden reference, including interval scales [32]. been unable to distinguish between recorded and simulated sound
The quality of auralization has many different aspects and it is tracks. In the future the quality of auralization systems will still
multidimensional by nature. Of course, subjects could only judge improve due to more efficient computers and advances in the mod-
whether the sound-tracks differ or not, but then no information eling techniques.
about the nature of differences is achieved. To obtain more infor-
mation about possible differences, two attributes, namely spatial
and timbral differences, have been studied. 7. ACKNOWLEDGMENTS
The assessment has been an iterative process containing sev-
eral evaluation rounds. Totally 20 subjects (three females and 17 This work has been supported by the Academy of Finland through
males) participated in the final listening test. All of them reported the Helsinki Graduate School in Computer Science.
normal hearing although this was not verified with audiometric
tests. The test was done in a standard listening room and the head- 8. REFERENCES
phone reproduction method was applied with Sennheiser HD-580
