SOFA Specs 0.6 PDF
SOFA Specs 0.6 PDF
SOFA Specs 0.6 PDF
SOFA
Spatially Oriented Format for Acoustics
Piotr Majdak
Acoustics Research Institute, Austrian Academy of Sciences, Vienna
< [email protected] >
Markus Noisternig
IRCAM-CNRS-UPMC, Paris, France
< [email protected] >
Further contributors:
• Hagen Wierstorf (Telekom Innovation Laboratories, Technical University of Berlin, Berlin,
Germany) [email protected]
1. WHAT IS SOFA?
Head-related transfer functions (HRTFs) describe the spatial filtering of the incoming sound. So
far available HRTFs are stored in various formats, making an exchange of HRTFs difficult be -
cause of incompatibilities between the formats. We propose a format for storing HRTFs with a
focus on interchangeability and extendability. The spatially oriented format for acoustics
(SOFA) aims at representing HRTFs in a general way, thus, allowing to store data such as di -
rectional room impulse responses (DRIRs) measured with a microphone-array excited by a
loudspeaker array. SOFA specifications consider data compression, network transfer, a link to
complex room geometries, and aim at simplifying the development of programming interfaces
for Matlab, Octave, and C++. SOFA conventions for a consistent description of measurement
setups are provided for future HRTF and DRIR databases.
1.16. SOFA version 0.6 (submitted to AES for the meeting in Berlin 2014)
• The global attribute Source renamed to Origin
• TimeCreated and TimeModified renamed to DateCreated and DateModified, respectively
• If ListenerUp is provided, ListenerView must be provided as well. If ListenerView is pro-
vided, ListenerView:Type and ListenerView:Units must be provided as well. This also ap-
plies to Source, Emitter, and Receiver objects.
• Geometry: only Cartesian or spherical coordinate systems allowed.
• Local coordinate system better defined.
• In SimpleFreeFieldHRIR: SubjectID renamed to ListenerShortName
SOFA version 0.6 Piotr Majdak & Markus Noisternig Page: 4
2. INTRODUCTION
Head-related transfer functions (HRTFs) describe the spatial filtering of the incoming sound
due to the listener's anatomy. HRTFs are crucially important for the binaural reproduction of
virtual acoustics. HRTFs have been measured by a number of laboratories and are typically
stored in each lab's native file format. While the different formats are of advantage for each lab,
an exchange of such data is difficult due to incompatibilities between formats.
In this work, we propose specifications for an HRTF – data exchange format with a special fo-
cus on interchangeability and extend ability. The spatially oriented format for acoustics (SOFA)
aims at representing spatial data in a general way, allowing to store not only HRTFs but also
more complex data, e.g., directional room impulse responses (DRIRs) measured with a multi-
channel microphone array excited by a loudspeaker array. In order t o simplify the adaption of
SOFA for various applications, examples of implementation of the format specifications are
provided together with a collection of exemplary data sets converted to SOFA.
The AES-X212 HRTF file format standardization project is based on the SOFA format and was
recently approved by the AES subcommittee SC-02 and assigned to the working group SC-02-
08 on audio file interchange.
Emitter #2
Source
(excitation signals)
Emitter #1
Listener
Receiver #1 Receiver #2
y
One of the first publicly available HRTFs measured in human listeners was the CIPIC database
[2]. The measurements were performed at a constant distance of 1 m for 1250 spatial directions
around the listener. The HRTFs are available for 43 listeners as IRs of 200 samples at a sampling
rate of 44.1 kHz. Since then many other HRTF/DRIR databases have been made publicly avail-
able [3–8].
All those measurement setups have the following properties in common. In an anechoic cham -
ber or in a room, excitation signals are generated and microphones are used to record the in -
coming signals (see Fig. 1). The measurement is repeated while varying the spatial position of
SOFA version 0.6 Piotr Majdak & Markus Noisternig Page: 5
the excitation source relative to the listener, which is done by varying the position of the lis-
tener, the sound source, or both in different dimensions.
Binaural HRTF measurement setups use only two microphones to record the left and right ear
signals. However, HRTF/DRIRs measurements may also consider multiple microphones, e.g.,
three microphones per head side in hearing-assist devices [7], tens of microphones arranged in
an array structure at different directions and distances from the center [9], a multichannel mi-
crophone array arranged around the listeners in a reciprocal HRTF measurement system [10],
[11], multichannel microphone arrays for measuring DRIRs [12] or various microphone posi-
tions in a room, e.g., for concert-hall acoustics measurements [13]. As a generalization, micro-
phones and an object comprising those microphones can be identified. Thus, in this article, a
microphone as the single receiver of the sound field is called the receiver, and the comprising all
the receivers is called the listener, see Fig. 1.
The sound source used for the excitation signal is not necessarily a single point source. Loud-
speaker arrays were used, either to control the sound field surrounding the listener, e.g., wave-
field synthesis [11], [14], [15], or higher-order Ambisonics [16], [17] or to control the radiation
characteristics of the sound sources [18]. Similarly to the concept of listener and receivers, in
this article, the particular sources creating the excitation signal are called emitters and the object
comprising the emitters is called source. Note that a measurement setup with a source with mul-
tiple emitters and a listener with multiple receivers has already been considered [19].
In typical HRTF measurements, only the direction of the incoming signal is varied. In more re -
cent setups also different sound-ear distances have been considered [4], [11], [20]. However,
sometimes the variation of other parameters is of interest. For example, HRTFs were measured
as a function of the head orientation relative to the torso [21], or the room IRs were measured
as a function of the room temperature [22]. An HRTF file format should thus consider even
such parameters.
1 see http://www.opendaff.org/
SOFA version 0.6 Piotr Majdak & Markus Noisternig Page: 6
• Self-describing data with a consistent definition, i.e., all the required information about the
measurement setup must be provided as metadata in the file;
• Flexibility to describe data of multiple conditions (listeners, distances, etc) in a single file;
• Partial file and network support;
• Available as binary file with data compression for efficient storage and transfer;
• Predefined description conventions for the most common measurement setups.
SOFA aims at fulfilling all those requirements. SOFA specifications are described in the follow -
ing sections. A HRTF/DRIR measurement setup is described by various objects (Sec. 3.1) and
their relations (Sec. 3.2). The information is stored in a numeric container (Sec. 3.3) and struc-
tured by the measurement. Measurement is a discrete sampled observation done at a specific
time and under a specific condition. A measurement consists of data, e.g., an IR (Sec. 3.4), and is
described by its corresponding dimensions (Sec. 3.5) and metadata (Sec. 3.6). All measurements
are stored in a single data structure, e.g., a matrix of IRs.
SOFA version 0.6 Piotr Majdak & Markus Noisternig Page: 7
3. GENERAL SPECIFICATIONS
3.1. Objects
Receiver is any acoustic sensor like the ear or a microphone. The number of receivers in not
limited in SOFA and defines the size of the data matrix.
Listener is the object incorporating all the receivers. For HRTFs, a listener can be a head or
dummy-head microphone. For DRIRs, a listener represents the microphone-array structure
such as a sphere or a frame. Incorporating the receivers in the listener as a single logical object is
important because in measurements, usually the orientation and/or position of the listener vary
without substantial changes in the head-microphone relation. For example, in measurements
done for multiple positions in a room, the position of the head varies and the relation between
the head and the microphones does not change. Note that only one listener is considered.
Emitter is any acoustic excitation used for the measurement. The number of emitters is not lim-
ited in SOFA. The contribution of the particular emitter is described by the metadata (see later).
Source is the object incorporating all emitters. In SOFA, source might be a multi-driver loud -
speaker (with the particular drivers as emitters), or a speaker array (with the particular speakers
as emitters), or a choir (with the particular human as emitter), etc. Note that only one source is
considered but the source may incorporate an unlimited number of emitters.
Room is the volume enclosing the measurement setup. In the case of a free-field measurement,
the room is not considered. An optional room description is considered for measurements per-
formed in reverberant spaces, with a direct description of a simple shoebox, or with a link to a
digital asset exchange file for a more complex description.
Optional Objects can be described by including user-defined metadata of a measurement. For
example, this might be the information about a torso, as in the measurements in which the an-
gle between the torso and the head is varied as an independent variable.
3.4. Data
Data represent the numeric description of the acoustic systems and consist of a multidimen-
sional matrix of an arbitrary size. Data stored in this format have the flexibility to be in the do-
main that best accommodates the measurement and measurement system. Data can be time do-
main finite IRs (data type FIR) or infinite IR filter coefficients (IIRBiquad), with or without sep-
2 see http://www.unidata.ucar.edu/software/netcdf/
3 see http://www.hdfgroup.org/HDF5
4 see http://www.unidata.ucar.edu/software/netcdf- java/formats/UnidataObsConvention.html and
http://cf-pcmdi.llnl.gov/
5 http://www.unidata.ucar.edu/software/netcdf/docs/user_guide.html
SOFA version 0.6 Piotr Majdak & Markus Noisternig Page: 9
FIR
Name Default Dimensions Type Comment
Data.IR 0 [mRn] double impulse responses (along a time axis)
broadband delay in the units of N (i.e., the
Data.Delay 0 [ I R ], [ M R ] double
time axis of FIR).
Data.SamplingRate 48000 [ I ], [ M ] double Sampling rate of the IRs and the delay
Data.SamplingRate:Units hertz irrelevant attribute Unit used for the sampling rate
TF
Name Default Dimensions Type Comment
Data.Real 0 [mRn] double real part of the complex spectrum
Data.Imag 0 [MRN] double imaginary part of the complex spectrum
N 0 [N] double frequency values
N:LongName frequency irrelevant attribute
N:Units hertz irrelevant attribute Unit used for N
Table 1: Data types considered . Dimensions noted in lower case define the corresponding
dimension size within the SOFA file.
arately stored broadband delays. The broadband delay (i.e., time-of-arrival, TOA) can be stored
as discrete delays in a matrix or as parameters of continuous-directional TOA model [26]. Data
contain fields (e.g., Data.FIR, Data.G) which are functions of the dimension N. The interpreta-
tion of N depends on the data type, e.g., for IRs, N represents the sampling interval (i.e., in-
verse of the sampling rate) or the number of FIR – filter taps. The interpretation is denoted in
the attributes of the dimension variable N. The different data types and corresponding fields are
shown in Tab. 1.
3.5. Dimensions
Each netCDF variable has fixed dimensions and its dimensions must be defined before creating
the variable. Thus, in SOFA, netCDF dimensions are pre-defined, see 2.
Data and metadata are described by using these dimensions. User-defined dimensions are cur-
rently not provided. Throughout this document, the variable sizes are denoted by [ A 1 A 2 … AI]
where Ai represents the length of the dimension i of the I -dimensional matrix. We use the Mat-
lab/Octave notation, where the first, second, and third values represent the number of the
rows, columns, and third dimension, respectively.
For example, assume a database consisting of one thousand measurements, i.e., M = 1000, ob -
tained for 1000 different positions of the source, i.e., SourcePosition is [M C], using two mi -
crophones, i.e., two IR per measurement, and sampling rate of 48 kHz. Further, assume only a
single measurement position, i.e., a single ListenerPosition. This means that Data.IR, Source-
Position, and ListenerPosition will be of dimension [1000 2 3], [1000 3], and [1 3], respectively.
Then, in the netCDF file, M = 1000, R = 2, and C = 3. Further, the netCDF variables Da-
ta.IR, SourcePosition, and ListenerPosition will have dimensions [M R N], [M C], and [I C], re-
spectively.
In a SOFA file, each dimension size must be uniquely defined and all variables with the corre-
sponding dimension must have that size. To this end, for each dimension, we define dimension
size: we chose a variable, which size defines the size of the corresponding dimension. In this
document, dimension sizes are noted as a lower-case letter, see e.g., [m R n] in Tab. 1. Note that
when designing SOFA conventions, dimension sizes must be defined exactly once: a missing di -
mension size will result in unknown size of the dimension; multiple definitions of a dimension
size will most probably result in contradictory size of the dimension.
Variables can have different dimensions. For example, it is possible to provide the ListerPosi-
tion as a single entry, meaning that the single ListenerPosition is valid for all measurements. But
it is also possible to provide a different ListenerPosition for each measurement. Note that there
are restrictions on the variant dimensions:
• The dimensions must be the pre-defined dimensions, see Tab. 2.
• The size of the dimensions may change, but the number of dimensions, i.e., dimen-
sionality, must not change. In the above example, valid dimensions of the ListenerPosi-
tions are [I C] and [M C]. Invalid dimensions would be [C].
Strings are represented as character arrays along the dimension S. When more than one string
array is considered in a SOFA file, S represents the size of the array with the longest string di-
mension. This can be useful when for example a SOFA file containing HRTFs of many listen-
ers is required and each subject is represented by an ID string. In such a case, a variable Subje -
cID can be defined as a string array, with a string for each ID.
3.6. Metadata
Metadata consist of variables and their attributes. Numerical variables are multidimensional ma-
trices of the type “double” (i.e. 64 bits floating point data). String variables are saved as charac-
ter arrays. Other types of variables are not allowed and can be derived from “double” or
“string”. Each variable can have its attributes, which are netCDF-attributes. Further, the most
important properties of the measurement are valid for the global measurement setup are de-
scribed by global attributes (see Tab. 4). All metadata names must begin with a letter followed
by letters or digits. Note that underscores (“_”) and the metadata names “API”, “GLOBAL”,
and “PRIVATE” are not allowed because they are reserved for internal usage in the API. When
saved as a variable, date and time uses the number of seconds from 1970-01-01 00:00:00 (Unix
time). When saved as attributes, date and time uses a string in the ISO-8601 format “yyyy-mm-
dd HH:MM:SS”. Units are lower case.
For the sake of simplicity, nested structures within the metadata are not allowed, but grouping
by prefixes using the Pascal convention, e.g., ListenerPosition and Listener View is used.
Table 4: General metadata in SOFA, stored as global attributes in the netCDF file.
SOFA version 0.6 Piotr Majdak & Markus Noisternig Page: 12
3.7.1. Cartesian
x, y, z as a basis
3.7.2. Spherical
Parameter Range Front, eye-level Left, eye-level Back, eye-level Above Below
Azimuth angle 0°...360° 0° 90° 180° 0° 0°
Elevation angle -90°...90° 0° 0° 0° 90° -90°
Radius >0 N/A N/A N/A
SOFA version 0.6 Piotr Majdak & Markus Noisternig Page: 14
4. SOFA CONVENTIONS
In order to meet the different requirements coming from different application fields, SOFA
conventions are specified, i.e., definitions of data and metadata consistently describing particular
HRTF/DRIR measurement setups. Instead of aiming at foreseeing the future, conventions
should be developed only for known measurement setups. The known features should be con -
sistently described while not limiting the development of future conventions.
The following SOFA conventions are being discussed. Measured data exist but their description
must be fixed in order to create publicly available SOFA files and corresponding software inter-
faces.
• SimpleFreeFieldHRIR: aimed at storing HRTFs recorded in free field with omnidirec-
tional emitter and source and stored as IRs for a single listener.
• SimpleFreeFieldTF: similar to SimpleFreeFieldHRIR, but uses TF as DataType cover-
ing special needs coming from HRTF simulations
• SingleRoomDRIR: Room impulse responses measured with an arbitrary number of re-
ceivers (such as a microphone array) and an omnidirectional source in a single room.
5. TECHNICAL ASPECTS
5.2. Networking
A repository is available at http://www.sofacoustics.org/data. Currently, http requests for
downloading full SOFA files are supported.
In principle, netCDF files can be also transferred via networks by using the Open Data Access
Protocol (OpenDAP), which is a protocol for providing local data to remote locations regard-
less of local storage format.8 SOFA, being technically speaking a netCDF convention, should be
able to use OpenDAP. The OpeNDAP server will allow partial access of SOFA files via net-
work.
6 see http://sf.net/projects/sofacoustics
7 see http://www.hdfgroup.org
8 see http://opendap.org
SOFA version 0.6 Piotr Majdak & Markus Noisternig Page: 16
[2] V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, “The CIPIC HRTF database,” in
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics , 2001, pp.
99–102.
[4] H. Wierstorf, M. Geier, A. Raake, and S. Spors, “A Free Database of Head-Related Impulse
Response Measurements in the Horizontal Plane with Multiple Distances,” in 130th Con-
vention of the Audio Engineering Society (AES), 2011, p. eBrief 6.
[5] T. Nishino, S. Kajita, K. Takeda, and F. Itakura, “Interpolation of head related transfer func-
tions of azimuth and elevation,” J Acoust Soc Jpn, vol. 57, pp. 685–692, 2001.
[6] P. Majdak, M. J. Goupell, and B. Laback, “3-D localization of virtual sound sources: effects of
visual environment, pointing method, and training.,” Attent Percept Psychophys, vol. 72,
no. 2, pp. 454–69, Feb. 2010.
[8] M. Jeub, M. Schäfer, and P. Vary, “A binaural room impulse response database for the evalua-
tion of dereverberation algorithms,” in 2009 16th International Conference on Digital Sig-
nal Processing, 2009, pp. 1–5.
[9] I. Balmages and B. Rafaely, “Open-Sphere Designs for Spherical Microphone Arrays,” IEEE
Trans Audio Speech Lang Proc, vol. 15, no. 2, pp. 727–732, Feb. 2007.
[10] D. N. Zotkin, R. Duraiswami, E. Grassi, and N. A. Gumerov, “Fast head-related transfer func-
tion measurement via reciprocity,” J Acoust Soc Am, vol. 120, no. 4, pp. 2202–2215, 2006.
[11] M. Pollow, M., Nguyen, K.-V., Warusfel, O., Carpentier, T., Müller-Trapet, M., Vorländer, M.,
and Noisternig, “Calculation of Head-Related Transfer Functions for Arbitrary Field Points
Using Spherical Harmonics Decomposition,” Acta Acust United Ac, vol. 89, pp. 72–82,
2012.
[12] B. Khaykin, D., and Rafaely, “Acoustic analysis by spherical microphone array processing of
room impulse responses,” J Acoust Soc Am, vol. 132, pp. 261–270, 2012.
[13] T. Pätynen, J., Tervo, S., and Lokki, “Analysis of concert hall acoustics via visualizations of
time-frequency and spatiotemporal responses,” J Acoust Soc Am, vol. 133, pp. 842–857,
2013.
[14] A. J. Berkhout, “Holographic Approach to Acoustic Sound Control,” J Audio Eng Soc, vol.
36, pp. 977–995, 1988.
[15] S. Ahrens, J., and Spors, “Wave field synthesis of a sound field described by spherical har-
monics expansion coefficients,” J Acoust Soc Am, vol. 131, pp. 2190–2199, 2012.
SOFA version 0.6 Piotr Majdak & Markus Noisternig Page: 17
[16] M. A. Gerzon, “Ambisonics. Part two: Studio Techniques,” Studio Sound, vol. 17, pp. 24–26,
1975.
[17] M. Zotter, F., Pomberger, H., and Noisternig, “Energy-preserving ambisonic decoding,” Acta
Acust United Ac, vol. 98, pp. 37–47, 2012.
[18] B. Rafaely, “Spherical loudspeaker array for local active control of sound,” J Acoust Soc Am,
vol. 125, pp. 3006–3017, 2009.
[19] S. Clapp, A. Guthrie, J. Braasch, and N. Xiang, “The use of multi-channel microphone and
loudspeaker arrays to evaluate room acoustics,” in Proceedings of the Acoustics 2012,
2012, vol. 131, no. 4, p. 3208.
[21] M. Guldenschuh, A. Sontacchi, and F. Zotter, “HRTF modelling in due consideration variable
torso reflections,” in Proceedings of the Acoustics’08, 2008, pp. 99–104.
[22] G. W. Elko, E. Diethorn, and T. Gänsler, “Room impulse response variation due to thermal
fluctuation and its impact on acoustic echo cancellation,” in International Workshop on
Acoustic Echo and Noise Control (IWAENC2003), 2003.
[23] A. Andreopoulou and A. Roginska, “Towards the Creation of a Standardized HRTF Reposi-
tory,” in 131th Convention of the Audio Engineering Society (AES), 2011, p. Convention
Paper 8571.
[24] D. Schwarz and M. Wright, “Extensions and Applications of the SDIF Sound Description In-
terchange Format,” in Proceedings of the International Computer Music Conference,
2000.
[25] J. Merimaa, T. Peltonen, and T. Lokki, “Concert Hall Impulse Responses - Pori, Finland,”
2005. [Online]. Available: http://www.acoustics.hut.fi/projects/poririrs/. [Accessed: 01-
Feb-2013].
[27] M. Noisternig, F. Zotter, and B. F. Katz, “Reconstructing sound source directivity in virtual
acoustic environments,” in Principles and Applications of Spatial Hearing, Y. Suzuki, D.
S. Brungart, and H. Kato, Eds. Singapore: World Scientific Publishing, 2011, pp. 357–373.