Spectral Modeling Synthesis: Past and Present
Spectral Modeling Synthesis: Past and Present
Spectral Modeling Synthesis: Past and Present
Output
Sines/Noise Synthesis
sound
Transient Modeling
magnitude frequency
smoothing spectrum
window sine frequencies
peak pitch peak
sound * FFT detection peak detection continuation sine magnitudes
phase peak sine phases
spectrum data
Multiresolution
data
additive
synthesis
sinusoidal
component
Residual Analysis/Modeling
-
residual
smoothing component
window window
*
generation
1
Relevant Research Topics (II) Detection/Estimation of Sinusoids (I)
Morphing George, E. B. (thesis, 1991)
Analysis by synthesis where each
Time Scaling sinusoid is subtracted one at a
time.
Compression/Transmission
Source Separation/Transcription Depalle, P.; Hélie, T. (WASPAA,
1997)
Music Content Analysis
parametric modeling of the STFT.
Instrument/Voice Models
Expanded Models Goodwin, M. (thesis, 1997)
Matching Pursuit.
Software Environments
Xavier Serra - London 2003 5 Xavier Serra - London 2003 6
2
Transient Modeling Residual Analysis/Modeling
Masri, P. (thesis 1996) Hamdy, K. N. et alt. (ICASSP, 1996)
Analysis of transients to Wavelet coding of residual.
position analysis window.
Goodwin, M. (thesis, 1996)
Ali, M. (thesis, 1996) Filter-bank auditory model.
Wavelet Analysis for
transients. Ding Y.; Qian, X.
(ICMC, 1997)
Verma, T. et alt. (ICMC, LPC modeling.
1997)
Sinusoids+Transients+ Desainte-Catherine, M; Hanna,P.
Noise Model. (DAFX, 2000)
Parameterization of noise-like sounds.
3
Synthesis of Sinusoids/Noise Morphing
Rodet, X.; Depalle, Ph. (AES, 1992) Serra, X. (ICMC, 1994) User Input
IFFT synthesis for sinusoids. sine- Featured-based
wave interpolation. Morph & Synthesis
Goodwin, M.; Rodet, X.(ICMC, 1994) SMS-Analysis
IFFT synthesis for nonstationary
sines. FFT with Tellman, E. et alt. SMS-
Morph
Blackman-Harris 92dB
Fitz, K.; Haken, L. (ICMC 1995)
(ICMC, 1994) Synthesis
Voice
Modeling. based on
magnitude phoneme HMMs Target Information
4
Source Separation / Transcription Music Content Analysis
Maher, R. (thesis, 1989) Herrera, P. et alt. (CBMI, 1989)
Descriptors for MPEG-7.
Partial collision and Two
Way Mismatch algorithm Heittola, T.; Klapuri, A.
for F0 detection. (ISMIR, 2002)
Identification of drums.
Virtanen, T. et alt.
Gómez, E. et alt. (JNMR, 2003)
(ICASSP, 2000) Melodic description.
Multipitch analysis and
iterative parameter Wang, A. (Shazam, 2003)
estimation. Audio identification.
5
Software Environments Conclusions
From speech to audio to music.
Serra, X. (LMJ, 1991)
SANSY: Lisp environment From analysis/synthesis to content
based on SPIRE
processing.
Fitz, K. et alt. (ICMC, 1995) Beyond signal processing techniques.
Lemur
Techniques are ready for many practical
Loscos, A. et alt.(DAFX, 1998)
SMSPerformer
applications.
Need to combine bottom-up with top-
Amatriain, X. et alt. (ACM, 2002)
CLAM down approaches.
6
Sinusoidal plus Residual Modeling of Musical Sounds: Relevant
References
compiled by Xavier Serra, September 2003
16. Ellis, Daniel P., Barry L. Vercoe. 1990. “A wavelet based sinusoid model of
sound for auditory signal separation.” ICMC90
17. Maher, Robert and James Beauchamp. 1990. “An Investigation of Vocal
Vibrato for Synthesis.” Applied Acoustics 30 pp. 219-245
18. McAulay, R. J.; T. F. Quatieri. 1990. “Pitch Estimation and Voicing
Detection Based on a Sinusoidal Speech Model.” Proceedings IEEE ICASSP
1990.
19. Schumacher, R. T., and C. Chafe. 1990. “Detection of Aperiodicity in Nearly
Periodic Signals.” Proceedings of the IEEE Int. Conf on Acoustics, Speech,
and Signal Processing, Alburquerque, NM, 1990.
36. Adams, G.J.; Evans, R.J. 1994. “Neural networks for frequency line tracking
“ IEEE Transactions on Signal Processing, Volume: 42 Issue: 4 , April 1994
Page(s): 936 -941
37. Doval, B. 1994. Estimation de la Fréquence Fondamentale des signaux
sonores. PhD. Thesis, Université Paris-6, Paris, 1994.
38. Goodwin, M. and X. Rodet. 1994. “Efficient Fourier Synthesis of
Nonstationary Sinusoids.” Proceedings of the 1994 International Computer
Music Conference. San Francisco: Computer Music Association.
39. Serra, Xavier. 1994. “Residual Minimization in a Musical Signal Model
based on a Deterministic plus Stochastic Decomposition.” Journal of the
Acoustical Society of America 95(5-2):2958--2959.
40. Serra, Xavier. 1994. “Sound Hybridization Techniques based on a
Deterministic plus Stochastic Decomposition Model.” Proceedings of the
1994 International Computer Music Conference. San Francisco: Computer
Music Association.
41. Tellman, E.; L. Haken; B. Holloway. 1994.”Timbre Morphing Using the
Lemur Representation.” Proceedings of the International Computer Music
Conference, Aarhus, Denmark, October 1994.
42. Wang, A. 1994. Instantaneous and Frequency-Warped Signal Processing
Techniques for Audio Source Separation. Ph.D. Thesis, Stanford University.
122. Althoff, Rasmus; Florian Keiler; Udo Zölzer. 1999. “Extracting Sinusoids
from Harmonic Signals.” DAFX99.
123. Fitz, Kelly. 1999. The Reassigned Bandwidth-Enhanced Method of Additive
Synthesis. Ph. D. dissertation, Dept. of Electrical and Computer Engineering,
University of Illinois at Urbana-Champaign.
124. Freed, Adrian. 1999. “Spectral Line Broadening with Transform Domain
Additive Synthesis.” ICMC99.
125. Herrera, P., X. Serra, G. Peeters. 1999. "A proposal for the description of
audio in the context of MPEG-7", Proceedings of the CBMI'99 European
Workshop on Content-Based Multimedia Indexing.
126. Irizarry, Rafael. 1999. “Weighted Estimation of Harmonic Components in a
Musical Sound Signal.” JTSA
127. Koenen, R. 1999. Overview of the MPEG-4 Standard. ISO/IEC
JTC1/SC29/WG11 N3156, Dec. 1999.
128. Laroche, J. and M. Dolson. 1999. “New phase-vocoder techniques for real-
time pitch shifting, chorusing, harmonizing, and other exotic audio
modifications.” Journal of the Audio Engineering Society, vol. 47, no. 11, pp.
928–936, November 1999.
129. Laroche, J. and M. Dolson. 1999. “New phase-vocoder techniques for pitch-
shifting, harmonizing, and other exotic effects.” in Proceedings of the IEEE
Workshop on Applications of Signal Processing to Audio and Acoustics, New
Paltz, NY, New York, Oct. 17–20, 1999, pp. 91–94, IEEE Press.
130. Laroche, Jean and Mark Dolson. 1999. “Improved Phase Vocoder Time-
Scale Modification of Audio.” IEEE Transactions on Speech and Audio
processing. Vol. 7, No. 3, May 1999.
131. Levine, S. N. 1999. Audio Representations for Data Compression and
Compressed Domain Processing. Ph.D. Thesis, Stanford University
132. Levine, S. N. and Julius O. Smith III. 1999. “A Switched Parametric &
Transform Audio Coder.” ICASSP-99
133. Levine, S. N. and Julius O. Smith III. 1999. “Improvement to the Switched
Parametric & Transform Audio Coder.” Proc. IEEE Workshop on
Applications of Signal Processing to Audio and Acoustics.
134. Peeters, G.; X. Rodet. 1999. “SINOLA: A New Analysis/Synthesis using
Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum.” ICMC99,
Beijing (China).
135. Rossignol, S.; P. Depalle, J. Soumagne, X. Rodet, J.-L. Collette. 1999.
“Vibrato: detection, estimation, extraction, modification.” DAFX99
136. Schwarz, D.; X. Rodet. 1999. “Spectral Envelope Estimation and
Representation for Sound Analysis-Synthesis.” Proceedings of the
International Computer Music Conference (ICMC'99), Beijing, October 1999.
137. Tolonen, Tero. 1999. “Methods for Separation of Harmonic Sound Sources
using Sinusoidal Modeling.” Preprint Number: 4958 AES Convention 106.
138. Troughton, Paul T. 1999. “Bayesian Restoration of Quantised Audio Signals
using a Sinusoidal Model with Autoregressive Residuals”. Proceedings of the
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
Mohonk, 1999.
139. Verma, T.S. and T.H.Y. Meng. 1999. “Sinusoidal modeling using frame-
based perceptually weighted matching pursuits,” in Proceedings ICASSP’99,
Phoenix, Arizona, USA, May 1999, vol. 2, pp. 981–984.
140. Verma, Tony S. “A Perceptually Based Audio Signal Model with
Application to Scalable Audio Compression”. Ph.D. thesis. Stanford
University, October 1999.
141. Vos, K.; R. Vafin, R. Heusdens, and W.B. Kleijn. 1999. “High-quality
consistent analysis-synthesis in sinusoidal coding,” in Proceedings of the AES
17th International Conference, Florence, Italy, September 1999, pp. 244–250.