Creating Visual Music in Jitter - Approaches and Techniques
Creating Visual Music in Jitter - Approaches and Techniques
Creating Visual Music in Jitter - Approaches and Techniques
Source: Computer Music Journal, Vol. 29, No. 4, Visual Music (Winter, 2005), pp. 55-70 Published by: The MIT Press Stable URL: http://www.jstor.org/stable/3681482 . Accessed: 22/02/2012 22:50
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to Computer Music Journal.
http://www.jstor.org
*2045 13th Avenue W Seattle, Washington 98119 USA [email protected] tUniversity of Victoria/Cycling '74 304-3122 Quebec Street Vancouver, British Columbia V5T3B4 Canada [email protected]
in
"Visual music" is a term used to refer to a broad range of artistic practices, far-flungtemporally and geographically yet united by a common idea: that visual art can aspire to the dynamic and nonobjective qualities of music (Mattis 2005). From paintings to films-and now to computer programs-the manifestations of visual music have evolved along with the technology available to artists. Today'sinteractive, computer-basedtools offer a variety of possibilities for relating the worlds of sound and image; as such, they demand new conceptual approaches as well as a new level of technical competence on the part of the artist. Jitter,a software package first made available in 2002 by Cycling '74, enables the manipulation of multidimensional data in the context of the Max programmingenvironment. An image can be conveniently representedby a multidimensional data matrix, and indeed Jitterhas seen widespread adoption as a format for manipulating video, both in nonreal-time production and improvisational contexts. However, the general nature of the Jitter architecture is well-suited to specifying interrelationships among different types of media data including audio, particle systems, and the geometrical representations of three-dimensional scenes. This article is intended to serve as a starting point and tutorial for the computer musician interested in exploring the world of visual music with Jitter.To understand what follows, no prior experience with Jitter is necessary, but we do assume a familiarity with the Max/MSP environment. We begin by briefly discussing strategies for the mapping of sound to image; influences here include culturally learned and physiologically inherent cross-modal associations, different domains of association, and musical style. We then introduce Jitter, the format
Computer Music Journal,29:4, pp. 55-70, Winter 2005 ? 2005 Massachusetts Institute of Technology.
of image matrices, and the software's capabilities for drawing hardware-accelerated graphics using the standard.This is followed by a survey of OpenGL techniques for acquiring event and signal data from musical processes. Finally, a thorough treatment of Jitter'svariable frame-ratearchitecture and the Max/MSP/Jitterthreading implementation is presented, because a good understanding of these mechanisms is critical when designing a visualization and/or sonification network.
Figure 1. The "Kikiand Bouba" experiment, proposed by K6hler (1929) and refined by Werner(1934).
Synaesthetic Mappings Synaesthesia is a psychological term that refers to a mixing of the senses that occurs in certain individuals. It is said to occur when a person perceives . something via one sense modality and experiences it through an additional sense. Though synaesthesia ........... does occur between sounds and colors, its most common form combines graphemes (written characters) with colors. Grapheme/color synaesthetes perceive each letter with an accompanying color, linked by a consistent internal logic. The letter "T,"for ex- and amplitude, certain mappings to graphicalpaample, might always appeargreen to such a person, rameters can be shown to have a basis in the physical world. The smaller a physical object is, the while "O" is always blue. Based on psychophysical Ramachandranand Hubbard(2001) higher the frequencies it tends to produce when resexperiments, have demonstrated that this form of synaesthesia is onating. Therefore, we can say that when mapping the frequency of a voice to the size of a correspona true sensory phenomenon rather than a conceptual one. At first blush, this consistent pairing of ding visual object, a scale factor that maps from different sense modes might thereby seem to be a high notes to small shapes and from low notes to foundation for audio/visual composition. large shapes is more natural than the reverse, begood cause it is consistent with our experience of the the logic of synaesthesia is entirely subHowever, The mappings derived by interviewing a physical world. Likewise, amplitude of sound tends jective. for example, are not to map naturally to brightness of image, because given person with synaesthesia, amplitude and brightness are measurements of the likely to be meaningful to another synaesthete, let same physical concept-intensity of the stimulusalone any given viewer. in the audio and visual domains, respectively. Some audio-to-visual mappings do exist that are Timbre is too complex a phenomenon to be replikely to reveal their logic to a variety of listener/ resented by a small number of changing values. viewers, owing to a basis in human perception or physics. For example, multiple concurrent voices in Characterizingmusical timbre in terms of perceptua work of music can be perceived in two compleally salient quantities is a complex task to which there are many approaches(Donnadieu, McAdams, mentary ways: reductionistically and holistically. Trained listeners can choose to devote more of their and Winsberg 1994). Some musically useful apattention to one or more voices or to the whole. proaches for mapping timbre to imagery might be viewers of an animation with simultanesuggested by results from the study of human perLikewise, ous discrete visual elements can focus their attenception. In an experiment designed by the Gestalt tion on one or more elements in the same way. In psychologist Wolfgang K6hler (1929) and further refined by Werner(1934), two drawings like the both the audio and visual domains, as more eleones in Figure 1 were shown to a variety of people. ments are added, the listener/viewer becomes less able to devote attention to them all simultaneously. When asked, "Which of these is a 'bouba' and which is a 'kiki'?," over 90 percent of people deThis isomorphism between sensory domains indicided that the round shape is the "bouba"and the cates that mapping single musical voices in a comto single graphical elements is a good pointy one is the "kiki." The overwhelming verdict position for making visual experiences that can ac- is striking. approach In their work on synaesthesia, Ramachandranand company music in a meaningful way. Hubbard(2001) present strong circumstantial eviTo make a graphicalvoice map to a musical voice, dence that the bouba/kiki effect is created by crosswe can consider the musical voice's fundamental modal connectivity in the brain's angulargyrus, and frequency, amplitude, and timbre. Forfrequency
56 Computer Music Journal
Figure 2. Excerptfrom Synchromy (1971) by Norman McLaren.Courtesy of the National Film Board of Canada.
J=110
mp
U.....
mmmmmm
that an excess of such connectivity is present in individuals with synaesthesia. In other words, we are all synaesthetes to a degree. This supports the potential of KOhler's earlier findings as a model for of timbre to form in visual music. As mappings with the mappings of frequency and amplitude proposed above, we see an isomorphism between audio frequencies and frequencies in another domain, this time of curvature of form. Thus, analysis in the frequency domain provides a way to generate a meaningful mapping from timbres to shapes. MappingDomains We can imagine using the observations in the preceding paragraphsto construct a single audiovisual voice in a work of visual music. Its shape would change from smooth to spiky as its timbre gained additional high harmonics, its size would shrink as its fundamental frequency rose, and its brightness would rise and fall with the amplitude of its sounds, creating discrete notes that we would perceive simultaneously through our eyes and ears. From a musical point of view, such strict mappings on their own can be banal, merely reinforcing correspondences we already understand. The isomorphisms
they manifest, however, are vital as a ground from which meaning can be derived. The degree to which the imagery and sound correlate through natural mappings in visual music is analogous to consonance between voices in tonal music. Each is an expression of the agreement between the parts of a composition. Each can be perceived to increase or decrease, can be measured in an approximate way, but not quantified exactly except through arbitrary metrics. In visual music as with harmony, vitality comes not from total agreement but from movement over time between concord and discord. The temporal scale at which a mapping operates is another degree of freedom available to the composer. Sound and image may be interrelated with a granularityrangingfrom milliseconds to minutes. Two examples of visual music by the Canadian artist Norman McLarenillustrate two points along this spectrum. Figure 2 shows an excerpt about 1.5 sec in length from McLaren's1971 work Synchromy, with images from the film above the transcribed score. To realize Synchromy, McLaren composed music that he drew directly on the optical soundtrack of 35-mm film as blocks of different vertical and horizontal sizes, which are audible as square waves of different frequencies and amplitudes, respectively. The visual component of the Jones and Nevile 57
fromLinesHorizontal
(1961) by Norman McLarenand Evelyn Lambart, with score by Pete Seeger.Courtesy of the National Film Board of Canada.
mmmmmm-
--160
r--3 D 1 II . I I 1 IW
A-m,-"
M 4 l TI~~l
r--3 -I. i1 .1
r ----3
W l I I I
3
.. II I I
8i
film was created by manipulating the soundtrack on an optical printer to create multiple copies in different foregroundand backgroundcolors. In this way, McLarenused the technology of film to associate sound to image millisecond by millisecond. Each note, chord, and rest is clearly visible; the close correlation produces a strong synaesthetic effect. Figure3 shows two excerpts from the 1961 film Lines Horizontal by Norman McLarenand Evelyn Lambart.The flowing, emotive score by Pete Seeger correlates clearly with this minimalist study in motion if time intervals on the order of 10-20 sec are considered. The flute melody on the top staff begins the piece, accompanying the measured introduction of the basic transformation by which the film's visual complexity is generated:one line gives birth to another as it changes direction. The score for flute, banjo, and guitar modulates through several sections as the film gains visual complexity, each modulation coinciding with a change in the background color. At the film's climax, as the accumulation of lines dazzles the eye by creating an ambiguity between figure and ground, the music builds to its
most harmonically adventurous point as shown by the guitar melody in the bottom staff, which is followed by a restatement of the audio and visual themes above. A computer-mediated mapping that operates on a time scale like that of Lines Horizontal would coordinate groups of audio and visual events statistically, rather than single events. Even when distinct voices do exist, looser mappings may be more salient, either alone or in concert with more finegrained mappings. It may also make sense to shift the conceptual framework of the visualization to match that in which the audio is composed. For example, musique concrate, which bases its meaning partly on source material from within a specific culture, may best be served by visual materials that reference that same culture. Mapping and Musical Style Culture often dictates more basic isomorphisms as well. Figure4 shows still frames from a visualization made with Jitterthat RandyJones created to accom-
58
4. Figure (a-c) Threevisualizationsfor the Suite Seven fromDavidJaffe's of Wonders the Ancient World (1996).
pany David Jaffe'sSuite from the Seven Wondersof the Ancient World (1996), as performedby Andrew Schloss on the Radio Drum (see www.jaffe.com/ 7w.html). In the original Seven Wonders, Mr. Schloss controlled a YamahaDisklavier with the Radio Drum; the moving keys of the Disklavier provided a visual accompaniment to the performance. A projected computer-visual accompaniment was created for performances of the Suite for which a Disklavier was not available. First realized using Onadime Composer (www.onadime.com/products/ composer.html) and later using Jitter, the accompaniment maps musical notes to visual ones using variations of form for each section of the Suite. Each variation is presented on a grid with one octave of "keys" per row. The rows are arrangedbottom to top, from low frequencies to high frequencies. This arrangementhas a loose physical basis in that smaller things (higher frequencies) are generally found higher in the air than bigger things (lower frequencies). However, the left-to-right order of increasing pitches within an octave makes sense mainly because it mirrors the piano keyboard, a culturally determined arrangement. Color is a topic to which we may already have called attention by avoiding it so studiously. Like timbre, color is a complex phenomenon that can be quantified in different ways requiring multiple variables. Thinkers since Aristotle have proposed particular connections between musical notes and the colors of the spectrum (iotaCenter 2000). However, as shown by FredCollopy's collection of "color scales" from 1700 to the present (rhythmiclight.com/ archives/ideas/colorscales.html), there is no basis for the universality of any one such mapping. Color can be mapped to sonic parametersbesides pitch to aspects of timbre, for example. Nevertheless, whatever mapping is chosen, a consistent internal logic (as with mappings to shape and size) is vital to create the expectations that Meyer (1956) has shown are necessary for a work of music to convey meaning. In general, the choices of crossmodal mappings made by the composer, and their relationships to physically motivated or culturally
defined mappings, help define the style of a work of
(a)
(b)
(c)
visual music.
59
AnIntroduction Jitter to
The original version of Max was created to manipu- It is important to note that what we have called the late MIDI data (Puckette 1988). "Patcher,"as it was spatial dimensions of a matrix need not be intercalled, provided composers working with MIDI tone preted spatially. Forinstance, as we will see later, it modules a graphical environment in which to create is possible to transcode audio signals into onegraphs of data flow between input and output, radi- dimensional matrices for Jitter-basedprocessing, or to represent the vertices of an OpenGLgeometric cally simplifying the process of implementing new model as a multi-plane, one-dimensional matrix. control systems. In its modern form, Max objects are designed to respondto messages that can be com- For an image matrix, the spatial interpretation of posed of multiple atoms. An atom can be a 32-bit in- the coordinates is correct, because the values of the teger, 32-bit floating-point number, or alphanumeric cells represent the colors of a two-dimensional grid of pixels. The ARGB representation separates a symbol. When MSP was introduced in 1998 (Zicolor into three different components: red, green, carelli 1998), it added a new data type: the signal. are streams of 32-bit floating-point samples. and blue, along with an additional opacity compoSignals Unlike messages, which can occur at varying internent known as the alpha channel. Accordingly, the vals and lack a precise relationship to time, signals most common Jitterimage matrix represents a twoare sampled at a fixed frequency, usually controlled dimensional grid with four planes. Because 2563 is by an audio output device's master clock. In the inequal to the 16 million colors that a basic consumer video card can display, a typical Jitternetterest of computational efficiency, objects that matheir calculations on a work operates on image matrices of 8-bit chars. nipulate signals perform group of samples each time MSP runs their code, Image data may also be representedby any of the other primitive types supportedby the matrix forrather than on one sample at a time. This group is mat or as grayscale images if only one plane is presknown as a signal vector. With a signal vector size of 64 samples and a sampling frequency of 44.1 kHz, ent. The high-resolution image format OpenEXR an MSP object's perform routine will be called (www.openexr.com) is also supported. There are several different ways to import image roughly once every 1.45 msec. data into the Max/MSP/Jitterenvironment. A comIn 2002, the introduction of the Jitter object set, mon starting point is to play back a QuickTime primarily created by JoshuaKit Clayton, brought another new data type to the Max world. A matrix movie. This can be accomplished with the is a multidimensional container without any exj it . qt .movie object, which loads the frames of a plicit time-dependency. The spatial dimensions of a movie from a file on disk and provides them as data in the image matrix format discussed above. Altermatrix define the number of cells within the matrix. A matrix may also have more than one plane, natively, live video can be routed into the Jitterenvironment using the j it . qt. grab or j it . dx. grab which affordsdata-parallelismin a spatial sense: each cell stores one scalar value for each plane of objects. One can also synthesize two-dimensional the matrix. Matrices can store one of four scalar imagery with the drawing tools of an object like data types: char (8-bit unsigned bytes), int (32-bit j it. lcd, with a noise generator like j it .noise, or float32 (32-bit floating point numbers), by manipulatingthe cells of a matrix directlyby sendintegers), and float64 (64-bit floating point numbers). For ex- ing setcell messages to the j it .matrix object. Of course, there are many objects that do not genample, a 320 x 240 matrix with four planes of char erate matrices but rather accept input matrices and data would have 76,800 cells and contain 307,200 operate on their data. Examples of such filters are unsigned bytes. Each matrix is associated with a j it. brcosa and j it. traffic, which operate unique name, and matrices are passed between obacross planes in the color space of the image, and jects by name in messages composed of the symbol and j it. repos, which operate on followed by the matrix's name. j it. tiffany j itmatrix
60
in the spatialdimension.
::;:fX:.XX? :...m :,?:::C:!.:::::?::I?1 ,p %: :X. .-1. 1IN m 1:1: -1I N .1 N "X:.X ...,. .X:.::XMMm IXX MtXj .;...... :...11 ..... : :X M .: N. .............. I, : ., --:.s: . .. -..-? Xj .... XMlm:IM ... ' :: .......I."..-% -imVX M-M::-;: ::: 4. .::.'..:.I~-? ..? ":4::4.m . -::X-M:::-M ".."1i?!11,:%..........I.." . .... i:: : ., 1I I .... ............ M " .I-I. .:::::- i ,:,:: ::- :....::m .4..NN:X:S?I.:MI::.':I. ":::. :1 ...:::1....I:X::m":1I.....Z.1.........X-:XM::1:X-:X?18..X........i:X ...:.................... .. :.I::M . .::: .....x-N:::'."..XmmXX... .: :MI.I...:.M:- :: .:M .'-:X11,.?..11.....:V ::i:XX-Xi:ll:1:M::4 :18.:i:: .:: :1?1....... - I,I...."I" ::F. it :j:X ::.: .." .' :...... ..................:MM :81 ? .M:::- " :XN" :M M- x~.--11...111. .. ... .1 ..::I:XM:.::1 "'."'.' .. MX :.MX:IX ~x % :?iIo?:: :?:I?:."i. :i-?:%::'-:NXX..:.::4i:.................................1 :.,,?I:1......:1......4X-1:--mM:?:.iii.:-:.::::1?::X:.:I.::: ::...::M:.:Xj : :1i :-?: :.?::.:: X :--: .: . :..X X:: X: ::X:?;8,8F .?: ..1 N : ." :-::-j:X-M:?:. :?....S .X NIN:41.:... .:.. :: .1... :., :?: ... t I ::,:. ,: :::.: XNiviv. :. Xi. ;:..,.-.X-Xim%-M -,.1 ...... ... ? ? :: ::I:! : ......... . .,? .:: ::.:: :'::1::XX " :.. ...:--I I~...."--..-"-!' . . .....X i': Il.??!?:.:!: . .11:4:.?:. : .::.::,.?:,:.:: .? . . -.I . -4. ..I .... .........IX... .. '." , . " ."" ...44 -:XMX M 1-:.X- :!. :NIN: XXXX X-X--:.::.:1.:.?!.:.:!.:?::t::,:!::.,:. -4::4?414: :m::"..I.-.111.111.11.1-1.1.1..1 : .::N: ...:.:.: :: :4.::.?:O:.."..:..::........?S -:M .:::::::!.::??::?:::,::?::. a: %444-N :iX-::-::4:.:.,l.:4.:C?4c:4,? .... ....IN%:ii::4:?Im.-mjX-: XXX .XX N. -4-:.iNXX ......." . :?.?4.?::.:4.::.?:4.?:.:40?:.::.::4?::.::.?:4 . :. '-'X4-'iX-X::-XX4:: X::-X: :;:!.-.?,:,:-,?:S:: :,::?:.:?::4:..4.:4::.':X~f:- -s.I::...,?.".1i.:4u?I m -lm ? ' - :.mX? :-I.:U::.,::,:. b"?.??,?.'???,? -- ...".".1,....." NX:-::X-:::-%:-XX.V: - ??? ?::% 4 ::.:: .....:?:4 -!X:'::.":.?l?::.::.?::.?:!.?:?::.":.?:1.::O:::?::":?.::1?::.'X- ?,? 1-:NX':as....,..: .-::.:%::.:?,:::. EF ...N."XM li11 ???'?.?:?:. :?.:-:!,:, X-::' :..... ."-.4 !.,-.M :X: SH?:!. ::- - :!.::!.:!..- ap :X:.:N':: 1.: ................,mX-mm~f::m -..:.1 :-.it8 :IN .-IN ?1 -:jI 1:,1M i!;:! ?? Q N-.':.:mXf-': -N'?'NF:......j 1..I--X?I ....::::::.:4?::?:?4.?4?.i .1,'%?:jj III! . ..: i:i:1,:. :4.:.:-1N:. 1:.1:4.?:4.?::.::.:::::::::.::4.'1-- :.::::.:,.:n :?-:: ..11-- i NXXX.. .:.::-:. ..X: XXN:i~m:?- X:-::X: ??????????iX I I.. - .M: .. N I?1 , i%1:1 .I.." .1".1" :? .... . :::. ::;?.::,::4::4it . " .:%.:M ::': N::-'?? ?: M P: w_X:7::?:4.:::..:::;?? .-. ..X~ i:m i .::-X::-X::-XII: !:: :' ' .:.8..,3..S :1: _ .:X':X:::.X-:X-': .. . . m.' :..:1 1.~::N ...1P..o .- ... --,I."'. ......................--..,:;::I .!.:.:T 1 " . . .1. ,Z _u :: t; .. .m!?!I,: i~ ::1 :?i ., - .1Nx ..,4N."'.".1-.-11 U :::.::4?:.?:.,:!.::.?:?-:4q?.4 ' M...'... 1... .-1 :N:1e%:1::.NIP.:.:-::.X:- %::.:,.!:..:..,::.:!.?:4 1?.% S9!: .:1??:1?: iNX: S:?q8:2,.!i? :: I -'N ;; ..-:I 8?1i::.? .?:: A..-.X 8:: ?:I% 1 11 .. ....I..."....I...".-111111 -1it ".' a . :r . .N . 1N W ?? ?I N :8M!Xtm i,
!G
.:X-%:m ::Q?:.:.:.:4?:.:4:.:4?c?4.:.:4m... I., -I :Tn::,::?::? . i??~lR. I. . I . X:::'-''::.":: .:i:::?:?:.:?.:.:.:.::.:.:.:::::: it.1:. :X:4::-":X-:.ilill~I., ".......?.......... ?i~i~iii??. :::.:::.::4:'I...:;;:: 44.?l~ ...........?........................1"..-1'1.1-1. . . . :":::'::::,::4?::,:::.:::.,:::":: ".".":..'.':,:"?'..?:"N:-:N:: ::X:.iXi:: . .1 " '. .?:.:t::.:!.:.:!?:.m.:%:::!"% :.:: :. : - .- 4:'.::l .ff..? ,::..4.:l'i .... I~I....:.... .1.1.... .:.:..........II~I~.... .:: .... I
.
-.
.I
iw
.%W
i!
: : : : : :, .: : ,: :;4,: :
".,
I~
'4:4"i:1 ? :::? ::4 I~1.II.m::. 'pm : ?X:NNX':X-X : N : ::; X!-:: ?:X:X!.:XoX :: -:i:. m ?1 - t. XX :.,::,:I.: :: :.:c?:,:?,.s :.?.2:.1 "I :.. . .1 ..... I .?. 1-. ' 11 .." 1':::'::. . ::X.......1." .-."- ...F1.I.I..I.-mm- :??;,?;-;;-;?.:?::.:?:.?.:.?:.:4:4::.::.:.:..4::.:.:c.:. :::6t4?b .II.. .- ?11! MMl:lIm 'N 11- 4 'X-:4:.~ . _ : - t ;::.?tll?:.:.:0:0:.?1:.:1m2 :6o.. :?8::???:!?! X-I..X '"'' .M ,:MX?.?.:,:::?..,:,:.m:":.:!?:.:.:::................... 1I?!a:3: m; : '_ :? XX ::X: !:: ,:..:,::.-MRNM:..1.".-.".".,.,-,.E?:.ma:W .:.:.::."..,,".....,?.x.l. :4 ,1? X:F ::8:8Np :x8. ........I.,1 lll : :X:X...I I?, 1-m% t :?1? O;i 0??gr,4,"".,..,??......iIIN ?;:t?! :. : :::!?::::4.:.:. .4 ,,,,1?... ..: . ???-;.,:c-;. ;.-?I?. .:!,M, -::1.:4.:X.:: . N-: :, 11.11. I:x'..:MX:: 4 .: ' ?I .... Ni .., ::: :1 .............. ...... ~-N'4N:?:?:? :?;.;c::- o:::X ?.,:.:.. ...""!:::4 tti:ul_.;N:.i'NI...???MI;1:1?-1?'??:18XM :? PF :l.:::::--:.:.?? 1 ?II.. -..'.l- .....N:?'N '.4 N.4 :8 ..N.:?:..,"!:3XI .:X--11 ----. I' .......... : I. .:1?:":!:C-:4?
MrI...41 .... ? NX4 1 X , :?. ? :X4NX-: :: ''1 :: 1 'm 91 I............. '4? I,........ I~ :loxai:?: :I ? ic :I? -p i-g -1:-:- '1 NX44-4.X-XX:X' X t--XX N .I.::.444 :::-:? :%;:!11-!NaM1.4 X ;1N1:1 - I?:1,?:1::!?::.::1:1.:1.:::.::?:I?::.::!.::.:?.:?1:?.,:.?.: ---:-::? - :-.:X ::I X-? 11 N,44RI41..4.l:,:..:x~m::?I 81mm m!'.?ij:X:X .' 8% .?:.?:.?:.?:! ::: X:-? ?g 5.1i? '.,.:XM... it8,1?g'vs?p~gqii.N. ?.g. ?iI:I :.,:!.::!.::. ??,:IN ?4X, ......%"1 .:4.::.::.,I?.:.::.??.,::.?:.,::.::4?::?:. ::.::!.::. __. :.::?:4-?:8,? pp'pt M dX :X8?NIN N 8 it:.S.!X.9-?:.?:.?:.::41:: uF. .:NN.. " I .6i:: :!,::?::1:1?:!":tl?:"::"::.:1.:.:?.,?.,:.::::!::.?:.?.:::..?:.:?? .?:! 1 X! :? I' :i?:!., :!.I:?.::.::!.::?::: ::-':st:sXRX. I.:IUM xt"m ....... ..%:X. -,:i .?:! .:: .:? .:: ::
the data in the spatial dimensions. Figure5 shows some of these Jitter objects operating on the image data produced from a j i t . qt .movie object. The Worldof OpenGL OpenGLis a cross-platform standardfor drawing two- and three-dimensional computer graphics, designed to provide a common interface for different types of graphics hardware.It is used in a variety of applications from video games to the Mac OS X Window Manager.It consists of two interdependent parts: a state machine with a complex, fixed internal structure for processing graphical data, and an API (application programminginterface) in the C programminglanguage for interacting with the state machine. The state machine defines a sequence of steps by which image textures can be applied to the faces of geometric primitives and renderedas seen by a virtual camera to create a final image. Many parameters that control the final image are defined, including a complex lighting model and extra processing steps that can be applied to the final rendering.
Some or all of the operations defined by the OpenGLstate machine may be implemented by hardware GPUs (graphicsprocessing units). Owing to the high degree of parallelism that can be applied to the task of drawing graphics, the past decade has seen affordableGPUs increase in speed dramaticallyfaster than CPU speeds. This has prompted software developers to move more and more drawing tasks to the GPU, and even some non-drawingtasks such as audio processing. (See gpgpu.organd graphics.stanford .edu/projects/brookgpufor examples of this.) Generating and Manipulating OpenGLData in Jitter OpenGL accepts images in a variety of formats, including the four-planeARGBformat used by Jitter. Input from live video, recordedmovies, or synthesizing objects such as j it. noise can be used directly in OpenGL. Geometries are defined in OpenGL by applying low-level geometric primitives to lists of vertices in three-dimensionalspace. The primitives define how a given sequence of vertices should be drawn.Examples of primitives are GL_LINE_STRIP, which connects all the vertices in the list with a line from first to
61
V5 V4 V4
V5 V4
V5
V2
V3
V3
V2
V3
VV,
V2
V1
V,
vV
last; GL_TRIANGLES, which connects each triad of vertices with a triangle; and GLTRIANGLESTRIP,
which draws a series of triangles between the vertices to form a strip (see Figure 6). OpenGL geometries, like images, are stored in Jitter using matrices. Each vertex of the geometry is stored as one cell of a float32 matrix with one or two dimensions and a number of planes that can range from three to thirteen. If the matrix has three planes, the planes specify the x, y, and z components of the location of each vertex. Additional data are specified at each vertex if groups of additional planes are present, as described in Table 1.
When a geometry matrix is sent to a j it .gl . ren-
8-11 12
r,g, b, a e
der object, the symbol specifying which OpenGL primitive to use can be appendedto the j it_matrix message as an additional atom, or it can be communicated separately to the j it . gl. render object.
In addition to the primitives defined by OpenGL, Jitterhas two of its own primitives, TRI GRIDand QUAD_GRID, which were added to accommodate two-dimensional geometry matrices. Renderingusing TRI GRID and QUADGRID creates connections
from each vertex to other vertices in its local neighborhood in the matrix grid. If not using TRI_GRID
or QUADGRID with a two-dimensional geometry
Models are definitions of geometry used to draw the characters and objects in video games and computer-animated movies. A variety of data may be stored with a model, including coordinates and parametersfor mapping multiple surface textures onto the geometry. Models are stored in files in a variety of different formats. One common, open format is the . obj file, created by WavefrontTechnologies (Rule 1996). Jitter's j it. gl.model object can read and draw . obj files into an OpenGL scene. Other objects provide different ways of creating
geometries. The j it . gi . gridshape object defines
matrix, the matrix is divided into rows or columns depending on the value of the geom_rows attribute of the j it. gl. render object, and the resulting one-dimensional matrices are drawn directly in OpenGL (see Figure 7). Geometries can come from a variety of sources.
62
a variety of geometric shapes that can be used to define collections of vertices at different spatial resolutions. The j it. gl. text3d The j it . gl. plato object can render
V40 V00
V40
V30
Vol
Vl1
ol
V41 V31
V21
Vol
V11
V21
V41
V31
vertices in 2D matrix
GL_LINE_STRIP, geom_rows = 1
the five Platonic solids (tetrahedron,hexahedron, octahedron, dodecahedron,and icosahedron). Finally, geometries can also be created directly by patching in the Max environment. Any matrix can be interpreted as an OpenGLgeometry, provided it consists of floating-point data and has the necessary three or more planes. By using Jitter'smatrix operators and built-in type conversions, a variety of shapes can be defined mathematically. Data such as moving images, the results of sound analysis, or gestures from controllers can be converted into geometry data. The j it. pack and j it. unpack objects can split the geometry matrix for processing of the desired planes, recombining it to send to the j it . gl. render object after processing. Most of the Jitter objects that are commonly used as video
Of course, since we are in the digital world, we must discretely sample a continuous parameterto make it available for analysis, so an argument can be made that distinguishing between a discrete event and a continuous parameteris only valid in theory. After all, one can treat each sample in a signal as a separate event, and conversely, one can construct a signal out of an aperiodic series of events through any number of heuristics, such as simply incorporatingvalues from new events into some sort of integration filter. Regardless,it is helpful to segment the discussion into the two types of input when thinking in terms of what one may want the visualization algorithm to do. Events In a live performance setting, one or more of the musical performersmay be using a device capable of sending discrete messages directly to the computer. The most common format for this type of input is of course the venerable MIDI protocol; despite the protocol's not having changed much in more than 20 years, modern off-the-shelf USB and Firewire MIDI interfaces still provide a convenient port of entry into the computer. In addition, some performershave designed standalone or extended instruments with circuitry to communicate with a computer via some other means, for example, using direct serial communication. Alternatively, one can analyze and extract events Jones and Nevile 63
j it. glop, operate on data of arbitrarytype and plane count. This enables their use as geometry filters as well, often with surprising and interesting results. A j it .matrixset object can be used to store animated geometries in the same way it is used to store animated images.
from a signal that represents some aspect of the performance. This signal could be in the form of an audio feed from a live microphone or a mixing board, or it could be a stream of gesture data, sampled at high rates with a device such as the Teabox (Allison and Place 2004). Multiple channels of signal input that represent different dimensions of a performance can be analyzed simultaneously, potentially with cross-correlations taken into account for nonorthogonal dimensions. Extraction of events from a signal is a complicated topic that has been thoroughly researchedby electrical engineers. The simplest detector is a threshold test, which compares the current value of a signal to a threshold value. When the status of the comparison changes-a transition from below the threshold to above, or vice versaan event can be triggeredby the comparator.But even a simple comparatorlike this becomes complicated when we consider that noise corrupts every signal and obscures the true value of the parameter. It may be possible to construct a filter to minimize the effects of the noise, but doing so necessarily introduces latency into the detection. It is therefore impossible to optimize a system for both latency and accuracy simultaneously, so one must settle for a compromise. Indeed, the threshold test is often the buildingblock for more complicated tests that rely on specific types of filters. For instance, if one wishes to detect a known variation in the signal-a particular sound in an audio signal, perhaps, or a defined gesture-it has been shown that the optimum detection process involves running the signal through a matched filter with an impulse response that mirrors the desired amplitude variation, and then performing a comparison with a fixed threshold whose value can be determined by the desired statistics of the detection process (Schwarzand Shaw 1975). Similar techniques exist for matching in the frequency domain after a transformationfrom the time domain using FFTs,wavelet-based methods, or other filtering operations. Miller Puckette's bonk- object (Puckette 1997a) for his signal-processing environment pd (Puckette 1997b), and which Ted Apel has ported to MSP (crca.ucsd.edu/-tapel/software.html), was designed to be a percussion detector. It tracks the envelopes of all 21 bands of a constant-Q filter64
bank to detect a rise in energy of rapidity above a certain threshold, and then it matches the detected energy spectrum to one of a series of known spectra. Parameters Unlike events, which arrive sporadically and may triggernew processes in the algorithm, a value for each parameterof a visualization algorithm is needed for every frame rendered.When a request is made to render a frame, the algorithm often need only know the current value of the parameter;in other words, values that the parameterpassed through en route to the final destination may not be important. However, if a parameteris rapidly changing, it is important to consider the aliasing effects that can take place in the large "downsampling" between the audio and video domains. If the effective frequency of a parameterchange is higher than half the visual sampling rate or frame rate of the visualization algorithm, the aliased parametervalues in the resulting video may not represent the parameter accurately. Movie film in most common formats has a sampling sample rate of 24 frames per second; this is fast enough to represent fluid moving imagery, but only if properfiltering is done. Fast motion must be low-pass filtered with respect to time to produce the appearanceof motion continuity. In analog film, this filtering is providedby the camera's shutter, which integrates the moving image during each frame, approximatinga box filter. This integration appearsto us as motion blur. To ensure smooth motion in digital work, then, it is sufficient to low-pass filter our parametersignals with a cutoff frequency of approximately 12 Hz if the visual effects controlled by those parametersare also filtered properly.However, owing to the computational expense of renderingmotion blur digitally, properfiltering is not normally present in real-time work; instead, faster frame rates and parameter sampling rates are typically used. MIDI continuous control (CC)messages are common communicators of parametersin the world of music. However, the inherent resolution of 128 values that a generic control allows is coarse, the protocol cannot simultaneously transmit multiple Computer Music Journal
Figure 8. Two ways of transferringamplitude data from the signal to event domains. The network on the left uses the peakamp- object to compute the largest value re-
ceived since the last parameter update. The network on the right uses an accumulator to sum the values received since the last update.
ttr
ses
Wi Ib"
lsgm~sfltbwblly
it
dimensions of data (because it is serial), and it is common to overwhelm a MIDI port by sending too much control data. Despite these shortcomings, MIDI CCs are adequate for many purposes. The most recently received value can be cached and used to generate a parametervalue when needed. Often, in the audio domain, a separate slew parameter controls how quickly the signal slides toward the target parametervalue as a strategy to prevent the "zippernoise" caused by large jumps in a signal value, which manifest themselves as impulses in the audio path. This can also be an effective strategy in the visual domain, especially in algorithms that employ recursive feedback where an impulse can have a long-term effect. In many situations, signals are the natural method for communicating parameters. Sending a bang message to the snapshot- object provides a convenient way to transform the value of a signal into a message for use by a Jitter object. If more precision is required,the event- object allows one to synchronously sample multiple signals with a signal-based triggermechanism. Raw audio data can be analyzed in a variety of ways to produce meaningful musical parameters. An amplitude metric can be calculated by squaring the sample values of the input signal, which for an audio channel in most cases is limited between the range of -1 and 1. It is also common to scale the value logarithmically to approximate our percep-
tion of loudness. Then, because the signal oscillates between large and small values, it is not enough simply to take a single value from the signal and use that as the estimate; to estimate accurately the overall amplitude of the signal, we must employ some kind of filter that examines the signal over a certain amount of time. Figure 8 illustrates two different methods of handling this problem: the first uses the peakamp- object, which keeps track of the largest value received since the last parameter
request. The second uses the +=- accumulator ob-
ject to add all the values received since the last parameter request. Note that the peakamp- and accumulating methods produce numbers with very different orders of magnitude, so the scaling factors that follow the networks must be different for the same expected range of output values. Alternatively, Tristan Jehan'sloudness- external (available online at web.media.mit.edu/-tristan) employs spectral methods to estimate the time-domain energy of a signal. Parameterscan also be calculated from a frequency analysis of the performed audio. FFT-based schemes are common, although the latency they introduce can be problematic, and the more resolved a frequency division desired, the longer the latency. Such an analysis can be accomplished with an algorithm implemented using MSP's built-in FFTtools, or the artist can use the variety of custom objects made for the job. Ted Apel's centroid- external
Jones and Nevile 65
takes the output of an f f t- object and estimates the spectralcentroid, the spectral "average" weighted by amplitude. Miller Puckette's fiddle- object is a polyphonic pitch tracker that produces the detected pitches both as messages for stabilized pitches and as signals for a "continuously" updated estimate.
Some humans perceive frequencies as high as 20,000 Hz, whereas our eyes refresh the image that is sent to our brain at a rate as much as three orders are of magnitude lower. Correspondingly,MSP operates and noisinessTristan Jehan's brightnessestimators of spectral centroid and spectral flatness, with very strict timing, and Jitter does not. In fact, Jitterwas designed so that its frame rate would adapt respectively, and his bark- object provides a specto the processing power available after audio and tral analysis with bands chosen according to an aumodel rather than the linear spacing between high-priority event processing was finished. The opditory eration of this variable frame-ratearchitecture infrequencies that an FFTprovides. volves a system of event queues and priorities that The j it . catch- object provides the ability to transcode data from a signal into a Jitter matrix in a the remainder of this article attempts to explain. The internal Max/MSP engine changeda greatdeal variety of ways. Forinstance, it is possible to request the most recent N samples of data, or every during the transition from the cooperative multionly since the last request, or the most recent tasking world of Macintosh OS 9 to the pre-emptive sample frame of data centered around some sort of threshmultitasking environments of Mac OS X and Windows XP.In version 4.5 of Max/MSP,there are three old (like the triggerfeature on an oscilloscope). The it . catch- object allows one to introduce signal j primary threads of execution. The first is called the main thread. This low-priority thread is responsible data directly into the matrix world, where further for short, expensive operations, such as the servictake place, or where direct synthesis analysis can can be the result. The j it. graph object (see Figing of the user-interface objects. The main thread calculates user-interface actions and redrawsthe ure 9) renders one-dimensional audio matrices as screen when one clicks on a button object or changes two-dimensional waveform displays. In addition to the value in a number box. The-high priority scheddisplaying these directly, one can use them as further fodderfor synthesis-for instance, as keyframes uler thread operates on time-sensitive data, such as for compositing, or as geometric manipulators. incoming data from MIDI interfaces, or bangs emaIn terms of analysis, moving audio into the manating from a clocked object like metro. Finally, the trix world affordssome tricks that can be taken advery high-priority thread that operates on the MSP signal vectors is called the perform thread. vantage of if the results for intermediary samples are not needed. For example, if an FIRfilter is used, Although the perform thread is given a higher priority in the operating system's thread-scheduling only the final sample must be calculated. perhaps mechanism than the scheduler thread, and the Because renderedvideo needs less than a tenth of a as many frames as renderedaudio, calculat- scheduler is given a higher priority than the main percent thread, any thread can interrupt any other at any ing the results of the FIRfiltering in the matrix dotime in a pre-emptive multitasking operating sysmain can be a considerable gain in efficiency and affordexpensive analysis that could not other- tem. Indeed, on a computer with more than one may wise be accommodated in MSP's signal domain. CPU, more than one thread can even execute simulSimilar efficiencies can be exploited with FFT anal- taneously. This can lead to confusing situations for Max programmerswhen a patch has elements that ysis, but not with recursive IIR filtering. However, the results of IIRfilters used for analysis are not pro- operate in different threads, occasionally interruptduced in the audio domain, and so the unpleasant ing one another in the midst of a calculation. The organization of the scheduler thread is fursound of a highly nonlinear phase response is no ther complicated by two options that can be set in longer an issue. This allows the use of more effiMax's DSP Status dialog box. If "overdrive"is discient filter types, such as Chebyshev and elliptical filters (Antoniou 2000). abled, all scheduler data is processed in the main 66 Computer Music Journal
Figure 9. A visualization
network using j it . catchand j it. graph. The j it. catch- object
rates the three planes of the output, and the j it . graph objects render the data in a different
semi-transparent color to the same output matrix, which is finally displayed in the window.
qmneiro2
:344.5-|hs.aw
......................
. . .. . . .. .. .. . .. .. ....
thread. If both "overdrive"and "scheduler in audio interrupt"are enabled, all scheduler data is processed in the performthreadimmediately priorto the calculating of the signal vector. If "overdrive"is enabled but "scheduler in audio interrupt" is not enabled, the scheduler thread exists normally. These three
configurations allow Max programmersto tailor the execution of high-priority events to their needs. Disabling overdrive removes the special status of scheduler messages but increases efficiency through the elimination of the additional thread, whereas executing the scheduler in the perform thread enJones and Nevile 67
Figure 10. (a) A patch with two counters banged in the scheduler thread by the output of metro objects. The output of counter A on the left is deferredto the main thread using a usurp
mechanism, whereas the output of counter B is deferred as usual (see the text). (b) A schedule of some executing messages from the patch illustrated in (a).
sures the most accurate timing but reduces the amount of time the perform thread has available to calculate the signal vector. With the latter option, one must be careful to keep the scheduler operations very short. A lengthy scheduler process increases the risk of exceeding the amount of time available to process the signal vector, which may result in audible glitches. Regardless of the configuration of the scheduler, if the processing resulting from a clocked action is expensive, it is usually wise to transfer the execution of the processing to the main thread, where a lengthy operation will only delay the execution of other time-insensitive actions. This can be accomplished using the defer object or j it . qball. Conversely, execution can be transferredfrom the main thread to the scheduler thread using the delay object with an argument of 0. Some objects operate in both threads;qmetro, for instance, functions as a metronome internally by using the scheduler thread to clock when bangs should be produced,but instead of sending the bangs in the scheduler thread, it defers their output to the main thread. This deferralis done using a usurp mechanism: if a message is waiting in the main thread'squeue and has not yet been produced, any new message coming from the same object will replace the old message in the queue. Figure 10 provides an illustration of the usurp mechanism in action. The two counter objects in Figure 10a are deferredto the main thread;the counter on the left ("counterA") uses a usurp mechanism, and the counter on the right ("counterB") does not. Figure 10b illustrates a schedule of executing messages as they are passed from the scheduler thread to the main thread. On the first line, counter A has sent out a 1, and this message is placed in the main thread's queue. The second line sees the counter send out a 1, which is also placed on the queue. On the third line, counter A produces a 2, which, owing to the usurp mechanism, replaces the 1 from counter A that was waiting to be produced. The fourth line illustrates the main thread'sprocessing of the first event in the queue, as well as the output of 2 from counter B. The fifth line shows the output of 3 from counter A, which is addedto the front of the queue because no other messages from counter A are waiting to be produced. On the sixth
68
(a)
usurp jitlqball@mode
defer i~0t.qbal111mode
(b)
counter A counter B queue
(scheduler thread) (scheduler thread) (mainthread) deferred usurped
b3
b2
Fb3?
1 I b 31 a 31b 21 b 1-
3a3b2 b2
line, the "3" message from counter B is placed at the front of the queue. The seventh line shows the processing and removal of counter B's 3, as well as the replacement of the 3 from counter A with the new output of 4 owing to the usurping mechanism. In the case of qmetro, the usurp mechanism ensures that only a single bang from the qmetro is ever waiting to be sent out in the main thread's queue. In a situation where a clocked object is connected to a network of Max objects that perform some expensive computations, and bangs are output more quickly than the network can perform the computations, the usurping mechanism prevents stack overflow. Networks of Jitter objects that operate on video matrices iterate over many thousands of pixels. Accordingly, these demanding calculations typically take place in the main thread. In fact, on multiprocessormachines, Jittermaintains a pool of threads for its own use in iterating over large matrices. Figure 11 shows the execution flow in the matrix_calc method of a multiprocessor-capableJitter object. The Max programmerneed not think about these
Computer Music Journal
Figure 11. In a multiprocessor environment, the calculation of some expensive operations is divided between the main thread and one or more worker
threads. This division of labor does not affect the relationship between the main thread and the scheduler and perform threads.
mainthread
work threadA
workthread B
matrix calc
process (wait)
I
process process
I
join
quickly resetting qmetro as described above effectively means that the Jitterprocessing will consume all available processing power after the perform and scheduler threads have done their work. The best way to estimate the different computational loads of different Jitternetworks is therefore to compare their frame rates, something that is easily done with the fpsgui object, a graphicalobject that can be connected anywhere in the network to provide valuable feedback about the frame rate, data type of the matrix, and other information. It is worth noting that the frame rate of the computer's monitor or video projectoris a practical upper limit on the rate that output imagery can be updated. It is a waste of computational power to synthesize images more quickly than the output medium can display them.
Conclusions
extra threads, other than to know that they can significantly speed up iteration over large matrices. It is common to drive a Jitter network with a qmetro object set to a very short period between bangs. Owing to the usurping mechanism discussed above, the result is that the Jitternetwork calculates new frames as quickly as possible given the computational resources available. Because the scheduler thread and perform thread execute concurrently, processing of audio and time-dependent events is not affected. The frame rate of the video output is dependent on the available computational resources. Because a modern operating system typically operates several dozen processes in the background, each of which requires a varying amount of time to be serviced, the available computational resources are constantly fluctuating. Therefore, the frame rate of the video output also fluctuates. Fortunately, the eye does not have the same stringent periodic requirements as the ear. This variable frame-ratearchitecture requires a different mindset from that requiredby MSP when evaluating the computational load of a patch. Because it has a fixed rate at which it must process samples, with MSP the computational load can be defined as the ratio of the time taken to calculate a single sample to the period of the audio signal. On the other hand, driving a Jitter network with a Max/MSP is a widely used system for creating audio works. With the addition of Jitter,new visual dimensions are available for electronic artists to explore. We have discussed mappings between sound and image in light of various considerations, from the theoretical treatment of human psychology to the practical demands of the programmingenvironment. Certain mappings make intuitive sense because they have a basis in physics or human perception, or because they are learned in a cultural context. The mappings chosen for a particularwork of visual music help define the work's style: an internal logic that creates a ground for meaning. It is our hope that Jitterwill prove to be an effective tool for the implementation of novel mappings, and that the resulting instruments will help to communicate new styles of visual music.
References A Allison, J.T., and T. A. Place. 2004. "Teabox: SensorData Interface System." Proceedings of the 2004 International ComputerMusic Conference. San Francisco:International Computer Music Association, pp. 699-701. Antoniou, A. 2000. Digital Filters:Analysis, Design and Applications, 2nd ed. New York:McGraw-Hill. Jones and Nevile 69
R., ceedings of the ThirdIntercollege Computer Music Bevilacqua, F.,MWiller, and Schnell, N. "MnM:A Toolbox." Proceedings of the InterFestival. Tokyo: Keio University, pp. 1-4. Max/MSP Mapping national Conferences on New Interfaces for Musical Puckette, M. S. 1997b. "PureData: Another Integrated Computer Music Environment."Proceedings of the Expression (NIME),Vancouver,B.C., Canada,2005. Second Intercollege ComputerMusic Festival. Donnadieu, S., S. McAdams, and S. Winsberg. 1994. Tachikawa:Kunitachi College of Music, pp. 37-41. "Context Effects in 'Timbre Space'." Proceedings of the 3rd International Conference on Music PercepV. Ramachandran, S., and E. M. Hubbard.2001. "Synaesthesia: A Window Into Perception, Thought and Lantion and Cognition. Liege, Belgium: ESCOM, guage." Journalof Consciousness Studies 8(12):3-34. pp. 311-312. iotaCenter. 2000. Kinetica 2 Exhibition Catalog. Los Rule, K. 1996. 3D GraphicsFile Formats:A Programmer's Reference. Boston: Addison-Wesley. Angeles: iotaCenter. Kohler,W. 1929. Gestalt Psychology. New York:Liveright. Schwarz, M., and L. Shaw. 1975. Signal Processing:Discrete Spectral Analysis, Detection, and Estimation. Mattis, 0. 2005. Visual Music. London:Thames and Hudson. New York:McGraw-Hill. Meyer, L. B. 1956. Emotion and Meaning in Music. Werner,H. 1934. "L'unit6des sens." Journalde Psycholoof Chicago Press. Chicago:University gie Normale et Pathologique 31:190-205. Puckette, M. 1988. "The Patcher."Proceedings of the Zicarelli, D. 1998. "An Extensible Real-Time Signal Pro1988 International ComputerMusic Conference. San cessing Environmentfor Max." Proceedings of the 1998 Francisco:International Computer Music Association, International ComputerMusic Conference. San Francisco: International Computer Music Association, pp. 420-429. M. S. 1997a. "PureData: Recent Progress."ProPuckette, pp. 463-466.
70