Academia.eduAcademia.edu

Hexaphonic Guitar Transcription and Visualization

2016

Music representation has been a widely researched topic through centuries. Transcription of music through the conventional notation system has dominated the field, for the best part of the last centuries. However, this notational system often falls short of communicating the essence of music to the masses, especially to the people with no music training. Advances in signal processing and computer science over the last few decades have bridged this gap to an extent, but conveying the meaning of music remains a challenging research field. Music visualization is one such bridge, which we explore in this work. This research presents an approach to visualize guitar performances, transcribing musical events into visual forms. To achieve this, hexaphonic guitar processing is carried out (i.e. processing each of the six strings as an independent monophonic sound source) to get music descriptors, which reflect the most relevant features of a sound to characterise it. Once this information is...

HEXAPHONIC GUITAR TRANSCRIPTION AND VISUALISATION Iñigo Angulo Music Technology Group Universitat Pompeu Fabra [email protected] Sergio Giraldo Music Technology Group Universitat Pompeu Fabra [email protected] ABSTRACT Music representation has been a widely researched topic through centuries. Transcription of music through the conventional notation system has dominated the field, for the best part of the last centuries. However, this notational system often falls short of communicating the essence of music to the masses, especially to the people with no music training. Advances in signal processing and computer science over the last few decades have bridged this gap to an extent, but conveying the meaning of music remains a challenging research field. Music visualisation is one such bridge, which we explore in this paper. This paper presents an approach to visualize guitar performances, transcribing musical events into visual forms. To achieve this, hexaphonic guitar processing is carried out (i.e. processing each of the six strings as an independent monophonic sound source) to get music descriptors, which reflect the most relevant features of a sound to define/characterise it. Once this information is obtained, our goal is to analyse how different mappings to the visual domain can meaningfully/intuitively represent music. As a final result, a system is proposed to enrich the musical listening experience, by extending the perceived auditory sensations to include visual stimuli. . 1. INTRODUCTION Music is one of the most powerful art-expressions. Through history, humans have shared the musical realm as part of their culture, with different instruments and compositional approaches. Often, music can express what words and images cannot and thus remains a vital part of our daily life. Advancements in technology over the last decades have brought us the opportunity to go deeper into developing an understanding of music, in the context of other senses such as sight, which dominates over other senses for representing information. In this work we propose a system to extend music by developing a visualisation/notation approach to map the most important features that characterise musical events into the visual domain. Copyright: © 2016 I. Angulo, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License 3.0 Unported, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Rafael Ramirez Music Technology Group Universitat Pompeu Fabra [email protected] The idea of providing mechanisms for understanding music using our eyes is not new, as traditional music notation (i.e. scores) may provide us with an idea about the acoustic content of a piece without the need of previously listening to it. However, our approach is not intended to design a performance instructor, but a visual extension of the musical events that compose a performance/piece. Our goal is to develop a system that is able to visually represent the musical features that best characterise the music produced by a guitar, that is, to develop a real-time visual representation system for guitar performances. One challenge for the development of such a system is the polyphonic nature of the guitar. The complexity of polyphonic sound transcription is well known, so to solve this issue we opted to use a hexaphonic guitar, in which each of the strings is processed as an independent monophonic sound source, simplifying the transcription of the sounds. We propose a way of transforming a conventional classical guitar into a hexaphonic one. Once the desired musical features are obtained, different ways to represent them are studied, analysing the mappings between sound and visual dimensions. The aim of this work is to offer a tool in which information about the musical events (e.g. pitch, loudness, harmony) of a guitar performance is visualized through a graphical user interface. Additionally, these visualisations could enrich the experience of listening to music by accompanying musical events with visual stimuli. 2. STATE OF THE ART A music notation system can be any symbolic notation that transmits sufficient information for interpreting the piece [1]. In our case, we wanted to explore the interconnection of music and visual fields, following the concept of bimodal art expression, the result of joining music and visuals, in which both dimensions are equally relevant since they are not perceived separately, but as a whole [2]. Throughout history, this kind of works has often been discussed in relation to “synaesthesia”. This is a neurological condition in which a stimulus in one sense causes an involuntary answer in another one. The disadvantage about synaesthesia, from a scientific point of view, is its idiosyncrasy, which means that two synaesthetes with the same type of synaesthesia will most likely not share the same synaesthetic experiences [3]. This characteristic means we cannot develop an objective basis for the interconnection of visual and musical domains based on synaesthetic esperiences. 2.1 Music Visualisation 2.2 Music Visualisation Systems Music visualisation is a field that has attracted the attention of many researchers, scientists and artist for centuries. Many attempts have been made to create a machine that joins music and visuals, as for example the “Color Organ” (or “Light Organ”) created by Louis Bertrand Castle in the 1730’s. Further examples of early inventions, and the history of this field can be found in [4][2]. As mentioned previously, synaesthesia seemed to play a very important role in the early works in the field. Despite the subjectivity of synaesthetic cases, many people have proposed different theories to try to accurately relate musical and visual domains. An example of this is the notes´ pitch to colour mapping, a topic that has been addressed by many researchers in the past, see Figure 1 [4]. At the present, many systems exist whose aim is combining sound and visual dimensions in music, generally having an artistic approach (“subjective approach”), as the output often consists of abstract imagery that accompany music. As technology has progressed, so have the tools that permit the exploration of the relation between these art modalities. Artist from many disparate fields dedicate themselves to experiment within this field, and this activity has led to the development of different practices through time. One example could be Video Jockeys (VJ´s), who perform by mixing videos and applying effects to them, normally in the context of some background music. Furthermore, concepts such as Visual Music or Audiovisual Composition appear together with these practices [2]. One example of a Music Visualisation system is Soma [2]. It was created in 2011 by Illias Bergstrom, for the live performance of procedural visual music/audiovisual art. The author wanted to improve the established practice of this field ́s art form and break through it ́s main limitations, which included constrained static mappings between visuals and music, the lack of a user interface for controlling the performance, the limitation of collaborative performances between artists, and the complexity of preparing and improvising in performances. With those ideas in mind, he proposed a system, both in hardware and software, to address these limitations. Figure 2 reflects the architecture of this system. Figure 1. Pitch to colour mappings through history. Nowadays, the music visualisation field is very active. The idea of joining sound and visuals to enrich the experience of music continues to attract the attention of many researchers, and there are different theories that support this hypothesis. An example of this is Michel Chion ́s widely established concept of synchresis: “(...) the spontaneous and irresistible weld produced between a particular auditory phenomenon and visual phenomenon when they occur at the same time”. This theory, which asserts that synchronised auditory and visual events provides “added value”, is vital knowledge for sound for cinema. These theories, and especially synaesthesia contributed to the argument that there is a strong indication that multimodality perceptions are not processed as separate streams of information, but are fused on the brain into a single percept [2]. When talking about music visualisation, many different approaches are comprised, which lead to different purposes. These may include, the simple representation of a waveform or a spectrum to visualize a signal into the frequency domain; the transcription of sound as accurate as possible, using scores or other notational system (“objective approach”); and the artistic visualisation of sound, which aim to create beautiful imagery to accompany music and create a sensory answer in the listener/viewer (“subjective approach”). In the next sections some examples of the different approaches are presented. Figure 2. Illustration of the signal flow in Soma system. Another interesting example of a music visuals system is Magic [5]. Magic is an application that allows one to create dynamic visuals that evolve from audio inputs. It is though as a tool for VJing, music visualisation, live video mixing, music video creation, etc. It allows one to work with simultaneous audio and MIDI inputs, both prerecorded tracks and live input signals, computing different features in each case, for example: the overall amplitude of the signal, the amplitude per band, pitch, brightness; or MIDI features such as velocity, pitch bend or channel pressure. Figure 3 shows the graphical user interface of the software. The boxes are modules which have different functionalities and are interconnected to create scenes. Figure 3. Magic Music Visuals graphical user interface. The previously presented systems (Soma and Magic) are examples of systems that permit the visualisation of music, by creating mappings between sound and visual features. Thus, the created visualisations are directly controlled by the “changing-along-time” musical features extracted from the audio signals. These features, such as pitch, loudness or rhythm, are normally computed using signal processing techniques. However, many of this kind of systems are more generic as they are designed for controlling the visuals from a group of DMI (Digital Musical Instruments) controllers, or directly from a stereo audio signal representing the mix of all the instruments that compose the piece. We propose a system specifically designed for the guitar. 2.3 Music Visualisation Systems based on Guitar The systems analysed in this section approximate to the “objective approach” domain, as their visualisation aim consists of the transcription of music so that it can be reinterpreted. In other words, these visualise music from a transcription/notation perspective, and entail a performance instruction rather than a “creative visualisation” of the music that is being played. One of the possible reasons that contributed to the popularity of these systems was the release of Guitar Hero [6], a videogame that appeared in 2005, which aimed to recreate the experience of playing music and make it available to everyone as a game. It consisted of a DMI guitar-shaped controller through which music was “interpreted” by the player, who was guided by the instructions that appeared on the screen. These instructions consisted of the notes that compose a particular song, presented over time. So, with the original song´s backing track sounding, the aim of the player was to press buttons on the guitar controller in time with musical notes that scroll on the game screen. Another game called Rocksmith [7] was released in 2011. This game followed the main idea of music performance instruction (as Guitar Hero), but with an essential difference: a real guitar was used instead of a DMI controller. The idea behind the game was to be able to use any electric guitar, so it was approached as a method of learning guitar playing. The game offered a set of songs, for each of which a performance instruction was presented based on the notes that had to be played along time. Then, some feedback about the performance quality was given to the user. There are other systems that follow the same approach of guitar music visualisation as notation, to give the user the necessary instructions to reinterpret a particular piece. Some examples of this are GuitarBots and Yousician [8]. These systems provide an easier way of learning to play guitar by helping the user with instructions about what to play. The system we propose in this paper lie into this last category of “objective” music visualisation system, which aims to transcribe music accurately. Our goal is to provide a guitar performance representation tool, transcribing sounds to visual forms instead of traditional notational systems (i.e. scores), which could result more meaningful for people with no musical training. This system is able to reproduce musical events in real-time, creating thus visual stimuli for both the listener/viewer and the musician. However, although our first intention is to accurately reproduce guitar performances, we also want to explore the artistic representation domain (once we have enough sound and visual features to be mapped) by creating more abstract/impressive visualisations, aiming to analyse if this leads to a stronger sensorial perception of music by the user of the system. 3. MATERIALS 3.1 Essentia Essentia [9] is an open-source C++ library for audio analysis and audio-based music information retrieval. The library is focused on the robustness of the music descriptors it offers, as well as on the optimisation in terms of the computational cost of the algorithms. Essentia offers a broad collection of algorithms, which compute a variety of low-level, mid-level and high-level descriptors useful in the music information retrieval (MIR) field. This library is cross-platform and provides a powerful tool that collects many state-of-the-art techniques for the extraction of music descriptors and optimized for fast computations on large collections. 3.2 Processing Processing [10] is a programming language and environment targeted at artists and designers. It has many different uses, from scientific data visualisation to artistic ex- plorations. It is based on the Java programming language. In the context of music visualisation, Processing has available several libraries to deal with audio, such as Minim, which lets one work with different audio formats and perform many different signal-processing techniques to obtain musical features. The visualisations can be controlled with the results of computing these features, and range from raw frame data that correspond to visualizing data as precisely as possible to the perceived music, (for example drawing the waveform of a frame), to generative visualisations that focus on producing beautiful images and impressive effects. 3.3 Hardware This research project also focuses on the hardware implementation of the hexaphonic guitar. Our approach is based on transduction using piezoelectric sensors. These sensors work on the piezoelectric principle, which states that electric charge is accumulated in certain solid materials in response to applied mechanical stress. Thereby, one of these sensors is able to transform the mechanical force that is exerted on it by the vibration of the string, into an electric current representing this vibration. In addition to these sensors, other materials such as wires to build the circuits and 1⁄4” TS (Jack) connectors to be plugged into the computer ́s audio input device are needed. 4. METHODS: DEVELOPMENT 4.1 Hexaphonic Guitar Construction As explained earlier, some hardware is needed to transform a traditional guitar into a hexaphonic one. It consists of six piezoelectric sensors, six 1⁄4” TS jack connectors and twelve wires to interconnect them. With this material a circuit was built to capture the signal from each string independently and send it through a cable to the computer ́s audio input device, via a common audio interface with six channels. Figure 4. Hexaphonic guitar construction scheme using piezoelectric sensors. The construction is shown in Figure 4. Each piezoelectric sensor is welded with a jack connector as shown in the scheme. The tip of the jack (the shortest part in the end) is connected to the white inner circle of the piezoelectric, and will transmit the signal. The sleeve of the jack is thus connected to the golden surface of the piezoelectric. Each of the sensors is cut and placed between the string and the wood of the bridge of the guitar, which is where we found the vibration of the string was best captured by the sensor. Once this is done, for each of the strings, the output jack connectors are plugged into different channels of the audio interface, so that the signals can be independently processed. These sensors act like small microphones capturing the sound produced by each of the strings. 4.2 Audio Signal Processing Once we had our six separate audio signals, corresponding to each of the strings, we processed them using the Essentia library to obtain meaningful musical features. For our purpose, several descriptors were used to extract the desired features from the sound, such as PitchYinFFT, Loudness and HPCP, which extract information about the pitch, energy and chroma of the notes. These were computed in real-time and sent to Processing, where they were used to control the visualisations. In addition to this, a map of frequencies was created, corresponding to the frequencies of the notes along the fretboard of the guitar, in standard tuning. Hence, obtaining the fundamental frequency of the note played on each of the strings (which is easy as the signal for each string comes on an separate input channel into the computer), tells us exactly which frets (and on which strings) were being played at a given moment. 4.3 Visualisation To perform the visualisation of the musical features previously extracted, we used Processing. Using this software, a graphical interface was created to visualize the musical features using different visual forms (Figure 5). At present, we have developed a simple visualisation to test the system. The amount of musical features involved, as well as the quality of the mappings and visualisation approach will be revised and enhanced in the future. We presented the information in a 2D-plane, in which the X-axis represented the six different strings and the Yaxis the pitch height. The notes were represented using circles. From left to right (on the X-axis) strings were visualized as vertical lines from the 6th to the 1st one. For example, when playing any note on the 4th string, it will be always represented on the same “invisible” vertical line, which crosses the X-axis at a particular point (corresponding to that 4th string). The loudness of each note was mapped to the size of the circle, which becomes smaller as the note decays. Also, having built the frequency map, it was easy to know which particular note was played, so the name of the corresponding note is plotted in the center of the circle. To easily distinguish one note from others, these were mapped to the range of visible colours. The lowest frequency on the guitar (E in standard tuning) is mapped to the lowest frequency in the visible range (red). The reason for this was more aesthetical than scientific. Figure 6 shows an approximation to this mapping. indicate the complexity he/she found when doing the first two experiments. Figure 7. Chord progression 1 score. Figure 8. Chord progression 2 score. Figure 9. Melody score. Figure 5. Example of visualisation interface. Figure 10. Arpeggio score. Figure 6. Note to colour mapping. 5. EVALUATION 5.1 Experiments As this research project is still a work in progress, we prepared a simple evaluation based on some basic guitar “riffs/phrases” visualisations. We focused on four different guitar phrases: two different chord progressions, a melody, an arpeggio, and a solo. The phrases were played in the same key, in order to produce similar visualisations (same colours, localisation of notes, etc). We proposed three different experiments to the users. In the first one, one of the two different chord progression recordings (Figure 7 and Figure 8) was presented to the user, and then the visualisations of the two chord progressions were shown in silence. The user had to choose the visualisation that matched the audio recording. The second experiment was the opposite, given one visualisation (presented in silence), the user had to select from two recordings the one that matched that visualisation, in this case, using fragments of the solo and melody (Figure 9) phrases. In addition, the user was asked to The last test consisted of listening to all the phrases (sequentially, presented as a song) together with their corresponding visualisations, and afterwards, answering some questions to rate the system. The questions evaluated the system in terms of: • mapping quality and meaningfulness, • expressiveness, subjectively evaluated by the user considering if the visualisations led to a stronger experience of music (multimodal perception), • interest, if the system was considered interesting/promising by the user • utility, in which context would a system like this one be used by the user. The answers consisted of a score from 1 to 5 to express agreement, disagreement or neutrality, in addition to a text box in which the users could write their opinion, suggestions, or ideas for improvements. 5.2 Results The experiment was conducted with 20 participants whose ages ranged from 21 to 55. They had different backgrounds and musical training. Besides, their musical taste was varied, as well as the frequency with which they went to concerts and listened to music. Table 1 shows the summary of the results of the experiments. Test 1 Test 2 Correct answers 80% 75% Difficulty (1-5) 2.9 3.2 Table 1. Experiment results. 80% of the users were able to identify the correct answer to the first experiment, with a difficulty of 2.9 (the mean of the 1 to 5 range, where 1 was easy and 5 difficult); and 75% of the users answered correctly to the second experiment. In particular, participants with musical education and/or guitar players found the task easy, were able to distinguish between the three visualisations, and even imagine how the music would sound before listening to it. Mapping quality/meaningfulness Expressiveness Interest Score (1-5) 4.3 4.2 4.8 Table 2. System valoration. Table 2 shows the users valoration of the system in term of mapping quality and meaningfulness, expresiveness and interest. The score ranges from 1 to 5, in which 1 means disagreement and 5 is strong agreement. Several comments were made about the mappings. Most users found intuitive the proposed connections between the sound and visual domains, but many of them argued about the use of colour to identify notes. Also, most of the users liked the experience of simultaneous music and visuals, but some of them said the visualisations were very basic, and suggested that developing more “artistic” visualisations would work better and transmit more sensations. All the users found the system very interesting, and suggested different contexts in which it could be used. Most of them proposed using the system in live music performances and concerts, to reinforce the emotions a particular music piece tries to evoke in the listeners; some participants suggested that the system could be used as a didactic tool to help people learn playing guitar, and musical concepts in general. Moreover, some participants said “as a tool to emphasize sensorial experiences for infants in primary education and give support in art classes”, or even “for helping disabled people (i.e. people with hearing problems) to perceive and experience music”. 6. CONCLUSIONS Music is one of the most powerful art-expressions, and advancements in technology have opened new paths towards its exploration. Nowadays, many researchers focus their work on accurately representing music, but without forgetting its most emotive dimension of evoking sensations in ourselves. Our interest resides in guitar music representation to offer, through the system described in this paper, a way to visually experience it. Throughout the experiments that were carried out, we noticed people found the system interesting and promis- ing in many different contexts, and they liked the experience of simultaneous music and visuals stimuli. This research project is a work in progress. However, the process of making an evaluation and obtaining some feedback from the users was really useful to get ideas to improve the system, and also to demonstrate the attractiveness of this system for the public. The next step we will take is to study how more musical features could contribute to a more useful system for the musician (as more detailed and complete information about musical events would be included), as well as making it more attractive/captivating for the audience, by the design of new approaches to visualizing the artistic dimensions of music. Acknowledgements. This work has been partly sponsored by the Spanish TIN TIMUL project (TIN201348152-C2-2-R) and the H2020-ICT-688268 TELMI project. 7. REFERENCES [1] A. P. Klapuri, “Automatic Music Transcription as We Know it Today,” J. New Music Res., vol. 33, no. 3, pp. 269–282, 2004. [2] I. Bergstrom, “Soma: live performance where congruent musical, visual, and proprioceptive stimuli fuse to form a combined aesthetic narrative,” 2011. [3] R. Ivry, “A Review of Synesthesia,” Univ. California, Berkeley, 2000. [4] M. N. Bain, “Real Time Music Visualization: A Study In The Visual Extension Of Music,” The Ohio State University, 2008. [5] “Magic Music Visuals: VJ Software, Music Visualizer & Beyond.” [Online]. Available: https://magicmusicvisuals.com/. [Accessed: 14-Jun2015]. [6] “Guitar Hero Live Home | Official Site of Guitar Hero.” [Online]. Available: https://www.guitarhero.com/. [Accessed: 01-Nov2015]. [7] “Rocksmith® 2014 | Página oficial de España | Ubisoft®.” [Online]. Available: http://rocksmith.ubi.com/rocksmith/es-es/home/. [Accessed: 01-Nov-2015]. [8] “Yousician.” [Online]. Available: https://get.yousician.com/. [Accessed: 01-Nov-2015]. [9] D. Bogdanov, N. Wack, E. Gómez, S. Gulati, P. Herrera, O. Mayor, G. Roma, J. Salamon, J. Zapata, and X. Serra, “ESSENTIA: an open-source library for sound and music analysis,” Proc. ACM SIGMM Int. Conf. Multimed., 2013. [10] C. Pramerdorfer, “An Introduction to Processing and Music Visualization.”