Composing Music On Paper and Computers: Musical Gesture Recognition
Composing Music On Paper and Computers: Musical Gesture Recognition
Composing Music On Paper and Computers: Musical Gesture Recognition
Gesture Recognition
Bobby Owolabi
Department of Computer Science
Human Computer Interaction Lab
University of Maryland
College Park, MD 20742
[email protected]
ABSTRACT
Paper is preferred and utilized far more than
computers in the composer’s music creation cycle
because it is the natural medium in which music
notation convention is learned. Current music notation
software utilizes only WIMP interfaces (Windows,
Icon, Menus and point and click). Our system enables
Figure 1: Music created on anoto paper with a
users to create musical compositions by utilizing
digital pen.
digital pen technology and having their work captured
and recognized into music notation in the digital world.
This recognized pen gesture data has the potential of
being imported into a popular musical notation
composition program for editing purposes. Figure 2: Written music as it would appear
entered in Finale software [2].
INTRODUCTION
Despite technological breakthroughs in computing interfaces are not serving composers well in their
technology in the last few decades, paper is being used composition activities because they bear little
increasingly [8]. This is especially true in artistic and resemblance to the handwritten techniques learned by
creative disciplines, such as music, where paper tends many musicians [3]. This shortcoming can cause a
to be utilized a great deal. This state of affair exists hindrance to the music creation cycle.
because of the complementary set of affordances that Software, such as The Music Notepad [3], designed
paper and digital documents provide [4]. Paper is for the tablet PC, is a move towards a more natural
portable, flexible and inexpensive [8]; while, digital interface for composers because of the familiarity of its
documents are easy to transmit, store and modify. For pen based input; however, affordances offered by
instance, when a composer is in the early stages of paper, such as flexibility and portability are often lost.
creating a musical piece, it is far easier to take out a Our system attempts to take the affordances offered
sheet of paper on the spark of inspiration, rather than a by both paper and digital documents and unify them
technological device, because one can express into one system. Users will be able to create traditional
themselves on paper quicker than waiting for a device handwritten music and have their creations captured
to load. At the same, there are reasons a composer and recognized into music notation in the digital world.
would want to use a computer such as instant playback Our system will better support the musician’s music
or the ability to quickly transpose a piece to a new key. creation cycle by reducing the time composers spend
Using the traditional written approach as well as synchronizing their works in both the physical and
computer software in conjunction often presents digital realms.
challenges.
Software applications such as Finale and Sibelieus RELATED WORK
[2][9] enable users to create the same musical notation Investigations have been performed to address many
they would on paper in the digital realm; however, of the difficulties that composers experience when
WIMP (Windows, Icon, Menus and point and click) utilizing today’s musical notation software and also
understand the interactions composers engage in with
paper and computers. Three categories of work that are
relevant to our system include: The Role of Paper and
Computers, Tablet PC Interfaces and Digital Pen
Technology.
The Role of Paper and Computers
In trying to develop an interface that will support
composers in their activities, it is important to
understand how composers create music and their
interactions with computers and paper.
In the Paperroles Project [6], the music creation
cycle of composers was investigated. The description
provided by the Paperroles Project leans more towards
classical composers and culture; however, it provides
an overview of the interactions with paper and
computers by people in the musical domain and can be
summarized into three basic parts.
In the beginning, composers prefer to work on paper Figure 3: Main gestures in the Presto system,
because of the freedom of expression that it offers. reproduced from original text [5]
They are not bound by the limitations of a software
program such as slow input, hardware loading time and
portability issues such as weight and size of hardware. The researchers of the Presto system performed an
In the middle of the cycle, there is a mixture of paper investigation of users’ hand-written music composition
and computer use. Computers provide composers with habits and styles and developed a proposal to replace
ease of modification. Often, composers experience standard music notation with simplified versions to
difficulties during the middle stage because they are make gesture recognition easier and more accurate.
slowed down by slow user input speed. The collection of gestures is presented in a way that
In the end, most composers in the classical genre encourages building a note rather than providing
prefer paper for archival purposes. gestures that are directly mapped to specific music
The Paperroles Project offers a design scenario that gestures (Figure 3).
proposes actually taking the handwritten document of a The Music Notepad is an application that supports
user, utilizing digital pen technology to recognize the music notation entry and also replaces the standard
written gestures and provide the ability for the musical music notation with simplified versions that are
notation to be opened up in an application in digital different from the Presto system. The Music Notepad
form for further editing. Our system aims to system offers a larger set of gestures that map to each
demonstrate the recognizing framework of this music notation object (Figure 4). The Music Notepad
interface. also utilizes a special stylus pen that has buttons that
correspond to different classes of gestures.
Tablet PC Based Interfaces The Presto system offers a smaller table of gestures
A major trend that is currently being investigated is for the user to learn while allowing multiple ways to
pen based interfaces over traditional mouse interfaces. draw gestures. This characteristic would most likely
Such examples include Presto [5] and The Music enable the user to adapt to the system quicker than the
Notepad [3]. Music Notepad. However some of Presto’s gestures do
There are a lot of complexities in detecting and not resemble the gestures they are mapped to; for
recognizing musical gestures because the vast array of instance, the half note (Figure 3). Unfamiliar gestures
possible handwriting styles of users and interpretations increase the learning curve of the system. The Music
of music notation convention. Proposed systems have Notepad’s gestures more closely resemble the gestures
offered simplified gestures that correspond to music they are mapped to and can be drawn in one stroke for
notation objects. With these mappings, designers wish the most part; whereas, Presto’s gestures may take
to offer a quick learnable interface to users, while multiple strokes to complete; consequently resulting in
being able to more accurately detect the true intention slower gesture creation.
of a user’s gestures. The Presto and Music Notepad
[5][3] systems have taken this route.
should be played, called articulations. During this stage
of system development, our main focus will be on the
detection of notes.
Staff
Music is written on a staff. A staff (Figure 5)
consists of a series of horizontal lines, most commonly
five, that are stacked, equally spaced, on top of each
other that run across a page.
Notes
Notes consist of four fundamental parts (Figure 6,
Appendix A for examples): (1) Head, which is an
empty or filled circle, its position on the staff specifies
pitch. (2) Vertical beam, which is connected to the
Figure 4: Gestures from the Music Notepad head. (3) Stem, which intersects vertical beams and
reproduced from original text [3]. Gestures on the left adds meaning to the duration of a note. (4) Horizontal
are drawn by the user. Gestures on the right are beams are extended stems that connect several notes.
resulting gestures drawn by the computer.
Recognizer MusicSheet
User Gesture MusicNotation
Figure 7: Overview of detection process: The user makes a gesture with the digital pen, once the pen is lifted off
the paper, it is sent to the Recognizer to get fundamental shape (ex: line or circle). It is then sent to the MusicSheet
object to get its location on the sheet as well as its spatial relationship with the other gestures on the page. Based
on that information, it is recognized as a MusicNotation object and stored.
Articulations small dots that enables the pen to know the exact
Articulations are symbols drawn around notes that paper, its location on it; thus, enabling it to record user
provide information about how the note should be gestures.
played. For instance, an articulation may express that a
System Components
series of notes should be played smoothly and
Once a user draws a gesture, it goes through two
connected without minimal space of rest between
stages: the initial recognition and spatial analysis on
them. Where as another articulation may express that a
the MusicSheet object (Figure 7).
series of notes should be play very short and fast with
The main elements of the systems are:
more rest space between them (For example, see accent
• MusicSheet - A MusicSheet object is a digital
in Appendix A).
representation of a physical sheet of music. It
is created based on key parameters of the
SYSTEM
physical sheet music such as the y-coordinate
Our system enables users to write musical
values of the staff line and the distance
compositions utilizing digital pen and paper
between staff lines. It consists of a series Staff
technology and have their work captured and
objects. Staff objects consist of a group of five
recognized in the digital world with the potential of
horizontal lines. Each Staff object contains
importing the data into a popular musical notion
gestures that the user has drawn on it. A
composition program.
MusicSheet object allows for ease for common
There are three main components of our system: (1)
queries such as finding which staff a user’s
Preprinted Paper, (2) Digital Pen and (3) a
gesture falls on. In order to query the pitch of a
Recognizer.
given note, the MusicSheet object can query its
Preprinted Paper Staff objects and discover the pitch of the note,
In our system, we preprinted staves on paper and based on where it falls on the lines.
stored the location of the staves of in our digital model • MusicNotation – Represents one of the
of the page. The staves were printed on Anoto [1] standard musical symbols (Appendix A).
paper. This special paper, as described below, enables Recognition is determined by the spatial
the digital pen to record the user’s strokes. relationships of fundamental shapes that the
Digital Pen user draws to build a musical gesture.
The digital pen has the properties of a regular pen; • Recognizer – Detects fundamental shapes that
however, it has a small camera at the bottom that the user draws in one pen stroke. Its main
records the coordinates of the user’s gesture. The component is an implementation of the $1
coordinates can either be stored in the pen’s memory Recognizer [10]. The $1 Recognizer consists
and sent to the computer at a later time or streamed in of a four step algorithm where a candidate user
real-time with Bluetooth technology. The digital pen is gesture is compared with pre-defined templates
able to know its location on the sheet of paper and of various expected gestures.
record the coordinate points of user drawn gestures
because of special properties of the paper. Every sheet
of anoto paper has a unique id number and pattern of
User Drawn Gesture User Drawn Gesture
Bounding Box
Bounding Box
Figure 8: Example of how gestures are linked together Figure 9: Example of a situation where a user draws a
to form one object. If a user draws a gestures that gesture that intersects more than one previously drawn
intersects or is reasonably close to a bounding box, gesture that results in the formation of a beamed note.
then the gestures are merged together to form a new
object.
Recognizing fundamental shapes Each object that is created by the user has a
Written music notation has gestures that convey both bounding box. The recognition state of the bounding
pitch and rhythm called notes. It was found that notes box is governed by an automaton shown Figure 10.
could be represented with 3 fundamental gestures: As a user draws gestures that intersect previously
filled circle, empty circle and line segments with drawn gestures, which are determined by querying the
various orientations. MusicSheet object, the group of gestures becomes one
Our system requires that the user draws the object (Figure 8). A new object and automata is started
fundamental gestures of music notation, circles and when the user draws a gesture that does not intersect
lines, one at a time. This means after each fundamental another gesture.
gesture that the user draws, the user must pick up their The MusicSheet can also be queried to find
pen from the page. Given this constraint, we are able to information such as the pitch of the note.
detect single notes through parsing the sequence of
Detecting Beamed Notes
fundamental gestures that intersect other previously
Our approach for detecting beamed notes was to
drawn gestures.
identify key scenarios where a beamed note is likely to
There are situations where the result of the
occur. For example, if a user draws a gesture that
Recognizer is not sufficient and semantic information
intersects more than one previously drawn gesture,
of the gesture, such as its relation to other gestures
then there is a high chance that the user drew or is
must be taken into account. For instance, for detecting
going to draw a beamed note (Figure 9). Further
filled circles, it is not practical to create templates and
analysis, such as looking at the recognition status of
expect that the points of a template will match up with
the intersected gestures, can verify the recognition.
a user candidate filled circle. Therefore, it is necessary
to do secondary detection checks to see if the candidate Detecting Articulations
gesture is a filled circle. Things such as the number of Detecting articulations currently relies primarily on
points that are on the border of a gesture versus the creating articulation templates for the $1 Recognizer
number of points that are contained within the gesture [10] that is utilized in the system. The system detects
can be investigated. The ratio of the length and height two articulations: ties and accents (Appendix A).
of the gesture can also be taken into account to Currently, detected articulations are not linked to
determine if it is a filled circle. notes; however, as the system is developed, spatial
information of the articulations will be used connect
Detecting Single Notes
articulations with their corresponding notes.
For our purposes, single notes are defined as notes
that have only one head; or more specifically, notes
EDITING
that are not beamed together with other notes. They
If a user wishes to delete a note, they can draw a
include whole, half, quarter, eighth, sixteenth and
“scribble” on top of the note and it will be deleted from
thirty-second notes (See appendix A).
memory.
Figure 10: Parser for Single Notes. As a user draws a gesture that does not intersect another gesture, the
parser for that object starts at the “START” state. As the user draw lines and circles that intersect the
given gesture, the state for the object is updated according to this diagram.
Beamed Thirty-Second
Quarter Note
Note
Quarter Rest
Sixteenth Note
Accent