Multimodal Methods For Researching Digital Technologies Carey Jewitt
Multimodal Methods For Researching Digital Technologies Carey Jewitt
Multimodal Methods For Researching Digital Technologies Carey Jewitt
Chapter 17
Carey Jewitt
This chapter provides an introduction to the field of multimodality and discusses its
outlining what multimodality is, its theoretical origins in social semiotics, and its
ensemble and meaning functions. The scope and potential of multimodality for
researching digital technologies are then discussed. The chapter sets out an illustrative
What is multimodality?
provide concepts, methods and a framework for the collection and analysis of visual,
aural, embodied, and spatial aspects of interaction and environments (Jewitt, 2009;
Kress, 2010). While other modes of communication, such as gesture, have been
recognized and studied extensively (e.g. McNeill, 1992), multimodality investigates the
ensemble. Multimodality emphasizes situated action, that is, the importance of the
social context and the resources available for meaning making, with attention to
people’s situated choice of resources, rather than emphasizing the system of available
resources. Thus it opens up possibilities for recognizing, analyzing and theorizing the
different ways in which people make meaning, and how those meanings are interrelated.
connection between the meaning potential of a material semiotic artefact, the meaning
potential of the social and cultural environment it is encountered in, and the resources,
intentions, and knowledge that people bring to that encounter. That is, it strives to
connect the material semiotic resources available to people with what they mean to
signify in social contexts. Changes to these resources and how they are configured are
particular interest to multimodality because they make a wide range of modes available,
often in new inter-semiotic relationships with one another, and unsettle and re-make
genres, in ways that reshape practices and interaction. Digital technologies are thus a
Underlying this approach is the idea that language, and other systems or modes of
communication (e.g. gesture, gaze), is shaped through the things that it has been used to
accomplish socially in everyday instantiations, not because of a fixed set of rules and
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
options (or ‘semiotic resources’) for communicating. With this emphasis, a key question
for multimodality is how people make meaning in context to achieve specific aims.
The first assumption underlying multimodality is that while language is widely taken to
multimodal ensemble. Multimodality ‘steps away from the notion that language always
plays the central role in interaction, without denying that it often does’ (Norris, 2004:3)
and proceeds on the assumption that all modes have the potential to contribute equally
to meaning. From a multimodal perspective, language is therefore only ever one mode
nestled among a multimodal ensemble of modes. While others have analyzed ‘non-
verbal’ modes, multimodality differs in that language is not its’ starting point or provide
a prototypical model of all modes of communication. The starting point is that all
the meaning potentials of resources and the purposes for which they are chosen.
The second assumption central to multimodal research is all modes have, like language,
been shaped through their cultural, historical and social uses to realize social functions
and through the social. This also draws attention to the ways in which communication is
constrained and produced in relation to social context and points to how modes come
This connects with the third assumption underpinning multimodality - that people
orchestrate meaning through their selection and configuration of modes. Thus the
communication is not in and of itself, however, new digital media have foregrounded
and their semiotic function in contemporary discourse worlds (Ventola, Charles and
Kaltenbacher, 2004). The meanings in any mode are always interwoven with the
meanings made with those of other modes co-operating in the communicative ensemble.
A brief background
Multimodality was developed in the early 2000s (see Kress and van Leeuwen, 2001;
Kress et al, 2001, 2005; van Leeuwen, 2005; Jewitt, 2009). It originated from linguistic
social semiotic system. Halliday’s work shifted attention from language as a static
linguistic system to language as a social system - how language is shaped by the ways
that people use it and the social functions that the resources of language are put to in
particular settings. In Language as Social Semiotic (1978) Halliday sets out a theory of
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
Hodge and Kress in Social Semiotics (1998) and later by Kress and van Leeuwen in
Reading Images (1996, 2006) expanded attention from language to other semiotic
systems (or modes), laying the groundwork for extending and adapting social semiotics
across a range of modes and opening the door for multimodality. Kress and van Leewen
communicate ideologies and discourses. Multimodality has taken ideas from linguistics
that are theoretically transportable to other modes, such as turn taking, coherence,
composition, and it has explored the currency of these in relation to the particularities of
other modes. In doing so it has extended and adapted Halliday’s conception of meaning
across a range of modes by taking the specific resources and organizing principles of
spoken and written language as a starting point, and extending their essence to other
modes in ways that recognize that the resources of gesture, gaze, image differ in
significant ways. As multimodality has developed it has also looked beyond linguistics
for resources to assist with analysis and to further explore the situated character of
meaning making including sociolinguists, film theory, art history and Iconography and
musicology.
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
within social semiotic multimodal analysis. The context shapes the resources available
for meaning making and how these are selected and designed. Signs, modes, and
meaning making are treated as relatively fluid, dynamic and open systems intimately
connected to the social context of use. From this perspective analytical interest in the
modal system (its resources and principles) is strongly located in (and regulated
through) the social and cultural. When making signs people bring together and connect
the available form that is most apt to express the meaning they want to express at a
given moment.
Kress introduced a strong emphasis on the social character of meaning and developed
the concept of the motivated sign (Kress, 1997). This served to foreground the agency
of the sign maker and the process of sign making. In Before Writing (Kress, 1997) he
engagement with texts, how they interpret, transform and redesign the semiotic
resources and signs available to them – what has been described as chains of semiosis.
From this perspective, signs (e.g. talk, gestures, and textual artifacts) are analyzed as
their interpretative and design patterns and the broader discourses, histories and social
factors that shape that. In a sense then, the text is seen as a window onto its maker.
Viewing signs as motivated and constantly being re-made draws attention to the
interests and intentions that motivate a person’s choice of one semiotic resource over
another (Kress, 1993). This ‘interest’ connects a person’s choice of one resource over
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
resources that are available to the person are an integral part of that context – hence the
to understand how the use of digital technologies extends the range of resources for
writing, and has the potential to significantly reconfigure notions of spatiality and
embodiment as well as genre conventions all of which can lead to adapted and some
Key concepts
This section outlines in more detail six concepts introduced above that are key for
Mode
This term refers to a set of socially and culturally shaped resources for making meaning:
‘grammar’ of the modal system to be broken is seen as a ‘test’ that it exists. Another
‘test’ for whether a set of resources can count as a mode is whether it is possible for it to
articulate all three of Halliday’s meaning functions: that is, can a set of resources be
Accepted examples of modes include writing, image, moving image, sound, speech,
gesture, gaze and posture in embodied interaction. What constitutes a mode is a subject
of debate. For instance, van Leeuwen (1999) has explored when sound and music can
be thought of as modes, while Bezemer and Kress (2008) have discussed whether
colour and layout can be considered as modes. As these examples suggest, modes are
created through social processes, fluid and subject to change - not autonomous and
fixed. For example, the meaning of words and gestures change over time. Modes are
Semiotic resource
This term is used to refer to a means for meaning making that is simultaneously a
material, social, and cultural resource. In other words a semiotic resource can be
with them:
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
our vocal apparatus, the muscles we use to make facial expressions and gestures
– or technologically – for example, with pen and ink, or computer hardware and
software – together with the ways in which these resources can be organized.
Semiotic resources have a meaning potential, based on their past uses, and a set
concrete social contexts where their use is subject to some form of semiotic
This definition highlights the historical development of connections between form and
meaning, aligned with Bakhtin’s notion of intertextuality. Kress (2010) emphasizes that
these resources are constantly transformed. This theoretical stance presents people as
agentive sign-makers who shape and combine semiotic resources to reflect their
interests.
Materiality
Materiality refers to how modes are taken to be the product of the work of social agents
shaping material, physical ‘stuff’ into cultural semiotic resources. This materiality has
inscription, while gesture offers different material potentials to colour, and so on. All
modes, on the basis both of their materiality and of the work that societies have done
with that material (e.g. working sound to become speech or music) offer specific
potentials and constraints for making meaning. The materiality of modes also connects
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
meaning.
Modal affordance
The term modal affordances is contested and continuously debated within multimodal
research. It originated from the psychologist James Gibson’s (1979) work on perception
latent in an environment, in which the potential uses of any object arise from its
interests. Donald Norman later took up this term in relation to the design of artifacts
Adapted by Kress (e.g. 2010), the term ‘modal affordance’ refers to the potentialities
communicate easily with the resources of a mode, and what is less straightforward or
even impossible – and this is subject to constant social work. From this perspective, the
connected to both the material and the cultural, social and historical use of a mode.
Modal affordance is shaped by how a mode has been used, what it has been repeatedly
used to mean and do, and the social conventions that inform its use in context. As
its history of cultural work, its provenance, shapes the meaning potential of a semiotic
these are open to change and disruption). The affordances of the sounds of speech for
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
with (speech) sounds. The logic of sequence in time is difficult to avoid for speech: one
sound is uttered after another, one word after another, one syntactic and textual element
the possibilities for putting things first or last, or somewhere else in a sequence. The
mode of speech is therefore strongly governed by the logic of time. Like all governing
principles they do not hold in all contexts and are realized through the complex
interaction of the social as material and vice versa – in this sense the material constitutes
the social and vice versa. Modal affordance suggests all modes are partial in making
meaning, so that the designed selection of modes, into multimodal ensembles, allows
Multimodal ensembles
Representations or interactions that consist of more than one mode can be referred to as
a multimodal ensemble. The term draws attention to the agency of the sign maker – who
pulls together the ensemble within the social and material constraints of a specific
outcome or trace of the social context, available modes and modal affordances, the
technology available and the agency of an individual. When several modes are involved
in a communicative event (e.g. a text, a website, a spoken interchange) all of the modes
combine to represent a message’s meaning (e.g. Kress et al., 2001; Kress et al., 2005).
The meaning of any message is however distributed across all of these modes and not
necessarily evenly. The different aspects of meaning are carried in different ways by
each of the modes in the ensemble. Any one mode in that ensemble is carrying a part of
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
and speech and writing are no exception (Jewitt and Kress, 2003). Multimodal research
attends to the interplay between modes to look at the specific work of each mode and
how each mode interacts with and contributes to the others in the multimodal ensemble.
This raises analytical questions, such as which modes have been included or excluded,
the function of each mode, how meanings have been distributed across modes, and what
the communicative effect of a different choice would be. At times the meaning realized
by two modes can be ‘aligned’, at other times they may be complementary and at other
times each mode may be used to refer to distinct aspects of meaning and be
contradictory, or in tension. Lemke noted (2002: 303) ‘No [written] text is an image. No
text or visual representation means in all and only the same ways that text can mean. It
ensembles raises the question of what image is ‘best’ for and what words, and other
modes and their arrangements are ‘best’ for in a particular context. The relationships
between modes as they are orchestrated in interactions (and texts) may itself realize
(Martinec and Salway, 2005) or modal density in an ensemble (Norris, 2009). The
elements that may contribute to the expansion of meaning relations between elements.
The question of what to attend to, what to ‘make meaningful’ is a significant aspect of
meaning makers decide on modal ‘best fit’ and how to combine modes for a particular
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
ensembles can enable the analyst to unpack how meanings are brought together.
Meaning functions
meaning as social action realized through people’s situated modal choices and the way
they combine and organize these resources into multimodal ensembles. It distinguishes
between three different but interconnected categories of meaning choices (also called
meaning), that is, the resources people choose to represent the world and their
experience of it, for example, what is depicted about processes, relations, events,
2. Choices related to how people articulate Interpersonal meanings, that is, the
themselves and those they are communicating with - either directly via
interaction or via a text or artefact. For example, the visual or spatial depiction
of elements as near and far, direct or oblique, are resources used to orient
choice of resources such as space, layout, pace and rhythm for realizing the
meaning potential: ‘what can be meant’ or ‘what can be done’ with a particular set of
semiotic resources and to explore how these three interconnected kinds of meaning
potentials are actualized through the grammar and elements of their different modal
systems.
A key point to draw attention to here is that the concepts outlined in this section can be
(Jewitt, 2002), a classroom with or without technology (Jewitt, Bezemer, and Kress,
operating theatre (Bezemer et. al, 2011). Thus, a researcher can employ multimodality
tangible environment) as well as how people make use of these resources in interaction.
This section gives a sense of the scope and potential of multimodality for researching
digital technologies: how it has been used to date, the kinds of questions it can be used
to address, and what research insights it can provide to inform the evaluation of
technology design and use. The following four potentials of multimodal research are
environments;
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
4. Contribution to research methods for the collection and analysis of digital data
available to people when using a technology in a particular context. This may be done
through a systematic description of the modes and their semiotic resources, materiality,
Building on the notion of meaning as choice and the concept of the meta-functions
some multimodal researchers use a style of diagramming called system networks to map
These map the potential of modal resources to articulate content, interpersonal and
preferably be of the either or type. As described by Kress and van Leewen (2006), for
instance, a visual image may either be a ‘demand for information’ (a kind of visual
for information’, in turn may be either ‘polar’ (yes/no question), or open, and so on.
When analyzing other modes than language, some semiotic relations are better
described as scaled along a continuum – for example the semiotic dimensions of colour
have been mapped as a set of continuum scales concerning hue, brightness, luminosity,
and so on (Kress and van Leeuwen, 2002). System networks provide an analytical tool
for mapping the range of semiotic resources and options made available by a mode in a
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
To date system networks have been used to describe the semiotic options available
and colour (Kress and van Leeuwen, 2002, 2006), action (Martinec, 2000), sound, voice
and music (van Leeuwen, 1999), as well as three-dimensional objects (e.g. tables,
Bjorkvall, 2009). Networks have been used to explore multimodal genres and
multimodal ensembles including online newspapers (Knox, 2007; Caple and Knox
2012), film and media texts (Bateman, 2008), and interactive media texts (White, 2012).
inventories can be of use in both understanding the meaning making potentials and
interaction and how users of those technologies notice and take up those resources in
different ways. This can inform both the re-design of technological artifacts and
environments as well as their introduction into a set of practices e.g. for learning or
work.
Multimodal researchers have also used system networks to focus on how modal
resources are taken up and used in a specific context. They map and compare people’s
choice of mode, semiotic resources in specific contexts and some examine how these
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
who are focused on meaning making as a process and are thus perhaps less concerned
with mapping the resources of the mode itself use system networks as a much looser
heuristic tool to explore meanings. Multimodal studies investigate how these resources
are used in specific contexts and how people talk about them, justify them and critique
them in order to understand how semiotic resources are used to articulate discourses
across a variety of contexts and media for instance school, workplaces, online
The import of the body and spatiality in the contemporary digital landscape is evident in
emergent bodily interaction based technologies (Price et al, 2009). Much work has been
done on the classroom as a multimodal environment of learning and the role of position,
posture, gesture and gaze has been shown to be key to learning and teaching in the
production of school English and Science (e.g. Kress et al, 2001, 2005). Multimodal
attention to how bodily modes and space feature in interaction – their semiotic resources
and affordance has potential for researching digital technologies. For instance, Wii
games serve to reconfigure the relationships between players physical (and therefore
social relationship) bodies, now with digital sensory feedback via wrist bands and body
straps, virtual avatars, and the screen in ways that require physical digital mapping in
interesting ways for what it means to collaborate and ‘play together’. Multimodality
provides a set of resources to describe and interrogate these re-mappings, for example to
get at the interaction across the ‘physical’ and the ‘virtual’ body. This type of digital
body is also relevant for understanding online multimodal interaction. Jones in his
analysis of how people construct and consume multimodal displays of their selves in
available for producing and consuming displays affects the kinds of relationships that
are possible between users of these sites and the kinds of social actions that these
displays allow them to take’ (Jones, 2009: 82). A focus on mode, semiotic resources,
interaction in these complex sites. For instance multimodal research in the surgical
operating theatre shows the interactional impact of digital technologies being inserted
into older established social environments (Bezemer et al, 2011). Surgeons undertaking
key hole surgery work in screen based digital environments that like the wii re-orientate
their gaze, body posture, team configurations, and require them to engage in physical-
digital mapping. A multimodal approach also asks if the use of blended physical-digital
tools of applications like those discussed here generate new forms of interaction and
questions of digital identities and literacy, notably in the field of education (Marsh,
2006; Alvermann, 2002; Jewitt and Kress, 2003). It has also has been used to analyse
the orchestration of music, filmic shots and editing features in video productions, digital
animation, and games, (e.g. Burn, 2009; Walton, 2004) as well as online environments,
(Jones, 2009) and more recently interactions with mobile and Geographic Information
The relationships across and between modes in multimodal texts and interaction are a
central area of multimodal research, and multimodal research often investigates the
relationship between a given context and the configuration of modes in a text or situated
interactions – both to better understand the modal resources in use and to address
this work, for instance understanding how multimodal cohesion (van Leeuwen, 2005) is
realized (or not) through the integration of different semiotic resources in multimodal
texts and communicative events via rhythm, composition, information linking, and
The ways in which contemporary digital texts are organized via textual features such as
digital layering and hyper-linking and the impact of this on how people navigate
multimodal digital texts has also been examined (Lemke, 2002; Zammit, 2007). This
work is potentially useful when thinking about the take up of designed resources (e.g.
Jewitt, 2008). There is a large body of multimodal research that explores the dynamics
of the interaction between image and language. This includes the early work of Kress
and van Leeuwen (1996) on the visual articulation of meaning, Lemke’s (1998) work on
the role of image and writing in science textbooks work by Martinec and Salaway
(2005) re-thinking Barthes’ classification of image-text relations thereof, and Kress and
typography, colour and layout in school textbooks. Focusing on multimodal texts, for
instance, Kress and Bezemer investigated the learning gains and losses of different
online learning resources. They provide a multimodal account of the changes to the
significance. They conclude that image and layout are increasingly meshed in the
construction of content and colour so that layout and typography can increasingly be
book Technology, Literacy and Learning, (2008), explores the fundamental connection
between a range of modal resources (including colour, image, sound, movement and
gesture, and gaze), digital technologies, knowledge, literacy and learning. In this and
other work she shows how teacher and student engagement with the modal resources
made available by technologies reshapes practices such as reading and writing, and the
ways in which students and teachers interact in school science, and English in particular
ways and explores its impact on learning. These studies show how digital technologies
stretch, foreground and in some cases remake modes, semiotic resources, and their
new uses
In addition to creating inventories of modes and semiotic resources and analyzing how
discovery and development of new semiotic resources and new ways of using existing
semiotic resources.
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
that resource has been, is, and can be used for purposes of communication, it is
drawing up an inventory of past and present and maybe also future resources and
their uses. By nature such inventories are never complete, because they tend to
The discovery and development of new modal resources is linked to social change and
society’s need for new semiotic resources and new ways of using existing semiotic
resources as the communicational landscape changes. Two factors central to this are the
society. Digital synthesizers and other digital technologies, for example, have reshaped
the possibilities of the ‘human’ voice to create new semiotic resources and contexts for
the use of ‘human’ voices – in digital artefacts, public announcements, music and so on
(van Leeuwen, 2005). This digital re-shaping of voice has in turn impacted on the non-
digital use of voice – for example providing different tonal or rhythmic uses of the non-
digital voice not previously imagined. Modal semiotic resources common to print based
texts, such as textual linking, layering, layout, and the organization of time are also
for example, has explored how online newspapers has reshaped newspaper layout,
genres, the relationship of image, writing, and video, and has mapped the ‘wash-back’
2007, Caple and Knox, 2012). Adami (2009, 2010) has examined the multimodal
patterns of coherence and turn taking on the social networking site YouTube.
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
space, time and embodiment which digital technologies (e.g. mobile and GIS) make
available, and address questions about how these technologies influence how people’s
Multimodality moves beyond intuitive ideas about what a technology can do, to provide
detailed analysis of the semiotic resources of digital technologies work, what they can
communication and thus makes it possible for these to be discussed, taught and
evaluated. Multimodality can also help to design and implement new uses for semiotic
innovations.
Researchers increasingly need to look beyond language to better understand how people
research methods with respect to digital texts and environments where conventional
concepts and analytical tools may need rethinking. Multimodality makes a significant
contribution to existing research methods for the collection and analysis of data and
environments within social research. It provides methods for the collection and analysis
of types of visual digital data including screen capture data and eye tracking data (e.g.
see Holsanova, 2012), researcher generated and naturally occurring digital video data
(e.g. Bezemer and Jewitt, 2010, 2012; Kress et al, 2001, 2005; Norris, 2004). The use of
digital video technology and a multimodal focus pose what has become a key challenge
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
innovation and experimentation in multimodal approaches. This might range from the
inclusion of line drawings and stills from video footage, the use of software such as
Comic Life and Transana (e.g Plowman and Stephen, 2008; Flewitt et. al, 2009; Baldry
and Tibault, 2005; Bezemer and Mavers, 2011). As already discussed multimodality
provides tools for mapping and analyzing the visual, embodied, and spatial features of
interaction with digital technologies as well as the analysis of music, film, digital
animation, games, adverts and other new media (e.g. Burn, 2009; Jones, 2009; Adami,
Having outlined the scope and potential of a multimodal approach for researching
digital technologies in general terms, the following section illustrates its application.
This short case study concerns the learning mathematical concepts in a digital
years) with the resources of Playground an object orientated programming tool (Jewitt
and Adamson, 2003). The excerpt discussed here focuses on how the students’
emergent conception of ‘bounce’ was shaped through their selection and use of the
modal resources available to them: the full case study is reported elsewhere (Jewitt,
2008).
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
being chased by an alien that fired bombs to catch it. The movement of their characters
(a creature and an alien) and bounce of the bullets were realised using modes and
semiotic resources drawn from static image, writing, and cartoon-visual genres (e.g. a
time-lapse drawing, and a wiggly lines to signify vibration and the sound of an
explosion).
Programming the game in Playground offered the students’ additional modes and
semiotic resources for their design, notably visual ready-made visual elements and
backgrounds, colour, movement, and sound and the removal of the written mode.
Detailed analysis of the students’ game as a product as well as video data of the process
representational commitment, design decisions and thinking on the part of the students.
In particular, they needed to specify the spatial and dynamic relationship between the
elements in the game. The move from the page to screen also underpinned changes in
ideational, interpersonal and textual meaning resulting, for instance, in increasing the
stakes for the little creature: now it will be killed instead of being caught. Suggesting a
shift in the students’ understanding of the affordance (social rules and expectations) of
genre from board game to adventure/action game on the screen. The students’ digital re-
design of the multimodal frame of the game re-defined the game narrative, and the
sideways by arrows and then if [the bomb] touches the bars it goes different ways’. That
not require the students to make explicit the ‘cause’ of this change in movement - the
The digital environment of Playground represents the idea of bounce in three modes and
each provides different semiotic resources for the students’ construction of the entity
‘bounce’. It uses the mode of writing—the word ‘Bouncing’ - to name and classify the
movement in everyday terms. It uses the mode of still image—two images of a spring
regular ordered entity rather than an organic, unpredictable bouncing (e.g. a rabbit).
one of a spring moving up and down between two bars, another of a spring moving
sideways between two bars, and a third sequence of a ball moving at angles within a
square. The animated sequences work to give meaning to the entity ‘bounce’ in the
This introduction of movement as a design resource raised a key question for the
students in their design, ‘what is it that produces bounce?’ and ‘what it is that bounces?’
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
mistake and how to rectify it. Initially, the students programmed the sticks to bounce
(that is they added the behaviour of bouncing to the sticks), placed them on the game
and then played the game and the sticks bounced off. Through their engagement with
the playground environment the students worked out their ambiguities about agency -
ambiguities that the affordances of writing and static image in the paper design masked.
The students used gaze and gesture as a resource to address these questions and the
process of programming bounce in their game. They students created different kinds of
spaces on the screen through their gesture and gaze with/at the screen itself, and their
interaction with and organisation of the elements displayed on the screen. These spaces
marked distinctions between the different kinds of practices that the students were
engaged with. In their creation and use of these spaces the students set up a rhythm and
distinction between game planning, game design and construction, and game playing.
The students gestured ‘on’ the screen to produce a plan of the game: an ‘imagined-
space’ overlaying the screen in which they gesturally placed elements and imagined
their movement, and used gesture and gaze to connect their imagined (idealised) game
with the resources of the application as it ran the program. The temporary and
ephemeral character of gesture and gaze as modes enabled their plans of the game to
The role of gesture was central in their unfolding programming of the bouncing
concept bounce. Initially the two students’ talk and gesture is strongly co-
ordinated and suggestive of a shared vision of how they imagine the bullet
moving (from the alien to the left stick, and then to the top-right stick). When
the students stop acting in unison, however, two alternative versions of the
movement of the bullet emerge (figure 3). Student 1 traces the bullet moving in
a vertical line down to the bottom-right stick. She then traces it in a horizontal
line to the dog, wiggles the pen to indicate somewhere in that area. She is
going from one place to another. Student 2 works with the entity ‘bounce’ as a
more specialised kind of movement. She indicates that a bullet would not move
in a perpendicular line from the top-right to the bottom-right stick (as gestured
by student 1). Holding her finger on the top-right stick she then gesturally traces
an ‘imagined’ stick to the right of the alien before slowly trailing her finger off
the edge of the screen. This ‘gestural overlay’ adds another stick to the visual
design of the game, which in turn enables her to imagine the movement of the
bullet bouncing from the top-right stick to the bottom-right stick, and then off
2. Examining the students’ use of gesture in this way helped to identify areas of
difficulty. The two students’ accounts both end with a faltering tone of voice,
the movement of a bullet would come to an end if it did not hit the dog. Would
the ball keep bouncing or would it go off screen? This is itself an uncertainty of
what is producing the bounce, is it the ball or the something that is hit by the
ball.
3. The students used gesture can be analysed to explore their hypothesis, S2 used a
gestural overlay to ‘estimate’ where the ball would bounce which in turn led to
the amendment of the game: S2’s suggestion that they need to place some
The invisibility, the visual absence, of the bullets at this stage of the design is what
proves to be problematic for the students. They prioritised the meaning of the visual
within the multimodal ensemble of the game and modally, at that point in the game-
design the students were working visually and not multimodally. The students were
looking at the game to decide where to ‘attach’ the bounce: the ‘sticks’ (bars) were
visible on screen but the bullets are ‘within the alien’ and are only visible when the
game is being played. In this visual mode of working the system does not make the
bullets available as something that the students could specify as the object that the ‘I
bounce’ refers to. In short, when working visually the notion of agency depends on
visual presence. In sum, what was made visible on the screen proved to be particularly
important in the students’ design process. The students appeared to associate visual
presence with agency: ‘If it couldn’t be seen it couldn’t be acting’ seemed to stand
This example shows how the availability of multimodal resources changes the
representations that students are working with as well as the work of interpreting them,
particularly what it is that the students need to attend to and what they need to specify.
Finally, it highlights the potential of examining multimodal interaction and the range of
learning.
Although multimodal research has much to offer, it also has several limitations. A
its analysis. How do you know that this gesture means this or this image means that? In
part this is an issue of the linguistic heritage of multimodality, that is, how do you get
from linguistics to all modes. In part it is the view of semiotic resources as contextual,
fluid and flexible – which makes the task of building ‘stable analytical inventories’ of
multimodal semiotic resources complex. It is perhaps useful to note that this problem
exists for speech or writing. The principles for establishing the ‘security’ of a meaning
or a category are the same for multimodality as for linguistics and other disciplines. It is
resolved by linking the meanings people make (what ever the mode) to context and
(combining textual/video analysis with interviews for example) and towards participant
Linked with the above problem of interpretation is the criticism that multimodality is a
kind of ‘linguistic imperialism’ that imports and imposes linguistic terms on everything.
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
origins in social semiotic theory of communication and the social component of this
perspective sets it apart from narrower concerns with syntactic structures, language and
mind and language universals that have long dominated the discipline. This view of
Multimodal analysis is an intensive research process both in relation to time and labor.
Multimodal research can be applied to take a detailed look at ‘big’ issues and questions
through specific instances. Nonetheless the scale of multimodal research can restrict the
innovative ways.
Conclusion
This chapter has provided an introduction to the field of multimodality. It has discussed
what multimodality is, sketched its theoretical origins and presented its underlying
assumptions. Throughout the chapter the key concepts central to this approach have
been introduced, discussed and illustrated through their application within the literature
and in the case study example presented above. In this way the chapter has shown set
out the scope and potential of multimodality for researching digital technologies with
meaning making in complex digitally mediated environments and the evaluation and
to research methods. Finally, the chapter points to some of the limitations and
References
Equinox.
Bateman, J. (2008) Multimodality and Genre: A Foundation for the Systematic Analysis
Bezemer, J. and G. Kress (2008) ‘Writing in multimodal texts: a social semiotic account
3: 191 - 207.
Bezemer, J., Murtagh, G., Cope, A., Kress, G. and Kneebone, R. (2011) ‘Scissors,
Bjorkvall, A. (2009) ‘Practical function and meaning: a case study of IKEA tables’, in
Burn, A. (2009) Making New Media: Creative Production and Digital Literacies. New
Caple, H. and Knox, J. (2012) ‘Online news galleries, photojournalism and the photo
Flewitt, R, Hampel, R., Hauck, M. and Lancaster, L. (2009) ‘What are multimodal data
Gibson, J. (1979) The Ecological Approach to Visual Perception. Hillsdale, New Jersey:
communication, 11(3).
Routledge.
Routledge.
Jewitt, C. (2002) ‘The move from page to screen: the multimodal reshaping of school
Semiotic Historical Account', National Society for the Study of Education Yearbook
110(1):129-152.
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
Kress, G., Jewitt, C., Bourne, J., Franks, A., Hardcastle, J., Jones, K. and Reid, E.
Kress, G., Jewitt, C., Ogborn, J, and Tsatsarelis, C (2001) Multimodal teaching and
Kress, G. and van Leeuwen, T. (2002) ‘Colour as a Semiotic Mode: notes for a
Kress, G.R., van Leeuwen, T. (2001) Multimodal Discourse: the modes and media of
283-302.
Digital Literacy Practices in the Home’, in K. Pahl and J.Rowsell (Eds.) Travel Notes
from the New Literacy Studies. Clevedon, UK: Multilingual Matters. pp. 19-39.
Martinec, R. & Salway, A. (2005) ‘A system for image-text relations in new (and old)
C.Jewitt (ed.) Routledge Handbook of Multimodal Analysis. London: Routledge. pp. 78-
90.
Plowman, L. and Stephen, C. (2008) 'The big picture? Video and the representation of
Sara Price, George Roussos, Taciana Pontual Falcão, Jennifer G. Sheridan (2009)
Amsterdam:John Benjamins.
This is a pre-print version of a chapter to be published in the SAGE Handbook of
Digital Technology research 2013
Walton, M. (2004) ‘Behind the screen: The language of web design’, in I.Snyder and
C.Beavis (eds.) Rewriting Literacy in the Network Society. Hampton, New Dimensions.
White, P. (2011) ‘Reception as Social Action: The Case of Marketing’, in S.Norris (ed.)