Towards iTV Accessibility: The MPEG-21 Case
Evangelos Vlachogiannis
Damianos Gavalas
Christos Anagnostopoulos
Department of Product and
Systems Design Engineering
University of the Aegean
Hermoupolis, Syros, Greece
Department of Cultural Technology
and Communication
University of the Aegean
Mytilene, Lesvos, Greece
Department of Cultural Technology
and Communication
University of the Aegean
Mytilene, Lesvos, Greece
[email protected]
[email protected]
[email protected]
George E. Tsekouras
Department of Cultural Technology
and Communication
University of the Aegean
Mytilene, Lesvos, Greece
[email protected]
ABSTRACT
In this paper, the accessibility of the interactive television is being
discussed seen as a primary factor for its satisfactory adoption and
commercial success. This is a presentation of work undertaken in
the context of a research project aiming at delivering iTV services
to disabled children. The project approaches this objective
through the utilization of the upcoming MPEG-21 ISO standard.
Based on that, iTV accessibility is faced through metadata and
adaptation. This work approaches the accessibility of iTV in a
wider manner contrary to previous studies that focus mostly onto
the users with low vision. Such a case study is being presented
accompanied with a discussion of relevant architectures,
technologies and faced design issues.
Categories and Subject Descriptors
H.5.1 [Multimedia Information Systems], J.4 [Social and
Behavioral Sciences]
General Terms
Documentation, Design, Experimentation,
Standardization, Legal Aspects.
Human
Factors,
Keywords
Accessibility, usability, human computer interaction, pervasive
environments, metadata, iTV, MPEG-21, SMIL, XSLT.
1. INTRODUCTION
A challenging application for adaptive web information systems
research has recently been the interactive television (iTV). This
field increasingly adopts techniques and technologies initially
developed for the World Wide Web ([6],[7]). This is more
apparent in the case of IP-TV but generally applies to all kinds of
iTV. Considering also that the number of TV sets is considerably
larger to that of PCs worldwide [20], it becomes evident that the
interaction requirements and specifically the need for accessibility
are crucial. For instance, an iTV user now is in front of a large
number of services (term used for TV channels) with amazing
possibilities. A similar “explosion” occurred in the past in the
World Wide Web and search engines; later on it was the portals
(equipped with search engine facilities) and the adaptation
mechanisms that made the huge information manageable.
Carmichael et al [3] discover similarities between the directions
of iTV with that of the web and further note that the gained
experience from the later has to be transferred to the domain of
iTV to avoid similar mistakes, mistakes which have not been
avoided so far. On the other hand, the different characteristics
mainly regarding to the lack of computer literacy have been raised
focusing on the evaluation of the iTV interface [4]. Even since
1997, RNIB has provided recommendations for the accessibility
of iTV [5]. Carmichael et al. conclude that the accessibility
characteristics that have not yet been given necessary emphasis
are subtitles, captions and audio description [3], characteristics
that are given emphasis in the corpus of the web (WCAG2.0,
SMIL1, SVG2).
The MPEG-21 standard [8] recently released by ISO, aiming at
defining an open framework for multimedia applications, seems
to find a natural fit in the world of iTV. In the related literature
there have been several attempts to incorporate accessibility
issues into the MPEG-21. It comes out that the majority of them
are focused into visual disabilities (e.g. [11], [13], [14], [19]).
Rice [11] presents the difficulties that visually disabled users face
while consuming iTV services. This work gives emphasis into
parameters like screen size, font size and color, icons’
identification and screen layout. The conclusion of this work is
that the best facing approach of the problem situation is
personalization due to the diverging requirements. Thang et al.
[14] proposed systematic contrast-enhancement method to
improve the content visibility for low-vision users, through
MPEG-21 content adaptation. Yang et al. [19] propose a
technique for the accessibility of iTV for people with visually
deficiency, especially color blindness. This technique involves
both the incorporation of MPEG-21 with relating descriptive
metadata and the design of an adaptive system. Berglund &
1
Accessibility Features of SMIL: http://www.w3.org/TR/SMIL-access/
2
Accessibility Features of SVG: http://www.w3.org/TR/SVG-access
Johansson [2] study the benefits of the usage of speech - dialog in
the domain of iTV and concludes to several design considerations.
This paper presents the work undertaken in the context a Greek
national project aiming at using MPEG-21 framework for
adapting iTV’s content to disabled children requirements. The
authors propose a number of view angles which the iTV
accessibility needs to be looked from. From such a perspective,
the focus of the approach is onto the interaction of the
stakeholders through adaptation. Contrary to the majority of the
approaches found in literature, this approach investigates iTV
accessibility in a wider manner without focusing to a specific user
group such as users with low vision.
This paper is organized as follows: Section 2 introduces MPEG21 and highlights its accessibility role. Section 3 presents the
proposed iTV architecture, and finally, Section 4 concludes our
work and draws directions for future work.
2. MPEG-21 and its ACCESSIBILITY ROLE
MPEG-21 is, among others, an attempt to provide the iTV
designer a framework that can offer a big - integrative picture of
an iTV system. Based on that, an indicative scenario has been
devised, including production, delivery and consumption of the
digital content, aiming at identifying the primary entities and the
way these are involved in the overall design outcome (see Figure
1). According to that:
The content designer (CD) identifies the target groups.
The CD, supported by MPEG-21 metadata, describes the
target groups using their characteristics (e.g. blindness) and
associates interaction modes (e.g. auditory description) using
an appropriate authoring tool.
The CD develops the required content components (digital
items) based on the above-decided interaction modes. These
are integrated into the metadata using the authoring tool.
End user A, say blind, wants to consume developed content.
She has already stored her profile. The context of use is
accomplished with attributes like access device capabilities,
audio configuration, time and location of the end user.
The context of use is delivered to the serving system
accompanied by the user request.
The system inferences and maps the user’s context of use
with an appropriate composition of the components of the
content. If, while consuming, the context of use is being
modified, the system needs to be aware so that it can adapt to
new requirements.
Figure 1. MPEG-21 involvement in iTV: a possible scenario.
Even if MPEG-21 addresses considerations for adaptation and
specifically accessibility by including several relating XML
elements into its schema, it seems that on its own this cannot
ensure the accessibility of delivered content. Instead, this is a
fundamental condition for providing accessibility output of the
systems involved. In other words, it should be able to provide the
required infrastructure so that a digital content would be able to
obtain the requisite variety for both the content designer, to be
able to design accessible content, and the involved systems, to
have the required information to deliver an accessible result.
From such a point of view, the content provider, the author (also
referred to as content designer), the authoring tools, the systems
of the content provider and of course the consumer with her
accompanied interaction profile (preferences, device capabilities
etcetera.) are identified and all play a major and cascading role to
the iTV accessibility.
Briefly, the role of the MPEG-21 towards the accessibility of iTV
is revealed through the following dimensions:
Alternative content: MPEG-21 offers metadata (defined in the
Part 7 of the standard [9]) that allows content providers to provide
the content in one or more alternative ways. The ways often refer
to different modalities and thus they can include captions, audio
descriptions, etc.
Digital Content Navigation: In iTV environments, navigation
facilities within available content are provided by an Electronic
Program Guide (EPG). This is actually the interactive portion of
the system that offers the required functionality to the user
including service (channel) selection / retrieval, programs
information and scheduling, profiling / personalizing, rating
and/or even acting upon the content.
Description of context of use (IN PARAMS): The usage context
actually refers to all the information that needs to be taken into
account to adapt digital content according to the user’s
requirements.
Description of presentation parameters of digital content (OUT
PARAMS): This determines what technical characteristics need to
be adapted. An important implementation consideration was the
transformation of MPEG-21 to SMIL as an intermediate solution
to ensure media players’ compatibility. This involved the
mapping between those two infrastructures realized using XSLT.
Device accessibility: This refers to the accessibility of the
involved hardware including remote controls and set-top boxes3.
Content provider accessibility policy: Probably, an important
contribution to the field of accessibility of MPEG-21 is the
capability of applying and claiming for an accessibility policy. In
other words, content providers need to be capable of applying a
kind of accessibility policy based on the target consumer group
and the former’s requirements for quality assurance. For instance,
such a policy could provide for digital content to be accompanied
by subtitles of two languages (e.g. English, Greek) and every
image with an alternative text between two and ten words.
Applying such policies requires a mechanism for validating a
digital content to a policy description and could be for instance
implemented based on Schematron [12], an XML structure
3
http://www.tiresias.org/equipment/settop_boxes.htm
validation language for making assertions about the presence or
absence of patterns in trees.
3. itvSimu ARCHITECTURE
Under the umbrella of our research project, the need for designing
and developing of a simulation platform, acting as an interaction
interface between our iTV architecture and the prospective
viewer, was evident. In other words, a user interface prototype has
been implemented to enable users to effectively browse, search,
download and consume the provided audio-visual content. In the
case of disabled people ‘effectively’ means that both the content
and the value-added services need to be accessible to the user, as
already discussed. Such an interface is actually a sub-system of
the overall system architecture as briefly presented in Figure 2,
consisting of an authoring tool, an expert system, storage (native
XML database) and the user interface, referred to as itvSimu. The
authoring tool allows content providers to easily author a
diversity of multimedia resources supporting a MPEG-21
compliant metadata model [1]. The expert system uses an
algorithm originally devised for clustering web documents [15],
to classify digital items and user profiles based on their attributes
and enable intelligent TV program recommendations. The
aforementioned systems communicate through web services under
a flexible - distributed architecture. This paper presents the design
and the implementation of the itvSimu system.
Figure 2. iTV adaptation architecture
3.1 Design Approach
In effect, the developed User Interface comprises an EPG
simulator. It should be noted that the choice of the
implementation technologies has not been straight-forward
considering the plethora of available standards and technologies
like MHP4, GEM-IPTV, TV-Anytime, DVB-IP, Java-TV and
more. Given the requirement for incorporating networking
functionality into the EPG subsystem, a web-based approach
instead of a standalone application has been adopted. This
approach ensures execution of the EPG through a standard
browser interface. The design approach follows.
4
http://www.mhp.org/
During the early faces of the design of the prototype system an
identification of the stakeholders took place:
The end user: he/she interacts with the ITV interface
browsing and consuming digital content. The end user is
associated with an XML-based user profile which includes
personal data, preferences upon the audiovisual content (e.g.
sports, news, movies) and potential disabilities (hearing
problems, visual impairments, etc)
The Service Provider: The analogous of the traditional TV
channels.
The TV Guide Provider: A service that informs end users
about the offered services and their availability time
schedule.
Occasionally, the Service Provider and the TV Guide Provider
coincide; for simplicity reasons we have made such assumption
while designing our prototype. Our focus has been on the
interaction of the end user with the iTV interface, since that will
affect the overall functionality of a personalized system, with
particular emphasis on disabled users.
Figure 3 illustrates the three elementary sub-systems of the iTV
user interface: the player, the EPG and the logger. These
subsystems are supported by auxiliary services for enhancing the
functionality of the iTV simulator. Bellow we analyze the
functional and interactivity requirements of the above-mentioned
subsystems and discuss the solutions adopted in our prototype.
Figure 3. A screenshot of the iTV user interface: selection of
TV programme through the EPG selection service
3.2 itvSimu Subsystems
Logger subsystem: the simplest, yet, a crucial software module as
it provides feedback to the user for the “hidden” operations. It
records and displays all (implicit or explicit) user actions (e.g.
profile modification, starting/pausing/resuming a TV program,
etc). It has been implemented through Java Observer pattern
whose actions activate the logger.
<?xml version="1.0" encoding="UTF-8"?>
<DIDL xmlns:xsi=http://www.w3.org/…="....\MPEG-21-DI\DIDL.xsd">
<ITEM>
<DESCRIPTOR>
<STATEMENT TYPE="text/plain">
Movie for normal, blind or deaf individuals
</STATEMENT>
</DESCRIPTOR>
<ITEM>
<DESCRIPTOR>
<STATEMENT TYPE="text/plain">
Movie for normal individuals
</STATEMENT>
</DESCRIPTOR>
<COMPONENT>
<RESOURCE REF="video.mov" TYPE="video/mov"/>
</COMPONENT>
</ITEM>
<ITEM>
<DESCRIPTOR>
<STATEMENT TYPE="text/plain">
Movie for blind individuals
</STATEMENT>
</DESCRIPTOR>
<COMPONENT>
<RESOURCE REF="video.mov" TYPE="video/mov"/>
<RESOURCE REF="audiodescription.mov" TYPE="audio/mp3"/>
</COMPONENT>
</ITEM>
<ITEM>
<DESCRIPTOR>
<STATEMENT TYPE="text/plain">
Movie for deaf individuals
</STATEMENT>
</DESCRIPTOR>
<COMPONENT>
<RESOURCE REF="video.mov" TYPE="video/mov"/>
<RESOURCE REF="captions.txt" TYPE="text/plain"/>
</COMPONENT>
</ITEM>
</ITEM>
SMIL player has been implemented using the QuickTime for Java
API5 [10]. The XSLT transformation of MPEG-21 digital items to
SMIL documents depends on the user profile, taking into account
potential user disabilities. An example of such digital item
declaration and its SMIL representation is given in Figure 4.The
second function of the Player subsystem is the provision of user
interaction information to the expert (recommendation) system.
An XML-based description of the user interaction is first stored
into an XML native database located on the iTV’s server and
retrieved by the recommendation system to enable more effective
and reliable reasoning. In effect, the user interaction history
comprises a function f (x, y, .., z), wherein x, y, .., z are the values
of ‘interaction parameters’. Such parameters are either explicitly
provided by the user or implicitly inferred by the player.
Examples of implicit parameters are the playing time of a video
over the overall video duration ratio, while the rating of a TV
program (in a 0-10 scale) could be explicitly provided by the
viewer. The interaction history function could be expressed as f(x)
= a X + b Y where a, b represent weights based on the designer’s
priorities, which could either be static or dynamically specified
(through training). As shown in Figure 5, the user’s interaction
history and the TV programs ratings posted by users that belong
to the same ‘users cluster’ (e.g. the same disability group)
comprise the input of the recommendation system. The latter
recommends -among the available digital content- those programs
that suit the user’s profile.
Digital content
rating
MPEG-21
MPEG-21
documents
documents
(digital
(digitalitems)
items)
<?xml version="1.0" encoding="UTF-8" ?>
<smil xmlns:qt="http://www.apple.com/...." time-slider="true">
<head>
<layout>
<root-layout width="320" height="350" background-color="black" />
<region id="captions" backgroundColor="yellow"
top="250" height="100" left="1" width="310" />
<region id="movie" left="0" top="0" width="620" height="740" />
</layout>
</head>
<body>
<par>
<textstream src=“captions.txt" region="captions“
systemCaptions="on" />
<video src=“video.mov" alt=“Movie title" region="movie“
begin="00:00.0" dur="00:14:02.000" />
</par>
</body>
</smil>
Recommendation
System
iTV Schedule
recommendation
User Profile
User Interaction
History
Figure 5. TV schedule recommendation.
Figure 4. A Digital Item Declaration document transformed to
SMIL format which synchronizes a video with captions
(appropriate for hearing impaired individuals).
Player subsystem: it reproduces iTV programs (digital items) as
well as recording the user’s interaction history. Its elementary
module is the digital content player. Such player should support
more than basic functionality (play, pause, rewind, etc.), such as
subtitles, audio descriptions, etc. Given that no MPEG-21 player
is currently available we have chosen to use SMIL as intermediate
technology mainly due to the numerous available SMIL players
(e.g. X-Smiles [18], QuickTime player). In particular, the MPEG21 digital item declarations are transformed to SMIL format
through an appropriate XSLT transformation and subsequently
the SMIL markup code is parsed by the SMIL player. That
approach ensures the iTV interface’s interoperability since SMIL
is now considered a mature web technology. In our prototype, the
EPG subsystem: This is the most ‘interactive’ subsystem since it
is used by the user to browse, navigate and download audiovisual
content. In the context of our research project we have identified
several use cases according to which the iTV end-user may use
EPG in order to:
•
5
navigate within iTV available services (zapping);
QuickTime for Java (QTJ) is a software library that allows software
written in Java to provide multimedia functionality, by making calls into
the native QuickTime library. QTJ offers SMIL support and also can
handle a larger variety of multimedia formats than the ‘traditional’ Java
Media Framework (JMF) API.
•
personalize the audiovisual content based on her
final overall design comes out as a proprietary solution composed
of several open standards.
potential disabilities and content preferences;
•
5. ACKNOWLEDGMENTS
schedule a reminder for a TV program.
An important consideration task during the EPG’s development
has been the representation and retrieval of the TV schedule. To
satisfy this design requirement we have used TV-Anytime
Programme metadata [16] along with TV-Anytime Java API of
BBC6. The overall functionality of the EPG has been based upon
the specifications of the JAVA TV API (JSR-000927) in a non
strict manner. The result of the BBC TV schedule retrieval on the
iTV interface is shown on Figure 3.
The most important part of content personalization has been the
modelling of user characteristics (e.g. disabilities) and
preferences. To address this issue, we have adopted the
Interaction Profile of DAWIS framework for the design of
adaptive web information systems [17]. The most abstract layer of
the DAWIS Interaction Profile consists of the Service Interaction
Profile, the Delivery Context Interaction Profile, the User
Interaction Profile and the Platform Interaction Profile. Based on
that, an itvProfile schema has been developed and serialized in
XML syntax including elements like LanguageNative,
Languages, ContentPreferences, Disabilities, Subtitles, Captions,
AudioDescription and SignLanguage. The itvProfile instances are
stored in a separate collection into the XML database storage
through XQuery 7.
4. CONCLUSIONS
This section aims at summarizing what have been achieved so far
and also sharing our design and implementation experience with
the standardization committees of relevant recommendation
documents.
This work is supported by the General Secretariat of Research and
Technology (Project “Software Applications for Interactive Kids
TV-MPEG-21”, project framework “Image, Sound, and Language
Processing”, project number: EHΓ-16). The participants are the
University of the Aegean, the Hellenic Public Radio and
Television (ERT) and the Time Lapse Picture Hellas.
6. REFERENCES
[1] Anagnostopoulos, C.; Tsekouras, G.; C.; Gavalas, D.;
Economou, D.; Psoroulas, I. (2007); Increasing Interactivity
in IPTV using MPEG-21 Descriptors, Proceedings of the 4th
IFIP Conference on Artificial Intelligence Applications &
Innovations (AIAI’2007), pp. 65-72, September 2007.
[2] Berglund, A.; Johansson, P. 2004. Using speech and
dialogue for interactive TV navigation. In Universal Access
Inf Soc 3(3-4):224–238.
[3] Carmichael, A.;Rice, M.;Sloan, D. 2006. Inclusive Design
and Interactive Digital Television: Has an Opportunity been
Missed? In 3rd Cambridge Workshop on Universal Access
and Assistive Technology. Fitzwilliam College , Cambridge,
10-12 April 2006.
[4] Chorianopoulos, K. and Spinellis, D. User Interface
Evaluation of Interactive TV: A Media Studies Perspective,
Universal Access in the Information Society, 5(2):209-218,
Springer, 2006
[5] Darby, S., (1997). Introduction to Enhancing
Accessibility of Digital Television. In RNIB.
the
So far, the developed system is at a prototype level and all
systems (i.e. expert system, authoring tool, iTV simulator) have
not been evaluated as a whole. However, at this stage, the itvSimu
seems to offer an interesting and simplified architecture that can
realize a primitive IP-TV platform and further serve as
benchmarking software for further research in the field of content
adaptation and accessibility. Currently, the prototype has
implemented only a portion of user groups. The reason for this is
that the difficulties for evaluating the adaptation behavior requires
a considerable number of users with diverse profiles, and an
analogous number of digital items. Such an evaluation is
considered as future work. In addition, as a future work it would
be interesting to consider more runtime parameters (implicit
profile) and more effective models for multiplexing them, maybe
through AI techniques and simulation. Finally, a separate version
of itvSimu optimized for users with hearing problems (e.g.
incorporating auditory menus functionality) will be implemented.
[6] Ferguson, Douglas, Perse E. (2000). The World Wide Web
as a Functional Alternative to Television Journal of
Broadcasting & Electronic Media 2000 44:2, 155-174.
From the point of view of standardization efforts, it came out that
the selection of standards it was a difficult task as there are many
of them, often overlapping and/or contradicting each other.
Consequently, even if some designer uses open standards, her
[11] Rice, M., (2004). Personalisation of interactive television for
visually impaired viewers. In 2nd Cambridge Workshop on
Universal Access and Assistive Technology. Fitzwilliam
College , Cambridge, 22-24 March 2004.
6
http://www.bbc.co.uk/opensource/projects/tv_anytime_api/
7
XQuery 1.0: An XML Query Language: http://www.w3.org/TR/xquery/
[7] Gil, A.; Pazos, J.; Lopez, C.; Lopez, J.; Rubio, R.; Ramos,
M.; Diaz, R., "Surfing the Web on TV: the MHP approach,"
Multimedia and Expo, 2002. ICME '02. Proceedings. 2002
IEEE International Conference on , vol.2, no., pp. 285-288
vol.2, 2002
[8] ISO MPEG-21, Part I: Information technology - Multimedia
framework (MPEG-21) —Vision, Technologies and
Strategy, ISO/IEC TR 21000-1:2004.
[9] ISO/IEC 21000-7, Information technology - Multimedia
framework (MPEG-21) — Part 7: Digital Item Adaptation,
First Edition, 2004.
[10] QuickTime
for
http://developer.apple.com/quicktime/qtjava/.
[12] Schematron
http://xml.ascc.net/schematron/schematron1-5.sch.
Java,
1.5,
[13] Springett, M. and Griffiths, R. (2007). Accessibility of
Interactive Television for Users with Low Vision: Learning
from the Web. In Proceedings of the 5th European
Conference on Interactive TV (EuroITV’2007), LNCS 4471,
pp. 76-85, May 2007.
[14] Thang T.C.; Yang S.; Ro Y.M.; Wong E.K. (2007). Media
Accessibility for Low-Vision Users in the MPEG-21
Multimedia Framework. IEICE Transactions on Information
and Systems, E90-D(8), 2007, pp.1271-1278.
[15] Tsekouras, G.; Anagnostopoulos, C.; Gavalas, D.;
Economou, D. (2007); Classification of Web Documents
using Fuzzy Logic Categorical Data Clustering, Proceedings
of the 4th IFIP Conference on Artificial Intelligence
Applications & Innovations (AIAI’2007), pp. 93-100,
September 2007.
[16] TV-Anytime, ETSI TS 102 822: Broadcast and On-line
Services: Search, select, and rightful use of content on
personal storage systems.
[17] Vlachogiannis, E. et al. (2008) “A reference framework for
the Design of Adaptive Web Information Systems (DAWIS)
inspired from a general systems’ research”. Working paper.
[18] X-Smiles
SMIL
http://www.xsmiles.org/xsmiles_smil.html.
player,
[19] Yang, Α.;Ro, Y.;Nam, J.;Hong, J.,Choi, S.;Lee, J., (2004).
Improving Visual Accessibility for Color Vision Deficiency
Based on MPEG-21. In ETRI Journal, vol.26, no.3, June
2004, pp.195-202.
[20] Zillmann D. (2000). The coming of media entertainment. In:
Zillmann D, Vorderer P (eds) Media entertainment: the
psychology of its appeal. Lawrence Erlbaum Associates,
Mahwah, pp 1–20