Whitepaper Mpegif PDF
Whitepaper Mpegif PDF
Whitepaper Mpegif PDF
- Licensing
Copies of this document may be freely downloaded from MPEGIF web site at
http://www.mpegif.org
This document is based on the original 2002 white paper produced by Martin
Jacklin from contributions by Tim Schaaff of Apple, Sebastian Moeritz of dicas
digital image coding, Yuval Fisher of Envivio, U. Georg Ohler of Fraunhofer IIS,
David Price and Fadi Malak of Harmonic, Jennifer Toton of iVAST, Rob Koenen
of InterTrust, Jan van der Meer of Philips, Olivier Avaro of France Telecom,
Professor Klaus Diepold of the Technical University of Munich, and other
members of the MPEGIF Board of Directors and Marketing Workgroup.
Trademarks used herein are the property of their mark holders and are used for
illustrative purposes only. MPEGIF thanks Apple Computer, Envivio, Inc.,
Harmonic, NTT DoCoMo, Panasonic, Pelco, Samsung, Streamcrest Associates,
Sony, Tandberg TV, and others for the use of the photographs and illustrations
they have provided.
Table of Contents
What Is MPEG-4? .......................................................................................... 1
Internet Streaming...................................................................................13
Packaged Media.......................................................................................14
MPEG-1................................................................................................. 17
MPEG-2................................................................................................. 17
MPEG-4................................................................................................. 17
MPEG-7................................................................................................. 18
MPEG-21............................................................................................... 19
Related Standards........................................................................................ 19
With the release of Part 10 AVC, is Part 2 video coding obsolete? ........... 33
What is the relationship between MPEG-4 Visual and the DivX codec?. 34
I read a benchmark of MPEG-4 where it did poorly, how can you claim it
is higher performance?.............................................................................35
MPEG-4s latest video codec is AVC, the Advanced Video Codec. Also
identically standardized as ITU H.264, the AVC codec represents the latest
developments in video coding, offering a typical compression rate half that of
MPEG-2 for similar perceived quality. This dramatic improvement has led to
AVC becoming the new standard for video transmission, being employed in
most new video products and services where quality and compression efficiency
are paramount. New HDTV satellite broadcasting and DSL video services will
use AVC, as will the Sony PlayStation Portable and Apple QuickTime 7 player.
AVC will also be used in video broadcasting to mobile handsets using the
DVB-H, DMB, and MediaFlo systems, and specified in the HD-DVD and
BluRay high-definition optical disc standards.
AAC is being used not only in television broadcasting, where it is the codec for
Japans ISDB digital TV system, but also in music players and distribution
services, such as Apples iPod and iTunes.
The low-bitrate quality improvements of the HE-AAC codec has enabled new
digital music broadcasting services, such as XM satellite radio and the music
download services of mobile carriers KDDI and Orange. The latest version,
HE-AAC v2, incorporating MPEG-4s Parametric Stereo tool, has just been
standardized by 3GPP for future music streaming services.
AAC Advanced Audio High-Performance audio codec Portable Music Apple iPod, iTunes
Codec for excellent quality at moderate DTV Broadcasting ISDB (Japan)
bitrates
While video and audio are coded with MPEG-4s excellent conventional codecs,
graphics, text and synthetic objects have their own coding, rather than forcing
them into pixels or waveforms. This makes representation more efficient and
handling them much more flexible. Additionally, MPEG-4 offers revolutionary
synthetic content tools, like structured audio (a language for describing sound
generation), animated faces and bodies, 2D and 3D meshes, and vector-based
graphics. This allows separate interaction with each object and sharp rendering at
all bandwidths, e.g. subtitles that can be turned on and off and remain sharp
even at high compression rates.
Truly a framework of the future, MPEG-4s rich media tools enable the
creation of content that is difficult or impossible to achieve with the proprietary
tools available today. These tools are already in use for distributing complex
corporate and educational multimedia presentations that require tight
synchronization of video and graphics content, with entertainment applications
expected as the creative community moves towards interactive content.
Excellent Performance
A codecs compression efficiency, or the bitrate required for a given perceived
quality level, is one of its most important features. MPEG-4s latest audio and
video codecs, AAC and AVC, provide outstanding compression efficiency,
equaling or exceeding the performance of any codecs now available.
350
5.1 Surround
300
250 Stereo
200
Kb/s
150
100
50
0
MP3 AAC HE-AAC MPEG
Surround
Once developed and proposed, new standards go through careful review and
approval to insure that they are completely and unambiguously specified, so they
can be implemented solely by following the standard. Additionally, test results,
frameworks, and the source code to reference software implementations are made
available so anyone can see how the codec was implemented, study why design
choices were made, and understand how the codec operates.
These same principles apply to MPEG-2, where history has shown them to be
very worthwhile. Early MPEG-2 encoders required a 6 Mb/s bitrate for
acceptable picture quality, while todays encoders can operate at 2 to 2.5 Mb/s,
saving 60-65% of the original bandwidth costs, without any changes to the
installed base of MPEG-2 decoders. A similar experience curve is expected for
MPEG-4.
Figure 3. Improvements in coding efficiency are facilitated by MPEG-based competition. Courtesy Harmonic.
1st commercial
8
encoder
Bit rate required for
7 constant broadcast
quality
Bit rate (Mbit/s)
4
Size reduction
3 System 3000
2
New generation
Evo5K
1
0
1994 1995 1996 1997 1998 1999 2000 2001 2002
Figure 4. Improvements in coding efficiency are facilitated by MPEG-based competition. Courtesy Tandberg TV.
The business failure of an MPEG-4 supplier does not put the standard in
jeopardy, and users may shift new purchases of MPEG-4 equipment to other
suppliers without losing their investment in equipment or software already
purchased. Conversely, the success of a MPEG-4 supplier does not mean research
and development efforts on the core functionality of the codec will be shifted to
product extensions into other areas, as is common when a proprietary standard
matures.
Consumption
Creation
Contribution Distribution
TV
Playout
video
Production Storage terrestrial
PC
Encode
audio interaction Packaged satellite
DVD
text Retail
Radio
Further, MPEG-4 has been designed from the beginning to operate over lossy
networks such as mobile 3G or WiFi, and incorporates scalability, error
concealment, and error recovery techniques in its codecs so that program quality
is preserved when changes in bandwidth or signal dropouts occur.
Though there is always a chance that an unknown inventor may someday assert a
patent against an MPEG-4 manufacturer or user, it is more likely the inventor
will join an established pool where the costs of licensing and royalty collection
can be shared with many other licensors. His royalties would be distributed as a
share of the pools, and a users royalties would be unchanged.
Television Broadcasting
MPEG-2 is the current standard for digital television production and
distribution, but doesnt offer enough compression for transmitting the hundreds
of channels cable and satellite TV consumers will expect in high-definition. As
more programming moves to HDTV, cable and particularly satellite distribution
systems will begin using more efficient compression codecs such as AVC. Already,
major satellite TV operators such as DirecTV, bSkyb, and Premiere have
announced new high-definition services using AVC.
Thousands of AVC encoders have already been deployed in the field by both
established and new encoder suppliers and a number of system operators are
undertaking extensive trials with a view to launching services soon, most notably
UK-based Video Networks Ltd., which has recently launched the first full
revenue-bearing service based on AVC.
Internet Streaming
MPEG-4 was designed from the start for streaming, and it is now featured in
systems from several manufacturers. Apple supports MPEG-4 Simple Profile
video and AAC audio in its QuickTime platform and now includes support for
AVC in QuickTime 7. Real Networks supports decoding of MPEG-4 content,
and the popular DivX codec is also MPEG-4 compliant. A number of MPEG-4
vendors offer plug-ins for Microsofts Windows Media Player that enable users to
also watch MPEG-4 content in this player.
Video Conferencing
Video conference terminals are a natural application for MPEG-4s AVC codec,
and recently market leaders Polycom, Tandberg, and Sony have all introduced
AVC support in their products. AVC will enable increased video quality over the
same connection compared to the earlier H.261 and H.263 standards, or a
half-bandwidth connection may be used for similar quality.
Satellite Radio
MPEG-4s HE-AAC audio codec, as well as its AAC and BSAC audio codecs,
have been employed in several systems for satellite radio and multimedia
broadcasting. XM Radio employs HE-AAC coding, while Digital Radio
Mondiale employs HE-AAC as well as the MPEG-4 speech codecs CELP and
HVXC. Koreas DMB system will employ AAC and BSAC audio coding and
AVC video.
Security
Traditional video surveillance
systems have employed time-lapse
video recorders and more recently
small Digital Video Recorders using
some variant of motion JPEG or
wavelet compression. These devices
often must limit the video
resolution and frame rate to provide
a reasonable recording time, and
Figure 10. Combination CDMA2000 mobile handset
they require proprietary video
and receiver for Satellite DMB service using MPEG-4
players or browser plug-ins for users AVC and AAC. Courtesy Samsung.
Thus, several security video firms have begun offering improved video systems
with recording of MPEG-4 video at full D1 resolution and frame rates of 25 or
30 frames per second. MPEG-4 coding dramatically reduces the storage costs: A
typical installation with 3000 cameras might use 800 TB of storage to support
15 days of recording. MPEG-4s interoperability also allows users to combine
equipment from different manufacturers in their systems and still be able to
export the video in a universally readable format.
Figure 11. MPEG-4 coding enables this casino to store weeks of video with better quality for forensic
analysis. Courtesy Pelco.
MPEG-1
MPEG-1 was originally intended to enable video to be compressed for storage
on optical disks of the era, and led to the Video CD format, hugely popular in
Asia for video content distribution. MPEG-1 was also used in early
video-on-demand systems and interactive media and has become almost
universally supported by PC media players. MPEG-1 audio offers Layer 2 coding,
used in the DAB digital radio standard, and Layer 3 coding, the MP3 format.
MPEG-2
MPEG-2 is the audiovisual standard most widely used for entertainment video
applications. MPEG-2 enables digital television and DVDs, with hundreds of
millions of MPEG-2 decoders deployed in digital satellite and cable set-top
boxes, DVD players and PCs. It is a more powerful format than MPEG-1,
capable of achieving higher compression ratios and supporting interlaced video.
MPEG2 video decoding and encoding are more CPU-intensive than for
MPEG-1.
Virtually every image you see on television today, even on an analog receiver, has
at some point been coded and decoded in MPEG-2.
MPEG-4
The subject of this paper, MPEG-4 is the successor standard to MPEG-2,
extending its application to IP-based and lossy networks, to rich multimedia
presentations, and to objects and interactivity. MPEG-4 also has improvements
in the core audio and video codecs over MPEG-2, notably the AVC, AAC, and
HE-AAC codecs which have enabled so many new products and services.
Figure 12. The Metadata and Rights Standards MPEG-7 and MPEG-21 are designed to
complement coding standards MPEG-1, -2, and -4. Courtesy Streamcrest Associates.
MPEG-7
MPEG-7 is a recently finalized standard for description of multimedia content.
It will be used for indexing, cataloging, advanced search tools, program selection,
smart reasoning about content and more. The standard comprises syntax and
semantics of multimedia descriptors and descriptor schemes. MPEG-7 is an
important standard because it allows the management, search and retrieval of
ever-growing amounts of content stored locally, on-line, and in broadcasts.
MPEG-21
MPEG-21 is an emerging standard with the goal of describing a big picture of
how different elements to build an infrastructure for the delivery and
consumption of multimedia content existing or under development work
together. The MPEG-21 world consists of Users that interact with Digital Items.
A Digital Item can be anything from an elemental piece of content (a single
picture, a sound track) to a complete collection of audiovisual works. A User can
be anyone who deals with a Digital Item, from producers to vendors to
end-users.
Interestingly, all Users are equal in MPEG-21, in the sense that they all have
their rights and interests in Digital Items, and they all need to be able to express
those. For example: usage information is valuable content in itself; an end-user
will want control over its utilization. A driving force behind MPEG-21 is the
notion that the digital revolution gives every consumer the chance to play new
roles in the multimedia food chain. While MPEG-21 has lofty goals, it has very
practical implementations.
MPEG-21 is about managing content and access to content. Even with fully
interoperable coding there are still additional steps to insure that all the features
of different networks work together. MPEG-21 is a framework which allows
interoperability and portability of content.
Related Standards
In their Wireless terminal specification, 3GPP and 3GPP2 use MPEG-4 Simple
Visual profile for video, MPEG-4 file format for multimedia messaging, and
RTP/RTSP for streaming protocols and control. 3GPP has also recently adopted
HE-AAC v2 for music delivery.
There are two ISMA specifications: ISMA 1.0 is mature, conformance testing is
in place, and ISMA-compliant implementations are available. It uses MPEG-4
Simple Visual and Advanced Simple profiles for video, and MPEG-4 High
Quality Audio Profile for audio.
ISMA 2.0 updates the codec suite to use Advanced Video Coding (MPEG-4 Part
10 AVC or H.264) Baseline, Main, and High video profiles, and High-Efficiency
AAC (HE-AAC) audio. Implementations are emerging and conformance testing
is available.
The Interoperability Program has been active since 2000 and currently conducts
three test rounds per year. It is intended as an informal, confidential program
where members can test their initial designs and implementations, and resolve
interoperability problems.
The main work of the group is done through conference calls and a private
reflector. Typically, calls are held every two weeks during an active test round.
Calls may be held less frequently or not at all during the periods between active
test rounds.
The Logo Qualification Program was established in 2004 and offers members a
procedure by which companies may qualify encoder & decoder products and
earn the right to label their products with a logo provided by MPEGIF.
MPEGIF also operates a website with a searchable directory of Qualified
Products. The program is designed to meet the following goals:
Decoder companies must verify that the product correctly decodes all
streams in the relevant Qualification Stream Set. This set is a
combination of published conformance streams and streams posted to
and tested within the Interoperability WG.
The fundamentals of MPEG-4 are described by the parts on Systems (part 1),
Visual (part 2) and Audio (part 3), along with the new part 10 on Advanced
Video Coding. DMIF (Delivery Multimedia Integration Framework, part 6)
defines an interface between an application and the network or storage.
Conformance (part 4) defines how to test an MPEG-4 implementation, and part
5 offers a significant body of example Reference Software, that can be used to
start implementing the standard.
Figure 13. This is merely one way to classify MPEG-4s toolset. Courtesy Streamcrest Associates.
5. Reference SW
4. Conformance
2. 10. 3.
decoding
Visual AVC Audio
Figure 14. The parts of MPEG-4. The arrows represent the flow of bits through the MPEG-4 system.
Audio
Profiles
Core
core Compl. 2D
hybrid etc.
FA Simple 2D Object Descriptor
Compl. Profile
Visual Graphics
Profiles Profiles Main personal
MPEG-4 consists of a large number of tools, not all of which are useful in any
given application. In order to allow different market segments to select subsets of
tools, MPEG-4 contains profiles, which are simply groups of tools. For example,
the MPEG-4 Advanced Simple visual profile contains pel motion
compensation, B-frames, and global motion vectors, but it does not contain
shape coded video.
Profiles allow users to choose from a variety of toolsets supporting just the
functionality they need. Profiles exist at a number of levels, which provide a way
to limit computational complexity, e.g. by specifying the bitrate, the maximum
number of objects in the scene, audio decoding complexity units, etc.
The concept of MPEG-2 Video Profiles has been extended to include the Visual,
Audio and Systems parts of the standard, so that all the tools can be
appropriately subsetted for a given application domain.
In the MPEG-2 world, content is created from various resources such as video,
graphics, and text. After it is composited into a plane of pixels, these are
encoded as if they all were video pixels. At the playback side, decoding is a
straightforward operation.
The BIFS language not only describes where and when the objects appear in the
scene, it can also describe behavior (make an object spin or make two videos do a
cross-fade) and even conditional behavior objects doing things in response to
an event, usually user input. This makes the interactivity of MPEG-4 rich
multimedia possible. All the objects can be encoded with their own optimal
coding scheme video is coded as video, text as text, graphics as graphics
instead of treating all the pixels as moving video, which they often really arent.
For applications that need more complex logic in response to an event,
ECMAScript can be used or the Java language via MPEG-J.
Recently, MPEG has redefined the scene description based on W3C's SVG
instead of VRML (Virtual Reality Modeling Language) that BIFS was based on.
This lightweight scene representation - called Laser is geared at 2D applications
on limited resources devices such as mobile handsets.
BIFS and Laser are descriptions of a scene to compose audio-visual objects. As all
scene descriptions, they only contain a limited number of features and
sometimes features specific to some applications. To allow content creators more
freedom in the composition and logic of their applications, a programming
language is necessary. The Java-based Graphical eXtension Framework (GFX)
provides a programmatic way to compose and to render audio-visual objects.
GFX was designed around mobile entertainment applications such as 3D games
enhanced with video.
As all the coders in MPEG-4 are optimized for the appropriate data types,
MPEG-4 includes efficient coders for audio, speech, video and even synthetic
content such as animated faces and bodies.
Vid MP
eo EG
com -4
pro
pre
Bitrate
ssio gre
ns ss
cie
nce
Examples of the first kind are support for fonts, synthesized textures and the
Animation Framework eXtension (AFX). Examples of the latter are MPEG-4
Advanced Video Coding (AVC), Lightweight Scene Representation (Laser), and
MPEG-J Graphical Framework eXtension (GFX).
For digital audio broadcasting, MPEG-4 AAC is becoming the codec of choice.
Satellite-based XM Radio uses HE-AAC, as does terrestrial Digital Radio
Mondiale.
MPEG-4 Audio is also inherently scalable. If, for example, a transmission uses an
error-prone channel with limited bandwidth, an audio stream consisting of a
small base layer and a larger extension layer provides a robust solution. Strong
error protection on the base layer (adding only little overhead to the overall
bitrate) makes sure there is always a signal, even with difficult reception. The
extension layer (with little error protection) and base layer together give excellent
quality in normal conditions. Any errors lead only to a subtle degradation of
quality but never in a total interruption of the audio stream.
Patent pool organizations provide a single convenient point for licensing patents
that are essential to implementing one of the MPEG standards. They operate
independently of MPEG and MPEGIF, and each sets its own terms and royalties
for patent licensing. Currently, MPEGIF is aware of two firms that operate pools
for MPEG-4 patents: MPEG-LA and Via Licensing.
These joint licensing schemes are not carried out on behalf of ISO, MPEG or
MPEGIF, nor are they, or do they need to be, officially blessed by any such
organization. There is no authority involved in licensing, it is a matter of
private companies working together to offer convenience to the market.
MPEG-4 AVC makes use of the latest research in video coding. The coding
methods rely on the fact that computational power and memory have become
cheaper than 5 years ago, meaning that coding methods can be more complex
than could previously be accommodated in hardware and software
environments.
Additionally, one must remember that Part 2 video is a part of several deployed
standards, such as mobile videophony. The installed base of 3G mobile video
handsets means that Part 2 video encoders will continue to be improved and
Benchmark tests are also complex to perform correctly because of the biases,
fatigue, and other psychological effects of the observers. A viewing jury in a test
conducted according to ITU R-500 sees a controlled set of clips that they rank
without knowing which codec is being used, and the test clips are carefully
chosen so there is a mix of motion, detail, and other features so the test
represents all types of scenes that would be encoded. In less formal benchmarks,
particularly those carried out by journalists or enthusiasts, these effects are often
not considered correctly.
Object model Video/audio and Audio/Video only Video/audio and Video/audio and
support rich 2D/3D mixed mixed media mixed media
media, synthetic through SMIL through
graphics. DRM on based protocol. proprietary
separate streams. No streaming of protocol.
mixed media.
One very promising technology is the DVB standard for interactive TV APIs,
Multimedia Home Platform (MHP), described in more detail below. In the
United States, MHP has its equivalents in the Java-based Digital Applications
Software environment (DASE), an Advanced Television Systems Committee
(ATSC) activity, and in OCAP, the Open Cable Application Platform specified
by the OpenCable consortium, which is based on MHP.
The mainstream of the Broadcast Industry likes Java, because unlike the host of
other proprietary and flavored web-standards based approaches (e.g.
MediaHighway, Liberate, OpenTV), it offers content creators and providers and
service operators a chance to write once, run many times using the same
content, which is itself indispensable to creating a horizontal market.
The first thing to understand is that there are two relevant groups of DVB
specifications. The first, DVB 1.0, is the transport foundation of the DVB family
of standards. This specification spells out how to implement DVB-compliant
MPEG streams. DVB is traditionally MPEG-2 based, but MPEG-4 is seen as a
logical evolution, and one which will be more efficient when DVB services are to
be delivered over IP. To this goal, DVB has included MPEG-4 Main Profile AVC
and HE-AAC in its latest revision.
The second DVB, also sometimes referred to as DVB 2.0, addresses the
Multimedia Home Platform and a variety of next generation delivery
applications, including Copy Protection and Copy Management and delivering
DVB services over IP. The Multimedia Home Platform (MHP) defines a generic
interface between interactive digital applications and the terminals on which
those applications execute. The MHP specification specifies how to download
applications and media content, typically delivered over a DVB compliant
transport stream, and optionally in the presence of a return channel.
The combination of MHP and MPEG-4 provides the ability to develop very
flexible and rich interactive applications for the interactive broadcast domain.
The MPEG-4 features can be introduced smoothly and gradually, in a
backwards-compatible manner.
In the course of defining the textual format, MPEG-4 was also extended with the
flexible timing models that SMIL uses. The so-called flextime support was
added to the broadcast-type of time stamp-based, rigid MPEG-2 type of
synchronization.
MPEG-7
MPEG-4
Representation MPEG-4
(e.g. mp4 file)
X3D Player
The textual format and the binary format are largely dual representations of the
same information. In most situations, one would want to deliver scene
description information in binary form, as that is much more efficient. For
exchanging scenes between authors or storing scenes inside a single organization
in a way that is understood by multiple tools, a textual format is a useful tool. It
is easy to go from text to binary representation, and while the other way is just as
easy in theory, it is harder to do so in a ways that is meaningful for an author,
much like a decompiled program can be hard to read.
The MPEG Industry Forum represents more than 80 companies from diverse
industries evenly distributed across North America, Europe and Asia, addressing
MPEG-4 adoption issues that go beyond the charter of ISO/IEC MPEG.
MPEGIF is vital to the success of the MPEG-4 standard, since the work done by
MPEG is necessary but not sufficient. In its endeavors to promote wide adoption
of MPEG-4, MPEGIF picks up where MPEG stops.
To join MPEGIF, or to find out what activities will benefit your company, visit:
http://www.mpegif.org.