Video is the electonic signal or data that, when rendered through a display, appears as moving images. Video is captured by a video camera whose sophistication ranges from those used in a studio to cameras in a mobile phone. The principle of operation is the same; an image is captured as light from the scene passes through a lens to be brought in focus on the imaging device. Typically the imaging device is made up of thousands of individual sensors each capturing one pixel of the image. In order to capture colour images the light hitting the sensor is split into Red, Green and Blue (RGB), using suitable filters, with each component having its own sensor i.e. each pixel effectively has three sensors. Any colour can be represented as a combination of red, green and blue. When the sensors are exposed to light from the scene, by opening a shutter, the intensity is recorded as an analogue electronic voltage. The amount of light can be adjusted by changing the aperture size (how big the hole is that lets light in from the outside world) and/or the shutter speed (how long the hole is open for). In low light conditions a large aperture and/or slow shutter speed will be necessary. The image, captured as an analogue voltage with RGB values for each pixel, is converted to a digital one by digitizing the analogue voltages for each pixel using an analogue-to-digital converter; typically using eight bits for each value. Up to this point the basic process is the same for (digital) still camera and video camera. The big difference comes because the video camera has to take tens of pictures per second to capture anything moving in the scene so that it will look like natural movement when played back on the display device. For television 25 (30 in the US system) pictures-per-second has been traditionally used which is similar to the 24 used in cinema. For some low-end devices 12 pictures per second is used. When filming fast moving scenes, such as sports, rates up to 60 pictures-per-second may be used. The video in standard definition television using the PAL system adopted in the UK and much of Europe has a resolution of 720 x 576 pixels matching the resolution of the original analogue system. Given this, the video bit-rate being generated by the camera is: 720x576 pixels x 3 colours x 8 bits x 25 pictures per second = 248.8Mbit/s. That's without any audio or extra bits that we'll need to indicate the beginning and end of a picture. On a camera we need to store the video using whatever media is available; tape, DVD, memory card, hard-drive. As an example; with the bit-rate above a 16GByte memorycard would be full in about eight and a half minutes -no feature length movies here. What about if we want to communicate the video as television for example. Here we need to know that our TV transmission systems such as terrestrial, satellite and cable are limited to 50Mbit/s at most (assuming we only need to carry one program) to see that we have a problem. What about transmitting it using broadband? Even with an exceptionally fast connection we still won't have enough bits-per-second to carry our video. Of course we know that video is transmitted via satellite, terrestrial and broadband very successfully, so what happens to enable this? At first glance we can see that reducing the resolution (that is reducing the number of pixels) will lower the bit-rate (half the resolution =>half the bit-rate). We could also reduce the picture rate (e.g. use 10 frames per second instead of 25) or we could reduce the number of bits used in our analogue-to-digital converter. However all of these would affect what we would see on the screen and how we perceive the video quality. We need to reduce the bit-rate somehow without us noticing when we come to display the video. Fortunately there are techniques that can be used which we notice less when it comes to the viewed video. They rely on how we perceive the video image. Communication Technology Notes R Germon 2010 1 Digital Video The first trick is to notice that we are less able to see detail in colour than we are brightness. If we have a series of lines where just the colour changes between them (no brightness change) the lines need to be further apart than if they are black and white (just brightness change) in order to see distinct lines rather than a continous colour (or grey). In more technical terms we can see greater resolution in luminance (brightness) than we can colour. To make use of this in our treatment of video the RGB values can be transformed into YC R C B or Luma, Chroma Red and Chroma Blue * . The Luma value carries the black and whiteness or brightness (luminance) and the Chroma values carry the colour information (this is just a transformation of the values we can always transform them back; and we do when it comes to display the video). The advantage of this transformation is that we can now treat the Luma and Chroma values separately. We find that provided we keep the same number of pixels showing black and whiteness (the Luma values) we can have fewer pixels showing the coloured (Chroma) part. Typically we can get away with half as many chroma pixels as luma both horizontally and vertically and not notice the difference when we convert back to a full set of RGB values that a display will require. This way we can half the number of bits representing each picture, and therefore the bit-rate, without really noticing. Another step we can take is to loose spatial detail in each picture that we don't really notice. It turns out that to resolve fine detail we need high contrast; so fine detail that doesn't have large luma or chroma changes can be lost without us noticing. This is the principle of JPEG still image compression that is also used in video. Teasing out the fine detail involves transforming our pixel values into a form that identifies the fine detail. Fortunately there is a mathematical tool that does just this; the Discrete Cosine Transform. Once transformed, approximations are made to the values representing fine detail, which often turn out to be zero which can then be coded very efficiently. Choosing how close the approximations are is set when the degree of compression or video quality is set. We can typically half the number of bits per picture and the bit-rate using these techniques without noticeable degradation in picture quality. Video picture sequences are usually made up of consecutive pictures that have a lot of similarity. Only a small amount of the image changes from one picture to the next. Parts of the image that have moved will typically have the same pixel values as the corresponding pixels in the previous and following pictures; just in a different position in the image. Where blocks of pixels are the same it is not necessary to store or transmit the values again instead a position vector pointing to the reference pixels can be coded. Typically blocks of 16x16 pixels are treated in this way giving the potential for enormous bit savings. This yields three types of pictures in the compressed video: Intra pictures (Keyframes in editing) which have no reference to adjacent pictures where just the still image techniques described above are used, Predicted pictures that use past pictures as reference and Birectionally predicted pictures that use past and future pictures as reference. Once all this has been done there is still scope for further reduction of bit-rate by noting that some bit sequences are more likely than others and coding accordingly. These techniques combined enable up to a 100-fold reduction in bit-rate whilst maintaining a broadcast quality image. This discussion started with a standard definition picture. The trend to have bigger TVs and watch video up close (e.g. on a monitor) has meant there is a drive towards higher resolution images (commonly known as High Definition) and even consumer camcorders and phones now have HD recording capability. So-called full-HD in widescreen format has 1920x1080 pixels; five times the standard definition. This drives the need for video compression even harder. Actual * This trick was actually used with analogue TV transmissions to allow colour transmissions simultaneously with the original monochrome. The Y component was used in monochrome receivers. Colour receivers could also decode the low bandwidth chroma signals. Communication Technology Notes R Germon 2010 2 Digital Video implementations typically employ refinements of the principles discussed above, for example in MPEG 4. Until recently, mobile phone screen resolution was a maximum of about 480x320 pixels (e.g. Iphone 2 and 3), with most being somewhat below this, gives a pixel count nearly three times lower than standard definition. In terms of video delivery this works in our favour. So through compression we can reduce the bit-rate required to store or transmit video to a level that is commensarate with that available from existing communication channels and the capacity of storage media. On the other-hand it does make things substantially more complicated. Another consequence is that we now require a very reliable (virtually error free) transmission or storage channel to prevent the bare-bones video data from being corrupted and hence the displayed image distorted. Whilst compression techniques used for broadcast TV are standardized (so that all TV/ set-top-box manufacturers can build a working decoder, they actually use MPEG 2) that is not the case when it comes to computer based video and IP based distribution. In the world of the PC there are a number of different compression techniques (codecs) and an even greater number of ways of wrapping up the compressed data and mixing it (multiplexing) with other streams such as audio and textual information; the so-called container format. Fortunately most media players are able to handle the most common of these. The Internet and IP networks generally are relatively new medium for distributing video and are significantly different to the broadcast transmission systems traditionally used for TV. In the latter, dedicated, fixed bandwidth is available for a particular channel and the image quality is usually very high. The programs themselves are usually of a high production quality and censorship of the content implicit. Internet video can have greatly varying quality in terms of both image and production. Video can be delivered over IP networks in three main ways: Download and Play: Here a video file must be downloaded in its entirity before it can be played. A copy of the video file is stored locally to the media player. The video quality available is dependent only on how the video was captured and encoded to create the file. Progressive Download: Here the video file is opened by the media player whilst it it still being downloaded. However the play-rate (this is fixed by the compressed video bit-rate) and download rate (what is available from the communication path) are independent. The file on the server has to exist before this can happen so is not suitable for streaming a live source. Live Streaming: Here the video is transmitted at the same rate as it is renderred; there is no local storage (apart from a small amount of buffering). The communication bandwidth must be at least as big as the video bit-rate at all times. This technique usually requires special protocols and is necessary if the video stream is from a live feed * . Whereas in broadcast video all receivers receive the same signal (i.e. a single video stream is transmitted to all recievers) an IP network is usually used in a unicast fashion one stream per *Apple has recently introduced an alternative protocol called http live streaming that is required to stream video to Iphones. In this a file or stream (in MPEG-TS format) is broken up into small files by a segmenter. An index file keeps track of which files have been received and rendered so that the next file can be requested. Streams of different quality may be available which cna be chosen depending on available bandwidth. Apple reference: http://developer.apple.com/library/ios/#documentation/NetworkingInternet/Conceptual/StreamingMediaGuide/ Introduction/Introduction.html#//apple_ref/doc/uid/TP40008332-CH1-DontLinkElementID_29 Communication Technology Notes R Germon 2010 3 Digital Video receiver even if the stream is the same video. The network bandwidth required therefore increases at the same rate as the number of receivers. One way round this is to use multicasting. This is where a packet is only duplicated when necessary in the multicast supported network (the routers have to support multicasting). Alternatively, a content delivery network may be used where the video content is duplicated at the edges of the network close to the end-user.