L5-6 - Data Representation (Part 2-3)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Data Representation

(Part 2-3)
• A graphic, what we see A pixel
with multimedia, is really
just a bunch of pixels
both in horizontal and
vertical direction.
• In the simplest form,
each of the dots or pixels
is a bunch of 0s and 1s.

Data Representation 2
1 1 0 0 0 0 1 1
1 0 1 1 1 1 0 1
0 1 0 1 1 0 1 0
0 1 1 1 1 1 1 0
0 1 0 1 1 0 1 0
0 1 1 0 0 1 1 0
1 0 1 1 1 1 0 1
1 1 0 0 0 0 1 1

Binary image: A digital image that has only


two possible values (0 and 1) for each pixel.

Data Representation 3
What if we use 2 bits
to represent a pixel?

We would be able to represent


22 𝑜𝑟 4 different shades of gray.

Data Representation 4
More bits

More shades of gray

Data Representation 5
https://goo.gl/tGJ3h3

Grayscale image: A grayscale (or graylevel) image


is one in which the only colors are shades of gray.

Data Representation 6
• HOW TO REPRESENT COLORS? → RGB Images
• RGB stands for Red Green Blue
o We generally use 8 bits for each color (red/green/blue)
o i.e. total 8 x 3 = 24 bits to represent the color of a pixel.

######## ######## ########


RED GREEN BLUE

Data Representation 7
• RGB stands for Red Green Blue
o With information giving an amount of red, an amount of green, and an
amount of blue, you can tell a computer how to colorize pixels
o None of the colors yields a black pixel
o All of the colors yields a white pixel
o In between these two options is where we get all sorts of colors

Data Representation 8
11111111 00000000 00000000 00000000 00000000 11111111 00000000 11111111 00000000

00000000 00000000 00000000 11111111 11111111 11111111

Data Representation 9
11111111 11111111 00000000 11111111 00000000 11111111 00000000 11111111 11111111

(We can get many other color variations by mixing


the primary colors in different quantities.)

Data Representation 10
▪ Two factors that affect the quality of an image:
1. Bit depth: The amount of bits available for each pixel in an image.
2. Resolution:
➢ Resolution refers to the number of pixels in an image. Resolution is
sometimes identified by the width and height of the image as well as the total
number of pixels in the image.

➢ For example, an image that is 2048 pixels wide and 1536 pixels high (2048 x
1536) contains (multiply) 3,145,728 pixels (or 3.1 Megapixels). You could call it
a 2048 x 1536 or a 3.1 Megapixel image.

Data Representation 11
If we keep the image size the same and increase the
resolution, the image gets sharper and more detailed.
(The opposite happens if we decrease the resolution.)

Data Representation 12
Original (400x262)
Data Representation 13
Half Size (200x131)
Data Representation 14
Selecting
resolution

Data Representation 15
Q1) Using 6 bits, we can represent _____ different things at most.
a) 6
b) 64
c) 12
d) 32
e) 128

Data Representation 16
Q2) A _____is a standard way of storing binary data in a computer.
a) metadata
b) sample rate
c) file format
d) pixel

Data Representation 17
Q3) A _____ image is one in which the only colors are shades of gray.
a) grayscale
b) binary
c) RGB
d) selfie

Data Representation 18
Q4) Which factor affects the quality of an image? (Select all that applies)
a) ASCII code
b) Bit depth
c) Sampling rate
d) Resolution

Data Representation 19
Source: https://www.youtube.com/watch?v=fGASncJR_kg

Data Representation 21
▪ Video formats are just a bunch of images shown quickly in succession to
create the illusion of motion.
▪ Common video file formats: MP4, FLV, AVI etc.

▪ Three terms related to video:


1. Frame: Each still image in a video is called a frame.
2. Frame rate: The number of frames that are projected or displayed per
second is called frame rate.
3. Aspect Ratio: The Aspect Ratio is the ratio of a frame’s width to its height.
It is notated as width:height

Data Representation 23
▪ Data compression: Reducing the amount of space needed to store a
piece of data
▪ Why compress files?
• Saving storage space
• Fast data transfer

▪ Compression ratio: The size of the compressed data divided by the size
of the uncompressed data

Data Representation 24
▪ Bandwidth: The number of bits or bytes that can be transmitted from
one place to another in a fixed amount of time

▪ Compression ratio: The size of the compressed data divided by the size
of the uncompressed data

▪ Lossless compression: A data compression technique in which there is


no loss of information

▪ Lossy compression: A data compression technique in which there is


loss of information

Data Representation 25
▪ 3 types of text compressions:
1. Keyword encoding
2. Run-length encoding
3. Huffman encoding

Data Representation 26
Keyword encoding
▪ In this text compression technique, we replace a frequently used word
with a single character
▪ For example, suppose we used the following chart to encode a few
words:

Data Representation 27
Keyword encoding Original paragraph:
The human body is composed of many independent systems, such
as the circulatory system, the respiratory system, and the
reproductive system. Not only must all systems work independently,
but they must interact and cooperate as well. Overall health is a
function of the well-being of separate systems, as well as how these
separate systems work in concert.

Encoded paragraph:
The human body is composed of many independent systems, such ^
~ circulatory system, ~ respiratory system, + ~ reproductive system.
Not only & each system work independently, but they & interact +
Original paragraph → 352 characters
cooperate ^ %. Overall health is a function of ~ %-being of separate
Encoded paragraph → 317 characters
Compression ratio → 317/352 or 0.9 systems, ^ % ^ how # separate systems work in concert.

Data Representation 28
Keyword encoding
▪ Limitations:
1. The character code we use to replace a word cannot be used in the passage (eg, $)
2. The word The and the cannot be encoded by the same character because they
contain different letters
3. We would not gain anything by encoding things like “I” or “a”
4. Save per word is small

▪ Advantage:
The encoded patterns are generally complete words rather than suffix’s. For example the
word dig = ~, digging= ~ing. This allows the pattern being encoded generally appear more
often then the whole word
Data Representation 29
Run-length encoding
▪ In this text compression technique, we replace a long series of a repeated
character with a count of the repetition.
• It is also sometimes called recurrence coding.

▪ A sequence of repeated characters is replaced by a flag character, followed by


the repeated character, followed by a single digit that indicates how many times
the character is repeated.
• we will use the ‘*’ character as our flag
• A string of 3 characters or less do not use flag characters

▪ This type of repetition doesn’t generally take place in English text, but often
occurs in large data streams, such as DNA sequences.

Data Representation 30
Run-length encoding
▪ Examples
1: AAAAAAA = *A7
2: nnnnnxxxxxxxxxccchhhhhh = *n5*x9ccc*h6

Data Representation 31
Huffman encoding
▪ In this text compression technique, we use a variable-length binary string
to represent a character so that frequently used characters have short
codes
▪ For example, suppose we use the following Huffman encoding to
represent a few characters:

Data Representation 32
▪ Any kind of data can be compressed. There are two main categories of
compression:
1. lossy
2. lossless

Data Representation 34
▪ Lossless compression
• Lossless compression doesn’t reduce the quality of the file at all.
• Since no data is lost, lossless compression allows a file to be retrieved exactly
as it was when originally created.

Data Representation 35
▪ What do the bytes of this image look like? Well, since the top of the flag
is just a solid color, we’re going to have a whole lot of bytes that are
exactly the same. Wouldn’t it be nice if we could just say “here come 100
red pixels” rather than listing out each pixel individually?

Data Representation 36
Lossless compression
▪ There is a lot of repeated blue in the first image
• Using the same 24 bits to represent each pixel!
▪ The second image is compressed and not what a user would see

Data Representation 37
▪ Lossy compression
• Lossy compression removes some of a file’s original data in order to reduce
the file size.
• This might mean reducing the numbers of colors in an image or reducing the
number of samples in a sound file.
• This can result in a small loss of quality of an image or sound file.
• The space savings of lossy compression are higher than they are with lossless
compression.

Data Representation 38
▪ Lossy compression

Data Representation 40
Image File Formats:
▪ JPEG (Joint Photographic Experts Group)
• Supports 24-bit color
• Uses lossy compression
▪ PNG (Portable Network Graphics)
• High quality graphics
• Supports 24-bit color
• Uses lossless compression
▪ BMP (Bitmap)
• Originally used by Windows
• Not super common these days

Data Representation 41
The night before exam

Data Representation 42
Image File Formats:
• GIF (Graphics Interchange Format)
o Low quality images
o Only supports up to 8-bit color
o Often used for memes
o Can be animated
o Like a video file with only a few images

• All these formats ultimately have an limited amount of information


o Ultimately just store pixels and colors of when the image was taken

Data Representation 43
Information layer → completed ^_^

Data Representation 44
1. Computer Science Illuminated – Nell Dale, John Lewis
• Chapter 3
2. https://www.bbc.co.uk/bitesize/subjects/zvc9q6f
3. https://www.bbc.co.uk/bitesize/guides/zjfgjxs/revision/1
4. https://cs50.harvard.edu/technology/2017/

Data Representation 45

You might also like