Color For VFX (PDFDrive)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

Color for VFX

Theory & Practice in a Nutshell

Chris Healer
CEO / CTO / VFX Supervisor
[email protected]

Version 0.2 05/01/14


Main Concepts:

Linearization Legalization LUTs Storage


(normalization) (Scaling) (applying and inverting them) (bits, ranges, disk space)
And those components combine to build this
common visual effects workflow:
These 4 concepts are interrelated to create the
ecosystem of color transforms that we use:

Linearization Legalization LUTs Storage


(normalization) (Scaling) (applying and inverting them) (bits, ranges, disk space)

Converting recorded image Scaling incoming or Color transformations stored Where we put material
data into gamma-corrected outgoing images to increase as a file or equation. These between stages of post
material for display or (stored) dynamic range, fit serve both technical and production. This creates
processing. broadcast standards, or to creative purposes. many of the obstacles we
work with certain LUTs. have to overcome.
Foundation
Between Cup and Lip
What is color?

Most simply put, what we call color is


combinations of different frequencies of
light (energy) that our eyes are sensitive to.

The fundamental particle that transmits light is a Photon. Photons are both
waves and particles, and different combinations of frequencies (waves) and
intensities (quantities of particles) of photons hitting our eyes produce the
effect we think of as color.
How do we see it?

Light enters the eye, and the


rods and cones on our retinas
encode incoming spectral
signatures into something like
luminance (rods), and
chrominance (cones).

We are more apt to think of color


in terms of hue, saturation, and
luminance.
Cameras (generally) sample in RGB space
The shutter opens for a moment
Light enters the lens
Depending on the angle of a particular beam of light, it hits a pixel
The intensity of the frequency of the light we care about is recorded
And we view what’s recorded on an output device

The device receives RGB information and


converts it back to real light.

The device is probably anticipating that the


image is gamma-shifted (sRGB, Rec709)

The device itself has limitations and


idiosyncrasies that it may try to compensate for.
Here’s the basic path:
It’s a long path to get there!
The color that enters your eye from a monitor was once a group of photons, but
to get to your eye, at the minimum it had to:

1. Pass through one or more camera lens and/or filter elements.


2. Be captured by a CMOS (the photosensitive camera element)
3. Be stored on a disk by the camera
4. Be read from disk and processed or transcoded to a playback format
5. Be played from a playback system (media player or editorial system)
6. Pass through any OS color calibration settings
7. Pass through any monitor colorimetry LUTs or manual settings.
8. Pass through any ambient light pollution in the room.
9. And finally pass through your eye
Or rephrased...

The physical world works one way,

Our eyes work another way,

And digital imaging works yet another way.


Linearization
aka: Log to Lin, Normalization, Gamma Correction
Our perception of light is logarithmic

Here’s a thought experiment (quoting from Art Adams):

Imagine one lit candle in a dark room. If we light fifteen


additional candles, one at a time, the amount of total
light coming from the candles will obviously increase.

By the time the sixteenth candle is lit you might assume that the room would
appear to our eyes to be sixteen times as bright as it did with one candle lit. But
because we see in logarithmic values, the room actually only seems four times
as bright as it did when the first candle was lit.
We only care about apparent changes
To affect the same apparent change, we need twice as much light:
And this is where Log and Lin come in...
You’ve probably heard these terms before, so let’s look at them for moment.

We’re used to linear values… we use them to count: 1,2,3,4,5…


Logarithmic values represent counting exponentially: b1, b2, b3, b4, b5…

For this to be meaningful, we need a


base (b) to count powers of.

The two graphs at the right show


a log and linear plot of the same data.

The plot on the left uses log base 10.


Camera sensors count photons

To extend our analogy, Cameras count the number of candles.

But we don’t care about the number of


candles, we care more about the apparent
differences between the amount of light
coming from the candles.

We need to Linearize the image data to


get those values, which is done by
applying a gamma curve or LUT of the
right shape to the data.
What’s the “Right Shape”?

The shape is determined by one of:

● the camera manufacturer (Arri, Sony, Red)


● the encoding/scanning standard (Cineon)
● the display standard (sRGB, Rec709)

Depending on the software, this is sometimes


called the “Color Space”.
Legal / Full Range
Legal Values vs. Full Range

Depending on your display device or recording device, it may be working with


what are called “legal values”, as opposed to “full range,” or “extended” values,
which means that the following remapping takes place:

Full Range Legal Range

8-bit 0-255 16-235

10-bit 0-1023 64-940

Float 0.0-1.0 0.062-0.921


Why?
“Legal Values” are commonly associated with broadcast video, which may use
the footroom and headroom for synchronization or other digital codes.

But cameras, depending on the settings, may use the additional headroom to
gain additional stops of dynamic range. The camera will scale but not crop the
data before recording, and some highlight information can be gained that way.

Frustratingly, we don’t know if footage is scaled until we start to work with it.
LUTs
What is a LUT?

A LUT is a LookUp Table, which is a mapping of data from one value to


another. Incoming values are mapped to outgoing values.

The image on the left is


the identity LUT, which
has no effect. The
image on the right shows
input (bottom) mapped
to output (right)
How we use LUTs

A lookup table is a general purpose math concept. For our purposes, the data
being mapped is each separate channel of each pixel of each frame.

LUTs are represented as sampled curves (like a string


of keyframes). The keyframe values represent the
“table”, and values between keyframes are interpolated.

Often an equation to represent a curve (like a CDL) can be used to the same
effect, but without actually using a “table”. This can still be called a LUT.
Types of LUTS

Generally speaking, there are only two kinds of LUTs: 1D LUTs and 3D LUTs.

A Luminance Curve is the special case of a 1D LUT with all matching curves.

Any color operation in any software can be represented with one of these three
transformations.

The terminology can be a little confusing, so let’s clear that up...


Luminance Curves
A luminance curve maps all channels of an image against one curve.
1D LUTs

A 1D LUT maps each


channel individually to its
corresponding curve. A
1D LUT for an RGB image
has three curves in it
(because it has 3 channels),
and the channels are kept
separate.
3D LUTs

A 3D LUT is more complicated, and it maps each channel against the other
channels, and can’t really be thought of as a curve, but more of a mapping of
one 3D space to another. 3D LUTs allow for crosstalk between the channels
and much more control, including saturation and channel swapping.
Linearization is done with a luminance curve (LUT)
Storage
In an ideal world...

Disk space would be infinite

Bit depth would not have limits

Or better:

We would never need to save


intermediate stages to disk.
But in this world...

Disk space is (usually) limited

We have to choose a bit depth to work in


and to store material in

and

There are many many intermediate stages


that require writing to disk.
What gives?

We have to choose a bit depth to store material in.

Commonly, material is stored in 8-bit or 10-bit, even though


it’s processed in floating point.

10-bit log data on well-exposed footage is generally


sufficient for all purposes.
Disks are big and fast now, why not store everything in
floating point Linear Space?

Knowing that we only care about the apparent differences


between color values, we don’t gain anything except file
size by storing everything as linear floating point.

By applying a logarithmic curve to the image, we are


dedicating color information to only the parts we care
about, as long as we know the shape of that curve.
Building the Workflow
The common workflow:
To Linearize:

1. We need to read the footage as RAW.


2. We may need to apply scaling to it, possibly from Legal to Extended
3. Then we may apply a CDL, which is a 1D LUT.
4. Then we apply the Camera LUT, which is a luminance curve.

The order is very important!


The display branch
The display branch

You’ll need a Display LUT for your display device, probably sRGB (computer
monitor) or Rec709 (broadcast monitor).

The filmmakers or DI facility may have a ‘look’ for a particular shot or sequence,
and you will want to view output with the look applied.

This is most likely a 3D LUT, and will appear as a .cube or .3dl or .cms file.

The Look LUT should be applied before the Display LUT, assuming the
Display LUT is not concatenated into it.
What is LUT concatenation?
Color transforms compound on top of each other, which is pretty apparent as
we create chains of them to view footage. What may not be apparent is that
chains of transforms can be concatenated, or combined, into one transform.

It’s virtually impossible to deconstruct concatenated LUTs into component


transforms. Usually the components are identified in the LUT’s filename.
The storage branch
The storage branch

If possible, we want to invert (or unwrap) the transformations from Linear Space
back to the original color space if we intend to deliver the footage to a colorist
or client.

To do this we need to apply inverse transforms in the reverse order that they
were applied.

Starting from Linear Space working backwards, we will need an Inverse


Camera LUT, an Inverse CDL, and to invert any Scaling.
What does it mean for a LUT to be inverted?

For 1D LUTs and Luminance Curves,


the inverted curve will be mirrored over
the diagonal line y=x.

Conceptually, you’re mapping input to


output back to input.

This can also be achieved by swapping the


x and y values of keyframes or by solving
a curve’s equation for input instead of output.

We may also sometimes see this called a backward (vs. forward) transform.
Are all LUTs reversible?

In short, no. 1D LUTs and Luminance Curves are generally invertible. But
very few 3D LUTs are mathematically invertible, so applications don’t include
an option to invert a 3D LUT.

This means that there are certain transformations


which are destructive to the color information in
an image, and once applying that transformation,
the color information can’t be retrieved.

For example, imagine using a 3D LUT to fully


desaturate an image. No transform can be created
to bring that color information back! A non-reversible 1D LUT
(solarisation)
Review:
Cameras and Recording
(more about linearization)
Every camera is different
Camera vendors (Arri, Red, Sony) want to present the best color they can, so
they apply some minimal processing to incoming data before saving it to disk in
order to conform it to a luminance curve that they know and like.

The Luminance curve they choose is based on many factors, so suffice to say
that their curve is the product of lots of testing in different lighting and camera
conditions. It is the camera manufacturer’s “special sauce”.

This luminance curve is called the Camera LUT,


and is used to linearize the camera data.
The sensitivities of some cameras (and film)
Making sure the image looks right
While on set, the Director of Photography needs to make sure that the image
looks right photographically, digitally, and on the script/story level.

Producers and Directors feel more comfortable if they see a good looking
image while on set, so a color correction is often applied to the preview image.

This color correct is recorded in a CDL (Color Decision List). The correction
may be purely creative (make skin tones nicer), or may be something technical
to compensate for color temperature or poor exposure.

The CDL is applied to the RAW data before the Camera LUT is applied.
Leaving the shoot

We want to leave the shoot with everything we need to linearize the footage.

Namely, CDL data, knowledge of which camera was used, and hopefully
whether the camera was recording legal or extended range.
A frequent ‘Gotcha’...

The camera may be sending a Legal (not Extended) signal to the DIT.

The DIT has the option to scale in the CDL, or to scale the signal (in his
software) before creating the CDL.

This is particularly confusing for us in VFX, because the CDL is expecting an


extended signal, but getting a scaled one.

The opposite case is also true, where the CDL is doing video scaling (by
modifying slope and offset), and the video is already in Extended range.
More on Storage
Video data is really big!
Like, really big.

To calculate uncompressed video size:

Width x Height x Bit-Depth x Channels = Size in bytes


1920 x 1080 x 4 x 3 = 24,883,200 bytes = 23.7 MB / frame
(for a single frame of HD in RGB in uncompressed float)

The disk space required to work in HD uncompressed floating point space is


about 600 MB / second, or 33 GB / minute.
So, how do we make it smaller?
Basically there are two methods: Compression and adjusted Bit-Depth.

Compression is outside the scope of this presentation…

But Bit-Depth is such an important aspect of compositing and color (not to


mention digital imaging in general) that we have to visit it for a moment.

One can argue that smaller bit depths are


a form of compression, which is true,
but we want to focus on processing space
for the moment.
Bit depth and number formats
Bit depth implies the resolution, or number of unique values, that can be
represented by a variable.

By using fewer bits, we have less to read and write from disk, but at the
expense of having less color information, or fewer tones available.

Depth Bytes Type Range / Resolution / Precision / Tones Available

8-bit 1 Integer All integer values between 0-255

10-bit 2 Integer All integer values between 0-1023 (note that some formats pack as 1.25 bytes per pixel)

16-bit 2 Integer All integer values between 0-65535

16-bit (half float) 2 Float Certain Floats between +/- 65504 with the most fractional resolution dedicated on 0-1

32-bit 4 Float Certain Floats between +/- 3.4 × 1038 with the most fractional resolution dedicated on 0-1
What those ranges mean
When we say that an 8-bit integer value goes from 0-255, we really mean that
there are 256 shades of grey available between black and white. We map
these to a range of 0.0-1.0 make things uniform and to work in floating point.

Remember, this is happening for Red, Green, and Blue at the same time.
Floating point values
Why do we call it floating point?

Because the decimal place can “float” within the significant digits specified!
The bits stored in a floating point value represent (in a mathematical way) both
the digits to return and the position of the decimal place.
1.23456789 Ok
Interestingly, the effect of this is that there is as -12.3456789 Ok
much resolution dedicated to the values between
123456.789 Ok
-1 and +1 as there is with values less than -1 or
greater than +1. .000123456789 Ok

.0123456789012 Too many digits!

123456789.0123 Too many digits!


Linear Space
A few advantages
Linear Space is a common ground for mixing media
from different sources
Media may be coming from:

● Source Footage
● Stock Footage
● CG-rendered Images
● A Still Camera
● HDRI
● Scanned Film
● etc.

Everything needs to be speaking the same color language, especially because


they are coming from such dramatically different sources!
Linear Space is a neutral ground for display devices

You’ll want to work in linear space and apply a display LUT that corresponds to
the display device you’re working on.

Regular computer monitors tend to have an sRGB response curve.

Broadcast monitors tend to be Rec709.

But a projector may want a custom display LUT to compensate for an off-white
screen or the color temperature of a bulb.

We want to work agnostically of the display device.


Linear Space matches more closely to the way we think
of color.
We may think of something as 50% brighter
than something else.
If you’re not working in linear space, something
appearing 50% brighter will have some other
value.

Anti-aliasing and Motion Blur Falloffs do what


you expect.
Usually motion blur and anti-aliasing are done as
a linear space operation. If you’re comping in
non-linear space, they won’t comp the way you
expect them to.
The Nuke Workflow
Here’s where the buttons are!
The most common case, which you’ve seen before

This is probably how most Nuke


users expect to read and write
their files.

This is basically an abbreviated


version of the full form, which does
not allow for CDLs to be used.

This setup expects any LUTs you


use to anticipate linear space as
input.
A better setup for Reading and Writing footage
Wrap, then Unwrap the transforms.
For unwrapping:
● Colorspace nodes need the In and Out inverted
● Grade and OCIOCDLTransform nodes have an Inverse/Reverse setting
● For scaling, use 16/255 and 235/255 (and use reverse if needed)
● The order of the transforms have to be reversed when unwrapping!
Same name, different direction

It may not be obvious that in Nuke, the colorspace chosen for a Read node and
a Write node are opposites, which follows the convention of unwrapping
transforms before writing to disk, even though in the nuke interface, they have
the same name.

A Read node’s colorspace is a forward transform, which produces an image in


linear space.

A Write node’s colorspace is a backward (inverse) transform, which expects


linear images in, and produces a non-linear result on disk.
I need my quicktime output to match my deliverable!
Generally, you want to read and write your footage with
the same colorspace, so that you’re following a convention
that does not interrupt the rest of a conform process.

Creating quicktimes is the one place that you probably


don’t want to write to the same colorspace as your read.

Most editorial packages are viewed on a computer screen


and/or broadcast monitor, so you will almost always want
to choose sRGB or Rec709.

Be aware that the term Rec709 refers to the gamma


curve, however, because it’s also related to broadcast
monitors, it’s frequently related to Legal vs Full Range
compression, which may also be a setting in the codec.
How to know when something’s off with your color.

If your footage appears:

● Milky
● Contrasty
● Overexposed
● Blacks just never seem rich and dark

You’re probably not linearizing or scaling (legalizing) your footage properly.


You may be using the wrong camera LUT to linearize, or you may be forgetting
to scale legal to full range before linearizing.
Additional Details
(in no particular order)
Our eyes have evolved to be sensitive to only a very small portion of the
electromagnetic spectrum
thanks to our ozone layer
and other aspects of our
atmosphere. Had we
evolved on another planet,
we would see other “colors”.

Notice that these diagrams are reversed from


our usual way of thinking, because purple has
a shorter wavelength (higher frequency) than
red’s longer wavelength.
Why do we apply the CDL before the Camera LUT?

This is a good question, with a simple answer.

The CDL should go before the Camera LUT because we don’t know what’s
coming afterward. For instance, we may be loading a file for VFX purposes,
where we use the chain of transforms shown before.

But we may be doing color correction, or playing back through a hardware


decoder, or doing a live stream.

Putting the CDL directly after the RAW data is the only way to ensure that it will
flow downstream regardless of what the application is, without being baked into
the RAW stream itself.
How does a CDL work?

Given the following:


A CDL’s standard equation is:
Input = input (rgb)
s = slope (rgb)
o = offset (rgb) Out = (Input * s + o)p
p = power (rgb) (gamma)
sat = saturation Which is effectively a 1D LUT.
(we’re ignoring saturation to simplify the equation)
What is color space?
The term “Color Space” refers to the shape or
space occupied by the gamut of the color
model being used. However, the term is
widely misused to refer to any type of color
transformation ranging from gamma/luminance
curves (sRGB, Alexa LogC, SLog, etc) to color
models (RGB, HSV, CMYK, XYZ).

Nuke, for instance, misuses the term “Color


Space” in the Colorspace node, because a
conversion from Cineon to HSV (for instance)
traverses both Color Model and Gamma Bias
Round-off Errors
Round off error happens when the destination variable of a calculation can’t
hold a value accurately enough.

It’s similar to the difference between ⅓ (one third) and 33%, or 33.3% or
33.3333%. One third can’t be represented as a decimal value.

It’s easier to visualize in 8-bit space, where for example if we have a red value
of 127 (~.5 in float) and we darken it by 50%, we will have a value of 63.5,
which as an integer rounds to 64. If we brighten it by 200% (to counteract the
darken), we get a new value of 128. The round-off error (from 127-128) is a
nuisance in this case, but if it’s compounded many times over, things can get
banded, clipped, etc. -- it can get messy!
Comparing Camera LUTs
AlexaV2LogC vs AlexaV3LogC: V3 came
out after V2, and answers to updates in
Alexa’s firmware.

Although the two curves seem very similar


(basically on top of each other), if we look
at the distance between them on higher
values (.6-.9) we see that the two are drastically different,
having a distance of nearly .25.

You may have to try both of these to see which one works,
based on the firmware of the camera it was shot on.
Comparing display LUTs
A display lut is used to present linear material
to the screen. Unlike Camera LUTs, which are
designed to contain High-dynamic values, display
LUTs will usually map 0 to 0 and 1 to 1, with a
gamma response curve in between.

Rec709 and sRGB are shown at the right. The


most difference is the shape of the gamma, where sRGB is close to a ɣ=2.2,
while Rec709 is close to ɣ=1.95.

In 1996, sRGB was intended to be a standard for computer screens and for the
internet, while Rec709 was established as a broadcast standard in 1990.
“Color Space” Cheat Sheet
Name Uses Is a Color Model Luminance Curve

HSV/HSL Saturation Adjustments,


Yes No
Hue Rotations, Color Pickers

RGB Virtually all video processing in a computer


Yes No
or other hardware

CMYK Printing on paper or fabric Yes No

YCrCb/YUV Encoding/compression (Y=luma, CrCb are


Yes No
chrominance components based on luma)

Alexa/Cineon As a camera LUT No (RGB) Yes

sRGB, Rec709 As a display LUT No (RGB) Yes


Things that don’t work in RGB

Just as an interesting side note, RGB is an additive color model, which


basically means that the final color you see is a sum of its components. It’s
built from from black (no light) to a final color by addition.

This is different from a subtractive color model like CMYK, where the
assumption is that we’re starting with white (a sheet of white paper, for
example), and colors mask white down to a final color.

Printing on paper, dying fabrics, projecting film, etc. are examples of subtractive
color.
Alexa LUT Generator (here)
Where Things Are Going
Technology that is on its way
ACES

The Academy Color Encoding System has been under development for many
years, and is starting to appear in new versions of softwares like Nuke and
Flame.

It’s goal is to standardize all steps of the color transformation process, including
cameras, storage, playback, monitors, etc.

This is accomplished by various new types of calibration and equipment


certification, as well as by including more metadata in files and video streams
and by moving to 16-bit linear half-float exr as a standard (instead of dpx).
ACES
http://docs.themolecule.com/color

You might also like