Lec 11

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

6.801/6.

866: Machine Vision, Lecture 11

Professor Berthold Horn, Ryan Sander, Tadayuki Yoshitake


MIT Department of Electrical Engineering and Computer Science
Fall 2020

These lecture summaries are designed to be a review of the lecture. Though I do my best to include all main topics from the
lecture, the lectures will have more elaborated explanations than these notes.

1 Lecture 11: Edge detection, Subpixel Position, CORDIC, Line Detection,


(US 6,408,109)
In this lecture, we will introduce some discussion on how patents work, their stipulations, and make our discussion explicit by
looking at a patent case study with sub-pixel accuracy edge detection. You can find the patent document on Stellar as well
under “Materials” → “US patent 6,408,109”.

1.1 Background on Patents


We will begin with some fun facts about industrial machine vision:
• Integrated circuit boards cannot be manufactured without machine vision

• Pharmaceutical chemicals also cannot be manufactured without machine vision


How do entrepreneurs and industrial companies ensure their inventions are protected, while still having the chance to disseminate
their findings with society? This is done through patents. Patents:
• Can be thought of as a “contract with society” - you get a limited monopoly on your idea, and in turn, you publish the
technical details of your approach.
• Can help to reduce litigation and legal fees.
• Can be used by large companies as “ammunition” for “patent wars”.

Some “rules” of patents:


• No equations are included in the patent (no longer true)

• No greyscale images - only black and white


• Arcane grammar is used for legal purposes - “comprises”, “apparatus”, “method”, etc.

• References of other patents are often included - sometimes these are added by the patent examiner, rather than the patent
authors

• Most patents end with something along the lines of “this is why our invention was necessary” or “this is the technical gap
our invention fills”
• Software is not patentable - companies and engineers get around this by putting code on hardware and patenting the
“apparatus” housing the code.

• It is also common to include background information (similar to related literature in research).

Now that we have had a high-level introduction to patents, let us turn to focus on one that describes an apparatus for sub-pixel
accuracy edge finding.

1
1.2 Patent Case Study: Detecting Sub-Pixel Location of Edges in a Digital Image
To put this problem into context, consider the following:

• Recall that images typically have large regions of uniform/homogeneous intensity

• Image arrays are very memory-dense. A more sparse way to transfer/convey information about an image containing edges
is to use the locations of edges as region boundaries of the image. This is one application of edge finding.

• This patent achieves 1/40th pixel accuracy.

This methodology and apparatus seeks to find edges of objects in digital images at a high sub-pixel accuracy level. To do so,
the authors leverage different detectors, or kernels. These kernels are similar to some of the computational molecules we have
already looked at in this course:

• Robert’s Cross: This approximates derivatives in a coordinate system rotated 45 degrees (x0 , y 0 ). The derivatives can
be approximated using the Kx0 and Ky0 kernels:
 
∂E 0 −1
→ K x 0 =
∂x0 −1 0
 
∂E 1 0
→ Ky 0 =
∂y 0 0 −1

• Sobel Operator: This computational molecule requires more computation and it is not as high-resolution. It is also more
robust to noise than the computational molecules used above:
⎡ ⎤
−1 0 1
∂E
→ K x = ⎣ 2 0 2⎦
∂x
−1 0 1
⎡ ⎤
−1 2 −1
∂E
→ Ky = ⎣ 0 0 0 ⎦
∂y
1 2 1

• Silver Operators: This computational molecule is designed for a hexagonal grid. Though these filters have some advan-
tages, unfortunately, they are not compatible with most cameras as very few cameras have a hexagonal pixel structure.

Figure 1: Silver Operators with a hexagonal grid.

For this specific application, we can compute approximate brightness gradients using the filters/operators above, and then we can
convert these brightness gradients from Cartesian to polar coordinates to extract brightness gradient magnitude and direction
(which are all we really need for this system). In the system, this is done using the CORDIC algorithm [1].

2
1.2.1 High-Level Overview of Edge Detection System
At a high level, we can divide the system into the following chronological set of processes/components:

1. Estimate Brightness Gradient: Given an image, we can estimate the brightness gradient using some of the filters
defined above.
2. Compute Brightness Gradient Magnitude and Direction: Using the CORDIC algorithm, we can estimate the
brightness gradient magnitude and direction. The CORDIC algorithm does this iteratively through a corrective feedback
mechanism (see reference), but computationally, only uses SHIFT, ADD, SUBTRACT, and ABS operations.

3. Choose Neighbors and Detect Peaks: This is achieved using brightness gradient magnitude and direction and a pro-
cedure called non-maximum suppression [2].

First, using gradient magnitude and direction, we can find edges by looking across the 1D edge (we can search for this edge
using the gradient direction Gθ , which invokes Non-Maximum Suppression (NMS). We need to quantize into 8 (Cartesian)
or 6 (Polar) regions - this is known as coarse direction quantization.

Finally, we can find a peak by fitting three points with a parabola (note this has three DOF). This approach will end up
giving us accuracy up to 1/10th of a pixel. To go further, we must look at the assumptions of gradient variation with
position, as well as:
• Camera optics
• Fill factor of the chip sensor
• How in-focus the image is
• How smooth the edge transition is
The authors of this patent claim that edge detection performance is improved using an optimal value of “s” (achieved through
interpolation and bias correction), which we will see later. For clarity, the full system diagram is here:

Figure 2: Aggregate edge detection system. The steps listed in the boxes correspond to the steps outlined in the procedure
above.

Some formulas for this system:


q
1. G0 = G2x + Gy2 (gradient estimation)
 
Gy
2. Gθ = tan−1 Gx (gradient estimation)

3. R0 = max(|Gx |, |Gy |) (octant quantization)

4. S0 = min(|Gx |, |Gy |) (octant quantization)


At a high level, the apparatus discussed in this patent is composed of:

3
• Gradient estimator
• Peak detector
• Sub-pixel interpolator
Next, let us dive in more to the general edge detection problem.

1.3 Edges & Edge Detection


Let us begin by precisely defining what we mean by edges and edge detection:
• Edge: A point in an image where the image brightness gradient reaches a local maximum in the image brightness gradient
direction. Additionally, an edge is where the second derivative of image brightness (can also be thought of as the gradient
of the image brightness gradient) crosses zero. We can look at finding the zeros of these 2nd derivatives as a means to
compute edges.
• Edge Detection: A process through which we can determine the location of boundaries between image regions that are
roughly uniform in brightness.

1.3.1 Finding a Suitable Brightness Function for Edge Detection


Let us first approximate an edge brightness function using a step function, given by u(x), such that:
(
1, x ≥ 0
u(x) =
0, x < 0

u(x)
1

x
−2 −1 0 1 2

Using this cross-section across the edge to model the edge actually causes problems arising from aliasing: since we seek to find
the location of an edge in a discrete, and therefore, sampled image, and since the edge in the case of u(x) is infinitely thin, we
will not be able to find it due to sampling. In Fourier terms, if we use a perfect step function, we introduce artificially high
(infinite) frequencies that prevent us from sampling without aliasing effects. Let us instead try a “soft” step function, i.e. a
“sigmoid” function: σ(x) = 1+e1−x . Then our u(x) takes the form:

A “Soft” Unit Step Function, u(x)

0.8

0.6
u(x)

0.4

0.2

0
−6 −4 −2 0 2 4 6
x

The gradient of this brightness across the edge, given by ru(x) (or du
dx in one dimension), is then given by the following. Notice
that the location of the maximum matches the inflection point in the graph above:

4
Gradient of “Soft” Unit Step Function, ru(x)

0.25

0.2

0.15

ru(x)
0.1

5 · 10−2

0
−6 −4 −2 0 2 4 6
x

As we mentioned above, we can find the location of this edge by looking at where the second derivative of brightness crosses
zero, a.k.a. where r(ru(x)) = r2 u(x) = 0. Notice that the location of this zero is given by the same location as the inflection
point of u(x) and the maximum of ru(x):

Gradient2 of “Soft” Unit Step Function, ru(x)

0.1

5 · 10−2
r2 u(x)

−5 · 10−2

−0.1
−6 −4 −2 0 2 4 6
x

For those curious, here is the math behind this specific function, assuming a sigmoid for u(x):
1
1. u(x) = 1+exp (−x)
 
du d 1 exp(−x)
2. ru(x) = dx = dx 1+exp −x = (1+exp(−x))2

d2 u d exp(−x) − exp(−x)(1+exp(−x))2 +2 exp(−x)(1+exp(−x)) exp(−x)


3. r2 u(x) = dx2 = dx ( (1+exp(−x))2 ) = (1+exp(−x))4

Building on top of this framework above, let us now move on to brightness gradient estimation.

1.3.2 Brightness Gradient Estimation


This component of this studied edge detection framework estimates the magnitude and direction of the brightness gradient. Let
us look at how this is derived for different filters:

Robert’s Cross Gradient: Since this estimates derivatives at 45 degree angles, the pixels are effectively further apart, and
this means there will be a constant of proportionality difference between the magnitude of the gradient estimated here and with
a normal (x, y) coordinate system:
q q
Ex20 + Ey20 ∝ Ex2 + Ey2

5
Next, we will look at the Sobel operator. For this analysis, it will be helpful to recall the following result from Taylor Series:

(δx)2 00 (δx)3 000 (δx)4 (4) X (δx)i f (i) (x) Δ
f (x + δx) = f (x) + δxf 0 (x) + f (x) + f (x) + f (x) + ... = , where 0! = 1
2! 3! 24 i=0
i!

Let us first consider the simple two-pixel difference operator (in the x-direction/in the one-dimensional case),
i.e. dE 1
dx → Kx = δ (−1 1). Let us look at the forward difference and backward difference when this operator is applied:

f (x+δx)−f (x) (δx)2 000


1. Forward Difference: ∂E
∂x ≈ δx = f 0 (x) + δx 00
2 f (x) + 6 f (x) + ...
f (x)−f (x−δx) (δx)2 000
2. Backward Difference: ∂E
∂x ≈ δx = −f 0 (x) − δx 00
2 f (x) + 6 f (x) + ...

Notice that for both of these, if f 00 (x) is large, i.e. if f (x) is nonlinear, then we will have second-order error terms that appear in
our estimates. In general, we want to aim for removing these lower-order error terms. If we average the forward and backward
differences, however, we can see that these second-order error terms disappear:
f (x+δx)−f (x) f (x)−f (x−δx)
+ (δx)2 000
δx δx
= f 0 (x) + f (x) + ...
2 6
Now we have increased the error term to 3rd order, rather than 2nd order! As a computational molecule, this higher-order filter
Sobel operator looks like dE 1
dx → Kx = 2δ (−1 0 1). But we can do even better! So long as we do not need to have a pixel at our
proposed edge, we can use a filter of three elements spanning (x − 2δ x x + 2δ ). There is no pixel at x but we can still compute
the derivative here. This yields an error that is 0.25 the error above due to the fact that our pixels are 2δ apart, as opposed to δ
apart:

( xδ
2 )
2
error = f 000 (x)
6
This makes sense intuitively, because the closer together a set of gradient estimates are, the more accurate they will be. We
can incorporate y into the picture, making this amenable for two-dimensional methods as desired, by simply taking the center
of four pixels, given for each dimension as:
 
∂E 1 −1 1
≈ Kx =
∂x 2δx −1 1
 
∂E 1 −1 −1
≈ Ky =
∂y 2δy 1 1

The proposed edge is in the middle of both of these kernels, as shown below:

Figure 3: We can estimate the brightness gradient with minimal error by estimating it at the point at the center of these 2D
filters.

Estimating these individually in each dimension requires 3 operations each for a total of 6 operations, but if we are able to
take the common operations from each and combine them either by addition or subtraction, this only requires 4 operations.
Helpful especially for images with lots of pixels.

Next, we will discuss the 3-by-3 Sobel operator. We can think of this Sobel operator (in each dimension) as being the dis-
crete convolution of a 2-by-2 horizontal or vertical highpass/edge filer with a smoothing or averaging filter:
⎡ ⎤
    −1 0 1
−1 1 1 1
1. x-direction: 2δ1x ∗ = ⎣−2 0 2⎦
−1 1 1 1
−1 0 1
⎡ ⎤
    −1 −2 −1
−1 −1 1 1
2. y-direction: 2δ1y ∗ =⎣ 0 0 0⎦
1 1 1 1
1 2 1

6
A few notes about the derivation above:

• The convolution used is a “padded convolution” [3], in which, when implemented, when the elements of the filter/kernel
(in this case, the averaging kernel) are not aligned with the image, they are simply multiplied by zero. Zero padding is the
most common padding technique, but there are other techniques as well, such as wraparound padding.
• This approach avoids the half-pixel (in which we estimate an edge that is not on a pixel) that was cited above.

• Smoothing/averaging is a double edge sword, because while it can reduce/remove high-frequency noise by filtering, it can
also introduce undesirable blurring.

Next, we will look at how the brightness gradient is converted from Cartesian to Polar coordinates:

(Ex , Ey ) → (E0 , Eθ )
q
E0 = Ex2 + Ey2
E 
y
Eθ = tan−1
Ey

Finally, we conclude this lecture by looking at appropriate values of s for quadratic and triangular functions. This assumes we
have three gradient measurements centered on G0 : (1) G− , (2) G0 , and (3) G+ . Let us look at the results for these two types
of functions:
G+ −G−
1. Quadratic: s = 4(G0 − 12 (G+ −G− ))
, this results in s ∈ [− 12 , 21 ].

G+ −G−
2. Triangular: s = 2(G0 −min(G+ ,G− ))

A few notes about these approaches:

• Note that in each case, we only want to keep if the magnitude G0 is a local maximum, i.e. G0 > G+ and G0 ≥ G− .

• In the quadratic case, we can parameterize the curve with three data points using three degrees of freedom, i.e. ax2 +bx+c =
0. With this approach, b ≈ first derivative and a ≈ second derivative.

1.4 References
1. CORDIC Algorithm, https://www.allaboutcircuits.com/technical-articles/an-introduction-to-the-cordic-algorithm/
2. Non-Maximum Supression, http://justin-liang.com/tutorials/canny/#suppression
3. Padded Convolution, https://medium.com/@ayeshmanthaperera/what-is-padding-in-cnns-71b21fb0dd7

7
MIT OpenCourseWare
https://ocw.mit.edu

6.801 / 6.866 Machine Vision


Fall 2020

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms

You might also like