Lec 11
Lec 11
Lec 11
These lecture summaries are designed to be a review of the lecture. Though I do my best to include all main topics from the
lecture, the lectures will have more elaborated explanations than these notes.
• References of other patents are often included - sometimes these are added by the patent examiner, rather than the patent
authors
• Most patents end with something along the lines of “this is why our invention was necessary” or “this is the technical gap
our invention fills”
• Software is not patentable - companies and engineers get around this by putting code on hardware and patenting the
“apparatus” housing the code.
Now that we have had a high-level introduction to patents, let us turn to focus on one that describes an apparatus for sub-pixel
accuracy edge finding.
1
1.2 Patent Case Study: Detecting Sub-Pixel Location of Edges in a Digital Image
To put this problem into context, consider the following:
• Image arrays are very memory-dense. A more sparse way to transfer/convey information about an image containing edges
is to use the locations of edges as region boundaries of the image. This is one application of edge finding.
This methodology and apparatus seeks to find edges of objects in digital images at a high sub-pixel accuracy level. To do so,
the authors leverage different detectors, or kernels. These kernels are similar to some of the computational molecules we have
already looked at in this course:
• Robert’s Cross: This approximates derivatives in a coordinate system rotated 45 degrees (x0 , y 0 ). The derivatives can
be approximated using the Kx0 and Ky0 kernels:
∂E 0 −1
→ K x 0 =
∂x0 −1 0
∂E 1 0
→ Ky 0 =
∂y 0 0 −1
• Sobel Operator: This computational molecule requires more computation and it is not as high-resolution. It is also more
robust to noise than the computational molecules used above:
⎡ ⎤
−1 0 1
∂E
→ K x = ⎣ 2 0 2⎦
∂x
−1 0 1
⎡ ⎤
−1 2 −1
∂E
→ Ky = ⎣ 0 0 0 ⎦
∂y
1 2 1
• Silver Operators: This computational molecule is designed for a hexagonal grid. Though these filters have some advan-
tages, unfortunately, they are not compatible with most cameras as very few cameras have a hexagonal pixel structure.
For this specific application, we can compute approximate brightness gradients using the filters/operators above, and then we can
convert these brightness gradients from Cartesian to polar coordinates to extract brightness gradient magnitude and direction
(which are all we really need for this system). In the system, this is done using the CORDIC algorithm [1].
2
1.2.1 High-Level Overview of Edge Detection System
At a high level, we can divide the system into the following chronological set of processes/components:
1. Estimate Brightness Gradient: Given an image, we can estimate the brightness gradient using some of the filters
defined above.
2. Compute Brightness Gradient Magnitude and Direction: Using the CORDIC algorithm, we can estimate the
brightness gradient magnitude and direction. The CORDIC algorithm does this iteratively through a corrective feedback
mechanism (see reference), but computationally, only uses SHIFT, ADD, SUBTRACT, and ABS operations.
3. Choose Neighbors and Detect Peaks: This is achieved using brightness gradient magnitude and direction and a pro-
cedure called non-maximum suppression [2].
First, using gradient magnitude and direction, we can find edges by looking across the 1D edge (we can search for this edge
using the gradient direction Gθ , which invokes Non-Maximum Suppression (NMS). We need to quantize into 8 (Cartesian)
or 6 (Polar) regions - this is known as coarse direction quantization.
Finally, we can find a peak by fitting three points with a parabola (note this has three DOF). This approach will end up
giving us accuracy up to 1/10th of a pixel. To go further, we must look at the assumptions of gradient variation with
position, as well as:
• Camera optics
• Fill factor of the chip sensor
• How in-focus the image is
• How smooth the edge transition is
The authors of this patent claim that edge detection performance is improved using an optimal value of “s” (achieved through
interpolation and bias correction), which we will see later. For clarity, the full system diagram is here:
Figure 2: Aggregate edge detection system. The steps listed in the boxes correspond to the steps outlined in the procedure
above.
3
• Gradient estimator
• Peak detector
• Sub-pixel interpolator
Next, let us dive in more to the general edge detection problem.
u(x)
1
x
−2 −1 0 1 2
Using this cross-section across the edge to model the edge actually causes problems arising from aliasing: since we seek to find
the location of an edge in a discrete, and therefore, sampled image, and since the edge in the case of u(x) is infinitely thin, we
will not be able to find it due to sampling. In Fourier terms, if we use a perfect step function, we introduce artificially high
(infinite) frequencies that prevent us from sampling without aliasing effects. Let us instead try a “soft” step function, i.e. a
“sigmoid” function: σ(x) = 1+e1−x . Then our u(x) takes the form:
0.8
0.6
u(x)
0.4
0.2
0
−6 −4 −2 0 2 4 6
x
The gradient of this brightness across the edge, given by ru(x) (or du
dx in one dimension), is then given by the following. Notice
that the location of the maximum matches the inflection point in the graph above:
4
Gradient of “Soft” Unit Step Function, ru(x)
0.25
0.2
0.15
ru(x)
0.1
5 · 10−2
0
−6 −4 −2 0 2 4 6
x
As we mentioned above, we can find the location of this edge by looking at where the second derivative of brightness crosses
zero, a.k.a. where r(ru(x)) = r2 u(x) = 0. Notice that the location of this zero is given by the same location as the inflection
point of u(x) and the maximum of ru(x):
0.1
5 · 10−2
r2 u(x)
−5 · 10−2
−0.1
−6 −4 −2 0 2 4 6
x
For those curious, here is the math behind this specific function, assuming a sigmoid for u(x):
1
1. u(x) = 1+exp (−x)
du d 1 exp(−x)
2. ru(x) = dx = dx 1+exp −x = (1+exp(−x))2
Building on top of this framework above, let us now move on to brightness gradient estimation.
Robert’s Cross Gradient: Since this estimates derivatives at 45 degree angles, the pixels are effectively further apart, and
this means there will be a constant of proportionality difference between the magnitude of the gradient estimated here and with
a normal (x, y) coordinate system:
q q
Ex20 + Ey20 ∝ Ex2 + Ey2
5
Next, we will look at the Sobel operator. For this analysis, it will be helpful to recall the following result from Taylor Series:
∞
(δx)2 00 (δx)3 000 (δx)4 (4) X (δx)i f (i) (x) Δ
f (x + δx) = f (x) + δxf 0 (x) + f (x) + f (x) + f (x) + ... = , where 0! = 1
2! 3! 24 i=0
i!
Let us first consider the simple two-pixel difference operator (in the x-direction/in the one-dimensional case),
i.e. dE 1
dx → Kx = δ (−1 1). Let us look at the forward difference and backward difference when this operator is applied:
Notice that for both of these, if f 00 (x) is large, i.e. if f (x) is nonlinear, then we will have second-order error terms that appear in
our estimates. In general, we want to aim for removing these lower-order error terms. If we average the forward and backward
differences, however, we can see that these second-order error terms disappear:
f (x+δx)−f (x) f (x)−f (x−δx)
+ (δx)2 000
δx δx
= f 0 (x) + f (x) + ...
2 6
Now we have increased the error term to 3rd order, rather than 2nd order! As a computational molecule, this higher-order filter
Sobel operator looks like dE 1
dx → Kx = 2δ (−1 0 1). But we can do even better! So long as we do not need to have a pixel at our
proposed edge, we can use a filter of three elements spanning (x − 2δ x x + 2δ ). There is no pixel at x but we can still compute
the derivative here. This yields an error that is 0.25 the error above due to the fact that our pixels are 2δ apart, as opposed to δ
apart:
( xδ
2 )
2
error = f 000 (x)
6
This makes sense intuitively, because the closer together a set of gradient estimates are, the more accurate they will be. We
can incorporate y into the picture, making this amenable for two-dimensional methods as desired, by simply taking the center
of four pixels, given for each dimension as:
∂E 1 −1 1
≈ Kx =
∂x 2δx −1 1
∂E 1 −1 −1
≈ Ky =
∂y 2δy 1 1
The proposed edge is in the middle of both of these kernels, as shown below:
Figure 3: We can estimate the brightness gradient with minimal error by estimating it at the point at the center of these 2D
filters.
Estimating these individually in each dimension requires 3 operations each for a total of 6 operations, but if we are able to
take the common operations from each and combine them either by addition or subtraction, this only requires 4 operations.
Helpful especially for images with lots of pixels.
Next, we will discuss the 3-by-3 Sobel operator. We can think of this Sobel operator (in each dimension) as being the dis-
crete convolution of a 2-by-2 horizontal or vertical highpass/edge filer with a smoothing or averaging filter:
⎡ ⎤
−1 0 1
−1 1 1 1
1. x-direction: 2δ1x ∗ = ⎣−2 0 2⎦
−1 1 1 1
−1 0 1
⎡ ⎤
−1 −2 −1
−1 −1 1 1
2. y-direction: 2δ1y ∗ =⎣ 0 0 0⎦
1 1 1 1
1 2 1
6
A few notes about the derivation above:
• The convolution used is a “padded convolution” [3], in which, when implemented, when the elements of the filter/kernel
(in this case, the averaging kernel) are not aligned with the image, they are simply multiplied by zero. Zero padding is the
most common padding technique, but there are other techniques as well, such as wraparound padding.
• This approach avoids the half-pixel (in which we estimate an edge that is not on a pixel) that was cited above.
• Smoothing/averaging is a double edge sword, because while it can reduce/remove high-frequency noise by filtering, it can
also introduce undesirable blurring.
Next, we will look at how the brightness gradient is converted from Cartesian to Polar coordinates:
(Ex , Ey ) → (E0 , Eθ )
q
E0 = Ex2 + Ey2
E
y
Eθ = tan−1
Ey
Finally, we conclude this lecture by looking at appropriate values of s for quadratic and triangular functions. This assumes we
have three gradient measurements centered on G0 : (1) G− , (2) G0 , and (3) G+ . Let us look at the results for these two types
of functions:
G+ −G−
1. Quadratic: s = 4(G0 − 12 (G+ −G− ))
, this results in s ∈ [− 12 , 21 ].
G+ −G−
2. Triangular: s = 2(G0 −min(G+ ,G− ))
• Note that in each case, we only want to keep if the magnitude G0 is a local maximum, i.e. G0 > G+ and G0 ≥ G− .
• In the quadratic case, we can parameterize the curve with three data points using three degrees of freedom, i.e. ax2 +bx+c =
0. With this approach, b ≈ first derivative and a ≈ second derivative.
1.4 References
1. CORDIC Algorithm, https://www.allaboutcircuits.com/technical-articles/an-introduction-to-the-cordic-algorithm/
2. Non-Maximum Supression, http://justin-liang.com/tutorials/canny/#suppression
3. Padded Convolution, https://medium.com/@ayeshmanthaperera/what-is-padding-in-cnns-71b21fb0dd7
7
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms