Report

Download as odt, pdf, or txt
Download as odt, pdf, or txt
You are on page 1of 14

Outline This report is comprised of the progress made by the project group members since the first review

was held where the initial idea for the project was proposed, and the same was approved. The proposal contained a detailed description of the objective of the project, the framework that needs to be developed, and an approximated estimate of the expected results. The data contained in this report is the collective summary of the result of the literature study that has been performed, and a design paradigm drawn up based on the research. It defines a sequential process that we have designed, and are currently working on designing individual modules for each of the steps in the procedure. Current Status We are currently in the process of performing extensive research related to all the main concepts that are of relevance to the project. In this phase, a detailed study of each aspect of the project is being done. The study of each aspect throws up a new concept, and subsequently the scope of the research is extended to include that concept. Note: This report serves as a supplement to the presentation given in front of the panel by the group members. It contains detailed explanations of the aspects and ideas discussed in the presentation. Original Concept: This project aims to design a software package which would perform the task of applying appropriate image processing algorithms onto a 2 dimensional image (which depicts a scene that forms a single frame of an animation storyboard) to render it into a co-ordinate form that can be used for 3 dimensional modelling. Added note: Based on the current progress that has been made, we have drawn the following conclusion. From a computing perspective, the implementation of the proposed idea needs comprehensive application of Artificial Intelligence techniques to automate the 3D modelling process that is currently in use in regular 3D designing software packages. Techniques previously discussed: Anisotropic diffusion Feature extraction Self-organising mapping Pattern recognition Principal component analysis Independent component analysis

The concepts described in this report concerns the first 4 among the above techniques.

Outline of the procedure designed to implement the project: Preparing the source image for data acquisition Remove noise and other details from the source image which will not be required during the initial process of feature extraction. Antisotropic diffusion is used for this purpose. Edge-detection algorithms are applied to obtain the image in a suitable form where it contains only outlined sketches of the elements in the original image. Convert the image from bitmap (raster) graphics form to vector graphics form, so that points, lines, curves, shapes and polygons can be represented as mathematical expressions.

At this stage, the image will be converted into a form that is suitable for applying data acquisition techniques to obtain 3D data pertaining to each element in the image, which is required for depth analysis later. Obtaining 3D data Use a self organising map on each element of the image to obtain a point cloud representation of the image. Using the values from the point cloud, represent the image in NURBS model. Use stereo photogrammetry on the NURBS model to acquire accurate 3D co-ordinate values. The 3rd dimension must be estimated using a proposed depth determination algorithm. Form orthographic and isometric maps of the elements in the image using the 3D co-ordinate values obtained.

At this stage, the 3 dimensional data has been retrieved from the source image. In principle, this means that the co-ordinate data for all 3 axes, X, Y and Z has been obtained, and the same will be used to reconstruct the image in the 3D form. Reconstruction Overlap the corresponding sections of the orthographic and isometric projection to obtain the original surfaces, and extrapolate the non corresponding values to compose the elements that belong to the 3rd dimension.

The new concepts that have been encountered while designing this algorithm are discussed subsequently. NURBS Non-uniform rational basis spline (NURBS) is a mathematical model commonly used in computer graphics for generating and representing curves and surfaces. They can be efficiently handled by the computer programs and yet allow for easy human interaction. NURBS surfaces are functions of two parameters mapping to a surface in three-dimensional space. The shape of the surface is determined by control points. NURBS surfaces can represent simple geometrical shapes in a compact form.

Has 3 levels of geometric continuity: Positional continuity (G0) holds whenever the end positions of two curves or surfaces are coincidental. The curves or surfaces may still meet at an angle, giving rise to a sharp corner or edge and causing broken highlights. Tangential continuity (G1) requires the end vectors of the curves or surfaces to be parallel, ruling out sharp edges. Because highlights falling on a tangentially continuous edge are always continuous and thus look natural, this level of continuity can often be sufficient. Curvature continuity (G2) further requires the end vectors to be of the same length and rate of length change. Highlights falling on a curvature-continuous edge do not display any change, causing the two surfaces to appear as one. This can be visually recognized as perfectly smooth. A NURBS curve is defined by its order, a set of weighted control points, and a knot vector. Control Points The control points determine the shape of the curve. Typically, each point of the curve is computed by taking a weighted sum of a number of control points. The weight of each point varies according to the governing parameter. For a curve of degree d, the weight of any control point is only nonzero in d+1 intervals of the parameter space. Within those intervals, the weight changes according to a polynomial function (basis functions) of degree d. At the boundaries of the intervals, the basis functions go smoothly to zero, the smoothness being determined by the degree of the polynomial. Adding more control points allows better approximation to a given curve, although only a certain class of curves can be represented exactly with a finite number of control points. NURBS curves also feature a scalar weight for each control point. This allows for more control over the shape of the curve without unduly raising the number of control points. The control points can have any dimensionality. One-dimensional points just define a scalar function of the parameter. Multi-dimensional points might be used to control sets of time-driven values, e.g. the different positional and rotational settings of a robot arm. NURBS surfaces are just an application of this. Each control 'point' is actually a full vector of control points, defining a curve. These curves share their degree and the number of control points, and span one dimension of the parameter space. By interpolating these control vectors over the other dimension of the parameter space, a continuous set of curves is obtained, defining the surface. The knot vector The knot vector is a sequence of parameter values that determines where and how the control points affect the NURBS curve. The number of knots is always equal to the number of control points plus curve degree minus one. The knot vector divides the parametric space in the intervals mentioned before, usually referred to as knot spans. Each time the parameter value enters a new knot span, a new control point becomes active, while an old control point is discarded.

It follows that the values in the knot vector should be in nondecreasing order, so (0, 0, 1, 2, 3, 3) is valid while (0, 0, 2, 1, 3, 3) is not. Consecutive knots can have the same value. This then defines a knot span of zero length, which implies that two control points are activated at the same time (and of course two control points become deactivated). This has impact on continuity of the resulting curve or its higher derivatives; for instance, it allows the creation of corners in an otherwise smooth NURBS curve. Order The order of a NURBS curve defines the number of nearby control points that influence any given point on the curve. The curve is represented mathematically by a polynomial of degree one less than the order of the curve. Hence, secondorder curves (which are represented by linear polynomials) are called linear curves, third-order curves are called quadratic curves, and fourth-order curves are called cubic curves. The number of control points must be greater than or equal to the order of the curve. Photogrammetry Photogrammetry is the practice of determining the geometric properties of objects from photographic images. In the simplest example, the distance between two points that lie on a plane parallel to the photographic image plane can be determined by measuring their distance on the image, if the scale (s) of the image is known. This is done by multiplying the measured distance by 1/s. The data model of photogrammetry is described below and depicted in the diagram that follows: The 3D co-ordinates define the locations of object points in the 3D space. The image co-ordinates define the locations of the object points' images on the film or an electronic imaging device. The exterior orientation of a camera defines its location in space and its view direction. The inner orientation defines the geometric parameters of the imaging process. This is primarily the focal length of the lens, but can also include the description of lens distortions. Further additional observations play an important role: With scale bars, basically a known distance of two points in space, or known fix points, the connection to the basic measuring units is created. Each of the four main variables can be an input or an output of a photogrammetric method.

Point Cloud A point cloud is a set of vertices in a three-dimensional coordinate system. These vertices are usually defined by X, Y, and Zcoordinates, and typically are intended to be representative of the external surface of an object. Point clouds can be directly rendered and inspected, but usually they are not directly usable in most 3D applications, and therefore are converted to polygon or triangle mesh models, NURBS surface models, or CAD models through a process commonly referred to as surface reconstruction.

EDGE DETECTION Edge detection is a fundamental tool in image processing, machine vision and computer vision areas of feature detection and feature extraction, which aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The purpose of detecting sharp changes in image brightness is to capture important events and changes in properties of the world. The result of applying an edge detector to an image may lead to a set of connected curves that indicate the boundaries of objects, the boundaries of surface markings as well as curves that correspond to discontinuities in surface orientation. Thus, applying an edge detection algorithm to an image may significantly reduce the amount of data to be processed and may therefore filter out information that may be regarded as less relevant, while preserving the important structural properties of an image The edges extracted from a two-dimensional image of a three-dimensional scene can be classified as either viewpoint dependent or viewpoint independent. A viewpoint independent edge typically reflects inherent properties of the threedimensional objects, such as surface markings and surface shape. A viewpoint dependent edge may change as the viewpoint changes, and typically reflects the geometry of the scene, such as objects occluding one another. Gaussian Filter: A Gaussian blur (also known as Gaussian smoothing) is the result of blurring an image by a Gaussian function. It is used in different image processing softwares to reduce the noise in the image. The visual effect of applying a filter to the image is of viewing an image through a translucent paper. Mathematically applying a Gaussian blur is equivalent to convolving the image with Gaussian function. It acts as a low pass filter since applying a Gaussian filter removes the high frequency components from the image. Applying multiple, successive Gaussian blurs to an image has the same effect as applying a single, larger Gaussian blur, whose radius is the square root of the sum of the squares of the blur radii that were actually applied. For example,

applying successive Gaussian blurs with radii of 6 and 8 gives the same results as applying a single Gaussian blur of radius 10, since (62+82)0.5=10 Gaussian smoothing is commonly used with edge detection. Most edge-detection algorithms are sensitive to noise; the 2-D Laplacian filter, built from a discretization of the Laplace operator, is highly sensitive to noisy environments. Using a Gaussian Blur filter before edge detection aims to reduce the level of noise in the image, which improves the result of the following edge-detection algorithm. This approach is commonly referred to as Laplacian of Gaussian, or LoG filtering. OPTIMAL EDGE DETECTION ALGORITHM An "optimal" edge detector means: good detection the algorithm should mark as many real edges in the image as possible. good localization edges marked should be as close as possible to the edge in the real image. minimal response a given edge in the image should only be marked once, and where possible, image noise should not create false edges

Since different edge detectors work better under different conditions, it would be ideal to have an algorithm that makes use of multiple edge detectors,applying each one when the scene conditions are most ideal for its method of detection So we have studied different types of edge detectors to determine which functions when in what condition so that the particular algorithm can be used in a situation. Edge detection algorithms are classified into 2 categories: 1. Gradient based edge detection. The gradient method detects the edges by looking for the maximum and minimum in the first derivative of the image. 2. Laplacian based Edge Detection: The Laplacian method searches for zero crossings in the second derivative of the image to find edges. Consider the following image with Gradient change in intensity depicting the edge.

The following image if the first derivative of the image given above. The derivative shows a maximum located at the center of the edge in the original signal. This method of locating an edge is characteristic of the gradient filter family of edge detection filters. A pixel location is declared an edge location if the value of the gradient

exceeds some threshold. Edges will have higher pixel intensity values than those surrounding it.

Furthermore, when the first derivative is at a maximum, the second derivative is zero. As a result, another alternative to finding the location of an edge is to locate the zeros in the second derivative. This method is known as the Laplacian and the second derivative of the signal is shown below:

Edge detection techniques Sobel operator The operator calculates the gradient of the image intensity at each point, giving the direction of the largest possible increase from light to dark and the rate of change in that direction. The result therefore shows how "abruptly" or "smoothly" the image changes at that point, and therefore how likely it is that that part of the image represents an edge, as well as how that edge is likely to be oriented. Mathematically, the operator uses two 33 kernels which are convolved with the original image to calculate approximations of the derivatives - one for horizontal changes, and one for vertical. If we define A as the source image, and Gx and Gy are two images which at each point contain the horizontal and vertical derivative approximations, the computations are as follows:

The x-coordinate is defined here as increasing in the "right"-direction, and the ycoordinate is defined as increasing in the "down"-direction. At each point in the image, the resulting gradient approximations can be combined to give the gradient magnitude, using:

Using this information, we can also calculate the gradient's direction

Canny edge detection The steps in Canny Edge-detection algorithm are: 1. Smooth the image with a two dimensional Gaussian. In most cases the computation of a two dimensional Gaussian is costly, so it is approximated by two one dimensional Gaussians, one in the x direction and the other in the y direction. 2. Take the gradient of the image. This shows changes in intensity, which indicates the presence of edges. This actually gives two results, the gradient in the x direction and the gradient in the y direction. 3. Non-maximal suppression. Edges will occur at points the where the gradient is at a maximum. Therefore, all points not at a maximum should be suppressed. In order to do this, the magnitude and direction of the gradientis computed at each pixel. Then for each pixel check if the magnitude of the gradient is greater at one pixel's distance away

in either the positive or the negative direction perpendicular to the gradient. If the pixel is not greater than both, suppress it 4.Once the edge direction is known, the next step is to relate the edge direction to a direction that can be traced in an image. So if the pixels of a 5x5 image are aligned as follows: xxxxx xxxxx xx axx xxxxx xxxxx Then, it can be seen by looking at pixel "a", there are only four possible directions when describing the surrounding pixels - 0 degrees (in the horizontal direction), 45 degrees (along the positive diagonal), 90 degrees (in the vertical direction), or 135 degrees (along the negative diagonal). So now the edge orientation has to be resolved into one of these four directions depending on which direction it is closest to (e.g. if the orientation angle is found to be 3 degrees, make it zero degrees). Think of this as taking a semicircle and dividing it into 5 regions.

Therefore, any edge direction falling within the yellow range (0 to 22.5 & 157.5 to 180 degrees) is set to 0 degrees. Any edge direction falling in the green range (22.5 to 67.5 degrees) is set to 45 degrees. Any edge direction falling in the blue range (67.5 to 112.5 degrees) is set to 90 degrees. And finally, any edge direction falling within the red range (112.5 to 157.5 degrees) is set to 135 degrees. 5. Edge Thresholding. The method of thresholding used by the Canny Edge Detector is referred to as "hysteresis". It makes use ofboth a high threshold and a low threshold. If a pixel has a value above the high threshold, it is set as an edge pixel. If a pixel has a value above the low threshold and is the neighbor of an edge pixel, it is set as an edge pixel as well. If a pixel has a value above the low threshold but is not the neighbor of an edge pixel, it is not set as an edge pixel. If a pixel has a value below the low threshold, it is never set as an edge pixel.

The Marr-Hildreth Edge Detector The Marr-Hildreth edge detector was a very popular edge operator before Canny released his paper. It is a gradient based operator which uses the Laplacian to take the second derivative of an image. The idea is that if there is a step difference in the intensity of the image, it will be represented by in the second derivative by a zero crossing. The steps: 1.Smooth the image using a Gaussian. This smoothing reduces the amount of error found due to noise. 2. Apply a two dimensional Laplacian to the image:

3. Loop through every pixel in the Laplacian of the smoothed image and look for sign changes. If there is a sign change and the slope across this sign change is greater than some threshold, mark this pixel as an edge. Alternatively, you can run these changes in slope through a hysteresis (described in the Canny edge detector) rather than using a

simple threshold.

Comparision: As edge detection is a fundamental step in computer vision, it is necessary to point out the true edges to get the best results from the matching process. That is why it is important to choose edge detectors that fit best to the application. Name Sobel Advantages Disadvantages to noise,

Simplicity, Detection of Sensitivity edges and their Inaccurate orientations

Canny

Using probability for Complex Computations, finding error rate, False zero crossing, Time Localization and consuming response. Improving signal to noise ratio, Better detection specially in noise conditions Finding the correct places of edges, Testing wider area around the pixel Malfunctioning at the corners, curves and where the gray level intensity function varies. Not finding the orientation of edge because of using the Laplacian filter

Marr-Hildreth

VECTOR GRAPHICS: Vector graphics is the use of geometrical primitives such as points, lines, curves, and shapes or polygon(s), which are all based on mathematical expressions, to represent images in computer graphics. "Vector", in this context, implies more than a straight line. Vector graphics is based on images made up of vectors (also called paths, or strokes) which lead through locations called control points. Each of these points has a definite position on the x and y axes of the work plan. Each point, as well, is a variety of database, including the location of the point in the work space and the direction of the vector (which is what defines the direction of the track). Each track can be assigned a color, a shape, a thickness and also a fill. This does not affect the size of the files in a substantial way because all information resides in the structure; it describes how to draw the vector. A vector image is a digital image composed of independent geometric objects (segments, polygons, arcs, etc.), They are defined by different mathematical attributes of shape, position, color, etc. For example, a red circle will be defined by the position of its center, its radius, line thickness and color. This image format is completely different from the raster format, also called matrix images, which are formed by pixels. The main interest of vector graphics is to expand the size of an image will not suffer the effects of scaling the raster suffering. It also allows you to move, stretch and twist images relatively easily. Its use is also widespread in the generation of images in three dimensions both dynamic and static. Each geometric primitive has a number of attributes (position, color, fill). During a performance, the software works with lines (or curves) and surfaces. All lines, individually, are defined by landmarks that define the equation. These characteristic points form a vector. Thus, this equation is calculated by the computer and is remembered by it. For example, tracing an ellipse, it exists in two forms: a mathematical formula in

memory and plotted on the screen. The route is controlled by the vector, the characters of line and color of the surface. Thus, all these parameters remain separately editable. The vector representation introduced a notion of drawing layers. It is possible to superimpose several shots of curves, which is impossible in bitmap representation since there is only one layer, each new point overwrites the previous point. This concept will be re-introduced in the field of cartography. Vector does not refer to mathematical vectors, but mainly refers to computer data or instructions that are graphical attributes. However, such data is often represented as tuples that may indicate vectors, as opposed to a raster image that is based on pixels. Bitmap Images A bitmap is an image which is made up of tiny squares of colour. The arrangement of these tiny coloured squares produce the effect of an image. This is a good method of reproducing 'continuous tone' images, such as photographs. The amount of detail that can be seen in a picture depends on the resolution of the image; how many times per inch these squares or pixels occur. 300 times per inch is what is needed for good quality reproduction on a commercial printing press, and 72 pixels per inch for monitor display. Bitmaps have two disadvantages. In terms of the amount of digital storage, bitmaps are memory intensive, and the higher the resolution, the larger the file size. The other disadvantage with bitmaps is when an image is enlarged, the individual coloured squares become visible and the illusion of a smooth image is lost to the viewer. This 'pixelation' makes the image look coarse. Vector Images Scalable vector graphics are very different from bitmaps. Vectors describe the shape of an object as a series of points connected by curved or straight lines, represented as a mathematical formula. These lines may have a thickness or stroke assigned to them, and the object they create can be filled with colour. The advantages of using vector graphics are; a small file size and the ability to scale the image to any size without loss of quality; see the image above. They are ideal for logo designs, as they can be printed very small on business cards or printed large on a billboard poster. Vector Representation ADVANTAGES: Data can be represented at its original resolution and form without generalization. Graphic output is usually more aesthetically pleasing. Accurate geographic location of data is maintained. Since most data is in vector form no data conversion is required. Allows for efficient encoding of topology, and as a result more efficient

operations that require topological information. Vector-oriented images are more flexible than bit maps because they can be resized and stretched. In addition, images stored as vectors look better on devices (monitors and printers) with higher resolution, whereas bit-mapped images always appear the same regardless of a device's resolution. Representations of images often require less memory than bit-mapped images do.

You might also like