3D Scanner Using Kinect
3D Scanner Using Kinect
3D Scanner Using Kinect
net/publication/265984730
Article
CITATIONS READS
2 1,666
3 authors, including:
Sida Wang
Stanford University
19 PUBLICATIONS 1,090 CITATIONS
SEE PROFILE
All content following this page was uploaded by Sida Wang on 30 March 2015.
In this report, we describe, justify and assess the quality of our design
for a 3D scanner using Microsoft Kinect.
A 3D model of an object can be obtained by taking both color and
depth images from different viewpoints. We developed a streamline
and a novel combination algorithm for this task. In the process, we
had to solve problems involving noise, miss-alignment between color
and depth, combination of different viewpoints and meshing. The
final mesh can be converted to a format that can be fed into the
RepRap 3D printer.
We provide justifications for some of our major design choices such
as meshing methods, combination, alignment. And we also assess the
quality of our final design with respect to the revised requirements
and find that the final design satisfies all our original requirements.
Appendix A provides details of our combination algorithm, appendix
chap:meshsoft provides an overview of software that take pointclouds
to meshes, and appendix C provides an overview on the alignment
procedure dealing with color/depth mismatch.
We thank Orbis Mutual Funds for sponsoring this course and Prof.
Jason Foster for his enthusiasm and helpful suggestions throughout
this course.
Our code is hosted at http://code.google.com/p/kinnectproj/.
Contents
Contents ii
1 The Design 1
1.1 Overview of the design . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Preprocessing steps . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 The solution . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 The point cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Combination step . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 RepRap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8 Accessing our code, replicating the results . . . . . . . . . . . . . 9
2 Justifications 10
2.1 Using Python as the programming language . . . . . . . . . . . . 10
2.1.1 Expressiveness . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Interactivity . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Numerical Package . . . . . . . . . . . . . . . . . . . . . . 11
2.1.4 Extensibility to GPU . . . . . . . . . . . . . . . . . . . . . 11
2.1.5 Familiarity, OpenSource and availability of support . . . . 11
2.2 Using OpenKinect as driver . . . . . . . . . . . . . . . . . . . . . 11
2.3 Meshing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Using the PLY file format . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Combination Method . . . . . . . . . . . . . . . . . . . . . . . . . 14
ii
CONTENTS
3 Assessments 15
3.1 Assessment against requirements . . . . . . . . . . . . . . . . . . 15
3.1.1 The software must be compatible with any Kinect device . 15
3.1.2 Produce a 360 degrees 3D model . . . . . . . . . . . . . . 15
3.1.3 Display only the scanning object . . . . . . . . . . . . . . 16
3.1.4 Output to standard 3D format . . . . . . . . . . . . . . . . 16
3.2 Assessment against Criteria . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Technical Criteria . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1.1 Filter input noise . . . . . . . . . . . . . . . . . . 16
3.2.1.2 Adhere to software design pattern . . . . . . . . . 17
3.2.1.3 The output format to be supported by other ex-
isting applications . . . . . . . . . . . . . . . . . 17
3.2.1.4 Relative accuracy . . . . . . . . . . . . . . . . . . 17
3.2.1.5 Handle complex objects . . . . . . . . . . . . . . 17
3.2.1.6 Scan range . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 Application Criteria . . . . . . . . . . . . . . . . . . . . . 18
3.2.2.1 3D printing . . . . . . . . . . . . . . . . . . . . . 18
3.2.2.2 3D Animation . . . . . . . . . . . . . . . . . . . . 18
3.2.2.3 3D live stream . . . . . . . . . . . . . . . . . . . 18
3.2.3 Economic Criteria . . . . . . . . . . . . . . . . . . . . . . . 19
A Combination Algorithm 20
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
A.1.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . 20
A.1.2 The Methods . . . . . . . . . . . . . . . . . . . . . . . . . 23
A.2 A probabilistic formulation . . . . . . . . . . . . . . . . . . . . . . 23
A.2.1 Basic formulation . . . . . . . . . . . . . . . . . . . . . . . 23
A.2.2 Our additions to EM . . . . . . . . . . . . . . . . . . . . . 24
A.3 The EM approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
A.4 Some experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
A.4.1 A 2D experiment first . . . . . . . . . . . . . . . . . . . . 25
A.4.2 3D reconstruction . . . . . . . . . . . . . . . . . . . . . . . 28
A.5 The Bayesian approach . . . . . . . . . . . . . . . . . . . . . . . . 28
iii
CONTENTS
C Alignment Methods 35
C.1 Depth coordinates to world coordinates . . . . . . . . . . . . . . . 35
C.2 Color and Depth Mapping . . . . . . . . . . . . . . . . . . . . . . 36
References 39
iv
Chapter 1
The Design
Figure 1.1: simple overview of this design, a more detailed flowchart is on figure
1.2.
First, images are taken from different viewpoints of some object. The sensor
image first goes through a few preprocessing steps so it has less noise than raw
sensor inputs. The denoised images then go through an alignment procedure
which aligns depth images with color images. The aligned image gets converted
to a point cloud in world-coordinates. Point clouds from adjacent viewpoints are
1
combined pairwise into a single point cloud. And all point clouds eventually get
combined together to form a single point cloud model for the whole object. Point
cloud can be converted to a mesh by sampling and adding edges as well as faces.
Lastly, the RepRap host software can take the mesh and print the object scanned.
2
The pre-processing steps
9 frames of Averaged Aligned
coloured colour coloured
Convertion to point
captures image Alignment image Clipped
Robust averaging clouds:
point cloud
process: goes to P = {x,y,z,r,g,b},
process to from a
world coordinates Throw away point
denoise specific view
and back outside a specified
point
cube
9 frames of Denoised Aligned
depth depth depth
captures image image
3
point cloud 1
R'i, t'i
Combined
Figure 1.3: The sensor input. left image: sensor input before robust averaging
right image: after robust averaging
4
Figure 1.4: Miss-aligned image. The red arrows are miss-aligned by a large
amount on the color and depth images [1].
1.3 Alignment
1.3.1 The problem
This problem is due to the way the Kinect sensors works. Kinect uses different
camera at different locations for color and depth. Furthermore, the infrared laser
projector is at yet a different location from depth, and color cameras. So miss-
alignment is to be expected. See figure 1.4 for a demonstration of this problem
[1].
• Find the mapping between a pixel in the depth image with another pixel
in the color image.
The solution to those two problems are well-studied within the Kinect com-
munity, and a solution used in this project is outlined in C with greater detail.
5
1.4 The point cloud
The input from the Kinect is a 2D array of depth values and color measures,
but the output PLY file format as well as the combine algorithms in the next
section works on point clouds. So it is necessary to convert the 2D arrays to a
point cloud form. The conversion is done in the constructor of the pointcloud
class. In essence, it simply goes over the 2-D array point by point and applies the
alignment algorithm mentioned in previous section. The code is optimized for
performance by using matrix operations instead of loops, so it imposes no over-
head for the program. After conversion, the data are stored into two lists: one for
vertex coordinates and another for its color. The two lists share the same index.
The reason for separating the coordinates and color is because the combination
algorithms only need to use the coordinate values. This implementation makes
the list more manipulable.
Other than the constructor, three more methods are defined for this class:
addition, clipping, and PLY output. The addition operation is overloaded to
concatenate two objects together, which is done by concatenating all internal
matrices. A clipping method is defined to cut off extraneous points around the
object. The input variables defines the 6 planes that forms a cube, and the
method will search in the point cloud and delete all points that are outside the
cube. The code is written so that clipping on one side a time is also possible. The
PLY output method follows the PLY file format standard by first printing out
the headers and then a list of all points in the point cloud. The coordinates are
scaled 1000 times when output, because the internal unit used in this project is
meter, while the RepRap software used in 3D printing assumes all coordinates in
millimeters ([2]). The PLY format specification is unit-less (any unit can be used
for the coordinates) so this scaling will not affect the portability of the outputted
PLY file.
6
the translation vector t ∈ R3 from each pair of adjacent point clouds. The detailed
algorithm is described in A, which is based on [3] and [4].
After all these transformations are extracted, each point cloud is modified
by its respective transformation, and all point clouds are combined into a single
point cloud in the world coordinate.
The combination algorithm works by simulated annealing on the variance σ 2
as EM steps are taken. With bigger variance, combination algorithm looks for
rough matches, and with smaller variance, finer matches. An uniform component
is also added to the Gaussian mixture in [3] to deal with outliers. The process
stops when 20% of the data are explained by the uniform component. For more
details on this algorithm we developed, see A. We also tried a Bayesian model for
this task, also described in A, which unfortunately does not meet performance
standards using Markov Chain Monte Carlo for inference.
There is one last hack used for combination. Because we do pairwise combi-
nation, errors accumulates so the last frame does not necessarily wrap back to
the first frame. So we introduce an ad-hoc algorithm forcing all transformation
matrices to multiply to identity. We use an iterative procedure.
P = R1 R2 ...Rn
0−1 0−1
1. Set Pi = Ri−1 Ri−2 ...R10−1 and Qi = Ri Ri+1 ...Rn
7
1.6 Mesh
Existing 3D applications often support 3D models in mesh format. We use a
third-party software MeshLab to convert a point cloud model to a triangular
mesh model.
We first subsample the point cloud using the Poisson Sampling algorithm.
This process forces all vertices to be uniformly distributed, while also eliminating
the noisy data points. Next, we apply the Poisson Surface Construction operation
to construct the 3D mesh. It is worth noting that the resulting mesh does not
include color information at this point. In the final step, we run the Vertex
Attribute Transfer operation to transfer the color information from the original
point cloud onto the new mesh model. The transfer algorithm uses a simple
closest point heuristic to match the points between the two models. Moreover,
MeshLab also allows users to export all the operations described above into a
single script file (.mlx). The script can be invoked using a shell script adhere to
the specifications of MeshLabServer [5].
See B for a detailed comparison between alternative meshing softwares.
1.7 RepRap
Although RepRap indicated PLY as a recommended file format on its official
site ([6]), the actual software can only read STL file and RepRap-specific (none-
portable) RFO file. The STL must be in binary form (ASCII format will not
load properly) and has unit of millimeter ([2]). The PLY file is already in unit of
millimeter, so the only thing left to do is converting PLY format to STL (MeshLab
is chosen to do this job simply because previous steps used it). It is important to
note that STL does not support color, the color information is lost (the RepRap
printer cannot print color anyways).
After the file has been converted to STL format, it can be loaded into the
RepRap host program and used to generate the 3D object in the machine! See
[7] for how to use RepRap host program to print parts. The printing is time
consuming, for example, printing a toy bear requires about 11 hours.
8
1.8 Accessing our code, replicating the results
First, you would need the Mercurial source control system to access the up-to-
date code. The code is hosted at http://code.google.com/p/kinnectproj/.
A few original dataset and results can be found in the Downloads section.
If you would like to become a contributor, email one of us. Our email can be
found in this project page.
9
Chapter 2
Justifications
2.1.2 Interactivity
Since Python is an interpreted language, we are able to run our programs inter-
actively, with the freedom of making modifications while running. This was a
very helpful and commonly used technique during the design process. A common
scenario goes like this:
10
This would not be possible with compiled language unless we save all our
intermediates results in files, which introduces considerable overhead.
• Python wrapper
11
• Control the tilting motor
• Run on Linux
Both drivers meet the requirements listed above, however at the time of our
initial research, python wrapper was not available for openni. In terms of func-
tionalities, OpenNI provides a rich set of middleware features such as full body
analysis, gesture detection and hand point analysis that are outside of the scope
of this project. Although, it can be helpful to have a rich set of external libraries,
but it also affects the portability of the final software. Overall, libfreenect was
chosen over OpenNI, for its simple design and its support for Python [10; 11].
2.3 Meshing
The general problem of converting a point cloud to a mesh model is thoroughly
studied in the field of computer graphics [12] [13], as there is a number of third-
party software available that we could use to perform this operation. In order
to choose which application is the most suitable for this project, here are the
requirements and criteria that we used for our selection process:
• Performance: It is critical that the final mesh closely resemble to the input
point cloud in terms of physical dimensions and also color coating. More-
over, there are some noisy data from the point cloud such as layer overlaps
that the software should adjust and ignore.
• Time: The whole process time should be fast and comparable with the
timing of the other components. We dont want this step to be the bottleneck
of the overall flow.
• Cost: The whole aim of this project is to make a cheap 3D scanner. Hence,
if free open source software can handle the job reasonably well, we dont
want to waste money on commercial software.
12
• Automation: The whole process should be automated. Ideally, we want to
write a shell script to execute the whole operation. The script specifies the
input file in a point cloud form and the output should a 3D model file in
the corresponding mesh form.
• Platform: So far, all the implementations are done on Ubuntu Linux ma-
chines. Hence, Linux based programs are preferred.
See B for a detailed comparison between different meshing softwares. Mesh-
Lab was chosen as the best fit for this project.
13
information it supports is beyond the scope of this project. Since PLY already
provided all required information, there is no need to use the COLLADA format
which will make the programming work much harder to implement. As a result,
the PLY file format was chosen as the official output format [16].
14
Chapter 3
Assessments
In this section, we will use the set of requirements and criteria outlined in the
Revised Requirements document as the metrics to evaluate the quality of our
project from different aspects.
15
3.1.3 Display only the scanning object
In order to clip unwanted scenes from the object, we construct a magic cube based
on a set of parameters. All surroundings outside of the cube are ignored. This
means that the user has to estimate both the size of the object and its distance
away from Kinect before conducting the clipping operation. The parameters can
be reused if all scanning objects are of similar size and the distance between
Kinect and the object is relatively constant.
To eliminate the amount of noise, a number of measures have been taking. First,
we use the robust weighted averaging method to achieve a more stable depth
image. In addition, the clipping process removes the surrounding from the real
object. The merging heuristics also assume a certain amount of outliers (20%).
Finally, the Poisson Sampling algorithm enforces an even vertex distribution
throughout the whole model. Overall, as illustrated in the figures shown pre-
viously, the final output model does an excellent job in regards to noise filter.
Color mapping: In general, the color mapping error is relatively small, within a
few pixels. The main bottleneck for this criterion is Kinect itself, because the
color images only have a resolution of 640x480 pixels. At the moment, the color
coating can be blurry on 3D models. However, we expect the color quality to be
improved if more data points are provided from Kinect. Computation time: The
major bottleneck in terms of computation time is the merge component. The
algorithm has a complexity of N 2 . If we use a sample size of 1000 points, this
translates to a running time of roughly 10 minutes. Depending on the complexity
16
of the object, 1000 sample points may be enough for simple objects such a box,
but not sufficient for more sophisticated objects such as a real person. Hence,
the computation time is acceptable for the purpose of 3D printing. However, the
speed is insufficient for fast response applications such as 3D video streaming.
The entire program is comprised of python code, shell scripts, and MeshLab
scripts. During the implementation stage, we tried our best to comment all
changes and also avoid hardcoded programs to deliver a generic implementation.
Moreover, there is an online code repository that all team members share [18].
This allows all members to synchronize to the latest work and also provides a
safety mechanism to backup all code onto the cloud server.
The relative accuracy between the real object and its 3D model is within a few
centimeters. This variation is mainly due to rounding errors when performing
the point cloud to mesh conversion.
The objects tested so far are a drawer, a soy sauce bottle, a box, a bear, and a real
person. The outcome is relatively accurate for the first 4 objects. The algorithm
performs well against both convex and concave objects. The soy sauce bottle
had a hole around the handle; this void space was also reflected in the output
model. In the case of a real person, although the general 3D shape resembled
to the actual person, the color mapping was poorly constructed. However, since
the person was rotating himself in order to capture image frames from all 360
17
degrees, the input data are less accurate than the other static objects. Overall,
we believe the scanner is fairly accurate against household items.
The software doesnt impose further physical restrictions in terms of the scanning
range; the range fully depends on the camera limitations on Kinect.
3.2.2.1 3D printing
3.2.2.2 3D Animation
This application requires low relative accuracy and no specific constraint for the
computation time. It is a good match to the current state of the project.
While accuracy is not very important in live streaming, the output has to be de-
livered instantaneously. The current running time takes over 10 minutes, however
it is possible to improve the performance dramatically with the help of 100+ core
GPUs.
18
3.2.3 Economic Criteria
The major objective of this project is to build a 3D scanner using Kinect as cheap
as possible. A brand new Kinect costs $149.99 at any BestBuy local store [19].
Initially, we planned to build a rotation apparatus, such that the scanning object
can be placed on it to rotate around. However, the merging algorithm doesnt
assume all captured frames to have the same rotational axis; hence there is no
need for such apparatus. Moreover, both OpenKinect and MeshLab are open
sources project that are freely available to the general public. As result, the total
cost of building our Kinect 3D scanner is equivalent to the cost of Kinect itself.
19
Appendix A
Combination Algorithm
A.1 Introduction
Readers are assumed to understand basic mixture models and EM algorithm in
this appendix. See relevant chapters in Bishop’s text [20].
20
Figure A.1: The sensor input. top image: depth map, whiter color means farther
away, pure black means the value is unknown due to sensor limitations (shadow of
projected light used for depth detection. The Kinect sensor uses the structured
light method for depth capturing.). bottom image: the corresponding RGB
map
21
Figure A.2: Point clouds with color, seen from two different viewpoints of the
above capture. This is obtained with some alignment and other processing steps,
from the data shown in figure A.1. But the process is straight-forward, just takes
some work.
22
A.1.2 The Methods
We quickly outline our approaches here, with a lot more details in the sections
to follow. The problem can be formulated somewhat probabilistically, We do
realize that the formulation is not the most natural. Also note that colors and
other attributes are completely ignored, so we only deal with shapes here. One
approach is EM, and the other is MCMC on a Bayesian model.
Basically, we could treat each point in some point cloud as a mixture compo-
nent in a Gaussian mixture model (GMM) with mean equal to location of that
point, and some variance σ 2 . Then we can try to maximize the probability of this
GMM generating some adjacent point cloud as data. Notice each point cloud is
rigid, so the mean and variance of a GMM cannot be changed freely as in the
regular EM method for GMMs. Instead, we rotate, translate all means of each
mixture component by a specific rotation matrix R ∈3 R3 and translation vector
t ∈ R3 . To a first approximation, if we can maximize the likelihood, with respect
to R and t, of generating an adjacent point cloud, then the R and t which maxi-
mizes the likelihood can be treated as the solution. Similarly, a vague prior can
also be added to R and t since the object might be rotated in a somewhat regular
way, and we may incorporate our prior belief into this framework. Slightly more
complexity arises from having to deal with systematic outliers, since adjacent
point clouds are not completely the same, but just have considerate overlap (say
80% overlap). Note that outliers here include both measurement errors, as well
as the systematic outliers due to non-overlapping parts.
23
we treat points in point cloud P as data, and points in point cloud Q as
mixture components.
Let Q0 = {q10 , q20 , ..., qN
0
} where qi0 = Rqi + t. Now the task is to find rotation
matrix R ∈3 R3 and translation vector t ∈ R3 , so that the likelihood of generat-
ing the data points P under Gaussian mixture components with means Q0 and
variance σ 2 . It is sensible for this task to use the same variance in all directions
and for all components, or at least use the same specific prior of variance for all
components. we will also use the same mixing proportions. Think of it as an
“cloud” of uniform thickness so that its shape should match another cloud. The
variance should not vary significantly as in usual GMMs.
The basic framework is presented in the coherent point drift paper [4], and an
EM algorithm specific for this application is presented in [3]. The EM algorithm
we used also does annealing and accounts for outliers, which are not in the second
paper. But we refer to these two papers without providing excessive details on
the EM approach, except a few additions we made that are not mentioned in the
paper.
24
• E-step: Evaluate the M × N responsibility matrix
• σ =σ×α
25
Figure A.3: Matching cubic curves. Green scatter shows the original mixing
components location. Red scatter shows the data location. Blue scatter shows
the new mixing component location
26
Figure A.4: Matching cubic curves. Green scatter shows the original mixing
components location. Red scatter shows the data location. Blue scatter shows
the new mixing component location
27
A.4.2 3D reconstruction
Then we sample points and apply this method to the 3D soysauce dataset. The
combination result is shown in figure A.5.
Figure A.5: Matching real pointclouds. Left is the combined pointcloud, with
many visible outliers. Middle is the topview. Right is after meshing the point-
clouds
28
A.5.1 The model
Recall we have two pointclouds, P and Q, where P = {p1 , p2 , ..., pM } and Q =
{q1 , q2 , ..., qN }. Set Q0 = {q10 , q20 , ..., qN
0
} where qi0 = Rqi +t. We treat P as the data
pointcloud, and Q as the mixture component pointcloud. Mixing proportions
under the outliers GMM are constants, and sums to a total of Π = (m + a)/(M +
P
a + b) and Π/M each, where m = i oi a, b are constants for the beta prior for
outlier indicators. Once a point is labeled outlier then it comes from an uniform
component, with mixing proportion 1−Π. Alternatively, we may also use a GMM
with larger variance to allow for softer assignments.
So, the simpler model (model A) is as follows:
(
GMM(Q0 , σ02 ) : oi = 0
pi ∼
Uniform(−l, l) : oi = 1
oi |θ ∼ Bernoulli(θ)
θ ∼ Beta(a = 4, b = 16)
log(σ0 ) ∼ N(−4, 0.1)
R ∼ Uniform(rotationmatrices)
t ∼ Uniform(−l, l).
(
GMM(Q0 , σ02 ) : oi = 1
pi ∼
GMM(Q0 , σ12 ) : oi = 0
oi |θ ∼ Bernoulli(θ)
θ ∼ Beta(a = 4, b = 16)
log(σ0 ) ∼ N(−4, 0.1)
log(σ1 ) ∼ N(−2, 1)
R ∼ N(rotationmatrices, R0 , σR )
t ∼ Uniform(t0 , σt ).
29
Problem domain knowledge is built into the prior. In this case, the object we
are interested in have roughly a radius of decimeters. As a result, E[log(σ1 )] = −2.
The feature size that should be matched are roughly centimeters, or even sub-
centimeter, so E[log(σ0 )] = −4 with small variance. These values come from my
beliefs, but arbitrarily specifying the prior means of σ0 , σ1 in log domain is much
better than arbitrarily specifying σ0 , σ1 themselves. We use Uniform(−l, l) to
mean an uniform distribution that has range −l to l in all dimensions.
In the simple case, R and t, can be assumed to be have uniform distributions
over its domain. Since R is a rotation matrix, it really only has 3 degrees of
freedom, instead of 9, which is its dimension. we do random walk first, to get
R0 = R + e and then do SVD to get R = U CV , and finally get the new R as U V .
For the actual application, we might have a rather good idea of what R and t
might be, although we do not know them exactly, and this can be incorporated
into the prior for R and t. This is another attraction of the Bayesian approach, as
such information is rather important and cannot be naturally incorporated into
the EM algorithm.
30
We only use the simple model above, and proposing opposite values for outlier
indicators in this project.
31
Appendix B
B.1 CloudMesh-0.1x
CloudMesh is an open source software that is readily available on source forge
[21]. We were able to download the full source code. Unfortunately, the program
was developed using Visual Studio, this means that the application can only run
natively on Windows machines. Moreover, the program only supports the .off file
format, which only includes 3D vertices with no colors. Lastly, the performance is
very weak against noisy point clouds. As shown in figure B.1, the exterior surface
of the soy sauce bottle is very rough. In addition, there should be an empty gap
around the handle; however the gap is filled up with undesired meshes.
32
Figure B.1: The mesh model of the soy sauce bottle using CloudMesh.
33
It is worth noting that the resulting mesh does not include color information at
this point. In the final step, we run the Vertex Attribute Transfer operation to
transfer the color information from the original point cloud onto the new mesh
model. The transfer algorithm uses a simple closest point heuristic to match the
points between the two models.
Moreover, MeshLab also allows users to export all the operations described
above into a single script file (.mlx). The script can be invoked using a shell script
adhere to the specifications of MeshLabServer [10].
Overall, we believe MeshLab is the perfect software to use for this project, as
it fulfills all the requirements that we discussed.
34
Appendix C
Alignment Methods
• Find the mapping between a pixel in the depth image with another pixel
in the color image.
35
Figure C.1: Greyscale image of a depth array
by us; however, we did run a series of sample tests. Our findings did match closely
with the claims. Now that we know the z-axis value of our world coordinates. To
find both x-axis and y-axis values, it is just a matter of image projection using
the formula listed below:
P3D . x = ( x d − cx d ) ∗ P3D . z / f x d
P3D . y = ( y d − cy d ) ∗ P3D . z / f y d
P3D . z = depth ( x d , y d )
A point cloud is a set of vertices in a three-dimensional coordinate system. If
we take each pixel on the depth image and convert each of them to its perspective
world coordinate, the point cloud is thus constructed.
36
Figure C.2: Relationship between depth measurements and inverse distances
As illustrated on figure C.4, both the color image and the depth image are
taken simultaneously. We can choose the four corners of the check board as feature
points (marked with red arrows) to analyze the mapping relationship. Contrary to
common sense, the mapping relationship is non-linear. The displacement between
the color camera and the depth camera implies an affine transformation between
the two images in both rotation and translation. Here are the formulas that we
used in our implementation [1]:
P3D ’=R. P3D + T
P2D rgb . x=(P3D ’ . x∗ f x r g b / P3D ’ . z ) + c x r g b
P2D rgb . y=(P3D ’ . y∗ f y r g b / P3D ’ . z ) + c y r g b
R and T represent the rotational and the translational matrices respectively,
while fx rgb, fy rgb, cx rgb and cy rgb are intrinsic values associated with the
Kinect device. Nicolas Burrus, a PhD student in computer vision did significant
37
Figure C.3: Constants used for the conversion to world coordinates
Figure C.4: Same checker board on both the depth image and the color image
works to derive those constants. We took the values that he purposed and ran a
number of sample tests with different objects. The formula works genuinely well
with small deviations. Accordingly, we modified some of the values slightly to
introduce a better fitting for our own Kinect.
38
References
[3] Y. Cui, S. Schuon, D. Chan, S. Thrun, and C. Theobalt, “3D shape scanning
with a time-of-flight camera,” in CVPR, pp. 1173–1180, IEEE, 2010. 7, 14,
24, 28
[8] T. Tieleman, “Gnumpy: an easy way to use GPU boards in Python,” Tech.
Rep. UTML TR 2010-002, University of Toronto, Department of Computer
Science, 2010. 11
39
REFERENCES
40
REFERENCES
41