CEAI, Vol.15, No.1 pp. 45-52, 2013
Printed in Romania
GPGPU Based Non-photorealistic Rendering of Volume Data
Anca Morar, Florica Moldoveanu, Victor Asavei, Lucian Petrescu, Alin Moldoveanu, Alexandru Egner
The Faculty of Automatic Control and Computers, University “POLITEHNICA” of Bucharest,Romania,
(e-mail:{anca.morar, florica.moldoveanu, victor.asavei,
lucian.petrescu, alin.moldoveanu, alexandru.egner}@cs.pub.ro)
Abstract: Nowadays, non-photorealistic volume rendering has become a useful technique in medicine
and scientific visualization. One of these rendering techniques is silhouette extraction of iso-surfaces.
This paper proposes three methods of extracting silhouettes from relatively large datasets very fast (in
some cases, even in real time), using the GPGPU technology. These methods are suitable for different
types of datasets, applications and hardware characteristics. The first method extracts the iso-surface and
then computes its silhouette. The second one extracts only the silhouette and computes the visibility of
each contour vertex using an algorithm inspired by ray casting. The third method uses a CUDA rasterizer
in order to render iso-surfaces and silhouettes from large datasets.
Keywords: Computer vision, Parallel algorithms, Non-photorealistic volume rendering, Marching cubes,
Geometry shader
1. INTRODUCTION
Volumetric visualization has become a widely used technique
in many domains, especially in medicine, for diagnosis. The
datasets acquired from medical devices (CT or MRI devices),
in the form of stacks of slices representing sections through
the scanned body, are processed in order to obtain different
interpretations of the scanned objects.
The fourth chapter describes the algorithms behind the
silhouette rendering methods. The fifth section is dedicated to
presenting some results regarding computing times and visual
effects. The final section points out several conclusions.
In addition to the usual volume rendering techniques (e.g.
direct volume rendering, MIP, indirect volume rendering), a
series of new illustrative visualization methods have been
designed in order to highlight different important
characteristics in a volume dataset.
The non-photorealistic rendering methods presented in this
paper are based on the marching cubes algorithm by
(Lorensen and Clinne 1987), and the ideas described in
(Burns et al. 2005). The applications render the silhouette of
an iso-surface extracted from a volume dataset, depending on
the viewer’s position, as shown in Figure 1. The visualization
of a body part’s silhouette is very important in medicine,
because it concentrates the attention on the salient elements
of the scanned object, and not on the texture, colour, or
lighting of the scene that can, in some cases, complicate too
much the understanding of the object’s medical
characteristics. Although medicine is the most significant
domain where volume rendering (and in this case, nonphotorealistic volume rendering) is useful, there are other
fields, like education (biology or medical training) and
scientific visualization, where this technology leads to very
interesting results.
The next section presents the state of the art in this domain, in
order to understand the challenges of volume rendering. The
third chapter describes the marching cubes algorithm with
CUDA and the silhouette extraction technique by (Burns et
al. 2005) that inspired the research presented in this article.
Fig. 1. Non-photorealistic rendering of femoral and pelvic
bones from a CT dataset of size 5123 using marching cubes
and geometry shader. The dataset was obtained from
Floreasca Emergency Hospital, Bucharest.
All the tests presented in the following chapters were made
on a GT 230M GPU with 1GB GPU RAM.
46
CONTROL ENGINEERING AND APPLIED INFORMATICS
2. STATE OF THE ART
One of the most popular techniques for volume rendering,
marching cubes, by (Lorensen and Cline 1987), approximates
an iso-surface through a triangular mesh, by computing its
intersection with the volume cells or voxels. This method
obtains the geometry of the iso-surface, unlike volume ray
casting described by (Levoy 1990) that casts rays from the
eye to each image pixel into the volume, sampling it at
certain intervals, and compositing the colours and opacities
for every sample.
Some non-photorealistic volume rendering techniques use
iso-surface reconstruction. (Pelt et al. 2008) propose an
interactive GPU illustrative framework based on a particle
system that is created by using a marching cubes like
approach. (Dong et al. 2003) introduce volumetric hatching,
a technique that produces pen-and-ink style images from
medical volume data by generating iso-surfaces from the data
with marching cubes and applying standard surface-hatching
techniques. (Yuan and Chen 2004) illustrate surfaces by
drawing feature lines, such as silhouettes, valleys, ridges and
surface hatching strokes. This approach first extracts a pointbased surface model described by (Co et al. 2003) and then
renders the volume, embedding surfaces from the first step.
Other methods, based on the ray casting algorithm, modify
the transfer function in order to obtain non-photorealistic
renderings. (Luo et al. 2009) present an illustrative technique
for focusing on a user-driven region of interest with a
distance function which controls the opacity of the voxels.
(Kindlmann et al. 003) advance the use of curvature
information in multi-dimensional transfer functions, bringing
contributions to three different application areas: nonphotorealistic volume rendering, surface smoothing via
anisotropic diffusion and visualization of iso-surface
uncertainty. (Bruckner and Gröller 2007) propose the concept
of style transfer functions that enables flexible data-driven
illumination which goes beyond using the transfer function
only for assigning colours and opacities. (Hadwiger et al.
2005) describe a real-time rendering pipeline for implicit
surfaces defined by a volumetric grid using a ray-casting
approach and local shape descriptors.
(Burs et al. 2005) render silhouettes from volume data with a
voxel-based marching lines technique that traces contours.
This method will be described in more detail in the next
chapter.
3. RELATED WORK
This section focuses on the previous techniques that represent
the basis for this new GPGPU silhouette rendering methods.
3.1. Silhouette extraction from volume data
In their paper, (Burns et al. 2005) propose a CPU method for
extracting silhouettes and suggestive contours. They extract
the silhouette by intersecting the iso-surface with the points
where the normal is perpendicular to the view ( N V 0 ).
Figure 2 presents an example of the silhouette extraction
algorithm – for a volume voxel.
Based on the scalar values of the voxel’s vertices (A0, A1 and
A3 are outside the iso-surface because their scalar values
a0, a2, a3 < iso-value, and A1 is inside the iso-surface because
a1 > iso-value), a triangle is generated, with vertices B0, B1
and B2 and their normal vectors N 0 , N 1 and N 2 .
Let V eye B be the view vector for vertex B. The dot
product d N V is then computed for every triangle vertex.
If the dot products have different signs (for example, d0 and
d1 are positive, d2 is negative), this means that there is a
silhouette segment ([C0C1]) inside the triangle. The silhouette
segment is determined using linear interpolation from the
triangle vertices, based on d0, d1 and d2.
Fig. 2. Extraction of a silhouette segment from an iso-surface,
based on its intersection with a volume voxel.
After obtaining the lines from the volume, a visibility test is
realized by casting rays from each silhouette segment
endpoint to the eye. For each volume voxel intersected by a
ray, they determine whether it intersects the iso-surface in
that cell by examining the intersection with the face by which
it leaves the cell. At that intersection, they use linear
interpolation from the four corners to determine the function
value and see if it has changed sign (indicating that the ray
has passed from outside to inside the iso-surface). If it has,
the originating vector is marked as occluded.
3.2. GPGPU parallelism
This sub-section describes several characteristics of the
GPGPU paradigm, as presented in the CUDA Programming
Guide from Nvidia (2011a). GPGPU refers to using the GPU
for other computations besides graphics rendering.
CUDA is a parallel computing architecture that leverages the
parallel compute engine in Nvidia GPUs to solve complex
computational problems in a more efficient way than on the
CPU. At the core of CUDA there are three main abstractions
that are exposed to the programmers as a set of language
extensions: a hierarchy of threads, shared memory and barrier
synchronization.
CUDA C extends the well-known C programming language
by allowing the definition of kernels that are executed in
parallel by different threads, unlike the regular C functions
that are executed only once. The threads are organized into
blocks and the blocks into grids, allowing programmers to
organize the problems depending on the size of the data to be
handled.
CUDA threads can access data from different memory spaces
during their execution. The GPU memory can be local,
shared, global, texture or constant.
CONTROL ENGINEERING AND APPLIED INFORMATICS
Threads within a block can cooperate by sharing data through
shared memory and by synchronizing the execution in order
to coordinate the access to memory. Through barrier
synchronization, threads in a block wait for all the other ones
to finish before proceeding with the next computation.
One research focus of our group is to develop parallel
algorithms for analysis and visualization in medicine based
on the GPGPU paradigm. As a result, several improvements
to CUDA implementations of marching cubes, Canny edge
detector and Hough transform were proposed by (Petrescu et
al. 2011; Morar et al. 2012). Other algorithms that use the
power of the GPUs for parallel computation are described in
the current paper.
3.2. Marching cubes with CUDA
The silhouette rendering algorithms presented in this paper
follow a parallel implementation, using CUDA kernels. A
CUDA kernel is defined for one data element and then
executed in parallel for all the data elements, for example, for
all the voxels in a volume dataset. The first silhouette
rendering technique is very fast, but uses a lot of GPU
memory. This is why other two algorithms that would require
less video memory have been designed. All three methods are
based to some extent on the CUDA implementation of
marching cubes.
The parallel implementation of marching cubes from the
(Nvidia 2011a, b) leads to very satisfactory results, compared
to the CPU implementation. For a dataset of size 2563, the
application runs at 10 fps if the iso-surface is reconstructed at
every frame, unlike the CPU application that runs at 0.03 fps.
If the reconstruction of the surface is made only when the isovalue has been changed, the application runs at 60 fps (while
the CPU application runs at 0.2 fps).
The marching cubes algorithm (in pseudo code) from the
CUDA SDK example is presented below. There is an
alteration to the initial algorithm. In the classic
implementation, the normal of the triangles that approximate
the iso-surface are computed based on the triangle vertices
using the cross product. Thus, only one normal is generated
for each triangle. In the proposed method, the triangle
normals are replaced by gradient vectors. The gradient
vectors of the voxels’ vertices are approximated with finite
differences based on the scalar values within the volume. The
gradient vectors of the iso-surface triangles are computed
using linear interpolation from the gradient vectors of the
voxel’s vertices.
This alteration offers a better visual effect of the rendered
iso-surface by allowing smooth illumination and is also
important for the further extraction of silhouettes. Since the
silhouette extraction presented in section 3.1 is based on the
difference between the normal vectors of a triangle’s vertices,
the original algorithm, with a single normal for the entire
triangle cannot be used.
Pseudo code 1
Marching cubes with CUDA
Step 1. copy data to the GPU
copy the search tables and the volume vol
into texture memory on the GPU.
47
Step 2. classify and remove non-border voxels
for every voxel v in vol do (in parallel
using a CUDA kernel)
classify v (interior/exterior/border)
end for
read total number of active (border) voxels
for every voxel v in vol do (in parallel
using a CUDA kernel)
if v is border voxel then
save v in a vector of active voxels
end if
end for
Step3. compute triangles of the iso-surface
for every active voxel av do (in parallel
using a CUDA kernel)
for every vertex vx of av do
read value of vx (from texture)
read value of neighbouring vertices
(for the approximation of the gradient vector)
compute gradient at vx (with finite
differences)
end for
for every intersection point ip of the
iso-surface with the edges of v do
determine value of ip and normal at ip
(using linear interpolation)
end for
for every triangle t of the iso-surface do
save vertices and normals into vertex
buffer objects (VBOs - on the GPU, in order to
be directly rendered by the GPU)
end for
end for
End of pseudo code
Changing the viewer’s position does not affect the
performance of this rendering method. This is why the first
approach presented in this paper is based mainly on the
implementation of marching cubes with CUDA.
Figure 3 presents the surface of a femoral bone and a part of
the pelvis extracted with marching cubes from a volume
dataset of size 2563.
Fig. 3. Surface rendered using the CUDA marching cubes
algorithm. The dataset was obtained from Floreasca
Emergency Hospital, Bucharest.
In order to handle large datasets, the original volume can be
divided into smaller sub-volumes (chunks) that are
reconstructed serially on the GPU using the same marching
48
CONTROL ENGINEERING AND APPLIED INFORMATICS
cubes CUDA kernel, as described by (Petrescu et al. 2011).
Instead of having two VBOs, one for the triangle vertices and
one for their normal vectors, there will be two VBOs for each
sub-volume. This implementation is slightly slower than the
original marching cubes with CUDA, but can handle
considerably larger datasets.
The silhouette rendering flow described in this section is
illustrated in Figure 5.
4. SILHOUETTE RENDERING USING CUDA
4.1. Silhouette rendering with marching cubes and geometry
shader
This new approach takes advantage of the programmable
rendering pipeline’s flexibility. After extracting the triangular
mesh approximating the iso-surface, the silhouettes are
rendered using a vertex shader and a geometry shader.
The input of the vertex shader contains the position and
normal for each vertex, the eye position in world coordinates,
the model matrix and the model-view-projection matrix. The
output of the vertex shader consists of the vertex positions in
clip space, the position and normal vector and the eye
position in world coordinates.
While the vertex shader can process only vertices, the
geometry shader can handle whole primitives. The input of
the geometry shader is represented by the mesh triangles in
world and clip coordinates and the eye position in world
coordinates. The output contains either nothing, or a segment
of the silhouette iso-surface in clip coordinates, computed
based on the silhouette extraction algorithm depicted in
Figure 2. For every triangle of the iso-surface which
intersects the surface where the normal is perpendicular to the
view, the geometry shader emits a silhouette segment
representing that intersection.
In order to compute the visibility of the silhouette points, the
geometry representing the iso-surface is sent to the graphics
pipeline, and rasterized only for the creation of the depth
buffer. After this step, the contour lines are also rasterized by
updating both the colour and the depth buffer. In
consequence, only the silhouette pixels that are not occluded
are rendered.
Figure 4 shows the output of the geometry shader that
represents the silhouette rendered without any visibility test,
and the output of the presented algorithm, i.e. the silhouette
rendered with the visibility test.
Fig. 5. The flow of the algorithm that renders the silhouette
with marching cubes and geometry shader.
This method is the fastest, but requires a lot of GPU memory
space. Storing the volume and the geometry on the GPU
implies the use of small datasets for this implementation (up
to 2563). Still, this method can be used for larger datasets, but
with an improvement.
In order to handle large datasets, an approach similar to the
one proposed by (Petrescu et al. 2011) can be considered.
The geometry represented by two VBOs for each sub-volume
is further processed by the vertex and geometry shader in the
same way as the method for smaller datasets. This approach
solves to some extent the memory limitation, because, at one
moment in time, only a small part of the volume is computed.
But the need to store the whole geometry on the graphics card
still limits the size of the dataset.
4.2. Silhouette rendering with ray based visibility test
The second approach, implemented on the GPU, proposes a
visibility test based on the CPU silhouette rendering method
of (Burns et al. 2005). First, it extracts the silhouette without
any visibility test, and then, for every vertex belonging to the
silhouette, it casts a ray through the volume to the eye
position, intersecting it with the iso-surface and deciding
whether the silhouette point is occluded or not. The steps of
the algorithm are described in Figure 6.
The algorithm for silhouette rendering with ray based
visibility test is described in the following pseudo code:
Fig. 4. Silhouette rendered without visibility test (left) and
with visibility test (right).
Pseudo code 2
Silhouette rendering with ray based visibility
test
Step 1. similar to Step 1 in pseudo code 1
Step 2. similar to Step 2 in pseudo code 1
Step 3. extract silhouette lines that are
visible
for every active voxel av from volume vol do
(in parallel using a CUDA kernel)
determine the triangles approximating the
iso-surface (similar to Step 3 in pseudo code 1)
for every triangle t of the iso-surface do
for every vertex vx do
CONTROL ENGINEERING AND APPLIED INFORMATICS
compute the view vector vv
compute the dot product dp between
normal at vx and vv
end for
if dp for one vertex has a different
sign from dp for the other two vertices then
compute the intersection of t with
the surface where the normal is perpendicular to
the view (based on the algorithm from section
2); the result is a silhouette segment s
for each endpoint p of s do
p is considered visible
cast a ray r from eye to p
determine where r enters vol
for every ray step (the step size
ia equal to the size of a voxel edge) do
determine the next intersected
voxel iv (the voxel that contains the next point
on the ray) by incremental computing
compute the intersection of iv
with r and determine the face f of iv by which r
leaves iv
if the value of f (computed
using linear interpolation from the corners of
f) has changed sign, then
p is occluded
end if
end for
end for
if both endpoints of s are visible
then
store s into a VBO
end if
end if
end for
end for
End pseudo code
The algorithm should work for larger datasets because only
the silhouette is stored on the GPU (unlike the first approach
that stores the iso-surface as well). However, the performance
is altered by the changing of the viewer’s position (each time
the eye position is changed, the silhouette has to be computed
all over again) and by the visibility test.
Fig. 6. The flow of the silhouette rendering algorithm with
ray based visibility test.
49
Also, for larger datasets, the implementation using chunks is
very useful. But this approach does not allow the generation
of visible silhouette lines using only one CUDA kernel. Thus,
Step 3 from the previous pseudo code is divided into two
steps. The first step, the generation of silhouette segments,
both visible and occluded, is defined in a CUDA kernel. The
kernel is executed for all the sub-volumes. The generated
silhouettes are referred to as sub-silhouettes. The process is
serialized, meaning that at one moment in time only one subvolume is processed in parallel on the GPU. The second step,
i.e., the visibility test, is defined in another CUDA kernel.
This kernel is executed for all the sub-silhouettes, for every
silhouette segment. The difference between the previous
approach is that this time the visibility test of a sub-silhouette
is run not only for the corresponding sub-volume, but for all
the sub-volumes, serially.
The processing flow for handling large datasets is explained
in Figure 7, in order to provide some insight into the
silhouette rendering with ray based visibility test for large
datasets.
4.3. Silhouette rendering using a CUDA rasterizer
A new approach that solves the problem of the memory
limitation is proposed. It does not save the geometry, but
renders it directly using a CUDA rasterizer. If there is GPU
memory only to store the volume, an own rasterizer can be
implemented, based on the z-buffer algorithm, using the
CUDA architecture. Three buffers are used, one colour
buffer, cbuffer, and two depth buffers (all three buffers
have the size of the display window). The first depth buffer,
zbuffer1, stores the maximum depth values for the triangles
approximating the iso-surface. The second depth buffer,
zbuffer2, stores the maximum depth values for the
silhouette lines.
Fig. 7. The flow of the algorithm that renders silhouettes
using a ray based visibility test – the approach that handles
large datasets.
50
The pseudo code of the silhouette renderer with a CUDA
rasterizer is presented below.
Pseudo code 3
Silhouette rendering with CUDA rasterizer
Step 1. copy volume to GPU
save volume vol on GPU texture memory
Step 2. update depth buffers
for every voxel v in vol do (in parallel
using a CUDA kernel)
determine the triangles approximating the
intersection of v with the iso-surface
for
every
triangle
t
(computed
in
normalized device coordinates) do
for every point (x,y,z) inside t or
belonging to its edges do
if zbuffer1[x,y]<z then
zbuffer1=z
end if
end for
if the triangle contains a silhouette
line s then
for every point (x,y,z) of s do
if zbuffer2[x,y]<z then
zbuffer2[x,y]=z
end if
end for
end if
end for
end for
Step 3. update colour buffer
for every display window pixel (x,y) do (in
parallel using a CUDA kernel)
cbuffer[x,y]=0
if
zbuffer2[x,y]>=zbuffer1[x,y]
(silhouette pixel visible) then
cbuffer[x,y]=1
end if
end for
End of pseudo code
The flow of the algorithm is described in Figure 8.
CONTROL ENGINEERING AND APPLIED INFORMATICS
The operation of updating the depth buffers is implemented
with the CUDA atomicMax() operation in order to avoid
read-modify-write hazards. The colour buffer is stored into a
pixel buffer object (PBO).
This implementation is much slower than the first approach,
because the CUDA rasterizer is not specialized for rendering,
like the hardware rendering mechanisms of the GPU. Also,
the generation of the visible silhouette segments has to be
repeated every time the eye position is changed.
This CUDA rasterizer can be used also for indirect volume
rendering. If there is not enough video memory to store the
iso-surface extracted using marching cubes, the geometry can
be rasterized directly with this new approach.
In order to handle larger datasets, the same depth buffer
updating kernel can be run (Step 2), but for each subvolume serially. The colour buffer updating kernel (Step 3)
will be run only once, just like in the previous pseudo code.
5. RESULTS
Table 1 presents the computing times for rendering the
silhouettes of iso-surfaces reconstructed from different
volume datasets. The second row (with the header “M” from
“mode”) refers to the reconstruction mode: “a” states for the
case when the reconstruction of the iso-surface/silhouette is
made only when the iso-value is changed (does not depend
on the changing of the viewer’s position – is only applicable
for the first rendering method); “b” states for the case when
the reconstruction has to be done all the time (when the isovalue is changed or the eye position is changed). For each
method, there are two columns, for the algorithm that handles
small datasets and the one that can handle larger datasets.
Table 1. Computing times (ms) for the three methods of
silhouette rendering (the case designed for small (“S”)
datasets and the one designed for larger (“L”) datasets,
with the division of the volume into sub-volumes).
Data vol.
M
323
323
2563
2563
5123
5123
5122x1024
5122x1024
a
b
a
b
a
b
a
b
1st
alg
S
4.2
7.4
16
115
x
x
x
x
1st
alg
L
5.75
8.79
29.8
117
66.1
865
x
x
2nd
alg
S
x
10.3
x
677
x
x
x
x
2nd
alg
L
x
12.76
x
839.7
x
4030
x
x
3rd
alg
S
x
14.5
x
501
x
x
x
x
3rd
alg
L
x
15.5
x
515
x
2609
x
9872
As can be observed, the first method is the fastest (especially
when the reconstruction is made only when the iso-value is
changed), but can handle only small datasets, because the
whole geometry (iso-surface and silhouette) is stored on the
GPU.
Fig. 8. The flow of the algorithm that renders silhouettes
from volumes using a CUDA rasterizer.
The second approach should work for larger datasets, but is
limited by the memory allocations for the removal of nonborder voxels and by the fact that the silhouette is stored on
the GPU.
CONTROL ENGINEERING AND APPLIED INFORMATICS
51
The third approach works faster than the second (for datasets
of size 2563 and 5123), and can handle large datasets
(theoretically, it can handle unlimited size volume datasets),
but at non-interactive frame rates.
Each implementation uses a different visibility test, with
more or less influence on the computing time. In order to
observe this influence, tests have been made for each method,
with or without the visibility test.
Table 2 presents the computing times for the silhouette
rendering technique with marching cubes and geometry
shader. The second column has the same meaning as the one
in Table 1. The third and fourth columns correspond to the
method (without and with visibility test) that does not divide
the volume, being able to handle only small datasets. The last
two columns correspond to the method that can handle larger
datasets.
Table 2. Computing times (ms) for the first silhouette
rendering method, using CUDA marching cubes and
geometry shader, with and without visibility test.
Data
vol.
323
323
2563
2563
5123
5123
M
a
b
a
b
a
b
Small
no visibility
4.2
7.4
15
114.5
x
x
Small
4.2
7.4
16
115
x
x
Large
no visibility
5.7
8.6
29.2
175
63
815
Large
5.75
8.79
29.8
177
66.1
865
The difference between applying the visibility test or not in
the first silhouette rendering method is almost insignificant,
because the iso-surface is extracted in any case. The small
increase in speed for the approach that does not test the
visibility of the silhouette vertices is caused by the fact that
the iso-surface is sent only once to the graphics pipeline (to
be further processed by the vertex and geometry shader in
order to extract the silhouette), unlike the other approach, that
sends the iso-surface twice, once for updating the depth
buffer and once for silhouette extraction.
differences between applying the visibility test or not are also
salient, but small compared to the differences in the second
silhouette rendering approach. These differences are bigger
than the ones in the first approach because, here, the isosurface updates the depth buffer (or not, in case of no
visibility test) using an own rasterizer, and not the specialized
hardware rendering mechanisms of the GPU.
Table 4. Computing times (ms) for the silhouette
rendering method with CUDA rasterizer.
Data vol.
3
32
2563
5123
5122x1024
Small
no visibility
11.2
490
x
X
Small
14.5
501
x
x
Large
no visibility
15
334
2213
5143
Large
15.5
515
2609
9872
Also, the visibility test for the three approaches leads to
different visual results: the visibility test for the first and third
approach is made for every point belonging to a silhouette
segment, while in the second approach, it is made only for the
endpoints of a silhouette segment. This is why the silhouette
rendered with the second method has more discontinuities
(see Figure 9).
The three silhouette rendering methods have been compared
not only regarding the computing times and the size of the
datasets that can be handled, but also regarding the visual
quality. Silhouettes from several datasets have been rendered
with the three approaches and compared with manually
drawn silhouettes. The tested datasets are shown in Figure 10.
Table 5 presents the percentage of contour pixels correctly
rendered for the proposed methods.
Table 3 presents the computing times for the silhouette
rendering algorithm with ray based visibility test.
Table 3. Computing times (ms) for the second silhouette
rendering method, with ray based visibility test.
Data
vol.
323
2563
5123
small
no visibility
6.7
121.5
x
Small
10.3
677
x
large
no visibility
10
143.4
782
Large
12.76
839.7
4030
Here the differences between the two approaches, one with
and one without visibility test, are remarkable, especially for
the method that handles large datasets by dividing the volume
into sub-volumes. Thus it can be concluded that the ray based
visibility test is very time consuming, especially for large
datasets.
Table 4 presents the computing times for the silhouette
rendering algorithm that uses a CUDA rasterizer. The
Fig. 9. Difference between rendering methods (from left to
right): iso-surface rendering, silhouette rendering with ray
based visibility test, silhouette rendering with geometry
shader/CUDA rasterizer. The dataset is part of the Volume
Library (2013).
Table 5. Percentage of correctly rendered silhouette
pixels (%) for the three methods.
Data vol.
a) 2563
b) 2563
c) 2563
d) 128x2562
e) 323
f) 643
First
method
0.99
0.97
0.97
0.97
0.97
0.95
Second
method
0.93
0.91
0.96
0.93
0.90
0.90
Third
method
0.99
0.97
0.97
0.97
0.97
0.95
52
CONTROL ENGINEERING AND APPLIED INFORMATICS
GPU memory. Our CUDA rasterizer solves the problem of
the GPU memory limitation, but it is not as efficient as the
hardware rendering mechanisms of the GPU. Similar to the
second implementation, the iso-surface and the silhouette
have to be extracted each time the position of the viewer is
changed.
ACKNOWLEDGEMENTS
Part of the work has been funded by the Sectorial Operational
Programme Human Resources Development 2007-2013 of
the Romanian Ministry of Labour, Family and Social
Protection
through
the
Financial
Agreement
POSDRU/88/1.5/S/61178.
REFERENCES
Fig. 10. Datasets used for testing the percentage of contour
pixels correctly rendered. For each dataset, an iso-surface and
a silhouette are shown. Datasets a) and b) represent a femoral
bone and a pair of hip bones (femur+pelvis), respectively,
and were obtained from Floreasca Emergency Hospital in
Bucharest. Datasets c) Foot, d) Head (Visible Male), e)
Bucky Ball and f) Fuel Injection are part of the Volume
Library (2013).
6. CONCLUSIONS
This paper presents several contributions in the field of nonphotorealistic volume rendering. Three silhouette rendering
methods were implemented, each being suited for different
types of datasets and different characteristics of the GPU
cards.
The first implementation requires enough GPU memory to
store the dataset, the reconstructed iso-surface and the
extracted silhouette. The advantages of this method are the
fast rendering times and the possibility to reconstruct the isosurface only when the iso-value is changed. The silhouette
rendering with marching cubes and geometry shader leads to
interactive frame rates for datasets up to 5123 if the
reconstruction is done only when the iso-value is changed.
The second implementation stores on the GPU only the
dataset and the silhouette of the iso-surface. The time
consuming operation in the second method is the ray based
visibility test, which casts rays from each silhouette vertex to
the viewer's position, intersecting them with the volume. The
implementation that splits the initial volume into chunks
handles larger datasets, but is not very fast. The reason is that
for one extracted sub-silhouette, the visibility test is run for
each sub-volume, serially. Another drawback of this
implementation is the requirement to extract the iso-surface
and the silhouette each time the viewer's position changes.
The third silhouette rendering implementation is perfectly
suited for the case when the size of the dataset makes it
impossible to store the iso-surface or the silhouette on the
Bruckner S., Gröller M.E. (2007). Style Transfer Functions for
Illustrative Volume Rendering. Computer Graphics Forum,
26(3):p. 715-724.
Burns M., Klawe J., Rusnikiewicz S., Finkelstein A., DeCarlo D.
(2005). Line Drawings from Volume Data. ACM Transactions on
Graphics (Proc. SIGGRAPH), 24(3):p. 512-518.
Co C. S., Hamann B., Joy K.I (2003). Isosplatting: A point-based
alternative to isosurface visualization. Proceedings of 11th Pacific
Conference on Computer Graphics, 325-334.
Dong F., Clapworthy G.J., Lin H., Krokos M.A. (2003). Nonphotorealistic Rendering of Medical Volume Data. IEEE
Computer Graphics and Applications, 23(4):p. 44-52.
Hadwiger M., Sigg C., Scharsach H., Buehler K., Gross M. (2005):
Real-Time Ray-Casting and Advanced Shading of Discrete
Isosurfaces. Computer Graphics Forum, 24(3):p. 303–312.
Kindlmann G., Whitaker R., Tasdizen T., Moeller T. (2003).
Curvature-Based Transfer Functions for Direct Volume
Rendering: Methods and Applications. VIS ‘03 Proceedings of
the 14th IEEE Visualization, p.513-520.
Levoy M. (1990). Efficient Ray Tracing of Volume Data. ACM
Transactions on Graphics, 9(3):p.245-261.
Lorensen W.E., Cline H.E. (1987). Marching Cubes: A High
Resolution 3D Surface Construction Algorithm. SIGGRAPH ’87
Proceedings of the 14th Annual Conference on Computer
Graphics and Interactive Techniques , p.163-169.
Luo Y., Guitian J.A.I., Gobbetti E., Marton F. (2009). Context
Preserving Focal Probes for Exploration of Volumetric Medical
Datasets. Proceedings of 3DPH, 187-198.
Nvidia Corporation (2011a). NVIDIA CUDA C Programming
Guide, Version 4.0.
Nvidia Corporation (2011b). NVIDIA CUDA SDK Example,
available at www.nvidia.com last visited in 02.01.2012.
Pelt, R.V., Vilanova, A., Wetering, H.M.M.V.D. (2008). GPU-based
Particle Systems for Illustrative Volume Rendering. Volume
Graphics, p. 89-96.
Petrescu L., Morar A., Moldoveanu F., Asavei V. (2011). Real Time
Reconstruction of Volumes from Very Large Datasets using
CUDA. Proceedings of 15th International Conference on System
Theory, Control and Computing p. 1-5.
Yuan X., Chen B. (2004). Illustrating Surfaces in Volume.
Proceedings of VisSym’04 Joint IEEE/EG Symposium on
Visualization.
The Volume Library, available at http://www9.informatik.unierlangen.de/External/vollib/, last visited in 11.03.2013.
Morar A., Moldoveanu F., Moldeveanu A., Asavei V., Egner A.
(2012). Multi-GPGPU Based Medical Image Processing in Hip
Replacement", Journal of Control Engineering and Applied
Informatics, vol. 14(3), pp. 25-34.