Graphical Models 76 (2014) 706–723
Contents lists available at ScienceDirect
Graphical Models
journal homepage: www.elsevier.com/locate/gmod
From 2D to 2.5D i.e. from painting to tactile model q
Rocco Furferi ⇑, Lapo Governi, Yary Volpe, Luca Puggelli, Niccolò Vanni, Monica Carfagni
Department of Industrial Engineering of Florence, Via di Santa Marta 3, 50139 Firenze, Italy
a r t i c l e
i n f o
Article history:
Received 9 August 2014
Received in revised form 10 October 2014
Accepted 13 October 2014
Available online 22 October 2014
Keywords:
2.5D model
Shape From Shading
Tactile model
Minimization techniques
a b s t r a c t
Commonly used to produce the visual effect of full 3D scene on reduced depth supports,
bas relief can be successfully employed to help blind people to access inherently bi-dimensional works of art. Despite a number of methods have been proposed dealing with the
issue of recovering 3D or 2.5D surfaces from single images, only a few of them explicitly
address the recovery problem from paintings and, more specifically, the needs of visually
impaired and blind people.
The main aim of the present paper is to provide a systematic method for the semiautomatic generation of 2.5D models from paintings. Consequently, a number of ad hoc
procedures are used to solve most of the typical problems arising when dealing with artistic representation of a scene. Feedbacks provided by a panel of end-users demonstrated the
effectiveness of the method in providing models reproducing, using a tactile language,
works of art otherwise completely inaccessible.
Ó 2014 Elsevier Inc. All rights reserved.
1. Background
Haptic exploration is the primary action that visually
impaired people execute in order to encode properties of
surfaces and objects [1]. Such a process is based on a
cognitive iter based on the combination of somatosensory
perception of patterns on touched surface (e.g., edges, curvature, and texture) and proprioception of hand position
and conformation [2]. As a consequence, visually impaired
people’s cognitive path is significantly restrained when
dealing with experience of art. Not surprisingly, in order
to confront with this issue, the access to three-dimensional
copies of artworks has been the first degree of interaction
for enhancing the visually impaired people experience of
art in museums and, therefore, numerous initiatives based
on the interaction with sculptures and tactile threedimensional reproductions or architectural aids on scale
q
This paper has been recommended for acceptance by Ralph Martin
and Peter Lindstrom.
⇑ Corresponding author. Fax: +39 (0)554796400.
E-mail address:
[email protected] (R. Furferi).
http://dx.doi.org/10.1016/j.gmod.2014.10.001
1524-0703/Ó 2014 Elsevier Inc. All rights reserved.
have been developed all around the world. Unfortunately,
cultural heritage in the form of two-dimensional art (e.g.
paintings or photographs) are mostly inaccessible to visually impaired people since they cannot be directly reproduced into a 3D model. As of today, even if numerous
efforts in translating such bi-dimensional forms of art into
3D model are in literature, sill a few works can be documented. With the aim of enhancing experience of 2D
images, painting subjects have to be translated into an
appropriate ‘‘object’’ to be touched. This implies a series
of simplifications in the artworks ‘‘translation’’ to be performed according both to explicit users suggestions and
to scientific findings of the last decades.
In almost all the scientific work dealing with this subject, a common objective is shared: to find a method for
translating paintings into a simplified (but representative)
model meant to help the visually impaired people in
understanding both the painted scene (position in space
of painted subjects) and the ‘‘shape’’ of subjects themselves. On the basis of recent literature, a wide range of
simplified models can be built to provide visually impaired
people with a faithful description of paintings [3]. Among
R. Furferi et al. / Graphical Models 76 (2014) 706–723
them, the following representations proved to be quite
effective, as experimentally demonstrated in [4]: tactile
diagrams (e.g. tactile outline-based and texturized pattern-based reconstruction) and bas-relief (e.g. flat layered
bas-relief, shaped bas-relief).
Tactile diagrams are not a relief reproduction of visual
images: rather, they are translations of visual images into
a tactile language consisting on the main outlines of the
subjects to be explored mixed together with patterns
added to discriminate different surfaces characterizing
the scene and/or the position of painted subjects. The most
common way for creating such a representation is to separate the background and foreground, or the ground and
figures, illustrating them in two separate diagrams or using
two different patterns. More in general, outline-based representations may be enriched using different textures each
one characterizing different position in space and/or different surface properties [4]. Moreover, these representations
are generally used in conjunction with verbal narratives
that guides the user through the diagram in a logical and
ordered manner [5,6].
Unlike tactile diagrams, bas-relief representation of
paintings delivers 3D information in a more ‘‘realistic’’
way by improving depth perception. For this reason, as
demonstrated in the cited recent study [4], this kind of
model (also referred as ‘‘2.5D model’’), is one of the most
‘‘readable’’ and meaningful for blind and visually impaired
people; it proves to provide a better perception of painted
subjects shape and, at the same time, a clear picture of
their position in the scene. Moreover, according to [7],
the bas-relief is perceived as being more faithful to the original artwork and the different relief (height) assigned to
objects ideally standing on different planes is considered
very useful in discriminating foreground objects from middle ground and background ones.
The topic of bas-relief reconstruction starting from a
single image is a long-standing issue in computer vision
scientific literature and moves from the studies aiming at
recovering 3D information from 2D pictures or photographs. Two of the best known works dealing with this
topic are [8,9]. In [8] a method for extracting a non-strictly
three-dimensional model of a painted scene with single
point perspective is proposed, making use of vanishing
points identification, foreground from background segmentation and polygonal reconstruction of the scene. In
[9] a coarse, scaled 3D model from a single image by classifying each image pixel as ground, vertical or sky and estimating the horizon position is automatically built in the
form of a pop-up model. These impressive works, as well
as similar approaches, however, are not meant to create a
bas-relief; rather they are aimed to create a 3D virtual
model of the scene when objects are virtually separated
each other.
Explicitly dealing with relief reconstruction from single
images, some significant studies can be found in literature,
especially dealing with coinage and commemorative medals (see for instance [10,11]). In most of the proposed
approaches the input image, often representing human
faces, stemmas, logos and figures standing out from the
image background are translated into a flat bas-relief by
adding volume to the represented subjects. The easier,
707
and probably best known, method to perform a bas-relief
reconstruction from images is image embossing [12,13], a
widely recognized computer graphics technique in which
each pixel of an image is replaced either by a highlight or
a shadow, depending on boundaries on the original image.
The result obtained using this technique consists of a relief
visually resembling the original image but affected by
shallow and incorrect depth reconstruction (due to the
algorithm based on image gradient computation). Some
improvements to this method, like the one proposed in
[14], just to cite a few, are in scientific literature; the
embossing method is enhanced by using pre-processing
techniques based on image enhancement, histogram
equalization and dynamic range. The uses of unsharp
masks and smoothing filters have also been extensively
adopted to emphasize salient features and deemphasize
others in the original image, so that the final results better
resemble the original image. In a recent paper [15] an
approach for estimating the height map from single images
representing brick and stone reliefs (BSR) has been also
proposed. The method proved to be adequate for restoring
BSR surfaces by using a height map estimation scheme
consisting of two levels: the bas-relief, referring to the
low frequency component of the BSR surfaces, and the high
frequency detail. Commercial software, like ArtCAM and
JDPaint [16] have been also developed making available
functions for bas-relief reconstruction from images. In
these software packages users are required to use vector
representation of the object to be reconstructed and
‘‘inflate’’ the surface delimited by the object outlines. The
above cited method prove to be effective in creating models where the subjects are volumetrically detached from
the background but with compressed depth [3] like, for
instance, models resembling figures obtained embossing
a metallic plate. In order to obtain a faithful surface reconstruction a strong interaction is required; in particular for
complex shapes, such as faces, it is not sufficient to vectorialize the subject’s outlines but each part to be inflated
needs to be outlined and vectorialized. In case of faces,
for example, lips, cheeks, nose, eyes, eyebrows etc. shall
be manually drafted. This is a time-consuming task when
dealing with paintings, often characterized by a number
of subjects blended into the background (or by a background drawing attention from the main subjects).
The inverse problem of creating 2.5D models starting
from 3D models has also been extensively studied
[17,18]. These techniques use normal maps obtained by
the (available) 3D model and uses techniques for performing the compression in the 2.5D space. However, these
works are not suitable for handling the reconstruction
from a 2D scene where normal map is the desired result
and not a starting point.
On the basis of the above mentioned works, it is evident
that the method researched in this paper is not fully
explored and only a few works are in literature. One of
the most important methods aimed to build models visually resembling sculptors-made bas-relief from paintings
is the one proposed by [19]. In this work the high resolution
image of the painting to be translated into bas-relief is
manually processed in order to (1) extract the painted subject’s contours, (2) to identify semantically meaningful
708
R. Furferi et al. / Graphical Models 76 (2014) 706–723
areas and (3) to assign appropriate height values to each
area. One of the results consists of ‘‘a layered depth diagram’’ made of a number of individual shapes cut out of
flexible sheets of constant thickness, which are glued on
top of each other to resemble a painting’’. Once the layered
depth is built, in the same work, a method for extracting
textures from the painted scene and for assigning them to
the 2.5D model is drawn. Some complex shapes, such as
subject faces, are modeled using apposite software and
the resulting normal maps are finally imported in the
model and blended together. The result consists of a high
quality texturized relief. While this approach may be extremely useful in case perspective related information is lacking in the considered paintings, it is not the best option in
case of perspective paintings. In fact, the reconstructed
scene could be geometrically inconsistent, thus causing
misperception by blind people. This is mainly due to the
fact that the method is required to manually assign height
relations for each area regardless to the information coming
from scene perspective. Moreover, using frequency analysis
of image brightness to convey volumetric information can
provide inconsistent shape reconstruction due to the concave–convex ambiguity. Nonetheless, the chief findings of
this work are used as inspiration from the present work,
together with other techniques aimed to retrieve perspective information and subjects/objects shapes, as described
in the next sections. With the aim of reducing the error in
reconstructing shapes from image shading, the most studied method is the so called Shape From Shading (SFS).
Extensively studied in the last decades [20–23], SFS is a
computational approach that bundles a number of techniques aimed to reconstruct the three-dimensional shape
of a surface shown in a single gray-level image. However,
automatic shape retrieval using SFS techniques proves to
be unsuitable for producing high-quality bas-reliefs [17];
as a consequence, more recent work by a number of
researchers has shown that moderate user-interaction is
highly effective increasing 2.5D models from a single view
[20–23]. Moreover, [24] proposed a two-step procedure:
the first step recovers high frequency details using SFS;
the second step corrects low frequency errors using a
user-driven editing tool. However, this approach entails
quite a considerable amount of user interaction especially
in the case of complex geometries. Nevertheless, in case
the amount of required user interaction can be maintained
at a reasonable level, interactive SFS methods may be considered among the best candidate techniques for generating good quality bas-reliefs starting from single images.
Unfortunately, since paintings are hand-made artworks,
many aspects of the represented scene (such as silhouette
and tones) are unavoidably not accurately reproduced in
the image, thus leading to an even more complex task in
performing reconstruction with respect to the analysis of
synthetic or real-world images. In fact, image brightness
and illumination in a painting are only an artistic reproduction of a (real or imagined) scene. To make things even
worse, light direction is unknown in the most of the cases
and a diffused light effect is often added by the artist to the
scene. Furthermore, real (and represented) objects surfaces
may have complex optical properties, far from being
approximated with Lambertian surfaces [22]. These
drawbacks have great impact on the reconstruction: any
method used for retrieving volume information from
paintings shall be able to retrieve 3D geometry on the basis
of defective information. As a consequence any approach
for solving such a complex SFS-based reconstruction shall
require a number of additional simplifying assumptions
and a worse outcome is always expected with reference
to results obtained for synthetic and real-world images.
With these strong limitations in mind, the present work
provides a valuable attempt in providing sufficiently
plausible reconstruction of artistically reproduced shaded
subjects/objects. In particular the main aim is to provide a
systematic user-driven methodology for the semiautomatic generation of tactile 2.5D models to be explored
by visually impaired people. The proposed methodology
lays its foundations on the most promising existing
techniques; nonetheless, due to the considerations made
about painted images, a series of novel concepts are introduced in this paper: the combination of the spatial scene
reconstruction and the volume definition using different
volume-based contributions, the possibility of modeling
scenes with unknown illumination and the possibility of
modeling subjects whose shading is only approximately
represented. The method does not claim to perform a
perfect reconstruction of painted scenes. It is, rather, a process intended to provide a plausible reconstruction by making a series of reasoned assumptions aimed at solving a
number of problems arising from 2.5D reconstruction starting from painted images. The devised methodology is supported by a user-driven graphical interface, designed with
the intent of helping non-expert users (after a short training
phase) in retrieving final surface of painted subjects (see
Fig. 1).
For sake of clarity, the description of the tasks carried
out to perform reconstruction will be described with reference to an exemplificative case study i.e. the reconstruction of ‘‘The Healing of the Cripple and the Raising of
Tabitha’’ fresco by Masolino da Panicale (see Fig. 2). This
masterpiece is a typical example of Italian Renaissance
paintings characterized by single-point perspective.
2. Method
With the aim of providing a robust reconstruction of the
scene and of the subjects/objects imagined by the artist in
a painting, the systematic methodology relies on an
interactive Computer-based modeling procedure that
integrates:
(1) Preliminary image processing-based operation on
the digital image of a painting; this step is mainly
devoted to image distortion correction and segmentation of subjects in the scene.
(2) Perspective geometry-based scene reconstruction;
mainly based on references [7,8,25], but modeling
also oblique planes, this step allows the reconstruction of the painted scene i.e. the geometric arrangement of painted subjects and objects in a 2.5D
virtual space using perspective-related information
when available. The final result of this step is a virtual ‘‘flat-layered bas-relief’’.
R. Furferi et al. / Graphical Models 76 (2014) 706–723
709
Fig. 1. A screenshot of the GUI, designed with the intent of helping non-expert users in retrieving final surface starting from painted images.
Fig. 2. Acquired image of ‘‘The Healing of the Cripple and the Raising of Tabitha’’ fresco by Masolino da Panicale in the Brancacci Chapel (Church of Santa
Maria del Carmine in Florence, Italy).
(3) Volume reconstruction; using an appositely devised
approach based on SFS and making use of some
implementations provided in [26], this step allow
the retrieval of volumetric information of painted
subjects. As already stated, this phase is applied on
painted subjects characterized by incorrect shading
(guessed by the artist); as a consequence, the proposed method introduces, in authors’ opinion, some
innovative contributions.
(4) Virtual bas-relief reconstruction; by integrating the
results obtained in steps 2–3, the final (virtual)
bas-relief resembling the original painted scene is
retrieved.
(5) Rapid prototyping of the virtual bas-relief.
2.1. Preliminary image processing-based operation on the
digital image
A digital copy of the original image to be reconstructed in
the form of bas-relief is acquired using proper image acquisition device and illumination. Generally speaking, image
acquisition should be performed with the intent of obtaining a high resolution image which preserve shading since
this information is to be used for virtual model reconstruction. This can be carried out using calibrated or uncalibrated
cameras. Referring to the case study used for explaining the
overall methodology, the acquisition device consists of a
Canon EOS 6D camera (provided with a 36 24 mm2 CMOS
sensor with a resolution equal to 5472 3648 pixel2). A CIE
standard illuminant D65 lamp placed frontally to the painting was chosen in order to perform a controlled acquisition.
The acquired image Ia (see for instance Fig. 2) is properly
undistorted (by evaluating lens distortion) and rectified
using widely recognized registration algorithms [27]. Let,
accordingly, Ir (size n m) be the rectified and undistorted
digital image representing the painted scene. Since both
scene reconstruction and volumes definition are based on
gray-scale information, the color of such an image has to
be discarded. In this work, color is discarded by performing
a color conversion of the original image from sRGB to CIELAB color space and by extracting the L⁄ channel. In detail,
a first color conversion from sRGB color space to the
tristimulus values CIE XYZ is carried out using the equations
available for the D65 illuminant [28]. Then, the color transformation from CIE XYZ to CIELAB space is performed simply using the XYZ to CIELAB relations [28]. The final result
consists of a new image IL a b . Finally, from such an image
⁄
the channel L is extracted thus defining a grayscale image
!L that can be considered the starting point for the entire
devised methodology (see Fig. 3).
Once the image !L is obtained, the different objects represented in the scene, such as human figures, garments,
710
R. Furferi et al. / Graphical Models 76 (2014) 706–723
architectural elements, are properly identified. This task is
widely recognized with the term ‘‘segmentation’’ and can
be accomplished using any of the methods available in literature (e.g. [29]). In the present work segmentation is performed by means of the developed GUI, where the objects
outlines are detected using the interactive livewire boundary extraction algorithm [30]. The result of segmentation
consists of a new image C (size n m) where different
regions (clusters Ci) are described by a different label Li
(see for instance Fig. 4 where the clusters are represented
in false colors) with i = 1. . .k and where k is the number
of segmented objects (clusters).
Besides image segmentation, since the volume definition techniques detailed in next sections are mainly based
on the analysis of the shading information provided by
each pixel of the acquired image, another important operation to be performed on image !L consists of albedo normalization. In detail it is necessary to normalize, the albedo
qi of every segment, which is pixel-by-pixel the amount of
diffusively reflected light, to a constant value. For sake of
simplicity, it has been chosen to normalize the albedo to
1, and this is obtained by dividing the gray channel of each
segment by its actual albedo value:
!i ¼
1
qi
!Li
ð1Þ
The final results of this preliminary image-processing
based phase are: (1) a new image ! representing the grayscale version of the acquired digital image with corrected
albedo and (2) clusters Ci each one representing a segment
of the original scene.
2.2. Perspective geometry-based scene reconstruction
Once the starting image has been segmented it is necessary to define the properties of its regions, in order to
arrange them in a consistent 2.5D scene. This is due to
the fact that the devised method for retrieving volumetric
information (described in the next sections) requires the
subject describing the scene to be geometrically and consistently placed in the space, but described in terms of flat
regions.
In other words, a deliberate choice is made by the
authors here: to model, in the first instance, each subject
Fig. 4. Example of a segmented image (different color represents
different segment/region). (For interpretation of the references to color
in this figure legend, the reader is referred to the web version of this
article.)
(i.e. each cluster Ci) as a planar region thus obtaining a virtual flat-layered bas-relief (visually resembling the original
scene but with flat objects) where the relevant information
is the perspective-based position of subjects in the virtual
scene. Two reasons are supporting this choice: firstly, psychophysical studies performed with blind people documented in [4] demonstrated that flat-layered bas-relief
representation of paintings are really useful for a first
rough understanding of the painted scene. Secondly, for
each subject (object) of the scene, the curve identified on
the object by the projection along the viewing direction
(normal to the painting plane) approximately lays on a
plane if the object is limited in size along the projection
direction i.e. when the shape of the reconstructed object
has limited size along the viewer direction (see Fig. 5).
Since the final aim of the present work is to obtain a basrelief where objects slightly detach from the background,
this assumption is considered valid for the purpose of the
proposed method.
A number of methods for obtaining 3D models from
perspective scenes are in literature, proving to be very
effective for solving this issue (see for instance [19]). Most
of them, and in particular the method provided in [8], can
be successfully used to perform this task. Or, at most, combining the results obtained using [8] with the method
described in [19] for creating layered depth diagrams, this
kind of spatial reconstruction can be accomplished. As
stated in the introduction, however, one of the aims of
the present work is to describe the devised user-guided
Fig. 3. Grayscale image !L obtained using L⁄ channel of IL a b .
R. Furferi et al. / Graphical Models 76 (2014) 706–723
711
Fig. 5. Curve identified on the object by the projection along the viewing direction.
methodology for 2.5D model retrieval ab initio in order to
help the reader in getting the complete sense of the
method. For this reason a method for obtaining a depth
map of the analyzed painting, where subjects are placed
in the virtual scene coherently with the perspective, is provided below. A further point to underline is that, differently from similar methods in literature, also oblique
planes (i.e. planes represented by trapezoid whose vanishing lines do not converge in the vanishing point) are modeled using the proposed method.
The procedure starts by constructing a Reference Coordinate System (RCS); first, the vanishing point coordinates
on the image plane V = (xV, yV) are computed [31] thus
allowing the definition of the horizon lh and the vertical
line through V, called lv. Once the vanishing point is evaluated, the RCS is built as follows: the x axis, lying on the
image plane and parallel to the horizon (pointing rightwards), the y axis lying on the image plane and perpendicular to the horizon (pointing upwards) and the z axis
perpendicular to the image plane (according to the right
hand rule). The RCS origin is taken in the bottom left corner
of the image plane.
Looking at a generic painted scene with perspective, the
following 4 types of planes are then identifiable: frontal
planes, parallel to the image plane, whose geometry is
not modified by the perspective view; horizontal planes,
perpendicular to the image plane and whose normal is parallel to the y axis (among them it is possible to define the
‘‘main plane’’ corresponding to the ground or floor of the
virtual 2.5D scene); vertical planes, perpendicular to the
image plane and whose normal is parallel to the x axis;
oblique planes, all the remaining planes, not belonging to
the previous three categories. In Fig. 6 some examples of
detectable planes from the exemplificative case study are
highlighted.
In detail, the identification, and classification, of such
planes is performed using the devised GUI with a semiautomatic procedure. First, frontal and oblique planes are
selected in image by the user, simply clicking on the appropriate segments of the clustered image. Then, since the V
coordinates have been computed, a procedure for automatically classifying a subset of the remaining vertical and
horizontal planes starts. In detail, for each cluster Ci, the
intersections between the region contained in Ci, and the
horizon line are sought. If at least one intersection is found,
the plane type is necessarily vertical and, in particular, is
vertical left if placed to the left of V and vertical right if
placed to the right of V. This statement can be justified
by observing that once selected the frontal and/or oblique
planes, only horizontal and vertical planes remain and that
no horizontal plane can cross the horizon line (they are
entirely above or below it). Actually the only horizontal
plane (i.e. parallel to the ground) which may ‘‘intersect’’
the horizon line is the one passing through the view point
(i.e. at the eye level) and whose representation degenerates in the horizon line itself. If no intersections with the
horizon line are found, intersections between the plane
and the line passing through the vanishing point and perpendicular to the horizon are sought. If an intersection of
this kind is found the plane type is necessarily horizontal
(upper horizontal if above V, lower horizontal if below V).
Among the detected horizontal planes, it is possible to
manually detect the so called ‘‘main plane’’ i.e. a plane
taken as a reference for a series of planes intersecting it
in the scene. Looking at the most part of renaissance paintings usually such a plane corresponds to the ground (floor).
If no intersections with either the horizontal line or with
its perpendicular are found, the automatic classification
of the plane is not possible; as a consequence, it is
requested to manually specify the type of the plane via
direct user input.
Once the planes are identified, and classified under one
of the above mentioned categories, it is possible to build
the virtual flat-layered model by assigning each plane a
proper height map. In fact, as widely known, the z coordinate is objectively represented by a gray value: a black
value represents the background (z = 0), whereas a white
value represents the foreground, i.e. the virtual scene element nearest to the observer (z = 1).
Since the main plane ideally extends from the foreground to the horizon line, its grayscale representation
has been obtained using a gradient represented by a linear
graduated ramp extending between two gray levels: the
level G0 corresponding to the nearest point p0 of the plane
in the scene (with reference to an observer) and the level
G1 corresponding to the farthermost point p1. As a
712
R. Furferi et al. / Graphical Models 76 (2014) 706–723
Fig. 6. Some examples of planes detectable from the painted scene.
consequence to the generic point p 2 ½p0 ; p1 of the main
plane is assigned the gray value G given by the following
relationship:
G ¼ G0 þ jp p0 j Sgrad
ð2Þ
G0
jp0 V j
where Sgrad ¼
is the slope of the linear ramp.
In the 2.5D virtual scene, some planes (which, from now
on, are called ‘‘touching planes’’) are recognizable as physically in contact to other ones. These planes share one or
more visible contact points: some examples may include
a human figure standing on the floor, a picture on a wall,
etc. With reference to a couple of touching planes, the first
one is called master plane while the second one is called
slave plane. The difference between master and slave
planes is related to the hierarchical sorting procedure
described below. Since contact points between two touching planes should share the same gray value (i.e. same
height), considering the main plane and its touching ones
(slaves) it is possible to obtain the gray value of a contact
point, belonging to a slave plane, directly from the main
one (master). This can be done once the main plane has
already been assigned a gradient according to Eq. (2). From
the inherited gray value Gcontact, the starting gray value G0
for the slave plane is obtained as follows, in the case of
frontal or vertical/horizontal planes respectively:
(a) frontal planes:
G0 ¼ Gcontact
ð3Þ
(b) vertical and horizontal planes:
Dgrad ¼
Gcontact
jpcontact V j
G0 ¼ Gcontact þ ðjpcontact p0 j Dgrad Þ
ð4Þ
ð5Þ
where pcontact is the contact point coordinate (p (x, 0) for
vertical planes, p (0, y) for horizontal planes). In light of
that, the grayscale gradient has to be applied firstly to
the main plane so that it is possible to determine the G0
value for its slave planes. In other words, once plane slaves
to the main one have been gradiented, it is possible to find
the G0 values for their own slave planes and so forth. More
in detail, assuming the master-slave relationships are
available, a hierarchical sorting of the touching planes
can be performed. This is possible by observing that the
relations between each touching plane can be unambiguously represented by a non-ordered rooted tree, in which
the main plane is the tree root node while the other touching planes are the tree leaf nodes. By computing the depth
of each leaf (i.e. distance from the root) it is possible to sort
the planes according to their depth. The coordinates of the
contact point pcontact for a pair of touching planes are
obtained in different ways depending on their type:
(a) Frontal planes touching the main plane: for the generic ith frontal plane touching the main plane, the
contact point can be approximated, most of the
times, by the lowest pixel hb of the generic region
Ci. Accordingly, the main plane pixel in contact with
the considered region, is the one below hb so that the
entire region inherits its gray value. In some special
cases, for instance when a scene object is extended
below the ground (e.g. a well), the contact point
between the two planes may not be the lowest pixel
of Ii. In this case it is needed to specify one contact
point via user input.
(b) Vertical planes touching the main plane: for the generic ith vertical plane (preliminarily classified)
touching the main plane it is necessary to determine
the region bounding trapezoid. This geometrical
construction allows assigning the correct starting
gray value (G0), for the vertical plane gradient, even
if the plane has a non-rectangular shape (trapezoidal
in perspective view). This is a common situation,
also shown in Fig. 7.
By observing Fig. 7b, it is clear that the leftmost pixel of
the vertical planar region to be gradiented is not
enough in order to identify the point at the same height
on the main plane where to inherit G0 value from. In
order to identify such a point, it is actually necessary
to compute the vertexes of the bounding box. This step
has been carried out using the approach provided in
[25].
(c) Other touching planes: for every other type of touching plane whose master plane is not the main plane
it is necessary to specify both the master plane and
one of the contact points shared between the two.
R. Furferi et al. / Graphical Models 76 (2014) 706–723
713
Fig. 7. (a) Detail of the segmented image of ‘‘The Healing of the Cripple and the Raising of Tabitha’’. (b) Example of vertical left plane touching the main
plane and its bounding box.
This task is performed via user input since there is
no automatic method to univocally determine the
contact point.
Using the procedure described above the only planes to
be gradiented remains all non-touching planes and oblique
ones. In fact, for planes which are not visibly in contact
with other planes i.e. non-touching planes (e.g. birds,
angels and cherubs suspended above the ground or planes
whose contact points are not visible in the scene because
hidden by a foreground element), the G0 value has to be
manually assigned by the user choosing, on the main
plane, the gray level corresponding to the supposed spatial
position of the subject.
Regarding oblique planes, the assignment of the gradient is performed observing that a generic oblique plane
can be viewed as an horizontal or vertical plane rotated
around one arbitrary axis: when considering the three
main axes (x, y, z) three main types of this kind can be
identified. Referring to Fig. 8, the first type of oblique
planes, called ‘‘A’’, can be expressed as a vertical plane
rotated around the x axis while the second (B) is nothing
but a horizontal plane rotated around the y axis. The third
type of oblique plane (C) can be expressed by either a vertical or horizontal plane, each rotated around the z axis. Of
course these three basic rotations can be applied consecutively, generating other types of oblique planes, whose
characterization is beyond the scope of this work. Observing the same figure it becomes clear that, except for the ‘‘C’’
type, the vanishing point related to the oblique planes is
different from the main vanishing point of the image. For
this reason the way to determine the grayscale gradient
is almost the same as the other type of planes except for
both the V coordinates, that have to be adjusted to an
updated value V0 , and for the gradient direction, that has
to be manually specified.
The updated formula for computing the gray level for a
generic point P belonging to any type of oblique plane is
then:
G0
S0grad ¼
p0 V 0
Fig. 8. Characterization of oblique planes.
ð6Þ
714
R. Furferi et al. / Graphical Models 76 (2014) 706–723
G ¼ G0 þ ðjp p0 j S0grad Þ
ð7Þ
It can be noticed that both x, y coordinates of the points
p, p0 and V0 have to be taken into account for calculating the
distance, since the sole x or y direction is not representative
of the gradient direction when dealing with oblique planes;
moreover, both for touching or non-touching oblique
planes, the G0 value has to be manually assigned by the
user.
In conclusion, once all the planes are assigned a proper
gradient, the final result consists of a grayscale image
corresponding to an height map (see Fig. 9). Accordingly,
this representation consists of a virtual flat-layered basrelief.
Of course, this kind of representation performs well for
flat or approximately planar regions (see Fig. 10) like, for
instance, the wooden panels behind the three figures
surrounding the seated woman (Tabitha). Conversely,
referring to the Tabitha figure depicted in Fig. 10, the
flat-layered representation is only sufficient to represent
its position in the virtual scene while its shape (e.g. face,
vest etc.) needs to be reconstructed in terms of volume.
2.3. Volumes reconstruction
Once the height map of the scene and the space distribution of the depicted figures have been drafted, it
becomes necessary to define the volume of every painted
subject, for which is possible, for the viewer, to figure out
its actual quasi-three-dimensional shape. As previously
stated, in order to accomplish this purpose, it is necessary
to translate in shape details all the information elicited
from the painting.
First, all objects resembling primitive geometry in the
scene are reconstructed using a simple user-guided image
processing-based procedure. The user is asked to select
the clusters Ci whose final expected geometry is ascribable
to a primitive, such as a cylinder or a sphere. Then, since any
selected cluster represents a single blob (i.e. a region with
constant pixel values) it is easy to detect its geometrical
properties like, for instance: centroid, major and minor axis
lengths, perimeter and area [32]. On the basis of such values
it is straightforward to discriminate between a shape that is
approximately circular (i.e. a shape that has to be reconstructed in the form of a sphere) or approximately rectangular (i.e. a shape that has to be reconstructed in the form
of a cylinder), using widely known geometric relationships
used in blob analysis. After the cluster is classified as a particularly shaped object, a gradient is consistently applied. If
the object to be reconstructed is only partially visible in the
scene, it is up to the user to manually classify the object.
Then, for cylinders user shall select at least two points
defining its major axis while, for spheres, he is required to
select a point approximately located in the circle center
and two points roughly defining the diameter. Once these
inputs are provided the grayscale gradient is automatically
computed.
Referring to subjects that are not reproducible using
primitive geometries (e.g. Tabitha), the reconstruction is
mainly performed using SFS-based techniques. This precise
choice is due to the fact that the three-dimensional effect
of a subject is, usually, realized by the artist using the
chiaroscuro technique, in order to reproduce on the flat surface of the canvas the different brightness of the real shape
under the scene illumination. This leads to consider that, in
order to reconstruct the volume of a painted figure, the
only useful information that can be worked out from the
painting is the brightness of each pixel. Generally speaking,
SFS methods prove to be effective for retrieving 3D information (e.g. height map) for synthetic images (i.e. images
obtained starting from a given normal map) while the performance of most methods applied to real-world images is
still unsatisfactory [33].
Furthermore, as stated in the introductory section,
paintings are hand-made artworks and so many aspects
of the painted scene (such as silhouette and tones) are
unavoidably not accurately reproduced in the image, light
direction is unknown in the most of the cases, a diffused
light is commonly painted by the artist and imagined surfaces are not perfectly diffusive. These drawbacks make it
even more complex the solution of SFS problem with
respect to real-world images. For these reasons, the present paper proposes a simplified approach where the final
solution i.e. the height map Zfinal of all the subjects in image
is obtained as a combination of three different height
maps: (1) ‘‘rough shape’’ Zrough; (2) ‘‘main shape’’ Zmain
and (3) ‘‘fine details shape’’ Zdetail. As a consequence:
Z final ¼ krough Z rough þ kmain Z main þ kdetail Z detail
ð8Þ
It has to be considered that since the final solution is
obtained by summing up different contributes, for each
of them different simplifying assumptions, valid for
Fig. 9. Final grayscale height-map of ‘‘The Healing of the Cripple and the Raising of Tabitha’’.
R. Furferi et al. / Graphical Models 76 (2014) 706–723
715
Fig. 10. Wooden panels behind the three standing figures are sufficiently modeled using flat representation. Conversely, it is necessary to convey
volumetric information of the Tabitha figure.
retrieving each height map, can be stated as explained in
the next pages. The main attempt in building a final surface
using three different contributes is to reduce the drawbacks due to the non-correct illumination and brightness
of the original scene. As explained in detail in the next sections, Zrough is built to overcome possible problems deriving
by diffused light in the scene by providing a non-flat surface; Zmain is a SFS-based reconstruction obtained using
minimization techniques, known to perform well for realworld images (but flattened and over smoothed with
respect to the expected surfaces; combining this height
map with Zrough the flatten effect is strongly reduced); Zdetail
is built to reduce the smoothness of both the height maps
Zrough and Zmain.
2.3.1. Basic principles of SFS
As widely known, under the hypothesis of Lambertian
surfaces [22], unique light source set far enough from the
scene to assume the light beams being parallel each other
and negligible perspective distortion surfaces in a single
shaded image can be retrieved by solving the ‘‘Fundamental
Equation of SFS’’ i.e. a non-linear Partial Derivative Equation
(PDE) that express the relation between height gradient
and image brightness:
1
q
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
!ðx; yÞ 1 þ jrzj2 þ ðlx ; ly Þrz lz ¼ 0
ð9Þ
!
where L ¼ ½lx ; ly ; lz is the unit vector opposite to the light
direction, q is the albedo (i.e. the fraction of light reflected
diffusively), !(x, y) is the brightness of the pixel located at
the coordinates (x, y) and z(x, y) is the height of the
retrieved surface.
Among the wide range of methods for solving Eq. (9)
[20,26,34,35], minimization methods are acknowledged
to provide the best compromise between efficiency and
flexibility, since they are able to deliver reasonable results
even in case of inaccurate input data, whether caused by
imprecise setting (e.g. ambiguous light direction) or inaccuracies of the image brightness. As mentioned above,
these issues are unavoidable when dealing with SFS starting from hand-painted images. Minimization SFS approach
is based on the hypothesis that the expected surface, which
should match the actual one, is (or, at least, is very close to)
the minimum of an appropriate functional. Usually the
functional, that represents the error to be iteratively minimized between the reconstructed surface and the expected
one, is a linear combination of several contributions called
‘‘constraints’’.
Three main different kinds of constraints can be used
for solving the SFS problem, according to scientific literature [22]: brightness constraint, which force the final solution to reproduce, pixel by pixel, an image as bright as the
initial one; smoothness constraint, which drives the solution towards smooth surface; integrability constraint,
which prevents the problem-solving process to provide
surfaces that cannot be integrated (i.e. surfaces for which
there is an univocal relation between normal map and
height).
Brightness constraint is necessary for a correct formulation of the problem, since it is the only one that is based on
given data; however it is not sufficient to detect a good
solution, due to the ‘‘ill-posedness’’ of the problem: there
are indeed infinite scenes that give exactly the input image
under the same light condition. For this reason it is necessary to consider, at least, also smoothness and/or integrability constraints so that the minimization process is
guided towards a more plausible solution. The downside
of adding constraints is that in case of complex surfaces
characterized by abrupt changes of slope and high frequency details the error in solution caused by smoothness
or integrability constraints becomes relevant. This particular effect, called over-smoothing error, leads the resolution
algorithm to produce smoother surfaces respect to the
expected one leading to a loss in fine surface details.
Another important aspect to be taken into account,
especially if dealing with paintings representing open-air
scenarios, is that the possible main light source is sometimes combined with a fraction of diffused reflected light,
usually called ambient light. Consequently, the image
716
R. Furferi et al. / Graphical Models 76 (2014) 706–723
results more artistically effective, but unfortunately also
affected by lower contrast resolution with respect to a single-spot illumination based scene. The main problem arising from the presence of diffused light is that in case
volumetric shape is retrieved using SFS without considering it, the result will appear too ‘‘flat’’ due to the smaller
brightness range available in image (deepness is loosed
in case of low frequency changes in slope). The higher is
the contribute of the diffused light with respect to spot
lights, the lower is the brightness range available for reconstruction and so the flatter is the retrieved volume.
In order to reduce this effect, a method could be to subtract from each pixel of the original image ! a constant
value Ld, so that the tone histogram of the new obtained
image !0 is left-shifted. As a consequence, fundamental
equation of SFS is to be changed using !0 instead of ! i.e.:
! !
qð N L Þ þ Ld ¼ !0
shape of the subject; the importance of this surface is twofold: first it is one of the three contributes that linearly
combined will provide the final surface; secondly it is used
as initialization surface for obtaining the main shape Zmain
as explained in the next section.
In Fig. 11 the surface obtained for Tabitha figure is
depicted, showing visual effect similar to the one obtained
by inflating a balloon roughly shaped as the final object to
be reconstructed.
It is important to highlight that Zrough is built independently from the scene illumination, still allowing to reduce
the over-flattening effect. The use of this particular method
allows to avoid the use of Eq. (10) thus simplifying the SFS
problem i.e. to neglect diffuse illumination Ld. Obviously,
the obtained surface results excessively smoothed, with
respect to the expected one. This is the reason why, as
explained later, the fine detail surface is retrieved.
ð10Þ
Eq. (10) could be solved using minimization techniques
too. Unfortunately, the equation is mathematically verified
only when dealing with real or synthetic scenes. For paintings considering the fraction of diffused light as a constant
is a mere approximation and performing a correction of the
original image ! with such a constant value cause an
unavoidable loss in image details. For this reason a method
for taking into account diffused ambient light thus allowing to avoid over-flattening effect in shape reconstruction
is required. This is the reason why in the present work a
rough shape surface is retrieved using the approach provided in the following section.
2.3.2. Rough shape Zrough
As mentioned above, rarely the scene painted by an artist
can be treated as a single illuminated scene. As a consequence, in order to come up with the over-flattening effect
due to the possible presence of ambient light, Ld, in the present work the gross volume of the final shape is obtained by
using inflating and smoothing techniques, applied to the silhouette of every figure represented in the image.
The proposed procedure is composed of two consecutive operations: rough inflating and successive fine
smoothing. Rough inflating is a one-step operation that
provides a first-tentative shape, whose height map Zinf is,
pixel by pixel, proportional to the Euclidean distance from
the outline. This procedure is quite similar to the one used
in commercial software packages like ArtCam. However, as
demonstrated in [26] the combination of rough inflating
and successive smoothing produces better results.
Moreover, the operation results extremely fast but,
unfortunately, the obtained height map Zinf is not acceptable as it stands, appearing very irregular and indented:
this inconvenience is caused by the fact that the digital
image is discretized in pixels, producing an outline that
appears as a broken line instead of a continuous curve.
For this reason the height map Zinf is smoothed by iteratively applying a short radius (e.g. sized 3 3) average filter to the surface height map keeping the height value on
the outline unchanged.
The result of this iterative procedure is an height map
Zrough defining, in its turn, a surface resembling the rough
2.3.3. Main shape Zmain
In this phase a surface, called ‘‘main surface’’ is basically
retrieved using the SFS-based method described in [7],
here quickly reminded to help the reader in understanding
the overall procedure. Differently from the cited method,
however, in this work a method for determining unknown
!
light source L ¼ ½lx ; ly ; lz is implemented. This is a very
important task for reconstructing images from paintings
since the principal illumination of the scene can be only
be guessed by the observer (and in any case empirically
measured, since it is not a real or a synthetic scene).
First a functional to be minimized is built as a linear
combination of brightness (B) and smoothing (S) constraints, into the surface reconstruction domain D:
E ¼ B þ kS ¼
X
X ! ! 2
! !2
1=qGi N Ti L þ k
Ni Nj
i2D
ð11Þ
fi;jg2D
where i is the pixel index; j is the index of a generic pixel
belonging to the 4-neighborhood of pixel i; Gi is the
!
brightness of pixel i (range [0–1]); N i ¼ ½ni;x ; ni;y ; ni;z ;
!
N j ¼ ½nj;x ; nj;y ; nj;z are the unit length vectors normal to
the surface (unknown) in positions i and j, respectively; k
is a regularization factor for smoothness constraint
(weight).
Fig. 11. Height map Zrough obtained using inflating and successive
iterative smoothing on the Tabitha figure.
717
R. Furferi et al. / Graphical Models 76 (2014) 706–723
Thanks to the proper choice of the minimization
unknown, the outcome results to be a quadratic form. As
a consequence its gradient is linear and the minimization
process can be carried out indirectly by optimizing the
functional gradient itself (Eq. (9)), by using Gauss-Seidel
with Successive Over Relaxation (SOR) iterative method,
which is proved to allow a very fast convergence to the
optimized solution [20,24].
8
< minE ¼ min 1 UT AU þ UT b þ c ! rðEÞ ¼ AU þ b ¼ 0
U
U
2
: U ¼ ½n ; . . . ; n ; n ; . . . ; n ; n ; . . . ; n T
1;x
1;y
1;z
k;x
k;y
k;z
ð12Þ
where k is the overall number of pixel defining the shape to
be reconstructed.
However, Eq. (12) can be built only once the vector
!
L ¼ ½lx ; ly ; lz is evaluated. For this reason the devised GUI
implements a simple user-driven tool whose aim is to
quickly and easily detect the light direction in the image.
Despite several approaches exist to cope with light direction determination on real-world or synthetic scenes
[36], these are not adequate for discriminating scene illumination on painted subjects. Accordingly, a user-guided
procedure has been setup.
In particular, the user can regulate light unit-vector
components along axes x, y and z, by moving the corresponding slider on a dedicated GUI (see Fig. 12), while
the shading generated by using such illumination is displayed on a spherical surface. If a shape on the painting
is locally approximated by a sphere, then the shading
obtained on the GUI needs to be as close as possible to
the one on the picture. This procedure makes it easier for
the users to properly set the light direction. Obviously this
task requires user to guess the scene illumination on the
basis of an analogy with the illumination distribution on
a known geometry.
Once the matrix form of the functional has been correctly built, it is possible to set different kinds of boundary
conditions. This step is the crucial point of the whole SFS
procedure, since it is the only operation that allows the
user to properly guide the automatic evaluation of the
geometry itself. Two main kinds of boundary conditions
have to be set: the first drives the reconstruction by fixing
the unknowns on the silhouette outline (Silhouette Boundary Condition, SBC) while the second by fixing these
unknowns on the selected white areas outlines (Morphological Boundary Condition, MBC), both performed by setting interactively as the local maxima or minima height
points of the expected surface [37].
More in depth, when the subject to be reconstructed is
clearly detached from the background, the SBC allows to
discriminate between concave and convex global shape.
This is obtained by imposing the unit-normal inward or
outward pointing respect to the silhouette itself. In addition, since background unknowns are not included in the
surface reconstruction domain D, they are automatically
forced to lie on the z-axis as explained in Eq. (13):
!T
N bRD ¼ ½0; 0; 1T
ð13Þ
MBC, instead, plays a fundamental role to locally overcome the concave–convex ambiguity. In particular, users
are required to specify, for a number of white regions in
the original image (possibly for all of them), which ones
correspond to local maxima or minima (in terms of surface
height) figuring out the final shape as seen from an observer located in correspondence of the light source. Once
such points are selected, the algorithm provided in [99]
is used to evaluate the normals to be imposed.
!
In addition, unknowns N w coinciding with white pixels
!
w included in D are automatically set equal to L :
!
!
8w 2 D j I w ¼ 1 ! N w ¼ L
ð14Þ
At the end of this procedure, the matrix formulation of
the problem results modified and reduced, properly guiding the successive minimization phase:
^ þ br
rðEr Þ ¼ Ar U
ð15Þ
where Er, Ar and br are, respectively, a reduced version of E,
A and b.
Fig. 12. GUI implemented to set the light unit-vector components along axes x, y and z.
718
R. Furferi et al. / Graphical Models 76 (2014) 706–723
The minimization procedure can provide a more reliable solution if the iterative process is guided by initial
guess of the final surface. In such terms, the surface
obtained from Zrough is effectively used as initialization surface. Once the optimal normal map has been evaluated, the
height map Zmain is obtained by minimizing the difference
between the relative height, z = zi – zj, between adjacent
pixels and a specific value, qij, that express the same relative height calculated by fitting an osculating arc between
the two unit-normals:
E2 ¼
X
fi;jg
ðzi zj Þ qij
2
ð16Þ
The final result of this procedure (see Fig. 13) consists of
a surface Zmain roughly corresponding to the original image
without taking into account diffuse light. As already stated
above, such a surface results excessively flat, with respect
to the expected one. This is, however, an expected solution
since, as reaffirmed above, the over-flattening is corrected
by mixing together Zmain and Zrough.
In other words, combining Zmain with Zrough is coarsely
equivalent to solve SFS problem using both diffuse illumination and principal illumination thus allowing the reconstruction of a surface effectively resembling the original
(imagined) shape of the reconstructed subject and avoiding over-flattening. The advantage here is that any evaluation of the diffused light is required since Zrough is built
independently from light in image.
Obviously, the value of krough and kmain have to be properly set in order to balance the effect of inflating (and
smoothing) with the effect of SFS-based reconstruction.
Moreover finest details in the scene are not reproduced
since both techniques tend to smooth down the surface
(smoothing procedure in retrieving Zrough plus oversmoothing due to the presence of smoothing constraint
in Zmain SFS-based reconstruction). This is the reason why
another height map (and derived surface) has to be computed as explained in the next section.
2.3.4. Fine details shape Zdetail
Inspired by some known techniques that commonly
available in both commercial software packages and in
literature works [38], a simple, but efficient, way to
Fig. 13. Height map Zmain obtained for Tabitha figure by solving the SFS
problem with minimization algorithm and using Zrough as initialization
surface.
reproduce the fine details is to consider the brightness of
the input image as a height map (see Fig. 14). This means
that the Zdetail height map is provided by the following
equation:
Z detail ði; jÞ ¼
1
qk
!ði; jÞ
ð17Þ
where:
Zdetail(i, j) is the height map value at the pixel of coordinate (i, j).
Y(i, j) is the brightness value of the pixel (i, j) in the
image Y.
qk is the value of the albedo of the segment k of the
image, to which the pixel belongs.
The obtained height map is not the actual one since
object details perpendicular to the scene principal
(painted) light will result, in this reconstruction, closer to
the observer even if they are not; moreover convexity
and concavity of the subjects are not discerned. However
the obtained shape visually resembles the desired one
and thus can be used to improve the overall surface
retrieval.
As an alternative, the height map Zdetail can be obtained
by the magnitude of the image gradient according to the
following equation which, for a given pixel with coordinates (i, j), can be approximately computed, for instance,
with reference to its 3 3 neighborhood according to
equation 19 making use of image gradient.
1
Z detail ði; jÞ ¼ r
!
q
k
r 1 !
q
k
where:
2
ð18Þ
ði;jÞ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ r2 þ r2
x
y
1 0 1
3
ð19Þ
2
1
1
7
6
rx ffi 6
4 1 0 1 5 ! and ry ffi 4 0
0
1 0 1
1
3
7
0 5 !
1 1 1
ð20Þ
Fig. 14. Height map Zdetail obtained for Tabitha figure using the original
grayscale image as a height map.
R. Furferi et al. / Graphical Models 76 (2014) 706–723
Fig. 15. Height map Zdetail obtained for Tabitha figure using the gradient
of the original image as a height map.
719
object with respect to the others. Users are allowed to
select the weight by means of relative sliders, showing in
real-time the aspect of the final surface (see Fig. 16). In this
phase, the system prevents the user from generating, for a
given object, a surface whose relief exceeds the one of
other objects closer to the observer.
In Fig. 17 the virtual 2.5D model (bas-relief) obtained by
using the devised procedure is depicted. In order to appreciate the differences between the perspective geometrybased reconstruction and the volumes-based one, in
Fig. 17 the model is split into two parts. On the left part
the flattened bas-relief obtained from the original image
is shown to illustrate the position of subjects in the scene,
but without volumes. On the right part, the complete
reconstruction is provided (both position in the virtual
scene and volumetric information).
2.5. Rapid prototyping of the virtual bas-relief
It should be noticed that not even the height map
obtained by Eq. (18) is correct: brighter (though more in
relief) regions are the ones where a sharp variation in the
original image occurs, thus not corresponding to the
regions actually closer to the observer (Fig. 15). However,
also in this case the retrieved surface visually resembles
the desired one; for this reason it can be used to improve
the surface reconstruction.
Accordingly, the height map obtained using one of the
two formulations provided in Eqs. (17) or (19) is worthwhile only setting the weight in Eq. (8) to a value lower
than 0.05. In fact, the overall result obtainable by combining Zdetail with Zmain and Zrough using such a weight is much
more realistically resembling the desired surface.
2.4. Virtual bas-relief reconstruction
Using the devised GUI it is possible to interactively
combine the contributions of perspective-based reconstruction, SFS, rough and fine detail surfaces. In particular it is
possible to decide the optimum weights for the different
components and to assess how much relief is given to each
Once the final surface has a satisfying quality, the procedure allows the user to produce a STL file, ready to be
manufactured with reverse prototyping techniques or
CNC milling machines.
In Fig. 18, the CNC prototyped model for ‘‘The Healing of
the Cripple and the Raising of Tabitha’’ is shown. The physical prototype is sized about 900 mm 400 mm 80 mm.
In the same figure, a detail from the final prototype is also
showed.
3. Case studies
The devised method was widely shared with both the
Italian Union of Blind and Visually Impaired People in Florence (Italy) and experts working in the Cultural Heritage
field, with particular mention to experts from the Musei
Civici Fiorentini (Florence Civic Museums, Italy) and from
Villa la Quiete (Florence, Italy). According to their suggestions, authors realized a wide range of bas-reliefs of wellknown artworks of the Italian Renaissance including ‘‘The
Annunciation’’ of Beato Angelico (see Fig. 19) permanently
Fig. 16. GUI devised for interactively combining the contributions from perspective-based reconstruction, SFS, Inflated and Fine detail surfaces.
720
R. Furferi et al. / Graphical Models 76 (2014) 706–723
Fig. 17. 2.5D virtual model obtained by applying the proposed methodology on the image of Fig. 2. On the left part of the model, the flattened bas-relief
obtained from the original image is shown; on the right part of the model the complete reconstruction is provided.
Fig. 18. Prototype of ‘‘The Healing of the Cripple and the Raising of Tabitha’’ obtained by using a CNC.
displayed at the Museo di San Marco (Firenze, Italy), some
figures from the ‘‘Mystical marriage of Saint Catherine’’ by
Ridolfo del Ghirlandaio (see Fig. 20) and the ‘‘Madonna
with Child and angels’’ by Niccolò Gerini (see Fig. 21), both
displayed at Villa La Quiete (Firenze, Italy).
4. Discussion
In cooperation with the Italian Union of Blind and Visually Impaired People in Firenze (Italy), a panel of 14 users
with a total visual deficit (8 congenital (from now on CB)
and 6 acquired (from now on AB)), split in two age groups
(between 25–47 years -7 users- and between 54–73 -7
users) has been selected for testing the realized models,
using the same approach described in [4]. The testing
phase was performed by a professional who is specifically
trained to guide people with visual impairments in tactile
exploration.
For this purpose, after a brief description about the general cultural context, the pictorial language of each artist
and the basic concept of perspective view (i.e. a description
of how human vision perceive objects in space) the interviewees were asked to imagine the position of subjects in
the 3D space and their shape on the basis of a two-levels
tactile exploration: completely autonomous and guided.
In the first case, people from the panel were asked to provide a description of the perceived subjects, their position
in space (both absolute and mutual) and to guess their
Fig. 19. Prototype of ‘‘The Annunciation’’ of Beato Angelico permanently placed in the upper floor of the San Marco Museum (Firenze), next to the original
Fresco.
R. Furferi et al. / Graphical Models 76 (2014) 706–723
Fig. 20. Prototype resembling the Maddalena and the Child figures
retrieved from the ‘‘Mystical marriage of Saint Catherine’’ by Ridolfo del
Ghirlandaio.
shape after a tactile exploration without any interference
from the interviewer and without any limitation in terms
of exploration time and modalities. In the second phase,
the expert provided a description of the painted scene
including subjects and mutual position in the (hypothetic)
721
3D space. After this explanation, the panel was asked to
identify again described subjects and their position in the
virtual 2.5D space.
Considering the four tactile bas-relief models, about
50% of AB and 37.5% of CB was able to perceive the main
subjects of the scene in the correct mutual position during
completely autonomous exploration. As a consequence
about 43% of the panel was able to properly understand
the reconstructed scene. Moreover, after the verbal
description provided in the second exploration phase,
86% of the panel proved to be capable of providing a sufficiently clear description of the touched scene.
Furthermore, after the second exploration phase, a
closed-ended question (‘‘why the model is considered
readable’’) was administered to the panel; the available
answers were: (1) ‘‘better perception of depth’’, (2) ‘‘better
perception of shapes’’, (3) ‘‘better perception of the subjects mutual position’’, (4) ‘‘better perception of details’’.
Results, depicted in Fig. 22, depend on the typology of
visual disability. AB people believed that the reconstructed
model allows, primarily, a better perception of position
(50%) and, secondarily, a better perception of shapes
(33%). Quite the reverse, as many as the 50% of the CB people declared to better perceive the shapes.
Despite the complexity in identifying painted subjects,
and their position, from a tactile bas-relief, and knowing
that a single panel of 14 people is not sufficient to provide
a reliable statistics on this matter, it can be qualitatively
stated that perspective-based reconstruction 2.5D models
from painted images are, in any case, an interesting
attempt in enhancing the artworks experience of blind
and visually impaired people. Accordingly a deeper work
in this field is highly recommended.
Fig. 21. Prototype realized for ‘‘Madonna with Child and angels’’ by Niccolò Gerini.
722
R. Furferi et al. / Graphical Models 76 (2014) 706–723
Fig. 22. Percentages of AB (blue) and CB people (red) related to the reasons of their preference. (For interpretation of the references to color in this figure
legend, the reader is referred to the web version of this article.)
Table 1
Typical issues of SFS applied to hand drawings and proposed solutions.
Cause
Effect using SFS
methods
Solution
Possible drawbacks
Presence in the scene of a painted (i.e.
guessed by the artist) diffuse
illumination
Flattened final surface
Loss of details in the
reconstructed surfaces. Too
much emphasis on coarse
volumes
Incorrect shading of objects due to artistic
reproduction of subjects in the scene
Errors in shape
reconstruction
Incorrect scene principal illumination
Combined with the
incorrect shading, this
leads to errors in
shape illumination
Loss of details due to excessive inflating
and to over smoothing effect in SFSbased reconstruction (e.g. using high
values of smoothness constraint)
None
Retrieval of a rough surface Zrough obtained
as a result of inflating and smoothing
iterative procedure. This allows to solve the
SFS problem using only the principal
illumination
Retrieval of a surface Zmain based on SFS
modified method using minimization and a
set of boundary conditions robust to the
possible errors in shading
Use of an empirical method for
determining, approximately, the
illumination vector. Combining Zmain with
Zrough is equivalent to solve SFS problem
using both diffuse illumination and
principal illumination
Use of a refinement procedure allowing to
take into account finer details of the
reconstructed objects
5. Conclusions
The present work presented an user-guided orderly
procedure meant to provide 2.5D tactile models starting
from single images. The provided method deals with a
number of complex issues typically arising from the analysis of painted scenes: imperfect brightness rendered by
the artist to the scene, incorrect shading of subjects, incorrect diffuse light, inconsistent perspective geometry. In full
awareness of these shortcomings, the proposed method
was intended to provide a robust reconstruction by using
a series of reasoned assumptions while contemporary
being tolerant to non-perfect reconstruction results. As
depicted in Table 1, most of the problems arising from
2.5D reconstruction starting from painted images are confronted with and, at least partially, solved using the proposed method. In these terms, the method is intended to
add more knowledge, and tools, dedicated to the simplification of 2D to 2.5D translation of artworks, making more
masterpieces available for visually impaired people and
allowing a decrease of final costs for tactile paintings
creation.
If the shading of painted
objects is grossly represented
by the artist, the reconstruction
may appear unfaithful
Solving the method with more
than one principal illumination
leads to incorrect
reconstructions
None
With the aim of encouraging further works in this field a
number of open issues and limitations can be drafted.
Firstly, since the starting point of the method consists of
image clustering, the development of an automatic and
accurate segmentation of painting subjects could dramatically improve the proposed method; extensive testing to
assess the performance of state of the art algorithms
applied to paintings could be also helpful for speeding up
this phase. Secondly, the reconstruction of morphable models of parts commonly represented in paintings (such as
hands, limbs or also entire human bodies) could be used
in order to facilitate the relief generation. Using such models could avoid solving complex SFS-based algorithms for at
least a subset of subjects. Some improvements could also be
performed (1) by developing algorithms aimed to automatically suppress/compress useless parts (for instance the
part of the scene comprised between the background and
the farthest relevant figure which needs to be modeled in
detail) and (2) to perform automatic transitions between
adjacent areas (segments) with appropriate constraints
(e.g. tangency). The generation of slightly undercut models,
to facilitate the comprehension/recognition of the
R. Furferi et al. / Graphical Models 76 (2014) 706–723
represented objects and to make the exploration more
enjoyable, is another open issue to be well-thought-out in
the next future. Finally, future works could be addressed
to embrace possible inputs and suggestions from neuroscientists in order to make the 2.5D reproductions more
‘‘effective’’ so that it is possible to improve the ‘‘aesthetic
experience’’ of the end-users. Moreover, the possibility of
enriching the tactile exploration experience by developing
systems capable of tracing users fingers and of providing
real time audio feedback could be investigated in the next
future.
[11]
[12]
[13]
[14]
[15]
Acknowledgments
[16]
The authors wish to acknowledge the valuable contribution of Prof. Antonio Quatraro, President of the Italian
Union of Blind People (UIC) Florence (Italy), in helping
the authors in the selection of artworks and in assessing
the final outcome. The authors also wish to thank the Tuscany Region (Italy) for co-funding the T-VedO project
(PAR-FAS 2007-2013), which originated and made possible
this research, the Fondo Edifici di Culto – FEC of the Italian
Ministry of Interior (Florence, Italy) and the Carmelite
Community.
[17]
[18]
[19]
[20]
[21]
[22]
Appendix A. Supplementary material
Supplementary data associated with this article can be
found, in the online version, at http://dx.doi.org/10.1016/
j.gmod.2014.10.001.
References
[1] R.L. Klatzky, S.J. Lederman, C.L. Reed, There’s more to touch than
meets the eye: the salience of object attributes for haptics with and
without vision, J. Exp. Psychol. Gen. 116 (1987) 356–369.
[2] A. Streri, E.S. Spelke, Haptic perception of objects in infancy, Cogn.
Psychol. 20 (1) (1988) 1–23.
[3] A. Reichinger, M. Neumüller, F. Rist, S. Maierhofer, W. Purgathofer,
Computer-aided design of tactile models – taxonomy and case
studies, in: K. Miesenberger, A. Karshmer, P. Penaz, W. Zagler (Eds.),
Computers Helping People with Special Needs. 7383 Volume of
Lecture Notes in Computer Science, Springer, Berlin/Heidelberg,
2013, pp. 497–504.
[4] M. Carfagni, R. Furferi, L. Governi, Y. Volpe, G. Tennirelli, Tactile
representation of paintings: an early assessment of possible
computer based strategies, Lecture Notes in Computer Science
(Including Subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), 7616 LNCS, 2012. pp. 261–270.
[5] L. Thompson, E. Chronicle, Beyond visual conventions: rethinking the
design of tactile diagrams, Br. J. Visual Impairment 24 (2) (2006) 76–
82.
[6] S. Oouchi, K. Yamazawa, L. Secchi, Reproduction of tactile paintings
for visual impairments utilized three-dimensional modeling system
and the effect of difference in the painting size on tactile perception,
Computers Helping People with Special Needs, Springer, 2010. pp.
527–533.
[7] Y. Volpe, R. Furferi, L. Governi, G. Tennirelli, Computer-based
methodologies for semi-automatic 3D model generation from
paintings, Int. J. Comput. Aided Eng. Technol. 6 (1) (2014) 88–112.
[8] Y. Horry, K. Anjyo, K. Arai, Tour into the picture: using a spidery
mesh interface to make animation from a single image, in:
Proceedings of the 24th Annual Conference on Computer Graphics
and Interactive Techniques, ACM Press/Addison-Wesley, 1997.
[9] D. Hoiem, A.A. Efros, H. Martial, Automatic photo pop-up, ACM
Transactions on Graphics (TOG), vol. 243, ACM, 2005.
[10] J. Wu, R.R. Martin, P.L. Rosin, X.-F. Sun, Y.-K. Lai, Y.-H. Liu, C.
Wallraven, of non-photorealistic rendering and photometric stereo
in making bas-reliefs from photographs, Graphical Models 76 (4)
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
723
(2014) 202–213. ISSN 1524-0703, http://dx.doi.org/10.1016/
j.gmod.2014.02.002.
J. Wu, R. Martin, P. Rosin, X.-F. Sun, F. Langbein, Y.-K. Lai, A. Marshall,
Y.-H. Liu, Making bas-reliefs from photographs of human faces,
Comput. Aided Des. 45 (3) (2013) 671–682.
Z.K. Huang, X.W. Zhang, W.Z. Zhang, L.Y. Hou, A new embossing
method for gray images using Kalman filter, Appl. Mech. Mater. 39
(2011) 488–491.
W. Song, A. Belyaev, H.-P. Seidel, Automatic generation of bas-reliefs
from 3d shapes, shape modeling and applications, 2007. SMI’07. In:
IEEE International Conference on. IEEE, 2007.
Sourin, Alexei. Functionally based virtual computer art, in:
Proceedings of the 2001 Symposium on Interactive 3D Graphic,
ACM, 2001.
Z. Li, S. Wang, J. Yu, K.-L. Ma, Restoration of brick and stone relief
from single rubbing images, Visualization Comput. Graphics IEEE
Trans. 18 (2012) 177–187.
M. Wang, J. Chang, J.J. Zhang, A review of digital relief generation
techniques, Computer Engineering and Technology (ICCET), in: 2010
2nd International Conference on, IEEE, 2010, p. 198–202.
T. Weyrich et al., Digital bas-relief from 3D scenes. ACM Transactions
on Graphics TOG, vol. 26(3), ACM, 2007.
J. Kerber, Digital Art of Bas-relief Sculpting, Master’s Thesis, Univ. of
Saarland, Saarbrücken, Germany, 2007.
A. Reichinger, S. Maierhofer, W. Purgathofer, High-quality tactile
paintings, J. Comput. Cult. Heritage 4 (2) (2011). Art. No. 5.
P. Daniel, J.-D. Durou, From deterministic to stochastic methods for
shape from shading, in: Proc. 4th Asian Conf. on Comp. Vis, Citeseer,
2000, pp. 1–23.
L. Di Angelo, P. Di Stefano, Bilateral symmetry estimation of human
face, Int. J. Interactive Design Manuf. (IJIDeM) (2012) 1–9. http://
dx.doi.org/10.1007/s12008-012-0174-8.
J.-D. Durou, M. Falcone, M. Sagona, Numerical methods for shapefrom-shading: a new survey with benchmarks, Comput. Vis. Image
Underst. 109 (2008) 22–43.
R.T. Frankot, R. Chellappa, A method for enforcing integrability in
shape from shading algorithms, IEEE Trans. Pattern Anal. Mach.
Intell. 10 (4) (1988) 439–451.
T.-P. Wu, J. Sun, C.-K. Tang, H.-Y. Shum, Interactive normal
reconstruction from a single image, ACM Trans. Graphics (TOG) 27
(2008) 119.
R. Furferi, L. Governi, N. Vanni, Y. Volpe, Tactile 3D bas-relief from
single-point perspective paintings: a computer based method, J.
Inform. Comput. Sci. 11 (16) (2014) 1–14. ISSN: 1548-7741.
L. Governi, M. Carfagni, R. Furferi, L. Puggelli, Y. Volpe, Digital basrelief design: a novel shape from shading-based method, Comput.
Aided Des. Appl. 11 (2) (2014) 153–164.
L.G. Brown, A survey of image registration techniques, ACM Comput.
Surveys (CSUR) 24 (4) (1992) 325–376.
M.W. Schwarz, W.B. Cowan, J.C. Beatty, An experimental comparison
of RGB, YIQ, LAB, HSV, and opponent color models, ACM Trans.
Graphics (TOG) 6 (2) (1987) 123–158.
E. Nadernejad, S. Sharifzadeh, H. Hassanpour, Edge detection
techniques: evaluations and comparisons, Appl. Math. Sci. 2 (31)
(2008) 1507–1520.
W.A. Barrett, E.N. Mortensen, Interactive live-wire boundary
extraction, Med. Image Anal. 1 (4) (1997) 331–341.
C. Brauer-Burchardt, K. Voss, Robust vanishing point determination
in noisy images, Pattern Recognition, in: Proceedings 15th
International Conference on, vol. 1, IEEE, 2000.
J.R. Parker, Algorithms for Image Processing and Computer Vision,
John Wiley & Sons, 2010.
O. Vogel, L. Valgaerts, M. Breuß, J. Weickert, Making shape from
shading work for real-world images, Pattern Recognition, Springer,
Berlin Heidelberg, 2009. pp. 191–200.
R. Huang, W.A.P. Smith, Structure-preserving regularisation
constraints for shape-from-shading, Comput. Anal. Images Patterns
Lecture Notes Comput. Sci. 5702 (2009) 865.
P.L. Worthington, E.R. Hancock, Needle map recovery using robust
regularizers, Image Vis. Comput. 17 (8) (1999) 545–557.
C. Wu et al., High-quality shape from multi-view stereo and shading
under general illumination, Computer Vision and Pattern
Recognition (CVPR), in: 2011 IEEE Conference on. IEEE, 2011.
L. Governi, R. Furferi, L. Puggelli, Y. Volpe, Improving surface
reconstruction in shape from shading using easy-to-set boundary
conditions, Int. J. Comput. Vision Robot. 3 (3) (2013) 225–247.
K. Salisbury et al., Haptic rendering: Programming touch interaction
with virtual objects, in: Proceedings of the 1995 symposium on
Interactive 3D graphics, ACM, 1995.