Smart Camera: From Wikipedia, The Free Encyclopedia

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 21

Smart camera

From Wikipedia, the free encyclopedia

Although there are many definitions of smart cameras offered by the media, camera manufacturers and
developers, still no binding definition exists. In a field where terms are often defined by their predominant
usage, most material in this article is based on the term's most predominant usage. In the book "Smart
Cameras"[1], a smart camera is defined as a vision system which, in addition to image capture circuitry, is
capable of extracting application-specific information from the captured images, along with generating event
descriptions or making decisions that are used in an intelligent and automated system.

A smart camera or "intelligent camera" is a self-contained, standalone vision system with built-in image sensor
in the housing of an industrial video camera. It contains all necessary communication interfaces, e.g. Ethernet,
as well as industry-proof 24V I/O lines for connection to a PLC, actuators, relays or pneumatic valves. It is not
necessarily larger than an industrial or surveillance camera. "Having" a capability in Machine Vision generally
means a degree of development such that these capabilities are ready for use on individual applications.

This architecture has the advantage of a more compact volume compared to PC-based vision systems and
often achieves lower cost, at the expense of a somewhat simpler (or missing altogether) user interface.

Although often used for simpler applications, modern smart cameras can rival PCs in terms of processing
power and functionalities. Smart cameras have been marketed since the mid 80s, but only in recent years have
they reached widespread use, once technology allowed their size to be reduced while their processing power
has reached several thousand MIPS (devices with 1GHz processors and up to 8000MIPS are available as of
end of 2006).

Having a dedicated processor in each unit, smart cameras are especially suited for applications where several
cameras must operate independently and often asynchronously, or when distributed vision is required (multiple
inspection or surveillance points along a production line or within an assembly machine).

Early smart camera (ca. 1985, in red) with an 8MHz Z80 compared to a modern device featuring Texas Instruments' C64
@1GHz
A smart camera usually consists of several (but not necessarily all) of the following components:

 Image sensor (matrix or linear, CCD- or CMOS)

 Image digitization circuitry

 Image memory

 processor (often a DSP or suitably powerful processor)

 program- and data memory (RAM, nonvolatile FLASH)

 Communication interface (RS232, Ethernet)

 I/O lines (often optoisolated)

 Lens holder or built in lens (usually C, CS or M-mount)

 Built in illumination device (usually LED)

 Purpose developed real-time operating system (For example VCRT)

A video output (e.g. VGA or SVGA) may be an option for a Smart Camera.

[edit]Fields of application

Smart cameras can in general be used for the same kind of applications where more complex vision systems
are used, and can additionally be applied in some applications where volume, pricing or reliability constraints
forbid use of bulkier devices and PC's.

Typical fields of application are:

 automated inspection for quality assurance (detection of defects, flaws,


missing parts...)

 non contact measurements.

 part sorting and identification.

 code reading and verification (barcode, Data Matrix, alphanumeric etc.)

 web inspection (inspection of continuously flowing materials such as coils,


tubes, wires, extruded plastic) for defect detection and dimensional gauging.

 detection of position and rotation of parts for robot guidance and automated
picking

 unattended surveillance (detection of intruders, fire or smoke detection)

 biometric recognition and access control (face, fingerprint, iris recognition)

 visual sensor networks


 robot guidance

 nearly any machine vision application

Developers can purchase smart cameras and develop their own programs for special, custom made
applications, or they can purchase ready made application software from the camera manufacturer or from third
party sources. Custom programs can be developed by programming in various languages (typically C or C++)
or by using more intuitive, albeit somewhat less flexible, development tools where existing functionalities (often
called tool or blocks) can be connected in a list (a sequence or a bidimensional flowchart) that describes the
desired flow of operations without any need to write program code. The main advantage of the visual approach
Vs. programming is in a much shorter and somewhat easier development process, available also to non-
programmers. Other development tools are available with relatively few but comparatively high level
functionalities, which can be configured and deployed with very limited effort.

Smart cameras running software tailored for a single specific application are often called "vision sensors."[2]

[edit]A Second Definition

"Having" a capability in Machine Vision generally means a degree of development such that these capabilities
are ready for use on individual applications. In the factory automation field where terms are often defined by
their predominant usage, "Smart Camera" is also commonly used to refer to a unit which includes all of the
hardware (and sometimes an operating system) as a platform for the above, but which requires the addition of
either machine vision software or more extensive programming before it ready to provide the described
capabilities.

[edit]References

1. ^ Ahmed Nabil Belbachir (Ed.) (2009). Smart Cameras. Springer. ISBN 978-
1-4419-0952-7.

2. ^ Alexander Hornberg (2006). Handbook of Machine Vision. Wiley-


VCH. ISBN 3527405844.

Categories: Applications of computer vision | Image sensor technology in computer vision | Embedded systems

• Log in / create account


• Article
• Discussion
• Read
• Edit
• View history
Top of Form
Bottom of Form
• Main page
• Contents
• Featured content
• Current events
• Random article
• Donate to Wikipedia
Interaction
• Help
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
Toolbox
Print/export
Languages
• Català
• Deutsch
• Français
• Italiano

• This page was last modified on 19 October 2010 at 09:59.


• Text is available under the Creative Commons Attribution-ShareAlike License; additional

terms may apply. See Terms of Use for details.


Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit
organization.
• Contact us

• Privacy policy

• About Wikipedia

• Disclaimers


Self Localizing Smart Camera Networks and their Applications to
3D Modeling
Camillo J. Taylor
Department of Computer and Information Science
University of Pennsylvania
[email protected]
Babak Shirmohammadi
Department of Computer and Information Science
University of Pennsylvania
[email protected]
Abstract
This paper describes a technology for localizing networks
of embedded cameras and sensors. In this scheme the
cameras and the nodes are equipped with controllable light
sources (either visible or infrared) which are used for signaling.
Each camera node can then automatically determine
the bearing to all the nodes that are visible from its vantage
point. From these angular measurements, the camera nodes
are able to determine the relative positions and orientations
of other nodes in the network.
The method is dual to other network localization techniques
in that it uses angular measurements derived from
images rather than range measurements derived from time
of flight or signal attenuation. The scheme can be implemented
with commonly available components and scales
well since the localization calculations only require limited
local communication. Further, the method provides estimates
of camera orientation which cannot be determined
solely from range measurements.
The localization technology can serve as a basic capability
on which higher level applications can be built. The
method could be used to automatically survey the locations
of sensors of interest, to implement distributed surveillance
systems or to analyze the structure of a scene based on the
images obtained from multiple registered vantage points.
1 Introduction
As the prices of cameras and computing elements continue
to fall, it has become increasingly attractive to consider
the deployment of smart camera networks. Such camera networks
could be used to support a wide variety of applications
including environmental modeling, 3D model construction
and surveillance.[3, 2, 7, 4]
One critical problem that must be addressed before such
systems can be realized is the issue of localization. That is,
in order to take full advantage of the images gathered from
multiple vantage points it is helpful to know how the cameras
in the scene are positioned and oriented with respect to each
other.
In this paper we describe a deployment scheme where
each of the smart cameras is equipped with a co-located controllable
light source which it can use to signal other smart
cameras in the vicinity. By analyzing the images that it acquires
over time, each smart camera is able to locate and
identify other nodes in the scene. This arrangement makes
it possible to directly determine the epipolar geometry of the
camera system from image measurements and, hence, provides
a means for recovering the relative positions and orientations
of the smart camera nodes.
A number of approaches to recovering the relative positions
of a set of cameras based on tracked objects have
been proposed in the literature [1, 6]. These approaches can
be very effective in situations where one can gather sufficient
correspondences over time. In contrast, the approach
proposed here directly instruments the sensors and provides
rapid estimates of the sensor field configuration using relatively
modest computational and communication resources.
2 Implementation
Figure 1 diagrams the basic elements of our vision based
localization system. Here we show a small network of 3
nodes, two of which are equipped with cameras. We will
begin by discussing how localization proceeds in this simple
case and then describe how the scheme can be extended to
handle multiple nodes.
In the first stage of the localization process, the nodes signal
their presence by blinking their lights in a preset pattern.
That is, each of the nodes would be assigned a unique string
representing a blink pattern such as 10110101, the node
would then turn its light on or off in the manner prescribed
by its string. Similar temporal coding schemes are employed
in laser target designators and freespace optical communication
schemes. These blink patterns provide a means for each
of the camera equipped nodes to locate other nodes in their
images. They do this by collecting a sequence of images
over time and analyzing the image intensity arrays to locate
pixels whose intensity varies in an appropriate manner. This
approach offers a number of important advantages, firstly it
allows the node to both localize and uniquely identify neighboring
nodes since the blink patterns are individualized. Secondly,
it allows the system to reliably detect nodes that subtend
only a few pixels in an image which provides an avenue
for further miniaturization of the smart camera nodes.
Figure 2 shows the results of the blinker detection phase
on a typical image. Here the detected locations in the image
are labeled with the unique codes that the system found.
Once the nodes have been detected and localized in the
images, we can derive the unit vectors, vab, vac, vba and vbc
that relate the nodes as shown in Figure 3. Here we assume
that the intrinsic parameters of each of the cameras (focal
a. b.
Figure 2. This figure shows the results of automatically localizing a
constellation of 4 smart cameras and 3 blinker nodes.
The image obtained from one of the smart cameras is shown in a while the
localization results are shown in b.
Figure 1. This figure shows the basic elements of the
proposed localization scheme. It depicts two smart camera
nodes equipped with controllable light sources and a
third blinker node.
A
B
C
vab
vac
nb
vba
lab
vbc
n lbc a lca
Figure 3. This figure depicts the relative vectors and
lengths in our 3 node localization problem.
length, principal point, distortion coefficients) have been determined
in an offline calibration step. These parameters allow
us to relate locations in the image to direction vectors in
space.
From the vectors vab, vac, vba and vbc, we can derive two
additional vectors na and nb which represent normalized versions
of (vab×vac) and (vba×vbc). These vectors correspond
to the normal to the plane containing the three nodes, A, B
and C expressed with respect to frames A and B respectively.
Here we note that the unit vectors vab, vba, na and nb are
related by a rotation matrix Rab which captures the relative
orientation of camera frames A and B.
vab = −Rabvba (1)
na = −Rabnb (2)
From the two perpendicular unit vectors, vab and na, we
can construct the orthonormal matrix Ra ∈ SO(3) as follows:
Ra = [ vab na (vab×na) ] Similarly, from the orthogonal
unit vectors −vba and −nb we construct the matrix
Rb = [ −vba −nb (vba ×nb) ]
From equations 1 and 2 we deduce that:
Ra = RabRb (3)
which in turn yields the following expression for Rab:
Rab = Ra(Rb)T (4)
Once we have ascertained the relative orientation of the
two cameras. We can recover the relative position of the
three nodes by considering the following homogenous linear
system.
labvab+lbc(Rabvbc)−lcavac = 0 (5)
Here the unknown variables lab, lbc and lca denote the lengths
of the segments AB, BC and CA. Since this system is homogenous
we can only resolve the configuration of the nodes
up to a positive scale factor.
The scheme that we have described essentially corresponds
to the calibration of a stereo system where both of
the epipoles can be located and measured directly in the imagery.
In this configuration, we require only a single additional
point to resolve the relationship between the two camera
frames. Since the epipoles are directly measured, the
localization scheme is quite stable numerically and can be
expected to yield accurate results as long as we avoid the
singular configuration where all three nodes are collinear.
Larger networks of smart cameras and sensors can be localized
by considering the relationship between triangular
subgraphs of the visibility graph as shown in Figure 4. In
this graph, the directed edges indicate that a particular smart
camera can view another node in the graph. The triangles indicate
triples of nodes that can be localized using the scheme
described previously. The localization results from triangles
that share an edge can be fused together into a common
frame of reference. Therefore, if the set of localization
triangles is fully connected, the entire network can be fully
localized. Alternatively, by analyzing the connected components
in the induced graph of localization triangles, one can
automatically determine which sets of cameras can be localized
to a common frame. The entire localization procedure is
capable of determining the relative location and orientation
of the nodes up to a scale factor, this scale can be resolved
by measuring the distance between any pair of nodes. Figure
6 shows the final result of localizing a constellation of four
smart cameras and three blinker nodes. Note that we do not
require all of the nodes to have cameras - once we have localized
two or more of the cameras we can localize other nodes
equipped with lights through simple triangulation. This is
an important advantage since it means that we can deploy a
few smart camera nodes in an environment and use them to
localize other smaller, cheaper sensor motes that are simply
outfitted with blinkers.
Smart Camera
Node
Blinker Node
Figure 4. Larger networks of smart cameras and sensors
can be localized by considering the relationship between
triangular subgraphs of the visibility graph.
It is important to note that in this framework angular
measurements derived from images and range measurements
derived from other sources are treated as complementary
sources of information. Measurements derived from the vision
system can be used to determine the relative orientations
of the camera systems which is important information that
cannot be derived solely from range measurements. On the
other hand, range measurements can be used to resolve the
scale ambiguity inherent in angle only localization schemes.
Similarly angular measurements can be used to disambiguate
the mirror reflection ambiguities that are inherent in range
only localization schemes. Ultimately it is envisioned that
smart camera networks would incorporate range measurements
derived from sources like the MIT Cricket system or
Ultra Wide Band radio transceivers. These measurements
could be used to improve the results of the localization procedure
and to localize nodes that may not be visible to the
smart camera nodes.
2.1 Refining Pose Estimates
The previous section described how neighboring cameras
can compute their relative position and orientation based on
corresponding image measurements. This process can be
done in a completely decentralized manner using only local
communication and will produce accurate relative location
estimates which is what is typically what is required to fuse
measurements from neighboring sensors.
If necessary, the estimates for node position and orientation
produced by this process can be further refined by a
scheme which takes account of all available measurements
simultaneously. In this refinement step the localization process
is recast as an optimization problem where the objective
is to minimize the discrepancy between the observed image
measurements and the measurements that would be predicted
based on the estimate for the relative positions and orientations
of the sensors and cameras. This process is referred
to as Bundle Adjustment in the computer vision and photogrammetry
literature.
In the sequel we will let ui j ∈ R3 denote the unit vector
corresponding to the measurement for the bearing of sensor j
with respect to camera i. This measurement is assumed to be
corrupted with noise. The vector vi j ∈ R3 corresponds to the
predicted value for this direction vector based on the current
estimates for the positions and orientations of the sensors.
This vector can be calculated as follows:
vi j = Ri(Tj −Ti) (6)
In this expression Ri ∈ SO(3) denotes the rotation matrix
associated with camera i while Ti,Tj ∈ R3 denote the positions
of camera i and sensor j respectively (note that sensor
j could be another camera).
The goal then is to select the camera rotations and sensor
positions so as to minimize the discrepancy between the vectors
ui j and vi j for every available measurement. In equation
7 this discrepancy is captured by the objective function O(x)
where x denotes a vector consisting of all of the rotation and
translation parameters that are being estimated.
O(x) =
i,j
_uij− vij
_vij_
_2 (7)
Problems of this sort can be solved very effectively using
variants of Newton’s method. In these schemes the objective
function is locally approximated by a quadratic form
constructed from the Jacobian and Hessian of the objective
function
O(x+ x)≈ O(x)+(∇O(x))T x+
1
2
 xT(∇2O(x)) x (8)
At each step of the Newton algorithm we attempt to find
a step parameter space  x that will minimize the overall objective
function by solving a linear equation of the form.
 x = −(∇2O(x))(∇O(x)) (9)
Here we can take advantage of the fact that the linear system
described in equation 9 is typically quite sparse More
specifically, the Hessian matrix ∇2O will reflect the structure
of the visibility graph of the sensor ensemble. This can
be seen by noting that the variables corresponding to the positions
of nodes i and j only interact in the objective function
if node i observes node j or vice versa. For most practical
deployments, the visibility graph is very sparse since
any given camera typically sees a relatively small number of
nodes. This means that the computational effort required to
carry out the pose refinement step remains manageable even
when we consider systems containing several hundred cameras
and sensor nodes.
3 Applications of Smart Camera Networks
Self localizing smart camera networks can serve as an
enabling technology for a wide range of higher level applications.
Here we focus on two applications where the images
from the camera systems are used to derive information
about the geometric structure of the environment.
3.1 Visual Hull Reconstruction
Multi camera systems are commonly used to derive information
about the three dimensional structure of a scene. One
approach to the reconstruction problem which is particularly
well suited to the proposed self localizing smart camera network
is the method of volume intersection which has been
employed in various forms by a number of researchers [5].
This method can be used to detect and localize dynamic objects
moving through the field of view of the smart camera
network. Here a set of stationary cameras are used to observe
one or more objects moving through the scene. Simple background
subtraction is employed to delineate the portions of
the images that correspond to the transient objects. Once this
has been accomplished one can interrogate the occupancy of
any point in the scene, P, by projecting it into each of the images
in turn and determining whether or not it lies within the
intersection of the swept regions. This process can be used
to produce an approximation for the 3D structure of the transient
objects by sampling points in the volume. The results
of such an analysis are shown in Figure 5.
Figure 5. (a) Background image of a scene (b) Image with
object inserted (c) Results of the background subtraction
operation (d) Results of applying the volumetric reconstruction
procedure to the difference images derived
from the three smart camera nodes
In this application the ability to rapidly localize a set of
widely separated cameras is a distinct advantage. Other implementations
of this reconstruction scheme involve complex,
time consuming calibration operations. This implementation,
in contrast, could be be quickly deployed in an
ad-hoc manner and would allow a user to localize and track
moving objects such as people, cars or animals as they move
through the scene.
3.2 Ad Hoc Range Finder
Another approach to reconstructing the 3D geometry of
the scene using the imagery from the smart camera network
involves establishing stereoscopic correspondences between
points viewed in two or more images. If we are able to find
such corresponding points we can readily reconstruct their
3D locations through triangulation. In order to employ this
scheme we need a mechanism for establishing correspondences
between pixels in one image and their mates in another.
One approach to establishing these inter frame correspondences
is to employ structured illumination to help disambiguate
the matching problem. This idea has been employed
successfully in a number of stereo reconstruction systems.
One such structured illumination scheme is depicted in Figure
7 where a projection system sweeps a beam of light
across the surface of the scene. Correspondences can then
be established by simply observing when various pixels in
the two images are lit by the passing beam.
Figure 6 shows a pair of images acquired using such a
structured light correspondence scheme. Here a plane of
laser light is swept across the scene and the curves corresponding
to the illuminated pixels in the two images are recovered.
In each image, every point on the curve corresponds
to a ray in space emanating from that camera position. To
find the correspondence for that point in the other image we
first project that ray into the other image to construct the corresponding
epipolar line and then search along that line to
find the corresponding pixel that is also illuminated by the
laser plane as shown in Figure 7.
P
Epipolar
Projector Line
Light
Plane
Illuminated
Scene Points
Figure 7. At every point in time the projector illuminates
a set of scene points along a planar curve in the scene. For
every point on the projected curve in one image we can
locate its correspondent in the other image by searching
along the epipolar line in the other image.
After sweeping the plane over the entire scene we are able
to determine the range to most of the points in the scene that
are visible from both camera positions even though those two
camera positions are widely separated. Such a range map
is shown in Figure 6c. This range scan was constructed by
Figure 6. ((a) and (b) show Two images of a scene illuminated with a plane of
laser light which is used to establish
correspondences between the two views (c) shows the range map constructed
based on the correspondences derived
from a sequence of such images.
sweeping the laser plane through 180 degrees in 1 degree
increments.
It is important to note here that this range map is constructed
in an ad-hoc manner since the relative positions and
orientations of the cameras are reconstructed automatically
using the self localization algorithm and the position and orientation
of the projector are not needed to recover the scene
depths. The proposed reconstruction scheme is interesting
because it provides a mechanism for recovering the structure
of an extended scene using an ensemble of small, cheap image
sensors and beam projectors which can be deployed in
an ad-hoc manner. This is in contrast to the traditional approach
of recovering scene structure using expensive range
sensors which must be carefully calibrated and aligned.
Figure 8. In this experiment range maps of the scene
were constructed from 4 different vantage points using
different configurations of cameras and projectors. Two
of these scans are shown here along with the corresponding
images
The scheme can be extended for use with multiple cameras
and multiple beam projectors as shown in Figure 8. Here
we are able to obtain multiple range maps of the scene taken
from different vantage points using a collection of camera
systems and projector positions. Importantly, since we are
able to recover the relative positions of all of the cameras
used here via the self localization scheme, all of the recovered
range maps can be related to a single frame of reference.
This provides an avenue to recovering the structure
of extended environments by merging the range maps obtained
from the different camera systems into a single coherent
model of the scene.
4 Conclusions
This paper describes a scheme for determining the relative
location and orientation of a set of smart camera nodes and
sensor modules. The scheme is well suited for implementation
on wireless sensor networks since the communication
and computational requirements are quite minimal.
Self localization is a basic capability on which higher
level applications can be built. For example, the scheme
could be used to survey the location of other sensor motes
enabling a range of location based sensor analyses such as
sniper detection, chemical plume detection and target tracking.
Further, the ability to automatically localize a set of
smart cameras deployed in an ad-hoc manner allows us to
apply a number of multi-camera 3D analysis techniques to
recover aspects of the 3D geometry of a scene from the available
imagery. Ultimately we envision being able to construct
accurate 3D models of extended environments based on images
acquired by a network of inexpensive smart camera systems.
5 References
[1] B. Dungan Ali Rahimi and Trevor Darrell. Simultaneous calibration
and tracking with a network of non-overlapping sensors. In Proc. IEEE
Conf. on Comp. Vision and Patt. Recog., 2004.
[2] Jason Campbell, Phillip B. Gibbons, Suman Nath, Padmanabhan Pillai,
Srinivasan Seshan, and Rahul Sukthankar. Irisnet: an internet-scale architecture
for multimedia sensors. In MULTIMEDIA ’05: Proceedings
of the 13th annual ACM international conference on Multimedia, pages
81–88, New York, NY, USA, 2005. ACM Press.
[3] Wu chi Feng, Brian Code, Ed Kaiser, Mike Shea, Wu chang Feng, and
Louis Bavoil. Panoptes: scalable low-power video sensor networking
technologies. In Proceedings of the eleventh ACM international conference
on Multimedia, pages 562–571. ACM Press, 2003.
[4] C. H. Lin, T. Lv, and I. B. Ozer W.Wolf. A peer-to-peer architecture for
distributed real-time gesture recognition. In International Conference
on Multimedia and Exposition, 2004.
[5] Wojciech Matusik, Christopher Buehler, Ramesh Raskarand Leonard
McMillan, and Steven J. Gortler. Image-based visual hulls. In SIGGRAPH,
2000.
[6] M. Paskin S. Funiak, C. Guestrin and R. Sukthankar. Distributed localization
of networked cameras. In ISPN, 2006.
[7] Z. Yue, L. Zhao, and R. Chellappa. View synthesis of articulating humans
using visual hull. In Proc. Intl. Conf. on Multimedia and Expo,
volume 1, pages 489–492, July 2003.

Application-Driven Design of Smart Camera Networks


Stephan Hengstler, Hamid Aghajan (Principal Adviser) and Andrea
Goldsmith (Co-Adviser)
Wireless Sensor Networks Lab
Department of Electrical Engineering
Stanford University, Stanford, CA 94305, United States
Email: {hengstler@,aghajan@,andrea@ee.}stanford.edu
Abstract — System design aspects must be considered to
effectively map an application onto the constraints of a smart
camera network. Therefore, we propose an application-driven
design methodology that enables the determination of an output
set of operation parameters given an input set of application
requirements.
We illustrate this approach utilizing distributed, sequential
Bayesian estimation for several applications including
target tracking, occupancy sensing and multi-object tracking.
Observation models for single camera and stereo vision systems
are introduced with a particular focus on low-resolution
image sensors. Early simulation results indicate that (i) stereo
vision can increase tracking accuracy by about a factor of five
over single camera vision and (ii) doubling camera resolution
can result in more than twice the accuracy.
I. INTRODUCTION
Information-intensive smart camera networks have recently
receivedmuch attention [1] partly due to their ability
of simplifying and enhancing existing applications
or even of enabling previously infeasible applications.
Application areas include intelligent surveillance, smart
homes, ambient intelligence, and elderly care.
Adaptation of image processing and collaborative estimation
techniques for smart camera networks needs to
consider constraints of individual nodes as well as those
of the entire network. Therefore, it is advantageous to
adopt an application-driven system design perspective to
effectively put a target application into operation across
a smart camera network. Valera and Velastin confirm
this point of view in their review of intelligent distributed
surveillance systems [2]: “Nevertheless there tends to be
a lack of contribution from the field of system engineering
to the research.” We believe that a key challenge to be
addressed lies in mapping the application requirements
into a suitable set of network operation parameters. System
engineering can provide answers to essential questions
about the impact of network topology, camera resolution,
node processing, storage, and communication resources
on performance metrics like accuracy, delay and
lifetime.
The contribution of this doctoral research is twofold.
Firstly, we propose a methodology for application-driven
design of smart camera networks. Secondly, we devise
distributed Bayesian filtering for this purpose, which enables
several applications in these networks.
II. APPLICATION-DRIVEN DESIGN
The first step of our design methodology is the derivation
of a set of system specifications from the application
at hand. Such specifications can specify a network operation
parameter directly or may target a performance
metric. Operation parameters generally relate to network
topology, sensing modalities and characteristics,
processing and communication capabilities, and energy
resources. Performance metrics are typically concerned
with accuracy, delay, false alarm rate, and network lifetime.
These system specifications feed into the generation
of a simulation model of the smart camera network.
Iterative optimization techniques can then determine a set
of suitable operation parameters. The solution is not necessarily
a single point in the operation parameter space
but may result in a range or bound for the parameters.
III. SMART CAMERA MODEL
Our smart camera model consists of two main components:
vision models describe the observations of individual
smart camera nodes and sequential Bayesian estimation
combines the observations into a joint belief state.
A. SEQUENTIAL BAYESIAN ESTIMATION
In [3], Chu et al. introduced the information-driven sensor
querying (IDSQ) tracking algorithm based on distributed
Bayesian MMSE estimation. The target’s true
location x(t) is estimated by the a posteriori distribution
p(x(t)|z(t)), referred to as tracker belief. With every new
set of measurements z(t+1), the tracker updates its belief
p(x(t+1)|z(t+1)) using the sequential Bayesian filtering
p(x(t+1)|z(t+1)) / p(z(t+1)|x(t+1)) (1)
·
Z
p(x(t+1)|x(t)) · p(x(t)|z(t))dx(t).
The quantity p(z(t+1)|x(t+1)) denotes the joint probability
distribution of the measurement vector z(t+1) conditioned
on the target location x(t+1). For mutually
independentmeasurements,
it can be computed as the product
of the individual likelihood functions p(z(t+1)
n |x(t+1)).
The integral term captures the uncertainty introduced to
the previous belief state p(x(t)|z(t)) by the object dynamics
p(x(t+1)|x(t)) under sampling time increment.
398
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
x (normalized)
y (normalized)
Leader Node
Sensor Node
Object
Inactive Node
Sensor Node
Figure 1: Illustration of object sensing in a smart camera
network with superimposed single camera observations.
B. CAMERA VISION MODELS
We consider two types of vision systems: a single camera
and a stereo camera system. Their likelihood function
provides a sensor model relating ground truth and sensing
parameters to observation likelihood.
Utilizing a planar pinhole camera observation model,
we derived the likelihood function of single camera observation
as a function of the operation parameters smart
camera location and orientation, image sensor resolution
R and angular field-of-view. Experiments with our
innovativeMeshEye
smart camera mote [4] indicate the readout
noise of the object bearing to be a mean-free Gaussian
distribution with variance 1/(2R)2. As shown in
Figure 1, the observation likelihood function can be visualized
as a normally distributed cone centered about the
object’s angular orientation. Similarly, we determined
the likelihood function for stereo vision observations.
IV. PERFORMANCE EVALUATION
For performance evaluation and validation of our
application-driven design methodology, we are currently
working on an analytical formulation, a simulation
model, and an experimental network setup.
The analytical performance evaluation targets finding
a lower bound for the variance of the Bayesian estimation.
The posterior Cram´er-Rao bound (PCRB) appears
to be a promising candidate since it readily allows calculation
of the covariance bounds from the joint likelihood
p(z(t)|x(t)). But we need to resort to the Cram´er-Rao
theorem as our estimator is generally biased.
To explore some of the fundamental tradeoffs between
operation parameters, we built a simulation model
in MathWorks Matlab. It is setup to simulate target
tracking–the canonical application example in sensor
12345678
10
−3
10
−2
10
−1
Number of Sensor Nodes S
Average Tracking Error (normalized)

Single Camera
Stereo Camera
Resolution = 8
Resolution = 16
Resolution = 32
Figure 2: Tracking performance vs. number of sensor
nodes S for different camera resolutions R.
networks–for different operation parameters (see Figure
2 for example), but we are in the process of extending it
towards occupancy sensing and multi-object tracking.
Of particular interest to our research are kilopixel imagers.
Despite their crude image arrays of only a few
thousand pixels, their collaboration in a smart camera
network may in fact reduce the overall estimation error
sufficiently. For this reason, we are deploying a network
of about 5 of our MeshEye smart camera motes
[4] for experimental data collection. These motes are
equipped with two kilopixel imagers forming a lowresolution
stereo vision system and one high-resolution
(VGA 640×480 pixel, 24-bit color) camera module.
V. ACKNOWLEDGMENTS
I gratefully acknowledge Agilent/Avago Technologies
for supporting and funding my Ph.D. degree program.
REFERENCES
[1] Y. Liu and S.K. Das. Information-intensive wireless
sensor networks: potential and challenges. IEEE
Communications Mag., 44(11):142–147, Nov. 2006.
[2] M. Valeral and S.A. Velastin. Intelligent distributed
surveillance systems: a review. IEE Proc. - Vision,
Image, and Signal Processing, 152(2):192–204,Apr.
2005.
[3] M. Chu, H. Haussecker, and F. Zhao. Scalable
information-driven sensor querying and routing for
ad hoc heterogeneous sensor networks. Int. J. High
Performance Computing Applications, 16(3):293–
313, Aug. 2002.
[4] S. Hengstler, D. Prashanth, S. Fong, and H. Aghajan.
Mesheye: a hybrid-resolution smart camera mote for
applications in distributed intelligent surveillance. In
Proc. 6th Int. Conf. on Information Processing in
Sensor Networks (IPSN ’07), pages 360–369, Apr.
2007.
399

You might also like