Tracking With The Kinematics of Extremal Contours: 1 Introduction and Background
Tracking With The Kinematics of Extremal Contours: 1 Introduction and Background
Tracking With The Kinematics of Extremal Contours: 1 Introduction and Background
contours
these components. First, we will develop an analytical solution for computing the
contour generator as a function of the motion parameters. The extremal contour
is simply the projection of the contour generator. Second, we will develop an
expression for the image Jacobian that maps joint-velocities onto image point-
velocities.
The kinematics of the contour generator. Let X be a 3-D point that lies
onto the smooth surface of a body part.
We derive now the constraint under which this surface point lies onto the
contour generator associated to a camera. This constraint simply states that
the line of sight associated with this point is tangent to the surface. Both the
line of sight and the surface normal should be expressed in a common reference
frame, and we choose to express these entities in the world reference frame:
>
(Rn) (RX + t − C) = 0, where vector n = ∂∂z X × ∂ X = X × X is normal
∂θ z θ
to the surface at X, and C is the camera optical center in world coordinates.
The equation above becomes:
X T n + (t − C)T Rn = 0 (3)
For any rotation, translation, and camera position, equation (3) allows to esti-
mate X as a function of the surface parameters.
The surface of a truncated elliptical cone is parametrized by an angle θ and
a height z:
a(1 + kz) cos(θ)
X (θ, z) = b(1 + kz) sin(θ) (4)
z
where a and b are the minor and major half-axes of the elliptical cross-section, k is
the tapering parameter of the cone, and z ∈ [z1 , z2 ]. With this parameterization,
eq. (3) can be developed to obtain a trigonometric equation of the form F cos θ +
G sin θ + H = 0 where F , G and H depend on Φ and C but do not depend on z.
With the standard substitution t = tan θ2 we obtain a second-degree polynomial:
This equation has two real solutions, t1 and t2 , (or, equivalently, θ1 and θ2 )
whenever the camera lies outside the cone that defines the body part. Note that
in the case of elliptical cones, θ1 and θ2 do not depend on z and the contour
generator is composed of two straight lines, X(θ1 , z) and X(θ2 , z). From now on
and without ambiguity, X denotes a point lying onto the contour generator.
X w = RX + t (6)
where A and B are defined below and JI is the classical 2×3 matrix:
Eq. (7) reveals that the motion of extremal contours has two components:
a component due to the rigid motion of the smooth surface, and a component
due to the sliding of the contour generator onto the smooth surface. The first
component is:
> w Ω
ṘX + ṫ = ṘR (X − t) + ṫ = A (8)
V
where A = [−[X w ]× I] and (Ω, V ) is the kinematic screw. The notation [m]×
stands for the skew-symmetric matrix associated with a vector m.
The second component can be made explicit by taking the time derivative
of the contour generator constraint, i.e., eq. (3). After some algebraic manipula-
tions, we obtain:
Ω
RẊ = B (9)
V
where B = b−1 RX θ (Rn)> [[C − t]× − I] is a 3 × 6 matrix and b = (X g +
RT (t − C))T nθ is a scalar.
The sliding of the contour generator infers an image velocity that is tangent to
the extremal contour. Approaches based on the estimation of the optical flow for
tracking [14] cannot take into account this tangential component of the velocity
field. Within our approach this term is important and it will be argued in the
experimental section below that it speeds up the convergence of the tracker by
a factor of 2.
Finally we notice that the kinematic screw of a body-part can be related
to the joint velocities associated with a kinematic chain [15], where JK is the
chain’s Jacobian matrix: (Ω V )> = JK Φ̇. By combining this formula with
eq. (7) we obtain eq. (1):
v = JI (A + B)JK Φ̇ (10)
the error function is also ill-suited. Eventually, Fig. 3-(d) is a plot of the error
function when using the sum of the two previously proposed distances. The error
function is never constant and there exists only one local minimum, where the
model contour coincides exactly with the observed contour.
Thus, the simulteneous use of the chamfer distances of both the edges and
the silhouette avoids such local minima. As explained above, minimizing the
silhouette distance pushes model contours inside the image silhouettes while
minimizing the edge distance attracts the model contours to high image gradients
within that silhouette, without explicitly representing the contour orientations.
Now that we have chosen the error function to be minimized, we can track
our model by iteratively minimizing the error in all views, using a non-linear
least-squares optimization technique such as Levenberg-Marquardt. Using the
results from section 2 together with a bilinear interpolation of the chamfer dis-
tance images, we compute the Jacobian analytically, which results in an efficient
implementation, as described in the next section.
4 Experimental results and discussion
We performed experiments with realistic and complex human motions using a
setup composed of 6 cameras that operate at 30 frames/second. The cameras are
both finely synchronized (within 10−6 s) and operate at the same shutter speed
(10−3 s.) thus allowing us to cope with fast motions. The 3-D human model is
Fig. 4. From left to right: A raw image, the silhouette, the edges inside the silhouette,
and the chamfer-distance image associated with the silhouette.
Fig. 5. A set of six calibrated cameras provides six image sequences whose frames are
synchronized.
With our current algorithms we did not restrict the joint angles to bio-
mechanically feasible limits. As a result, most of our tracker failures occurred
because of incorrect assignments during matching, which resulted in collisions
between body parts. We believe we can solve this problem by implementing col-
lision detection and collision prevention more carefully. Another important issue
that should be addressed in future work, is the automatic calibration of the
parameters of our human-body model. Obtaining optimal values for all the con-
stant geometric and kinematic parameters in the anthropomorphic model will
be important for evaluating and improving further the quality, robustness, and
precision of our tracker.
References
1. Gavrila, D.M.: The visual analysis of human movement: A survey. Computer
Vision and Image Understanding 73 (1999) 82–98
2. Bregler, C., Malik, J., Pullen, K.: Twist based acquisition and tracking of animal
and human kinematics. International Journal of Computer Vision 56 (2004) 179–
194
3. Deutscher, J., Blake, A., Reid, I.: Articulated body motion capture by annealed
particle filtering. In: Computer Vision and Pattern Recognition. (2000) 2126–2133
4. Hilton, A.: Towards model-based capture of a persons shape, appearance and
motion. In: Proceedings of the IEEE International Workshop on Modelling People.
(1999)
5. Yan, J., Pollefeys, M.: A factorization approach to articulated motion recovery.
In: Conference on Computer Vision and Pattern Recognition. Volume 2. (2005)
815–821
6. Drummond, T., Cipolla, R.: Real-time tracking of highly articulated structures in
the presence of noisy measurements. In: ICCV. (2001) 315–320
7. Sminchisescu, C., Telea, A.: Human pose estimation from silhouettes. a consis-
tent approach using distance level sets. In: WSCG International Conference on
Computer Graphics, Visualization and Computer Vision. (2002)
8. Delamarre, Q., Faugeras, O.: 3d articulated models and multi-view tracking with
physical forces. Computer Vision and Image Understanding 81 (2001) 328–357
9. Niskanen, M., Boyer, E., Horaud, R.: Articulated motion capture from 3-d points
and normals. In Clocksin, Fitzgibbon, T., ed.: British Machine Vision Conference.
Volume 1., Oxford, UK, BMVA, British Machine Vision Association (2005) 439–
448
10. Blake, A., Isard, M.: Active Contours. Springer-Verlag (1998)
11. Agarwal, A., Triggs, B.: Learning to track 3d human motion from silhouettes. In:
International Conference on Machine Learning, Banff (2004) 9–16
12. Drummond, T., Cipolla, R.: Real-time visual tracking of complex structures. IEEE
Trans. Pattern Analalysis Machine Intelligence 24 (2002) 932–946
13. Martin, F., Horaud, R.: Multiple camera tracking of rigid objects. International
Journal of Robotics Research 21 (2002) 97–113
14. Rosten, E., Drummond, T.: Rapid rendering of apparent contours of implicit
surfaces for real-time tracking. In: British Machine Vision Conference. Volume 2.
(2003) 719–728
15. McCarthy, J.M.: Introduction to Theoretical Kinematics. MIT Press, Cambridge
(1990)