Academia.eduAcademia.edu

Human gait recognition at sagittal plane

2007, Image and Vision Computing

The reliable extraction of characteristic gait features from image sequences and their recognition are two important issues in gait recognition. In this paper, we propose a novel two-step, model-based approach to gait recognition by employing a five-link biped locomotion human model. We first extract the gait features from image sequences using the Metropolis-Hasting method. Hidden Markov Models are then trained based on the frequencies of these feature trajectories, from which recognition is performed. As it is entirely based on human gait, our approach is robust to different type of clothes the subjects wear. The model-based gait feature extraction step is insensitive to noise, cluttered background or even moving background. Furthermore, this approach also minimizes the size of the data required for recognition compared to model-free algorithms. We applied our method to both the USF Gait Challenge data set and CMU MoBo data set, and achieved recognition rate of 61 and 96%, respectively. We further studied the relationship between number of subjects within data set and the recognition rate. The results suggest that the recognition rate is significantly limited by the distance of the subject to the camera.

Image and Vision Computing 25 (2007) 321–330 www.elsevier.com/locate/imavis Human gait recognition at sagittal plane Rong Zhang a,*, Christian Vogler b, Dimitris Metaxas a b a Department of Computer Science, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854, USA Gallaudet Research Institute, Gallaudet University, HMB S-433, 800 Florida Avenue NE, Washington, DC 20002, USA Received 16 October 2004; received in revised form 24 August 2005; accepted 11 October 2005 Abstract The reliable extraction of characteristic gait features from image sequences and their recognition are two important issues in gait recognition. In this paper, we propose a novel two-step, model-based approach to gait recognition by employing a five-link biped locomotion human model. We first extract the gait features from image sequences using the Metropolis–Hasting method. Hidden Markov Models are then trained based on the frequencies of these feature trajectories, from which recognition is performed. As it is entirely based on human gait, our approach is robust to different type of clothes the subjects wear. The model-based gait feature extraction step is insensitive to noise, cluttered background or even moving background. Furthermore, this approach also minimizes the size of the data required for recognition compared to model-free algorithms. We applied our method to both the USF Gait Challenge data set and CMU MoBo data set, and achieved recognition rate of 61 and 96%, respectively. We further studied the relationship between number of subjects within data set and the recognition rate. The results suggest that the recognition rate is significantly limited by the distance of the subject to the camera. q 2006 Published by Elsevier B.V. Keywords: Gait recognition; Biometrics; Human motion analysis; Human identification; Hidden Markov model 1. Introduction Human can be identified through many biometrics. Face, iris, and fingerprints have been successfully employed in automatic human identification systems [4,7,18]. However, as they require the subject to be very close to the camera for accurate identification, those characteristic features are not suited for application to surveillance at low resolutions. Gait, or the manner people walk, is the only feature available for recognition when the subject is far from the camera, and has attracted increased attention recently [5,21,24,25]. Moreover, behavioral features such as gait are more difficult to be disguised than facial features. Any attempt to conduct a different gait only makes the subject more suspicious. People can identify walking acquaintances even when they are too far to be recognized by their face. It is commonly agreed that human visual system is extremely sensitive to motion stimuli, although the exact mechanism is still unclear [33]. In 1973, Johansson [15] introduced a new visual stimulus called point-light display, in which human action is reduced to * Corresponding author. E-mail address: [email protected] (R. Zhang). 0262-8856/$ - see front matter q 2006 Published by Elsevier B.V. doi:10.1016/j.imavis.2005.10.007 a few moving points of light. It was observed that biological motion could be accurately perceived with such a simple visual input. More recent studies performed by Stevenage et al. [28] demonstrated that human movements in video could be used as a reliable cue to identify individuals. These findings inspired researchers in computer vision to extract potential characteristic gait signatures from image sequences for human identification. To specify idiosyncratic gait features in image sequences containing moving subjects is challenging due to the similarities in the spatiotemporal walking patterns of different people. The difference in kinematics and dynamics of the human body motion need to be detected for identification purposes. In addition, these features should be invariant to other factors such as clothes, or the hair style. Early research performed by Murray [23] showed that gait was a unique behavioral characteristic if all gait movements were considered. However, in general gait recognition settings, the input signals are image sequences taken by cameras. The loss of the depth information in those two-dimensional image sequences makes it impossible to fully recover the three-dimensional gait patterns. Therefore, gait recognition in computer vision field is commonly performed on two-dimensional features extracted from two-dimensional image sequences. Observing that most walking dynamics take place in the sagittal plane, we aims to identify walking subjects through 322 R. Zhang et al. / Image and Vision Computing 25 (2007) 321–330 Fig. 1. Outline of the proposed algorithm. video sequences containing their side view. In this paper, we propose a two-step, model-based approach for gait recognition employing exclusively biological motion information. The proposed algorithm is outlined in Fig. 1. First, we fit each image using a five-link biped human locomotion model to extract the joint position trajectories. These features are invariant to the subjects’ clothes, hair style, or the human body shape information expect the height of each body parts. The recognition step is then performed using Hidden Markov Models (HMMs) based on the frequency components of these joint trajectories. Applying our approach to both the CMU MoBo and the USF Gait challenge data sets, we demonstrate that promising recognition rates can be obtained only using gait features. This paper is organized as follows. Section 2 summarizes the existing approaches to the gait recognition problem. The five-link biped human model is described in Section 3. Section 4 provides details of the extraction of gait features, whereas recognition using HMM is described in Section 5. Experimental results are presented in Section 6, followed by conclusion in Section 7. 2. Previous approaches to gait recognition Current approaches for the gait recognition problem all contain two major components: extraction of motion features from image sequences, and the subsequent identification by comparing the similarity between the probe feature sequence and those in the gallery. Existing methods for feature extraction can be divided into two categories: model-based and model-free approaches. An explicit structure is employed to interpret the human body movements in image sequences in model-based approaches, whereas features are extracted without considering human structure in model-free methods. Model-free methods focus on the spatiotemporal information contained in the silhouette images, which are binary images representing whether a pixel in the image belongs to the subject or not. This representation has the advantage by eliminating the texture and color information of the subject. Two baseline approaches were proposed for gait recognition based on the silhouette images: Phillips et al. [25] compared the entire silhouette image sequences to the gallery, while Collins et al. [5] selected key frames for comparison. Murase etc. [11] extracted eigenspace representation from silhouette image sequences, while Huang et al. [14] extended the method using Canonical space transformation. Other low level image features are extracted for identification of the spatial and temporal variances of the human gait, which include the width of the outer contour of the silhouette [16], gait mask responses [9], moments of the optical flows [22], linear decomposition of style and content [19], generalized symmetry operator [12], eigengait [2] etc. Lee et al. [21] fit seven ellipses in the human body area, and used their locations, orientations, and aspect ratios as features to represent the gait. All features used in these model-free approaches are susceptible to noise and background cluttering, since they are calculated either on the pixel level (background subtraction) or within small regions (edge map and optical flow calculation). Recently, silhouette refinement [20] has been proposed to improve the recognition rate. However, the features extracted using the above methods include shape information, which should be avoided for gait recognition. With gait features closely related to the walking mechanics, model-based approaches have the potential for robust feature extraction. Human-like structures have been proposed in gait feature extraction. A two-dimensional stick model was obtained through line fitting to the skeleton of the silhouette images by Niyogi and Adelson [24]. Cunado et al. [6] modelled the thighs as interlinked pendular to extract their angular movements. Yam et al. [34] used the pendulum model to explore the difference between walking and running motions. These recognition features are extracted over large regions, which are less sensitive to the image noise. More significantly, these features do not contain shape information. Though compact in representation, the above features were used to achieve satisfactory recognition rates in small gait data sets, demonstrating the potential of identifying persons only using movement features. Following the gait feature extraction, an identification step is performed based on the similarity measurement between the probe and training sequences. Many classification methods has been applied for this purpose, such as Canonical analysis [9], covariance measurement [25], and support vector machine [19]. Nearest neighbor or K-nearest neighbor methods were intensively applied [2,6,12,16,21,24,31], in which gallery examples are sorted according to its similarity to the probe sequence, and the probe sequence is labeled as the one that most of the top k examples belong to. Another commonly used method is Hidden Markov model (HMM) [17,30], a useful tool for representing temporal dependent processes. Human gait is a periodic procedure over time, therefore, HMMs are appropriate R. Zhang et al. / Image and Vision Computing 25 (2007) 321–330 for coping with the stochastic properties of the dynamics. Exemplars of width vector [17] or postures [30] were used as the states of the HMMs to characterize the gait. Sequences of the same subject in the data set are used to train an HMM, and identification is executed by choosing the label of the HMM that generates the given probe sequence with the highest probability. Here, we propose a five-link biped model to extract motion trajectories of the joint positions in image sequences. Unlike the method used in [35], where the joint angles were extracted via a line fitting process, our fitting algorithm is applied to the entire body parts, leading to more robust and reliable results. The HMM is used in the recognition step. 3. Five-link biped model A good human model for gait recognition should be simple; however, it should also be general enough to capture the walking dynamics of most people, and to be customized for different persons. Complicated human models, such as the 3D deformable model [27], are not practical for efficient human tracking. Studies carried out by physiologists show [3] that most walking dynamics take place in the sagittal plane, or the plane bisecting the human body (Fig. 2(a)), and that the trajectories of the legs in the sagittal plane reveal most of the walking dynamics. Therefore, we are interested in extracting the motion dynamics information contained in the image sequences of subjects walking parallel to the camera (side view). Fig. 2(b) shows a typical side view of a walking person. We choose a two-dimensional five-link biped locomotion model, shown in Fig. 2(c), to effectively represent the physical structure and movement constraints of the human body. The lower limbs are represented as trapezoids, whereas the upper body is simplified as the upper half of the human silhouette without arms. Each body part is considered to be rigid with movement only allowed at joint positions. The influence of the arm dynamics has been neglected in our dynamic model. This treatment is justified as little information 323 of the arms is available in the visual images for people walking at a distance, which makes it difficult to recover the exact arm positions. These simplifications are necessary for a compact model to reduce the computational complexity, while at the same time enabling capturing most dynamics of the walking subject. If the length and width of each part (shape model) is fixed, the biped model M has seven degrees offreedom, MZ(CZ{x, y}, QZ{q1, q2, q3, q4, q5}), where C is the position of the body center within the image, and Q is the orientation vector consisting of sagittal plane elevation angles (SEAs) of the five body parts. The SEA of a certain body part is defined as the angle between the main axis of the body part and the y-axis [29], as shown in Fig. 2(b). However, the size of each body part in the images may differ for different persons, or the same person at different distances from the camera. One way to obtain the shape model for each person is to manually locate the joint positions and body parts on the first image of each sequences, which is tedious when the data set is large. In our work, we develop a model fitting method for the initialization process, in which the size of each body part in the image is specified by fitting the human shape model to the silhouette image. 3.1. Scale-invariant body model For our purpose, we need a general human shape model independent of scaling. As shown in Fig. 2, the human body model, without considering neck and head, consists of five trapezoids, connected at the joints. Each trapezoid is defined by its height (l) and the lengths of the top and bottom bases (t and b, respectively). Hence, each body part pi, iZ1, ., 5, can be represented by piZ{ai, bi, li}, where aZt/l and bZb/l are the base-to-height ratios. By normalizing the body part heights with respect to the height of the trunk (l5), we obtain a shape model invariant to scaling, which is parameterized by two vectors: the base-to-height ratio vector, KZ{a1, b1, a2, b2, ., a5, b5}, and the relative height vector RZ{r1, r2, ., r5}, where Fig. 2. (a) Illustration of sagittal plane. (b) Side view of a walking subject. (c) Five-link biped human model, where qi (iZ1, ., 5) is the sagittal elevation angle of the corresponding body part. (d) Schematic representation of individual body part. 324 R. Zhang et al. / Image and Vision Computing 25 (2007) 321–330 riZli/l5. Together with the biped model M, we can describe the human body posture as HZ{K, R, M}. We assume that the model parameters are independent of each other, subject to Gaussian distributions. The orientation vector is subject to a uniform distribution over an interval LQ, provided by physical limits of the joints. Thus, the probabilistic distribution of the human model can be expressed as H Z fK;R;Mg wGðK;SK ÞGðR;SR ÞUðCÞðQ;LQ Þ (1) The means and variances are estimated from the measurements provided in [32]. 3.2. Initialization of body shape model The orientation and the actual size of each body part in the image are specified in the initialization step. To separate the subject from background, we perform a background subtraction procedure to obtain the silhouette image as described in [8]. We assume only one subject is present within each image and the largest blob within the silhouette image can be regarded as where the subject will locate. We choose to fit the five-link biped model to the silhouette image when the two legs are furthest apart from each other, i.e. the double stance phase. At this phase, the SEAs of the shank and thigh of the same leg are approximately identical, and the overlap between the lags are small, making it less ambiguous to parameterize the body parts. Note we do not distinguish left or right legs here. We first obtain a rough estimation of the human model H from the silhouette image S. The body center position in the image is set to the weight center of the silhouette pixels C Z ðx;yÞ Z ðmedianðxi Þ; medianðyi ÞÞ: xi2S yi2S (2) To calculate the orientation vector Q, we select three subregions within the silhouette image, one for the upper body and one for each leg as shown in Fig. 3(a). The SEA of each body part is set as the angle of the main axis for pixels within the corresponding region, or the axis with the least second moment [13]. Given Q, the height of each body part can be obtained based on the height of the silhouette image. The above estimation provides a good starting point, as shown in Fig. 3(b), however, it may not be very accurate in some cases. For example, in Fig. 3(b), we see the locations of the head and right (in the image) shank in the model deviate observably from the actual ones. For further refinement, we seek for a human model H*, which best fits the silhouette image S. Using Bayesian inference, we formulate this procedure as H  Z arg max pðHjSÞ Z arg max pðSjHÞpðHÞ H H (3) where the prior distribution p(H) is given in Eq. (1), and the likelihood p(SjH) specifies the silhouette generating process from human model H to S. Assume S 0 is the shape generated by the human model H, CS is the boundary point set for shape S, and AðSÞ is the corresponding area. The similarity measure between two shapes can be determined by the overlap area between two shapes, and the proximity of the shape boundaries. The likelihood function, therefore, is defined as pðSjHÞ Z pðSjS 0 Þ 0 1 w2 Y w@ GðDðv;CS Þ;s2d ÞÞw1 ðGðAðSÞKAðS 0 Þ;s2S ÞA : v2CS0 (4) Here, s2d , and s2s are the variances of the distance and area, respectively, w1 and w2 are the weights for the two components, and Dðv; Cs Þ Z min dðv; v 0 Þ 0 (5) v 2CS is the minimum distance from point v to the contour of S, calculated by the distance transform. Finding the global optimal H* is rather difficult due to the high dimensionality of H. Here, we use the Metropolis–Hasting method, which guarantees convergence to sample from the Fig. 3. (a) A silhouette image, where the three blocks correspond to the regions where shape model calculation is performed. (b) Rough estimation result. (c) Initialization result. 325 R. Zhang et al. / Image and Vision Computing 25 (2007) 321–330 posterior. Starting from the rough estimation H obtained above, the Metropolis–Hasting steps for adjusting H are: 1. Generate new sample H 0 according to q(H/H 0 ). 0 0 0 qðH / H Þf pðH ÞGðH KH ;SH Þ; 1 0.8 (6) where SH is the covariance matrix for model parameters. 2. Accept H 0 according to   pðH 0 jSÞqðH 0 / HÞ a Z min 1; : pðHjSÞqðH / H 0 Þ 3. Repeat step 1 and 2 until p(HjS) is high enough or a maximum number of iterations is reached. 0.6 0.4 0.2 0 –10 The sizes of neck and head are then calculated through the relative sizes with respect to the trunk length [32]. Fig. 3(c) shows the initialization result of the silhouette image in Fig. 3(a), which shows improvement from the rough estimation in Fig. 3(b). We have noticed that the initialization step relies on the quality of the silhouette image. Hence, further refinement of the parameters may be needed if the silhouette image is severely corrupted. After the initialization step, the location and size of each body part within image are specified, therefore, we can obtain the appearance model (W) of each body part based on the color information within the corresponding image region. 0 5 10 Fig. 4. The Geman–McClure function. Here, wc, wA ; and wm are three weight factors, r is the Geman–McClure function, shown in Fig. 4, is defined as [1] rðx;sÞ Z x2 ; s C x2 (8) 2 which constrains the effect of large residue (x) value as it saturates at one for large values of x. The robust scale parameter s is defined as: s Z 1:4826 !medianjIt KIðMt ;WÞj (9) The minimization of the energy term in Eq. (7) is equivalent to maximizing the following probability 4. Tracking Since we have extracted the shape model and the initial configuration, the next step is to extract gait signatures over time based on this shape model. This is also a two-dimensional tracking problem, i.e. locating image position and orientation of each body part over image sequences. Current 2D-based tracking methods use either image edges or dense optical flow for detection and tracking. However, image cues, such as optical flows and edges, are not entirely reliable, especially when calculated from noisy images. To achieve robustness, we need to carry out our computations within large regions, e.g. at the body part level. The image information we utilize is the color and the inner silhouette region. For an input frame It at time instance t, we use the background model to obtain the silhouette image St. Given the appearance model (W) and human model parameters MtZ(Ct, Qt), we can compose an image I(Mt; W). The best human model configuration should make this image as close to It as possible. In addition, the area of the human model should be equal to the area of the silhouette image, and the difference of the biped model configurations between time instances tK1 and t is small. Therefore, we estimate the best biped model Mt by minimizing the total energy of the following equation, X rðIt KIðMt ;WÞ;sÞ C wA ðAðSt ÞKAðMt ÞÞ2 E Z wc C wm jMt KMtK1 j: –5 (7) pðMt jIt Þf expðKEÞ; (10) by employing the same Metropolis–Hasting method used in the initialization step. The initial Ct is calculated as the mass center of the silhouette image as given in Eq. (2). The predicted orientations are given by: Qt Z 2 * QtK1 KQtK2 (11) Knee stride width Knee elevation Ankle elevation Ankle stride width Fig. 5. The space domain features. 326 R. Zhang et al. / Image and Vision Computing 25 (2007) 321–330 200 200 100 100 0 0 100 100 200 0 0.2 0.4 0.6 Gait cycle 0.8 1 200 0 0.2 0.4 0.6 Gait cycle 0.8 1 Fig. 6. The trajectories of space domain features for two different subjects. 5. Recognition Based on the tracking results obtained with the biped model, the differences among people are largely temporal. It is, therefore, necessary to choose a feature representation that makes the temporal characteristics of the data signal explicit. The sagittal elevation angles extracted from the above tracking procedure capture the temporal dynamics of the gait of the subject, whereas the trajectories of the corresponding joint position reveal the spatial-temporal history. In addition, studies showed that the SEAs exhibit less inter-subject variation across humans [3,29]. Therefore, our recognition method focuses on the joint position trajectories, which is described in details in this section. 5.1. Recognition features To this end, we first compute the following space domain features: ankle elevation (s1), knee elevation (s2), ankle stride width (s3), and knee stride width (s4), as illustrated in Fig. 5. The trajectories of the above four features for two different subjects are shown in Fig. 6. In this plot, all four trajectories are truncated into several pieces, each containing the motion dynamics within one gait cycle. Then all gait cycles length are normalized from 0 to 1. From Fig. 6, we see that the differences between the two subjects are subtle. To distinguish two gait sequences, a frequency domain-based representation seems particularly suitable due to the cyclic nature of gait. For each of these four features si, we compute the Discrete Fourier Transform (DFT), denoted as Si over a fixed window size of 32 frames, which we slide over the feature signal sequences: Si ðnÞ Z 31 1 X s ðkÞeK2pink=32 ; 32 kZ0 i n Z 0;.;31: The DFTs reveal periodicities in the feature data as well as the relative strengths of any periodic components. Since, the zero-th order frequency component does not provide any information on the periodicity of the signal, while high frequency components mainly capture the noise, we sample the magnitude and phase of the second to fifth lowest frequency components. This leads to a feature vector containing 4 magnitude and 4 phase measures for each of the four space domain base features (S1,.,S4) leading to an overall dimension of 32. 5.2. Recognition method After computing the features for each of the gait data samples, we then segment the resulting data stream according to its gait cycles, such that any single example contains only the data from a single gait cycle. In this way, recognition is analogous to isolated speech recognition. Therefore, we applied the Hidden Markov model (HMM) for identification, which have been used successfully in speech recognition. We consider HMM of degree one, where the current state depends only on the previous state. The observation for HMM is the 32-dimensional feature described above. The HMM is (12) The size of 32 frames is chosen close to a typical human gait cycle. Future work should also investigate adaptive window size based on the actual period of the gait cycle of different person. Fig. 7. (a) Sample image from CMU MoBo data set. (b) Result for fitting biped model. 327 R. Zhang et al. / Image and Vision Computing 25 (2007) 321–330 Fig. 8. Tracking results for one subject. represented as (p, A, B). In the training step, the initial state distribution p, transition probabilities A and the observation probability B are estimated using the standard Baum–Welch reestimation method [26]. Given a test example, we compute the likelihood of each HMM on the example, and choose the HMM with the highest likelihood as the correct one, i.e. label it as k* such that k*Zarg maxk pk(Ojp, A, B). With multiple gait cycles for the same person, we can recognize each gait cycle individually using the above method. To combine the recognition results together, we aggregate the N-best recognition: for the recognition result of each gait cycle, assign a score of 20 to the first rank, 19 to the second rank, and so on; and then sum up all the rank scores for all possible hypotheses, and pick the result with the highest cumulative score. Performing aggregation in this way yields an improved identification rate. Fig. 9. We achieve a 96% identification accuracy (RankZ1), and the correct identification always occurs within top 3 ranks. We also carried out the following experiments on this data set: 1. 2. 3. 4. Train with slow walk and test with slow walk. Train with fast walk and test with fast walk. Train with incline walk and test with incline walk. Train with walking while holding a ball and test with walking while holding a ball. 5. Train with slow walk and test with walking while holding a ball. In the first four experiments, the sequences are divided into training and testing sets, by the ratio of 4:1. In case (5), the 100 6. Experiments and results 6.1. CMU MoBo data set The CMU data set contains video sequences of 25 individuals with 824 gait cycles, who are walking on a treadmill under four different conditions: slow, fast, incline, or with a ball in hand. Figs. 7 and 8 show the tracking result of sample images. In our first experiment, we split the gait cycles randomly into a training and a test set by a ratio of 3:1, so that both sets contain a mix of examples from all four walking activities. The cumulative match score is plotted as a function of rank in Cumulative Match Scores The above algorithm is applied to both the CMU MoBo data set [10] and the USF Gait challenge data set [25]. 95 90 85 80 75 70 65 60 55 50 1 2 3 4 5 6 Rank 7 Fig. 9. CMS plot of CMU gait data. 8 9 10 328 R. Zhang et al. / Image and Vision Computing 25 (2007) 321–330 Table 1 Identification rate PI at ranks 1, 2, 5 and 10 on the CMU MoBo data set for various experiments Train set Test set PI (%) RankZ1 2 5 10 1 2 3 4 Slow Fast Incline Holding ball Slow Slow Fast Incline Holding ball Holding ball 100 96.0 95.8 100 100 100 100 100 100 100 100 100 100 100 100 100 5 52.2 60.9 69.6 91.3 entire slow walk sequences are used for training, and only one gait cycle for walking with a ball is used for evaluation. The results of the five experiments are summarized in Table 1. As we can see, the recognition rate hit 100% at the top match for experiments (1) and (4), and is nearly perfect (around 96%) for (2) and (3). This shows that our method performs better than those shape-based approach such as [16], suggesting that the motion dynamics of different subjects are accurately captured using our five-link biped human model. However, our recognition result for experiment (5) is significantly worse than those for (1)–(4). A common observation from physiology study is that human locomotion must satisfy the postural stability and dynamic equilibrium. Hence, the subject slightly changes his/her gait when holding a ball, due to the necessary adjustment to balance the additional weight, and the restriction of arm movement. Consequently, the poor recognition rate in this case is a natural outcome for methods using only dynamic information. The higher recognition rates achieved using other recognition methods is an indication that the human shape information are employed in addition to gait (Fig. 10). 6.2. USF gait challenge data set The USF gait challenge data set contains people walking in a natural outdoor setting. Since, they are taken outdoors, we 90 Cumulative Match Scores Experiment 100 80 70 60 50 40 30 20 10 0 1 3 5 7 9 11 Rank 13 15 17 19 Fig. 10. CMS plot of USF gait data. need to handle special background changes due to shadows, moving background, lighting changes, etc. Therefore, we applied the non-parametric background modelling [8] for silhouette extraction. Fig. 11 shows typical images in that data set and the corresponding silhouette images, whereas Fig. 12 show regions in the silhouette images Containing the subject. Due to the color similarity between the subject and the outdoor scene, the silhouette images are not smooth and may be discontinuous. Therefore, only region based approaches can provide reliable feature extraction results. Although the subjects walk along an ellipsis track on the ground, we only pick the sequence where they are walking parallel to the camera. Typically, each gait data sample in this data set contains 4–7 individual gait cycles. Overall, there are 75 subjects in the data set, with a total of 2045 gait cycles. 75% of the cycles are randomly selected to form the training set, with the rest forming the test set. Both sets contain a mix of examples from the subjects with different camera views, types of shoes and surfaces. The identification rate (rankZ1) for the entire USF data set is 61%, as shown in Fig. 10. Fig. 11. (a) Original images. (b) Silhouette images. R. Zhang et al. / Image and Vision Computing 25 (2007) 321–330 329 Fig. 12. Sample silhouettes from the USF data. The recognition rate is lower than that for the CMU data set, which may be attributed to either the number of subjects in the data set, or the distance between the subjects and the camera. We randomly choose a certain number of subjects from the USF data set and use their data for the recognition process. The process is repeated 20 times for data set size varying from 15 to 40. The average recognition rate and the standard deviation are shown in Fig. 13 as a function of rank. Clearly, increasing the data size leads to a decrease in the recognition rate. Using 25 subjects as in the CMU MoBo data set, a recognition rate of (77G5)% is obtained for the USF data set. Although this represents a significant improvement from 61% when all 75 subjects are used, it does not completely account for the lower recognition rate for the USF data set. We notice that the average image length of thighs for the subjects here is 26.7 pixels, compared to w130 in the CMU data set. Therefore, the accuracy of the extracted feature is limited in USF data set. The subtle inter-subject movement differences could not be fully extracted from these images, which results in lower recognition rate. Both the number of subjects and the image resolution are hence important in affecting the recognition rate. 7. Conclusion In this paper, we have shown a novel two-step, model-based approach for gait recognition using exclusively the human body movement information within the sagittal plane. As the 95 Recognition rate at rank 1 Acknowledgements The authors would like to thank Shan Lu for fruitful discussions, and Stratos Loukidis for setting up the data-base. This work is supported by the National Science Foundation, contract number NSF-ITR-0205671, NSF-ITR-0313184, and NSF: 0200983. References 100 90 85 80 75 70 65 60 55 50 15 concealable shape and appearance information is avoided, robust recognition is achieved using our method. Applying this approach to CMU MoBo data set and USF Gait Challenge data set, we achieve recognition rates of 96% and 61%, respectively. The lower recognition rate for USF data set is attributed to both the larger number of subjects and the longer distance from camera to subjects. This suggests that proper zoom lenses are needed to ensure that the gait motion is seen at sufficient detail. The experimental results demonstrate that the sagittal plane contains identification information. Other view point may also contains important gait features. For example, images captured from camera facing the subject reveals additional swaging and toe-in/toe-out information, which may also be useful in recognition. In our future work, we would combine the frontal view of the subject for other information such as the toe-out and the bending of the legs [29], to further improve the recognition rate. 20 25 30 35 Number of subjects in data–set Fig. 13. The recognition change according to size of data set. 40 [1] S. Ayer, H.S. Sawhney, Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding, in: IEEE International Conference on Computer Vision, 1995, pp. 777–784. [2] C. BenAbdelkader, Ross Cutler, Harsh Nanda, L.S. Davis, Eigengait: motion-based recognition of people using image self-similarity, in: Proceedings of the International Conference on Audio and Video-based Person Authentication, 2001. [3] A. Borghese, L. Bianchi, F. Lacquaniti, Kinematic determinants of human locomotion, Journal of Physiology 494 (3) (1996) 863–879. [4] R. Chellappa, C. Wilson, S. Sirohev, Human and machine recognition of faces: a survey, Proceedings of IEEE 83 (5) (1995) 705–740. [5] R.T. Collins, R. Gross, J. Shi, Silhouette-based human identification from body shape and gait, in: International Conference on Automatic Face and Gesture Recognition, 2002. [6] D. Cunado, M.S. Nixon, J.N. Carter, Using gait as a biometric, via phaseweighted magnitude spectra, in: First International Conference Audio and Video based Biometric Person Authentification, 1997. 330 R. Zhang et al. / Image and Vision Computing 25 (2007) 321–330 [7] John G. Daugman, High confidence visual recognition of persons by a test of statistical independence, IEEE Transactions on Pattern Analysis and Machine Intelligence 15 (11) (1993) 1148–1161. [8] A. Elgammal, D. Harwood, L. Davis, Non-parametric model for background subtraction, in: Sixth European Conference on Computer Vision, 2000. [9] J.P. Foster, M.S. Nixon, A. Prudel-Bennett, Automatic gait recognition using area-based metrics, Pattern Recognition Letters 24 (14) (2001) 2489–2497. [10] R. Gross, J. Shi, The CMU motion of body (MoBo) database, Technical Report CMU-RI-TR-01-18, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, June 2001. [11] H. Murase, R. Sakai, Moving object recognition in eigenspace representation: gait analysis and lip reading, Pattern Recognition Letters 17 (2) (1996) 155–162. [12] J.B. Hayfron-Acquah, M.S. Nixon, J.N. Carter, Automatic gait recognition by symmetry analysis, Pattern Recognition Letters 24 (13) (2003) 2175–2183. [13] R.K.P. Horn (Ed.), Robot Vision, MIT Press, Cambridge, MA, 1986. [14] P.S. Huang, C.J. Harris, Mark S. Nixon, Human gait recognition in canonical space using temporal templates. In: IEE Proceedings—Vision, Image and Signal Processing, vol. 2(146), 1999, pp. 93–100. [15] G. Johansson, Visual perception of biological motion and a model for its analysis, Perception and Psychophysics 14 (2) (1973) 201–211. [16] A. Kale, N. Cuntoor, B. Yegnanarayana, A.N. Rajagopalan, R. Chellappa, Gait analysis for human identification, in: Proceedings of the Third International Conference on Audio and Video Based Person Authentication, 2003. [17] A. Kale, A.N. Rajagopalan, N. Cuntoor, V. Kruger, Gait based recognition of humans using continuous HMMs, in: Face and Gesture Recognition, 2002. [18] K. Karu, A.K. Jain, Fingerprint classification, Pattern Recognition 29 (3) (1996) 389–404. [19] C.S. Lee, A. Elgammal, Gait style and gait content: bilinear model for gait recognition using gait re-sampling, in: Sixth International Conference on Automatic Face and Gesture Recognition, 2004. [20] L. Lee, G. Dailey, K. Tieu, Learning pedestrian models for silhouette refinement, in: International Conference on Computer Vision and Pattern Recognition, 2003. [21] L. Lee, W.E.L. Grimson, Gait analysis for recognition and classification, in: IEEE Conference on Face and Gesture Recognition, 2002, pp. 155–161. [22] L. Little, J. Boyd, Recognizing people by their gait: the shape of motion, Videre 1 (2) (1996) 1–32. [23] M.P. Murray, A.B. Drought, R.C. Kory, Walking patterns of normal men, Journal of Bone and Joint Surgery 46-A (2) (1964) 335–360. [24] S.A. Niyogi, E.H. Adelson, Analyzing and recognizing walking figures in xyt, in: IEEE Conference on Computer Vision and Pattern Recognition, 1994. [25] P.J. Phillips, S. Sarkar, I. Robledo, P. Grother, K. Bowyer, The gait identification challenge problem: data sets and baseline algorithm, in: International Conference on Pattern Recognition, 2002. [26] L.R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE 77 (2) (1989) 257–286. [27] C. Sminchisescu, B. Triggs, Covariance scaled sampling for monocular 3d body tracking, in: IEEE International Conference on Computer Vision and Pattern Recognition, 2001. [28] S.V. Stevenage, M.S. Nixon, K. Vince, Visual analysis of gait as a cue to identity, Applied Cognitive Psychology 13 (1999) 513–526. [29] H. Sun, Curved Path Human Locomotion on Uneven Terrain, PhD thesis, University of Pennsylvania, 2000. [30] A. Sundaresan, A. RoyChowdhury, R. Chellappa, A hidden markov model based framework for recognition of humans from gait sequences, in: International Conference on Image Processing, 2003. [31] R. Tanawongsuwan, A. Bobick, Gait recognition from time-normalized joint-angle trajectories in the walking plane, in: CVPR, vol. 2, 2001, pp. 726–731. [32] A.R. Tilley (Ed.), The Measure of Man and Woman: Human Factors in Design, H.D. Associates, New York, 1993. [33] N.F. Troje, Decomposing biological motion: a framework for analysis and synthesis of human gait patterns, Journal of Vision 2 (2002) 371–387. [34] C.Y. Yam, M.S. Nixon, J.N. Carter, On the relationship of human walking and running: automatic person identification by gait, in: International Conference on Pattern Recognition, 2002. [35] J.H. Yoo, M.S. Nixon, C.J. Harris, Extracting human gait signatures by body segment properties, in: Southwest Symposium on Image Analysis and Interpretation, 2002, pp. 35–39.