Academia.eduAcademia.edu

Contour Shapes and Gesture Recognition by Neural Network

2012, International Journal of Computer Theory and Engineering

This paper describes on a real time tracking by using images captured from a closed circuit television (CCTV) before being transmitted to a recognition system for identification of the object's contour shape and gesture. The purposes of this research are to develop a contour shapes and gesture recognition model that can be implemented in an intelligent CCTV target recognition system to discover the possible crime events immediately at the critical areas, while reducing the human power. The crime events that had been focused on were robberies and stealing that commonly happen in shopping malls and ATM machines. Therefore, the contour shape of dangerous weapon and suspected person's gesture had been included in this study. The recognition system was designed using the Image Processing and Neural Network tools of Matrix Laboratory (MATLAB) programming language. The analysis of Sum Square Error and correlation coefficient of the designed network in this study had showed that the recognition system was performing well in recognizing the contour shapes and gesture.

International Journal of Computer Theory and Engineering, Vol. 4, No. 4, August 2012 Contour Shapes and Gesture Recognition by Neural Network Lee Chin Kho, Sze Song Ngu, Annie Joseph, and Liang Yew Ng  network. Before that, the basic surveillance system was briefly discussed because it was the medium used in this study to capture the images before transmitting to the recognition system to identify the contour shapes of dangerous weapons and suspected person’s motions. The basic surveillance system consisted of four main components, which were cameras, transmission medium, the peripheral and monitor as shown in the Fig. 1. Abstract—This paper describes on a real time tracking by using images captured from a closed circuit television (CCTV) before being transmitted to a recognition system for identification of the object’s contour shape and gesture. The purposes of this research are to develop a contour shapes and gesture recognition model that can be implemented in an intelligent CCTV target recognition system to discover the possible crime events immediately at the critical areas, while reducing the human power. The crime events that had been focused on were robberies and stealing that commonly happen in shopping malls and ATM machines. Therefore, the contour shape of dangerous weapon and suspected person’s gesture had been included in this study. The recognition system was designed using the Image Processing and Neural Network tools of Matrix Laboratory (MATLAB) programming language. The analysis of Sum Square Error and correlation coefficient of the designed network in this study had showed that the recognition system was performing well in recognizing the contour shapes and gesture. Fig. 1. Basic surveillance system. Index Terms—Contour shape, neural network, multilayer perceptron, sum square error (SSE). The real time tracking pattern recognition program in this paper refered to the automatic surveillance that consisted of specific object detection and motion detection which were used to recognize the dangerous weapons and suspected person’s motions. These functions were important to improve the ability of the surveillance software. I. INTRODUCTION Nowadays, closed circuit television (CCTV) system becomes commonly used for monitoring and surveillance, especially in commercial areas. To observe wider area, larger amount of camera is required. However, the data of CCTV will not even be processed or looked because it requires intensive labors for monitoring purpose. Therefore, the development of real time tracking systems on the contour shape like dangerous weapons or suspected motions for crime prevention is necessary in order to reduce the crime events that keep increasing nowadays. Some studies on automated surveillance [1], motion detection [2]-[5], and human shape recognition [6]-[10] had been proposed and constructed by other researchers. This study is critical in applying the contour shape recognition system to the human security field. In this study, the real time tracking system was developed by the pattern recognition program, moving multiple frames into workspace, motion detection and lastly the neural II. DESIGN MODULE The design module for this study consisted of eight main stages: Motion Detection, Frame Crop to Edge, Frame Resize, Frame Representation in Single Vector, Assemble the Training Data, Define the Network, Train the Network, and Simulate the Network Response to Testing. The Motion Detection was used to produce a set of frames that consisted of moving objects. These frames were then used to initialize the frame crop to edge procedure. After that, the cropped frames were led to the frame resize process before being converted into single vector. Once the frame became single vector, it would be the training data to initialize the Neural Network, and if it failed to do so, the frame would go back to the initial stage to repeat the image processing stages. After the image processing stage, the process would proceed to assemble the training data which would then load to the defined network before it could be trained, and simulated the network response to the testing set. If the network was able to recognize the contour shape, the recognition system was successfully established. If not, the neural network stages were repeated with more varieties of training set. Manuscript received May 27, 2012; revised June 27, 2012. Lee Chin Kho is with Department of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, 923-12 Japan (e-mail: [email protected]). Sze Song Ngu and Liang Yew Ng are with the Electronic Engineering Department, Faculty of Engineering, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Malaysia (e-mail: [email protected], [email protected]). Annie Joseph is with Kobe University, 657-8501 Kobe Shi, Nada-Ku, Rokko dai cho, 1-1, Japan (e-mail:[email protected]) 662 International Journal of Computer Theory and Engineering, Vol. 4, No. 4, August 2012 Multilayer Perceptron (MLP) backpropagation neural network was used in this study. This was because MLP backpropagation neural network worked well for pattern matching and this feature was very important in order to create the recognition system. Backpropagation neural network was a feed forward network that used supervised learning to adjust the connection weights [11]. Training the neural network involved processing a set of training data and computing the axis crossover representation for each object. Each frame vector was then given a label of dangerous weapon, not dangerous weapon, suspected person’s motion or not suspected person’s motion based on what class of object it represented. The general structure of the neural network used to classify the frame vectors was illustrated in Fig. 2. Frame 1 Frame 2 Frame 3 Input Layer Fig. 3. Comparison between testing result and actual result for the recognition system. Hidden Layer Output Layer Dangerous Weapon Not Dangerous Weapon Frame 4 Frame 5 Suspected Person Frame 6 Not Suspected Person Frame 7 Frame n Fig. 2. Feed-forward neural network used to classify the frame crossover vectors consists of a single hidden layer. Once the network weights and biases had been initialized, the network was ready for training. The network could be trained for function approximation (nonlinear regression), pattern association, or pattern classification. The training process required a set of examples of proper network behavior - network inputs P and target outputs T. The performance function for feed forward networks was Sum Square Error (SSE) - the total squared error between the network outputs and the target outputs T. III. RESULTS AND DISCUSSION The frames for dangerous weapon recognition system and suspected person motion recognition system were led to the testing set for its network. Thus, this system consisted of 80 frames of testing set, which were 20 frames of dangerous weapon, 20 frames of NOT dangerous weapon, 20 frames of NOT suspected person’s motion and 20 frames of suspected person’s motion. The network testing result would be the dangerous weapon, NOT dangerous weapon, NOT suspected person’s motion or suspected person’s motion. This was because of the four linear output neurons that had been set for the network training of the system. A graph which consisted of the actual result and testing result for the recognition system was plotted and shown in Fig. 3. 663 Axis-y in Fig. 3 is the linear output neurons where 1 represents dangerous weapon, 2 represents NOT dangerous weapon, 3 represents NOT suspected person’s motion and 4 represents suspected person’s motion, whereas the axis-x is the frames that lead to the network of the testing set for the system. The blue line with the round nodes represents the actual result for every frame that leads to the network. There were total 80 frames that had been tested. The first 20 frames had actual results of 1 (dangerous weapon), 21 to 40 frame had actual result of 2 (NOT dangerous weapon), 41 to 60 frame had actual result of 3 (NOT suspected person’s motion) while the remaining frames had actual results of 4 (suspected person’s motion). The red dashed line with triangle nodes represents the testing result of the network for every frame. There were some error recognition occurred in the network as shown in Fig. 3. The network recognized fifth, sixteenth, seventeenth, eighteenth and nineteenth frame as 2 (NOT dangerous weapon); second and seventh frame as 3 (NOT suspected person’s motion); and twentieth frame as 4 (suspected person’s motion), while all frames from 1 to 20 were supposed to be recognized as 1 (dangerous weapon). This was why there were 8 red triangle nodes mismatched with the blue round nodes on line 1 (dangerous weapon) for the first 20 frames. For the 21 to 40 frame, the network was wrongly recognized for thirty-first and thirty-seventh frame as dangerous weapon and NOT suspected person’s motion. For the 41 to 60 frame, the actual result should be NOT suspected person’s motion, but the network was wrongly recognized for fifty-sixth frame as dangerous weapon. For the remaining frames with actual result of suspected person’s motion, the network was wrongly recognized at sixtieth frame as NOT suspected person’s motion. Therefore, total wrong recognition for the network was 12 out of 80 frames. In order to determine the accuracy of the network, Sum Square Error (SSE) and correlation coefficient (R-value) were used as referred. The SSE was used to measure the network performance function, whereas R-value was the computation between the network response and the target shown in linear regression between the network response and the target. International Journal of Computer Theory and Engineering, Vol. 4, No. 4, August 2012 Fig. 4 illustrates the linear regression for recognition system that corresponds to the testing result. There were eight errors recognition of frame at the first 20 frame or at T = 1, which resulted in the best linear fit for T = 1 around 1.45. On the other hand, there were two errors recognition of frame at the 21 to 40 frame, causing the best linear fit for T = 2 around 2.2. There was one error recognition at the 41 to 60 frame and the best linear fit value for T = 3 was equal to 3. Lastly, there was also one error recognition at the 61 to 80 frame, causing the best linear fit value for T = 4 which was around 3.8. As shown in Fig. 4, the correlation coefficient for the best linear fit line R-value was 0.852 and from the Figure 3, the sum square error was SSE  (2) 2  (1) 2  (1) 2  (1) 2  (2) 2  (1) 2  (1) 2  (3) 2  (1) 2  (1) 2  (2) 2  (1) 2  27 Fig. 4. Linear regression for the recognition system. For each simulation, different values of SSE and R-value were obtained due to the random initial weights for network training [12]. Therefore, in order to get the more accurate value of SSE and R-value for each recognition system, at least ten simulations should be recorded and calculated for the average values. Table I shows the simulation values of the SSE and R-value for the recognition system. The smallest R-value and the largest SSE value for the recognition system was 0.673 and 63 at 1st simulation. By comparing every couple values of SSE and R-value for each recognition system, it was found that the R-value was inversely proportional to the SSE value. Average value of SSE with Different Hidden Neurons SSE 120 100 80 60 40 20 0 SSE 1 5 10 20 40 60 80 100 120 140 Hidden Neurons TABLE I: SIMULATIONS VALUES OF SUM SQUARE ERROR AND CORRELATION COEFFICIENT (R-VALUE) FOR RECOGNITION SYSTEM Simulation 1 2 3 4 5 6 7 8 9 10 Average of SSE and R-value Fig. 5. Average values of Sum Square Error with different hidden neurons for Combine recognition system Recognition Systems SSE 63 26 47 46 33 58 42 32 27 58 R 0.673 0.868 0.755 0.762 0.832 0.695 0.793 0.837 0.865 0.698 43.2 0.778 Fig. 6 shows that the number of hidden neurons with 1, 120 and 140 had smaller R-value compared to the remaining hidden neurons. It meant that the system with hidden neuron of 1, 120 and 140 had lower accuracy compared to others. When hidden neuron was 1, the network was probably already brain-dead, and would never learn. For the networks with 120 and 140 hidden neurons, the network's predictive powers could only be improved by reducing the number of hidden neurons to the acceptable range. Hidden neurons in the range of 5 to 100 are suitable to be applied in this system. However, the best number of hidden neuron that could be set was 80 because it had the highest average R-value and the lowest average value of Sum Square Error compared to others. Besides, the performance of the algorithm in this study was very sensitive to the proper setting of the learning rate. If the learning rate was set too high, the algorithm might oscillate and became unstable. If the learning rate was too small, the algorithm would take too long to converge [11]. Therefore, the comparison between different learning rates was done on the system. The average values of 10 simulations for both SSE and R-value with different hidden neurons had been calculated and recorded. The different hidden neurons that had been set for the comparison were 0.1, 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02 and 0.01. When training a network, the number of hidden neurons is critical. If there is too few of hidden neurons, it means that there is not enough available "brain" to learn the problem. Whereas too many, the network "memorizes" instead of "learns" [13]. Therefore, it is important to find out the most suitable number of hidden neuron that can be used in this study. The different numbers of hidden neuron that had been set for the comparison were 1, 5, 10, 20, 40, 60, 80, 100, 120 and 140. From Fig. 5, the number of hidden neurons with 1, 120 and 140 had larger value of SSE than the remaining of hidden neurons. This indicated that the system with hidden neuron of 1, 120 and 140 had lower accuracy and they were not suitable to be applied in this system. 664 International Journal of Computer Theory and Engineering, Vol. 4, No. 4, August 2012 0.05 were not suggested to be used in this system. Therefore, the system was accepting the range of learning rate between 0.04 to 0.01. However, the best learning rate for this system was 0.04 because it had the highest average R-value and the lowest average value of Sum Square Error compared to others. Average value of Correlation Coefficient (R) with Different Hidden Neurons 0.9 Correlation Coefficient 0.8 0.7 0.6 0.5 R 0.4 IV. CONCLUSION 0.3 This study was implemented utilizing basic MATLAB programming which was capable of combining image processing and neural network techniques to create a contour shape recognition system. From the results, the system had been proved that it was performing well in recognizing the dangerous weapon and suspected person’s motion. By analyzing the values of Sum Square Error and Correlation Coefficient (R-value), the accuracy of the recognition system could be verified. Most of the major features of the system had been successfully accomplished and all the requirements had been fulfilled, but there were some limitations due to certain constraint occurred. The limitations were that the system would take longer time to operate if the number of training set was too large and there was higher resolution of the frame in the training set. 0.2 0.1 0 1 5 10 20 40 60 80 100 120 140 Neurons Fig. 6. Average values of correlation coefficient (R) with different hidden neurons for combine recognition system. Fig. 7 shows the average values of SSE with different learning rate for the recognition system. The learning rate of 0.04, 0.03, 0.02 and 0.01 had smaller value and it meant that the system had higher accuracy compared to others. In other words, the learning rate of 0.1, 0.09, 0.08, 0.07 0.06 and 0.05 were not suitable to be applied in this system. SSE Average value of SSE with Different Learning Rate 200 ACKNOWLEDGMENT 150 The author would like to thank Universiti Malaysia Sarawak (UNIMAS) for providing the funding to publish and present this paper. 100 SSE 50 REFERENCES 0 R. T. Collins, A. J. Lipton, and T. Kanade, “A system for video surveillance and monitoring,” in Proc. 8th International Topical Meeting on Robotics and Remote Systems, USA, 1999, pp. 1–15. [2] R. Cutler and L. S. Davis, “Robust real-time periodic motion detection, analysis, and applications,” in IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, pp. 781– 796, August 2000. [3] Y. Guo, G. Xu, and S. Tsuji, “Understanding human motion patterns,” in Proc. 12th IAPR International Conference on Pattern Recognition, Jerusalem, vol. 2 , 1994, pp. 325-330 [4] J. Russell, “Detecting Humans in Video Footage using Multiple Classifiers,” Honours dissertation, School of Comp. Sci. and Software Eng., Western Australia Uni., 2004. [5] L. Wang, W. Hu, and T. Tan. (May 2002). Recent developments in human motion analysis. The Journal of the Pattern Recognition Society. [Online]. 36. pp. 585–601. Available: http://vc.cs.nthu.edu.tw/home/paper/codfiles/pcchu/200404211710/re cent_developments_in_human_motion_analysis.pdf [6] K. Tabb, S. George, R. Adams, and N. Davey, “Human shape recognition from snakes using neural networks,” in Proc. 3rd International Conference on Computational Intelligence and Multimedia Applications, USA, 1999, pp. 292–296. [7] R. Duda, P. Hart, and D. Stork, Pattern Classification, New York, NY: J. Wiley and Sons, 2001. [8] C. A. Nicolaou, A. L. Egbert, R. C. Lacher, and S. I. Bassett, “Human shape recognition using the method of moments and artificial neural networks,” in Proc.IJCNN’99 International Joint Conference on Neural Networks, Washington, vol. 5, 1999, pp. 3147–3151. [9] D. Comaniciu, V. Ramesh, and P. Meer, “Real-time tracking of nonrigid objects using mean shift,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, USA, vol. 2, 2000, pp. 142–149. [10] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 24, no 24, pp. 509-522, April 2002 [1] 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 Learning Rate Fig. 7. Average values of sum square error with different learning rate for the recognition system. Average value of Correlation Coefficient (R) with Different Learning Rate 0.9 Correlation Coefficient 0.8 0.7 0.6 0.5 R 0.4 0.3 0.2 0.1 0 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 Learning Rate Fig. 8. Average values of correlation coefficient (R) with different learning rate for the recognition system. From Fig. 8, the learning rate of 0.1, 0.09, 0.08, 0.07 0.06 and 0.05 had smaller R-value compared to the remaining learning rate. These high learning rates would cause the algorithm to be oscillated and become unstable. Thus, the system with learning rate of 0.1, 0.09, 0.08, 0.07, 0.06 and 665 International Journal of Computer Theory and Engineering, Vol. 4, No. 4, August 2012 [11] The MathWorks. Neural Network Toolbox 6.0. (January 2008). [Online]. Available: http://www.mathworks.com/products/neuralnet/. [12] A. Pavelka and A. Proch´azka, “Algorithms for Initialization of Neural Network Weights,” Sbornik prispevku 11. Konference MATLAB 2004, vol. 2, 2004, pp. 453-459, [13] VerDuin, “Solving Manufacturing Problems with Neural Networks,” in Article Automation (Cleveland, Ohio: 1987), July 1990, pp. 54-58. Sze Song Ngu received the B.Eng. (Hons) degree in Electronics Engineering from Multimedia University, Cyberjaya (2003) and M.Eng degree in Electrical Engineering from the University of Adelaide (2004). He is working as a lecturer in the Department of Electronic Engineering at University Malaysia Sarawak (UNIMAS), Malaysia. He is currently a PhD. Student with the School of Engineering at the University Glasgoww. His research interests include electrical machines and drive, power electronics, control system and renewable energy. Lee Chin Kho received the B.Eng (Hons) Electronics Engineering from Multimedia University in 2003 and Master of Electrical Engineering from Adelaide University in 2004. Now, she is further her PhD. study in Japan Advance Institute of Science and Technology (JAIST). In 2003, she becomes Process Integration Engineer in 1st Silicon Sdn Bhd for six months. Since 2005, she worked as lecturer in University Malaysia Sarawak. In 2005, she obtain a grant on FRGS from UNIMAS for two years on the research of Signal Penetration Into Building Materials and another two grant from UNIMAS in 2010 and 2011 on the Microstrip Antenna Design and Motion Detection by Neural Network research respectively. She is the member of Board of Engineer in Malaysia (BEM) and graduate member of Institute of Engineering Malaysia (IEM). Annie Joseph received the BEng and MSc degrees in Electrical and Electronic Engineering and Mathematics from Colleague University Tun Hussein Onn in 2005, and University Science Malaysia, in 2006 respectively. She is currently working towards the PhD degree in Electrical and Electronic Engineering at the Kobe University, Japan. Her research interest is online learning, neural network, concept drift, feature extraction and machine learning. She is a member of board of Engineer of Malaysia (BEM). She is also a student member of the IEEE. 666