Academia.eduAcademia.edu

DYNAMIC HAND GESTURE RECOGNITION USING CBIR

2013, IAEME PUBLICATION

Image Databases and archives provide lot of research areas. Significant among them is ; The Contentbased image retrieval (CBIR) research area for manipulating large amount of image databases and archives. CBIR is mainly based on the way that the image is extracted. The main focus of the proposed system is on the color and shape feature extractions for hand tracking that is intended as a step towards palm and face tracking for a perceptual user interface. This paper extends a default implementation to allow tracking on type of feature spaces and arbitrary number by reviewing the k-means clustering algorithm. We weigh the multidimensional histogram with a simple monotonically decreasing kernel profile prior to histogram back projection in order to compute the new portability that a pixel value belongs to the target model. We examine the effectiveness of the K-means clustering algorithm as a general-purpose hand and face tracking approach in the case where no assumptions have been made about the palm to tracked in this paper image retrieval

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 3, May-June (2013), pp. 340-352 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET ©IAEME DYNAMIC HAND GESTURE RECOGNITION USING CBIR Mr.Shivamurthy.R.C Research Scholar, Department of Computer Science & Engineering Akshaya Institute of Technology, Tumkur-572106, Karnataka, India Dr. B.P. Mallikarjunaswamy Professor, Department of Computer Science & Engg, Sri Siddharatha Institute of Technology, Maralur, Tumkur: 572105, Karnataka, India Mr.Pradeep kumar B.P. Department of Electronics & communication Engineering Akshaya Institute of Technology, Tumkur-572106, Karnataka, India ABSTRACT Image Databases and archives provide lot of research areas. Significant among them is ; The Contentbased image retrieval (CBIR) research area for manipulating large amount of image databases and archives. CBIR is mainly based on the way that the image is extracted. The main focus of the proposed system is on the color and shape feature extractions for hand tracking that is intended as a step towards palm and face tracking for a perceptual user interface. This paper extends a default implementation to allow tracking on type of feature spaces and arbitrary number by reviewing the k-means clustering algorithm. We weigh the multidimensional histogram with a simple monotonically decreasing kernel profile prior to histogram back projection in order to compute the new portability that a pixel value belongs to the target model. We examine the effectiveness of the K-means clustering algorithm as a general-purpose hand and face tracking approach in the case where no assumptions have been made about the palm to tracked in this paper image retrieval Keywords: Content–based image retrieval (CBIR), histogram back projection, k means clustering algorithm 340 International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 1.INTRODUCTION There has been a significant growth in the IT field for medical imaging where various techniques and process are involved to create images of human body for clinical procedure. This rapid growth has kept the medical science at a very high level. The developments of large database of medical image are the results of collaborative approaches of handling medical procedures. An intelligent, fast and accurate medical image retrieval system would be the ultimate goal of medical imaging for it to be succeeded. With rapidly growing data in size and semantically distinguishable as the features of various medical images are fuzzy in nature for different organs, the retrieval system should be very adaptive. As compared to general purpose images (GPI), the medical images are distinguished in its characteristics. Hence, in medical image processing systems the process adopted for searching GPI cannot be adopted.The most visually similar images to a give query image from a database of images are importantly adopted in context-based image retrieval (CBIR). CBIR will assist him/her in diagnosis; it does not target at replacing the physician by predicting the disease of a particular case. The diagnostic information can be derived by the visual characteristics of a disease. Sometimes it can also be derived out of similar images correspond to the same disease category. The physician can use the output of CBIR system to obtain more confidence in his/her decision or to consider other possibilities as well. The advances in CBIR systems have led the researchers for new approaches in information retrieval for image databases. It has already met some degree of success in constrained problems in medical applications.Not withstanding the progress already achieved in the few frameworks available here is still a lot of work to be done in order to develop a commercial system able to fulfill image retrieval/diagnosis comprehending a broader image domain. Recently, advances in Content Based Image Retrieval prompted researchers towards new approaches in information retrieval for image databases. In medical applications it already met some degree of success in constrained problems. The generic framework is shown in Figure 1. Figure 1: Diagram for content-based image retrieval system 341 International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Notwithstanding the progress already achieved in the few frameworks available here is still a lot of work to be done in order to develop a commercial system able to fulfil image retrieval/diagnosis comprehending a broader image domain. 2.METHODS USED FOR IMPLEMENTING CBIR 2.1Shape based Method The image feature extracted for shape based image retrieval is usually an Ndimensional feature vector that can be regarded as a point in N-dimensional space. After the images are stored into the database using the extracted feature vectors, the image will be retieved to determine the similarity between the query image and the target images in the databases. This is essentially the determination of distance between the feature vectors that represents the images. The desirable distance measure should reflect human perception. In image retrieval various similarity measure have been exploited. We have used Euclidean distance for similarity measurement in our implementation. 2.2Texture Based Method We find larger variety of texture measures as compared to color measures. Wavelets and Gabor filters are some of the common measures for capturing the texture of images where Gabor filters perform better and correspond well to. The characteristics of images or image parts with respect to changes in certain directions and scale of changes are capture by the texture measure. This is most useful for regions or images with homogeneous texture. 2.3Using Low-Level Visual Features Preprocessing phase and retrieval phase are the two main phases of the image retrieval process. The description of of both phases as follows. A feature extraction model and a classification model are the two main components of pre-processing phase.The original image database i.e. images from the image CLEF medical collection, with more than 66,000 medical images is the input of the preprocessing phase. An index relating each image to its modularity and a feature database is the output of the preprocessing phase. 3.LOW LEVEL IMAGE FEATURE Content-based image retrieval is based principally on low level image feature. As it has been found that users are usually more interested in specific region rather than entire image, most current content-based image retrieval systems are region based. This means that the image is divided in regions on whuich the other operation are performed. To carry out this first step we need to perform a segmentation of the original image. To specify queries we will consider several classes of features: color, texture, shape. 3.1 Image segmentation Automatic image segmentation is a difficult task to compute. A lot of techniques have been proposed in the past, such as curve evolution [11] and graph partitioning [12]. A lot of known techniques works well for images that have homogeneous color regions. Let's see an example of segmentation on one picture in figure 2. However, picture from the real world are 342 International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME richer of color and shades. In literature there are know many segmentation algorithm, like JSEG [10], blobworld [9] or KMCC [13]. Figure 2: Example of segmentation of a picture. (the red lines divided the regions) 3.2 Color feature Color features are easy to obtain, often directly from the pixels intensities like color histogram over the whole image, over a fixed sub image, or over a segmented region. It is one of the most used features in image retrieval. The colors are described by their color space: RGB, LAB, LUV, HSV, RGB is the best known space color and is it commonly used for visualization. The acronym stands for Red Green Blue. This space color can be seen as a cube where the horizontal x-axis as red values increasing to the left, y-axis as blue increasing to the lower right and the vertical z-axis as green increasing towards the top, as in figure 3. Figure 3: The RGB color model mapped to a cube. The origin, black, is hidden behind the cube. RGB is a convenient color model for computer graphics because the human visual system works in a way that is similar, though not quite identical, to an RGB color space. Another famous space color is the HSV. The acronym stands for Hue Saturation Value. Refering to the image 3 we can see the color space as a cylinder, where the angle around the central vertical axis corresponds to hue, the distance from the axis corresponds to saturation, and the distance along the axis corresponds to value (also called brightness). 343 International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Figure 4: The HSV space color. 3.3 Shape feature There's no universal definition of what a shape is. Impressions of shape can be conveyed by color or intensity patterns, or texture, from which a geometrical representation can be derived, like Plato's Meno, where a Socratic dialog is made around the word “figure". Figure is the only existing thing that is found always following color – Socrate. Shape feature (like aspect ratio, circularity, Fourier descriptor, consecutive boundary segments, . . . ) are very important image feature, even if they are not so commonly used in Region-Based Image Retrieval Systems. Due to the inaccuracy of segmentation step, it is more difficult to apply shape features instead of color or texture feature. However in literature are know some Content-Based Image Retrieval Systems that use this feature, like [15], [13] e [14]. 4. INTRODUCTION TO GESTURE Human hand gestures have been a mode of nonverbal interaction widely used. It ranges from simple action of using our finger to point at and using hands to move objects around for more complex expressions for the feelings and communicating with others. The pursuance for the Human Computer Interaction research is moved by the central dogma of removing the complex and cumbersome interaction devices and replacing them with more obvious and expressive means of interaction by minimizing interaction 4.1 Background Registration and foreground segmentation When making skin color detection in real situation where it is very important process in gesture recognition, so that divide the images into foreground and background regions according to the dynamic characteristic of the film. The background regions such as face, neck, areas of adjacent skin color, etc. do not change during the recognition process, In addition to the background region, we call the hand gestures area foreground region. We subtract the background from the images to get foreground regions. The foreground image is used to conduct "and" operation with skin color images to get a more accurate image of hand region. Differential techniques are used in video image processing, that is current image subtracts background image, the result image is called difference foreground image. Select appropriate threshold. The difference foreground image was then be binarized to produce foreground binary image. The basic principle of differential image processing is to carry out 344 International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME differences calculation between the image after gray scale transformation in the detect area with the background image in the spatial domain. That can be expressed as Where f(x,y,ti) and f(x,y,tj) represent the brightness values of the pixel at position of (x,y) on the moment of ti and tj respectively. The range of the value is from 0 to 255. The significant differences of pixel's brightness at position (x,y) is described as: Where mk and sk(k=i,j) are the mean and variance of f(x,y,tk) within a small neighborhood q(x,y) at position of (x,y). t is a threshold. Segmentation is a process of partitioning an image into multiple segments based on certain attributes. The ultimate goal of the segmentation is to convert the image into a simplified form that is more useful as compared to the original image. The results of multiple segmentation techniques depend upon the requirement for segmentation. Background subtraction provides an effective means of segmenting objects moving in front of a static background. Researchers have traditionally used combinations of morphological operations to remove the noise inherent in the background-subtracted result. 4.2 Tracking Tracking starts with interest of color space used for skin modeling. RGB is a convenient color model for computer graphic because the human visual system works in a way that is Similar to an RGB color space, when it was convenient to express color as a combination of three based colored rays (red, green and blue). Normalized RGB skin color model is considered to be more proper for hand skin, and can be easily obtained from the RGB values by a simple normalization processing. Format of YCbCr has been considered to be better in describing the properties than RGB color space The clustering characteristic for YCbCr is better than RGB YCbCr is used to separate out a luminance signal (Y) and two chrominance components (Cb and Cr). YCbCr can challenge various illumination conditions by discarding the signal Y, which not only improve the performance and also reduce the data dimension than RGB. 345 International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME The properties of skin color can be characterized by Gaussian distribution. The single Gaussian model is one of the simplest models to model the distribution of the certain objects which is widely used in computer vision and pattern recognition. Gaussian distribution is given by Where is the mean value of the samples and the variance value. Using the Gaussian model to model skin color is actually a process that matching each pixel of the image to the model. If matched, it is consider the pixel as a skin pixel, else it is consider it background. Highlighting of the hand posture particularly in the film is done considering several steps including skin detection are camshaft algorithm, calibration, motion detection etc. 5. BLOCK DIAGRAM Figure 5: Proposed Hand Gesture Recognition System Gesture recognition is important for developing an attractive alternative to prevalent human–computer interaction modalities. The system design shown above depicts the techniques used for designing a dynamic user interface which initializes by acquiring image 346 International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME or video of the gestures from user. The different hand gestures considered are the sequences of distinct hand gestures. A given hand gesture can undergo motion and discrete change. These gestures are distinguished based on the nature of motion. The real time recognition engine is developing which can reliably recognize these gestures despite individual variations. The engine also has the ability to detect start and end of gesture sequences in an automated fashion. 6. ALGORITHMS 6.1 Algorithm for calibration In calibration part, take the snapshot of the hand region from the webcam then prompt user has to select the hand region from the snapshot. Convert the input image from RGB color format to LAB and HSV format for further operation on selected skin region. Then calculate and store the mean values of A, B, H&S[7]. 6.2 Skin Detection Skin detection is very important process in detecting the hand posture so to perform with skin detection we need to do calibration to get hand color pixels as samples then comparing those pixels with current image to detect the hand. Hence following processing steps can be observed in skin detection. a. Load color samples, and update the background so as to take the median. Calculate the mean 'a*' and 'b*' value for each area that you extracted with roipoly. These values serve as your color markers in 'a*b*' space. b. Trigger the image and take the difference of current image and the background. watever the output we obtain we have to take it in 3 dimension. c. Next process is Thersholding. This is in simple converting image into binary. Then doing some morphological operations like Dilation defined as some kind of expansion.Fill the circular region. d. Convert the output to LAB format and HSV format where LAB defined as L for luminosity, A for chrominocity layer a, B for chrominocity layer b. e. Calculate Euclidian distance between input pixels and sample values, these values are nothing but the inputs A,B,am,bm which are stored in calibration. distance1 = ((a - am).^2 + (b - bm).^2 ).^0.5; distance2 = ((H - hm).^2 +(S - sm).^2 ).^0.5; If distance between the pixels is less than threshold value 15 for LAB format(D1<THRESHOLD (15)) then image mask 1 is generated. f. If distance between the pixels is less than threshold value 0.5 for HSV format(D1<THRESHOLD (0.5)) then image mask 2 is generated. Distance between the pixels is less if we have similar pixels and more if pixels are different. In terms of HSV if the distance is less than 0.1 then it is matching. Get BW and BW2. Get the mask for skin segmentation. Perform OR operation for both color space segmentation output so that to get the skin mask [8]. 347 International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 7. RESULTS AND DISCUSSION Figure 6: Acquired Image with subplot Figure 7: A sample output after background registration and foreground segmentation 348 International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Figure 8: Selecting ROI to get skin color samples in calibration Figure 9: Dynamic single palm Picture with Segmentation Figure 10: two palm after skin segmentation 349 International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Figure 11: detection of palm and face after skin segmentation 8. INTERPRETATION OF RESULTS Figure 6 shows the hand figure, is captured through the camera by using the Matlab Program. Figure 8 shows the hand figure with area selected. Here, the values from area selected are stored for the dynamic hand segmentation depending on the values (color) calculated earlier. Figure 7 A sample output after background registration and foreground segmentation. Figure 9,10 shows the dynamic segmentation by matching the color values. Pictures are saved while hand in motion dynamically. Figure 11 shows there will be a detection of face also due to the skin color segmentation. 9. CONCLUSION This paper describes the design and implementation of a bare dynamic hand gesture recognition system using just one color camera. The developed system can obtain a high recognition rate of bare hand gestures. Our current research mainly focuses on the single hand static gestures for the simplicity. However, we are far from building a general-purpose gesture recognition system. The extensive experiments and evaluation in outdoor environment where more uncertainties such as changing backgrounds, sunshine and shadows may bring the hand gesture complicated. The dynamic gestures and two-handed gestures should also be explored in the future, since they are more expressive and allow more natural interaction. the system implementation shown above adopted by the detection of hand or highlighting hand posture in the input image using calibration, motion detection, and resulted with segmented hand posture is defined with results. As in case of dynamic recognition technics video processing needs speed and higher intensity in providing experience of virtual and real time experience. 350 International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME REFERENCES [1] [2] [3] [4] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] Yoo-Joo Choi, Je-Sung Lee and We-Duke Cho, “A Robust Hand Recognition In Varying Illumination,” Advances in Human Computer Interaction, Shane Pinder (Ed.), 2006. N. Conci, P. Cerseato and F. G. B. De Natale, “Natural Human- Machine Interface using an Interactive Virtual Blackboard,” In Proceeding of ICIP 2007, pp. 181-184, 2007. S. Mitra, and T. Acharya, “Gesture Recognition: A survey,” IEEE Transactions on Systems, Man and Cybernetics (SMC) - Part C: Applications and Reviews, vol. 37(3), pp. 211-324, 2007 Xiujuan Chai, Yikai Fang and Kongqiao Wang, “Robust hand gesture analysis and application in gallery browsing,” In Proceeding of ICME, New York, pp. 938-94, 2009.[5] Ayman Atia and Jiro Tanaka, “Interaction with Tilting Gestures in Ubiquitous Environments,” In International Journal of UbiComp (IJU), Vol.1, No.3, 2010. José Miguel Salles Dias, Pedro Nande, Pedro Santos, Nuno Barata and André Correia, “Image Manipulation through Gestures,” In Proceedings of AICG’04, pp. 1-8, 2004. Pradeep kumar B.P Shailesh M.L, Shankar B.B, , Kumudeesh.K.C. “Dynamic Hand Gesture Recognition”, International Journal of Graphics And Image Processing, published in ijgip, volume 2, issue 1, feb 2012 Pradeep kumar B.P “Design and development of human computer interface (HCI) system based on gesture recognition using svm”, International Journal of Graphics And Image Processing, published in ijgip, volume 2, issue 2, may 2012 C. Carson, S. Belongie, H. Greenspan, and J. Malik. Blobworld: im-age segmentation using expectation-maximization and its application to image querying. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(8):1026{1038, 2002. Y. Deng and B. S. Manjunath. Unsupervised segmentation of color- texture regions in images and video. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(8):800{810, 2001. Haihua Feng, D.A. Castanon, and W.C. Karl. A curve evolution approach for image segmentation using adaptive ows. In ICCV 2001. Proceedings. Eighth IEEE International Conference on computer vision, volume 2, 2001. W.Y. Ma and B.S. Majunath. Edge ow: a framework of boundary detection and image segmentation. In ICCV 2001. Proceedings. Eighth IEEE International Conference on computer vision, 1997. V. Mezaris, I. Kompatsiaris, and M. Strintzis. An ontology approach to object-based image retrieval, 2003. Christopher Town and David Sinclair. Content based image retrieval using semantic visual categories. Technical report, 2001. J. Wang, J. Li, D. Chan, and G. Wiederhold. Semantics-sensitive retrieval for digital picture libraries. D-LIB Magazine, 5(11), November 1999. Shaikh Shabnam Shafi Ahmed, Dr.Shah Aqueel Ahmed and Sayyad Farook Bashir, “Fast Algorithm for Video Quality Enhancing using Vision-Based Hand Gesture Recognition”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 3, 2012, pp. 501 - 509, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. 351 International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME ABOUT THE AUTHOR Mr. Shivamurthy R C received the BE degree from PDA college of Engineering, Gulbarga University and received the M.Tech degree in Computer Science & Engineering from Malnad College of Engineering, Visvesvaraya Technological University, Belgaum. Currently working as professor in the department of Computer Science at A.I.T, Tumkur, Karnataka, and he is also a Ph.D scholar in CMJ University, India. Dr.B.P Mallikarjunaswamy. working as a professor in the Department of Computer Science & Engineering, Sri Siddhartha Institute of Technology, affiliated to Sri Siddhartha University. He has more than 20 years of Experience in teaching and 5 years of R & D. He is guiding many Ph.D scholars. He has published more than 30 technical papers in national and International Journals and conferences. His current research interests are in pattern Recognition and Image Processing. Mr.Pradeep Kumar.B.P, He is presently working as a Assistant Professor in Akshaya Institute of Technology, Tumkur and perusing his PhD in Jain university in electronics & communication Engineering department. He has published more than 20 technical papers in national and International Journals and conferences. His areas of interests are signal processing, Medical Imaging, pattern recognition, video processing, Multimedia Communication systems. 352