Journal of Advances in Information Technology

Journal of Advances in Information Technology

Khairullah Khan

visibility

…

description

53 pages

link

1 file

Journal of Advances in Information Technology ISSN 1798-2340 Volume 1, Number 3, August 2010 Special Issue: Ubiquitous Computing Guest Editors: Neeraj Kumar Nehra and Pranay Chaudhuri Contents Guest Editorial Neeraj Kumar Nehra and Pranay Chaudhuri 103 SPECIAL ISSUE PAPERS A Location Dependent Connectivity Guarantee Key Management Scheme for Heterogeneous Wireless Sensor Networks Kamal Kumar, A. K. Verma, and R.B. Patel 105 Peripheral Display for Multi-User Location Awareness Rahat Iqbal, Anne James, John Black, and Witold Poreda 116 Discrete Characterization of Domain Using Semantic Clustering Sanjay Madan and Shalini Batra 127 GRAAA: Grid Resource Allocation Based on Ant Algorithm Manpreet Singh 133 A Channel Allocation Algorithm for Hot-Spot Cells in Wireless Networks Rana Ejaz Ahmed 136 JPEG Compression Steganography & Crypography Using Image-Adaptation Technique Meenu Kumari, A. Khare, and Pallavi Khare 141 Review of Machine Learning Approaches to Semantic Web Service Discovery Shalini Batra and Seema Bawa 146 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 103 Special Issue on Ubiquitous Computing Guest Editorial Ubiquitous computing is an important and fast growing research area but the development of ubiquitous computing is still in its infancy, although a few ubiquitous services have been developed and deployed in our daily lives, such as mobile audio/video streaming, mobile e-learning, and remote video surveillance. This fast growing field will emerge as a new research field in near future having interaction with the outer world with the development of numerous interesting ubiquitous applications. The main aim of this special issue was to collect original research papers that present recent advances and future directions in Ubiquitous environment from theoretical as well as practical point of view. This special issue is a collection of research papers from all aspects of this new emerging field e.g. design, implementation and future aspects as well as challenges and constraints in this field. This special issue contains a diverse collection of high-quality papers authored by eminent academicians and researchers in the field. In the first paper, Kamal et. al. propose a Location dependent Connectivity guarantee Key management scheme for heterogeneous wireless sensor networks (LOCK) without using deployment knowledge. A pair-wise, group wise and cluster key is generated efficiently for participating nodes. LOCK provides dynamicity by two ways; one by not completely depending upon pre deployed information and other by not completely depending upon location. Scheme is proved to support largest possible network using smallest storage overhead as compared to existing key management schemes. In the second paper, Rahat et. al. have developed a multi-user location-awareness system by following a user-centred design and evaluation approach. The authors discuss the development of the system that allows users to share informative feedback about their current geographical location. Also the proposed system can be used by various users, for example family members, relatives or a group of friends, in order to share the information related to their locations and to interact with each other. In the next paper, Sanjay et. al., propose to get the knowledge about the software systems in software reengineering. In the proposed approach, the mapping of domain to the code using the information retrieval techniques and linguistic information, such as identifier names and comments in source code has been used. Moreover, concept of Semantic Clustering has been introduced in this paper and an algorithm has been provided to group source artifacts based on how the synonymy and polysemy is related. Based on semantic similarity automatic labeling of the program code is done after detecting the clusters, and is visually explore in 3-Dimension for discrete characterization. In the next paper, Manpreet et al. propose resource allocation on grid using ant colony algorithm. The major objective of resource allocation in grid is effective scheduling of tasks and in turn the reduction of execution time. For efficient resource allocation an Ant colony algorithm is proposed, which is one of the heuristic algorithm suits well for the allocation and scheduling in grid environment. In the next paper, author proposes a channel allocation algorithm for hot spot cells in wireless networks. The proposed scheme presents a new hybrid channel allocation algorithm in which the base station sends a multi-level hotspot notification to the central pool located at Mobile Switching Station (MSC) on each channel request that cannot be satisfied locally at the base station. This notification will request more than one channel be assigned to the requesting cell, proportional to the current hot-spot level of the cell. When a call using such a borrowed channel terminates, the cell may retain the channel depending upon its current hot-spot level. In the next paper, Meenu Kumari et. al. propose a JPEG Compression Steganography & Cryptography using ImageAdaptation Technique. Authors have designed a system that allows an average user to transfer text messages by hiding them in a digital image file using the local characteristics within an image. This paper is a combination of steganography and encryption algorithms, which provides a strong backbone for its security. The proposed system not only hides large volume of data within an image, but also limits the perceivable distortion that might occur in an image while processing it. In the next paper, Shalini et. al. provides an exhaustive review of machine learning approaches used for Web Services discovery and frameworks developed based on these approaches. A thorough analysis of existing frameworks for semantic discovery of Web Services is also provided in this paper. Special Thanks Guest Editors would like to extend sincere thanks to all the people who have contributed their time and efforts in making this special issue a grand success. We are thankful to all the authors who have contributed their papers for this special issue. We are thankful to all the reviewers for providing their valuable suggestions and comments to the submitted manuscripts. We are also thankful to Editor-in-Chief, Prof. ACM Fong, for his encouragement and strong support during the preparation of this special issue. © 2010 ACADEMY PUBLISHER doi:10.4304/jait.1.3.103-104 104 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 Guest Editors Dr. Neeraj Kumar Nehra Assistant Professor, School of CSE, SMVD University, Katra (J&K), India E-mail: [email protected]; [email protected] Dr. Pranay Chaudhuri Professor, Department of CSE, JUIT, Waknaghat (H.P.), India Email: [email protected] Dr. Neeraj Kumar Nehra is working as Assistant Professor in School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra(India). He received his Ph.D. in CSE from Shri Mata Vaishno Devi University, Katra(India) and PDF from UK. He has more than 30 publications in reputed journals and conferences including IEEE, Springer and ACM. His research is focused on mobile computing, parallel/distributed computing, multiagent systems, service oriented computing, routing and security issues in wireless adhoc, sensor and mesh networks. He is leading the Mobile Computing and Distributed System Research Group. Prior to joining SMVDU, Katra he has worked with HEC Jagadhri and MMEC Mullana, Ambala, Haryana, India. He has delivered invited talks and lectures in various IEEE international conferences in India and abroad. He has organized various special sessions in international conferences in his area of expertise in India and abroad. He is TPC of various IEEE sponsored conferences in India and abroad. He is reviewer/ editorial board of various journals e.g. Journal of Supercomputing (Springer), International Journal of Network Security (IJNS), Journal of Emerging Trends in Web Intelligence, Journal of Advances in Information Technology, IJCA and many more. He is senior member of ACEEE and IACSIT. Prof. Pranay Chaudhuri has been the Head of the Department of Computer Science, Mathematics and Physics, University of the West Indies. Professor Pranay Chaudhuri joined the University of the West Indies in June 2000 as Professor of Computer Science. Prior to joining the University of the West Indies, Professor Chaudhuri has held faculty positions at the Indian Institute of Technology, James Cook University of North Queensland, University of the New South Wales and Kuwait University, and is currently professor at Jaypee University of Information Technology, India. Professor Chaudhuri's research interests include Parallel and Distributed Computing, Grid Computing, Self-stabilization and Graph Theory. In these areas, he has extensively published in leading international journals and conference proceedings. He is also the author of a book entitled, Parallel Algorithms: Design and Analysis (Prentice-Hall, 1992). Professor Chaudhuri is the recipient of several international awards for his research contribution © 2010 ACADEMY PUBLISHER JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 105 A Location Dependent Connectivity Guarantee Key Management Scheme for Heterogeneous Wireless Sensor Networks Kamal Kumar, M.M. Engineering College Mullana, Ambala, Haryana, India. [email protected] A. K. Verma R.B. Patel Thapar Institute of Engineering and Technology Patiala, Punjab, India [email protected] M.M. Engineering College Mullana, Ambala, Haryana, India., [email protected] Abstract – Wireless sensor networks pose new security and privacy challenges. One of the important challenges is how to bootstrap secure communications among nodes. Several key management schemes have been proposed. However, they either cannot offer strong resilience against node capture attacks, or requires too much memory for achieving the desired connectivity. In this paper, we propose a LOcation dependent Connectivity guarantee Key management scheme for heterogeneous wireless sensor networks (LOCK) without using deployment knowledge. In our scheme, a target field is divided into hexagon clusters using a new clustering scheme crafted out of nodes’s heterogeneity. Even without using deployment knowledge, we drastically reduce the number of keys to be stored at each node. A pair-wise, group wise and cluster key can be generated efficiently for among nodes. LOCK provides dynamicity by two ways; one by not completely depending upon pre deployed information and other by not completely depending upon location. Compared with existing schemes, our scheme achieves a higher connectivity with a much lower memory requirement. It also outperforms other schemes in terms of resilience against node capture and node replication attacks. Scheme is proved to support largest possible network using smallest storage overhead as compared to existing key management schemes. environment, a number of clever symmetric-key management schemes have been introduced. One well received solution that has been extended by several researchers is to pre-distribute a certain number of randomly selected keys in each of the nodes throughout the network [9], [4], [7], [16]. Using this approach, one can achieve a known probability of connectivity within a network. These previous efforts have assumed a deployment of homogeneous nodes and have therefore suggested a balanced distribution of random keys to each of the nodes to achieve security. Likewise, the analysis of those solutions relies on assumptions specific to a homogeneous environment. A deviation from the homogeneous system model has been increasingly discussed in the research community. Instead of assuming that sensor networks are comprised entirely of low-ability nodes, a number of authors have started exploring the idea of deploying a heterogeneous mix of platforms and harnessing the available “microservers” for a variety of needs. For example, Mhatre et al. [1] automatically designate nodes with greater inherent capabilities and energy as cluster heads in order to maximize network lifetime. Traynor et al. [32] extend this idea to mobile groups by having a more powerful node perform group handoffs for neighboring sensors. In this paper, we propose LOCK without using deployment knowledge. In our scheme, a target field is divided into hexagon clusters using a new clustering scheme crafted out of nodes’s heterogeneity. Even without using deployment knowledge, we drastically reduce the number of keys to be stored at each node. A pair-wise, group wise and cluster key can be generated efficiently for among nodes. LOCK provides dynamicity by two ways; one by not completely depending upon pre deployed information and other by not completely depending upon location. The rest of the paper is organized as follows. Section II and Section III discusses clustering approach. In section IV LOCK has been the proposed network’s model, network deployment and discussed with section V discussing performance related issues. Finally concluded in section VI. Index Terms – Deployment, Heterogeneous, Connectivity, Geographical Group I. INTRODUCTION Wireless sensor networks (WSNs) are commonly used in ubiquitous and pervasive applications such as military, homeland security, health-care, and industry automation. WSNs consist of numerous small, low-cost, independent sensor nodes, which have limited computing and energy resources. Secure and scalable WSN applications require efficient key distribution and key management mechanisms. These systems have traditionally been composed of a large number of homogeneous nodes with extreme resource constraints. This combination of austere capabilities and physical exposure make security in sensor networks an extremely difficult problem. Because traditional asymmetric encryption is not practical in this 1 © 2010 ACADEMY PUBLISHER doi:10.4304/jait.1.3.105-115 106 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 II. NETWORK ELEMENTS Basically, two architectures are available for wireless networks, distributed flat architecture and hierarchical architecture. The former has better survivability since it does not have a single point of failure, and the latter provides simpler network management, and can help further reduce transmissions. As we know, WSNs are distributed event-driven systems that differ from traditional wireless networks in several ways such as extremely large network size, severe energy constraints, redundant low-rate data, and many-to-one flows. It is clear that in many sensing applications, connectivity between all Sensor Nodes ( SNs ) is not required but some applications require explicit connectivity between every pair of nodes. Mostly wireless SNs merely observe and transmit data to those nodes with better routing and processing capabilities, and do not share data among themselves. Data centric mechanisms should be performed to aggregate redundant data in order to reduce the energy consumption and traffic load in WSNs (out of scope of our proposal). Therefore, the hierarchical heterogeneous network model has more operational advantages than the flat homogeneous model for WSNs with their inherent limitations on power and processing capabilities [11][12][13][8][12]. Moreover recent trend is towards secure connectivity between geographical neighboring nodes. This phenomenon requires of Group Key which is shared symmetric key among a group of neighboring nodes. In this paper, we focus on large-scale WSNs with the same three-tier hierarchical architecture as in [2] [3]. SNs are divided into two categories namely H-Sensors and L-Sensors. H-Sensors are small number of SNs possessing higher memory, transmission range, multiple transmission ranges, processing power and battery life. Our network model has four different kinds of wireless devices on the basis of functionality; sink node/base station ( BS ), cluster head node ( CH ), Anchor Nodes ( AN ) and sensor node ( SNs ). Sensor node ( SNs ): Sensor nodes are L-Sensors which are inexpensive, limited-capability, generic wireless devices. Each SNs has limited battery power, memory size, data processing capability and short radio transmission range. SNs communicates with its, CH , SNs and SINK . Cluster head node ( CH ): Cluster head nodes are a kind of H-Sensors, have considerably more resources than the SNs . Equipped with high power batteries, large memory storages, powerful antenna and data processing capacities, CH can execute relatively complicated numerical operations and have much longer radio transmission range than SNs . CHs can communicate with each other directly and relay data between its cluster members and the sink node (base station). SNs which need to communicate with neighbors in neighboring cluster will relay its data through CHs . Anchor Nodes ( ANs ): Anchor Nodes are a kind of H-Sensors which have multiple power level for transmission. Thus ANs have capability to transmit in multiple ranges which can be changed at requirement. ANs are placed at triangular/Hexagonal points to realize a new clustering approach. We introduce a new clustering approach which divides the nodes into clusters of hexagonal shapes. This approach will classify our scheme into location dependent scheme but without using deployment knowledge. Sink node/Base station ( SINK / BS ): Sink node is the most powerful node in a WSN, it has virtually unlimited computational and communication power, unlimited memory storage capacity, and very large radio transmission range which can reach all the SNs in a WSN. Sink node can be located either in the center or at a corner of the network based on the application. In our network model, a large number of SNs are randomly distributed in an area. A sink node/base station ( BS ) is located in a well-protected place and takes charge of the whole network’s operation. After the deployment, CHs partition a WSN into several distinct clusters by using a clustering algorithm discussed ahead. Each cluster is composed of a CH and a set of SNs (distinct from other sets). SNs monitor the surrounding environment and transmit the sensed readings to their respective CH for relay. SNs may use multihop or single communication pattern for communication with CHs . III. NETWORK DEPLOYMENT AND CLUSTERING APPROACH SNs are large in number and have limited capabilities. SNs are deployed randomly in the field for deployment like can be dropped from an aircraft. ANs are placed uniformly and in controlled manner using a manned or unmanned deployment vehicle which is equipped with GPS system to connect with satellite to retrieve exact location for ANs . Using hexagonal/triangular deployment of ANs in the deployment field the network deployment field is roughly divided into hexagonal/triangular field using multiple transmission power levels of ANs . As shown in the Fig. 1 the lines in dark are transmission radius of ANs placed at triangular points. The higher is the transmission level larger is the transmission radius. For sake of convenience we approximated and drawn arc shaped lines by straight lines and thus resulting each field is subdivided into approximately triangular cells. Depending upon the number of ANs whose transmission ranges are aligned/covering the triangle completely, SNs in that triangular cell will receive the equivalent number of nonce, considering that each transmission level of a AN transmits an entirely different nonce. For e.g. Nodes in Blue Cluster receives Selected Nonce but from all from AN 5 and N 65 N 66 , from AN 6 . SNs in other cells of same cluster receives nonce depending upon their location in the field. Further adjoining neighboring triangular Cells will form a Cluster and each cluster will be administered by CH . This process or step is followed 2 © 2010 ACADEMY PUBLISHER JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 107 Figure 1: Hexagonal deployments of ANs and Resultant Hexagonal Clusters. For convenience the circular arcs are approximated as straight lines. Transmission ranges from closely placed Anchor Nodes at six corners intersect with each other and resulting into triangular shaped cells. Adjoining cells may be joined to give a hexagonal shaped clusters which are supposed to managed by cluster Head by another controlled deployment using same GPS equipped vehicle, corresponding to H-Sensors which will work as CHs . Considering the placement of Nodes as shown in the Fig. 1, AN1 , AN 2 , and AN 3 , AN 4 , AN 5 and AN 6 are able to transmit at different power level and thus can transmit in multiple ranges. We here assume that the Anchor Nodes are able to transmit at six power levels in Fig. 1. IV. LOCK A. Underlying Approach In existing key pre-distribution schemes, two communicating sensors either use one or some of their shared pre-loaded keys directly as their communication key [15][9], or compose a pairwise key by their preloaded secret shares. Although this kind of mechanism has low computational overhead, it could lead to a serious security threat in practice. If some SNs are captured after the deployment, an adversary may crack some or even all the communication keys in the network by those compromised keys or secret shares. This node capture attack is the main threat to a key pre-distribution scheme. To address the limitations of existing key pre-distribution schemes, we propose to incorporate the location dependence with pre-distribution. Our proposal allows each pair of neighboring SNs has a unique pairwise key between them, which cannot be derived from the preoladed setup keys by other nodes. An adversary cannot crack the pairwise keys among non-captured SNs , even if some SNs are captured and their stored key information is compromised. Therefore, any SNs compromise can not affect the communication between noncompromised SNs . B. Procedures in LOCK Our proposed LOCK scheme has two phases, (a) Setup keys assignment phase, (b) Location dependent Keys Generation Phase Key generation phase includes the generation of group, cluster and pair wise key between nodes. An off-line authority center called SINK is in charge of the initialization of the SNs in LOCK. Before deployment, each sensor node is assigned a unique ID ; generated by SINK . Besides this each sensor node is also assigned the IDs of two CH which is assigned the part of the information required to generate pair wise key with its post deployment CH . SINK also generates a large size key pool P composed of more than 2 20 distinct symmetric keys. For each sensor node SN i , SINK randomly selects a secret key from P and stores it into SN i s memory, this pre-loaded key is denoted as k SN i − SINK . k SN i − SINK is the shared pairwise key between node SN i and the Sink node, and is be used to encrypt the exchanged data between the node SN i and SINK . Setup Key Assignment Phase: Before SNs are deployed, setup keys need to be pre-loaded into them in a certain way to ensure any two nodes can find some common keys after the deployment. Besides this each sensor node is also assigned IDs of two CH which is assigned the part of the information required to generate pair wise key with its post deployment CH . For each SN i , SINK randomly selects some keys from P and pre-loads them into the intended SN ' s memory. In our scheme, this pre-loaded information is named as network setup keys. Besides these a common key K is preloaded as a common setup key into the memory of each SN . To ensure any two SNs share some keys after deployment, depending upon its location beside common key K , we use a simple but efficient setup key assignment method for Heterogeneous Wireless Sensor Networks( HWSN ). 3 © 2010 ACADEMY PUBLISHER 108 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 Suppose there are n SNs in the network. First, SINK randomly selects n distinct keys from key pool P and constructs a two-dimensional (m×m) matrix M , where “ m = n ”. Fig. 2 illustrates an example of the constructed key matrix M , in which each entry is a symmetric key with a unique two-dimensional id denoted by “ k i , j (i, j = 1,2,..., m ) .” For convenience, we use Ri and “ C j (i, j = 1,2,..., m ) ” to represent the i th row and the j th column in M , respectively. An equivalent representation of the Matrix M is given in Fig. 4, where nodes in black represent the diagonal entries of the matrix and also root of the Dual skewed Hash Binary Tree (DHBT) a modification of Hash Binary Tree in Fig. 3. Root can be used to derive node’s left skewed branch and right skewed branch. For e.g. k 3,3 can be used to derive the row 3 completely. Similarly all the diagonal elements of the matrix M . Besides these each SN is informed a number N , such that N = 2t with “ t (1 ≤ t ≤ m ) ” values of which t values represents row numbers and remaining t values represents column numbers, assigned to SNs by their post deployment CH (Deployment of CHs is discussed in previous section). Before we can generate the complete rows of the matrix we need to customize the key matrix with respect to SN ' s Location. To customize and to make the scheme location dependent, the diagonal elements of a SN in a cluster in conformance to its administered cluster and corresponding geographical location; CHs computes the common content of the broadcast received by all constituent cells and sends the common broadcast vector to each node in its cluster using a plain broadcast message or by encrypting using cluster key. Equation (1) is used to customize the diagonal elements in M , where K i j,i is the customized diagonal element of ith row and ith column with respect to location of jth cluster. COMM j is defined as common content received by each SNs node in the jth cluster and is defined as “ COMN j = K 1 ⊕ K 2 ⊕ K 3 ⊕ ... ” where K1 , K 2 etc are nonce/keys shared by all the nodes ( in the jth cluster CH ) . COMM j Applying the procedure repeatedly results in generation of complete Key Matrix M where hash function is considered to be hardwired in SN . Now consider the network setup keys pre-loaded in a SN when t = 2 . In our case, each SN stores only m instead of m 2 or keys in its memory. This is alternative to use t rows and t columns thus 2 × t × m values in the storage [10]. This is where our scheme performs better in terms of memory requirements as it requires only m keys in the memory. For higher values of t this saving in memory requirements are even higher. For higher values of t this memory requirement shoots up exponentially [10] and thus our scheme offers a memory efficient approach for establishing pair wise keys in 2 HWSNs. Any two SNs share at least 2t common keys in their memories, therefore, our setup key assignment which is deployment knowledge independent but location dependent in manner compared to procedure in [10] but still guarantee the connectivity between any two nodes in the network. Compared to scheme proposed in [6] our scheme ensures 100 percent connectivity among nodes of WSN. So, compared with existing key pre-distribution schemes, our approach is the first one to support full network connectivity without any prior deployment information and no matter how the SN are deployed and offers higher memory efficiency and computational efficiency, which are the main contribution of our proposed scheme. Key Generation Phase: This phase includes the procedures for generation of Inter Cluster, Administrative keys, Cluster key and pair wise symmetric keys. Inter Cluster Key Establishment K CH a −CH b : Each ( ) node is assigned a node ID by SINK . Provided CH a and CH b are the participating cluster heads, CHs can generate the pair wise key between them using (2) where sh1 and sh2 are shares of the symmetric keys exchanged between participating CHs (K CH a −CH b ) = H K (sh1 ⊕ sh2 ) (2) Administrative Key ( K CH i − SNi ) Generation: The nodes is a vector and is are preloaded with a symmetric key i.e. K SNi −CH i which informed by the cluster head to each node in its broadcast. can be used directly. CHs has to construct this pair wise symmetric key using the information stored in SN . Each SN is provided with IDs of two CHs . These IDs are sent to CH of the parent cluster. CH will receives the j K i j,i = H COMM j (K i ,i ) (1) Now each node is provided with localized keys which represent the diagonal elements of the Key Matrix K . Next we propose to use Dual Skewed Hash Binary Tree (DHBT)where left or right branch can be generated using hash of left shifted value or right shifted value. The diagonal elements are considered as roots of these DHBT. shares k1 , k 2 from two CHs whose IDs is sent to CH by SN . Equation (3) is used to set up K CH i − SNi , where K is preloaded common setup key. K CH i −SNi = H K (k1 ⊕ k 2 ) (3) 4 © 2010 ACADEMY PUBLISHER JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 109 Figure 2: Setup key Matrix and Keys Assignment 0 S(0,0) 1 S (1,1) S(1,0) 00 S(2,0) 000 S(3,0) S(2,1) 001 S(3,1) 010 S(3,2) 11 10 01 S(2,3) S(2,2) 011 S(3,3) 100 S(3,4) 101 S(3,5) 111 110 S(3,6) S(3,7) Figure 3: Hash Binary Tree. S(0,1) is obtained as Hash(leftShift(S(0,0))). Similarly S(1,1) is obtained as Hash(RightShift(S(0,0))). The complete HBT can be obtained in this manner and upto required height. Figure 4: Dual Skewed Hash Binary Tree Representation of Key Matrix K 5 © 2010 ACADEMY PUBLISHER 110 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 Pairwise Key Generation Phase: To secure the communication between two neighboring nodes, any SN needs to generate a pairwise key with each of its onehop neighbors after the deployment. In our proposed scheme, the pairwise key generation phase has three steps. First, node SN i randomly selects “ l (1 < l < t ) ” rows and l columns from its stored setup keys and SN i generates a random nonce rni . Then, node SN i broadcasts a handshaking message including its node IDi , the random nonce rni , and indices of it selected rows and columns to its one-hop neighbors. After two neighboring nodes exchanged the handshaking message, they can generate a pairwise key using their shared setup keys and the random nonce. To explain the procedure clearly, we use an example to illustrate how two communicating nodes generate a pairwise key between them. Suppose nodes SN a and SN b are two neighboring SNs after the deployment. As shown in Fig. 2, SN a has been pre-loaded the 3rd and 6th columns, and the 1st and 4th rows indices of key matrix K in its memory, SN b has the 1st and 4th columns, and the 3rd and 6th rows indices of key matrix K pre-loaded in its memory. To establish a pairwise key between nodes under consideration, first SN a generates a random nonce rna . Then, SN a broadcasts a handshaking message { } to a random “ SN a , R1 , R 4 , C 3 , C 6 , rn a ” node In other words, the path-key establishment phase of existing key pre-distribution schemes is eliminated in our approach, which not only reduces the communication overhead, but also increases the security level of the generated pairwise keys. On the other hand, since each generated pairwise key is distinct to others, LOCK improves the network resilience against node capture attack. Further customizing the diagonal elements to a cluster results in strengthening the resilience against node capture attack as same node may never be used outside the cluster heads transmission range. Geographical Group Key Generation ( k GoG ) :Sensor nodes in the same geographical group i.e. triangular cell, can construct a group key k GoG using the broadcast received from ANs and membership information obtained from CHs as follows: ( ) K GoG = H KCH k11 , k12, ,..., k ij ,..., list _ of _ IDs … (5) i where k ij ’s are key broadcast from AN i and transmitting at jth power or transmission level, list _ of _ IDs is unique value obtained as a result of XOR operation on the IDs of the nodes residing in a cell as defined in (6): list _ of _ IDs = IDi ,1 ⊕ IDi , 2 ⊕ ... ⊕ IDi ,m (6) “ SN b ”. nonce rnb , and where IDi , j is the jth Node’s ID in ith cell assigned at broadcasts “ SN b , R3 , R6 ,C 1, C 4 , rnb ”to node. After exchanging their handshaking messages, node SN a obtains rnb as well as its shared setup keys indices with pre-deployment stage by the SINK . list _ of _ IDs is securely sent to the SNs using pair wise symmetric key i.e. K CH i − SNi . “ SN b k1,1 , k1, 4 , k 3,3 , k 6,3 , k 4,1 , k 4, 4 , k 3,6 , k 6,6 ” which are Cluster Key Generation ( K CH i ):Equation (7) can be Similarly, SN b generates { } the intersections of the corresponding key rows and columns. Node SN b also can get rn a and the shared setup keys with SN a at the same time. Now, nodes SN a and SN b can calculate a pairwise key between them by Equation (4): pk N a − N b = rn a ⊕ k1,1 ⊕ k1,4 ⊕ k 3,3 ⊕ k 6,3 ⊕ k 4,1 ⊕ k 4, 4 ⊕ k 3,6 ⊕ k 6,6 ⊕ rn b " (4 ) In (4), "⊕" is the exclusive-or operator, pk N a − N b denotes the pair wise key between nodes SN a and SN b , rna and rnb are two random nonce generated by SN a and SN b respectively. In LOCK, each SN stores m diagonal keys from the constructed matrix M and t rows and t column indexes. Since each pair of row and column has an intersection entry between them, any two SNs can find 2t 2 common keys after they exchange the handshaking messages, which means, any two SNs which are members of same cluster, within their radio transmission range, can directly setup a secure link without the third node’s participation. used to generate K CH i : K CH i = H K (COMN i ) (7) where K1 , where “ COMN i = K1 ⊕ K 2 ⊕ K 3 ⊕ ... ” K 2 etc are keys shared by all the nodes in the ith cluster (CH i ) , and K is pre deployed in SNs as described earlier. Successive uses of Common keys is will be deleted after replaced by K CH i as K bootstrapping is over. V. SECURITY ANALYSIS AND PERFORMANCE EVALUATION We analyze the security property and evaluate the performance of our proposed LOCK scheme in this section. A.. Security Analysis Node Replication Attack: Because of the unattended mode operation, some SNs could be physically captured 6 © 2010 ACADEMY PUBLISHER JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 by an adversary during the operating period. Thus Node replication attack is a severe threat for WSNs due to its infrastructure less architecture. In [9], the pair-wise keys are directly used from the pre-loaded keys. After the network bootstrapping phase, if SN is captured and all its stored keys are compromised, the adversary can duplicate some malicious node and deploy them into the network to execute some attacks such as eavesdropping, Denial-ofService (DoS), etc. In LOCK the keys are not same throughout the operational life of the SN . Cluster key is updated as and when needed using most recent broadcast from the ANs. Geographical group key is updated using new and remaining list of nodes from cluster head and new broadcast from the ANs . Diagonal entries of the matrix got customized with respect to the corresponding cluster using the common part of the broadcast received by the nodes in the cluster head’s coverage range. Moreover any pair of SNs has a unique pairwise key between them after network initialization phase, which can be used to authenticate the communicating parties mutually. Without the proper authentication, any stranger’s packets will just be ignored. Consequently node replication attack can be totally prevented by our proposed scheme. Resiliency against Node Capture Attack: Adversary can physically capture some SNs to compromise the secret information. Node capture attack is the most serious threat in WSNs. The communication between non-captured nodes could be cracked even they are not physically captured. In [15], if each SN stores 200 keys in its memory and the probability that any two nodes share at least one common key is 0.33 and 50 nodes’ capture could compromise 10% of the communication among the non-captured nodes. Although [9] claims that the network resilience can be improved if two nodes share at least q(q > l ) common keys to establish a secure link, it only works when the number of captured nodes is less than a critical value. When the number of captured nodes exceeds the critical value, the fraction of compromised communication among non-captured nodes increases even at a much faster rate than [15]. In LOCK, after the pairwise key generation phase, each pair of neighboring nodes have a unique pairwise key between them, hence any node’s capture can not affect the secure communication between non-captured nodes. In other words, our approach can guarantee the communication security among noncaptured nodes no matter how many SNs are captured by the adversary, which is one of the main contributions of our work. Fig.5 shows that above 30% of the communication between non-captured nodes are compromised in [9] when 200 nodes are captured; if the number of captured nodes increases to 500, more than 60% of the communication of the rest network will be compromised. On the contrary, no communication between non-captured nodes could be compromised in LOCK no matter how many SNs are captured by the adversary. Fig. 6 shows the LOCK cluster size supported. 111 assorted. As a result of location dependence the network size supported is much larger than any existing key management scheme. If we assume the number of clusters is 7 and the size of the network is almost 7 times of cluster, the network supported is drawn in Fig. 6. In LOCK, the maximum supported cluster size exponentially increases when the key ring size increases linearly, which means our proposed scheme has better scalability than any of the existing key pre-distribution schemes till date. Think of the network size that can be supported in our deployment. Random key pre-distribution key management scheme can support a network of size 200 nodes using 50 keys per node. Q-composite[4] key distribution is not much better than Random key pre-distribution key management scheme. In LOCK the same matrix got localized and thus same equation which earlier denoted the size of equation which earlier denoted the size of the network, exploits the size of cluster. Moreover due to our proposal of storing only diagonal entries the memory requirements are even lesser. Network Connectivity: Random key predistribution schemes cannot guarantee any two SNs establish a pairwise key directly. To increase the network connectivity, intermediate nodes need to be involved in a path-key establishment procedure. Even so, based on probability theory, some SNs or some portions of a network are still possibly isolated from the network if no path-keys can be established. LOCK can guarantee a completed network connectivity since any two SNs can find common setup keys between them, which is the second contribution of our work. Fig. 7 shows that LOCK can generate a connected network with only one-hop neighbors’ information exchange. For random key pre-distribution schemes, two or three more hops neighbors need to be involved to setup an almost connected network, which not only reduce the security of the established pairwise key, but also produce more communication overhead in the network. Communication Overhead: In random key predistribution schemes, each SN exchanges all of its stored key information with its neighbors. For a large-scale communication and memory storage overheads are produced in this procedure. LOCK guarantees any two nodes to establish a pairwise key directly; therefore, its communication overhead is much lower than the previous schemes. C. Performance reviewed in the light of multiple Transmission levels of Anchor Nodes Scheme proposed in previous sections has support for all three types of keys namely cluster key, Pair wise key and group key. Key refresh mechanism of cluster and pair wise keys is achieved with the help of periodic or event based broadcast from ANs. The information broadcast used by nodes in a cluster to generate cluster and pairwise keys. To measure the effect of number of power levels and radius of broadcast we reinvestigate the performance 7 © 2010 ACADEMY PUBLISHER JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 Fraction of Cluster Compromized 112 1 0.9 0.8 0.7 0.6 0.5 Random Pre Distribution Key Managem ent LOCK 0.4 0.3 0.2 0.1 0 0 200 400 600 800 1000 1200 Num ber of Com prom ised Nodes Cluster Size Figure 5: Fraction of Compromised Nodes 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 0 5 10 15 20 Num ber of Keys per Node Connectivity ratio with Same Cluster Neighbors Figure 6: Cluster Size Supported 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 Number of Hops LOCK Random Key PreDistribution(k=50) Figure 7: Network Connectivity vs. Number of hops needed for pair wise keys 8 © 2010 ACADEMY PUBLISHER JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 and subject to various configuration and analyze the effect on memory and connectivity performance. We start by investigating the expected number of keys stored on each sensor node when using LOCK. This gives a measure of memory capacity of every sensor that needs to be devoted for LOCK while refreshing. Location Dependence measurements: The number of keys stored on a sensor node is only momentarily contributed by the number of messages that a node receives from various ANs and almost completely by number of generation keys stored on each sensor node. Starting with memory required by the former factor. Each message contains a nonce which is then used to form or customize the pre-distributed keys with respect to its location in the deployment field and to derive a key used for geographical group and to derive a cluster key. After these uses the nonce obtained by the nodes are deleted. Hence we need to store these keys momentarily and delete thereafter. Thus there is not major consumption on the memory of the individual nodes as a result of receiving broadcast from ANs . If we assume the memory consumption is contributed equally by former factor, then we need to determine the expected number of messages received by a sensor node. In order to this we divide the messages transmitted by each AN into N P different categories, where N P is the number of power levels on each AN . The messages transmitted at the ith power level are called type − i messages. Type − 1 correspond to the lowest power level while type − N P messages correspond to the highest power level. Therefore, when sensor node receives type − i messages then it also receives messages of type − j where “ j >= i ”. ( ) N p Na ⎛ ρ Lπ R 2j − R 2j −1 − ρ π (R 2 − R 2 ) ⎞ e L j j −1 ⎟ E N = ∑ ∑ i (N P − j + 1)⎜ ⎜ ⎟ i! j =1i =1 ⎝ ⎠ (8) The equation (8) as derived in [5] represents the number of nonce which a node receives corresponding to the ith power level. Where radius Ri is the outer radius for annulus centered on SN under consideration with Ri −1 inner radius with of the annulus, ρ L indicates the density of AN deployment, N P is the number of power levels N a is the total number of anchor nodes in the network, area of annulus may be calculated by “ Aa = ∏ Ri2 − Ri2−1 ”. All the nonce broadcast from ANs at higher transmission level is also received. Dependence on the average number of keys on each node. We further assumed the maximum transmission radius R N P of an AN . We want to determine the number ( ) of sub-keys would be needed to ensure high degree of location dependence Given this equation we analyze the effect of number of power levels and thus measure of 113 location thereby reduce the memory requirement at predistribution stage. For Lower values of R N P the degree of location dependence is very high and approaches to Lower levels for higher values of R N P . This phenomenon is attributed to the fact that with increased R N P we are able to cover more nodes by the same AN and thus the probability of having same diagonal contents shared among neighbors increases and thus lowers Location dependence. We can achieve desired connectivity but compromise ratio will have a boost as a result. To achieve connectivity ratio of 1 and low compromise ratio we are required to increase the size of matrix to achieve desired uniqueness in row and column assignment. The memory requirement is dependent upon number of power levels. The reason to this issue is attributed to the behavior is that with even single power level the node in coverage of AN will receive one nonce/subkey. The predistributed contents and thereafter customized contents are same throughout and thus will require a large sized matrix and thus higher memory requirement at each SN to achieve desired connectivity at low compromise ratio. With only one power level impact of compromised nodes is very severe. To avoid the effect on the compromise ratio we need to increase the size of matrix and thus memory requirement at each SN . To achieve the uniqueness in pre distributed information we need extremely large sized key matrix. Thus memory requirement is heavenly dependent upon the number of power levels. This is because when using a single power level any node in the transmission range of an AN knows the of all secrets transmitted by the AN . When the number of power levels increases for the same value of R N C , the number of secrets of ANs known by the sensor node depends upon the distance of the sensor from ANs of interest. Consider the case where all the intermediate power levels are eliminated. Then sensor in any region will receive all the secrets from all the transmitting anchor nodes. Thus compromise of any node in the region will jeopardize the communication of any other sensor node in the network unless we increase the matrix size. On the other hand by having three power levels for each AN , the nodes in any region will not receive all the secrets from AN . In such a case compromise of a node leads to a lesser number of secure links between non-compromised nodes being jeopardized. This effect is attributed to the fact that with the increase in the number of power levels the degree of location dependence increases thus causing a reduction in number of SN in each cell and thus in a cluster. This will reduce the size of matrix by many folds equivalent to the number of clusters obtained as a result of sectoring of network deployment field. This factor will continue to affect the memory requirement; in other words lower down the memory requirement unless each node in a separate cell. We can increase the number levels to a degree such that there should be at least two 9 © 2010 ACADEMY PUBLISHER 114 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 nodes in each cell. Even at high degree of location dependence the connectivity ratio is 1 for every node in the same cluster. Beyond a threshold i.e each SN in a separate cell, this factor will not affect / have impact on the compromise ratio but reduces the connectivity ratio to 0. Thus the compromise ratio as well as connectivity ratio is sensitive only to very low and very high values of N P . To study the effects of density, number of power levels and transmission levels of AN on the compromise ratio. For a sensor node we consider only the density and maximum transmission range. For a sensor node connectivity remains same and compromise ratio increase as the density of sensor is increased. This is because with the increase in sensor density there are more nodes that share the same customized diagonal entries. As more nodes are close by and thus able to connect with their neighbors, a node is able to set up secure links with more of its neighbors. In addition compromise of a node is still unaffected as long as we able to have uniqueness in row and column assignment. With the increase in the density the size of matrix may required be increased to bring uniqueness in row and column assignment and thus increasing the memory requirement at each node. Further as the transmission radius of the sensor nodes is increased, the nodes have more neighbors and a node is able to communicate with a node only if they share commonly customized matrix. Moreover if node belongs to the neighboring cluster node will not be able to communicate. We have not considered such scenario, but of course will reduce linked node compared to potential neighboring nodes. The connectivity ratio on the other hand could be reduced. Increasing radius results in increasing the neighbors; some of which might not be sharing the same secrets; as new neighbors might not be covered by the same ANs as node concerned. Thus reducing the capacity of connecting to all the neighbors of a sensor node thus reducing connectivity ratio. The compromise ratio on the other hand should not be affected. Changing the transmission range will not affect the number of non compromised nodes impacted due to compromise of any node. It is because a noncompromised node is impacted only when it shares keys with the compromised nodes. And sharing of keys is not governed by the transmission range of a sensor node. Increasing the transmission range might allow more number of non-compromised nodes to set up secure links and the fraction of these new links that are impacted cannot be predicted. Increasing the number of power levels N p on an AN while keeping the density of ANs as well as the maximum transmission range R N P the same also does not affect either the connectivity ratio or the compromise ratio. Increasing the density of ANs without changing either N p or R N P has positive impact on both the connectivity ratio or compromise ratio. This is because by increasing the number of ANs more number of Sensor Nodes can receive the beacons/nonces which allow them to derive their own customized diagonals. This also has a positive impact on the compromise ratio by reducing the compromise ratio since location dependence increases with increase in the density of ANs . Increasing the maximum transmission radius of a ANs has negative impact on Location dependence. This is because by increasing the R N P more number of sensor nodes will receive beacons from the same AN . This makes it easier for neighboring nodes to share common diagonal. This will also result in increasing compromise ratio. Thus from above we can conclude that the AN density has to increased while ensuring that both N p as well as R N P are not large in order to reduce the impact of compromised nodes. But this could increase the cost associated with the deployment. If compromise of nodes can be tolerated then the system can deploy a low density of ANs with large transmission range and fewer power levels. VI. CONCLUSION AND FUTURE WORKS With the proposal above we are able to highlight the effect of Heterogeneity on the performance of Key Management Scheme in Wireless Sensor Networks. We considered a special kind of heterogeneity i.e. Number of Power levels and were able to draw the effect on performance in terms of memory requirements and size of the network supported by LOCK. Average number of nuances/keys received depend not only the maximum transmission radius but also on number of power levels. Although some issues like simulation results etc still needs be addressed but hopefully in our next work we come out with better results on the issue. Future scope lies in making this scheme scalable with respect to new node addition , routing aware and thus achieve secure communication. We have not considered much of intercluster communication model among the Sensor nodes thus open challenge. VII. REFERNCES [1] Mhatre, V. P., Rosenberg, C., Kofman, D., Mazumdar, R., and Shroff, N., “A Minimum Cost Heterogeneous Sensor Network with a Lifetime Constraint”, In Proceedings of IEEE Transactions on Mobile Computing 4, 1 (Jan. 2005), 4-15. [2] M. Younis, M. Youssef, and K. Arisha, “Energy-Aware Routing in Cluster-Based Sensor Networks,” In Proceedings of the 10th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS2002), 2002. [3] K. Arisha, M. Youssef, and M. Younis, “Energy-Aware TDMA-Based MAC for Sensor Networks,” In Proceedings of the IEEE Workshop on Integrated Management of Power Aware Communications, Computing and Networking (IMPACCT 2002), May, 2002. [4] H. Chan , A. Perrig, D. Song, “Random Key Predistribution Schemes for Sensor Networks”, In the Proceedings of the 2003 IEEE Symposium on Security and Privacy, p.197, May 11-14, 2003 10 © 2010 ACADEMY PUBLISHER JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 [5] Anjum, F., “Location dependent key management using random key-predistribution in sensor networks”, In Proceedings of the 5th ACM Workshop on Wireless Security (Los Angeles, California, September 29 - 29, 2006). WiSe '06. ACM, New York, NY, pp 21-30. [6] Kausar Firdous, Sajid Hussain, Laurence Tianruo Yang, Masood Ashraf, “Scalable and efficient key management for heterogeneous sensor networks”, In Journal of Supercomputing 45(1): 44-65 (2008) [7] W. Du , J. Deng , Y. S. Han , P. K. Varshney, “A pairwise key pre-distribution scheme for wireless sensor networks”, In the Proceedings of the 10th ACM conference on Computer and communications security, October 27-30, 2003, pp. 42 - 51 , Washington D.C., USA [8] B. Liu, Z. Liu, and D. Towsley, “On the capacity of hybrid wireless networks”, In Proceedings of IEEE INFOCOM, April 2003, volume 2, pages 1543--1552, San Francisco, CA. [9] L. Eschenauer and V. Gligor, "A Key Management Scheme for Distributed Sensor Networks", In Proceedings of the 9th ACM Conference on Computer and Communication Security, pp. 41-47, November 2002. [10] Cheng Y., Aggarwal D.P., “ An Improved key mechanism for large scale hierarchical wireless sensor networks” ,Proc. of Security Issues in Sensor and Adhoc Networks, Elsevier, Vol. 5(1), p.p. 35-48. [11] P. Gupta and P. Kumar, “Internets in the sky: The capacity of three dimensional wireless networks”, In Proceedings of Communications in Information and Systems, 1(1), pp. 33-50, 2001. [12] S. Zhao, K. Tepe, I. Seskar, and D. Raychaudhuri, “Routing protocols for self-organizing hierarchical ad-hoc wireless networks,” In Proceedings of IEEE Sarnoff 2003 Symposium, 2003. [13] P. Gupta and P. R. Kumar, “The capacity of wireless networks,” IEEE Trans. Inform. Theory, vol. 46, no. 2, pp. 388–404, Mar. 2000. [14] Gang Zhou, Chengdu Huang, Ting Yan, Tian He, John A. Stankovic and Tarek F. Abdelzaher, “MMSN: MultiFrequency Media Access Control for Wireless Sensor Networks,” In Proceedings of IEEE INFOCOM 2006, Barcelona, Spain, April 2006. [15] A. D. Wood and J. A. Stankovic, “Denial of service in sensor networks,” Computer 35(10):54–62, 2002. [16] D. Liu and P. Ning, “Location-based pairwise key establishments for static sensor networks,” in Proceedings of the 1st ACM Workshop on Security of Ad Hoc and Security of Ad Hoc and Sensor Networks in Association with 10th ACM Conference on Computer and Communications Security, Fairfax, Va, USA, October 2003, pp. 72–82. 115 Kamal Kumar received his M.Tech. as well as B.Tech degree from Kurukshetra University, Kurukshetra, India. Presently he is working as Associate Professor in Computer Engineering Department in M.M. Engineering College, Ambala, India. He is pursuing Ph. D from Thapar University, Patiala, India. A. K. Verma is currently working as Assistant Professor in the department of Computer Science and Engineering at Thapar University, Patiala in Punjab (INDIA). He received his B.S. and M.S. in 1991 and 2001 respectively, majoring in Computer Science and Engineering. He has worked as Lecturer at M.M.M. Engg. College, Gorakhpur from 1991 to 1996. From 1996 he is associated with the same University. He has been a visiting faculty to many institutions. He has published over 80 papers in referred journals and conferences (India and Abroad). He is member of various program committees for different International/National Conferences and is on the review board of various journals. He is a senior member (ACM), LMCSI (Mumbai), GMAIMA (New Delhi). He is a certified software quality auditor by MoCIT, Govt. of India. His main areas of interests are: Programming Languages, Soft Computing, Bioinformatics and Computer Networks. His research interests include wireless networks, routing algorithms and securing ad hoc networks. R. B. Patel received a PDF, Highest Institute of Education, Science & Technology (HIEST), Athens, Greece, 2005. He received a PhD in Computer Science and Technology from Indian Institute of Technology (IIT), Roorkee, India. He is member IEEE, ISTE. His current research interests are in Mobile and Distributed Computing, Security, Fault Tolerance Systems, Peer-to-Peer Computing, Cluster Computing and Sensor networks. He has published more than 100 papers in International Journals and Conferences and 17 papers in national journal/conferences. Two patents are also in the credits of Dr. Patel in the field of Mobile Agent Technology and Sensor Networks. 11 © 2010 ACADEMY PUBLISHER 116 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 Peripheral Display for Multi-User Location Awareness Rahat Iqbal, Anne James, John Black Faculty of Engineering and Computing, Department of Computing and the Digital Environment, Coventry University, Coventry, UK Email: {r.iqbal, a.james, john.black}@coventry.ac.uk Witold Poreda UYT Limited Coventry Business Park Coventry, UK Email: [email protected] Abstract— An important aspect of Ubiquitous Computing (UbiComp) is augmenting people and environments with computational resources which provide information and services unobtrusively whenever and wherever required. In line with the vision of UbiComp, we have developed a multiuser location-awareness system by following a user-centred design and evaluation approach. In this paper, we discuss the development of the system that allows users to share informative feedback about their current geographical location. Most importantly, the proposed system is to be integrated in the smart-home environment by portraying location-awareness information on a peripheral display. The proposed system can be used by various users, for example family members, relatives or a group of friends, in order to share the information related to their locations and to interact with each other. Index Terms— Location awareness, Ubiquitous Computing, Calm Technology, Global Positioning System (GPS), Tracking, Focus group, Interviews I. INTRODUCTION According to Mark Weiser’s vision of Ubiquitous Computing (UbiComp), the trend is to move away from the traditional desktop computing paradigm, to integrate seamlessly with the environment, augmenting people and their environment with computational resources which provide information and services unobtrusively whenever and wherever required [1]. The UbiComp paradigm, where computers are everywhere, calls for new technology that prevents humans from feeling overwhelmed by information. ‘Calm technology’ implements this idea by putting computers in the periphery of our attention until needed [2]. Calm technologies easily and seamlessly move from the periphery of someone’s attention and back again when appropriate. In this way, peripheral displays can convey non-critical information, but not distract or burden its users. © 2010 ACADEMY PUBLISHER doi:10.4304/jait.1.3.116-126 In line with this vision of ubiquitous computing and calm technology, several projects such as AMI [3], Interactive Workspaces [4] and CHIL [5] envisage systems that, rather than being used as a tool, support human-human communication in an implicit and unobtrusive way, by constantly monitoring humans, their activities and their intentions. Traffic congestion is increasingly a problem around the world. So there is increasing scope for being delayed by congestion, due to the numbers of vehicles or due to an accident. When we are on a car journey, family members, friends and colleagues may want to know where we are. They may also wish to know if we are alright, making adequate progress and be able to interact (exchange messages) with us. A few years ago we had to depend on our sense of direction and paper maps while travelling. Today many of us cannot imagine driving without a Satellite Navigation or a Global Positioning System (GPS). GPS is not only able to be used to locate unknown places, but also can be used to provide location-awareness information of its carrier. This research raises some security concerns but we contend that there is a good tradeoff between the benefits we can get from this technology and disadvantages of being tracked by someone else. Within the family, tracking can be perceived as a positive step [6]. For instance when a member of the family is traveling at a busy time or in a bad weather, it is good to know that someone has reached their destination safely. There are automatic algorithms and systems which use data gathered from phone calls, sensors embedded in, near or above the road surface or using GPS receivers in some cars themselves, called probe vehicles [7]. In addition there are a variety of Intelligent Transport Systems which attempt to help manage traffic and provide information to travelers [8]. However these systems are limited by the geographic spread of the JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 sensors or the number of probe vehicles. So an entire country cannot be covered. In addition these systems do not cover a single person. This motivates our research and the development of the location-awareness system presented in this paper. We develop a system which supports the sharing of positional data amongst users carrying GPS enabled devices. It is also possible to support non-GPS enabled devices if the user is prepared to input position data manually on an interactive map. Most importantly, the proposed system is to be integrated in the smart-home environment by portraying location-awareness information on a peripheral display. We have not been able to identify a product which would provide a similar level of functionality to that of our system. There are many GPS tracking solutions available on the market at present, but existing systems are mainly associated with some designated GPS receiver hardware which reports position to a server and then positions may be viewed using a computer browser, or are limited to given hardware. The scope of functionality in many cases is very limited and based on one-way communication. None of the systems we investigated was able to share positions with a smart display (in a smart-home environment) to display positions. The rest of the paper is organised as follows: Section II reviews the technological background; Section III discusses the development of the proposed system. This section also describes the requirements captured using a user-centred design and evaluation method; Section IV presents the results of an evaluation of the system in terms of a user evaluation and a comparison of the system to other similar systems. Finally Section V concludes this paper and outlines our future work. II. REVIEW OF EXISTING SYSTEMS A Global Positioning Systems and Mobile Data Modems Part of the technological infrastructure chosen to find location and time are Global Position System (GPS) receivers and mobile data modems. GPS receivers are increasingly being used in consumer applications such as navigation aids for walkers, navigation devices for boats and aeroplanes and in-car satellite navigation systems. The technology is getting increasingly cheap. In-car satellite navigation systems are fairly standard on luxury cars and are prevalent on top-of-the-range models in most categories of cars. Alternatively, in-car satellite navigation systems are available as an optional extra or can be bought as a separate item. Mobile data modems using mobile-phone, packetised, data transmission technology using either General Packet Radio System (GPRS) (a second generation (2G) mobile phone technology) or third generation (3G) is also cheap. In addition mobile networks are widely deployed in the UK, covering most of the UK population and landmass, for example, Vodafone UK coverage map [9], Orange UK coverage map [10] and 3 UK coverage map [11]. © 2010 ACADEMY PUBLISHER 117 B Incident Detection Systems and Intelligent Transport Systems As has been outlined above there has been research and development of systems which detect incidents. There are a variety of approaches [7]: driver-based algorithms, for example correlating mobile phone calls [12]; roadway-based algorithms [13], [14]; probe-based algorithms which use GPS-equipped vehicles [15] and sensor-fusion-based algorithms using, for example, data from fixed detectors and probe vehicles [16]. As discussed above, the limitations of these systems are that they use only the extent of the deployment either of fixed sensors or the number and deployment of the probe vehicles. This cannot possibly cover a whole country. In any case they do not address the issue of the movements of a single vehicle. GPS and the mobile phone networks cover the vast proportion of the UK. C Smart Home There has been much research into the smart home. As will be seen below, the smart home is the embodiment of ubiquitous computing, because the computing integrates seamlessly with the environment. The smart home augments the people and the computational resources in the home provide information and services unobtrusively whenever and wherever required. Smart-home projects falls into two types [17]: • Non-healthcare • Healthcare Some examples of non-healthcare smart-home projects are briefly described, as these are the most relevant. Mozer [18] describes the development of a system that learns the behaviour of the inhabitants of a building and controls the lighting and heating, water heating and ventilation. Microsoft’s EasyLiving project was concerned the production of an architecture and technologies for intelligent environments [19],[20]. An example is someone wants music to be played, then the system switches on the speakers, based on the location of the person [20]. The Ubiquitous Home project in Japan uses cameras and microphones in each room to record the residents; however pressure sensors are used to track the inhabitants and also to determine the position of furniture [21]. Other sensors are IR sensors which are placed in each room and at foot level in the kitchen and corridor and two radio frequency ID systems (active and passive) are used. There are appliances and visible robots in the home are controlled by the Ubiquitous Home system. D Tracking Systems Three systems have been identified that allow tracking of vehicles. The first is the En Route HQ app on the iPhone [22]. The second is Glympse which runs on iPhone and mobile phones running the Android and Windows Mobile operating systems [23]. The third is the TomTom Plus Buddies service [24]. Note that this paper was finalised in mid June 2010. The functionality, availability and hardware and operating systems relating 118 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 to systems and software described in this subsection may have changed since mid June 2010. At the time of writing, the En Route HQ app required an Apple iPhone 3G or 3GS [22]; the iPhone 3G and iPhone 3GS possess a GPS receiver. The trip can be viewed on any sort of Apple iPhone existing at the time of writing or on the En Route HQ web site [22]. Fig. 1 shows some screenshots of the iPhone app. En Route HQ has the ability to exchange messages between users of Apple iPhones or the En Route HQ website. The problem with the En Route HQ app, at the time of writing, is that it is limited to an iPhone 3G or 3GS for the tracking function [22]. Another problem, at the time of writing, is that only one trip can be seen at a time in a single web browser window or window tab [22]. The Glympse app requires an iPhone 3G or 3GS or a mobile phone running the Android or Windows Mobile operating systems and which have a GPS receiver for tracking [23]. In addition an iPod touch connected to WiFi can be used to send a position. A person’s location can be viewed, at the time of writing, using the app on an Apple iPhone, an Apple iPod Touch, Apple iPad or a mobile phone running the Android or Windows Mobile operating systems or on the Glympse web site [23]. Multiple people can be tracked using the Glympse app, which appear on different screens [23], [25]. Fig. 2 shows some screen shots of the Glympse app on an Apple iPhone. A problem with Glympse, at the time of writing, is that there is limited scope for interaction by sending Figure 1. Figure 2. © 2010 ACADEMY PUBLISHER messages. A message can be sent by a user of the app, when initially notifying his or her location. However in order to send another message, the location notification must be resent. There is no scope for viewers of a location either via the app or the Glympse website for sending messages back. The TomTom Buddies service requires a TomTom satellite navigation device and linking the TomTom satellite navigation device to a mobile phone via Bluetooth. A screenshot of the TomTom is shown in Fig. 3. Two problems with the TomTom Buddies system are: the position update is not automatic and requires a user to request an update; and the system is tied to having the TomTom system. In addition to the tracking applications mentioned above a there are a number of other multi-user location awareness systems, based around GPS-enabled mobile phones. Examples of these are as follows: • • • • • • Google Latitude [26] Centrl [27] Pocket Life and Pocket Life Lite [28] Look 8 Me tracking [29] Locus [30] Locc.us: [31] Screenshots of the En Route HQ app from the En Route HQ website [22] Screenshots of the Glympse app from the iTunes Store on the iTunes application [25] JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 Figure 3. III. Screenshot of the location of a TomTom buddy [24] DEVELOPMENT OF THE SYSTEM We develop the location-awareness system by following a user-centred design and evaluation methodology [32]. We conducted a user study consisting of a focus group and interviews. The interest of this study was to inform the system design in order to design and develop a location-awareness system. A Focus Groups We established two focus groups consisting of 7 people in each with different age ranges to discuss the core requirements and most importantly, the usefulness of the system and whether they would use such a system. The merits and demerits of such a system were discussed and the participants’ opinions were recorded for later analysis. All participants of the focus group thought the idea was interesting and useful. However they all expressed concerns about the ethics and privacy of the system. The consensus was that tracking was to be on a case-by-case basis and they would not want to be tracked all the time. The results also showed that there would be some situations where users would not want to share position information and others where sharing position information would be very useful. In addition the second focus group raised two interesting issues regarding the use of the system being used to track drivers. The first was concerns about safety using such a system while driving. The second concerned a person at home wanting to know when the driver (e.g., a family member or friend) was arriving back at home or a house where they were going to visit. This was so that the person in the house can schedule their activities, prior to the arrival of the driver. B Interviews The interviews were used to capture further user requirements. The interviews were held with three doctorate, two post-doctorate and two undergraduate students for one hour. A usability expert also participated in the interview sessions. The interviews were used to inform the system design and importantly, to record the missing functionality for the second stage of iteration of the user-centred design process. © 2010 ACADEMY PUBLISHER 119 C User Requirements and Design Considerations The system provides three different user interfaces: a smart-phone solution, a web site solution and a smarthome display. The smart-phone and web solutions provide a similar level of functionality. The system allows the Web user to: register; confirm email address; invite friends; accept or refuse invitations; see friends’ positions; send and receive messages, attach information to locations; set alerts on positional data; and manually set their own position. The smart-home display allows the viewing of positions messages, selection of the person to view and sending of messages from the person. As a source of GPS data, the mobile user uses the internal or external GPS receiver connected to his/her mobile device. The system protects the user against potential problems with connection and will re-establish connectivity if necessary. During the registration process the users need to provide the system with a valid email address which is used to deliver a dynamically generated, unique link. This is required to fully activate the account and confirm ownership of specified email address. A logged-in user can use the system interface to invite friends with whom who the user would like to share positions. The system provides a high level of privacy. Only an account owner should decide who will see his or her position. To achieve this level of privacy, the system uses invitations, which must be manually approved by the account owner. The user may see other users’ positions from his friends’ list only if those friends have agreed. If a mobile device malfunctions, it is possible to set the position manually using a web interface. Another functional requirement is that a user can create a number of locations with associated names which could potentially be used to create position related alerts. Additional requirements for the specific driver tracking application include information about estimated arrival time at location and reasons for delay to be supplied to the monitoring parties. The non-functional requirements are related to cost, energy effectiveness, security and usability aspects of the system design. Cost effectiveness - the smart-phone application requires the use of a network connection in order to work correctly. Mobile Internet is becoming more and more popular and in many cases it is included in contracts for free. The system also targets users who need to pay for their network use. Thus the amount of data sent between 120 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 the client application and the server should be minimised. To further improve cost effectiveness, a periodical update should be used. This gives the user an opportunity to decide how often position data should be transferred to the server. Energy effectiveness – because the system is operating on mobile devices, energy use is a very important matter. The system should not interfere with normal phone functions. When an application is using an internal GPS receiver it could easily lead to phone battery discharge. Because of that, periodical updates, used to save cost, could also increase battery life. The application should not use GPS hardware constantly, instead the normal state should be off with activation only for short periods of time to obtain position. In this way the system will give a user the opportunity to decide how often to update position. Security – the system will handle confidential data, namely user position. Thus it should provide a high level of security and make sure that only selected users can have access to the data. To transfer data safely between client applications and server, SSL is used. As an additional security measure, a ticket authentication system is introduced. This reduces password traffic. Platform compatibility – part of the system is targeted to be used on mobile devices. The system should be easily expandable to other mobile platforms available on the market. The requirement is achieved by using with XML and Web Services which are platform independent. The system stores all logic on the server side of the system. Thus all major changes are made in one single place. Smart Home - The smart home will have Internetconnected screens in more than one room, for example in the kitchen, living room and dining room. The screens would be built into the wall. The tracking application could either be on a web page or be a web-service application. The application could display the current position, the intended route, the estimated arrival time and/or the estimated time to arrival and any messages sent by the driver. One of the major design considerations is that the display should be calm, unobtrusive and peripheral. The display could switch on by itself or display a message when one arrives. The home inhabitants could switch the display on or to the display when they wanted to check the location of the driver. Another interaction of the system with the smart home is if there is no one home, then the heating could be automatically activated as the tracked individual is nearing home. Similarly the system on the smart-phone device could be integrated with a smart home’s control system and thus at the same time as being able to reveal personal location, could also control the smart-home environment. To be specific the requirements for smart-home display, in addition to those for the web page are that there should be that it should be calm and unobtrusive © 2010 ACADEMY PUBLISHER and in the periphery of someone’s attention when there is no information to convey. When there is information to convey it should move to the centre of attention. So when there is nothing to report, the display should be switched off. The display could be used for other functions, for example watching TV or a DVD, playing video games or browsing the internet. Alternatively if the display is being used for anything else, then the tracking display should be minimized. When there is a message, then the display should switch on if it was off. Alternatively, if there is a message and if the display is on and it is being used for another purpose, then the tracking display should become more prominent. The information to be displayed is the position, speed and direction of travel of the driver. If known, then the destination and the intended route should be displayed. In these ways the display would be peripheral and unobtrusive. The interactions with the display are as follows. The tracking display can be selected. There should be the option to select either the entire trip or centre the display on the current location. The display should be able to handle more than one driver, as there maybe more than one driver who is being tracked. There should be the option to switch between viewing individual drivers or see a map covering all drivers. There should be the ability to send messages to the drivers and to receive messages from drivers. The inhabitant of the home should be able to interact with the display using voice, keyboard or smart phone. D Technologies Used This section includes presentation of all main technologies used in the development. The system consists of three sub systems (as shown in Fig. 4) which represent different programming areas: web applications; web services; and mobile applications. All interact with each other so determination of the requirements and techniques of each was crucial to avoid interoperability problems. The sub systems and technologies used are listed below. The System -.NET framework: There is a wide range of technologies and programming environments available, from open-source platforms to solutions provided by companies like Microsoft. Only two platforms meet the requirements of the three aspects of this development. One of them was an Open Source JAVA provided by Sun Microsystems and the other was .NET framework supplied by Microsoft. Each of them has a specific pros and cons. The system was built using .NET Framework, mainly to provide some level of similarity between code and functionality in different parts of system. The Web Site - ASP.NET Framework 3.5: The web site was created using ASP.NET Framework 3.5 SP1 which was the latest available version released. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 121 Figure 4. System Overview Design The Mobile Application - Windows Mobile 6.1 SDK: The mobile application was developed using Visual Studio and C with Windows Mobile 6.1 SDK which brought a full mobile functionality to Visual Studio. The SDK includes a full documentation, code samples, libraries and tools – everything necessary to develop a rich mobile application. SDK libraries in conjunction with GPSID – GSP Intermediate Driver made it possible to access a GPS receiver and obtain GPS data [33]. GPSID works as an agent between the application and the GPS hardware. The main benefits of this solution are that GPSID enables use of a GPS receiver to handle multiple applications at the same time. To run the application on a smart phone, .NET Compact Framework 3.5 was necessary The Server Functionality XML Web Services: The initial intention was to use Web Services provided by WCF – Windows Communication Foundation. WCF is an API introduced in .NET Framework 3.0 which is used to build distributed systems. Unlike past solutions WCF provides a single, unified and extendable programming object model. Authentication - Microsoft Membership API: Security is an important aspect of the proposed system. Providing the user with a high level of privacy was one of the main concerns. Login facilities are very specific and provide almost the same scope of functionality for most of the applications. To save time and provide the developer with a high quality solution, Microsoft has included a Membership API with ASP.NET 3.5 [34]. The Membership API allows the developer to avoid repetitive implementation of authentication features and also to be sure that the used solution is bug free. It provides functionality for the most common activities like: registration, password change, password recovery, and login. © 2010 ACADEMY PUBLISHER The Maps - Google Maps: This project uses Google Maps. Google Maps does not have a dedicated API for ASP.NET and so interfacing was achieved through Javascript. The User Interfaces - ASP.NET AJAX: ASP.NET AJAX was used in this project to create user interfaces [35]. AJAX allows an application to retrieve data from the server while the user is interacting with the web site. In consequence only a selected part of interface is changed during user interaction and the user does not have to wait for the full web page to reload which creates a better experience. Security - SSL (TLS Transport Layer Security)/SOAP Headers: Security is very important task in this project. The application will operate with confidential information such as actual user location. That why TLS was used to create a secure link between the client application and the server. The Database - SQL Server: Visual Studio and the .NET Framework are well integrated with MS SQL Server. Use of MS SQL Server guaranteed a simplified access to diagnostics, database management and use of framework-provided security mechanisms. Data Querying - LINQ: LINQ is a .NET Framework component developed by Microsoft that expands functionality of the .NET language by supporting the construction and compilation of data queries [36]. E System Interfaces Fig.5 shows screenshot of the system website. As can be seen in the figure, the main content of a page is divided into two parts. On the left hand side there is a map showing users’ positions; on the right hand side a vertical menu including a contact list, some additional features and messages. Graphics were limited to a minimum to reduce web page loading time. The whole layout is based on CSS (Cascade Style Sheets). Fig. 6 shows the main screen of the smart-phone interface. 122 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 In comparison with the web page, we see in Fig. 6, that the smart-phone layout is very simple, mainly due to limited amount of space provided by a mobile device. In the central point we can see a map and just below it there is a ‘friend’ list with scroll bar. On the top and bottom of the screen there are two status bars, ‘message’ and ‘application’. It was not possible to provide all the functionality on the one screen. Most of the features use separate screens or partially cover the main screen. Fig. 7 is the design for the smart-home display. The interface is designed to be interacted with by voice. It is a multifunction display; in the lower left there is the mechanism for selecting what is to be displayed, which is highlighted. The display has an area for the showing the position of one of the friends. There is also area for the messages received from the friend. In addition the text below the tracking area shows the friend list; the highlighted name is the one being the one being displayed. Below there is a means for selecting what is displayed on the map. Below that is a box for entering and sending a message to the friend. The is entered using speech input In order to be peripheral when there is not a message and the display is not being used for other purposes, then the display could be blank. When a message arrives from the driver then an audio-alert would be played (similar to that when a text message arrives on a mobile phone). In addition the display could illuminate and the screen seen in Fig. 7 be displayed. If the display is being used for another function then the new message would appear on the screen in some way. For example the message could be superimposed in the middle of the display, or at the top or bottom of the screen. In addition the display could be integrated with a smart-home environment so that a system would sense in which room or rooms the occupants of the house were and only display the tracking screen on a display in those rooms. If the inhabitants of the house were not present in a room with a smart display, then some kind of audio alert could be played in those rooms, for example like the audio alert for a text message on a mobile phone. Alternatively something more meaningful such as a speech synthesised alert could be played for example “Message received from Adam”. The inhabitants could then go to a room with a smarthome display. In these ways the display would be calm and move from the periphery to the centre of attention when required, as discussed above. IV. Figure 5. GPS Tracking System Web Page Main Screen Figure 6. Smart-phone Main Screen EVALUATION A Introduction Two types of evaluation of the system were made. The first was a user evaluation of the system other than the smart-home display and separately an evaluation of smart-home display on its own. The second part was a comparison of the system to other existing systems identified in Section II D. Figure 7. Smart-home Display © 2010 ACADEMY PUBLISHER JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 B User Evaluation Both the qualitative as well as quantitative evaluation was carried out in order to test the functionality and behavior of the system as well as to test the usefulness of the system with the potential users. The system apart from the smart-home display was tested with fifteen potential users. The users were from different age ranges. Age factor was very important to simulate potential user behaviour. All participants were from different academic and cultural backgrounds and represented various levels of IT skills. For evaluation purposes, the system was deployed on a laptop computer and the users were connected to the same private network. In this way it was possible to test system using different computers and different browser versions. The system operated well and the user feedback was positive. The second evaluation was conducted with ten users. All the participants were from the same academic department (Computer Science students) but from different cultural backgrounds. After the demonstration of the system (apart from the smart-home display), the users were asked the following questions. The results are shown in Fig.8. • Is the system easy to use? • Is the system useful? • Would you use the system to track location and time of your family and friends? • Would you let others track your location and time always? • Would you let others track your location and time on a case-by-case basis? • • • • • • • • • Figure 8. User Evaluation Results The third part of the user evaluation was to interview separately a group of four potential users. The users ranged in age from 40 to 60 and consisted of three females and one male. The entire system was explained to the potential users. Then the users were shown the smart home display in Fig, 7, the functionality was explained and two potential scenarios were given involving journeys were described, which illustrate where the system may be useful. An alternate screen with a different driver just starting off, similar to Fig. 7 was shown as well as an alternative design to Fig. 7. The questions asked were: • In general do you think all the information is there which you require? • Do you think the idea of using speech input is a good idea? © 2010 ACADEMY PUBLISHER 123 What do you like about the display and what don’t you like? What do you think is best: Showing the route (planned) taken as: o A series of the position markers? o A solid path. There would be a difference in colour and a direction marker which would show where the driver is? Do you think it is useful to show the planned route so that the viewer can see where the driver is going or if the driver has deviated from the planned route? Do you think it is useful to show the current speed? Time to arrival: o Do you think it is useful to show the time to arrival? o Alternatively do you think it is better to show what the estimated arrival time is? o Do you think it would be good to have the option to select either of the time to arrival or the arrival time? Which do you think is best: o Showing the messages in a box? o Showing the messages as a speech bubble? Do you think showing the driver list in the way shown is a good idea? Do you think that showing the kind of map display (Referred to as View on the screen mock-up) is a good idea? Do you like it? What improvements would you like to see? Speech input was thought to be good idea as the hands are not tied up interacting with the screen. Other activities were thought to be able to be performed while interacting with the screen, for example while cooking. Speech input was also perceived to be a more natural way to interact with a computer . Furthermore speech was thought to be a better way for people with disabilities to interact with a system and also good for people who are not adept at using a keyboard. However there was the worry that speech other than the commands could present a problem, due to speech not intended to be commands being misinterpreted by the system. Another theme that emerged was the ability to have options and flexibility in what was displayed. For instance, to have a basic operating mode, but also to have a more flexible mode or a mode which was more complicated. One example was to choose to display the message box or not. Another example was to have different map display options. For example, to have the map occupying the whole of the top of the display, or perhaps the entire display area. A further example of flexibility is the option of choosing to display either or both of estimated time to arrival and estimated arrival time. Yet another example of flexibility is either or both the current speed and a short- 124 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 term average speed (perhaps over the last five minutes). One idea proposed was to have speeds scrolling across the bottom of the screen in some way. There were mixed feelings about the map display. Some liked the position markers, some did not. One interviewee thought that the position markers of the same size were confusing as it was not clear where the driver currently was or was going. It was suggested that there might be someway to distinguish what was the current position for example only having one position marker, or position markers decreasing in size as they get older. One idea mentioned in two interviews was to have a flashing position marker. It was also thought that having two lines showing where the driver had been a line in a different colour showing where the driver was going to go could be confusing. There were also mixed views about the usefulness of displaying the planned route. C Comparison of System to Other Systems This section shows a comparison of the proposed system and existing system in Table 1. As can be seen most systems allow many to many tracking. Some are available on more than one type of mobile phone hardware or operating systems. However no other system are integrated with the smart-home environment. The smart home is important, because of energy efficiency (for example [37]), health and safety (particularly monitoring the health of the elderly) and assisting people [17] and also government grants. As was seen in Section IIC there has also been work that monitors the behaviour of the inhabitants of a smart home for example [18], [19],[20] and [21]. So integration of systems with the smart home is important. Only the proposed system does this and the other systems do not. Note that this paper was finalised in mid June 2010. The functionality, availability and hardware and operating systems relating to systems and software in Table 1 may have changed since mid June 2010. V. CONCLUSIONS AND FUTURE WORK In this paper, we have discussed the development of a multi-user location-awareness system, to be used to locate or track different people, especially family members and friends. The tracking is particularly important and the security and safety concerns are more in bad weather and when there are traffic problems. The proposed system integrates with the smart home and incorporates a context-aware peripheral display in the smart home. The system is developed using a user-centred design and evaluation methodology. The evaluation results show that the system is useful and would be used by the users to track their family members and friends. However, it is also concluded that tracking would not always be permitted due to privacy concerns. Changes could be made to the smart home display, following on from comments made by the interviewees in the user evaluation. For example to allow more flexibility in what is displayed, in terms of speed, arrival time and t the planned route ant the route taken, if messages are © 2010 ACADEMY PUBLISHER displayed if there are new messages and the area of the screen taken up by the map. A future application is the use of Artificial Intelligence for the system alarm function. It could be designed in a way that would allow learning of user routines based on collected data. The system would then be able to react intelligently to changing conditions. An additional feature that could be built around existing code is the display of past positions. That would allow user to see visited places, or for example to recreate a path of some trip. The system could potentially share this information with other users as recommendations of interesting places to visit. The system could be also integrated with social networking web sites and allow users to share their positions using the web site specific interface. The system could be expanded to provide car navigation functionality. A community could be built around this to share information about road conditions such as road works, traffic or accidents. The system could be also adapted for a specific business use, to meet the requirements of that particular market. An additional possibility is the interfacing to systems that provide information on aircraft or train arrival times. Furthermore the functionality of the smart-home display could be increased so that a user could set an alarm to alert the user when a driver was at the destination or was a user-specified time period (ie so many minutes) from reaching their destination. This functionality would be useful when a driver’s destination was a home where the smart-home display is located. That way an inhabitant of the smart home could be ready for the arrival of a driver, for example have refreshments ready. The alert would mean playing an audio alarm and displaying the tracking screen on the smart-home display in the room where people were located. REFERENCES [1] M. Weiser, “Some Computer Science issues in Ubiquitous Computing”, Communications of the ACM, vol. 36, pp. 75-84, 1993 [2] M. Weiser and J. Seely Brown, “The coming age of calm technology”, in Beyond Calculation - The Next Fifty Years of Computing, P. J. Denning and R. M. Metcalfe, Eds. Copernicus, 1996 [3] AMI, The European project AMI (Augmented Multiparty Interaction), available via: http://www.amiproject.org [4] B. Johanson, A. Fox and T. Winograd, “The Interactive Workspaces Project: Experiences with Ubiquitous Computing Rooms”, IEEE Pervasive Computing Magazine, vol. 1, no. 2, pp. 515-523, 2009, [5] CHIL Project, available via: http://chil.server.de [6] V.-J. Khan, P. Markopoulos and B. Eggen, “On the role of awareness systems for supporting parent involvement in young children’s schooling”, in IFIP International Federation for Information Processing, vol 241, 2007, pp. 91–101. [7] E. Parkany, and C. Xie, “A complete review of incident detection algorithms & their deployment: what works and what doesn’t”, University of Massachusetts Transportation Centre , prepared for The New England Transportation Consortium, available via http://www.ct.gov/dot/LIB/dot/documents/dresearch/NET CR37_00-7.pdf JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 125 TABLE I. COMPARISON OF PROPOSED SYSTEM TO EXISTING SYSTEMS Software System or Tracking displayed on device Other means of displaying tracking Web page and mobile phone Web page, or original iPhone Web page, iPod Touch, iPad Proposed system Many many to En Route [22] Many Many to Glympse [23] Many many to TomTom Buddies [24] Google Latitude [26] Many many Many many to No to Web page Centrl [27] Many Many to Web page Pocket Life and Pocket Life Lite [28] Locus [30] Many many to Web page HQ Many to many Location posting is manual No Messages or Notifications Compatibility with Hardware Privacy Integration with smart home Anytime All hardware types Yes Yes Anytime iPhone 3G, iPhone 3GS, iPhone (original) Yes No Anytime, but need to modify notification Anytime iPhone 3G, iPhone 3GS, iPod Touch, mobile phone with Android or Windows Mobile operating systems TomTom only. Mobile phone with Bluetooth required Android OS mobile phones iPhone and iPod touch devices Most colour Blackberry mobile phones Most Windows Mobile 5+ devices Most Symbian S60 (Nokia) iPhone Blackberry Android Nokia iPhone 3G/3GS, certain Nokia, Blackberry, Samsung, LG, Samsung, Sony Ericsson and HTC Yes No Yes No Yes No Yes No Yes No iPhone 3G, iPhone 3GS Yes No Anytime Anytime Anytime Only with a location post [8] L. Figueiredo, I. Jesus, J.A.T. Machado, J.R. Ferreira and J.L Martins de Carvalho, “Towards the development of intelligent transportation systems.” in Proceedings of Intelligent Transportation Systems, 2001, Oakland California, USA [9] Vodafone UK coverage map, available via: http://maps.vodafone.co.uk/coverageviewer/web/default.as px [10] Orange UK coverage map, available via: http://search.orange.co.uk/ouk/portal/coveragechecker.html [11] 3 UK coverage map, available via: http://www.three.co.uk/Help_Support/Coverage [12] A. Skabardonis, T. Chira-Chavala and D. Rydzewski,. “The I-880 Field Experiment: Effectiveness of Incident Detection using Cellular Phones California PATH Program Report” UCB-ITS-PRR-98-1, Institute of California, Berkeley , 1998 [13] K.N. Balke, “An evaluation of existing incident detection algorithms”. Research Report, FHWA/TX-93/1232-20, Texas Transportation Institute, Texas A&M University System, College Station, TX , 1993 [14] M. Bell and B. Thancanamootoo , “Automatic incident detection in urban road networks”, in Proceedings of Planning and Transport Research and Computation (PTRC) Summer Annual Meeting, University of Sussex, UK, 1986, pp. 175-185. [15] M. W. Sermons and F.S. Koppelman, “Use of vehicle positioning data for arterial incident detection”. © 2010 ACADEMY PUBLISHER [16] [17] [18] [19] [20] [21] [22] [23] Transportation Research Part C, vol. 4, no. 2, pp. 87-96, 1996 N.E. Thomas, “Multi-state and multi-sensor incident detection systems for arterial streets”. Transportation Research Part C, vol. 6, no. 2, pp 337-357, 1998 M. Chan, D. Esteve, C. Escriba, and E. Campo, “A review of smart homes – Present state and future challenges”. Comp. Meth. Prog. Biomed., vol. 91, pp 55-81 July 2008 M.C. Mozer, “The neural network house: an environment that adapts to its inhabitants”, in Proceedings of the AAAI Spring Symposium on Intelligent Environments, Menlo Park, California, U.S.A.: AAAI Press, 1998, 110–114. J. Krumm, S. Harris, B. Meyers, B. Brumitt, M. Hale, S. Shafer, S “Multi-camera multi-person tracking for EasyLiving”, in Proceedings of 3rd IEEE International Workshop on Visual Surveillance, 2000, pp 3–10. B. Brumitt, B. Meyers, J. Krumm, A. Kern and S. Shafer, S. “EasyLiving: technologies for intelligent environments”. In proceedings of Handheld and Ubiquitous Computing. Second International Symposium, HUC 2000 (Lecture Notes in Computer Science Vol.1927), 2000. pp12-29 T. Yamazaki, ˙”Beyond the smart home” In: Proceedings of the International Conference on Hybrid Information Technology (ICHIT’06), 2006, pp. 350–355 En Route HQ, En Route HQ web page, available via http://www.enroutehq.com Glympse, Glympse web site available via http://www,glympse.com 126 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 [24] TomTom, TomTom PLUS Services Buddies, available via http://www.tomtom.com [25] Glympse on iTunes Store on Apple iTunes application [26] Google Latitude, Google Latitude web page available from http://www.google.com/intl/en_us/latitude/intro.html [27] Centrl, Central web site available from http://centrl.com [28] Pocketlife, Pocketweb web page available via http:.//www. pocketweb.com [29] Look 8 Me, digital art media GmbH web page, available from http://www.look8me.de [30] Locus, Nsquared web site, available from http://nsquaredsolutions.com/Locus/ [31] Locc.us, Locc.us web page available from Locc.via http://locc.us [32] R. Iqbal , N. Shah N, A. James and J. Duursma , “Usercentred design and evaluation of support management system”, in Proceedings of the 13th International Conference on Computer Supported Cooperative Work in Design, Santiago, Chile, April 2009, pp. 155-160 [33] Microsoft, GPS Intermediate Driver Architecture, available via http://msdn.microsoft.com/enus/library/bb201942.aspx. [34] Microsoft, NET Compact Framework 3.5, available via: http://www.microsoft.com/ [35] Microsoft, About ASP.NET AJAX, available via http://www.asp.net/ajax/about/ [36] Microsoft, Data Platform Development Center – LINQ, available from http://msdn.microsoft.com/enus/data/cc299380.aspx [37] N. Pardo, A. Sala, A. Montero, J.F. Urchueguia, J. Martos, “Advanced control structure for energy management in ground coupled heat pump HVAC system,” In Proceedings of the 17th World Congress The International Federation of Automatic Control, 2008, pp 2448-2453 Dr Rahat Iqbal is a Senior Lecturer in the Distributed Systems and Modelling Applied Research Group at Coventry University. His main duties include teaching and tutorial guidance, research and other forms of scholarly activity, examining, curriculum development, coordinating and supervising postgraduate project students and monitoring the progress of research students within the Department. His research interests lie in requirements engineering, in particular with regard to user-centred design and evaluation in order to balance technological factors with human aspects to explore implications for better design. A particular focus of his interest is how user needs could be incorporated into the © 2010 ACADEMY PUBLISHER enhanced design of ubiquitous computing systems, such as smart homes, assistive technologies, and collaborative systems. He is using Artificial Intelligent Agents to develop such supportive systems. He has published more than 50 papers in peer reviewed journals and reputable conferences and workshops Dr Anne James is Professor of Data Systems Architecture in the Distributed Systems and Modelling Applied Research Group at Coventry University. Her main duties involve leading research, supervising research students and teaching at undergraduate and postgraduate levels. Her teaching interests are enterprise systems development, distributed applications development and legal aspects of computing. The research interests of Professor James are in the general area of creating systems to meet new and unusual data and information challenges. Examples of current projects are the development of Quality of Service guarantees in Grid Computing and the development of special techniques to accommodate appropriate handling of web transactions. Professor James has supervised around 20 research degree programmes and has published more than 100 papers in per reviewed journals or conferences. She is currently also involved in an EU FP7 funded programme to reduce energy consumption in homes, through appropriate data collection and presentation. Dr John Black has a B.Sc. in Physics and Astrophysics as well as a Ph.D, in astronomical image processing, both obtained from King’s College, University of London. He has conducted research in vector quantisation, image and data fusion, tracking, image and video compression and data mining. He has worked at Coventry University, University of Warwick, QinetiQ and QinetiQ’s predecessor organisations. At the time of writing of this paper he was completing an M.Sc. in software engineering at Coventry University. Witold Poreda is a software developer at UYT Limited , Coventry, UK. UYT Limited is an automotive component manufacturing facility producing Body-in-White (BIW) components and sunroof assemblies. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 127 Discrete Characterization of Domain Using Semantic Clustering Sanjay Madan Comviva Technologies Ltd., MBS-PACS, Gurgaon, India. Email: [email protected] Shalini Batra Computer Science and Engineering Department, Thapar University, Patiala, Punjab, India Email: [email protected]. Abstract—Lots of approaches have been developed to understand the software source code and majority of them are focused on program structural information which results in the loss of domain semantic crucial information contained in the text or symbols of source code. To understand software as a whole, we need to enrich these approaches with conceptual insights gained from the domain semantics. This paper proposes the mapping of domain to the code using the information retrieval techniques to use linguistic information, such as identifier names and comments in source code. Concept of Semantic Clustering has been introduced in this paper and an algorithm has been provided to group source artifacts based on how the synonymy and polysemy is related. Based on semantic similarity automatic labeling of the program code is done after detecting the clusters, and is visually explore in 3-Dimension for discrete characterization. This approach works at the source code textual level which makes it language independent. The approach correlates the semantics with structural information applies at different levels of abstraction (e.g. packages, classes, methods). Index Terms— Information retrieval, Semantic clustering, Software reverse engineering. I. INTRODUCTION TO get knowledge about a software system is one of the main activities in software reengineering. It has been estimated that up to 60 percent of software maintenance is spent on comprehension [1]. This is because a lot of knowledge about the software system and its associated business domain is not captured in an explicit form. Most approaches that have been developed focus on program structure [2] or on external documentation [3, 4]. However, the identifier names and the source code comments are the main fundamental source of information. The source code comprises of two types of communication: human-machine communication through program instructions and human to human communications through names of identifiers and © 2010 ACADEMY PUBLISHER doi:10.4304/jait.1.3.127-132 comments [5]. The executables are for machine where as code is written for humans not for machines. Let us consider a small code example, which tell whether a time value is in the morning: /** Return true if the given 24-hour time is in the morning and false otherwise. */ public boolean isMorning(int hours,int minutes,int seconds) { if (!isDate(hours, minutes, seconds)) throw Exception(”Invalid input: not a time value.”) return hours < 10 && minutes < 60 && seconds < 60; } When we strip away all identifiers and comments, from the machine point of view the functionality remains the same, but for a human reader the meaning is obfuscated and almost impossible to figure out. In our example, retaining formal information yields: public type1 method1(type2 a, type2 b, type2 c) { if (!method 2(a, b ,c)) throw Exception(literal 1). return (a < A) && (b < B) && (c < C); } In this informal information, the vocabulary is presented in random order and the domain of the code is still recognizable. In this example, retaining only the naming yields: is int hours minutes int < minutes input hours is seconds && boolean morning false 24 time minutes not 60 invalid && value seconds time < seconds hour given hours 60 12 < morning date int is otherwise [5]. It is a well known fact that information retrieval provides means to analyze, classify and characterize text documents based on their content and the given representation of documents as bag-of-terms is a wellestablished technique in information retrieval (IR) used to model documents in a text corpus. Apart from external documentation, the location and use of source-code identifiers is the most frequently consulted source of 128 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 information in software maintenance. In the software analysis different approaches that apply IR on external documentation [6, 7], but only few work has been focused on treating the source code itself as data source. In this case we are using information retrieval to derive topics from the vocabulary usage at the source code level. First three steps of the domain extraction from source code include: pre-processing, applying LSI, and clustering. Furthermore we retrieve the most relevant terms for each cluster, thus in short the approach is: (1) Pre-processing the software system. Break the system into documents and build a term-documentmatrix that contains the vocabulary usage of the system. (2) Applying Latent Semantic Indexing. Use LSI to compute the similarities between source code documents and illustrate the result in a correlation matrix [10]. (3) Identifying topics. Then cluster the documents based on their similarity, we rearrange the correlation matrix and each cluster is a linguistic topic. (4) Describing the topics with labels. Use LSI again to retrieve for each cluster the top-n most relevant terms. II. LATENT SEMANTIC INDEXING Latent Semantic Indexing (LSI) is a technique common in information retrieval to index, analyzes and classifies text documents. It analyzes how terms are spread over the documents of a text corpus and creates a search space with document vectors: similar documents are located near each other in this space and unrelated documents far apart of each other. Since LSI can be used to locate linguistic topics in a set of documents [8, 9], it is applied to compute the linguistic similarity between source artifacts (e.g. packages, classes or methods) and cluster them according to their similarity. This clustering partitions the system into linguistic topics that represent groups of documents using similar vocabulary. It is used to analyze the linguistic information of a software system as the source code is basically composed of text documents. To illustrate it further, like other IR techniques, Latent Semantic Indexing is based on the vector space model (VSM) approach. This approach models documents as bag-of-words and arranges them in a Term-Document Matrix A, such that ai,j equals the number of times term ti occurs in document dj. LSI has been developed to overcome problems with synonymy and polysemy that occurred in prior vectorial approaches, and thus improves the basic vector space model by replacing the original term-document matrix with an approximation. This is done using singular value decomposition (SVD), a principal components analysis (PCA) technique originally used in signal processing to reduce noise while preserving the original signal. Assuming that the original term-document matrix is noisy (the synonymy and polysemy), the approximation is interpreted as a noise reduced – and thus better – model of the text corpus. © 2010 ACADEMY PUBLISHER For example, a typical search engine covers a text corpus with millions of web pages, containing some ten thousands of terms, which is reduced to a vector space with 200-500 dimensions only. In Software Analysis, the number of documents is much smaller and we reduce the text corpus to 20-50 dimensions. There is a wide range of applications of LSI, such as automatic assignment of reviewers to submitted conference papers [10], cross-language search engines, spell checkers and many more. In the field of software engineering LSI has been successfully applied to categorized source files [11] and open-source projects [12], detect high-level conceptual clones [13], recover links between external documentation and source code [14,15]. Furthermore LSI has proved useful in psychology to simulate language understanding of the human brain, including processes such as the language acquisition of children. Figure 1 schematically represents the LSI process. The document collection is modeled as a vector space. Each document is represented by the vector of its term occurrences, where terms are words appearing in the document. The term-document-matrix A is a sparse matrix and represents the document vectors on the rows. This matrix is of size n × m, where m is the number of documents and n the total number of terms over all documents. Each entry ai,j is the frequency of term ti in document dj . A geometric interpretation of the termdocument-matrix is a set of document vectors occupying a vector space spanned by the terms. The similarity between documents is typically defined as the cosine or inner product between the corresponding vectors. Two documents are considered similar if their corresponding vectors point in the same direction. Figure 1, LSI takes as input a set of documents and the terms occurrences, and returns as output a vector space containing all the terms and all the documents. The similarity between two items (terms or documents) is given by the angle between their corresponding vectors [5]. LSI starts with an input as term-document-matrix, weighted by a weighting function to balance out very rare and very common terms. SVD is used to break down the vector space model into less dimensions. This algorithm preserves as much information as possible about the relative distances between the document vectors, while collapsing them into a much smaller set of dimensions. SVD decomposes matrix A into its singular values and its singular vectors, and yields – when truncated at the k largest singular values – an approximation A` of A with rank k. Furthermore, not only the low-rank termdocument matrix A` can be computed but also a termterm matrix and a document-document matrix. Thus, LSI JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 allows us to compute term-document, term-term and document-document similarities. As the rank is the number of linear-independent rows and columns of a matrix, the vector space spanned by A` is of dimension k only and much less complex than the initial space. When used for information retrieval, k is typically about 200-500, while n and m may go into millions. When used to analyze software on the other hand, k is typically about 20−50 with vocabulary and documents in the range of thousands only, and since A` is the best approximation of A under the least-square-error criterion, the similarity between documents is preserved, while in the same time mapping semantically related terms on one axis of the reduced vector space and thus taking into account synonymy and polysemy. In other words, the initial term-document-matrix A is a table with term occurrences and by breaking it down to much less dimension the latent meaning must appear in A` since there is now much less space to encode the same information. Meaningless occurrence data is transformed into meaningful concept information. III. TERM AND DOCUMENT SIMILARITY To show the SVD factors geometrically, the rows of the matrices are taken as coordinates of points representing the documents and terms vector dimensional space. The nearer one points to the other, if they are more similar documents or terms (see Figure 2). Similarity is typically defined as the cosine between the corresponding vectors: sim(di, dj) = cos(vi, vj) 129 from the source code and then clustering is applied to group the related software artifacts into clusters and groups of artifacts having the same vocabulary are identified and these are called clusters. Thus each cluster reveals a different concept of the system. Most of these are domain concepts, some are implementation concepts. The actual ratio depends on the naming convention of the system. At the end, the inherently unnamed concepts are labelled with terms taken from the vocabulary of the source code. An automatic algorithm labels each cluster with most similar terms, and in this way provide a human readable description of the main concepts in a software system. Additionally, the clustering is visualized as a shaded Correlation Matrix that illustrates: • the semantic similarity between elements of the system, the darker a dot the more similar its artifacts, • a partition of the system into clusters with high semantic cohesion, which reveals groups of software artifacts that implement the same domain concept, • semantic links between these clusters, which emphasize single software artifacts that interconnect the above domain concepts. Figure 3: From left to right: unordered correlation matrix, then sorted by similarity, then grouped by clusters, and finally including semantic links [5]. V. BUILDING THE TEXT CORPUS Figure 2: On the left: An LSI-Space with terms and documents, similar elements are placed near each other [5]. Computing the similarity between document di and dj is done taking the cosine between the i-th and j-th row of the matrix. The resulting cosine value, similarity values range from 1 to 0: 1 for similar vectors with the same direction and to 0 for dissimilar, orthogonal vectors. Theoretically cosine values can go all the way to −1, but because there are no negative term occurrences, similarity values never goes below to zero. IV. SEMANTIC CLUSTERING Semantic clustering is a non-interactive and unsupervised technique to analyze the semantics of a software system. Semantic clustering offers a high level view on the domain concepts of a system, abstracting concepts from software artifacts. Firstly, Latent Semantic Indexing (LSI) is used to extract linguistic information © 2010 ACADEMY PUBLISHER Text corpus is a large and structured set of texts. To build a semantic model, Latent Semantic Indexing (LSI) is used to analyze the distribution of terms over a text corpus. When applying LSI on a software system we break its source code into documents and use the vocabulary found therein as terms. The system can be split into documents at any level of granularity, such as modules, classes or methods, it is even possible to use entire projects as documents [16]. The vocabulary of source can be extracted both from the content of comments and from the identifier names. Comments are parsed as natural language text and compound identifier names split into their parts. As most modern naming conventions are used camel case, it is straight forward to split identifiers: for example, FooBar becomes foo and bar. In case of legacy code that uses other naming conventions, more advanced algorithms and heuristics are required [17]-[18]. Common stop words are excluded from the vocabulary, as they do not help to discriminate documents, and stemmer algorithm is used to reduce all words to their morphological root. Finally the termdocument matrix is weighted with tf-idf (Term frequency, inverted document frequency), to balance out the influence of very rare and very common terms. 130 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 VI. SEMANTIC SIMILARITY AND CORRELATION MATRIX Semantic similarity is the likeness of meaning/semantic content within a set of documents or terms. Latent Semantic Indexing (LSI) can be used to extract linguistic information from the source code. The result of this process will be an LSI index L with similarities between software artifacts as well as terms. Based on the index we can determine the similarity between these elements. Software artifacts are more similar if they cover the same concept, terms are more similar if they denote related concepts. Since similarity is defined as cosine between element vectors, its values range between 0 and 1. The similarities between elements are arranged in a square matrix A called the Correlation Matrix. To visualize the similarity values we map them to gray values: the darker, the more similar. In that way the matrix becomes a raster-graphic with gray dots: each dot ai,j shows the similarity between element di and element dj. The elements are arranged on the diagonal and the dots in the off-diagonal show the relationship between them. Without proper ordering, the correlation matrix looks like a television tuned to a dead channel. An unordered matrix does not reveal any patterns: arbitrary ordering, such as the names of the elements, is generally as useful as random ordering [19]—therefore, matrix will be clustered such that similar elements are put near each other and dissimilar elements far apart of each other. After applying the clustering algorithm, the similar elements are grouped together and aggregated into concepts. Hence, a concept is characterized as a set of elements that uses the same vocabulary. Documents that are not related to any concept usually end up in singleton clusters in the middle or in the bottom right of the correlation matrix. The correlation matrices are ordered using average linkage clustering algorithm. The matrix will be reordered first, and then dots will be grouped by clusters and colour them with their average cluster similarity. As with the element similarities in the previous section, the similarities between clusters are arranged in a square matrix A. When visualized, this matrix becomes a raster-graphic with gray rectangles: each rectangle ri,j shows the similarity between cluster Ri and cluster Rj , and has the size (|Ri|, | Rj|). The clusters are arranged on the diagonal and the rectangles in the off-diagonal show the relationship between them—see the third matrix on Figure 3. A correlation matrix is gray-scale raster-graphic: each dot ai,j shows the similarity between element di and element dj—the darker, the more similar. The elements are arranged on the diagonal while the dots in the offdiagonal show the relationship between them. An unordered matrix does not reveal any patterns; therefore we cluster the elements and sort the matrix: all dots in a cluster are grouped together and are colour with their average similarity; this is semantic cohesion [20]. This offers a high-level view on that system, abstracting from elements to concepts. VII. DISCRETE CHARACTERIZATION OF CLUSTERS © 2010 ACADEMY PUBLISHER Visualization of the cluster in 3-Dimesion extended the domain detection concept much simpler in terms of distributed application. Just visualizing clusters is not enough; labelling is required to describe the cluster. Often just enumerating the names of the software artifacts in a cluster gives a sufficient interpretation. If the names are badly chosen or unnamed software artifacts are analyzed, we need an automatic way to identify labels. Figure 4 shows the labels in the concept of LAN example. Figure 4: Automatically retrieved labels describe the concepts. The labels were retrieved using the documents in a concept cluster as query to search the LSI space for related terms. To obtain the most relevant labels comparison will be performed between the similar terms of the current cluster and similar terms of all other clusters. All the steps of the domain extraction from source code include: pre-processing, applying LSI, clustering and retrieve the most relevant terms for each cluster and the similarity measurement to identify topics in the source code will follow the flow as depicted in the figure: Figure 5: Modified Semantic clustering of software source code [5]. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 VIII. CONCLUSION When understanding a software system, analyzing its structure reveals only half of the story. The other half resides in the domain semantics of the implementation. Developers put their domain knowledge into identifiers name or comments. This work presented the use of Semantic Clustering to analyze the textual content of source code to recover domain concepts from the code itself [22]. To identify the different concepts in the code, we applied Latent Semantic Indexing (LSI) and cluster the source artifacts according to the vocabulary of identifiers and comments. Each cluster represents a distinct domain concept. To define the concept and to retrieve the most relevant labels for clusters, LSI technique has been used. For each cluster, the labels are obtained by ranking and filtering the most similar terms [16]. The result of applying LSI is a vector space, based on which we can compute the similarity between either documents or terms. REFERENCES [1] A. Abran, P. Bourque, R. Dupuis, L. Tripp, “Guide to the software engineering body of knowledge (ironman version),” Tech. rep., IEEE Computer Society (2004). [2] S. Ducasse, M. Lanza, “The class blueprint: Visually supporting the understanding of classes,” IEEE Transactions on Software Engineering 31 (1) (2005) 75– 90. [3] Y. S. Maarek, D. M. Berry, G. E. Kaiser, “An information retrieval approach for automatically constructing software libraries,” IEEE Transactions on Software Engineering 17 (8) (1991) 800–813. [4] G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, E. Merlo, “Recovering traceability links between code and documentation,” IEEE Transactions on Software Engineering 28 (10) (2002) 970–983. [5] Adrian Kuhn, Stephane Ducasse, Tudor Girba, “Semantic Clustering: Identifying Topics in Source Code,” Language and Software Evolution Group, LISTIC, Universite de Savoie, France, 2006 [6] Yo¨elle S. Maarek, Daniel M. Berry, and Gail E. Kaiser, “An information retrieval approach for automatically constructing software libraries,” IEEE Transactions on Software Engineering, 17(8):800–813, August 1991. [7] Giuliano Antoniol, Gerardo Canfora, Gerardo Casazza, Andrea De Lucia, and Ettore Merlo, “Recovering traceability links between code and documentation,” IEEE Transactions on Software Engineering, 28(10):970–983, 2002. [8] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, R. A. Harshman, “Indexing by latent semantic analysis,” Journal of the American Society of Information Science 41 (6) (1990) 391–407. [9] A. Marcus, A. Sergeyev, V. Rajlich, J. Maletic, “An information retrieval approach to concept location in source code”, in: Proceedings of the 11thWorking Conference on Reverse Engineering (WCRE 2004), 2004, pp. 214–223. [10] S. T. Dumais, J. Nielsen, “Automating the assignment of submitted manuscripts to reviewers,” In Research and Development in Information Retrieval, 1992, pp. 233–244. [11] J. I. Maletic, A. Marcus, “Using latent semantic analysis to identify similarities in source code to support © 2010 ACADEMY PUBLISHER 131 program understanding,” In: Proceedings of the 12th International Conference on Tools with Artificial Intelligences (ICTAI 2000), 2000, pp. 46–53. [12] S. Kawaguchi, P. K. Garg, M. Matsushita, K. Inoue, “Mudablue: An automatic categorization system for open source repositories,” in: Proceedings of the 11th AsiaPacific Software Engineering Conference (APSEC 2004), 2004, pp. 184–193. [13] A. Marcus, J. I. Maletic, “Identification of high-level concept clones in source code,” in: Proceedings of the 16th International Conference on Automated Software Engineering (ASE 2001), 2001, pp. 107–114. [14] A. De Lucia, F. Fasano, R. Oliveto, G. Tortora, “Enhancing an artefact management system with traceability recovery features,” in: Proceedings of 20th IEEE International Conference on Software Maintainance (ICSM 2004), 2004, pp. 306–315. [15] A. Marcus, D. Poshyvanyk, “The conceptual cohesion of classes,” in: Proceedings Internationl Conference on Software Maintenance (ICSM 2005), IEEE Computer Society Press, Los Alamitos CA, 2005, pp. 133–142. [16] Adrian Kuhn, Stephane Ducasse, and Tudor Girba, “Semantic clustering: Exploiting source code linguistic information,” Information and Software Technology, submitted, 2006. [17] Bruno Caprile and Paolo Tonella. Nomen est omen, “Analyzing the language of function identifiers,” In Proceedings of 6th Working Conference on Reverse Engineering (WCRE 1999), pages 112–122. IEEE Computer Society Press, 1999. [18] Nicolas Anquetil and Timothy Lethbridg, “Extracting concepts from file names; a new file clustering criterion,” In International Conference on Software Engineering (ICSE’98), pages 84–93, 1998. [19] Jaques Bertin, “Graphics and Graphic Information Processing,” Walter de Gruyter, 1981. [20] Andrian Marcus and Denys Poshyvanyk, “The conceptual cohesion of classes,” In Proceedings Internationl Conference on Software Maintenance (ICSM 2005), pages 133–142, Los Alamitos CA, 2005. IEEE Computer Society Press. [21] Michael W. Berry, Susan T. Dumais, and Gavin W. O’Brien, “Using linear algebra for intelligent information retrieval,” SIAM Review, 37(4):573–597, 1995 [22] Adrian Kuhn, St´ephane Ducasse, and Tudor Gˆırba, “Enriching reverse engineering with semantic clustering,” In Proceedings of Working Conference on Reverse Engineering (WCRE 2005), pages 113–122, Los Alamitos CA, November 2005. IEEE Computer Society Press. Sanjay Madan, Author Sanjay Madan is working as Software Engineer in Comviva Technologies Ltd, Gurgaon since 2009. He has done Post graduation from Thapar University, Patiala. He had worked on more than six professional/research projects. He is the author/co-author of four publication in international conferences and journals. His research area of interest include Web semantics and machine learning particularly semantic clustering and classification. He had taken courses in their teaching career as of Data Structure, Web Technologies and Computer Graphics. 132 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 Shalini Batra, Author Shalini Batra is working as Assistant Professor in Computer Science and Engineering Department, Thapar University, Patiala since 2002. She has done her Post graduation from BITS, Pilani and is pursuing Ph.D. from Thapar © 2010 ACADEMY PUBLISHER University in the area of Semantic and Machine Learning. She has guided fifteen ME theses and presently guiding four. She is author/co-author of more than twenty-five publications in national and international conferences and journals. Her areas of interest include Web semantics and machine learning particularly semantic clustering and classification. She is taking courses of Compiler construction, Theory of Computations and Parallel and Distributed Computing. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 133 GRAAA: Grid Resource Allocation Based on Ant Algorithm Manpreet Singh Department of Computer Engineering, M. M. Engineering College, M. M. University, Mullana, Haryana, India Email: [email protected] Abstract— Selecting the appropriate resources for the particular task is one of major challenging work in the computational grid. The major objective of resource allocation in grid is effective scheduling of tasks and in turn the reduction in execution time. Hence the resource allocation must consider some specific characteristics of the resources, tasks and then decide the metrics to be used accordingly. Ant algorithm, which is one of the heuristic algorithm suits well for the allocation and scheduling in grid environment. In this paper, a Grid Resource Allocation based on Ant Algorithm (GRAAA) is proposed. The simulation result shows that the proposed algorithm is capable of producing high quality allocation of grid resources to tasks. Index Terms— Resource Allocation, Task Scheduling, Ant System, Grid I. INTRODUCTION Resource allocation and task scheduling are fundamental issues in achieving high performance in grid computing systems. However, it is a big challenge for efficient allocation and scheduling algorithm design and implementation. Unlike scheduling problems in conventional distributed systems, this problem is much more complex as new features of grid systems such as its dynamic nature and the high degree of heterogeneity of jobs and resources must be tackled [1]. By harmonizing and distributing the grid resources efficiently, an advanced resource allocation strategy can reduce total run time and total expense greatly and bring an optimal performance [2] [3]. Ant algorithm is a random search algorithm, like other evolutionary Algorithms [4]. The algorithm is a new model-based bionic approach with different transition and pheromone updating rules. It is inspired in food selfenforcing foraging behaviors exhibited by ant societies. It is algorithm for solving the NP-hard combinatorial optimization problems, such as TSP (Traveling Salesman Problem) [3]. Then it was used in JSP (Job-shop Scheduling Problem) [5][6], QAP (Quadratic Assignment Problem), and so on [7]. The motivation of this paper is to develop a grid resource allocation algorithm that can perform efficiently and effectively in terms of minimizing total execution time and cost. Not only does it improve the overall performance of the system but it also adapts to the dynamic grid system. First of all, this paper proposes a Resource Oriented Ant Algorithm (ROAA) to find the optimal allocation of each resource within the dynamic grid system. Secondly, the simulation of proposed algorithm is presented using Gridsim. II. RELATED WORK Recently, many researchers have studied several works on allocation and scheduling in grid environment. Some of the popular heuristic algorithms, which have been developed, are Min-Min [8], the Fast Greedy [8], Tabu Search [8] and an Ant System [9]. Max-Min Ant System (MMAS) [10] limit the pheromone range to be greater than or equal to the low bound value (Min) and smaller than or equal to the upper bound value (Max) to avoid ants to converge too soon in some ranges. [11] use multiple kinds of ant to find multiple optimal paths for network routing. The idea can be applied to find multiple available resources to balance resources utilization in job scheduling. In [12], the scalability of ant algorithm is validated, a simple Grid simulation architecture and design of ant algorithm suitable for Grid task scheduling is proposed. III. GRID RESOURCE ALLOCATION BASED ON ANT ALGORITHM (GRAAA) The GRAAA is a resource allocation framework, which comprises user, resource broker, resources and Grid Information Services (GIS). It adopts ant colony as major allocation strategy as shown in Fig.1. Query Grid Information Service Resource Broker Ant Allocation Algorithm Task Agent Resource Discovery Agent Registration Status Query / Task Submission Task Resource R1 Result Availability Information / Task Execution Results Resource R2 . . . User Resource RN Figure 1: System Model © 2010 ACADEMY PUBLISHER doi:10.4304/jait.1.3.133-135 Resource Information 134 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 The interaction among various entities of system model is as follow: Step 1: Resource registration to GIS takes place. Step2: User submit task with complete specification to resource broker through grid portal. Step3: Task Agent (TA) places all submitted tasks in a task set and activates Resource Discovery Agent (RDA). Step4: RDA queries GIS regarding resources. Step5: GIS returns the static attribute of resources such as number of machines, number of processing elements (PE), MIPS (Million Instruction Per second) rating of each PE, allocation policy. Step6: RDA send query to registered resources for their availability status. Step7: RDA gets the status information and makes it available to TA. Step8: TA, by deploying ant algorithm, select a resource for next task assignment and dispatch the task to selected resource through RDA. Step9: After task execution, results are received from resources and are returned to user by TA. IV. RESOURCE ORIENTED ANT ALGORITHM (ROAA) Ant algorithm [3] is inspired on an analogy with real life behavior of a colony of ants when looking for food, and is effective algorithm for the solution of many combinatorial optimization problems. Investigations show that: Ant has the ability of finding an optimal path from nest to food. On the way of ants moving, they lay some pheromone on the ground. While an isolated ant moves essentially at random, an ant encountering a previously laid trail can detect it and decide with high probability to follow it, thus reinforcing the trail with its own pheromone. The probability of ant chooses a way is proportion to the concentration of a way’s pheromone. To a way, the more ants choose, the way has denser pheromone, and the denser pheromone attracts more ants. Through this positive feedback mechanism, ant can find an optimal way finally. In ROAA, the pheromone is associated with resources rather than path. The increase or decrease of pheromone depends on task status at resources. The main objective of algorithm is reduction in total cost and execution time. Let the number of tasks (ants) in task set T maintained by task agent is P and the number of registered resources is Q . When a new resource R i is registered to GIS, then it will initialize its pheromone based on: τ i (0 ) = N x M , where N represent the number of processing elements and M corresponds to MIPS rating of processing element. Whenever a new task is assigned to or some task is returned from R i , then the pheromone of R i is changed as: © 2010 ACADEMY PUBLISHER τ i new = ρ *τ i old + ∆τ i , where ∆ i is pheromone variance and ρ , 0 < ρ < 1 is a pheromone decay parameter. When a task is assigned to R i , its pheromone is reduced i.e. ∆ i = − C , where C represents the computational complexity of assigned task. When a task is successfully returned from R i , ∆ i = Φ * C , where Φ is the encouragement argument. On the other hand, if task failure occurs at R i , ∆ i = Θ * C , where Θ is the punishment argument. It is clear that pheromone increases when task execution at a resource is successful. The possibility of next task assignment to resource R j is computed as: p j (t ) = [τ (t )] * [η ] α β j j ∑ [τ (t )]α * [η ]β r r , r τ j (t ) denotes the current pheromone of resource R j and η j represents the initial pheromone of R j i.e. η j = τ j (0 ) . α is the where j, r ∈ available resources . parameter on relative performance of pheromone trail intensity, β is the parameter on relative importance of initial performance attributes. The process of resource oriented ant algorithm is shown below. Procedure Resource_Ant_Algorithm Begin Initialize parameters and set pheromone trails. While (Task set T ≠ Φ ) do Begin Select next task t from T . Determine the next resource R i for task assignment having higher transition probability among all resources (high pheromone intensity), i.e. p i (t ) = max p l (t ) . l∈Q Schedule task t to R i and remove it from T i.e. T = T − {t } . If (Any task completion or failure occurs) then Update pheromone intensity of corresponding resource and transition probability of all registered resources. End End V. SIMULATION RESULTS We analyze the ROAA using GridSim simulator [13]. Resource and tasks used in simulation are modeled as shown in Table1. The proposed algorithm is compared with the algorithm already used in GridSim. This algorithm selects the next resource for task assignment in a random fashion (RandomAlgorithm). Other simulation parameters are: ρ = 0.9, α = 0.5, β = 0.5, Φ = 1.1, Θ = 0.8 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 135 In our simulation, we use 10 heterogeneous grid resources and we run a simulation at five levels of workloads-50 tasks, 100 tasks, 150 tasks, 200 tasks and 250 tasks. reduction in total execution time and cost. In future work, we plan to add the applications of ant level load balancing in addition to implementing this mechanism in a more realistic environment. Table 1: Simulation Parameters REFERENCES Parameter Number of Resources Number of PE per Resource MIPS of PE Resource Cost Total Number of Tasks(Ants) Length of Task Value 10 4-16 300-900 9 G$ 50-250 10000-18000 MI (Million Instructions) [1] [2] [3] 90000 ROAA RandomAlgorithm 80000 [4] 70000 Co st( G $ ) 60000 50000 [5] 40000 30000 [6] 20000 10000 0 50 100 150 200 250 300 No. of Task [7] Figure 2: Comparison of Total Cost 800 [8] ROAA Random Algorithm E x e c u t io n T im e ( h r . ) 700 600 500 400 [9] 300 200 [10] 100 0 0 50 100 150 200 250 300 No. of Task Figure 3: Comparison of Total Execution Time Each task is submitted into grid system randomly. Fig. 2 and Fig. 3 show that system using ROAA outperforms system using RandomAlgorithm in terms of execution time and cost. VI. [11] [12] CONCLUSION In this paper, we described grid resource allocation using an ant algorithm. The results of the experiments are also presented and the strengths of the algorithm are investigated. The simulation results demonstrate that the ROAA algorithm increases performance in terms of © 2010 ACADEMY PUBLISHER [13] I. Foster, C. Kesselman, and S. Tuecke,“ The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International Journal of High Performance Computing Applications, vol. 15(3), pp. 200-222, 2001. C. Chapman, M. Musolesi, W. Emmerich, and C. Mascolo, “Predictive Resource Scheduling in Computational Grids,” IEEE Parallel and Distributed Processing Symposium, 2007 (IPDPS 2007), pp. 1-10, 2630 March 2007. K. Krauter, R. Buyya, and M. Maheswaran, “A Taxonomy and survey of Grid Resource Management Systems for Distributed Computing,” Software: Practice and Experience (SPE) Journal, Wiley Press, USA, vol. 32(2), pp. 135-164, 2002. M. Dorigo, and L. M. Gambardella, “Ant colony system: a cooperative learning approach to the traveling salesman problem,” IEEE Transactions on Evolutionary Computation, vol. 1(1), pp. 53-66, 1997. A. Colorni , M. Dorigo, and V. Maniezzo, “Ant colony system for job-shop scheduling,” Belgian Journal of Operations Research Statistics and Computer Science, vol. 34(1), pp. 39-53,1999. A. Lorpunmanee, M. N. Sap, A. H. Abdullah, and C. Chompoo-inwai, “An Ant Colony Optimization for Dynamic Job Scheduling in Grid Environment,” International Journal of Computer and Information Science and Engineering, vol. 1(4), pp. 207-214, 2007. V. Maniezzo, and A. Colorni, “The Ant System Applied to Quadratic Assignment Problem,” IEEE Transaction on Knowledge and Data Engineering, vol. 11(5), pp. 769778, 1999. T. D. Braun, H. J. Siegel, N. Beck, L. L. Bölöni, M. Maheswaran, A. I.Reuther, J. P. Robertson, M. D. Theys, B. Yao, D. Hensgen and R. F. Freund , “A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems”, Journal of Parallel and Distributed Computing, vol. 61(6), pp. 810-837, 2001. Li, X. Peng, Z. Wang, and Y. Liu, “Scheduling Interrelated Tasks in Grid Based on Ant Algorithm,” Journal of System Simulation, 2007. T. Stutzle, “MAX-MIN Ant System for Quadratic Assignment Problems” Technical Report AIDA-97-04 Intellectics Group, Department of Compute Science, Darmstadt University of Technology, Germany, July 1997. K. M. Sim, and W. H. Sun “Multiple Ant-Colony Optimization for Network Routing”, Proceedings of First International Symposium on Cyber Worlds, pp. 277-281, 6-8 Nov. 2002. Z. Xu, X. Hou, and J. Sun, “Ant Algorithm-Based Task Scheduling in Grid Computing,” Proceeding of the IEEE Conference on Electrical and Computer Engineering, pp. 1107-1110, 2003. R. Buyya, and M. Murshed, “GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing,” Concurrency & Computation: Practice & Experience, vol. 14, pp. 1175-1220, 2002. 136 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 A Channel Allocation Algorithm for Hot-Spot Cells in Wireless Networks Rana Ejaz Ahmed College of Engineering, American University of Sharjah Sharjah, United Arab Emirates Email: [email protected] Abstract— Recent growth in mobile telephone traffic in wireless cellular networks, along with the limited number of channels available, presents a challenge for the efficient reuse of channels. Channel allocation problem becomes more complicated if one or more cells in the network become “hot-spots” for some period of time, i.e., the bandwidth resources currently available in those cells are not sufficient to sustain the needs of current users in the cells. This paper presents a new hybrid channel allocation algorithm in which the base station sends a multi-level “hotspot” notification to the central pool located at Mobile Switching Station (MSC) on each channel request that cannot be satisfied locally at the base station. This notification will request more than one channel be assigned to the requesting cell, proportional to the current hot-spot level of the cell. When a call using such a “borrowed” channel terminates, the cell may retain the channel depending upon its current hot-spot level. The simulation study of the protocol indicates that the protocol has low overhead, and it behaves similar to the Fixed Channel Allocation (FCA) scheme at high traffic and to the Dynamic Channel Allocation (DCA) scheme at low traffic loads. The proposed algorithm also offers low-overhead in terms of the number of control messages exchanged between a base station and the MSC on channel acquisition and release phases. Index Terms— Cellular network architectures, Channel allocation schemes, Hot-spot cell design, Network architecture for ubiquitous computing. I. INTRODUCTION Recent growth of mobile telephone traffic in wireless cellular networks, along with the limited number of radio frequency channels available in the network, requires the need of efficient reuse of channels. An efficient channel allocation strategy is needed should exploit the principle of frequency reuse to increase the availability of channels to support the maximum possible number of calls at any given time. A given frequency channel cannot be used at the same time by two cells in the system if they are within a distance called minimum channel reuse distance, because it will cause radio interference (also known as co-channel interference). Several channel allocation schemes have proposed [14] in the literature, and they can be divided into three major categories: Fixed Channel Allocation (FCA), Dynamic Channel Allocation (DCA), and Hybrid Channel Allocation (HCA). In FCA schemes, a fixed number of channels are assigned to each cell according to © 2010 ACADEMY PUBLISHER doi:10.4304/jait.1.3.136-140 predetermined traffic demand and co-channel interference constraints. FCA schemes are very simple; however, they are inflexible, as they do not adapt to changing traffic conditions and user distribution In order to overcome these deficiencies of FCA schemes, DCA schemes have been introduced. In DCA schemes, channels are placed in a pool (usually centralized at Mobile Switching Center (MSC) or distributed among various base stations) and are assigned to new calls as needed. Any cell can use a channel as long as the interference constraints are satisfied. After the call is over, the channel is returned back to the central pool. At the cost of higher complexity and control message overhead, DCA provides flexibility and traffic adaptability. However, DCA schemes are less efficient than FCA under high load conditions [2], mainly due to high overhead involved in exchanging control messages. To improve the performance, some DCA schemes use channel reassignment, where on-going calls may be switched, when possible, to reduce the distance between co-channel cells [1,2]. Another type of DCA strategy involves channel borrowing mechanism from neighboring cells. In such a scheme, channels are assigned to each cell, as is normally done in the case of FCA. However, when a call request finds all such channel busy, a channel may be borrowed from a neighboring cell if such cell borrowing will not violate the co-channel interference constraints [1-5]. A generic mathematical theory for load balancing problem in cellular network is described in [6]. HCA techniques are designed by combining FCA and DCA schemes in an effort to take advantages of both schemes. In HCA, channels are divided into two disjoint sets: one set of channels is assigned to each cell on FCA basis (fixed set), while the others are kept in a central pool for dynamic assignment (dynamic set). The fixed set contains a number of channels that are assigned to cells as in the FCA schemes and such channels are preferred for use in their respective cells. When a mobile host needs a channel for its call, and all the channels in its fixed set are busy, only then a request from the dynamic set is made. The ratio of the number of fixed and dynamic channels plays an important role. It has been found that if the ratio is 50% or more, FCA performs better than HCA. The HCA techniques proposed in the literature are complex to implement and they suffer from the large control overhead incurred from system state collection and dissemination. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 The channel allocation problem becomes even more challenging when one or more cells in the networks become “hot-spots” for some duration of time. A cell becomes a “hot-spot” when traffic generated in that cell exceeds far beyond its normal traffic load for that particular hour. An example of a “hot-spot” cell could be the area covered by a football stadium for the duration of a favorite game. The reported HCA techniques in the literature do not offer proactive strategies in case a cell in the system will become a “hot-spot” in the very near future. This paper presents a new HCA scheme that takes into account the level of traffic intensity in a cell in terms of a “hot-spot” signal, in the case the cell becomes “hot-spot”. The proposed scheme is simple to implement and offers low-overhead in terms of the number of control messages exchanged between the base station and MSC on channel acquisition and release phases. II. BASELINE SYSTEM ARCHITECTURE A. System Model and Definitions We consider a wireless cellular network where the covered geographic area served by the system is divided into several hexagonal-shaped cells. Each cell is served by a base station (also called the MSS (Mobile Service Station)), usually present at the center of the cell. The base stations are connected with one another through a fixed wired (or wireless) network, in general. A mobile host can communicate only with the base station in its cell directly. When a mobile host wants to set up a call, it sends a request to its base station in its cell on the control channel. The call can be set up only if a channel is assigned to support the communication between the mobile host and the base station. No two cells in the system can use the same channel at the same time if they are within minimum channel reuse distance; otherwise, channel interference will occur. It is assumed that a base station can keep the count of number of calls originated (successful or unsuccessful) in its related cell over a given period of time. This count will help the base station to determine its present level of “hot-spot” and to send a multi-level “hot-spot” notification/ signal to its Mobile Switching Center (MSC). The system uses a hybrid channel allocation scheme where the total number of C channels is divided into two disjoint sets, F and D. The set F contains the channels for fixed (or static) assignment, while the set D contains the channels for dynamic assignment, i.e., C = F ∪ D. Moreover, each base station maintains a temporary pool (called T ) to retain a channel that was originally transferred from the dynamic assignment pool at MSC. The system uses a frequency reuse factor N. The fixed channels are assigned to a cell statically as in FCA; while dynamic channels are kept in a centralized pool at MSC. Let r be the ratio of the number of dynamic channels to the total number of channels available in the system, i.e., r = |D| / |C| © 2010 ACADEMY PUBLISHER 137 The ratio r will remain fixed in the system (i.e., it will not change dynamically over the period of time). The value of r is a design parameter, and it depends on the designer’s view about the heterogeneity (or difference) in traffic volumes in different cells in the network. For example, if most of the cells in the network has the chance of becoming “hot-spots”, then it is a good idea to keep the ratio r > 0.5. The “hot-spot” notification level is an integer-valued number L, such that L ∈ { 0,1,2,.. M } where M represents a pre-defined maximum level supported by the system. The value for L represents the fact that up to L borrowed channels can be retained by the base station after a call on the borrowed channel from that cell terminates. The hybrid channel allocation algorithm (described next) will use the appropriate value of L in several of its steps. B. Hybrid Channel Allocation Algorithm The proposed hybrid channel allocation algorithm is described in two phases: channel acquisition phase and channel release phase. The steps taken by mobile host, base station and MSC are outlined below. Set L to 0 at the beginning to indicate that, at the present time, the channel request can be accommodated from the fixed (static) list assigned to the cell. Channel Acquisition Phase The following Steps are taken from Mobile Host/ Base Station sides during a channel acquisition phase: 1. 2. 3. 4. 5. When a mobile host wants to initiate a call, it sends the channel request on the control channel to its related base station. If the base station has an available channel from its current fixed channel list (i.e., set F) , it will assign the channel to the mobile host, and channel acquisition phase terminates. If no channel from the fixed list for the cell is available, then the base station updates the value of L as follows: L = L +1; L = max (L, M); The base station then sends a request to borrow a channel from the central pool located at MSC. It also includes the current value of L in the channel request; and the maximum value of L is a predefined number M. When the base station successfully acquires a channel from the dynamic pool at MSC, it also adds the channel to its temporary pool (T). The following steps are taken from MSC Side during a channel acquisition phase: 138 1. 2. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 The MSC, on receiving a channel request from the base station assigns up to L channels, if available, from the pool allocated for dynamic assignment to the requesting base station (even the call generated by the mobile host needed only one channel) and the channel acquisition phase terminates. The main reason of assigning up to L channels (instead of only one) is a proactive measure, as the “borrowing” event indicates that the probability of the cell covered by the requesting base station becoming a “hot-spot” might be on the rise, and an assignment of several channels with one request will definitely involve less overhead (in terms of control messages exchanged) as compared to several single channel requests. If the MSC cannot assign even one channel, then the call will be blocked and the channel acquisition phase terminates. Channel Release Phase The following Steps are taken from Mobile Host, Base Station , and MSC sides during a channel release phase: 1. When a call terminates on a channel ci at a mobile host, the base station needs to find out which type of channel the call belonged to. If the channel belonged to is from the fixed (static) pool maintained at the base station, the channel is returned to the pool and channel release phase terminates. 2. However, if the channel, ci , being returned belonged to the dynamic pool at MSC, the base station estimates the current level of “hot-spot” , h , in the cell. 3. If h is less than or equal to the old value of level L, meaning that the congestion in the cell is the same or easing, the base station checks its temporary pool (T) and retains only up to h channels in a random order, and all the remaining channels (if any) in T are returned back to the pool at MSC. If h is greater than the old value of level L (i.e., h > L), meaning that the congestion in the cell is getting worse, the channel ci is retained in the cell and is returned to the base station’s temporary pool (T); the channel ci is not returned back to MSC at this time. III. PERFORMANCE EVALUATION Several metrics can be used to evaluate and compare the performance of the proposed algorithm with the existing ones. In this paper, we considered the followings metrics: Call blocking (denial) probability, the average number of control messages sent from the base station to MSC in order to acquire one channel from the pool holding dynamic channels, and the average number of control messages sent from the base station to MSC in order to release one channel to the centralized pool. The call blocking probability is defined as the ratio of the number of new calls initiated by a mobile host which © 2010 ACADEMY PUBLISHER cannot be supported by existing channel arrangement to the total number of new calls initiated; i.e., the probability that a call arriving to a cell finds both fixed and dynamic channels busy. In the classical hybrid channel allocation schemes reported in the literature, one control message is exchanged between the base station and the MSC in order to acquire exactly one channel from the dynamic pool resident at MSC, and the similar comments are true for channel release phase. However, the proposed algorithm makes “bulk” acquisition of the channels per request, and “bulk” release of channels per request depending upon the hot-spot level. That means that the number of control messages needed in the proposed algorithm is expected to be low as compared to the classical hybrid allocation schemes. A. Simulation Parameters The simulation parameters used in this paper are quite similar to the ones used in [3]. The simulated cellular network consists of a 2D structure of 6 x 6 hexagonal cells, with each cell having six neighbors. There are 300 channels in total in the system. A frequency reuse factor of 3 is assumed (i.e., N = 3). The arrival of calls at any cell is assumed to be a Poisson process and that the call duration is exponentially distributed with a mean of 3 minutes. A cell can be in one of the two states at any time: normal or “hot-spot”. The mean call arrival rate in a normal (i.e., non “hot-spot”) cell is λ calls per minute. When a cell is in a “hot-spot” state, the call arrival rate in the cell is assumed to be 3λ calls per minute. The mean rate (per minute, uniformly distributed) of changing from normal to “hot-spot” state is 1/30; while the mean rate (per minute, uniformly distributed) form the “hot-spot” state to normal state is 1/3. It is to be noted that a cell can also become ”hotspot” with short-term traffic fluctuations even the cell is not explicitly forced into “hot-spot” state as mentioned above. The ratio r (the ratio of the number of dynamic channels to the total number of channels available in the system) is also changed from 0.1 to 0.8. The system load (or traffic) intensity is defined with respect to arrival rates and call service rate in a cell that can switch back and forth from “normal” to “hot-spot” state. The value of M (the maximum “hot-spot” level supported by the system) is also varied from 4 to 8. B. Simulation Results The proposed algorithm was simulated using discreteevent simulation model in C++, and the call blocking probability and the number of control messages exchanged were studied under various system parameters, as discussed in section 3.1. Figure 1 shows the results of blocking probabilities for the proposed algorithm for various values of r for M = 4, while Figure 2 shows the results for various values of r for M = 8. The results are also compared with the static Fixed Chanel Assignment (FCA) algorithm. C. Comparisons and Discussion The main advantage of the proposed protocol is that it can adapt to dynamic strategy at low traffic load to static strategy (FCA) at higher traffic load. This fact is verified by simulation, as shown in Figures 1 and 2. When M=4 (as shown in Figure 1), the proposed strategy is better than FCA at system loads less than about 0.8, and higher value of r gives better results in this region. At higher system loads, the performance of the proposed algorithm approaches to FCA, and the lower values of r give better performance (closer to FCA) in this region. If we increase the value of maximum “hot-spot” level (M), the system performance, in general improves, in both regions of low and high system loads. Such observation can be easily made in Figure 2. The main reason of improvement in result is due to the fact that at higher traffic load, the more channels are available and retained in a “hot spot” cell. Figures 3 and 4 shows the average number of control message exchanged on each dynamic channel acquired (or returned) from the central pool at MSC. It should be noted that in all classical HCA (or DCA) algorithms, we need to send one control message on the event of channel acquisition or release for every channel request from the base station. The proposed algorithm offer a very low overhead in terms of control message exchanges, as the “bulk” channel acquisitions or releases are done through a single control message. Higher values of the ratio “r” offer even low over head, especially at higher traffic intensities. This is due to the fact that most of the channels are available in the central pool at MSC for dynamic channel assignments, and the channel requests are likely to be fully fulfilled at higher traffic intensities. IV. 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 139 FCA r = 0.1 r = 0.5 r = 0.8 0 0.25 0.5 0.75 1 System Load 1.25 1.5 Figure 1: Simulation Results of proposed algorithm for various values of r for M=4, and their comparison with FCA. B locking Probability Figure 3 shows the average number of control messages sent per dynamic channel acquired from the pool at MSC, while Figure 4 shows the average number of control messages sent for each dynamic channel returned to the centralized pool at MSC. Blocking Probability JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 FCA r = 0.1 r = 0.5 r = 0.8 0 0.25 0.5 0.75 1 1.25 1.5 System Load Figure 2: Simulation Results of proposed algorithm for various values of N for M=8, and their comparisons with FCA. CONCLUSIONS This paper presents a new hybrid channel allocation algorithm that sends a multi-level “hot-spot” notification to the central pool on each channel request that cannot be satisfied locally at the base station. This notification will request more than one channel be assigned to the requesting cell, proportional to the current hot-spot level of the cell. This also reduce control message overhead needed to acquire each channel individually. When a call using such a “borrowed” channel terminates, the cell may retain the channel depending upon its current hot-spot level. The simulation study of the protocol indicates that the protocol has low overhead, and it behaves similar to the FCA at high traffic and to the DCA at low traffic loads. Figure 3 (a): Average number of control messages per dynamic channel acquired form M=4. © 2010 ACADEMY PUBLISHER 140 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 REFERENCES [1] Figure 3 (b): Average number of control messages per dynamic channel acquired form M=8. Figure 4: Average number of control messages per dynamic channel returned for M=8. © 2010 ACADEMY PUBLISHER I. Katzela and M. Naghshineh, “Channel Assignment Schemes for Cellular Mobile Telecommunication Systems: A Comprehensive Survey”, IEEE Personal Communications, vol. 3, No. 3, June 1996. [2] K.L. Yeung and T.P. Yum, “Compact Pattern Based Dynamic Channel Assignment for Cellular Mobile Systems”, IEEE Trans. on Vehicular Technology, Vol. 43, No.4, November 1994, pp. 892-896. [3] J. Yang, et al., “ A Fault-Tolerant Distributed Channel Allocation Scheme for Cellular Networks”, IEEE Transactions on Computers”, Vol. 54, No. 5, May 2005, pp. 616-629. [4] R. Prakash, N. Shivaratri, and M. Singhal, “ Distributed Dynamic Fault-Tolerant Channel Allocation for Cellular Networks”, IEEE Trans. On Vehicular Technology, Vol. 48, No. 6, November 1999, pp. 1874-1888. [5] J. Yang and D. Manivannan, “An Efficient Fault-Tolerant Distributed Channel Allocation Algorithm for Cellular Networks”, IEEE Transactions on Mobile Computing, Vol. 4, No. 6, Nov./Dec. 2005, pp. 578-587. [6] O. Tonguz and E. Yanmaz, “ The Mathematical Theory of Dynamic Load Balancing in Cellular Networks”, IEEE Transactions on Mobile Computing, Vol. 7, No. 12, December 2008, pp. 1504-1518. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 141 JPEG Compression Steganography & Crypography Using Image-Adaptation Technique Meenu Kumari BVUCOE/IT Dept, Pune, India Email: [email protected] Prof. A. Khare and Pallavi Khare BVUCOE/IT Dept, Pune, India SSSIST/E&TC Dept, Bhopal, India Email: [email protected] Abstract—In any communication, security is the most important issue in today’s world. Lots of data security and data hiding algorithms have been developed in the last decade, which worked as motivation for our research. In this paper, named “JPEG Compression Steganography & Cryptography using Image-Adaptation Technique”, we have designed a system that will allow an average user to securely transfer text messages by hiding them in a digital image file using the local characteristics within an image. This paper is a combination of steganography and encryption algorithms, which provides a strong backbone for its security. The proposed system not only hides large volume of data within an image, but also limits the perceivable distortion that might occur in an image while processing it. This software has an advantage over other information security software because the hidden text is in the form of images, which are not obvious text information carriers. The paper contains several challenges that make it interesting to develop. The central task is to research available steganography and encryption algorithms to pick the one the offer the best combination of strong encryption, usability and performance. The main advantage of this project is a simple, powerful and user-friendly GUI that plays a very large role in the success of the application. Index Terms—Steganography, Cryptography, Compression, JPEG, DCT, Local Criteria, Image-Adaptation, Huffman coding, ET, SEC scheme I. INTRODUCTION In simple words, Steganography can be defined as the art and science of invisible communication. This is accomplished through hiding information in other information, thus hiding the existence of the communicated information. Though the concept of steganography and 1 cryptography are the same, but still steganography differs from cryptography. Cryptography [24] focuses on keeping the contents of a message secret, steganography focuses on keeping the existence of a message secret. Steganography and cryptography are both ways to protect information from unwanted parties but neither 1 Manuscript submitted on May 15, 2010; revised May 17, 2010; accepted May 31, 2010 © 2010 ACADEMY PUBLISHER doi:10.4304/jait.1.3.141-145 technology alone is perfect and can be compromised. Once the presence of hidden information is revealed or even suspected, the purpose of steganography is partly defeated. The strength of steganography can thus be amplified by combining it with cryptography. Almost all digital file formats can be used for steganography, but the formats that are more suitable are those with a high degree of redundancy. Redundancy can be defined as the bits of an object that provide accuracy far greater than necessary for the object’s use and display. The redundant bits of an object are those bits that can be altered without the alteration being detected easily. Image and audio files especially comply with this requirement, while research has also uncovered other file formats that can be used for information hiding. Given the proliferation of digital images, especially on the Internet, and given the large amount of redundant bits present in the digital representation of an image, images are the most popular cover objects for steganography. In the domain of digital images many different image file format exit, most of them for specific applications. For these different image file formats, different steganographic algorithms exist. Among all these file formats, the JPEG file format is the most popular image file format on the Internet, because of the small size of the images. II. OVERVIEW When working with larger images of greater bit depth, the images tend to become too large to transmit over a standard Internet connection. In order to display an image in a reasonable amount of time, techniques must be incorporated to reduce the image’s file size. These techniques make use of mathematical formulas to analyze and condense image data, resulting in smaller file sizes. This process is called compression [3]. In images there are two types of compression: lossy and lossless compression [3]. Compression plays a very important role in choosing which steganographic algorithm to use. Lossy compression techniques result in smaller image file sizes, but it increases the possibility that the embedded message may be partly lost due to the fact that excess image data will be removed. Lossless compression 142 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 though, keeps the original digital image intact without the chance of lost, although is does not compress the image to such a small file size. To compress an image into JPEG format, the RGB colour representation is first converted to a YUV representation. In this representation the Y component corresponds to the luminance (or brightness) and the U and V components stand for chrominance (or color). According to research the human eye is more sensitive to changes in the brightness (luminance) of a pixel than to changes in its color. This fact is exploited by the JPEG compression by down sampling the color data to reduce the size of the file. The color components (U and V) are halved in horizontal and vertical directions, thus decreasing the file size by a factor of 2. The next step is the actual transformation of the image. For JPEG [18], the Discrete Cosine Transform (DCT) [18] is used, but similar transforms are for example the Discrete Fourier Transform (DFT). These mathematical transforms convert the pixels in such a way as to give the effect of “spreading” the location of the pixel values over part of the image. The DCT transforms [18] a signal from an image representation into a frequency representation, by grouping the pixels into 8 × 8 pixel blocks and transforming the pixel blocks into 64 DCT coefficients each. A modification of a single DCT coefficient will affect all 64 image pixels in that block. The next step is the quantization [18] phase of the compression. Here another biological property of the human eye is exploited: The human eye is fairly good at spotting small differences in brightness over a relatively large area, but not so good as to distinguish between different strengths in high frequency brightness. This means that the strength of higher frequencies can be diminished, without changing the appearance of the image. JPEG does this by dividing all the values in a block by a quantization coefficient. The results are rounded to integer values and the coefficients are encoded using Huffman coding to further reduce the size. Originally it was thought that steganography would not be possible to use with JPEG images, since they use lossy compression [3] which results in parts of the image data being altered. One of the major characteristics of steganography is the fact that information is hidden in the redundant bits of an object and since redundant bits are left out when using JPEG it was feared that the hidden message would be destroyed. Even if one could somehow keep the message intact it would be difficult to embed the message without the changes being noticeable because of the harsh compression applied. However, properties of the compression algorithm have been exploited in order to develop a steganographic algorithm for JPEGs. One of these properties of JPEG is exploited to make the changes to the image invisible to the human eye. During the DCT transformation phase of the compression algorithm, rounding errors occur in the coefficient data that are not noticeable. Although this property is what classifies the algorithm as being lossy, this property can also be used to hide messages. © 2010 ACADEMY PUBLISHER It is neither feasible nor possible to embed information in an image that uses lossy compression, since the compression would destroy all information in the process. Thus it is important to recognize that the JPEG compression algorithm is actually divided into lossy and lossless stages [3]. The DCT and the quantization phase form part of the lossy stage, while the Huffman encoding used to further compress the data is lossless. Steganography can take place between these two stages. Using the same principles of LSB insertion the message can be embedded into the least significant bits of the coefficients before applying the Huffman encoding. By embedding the information at this stage, in the transform domain, it is extremely difficult to detect, since it is not in the visual domain. III. PROPOSED SYSTEM We propose a framework for hiding large volumes of data in images while incurring minimal perceptual degradation. The embedded data can be recovered successfully, without any errors, after operations such as decompression, additive noise, and image tampering. The proposed methods can be employed for applications that require high-volume embedding with robustness against certain non-malicious attacks. The hiding methods we propose are guided by the growing literature on the information theory of data hiding [22]. The key novelty of our approach is that our coding framework permits the use of local criteria to decide where to embed data. In order to robustly hide large volumes of data in images without causing significant perceptual degradation, hiding techniques must adapt to local characteristics within an image. The main ingredients of our embedding methodology are as follows. (a) As is well accepted, data embedding is done in the transform domain, with a set of transform coefficients in the low and mid frequency bands selected as possible candidates for embedding. (These are preserved better under compression attacks than high frequency coefficients) (b) A novel feature of our method is that, from the candidate set of transform coefficients, the encoder employs local criteria to select which subset of coefficients it will actually embed data in. In example images, the use of local criteria for deciding where to embed is found to be crucial to maintaining image quality under high volume embedding. (c) For each of the selected coefficients, the data to be embedded indexes the choice of a scalar quantizer for that coefficient. We motivate this by information theoretic analysis. (d) The decoder does not have explicit knowledge of the locations where data is hidden, but employs the same criteria as the encoder to guess these locations. The distortion due to attacks may now lead to insertion errors (the decoder guessing that a coefficient has embedded data, when it actually does not) and deletion errors (the decoder guessing that a coefficient does not have JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 embedded data, when it actually does). In principle, this can lead to desynchronization of the encoder and decoder. (e) An elegant solution based on erasures and errors correcting codes is provided to the synchronization problem caused by the use of local criteria. Specifically, we use a code on the hidden data that spans the entire set of candidate embedding coefficients, and that can correct both errors and erasures. The subset of these coefficients in which the encoder does not embed can be treated as erasures at the encoder. Insertions now become errors, and deletions become erasures (in addition to the erasures already guessed correctly by the decoder, using the same local criteria as the encoder). While the primary purpose of the code is to solve the synchronization problem, it also provides robustness to errors due to attacks. Two methods for applying local criteria are considered. The first is the block-level Entropy Thresholding (ET) method, which decides whether or not to embed data in each block (typically 8X8) of transform coefficients, depending on the entropy, or energy, within that block. The second is the Selectively Embedding in Coefficients (SEC) method, which decides whether or not to embed data based on the magnitude of the coefficient. Reed-Solomon (RS) codes [24] are a natural choice for the block-based ET scheme, while a “turbo-like" Repeat Accumulate (RA) code is employed for the SEC scheme. We are able to hide high volumes of data under both JPEG and AWGN attacks [24]. Moreover, the hidden data also survives wavelet compression, image resizing and image tampering attacks. Figure 1. Image-adaptive embedding methodology It is observed that the perceptual quality as well as the PSNR is better for the image with hidden data using local criteria. Note that though the PSNR is only marginally better, the actual perceptual quality is much better. This indicates that the local criteria must be used for robust and transparent high volume embedding. Although we do not use specific perceptual models, we refer to our criteria as `perceptual' because our goal in using local adaptation is to limit perceivable distortion. Figure 1 shows a high-level block diagram of the hiding methods presented. Both the embedding methods, the entropy thresholding (ET) scheme, and the selectively embedding in coefficients (SEC) scheme, are based on © 2010 ACADEMY PUBLISHER 143 joint photographic experts group (JPEG) compression standard. As seen in the Figure 1, the techniques involve taking 2D discrete cosine transform (DCT) of nonoverlapping 8X8 blocks, followed by embedding in selected DCT coefficients. Coding for Insertions & Deletions: We noted that use of image-adaptive criteria is necessary when hiding large volumes of data into images. A threshold is used to determine whether to embed in a block (ET scheme) or in a coefficient (SEC scheme). More advanced image-adaptive schemes would exploit the human visual system (HVS) models to determine where to embed information. Distortion due to attack may cause an insertion (decoder guessing that there is hidden data where there is no data) or a deletion (decoder guessing that there is no data where there was data hidden). There could also be decoding error, where the decoder makes a mistake in correctly decoding the bit embedded. While the decoding errors can be countered using simple error correction codes, insertions and deletions can potentially cause catastrophic loss of synchronization between encoder and decoder. In the ET scheme, insertions and deletions are observed when the attack quality factor is mismatched with the design quality factor for JPEG attack. However, for the SEC scheme, there are no insertions or deletions for most of the images for JPEG attacks with quantization interval smaller than or equal to the design interval. This is because no hidden coefficient with magnitude ≤ t can be ambiguously decoded to t+1 due to JPEG quantization with an interval smaller than the design one. Both the ET and SEC schemes have insertions/deletions under other attacks. Coding Framework: The coding framework employs the idea of erasures at the encoder. The bit stream to be hidden is coded, using a low rate code, assuming that all host coefficients that meet the global criteria will actually be employed for hiding. A code symbol is erased at the encoder if the local perceptual criterion for the block or coefficient is not met. Since we code over entire space of coefficients that lie in a designated low-frequency band, long codewords can be constructed to achieve very good correction ability. A maximum distance separable (MDS) code [24], such as Reed Solomon (RS) code, does not incur any penalty for erasures at the encoder. Turbo-like codes, which operate very close to capacity, incur only a minor overhead due to erasures at the encoder. Figure 3.4 shows how the sequence is decoded in the presence of attacks. As it is seen, insertions become errors, and deletions become additional erasures. It should be noted that a deletion, which causes an erasure, is about half as costly as an insertion, which causes an error. Hence, it is desirable that the data-hiding scheme [4] be adjusted in such a manner that there are only a few insertions. Thus, using a good erasures and errors correcting code, one can deal with insertions/deletions without a significant decline in original embedding rate. Reed-Solomon codes have been used for ET scheme and Repeat Accumulate codes have been used for the SEC scheme. 144 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 IV. RESULT ANALYSIS All steganographic algorithms have to comply with a few basic requirements. The requirements are: Invisibility, Payload capacity, Robustness against statistical attacks, Robustness against image manipulation, Independent of file format and Unsuspicious files. The following table compares least significant bit (LSB) insertion in BMP and in GIF files, JPEG compression steganography, the patchwork approach and spread spectrum techniques, according to the above requirements: TABLE I. COMPARISON OF IMAGE STEGANOGRAPHY ALGORITHMS Invisibility LSB in BMP High* LSB in GIF Mediu m* JPEG compressi on High Patch work Spread spectrum High High Payload capacity High Mediu m Medium Low Medium Robustnes s against statistical attacks Robustnes s against image manipulati on Independe nt of file format Unsuspici ous files Low Low Medium High High Low Low Medium High Medium Low Low Low High High Low Low High High High * - Depends on cover image used The levels at which the algorithms satisfy the requirements are defined as high, medium and low. A high level means that the algorithm completely satisfies the requirement, while a low level indicates that the algorithm has a weakness in this requirement. A medium level indicates that the requirement depends on outside influences, for example the cover image used. LSB in GIF images has the potential of hiding a large message, but only when the most suitable cover image has been chosen. The ideal, in other words a perfect; steganographic algorithm would have a high level in every requirement. Unfortunately in the algorithms that are evaluated here, there is not one algorithm that satisfies all of the requirements. Thus a trade-off will exist in most cases, depending on which requirements are more important for the specific application. The process of embedding information during JPEG compression results in a stego image with a high level of invisibility, since the embedding takes place in the transform domain. JPEG is the most popular image file format on the Internet and the image sizes are small because of the compression, thus making it the least suspicious algorithm to use. However, the process of the © 2010 ACADEMY PUBLISHER compression is a very mathematical process, making it more difficult to implement. The JPEG file format can be used for most applications of steganography, but is especially suitable for images that have to be communicated over an open systems environment like the Internet. V. CONCLUSION AND SCOPE FOR FUTURE WORK The meaning of Steganography is hiding information and the related technologies. There is a principal difference between Steganography and Encryption; however they can meet at some points too. They can be applied together, i.e. encrypted information can be hidden in addition. To hide something a covering medium is always needed. (Picture, sound track, text or even the structure of a file system, etc.) The covering medium must be redundant; otherwise the hidden information could be detected easily. The technology of hiding should match the nature of the medium. The hidden information should not be lost, if the carrying medium is edited, modified, formatted, re-sized, compressed or printed. That’s a difficult task to realize.The application is primarily intended to be used to inconspicuously hide confidential and proprietary information by anyone seeking to hide information. This software has an advantage over other information security systems because the hidden text are in the form of image, which is not obvious text information carriers. Because of its user-friendly interface, the application can also be used by anyone who wants to securely transmit private information. The main advantage of this program for individuals is that they do not have to have any knowledge about steganography or encryption. The visual way to encode the text, plus the visual key makes it easy for average users to navigate within the program. Digital Image Steganography system allows an average user to securely transfer text messages by hiding them in a digital image file. A combination of Steganography and encryption algorithms provides a strong backbone for its security. Digital Image Steganography system features innovative techniques for hiding text in a digital image file or even using it as a key to the encryption. Digital Image Steganography [2] system allows a user to securely transfer a text message by hiding it in a digital image file. 128 bit AES encryption is used to protect the content of the text message even if its presence were to be detected. Currently, no methods are known for breaking this kind of encryption within a reasonable period of time (i.e., a couple of years). Additionally, compression is used to maximize the space available in an image. To send a message, a source text, an image in which the text should be embedded, and a key are needed. The key is used to aid in encryption and to decide where the information should be hidden in the image. A short text can be used as a key. To receive a message, a source image containing the information and the corresponding JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 key are both required. The result will appear in the text tab after decoding. The common Internet-friendly format is offered. It is inherently more difficult to hide information in a JPEG image because that is exactly what the designers of JPEG wanted to avoid: the transmission of extra information that doesn't affect the appearance of the image. ACKNOWLEDGEMENT The work on this paper was supported by the Bharati Vidyapeeth University & College of Engineering, Pune. The views and conclusions contained herein are those of the authors and the paper contains the original work of the authors. We took help from many books, papers and other materials. REFERENCES [1] N . Provos, “Defending Against Statistical Steganography,” Proc 10th USENEX Security Symposium 2005. [2] N . Provos and P. Honeyman, “Hide and Seek: An introduction to Steganography,” IEEE Security & Privacy Journal 2003. [3] Steven W. Smith , The Scientist and Engineer's Guide to Digital Signal Processing [4] Katzenbeisser and Petitcolas , ”Information Hiding Techniques for Stenography and Digital watermaking” Artech House, Norwood, MA. 2000 . [5] L. Reyzen And S. Russell , “More efficient provably secure Steganography” 2007. [6] S.Lyu and H. Farid , “Steganography using higher order image statistics , “ IEEE Trans. Inf. Forens. Secur. 2006. [7] Venkatraman , s, Abraham , A . & Paprzycki M.” Significance of Steganography on Data Security “ , Proceedings of the International Conference on Information Technology : Coding and computing , 2004. [8] Fridrich , J ., Goljan M., and Hogea , D ; New Methodology for Breaking stenographic Techniques for JPEGs. “ Electronic Imaging 2003”. [9] http:/ aakash.ece.ucsb.edu./ data hiding / stegdemo.aspx.Ucsb data hiding online demonstration . Released on Mar .09,2005. [10] Mitsugu Iwanmoto and Hirosuke Yamamoto, “The Optimal n-out-of-n Visual Secret Sharing Scheme for GrayScale Images”, IEICE Trans. Fundamentals, vol.E85A, No.10, October 2002, pp. 2238-2247. [11] Doron Shaked, Nur Arad, Andrew Fitzhugh, Irwin Sobel, “Color Diffusion: Error Diffusion for Color Halftones”, HP Laboratories Israel, May 1999. [12] Z.Zhou, G.R.Arce, and G.Di Crescenzo, “Halftone Visual Cryptography”, IEEE Tans. On Image Processing,vol.15, No.8, August 2006, pp. 2441-2453. [13] M.Naor and A.Shamir, “Visual Cryptography”, in Proceedings of Eurocrypt 1994, lecture notes in computer science, 1994, vol.950, pp. 1-12. [14] Robert Ulichney, “The void-and-cluster method for dither array generation”, IS&T/SPIE Symposium on Electronic Imaging and Science, San Jose, CA, 1993, vol.1913, pp.332-343. [15] E.R.Verheul and H.C.A. Van Tilborg, “Constructions and properties of k out of n visual secret sharing scheme”, Designs, Codes, and Cryptography, vol.1, no.2, 1997, pp.179-196. © 2010 ACADEMY PUBLISHER 145 [16] Daniel L.Lau, Robert Ulichney, Gonzalo R.Arce, “Fundamental Characteristics of Halftone Textures: BlueNoise and Green-Noise”, Image Systems Laboratory, HP Laboratories Cambridge, March 2003. [17] C.Yang and C.Laih, “New colored visual secret sharing schemes”, Designs, Codes and Cryptography, vol.20, 2000, pp.325-335. [18] Jain, Anil K., “Fundamentals of Digital Image Processing”, Prentice-Hall of India, 1989 [19] C.Chang, C.Tsai, and T.Chen, “A new scheme for sharing secret color images in computer network”, in Proc. of International Conference on Parallel and Distributed Systems, 2000, pp. 21-27. [20] R.L.Alder, B.P.Kitchens, M.Martens, “The mathematics of halftoning”, IBM J. Res. & Dev. Vol.47 No.1, Jan. 2003, pp. 5-15. [21] R.Lukac, K.N.Plantaniotis, B.Smolka, “A new approach to color image secret sharing”, EUSIPCO 2004, pp.14931496. [22] H.Ancin, Anoop K.Bhattacharjya, Joseph Shu, “Improving void-and-cluster for better halftone uniformity”,International Conference on Digital Printing Technoogies. [23] D. Hankerson, P. D. Johnson, and G. A. Harris, "Introduction to Information Theory and Data Compression”. [24] Ranjan Bose, “Information Theory Coding and Cryptography”. Meenu Kumari- Completed B.E. in Information Technology from Sanjivani Educational Society & College of Engineering, Kopargaon, Pune University in 2005. Persuing M.Tech. IT from Bharati Vidyapeeth University College of Engineering, Pune. Presented one national comference on Image Compression. Published research paper in one e-journal & one international journals. Submitted research paper in other national & international journals for publication. Prof. A. Khare- Completed B.E. and M.E. from Bhopal. Currently working as Assistant Professor, in Bharati Vidyapeeth University College of Engineering, Information Technology Department, Pune. Presented many national & international conferences & journals. Pallavi Khare- Research Department, Bhopal, India. student of SSSIST, E&TC 146 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 Review of Machine Learning Approaches to Semantic Web Service Discovery Shalini Batra Computer Science and Engineering Department, Thapar University, Patiala, Punjab, India Email: [email protected]. Dr. Seema Bawa Computer Science and Engineering Department, Thapar University, Patiala, Punjab, India; Email: [email protected]. Abstract--- A Web service can discover and invoke any service anywhere on the Web, independently of the language, location, machine, or other implementation details. The goal of Semantic Web Services is the use of richer, more declarative descriptions of the elements of dynamic distributed computation including services, processes, message-based conversations, transactions, etc. In recent years text mining and machine learning have been efficiently used for automatic classification and labeling of documents. Various Web service discovery frameworks are applying machine learning techniques like clustering, classification, association rules, etc., to discover the services semantically. This paper provides an exhaustive review of machine learning approaches used for Web Services discovery and frameworks developed based on these approaches. A thorough analysis of existing frameworks for semantic discovery of Web Services is provided in the paper. Index Terms--- Machine Learning, Semantics, Web Services, Web services Discovery, Web Service Discovery Frameworks I. INTRODUCTION Semantic Web Services (SWS) lie at the intersection of two important trends in the World Wide Web’s evolution. The first is rapid development of Web service technologies and the second is the Semantic Web. Semantic Web focuses on the publication of more expressive metadata in a shared knowledge framework, enabling the deployment of software agents that can intelligently use Web resources. The driving force behind usage of Web services is the need of reliable, vendorneutral, software interoperability across heterogeneous platforms and networks. Another important objective behind the development of Web Services has been the ability to coordinate business processes involving heterogeneous components (deployed as services) across ownership boundaries. These objectives have led to the development of widely recognized Web service standards such as WSDL, UDDI, and BPEL. The Semantic Web brings knowledge-representation languages and ontologies into the fabric of the Internet; providing a foundation for powerful new approaches to © 2010 ACADEMY PUBLISHER doi:10.4304/jait.1.3.146-151 organizing, describing, searching, and reasoning about information and activities on the Web (and also other networked environments). Semantic Web proposes to extend the traditional Web Services technologies on the way to consolidate ontologies and semantics such that services are able to dynamically adapt themselves to changes without human intervention. The description of Semantic Web services enable fuller, more flexible automation of service provision and use and the construction of more powerful tools and methodologies for working with services. As a rich representation framework permits a more comprehensive specification of many aspects of services, SWS can provide a solid foundation for a broad range of activities throughout the Web service life cycle. For example, richer service descriptions can support: • greater automation of service selection and invocation, • automated translation of message content between heterogeneous interoperating services, • automated or semi automated approaches to service composition, and • more comprehensive approaches to service monitoring and recovery from failure [1]. Semantic Web Services enable the automatic discovery of distributed Web services based on comprehensive semantic representations. However, although SWS technology supports the automatic allocation of resources for a given well defined task, it does not entail the discovery of appropriate SWS representations for a given context. One of the major problems with existing structure are that UDDI does not capture the relationships between entities in its directory and therefore is not capable of making use of the semantic information to infer relationships during search. Secondly, UDDI supports search based on the highlevel information specified about businesses and services only. It does not get to the specifics of the capabilities of services during matching [2]. Several upper ontologies (i.e., applicationindependent) have been already proposed for service JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 description. The first one was DAML-S [3] based on the DAML+OIL ontology definition language. However, with the wide acceptance of the Web Ontology Language (OWL) [4] family of languages, DAML-S was replaced by OWL-S [5]. On similar specifications various SWS frameworks like WSDL-S [6], WSMO [7], etc. were also developed. All these specifications, although sharing many modeling elements, differ in terms of expressiveness, complexity and tool support [8]. The only change desired in discovering the services semantically is that some metadata should be available which provide the functional description of Web services on which machine learning techniques like classification, clustering, association mining can be applied. Our contribution in this paper is to present a survey of SWS discovery frameworks based on machine learning approaches, methodologies and techniques applied for discovery of web services semantically and analyze the shortcomings of these approaches along with future direction to accomplish the job of Web Service discover successfully. II. MACHINE LEARNING BASED FRAMEWORKS In machine learning there are two major settings in which a function can be described: supervised learning and unsupervised learning. In supervised learning the variables under investigation can be split into two groups: explanatory variables and one (or more) dependent variables. The target of the analysis is to specify a relationship between the explanatory variables and the dependent variable as it is done in regression analysis. To apply directed data mining techniques the values of the dependent variable must be known for a sufficiently large part of the data set. In unsupervised learning all variables are treated in the same way, there is no distinction between explanatory and dependent variables. Supervised learning requires that the target variable is well defined and that a sufficient number of its values are given. For unsupervised learning typically either the target variable is unknown or has only been recorded for too small a number of cases. Classification models are created by examining already classified data (cases) and inductively finding a predictive pattern. Classification problems aim to identify the characteristics that indicate the group to which each case belongs. This pattern can be used both to understand the existing data and to predict how new instances will behave. Clustering is a multivariate statistical technique that allows an automatic generation of groups in data. The result of the clustering is a partitioning of the collection of objects in groups of related objects. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. Conceptual clustering is a machine learning paradigm for unsupervised classification distinguished from ordinary data clustering by generating a concept description for each generated class. Most conceptual © 2010 ACADEMY PUBLISHER 147 clustering methods are capable of generating hierarchical category structures. A. Classification Based Approaches to Semantic Web Service Discovery Some of the Web service discovery frameworks combine text mining and machine learning techniques for classifying Web services and hence some semiautomatic and automatic methods have been proposed for Web service discovery through classification. Some approaches are based on argument definitions matching [9, 10], document classification techniques [11, 12] and semantic annotations matching [13]. MWSAF [9] is an approach for classifying Web services based on argument definitions matching. First, MWSAF translates the WSDL definitions into a graph. Then, MWSAF uses graph similarity techniques for comparing both. On similar lines Duo et al. [10] propose to translate a definition into an ontology, instead of a graph. Then, an ontology alignment technique attempts to map one ontology on another [7]. METEOR-S [11] describes a further improved version of MWSAF. The problem of determining a Web service category is abstracted to a document classification problem. The graph matching technique is replaced with a Na¨ıve Bayes classifier. To do this, METEOR-S extracts the names of all operations and arguments declared in WSDL documents of precategorized Web services. Assam [14] is an ensemble machine learning approach for determining Web service category. Assam combines the Na¨ıve Bayes and SVM [15] machine learning algorithms to classify WSDL files in manually defined hierarchies. Assam takes into account Web service natural language documentation and descriptions. Automatic Web Service Classification (AWSC) compares a Web service description with other descriptions that have been manually classified. In AWSC, a two-stage process to classify a Web service is applied which uses text mining techniques at the first stage, namely preprocessing, to extract relevant information from a WSDL document and a supervised document classifier at the second stage, namely classification. This classifier deduces a sequence of candidate categories for a preprocessed Web service description. B. Cluster Based Approaches to Semantic Web Service Discovery The clustering methodology re-organizes a set of data into different groups based on some standards of similarity thus transforming a complex problem into a series of simpler ones, which can be handled more easily. Based on the clustered service groups, a set of matched services can be returned by comparing the similarity between the query and related group, rather than computing the similarity between query and each service in the dataset. If the service results returned are not compatible to the user’s query, the second best cluster would be chosen and the computing proceeds to the next iteration 148 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 Various clustering approaches have been used for discovering Web services. Dong [16] puts forward a clustering approach to search Web services where the search consisted of two main stages. A service user first types keywords into a service search engine, looking for the corresponding services. Then, based on the initial Web services returned, the approach extracts semantic concepts from the natural language descriptions provided in the Web services. In [17] Arbramowicz proposed an architecture for Web services filtering and clustering. The service filtering is based on the profiles representing users and application information, which are further described through Web Ontology Language for Services (OWL-S). In order to improve the effectiveness of the filtering process, a clustering analysis is applied to the filtering process by comparing services with related the clusters. Another similar approach followed in [18] concentrates on Web service discovery with OWL-S and clustering technology, which consists of three main steps. The OWL-S is first combined with WSDL to represent service semantics before a clustering algorithm is used to group the collections of heterogeneous services together. Finally, a user query is matched against the clusters, in order to return the suitable services. Web services are clustered into the predefined hierarchical business categories in [19] and service discovery is based on a directory. In this situation, the performance of reasonable service discovery relies on both service providers and service requesters having prior knowledge on the service organization schemes. In [20] Probabilistic Latent Semantic Analysis (PLSA) is used to capture semantic concepts hidden behind words in the query and advertisements in services so that service matching is carried out at concept level. The Singular Vector Decomposition (SVD) of matrix approach of matching web services [21] has been extended with a different methodology called Probabilistic Latent Semantics Analysis (PLSA) based on aspect model. The model indirectly associates keywords to its corresponding documents by introducing an intermediate layer called hidden factor variable Z = {z1, z2 ,..., zk }[22]. Clustering Probabilistic Semantic Approach (CPLSA) discussed in [24] is an extension of [20], uses a dynamic algorithm that partitions a service working dataset into smaller pieces. It includes the two main phases: eliminating irrelevant services and matching services at semantic concept level. Once the irrelevant services are eliminated, a Probabilistic Latent Semantic Analysis approach is applied to the working dataset for capturing semantic concepts. As a result, Web services are clustered into a finite number of semantically related groups. The Semantic We Service Classification (SWSC) method discussed in [25] analyses the WSDL and checks its configuration and structure for further processing. The WSDL information of Web services is transformed in a richer semantic representation language OWL-S [4] using a series of methods. The hierarchical agglomerative clustering method, which is often used in information retrieval for grouping similar documents, is used in [27] for Web service © 2010 ACADEMY PUBLISHER clustering. This method uses a bottom-up strategy that starts by placing each Web service in its own cluster, and then successively merges clusters together until a stopping criterion is satisfied. The clusters (terms representing Web services) are stored in the UDDI registry database. The SWSC method improves the search function by retrieving the best offers of services using the cluster matching. The SWSC method ranks the matched Web services and indicates the degree of relevance according to the term existence in clusters. C. Context-Aware SWS Discovery Contexts have increasingly been considered for better service provision. Dey [27] describes a context-aware computing system as a system that uses contexts to provide relevant information and/or services to the users, where relevancy depends on the user tasks, while Korkea-aho [26] defines contexts as any situational information that is available at the time of interaction between users and computing systems. Contexts can be useful for service discovery also. The UDDI can better perform if contexts of service consumers, service providers, and Web services are considered at discovery time. Context awareness has been applied in Web services discovery researches. The WASP project [27] attempts to enhance the standard UDDI into UDDI+ by adding the semantic and contextual features. Their approach focuses on semantic analysis of service provider contexts which are described by the ontologybased DAML-S specification, and hence their contexts are static only. Doulkeridis et al [29] propose a context-aware service discovery architecture which accommodates various registry technologies including the UDDI and ebXML registry, but they consider the provision and consumption of services via mobile devices. Their approach therefore focuses on contexts related to mobility and handheld devices and does not cater for a generic context model. Lee et al [30] enhances context-aware discovery by introducing context attributes as part of service descriptions in the service registry but their contexts are dynamic attributes only. The CB-SeC framework [31] also enables more sophisticated discovery and composition of services, by having a WSDL of a Web service augmented with context functions; these context functions will be invoked to determine the values of the service contexts. In this way, however, the WSDL will be cluttered with operations that do not reflect service capability. Keidl et al [32] introduces the concept of context type in their context framework but they focus on adapting service provision according to the consumer’s contexts which are specified under particular context types; their framework does not consider service discovery. Conceptual Spaces (CS), introduced by Gärdenfors [32,33], follows a theory of describing entities at the conceptual level in terms of their natural characteristics similar to natural human cognition in order to avoid the symbol grounding issue. Semantic similarity between situations is calculated in terms of their Euclidean JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 distance within a CSS. Context-aware discovery and invocation of Web services and data sources is highly desired across a wide variety of application domains and subject to intensive research throughout the last decade [34, 35, 24] . Authors in [37] propose that extending merely symbolic SWS descriptions with context information on a conceptual level through CSS enables similarity-based matchmaking between real-world situation characteristics and predefined resource representations as part of SWS descriptions. CSS are mapped to standardized SWS representations to enable the context-aware discovery of appropriate SWS descriptions and the automatic discovery and invocation of appropriate resources - Web services and data - to achieve a given task within a particular situation. D. Web Services Discovery Based On Schema Matching In [31] the authors propose a SVD-Based algorithm to locate matched services for a given service. There algorithm uses characteristics of singular value decomposition to find relationships among services. But it only considers textual descriptions and can not reveal the semantic relationship between web services. Wang et el [38] proposed a method based on information retrieval and structure matching. Given a potentially partial specification of the desired service, all textual elements of the specification are extracted and are compared against the textual elements of the available services, to identify the most similar service description files and to order them according to their similarity. Approach in [38] is similar to that followed in [16] but here focus in is on the semantic similarity not the structural similarity. Woogle [16] develops a clustering algorithm to group names of parameters of web-service operations into semantically meaningful concepts. Then these concepts are used to measure similarity of web-service operations. It relies too much on names of parameters and does not deal with composition problem. A schema tree matching algorithm has been proposed in [23], which employs a cost model to compute tree edit distances for supporting web-service operations matching and catches the semantic information of schemas and then the agglomeration algorithm is employed to cluster similar web- service operations and then rank them to satisfy a user's top-k requirements. E. Some Prevalent Frameworks and Methodologies A considerable body of research has emerged proposing different methods of improving accuracy of Web service discovery. A Web service discovery method combining semantic and statistical association with hyperclique pattern discovery [39] has been proposed. Algorithms using Singular Vector Decomposition (SVD) [15] and probabilistic latent semantic analysis [20] have been proposed to find the similarity between various Web services to enhance the accuracy of service discovery. However none of these methods provides empirical and theoretical analysis showing that these methods improve the process of Web service discovery. In [36] authors proposed an extension of SVD [15] to support-based © 2010 ACADEMY PUBLISHER 149 latent semantic kernel to further increase the accuracy of Web service discovery by using random projection [20] for service discovery. In random projection, the initial corpus is projected to l dimensions, for some l > k, where k is the dimension of the semantic kernel, to obtain a smaller representation which is close to the original corpus and then perform SVD on the reduced dimension matrix where the semantic kernel has been created on a large Wikipedia corpus for dimensionality reduction by introducing the concept of merging documents as well as using the constructed kernel on a general-purpose corpus to find semantically similar Web services for a user query. Hyperclique pattern [39] are described as a type of association pattern containing items that are strongly associated with each other. Every pair of items within a hyperclique pattern is guaranteed to have the uncentered correlation coefficient above a certain level. When used in Web services field, items are the input or output parameters and a transaction is the set of input and output parameters for individual web service. Hyperclique pattern discovery can be adapted to capture frequently occurring local operation parameters’ structures in web services, where parameters of a web service are represent as a vector, and each entry records the terms of the operations’ input and output, thus each of these collections of terms forms a transaction. The web service collection is mined to find the frequent hyper clique patterns that satisfy a given support level and hconfidence level [23]. This is followed by a pruning of the hyper clique patterns on the basis of the ranking of semantic relationships among the terms. III. COMPARITIVE ANALYSIS OF EXISTING FRAMWORKS The major concerns in Semantic Web Service Discovery are that all new services do not have semantic tagged descriptions and vast majority of already existing web services do not have associated semantics. The problem of discovering Web services semantically is that there are too few annotated services and hence, semantic approach suffers from a cold-start problem as it assumes that a corpus of previously annotated services is available. Incorporating semantic annotating support to Web service is necessary. A sensible classification system may “guide” the annotating process by deducing a handful set of similar services. Efforts for classifying Web services have several shortcomings. Natural language documentation, usually present in WSDL files and service registries have not been considered thoroughly. Some of frameworks are based on the false premise that an operation and its argument names are independent, while some do not consider natural language documentation. Some frameworks assume that a corpus of previously classified services is available which generates the inability for dynamically creating categories without re-building the classifier. Clustering annotated resources enables the definition of new emerging concepts (concept formation) on the grounds of the concepts defined in a knowledge base; 150 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 supervised methods can exploit these clusters to induce new concept definitions or to refining existing ones (ontology evolution); intentionally defined groupings may speed-up the task of search and discovery. Although the idea of clustering of similar Web services into a group is well supported and appreciated but it lacks a common base. Some authors propose hierarchical clustering while others prefer agglomerative clustering. The major effort required it is to have incremental clustering and dynamic classification. A common problem with the SVD based approaches is that the computation of the high dimensional matrix representing the training documents is expensive. There have been some attempts to reduce the dimensionality of matrix prior to applying SVD. Current SWS frameworks such as WSMO and OWL-S address the allocation of distributed services for a given (semantically) well-described task but none of the Web services fully solve the issues related to symbolic Semantic Web -based knowledge representations. Although lot of research has been done in this area and the entire research is moving in right direction but still no major breakthrough has been achieved and a lot more needs to be done to accomplish the task of Semantic Web service discovery. Meta data based classification is a realistic option as they produce more specific semantic types. Combining metadata and content-based classification can indeed improve the performance of semantic based discovery. If we try to explore the clustering option then incremental clustering approach might be more suitable. Functional description of Web service can be provided in documentation tag of the published service as this will serve as an additional sources of information and it semantically annotates the services, which can be easily extracted by text pre-processing techniques including detagging, tokenizing, stop word removal, etc. Using trained classifiers is not accurate enough and soft classification based frameworks should be considered. For more effective ranking of the results, semantic weights should be associated to the retrieved set of web services. Applying content based classification algorithm is an efficient method which can be used to classify web services into their respective groups. Pattern learning algorithms are another option which can be used to identify similar patterns in heterogeneous data and match them semantically. REFERENCES [1] David Martin, John Domingue, “Semantic Web Services: Trends & Controversies”. [2] Schmidt, A., Winterhalter, C. (2004), “User Context Aware Delivery of E-Learning Material: Approach and Architecture”, Journal of Universal Computer Science (JUCS) vol.10, no.1, January 2004 [3] http://www.daml.org/index.html. © 2010 ACADEMY PUBLISHER [4] w3.org, “OWL Web Ontology Language Overview”, owlfeatureshttp://www.w3.org/TR/2004/REC20040210/ [5] Martine et al, “OWL-S: Semantic Markup for Web Services”, W3C Member Submission. [6] R. Akkiraju, J. Farrell, J.Miller, M. Nagarajan, M. Schmidt, A. Sheth, K.Verma, "Web Service Semantics - WSDL-S, " A joint UGA-IBM Technical Notes Version1.0,April 18, 2005. [7] Keller, U., Lara, R., Polleres, A., “WSMO d5.1 discovery”, http://www.wsmo.org/TR /d5/d5.1 [8] Jorge Cardoso, “Semantic Web Services: Theory, Tools, and Applications”, Information Science Reference, University of Madeira, Portugal, ISBN 978-1-59904-0455, 2007. [9] Abhijit A. Patil, Swapna A. Oundhakar, Amit P. Sheth, and Kunal Verma, “METEOR-S Web Service Annotation Framework”, In Proc. of the 13th International Conference on WWW. ACM Press, 2004 [10] Zhang Duo, Li Zi, , and Xu Bin, “Web service annotation using ontology mapping”, IEEE International Workshop on Service-Oriented System Engineering, pages 235–242, 2005. [11] Nicole Oldham, Christopher Thomas, Amit P. Sheth, and Kunal Verma, “METEOR-S Web service annotation framework with machine learning classification”, In Semantic Web Services and Web Process Composition, volume 3387 of LNCS, pages 137–146, San Diego, CA, USA, 2004. Springer. [12] Andreas Heß and Nicholas Kushmerick. “Learning to attach Semantic Metadata to Web Services”, in Dieter Fensel, Katia P. Sycara, and John Mylopoulos, editors, International Semantic Web Conference, volume 2870 of Lecture Notes in Computer Science, pages 258–273. Springer, 2003. [13] Miguel ´ Angel Corella and Pablo Castells. “Semiautomatic Semantic-based Web Service Classification”, Business Process Management Workshops, volume 4103 of LNCS, pages 459–470, Vienna, Austria, September 4-7 2006. Springer. [14] Andreas Heß, Eddie Johnston, and Nicholas Kushmerick, “ASSAM: A tool for Semi Automatically Annotating Semantic Web Services”, In McIlraith et al. [pages 320– 334. [15] Nello Cristianini and John Shawe-Taylor, “An Introduction to Support Vector Machines and other Kernel-Based Learning Methods”, Cambridge University Press, New York, NY, USA, 2000. [16] X. Dong, A. Halevy, J. Madhavan, E. Nemes and J. Zhang, “Similarity Search for Web services”, Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004. [17] W. Abramowicz, K. Haniewicz, M. Kaczmarek and D. Zyskowski “Architecture for Web services Filtering and Clustering”, Internet and Web Applications and Services, (ICIW '07),May 13-19, 2007, Le Morne, Mauratius. [18] Le Duy Ngan, Tran Minh Hang, and Angela Eck Soong Goh “MOD - A Multi-Ontology Discovery System”, International Workshop on Semantic Matchmaking and Resource Retrieval (co-located with VLDB'06), Seoul, Korea, Sept. 2006. [19] Shou-jian Yu, Jing-zhou Zhang, Xiao-kun Ge, Guo-wen Wu, “Semantics Based Web Services Discovery” Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications & Conference on Real-Time Computing Systems and Applications, PDPTA 2006, Las Vegas, Nevada, USA, JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010 [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] June 26-29, 2006, Volume 1. CSREA Press 2006, ISBN 1932415-86-6 Jiangang Ma, Jinli Cao, Yanchun Zhang, “A Probabilistic Semantic Approach for Discovering Web Services”, WWW 2007, May 8–12, 2007, Banff, Alberta, Canada. ACM 9781-59593-654-7/07/0005. Atul Sajjanhar, Jingyu Hou and Yanchun Zhang. “Algorithm for Web Services Matching”, In Proceedings of the 6th Asia-Pacific Web Conference, `APWeb', Vol. 3007, pp. 665-6702004, Hangzhou, China, April,1417,2004. Thomas. Hofmann, “Probabilistic Latent Semantic Analysis”, In Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval. Berkeley, California, pages: 50-57, ACM Press, 1999. Natenapa Sriharee, “Semantic Web Services Discovery Using Ontology-based Rating Model”, Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06) 18-22 Dec. 2006, Hongkong, China. 0-7695-2747-7/06 Jiangang Ma, Jinli Cao, Yanchun Zhang, “Efficiently Finding Web Services Using a Clustering Semantic Approach”, CSSSIA 2008, April 22, Beijing, China Richi Nayak Bryan Lee, “Web Service Discovery with Additional Semantics and Clustering”, IEEE Computer Society, 2 M. Korkea-aho, “Context-aware Applications Survey”, Internetworking Seminar (Tik-110.551), Helsinki University of Technology, Spring 2000. A. Dey, “Providing Architectural Support for Building Context-aware Application”, Ph.D. dissertation (Atlanta: Georgia Institute of Technology, 2000). S. Pokraev, J. Koolwaaij and M. Wibbels, “Extending UDDI with Context-aware Features based on Semantic Service Descriptions”, Proc. of 1st Intl. Conf. on Web Services, Las Vegas, Nevada, USA, 2003, 184-190. C. Doulkeridis, N. Loutas, & M. Vazirgiannis, “A System Architecture for Context-aware Service Discovery”, Proc. of Intl. Workshop on Context for Web Services (CWS'05), Paris, France, July 5, 2005, 101-116. C. Lee & S. Helal, “Context attributes: An Approach to Enable Context-awareness for Services Discovery”, Proc. of Symposium on Application and the Internet, Florida, USA, 2003, 22-30. S. K. Mostefaoui, H. Gassert, & B. Hirsbrunner, “Context meets Web Services: Enhancing WSDL with ContextAware Features”, Proc. of 1st Intl. Workshop on Best Practices and Methodologies in Service-Oriented Architectures: Paving the Way to Web-services Success, Vancouver, British Columbia, Canada, 2004, 1-14. M. Keidl & A. Kemper, “Towards Context-aware Adaptable Web services”, Proc. of 13th Intl. World Wide Web Conf. - Alternate Track Papers & Posters, New York, USA, 2004, 55-65. Nicola Fanizzi, Claudia d’Amato, and Floriana Esposito, “Randomized Metric Induction and Evolutionary Conceptual Clustering for Semantic Knowledge Bases”, CIKM’07, November 6–8, 2007, Lisboa, Portugal. Dietze, S., Gugliotta, A., Domingue, J., “A Semantic Web Services-based Infrastructure for Context-Adaptive Process Support”, Proceedings of IEEE 2007 International Conference on Web Services (ICWS), Salt Lake City, Utah, USA. Xiong, H., Tan, P., & Kumar, V. “Mining Strong Affinity Association Patterns in Data Sets with Skewed Support © 2010 ACADEMY PUBLISHER 151 Distribution”, IEEE International Conference on Data Mining (ICDM), 387-394. [36] Papadimitriou C. H., Raghavan P., Tamaki H., and Vempala S., "Latent Semantic Indexing: A Probabilistic Analysis," in Seventeenth ACM SIGACT-SIGMODSIGART symposium on Principles of database systems Seattle, Washington, United States, 1998, pp. 159-168. [37] Nicola Fanizzi, Claudia d’Amato, Floriana Esposito, “Instance based Retrieval by Analogy”, SAC’07 March 11-15, 2007, Seoul, Korea. [38] Wang, Y. & Stroulia, E., “Flexible Interface Matching for Web Service Discovery”, in `WISE', 2003. [39] Lin J. and Gunopulos D., "Dimensionality Reduction by Random Projection and Latent Semantic Indexing," in Third SIAM International Conference on Data Mining, San Francisco, CA, USA, 2003. Mrs. Shalini Batra joined Computer Science and Engineering Department, Thapar University, Patiala as Lecture in 2002 and she is presently working as Assistant Professor in the same department since 2009. She has done her Post graduation from BITS, Pilani and is pursuing Ph.D. from Thapar University in the area of Semantic and Machine Learning. She has guided fifteen ME theses and presently guiding four. She is author/co-author of more than twenty-five publications in national and international conferences and journals. Her areas of interest include Web semantics and machine learning particularly semantic clustering and classification. She is taking courses of Compiler construction, Theory of Computations and Parallel and Distributed Computing. Dr. Seema Bawa has done her M. Tech. from IIT, Kharagpur and Ph.D. from Thapar University, Patiala. She joined Computer Science and Engg. Dept., Thapar University, Patiala as Asstt. Professor in 1999 and she is presently serving as Professor since 2004. She has guided four Ph.Ds and more than thirty M.E theses. She has served Computer industry for more than six years before joining the University and has teaching experience of more than ten years. She has undertaken various projects and consultancy assignments in industry and academia. She is the author/co-author of more than 75 publications in technical journals and conferences of international repute. She has served as Advisor / Track chair for various national and international confrences. Her areas of interest include Parallel, Distributed and Grid Computing and Cultural Computing. Call for Papers and Special Issues Aims and Scope JAIT is intended to reflect new directions of research and report latest advances. It is a platform for rapid dissemination of high quality research / application / work-in-progress articles on IT solutions for managing challenges and problems within the highlighted scope. JAIT encourages a multidisciplinary approach towards solving problems by harnessing the power of IT in the following areas: • Healthcare and Biomedicine - advances in healthcare and biomedicine e.g. for fighting impending dangerous diseases - using IT to model transmission patterns and effective management of patients’ records; expert systems to help diagnosis, etc. • Environmental Management - climate change management, environmental impacts of events such as rapid urbanization and mass migration, air and water pollution (e.g. flow patterns of water or airborne pollutants), deforestation (e.g. processing and management of satellite imagery), depletion of natural resources, exploration of resources (e.g. using geographic information system analysis). • Popularization of Ubiquitous Computing - foraging for computing / communication resources on the move (e.g. vehicular technology), smart / ‘aware’ environments, security and privacy in these contexts; human-centric computing; possible legal and social implications. • Commercial, Industrial and Governmental Applications - how to use knowledge discovery to help improve productivity, resource management, day-to-day operations, decision support, deployment of human expertise, etc. Best practices in e-commerce, e-commerce, egovernment, IT in construction/large project management, IT in agriculture (to improve crop yields and supply chain management), IT in business administration and enterprise computing, etc. with potential for cross-fertilization. • Social and Demographic Changes - provide IT solutions that can help policy makers plan and manage issues such as rapid urbanization, mass internal migration (from rural to urban environments), graying populations, etc. • IT in Education and Entertainment - complete end-to-end IT solutions for students of different abilities to learn better; best practices in elearning; personalized tutoring systems. IT solutions for storage, indexing, retrieval and distribution of multimedia data for the film and music industry; virtual / augmented reality for entertainment purposes; restoration and management of old film/music archives. • Law and Order - using IT to coordinate different law enforcement agencies’ efforts so as to give them an edge over criminals and terrorists; effective and secure sharing of intelligence across national and international agencies; using IT to combat corrupt practices and commercial crimes such as frauds, rogue/unauthorized trading activities and accounting irregularities; traffic flow management and crowd control. The main focus of the journal is on technical aspects (e.g. data mining, parallel computing, artificial intelligence, image processing (e.g. satellite imagery), video sequence analysis (e.g. surveillance video), predictive models, etc.), although a small element of social implications/issues could be allowed to put the technical aspects into perspective. In particular, we encourage a multidisciplinary / convergent approach based on the following broadly based branches of computer science for the application areas highlighted above: Special Issue Guidelines Special issues feature specifically aimed and targeted topics of interest contributed by authors responding to a particular Call for Papers or by invitation, edited by guest editor(s). We encourage you to submit proposals for creating special issues in areas that are of interest to the Journal. Preference will be given to proposals that cover some unique aspect of the technology and ones that include subjects that are timely and useful to the readers of the Journal. A Special Issue is typically made of 10 to 15 papers, with each paper 8 to 12 pages of length. The following information should be included as part of the proposal: • Proposed title for the Special Issue • Description of the topic area to be focused upon and justification • Review process for the selection and rejection of papers. • Name, contact, position, affiliation, and biography of the Guest Editor(s) • List of potential reviewers • Potential authors to the issue • Tentative time-table for the call for papers and reviews If a proposal is accepted, the guest editor will be responsible for: • Preparing the “Call for Papers” to be included on the Journal’s Web site. • Distribution of the Call for Papers broadly to various mailing lists and sites. • Getting submissions, arranging review process, making decisions, and carrying out all correspondence with the authors. Authors should be informed the Instructions for Authors. • Providing us the completed and approved final versions of the papers formatted in the Journal’s style, together with all authors’ contact information. • Writing a one- or two-page introductory editorial to be published in the Special Issue. Special Issue for a Conference/Workshop A special issue for a Conference/Workshop is usually released in association with the committee members of the Conference/Workshop like general chairs and/or program chairs who are appointed as the Guest Editors of the Special Issue. Special Issue for a Conference/Workshop is typically made of 10 to 15 papers, with each paper 8 to 12 pages of length. Guest Editors are involved in the following steps in guest-editing a Special Issue based on a Conference/Workshop: • Selecting a Title for the Special Issue, e.g. “Special Issue: Selected Best Papers of XYZ Conference”. • Sending us a formal “Letter of Intent” for the Special Issue. • Creating a “Call for Papers” for the Special Issue, posting it on the conference web site, and publicizing it to the conference attendees. Information about the Journal and Academy Publisher can be included in the Call for Papers. • Establishing criteria for paper selection/rejections. The papers can be nominated based on multiple criteria, e.g. rank in review process plus the evaluation from the Session Chairs and the feedback from the Conference attendees. • Selecting and inviting submissions, arranging review process, making decisions, and carrying out all correspondence with the authors. Authors should be informed the Author Instructions. Usually, the Proceedings manuscripts should be expanded and enhanced. • Providing us the completed and approved final versions of the papers formatted in the Journal’s style, together with all authors’ contact information. • Writing a one- or two-page introductory editorial to be published in the Special Issue. More information is available on the web site at http://www.academypublisher.com/jait/.

Log In

Journal of Advances in Information Technology

Related papers

Related papers

Related topics