Journal of Advances in Information Technology
ISSN 1798-2340
Volume 1, Number 3, August 2010
Special Issue: Ubiquitous Computing
Guest Editors: Neeraj Kumar Nehra and Pranay Chaudhuri
Contents
Guest Editorial
Neeraj Kumar Nehra and Pranay Chaudhuri
103
SPECIAL ISSUE PAPERS
A Location Dependent Connectivity Guarantee Key Management Scheme for Heterogeneous
Wireless Sensor Networks
Kamal Kumar, A. K. Verma, and R.B. Patel
105
Peripheral Display for Multi-User Location Awareness
Rahat Iqbal, Anne James, John Black, and Witold Poreda
116
Discrete Characterization of Domain Using Semantic Clustering
Sanjay Madan and Shalini Batra
127
GRAAA: Grid Resource Allocation Based on Ant Algorithm
Manpreet Singh
133
A Channel Allocation Algorithm for Hot-Spot Cells in Wireless Networks
Rana Ejaz Ahmed
136
JPEG Compression Steganography & Crypography Using Image-Adaptation Technique
Meenu Kumari, A. Khare, and Pallavi Khare
141
Review of Machine Learning Approaches to Semantic Web Service Discovery
Shalini Batra and Seema Bawa
146
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
103
Special Issue on Ubiquitous Computing
Guest Editorial
Ubiquitous computing is an important and fast growing research area but the development of ubiquitous computing is
still in its infancy, although a few ubiquitous services have been developed and deployed in our daily lives, such as
mobile audio/video streaming, mobile e-learning, and remote video surveillance. This fast growing field will emerge as
a new research field in near future having interaction with the outer world with the development of numerous
interesting ubiquitous applications.
The main aim of this special issue was to collect original research papers that present recent advances and future
directions in Ubiquitous environment from theoretical as well as practical point of view. This special issue is a
collection of research papers from all aspects of this new emerging field e.g. design, implementation and future aspects
as well as challenges and constraints in this field. This special issue contains a diverse collection of high-quality papers
authored by eminent academicians and researchers in the field.
In the first paper, Kamal et. al. propose a Location dependent Connectivity guarantee Key management scheme for
heterogeneous wireless sensor networks (LOCK) without using deployment knowledge. A pair-wise, group wise and
cluster key is generated efficiently for participating nodes. LOCK provides dynamicity by two ways; one by not
completely depending upon pre deployed information and other by not completely depending upon location. Scheme is
proved to support largest possible network using smallest storage overhead as compared to existing key management
schemes.
In the second paper, Rahat et. al. have developed a multi-user location-awareness system by following a user-centred
design and evaluation approach. The authors discuss the development of the system that allows users to share
informative feedback about their current geographical location. Also the proposed system can be used by various users,
for example family members, relatives or a group of friends, in order to share the information related to their locations
and to interact with each other.
In the next paper, Sanjay et. al., propose to get the knowledge about the software systems in software reengineering.
In the proposed approach, the mapping of domain to the code using the information retrieval techniques and linguistic
information, such as identifier names and comments in source code has been used. Moreover, concept of Semantic
Clustering has been introduced in this paper and an algorithm has been provided to group source artifacts based on how
the synonymy and polysemy is related. Based on semantic similarity automatic labeling of the program code is done
after detecting the clusters, and is visually explore in 3-Dimension for discrete characterization.
In the next paper, Manpreet et al. propose resource allocation on grid using ant colony algorithm. The major
objective of resource allocation in grid is effective scheduling of tasks and in turn the reduction of execution time. For
efficient resource allocation an Ant colony algorithm is proposed, which is one of the heuristic algorithm suits well for
the allocation and scheduling in grid environment.
In the next paper, author proposes a channel allocation algorithm for hot spot cells in wireless networks. The
proposed scheme presents a new hybrid channel allocation algorithm in which the base station sends a multi-level hotspot notification to the central pool located at Mobile Switching Station (MSC) on each channel request that cannot be
satisfied locally at the base station. This notification will request more than one channel be assigned to the requesting
cell, proportional to the current hot-spot level of the cell. When a call using such a borrowed channel terminates, the cell
may retain the channel depending upon its current hot-spot level.
In the next paper, Meenu Kumari et. al. propose a JPEG Compression Steganography & Cryptography using ImageAdaptation Technique. Authors have designed a system that allows an average user to transfer text messages by hiding
them in a digital image file using the local characteristics within an image. This paper is a combination of
steganography and encryption algorithms, which provides a strong backbone for its security. The proposed system not
only hides large volume of data within an image, but also limits the perceivable distortion that might occur in an image
while processing it.
In the next paper, Shalini et. al. provides an exhaustive review of machine learning approaches used for Web
Services discovery and frameworks developed based on these approaches. A thorough analysis of existing frameworks
for semantic discovery of Web Services is also provided in this paper.
Special Thanks
Guest Editors would like to extend sincere thanks to all the people who have contributed their time and efforts in
making this special issue a grand success. We are thankful to all the authors who have contributed their papers for this
special issue. We are thankful to all the reviewers for providing their valuable suggestions and comments to the
submitted manuscripts. We are also thankful to Editor-in-Chief, Prof. ACM Fong, for his encouragement and strong
support during the preparation of this special issue.
© 2010 ACADEMY PUBLISHER
doi:10.4304/jait.1.3.103-104
104
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
Guest Editors
Dr. Neeraj Kumar Nehra
Assistant Professor, School of CSE, SMVD University, Katra (J&K), India
E-mail:
[email protected];
[email protected]
Dr. Pranay Chaudhuri
Professor, Department of CSE, JUIT, Waknaghat (H.P.), India
Email:
[email protected]
Dr. Neeraj Kumar Nehra is working as Assistant Professor in School of Computer Science and Engineering, Shri
Mata Vaishno Devi University, Katra(India). He received his Ph.D. in CSE from Shri Mata Vaishno Devi University,
Katra(India) and PDF from UK. He has more than 30 publications in reputed journals and conferences including IEEE,
Springer and ACM. His research is focused on mobile computing, parallel/distributed computing, multiagent systems,
service oriented computing, routing and security issues in wireless adhoc, sensor and mesh networks. He is leading the
Mobile Computing and Distributed System Research Group. Prior to joining SMVDU, Katra he has worked with HEC
Jagadhri and MMEC Mullana, Ambala, Haryana, India. He has delivered invited talks and lectures in various IEEE
international conferences in India and abroad. He has organized various special sessions in international conferences in
his area of expertise in India and abroad. He is TPC of various IEEE sponsored conferences in India and abroad. He is
reviewer/ editorial board of various journals e.g. Journal of Supercomputing (Springer), International Journal of
Network Security (IJNS), Journal of Emerging Trends in Web Intelligence, Journal of Advances in Information
Technology, IJCA and many more. He is senior member of ACEEE and IACSIT.
Prof. Pranay Chaudhuri has been the Head of the Department of Computer Science, Mathematics and Physics,
University of the West Indies. Professor Pranay Chaudhuri joined the University of the West Indies in June 2000 as
Professor of Computer Science. Prior to joining the University of the West Indies, Professor Chaudhuri has held faculty
positions at the Indian Institute of Technology, James Cook University of North Queensland, University of the New
South Wales and Kuwait University, and is currently professor at Jaypee University of Information Technology, India.
Professor Chaudhuri's research interests include Parallel and Distributed Computing, Grid Computing, Self-stabilization
and Graph Theory. In these areas, he has extensively published in leading international journals and conference
proceedings. He is also the author of a book entitled, Parallel Algorithms: Design and Analysis (Prentice-Hall, 1992).
Professor Chaudhuri is the recipient of several international awards for his research contribution
© 2010 ACADEMY PUBLISHER
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
105
A Location Dependent Connectivity Guarantee
Key Management Scheme for Heterogeneous
Wireless Sensor Networks
Kamal Kumar,
M.M. Engineering College Mullana, Ambala, Haryana, India.
[email protected]
A. K. Verma
R.B. Patel
Thapar Institute of Engineering and Technology
Patiala, Punjab, India
[email protected]
M.M. Engineering College Mullana,
Ambala, Haryana, India.,
[email protected]
Abstract – Wireless sensor networks pose new security and
privacy challenges. One of the important challenges is how
to bootstrap secure communications among nodes. Several
key management schemes have been proposed. However,
they either cannot offer strong resilience against node
capture attacks, or requires too much memory for achieving
the desired connectivity. In this paper, we propose a
LOcation dependent Connectivity guarantee Key
management scheme for heterogeneous wireless sensor
networks (LOCK) without using deployment knowledge. In
our scheme, a target field is divided into hexagon clusters
using a new clustering scheme crafted out of nodes’s
heterogeneity. Even without using deployment knowledge,
we drastically reduce the number of keys to be stored at
each node. A pair-wise, group wise and cluster key can be
generated efficiently for among nodes. LOCK provides
dynamicity by two ways; one by not completely depending
upon pre deployed information and other by not completely
depending upon location. Compared with existing schemes,
our scheme achieves a higher connectivity with a much
lower memory requirement. It also outperforms other
schemes in terms of resilience against node capture and
node replication attacks. Scheme is proved to support
largest possible network using smallest storage overhead as
compared to existing key management schemes.
environment, a number of clever symmetric-key
management schemes have been introduced. One well
received solution that has been extended by several
researchers is to pre-distribute a certain number of
randomly selected keys in each of the nodes throughout
the network [9], [4], [7], [16]. Using this approach, one
can achieve a known probability of connectivity within a
network. These previous efforts have assumed a
deployment of homogeneous nodes and have therefore
suggested a balanced distribution of random keys to each
of the nodes to achieve security. Likewise, the analysis of
those solutions relies on assumptions specific to a
homogeneous environment. A deviation from the
homogeneous system model has been increasingly
discussed in the research community. Instead of
assuming that sensor networks are comprised entirely of
low-ability nodes, a number of authors have started
exploring the idea of deploying a heterogeneous mix of
platforms and harnessing the available “microservers” for
a variety of needs. For example, Mhatre et al. [1]
automatically designate nodes with greater inherent
capabilities and energy as cluster heads in order to
maximize network lifetime. Traynor et al. [32] extend
this idea to mobile groups by having a more powerful
node perform group handoffs for neighboring sensors.
In this paper, we propose LOCK without using
deployment knowledge. In our scheme, a target field is
divided into hexagon clusters using a new clustering
scheme crafted out of nodes’s heterogeneity. Even
without using deployment knowledge, we drastically
reduce the number of keys to be stored at each node. A
pair-wise, group wise and cluster key can be generated
efficiently for among nodes. LOCK provides dynamicity
by two ways; one by not completely depending upon pre
deployed information and other by not completely
depending upon location. The rest of the paper is
organized as follows. Section II and Section III discusses
clustering approach. In section IV LOCK has been the
proposed network’s model, network deployment and
discussed with section V discussing performance related
issues. Finally concluded in section VI.
Index Terms – Deployment, Heterogeneous, Connectivity,
Geographical Group
I. INTRODUCTION
Wireless sensor networks (WSNs) are commonly used
in ubiquitous and pervasive applications such as military,
homeland security, health-care, and industry automation.
WSNs consist of numerous small, low-cost, independent
sensor nodes, which have limited computing and energy
resources. Secure and scalable WSN applications require
efficient key distribution and key management
mechanisms.
These systems have traditionally been composed of a
large number of homogeneous nodes with extreme
resource constraints. This combination of austere
capabilities and physical exposure make security in
sensor networks an extremely difficult problem. Because
traditional asymmetric encryption is not practical in this
1
© 2010 ACADEMY PUBLISHER
doi:10.4304/jait.1.3.105-115
106
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
II. NETWORK ELEMENTS
Basically, two architectures are available for wireless
networks, distributed flat architecture and hierarchical
architecture. The former has better survivability since it
does not have a single point of failure, and the latter
provides simpler network management, and can help
further reduce transmissions. As we know, WSNs are
distributed event-driven systems that differ from
traditional wireless networks in several ways such as
extremely large network size, severe energy constraints,
redundant low-rate data, and many-to-one flows. It is
clear that in many sensing applications, connectivity
between all Sensor Nodes ( SNs ) is not required but some
applications require explicit connectivity between every
pair of nodes. Mostly wireless SNs merely observe and
transmit data to those nodes with better routing and
processing capabilities, and do not share data among
themselves. Data centric mechanisms should be
performed to aggregate redundant data in order to reduce
the energy consumption and traffic load in WSNs (out of
scope of our proposal). Therefore, the hierarchical
heterogeneous network model has more operational
advantages than the flat homogeneous model for WSNs
with their inherent limitations on power and processing
capabilities [11][12][13][8][12]. Moreover recent trend is
towards secure connectivity between geographical
neighboring nodes. This phenomenon requires of Group
Key which is shared symmetric key among a group of
neighboring nodes.
In this paper, we focus on large-scale WSNs with the
same three-tier hierarchical architecture as in [2] [3].
SNs are divided into two categories namely H-Sensors
and L-Sensors. H-Sensors are small number of SNs
possessing higher memory, transmission range, multiple
transmission ranges, processing power and battery life.
Our network model has four different kinds of wireless
devices on the basis of functionality; sink node/base
station ( BS ), cluster head node ( CH ), Anchor Nodes
( AN ) and sensor node ( SNs ).
Sensor node ( SNs ): Sensor nodes are L-Sensors
which are inexpensive, limited-capability, generic
wireless devices. Each SNs has limited battery power,
memory size, data processing capability and short radio
transmission range. SNs communicates with its, CH ,
SNs and SINK .
Cluster head node ( CH ): Cluster head nodes are
a kind of H-Sensors, have considerably more resources
than the SNs . Equipped with high power batteries, large
memory storages, powerful antenna and data processing
capacities, CH can execute relatively complicated
numerical operations and have much longer radio
transmission range than SNs . CHs can communicate
with each other directly and relay data between its cluster
members and the sink node (base station). SNs which
need to communicate with neighbors in neighboring
cluster will relay its data through CHs .
Anchor Nodes ( ANs ): Anchor Nodes are a kind
of H-Sensors which have multiple power level for
transmission. Thus ANs have capability to transmit in
multiple ranges which can be changed at requirement.
ANs are placed at triangular/Hexagonal points to realize
a new clustering approach. We introduce a new clustering
approach which divides the nodes into clusters of
hexagonal shapes. This approach will classify our scheme
into location dependent scheme but without using
deployment knowledge.
Sink node/Base station ( SINK / BS ): Sink node is
the most powerful node in a WSN, it has virtually
unlimited computational and communication power,
unlimited memory storage capacity, and very large radio
transmission range which can reach all the SNs in a
WSN. Sink node can be located either in the center or at a
corner of the network based on the application.
In our network model, a large number of SNs are
randomly distributed in an area. A sink node/base station
( BS ) is located in a well-protected place and takes
charge of the whole network’s operation. After the
deployment, CHs partition a WSN into several distinct
clusters by using a clustering algorithm discussed ahead.
Each cluster is composed of a CH and a set of SNs
(distinct from other sets). SNs monitor the surrounding
environment and transmit the sensed readings to their
respective CH for relay. SNs may use multihop or
single communication pattern for communication
with CHs .
III. NETWORK DEPLOYMENT AND CLUSTERING APPROACH
SNs are large in number and have limited capabilities.
SNs are deployed randomly in the field for deployment
like can be dropped from an aircraft. ANs are placed
uniformly and in controlled manner using a manned or
unmanned deployment vehicle which is equipped with
GPS system to connect with satellite to retrieve exact
location for
ANs . Using hexagonal/triangular
deployment of ANs in the deployment field the network
deployment
field
is
roughly
divided
into
hexagonal/triangular field using multiple transmission
power levels of ANs . As shown in the Fig. 1 the lines in
dark are transmission radius of ANs placed at triangular
points. The higher is the transmission level larger is the
transmission radius. For sake of convenience we
approximated and drawn arc shaped lines by straight
lines and thus resulting each field is subdivided into
approximately triangular cells. Depending upon the
number of ANs whose transmission ranges are
aligned/covering the triangle completely, SNs in that
triangular cell will receive the equivalent number of
nonce, considering that each transmission level of a
AN transmits an entirely different nonce. For e.g. Nodes
in Blue Cluster receives Selected Nonce but from all
from AN 5 and N 65 N 66 , from AN 6 . SNs in other cells
of same cluster receives nonce depending upon their
location in the field. Further adjoining neighboring
triangular Cells will form a Cluster and each cluster will
be administered by CH . This process or step is followed
2
© 2010 ACADEMY PUBLISHER
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
107
Figure 1: Hexagonal deployments of ANs and Resultant Hexagonal Clusters. For convenience the circular arcs are approximated as
straight lines. Transmission ranges from closely placed Anchor Nodes at six corners intersect with each other and resulting into triangular
shaped cells. Adjoining cells may be joined to give a hexagonal shaped clusters which are supposed to managed by cluster Head
by another controlled deployment using same GPS
equipped vehicle, corresponding to H-Sensors which will
work as CHs . Considering the placement of Nodes as
shown in the Fig. 1, AN1 , AN 2 , and AN 3 , AN 4 ,
AN 5 and AN 6 are able to transmit at different power
level and thus can transmit in multiple ranges. We here
assume that the Anchor Nodes are able to transmit at six
power levels in Fig. 1.
IV. LOCK
A. Underlying Approach
In existing key pre-distribution schemes, two
communicating sensors either use one or some of their
shared pre-loaded keys directly as their communication
key [15][9], or compose a pairwise key by their preloaded secret shares. Although this kind of mechanism
has low computational overhead, it could lead to a serious
security threat in practice. If some SNs are captured after
the deployment, an adversary may crack some or even all
the communication keys in the network by those
compromised keys or secret shares. This node capture
attack is the main threat to a key pre-distribution scheme.
To address the limitations of existing key pre-distribution
schemes, we propose to incorporate the location
dependence with pre-distribution. Our proposal allows
each pair of neighboring SNs has a unique pairwise key
between them, which cannot be derived from the preoladed setup keys by other nodes. An adversary cannot
crack the pairwise keys among non-captured SNs , even if
some SNs are captured and their stored key information is
compromised. Therefore, any SNs compromise can not
affect
the
communication
between
noncompromised SNs .
B. Procedures in LOCK
Our proposed LOCK scheme has two phases,
(a) Setup keys assignment phase,
(b) Location dependent Keys Generation Phase
Key generation phase includes the generation of group,
cluster and pair wise key between nodes. An off-line
authority center called SINK is in charge of the
initialization of the SNs in LOCK. Before deployment,
each sensor node is assigned a unique ID ; generated
by SINK . Besides this each sensor node is also assigned
the IDs of two CH which is assigned the part of the
information required to generate pair wise key with its
post deployment CH . SINK also generates a large size
key pool P composed of more than 2 20 distinct
symmetric keys. For each sensor node SN i , SINK
randomly selects a secret key from P and stores it into
SN i s memory, this pre-loaded key is denoted
as k SN i − SINK . k SN i − SINK is the shared pairwise key
between node SN i and the Sink node, and is be used to
encrypt the exchanged data between the node SN i
and SINK .
Setup Key Assignment Phase: Before SNs are
deployed, setup keys need to be pre-loaded into them in a
certain way to ensure any two nodes can find some
common keys after the deployment. Besides this each
sensor node is also assigned IDs of two CH which is
assigned the part of the information required to generate
pair wise key with its post deployment CH . For
each SN i , SINK randomly selects some keys from P and
pre-loads them into the intended SN ' s memory. In our
scheme, this pre-loaded information is named as network
setup keys. Besides these a common key K is preloaded
as a common setup key into the memory of each SN . To
ensure any two SNs share some keys after deployment,
depending upon its location beside common key K , we
use a simple but efficient setup key assignment method
for Heterogeneous Wireless Sensor Networks( HWSN ).
3
© 2010 ACADEMY PUBLISHER
108
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
Suppose there are n SNs in the network. First, SINK
randomly selects n distinct keys from key pool P and
constructs a two-dimensional (m×m) matrix M , where
“ m = n ”. Fig. 2 illustrates an example of the
constructed key matrix M , in which each entry is a
symmetric key with a unique two-dimensional id
denoted by “ k i , j (i, j = 1,2,..., m ) .” For convenience,
we use Ri and “ C j (i, j = 1,2,..., m ) ” to represent the i th
row and the
j th column in M , respectively. An
equivalent representation of the Matrix M is given in
Fig. 4, where nodes in black represent the diagonal
entries of the matrix and also root of the Dual skewed
Hash Binary Tree (DHBT) a modification of Hash Binary
Tree in Fig. 3. Root can be used to derive node’s left
skewed branch and right skewed branch. For e.g. k 3,3 can
be used to derive the row 3 completely. Similarly all the
diagonal elements of the matrix M . Besides these
each SN is informed a number N , such that N = 2t with
“ t (1 ≤ t ≤ m ) ” values of which t values represents row
numbers and remaining t values represents column
numbers,
assigned
to SNs by
their
post
deployment CH (Deployment of CHs is discussed in
previous section). Before we can generate the complete
rows of the matrix we need to customize the key matrix
with respect to SN ' s Location. To customize and to make
the scheme location dependent, the diagonal elements of
a SN in a cluster in conformance to its administered
cluster and corresponding geographical location; CHs
computes the common content of the broadcast received
by all constituent cells and sends the common broadcast
vector to each node in its cluster using a plain broadcast
message or by encrypting using cluster key.
Equation (1) is used to customize the diagonal
elements in M , where K i j,i is the customized diagonal
element of ith row and ith column with respect to
location of jth cluster. COMM j is defined as common
content received by each SNs node in the jth cluster
and is defined as “ COMN j = K 1 ⊕ K 2 ⊕ K 3 ⊕ ... ”
where K1 , K 2 etc are nonce/keys shared by all the nodes
(
in the jth cluster CH
) . COMM j
Applying the procedure repeatedly results in generation
of complete Key Matrix M where hash function is
considered to be hardwired in SN .
Now consider the network setup keys pre-loaded in a
SN when t = 2 . In our case, each SN stores only m
instead of m 2 or keys in its memory. This is alternative
to use t rows and t columns thus 2 × t × m values in
the storage [10]. This is where our scheme performs
better in terms of memory requirements as it requires
only m keys in the memory. For higher values of t this
saving in memory requirements are even higher. For
higher values of t this memory requirement shoots up
exponentially [10] and thus our scheme offers a memory
efficient approach for establishing pair wise keys in
2
HWSNs. Any two SNs share at least 2t common keys
in their memories, therefore, our setup key assignment
which is deployment knowledge independent but location
dependent in manner compared to procedure in [10] but
still guarantee the connectivity between any two nodes in
the network. Compared to scheme proposed in [6] our
scheme ensures 100 percent connectivity among nodes of
WSN. So, compared with existing key pre-distribution
schemes, our approach is the first one to support full
network connectivity without any prior deployment
information and no matter how the SN are deployed and
offers higher memory efficiency and computational
efficiency, which are the main contribution of our
proposed scheme.
Key Generation Phase: This phase includes the
procedures for generation of Inter Cluster, Administrative
keys, Cluster key and pair wise symmetric keys.
Inter Cluster Key Establishment K CH a −CH b : Each
(
)
node is assigned a node ID by SINK . Provided CH a and
CH b are the participating cluster heads, CHs can
generate the pair wise key between them using (2) where
sh1 and sh2 are shares of the symmetric keys exchanged
between participating CHs
(K
CH a −CH b
)
= H K (sh1 ⊕ sh2 )
(2)
Administrative Key ( K CH i − SNi ) Generation: The nodes
is a vector and is
are preloaded with a symmetric key i.e. K SNi −CH i which
informed by the cluster head to each node in its
broadcast.
can be used directly. CHs has to construct this pair wise
symmetric key using the information stored in SN . Each
SN is provided with IDs of two CHs . These IDs are sent
to CH of the parent cluster. CH will receives the
j
K i j,i = H COMM j (K i ,i )
(1)
Now each node is provided with localized keys which
represent the diagonal elements of the Key Matrix K .
Next we propose to use Dual Skewed Hash Binary Tree
(DHBT)where left or right branch can be generated using
hash of left shifted value or right shifted value. The
diagonal elements are considered as roots of these DHBT.
shares k1 , k 2 from
two CHs whose IDs is
sent
to CH by SN . Equation (3) is used to set up K CH i − SNi ,
where K is preloaded common setup key.
K CH i −SNi = H K (k1 ⊕ k 2 )
(3)
4
© 2010 ACADEMY PUBLISHER
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
109
Figure 2: Setup key Matrix and Keys Assignment
0
S(0,0)
1
S (1,1)
S(1,0)
00
S(2,0)
000
S(3,0)
S(2,1)
001
S(3,1)
010
S(3,2)
11
10
01
S(2,3)
S(2,2)
011
S(3,3)
100
S(3,4)
101
S(3,5)
111
110
S(3,6)
S(3,7)
Figure 3: Hash Binary Tree. S(0,1) is obtained as Hash(leftShift(S(0,0))). Similarly S(1,1) is obtained as Hash(RightShift(S(0,0))). The
complete HBT can be obtained in this manner and upto required height.
Figure 4: Dual Skewed Hash Binary Tree Representation of Key Matrix K
5
© 2010 ACADEMY PUBLISHER
110
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
Pairwise Key Generation Phase: To secure the
communication between two neighboring nodes, any
SN needs to generate a pairwise key with each of its onehop neighbors after the deployment.
In our proposed scheme, the pairwise key generation
phase has three steps. First, node SN i randomly selects
“ l (1 < l < t ) ” rows and l columns from its stored setup
keys and SN i generates a random nonce rni . Then, node
SN i broadcasts a handshaking message including its
node IDi , the random nonce rni , and indices of it selected
rows and columns to its one-hop neighbors. After two
neighboring nodes exchanged the handshaking message,
they can generate a pairwise key using their shared setup
keys and the random nonce. To explain the procedure
clearly, we use an example to illustrate how two
communicating nodes generate a pairwise key between
them. Suppose nodes SN a and SN b are two neighboring
SNs after the deployment. As shown in Fig. 2, SN a has
been pre-loaded the 3rd and 6th columns, and the
1st and 4th rows indices of key matrix K in its memory,
SN b has the 1st and 4th columns, and the 3rd and 6th
rows indices of key matrix K pre-loaded in its memory.
To establish a pairwise key between nodes under
consideration, first SN a generates a random nonce rna .
Then, SN a broadcasts a handshaking message
{
}
to
a
random
“ SN a , R1 , R 4 , C 3 , C 6 , rn a ”
node
In other words, the path-key establishment phase of
existing key pre-distribution schemes is eliminated in our
approach, which not only reduces the communication
overhead, but also increases the security level of the
generated pairwise keys. On the other hand, since each
generated pairwise key is distinct to others, LOCK
improves the network resilience against node capture
attack. Further customizing the diagonal elements to a
cluster results in strengthening the resilience against node
capture attack as same node may never be used outside
the cluster heads transmission range.
Geographical Group Key Generation ( k GoG ) :Sensor
nodes in the same geographical group i.e. triangular cell,
can construct a group key k GoG using the broadcast
received from ANs and membership information
obtained from CHs as follows:
(
)
K GoG = H KCH k11 , k12, ,..., k ij ,..., list _ of _ IDs … (5)
i
where k ij ’s are key broadcast from
AN i
and
transmitting at jth power or transmission level,
list _ of _ IDs is unique value obtained as a result of XOR operation on the IDs of the nodes residing in a cell as
defined in (6):
list _ of _ IDs = IDi ,1 ⊕ IDi , 2 ⊕ ... ⊕ IDi ,m
(6)
“ SN b ”.
nonce rnb ,
and
where IDi , j is the jth Node’s ID in ith cell assigned at
broadcasts “ SN b , R3 , R6 ,C 1, C 4 , rnb ”to node. After
exchanging their handshaking messages, node SN a
obtains rnb as well as its shared setup keys indices with
pre-deployment stage by the SINK . list _ of _ IDs is
securely sent to the SNs using pair wise symmetric key
i.e. K CH i − SNi .
“ SN b k1,1 , k1, 4 , k 3,3 , k 6,3 , k 4,1 , k 4, 4 , k 3,6 , k 6,6 ” which are
Cluster Key Generation ( K CH i ):Equation (7) can be
Similarly, SN b generates
{
}
the intersections of the corresponding key rows and
columns. Node SN b also can get rn a and the shared
setup keys with SN a at the same time. Now, nodes SN a
and SN b can calculate a pairwise key between them by
Equation (4):
pk N a − N b = rn a ⊕ k1,1 ⊕ k1,4 ⊕ k 3,3 ⊕ k 6,3 ⊕ k 4,1
⊕ k 4, 4 ⊕ k 3,6 ⊕ k 6,6 ⊕ rn b " (4 )
In (4), "⊕" is the exclusive-or operator, pk N a − N b
denotes the pair wise key between nodes SN a and SN b ,
rna and rnb are two random nonce generated by SN a
and SN b respectively.
In LOCK, each SN stores m diagonal keys from the
constructed matrix M and t rows and t column indexes.
Since each pair of row and column has an intersection
entry between them, any two SNs can find 2t 2 common
keys after they exchange the handshaking messages,
which means, any two SNs which are members of same
cluster, within their radio transmission range, can directly
setup a secure link without the third node’s participation.
used to generate K CH i :
K CH i = H K (COMN i )
(7)
where K1 ,
where
“ COMN i = K1 ⊕ K 2 ⊕ K 3 ⊕ ... ”
K 2 etc are keys shared by all the nodes in
the ith cluster (CH i ) , and K is pre deployed in SNs as
described earlier. Successive uses of Common keys is
will be deleted after
replaced by K CH i as K
bootstrapping is over.
V. SECURITY ANALYSIS AND PERFORMANCE EVALUATION
We analyze the security property and evaluate the
performance of our proposed LOCK scheme in this
section.
A.. Security Analysis
Node Replication Attack: Because of the unattended
mode operation, some SNs could be physically captured
6
© 2010 ACADEMY PUBLISHER
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
by an adversary during the operating period. Thus Node
replication attack is a severe threat for WSNs due to its
infrastructure less architecture. In [9], the pair-wise keys
are directly used from the pre-loaded keys. After the
network bootstrapping phase, if SN is captured and all its
stored keys are compromised, the adversary can duplicate
some malicious node and deploy them into the network to
execute some attacks such as eavesdropping, Denial-ofService (DoS), etc.
In LOCK the keys are not same throughout the
operational life of the SN . Cluster key is updated as and
when needed using most recent broadcast from the ANs.
Geographical group key is updated using new and
remaining list of nodes from cluster head and new
broadcast from the ANs . Diagonal entries of the matrix
got customized with respect to the corresponding cluster
using the common part of the broadcast received by the
nodes in the cluster head’s coverage range. Moreover any
pair of SNs has a unique pairwise key between them after
network initialization phase, which can be used to
authenticate the communicating parties mutually.
Without the proper authentication, any stranger’s packets
will just be ignored. Consequently node replication attack
can be totally prevented by our proposed scheme.
Resiliency against Node Capture Attack: Adversary
can physically capture some SNs to compromise the
secret information. Node capture attack is the most
serious threat in WSNs.
The communication between non-captured nodes could
be cracked even they are not physically captured. In [15],
if each SN stores 200 keys in its memory and the
probability that any two nodes share at least one common
key is 0.33 and 50 nodes’ capture could compromise 10%
of the communication among the non-captured nodes.
Although [9] claims that the network resilience can be
improved if two nodes share at least q(q > l ) common
keys to establish a secure link, it only works when the
number of captured nodes is less than a critical value.
When the number of captured nodes exceeds the critical
value, the fraction of compromised communication
among non-captured nodes increases even at a much
faster rate than [15]. In LOCK, after the pairwise key
generation phase, each pair of neighboring nodes have a
unique pairwise key between them, hence any node’s
capture can not affect the secure communication between
non-captured nodes. In other words, our approach can
guarantee the communication security among noncaptured nodes no matter how many SNs are captured by
the adversary, which is one of the main contributions of
our work. Fig.5 shows that above 30% of the
communication between non-captured nodes are
compromised in [9] when 200 nodes are captured; if the
number of captured nodes increases to 500, more than
60% of the communication of the rest network will be
compromised. On the contrary, no communication
between non-captured nodes could be compromised in
LOCK no matter how many SNs are captured by the
adversary. Fig. 6 shows the LOCK cluster size supported.
111
assorted. As a result of location dependence the network
size supported is much larger than any existing key
management scheme.
If we assume the number of clusters is 7 and the size of
the network is almost 7 times of cluster, the network
supported is drawn in Fig. 6. In LOCK, the maximum
supported cluster size exponentially increases when the
key ring size increases linearly, which means our
proposed scheme has better scalability than any of the
existing key pre-distribution schemes till date. Think of
the network size that can be supported in our deployment.
Random key pre-distribution key management scheme
can support a network of size 200 nodes using 50 keys
per node. Q-composite[4] key distribution is not much
better than Random key pre-distribution key management
scheme.
In LOCK the same matrix got localized and thus same
equation which earlier denoted the size of equation which
earlier denoted the size of the network, exploits the size
of cluster. Moreover due to our proposal of storing only
diagonal entries the memory requirements are even
lesser. Network Connectivity: Random key predistribution schemes cannot guarantee any two SNs
establish a pairwise key directly. To increase the network
connectivity, intermediate nodes need to be involved in a
path-key establishment procedure. Even so, based on
probability theory, some SNs or some portions of a
network are still possibly isolated from the network if no
path-keys can be established.
LOCK can guarantee a completed network
connectivity since any two SNs can find common setup
keys between them, which is the second contribution of
our work. Fig. 7 shows that LOCK can generate a
connected network with only one-hop neighbors’
information exchange. For random key pre-distribution
schemes, two or three more hops neighbors need to be
involved to setup an almost connected network, which
not only reduce the security of the established pairwise
key, but also produce more communication overhead in
the network.
Communication Overhead: In random key predistribution schemes, each SN exchanges all of its stored
key information with its neighbors. For a large-scale
communication and memory storage overheads are
produced in this procedure. LOCK guarantees any two
nodes to establish a pairwise key directly; therefore, its
communication overhead is much lower than the previous
schemes.
C. Performance reviewed in the light of multiple
Transmission levels of Anchor Nodes
Scheme proposed in previous sections has support for
all three types of keys namely cluster key, Pair wise key
and group key. Key refresh mechanism of cluster and pair
wise keys is achieved with the help of periodic or event
based broadcast from ANs. The information broadcast
used by nodes in a cluster to generate cluster and pairwise
keys. To measure the effect of number of power levels
and radius of broadcast we reinvestigate the performance
7
© 2010 ACADEMY PUBLISHER
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
Fraction of Cluster Compromized
112
1
0.9
0.8
0.7
0.6
0.5
Random Pre
Distribution
Key
Managem ent
LOCK
0.4
0.3
0.2
0.1
0
0
200
400
600
800 1000 1200
Num ber of Com prom ised Nodes
Cluster Size
Figure 5: Fraction of Compromised Nodes
20000
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
0
5
10
15
20
Num ber of Keys per Node
Connectivity ratio with Same
Cluster Neighbors
Figure 6: Cluster Size Supported
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
Number of Hops
LOCK
Random Key PreDistribution(k=50)
Figure 7: Network Connectivity vs. Number of hops needed for pair wise keys
8
© 2010 ACADEMY PUBLISHER
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
and subject to various configuration and analyze the
effect on memory and connectivity performance. We start
by investigating the expected number of keys stored on
each sensor node when using LOCK. This gives a
measure of memory capacity of every sensor that needs to
be devoted for LOCK while refreshing.
Location Dependence measurements: The number of
keys stored on a sensor node is only momentarily
contributed by the number of messages that a node
receives from various ANs and almost completely by
number of generation keys stored on each sensor node.
Starting with memory required by the former factor. Each
message contains a nonce which is then used to form or
customize the pre-distributed keys with respect to its
location in the deployment field and to derive a key used
for geographical group and to derive a cluster key. After
these uses the nonce obtained by the nodes are deleted.
Hence we need to store these keys momentarily and
delete thereafter. Thus there is not major consumption on
the memory of the individual nodes as a result of
receiving broadcast from ANs . If we assume the memory
consumption is contributed equally by former factor, then
we need to determine the expected number of messages
received by a sensor node. In order to this we divide the
messages transmitted by each AN into N P different
categories, where N P is the number of power levels on
each AN . The messages transmitted at the ith power
level are called type − i messages. Type − 1 correspond to
the lowest power level while type − N P messages
correspond to the highest power level. Therefore, when
sensor node receives type − i messages then it also
receives messages of type − j where “ j >= i ”.
(
)
N p Na
⎛ ρ Lπ R 2j − R 2j −1 − ρ π (R 2 − R 2 ) ⎞
e L j j −1 ⎟
E N = ∑ ∑ i (N P − j + 1)⎜
⎜
⎟
i!
j =1i =1
⎝
⎠
(8)
The equation (8) as derived in [5] represents the
number of nonce which a node receives corresponding to
the ith power level. Where radius Ri is the outer radius
for annulus centered on SN under consideration with Ri −1
inner radius with of the annulus, ρ L indicates the density
of AN deployment, N P is the number of power levels
N a is the total number of anchor nodes in the network,
area
of
annulus
may
be
calculated
by
“ Aa = ∏ Ri2 − Ri2−1 ”. All the nonce broadcast from
ANs at higher transmission level is also received.
Dependence on the average number of keys on each
node. We further assumed the maximum transmission
radius R N P of an AN . We want to determine the number
(
)
of sub-keys would be needed to ensure high degree of
location dependence Given this equation we analyze the
effect of number of power levels and thus measure of
113
location thereby reduce the memory requirement at predistribution stage.
For Lower values of R N P the degree of location
dependence is very high and approaches to Lower levels
for higher values of R N P . This phenomenon is attributed
to the fact that with increased R N P we are able to cover
more nodes by the same AN and thus the probability of
having same diagonal contents shared among neighbors
increases and thus lowers Location dependence. We can
achieve desired connectivity but compromise ratio will
have a boost as a result. To achieve connectivity ratio of
1 and low compromise ratio we are required to increase
the size of matrix to achieve desired uniqueness in row
and column assignment.
The memory requirement is dependent upon number of
power levels. The reason to this issue is attributed to the
behavior is that with even single power level the node in
coverage of AN will receive one nonce/subkey. The predistributed contents and thereafter customized contents
are same throughout and thus will require a large sized
matrix and thus higher memory requirement at each
SN to achieve desired connectivity at low compromise
ratio.
With only one power level impact of compromised
nodes is very severe. To avoid the effect on the
compromise ratio we need to increase the size of matrix
and thus memory requirement at each SN .
To achieve the uniqueness in pre distributed
information we need extremely large sized key matrix.
Thus memory requirement is heavenly dependent upon
the number of power levels. This is because when using a
single power level any node in the transmission range of
an AN knows the of all secrets transmitted by the AN .
When the number of power levels increases for the same
value of R N C , the number of secrets of ANs known by
the sensor node depends upon the distance of the sensor
from ANs of interest.
Consider the case where all the intermediate power
levels are eliminated. Then sensor in any region will
receive all the secrets from all the transmitting anchor
nodes. Thus compromise of any node in the region will
jeopardize the communication of any other sensor node in
the network unless we increase the matrix size. On the
other hand by having three power levels for each AN , the
nodes in any region will not receive all the secrets
from AN . In such a case compromise of a node leads to a
lesser number of secure links between non-compromised
nodes being jeopardized. This effect is attributed to the
fact that with the increase in the number of power levels
the degree of location dependence increases thus causing
a reduction in number of SN in each cell and thus in a
cluster. This will reduce the size of matrix by many folds
equivalent to the number of clusters obtained as a result
of sectoring of network deployment field. This factor will
continue to affect the memory requirement; in other
words lower down the memory requirement unless each
node in a separate cell. We can increase the number
levels to a degree such that there should be at least two
9
© 2010 ACADEMY PUBLISHER
114
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
nodes in each cell. Even at high degree of location
dependence the connectivity ratio is 1 for every node in
the same cluster. Beyond a threshold i.e each SN in a
separate cell, this factor will not affect / have impact on
the compromise ratio but reduces the connectivity ratio to
0. Thus the compromise ratio as well as connectivity ratio
is sensitive only to very low and very high values of N P .
To study the effects of density, number of power levels
and transmission levels of AN on the compromise ratio.
For a sensor node we consider only the density and
maximum transmission range.
For a sensor node connectivity remains same and
compromise ratio increase as the density of sensor is
increased. This is because with the increase in sensor
density there are more nodes that share the same
customized diagonal entries. As more nodes are close by
and thus able to connect with their neighbors, a node is
able to set up secure links with more of its neighbors. In
addition compromise of a node is still unaffected as long
as we able to have uniqueness in row and column
assignment. With the increase in the density the size of
matrix may required be increased to bring uniqueness in
row and column assignment and thus increasing the
memory requirement at each node.
Further as the transmission radius of the sensor nodes
is increased, the nodes have more neighbors and a node is
able to communicate with a node only if they share
commonly customized matrix. Moreover if node belongs
to the neighboring cluster node will not be able to
communicate. We have not considered such scenario, but
of course will reduce linked node compared to potential
neighboring nodes. The connectivity ratio on the other
hand could be reduced. Increasing radius results in
increasing the neighbors; some of which might not be
sharing the same secrets; as new neighbors might not be
covered by the same ANs as node concerned. Thus
reducing the capacity of connecting to all the neighbors
of a sensor node thus reducing connectivity ratio.
The compromise ratio on the other hand should not be
affected. Changing the transmission range will not affect
the number of non compromised nodes impacted due to
compromise of any node. It is because a noncompromised node is impacted only when it shares keys
with the compromised nodes. And sharing of keys is not
governed by the transmission range of a sensor node.
Increasing the transmission range might allow more
number of non-compromised nodes to set up secure links
and the fraction of these new links that are impacted
cannot be predicted.
Increasing the number of power levels N p on an
AN while keeping the density of ANs as well as the
maximum transmission range R N P the same also does
not affect either the connectivity ratio or the compromise
ratio.
Increasing the density of ANs without changing either
N p or R N P has positive impact on both the connectivity
ratio or compromise ratio. This is because by increasing
the number of ANs more number of Sensor Nodes can
receive the beacons/nonces which allow them to derive
their own customized diagonals. This also has a positive
impact on the compromise ratio by reducing the
compromise ratio since location dependence increases
with increase in the density of ANs . Increasing the
maximum transmission radius of a ANs has negative
impact on Location dependence. This is because by
increasing the R N P more number of sensor nodes will
receive beacons from the same AN . This makes it easier
for neighboring nodes to share common diagonal. This
will also result in increasing compromise ratio. Thus from
above we can conclude that the AN density has to
increased while ensuring that both N p as well as R N P
are not large in order to reduce the impact of
compromised nodes. But this could increase the cost
associated with the deployment. If compromise of nodes
can be tolerated then the system can deploy a low density
of ANs with large transmission range and fewer power
levels.
VI. CONCLUSION AND FUTURE WORKS
With the proposal above we are able to highlight the
effect of Heterogeneity on the performance of Key
Management Scheme in Wireless Sensor Networks. We
considered a special kind of heterogeneity i.e. Number of
Power levels and were able to draw the effect on
performance in terms of memory requirements and size
of the network supported by LOCK. Average number of
nuances/keys received depend not only the maximum
transmission radius but also on number of power levels.
Although some issues like simulation results etc still
needs be addressed but hopefully in our next work we
come out with better results on the issue. Future scope
lies in making this scheme scalable with respect to new
node addition , routing aware and thus achieve secure
communication. We have not considered much of intercluster communication model among the Sensor nodes
thus open challenge.
VII. REFERNCES
[1] Mhatre, V. P., Rosenberg, C., Kofman, D., Mazumdar, R.,
and Shroff, N., “A Minimum Cost Heterogeneous Sensor
Network with a Lifetime Constraint”, In Proceedings of
IEEE Transactions on Mobile Computing 4, 1 (Jan. 2005),
4-15.
[2] M. Younis, M. Youssef, and K. Arisha, “Energy-Aware
Routing in Cluster-Based Sensor Networks,” In
Proceedings of the 10th IEEE/ACM International
Symposium on Modeling, Analysis and Simulation of
Computer
and
Telecommunication
Systems
(MASCOTS2002), 2002.
[3] K. Arisha, M. Youssef, and M. Younis, “Energy-Aware
TDMA-Based MAC for Sensor Networks,” In Proceedings
of the IEEE Workshop on Integrated Management of
Power Aware Communications, Computing and
Networking (IMPACCT 2002), May, 2002.
[4] H. Chan , A. Perrig, D. Song, “Random Key
Predistribution Schemes for Sensor Networks”, In the
Proceedings of the 2003 IEEE Symposium on Security and
Privacy, p.197, May 11-14, 2003
10
© 2010 ACADEMY PUBLISHER
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
[5] Anjum, F., “Location dependent key management using
random key-predistribution in sensor networks”, In
Proceedings of the 5th ACM Workshop on Wireless
Security (Los Angeles, California, September 29 - 29,
2006). WiSe '06. ACM, New York, NY, pp 21-30.
[6] Kausar Firdous, Sajid Hussain, Laurence Tianruo Yang,
Masood Ashraf, “Scalable and efficient key management
for heterogeneous sensor networks”, In Journal of
Supercomputing 45(1): 44-65 (2008)
[7] W. Du , J. Deng , Y. S. Han , P. K. Varshney, “A pairwise
key pre-distribution scheme for wireless sensor networks”,
In the Proceedings of the 10th ACM conference on
Computer and communications security, October 27-30,
2003, pp. 42 - 51 , Washington D.C., USA
[8] B. Liu, Z. Liu, and D. Towsley, “On the capacity of hybrid
wireless networks”, In Proceedings of IEEE INFOCOM,
April 2003, volume 2, pages 1543--1552, San Francisco,
CA.
[9] L. Eschenauer and V. Gligor, "A Key Management
Scheme for Distributed Sensor Networks", In Proceedings
of the 9th ACM Conference on Computer and
Communication Security, pp. 41-47, November 2002.
[10] Cheng Y., Aggarwal D.P., “ An Improved key mechanism
for large scale hierarchical wireless sensor networks”
,Proc. of Security Issues in Sensor and Adhoc Networks,
Elsevier, Vol. 5(1), p.p. 35-48.
[11] P. Gupta and P. Kumar, “Internets in the sky: The capacity
of three dimensional wireless networks”, In Proceedings of
Communications in Information and Systems, 1(1), pp. 33-50, 2001.
[12] S. Zhao, K. Tepe, I. Seskar, and D. Raychaudhuri,
“Routing protocols for self-organizing hierarchical ad-hoc
wireless networks,” In Proceedings of IEEE Sarnoff 2003
Symposium, 2003.
[13] P. Gupta and P. R. Kumar, “The capacity of wireless
networks,” IEEE Trans. Inform. Theory, vol. 46, no. 2, pp.
388–404, Mar. 2000.
[14] Gang Zhou, Chengdu Huang, Ting Yan, Tian He, John A.
Stankovic and Tarek F. Abdelzaher, “MMSN: MultiFrequency Media Access Control for Wireless Sensor
Networks,” In Proceedings of IEEE INFOCOM 2006,
Barcelona, Spain, April 2006.
[15] A. D. Wood and J. A. Stankovic, “Denial of service in
sensor networks,” Computer 35(10):54–62, 2002.
[16] D. Liu and P. Ning, “Location-based pairwise key
establishments for static sensor networks,” in Proceedings
of the 1st ACM Workshop on Security of Ad Hoc and
Security of Ad Hoc and Sensor Networks in Association
with 10th ACM Conference on Computer and
Communications Security, Fairfax, Va, USA, October
2003, pp. 72–82.
115
Kamal Kumar received his
M.Tech. as well as B.Tech
degree
from
Kurukshetra
University, Kurukshetra, India.
Presently he is working as
Associate Professor in Computer
Engineering Department in M.M.
Engineering College,
Ambala,
India. He is pursuing Ph. D from
Thapar University, Patiala, India.
A. K. Verma is currently working
as Assistant Professor in the
department of Computer Science
and
Engineering
at
Thapar
University, Patiala in Punjab
(INDIA). He received his B.S. and
M.S. in 1991 and 2001 respectively,
majoring in Computer Science and
Engineering. He has worked as
Lecturer at M.M.M. Engg. College,
Gorakhpur from 1991 to 1996. From 1996 he is associated with
the same University. He has been a visiting faculty to many
institutions. He has published over 80 papers in referred
journals and conferences (India and Abroad). He is member of
various program committees for different International/National
Conferences and is on the review board of various journals. He
is a senior member (ACM), LMCSI (Mumbai), GMAIMA
(New Delhi). He is a certified software quality auditor by
MoCIT, Govt. of India. His main areas of interests are:
Programming Languages, Soft Computing, Bioinformatics and
Computer Networks. His research interests include wireless
networks, routing algorithms and securing ad hoc networks.
R. B. Patel received a PDF,
Highest Institute of Education,
Science & Technology (HIEST),
Athens, Greece, 2005. He
received a PhD in Computer
Science and Technology from
Indian Institute of Technology
(IIT), Roorkee, India. He is
member IEEE, ISTE. His current
research interests are in Mobile
and
Distributed
Computing,
Security,
Fault
Tolerance
Systems,
Peer-to-Peer
Computing, Cluster Computing and Sensor networks. He has
published more than 100 papers in International Journals and
Conferences and 17 papers in national journal/conferences. Two
patents are also in the credits of Dr. Patel in the field of Mobile
Agent Technology and Sensor Networks.
11
© 2010 ACADEMY PUBLISHER
116
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
Peripheral Display for
Multi-User Location Awareness
Rahat Iqbal, Anne James, John Black
Faculty of Engineering and Computing,
Department of Computing and the Digital Environment,
Coventry University,
Coventry, UK
Email: {r.iqbal, a.james, john.black}@coventry.ac.uk
Witold Poreda
UYT Limited
Coventry Business Park
Coventry, UK
Email:
[email protected]
Abstract— An important aspect of Ubiquitous Computing
(UbiComp) is augmenting people and environments with
computational resources which provide information and
services unobtrusively whenever and wherever required. In
line with the vision of UbiComp, we have developed a multiuser location-awareness system by following a user-centred
design and evaluation approach. In this paper, we discuss
the development of the system that allows users to share
informative feedback about their current geographical
location. Most importantly, the proposed system is to be
integrated in the smart-home environment by portraying
location-awareness information on a peripheral display. The
proposed system can be used by various users, for example
family members, relatives or a group of friends, in order to
share the information related to their locations and to
interact with each other.
Index Terms— Location awareness, Ubiquitous Computing,
Calm Technology, Global Positioning System (GPS),
Tracking, Focus group, Interviews
I.
INTRODUCTION
According to Mark Weiser’s vision of Ubiquitous
Computing (UbiComp), the trend is to move away from
the traditional desktop computing paradigm, to integrate
seamlessly with the environment, augmenting people and
their environment with computational resources which
provide information and services unobtrusively whenever
and wherever required [1].
The UbiComp paradigm, where computers are
everywhere, calls for new technology that prevents
humans from feeling overwhelmed by information. ‘Calm
technology’ implements this idea by putting computers in
the periphery of our attention until needed [2]. Calm
technologies easily and seamlessly move from the
periphery of someone’s attention and back again when
appropriate. In this way, peripheral displays can convey
non-critical information, but not distract or burden its
users.
© 2010 ACADEMY PUBLISHER
doi:10.4304/jait.1.3.116-126
In line with this vision of ubiquitous computing and
calm technology, several projects such as AMI [3],
Interactive Workspaces [4] and CHIL [5] envisage
systems that, rather than being used as a tool, support
human-human communication in an implicit and
unobtrusive way, by constantly monitoring humans, their
activities and their intentions.
Traffic congestion is increasingly a problem around the
world. So there is increasing scope for being delayed by
congestion, due to the numbers of vehicles or due to an
accident. When we are on a car journey, family members,
friends and colleagues may want to know where we are.
They may also wish to know if we are alright, making
adequate progress and be able to interact (exchange
messages) with us.
A few years ago we had to depend on our sense of
direction and paper maps while travelling. Today many of
us cannot imagine driving without a Satellite Navigation
or a Global Positioning System (GPS). GPS is not only
able to be used to locate unknown places, but also can be
used to provide location-awareness information of its
carrier.
This research raises some security concerns but we
contend that there is a good tradeoff between the benefits
we can get from this technology and disadvantages of
being tracked by someone else. Within the family,
tracking can be perceived as a positive step [6]. For
instance when a member of the family is traveling at a
busy time or in a bad weather, it is good to know that
someone has reached their destination safely.
There are automatic algorithms and systems which
use data gathered from phone calls, sensors embedded in,
near or above the road surface or using GPS receivers in
some cars themselves, called probe vehicles [7]. In
addition there are a variety of Intelligent Transport
Systems which attempt to help manage traffic and
provide information to travelers [8]. However these
systems are limited by the geographic spread of the
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
sensors or the number of probe vehicles. So an entire
country cannot be covered. In addition these systems do
not cover a single person. This motivates our research
and the development of the location-awareness system
presented in this paper.
We develop a system which supports the sharing of
positional data amongst users carrying GPS enabled
devices. It is also possible to support non-GPS enabled
devices if the user is prepared to input position data
manually on an interactive map. Most importantly, the
proposed system is to be integrated in the smart-home
environment
by
portraying
location-awareness
information on a peripheral display. We have not been
able to identify a product which would provide a similar
level of functionality to that of our system. There are
many GPS tracking solutions available on the market at
present, but existing systems are mainly associated with
some designated GPS receiver hardware which reports
position to a server and then positions may be viewed
using a computer browser, or are limited to given
hardware. The scope of functionality in many cases is
very limited and based on one-way communication. None
of the systems we investigated was able to share positions
with a smart display (in a smart-home environment) to
display positions.
The rest of the paper is organised as follows: Section II
reviews the technological background; Section III
discusses the development of the proposed system. This
section also describes the requirements captured using a
user-centred design and evaluation method; Section IV
presents the results of an evaluation of the system in
terms of a user evaluation and a comparison of the system
to other similar systems. Finally Section V concludes this
paper and outlines our future work.
II.
REVIEW OF EXISTING SYSTEMS
A Global Positioning Systems and Mobile Data Modems
Part of the technological infrastructure chosen to find
location and time are Global Position System (GPS)
receivers and mobile data modems. GPS receivers are
increasingly being used in consumer applications such as
navigation aids for walkers, navigation devices for boats
and aeroplanes and in-car satellite navigation systems.
The technology is getting increasingly cheap. In-car
satellite navigation systems are fairly standard on luxury
cars and are prevalent on top-of-the-range models in most
categories of cars. Alternatively, in-car satellite
navigation systems are available as an optional extra or
can be bought as a separate item.
Mobile data modems using mobile-phone, packetised,
data transmission technology using either General Packet
Radio System (GPRS) (a second generation (2G) mobile
phone technology) or third generation (3G) is also cheap.
In addition mobile networks are widely deployed in the
UK, covering most of the UK population and landmass,
for example, Vodafone UK coverage map [9], Orange
UK coverage map [10] and 3 UK coverage map [11].
© 2010 ACADEMY PUBLISHER
117
B Incident Detection Systems and Intelligent Transport
Systems
As has been outlined above there has been research
and development of systems which detect incidents.
There are a variety of approaches [7]: driver-based
algorithms, for example correlating mobile phone calls
[12]; roadway-based algorithms [13], [14]; probe-based
algorithms which use GPS-equipped vehicles [15] and
sensor-fusion-based algorithms using, for example, data
from fixed detectors and probe vehicles [16].
As discussed above, the limitations of these systems
are that they use only the extent of the deployment either
of fixed sensors or the number and deployment of the
probe vehicles. This cannot possibly cover a whole
country. In any case they do not address the issue of the
movements of a single vehicle. GPS and the mobile
phone networks cover the vast proportion of the UK.
C Smart Home
There has been much research into the smart home. As
will be seen below, the smart home is the embodiment of
ubiquitous computing, because the computing integrates
seamlessly with the environment. The smart home
augments the people and the computational resources in
the home provide information and services unobtrusively
whenever and wherever required.
Smart-home projects falls into two types [17]:
• Non-healthcare
• Healthcare
Some examples of non-healthcare smart-home projects
are briefly described, as these are the most relevant.
Mozer [18] describes the development of a system that
learns the behaviour of the inhabitants of a building and
controls the lighting and heating, water heating and
ventilation.
Microsoft’s EasyLiving project was concerned the
production of an architecture and technologies for
intelligent environments [19],[20]. An example is
someone wants music to be played, then the system
switches on the speakers, based on the location of the
person [20].
The Ubiquitous Home project in Japan uses cameras
and microphones in each room to record the residents;
however pressure sensors are used to track the inhabitants
and also to determine the position of furniture [21]. Other
sensors are IR sensors which are placed in each room and
at foot level in the kitchen and corridor and two radio
frequency ID systems (active and passive) are used.
There are appliances and visible robots in the home are
controlled by the Ubiquitous Home system.
D
Tracking Systems
Three systems have been identified that allow
tracking of vehicles. The first is the En Route HQ app on
the iPhone [22]. The second is Glympse which runs on
iPhone and mobile phones running the Android and
Windows Mobile operating systems [23]. The third is the
TomTom Plus Buddies service [24]. Note that this paper
was finalised in mid June 2010. The functionality,
availability and hardware and operating systems relating
118
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
to systems and software described in this subsection may
have changed since mid June 2010.
At the time of writing, the En Route HQ app required
an Apple iPhone 3G or 3GS [22]; the iPhone 3G and
iPhone 3GS possess a GPS receiver. The trip can be
viewed on any sort of Apple iPhone existing at the time
of writing or on the En Route HQ web site [22]. Fig. 1
shows some screenshots of the iPhone app. En Route HQ
has the ability to exchange messages between users of
Apple iPhones or the En Route HQ website. The problem
with the En Route HQ app, at the time of writing, is that
it is limited to an iPhone 3G or 3GS for the tracking
function [22]. Another problem, at the time of writing, is
that only one trip can be seen at a time in a single web
browser window or window tab [22].
The Glympse app requires an iPhone 3G or 3GS or a
mobile phone running the Android or Windows Mobile
operating systems and which have a GPS receiver for
tracking [23]. In addition an iPod touch connected to WiFi can be used to send a position. A person’s location can
be viewed, at the time of writing, using the app on an
Apple iPhone, an Apple iPod Touch, Apple iPad or a
mobile phone running the Android or Windows Mobile
operating systems or on the Glympse web site [23].
Multiple people can be tracked using the Glympse app,
which appear on different screens [23], [25]. Fig. 2 shows
some screen shots of the Glympse app on an Apple
iPhone. A problem with Glympse, at the time of writing,
is that there is limited scope for interaction by sending
Figure 1.
Figure 2.
© 2010 ACADEMY PUBLISHER
messages. A message can be sent by a user of the app,
when initially notifying his or her location. However in
order to send another message, the location notification
must be resent. There is no scope for viewers of a
location either via the app or the Glympse website for
sending messages back.
The TomTom Buddies service requires a TomTom
satellite navigation device and linking the TomTom
satellite navigation device to a mobile phone via
Bluetooth. A screenshot of the TomTom is shown in Fig.
3. Two problems with the TomTom Buddies system are:
the position update is not automatic and requires a user to
request an update; and the system is tied to having the
TomTom system.
In addition to the tracking applications mentioned
above a there are a number of other multi-user location
awareness systems, based around GPS-enabled mobile
phones. Examples of these are as follows:
•
•
•
•
•
•
Google Latitude [26]
Centrl [27]
Pocket Life and Pocket Life Lite [28]
Look 8 Me tracking [29]
Locus [30]
Locc.us: [31]
Screenshots of the En Route HQ app from the En Route HQ website [22]
Screenshots of the Glympse app from the iTunes Store on the iTunes application [25]
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
Figure 3.
III.
Screenshot of the location of a TomTom buddy [24]
DEVELOPMENT OF THE SYSTEM
We develop the location-awareness system by
following a user-centred design and evaluation
methodology [32]. We conducted a user study consisting
of a focus group and interviews. The interest of this study
was to inform the system design in order to design and
develop a location-awareness system.
A
Focus Groups
We established two focus groups consisting of 7
people in each with different age ranges to discuss the
core requirements and most importantly, the usefulness of
the system and whether they would use such a system.
The merits and demerits of such a system were discussed
and the participants’ opinions were recorded for later
analysis.
All participants of the focus group thought the idea
was interesting and useful. However they all expressed
concerns about the ethics and privacy of the system. The
consensus was that tracking was to be on a case-by-case
basis and they would not want to be tracked all the time.
The results also showed that there would be some
situations where users would not want to share position
information and others where sharing position
information would be very useful.
In addition the second focus group raised two
interesting issues regarding the use of the system being
used to track drivers. The first was concerns about safety
using such a system while driving. The second concerned
a person at home wanting to know when the driver (e.g.,
a family member or friend) was arriving back at home or
a house where they were going to visit. This was so that
the person in the house can schedule their activities, prior
to the arrival of the driver.
B Interviews
The interviews were used to capture further user
requirements. The interviews were held with three
doctorate, two post-doctorate and two undergraduate
students for one hour. A usability expert also participated
in the interview sessions. The interviews were used to
inform the system design and importantly, to record the
missing functionality for the second stage of iteration of
the user-centred design process.
© 2010 ACADEMY PUBLISHER
119
C User Requirements and Design Considerations
The system provides three different user interfaces: a
smart-phone solution, a web site solution and a smarthome display. The smart-phone and web solutions
provide a similar level of functionality. The system
allows the Web user to: register; confirm email address;
invite friends; accept or refuse invitations; see friends’
positions; send and receive messages, attach information
to locations; set alerts on positional data; and manually
set their own position. The smart-home display allows the
viewing of positions messages, selection of the person to
view and sending of messages from the person. As a
source of GPS data, the mobile user uses the internal or
external GPS receiver connected to his/her mobile device.
The system protects the user against potential problems
with connection and will re-establish connectivity if
necessary.
During the registration process the users need to
provide the system with a valid email address which is
used to deliver a dynamically generated, unique link. This
is required to fully activate the account and confirm
ownership of specified email address. A logged-in user
can use the system interface to invite friends with whom
who the user would like to share positions.
The system provides a high level of privacy. Only an
account owner should decide who will see his or her
position. To achieve this level of privacy, the system uses
invitations, which must be manually approved by the
account owner. The user may see other users’ positions
from his friends’ list only if those friends have agreed. If
a mobile device malfunctions, it is possible to set the
position manually using a web interface. Another
functional requirement is that a user can create a number
of locations with associated names which could
potentially be used to create position related alerts.
Additional requirements for the specific driver tracking
application include information about estimated arrival
time at location and reasons for delay to be supplied to
the monitoring parties.
The non-functional requirements are related to cost,
energy effectiveness, security and usability aspects of the
system design.
Cost effectiveness - the smart-phone application
requires the use of a network connection in order to work
correctly. Mobile Internet is becoming more and more
popular and in many cases it is included in contracts for
free. The system also targets users who need to pay for
their network use. Thus the amount of data sent between
120
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
the client application and the server should be minimised.
To further improve cost effectiveness, a periodical update
should be used. This gives the user an opportunity to
decide how often position data should be transferred to
the server.
Energy effectiveness – because the system is
operating on mobile devices, energy use is a very
important matter. The system should not interfere with
normal phone functions. When an application is using an
internal GPS receiver it could easily lead to phone battery
discharge. Because of that, periodical updates, used to
save cost, could also increase battery life. The application
should not use GPS hardware constantly, instead the
normal state should be off with activation only for short
periods of time to obtain position. In this way the system
will give a user the opportunity to decide how often to
update position.
Security – the system will handle confidential data,
namely user position. Thus it should provide a high level
of security and make sure that only selected users can
have access to the data. To transfer data safely between
client applications and server, SSL is used. As an
additional security measure, a ticket authentication
system is introduced. This reduces password traffic.
Platform compatibility – part of the system is targeted
to be used on mobile devices. The system should be
easily expandable to other mobile platforms available on
the market. The requirement is achieved by using with
XML and Web Services which are platform independent.
The system stores all logic on the server side of the
system. Thus all major changes are made in one single
place.
Smart Home - The smart home will have Internetconnected screens in more than one room, for example in
the kitchen, living room and dining room. The screens
would be built into the wall. The tracking application
could either be on a web page or be a web-service
application. The application could display the current
position, the intended route, the estimated arrival time
and/or the estimated time to arrival and any messages
sent by the driver. One of the major design considerations
is that the display should be calm, unobtrusive and
peripheral. The display could switch on by itself or
display a message when one arrives.
The home
inhabitants could switch the display on or to the display
when they wanted to check the location of the driver.
Another interaction of the system with the smart home
is if there is no one home, then the heating could be
automatically activated as the tracked individual is
nearing home. Similarly the system on the smart-phone
device could be integrated with a smart home’s control
system and thus at the same time as being able to reveal
personal location, could also control the smart-home
environment.
To be specific the requirements for smart-home
display, in addition to those for the web page are that
there should be that it should be calm and unobtrusive
© 2010 ACADEMY PUBLISHER
and in the periphery of someone’s attention when there is
no information to convey. When there is information to
convey it should move to the centre of attention. So when
there is nothing to report, the display should be switched
off. The display could be used for other functions, for
example watching TV or a DVD, playing video games or
browsing the internet. Alternatively if the display is being
used for anything else, then the tracking display should
be minimized. When there is a message, then the display
should switch on if it was off. Alternatively, if there is a
message and if the display is on and it is being used for
another purpose, then the tracking display should become
more prominent. The information to be displayed is the
position, speed and direction of travel of the driver. If
known, then the destination and the intended route should
be displayed. In these ways the display would be
peripheral and unobtrusive.
The interactions with the display are as follows. The
tracking display can be selected. There should be the
option to select either the entire trip or centre the display
on the current location. The display should be able to
handle more than one driver, as there maybe more than
one driver who is being tracked. There should be the
option to switch between viewing individual drivers or
see a map covering all drivers. There should be the ability
to send messages to the drivers and to receive messages
from drivers. The inhabitant of the home should be able
to interact with the display using voice, keyboard or
smart phone.
D
Technologies Used
This section includes presentation of all main
technologies used in the development. The system
consists of three sub systems (as shown in Fig. 4) which
represent different programming areas: web applications;
web services; and mobile applications. All interact with
each other so determination of the requirements and
techniques of each was crucial to avoid interoperability
problems. The sub systems and technologies used are
listed below.
The System -.NET framework: There is a wide range
of technologies and programming environments
available, from open-source platforms to solutions
provided by companies like Microsoft. Only two
platforms meet the requirements of the three aspects of
this development. One of them was an Open Source
JAVA provided by Sun Microsystems and the other was
.NET framework supplied by Microsoft. Each of them
has a specific pros and cons. The system was built using
.NET Framework, mainly to provide some level of
similarity between code and functionality in different
parts of system.
The Web Site - ASP.NET Framework 3.5: The web
site was created using ASP.NET Framework 3.5 SP1
which was the latest available version released.
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
121
Figure 4. System Overview Design
The Mobile Application - Windows Mobile 6.1 SDK:
The mobile application was developed using Visual
Studio and C with Windows Mobile 6.1 SDK which
brought a full mobile functionality to Visual Studio. The
SDK includes a full documentation, code samples,
libraries and tools – everything necessary to develop a
rich mobile application. SDK libraries in conjunction
with GPSID – GSP Intermediate Driver made it possible
to access a GPS receiver and obtain GPS data [33].
GPSID works as an agent between the application and
the GPS hardware. The main benefits of this solution are
that GPSID enables use of a GPS receiver to handle
multiple applications at the same time. To run the
application on a smart phone, .NET Compact Framework
3.5 was necessary
The Server Functionality XML Web Services: The
initial intention was to use Web Services provided by
WCF – Windows Communication Foundation. WCF is
an API introduced in .NET Framework 3.0 which is used
to build distributed systems. Unlike past solutions WCF
provides a single, unified and extendable programming
object model.
Authentication - Microsoft Membership API: Security
is an important aspect of the proposed system. Providing
the user with a high level of privacy was one of the main
concerns. Login facilities are very specific and provide
almost the same scope of functionality for most of the
applications. To save time and provide the developer with
a high quality solution, Microsoft has included a
Membership API with ASP.NET 3.5 [34].
The
Membership API allows the developer to avoid repetitive
implementation of authentication features and also to be
sure that the used solution is bug free. It provides
functionality for the most common activities like:
registration, password change, password recovery, and
login.
© 2010 ACADEMY PUBLISHER
The Maps - Google Maps: This project uses Google
Maps. Google Maps does not have a dedicated API for
ASP.NET and so interfacing was achieved through
Javascript.
The User Interfaces - ASP.NET AJAX: ASP.NET
AJAX was used in this project to create user interfaces
[35]. AJAX allows an application to retrieve data from
the server while the user is interacting with the web site.
In consequence only a selected part of interface is
changed during user interaction and the user does not
have to wait for the full web page to reload which creates
a better experience.
Security - SSL (TLS Transport Layer Security)/SOAP
Headers: Security is very important task in this project.
The application will operate with confidential
information such as actual user location. That why TLS
was used to create a secure link between the client
application and the server.
The Database - SQL Server: Visual Studio and the
.NET Framework are well integrated with MS SQL
Server. Use of MS SQL Server guaranteed a simplified
access to diagnostics, database management and use of
framework-provided security mechanisms.
Data Querying - LINQ: LINQ is a .NET Framework
component developed by Microsoft that expands
functionality of the .NET language by supporting the
construction and compilation of data queries [36].
E System Interfaces
Fig.5 shows screenshot of the system website. As can
be seen in the figure, the main content of a page is
divided into two parts. On the left hand side there is a
map showing users’ positions; on the right hand side a
vertical menu including a contact list, some additional
features and messages. Graphics were limited to a
minimum to reduce web page loading time. The whole
layout is based on CSS (Cascade Style Sheets). Fig. 6
shows the main screen of the smart-phone interface.
122
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
In comparison with the web page, we see in Fig. 6,
that the smart-phone layout is very simple, mainly due to
limited amount of space provided by a mobile device. In
the central point we can see a map and just below it there
is a ‘friend’ list with scroll bar. On the top and bottom of
the screen there are two status bars, ‘message’ and
‘application’. It was not possible to provide all the
functionality on the one screen. Most of the features use
separate screens or partially cover the main screen.
Fig. 7 is the design for the smart-home display. The
interface is designed to be interacted with by voice. It is a
multifunction display; in the lower left there is the
mechanism for selecting what is to be displayed, which is
highlighted. The display has an area for the showing the
position of one of the friends. There is also area for the
messages received from the friend. In addition the text
below the tracking area shows the friend list; the
highlighted name is the one being the one being
displayed. Below there is a means for selecting what is
displayed on the map. Below that is a box for entering
and sending a message to the friend. The is entered using
speech input
In order to be peripheral when there is not a message
and the display is not being used for other purposes, then
the display could be blank. When a message arrives from
the driver then an audio-alert would be played (similar to
that when a text message arrives on a mobile phone). In
addition the display could illuminate and the screen seen
in Fig. 7 be displayed. If the display is being used for
another function then the new message would appear on
the screen in some way. For example the message could
be superimposed in the middle of the display, or at the
top or bottom of the screen. In addition the display could
be integrated with a smart-home environment so that a
system would sense in which room or rooms the
occupants of the house were and only display the tracking
screen on a display in those rooms. If the inhabitants of
the house were not present in a room with a smart
display, then some kind of audio alert could be played in
those rooms, for example like the audio alert for a text
message on a mobile phone. Alternatively something
more meaningful such as a speech synthesised alert could
be played for example “Message received from Adam”.
The inhabitants could then go to a room with a smarthome display. In these ways the display would be calm
and move from the periphery to the centre of attention
when required, as discussed above.
IV.
Figure 5.
GPS Tracking System Web Page Main Screen
Figure 6.
Smart-phone Main Screen
EVALUATION
A Introduction
Two types of evaluation of the system were made. The
first was a user evaluation of the system other than the
smart-home display and separately an evaluation of
smart-home display on its own. The second part was a
comparison of the system to other existing systems
identified in Section II D.
Figure 7. Smart-home Display
© 2010 ACADEMY PUBLISHER
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
B
User Evaluation
Both the qualitative as well as quantitative evaluation
was carried out in order to test the functionality and
behavior of the system as well as to test the usefulness of
the system with the potential users.
The system apart from the smart-home display was
tested with fifteen potential users. The users were from
different age ranges. Age factor was very important to
simulate potential user behaviour. All participants were
from different academic and cultural backgrounds and
represented various levels of IT skills.
For evaluation purposes, the system was deployed on
a laptop computer and the users were connected to the
same private network. In this way it was possible to test
system using different computers and different browser
versions. The system operated well and the user
feedback was positive.
The second evaluation was conducted with ten users.
All the participants were from the same academic
department (Computer Science students) but from
different cultural backgrounds. After the demonstration of
the system (apart from the smart-home display), the users
were asked the following questions. The results are
shown in Fig.8.
• Is the system easy to use?
• Is the system useful?
• Would you use the system to track location and
time of your family and friends?
• Would you let others track your location and time
always?
• Would you let others track your location and time
on a case-by-case basis?
•
•
•
•
•
•
•
•
•
Figure 8. User Evaluation Results
The third part of the user evaluation was to interview
separately a group of four potential users. The users
ranged in age from 40 to 60 and consisted of three
females and one male. The entire system was explained
to the potential users. Then the users were shown the
smart home display in Fig, 7, the functionality was
explained and two potential scenarios were given
involving journeys were described, which illustrate where
the system may be useful. An alternate screen with a
different driver just starting off, similar to Fig. 7 was
shown as well as an alternative design to Fig. 7. The
questions asked were:
• In general do you think all the information is
there which you require?
• Do you think the idea of using speech input is a
good idea?
© 2010 ACADEMY PUBLISHER
123
What do you like about the display and what
don’t you like?
What do you think is best: Showing the route
(planned) taken as:
o A series of the position markers?
o A solid path. There would be a
difference in colour and a direction
marker which would show where the
driver is?
Do you think it is useful to show the planned
route so that the viewer can see where the driver
is going or if the driver has deviated from the
planned route?
Do you think it is useful to show the current
speed?
Time to arrival:
o Do you think it is useful to show the
time to arrival?
o Alternatively do you think it is better to
show what the estimated arrival time
is?
o Do you think it would be good to have
the option to select either of the time to
arrival or the arrival time?
Which do you think is best:
o Showing the messages in a box?
o Showing the messages as a speech
bubble?
Do you think showing the driver list in the way
shown is a good idea?
Do you think that showing the kind of map
display (Referred to as View on the screen
mock-up) is a good idea? Do you like it?
What improvements would you like to see?
Speech input was thought to be good idea as the hands
are not tied up interacting with the screen. Other activities
were thought to be able to be performed while interacting
with the screen, for example while cooking. Speech input
was also perceived to be a more natural way to interact
with a computer . Furthermore speech was thought to be a
better way for people with disabilities to interact with a
system and also good for people who are not adept at
using a keyboard. However there was the worry that
speech other than the commands could present a problem,
due to speech not intended to be commands being
misinterpreted by the system.
Another theme that emerged was the ability to have
options and flexibility in what was displayed. For
instance, to have a basic operating mode, but also to have
a more flexible mode or a mode which was more
complicated. One example was to choose to display the
message box or not. Another example was to have
different map display options. For example, to have the
map occupying the whole of the top of the display, or
perhaps the entire display area.
A further example of flexibility is the option of
choosing to display either or both of estimated time to
arrival and estimated arrival time. Yet another example of
flexibility is either or both the current speed and a short-
124
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
term average speed (perhaps over the last five minutes).
One idea proposed was to have speeds scrolling across
the bottom of the screen in some way.
There were mixed feelings about the map display.
Some liked the position markers, some did not. One
interviewee thought that the position markers of the same
size were confusing as it was not clear where the driver
currently was or was going. It was suggested that there
might be someway to distinguish what was the current
position for example only having one position marker, or
position markers decreasing in size as they get older. One
idea mentioned in two interviews was to have a flashing
position marker. It was also thought that having two lines
showing where the driver had been a line in a different
colour showing where the driver was going to go could
be confusing. There were also mixed views about the
usefulness of displaying the planned route.
C Comparison of System to Other Systems
This section shows a comparison of the proposed system
and existing system in Table 1. As can be seen most
systems allow many to many tracking. Some are available
on more than one type of mobile phone hardware or
operating systems. However no other system are
integrated with the smart-home environment. The smart
home is important, because of energy efficiency (for
example [37]), health and safety (particularly monitoring
the health of the elderly) and assisting people [17] and
also government grants. As was seen in Section IIC there
has also been work that monitors the behaviour of the
inhabitants of a smart home for example [18], [19],[20]
and [21]. So integration of systems with the smart home
is important. Only the proposed system does this and the
other systems do not. Note that this paper was finalised in
mid June 2010. The functionality, availability and
hardware and operating systems relating to systems and
software in Table 1 may have changed since mid June
2010.
V.
CONCLUSIONS AND FUTURE WORK
In this paper, we have discussed the development of a
multi-user location-awareness system, to be used to
locate or track different people, especially family
members and friends. The tracking is particularly
important and the security and safety concerns are more
in bad weather and when there are traffic problems. The
proposed system integrates with the smart home and
incorporates a context-aware peripheral display in the
smart home.
The system is developed using a user-centred design
and evaluation methodology. The evaluation results show
that the system is useful and would be used by the users
to track their family members and friends. However, it is
also concluded that tracking would not always be
permitted due to privacy concerns.
Changes could be made to the smart home display,
following on from comments made by the interviewees in
the user evaluation. For example to allow more flexibility
in what is displayed, in terms of speed, arrival time and t
the planned route ant the route taken, if messages are
© 2010 ACADEMY PUBLISHER
displayed if there are new messages and the area of the
screen taken up by the map.
A future application is the use of Artificial Intelligence
for the system alarm function. It could be designed in a
way that would allow learning of user routines based on
collected data. The system would then be able to react
intelligently to changing conditions.
An additional feature that could be built around
existing code is the display of past positions. That would
allow user to see visited places, or for example to recreate
a path of some trip. The system could potentially share
this information with other users as recommendations of
interesting places to visit. The system could be also
integrated with social networking web sites and allow
users to share their positions using the web site specific
interface.
The system could be expanded to provide car
navigation functionality. A community could be built
around this to share information about road conditions
such as road works, traffic or accidents. The system could
be also adapted for a specific business use, to meet the
requirements of that particular market. An additional
possibility is the interfacing to systems that provide
information on aircraft or train arrival times.
Furthermore the functionality of the smart-home
display could be increased so that a user could set an
alarm to alert the user when a driver was at the
destination or was a user-specified time period (ie so
many minutes) from reaching their destination. This
functionality would be useful when a driver’s destination
was a home where the smart-home display is located.
That way an inhabitant of the smart home could be ready
for the arrival of a driver, for example have refreshments
ready. The alert would mean playing an audio alarm and
displaying the tracking screen on the smart-home display
in the room where people were located.
REFERENCES
[1] M. Weiser, “Some Computer Science issues in Ubiquitous
Computing”, Communications of the ACM, vol. 36, pp.
75-84, 1993
[2] M. Weiser and J. Seely Brown, “The coming age of calm
technology”, in Beyond Calculation - The Next Fifty Years
of Computing, P. J. Denning and R. M. Metcalfe, Eds.
Copernicus, 1996
[3] AMI, The European project AMI (Augmented Multiparty
Interaction), available via: http://www.amiproject.org
[4] B. Johanson, A. Fox and T. Winograd, “The Interactive
Workspaces Project: Experiences with Ubiquitous
Computing Rooms”, IEEE Pervasive Computing
Magazine, vol. 1, no. 2, pp. 515-523, 2009,
[5] CHIL Project, available via: http://chil.server.de
[6] V.-J. Khan, P. Markopoulos and B. Eggen, “On the role
of awareness systems for supporting parent involvement in
young children’s schooling”, in IFIP International
Federation for Information Processing, vol 241, 2007, pp.
91–101.
[7] E. Parkany, and C. Xie, “A complete review of incident
detection algorithms & their deployment: what works and
what doesn’t”, University of Massachusetts Transportation
Centre , prepared for The New England Transportation
Consortium,
available
via
http://www.ct.gov/dot/LIB/dot/documents/dresearch/NET
CR37_00-7.pdf
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
125
TABLE I.
COMPARISON OF PROPOSED SYSTEM TO EXISTING SYSTEMS
Software
System
or
Tracking
displayed
on device
Other
means
of
displaying
tracking
Web page
and mobile
phone
Web page,
or original
iPhone
Web page,
iPod Touch,
iPad
Proposed system
Many
many
to
En Route
[22]
Many
Many
to
Glympse [23]
Many
many
to
TomTom Buddies
[24]
Google Latitude
[26]
Many
many
Many
many
to
No
to
Web page
Centrl [27]
Many
Many
to
Web page
Pocket Life
and
Pocket Life Lite
[28]
Locus [30]
Many
many
to
Web page
HQ
Many to
many
Location
posting is
manual
No
Messages or
Notifications
Compatibility with Hardware
Privacy
Integration
with smart
home
Anytime
All hardware types
Yes
Yes
Anytime
iPhone 3G, iPhone 3GS,
iPhone (original)
Yes
No
Anytime, but
need
to
modify
notification
Anytime
iPhone 3G, iPhone 3GS,
iPod
Touch, mobile phone with Android
or Windows Mobile operating
systems
TomTom only. Mobile phone with
Bluetooth required
Android OS mobile phones
iPhone and iPod touch devices
Most colour Blackberry mobile
phones
Most Windows Mobile 5+ devices
Most Symbian S60 (Nokia)
iPhone
Blackberry
Android
Nokia
iPhone 3G/3GS, certain Nokia,
Blackberry, Samsung, LG, Samsung,
Sony Ericsson and HTC
Yes
No
Yes
No
Yes
No
Yes
No
Yes
No
iPhone 3G, iPhone 3GS
Yes
No
Anytime
Anytime
Anytime
Only with a
location post
[8] L. Figueiredo, I. Jesus, J.A.T. Machado, J.R. Ferreira and
J.L Martins de Carvalho, “Towards the development of
intelligent transportation systems.” in Proceedings of
Intelligent Transportation Systems, 2001, Oakland
California, USA
[9] Vodafone
UK
coverage
map,
available
via:
http://maps.vodafone.co.uk/coverageviewer/web/default.as
px
[10] Orange
UK
coverage
map,
available
via:
http://search.orange.co.uk/ouk/portal/coveragechecker.html
[11] 3
UK
coverage
map,
available
via:
http://www.three.co.uk/Help_Support/Coverage
[12] A. Skabardonis, T. Chira-Chavala and D. Rydzewski,.
“The I-880 Field Experiment: Effectiveness of Incident
Detection using Cellular Phones California PATH Program
Report” UCB-ITS-PRR-98-1, Institute of California,
Berkeley , 1998
[13] K.N. Balke, “An evaluation of existing incident detection
algorithms”. Research Report, FHWA/TX-93/1232-20,
Texas Transportation Institute, Texas A&M University
System, College Station, TX , 1993
[14] M. Bell and B. Thancanamootoo , “Automatic incident
detection in urban road networks”, in Proceedings of
Planning and Transport Research and Computation
(PTRC) Summer Annual Meeting, University of Sussex,
UK, 1986, pp. 175-185.
[15] M. W. Sermons and F.S. Koppelman, “Use of vehicle
positioning data for arterial incident detection”.
© 2010 ACADEMY PUBLISHER
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
Transportation Research Part C, vol. 4, no. 2, pp. 87-96,
1996
N.E. Thomas, “Multi-state and multi-sensor incident
detection systems for arterial streets”. Transportation
Research Part C, vol. 6, no. 2, pp 337-357, 1998
M. Chan, D. Esteve, C. Escriba, and E. Campo, “A review
of smart homes – Present state and future challenges”.
Comp. Meth. Prog. Biomed., vol. 91, pp 55-81 July 2008
M.C. Mozer, “The neural network house: an environment
that adapts to its inhabitants”, in Proceedings of the AAAI
Spring Symposium on Intelligent Environments, Menlo
Park, California, U.S.A.: AAAI Press, 1998, 110–114.
J. Krumm, S. Harris, B. Meyers, B. Brumitt, M. Hale, S.
Shafer, S “Multi-camera multi-person tracking for
EasyLiving”, in Proceedings of 3rd IEEE International
Workshop on Visual Surveillance, 2000, pp 3–10.
B. Brumitt, B. Meyers, J. Krumm, A. Kern and S. Shafer,
S. “EasyLiving: technologies for intelligent environments”.
In proceedings of Handheld and Ubiquitous Computing.
Second International Symposium, HUC 2000 (Lecture
Notes in Computer Science Vol.1927), 2000. pp12-29
T. Yamazaki, ˙”Beyond the smart home” In: Proceedings
of the International Conference on Hybrid Information
Technology (ICHIT’06), 2006, pp. 350–355
En Route HQ, En Route HQ web page, available via
http://www.enroutehq.com
Glympse,
Glympse
web
site
available
via
http://www,glympse.com
126
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
[24] TomTom, TomTom PLUS Services Buddies, available via
http://www.tomtom.com
[25] Glympse on iTunes Store on Apple iTunes application
[26] Google Latitude, Google Latitude web page available from
http://www.google.com/intl/en_us/latitude/intro.html
[27] Centrl, Central web site available from http://centrl.com
[28] Pocketlife, Pocketweb web page available via http:.//www.
pocketweb.com
[29] Look 8 Me, digital art media GmbH web page, available
from http://www.look8me.de
[30] Locus,
Nsquared
web
site,
available
from
http://nsquaredsolutions.com/Locus/
[31] Locc.us, Locc.us web page available from Locc.via
http://locc.us
[32] R. Iqbal , N. Shah N, A. James and J. Duursma , “Usercentred design and evaluation of support management
system”, in Proceedings of the 13th International
Conference on Computer Supported Cooperative Work in
Design, Santiago, Chile, April 2009, pp. 155-160
[33] Microsoft, GPS Intermediate Driver Architecture, available
via
http://msdn.microsoft.com/enus/library/bb201942.aspx.
[34] Microsoft, NET Compact Framework 3.5, available via:
http://www.microsoft.com/
[35] Microsoft,
About ASP.NET AJAX, available via
http://www.asp.net/ajax/about/
[36] Microsoft, Data Platform Development Center – LINQ,
available
from
http://msdn.microsoft.com/enus/data/cc299380.aspx
[37] N. Pardo, A. Sala, A. Montero, J.F. Urchueguia, J. Martos,
“Advanced control structure for energy management in
ground coupled heat pump HVAC system,” In Proceedings
of the 17th World Congress The International Federation of
Automatic Control, 2008, pp 2448-2453
Dr Rahat Iqbal is a Senior Lecturer in the Distributed Systems
and Modelling Applied Research Group at Coventry University.
His main duties include teaching and tutorial guidance, research
and other forms of scholarly activity, examining, curriculum
development, coordinating and supervising postgraduate project
students and monitoring the progress of research students within
the Department.
His research interests lie in requirements engineering, in
particular with regard to user-centred design and evaluation in
order to balance technological factors with human aspects to
explore implications for better design. A particular focus of his
interest is how user needs could be incorporated into the
© 2010 ACADEMY PUBLISHER
enhanced design of ubiquitous computing systems, such as
smart homes, assistive technologies, and collaborative systems.
He is using Artificial Intelligent Agents to develop such
supportive systems. He has published more than 50 papers in
peer reviewed journals and reputable conferences and
workshops
Dr Anne James is Professor of Data Systems Architecture in
the Distributed Systems and Modelling Applied Research
Group at Coventry University. Her main duties involve leading
research, supervising research students and teaching at
undergraduate and postgraduate levels. Her teaching interests
are enterprise systems development, distributed applications
development and legal aspects of computing.
The research interests of Professor James are in the general
area of creating systems to meet new and unusual data and
information challenges. Examples of current projects are the
development of Quality of Service guarantees in Grid
Computing and the development of special techniques to
accommodate appropriate handling of web transactions.
Professor James has supervised around 20 research degree
programmes and has published more than 100 papers in per
reviewed journals or conferences. She is currently also involved
in an EU FP7 funded programme to reduce energy consumption
in homes, through appropriate data collection and presentation.
Dr John Black has a B.Sc. in Physics and Astrophysics as well
as a Ph.D, in astronomical image processing, both obtained
from King’s College, University of London. He has conducted
research in vector quantisation, image and data fusion, tracking,
image and video compression and data mining. He has worked
at Coventry University, University of Warwick, QinetiQ and
QinetiQ’s predecessor organisations. At the time of writing of
this paper he was completing an M.Sc. in software engineering
at Coventry University.
Witold Poreda is a software developer at UYT Limited ,
Coventry, UK. UYT Limited is an automotive component
manufacturing facility producing Body-in-White (BIW)
components and sunroof assemblies.
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
127
Discrete Characterization of Domain Using
Semantic Clustering
Sanjay Madan
Comviva Technologies Ltd.,
MBS-PACS, Gurgaon, India.
Email:
[email protected]
Shalini Batra
Computer Science and Engineering Department,
Thapar University, Patiala, Punjab, India
Email:
[email protected].
Abstract—Lots of approaches have been developed to
understand the software source code and majority of them
are focused on program structural information which
results in the loss of domain semantic crucial information
contained in the text or symbols of source code. To
understand software as a whole, we need to enrich these
approaches with conceptual insights gained from the
domain semantics. This paper proposes the mapping of
domain to the code using the information retrieval
techniques to use linguistic information, such as identifier
names and comments in source code. Concept of Semantic
Clustering has been introduced in this paper and an
algorithm has been provided to group source artifacts based
on how the synonymy and polysemy is related. Based on
semantic similarity automatic labeling of the program code
is done after detecting the clusters, and is visually explore in
3-Dimension for discrete characterization. This approach
works at the source code textual level which makes it
language independent. The approach correlates the
semantics with structural information applies at different
levels of abstraction (e.g. packages, classes, methods).
Index Terms— Information retrieval, Semantic clustering,
Software reverse engineering.
I. INTRODUCTION
TO get knowledge about a software system is one of
the main activities in software reengineering. It has been
estimated that up to 60 percent of software maintenance
is spent on comprehension [1]. This is because a lot of
knowledge about the software system and its associated
business domain is not captured in an explicit form. Most
approaches that have been developed focus on program
structure [2] or on external documentation [3, 4].
However, the identifier names and the source code
comments are the main fundamental source of
information.
The source code comprises of two types of
communication: human-machine communication through
program
instructions
and
human
to
human
communications through names of identifiers and
© 2010 ACADEMY PUBLISHER
doi:10.4304/jait.1.3.127-132
comments [5]. The executables are for machine where as
code is written for humans not for machines. Let us
consider a small code example, which tell whether a time
value is in the morning:
/** Return true if the given 24-hour time is in the
morning and false otherwise. */
public boolean isMorning(int hours,int minutes,int
seconds) {
if (!isDate(hours, minutes, seconds)) throw
Exception(”Invalid input: not a time value.”)
return hours < 10 && minutes < 60 && seconds <
60; }
When we strip away all identifiers and comments,
from the machine point of view the functionality remains
the same, but for a human reader the meaning is
obfuscated and almost impossible to figure out. In our
example, retaining formal information yields:
public type1 method1(type2 a, type2 b, type2 c) {
if (!method 2(a, b ,c)) throw Exception(literal 1).
return (a < A) && (b < B) && (c < C);
}
In this informal information, the vocabulary is
presented in random order and the domain of the code is
still recognizable. In this example, retaining only the
naming yields:
is int hours minutes int < minutes input hours is
seconds && boolean morning false 24 time minutes not
60 invalid && value seconds time < seconds hour
given hours 60 12 < morning date int is otherwise [5].
It is a well known fact that information retrieval
provides means to analyze, classify and characterize text
documents based on their content and the given
representation of documents as bag-of-terms is a wellestablished technique in information retrieval (IR) used to
model documents in a text corpus. Apart from external
documentation, the location and use of source-code
identifiers is the most frequently consulted source of
128
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
information in software maintenance. In the software
analysis different approaches that apply IR on external
documentation [6, 7], but only few work has been
focused on treating the source code itself as data source.
In this case we are using information retrieval to derive
topics from the vocabulary usage at the source code level.
First three steps of the domain extraction from source
code include: pre-processing, applying LSI, and
clustering. Furthermore we retrieve the most relevant
terms for each cluster, thus in short the approach is:
(1) Pre-processing the software system. Break the
system into documents and build a term-documentmatrix that contains the vocabulary usage of the
system.
(2) Applying Latent Semantic Indexing. Use LSI to
compute the similarities between source code
documents and illustrate the result in a correlation
matrix [10].
(3) Identifying topics. Then cluster the documents based
on their similarity, we rearrange the correlation matrix
and each cluster is a linguistic topic.
(4) Describing the topics with labels. Use LSI again to
retrieve for each cluster the top-n most relevant terms.
II. LATENT SEMANTIC INDEXING
Latent Semantic Indexing (LSI) is a technique
common in information retrieval to index, analyzes and
classifies text documents. It analyzes how terms are
spread over the documents of a text corpus and creates a
search space with document vectors: similar documents
are located near each other in this space and unrelated
documents far apart of each other. Since LSI can be used
to locate linguistic topics in a set of documents [8, 9], it is
applied to compute the linguistic similarity between
source artifacts (e.g. packages, classes or methods) and
cluster them according to their similarity. This clustering
partitions the system into linguistic topics that represent
groups of documents using similar vocabulary. It is used
to analyze the linguistic information of a software system
as the source code is basically composed of text
documents.
To illustrate it further, like other IR techniques, Latent
Semantic Indexing is based on the vector space model
(VSM) approach. This approach models documents as
bag-of-words and arranges them in a Term-Document
Matrix A, such that ai,j equals the number of times term ti
occurs in document dj.
LSI has been developed to overcome problems with
synonymy and polysemy that occurred in prior vectorial
approaches, and thus improves the basic vector space
model by replacing the original term-document matrix
with an approximation. This is done using singular value
decomposition (SVD), a principal components analysis
(PCA) technique originally used in signal processing to
reduce noise while preserving the original signal.
Assuming that the original term-document matrix is noisy
(the synonymy and polysemy), the approximation is
interpreted as a noise reduced – and thus better – model
of the text corpus.
© 2010 ACADEMY PUBLISHER
For example, a typical search engine covers a text
corpus with millions of web pages, containing some ten
thousands of terms, which is reduced to a vector space
with 200-500 dimensions only. In Software Analysis, the
number of documents is much smaller and we reduce the
text corpus to 20-50 dimensions.
There is a wide range of applications of LSI, such as
automatic assignment of reviewers to submitted
conference papers [10], cross-language search engines,
spell checkers and many more. In the field of software
engineering LSI has been successfully applied to
categorized source files [11] and open-source projects
[12], detect high-level conceptual clones [13], recover
links between external documentation and source code
[14,15]. Furthermore LSI has proved useful in
psychology to simulate language understanding of the
human brain, including processes such as the language
acquisition of children.
Figure 1 schematically represents the LSI process. The
document collection is modeled as a vector space. Each
document is represented by the vector of its term
occurrences, where terms are words appearing in the
document. The term-document-matrix A is a sparse
matrix and represents the document vectors on the rows.
This matrix is of size n × m, where m is the number of
documents and n the total number of terms over all
documents. Each entry ai,j is the frequency of term ti in
document dj . A geometric interpretation of the termdocument-matrix is a set of document vectors occupying
a vector space spanned by the terms. The similarity
between documents is typically defined as the cosine or
inner product between the corresponding vectors. Two
documents are considered similar if their corresponding
vectors point in the same direction.
Figure 1, LSI takes as input a set of documents and the terms
occurrences, and returns as output a vector space containing all the
terms and all the documents. The similarity between two items (terms or
documents) is given by the angle between their corresponding vectors
[5].
LSI starts with an input as term-document-matrix,
weighted by a weighting function to balance out very rare
and very common terms. SVD is used to break down the
vector space model into less dimensions. This algorithm
preserves as much information as possible about the
relative distances between the document vectors, while
collapsing them into a much smaller set of dimensions.
SVD decomposes matrix A into its singular values and
its singular vectors, and yields – when truncated at the k
largest singular values – an approximation A` of A with
rank k. Furthermore, not only the low-rank termdocument matrix A` can be computed but also a termterm matrix and a document-document matrix. Thus, LSI
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
allows us to compute term-document, term-term and
document-document similarities.
As the rank is the number of linear-independent rows
and columns of a matrix, the vector space spanned by A`
is of dimension k only and much less complex than the
initial space. When used for information retrieval, k is
typically about 200-500, while n and m may go into
millions. When used to analyze software on the other
hand, k is typically about 20−50 with vocabulary and
documents in the range of thousands only, and since A` is
the best approximation of A under the least-square-error
criterion, the similarity between documents is preserved,
while in the same time mapping semantically related
terms on one axis of the reduced vector space and thus
taking into account synonymy and polysemy. In other
words, the initial term-document-matrix A is a table with
term occurrences and by breaking it down to much less
dimension the latent meaning must appear in A` since
there is now much less space to encode the same
information. Meaningless occurrence data is transformed
into meaningful concept information.
III. TERM AND DOCUMENT SIMILARITY
To show the SVD factors geometrically, the rows of
the matrices are taken as coordinates of points
representing the documents and terms vector dimensional
space. The nearer one points to the other, if they are more
similar documents or terms (see Figure 2). Similarity is
typically defined as the cosine between the corresponding
vectors:
sim(di, dj) = cos(vi, vj)
129
from the source code and then clustering is applied to
group the related software artifacts into clusters and
groups of artifacts having the same vocabulary are
identified and these are called clusters. Thus each cluster
reveals a different concept of the system. Most of these
are domain concepts, some are implementation concepts.
The actual ratio depends on the naming convention of the
system.
At the end, the inherently unnamed concepts are
labelled with terms taken from the vocabulary of the
source code. An automatic algorithm labels each cluster
with most similar terms, and in this way provide a human
readable description of the main concepts in a software
system. Additionally, the clustering is visualized as a
shaded Correlation Matrix that illustrates:
• the semantic similarity between elements of the
system, the darker a dot the more similar its artifacts,
• a partition of the system into clusters with high
semantic cohesion, which reveals groups of software
artifacts that implement the same domain concept,
• semantic links between these clusters, which
emphasize single software artifacts that interconnect
the above domain concepts.
Figure 3: From left to right: unordered correlation matrix, then sorted by
similarity, then grouped by clusters, and finally including semantic links
[5].
V. BUILDING THE TEXT CORPUS
Figure 2: On the left: An LSI-Space with terms and documents, similar
elements are placed near each other [5].
Computing the similarity between document di and dj
is done taking the cosine between the i-th and j-th row of
the matrix. The resulting cosine value, similarity values
range from 1 to 0: 1 for similar vectors with the same
direction and to 0 for dissimilar, orthogonal vectors.
Theoretically cosine values can go all the way to −1, but
because there are no negative term occurrences, similarity
values never goes below to zero.
IV. SEMANTIC CLUSTERING
Semantic clustering is a non-interactive and
unsupervised technique to analyze the semantics of a
software system. Semantic clustering offers a high level
view on the domain concepts of a system, abstracting
concepts from software artifacts. Firstly, Latent Semantic
Indexing (LSI) is used to extract linguistic information
© 2010 ACADEMY PUBLISHER
Text corpus is a large and structured set of texts. To
build a semantic model, Latent Semantic Indexing (LSI)
is used to analyze the distribution of terms over a text
corpus. When applying LSI on a software system we
break its source code into documents and use the
vocabulary found therein as terms. The system can be
split into documents at any level of granularity, such as
modules, classes or methods, it is even possible to use
entire projects as documents [16].
The vocabulary of source can be extracted both from
the content of comments and from the identifier names.
Comments are parsed as natural language text and
compound identifier names split into their parts. As most
modern naming conventions are used camel case, it is
straight forward to split identifiers: for example, FooBar
becomes foo and bar. In case of legacy code that uses
other naming conventions, more advanced algorithms and
heuristics are required [17]-[18].
Common stop words are excluded from the
vocabulary, as they do not help to discriminate
documents, and stemmer algorithm is used to reduce all
words to their morphological root. Finally the termdocument matrix is weighted with tf-idf (Term frequency,
inverted document frequency), to balance out the
influence of very rare and very common terms.
130
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
VI. SEMANTIC SIMILARITY AND CORRELATION MATRIX
Semantic
similarity is
the
likeness
of
meaning/semantic content within a set of documents or
terms. Latent Semantic Indexing (LSI) can be used to
extract linguistic information from the source code. The
result of this process will be an LSI index L with
similarities between software artifacts as well as terms.
Based on the index we can determine the similarity
between these elements. Software artifacts are more
similar if they cover the same concept, terms are more
similar if they denote related concepts. Since similarity is
defined as cosine between element vectors, its values
range between 0 and 1. The similarities between elements
are arranged in a square matrix A called the Correlation
Matrix.
To visualize the similarity values we map them to gray
values: the darker, the more similar. In that way the
matrix becomes a raster-graphic with gray dots: each dot
ai,j shows the similarity between element di and element
dj. The elements are arranged on the diagonal and the dots
in the off-diagonal show the relationship between them.
Without proper ordering, the correlation matrix looks
like a television tuned to a dead channel. An unordered
matrix does not reveal any patterns: arbitrary ordering,
such as the names of the elements, is generally as useful
as random ordering [19]—therefore, matrix will be
clustered such that similar elements are put near each
other and dissimilar elements far apart of each other.
After applying the clustering algorithm, the similar
elements are grouped together and aggregated into
concepts. Hence, a concept is characterized as a set of
elements that uses the same vocabulary. Documents that
are not related to any concept usually end up in singleton
clusters in the middle or in the bottom right of the
correlation matrix. The correlation matrices are ordered
using average linkage clustering algorithm.
The matrix will be reordered first, and then dots will
be grouped by clusters and colour them with their
average cluster similarity. As with the element
similarities in the previous section, the similarities
between clusters are arranged in a square matrix A. When
visualized, this matrix becomes a raster-graphic with gray
rectangles: each rectangle ri,j shows the similarity
between cluster Ri and cluster Rj , and has the size (|Ri|, |
Rj|). The clusters are arranged on the diagonal and the
rectangles in the off-diagonal show the relationship
between them—see the third matrix on Figure 3.
A correlation matrix is gray-scale raster-graphic: each
dot ai,j shows the similarity between element di and
element dj—the darker, the more similar. The elements
are arranged on the diagonal while the dots in the offdiagonal show the relationship between them. An
unordered matrix does not reveal any patterns; therefore
we cluster the elements and sort the matrix: all dots in a
cluster are grouped together and are colour with their
average similarity; this is semantic cohesion [20]. This
offers a high-level view on that system, abstracting from
elements to concepts.
VII. DISCRETE CHARACTERIZATION OF CLUSTERS
© 2010 ACADEMY PUBLISHER
Visualization of the cluster in 3-Dimesion extended the
domain detection concept much simpler in terms of
distributed application. Just visualizing clusters is not
enough; labelling is required to describe the cluster. Often
just enumerating the names of the software artifacts in a
cluster gives a sufficient interpretation. If the names are
badly chosen or unnamed software artifacts are analyzed,
we need an automatic way to identify labels. Figure 4
shows the labels in the concept of LAN example.
Figure 4: Automatically retrieved labels describe the concepts. The
labels were retrieved using the documents in a concept cluster as query
to search the LSI space for related terms.
To obtain the most relevant labels comparison will be
performed between the similar terms of the current
cluster and similar terms of all other clusters.
All the steps of the domain extraction from source
code include: pre-processing, applying LSI, clustering
and retrieve the most relevant terms for each cluster and
the similarity measurement to identify topics in the
source code will follow the flow as depicted in the figure:
Figure 5: Modified Semantic clustering of software source code [5].
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
VIII. CONCLUSION
When understanding a software system, analyzing its
structure reveals only half of the story. The other half
resides in the domain semantics of the implementation.
Developers put their domain knowledge into identifiers
name or comments. This work presented the use of
Semantic Clustering to analyze the textual content of
source code to recover domain concepts from the code
itself [22]. To identify the different concepts in the code,
we applied Latent Semantic Indexing (LSI) and cluster
the source artifacts according to the vocabulary of
identifiers and comments. Each cluster represents a
distinct domain concept. To define the concept and to
retrieve the most relevant labels for clusters, LSI
technique has been used. For each cluster, the labels are
obtained by ranking and filtering the most similar terms
[16]. The result of applying LSI is a vector space, based
on which we can compute the similarity between either
documents or terms.
REFERENCES
[1] A. Abran, P. Bourque, R. Dupuis, L. Tripp, “Guide to the
software engineering body of knowledge (ironman
version),” Tech. rep., IEEE Computer Society (2004).
[2] S. Ducasse, M. Lanza, “The class blueprint: Visually
supporting the understanding of classes,” IEEE
Transactions on Software Engineering 31 (1) (2005) 75–
90.
[3] Y. S. Maarek, D. M. Berry, G. E. Kaiser, “An information
retrieval approach for automatically constructing software
libraries,” IEEE Transactions on Software Engineering 17
(8) (1991) 800–813.
[4] G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, E.
Merlo, “Recovering traceability links between code and
documentation,” IEEE Transactions on Software
Engineering 28 (10) (2002) 970–983.
[5] Adrian Kuhn, Stephane Ducasse, Tudor Girba, “Semantic
Clustering: Identifying Topics in Source Code,” Language
and Software Evolution Group, LISTIC, Universite de
Savoie, France, 2006
[6] Yo¨elle S. Maarek, Daniel M. Berry, and Gail E. Kaiser,
“An information retrieval approach for automatically
constructing software libraries,” IEEE Transactions on
Software Engineering, 17(8):800–813, August 1991.
[7] Giuliano Antoniol, Gerardo Canfora, Gerardo Casazza,
Andrea De Lucia, and Ettore Merlo, “Recovering
traceability links between code and documentation,” IEEE
Transactions on Software Engineering, 28(10):970–983,
2002.
[8] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G.
W. Furnas, R. A. Harshman, “Indexing by latent semantic
analysis,” Journal of the American Society of Information
Science 41 (6) (1990) 391–407.
[9] A. Marcus, A. Sergeyev, V. Rajlich, J. Maletic, “An
information retrieval approach to concept location in
source code”, in: Proceedings of the 11thWorking
Conference on Reverse Engineering (WCRE 2004), 2004,
pp. 214–223.
[10] S. T. Dumais, J. Nielsen, “Automating the assignment
of submitted manuscripts to reviewers,” In Research and
Development in Information Retrieval, 1992, pp. 233–244.
[11] J. I. Maletic, A. Marcus, “Using latent semantic
analysis to identify similarities in source code to support
© 2010 ACADEMY PUBLISHER
131
program understanding,” In: Proceedings of the 12th
International Conference on Tools with Artificial
Intelligences (ICTAI 2000), 2000, pp. 46–53.
[12] S. Kawaguchi, P. K. Garg, M. Matsushita, K. Inoue,
“Mudablue: An automatic categorization system for open
source repositories,” in: Proceedings of the 11th AsiaPacific Software Engineering Conference (APSEC 2004),
2004, pp. 184–193.
[13] A. Marcus, J. I. Maletic, “Identification of high-level
concept clones in source code,” in: Proceedings of the 16th
International Conference on Automated Software
Engineering (ASE 2001), 2001, pp. 107–114.
[14] A. De Lucia, F. Fasano, R. Oliveto, G. Tortora,
“Enhancing an artefact management system with
traceability recovery features,” in: Proceedings of 20th
IEEE International Conference on Software Maintainance
(ICSM 2004), 2004, pp. 306–315.
[15] A. Marcus, D. Poshyvanyk, “The conceptual cohesion
of classes,” in: Proceedings Internationl Conference on
Software Maintenance (ICSM 2005), IEEE Computer
Society Press, Los Alamitos CA, 2005, pp. 133–142.
[16] Adrian Kuhn, Stephane Ducasse, and Tudor Girba,
“Semantic clustering: Exploiting source code linguistic
information,” Information and Software Technology,
submitted, 2006.
[17] Bruno Caprile and Paolo Tonella. Nomen est omen,
“Analyzing the language of function identifiers,” In
Proceedings of 6th Working Conference on Reverse
Engineering (WCRE 1999), pages 112–122. IEEE
Computer Society Press, 1999.
[18] Nicolas Anquetil and Timothy Lethbridg, “Extracting
concepts from file names; a new file clustering criterion,”
In International Conference on Software Engineering
(ICSE’98), pages 84–93, 1998.
[19] Jaques Bertin, “Graphics and Graphic Information
Processing,” Walter de Gruyter, 1981.
[20] Andrian Marcus and Denys Poshyvanyk, “The
conceptual cohesion of classes,” In Proceedings
Internationl Conference on Software Maintenance (ICSM
2005), pages 133–142, Los Alamitos CA, 2005. IEEE
Computer Society Press.
[21] Michael W. Berry, Susan T. Dumais, and Gavin W.
O’Brien, “Using linear algebra for intelligent information
retrieval,” SIAM Review, 37(4):573–597, 1995
[22] Adrian Kuhn, St´ephane Ducasse, and Tudor Gˆırba,
“Enriching reverse engineering with semantic clustering,”
In Proceedings of Working Conference on Reverse
Engineering (WCRE 2005), pages 113–122, Los Alamitos
CA, November 2005. IEEE Computer Society Press.
Sanjay Madan, Author
Sanjay Madan is working as Software
Engineer in Comviva Technologies Ltd,
Gurgaon since 2009. He has done Post
graduation from Thapar University, Patiala.
He had worked on more than six
professional/research projects. He is the
author/co-author of four publication in international conferences
and journals. His research area of interest include Web semantics
and machine learning particularly semantic clustering and
classification. He had taken courses in their teaching career as of
Data Structure, Web Technologies and Computer Graphics.
132
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
Shalini Batra, Author
Shalini Batra is working as Assistant
Professor in Computer Science and
Engineering
Department,
Thapar
University, Patiala since 2002. She has
done her Post graduation from BITS,
Pilani and is pursuing Ph.D. from Thapar
© 2010 ACADEMY PUBLISHER
University in the area of Semantic and Machine Learning. She
has guided fifteen ME theses and presently guiding four. She is
author/co-author of more than twenty-five publications in
national and international conferences and journals. Her areas of
interest include Web semantics and machine learning
particularly semantic clustering and classification. She is taking
courses of Compiler construction, Theory of Computations and
Parallel and Distributed Computing.
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
133
GRAAA: Grid Resource Allocation Based on
Ant Algorithm
Manpreet Singh
Department of Computer Engineering, M. M. Engineering College, M. M. University, Mullana, Haryana, India
Email:
[email protected]
Abstract— Selecting the appropriate resources for the
particular task is one of major challenging work in the
computational grid. The major objective of resource
allocation in grid is effective scheduling of tasks and in turn
the reduction in execution time. Hence the resource
allocation must consider some specific characteristics of the
resources, tasks and then decide the metrics to be used
accordingly. Ant algorithm, which is one of the heuristic
algorithm suits well for the allocation and scheduling in grid
environment. In this paper, a Grid Resource Allocation
based on Ant Algorithm (GRAAA) is proposed. The
simulation result shows that the proposed algorithm is
capable of producing high quality allocation of grid
resources to tasks.
Index Terms— Resource Allocation, Task Scheduling, Ant
System, Grid
I.
INTRODUCTION
Resource allocation and task scheduling are
fundamental issues in achieving high performance in grid
computing systems. However, it is a big challenge for
efficient allocation and scheduling algorithm design and
implementation. Unlike scheduling problems in
conventional distributed systems, this problem is much
more complex as new features of grid systems such as its
dynamic nature and the high degree of heterogeneity of
jobs and resources must be tackled [1]. By harmonizing
and distributing the grid resources efficiently, an
advanced resource allocation strategy can reduce total run
time and total expense greatly and bring an optimal
performance [2] [3].
Ant algorithm is a random search algorithm, like other
evolutionary Algorithms [4]. The algorithm is a new
model-based bionic approach with different transition and
pheromone updating rules. It is inspired in food selfenforcing foraging behaviors exhibited by ant societies. It
is algorithm for solving the NP-hard combinatorial
optimization problems, such as TSP (Traveling Salesman
Problem) [3]. Then it was used in JSP (Job-shop
Scheduling Problem) [5][6], QAP (Quadratic Assignment
Problem), and so on [7].
The motivation of this paper is to develop a grid
resource allocation algorithm that can perform efficiently
and effectively in terms of minimizing total execution
time and cost. Not only does it improve the overall
performance of the system but it also adapts to the
dynamic grid system. First of all, this paper proposes a
Resource Oriented Ant Algorithm (ROAA) to find the
optimal allocation of each resource within the dynamic
grid system. Secondly, the simulation of proposed
algorithm is presented using Gridsim.
II.
RELATED WORK
Recently, many researchers have studied several works
on allocation and scheduling in grid environment. Some
of the popular heuristic algorithms, which have been
developed, are Min-Min [8], the Fast Greedy [8], Tabu
Search [8] and an Ant System [9]. Max-Min Ant System
(MMAS) [10] limit the pheromone range to be greater
than or equal to the low bound value (Min) and smaller
than or equal to the upper bound value (Max) to avoid
ants to converge too soon in some ranges. [11] use
multiple kinds of ant to find multiple optimal paths for
network routing. The idea can be applied to find multiple
available resources to balance resources utilization in job
scheduling. In [12], the scalability of ant algorithm is
validated, a simple Grid simulation architecture and
design of ant algorithm suitable for Grid task
scheduling is proposed.
III.
GRID RESOURCE ALLOCATION BASED ON
ANT ALGORITHM (GRAAA)
The GRAAA is a resource allocation framework,
which comprises user, resource broker, resources and
Grid Information Services (GIS). It adopts ant colony as
major allocation strategy as shown in Fig.1.
Query
Grid Information
Service
Resource Broker
Ant Allocation Algorithm
Task Agent
Resource
Discovery Agent
Registration
Status Query /
Task Submission
Task
Resource R1
Result
Availability Information /
Task Execution Results
Resource R2
.
.
.
User
Resource RN
Figure 1: System Model
© 2010 ACADEMY PUBLISHER
doi:10.4304/jait.1.3.133-135
Resource
Information
134
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
The interaction among various entities of system model
is as follow:
Step 1: Resource registration to GIS takes place.
Step2: User submit task with complete specification to
resource broker through grid portal.
Step3: Task Agent (TA) places all submitted tasks in a
task set and activates Resource Discovery Agent (RDA).
Step4: RDA queries GIS regarding resources.
Step5: GIS returns the static attribute of resources such
as number of machines, number of processing elements
(PE), MIPS (Million Instruction Per second) rating of
each PE, allocation policy.
Step6: RDA send query to registered resources for
their availability status.
Step7: RDA gets the status information and makes it
available to TA.
Step8: TA, by deploying ant algorithm, select a
resource for next task assignment and dispatch the task to
selected resource through RDA.
Step9: After task execution, results are received from
resources and are returned to user by TA.
IV.
RESOURCE ORIENTED ANT ALGORITHM
(ROAA)
Ant algorithm [3] is inspired on an analogy with real
life behavior of a colony of ants when looking for food,
and is effective algorithm for the solution of many
combinatorial optimization problems. Investigations
show that: Ant has the ability of finding an optimal path
from nest to food. On the way of ants moving, they lay
some pheromone on the ground. While an isolated ant
moves essentially at random, an ant encountering a
previously laid trail can detect it and decide with high
probability to follow it, thus reinforcing the trail with its
own pheromone. The probability of ant chooses a way is
proportion to the concentration of a way’s pheromone. To
a way, the more ants choose, the way has denser
pheromone, and the denser pheromone attracts more ants.
Through this positive feedback mechanism, ant can find
an optimal way finally.
In ROAA, the pheromone is associated with resources
rather than path. The increase or decrease of pheromone
depends on task status at resources. The main objective of
algorithm is reduction in total cost and execution time.
Let the number of tasks (ants) in task set T maintained
by task agent is P and the number of registered resources
is Q .
When a new resource R i is registered to GIS, then it
will initialize its pheromone based on:
τ i (0 ) = N x M ,
where N represent the number of processing elements
and M corresponds to MIPS rating of processing
element.
Whenever a new task is assigned to or some task is
returned from R i , then the pheromone of R i is changed
as:
© 2010 ACADEMY PUBLISHER
τ i new = ρ *τ i old + ∆τ i ,
where
∆ i is pheromone variance and ρ , 0 < ρ < 1 is a
pheromone decay parameter. When a task is assigned
to R i , its pheromone is reduced i.e. ∆ i = − C , where C
represents the computational complexity of assigned task.
When
a
task
is
successfully
returned
from R i , ∆ i = Φ * C , where Φ is the encouragement
argument. On the other hand, if task failure occurs
at R i , ∆ i = Θ * C , where Θ is the punishment
argument. It is clear that pheromone increases when task
execution at a resource is successful.
The possibility of next task assignment to resource
R j is computed as: p j (t ) =
[τ (t )] * [η ]
α
β
j
j
∑ [τ (t )]α * [η ]β
r
r
,
r
τ j (t ) denotes the
current pheromone of resource R j and η j represents the
initial pheromone of R j i.e. η j = τ j (0 ) . α is the
where j, r ∈ available resources .
parameter on relative performance of pheromone trail
intensity, β is the parameter on relative importance of
initial performance attributes. The process of resource
oriented ant algorithm is shown below.
Procedure Resource_Ant_Algorithm
Begin
Initialize parameters and set pheromone trails.
While (Task set T ≠ Φ ) do
Begin
Select next task t from T .
Determine the next resource R i for
task
assignment having higher transition probability among all
resources
(high
pheromone
intensity),
i.e.
p i (t ) = max p l (t ) .
l∈Q
Schedule task
t to R i and remove it from T
i.e. T = T − {t } .
If (Any task completion or failure occurs) then
Update pheromone intensity of corresponding
resource and transition probability of all registered
resources.
End
End
V.
SIMULATION RESULTS
We analyze the ROAA using GridSim simulator [13].
Resource and tasks used in simulation are modeled as
shown in Table1. The proposed algorithm is compared
with the algorithm already used in GridSim. This
algorithm selects the next resource for task assignment in
a random fashion (RandomAlgorithm). Other simulation
parameters are:
ρ = 0.9, α = 0.5, β = 0.5, Φ = 1.1, Θ = 0.8
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
135
In our simulation, we use 10 heterogeneous grid
resources and we run a simulation at five levels of
workloads-50 tasks, 100 tasks, 150 tasks, 200 tasks and
250 tasks.
reduction in total execution time and cost. In future
work, we plan to add the applications of ant level
load balancing in addition to implementing this
mechanism in a more realistic environment.
Table 1: Simulation Parameters
REFERENCES
Parameter
Number of Resources
Number of PE per Resource
MIPS of PE
Resource Cost
Total Number of Tasks(Ants)
Length of Task
Value
10
4-16
300-900
9 G$
50-250
10000-18000 MI
(Million
Instructions)
[1]
[2]
[3]
90000
ROAA
RandomAlgorithm
80000
[4]
70000
Co st( G $ )
60000
50000
[5]
40000
30000
[6]
20000
10000
0
50
100
150
200
250
300
No. of Task
[7]
Figure 2: Comparison of Total Cost
800
[8]
ROAA
Random Algorithm
E x e c u t io n T im e ( h r . )
700
600
500
400
[9]
300
200
[10]
100
0
0
50
100
150
200
250
300
No. of Task
Figure 3: Comparison of Total Execution Time
Each task is submitted into grid system randomly. Fig.
2 and Fig. 3 show that system using ROAA outperforms
system using RandomAlgorithm in terms of execution
time and cost.
VI.
[11]
[12]
CONCLUSION
In this paper, we described grid resource allocation
using an ant algorithm. The results of the experiments are
also presented and the strengths of the algorithm are
investigated. The simulation results demonstrate that the
ROAA algorithm increases performance in terms of
© 2010 ACADEMY PUBLISHER
[13]
I. Foster, C. Kesselman, and S. Tuecke,“ The Anatomy of
the Grid: Enabling Scalable Virtual Organizations,”
International Journal of High Performance Computing
Applications, vol. 15(3), pp. 200-222, 2001.
C. Chapman, M. Musolesi, W. Emmerich, and C.
Mascolo,
“Predictive
Resource
Scheduling
in
Computational Grids,” IEEE Parallel and Distributed
Processing Symposium, 2007 (IPDPS 2007), pp. 1-10, 2630 March 2007.
K. Krauter, R. Buyya, and M. Maheswaran, “A
Taxonomy and survey of Grid Resource Management
Systems for Distributed Computing,” Software: Practice
and Experience (SPE) Journal, Wiley Press, USA, vol.
32(2), pp. 135-164, 2002.
M. Dorigo, and L. M. Gambardella, “Ant colony system:
a cooperative learning approach to the traveling salesman
problem,” IEEE Transactions on Evolutionary
Computation, vol. 1(1), pp. 53-66, 1997.
A. Colorni , M. Dorigo, and V. Maniezzo, “Ant colony
system for job-shop scheduling,” Belgian Journal of
Operations Research Statistics and Computer Science,
vol. 34(1), pp. 39-53,1999.
A. Lorpunmanee, M. N. Sap, A. H. Abdullah, and C.
Chompoo-inwai, “An Ant Colony Optimization for
Dynamic Job Scheduling in Grid Environment,”
International Journal of Computer and Information
Science and Engineering, vol. 1(4), pp. 207-214, 2007.
V. Maniezzo, and A. Colorni, “The Ant System Applied
to Quadratic Assignment Problem,” IEEE Transaction on
Knowledge and Data Engineering, vol. 11(5), pp. 769778, 1999.
T. D. Braun, H. J. Siegel, N. Beck, L. L. Bölöni, M.
Maheswaran, A. I.Reuther, J. P. Robertson, M. D. Theys,
B. Yao, D. Hensgen and R. F. Freund , “A Comparison of
Eleven Static Heuristics for Mapping a Class of
Independent Tasks onto Heterogeneous Distributed
Computing Systems”, Journal of Parallel and Distributed
Computing, vol. 61(6), pp. 810-837, 2001.
Li, X. Peng, Z. Wang, and Y. Liu, “Scheduling
Interrelated Tasks in Grid Based on Ant Algorithm,”
Journal of System Simulation, 2007.
T. Stutzle, “MAX-MIN Ant System for Quadratic
Assignment Problems” Technical Report AIDA-97-04
Intellectics Group, Department of Compute Science,
Darmstadt University of Technology, Germany, July
1997.
K. M. Sim, and W. H. Sun “Multiple Ant-Colony
Optimization for Network Routing”, Proceedings of First
International Symposium on Cyber Worlds, pp. 277-281,
6-8 Nov. 2002.
Z. Xu, X. Hou, and J. Sun, “Ant Algorithm-Based Task
Scheduling in Grid Computing,” Proceeding of the IEEE
Conference on Electrical and Computer Engineering, pp.
1107-1110, 2003.
R. Buyya, and M. Murshed, “GridSim: A Toolkit for the
Modeling and Simulation of Distributed Resource
Management and Scheduling for Grid Computing,”
Concurrency & Computation: Practice & Experience, vol.
14, pp. 1175-1220, 2002.
136
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
A Channel Allocation Algorithm for Hot-Spot
Cells in Wireless Networks
Rana Ejaz Ahmed
College of Engineering, American University of Sharjah
Sharjah, United Arab Emirates
Email:
[email protected]
Abstract— Recent growth in mobile telephone traffic in
wireless cellular networks, along with the limited number of
channels available, presents a challenge for the efficient
reuse of channels. Channel allocation problem becomes
more complicated if one or more cells in the network
become “hot-spots” for some period of time, i.e., the
bandwidth resources currently available in those cells are
not sufficient to sustain the needs of current users in the
cells. This paper presents a new hybrid channel allocation
algorithm in which the base station sends a multi-level “hotspot” notification to the central pool located at Mobile
Switching Station (MSC) on each channel request that
cannot be satisfied locally at the base station. This
notification will request more than one channel be assigned
to the requesting cell, proportional to the current hot-spot
level of the cell. When a call using such a “borrowed”
channel terminates, the cell may retain the channel
depending upon its current hot-spot level. The simulation
study of the protocol indicates that the protocol has low
overhead, and it behaves similar to the Fixed Channel
Allocation (FCA) scheme at high traffic and to the Dynamic
Channel Allocation (DCA) scheme at low traffic loads. The
proposed algorithm also offers low-overhead in terms of the
number of control messages exchanged between a base
station and the MSC on channel acquisition and release
phases.
Index Terms— Cellular network architectures, Channel
allocation schemes, Hot-spot cell design, Network
architecture for ubiquitous computing.
I.
INTRODUCTION
Recent growth of mobile telephone traffic in wireless
cellular networks, along with the limited number of radio
frequency channels available in the network, requires the
need of efficient reuse of channels. An efficient channel
allocation strategy is needed should exploit the principle
of frequency reuse to increase the availability of channels
to support the maximum possible number of calls at any
given time. A given frequency channel cannot be used at
the same time by two cells in the system if they are
within a distance called minimum channel reuse distance,
because it will cause radio interference (also known as
co-channel interference).
Several channel allocation schemes have proposed [14] in the literature, and they can be divided into three
major categories: Fixed Channel Allocation (FCA),
Dynamic Channel Allocation (DCA), and Hybrid
Channel Allocation (HCA). In FCA schemes, a fixed
number of channels are assigned to each cell according to
© 2010 ACADEMY PUBLISHER
doi:10.4304/jait.1.3.136-140
predetermined traffic demand and co-channel interference
constraints. FCA schemes are very simple; however, they
are inflexible, as they do not adapt to changing traffic
conditions and user distribution In order to overcome
these deficiencies of FCA schemes, DCA schemes have
been introduced. In DCA schemes, channels are placed in
a pool (usually centralized at Mobile Switching Center
(MSC) or distributed among various base stations) and
are assigned to new calls as needed. Any cell can use a
channel as long as the interference constraints are
satisfied. After the call is over, the channel is returned
back to the central pool. At the cost of higher complexity
and control message overhead, DCA provides flexibility
and traffic adaptability. However, DCA schemes are less
efficient than FCA under high load conditions [2], mainly
due to high overhead involved in exchanging control
messages. To improve the performance, some DCA
schemes use channel reassignment, where on-going calls
may be switched, when possible, to reduce the distance
between co-channel cells [1,2]. Another type of DCA
strategy involves channel borrowing mechanism from
neighboring cells. In such a scheme, channels are
assigned to each cell, as is normally done in the case of
FCA. However, when a call request finds all such channel
busy, a channel may be borrowed from a neighboring cell
if such cell borrowing will not violate the co-channel
interference constraints [1-5]. A generic mathematical
theory for load balancing problem in cellular network is
described in [6].
HCA techniques are designed by combining FCA and
DCA schemes in an effort to take advantages of both
schemes. In HCA, channels are divided into two disjoint
sets: one set of channels is assigned to each cell on FCA
basis (fixed set), while the others are kept in a central
pool for dynamic assignment (dynamic set). The fixed
set contains a number of channels that are assigned to
cells as in the FCA schemes and such channels are
preferred for use in their respective cells. When a mobile
host needs a channel for its call, and all the channels in its
fixed set are busy, only then a request from the dynamic
set is made. The ratio of the number of fixed and dynamic
channels plays an important role. It has been found that if
the ratio is 50% or more, FCA performs better than HCA.
The HCA techniques proposed in the literature are
complex to implement and they suffer from the large
control overhead incurred from system state collection
and dissemination.
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
The channel allocation problem becomes even more
challenging when one or more cells in the networks
become “hot-spots” for some duration of time. A cell
becomes a “hot-spot” when traffic generated in that cell
exceeds far beyond its normal traffic load for that
particular hour. An example of a “hot-spot” cell could be
the area covered by a football stadium for the duration of
a favorite game. The reported HCA techniques in the
literature do not offer proactive strategies in case a cell in
the system will become a “hot-spot” in the very near
future.
This paper presents a new HCA scheme that takes into
account the level of traffic intensity in a cell in terms of a
“hot-spot” signal, in the case the cell becomes “hot-spot”.
The proposed scheme is simple to implement and offers
low-overhead in terms of the number of control messages
exchanged between the base station and MSC on channel
acquisition and release phases.
II.
BASELINE SYSTEM ARCHITECTURE
A. System Model and Definitions
We consider a wireless cellular network where the
covered geographic area served by the system is divided
into several hexagonal-shaped cells. Each cell is served
by a base station (also called the MSS (Mobile Service
Station)), usually present at the center of the cell. The
base stations are connected with one another through a
fixed wired (or wireless) network, in general. A mobile
host can communicate only with the base station in its
cell directly. When a mobile host wants to set up a call, it
sends a request to its base station in its cell on the control
channel. The call can be set up only if a channel is
assigned to support the communication between the
mobile host and the base station. No two cells in the
system can use the same channel at the same time if they
are within minimum channel reuse distance; otherwise,
channel interference will occur.
It is assumed that a base station can keep the count of
number of calls originated (successful or unsuccessful) in
its related cell over a given period of time. This count
will help the base station to determine its present level of
“hot-spot” and to send a multi-level “hot-spot”
notification/ signal to its Mobile Switching Center
(MSC).
The system uses a hybrid channel allocation scheme
where the total number of C channels is divided into two
disjoint sets, F and D. The set F contains the channels for
fixed (or static) assignment, while the set D contains the
channels for dynamic assignment, i.e., C = F ∪ D.
Moreover, each base station maintains a temporary pool
(called T ) to retain a channel that was originally
transferred from the dynamic assignment pool at MSC.
The system uses a frequency reuse factor N. The fixed
channels are assigned to a cell statically as in FCA; while
dynamic channels are kept in a centralized pool at MSC.
Let r be the ratio of the number of dynamic channels to
the total number of channels available in the system, i.e.,
r = |D| / |C|
© 2010 ACADEMY PUBLISHER
137
The ratio r will remain fixed in the system (i.e., it will
not change dynamically over the period of time). The
value of r is a design parameter, and it depends on the
designer’s view about the heterogeneity (or difference) in
traffic volumes in different cells in the network. For
example, if most of the cells in the network has the
chance of becoming “hot-spots”, then it is a good idea to
keep the ratio r > 0.5.
The “hot-spot” notification level is an integer-valued
number L, such that
L ∈ { 0,1,2,.. M }
where M represents a pre-defined maximum level
supported by the system. The value for L represents the
fact that up to L borrowed channels can be retained by the
base station after a call on the borrowed channel from
that cell terminates. The hybrid channel allocation
algorithm (described next) will use the appropriate value
of L in several of its steps.
B. Hybrid Channel Allocation Algorithm
The proposed hybrid channel allocation algorithm is
described in two phases: channel acquisition phase and
channel release phase. The steps taken by mobile host,
base station and MSC are outlined below.
Set L to 0 at the beginning to indicate that, at the
present time, the channel request can be accommodated
from the fixed (static) list assigned to the cell.
Channel Acquisition Phase
The following Steps are taken from Mobile Host/ Base
Station sides during a channel acquisition phase:
1.
2.
3.
4.
5.
When a mobile host wants to initiate a call, it
sends the channel request on the control channel to
its related base station.
If the base station has an available channel from
its current fixed channel list (i.e., set F) , it will
assign the channel to the mobile host, and channel
acquisition phase terminates.
If no channel from the fixed list for the cell is
available, then the base station updates the value
of L as follows:
L = L +1;
L = max (L, M);
The base station then sends a request to borrow a
channel from the central pool located at MSC. It
also includes the current value of L in the channel
request; and the maximum value of L is a predefined number M.
When the base station successfully acquires a
channel from the dynamic pool at MSC, it also
adds the channel to its temporary pool (T).
The following steps are taken from MSC Side during a
channel acquisition phase:
138
1.
2.
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
The MSC, on receiving a channel request from the
base station assigns up to L channels, if available,
from the pool allocated for dynamic assignment to
the requesting base station (even the call generated
by the mobile host needed only one channel) and
the channel acquisition phase terminates. The
main reason of assigning up to L channels (instead
of only one) is a proactive measure, as the
“borrowing” event indicates that the probability of
the cell covered by the requesting base station
becoming a “hot-spot” might be on the rise, and
an assignment of several channels with one
request will definitely involve less overhead (in
terms of control messages exchanged) as
compared to several single channel requests.
If the MSC cannot assign even one channel, then
the call will be blocked and the channel
acquisition phase terminates.
Channel Release Phase
The following Steps are taken from Mobile Host, Base
Station , and MSC sides during a channel release phase:
1.
When a call terminates on a channel ci at a mobile
host, the base station needs to find out which type
of channel the call belonged to. If the channel
belonged to is from the fixed (static) pool
maintained at the base station, the channel is
returned to the pool and channel release phase
terminates.
2. However, if the channel, ci , being returned
belonged to the dynamic pool at MSC, the base
station estimates the current level of “hot-spot” , h
, in the cell.
3. If h is less than or equal to the old value of level L,
meaning that the congestion in the cell is the same
or easing, the base station checks its temporary
pool (T) and retains only up to h channels in a
random order, and all the remaining channels (if
any) in T are returned back to the pool at MSC.
If h is greater than the old value of level L (i.e., h >
L), meaning that the congestion in the cell is getting
worse, the channel ci is retained in the cell and is
returned to the base station’s temporary pool (T); the
channel ci is not returned back to MSC at this time.
III.
PERFORMANCE EVALUATION
Several metrics can be used to evaluate and compare
the performance of the proposed algorithm with the
existing ones. In this paper, we considered the followings
metrics: Call blocking (denial) probability, the average
number of control messages sent from the base station to
MSC in order to acquire one channel from the pool
holding dynamic channels, and the average number of
control messages sent from the base station to MSC in
order to release one channel to the centralized pool.
The call blocking probability is defined as the ratio of
the number of new calls initiated by a mobile host which
© 2010 ACADEMY PUBLISHER
cannot be supported by existing channel arrangement to
the total number of new calls initiated; i.e., the
probability that a call arriving to a cell finds both fixed
and dynamic channels busy.
In the classical hybrid channel allocation schemes
reported in the literature, one control message is
exchanged between the base station and the MSC in order
to acquire exactly one channel from the dynamic pool
resident at MSC, and the similar comments are true for
channel release phase. However, the proposed algorithm
makes “bulk” acquisition of the channels per request, and
“bulk” release of channels per request depending upon
the hot-spot level. That means that the number of control
messages needed in the proposed algorithm is expected to
be low as compared to the classical hybrid allocation
schemes.
A. Simulation Parameters
The simulation parameters used in this paper are quite
similar to the ones used in [3]. The simulated cellular
network consists of a 2D structure of 6 x 6 hexagonal
cells, with each cell having six neighbors. There are 300
channels in total in the system. A frequency reuse factor
of 3 is assumed (i.e., N = 3).
The arrival of calls at any cell is assumed to be a
Poisson process and that the call duration is exponentially
distributed with a mean of 3 minutes. A cell can be in one
of the two states at any time: normal or “hot-spot”. The
mean call arrival rate in a normal (i.e., non “hot-spot”)
cell is λ calls per minute. When a cell is in a “hot-spot”
state, the call arrival rate in the cell is assumed to be 3λ
calls per minute. The mean rate (per minute, uniformly
distributed) of changing from normal to “hot-spot” state
is 1/30; while the mean rate (per minute, uniformly
distributed) form the “hot-spot” state to normal state is
1/3. It is to be noted that a cell can also become ”hotspot” with short-term traffic fluctuations even the cell is
not explicitly forced into “hot-spot” state as mentioned
above.
The ratio r (the ratio of the number of dynamic channels
to the total number of channels available in the system) is
also changed from 0.1 to 0.8. The system load (or traffic)
intensity is defined with respect to arrival rates and call
service rate in a cell that can switch back and forth from
“normal” to “hot-spot” state. The value of M (the
maximum “hot-spot” level supported by the system) is
also varied from 4 to 8.
B. Simulation Results
The proposed algorithm was simulated using discreteevent simulation model in C++, and the call blocking
probability and the number of control messages
exchanged were studied under various system parameters,
as discussed in section 3.1.
Figure 1 shows the results of blocking probabilities for
the proposed algorithm for various values of r for M =
4, while Figure 2 shows the results for various values of r
for M = 8. The results are also compared with the static
Fixed Chanel Assignment (FCA) algorithm.
C. Comparisons and Discussion
The main advantage of the proposed protocol is that it
can adapt to dynamic strategy at low traffic load to static
strategy (FCA) at higher traffic load. This fact is verified
by simulation, as shown in Figures 1 and 2.
When M=4 (as shown in Figure 1), the proposed
strategy is better than FCA at system loads less than
about 0.8, and higher value of r gives better results in this
region. At higher system loads, the performance of the
proposed algorithm approaches to FCA, and the lower
values of r give better performance (closer to FCA) in
this region.
If we increase the value of maximum “hot-spot” level
(M), the system performance, in general improves, in
both regions of low and high system loads. Such
observation can be easily made in Figure 2. The main
reason of improvement in result is due to the fact that at
higher traffic load, the more channels are available and
retained in a “hot spot” cell.
Figures 3 and 4 shows the average number of control
message exchanged on each dynamic channel acquired
(or returned) from the central pool at MSC. It should be
noted that in all classical HCA (or DCA) algorithms, we
need to send one control message on the event of channel
acquisition or release for every channel request from the
base station. The proposed algorithm offer a very low
overhead in terms of control message exchanges, as the
“bulk” channel acquisitions or releases are done through
a single control message. Higher values of the ratio “r”
offer even low over head, especially at higher traffic
intensities. This is due to the fact that most of the
channels are available in the central pool at MSC for
dynamic channel assignments, and the channel requests
are likely to be fully fulfilled at higher traffic intensities.
IV.
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
139
FCA
r = 0.1
r = 0.5
r = 0.8
0
0.25 0.5 0.75 1
System Load
1.25 1.5
Figure 1: Simulation Results of proposed algorithm for various values
of r for M=4, and their comparison with FCA.
B locking Probability
Figure 3 shows the average number of control messages
sent per dynamic channel acquired from the pool at MSC,
while Figure 4 shows the average number of control
messages sent for each dynamic channel returned to the
centralized pool at MSC.
Blocking Probability
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
FCA
r = 0.1
r = 0.5
r = 0.8
0
0.25 0.5 0.75
1
1.25 1.5
System Load
Figure 2: Simulation Results of proposed algorithm for various values
of N for M=8, and their comparisons with FCA.
CONCLUSIONS
This paper presents a new hybrid channel allocation
algorithm that sends a multi-level “hot-spot” notification
to the central pool on each channel request that cannot be
satisfied locally at the base station. This notification will
request more than one channel be assigned to the
requesting cell, proportional to the current hot-spot level
of the cell. This also reduce control message overhead
needed to acquire each channel individually. When a call
using such a “borrowed” channel terminates, the cell may
retain the channel depending upon its current hot-spot
level. The simulation study of the protocol indicates that
the protocol has low overhead, and it behaves similar to
the FCA at high traffic and to the DCA at low traffic
loads.
Figure 3 (a): Average number of control messages per dynamic channel
acquired form M=4.
© 2010 ACADEMY PUBLISHER
140
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
REFERENCES
[1]
Figure 3 (b): Average number of control messages per dynamic channel
acquired form M=8.
Figure 4: Average number of control messages per dynamic channel
returned for M=8.
© 2010 ACADEMY PUBLISHER
I. Katzela and M. Naghshineh, “Channel Assignment
Schemes for
Cellular Mobile Telecommunication
Systems: A Comprehensive Survey”, IEEE Personal
Communications, vol. 3, No. 3, June 1996.
[2] K.L. Yeung and T.P. Yum, “Compact Pattern Based
Dynamic Channel Assignment for Cellular Mobile
Systems”, IEEE Trans. on Vehicular Technology, Vol. 43,
No.4, November 1994, pp. 892-896.
[3] J. Yang, et al., “ A Fault-Tolerant Distributed Channel
Allocation Scheme for Cellular Networks”,
IEEE
Transactions on Computers”, Vol. 54, No. 5, May 2005,
pp. 616-629.
[4] R. Prakash, N. Shivaratri, and M. Singhal, “ Distributed
Dynamic Fault-Tolerant Channel Allocation for Cellular
Networks”, IEEE Trans. On Vehicular Technology, Vol.
48, No. 6, November 1999, pp. 1874-1888.
[5] J. Yang and D. Manivannan, “An Efficient Fault-Tolerant
Distributed Channel Allocation Algorithm for Cellular
Networks”, IEEE Transactions on Mobile Computing, Vol.
4, No. 6, Nov./Dec. 2005, pp. 578-587.
[6] O. Tonguz and E. Yanmaz, “ The Mathematical Theory of
Dynamic Load Balancing in Cellular Networks”, IEEE
Transactions on Mobile Computing, Vol. 7, No. 12,
December 2008, pp. 1504-1518.
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
141
JPEG Compression Steganography &
Crypography Using Image-Adaptation Technique
Meenu Kumari
BVUCOE/IT Dept, Pune, India
Email:
[email protected]
Prof. A. Khare and Pallavi Khare
BVUCOE/IT Dept, Pune, India
SSSIST/E&TC Dept, Bhopal, India
Email:
[email protected]
Abstract—In any communication, security is the most
important issue in today’s world. Lots of data security and
data hiding algorithms have been developed in the last
decade, which worked as motivation for our research. In
this paper, named “JPEG Compression Steganography &
Cryptography using Image-Adaptation Technique”, we
have designed a system that will allow an average user to
securely transfer text messages by hiding them in a digital
image file using the local characteristics within an image.
This paper is a combination of steganography and
encryption algorithms, which provides a strong backbone
for its security. The proposed system not only hides large
volume of data within an image, but also limits the
perceivable distortion that might occur in an image while
processing it. This software has an advantage over other
information security software because the hidden text is in
the form of images, which are not obvious text information
carriers. The paper contains several challenges that make it
interesting to develop. The central task is to research
available steganography and encryption algorithms to pick
the one the offer the best combination of strong encryption,
usability and performance. The main advantage of this
project is a simple, powerful and user-friendly GUI that
plays a very large role in the success of the application.
Index Terms—Steganography, Cryptography, Compression,
JPEG, DCT, Local Criteria, Image-Adaptation, Huffman
coding, ET, SEC scheme
I. INTRODUCTION
In simple words, Steganography can be defined as the
art and science of invisible communication. This is
accomplished through hiding information in other
information, thus hiding the existence of the
communicated information.
Though the concept of steganography and
1
cryptography are the same, but still steganography
differs from cryptography. Cryptography [24] focuses on
keeping the contents of a message secret, steganography
focuses on keeping the existence of a message secret.
Steganography and cryptography are both ways to protect
information from unwanted parties but neither
1
Manuscript submitted on May 15, 2010; revised May 17, 2010;
accepted May 31, 2010
© 2010 ACADEMY PUBLISHER
doi:10.4304/jait.1.3.141-145
technology alone is perfect and can be compromised.
Once the presence of hidden information is revealed or
even suspected, the purpose of steganography is partly
defeated. The strength of steganography can thus be
amplified by combining it with cryptography.
Almost all digital file formats can be used for
steganography, but the formats that are more suitable are
those with a high degree of redundancy. Redundancy can
be defined as the bits of an object that provide accuracy
far greater than necessary for the object’s use and
display. The redundant bits of an object are those bits that
can be altered without the alteration being detected
easily. Image and audio files especially comply with this
requirement, while research has also uncovered other file
formats that can be used for information hiding.
Given the proliferation of digital images, especially on
the Internet, and given the large amount of redundant bits
present in the digital representation of an image, images
are the most popular cover objects for steganography. In
the domain of digital images many different image file
format exit, most of them for specific applications. For
these different image file formats, different
steganographic algorithms exist. Among all these file
formats, the JPEG file format is the most popular image
file format on the Internet, because of the small size of
the images.
II. OVERVIEW
When working with larger images of greater bit depth,
the images tend to become too large to transmit over a
standard Internet connection. In order to display an image
in a reasonable amount of time, techniques must be
incorporated to reduce the image’s file size. These
techniques make use of mathematical formulas to analyze
and condense image data, resulting in smaller file sizes.
This process is called compression [3]. In images there
are two types of compression: lossy and lossless
compression [3]. Compression plays a very important
role in choosing which steganographic algorithm to use.
Lossy compression techniques result in smaller image file
sizes, but it increases the possibility that the embedded
message may be partly lost due to the fact that excess
image data will be removed. Lossless compression
142
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
though, keeps the original digital image intact without the
chance of lost, although is does not compress the image
to such a small file size.
To compress an image into JPEG format, the RGB
colour representation is first converted to a YUV
representation. In this representation the Y component
corresponds to the luminance (or brightness) and the U
and V components stand for chrominance (or color).
According to research the human eye is more sensitive to
changes in the brightness (luminance) of a pixel than to
changes in its color. This fact is exploited by the JPEG
compression by down sampling the color data to reduce
the size of the file. The color components (U and V) are
halved in horizontal and vertical directions, thus
decreasing the file size by a factor of 2.
The next step is the actual transformation of the image.
For JPEG [18], the Discrete Cosine Transform (DCT)
[18] is used, but similar transforms are for example the
Discrete Fourier Transform (DFT). These mathematical
transforms convert the pixels in such a way as to give the
effect of “spreading” the location of the pixel values over
part of the image. The DCT transforms [18] a signal from
an image representation into a frequency representation,
by grouping the pixels into 8 × 8 pixel blocks and
transforming the pixel blocks into 64 DCT coefficients
each. A modification of a single DCT coefficient will
affect all 64 image pixels in that block.
The next step is the quantization [18] phase of the
compression. Here another biological property of the
human eye is exploited: The human eye is fairly good at
spotting small differences in brightness over a relatively
large area, but not so good as to distinguish between
different strengths in high frequency brightness. This
means that the strength of higher frequencies can be
diminished, without changing the appearance of the
image. JPEG does this by dividing all the values in a
block by a quantization coefficient. The results are
rounded to integer values and the coefficients are
encoded using Huffman coding to further reduce the size.
Originally it was thought that steganography would
not be possible to use with JPEG images, since they use
lossy compression [3] which results in parts of the image
data being altered. One of the major characteristics of
steganography is the fact that information is hidden in the
redundant bits of an object and since redundant bits are
left out when using JPEG it was feared that the hidden
message would be destroyed. Even if one could somehow
keep the message intact it would be difficult to embed the
message without the changes being noticeable because of
the harsh compression applied. However, properties of
the compression algorithm have been exploited in order
to develop a steganographic algorithm for JPEGs.
One of these properties of JPEG is exploited to make
the changes to the image invisible to the human eye.
During the DCT transformation phase of the compression
algorithm, rounding errors occur in the coefficient data
that are not noticeable. Although this property is what
classifies the algorithm as being lossy, this property can
also be used to hide messages.
© 2010 ACADEMY PUBLISHER
It is neither feasible nor possible to embed information
in an image that uses lossy compression, since the
compression would destroy all information in the
process. Thus it is important to recognize that the JPEG
compression algorithm is actually divided into lossy and
lossless stages [3]. The DCT and the quantization phase
form part of the lossy stage, while the Huffman encoding
used to further compress the data is lossless.
Steganography can take place between these two stages.
Using the same principles of LSB insertion the message
can be embedded into the least significant bits of the
coefficients before applying the Huffman encoding. By
embedding the information at this stage, in the transform
domain, it is extremely difficult to detect, since it is not in
the visual domain.
III. PROPOSED SYSTEM
We propose a framework for hiding large volumes of
data in images while incurring minimal perceptual
degradation. The embedded data can be recovered
successfully, without any errors, after operations such as
decompression, additive noise, and image tampering. The
proposed methods can be employed for applications that
require high-volume embedding with robustness against
certain non-malicious attacks. The hiding methods we
propose are guided by the growing literature on the
information theory of data hiding [22].
The key novelty of our approach is that our coding
framework permits the use of local criteria to decide
where to embed data. In order to robustly hide large
volumes of data in images without causing significant
perceptual degradation, hiding techniques must adapt to
local characteristics within an image. The main
ingredients of our embedding methodology are as
follows.
(a) As is well accepted, data embedding is done in the
transform domain, with a set of transform coefficients in
the low and mid frequency bands selected as possible
candidates for embedding. (These are preserved better
under compression attacks than high frequency
coefficients)
(b) A novel feature of our method is that, from the
candidate set of transform coefficients, the encoder
employs local criteria to select which subset of
coefficients it will actually embed data in. In example
images, the use of local criteria for deciding where to
embed is found to be crucial to maintaining image quality
under high volume embedding.
(c) For each of the selected coefficients, the data to be
embedded indexes the choice of a scalar quantizer for
that coefficient. We motivate this by information
theoretic analysis.
(d) The decoder does not have explicit knowledge of
the locations where data is hidden, but employs the same
criteria as the encoder to guess these locations. The
distortion due to attacks may now lead to insertion errors
(the decoder guessing that a coefficient has embedded
data, when it actually does not) and deletion errors (the
decoder guessing that a coefficient does not have
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
embedded data, when it actually does). In principle, this
can lead to desynchronization of the encoder and
decoder.
(e) An elegant solution based on erasures and errors
correcting codes is provided to the synchronization
problem caused by the use of local criteria.
Specifically, we use a code on the hidden data that
spans the entire set of candidate embedding coefficients,
and that can correct both errors and erasures. The subset
of these coefficients in which the encoder does not embed
can be treated as erasures at the encoder. Insertions now
become errors, and deletions become erasures (in
addition to the erasures already guessed correctly by the
decoder, using the same local criteria as the encoder).
While the primary purpose of the code is to solve the
synchronization problem, it also provides robustness to
errors due to attacks.
Two methods for applying local criteria are
considered. The first is the block-level Entropy
Thresholding (ET) method, which decides whether or not
to embed data in each block (typically 8X8) of transform
coefficients, depending on the entropy, or energy, within
that block. The second is the Selectively Embedding in
Coefficients (SEC) method, which decides whether or not
to embed data based on the magnitude of the coefficient.
Reed-Solomon (RS) codes [24] are a natural choice for
the block-based ET scheme, while a “turbo-like" Repeat
Accumulate (RA) code is employed for the SEC scheme.
We are able to hide high volumes of data under both
JPEG and AWGN attacks [24]. Moreover, the hidden
data also survives wavelet compression, image resizing
and image tampering attacks.
Figure 1. Image-adaptive embedding methodology
It is observed that the perceptual quality as well as the
PSNR is better for the image with hidden data using local
criteria. Note that though the PSNR is only marginally
better, the actual perceptual quality is much better. This
indicates that the local criteria must be used for robust
and transparent high volume embedding.
Although we do not use specific perceptual models,
we refer to our criteria as `perceptual' because our goal in
using local adaptation is to limit perceivable distortion.
Figure 1 shows a high-level block diagram of the hiding
methods presented. Both the embedding methods, the
entropy thresholding (ET) scheme, and the selectively
embedding in coefficients (SEC) scheme, are based on
© 2010 ACADEMY PUBLISHER
143
joint photographic experts group (JPEG) compression
standard. As seen in the Figure 1, the techniques involve
taking 2D discrete cosine transform (DCT) of nonoverlapping 8X8 blocks, followed by embedding in
selected DCT coefficients.
Coding for Insertions & Deletions:
We noted that use of image-adaptive criteria is
necessary when hiding large volumes of data into images.
A threshold is used to determine whether to embed in a
block (ET scheme) or in a coefficient (SEC scheme).
More advanced image-adaptive schemes would exploit
the human visual system (HVS) models to determine
where to embed information. Distortion due to attack
may cause an insertion (decoder guessing that there is
hidden data where there is no data) or a deletion (decoder
guessing that there is no data where there was data
hidden). There could also be decoding error, where the
decoder makes a mistake in correctly decoding the bit
embedded. While the decoding errors can be countered
using simple error correction codes, insertions and
deletions can potentially cause catastrophic loss of
synchronization between encoder and decoder.
In the ET scheme, insertions and deletions are
observed when the attack quality factor is mismatched
with the design quality factor for JPEG attack. However,
for the SEC scheme, there are no insertions or deletions
for most of the images for JPEG attacks with quantization
interval smaller than or equal to the design interval. This
is because no hidden coefficient with magnitude ≤ t can
be ambiguously decoded to t+1 due to JPEG quantization
with an interval smaller than the design one. Both the ET
and SEC schemes have insertions/deletions under other
attacks.
Coding Framework:
The coding framework employs the idea of erasures at
the encoder. The bit stream to be hidden is coded, using a
low rate code, assuming that all host coefficients that
meet the global criteria will actually be employed for
hiding. A code symbol is erased at the encoder if the
local perceptual criterion for the block or coefficient is
not met. Since we code over entire space of coefficients
that lie in a designated low-frequency band, long
codewords can be constructed to achieve very good
correction ability. A maximum distance separable (MDS)
code [24], such as Reed Solomon (RS) code, does not
incur any penalty for erasures at the encoder. Turbo-like
codes, which operate very close to capacity, incur only a
minor overhead due to erasures at the encoder. Figure 3.4
shows how the sequence is decoded in the presence of
attacks. As it is seen, insertions become errors, and
deletions become additional erasures. It should be noted
that a deletion, which causes an erasure, is about half as
costly as an insertion, which causes an error. Hence, it is
desirable that the data-hiding scheme [4] be adjusted in
such a manner that there are only a few insertions. Thus,
using a good erasures and errors correcting code, one can
deal with insertions/deletions without a significant
decline in original embedding rate. Reed-Solomon codes
have been used for ET scheme and Repeat Accumulate
codes have been used for the SEC scheme.
144
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
IV. RESULT ANALYSIS
All steganographic algorithms have to comply with a
few basic requirements. The requirements are:
Invisibility, Payload capacity, Robustness against
statistical
attacks,
Robustness
against
image
manipulation, Independent of file format and
Unsuspicious files. The following table compares least
significant bit (LSB) insertion in BMP and in GIF files,
JPEG compression steganography, the patchwork
approach and spread spectrum techniques, according to
the above requirements:
TABLE I.
COMPARISON OF IMAGE STEGANOGRAPHY ALGORITHMS
Invisibility
LSB
in
BMP
High*
LSB in
GIF
Mediu
m*
JPEG
compressi
on
High
Patch
work
Spread
spectrum
High
High
Payload
capacity
High
Mediu
m
Medium
Low
Medium
Robustnes
s against
statistical
attacks
Robustnes
s against
image
manipulati
on
Independe
nt of file
format
Unsuspici
ous files
Low
Low
Medium
High
High
Low
Low
Medium
High
Medium
Low
Low
Low
High
High
Low
Low
High
High
High
* - Depends on cover image used
The levels at which the algorithms satisfy the
requirements are defined as high, medium and low. A
high level means that the algorithm completely satisfies
the requirement, while a low level indicates that the
algorithm has a weakness in this requirement. A medium
level indicates that the requirement depends on outside
influences, for example the cover image used. LSB in
GIF images has the potential of hiding a large message,
but only when the most suitable cover image has been
chosen.
The ideal, in other words a perfect; steganographic
algorithm would have a high level in every requirement.
Unfortunately in the algorithms that are evaluated here,
there is not one algorithm that satisfies all of the
requirements. Thus a trade-off will exist in most cases,
depending on which requirements are more important for
the specific application.
The process of embedding information during JPEG
compression results in a stego image with a high level of
invisibility, since the embedding takes place in the
transform domain. JPEG is the most popular image file
format on the Internet and the image sizes are small
because of the compression, thus making it the least
suspicious algorithm to use. However, the process of the
© 2010 ACADEMY PUBLISHER
compression is a very mathematical process, making it
more difficult to implement. The JPEG file format can be
used for most applications of steganography, but is
especially suitable for images that have to be
communicated over an open systems environment like
the Internet.
V. CONCLUSION AND SCOPE FOR FUTURE
WORK
The meaning of Steganography is hiding information
and the related technologies. There is a principal
difference between Steganography and Encryption;
however they can meet at some points too. They can be
applied together, i.e. encrypted information can be hidden
in addition. To hide something a covering medium is
always needed. (Picture, sound track, text or even the
structure of a file system, etc.) The covering medium
must be redundant; otherwise the hidden information
could be detected easily. The technology of hiding should
match the nature of the medium. The hidden information
should not be lost, if the carrying medium is edited,
modified, formatted, re-sized, compressed or printed.
That’s a difficult task to realize.The application is
primarily intended to be used to inconspicuously hide
confidential and proprietary information by anyone
seeking to hide information. This software has an
advantage over other information security systems
because the hidden text are in the form of image, which is
not obvious text information carriers.
Because of its user-friendly interface, the application
can also be used by anyone who wants to securely
transmit private information. The main advantage of this
program for individuals is that they do not have to have
any knowledge about steganography or encryption. The
visual way to encode the text, plus the visual key makes
it easy for average users to navigate within the program.
Digital Image Steganography system allows an
average user to securely transfer text messages by hiding
them in a digital image file. A combination of
Steganography and encryption algorithms provides a
strong backbone for its security. Digital Image
Steganography system features innovative techniques for
hiding text in a digital image file or even using it as a key
to the encryption.
Digital Image Steganography [2] system allows a user
to securely transfer a text message by hiding it in a digital
image file. 128 bit AES encryption is used to protect the
content of the text message even if its presence were to
be detected. Currently, no methods are known for
breaking this kind of encryption within a reasonable
period of time (i.e., a couple of years). Additionally,
compression is used to maximize the space available in
an image.
To send a message, a source text, an image in which
the text should be embedded, and a key are needed. The
key is used to aid in encryption and to decide where the
information should be hidden in the image. A short text
can be used as a key. To receive a message, a source
image containing the information and the corresponding
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
key are both required. The result will appear in the text
tab after decoding.
The common Internet-friendly format is offered. It is
inherently more difficult to hide information in a JPEG
image because that is exactly what the designers of JPEG
wanted to avoid: the transmission of extra information
that doesn't affect the appearance of the image.
ACKNOWLEDGEMENT
The work on this paper was supported by the Bharati
Vidyapeeth University & College of Engineering, Pune.
The views and conclusions contained herein are those of
the authors and the paper contains the original work of
the authors. We took help from many books, papers and
other materials.
REFERENCES
[1] N . Provos, “Defending Against Statistical Steganography,”
Proc 10th USENEX Security Symposium 2005.
[2] N . Provos and P. Honeyman, “Hide and Seek: An
introduction to Steganography,” IEEE Security & Privacy
Journal 2003.
[3] Steven W. Smith , The Scientist and Engineer's Guide to
Digital Signal Processing
[4]
Katzenbeisser and Petitcolas , ”Information Hiding
Techniques for Stenography and Digital watermaking”
Artech House, Norwood, MA. 2000 .
[5] L. Reyzen And S. Russell , “More efficient provably secure
Steganography” 2007.
[6] S.Lyu and H. Farid , “Steganography using higher order
image statistics , “ IEEE Trans. Inf. Forens. Secur. 2006.
[7] Venkatraman , s, Abraham , A . & Paprzycki M.”
Significance of Steganography on Data Security “ ,
Proceedings of the International Conference on
Information Technology : Coding and computing , 2004.
[8] Fridrich , J ., Goljan M., and Hogea , D ; New Methodology
for Breaking stenographic Techniques for JPEGs. “
Electronic Imaging 2003”.
[9]
http:/
aakash.ece.ucsb.edu./
data
hiding
/
stegdemo.aspx.Ucsb data hiding online demonstration .
Released on Mar .09,2005.
[10] Mitsugu Iwanmoto and Hirosuke Yamamoto, “The
Optimal n-out-of-n Visual Secret Sharing Scheme for
GrayScale Images”, IEICE Trans. Fundamentals, vol.E85A, No.10, October 2002, pp. 2238-2247.
[11] Doron Shaked, Nur Arad, Andrew Fitzhugh, Irwin Sobel,
“Color Diffusion: Error Diffusion for Color Halftones”,
HP Laboratories Israel, May 1999.
[12] Z.Zhou, G.R.Arce, and G.Di Crescenzo, “Halftone Visual
Cryptography”, IEEE Tans. On Image Processing,vol.15,
No.8, August 2006, pp. 2441-2453.
[13]
M.Naor and A.Shamir, “Visual Cryptography”, in
Proceedings of Eurocrypt 1994, lecture notes in computer
science, 1994, vol.950, pp. 1-12.
[14] Robert Ulichney, “The void-and-cluster method for dither
array generation”, IS&T/SPIE Symposium on Electronic
Imaging and Science, San Jose, CA, 1993, vol.1913,
pp.332-343.
[15] E.R.Verheul and H.C.A. Van Tilborg, “Constructions and
properties of k out of n visual secret sharing scheme”,
Designs, Codes, and Cryptography, vol.1, no.2, 1997,
pp.179-196.
© 2010 ACADEMY PUBLISHER
145
[16]
Daniel L.Lau, Robert Ulichney, Gonzalo R.Arce,
“Fundamental Characteristics of Halftone Textures: BlueNoise and Green-Noise”, Image Systems Laboratory, HP
Laboratories Cambridge, March 2003.
[17] C.Yang and C.Laih, “New colored visual secret sharing
schemes”, Designs, Codes and Cryptography, vol.20,
2000, pp.325-335.
[18]
Jain, Anil K., “Fundamentals of Digital Image
Processing”, Prentice-Hall of India, 1989
[19] C.Chang, C.Tsai, and T.Chen, “A new scheme for sharing
secret color images in computer network”, in Proc. of
International Conference on Parallel and Distributed
Systems, 2000, pp. 21-27.
[20] R.L.Alder, B.P.Kitchens, M.Martens, “The mathematics of
halftoning”, IBM J. Res. & Dev. Vol.47 No.1, Jan. 2003,
pp. 5-15.
[21] R.Lukac, K.N.Plantaniotis, B.Smolka, “A new approach to
color image secret sharing”, EUSIPCO 2004, pp.14931496.
[22] H.Ancin, Anoop K.Bhattacharjya, Joseph Shu, “Improving
void-and-cluster
for
better
halftone
uniformity”,International Conference on Digital Printing
Technoogies.
[23] D. Hankerson, P. D. Johnson, and G. A. Harris,
"Introduction to Information Theory and Data
Compression”.
[24] Ranjan Bose, “Information Theory Coding and
Cryptography”.
Meenu Kumari- Completed B.E. in Information Technology
from Sanjivani Educational Society & College of Engineering,
Kopargaon, Pune University in 2005. Persuing M.Tech. IT from
Bharati Vidyapeeth University College of Engineering, Pune.
Presented one national comference on Image Compression.
Published research paper in one e-journal & one international
journals. Submitted research paper in other national &
international journals for publication.
Prof. A. Khare- Completed B.E. and M.E. from Bhopal.
Currently working as Assistant Professor, in Bharati Vidyapeeth
University College of Engineering, Information Technology
Department, Pune. Presented many national & international
conferences & journals.
Pallavi Khare- Research
Department, Bhopal, India.
student
of
SSSIST,
E&TC
146
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
Review of Machine Learning Approaches to
Semantic Web Service Discovery
Shalini Batra
Computer Science and Engineering Department,
Thapar University, Patiala, Punjab, India
Email:
[email protected].
Dr. Seema Bawa
Computer Science and Engineering Department,
Thapar University, Patiala, Punjab, India;
Email:
[email protected].
Abstract--- A Web service can discover and invoke any
service anywhere on the Web, independently of the
language, location, machine, or other implementation
details. The goal of Semantic Web Services is the use of
richer, more declarative descriptions of the elements of
dynamic distributed computation including services,
processes, message-based conversations, transactions, etc. In
recent years text mining and machine learning have been
efficiently used for automatic classification and labeling of
documents. Various Web service discovery frameworks are
applying machine learning techniques like clustering,
classification, association rules, etc., to discover the services
semantically. This paper provides an exhaustive review of
machine learning approaches used for Web Services
discovery and frameworks developed based on these
approaches. A thorough analysis of existing frameworks for
semantic discovery of Web Services is provided in the
paper.
Index Terms--- Machine Learning, Semantics, Web Services,
Web services Discovery, Web Service Discovery
Frameworks
I. INTRODUCTION
Semantic Web Services (SWS) lie at the intersection
of two important trends in the World Wide Web’s
evolution. The first is rapid development of Web service
technologies and the second is the Semantic Web.
Semantic Web focuses on the publication of more
expressive metadata in a shared knowledge framework,
enabling the deployment of software agents that can
intelligently use Web resources. The driving force behind
usage of Web services is the need of reliable, vendorneutral, software interoperability across heterogeneous
platforms and networks. Another important objective
behind the development of Web Services has been the
ability to coordinate business processes involving
heterogeneous components (deployed as services) across
ownership boundaries. These objectives have led to the
development of widely recognized Web service standards
such as WSDL, UDDI, and BPEL.
The Semantic Web brings knowledge-representation
languages and ontologies into the fabric of the Internet;
providing a foundation for powerful new approaches to
© 2010 ACADEMY PUBLISHER
doi:10.4304/jait.1.3.146-151
organizing, describing, searching, and reasoning about
information and activities on the Web (and also other
networked environments). Semantic Web proposes to
extend the traditional Web Services technologies on the
way to consolidate ontologies and semantics such that
services are able to dynamically adapt themselves to
changes without human intervention.
The description of Semantic Web services enable
fuller, more flexible automation of service provision and
use and the construction of more powerful tools and
methodologies for working with services. As a rich
representation framework permits a more comprehensive
specification of many aspects of services, SWS can
provide a solid foundation for a broad range of activities
throughout the Web service life cycle. For example,
richer service descriptions can support:
• greater automation of service selection and
invocation,
• automated translation of message content
between heterogeneous interoperating services,
• automated or semi automated approaches to
service composition, and
• more comprehensive approaches to service
monitoring and recovery from failure [1].
Semantic Web Services enable the automatic
discovery of distributed Web services based on
comprehensive semantic representations. However,
although SWS technology supports the automatic
allocation of resources for a given well defined task, it
does not entail the discovery of appropriate SWS
representations for a given context. One of the major
problems with existing structure are that UDDI does not
capture the relationships between entities in its directory
and therefore is not capable of making use of the
semantic information to infer relationships during search.
Secondly, UDDI supports search based on the highlevel information specified about businesses and services
only. It does not get to the specifics of the capabilities of
services during matching [2].
Several
upper
ontologies
(i.e.,
applicationindependent) have been already proposed for service
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
description. The first one was DAML-S [3] based on the
DAML+OIL ontology definition language. However,
with the wide acceptance of the Web Ontology Language
(OWL) [4] family of languages, DAML-S was replaced
by OWL-S [5]. On similar specifications various SWS
frameworks like WSDL-S [6], WSMO [7], etc. were also
developed. All these specifications, although sharing
many modeling elements, differ in terms of
expressiveness, complexity and tool support [8].
The only change desired in discovering the services
semantically is that some metadata should be available
which provide the functional description of Web services
on which machine learning techniques like classification,
clustering, association mining can be applied. Our
contribution in this paper is to present a survey of SWS
discovery frameworks based on machine learning
approaches, methodologies and techniques applied for
discovery of web services semantically and analyze the
shortcomings of these approaches along with future
direction to accomplish the job of Web Service discover
successfully.
II. MACHINE LEARNING BASED FRAMEWORKS
In machine learning there are two major settings in
which a function can be described: supervised learning
and unsupervised learning. In supervised learning the
variables under investigation can be split into two groups:
explanatory variables and one (or more) dependent
variables. The target of the analysis is to specify
a relationship between the explanatory variables and the
dependent variable as it is done in regression analysis. To
apply directed data mining techniques the values of the
dependent variable must be known for a sufficiently large
part of the data set. In unsupervised learning all variables
are treated in the same way, there is no distinction
between explanatory and dependent variables. Supervised
learning requires that the target variable is well defined
and that a sufficient number of its values are given. For
unsupervised learning typically either the target variable
is unknown or has only been recorded for too small
a number of cases.
Classification models are created by examining
already classified data (cases) and inductively finding a
predictive pattern. Classification problems aim to identify
the characteristics that indicate the group to which each
case belongs. This pattern can be used both to understand
the existing data and to predict how new instances will
behave. Clustering is a multivariate statistical technique
that allows an automatic generation of groups in data.
The result of the clustering is a partitioning of the
collection of objects in groups of related objects. From a
machine learning perspective clusters correspond to
hidden patterns, the search for clusters is unsupervised
learning, and the resulting system represents a data
concept. Conceptual clustering is a machine learning
paradigm for unsupervised classification distinguished
from ordinary data clustering by generating a concept
description for each generated class. Most conceptual
© 2010 ACADEMY PUBLISHER
147
clustering methods are capable of generating hierarchical
category structures.
A. Classification Based Approaches to Semantic Web
Service Discovery
Some of the Web service discovery frameworks
combine text mining and machine learning techniques for
classifying Web services and hence some semiautomatic
and automatic methods have been proposed for Web
service discovery through classification. Some
approaches are based on argument definitions matching
[9, 10], document classification techniques [11, 12] and
semantic annotations matching [13]. MWSAF [9] is an
approach for classifying Web services based on argument
definitions matching. First, MWSAF translates the
WSDL definitions into a graph. Then, MWSAF uses
graph similarity techniques for comparing both. On
similar lines Duo et al. [10] propose to translate a
definition into an ontology, instead of a graph. Then, an
ontology alignment technique attempts to map one
ontology on another [7]. METEOR-S [11] describes a
further improved version of MWSAF. The problem of
determining a Web service category is abstracted to a
document classification problem. The graph matching
technique is replaced with a Na¨ıve Bayes classifier. To
do this, METEOR-S extracts the names of all operations
and arguments declared in WSDL documents of precategorized Web services.
Assam [14] is an ensemble machine learning approach
for determining Web service category. Assam combines
the Na¨ıve Bayes and SVM [15] machine learning
algorithms to classify WSDL files in manually defined
hierarchies. Assam takes into account Web service
natural language documentation and descriptions.
Automatic Web Service Classification (AWSC)
compares a Web service description with other
descriptions that have been manually classified. In
AWSC, a two-stage process to classify a Web service is
applied which uses text mining techniques at the first
stage, namely preprocessing, to extract relevant
information from a WSDL document and a supervised
document classifier at the second stage, namely
classification. This classifier deduces a sequence of
candidate categories for a preprocessed Web service
description.
B. Cluster Based Approaches to Semantic Web Service
Discovery
The clustering methodology re-organizes a set of data
into different groups based on some standards of
similarity thus transforming a complex problem into a
series of simpler ones, which can be handled more easily.
Based on the clustered service groups, a set of matched
services can be returned by comparing the similarity
between the query and related group, rather than
computing the similarity between query and each service
in the dataset. If the service results returned are not
compatible to the user’s query, the second best cluster
would be chosen and the computing proceeds to the next
iteration
148
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
Various clustering approaches have been used for
discovering Web services. Dong [16] puts forward a
clustering approach to search Web services where the
search consisted of two main stages. A service user first
types keywords into a service search engine, looking for
the corresponding services. Then, based on the initial
Web services returned, the approach extracts semantic
concepts from the natural language descriptions provided
in the Web services. In [17] Arbramowicz proposed an
architecture for Web services filtering and clustering. The
service filtering is based on the profiles representing
users and application information, which are further
described through Web Ontology Language for Services
(OWL-S). In order to improve the effectiveness of the
filtering process, a clustering analysis is applied to the
filtering process by comparing services with related the
clusters. Another similar approach followed in [18]
concentrates on Web service discovery with OWL-S and
clustering technology, which consists of three main steps.
The OWL-S is first combined with WSDL to represent
service semantics before a clustering algorithm is used to
group the collections of heterogeneous services together.
Finally, a user query is matched against the clusters, in
order to return the suitable services.
Web services are clustered into the predefined
hierarchical business categories in [19] and service
discovery is based on a directory. In this situation, the
performance of reasonable service discovery relies on
both service providers and service requesters having prior
knowledge on the service organization schemes. In [20]
Probabilistic Latent Semantic Analysis (PLSA) is used to
capture semantic concepts hidden behind words in the
query and advertisements in services so that service
matching is carried out at concept level. The Singular
Vector Decomposition (SVD) of matrix approach of
matching web services [21] has been extended with a
different methodology called Probabilistic Latent
Semantics Analysis (PLSA) based on aspect model. The
model indirectly associates keywords to its corresponding
documents by introducing an intermediate layer called
hidden factor variable Z = {z1, z2 ,..., zk }[22].
Clustering Probabilistic Semantic Approach (CPLSA)
discussed in [24] is an extension of [20], uses a dynamic
algorithm that partitions a service working dataset into
smaller pieces. It includes the two main phases:
eliminating irrelevant services and matching services at
semantic concept level. Once the irrelevant services are
eliminated, a Probabilistic Latent Semantic Analysis
approach is applied to the working dataset for capturing
semantic concepts. As a result, Web services are
clustered into a finite number of semantically related
groups. The Semantic We Service Classification (SWSC)
method discussed in [25] analyses the WSDL and checks
its configuration and structure for further processing. The
WSDL information of Web services is transformed in a
richer semantic representation language OWL-S [4] using
a series of methods.
The hierarchical agglomerative clustering method,
which is often used in information retrieval for grouping
similar documents, is used in [27] for Web service
© 2010 ACADEMY PUBLISHER
clustering. This method uses a bottom-up strategy that
starts by placing each Web service in its own cluster, and
then successively merges clusters together until a
stopping criterion is satisfied. The clusters (terms
representing Web services) are stored in the UDDI
registry database. The SWSC method improves the
search function by retrieving the best offers of services
using the cluster matching. The SWSC method ranks the
matched Web services and indicates the degree of
relevance according to the term existence in clusters.
C. Context-Aware SWS Discovery
Contexts have increasingly been considered for better
service provision. Dey [27] describes a context-aware
computing system as a system that uses contexts to
provide relevant information and/or services to the users,
where relevancy depends on the user tasks, while
Korkea-aho [26] defines contexts as any situational
information that is available at the time of interaction
between users and computing systems. Contexts can be
useful for service discovery also. The UDDI can better
perform if contexts of service consumers, service
providers, and Web services are considered at discovery
time. Context awareness has been applied in Web
services discovery researches. The WASP project [27]
attempts to enhance the standard UDDI into UDDI+ by
adding the semantic and contextual features. Their
approach focuses on semantic analysis of service
provider contexts which are described by the ontologybased DAML-S specification, and hence their contexts
are static only.
Doulkeridis et al [29] propose a context-aware service
discovery architecture which accommodates various
registry technologies including the UDDI and ebXML
registry, but they consider the provision and consumption
of services via mobile devices. Their approach therefore
focuses on contexts related to mobility and handheld
devices and does not cater for a generic context model.
Lee et al [30] enhances context-aware discovery by
introducing context attributes as part of service
descriptions in the service registry but their contexts are
dynamic attributes only. The CB-SeC framework [31]
also enables more sophisticated discovery and
composition of services, by having a WSDL of a Web
service augmented with context functions; these context
functions will be invoked to determine the values of the
service contexts. In this way, however, the WSDL will be
cluttered with operations that do not reflect service
capability. Keidl et al [32] introduces the concept of
context type in their context framework but they focus on
adapting service provision according to the consumer’s
contexts which are specified under particular context
types; their framework does not consider service
discovery.
Conceptual Spaces (CS), introduced by Gärdenfors
[32,33], follows a theory of describing entities at the
conceptual level in terms of their natural characteristics
similar to natural human cognition in order to avoid the
symbol grounding issue. Semantic similarity between
situations is calculated in terms of their Euclidean
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
distance within a CSS. Context-aware discovery and
invocation of Web services and data sources is highly
desired across a wide variety of application domains and
subject to intensive research throughout the last decade
[34, 35, 24] .
Authors in [37] propose that extending merely
symbolic SWS descriptions with context information on a
conceptual level through CSS enables similarity-based
matchmaking between real-world situation characteristics
and predefined resource representations as part of SWS
descriptions. CSS are mapped to standardized SWS
representations to enable the context-aware discovery of
appropriate SWS descriptions and the automatic
discovery and invocation of appropriate resources - Web
services and data - to achieve a given task within a
particular situation.
D. Web Services Discovery Based On Schema Matching
In [31] the authors propose a SVD-Based algorithm to
locate matched services for a given service. There
algorithm uses characteristics of singular value
decomposition to find relationships among services. But
it only considers textual descriptions and can not reveal
the semantic relationship between web services. Wang et
el [38] proposed a method based on information retrieval
and structure matching. Given a potentially partial
specification of the desired service, all textual elements
of the specification are extracted and are compared
against the textual elements of the available services, to
identify the most similar service description files and to
order them according to their similarity. Approach in [38]
is similar to that followed in [16] but here focus in is on
the semantic similarity not the structural similarity.
Woogle [16] develops a clustering algorithm to group
names of parameters of web-service operations into
semantically meaningful concepts. Then these concepts
are used to measure similarity of web-service operations.
It relies too much on names of parameters and does not
deal with composition problem.
A schema tree matching algorithm has been proposed
in [23], which employs a cost model to compute tree edit
distances for supporting web-service operations matching
and catches the semantic information of schemas and
then the agglomeration algorithm is employed to cluster
similar web- service operations and then rank them to
satisfy a user's top-k requirements.
E. Some Prevalent Frameworks and Methodologies
A considerable body of research has emerged
proposing different methods of improving accuracy of
Web service discovery. A Web service discovery method
combining semantic and statistical association with
hyperclique pattern discovery [39] has been proposed.
Algorithms using Singular Vector Decomposition (SVD)
[15] and probabilistic latent semantic analysis [20] have
been proposed to find the similarity between various Web
services to enhance the accuracy of service discovery.
However none of these methods provides empirical and
theoretical analysis showing that these methods improve
the process of Web service discovery. In [36] authors
proposed an extension of SVD [15] to support-based
© 2010 ACADEMY PUBLISHER
149
latent semantic kernel to further increase the accuracy of
Web service discovery by using random projection [20]
for service discovery. In random projection, the initial
corpus is projected to l dimensions, for some l > k, where
k is the dimension of the semantic kernel, to obtain a
smaller representation which is close to the original
corpus and then perform SVD on the reduced dimension
matrix where the semantic kernel has been created on a
large Wikipedia corpus for dimensionality reduction by
introducing the concept of merging documents as well as
using the constructed kernel on a general-purpose corpus
to find semantically similar Web services for a user
query.
Hyperclique pattern [39] are described as a type of
association pattern containing items that are strongly
associated with each other. Every pair of items within a
hyperclique pattern is guaranteed to have the uncentered
correlation coefficient above a certain level. When used
in Web services field, items are the input or output
parameters and a transaction is the set of input and output
parameters for individual web service. Hyperclique
pattern discovery can be adapted to capture frequently
occurring local operation parameters’ structures in web
services, where parameters of a web service are represent
as a vector, and each entry records the terms of the
operations’ input and output, thus each of these
collections of terms forms a transaction. The web service
collection is mined to find the frequent hyper clique
patterns that satisfy a given support level and hconfidence level [23]. This is followed by a pruning of
the hyper clique patterns on the basis of the ranking of
semantic relationships among the terms.
III. COMPARITIVE ANALYSIS OF EXISTING FRAMWORKS
The major concerns in Semantic Web Service
Discovery are that all new services do not have semantic
tagged descriptions and vast majority of already existing
web services do not have associated semantics. The
problem of discovering Web services semantically is that
there are too few annotated services and hence, semantic
approach suffers from a cold-start problem as it assumes
that a corpus of previously annotated services is
available. Incorporating semantic annotating support to
Web service is necessary.
A sensible classification system may “guide” the
annotating process by deducing a handful set of similar
services. Efforts for classifying Web services have
several shortcomings. Natural language documentation,
usually present in WSDL files and service registries have
not been considered thoroughly.
Some of frameworks are based on the false premise
that an operation and its argument names are
independent, while some do not consider natural
language documentation. Some frameworks assume that
a corpus of previously classified services is available
which generates the inability for dynamically creating
categories without re-building the classifier.
Clustering annotated resources enables the definition
of new emerging concepts (concept formation) on the
grounds of the concepts defined in a knowledge base;
150
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
supervised methods can exploit these clusters to induce
new concept definitions or to refining existing ones
(ontology evolution); intentionally defined groupings
may speed-up the task of search and discovery.
Although the idea of clustering of similar Web
services into a group is well supported and appreciated
but it lacks a common base. Some authors propose
hierarchical clustering while others prefer agglomerative
clustering. The major effort required it is to have
incremental clustering and dynamic classification.
A common problem with the SVD based approaches is
that the computation of the high dimensional matrix
representing the training documents is expensive. There
have been some attempts to reduce the dimensionality of
matrix prior to applying SVD.
Current SWS frameworks such as WSMO and OWL-S
address the allocation of distributed services for a given
(semantically) well-described task but none of the Web
services fully solve the issues related to symbolic
Semantic Web -based knowledge representations.
Although lot of research has been done in this area and
the entire research is moving in right direction but still no
major breakthrough has been achieved and a lot more
needs to be done to accomplish the task of Semantic Web
service discovery.
Meta data based classification is a realistic option as
they produce more specific semantic types. Combining
metadata and content-based classification can indeed
improve the performance of semantic based discovery. If
we try to explore the clustering option then incremental
clustering approach might be more suitable.
Functional description of Web service can be provided
in documentation tag of the published service as this will
serve as an additional sources of information and it
semantically annotates the services, which can be easily
extracted by text pre-processing techniques including
detagging, tokenizing, stop word removal, etc.
Using trained classifiers is not accurate enough and
soft classification based frameworks should be
considered. For more effective ranking of the results,
semantic weights should be associated to the retrieved set
of web services.
Applying content based classification algorithm is an
efficient method which can be used to classify web
services into their respective groups. Pattern learning
algorithms are another option which can be used to
identify similar patterns in heterogeneous data and match
them semantically.
REFERENCES
[1] David Martin, John Domingue, “Semantic Web Services:
Trends & Controversies”.
[2] Schmidt, A., Winterhalter, C. (2004), “User
Context
Aware Delivery of E-Learning Material: Approach and
Architecture”, Journal of Universal Computer Science
(JUCS) vol.10, no.1, January 2004
[3] http://www.daml.org/index.html.
© 2010 ACADEMY PUBLISHER
[4] w3.org, “OWL Web Ontology Language
Overview”,
owlfeatureshttp://www.w3.org/TR/2004/REC20040210/
[5] Martine et al, “OWL-S: Semantic Markup for Web
Services”, W3C Member Submission.
[6] R. Akkiraju, J. Farrell, J.Miller, M. Nagarajan, M. Schmidt,
A. Sheth, K.Verma, "Web Service Semantics - WSDL-S, "
A joint UGA-IBM Technical Notes Version1.0,April 18,
2005.
[7] Keller, U., Lara, R., Polleres, A., “WSMO d5.1 discovery”,
http://www.wsmo.org/TR /d5/d5.1
[8] Jorge Cardoso, “Semantic Web Services: Theory, Tools,
and Applications”,
Information Science Reference,
University of Madeira, Portugal, ISBN 978-1-59904-0455, 2007.
[9] Abhijit A. Patil, Swapna A. Oundhakar, Amit P. Sheth,
and Kunal Verma, “METEOR-S Web Service Annotation
Framework”, In Proc. of the 13th International Conference
on WWW. ACM Press, 2004
[10] Zhang Duo, Li Zi, , and Xu Bin, “Web service annotation
using
ontology mapping”, IEEE International
Workshop on Service-Oriented System Engineering, pages
235–242, 2005.
[11] Nicole Oldham, Christopher Thomas, Amit P. Sheth, and
Kunal Verma, “METEOR-S Web service annotation
framework with machine learning classification”, In
Semantic Web Services and Web Process Composition,
volume 3387 of LNCS, pages 137–146, San Diego, CA,
USA, 2004. Springer.
[12] Andreas Heß and Nicholas Kushmerick. “Learning
to
attach Semantic Metadata to Web Services”, in Dieter
Fensel, Katia P. Sycara, and John Mylopoulos, editors,
International Semantic Web Conference, volume 2870 of
Lecture Notes in Computer Science, pages 258–273.
Springer, 2003.
[13] Miguel ´ Angel Corella and Pablo Castells. “Semiautomatic Semantic-based Web Service Classification”,
Business Process Management Workshops, volume 4103
of LNCS, pages 459–470, Vienna, Austria, September 4-7
2006. Springer.
[14] Andreas Heß, Eddie Johnston, and Nicholas Kushmerick,
“ASSAM: A tool for Semi Automatically Annotating
Semantic Web Services”, In McIlraith et al. [pages 320–
334.
[15] Nello Cristianini and John Shawe-Taylor, “An Introduction
to Support Vector Machines and other Kernel-Based
Learning Methods”, Cambridge University Press, New
York, NY, USA, 2000.
[16] X. Dong, A. Halevy, J. Madhavan, E. Nemes and J. Zhang,
“Similarity Search for Web services”, Proceedings of the
30th VLDB Conference, Toronto, Canada, 2004.
[17] W. Abramowicz, K. Haniewicz, M. Kaczmarek and D.
Zyskowski “Architecture for Web services Filtering and
Clustering”, Internet and Web Applications and Services,
(ICIW '07),May 13-19, 2007, Le Morne, Mauratius.
[18] Le Duy Ngan, Tran Minh Hang, and Angela Eck Soong
Goh “MOD - A Multi-Ontology Discovery System”,
International Workshop on Semantic Matchmaking and
Resource Retrieval (co-located with VLDB'06), Seoul,
Korea, Sept. 2006.
[19] Shou-jian Yu, Jing-zhou Zhang, Xiao-kun Ge, Guo-wen
Wu, “Semantics Based Web Services Discovery”
Proceedings of the International Conference on Parallel
and Distributed Processing Techniques and Applications &
Conference on Real-Time Computing Systems and
Applications, PDPTA 2006, Las Vegas, Nevada, USA,
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 1, NO. 3, AUGUST 2010
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
June 26-29, 2006, Volume 1. CSREA Press 2006, ISBN 1932415-86-6
Jiangang Ma, Jinli Cao, Yanchun Zhang, “A Probabilistic
Semantic Approach for Discovering Web Services”, WWW
2007, May 8–12, 2007, Banff, Alberta, Canada. ACM 9781-59593-654-7/07/0005.
Atul Sajjanhar, Jingyu Hou and Yanchun Zhang.
“Algorithm for Web Services Matching”, In Proceedings
of the 6th Asia-Pacific Web Conference, `APWeb', Vol.
3007, pp. 665-6702004, Hangzhou, China, April,1417,2004.
Thomas. Hofmann, “Probabilistic Latent Semantic
Analysis”, In Proceedings of the 22nd Annual ACM
Conference on Research and Development in Information
Retrieval. Berkeley, California, pages: 50-57, ACM Press,
1999.
Natenapa Sriharee, “Semantic Web Services Discovery
Using Ontology-based Rating Model”, Proceedings of the
2006 IEEE/WIC/ACM International Conference on Web
Intelligence
(WI
2006
Main
Conference
Proceedings)(WI'06) 18-22 Dec. 2006, Hongkong, China.
0-7695-2747-7/06
Jiangang Ma, Jinli Cao, Yanchun Zhang, “Efficiently
Finding Web Services Using a Clustering Semantic
Approach”, CSSSIA 2008, April 22, Beijing, China
Richi Nayak Bryan Lee, “Web Service Discovery with
Additional Semantics and Clustering”, IEEE Computer
Society, 2
M. Korkea-aho, “Context-aware Applications Survey”,
Internetworking
Seminar
(Tik-110.551),
Helsinki
University of Technology, Spring 2000.
A. Dey, “Providing Architectural Support for Building
Context-aware Application”, Ph.D. dissertation (Atlanta:
Georgia Institute of Technology, 2000).
S. Pokraev, J. Koolwaaij and M. Wibbels, “Extending
UDDI with Context-aware Features based on Semantic
Service Descriptions”, Proc. of 1st Intl. Conf. on Web
Services, Las Vegas, Nevada, USA, 2003, 184-190.
C. Doulkeridis, N. Loutas, & M. Vazirgiannis, “A System
Architecture for Context-aware Service Discovery”, Proc.
of Intl. Workshop on Context for Web Services (CWS'05),
Paris, France, July 5, 2005, 101-116.
C. Lee & S. Helal, “Context attributes: An Approach to
Enable Context-awareness for Services Discovery”, Proc.
of Symposium on Application and the Internet, Florida,
USA, 2003, 22-30.
S. K. Mostefaoui, H. Gassert, & B. Hirsbrunner, “Context
meets Web Services: Enhancing WSDL with ContextAware Features”, Proc. of 1st Intl. Workshop on Best
Practices and Methodologies in Service-Oriented
Architectures: Paving the Way to Web-services Success,
Vancouver, British Columbia, Canada, 2004, 1-14.
M. Keidl & A. Kemper, “Towards Context-aware
Adaptable Web services”, Proc. of 13th Intl. World Wide
Web Conf. - Alternate Track Papers & Posters, New York,
USA, 2004, 55-65.
Nicola Fanizzi, Claudia d’Amato, and Floriana Esposito,
“Randomized Metric Induction and Evolutionary
Conceptual Clustering for Semantic Knowledge Bases”,
CIKM’07, November 6–8, 2007, Lisboa, Portugal.
Dietze, S., Gugliotta, A., Domingue, J., “A Semantic Web
Services-based Infrastructure for Context-Adaptive
Process Support”, Proceedings of IEEE 2007 International
Conference on Web Services (ICWS), Salt Lake City,
Utah, USA.
Xiong, H., Tan, P., & Kumar, V. “Mining Strong Affinity
Association Patterns in Data Sets with Skewed Support
© 2010 ACADEMY PUBLISHER
151
Distribution”, IEEE International Conference on Data
Mining (ICDM), 387-394.
[36] Papadimitriou C. H., Raghavan P., Tamaki H., and
Vempala S., "Latent Semantic Indexing: A Probabilistic
Analysis," in Seventeenth ACM SIGACT-SIGMODSIGART symposium on Principles of database systems
Seattle, Washington, United States, 1998, pp. 159-168.
[37] Nicola Fanizzi, Claudia d’Amato, Floriana Esposito,
“Instance based Retrieval by Analogy”, SAC’07 March
11-15, 2007, Seoul, Korea.
[38] Wang, Y. & Stroulia, E., “Flexible Interface Matching for
Web Service Discovery”, in `WISE', 2003.
[39] Lin J. and Gunopulos D., "Dimensionality Reduction by
Random Projection and Latent Semantic Indexing," in
Third SIAM International Conference on Data Mining, San
Francisco, CA, USA, 2003.
Mrs. Shalini Batra joined Computer
Science and Engineering Department,
Thapar University, Patiala as Lecture in
2002 and she is presently working as
Assistant Professor in the same
department since 2009. She has done her
Post graduation from BITS, Pilani and is
pursuing Ph.D. from Thapar University
in the area of Semantic and Machine
Learning. She has guided fifteen ME theses and presently
guiding four. She is author/co-author of more than twenty-five
publications in national and international conferences and
journals. Her areas of interest include Web semantics and
machine learning particularly semantic clustering and
classification. She is taking courses of Compiler construction,
Theory of Computations and Parallel and Distributed
Computing.
Dr. Seema Bawa has done her M.
Tech. from IIT, Kharagpur and
Ph.D. from Thapar University,
Patiala. She joined Computer
Science and Engg. Dept., Thapar
University, Patiala
as Asstt.
Professor in 1999 and she is
presently serving as Professor
since 2004. She has guided four
Ph.Ds and more than thirty M.E
theses. She has served Computer industry for more than six
years before joining the University and has teaching experience
of more than ten years. She has undertaken various projects and
consultancy assignments in industry and academia. She is the
author/co-author of more than 75 publications in technical
journals and conferences of international repute. She has served
as Advisor / Track chair for various national and international
confrences. Her areas of interest include Parallel, Distributed
and Grid Computing and Cultural Computing.
Call for Papers and Special Issues
Aims and Scope
JAIT is intended to reflect new directions of research and report latest advances. It is a platform for rapid dissemination of high quality research /
application / work-in-progress articles on IT solutions for managing challenges and problems within the highlighted scope. JAIT encourages a
multidisciplinary approach towards solving problems by harnessing the power of IT in the following areas:
• Healthcare and Biomedicine - advances in healthcare and biomedicine e.g. for fighting impending dangerous diseases - using IT to model
transmission patterns and effective management of patients’ records; expert systems to help diagnosis, etc.
• Environmental Management - climate change management, environmental impacts of events such as rapid urbanization and mass migration,
air and water pollution (e.g. flow patterns of water or airborne pollutants), deforestation (e.g. processing and management of satellite imagery),
depletion of natural resources, exploration of resources (e.g. using geographic information system analysis).
• Popularization of Ubiquitous Computing - foraging for computing / communication resources on the move (e.g. vehicular technology), smart
/ ‘aware’ environments, security and privacy in these contexts; human-centric computing; possible legal and social implications.
• Commercial, Industrial and Governmental Applications - how to use knowledge discovery to help improve productivity, resource
management, day-to-day operations, decision support, deployment of human expertise, etc. Best practices in e-commerce, e-commerce, egovernment, IT in construction/large project management, IT in agriculture (to improve crop yields and supply chain management), IT in
business administration and enterprise computing, etc. with potential for cross-fertilization.
• Social and Demographic Changes - provide IT solutions that can help policy makers plan and manage issues such as rapid urbanization, mass
internal migration (from rural to urban environments), graying populations, etc.
• IT in Education and Entertainment - complete end-to-end IT solutions for students of different abilities to learn better; best practices in elearning; personalized tutoring systems. IT solutions for storage, indexing, retrieval and distribution of multimedia data for the film and music
industry; virtual / augmented reality for entertainment purposes; restoration and management of old film/music archives.
• Law and Order - using IT to coordinate different law enforcement agencies’ efforts so as to give them an edge over criminals and terrorists;
effective and secure sharing of intelligence across national and international agencies; using IT to combat corrupt practices and commercial
crimes such as frauds, rogue/unauthorized trading activities and accounting irregularities; traffic flow management and crowd control.
The main focus of the journal is on technical aspects (e.g. data mining, parallel computing, artificial intelligence, image processing (e.g. satellite
imagery), video sequence analysis (e.g. surveillance video), predictive models, etc.), although a small element of social implications/issues could be
allowed to put the technical aspects into perspective. In particular, we encourage a multidisciplinary / convergent approach based on the following
broadly based branches of computer science for the application areas highlighted above:
Special Issue Guidelines
Special issues feature specifically aimed and targeted topics of interest contributed by authors responding to a particular Call for Papers or by
invitation, edited by guest editor(s). We encourage you to submit proposals for creating special issues in areas that are of interest to the Journal.
Preference will be given to proposals that cover some unique aspect of the technology and ones that include subjects that are timely and useful to the
readers of the Journal. A Special Issue is typically made of 10 to 15 papers, with each paper 8 to 12 pages of length.
The following information should be included as part of the proposal:
•
Proposed title for the Special Issue
•
Description of the topic area to be focused upon and justification
•
Review process for the selection and rejection of papers.
•
Name, contact, position, affiliation, and biography of the Guest Editor(s)
•
List of potential reviewers
•
Potential authors to the issue
•
Tentative time-table for the call for papers and reviews
If a proposal is accepted, the guest editor will be responsible for:
•
Preparing the “Call for Papers” to be included on the Journal’s Web site.
•
Distribution of the Call for Papers broadly to various mailing lists and sites.
•
Getting submissions, arranging review process, making decisions, and carrying out all correspondence with the authors. Authors should be
informed the Instructions for Authors.
•
Providing us the completed and approved final versions of the papers formatted in the Journal’s style, together with all authors’ contact
information.
•
Writing a one- or two-page introductory editorial to be published in the Special Issue.
Special Issue for a Conference/Workshop
A special issue for a Conference/Workshop is usually released in association with the committee members of the Conference/Workshop like general
chairs and/or program chairs who are appointed as the Guest Editors of the Special Issue. Special Issue for a Conference/Workshop is typically made of
10 to 15 papers, with each paper 8 to 12 pages of length.
Guest Editors are involved in the following steps in guest-editing a Special Issue based on a Conference/Workshop:
•
Selecting a Title for the Special Issue, e.g. “Special Issue: Selected Best Papers of XYZ Conference”.
•
Sending us a formal “Letter of Intent” for the Special Issue.
•
Creating a “Call for Papers” for the Special Issue, posting it on the conference web site, and publicizing it to the conference attendees.
Information about the Journal and Academy Publisher can be included in the Call for Papers.
•
Establishing criteria for paper selection/rejections. The papers can be nominated based on multiple criteria, e.g. rank in review process plus the
evaluation from the Session Chairs and the feedback from the Conference attendees.
•
Selecting and inviting submissions, arranging review process, making decisions, and carrying out all correspondence with the authors. Authors
should be informed the Author Instructions. Usually, the Proceedings manuscripts should be expanded and enhanced.
•
Providing us the completed and approved final versions of the papers formatted in the Journal’s style, together with all authors’ contact
information.
•
Writing a one- or two-page introductory editorial to be published in the Special Issue.
More information is available on the web site at http://www.academypublisher.com/jait/.