Final - Volume 3 No 4
Final - Volume 3 No 4
Final - Volume 3 No 4
ABSTRACT
The gate-level testing also called low-level testing is generally appropriate at the
design time and for small circuits. The chip-level testing and board-level testing
also called high-level testing are preferred when the circuit complexities are too
high, making it difficult to perform low level testing in a reasonable amount of
time. The cost of low-level testing is also generally very high. Such high costs and
time are only justified when some design-changes are required. In this paper, a
high level quick checking method, known as Linear Checking Method, is
presented which can be used to qualify the functionality of a Microprocessor. This
can also be used to check hard faults in Memory chips.
3.1.5 Logical-OR instruction (f(x) = x·OR·y) 3.1.11 2’s comp. instruction (f(x) Æ x + 1)
K = f i (x, y) + f i (x, y) + f i (x, y) + f i (x, y) K = f i (x) + f i (x) + f i (x) + f i (x)
= 1011 + 1011 + 0110 + 0110 Table 2: address versus contents in memory testing
= 11 + 11 + 6 + 6 = 34
Address
Generalized form Æ K = 2(2n + 1) x y
Contents
ABSTRACT
The technological trend in the recent years has led to the emergence of complete
systems on a single chip with integrated low power communication and transducer
capabilities. This has opened the way for wireless sensor networks: a paradigm of
hundreds or even thousands of tiny, smart sensors with transducer and
communication capabilities. Manage such a complex network that has to work
unattended for months or years, being aware of the limited power resouces of
battery-supplied nodes is a challenging task. Attending that task requires an
adequate software platform, in other words an operating system specifically suited
for wireless sensor networks. This paper presents a brief overview of the most
known operating systems, highlighting the key challenges that have driven their
design.
TinyOS Mantis
Objectives Manage concurrent data flows Small learning curve
Scale easily with technology
Modularity
Structure Event-based approach Multithreaded OS,
Tiny scheduler and a set of components UNIX-style scheduler
No blocking or polling Statically-allocated thread table
Developed in nesC Developed in C
Special A byte code interpreter for non-expert Specific idle task that adjusts kernel
features programmers parameters to conserve energy
Remote debugging and reprogramming
Contiki PicOS
Objectives Preemptive multithreading support Aimed for microcontrollers with tiny RAM
Runtime loading and linking of libraries
Structure Lightweight event-driven kernel Each process thought as a FSM
Multithreading features as an optionally Multiplexing at state boundaries
linked library Written in C
Special Capable of changing communication layer on Memory allocator
features the run A set of configurable device drivers
MagnetOS EYES
Objectives Adaptability to resource and network changes Address problems of scarce memory and
Manage nodes with heterogeneous power supply
capabilities
Structure Single System Image, the entire network is a Event driven OS
unified Java virtual machine Structured in modules executed as responses
to external events
Each task runs to completion
Special Two on-line special algorithms to reduce Two layers of abstraction with specific API
features energy consumption for applications and physical support
Four-step procedure to update the code
Table 2: the seven expected features of the next generation WSN operating systems.
Power-aware policies
Self organization
Portability
Easy programming language for non-tech
users
ABSTRACT
Real time applications are characterized by their delay bounds. To satisfy the
Quality of Service (QoS) requirements of such flows over wireless
communications, we enhance the 802.11 protocol to support the Deadline
Monotonic (DM) scheduling policy. Then, we propose to evaluate the performance
of DM in terms of throughput, average medium access delay and medium access
delay distrbution. To evaluate the performance of the DM policy, we develop a
Markov chain based analytical model and derive expressions of the throughput, the
average MAC layer service time and the service time distribution. Therefore, we
validate the mathematical model and extend analytial results to a multi-hop
network by simulation using the ns-2 network simulator.
2.3 Real time scheduling over 802.11 3.1 Distributed Scheduling over 802.11
To realize a distributed scheduling over 802.11,
A distributed solution for the support of real- we introduce a priority broadcast mechanism similar
time sources over IEEE 802.11, called Blackburst, is to [18]. Indeed each station maintains a local
discussed in [8]. This scheme modifies the MAC scheduling table with entries for HOL packets of all
protocol to send short transmissions in order to gain other stations. Each entry in the scheduling table of
priority for real-time service. It is shown that this
node S i comprises two fields S j , D j where S j is
approach is able to support bounded delays. The
main drawback of this scheme is that it requires the source node MAC address and D j is the
constant intervals for high priority traffic; otherwise deadline of the HOL packet of node S j . To
the performance degrades very much. broadcast HOL packets deadlines, we propose to use
the two way handshake DATA/ACK access mode.
In [18], the authors introduced a distributed
priority scheduling over 802.11 to support a class of
When a node S i transmits a DATA packet, it
dynamic priority schedulers such as Earliest
Deadline First (EDF) or Virtual Clock (VC). Indeed, piggybacks the deadline of its HOL packet. Nodes
the EDF policy is used to schedule real time flows hearing the DATA packet add an entry for S i in
according to their absolute deadlines, where the their local scheduling tables by filling the
absolute deadline is the node arrival time plus the corresponding fields. The receiver of the DATA
delay bound. packet copies the priority of the HOL packet in ACK
To realize a distributed scheduling over 802.11, before sending the ACK frame. All the stations that
the authors of [18] used a priority broadcast did not hear the DATA packet add an entry for S i
mechanism where each station maintains an entry for using the information in the ACK packet.
the highest priority packet of all other stations. Thus,
stations can adjust their backoff according to other 3.2 DM medium access backoff policy
stations priorities. Let’s consider two stations S 1 S2and
The overhead introduced by the broadcast transmitting two flows with the same deadline D1
priority mechanism is negligible. This is due to the ( D1 is expressed as a number of 802.11 slots). The
fact that priorities are exchanged using native DATA two stations having the same delay bound can access
and ACK packets. Nevertheless, authors of [18] the channel with the same priority using the native
proposed a generic backoff policy that can be used 802.11 DCF.
by a class of dynamic priority schedulers no matter if Now, we suppose that S 1 and S 2 transmit flows
this scheduler targets delay sensitive flows or rate
sensitive flows. with different delay bounds D1 and D2 such as
D1 D2 , and generate two packets at time instants
In this paper, we focus on delay sensitive flows t 1 and t 2 . If S 2 had the same delay bound as S 1 ,
and propose to support the fixed priority Deadline
Monotonic (DM) policy over 802.11 to schedule its packet would have been generated at time t '2 such
delay sensitive flows. For instance, we use a priority as t '2 t 2 D 21 , where D 21 D 2 D1 .
broadcast mechanism similar to [18] and introduce a
At that time, S 1 and S 2 would have the same
new medium access backoff policy where the
priority and transmit their packets according to the
and (7), we obtain a system of non linear equations Hence, the expression of the throughput of a
as follows: category C i station is given by:
Ts T PHY TMAC T p T D SIFS
(33)
TPHY T ACK T D DIFS
Tc TPHY TMAC T p TD EIFS (34)
11
Psuc : the probability that S 1 observes a
successful transmission on the channel,
while S 1 is in one of the states of 1 .
11
Psuc n1 1 11 1 11 n1 2 (36)
12
Psuc : the probability that S 1 observes a
successful transmission on the channel,
while S 1 is in one of the states of 1 .
Figure 5: Normalized throughput as a function of 12
Psuc n1 1 12 1 12 n1 2 1 22 n2
the number of contending stations (37)
n2 22 1 22 n2 1 1 12 n1 1
All the curves show that DM performs service
differentiation over 802.11 and offers better We evaluate H 1R ,i ,i j , D21 Z for each state
throughput for category C1 stations independently
of S 1 Markov chain as follows:
of the number of contending stations.
Ts
1 11 Te
6 SERVICE TIME ANALYSIS H 1~ C2 ,i ,i , D21 Z Psuc Z
W
In this section, we evaluate the average MAC
Tc
layer service time of category C1 and category C 2 min i D21 1,W 1
m c
T
Ĥ 1C2 ,i D21 ,i , D21 Z 0 Otherwise 1 p12 H 1C2 ,D21 ,0 , D21 Z Te
Z p11H 1~C ,0 ,0 , D Z
2 21
i 0
We also have:
p12 H 1C 2 ,D21 ,0 , D21 Z i
1 p11 Z D21 H 1~ C2 ,i ,i , D21 Z 6.1.2 Service time Z-transform of a category
H 1C2 ,i ,i D21 , D21 Z C2 station:
Ts Tc
All the curves drop gradually to 0 as the delay Analytical and simulation results show that
increases. Category C1 stations curves drop to 0 complementary service time curves drop faster
when the number of contending stations is small for
faster than category C 2 curves. Indeed, when both category C1 and category C 2 stations. This
D21 4 slots, the probability that S 1 service time means that all stations service time increases as the
ABSTRACT
A Temporal Information System, that incorporates temporal information into the
traditional Information System of Rough Set Theory (RST) and Variable Precision
Rough Sets (VPRS), is presented. Mobile Ad hoc Networks (MANETs)
dynamically form a network without an existing infrastructure. The Dynamic
Source Routing (DSR) protocol of MANETs is modified in this paper to use recent
routes. Weighted elementary sets are introduced in temporal information systems
and used to route packets in mobile ad hoc networks. Notions from VPRS are also
brought into weighted temporal information systems and used in routing. The
performance of these proposed routing protocols is studied.
1.3 Routing in Mobile Ad Hoc Networks This paper presents a Temporal Information
System that brings temporal information into the
Sending data from a source mobile node to traditional Information System of Rough Set Theory
a destination mobile node through a route (or and Variable Precision Rough Sets. The DSR
sequence of intermediate mobile nodes) is routing. protocol is modified to study the use of recent routes.
Routing is one of the most difficult issues in mobile The paper then introduces the notion of weighted
ad hoc networks. Each node in an ad hoc network is elementary sets in temporal information systems and
responsible for routing. Hence each node should uses this to route packets in mobile ad hoc networks.
maintain the information necessary for routing, that This paper then uses VPRS in weighted temporal
is, the next hop or the path through which the data information systems and uses this in routing. The
has to be routed to the destination. This information performance of these proposed routing protocols are
is either available even before it is needed studied.
(proactive) or is got only when necessary (reactive).
2 TEMPORAL INFORMATION SYSTEMS
Proactive routing is usually done using table
driven protocols where tables are formed initially, 2.1 Information Systems and Decision Systems
and are updated either periodically or when some
change is known. Consider a universe U of elements. An
information system I is defined as I = (U, A, V, ρ),
Reactive routing protocols are also known where
as on-demand routing protocols. In this kind of A is a non-empty, finite set of attributes;
protocols, a route is found for a destination only is the set of attribute values
when there is a need to send information to the of all attributes,
destination. Each node does not have knowledge of where is the set of possible values of
the location of all other nodes in the network. All attribute ;
nodes when they learn or use routes, store the routes is an information function,
to the destinations in a routing table or a cache. such that for every element ,
is the value of attribute for
Most of the on-demand routing protocols element .
[5], [25], [9], discover routes when needed by The information system can also be viewed as an
flooding route request packets in the network. A information table, where each element
route reply is sent back to the source node by either corresponds to a row, and each attribute
an intermediate node that has a route to the corresponds to a column.
destination or by the destination.
I = (U, A, V, ρ), is known as a decision
system, when an attribute is specified as the
The conditional probability that the element 2.4 Information System in a Mobile Node
in the elementary set is negative is
The use of an information system in mobile
ad hoc routing was introduced in [26]. The
information system was modified in [27] to represent
When the context is clear, the conditional probability the route better, by using the link information rather
of an elementary set is taken to be . than the node information. A threshold was used in
the identification of a good next hop.
The -positive region is the union of the
findRecentRoute() {
besttime = 0;
Packet Delivery Ratio (%)
18
Average hop count
16
14 4.1 Routing Based on WTIS
12
10 DSR
8
Here, the route cache of the mobile node is
DSRrec used as the WTIS. Routes that are learnt and used are
6
4 added to the cache of the mobile node. When routes
2
0 are added, the time stamp of each link is added along
with the routes. However, unlike DSR, even if the
5 10 20 30 40
same route is present in the cache earlier, the new
Number of connections route is added with the new stamp stamps. So, the
cache now has the same route multiple times, but
with different time stamps.
Figure 3: Average hop count vs. number of
connections for DSR and DSRrecent In the source node, initially, as in DSR, a
shortest route in the route cache, if available, is
placed as the source route in the data packet. If not
available, route discovery is done.
Average end-to-end delay
10
9
8 Then in the source node, and in any
7 intermediate forwarding node, the WTIS is used to
6 DSR
5 determine the best next hop (using algorithm
4 DSRrec
3 findWeightBasedHop). If the next hop is found, and
2 does not result in a loop, the data packet will be
1
0 forwarded to this next hop. If this next hop is
different from the one in the source route that is
5 10 20 30 40
already in the data packet, this new next hop is
Number of Connections appended to the source route in the data packet at the
current node and the route is invalidated by setting a
flag in the data packet.
Figure 4: Average end-to-end delay vs. number of
connections for DSR and DSRrecent If a next hop cannot be determined from the
WTIS, or if the next hop results in a loop, if the
source route in the data packet has not been
4 MOBILE AD HOC ROUTING USING invalidated earlier, the data packet is forwarded
WEIGHTED TEMPORAL INFORMATION according to the source route. Else, a route discovery
SYSTEMS (WTIS) is done.
In Temporal Information Systems, each The total time is divided into time intervals.
elementary set is associated with a particular time The list of next hops to the destination that are
interval. In Weighted Temporal Information present in the route cache is found. For each possible
Systems, elementary sets in different time intervals next hop, from the current time interval till the initial
have weights. Since it is seen in the previous section time interval, a weighted sum of the number of times
that the recent route in the route cache is useful, that the particular next hop is used is found. More
4
DSR
3
TIME_WT
The ratio of the weighted sum of the usage 2
of the node to the total weight is found. The node for 1
which the ratio is greater than a threshold $\beta'$ is 0
chosen as the next hop. 5 10 20 30 40
Number of connections
4.2 Performance Evaluation
Figure 8: Average end-to-end delay vs. number of The proposed protocol used in this section
connections for DSR and TIME_WT is referred to as VPRS_WT. The packet delivery
ratio for VPRS_WT is less than that for DSR for 5
and 10 connections. When the number of
5 MOBILE AD HOC ROUTING USING connections is increased from 20 to 40 there is a
BETA-POSITIVE REGIONS IN WTIS slight improvement in the packet delivery ratio from
2% to 6% (Fig. 9).
5.1 Routing Based on Beta-Positive Regions
The normalized control overhead for
The experiment described in this section VPRS_WT is more than that for DSR when the
determines the predicted next hop as that value of the number of sources is 5, 10. With increase in the
decision attribute where the union of these number of connections from 20 to 40, there is an
elementary sets is in the $\beta$-positive region, as average improvement of 14% to 19% over DSR (Fig.
described in section 2.5. 10).
14
12
002.
10 DSR
[2] A. Skowron and P. Synak. Reasoning Based on
8 Information Changes in Information Maps. In Rough
6 VPRS_WT
Sets, Fuzzy Sets, Data Mining, and Granular
4
Computing, volume 2639 of Lecture Notes in
2
0 Artificial Intelligence, pages 229 – 236. Springer
2003.
5 10 20 30 40
[3] A. Wieczorkowska, J. Wroblewski, P. Synak and
Number of connections D. Slezak. Application of temporal descriptors to
musical sound recognition. Journal of Intelligent
Information Systems, 21(1):71 – 93, 2003.
Figure 11: Average hop count vs. number of [4] J.K. Baltzersen. An attempt to predict stock
connections for DSR and VPRS_WT market data: a rough sets approach. Master’s thesis.
1996.
[5] David B.Johnson and David A. Maltz. Dynamic
source routing in ad hoc wireless networks. In
Average end-to-end delay
7
Imielinski and Korth, editors, Mobile Computing,
6
volume 353. Kluwer Academic Publishers, 1996.
5
4 DSR [6] Zhongmin Cai, Xiaohong Guan, Ping Shaoa, and
3
Guoji Sun. A rough set theory based method for
VPRS_WT
2 anomaly intrusion detection in computer network
1 systems. In Expert Systems, volume 20, pages 251 –
0 259, 2003.
5 10 20 30 40 [7] Frank Chiang and Robin Braun. Intelligent
failure domain prediction in complex
Number of Connections
telecommunication networks with hybrid rough sets
and adaptive neural nets. In 3rd International
Information and Telecommunication Technologies
Figure 12: Average end-to-end delay vs. number of Symposium, 2004.
connections for DSR and VPRS_WT [8] Slezak D., Synak P., Wieczorkowska A., and
Wroblewski J. Kdd-based approach to musical
instrument sound recognition. In Proc. of the 13th
6 CONCLUSIONS International Symposium on Foundations of
Intelligent Systems, Vol. 2366, Lecture Notes in
This paper presents temporal extensions to Artificial Intelligence, Springer, pages 28 – 36, 2002.
Rough Set Theory and Variable Precision Rough [9] Rohit Dube, Cynthia D. Rais, Kuang-Yeh Wang,
Sets. These extensions are applied to Mobile Ad hoc and Satish K. Tripathi. Signal stability-based
routing. Illustrative experiments are described and adaptive routing (SSA) for ad hoc mobile networks.
the results are presented. IEEE Personal Communications, 4(1):36{45, 1997.
[10] K. Fall and K. Varadhan. The ns manual
D. Singh
Institute of Technology,Banaras Hindu University,Varanasi, UP, INDIA
ABSTRACT
This paper proposes a fuzzy inference based neural network for the forecasting of
short term loads. The forecasting model is the integration of fuzzy inference engine
and the neural network, known as Fuzzy Inference Neural Network (FINN). A
FINN initially creates a rule base from existing historical load data. The parameters
of the rule base are then tuned through a training process, so that the output of the
FINN adequately matches the available historical load data. Results show that the
FINN can forecast future loads with an accuracy comparable to that of neural
networks, while its training is much faster than that of neural networks. Simulation
results indicate that hybrid fuzzy neural network is one of the best candidates for
the analysis and forecasting of electricity demand. Radial Basis Function Neural
Network (RBFNN) integrated with Fuzzy Inference Engine has been used to create
a Short Term Load Forecasting model.
y mp = ∑i =1
w mi oi + bi (2) fuzzy sets defined in U. Hence, the fuzzification
interface provides a link between the non-fuzzy
where, outside world and the fuzzy system framework. The
N = number of hidden layer nodes (RBF units) fuzzy rule base is a set of linguistic rules or
conditional statements in the form of: "IF a set of
y mp = output val ue of the m th node in the output layer
conditions is satisfied, THEN a set of consequences
for the i th incoming pattern are inferred". The fuzzy inference engine is a
decision making logic performing the inference
w mi = weight betweeni th RBF unitandmth outputnode
operations of the fuzzy rules. Based on the fuzzy IF-
THEN rules in the fuzzy rule base and the
bl = biasing strength of the m th output node
compositional rule of inference [14], the appropriate
oi = i th input to the linear layer. fuzzy sets are inferred in the output space.
Supposing the mapping µ A from discussed
The values of the different parameters of the
RBF networks are determined during training. These region U to the range [0, 1]: U → [0,1] ,
parameters are spread, cluster centers, and weights x → µ A (x) confirms a fuzzy subset of U, named A,
and biases of the linear layer. The number of neurons
for the network and spread is determined through the mapping µ A (x) is known as membership
experimentation with a large number of function of A. The size of the mapping
combinations of spread and number of neuron. The µ A (x) shows the membership degree of x to fuzzy
best combination is one which produces minimum
Sum Squared Error (SSE) on the testing data. set A, which is called membership degree for short.
In practice, membership function can be selected
3 FUZZY INFERENCE according to the characteristic of the object.
Fuzzy inference based on fuzzy estimation is a
Fuzzy inference is the process of formulating the method by which a new and approximate fuzzy
mapping from a given input to the output using fuzzy estimation conclusion is inferred using fuzzy
logic. This process numerically evaluates the language rule. This paper adopts composite fuzzy
information embedded in the fuzzy rule base. The inference method, which is inference method based
fuzzy rule base consists of “IF-THEN” type rules. on fuzzy relation composing principle. A fuzzy
For a set of input variables, there will be fuzzy inference engine can process mixed data. Input data
membership in several fuzzy input variables. By received from the external world is analyzed for its
using the fuzzy inference mechanism, the validity before it is propagated into a fuzzy inference
information is processed to evaluate the actual value engine. The capability of processing mixed data is
from the fuzzy rule base. A good precision can be based on the membership function concept by which
achieved by applying appropriate membership all the input data are eventually transformed into the
definitions along with well-defined membership same unit before the inference computations. A
functions. This is an information processing system fuzzy inference engine normally includes several
that draws conclusions based on given conditions or antecedent fuzzy variables. If the number of
evidences. A fuzzy inference engine is an inference antecedent variables is k then there will be k
engine using fuzzy variables. Fuzzy inference refers information collected from the external world.
to a fuzzy IF-THEN structure. The fact that fuzzy Fuzzification and normalization are the two typical
inference engines evaluates all the rules transformations. Another important property is that
simultaneously and do not search for matching when an input data set is partially ambiguous or
antecedents on a decision tree makes them perfect unacceptable, a fuzzy inference engine may still
candidates for parallel processing computers. produce reasonable answers.
A fuzzy set is a set without a crisp, clearly
defined boundary, and can contain fuzzy variables 4 FUZZY INFERENCE NEURAL NETWORK
with a partial degree of membership, which is
presented by the membership functions within the A fuzzy Inference neural network approach,
Learning
Algorithm
5 SIMULATION RESULTS
6000
A c tual
5800 Forec as ted
5600
5400
(MW
)
5200
oad
5000
L
4800
4600
4400
4200
0 20 40 60 80 100 120 140 160
Hour
5400 Actual
Forecasted
5200
5000
(M
d W
)
4800
o
La
4600
4400
4200
4000
0 20 40 60 80 100 120 140 160
Hour
5400 Actual
Forecasted
5200
5000
(MW
)
4800
oad
4600
L
4400
4200
4000
3800
0 20 40 60 80 100 120 140 160
Hour
designed network is used to forecast the day ahead accurate as compared to MLR. The error depends
and week ahead forecast on an hourly basis. on many factors such as homogeneity in data,
Forecasting has been done on the one year load data network parameters, choice of model and the type
of Trans Alta Electric Utility for Alberta, Canada. of solution. The flexibility of the fuzzy logic
Load varies from 3900 MW to 6200MW. The offering a logical set of IF-THEN rules, which
FINN is trained using last four weeks hourly load could be easily understood by an operator, will be a
data and then they are used to forecast the load for good solution for practical implementation. FINN
the next 168 hours i.e. one week. The results are training time was much faster and also effectively
reported for three weeks, one each for winter, incorporated linguistic IF-THEN rules. Load
spring and summer seasons. This reflects the forecasting results show that FINN is equally good
behaviour of the network during seasonal changes for week ahead and day ahead forecasting and
and corresponding results are shown in Table 2. It requires lesser training time as compared to other
is observed that the performance of the day ahead forecasting techniques, conventional regression
and week ahead forecast are equally good. Load MLR and simple RBF neural network.
shape curves for three weeks are shown in Fig. 3,
Fig. 4 and Fig. 5.The errors are tabulated in Table 2. ACKNOWLEDGEMENT
It is observed from the figures that the forecaster
captures the load shape quite accurately and the The authors would like to thank TransAlta,
forecasting error on most of the week days are low Alberta, Canada for providing the load data used in
with slightly higher error on weekend days. the studies.
For having a comparative study the proposed
FINN method is compared with other two methods, 7 REFERENCES
conventional Multi Layer Regression and RBF
neural networks for the same period of time. The [1.] A.D.Papalexopoulos, T.Hasterberg: A
result (Table 3) shows that the average MAPE for Regression based Approach to Short Term
FINN is better than MLR in all seasons and the System Load Forecast , IEEE Trans. On
average MAPE for RBFNN is even better than Power Systems. Vol.5, No.4, pp 1535-1544,
FINN. But at the same time it is also noticeable that (1990).
the training time required in the forecasting through
RBFNN integrated with Fuzzy Inference is [2.] A. Khotanzad, E. Zhou and H.Elragal: A
approximately ten times less than the training time Neuro-Fuzzy approach to Short-term load
required for simple RBFNN. forecasting in a price sensitive environment,
IEEE Trans. Power Syst., vol. 17 no. 4, pp.
6 CONCLUSION 1273–1282, (2002).
[6.] D.K.Ranaweera, .F.Hubele and [15.] K.Y. Lee, Y.T. Cha, and J.H. Park: Short-
A.D.Papalexopoulos: Application of Radial Term Load Forecasting Using An Artificial
Basis Function Neural Network Model for Neural Network,” IEEE Trans. on Power
Short Term Load Forecasting , IEE Proc. Systems, vol. 7, no. 1, pp. 124–132, (1992).
Gener. Trans. Distrib., vol. 142, No.1,
(1995).
[16.] Ranaweera D.K., Hubele N.F. and Karady
G.G: Fuzzy logic for short-term load
[7.] D. Singh and S.P. Singh: Self selecting
forecasting, Electrical Power and Energy
neural network for short-term load
Systems,” Vol. 18, No. 4, pp. 215-222,
forecasting , Jour. Of Electric Power.
(1996).
Component and Systems, vol. 29, pp 117-
130, (2001).
[17.] T. Takagi and M. Sugeno: Fuzzy
[8.] E. H. Mamdani and S. Assilian: An identification of systems and its
experiment in linguistic synthesis with a applications to modeling and control, IEEE
fuzzy logic controller, Int. J. Man–Mach. Trans. Syst., Man, Cybern., vol. 15, pp.
Stud., vol. 7, no. 1, pp. 1–12, (1975). 116–132, (1985).
[9.] F. Meslier: New advances in short term [18.] T. S. Dillon, S. Sestito, and S. Leung: Short
load forecasting using Box and Jenkins term load forecasting using an adaptive
approach , Paper A78 051-5, IEEUES neural network, Elect. Power Energy Syst.,
Winter Meeting,( 1978). vol. 13, (1991).
[10.] Hiroyuki Mori and Hidenori Kobayashi: [19.] W.R.Christiaanse: Short Term Load
Optimal fuzzy inference for short term load Forecasting using general exponential
forecasting, IEEE Trans. on Power Systems, smoothing, IEEE Trans. On Power Appar.
vol.11, No.2, pp. 390-396, (1996). Syst.,PAS-3,pp 900-911 (1988)
[11.] I. Mogram and S. Rahman : Analysis and [20.] Zadeh L.A: Roles of Soft Computing and
evaluation of five short term load forecast Fuzzy Logic in the Conception, Design and
techniques, IEEE Trans. On Power Systems. Deployment of Information /Intelligent
Vol.4, No.4, pp 1484-1491, (1989). Systems, Computational Intelligence, Soft
Computing and Fuzzy-Neuro Integration
[12.] Kwang-Ho Kim, Hyoung-Sun Youn, Yong- with Applications, O Kaynak, LA Zadeh, B
Cheol Kang: Short-tem Load Forecasting Turksen, IJ Rudas (Eds.), pp 1-9. (1998).
for Special Days in anomalous Load
Conditions Using Neural Network and
Fuzzy Inference Method, IEEE Trans. on
Power Systems, Vol. 15, pp. 559-569,
(2000).
ABSTRACT
Search plays an important role in ubiquitous computing. In this paper, we
investigate the expected cost and latency of three unstructured search schemes:
broadcast, TTL-based, and random-walk-based search schemes. We build a unified
analytical model for unstructured search schemes. We demonstrate, through
simulation, that the different search schemes exhibit very different cost and latency
tradeoffs, leaving large gaps in the performance landscape. We propose
randomized mixing schemes to bridge such performance gaps and bring out new
Pareto frontier that offers more diverse performance choice for applications.
paid outright, but that of a later action, Ci, is paid out E[ D '] ≡ ∑∑ ⎜
⎜
i =1 ⎝
D ' j ⎟ ( Fi −1 − Fi ) =
⎟ ∑F D' i i (6)
only if the previous action failed with the probability j =1 ⎠ i =1
random walk is 1.
A2 The cost of a search action that requires where F(j-1) is defined in the sense of (7). This
broadcasting to n nodes is n, i.e., Ci = Ni in minimum expected cost can be achieved by an
equations (1) and (2). idealized random walk.
A3 The m target replicas are independently, Proof: First, we show that for an arbitrary search
identically distributed among the nodes. scheme, its expected cost is at least Cmin. Suppose the
The above assumptions are admittedly idealistic, scheme’s search action sequence is [A1, A2,…, Al],
but they are used here to focus on the issue intrinsic and the corresponding node coverage sequence being
to the merits of a particular search scheme and [N1, N2,…, Nl = N]. It is clear that Ci >= ΔNi =Ni - Ni-
exclude external factors such as network conditions, 1, since it takes at least a cost of ΔNi to cover ΔNi
implementation efficiency, etc. Assumption A1 holds nodes. So we have
only if the links are lossless, which is not true in
practical situations. However, introducing loss l l l
requires specification of loss probability, which is
some extraneous detail outside the scope of our
E[C ] = ∑i =1
Fi −1Ci ≥ ∑
i =1
Fi −1ΔNi = ∑ F (N
i =1
i −1 ) ΔN i
Fi = F ( Ni ) (7)
Second, an idealized random walk is a sequential
The particular form of F depends on the distribution visit of each node. Without loss of generality, let the
of the replicas. Now we are ready for the following sequence be j=1, 2,…, N, and thus incurs the expect
proposition. the minimum cost of
Proposition 2.1 A: The expected cost is
parameterized by and thus depends only on the N
search sequence [N1, N2,…, Nl], regardless the ∑ F ( j − 1) Q.E.D.
network topology. B: The expected latency depends j =1
on both the search sequence and network topology.
Proof: The validity of the statement A is The next proposition is obvious, and we state
apparent by combing equations (1) and (7). The without proof.
validity of the statement A derives from the fact that Proposition 2.2 The minimum expected latency
different latencies are incurred in different network is the minimum expected distance in hops between
topologies to cover the same number of nodes. For, the originator and any one of target replicas, which
example, it incurs just one-hop latency to cover all can be achieved by a broadcast search.
d
⎛r⎞ 4 TTL-BASED SEARCH SCHEMES
Pr ≡ P ( x ≤ r ) = ⎜ ⎟
⎝R⎠
A TTL-based search consists of a sequence of
broadcasts with increasing TTL values, with the last
where R is the radius of the network in hops. The action being a network-wide broadcast. To motivate
probability that minimum distance between the our analysis, we consider the first two actions in a
originator and the m target replicas is no larger than r TTL-based search, i.e., A1 and ,A2 with increasing
hops is TTL values, covering N1 and N2 nodes, incurring
costs of C1 and C2, respectively. Note that the node
m
⎡ ⎛ r ⎞d ⎤ coverage of A2 includes that of A1. We consider two
m m
Pr ≡ P ( x ≤ r ) = 1 − ⎢1 − ⎜ ⎟ ⎥ options: a) a sequence of search actions [A1, A2], or
⎢⎣ ⎝ R ⎠ ⎥⎦ b) a single action A2. Clearly, these two options have
the same probability of success at the conclusion of
And according to Proposition 2.2, the expected the search. However, their costs are different, i.e., Ca
latency of a broadcast search can be calculated as the = C1 + F1 C2 for option a, and Cb = C2, for option b.
expected minimum distance between originator and Option a is preferable to b only if C1 + F1C2 < C2.
the targets. Rearranging terms, we have the following lemma,
which was similarly stated in [6] but is derived here
m using simpler argument.
R L ⎡ ⎛ r ⎞d ⎤ Lemma 4.1 The cost of a search action A2 can
∫ ∫
m
E[ DBCast ] = rdPr = − rd ⎢1 − ⎜ ⎟ ⎥
0 0 ⎢⎣ ⎝ R ⎠ ⎥⎦ be reduced by using a sequence of search actions [A1,
A2], if the inequality holds:
1
Γ(m)Γ( + 1)
= mR d (10) C1
1 F1 < 1 − (13)
Γ(m + + 1) C2
d
Specializing to our model, and under the
where Γ( ) is the gamma function. For d = 2, which is assumptions A1-A3, we have C1 = N1, C2 = N2, and
our main focus here, we have N = απR2, with α being the inequality becomes
the node density. The node density, defined as the
400
which survive with probability of q at each step. The
300 persistent walker guarantees the search is always
200
successful, while the non-persistent walkers
speculatively wander away trying to reduce latency.
100
In the following, we discuss expected cost and
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 latency of SRW and MRW sequentially.
p1
We model a SWR search scheme as a sequence
(a) of actions, A1, A2, …Al, with Ci = 1, Di= 1, and l >= N.
20
Let si indicate the number of distinct nodes visited in
the first i steps, then probability of failure to find any
18
m=2
m=4
of m replicas of the target is given by (since search
16 m=6
m=8 for each replica has to fail),
m=10
m
14
⎛ s ⎞
Fi = ⎜ 1 − i ⎟ = fi m (17)
12
⎝ N ⎠
E[D]
10
8
si
where fi = 1 − (18)
6
N
4
∑ ∑f m
p1
l l −1
E[C]
Therefore, in a practical implementation, the 500
⎡ l −1 ⎤ l −1
∑ ∑f
k=1
E[CMRW ] = k ⎢ Pi fi m ⎥ + m 320
i (21) k=2
k=3
⎣⎢ i =0 ⎦⎥
k=4
i =0 310
k=5
E[D]
280
q i − q l +1
Pi = (22)
1 − q l +1
270
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
q
(b)
The expected latency is determined by the
minimum of those among multiple walkers, whose Figure 2: Expected cost E[C] (a) and latency E[D] (b)
population at ith step is versus q for RW-based scheme
Ki = 1 + kPi (23)
From the figure, one can see that large
The search continues only if all Ki walkers fail to performance variation occurs only where q is close
find any of m replicas, so the expected latency is to 1. Roughly after q is smaller than 0.9, the
expected cost and latency fast converges to those
l −1
achieved by a single persistent walker. Figure 2 also
E[ DMRW ] = ∑f i =0
i
mKi
(24) shows the tradeoff between expected cost and
latency, i.e., latency can be reduced, at the expense
of increased cost, by either increasing the number of
By varying k and q, we can construct a family of non-persistent walkers (k) or reducing the survive
RW-based schemes. Simulation results are shown in probability (q). An important point, which will be
Figure 2 for a network with N = 1000, random elaborated in the next section, is that the cost-latency
topology and average degree being five, and with tradeoff here exhibits a very different feature from
one persistent walker plus one to five non-persistent that of TTL-based schemes, comparing Figures 1 and
walkers. To approximate practical, optimized 2.
implementation, previously visited nodes are flagged,
BCast
In this section, we examine more closely the 900
E[C]
DA to indicate E[CA], E[DA] of a particular search 500
500
probability 1-p.
Proposition 6.1 The feasible region of a mixed
400
RW
E[D]
p)CB, pDA+(1-p)DB, is realizable. Therefore the
feasible region is convex. Q.E.D. (b)
Equipped with the above background, let us 1000
500
RW
based schemes achieve a slightly larger latency than 200
E[D]
60 70 80 90 100
∂CRW
∂CTTL
∑P f i i
m
of INFOCOM (2004).
= ∂k = i =0 [13] Gkantsidis, C., Mihail, M., Saberi, A.: Hybrid
∂DRW ∂DTTL l −1 search schemes for unstructured peer-to-peer
q
∂k q
m ∑P f
i =0
i i
mKi
ln fi networks, in Proc. of INFOCOM (2005)
[14] L. A. Adamic, R. M. Lukose, B. Huberman, and
l −1 A. R. Puniyani: Search in power-law networks,
dPi
∂CRW
∂CTTL
∂q
∑ dq fi m Physical Review E., vol. 64, no. 046135 (2001).
= = i =0 [15] R. Gaeta, G. Balbo, S. Bruell, M. Gribaudo, M.
∂DRW ∂DTTL l −1
dPi Sereno: A simple analytical framework to
k
∂q k
m ∑ dq
i =0
fi mKi ln fi analyze search strategies in large-scale peer-to-
peer networks, Performance Evaluation, Vol. 62,
Issue 1-4, (2005)
From the above, we can explain the smaller cost-
latency slope by the absence of the √N term as with
TTL-based schemes. Further, we can attribute the m
factor in the denominator to the fact that the slope
becomes smaller with larger m, as shown in Figure 3.
9 REFERENCES
Abstract— Correlation among queries is an important factor to on the Web are much higher than the user which is simply
analyze as it may affect the results delivered by a search engine. retrieving some information from a traditional database. This
In this paper, we analyze correlation among queries and how makes the task of extracting information from the Web a bit
it affects the information retrieved from the Web. We analyze
two types of queries: (i) queries with embedded semantics, and challenging [1].
(ii) queries without any semantics. In our analysis, we consider Since the Web searching is an important activity and the
parameters such as search latencies and search relevance. We results obtained so may affect decisions and directions for
focus on two major search portals that are mainly used by individuals as well as for organizations, therefore, it is of
end users. Further, we discuss a unified criteria for comparison utmost importance to analyze the parameters or constituents
among the performance of the search engines.
involved in it. Many researchers have analyzed many different
Index Terms— Query correlation, search portals, Web infor- issues pertaining to Web searching that include index quality
mation retrieval, unified criteria for comparison, earned points. [2], user-effort measures [3], Web page reputation [6], and user
perceived quality [7].
In this paper, we try to answer the following question: What
I. I NTRODUCTION happens when a user fires queries to a search engine one by
The Internet that was aimed to communicate research ac- one that are correlated? Specifically, we wish to evaluate the
tivities among a few universities in United States has now effect of correlation among the queries submitted to a search
become a basic need of life for all people who can read and engine (or a search portal).
write throughout the world. It has become possible only due Rest of this paper is organized as follows. In section II, we
to the proliferation of the World Wide Web (WWW) which is briefly review methodologies used in popular search engines.
now simply called as the Web. The Web has become the largest In section III, we describe query correlation. Section IV
source of information in all parts of life. Users from different contains results and discussion. In section V, we describe a
domains often extract information that fits to their needs. criteria for comparison of search engines. Finally, section VI
The term Web information retrieval1 is used for extracting is for conclusion and future work.
information from the Web.
Although, Web information retrieval finds its roots to tra- II. A R EVIEW OF M ETHODOLOGIES U SED IN S EARCH
ditional database systems [4], [5]. However, the retrieval of E NGINES
information from the Web is more complex as compared to the
First we discuss a general strategy employed for retrieving
information retrieval from a traditional database. This is due
information from the Web and then we shall review some of
to subtle differences in their respective underlying databases2 .
the search portals.
In a traditional database, the data is often organized, limited,
and static. As opposed to that the Webbase is unorganized,
unlimited, and is often dynamic. Every second a large number A. A General Strategy for Searching
of updates are carried out in the Webbase. Moreover, as A general strategy for searching information on the Web is
opposed to a traditional database which is controlled by a shown in Fig. 1. Broadly a search engine consists of the fol-
specific operating system and the data is located either at lowing components: User Interface, Query Dispatcher, Cache 3 ,
a central location or at least at a few known locations, the Server Farm, and Web Base. The way these components
Webbase is not controlled by any specific operating system interact with one another depends upon the strategy employed
and its data may not reside either at a central site or at few in a particular search engine. We describe here a broad view.
known locations. Further, the Webbase can be thought as a An end user fires a query using an interface, say User Interface.
collection of a large number of traditional databases of various The User Interface provides a form to the user. The user fills
organization. The expectations of a user searching information the form with a set of keywords to be searched. The query
1 The terms Web surfing, Web searching, Web information retrieval, Web goes to the Query Dispatcher which, after performing some
mining are often used in the same context. However, they differ depending refinements, sends it to the Cache. If the query obtained after
upon the methodologies involved, intensity of seeking information, and
intentions of users who extract information from the Web. 3 We use the word Cache to mean Search Engine Cache i.e. storage space
2 Let us use the term Webbase for the collection of data in case of the Web, where results matching to previously fired queries or words are kept for future
in order to differentiate it from the traditional database. use.
UbiCC Journal - Volume 3 59
2 4
1 U 3
S Query 5
query E Dispatcher
R
I 8
N 7
T
WEB BASE
E
R
response F Server Farm
Cache
A
C
9 E 6
refinement4 is matched to a query in the Cache, the results are search engine may not search words that are not part of its
immediately sent by the Query Dispatcher to the User Interface ontology. It can modify its ontology with time. One step more,
and hence to the user. Otherwise, the Query Dispatcher sends an ontology based search engine may also shorten the set of
the query to one of the Server in the Server Farm which are results searched before presenting it to the end users that are
busy in building a Web Base for the search engine. The server not part of the ontology of the given term.
so contacted, after due consideration from the Web Base sends We now describe an important aspect pertaining to informa-
it to the Cache so that the Cache may store those results tion retrieval from the Web. The results delivered by a search
for future reference, if any. Cache sends them to the Query engine may depend how the queries are formulated and what
Dispatcher. Finally, through the User Interface, response is relation a given query has with previously fired queries, if any.
returned to the end user. We wish to study the effect of correlation among the queries
In what follows, we briefly review the strategies employed submitted to a search engine.
by different search portals.
III. Q UERY C ORRELATION
B. Review of Strategies of Search Portals The searched results may differ depending upon whether
The major search portals or search engines5 which end users a search engine treats a set of words as an ordered set or an
generally use for searching are GoogleTM and YahooTM . Let unordered set. In what follows, we consider each one of them.
us briefly review the methodologies behind their respective
search engines6 of these search portals. A. Permutations
Google is based on the PageRank scheme described in [8].
Searched results delivered by a search engine may depend
It is somewhat similar to the scheme proposed by Kleinberg in
upon the order of words appearing in a given query7. If we
[9] which is based on hub and authority weights and focuses
take into account order of words, the same set of words may
on the citations of a given page. To understand the Google’s
form different queries for different orderings. The different
strategy, one has to first understand the HITS (Hyperlink-
orderings of the set of words of the given query are called
Induced Topic Search) algorithm proposed by Klienberg. For
permutations. The formal definition of permutations of a given
that the readers are directed to [9] for HITS and to [8] for
PageRank.
query is as follows.
Definition 1: Let the query Q wi 1 i m, Q φ, be
On the other hand, Yahoo employs an ontology based
a set of words excluding stop words of a natural language. Let
P x j
1 j m be a set of words excluding stop words.
search engine. An ontology is a formal term used to mean a
hierarchical structure of terms (or keywords) that are related.
If P is such that wi x j for some j not necessarily equal to
The relationships among the keywords are governed by a set
i, and wi Q
x j P such that wi x j where j may not be
of rules. As a result, an ontology based search engine such
equal to i, then P is called a permutation of Q.
as Yahoo may search other related terms that are part of
In the above definition, stop words are language dependent.
the ontology of the given term. Further, an ontology based
For example in the English language, the set of stop words,
4
By refinement of a query, we mean that the given query is transformed S, is often taken as
in such a way so that the words and forms that are not so important are
eliminated so that they do not affect the results. S a an the is am are will shall of in for
5 A search engine is a part of search portal. A search portal provides many
other facilities or services such as Advanced Search, News etc. 7 The term ’query’ means a set of words that is given to a search engine to
6 The respective products are trademarks of their organizations. search for the information available on the Web.
UbiCC Journal - Volume 3 60
1
Note that if there are m words (excluding the stop words) in Google
Yahoo
Latency
different orderings of the same set of words. However, one
would like to know how the given search engine behaves when 0.4
Latency
engine such that Q1 and Q2 are sets of words of a natural
language and Q1 Q2 φ. Q1 and Q2 are said to be correlated 0.4
only if C k, where denotes the cardinality. Fig. 3. Latency versus page number for permutation P2.
For two queries that are correlated, we define a parameter
called Correlation Factor8 as follows. 1
Google
Yahoo
Q1 Q2
Correlation Factor (1)
Q1 Q2 0.8
Q1 Q2 . 0.6
queries the Correlation Factor is 0. Further, one can see from 0.4
by the following equation. Fig. 4. Latency versus page number for permutation P3.
O
Qo ∑ Qi ∑
Qi Q j ∑ Qi Qj Qk 1
Google
o 1 i i j i j k Yahoo
O 1
1 Q1 Q2 QO (2) 0.8
O
o 1 Qo
Correlation Factor O
(3) 0.4
o 1 Qo
A high correlation factor means that the queries in the cluster 0.2
8 This correlation factor is nothing but Jaccard’s Coefficient, which is often Fig. 5. Latency versus page number for permutation P4.
used as a measure of similarity.
UbiCC Journal - Volume 3 61
TABLE I
S EARCH LATENCIES , QUERY SPACE , AND THE NUMBER OF RELEVANT RESULTS FOR DIFFERENT PERMUTATIONS OF THE QUERY: Ash Mohammad Abbas
FOR G OOGLE .
Permutation p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
1 0.22 0.15 0.04 0.08 0.33 0.29 0.15 0.13 0.16 0.17
300000 300000 300000 300000 300000 300000 300000 300000 300000 300000
8 5 1 0 2 0 0 3 0 0
2 0.51 0.15 0.22 0.19 0.13 0.12 0.10 0.27 0.16 0.15
300000 300000 300000 300000 300000 300000 300000 300000 300000 300000
3 2 2 1 1 0 2 0 0 0
3 0.30 0.08 0.18 0.20 0.14 0.25 0.13 0.21 0.14 0.21
300000 300000 300000 300000 300000 300000 300000 300000 300000 300000
6 4 1 3 2 1 1 0 0 0
4 0.60 0.07 0.35 0.11 0.13 0.15 0.23 0.13 0.28 0.26
300000 300000 300000 300000 300000 300000 300000 300000 300000 300000
3 0 2 1 0 0 2 0 1 1
5 0.38 0.09 0.39 0.14 0.17 0.15 0.14 0.16 0.15 0.13
300000 300000 300000 300000 300000 300000 300000 300000 300000 300000
3 2 1 2 1 0 0 1 1 1
6 0.36 0.15 0.10 0.12 0.18 0.17 0.15 0.13 0.20 0.15
300000 300000 300000 300000 300000 300000 300000 300000 300000 300000
5 4 1 3 0 2 1 2 2 0
TABLE II
S EARCH LATENCIES , QUERY SPACE , AND THE NUMBER OF RELEVANT RESULTS FOR DIFFERENT PERMUTATIONS OF THE QUERY: Ash Mohammad Abbas
FOR YAHOO .
Permutation p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
1 0.15 0.15 0.27 0.24 0.25 0.23 0.34 0.21 0.27 0.30
26100 26400 26400 27000 27000 26900 26900 26900 25900 25900
10 4 1 0 0 1 0 0 0 0
2 0.18 0.13 0.20 0.15 0.19 0.10 0.15 0.09 0.12 0.13
26900 27000 27000 26900 25800 26900 26900 26800 26800 26800
4 6 1 1 0 1 1 1 0 0
3 0.12 0.11 0.15 0.14 0.11 0.10 0.11 0.12 0.09 0.13
26900 27100 26900 26900 26500 26800 26800 26500 26500 26700
10 3 1 2 0 0 0 0 0 0
4 0.03 0.10 0.14 0.13 0.12 0.20 0.10 0.19 0.12 0.17
27000 26400 26400 26700 27000 26700 26400 26900 26800 26800
7 4 0 2 1 0 0 1 0 1
5 0.12 0.12 0.20 0.08 0.13 0.10 0.12 0.09 0.13 0.20
26400 26800 26800 26800 26800 26700 26700 26800 26700 26200
8 5 1 1 0 0 0 0 0 1
6 0.16 0.10 0.16 0.12 0.13 0.11 0.10 0.11 0.12 0.15
27100 26700 27100 26700 26600 27000 26600 26900 26500 26500
10 5 0 0 0 1 0 0 0 0
1 1
Google Google
Yahoo Yahoo
0.8 0.8
0.6 0.6
Latency
Latency
0.4 0.4
0.2 0.2
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Page Number Page Number
Fig. 6. Latency versus page number for permutation P5. Fig. 7. Latency versus page number for permutation P6.
0.4
A. Query Permutations
0.35
To see how a search engine behaves for different permuta-
0.3
tions of a query, we consider the following query.
0.25
0.6
entiate from one another. We wish to analyze search results on
0.5 the basis of search time, number of relevant results and query
0.4
space. The query space is nothing but the cardinality of all
0.3
results returned by a given search engine in response to a given
query. Note that search time is defined as the actual time taken
0.2
1 1.5 2 2.5
Correlation
3 3.5 4 by the search engine to deliver the results searched. Ideally, it
Fig. 9. Latency versus correlation for random queries.
does not depend upon the speeds of hardware, software, and
network components from where queries are fired because it is
the time taken by the search engine server. Relevant results are
0.65
Google:Q1
Google:Q2
those which the user intends to search. For example, the user
Yahoo:Q1
0.6 Yahoo:Q2 intends to search information about Ash Mohammad Abbas9.
0.55 Therefore, all those results that contain Ash Mohammad Abbas
0.5 are relevant for the given query.
0.45
In what follows, we discuss the results obtained for different
Latency
0.4
permutation of a given query. Let the given query be Ash
Mohammad Abbas. For all permutations, all those results that
0.35
contain Ash Mohammad Abbas are counted as relevant results.
0.3
Since both Google and Yahoo deliver the results page wise,
0.25 therefore, we list all parameters mentioned in the previous
0.2 paragraph page wise. We go up to 10 pages for both the search
1 1.5 2 2.5 3 3.5 4
Correlation engines as the results beyond that are rarely significant.
Fig. 10. Query Space versus correlation for queries with embedded semantics. Table I shows search latencies, query space, and the number
of relevant results for different permutations of the given query.
1.1
Google:Q1
The search portal is Google. Our observations are as follows.
Google:Q2
1
Yahoo:Q1
Yahoo:Q2 For all permutations, the query space remains the same
0.9 and it does not vary along the pages of results.
0.8
The time to search the first page of the results in response
to a the given query is the largest for all permutations.
0.7
Latency
0.5 9 We have intentionally taken the query: Ash Mohammad Abbas. We wish
0.4
to search for different permutations of a query and the effect of those
permutations on query space and on the number of relevant results. The
0.3 relevance is partly related to the intentions of an end-user. Since we already
know what are the relevant results for the chosen query, therefore, this is
0.2
1 1.5 2 2.5 3 3.5 4 easier to decide what relevant results out of them have been returned by a
Correlation
search engine. The reader may take any other query, if he/she wishes so. In
Fig. 11. Query Space versus correlation for random queries. that case, he has to decide what are the results that are relevant to his/her
query and this will partly depend upon what he/she intended to search.
UbiCC Journal - Volume 3 63
TABLE III
Q UERIES WITH EMBEDDED SEMANTICS .
S. No. Query No. Query Correlation
E1 Q1 node disjoint multipath 1
Q2 edge disjoint multicast
E2 Q1 node disjoint multipath routing 2
Q2 edge disjoint multicast routing
E3 Q1 node disjoint multipath routing 3
Q2 edge disjoint multipath routing
E4 Q1 node disjoint multipath routing ad hoc 4
Q2 wireless node disjoint multipath routing
TABLE IV
Q UERIES WITHOUT EMBEDDED SEMANTICS ( RANDOM QUERIES ).
S. No. Query No. Query Correlation
R1 Q1 adhoc node ergonomics 1
Q2 quadratic power node
R2 Q1 computer node constellations parity 2
Q2 hiring parity node biased
R3 Q1 wireless node parity common mitigate 3
Q2 mitigate node shallow rough parity
R4 Q1 few node parity mitigate common correlation 4
Q2 shallow mitigate node parity common stanza
TABLE V
S EARCH TIME AND Q UERY S PACE FO QUERIES WITH EMBEDDED SEMANTICS .
S. No. Query No. Google Yahoo
Time Query Space Time Query Space
E1 Q1 0.27 43100 0.37 925
Q2 0.23 63800 0.28 1920
E2 Q1 0.48 37700 0.40 794
Q2 0.32 53600 0.32 1660
E3 Q1 0.48 37700 0.40 794
Q2 0.24 21100 0.34 245
E4 Q1 0.31 23500 0.64 79
Q2 0.33 25600 0.44 518
TABLE VI
S EARCH TIME AND QUERY SPACE FOR RANDOM QUERIES .
S. No. Query No. Google Yahoo
Time Query Space Time Query Space
R1 Q1 0.44 28500 0.57 25
Q2 0.46 476000 0.28 58200
R2 Q1 0.46 34300 0.55 164
Q2 0.42 25000 0.35 90
R3 Q1 0.47 25000 0.40 233
Q2 0.33 754 0.68 31
R4 Q1 0.34 20000 0.58 71
Q2 1.02 374 0.64 23
Table II shows the same set of parameters for different number of relevant results. For permutation 2 (i.e. Ash
permutations of the given query for search portal Yahoo. From Abbas Mohammad), the second page contains the largest
the table, we observe that number of relevant results.
As opposed to Google, the query space does not remain Let us discuss reasons for the above mentioned observations.
same, rather it varies with the pages of searched results. Consider the question why query space in case of Google is
The query space in this case is less than Google. larger than that of Yahoo. We have pointed out that Google
The time to search the first page of results is not neces- is based on the page ranks. For a given query (or a set of
sarily the largest of the pages considered. More precisely, words), it ranks the pages. It delivers all the ranked pages
it is larger for the pages where there is no relevant result. that contain the words contained in the given query. On the
Further, the time taken by Yahoo is less than that of other hand, Yahoo is an ontology based search engine. As
Google. mentioned earlier, it will search only that part of its Webbase
In most of the cases, the first page contains the largest that constitutes the ontology of the given query. This is the
UbiCC Journal - Volume 3 64
TABLE VII TABLE IX
L ATENCY MATRIX , L, FOR DIFFERENT PERMUTATIONS . R ELEVANCE MATRIX FOR DIFFERENT PERMUTATIONS FOR G OOGLE .
P p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 P p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
1 1 1
1 1 0 0 1 1 1 1 1 8 5 1 0 2 0 0 3 0 0
2
2 0 0 0 0 1 0 1 0 0 0 2 3 2 2 1 1 0 2 0 0 0
3 0 1 0 0 0 0 0 0 0 0 3 6 4 1 3 2 1 1 0 0 0
4 0 1 0 1 0 1 0 1 0 0 4 3 0 2 1 0 0 2 0 1 1
5 0 1 0 0 0 0 0 0 0 1 5 3 2 1 2 1 0 0 1 1 1
6 0 0 1 1
0 0 0 0 0 1 6 5 4 1 3 0 2 1 2 2 0
2 2
reason why query space in case of Google is larger than that shown in Table IV. The words contained in these queries are
of Yahoo. random and are not related semantically.
Let us answer the question why query space changes in We wish to evaluate the performance of a search engine
case of Yahoo and why it remains constant in case of Google. for k-correlated queries. For that we evaluate search time and
Note that ontology may change with time and with order of query space of a search engine for the first page of results.
words in the given query. For every page of results, Yahoo Since both Google and Yahoo deliver 10 results per page,
estimates the ontology of the given permutation of the query therefore, looking for the first page of results means that we
before delivering the results to the end user. Therefore, the are evaluating 10 top most results of these search engines. Note
query space for different permutations of the given query is that we do not consider number of relevant results because
different and it changes with pages of the searched results10 . relevancy in this case would be query dependent. Since there
However, page ranks do not change either with pages or with is no single query, therefore, evaluation of relevancy would
order of words. The page ranks will only change when new not be so useful.
links or documents are added to the Web that are relevant to Table V shows search time and query space for k-correlated
the given query. Since neither a new link nor a new document queries with embedded semantics (see Table III). The second
is added to the Web during the evaluation of permutations of query, Q2 , is fired after the first query Q1 . On the other hand,
the query, therefore, the query space does not change in case Table VI shows search time and query space for k-correlated
of Google. queries whose words may not be related (see Table IV).
In order to compare the performance of Google and Yahoo,
the latencies versus page numbers for different permutations TABLE XI
of the query have been shown in Figures 2 through 7. Let us R ELEVANCE FOR DIFFERENT PERMUTATIONS .
consider the question why search time in case of Google is P Google Yahoo
larger than that of Yahoo. Note that Google ranks the results 1 19 16
before delivering them to end users while Yahoo does not. The 2 11 15
3 18 16
ranking of pages takes time. This is the reason why search time 4 10 16
taken by Google is larger than that of Yahoo. 5 12 16
In what follows, we discuss how a search engine behaves 6 20 16
Total 90 95
for correlated queries.
TABLE XII
B. Query Correlation Earned Points (EP) FOR DIFFERENT PERMUTATIONS .
We have formulated k-correlated queries as shown in Ta- P Google Yahoo
ble III. Since all words contained in a query are related11, Latency Query EP Latency Query EP
Space Space
therefore, we call them queries with embedded semantics. On
1 14 5 19 33 5 3 0 3
the other hand, we have another set of k-correlated queries as 2 3 11 14 14 0 14
3 4 18 22 13 0 13
10 This observed behavior may also be due to the use of a randomized
4 1 10 11 9 0 9
algorithm. To understand the behavior of randomized algorithms, readers are 5 3 12 15 10 0 10
referred to any text on randomized algorithms such as [10]. 6 25 20 22 5 16 0 16
11 More precisely, all words in these queries are from ad hoc wireless Total 118 65
networks, an area that authors of this paper like to work.
UbiCC Journal - Volume 3 65
TABLE XIII
In order to compare the performance of Yahoo and Google,
CEP FOR DIFFERENT PERMUTATIONS FOR G OOGLE .
the latencies versus correlation for queries with embedded
P Latency Query Space
semantics is shown in Figure 8 and that for randomized queries Contribution Contribution
is shown in Figure 9. Similarly, the query space for queries 1 123 834 5700000
with embedded semantics is shown in Figure 10 and that for 2 61 262 3300000
randomized queries is shown in Figure 11. 3 116 534 5400000
4 35 918 3000000
The query space of Yahoo is much less than that of Google 5 73 458 3600000
for the reasons discussed in the previous subsection. Other 6 530 376 6000000
Total 530 376 27000000
important observations are as follows.
In case of k-correlated queries with embedded semantics,
TABLE XIV
generally the time to search for Q2 is less than that of
CEP FOR DIFFERENT PERMUTATIONS FOR YAHOO .
Q1 .
P Latency Query Space
This is due to the fact that since the queries are correlated, Contribution Contribution
some of the words of Q2 have already been searched 1 101 385 419900
while searching for Q1 . 2 107 821 404100
The query space is increased when the given query has 3 131 558 431000
4 308 197 428700
a word that is more frequently found in Web pages (e.g. 5 130 833 425000
in R1: Q2 , the word quadratic that is frequently used 6 121 591 431500
in Engineering, Science, Maths, Arts, etc.). The query Total 901 385 2540200
space is decreased when there is a word included in
the query which is rarely used (e.g. mitigate included
in R3,R4:Q1 Q2 and shallow included in R3,R4:Q2). as follows.
The search time is larger in case of randomized queries
as compared to queries with embedded semantics. 1
1
if latency1i j latency0i j
The reason for the this observation is as follows. In case li j 2 if latency1i j latency0i j (4)
of queries with embedded semantics, the words of a given 0 otherwise.
query are related and are found in Web pages that are not Similarly, let S si j be a matrix where si j is defined as
too far from one another either from the point of view follows.
of page rank as in Google or from the point of view of
ontology as in Yahoo. 1
1
if space1i j space0i j
si j if space1i j space0i j (5)
One cannot infer anything about the search time of 2
Google and Yahoo as it depends upon the query. More 0 otherwise.
precisely, it depends upon the fact which strategy takes In matrices defined above, where there is a ’1’, it means at
more time whether page rank in Google or estimation of that place Google is the winner and a ’ 12 ’ represents that there
ontology in Yahoo. has been a tie between Google and Yahoo. We now define a
However, from Table V and Table VI, one can infer the parameter that we call Earned Points (EP) which is as follows.
following. Google is better in the sense that its query space
∑ pages
L
is much larger than that of Yahoo. However, Yahoo takes EPk relevantki k
i Sik (6)
less time as compared to Google for different permutations i 1
of the same query. For k-correlated queries with embedded where, superscript k 0 1 denotes the search engine.
semantics, Google takes less time to search for the first query Table VII shows a latency matrix, L, for different permu-
as compared to Yahoo. It also applies to randomized queries tations of the query as that for Table I and Table II, and has
with some exceptions. In exceptional cases, Google takes been constructed using both of them. In the latency matrix,
much more time as compared to Yahoo. We have mentioned there are 40 ’0’s, 17 ’1’s, and 3 ’ 12 ’. We observe from the
it previously that it depends upon the given query as well as latency matrix that Yahoo is the winner (as far as latencies
the strategy employed in the search engine. are concerned), as there are 40 ’0’s out of 60 entries in total.
In what follows, we describe a unified criteria for comparing On the other hand, Table VIII shows the query space
the search engines considered in this paper. matrix, S, for different permutations of the same query and
is constructed using the tables mentioned in the preceding
paragraph. One can see that as far as query space is concerned,
V. A U NIFIED C RITERIA FOR C OMPARISON Google is the sole winner. Infact, query space of Google is
much larger than that of Yahoo.
Let us denote Google by a superscript ’1’ and Yahoo by a The relevance matrix for Google is shown in Table IX and
superscript ’0’12 . Let L li j be a matrix where li j is defined that for Yahoo is shown in Table X. The total relevance for the
first ten pages is shown in Table XI for both Google as well
12 This is simply a representation. One may consider a representation which as Yahoo. It is seen from Table XI that the total relevance for
is reverse of it, then also, there will not be any effect on the criteria. Google is 90 and that for Yahoo is 95. Average relevance per
UbiCC Journal - Volume 3 66
TABLE XV
Table XIII shows contributions of latency and query space
C ONTRIBUTION DUE TO QUERY SPACE IN CEP FOR DIFFERENT SETS OF
in CEP for Google. Similarly, Table XIV shows the same for
WEIGHTS .
Yahoo. We observe that contribution of latency for Google is
Weights
wl 1, wq 10
6
Google
27 00
Yahoo
2 54
530 376 and that for Yahoo is 901 385. However, contribution
wl
wl
1,
1,
wq
wq
10
10
5
4
270 00
2700 00
25 40
254 02
of query space for Google is 27000000 and that for Yahoo is
2540200. In other words, the contribution of query space for
wl
wl
1,
1,
wq
wq
10
10
3
2
27000 00
270000 00
2540 20
25402 00
Google is approximately 11 times of that for Yahoo. Adding
these contributions shall result in a larger CEP for Google as
wl 1, wq 10 1 2700000 00 254020 00
wl 1, wq 1 27000000 2540200 compared to Yahoo. The CEP defined using (7) has a problem
that we call dominating constituent problem (DCP). The larger
parameter suppresses the smaller parameter. Note that the
TABLE XVI
definition of CEP in (7) assumes equal weights for latency
CCEP FOR DIFFERENT SETS OF comparable weights.
and query space. On the other hand, one may be interested in
Weights Google Yahoo assigning different weights to constituents of CEP depending
wl 0 9, wq 01 486 3384 811 2465 upon the importance of constituents. Let us rewrite (7) to
wl 0 8, wq 02 442 3008 721 1080
wl 0 7, wq 03 398 2632 630 9695 incorporate weights. Let wl and wq be the weights assigned to
wl 0 6, wq 04 354 2256 540 8310 latency and query space, respectively. The (7) can be written
wl 0 5, wq 05 310 1880 450 6925
as follows.
wl 0 4, wq 06 266 1504 360 5540
wl 0 3,
wl 0 2,
wq
wq
07
08
222 1128
178 0752
270 4155
180 2770 CEP k ∑
pages
relevantki 1
wl qki wq (8)
wl 0 1, wq 09 134 0376 90 1385 i 1 dik
The weights should be chosen carefully. For example, the
weights wl 1, wq 10 6 will add 27 to the contribution in
permutation and per page for Google is 1 5 and that for Yahoo
CEP due to query space for Google and 2 54 to Yahoo. On the
is 1 583. Therefore, as far as average relevance is concerned, other hand, a set of weights wl 1, wq 10 5 shall add 270
Yahoo is the winner. for Google and 25 4 for Yahoo. Table XV shows contribution
Table XII shows the number of earned points for both of query space in CEP for different sets of weights. It is to note
Google as well as Yahoo for different permutations of the that wl is fixed to 1 for all sets, and only wq is varied. As wq is
query mentioned earlier. We observe that the number of earned increased beyond 10 5, the contribution of query space starts
points for Google is 118 and that for Yahoo is 65. The number dominating over the contribution of latency. The set of weight
of earned points of Google is far greater than Yahoo. The wl 1 wq 10 5 indicates that one can ignore contribution
reason behind this is that query space of Yahoo is always less of query space in comparison to the contribution of latencies
than that of Google and it does not contribute to the number provided that one is more interested in comparing search
of earned points. engines with respect to latency. In that case, an approximate
A closer look on the definition of EP reveals that while expression for CEP can be written as follows.
∑
defining the parameter EP in (6) together with (4) and (5),
pages
we have assumed that a search engine either has a constituent
parameter (latency or query space) or it does not have that
CEP k
i 1
relevantki
1
dik (9)
parameter at all. The contribution of some of the parameter Alternatively, one may consider an approach that is combi-
is lost due the fact that the effective contribution of other nation of the definition of EP defined in (6) (together with (4)
parameter by which the given parameter is multiplied is zero. and (5) and that of CEP defined in (7). In that we may use
Note that our goal behind introduction of (6) was to rank the definition of matrix S which converts the contribution of
the given set of search engines. We call this type of ranking query space in the form of binaries13 . The modified definition
of search engines as Lossy Constituent Ranking (LCR). We,
is as follows.
therefore, feel that there should be a method of comparison
between a given set of search engines that is lossless in nature.
CCEPk ∑pages
relevantki 1
Sik (10)
For that purpose, we define another parameter that we call i 1 dik
Contributed Earned Points (CEP). The definition of CEP is as
follows. where Sik is in accordance with the definition of S given by
(5). The acronym CCEP stands for Combined Contributory
CEP k ∑ pages
relevantki 1
qki (7)
Earned Points. If one wishes to incorporate weights, then the
definition of CCEP becomes as follows.
dik
∑
i 1
pages
where, superscript k 0 1 denotes the search engine, d
denotes the actual latency, and q denotes the actual query
CCEPk
i 1
relevantki
1
dik
wl Sik wq (11)
space. The reason behind having an inverse of actual latency 13 We mean that the matrix S says either there is a contribution of query
in (7) is that the better search engine would be that which space of a search engine provided that its query space is larger than that of
takes less time. the other one or there is no contribution of query space at all, if otherwise.
UbiCC Journal - Volume 3 67
In the definition of CCEP given by (11) the weights can be weights to different constituents of the criteria—latency and
comparable and the dominant constituent problem mentioned query space. Our observations are as follows.
earlier can be mitigated for comparable weights. We define We observed that performance of Yahoo is better in terms
comparable weights as follows.
of the latencies, however, Google performs better in terms
Definition 3: A set of weights W wi wi 0, is said of query space.
to have comparable weights if and only if ∑i wi 1 and the We discussed the dominant constituent problem. We
condition 19 wwij 9 is satisfied wi w j W . discussed that this problem can be mitigated using the
Table XVI shows the values of CCEP for different sets of concept of contributory earned points if weights assigned
comparable weights. We observe that the rate of decrease of to constituents are comparable. If both the constituent
CCEP for Yahoo is larger than that of Google. For example, for are assigned equal weights, we found that Yahoo is the
wl 0 9 wq 0 1, CCEP for Google is 486 3384 and that for winner.
Yahoo is 811 2465. For wl 0 8 wq 0 2, CCEP for Google However, the performance of a search engine may depend
is 442.3008 and that for Yahoo is 721 1080. In other words, upon the criteria itself and only one criteria may not be
the rate of decrease in CCEP for Google is 9 05% and that for sufficient for an exact analysis of the performance. Further
Yahoo is 11 11%. The reason being that in the query space investigations and improvements in this direction forms our
matrix, S, (see Table VIII) all entries are ’1’. It means that future work.
query space of Google is always larger than that of Yahoo.
Therefore, in case of Yahoo, the contribution due to query R EFERENCES
space is always 0 irrespective of the weight assigned to it.
[1] S. Malhotra, ”Beyond Google”, CyberMedia Magazine on Data Quest,
However, in case of Google the contribution due to query space vol. 23, no. 24, pp.12, December 2005.
is nonzero and increases with an increase in weight assigned [2] M.R. Henzinger, A. Haydon, M. Mitzenmacher, M. Nozark, ”Measuring
due to query space. Moreover, for a set of
to the contribution Index Quality Using Random Walks on the Web”, Proceedings of 8th
weights, W wl 0 5 wq 0 5 , the values of CCEP are International World Wide Web Conference, pp. 213-225, May 1999.
[3] M.C. Tang, Y. Sun, ”Evaluation of Web-Based Search En-
310 1880 and 450 6925 for Google and Yahoo, respectively. gines Using User-Effort Measures”, Library an Information Sci-
It means that if one wishes to assign equal weights to latency ence Research Science Electronic Journal, vol. 13, issue 2, 2003,
http://libres.curtin.edu.au/libres13n2 /tang.htm.
and query space then Yahoo is the winner in terms of the [4] C.W. Cleverdon, J. Mills, E.M. Keen, An Inquiry in Testing of Infor-
parameter CCEP. mation Retrieval Systems, Granfield, U.K., 1966.
In case of CCEP, the effect of the dominating constituent [5] J. Gwidzka, M. Chignell, ”Towards Information Retrieval Mea-
sures for Evaluation of Web Search Engines”, http://www.imedia.
problem is less as compared to that in case of CEP. In other mie.utoronto.ca/people/jacek/pubs/webIR eval1 99.pdf, 1999.
words, the effect of large values of query space is fairly smaller [6] D. Rafiei, A.O. Mendelzon, ”What is This Page Known For: Computing
in case of CCEP as compared to that in case of CEP. This is Web Page Reputations”, Elsevier Journal on Computer Networks, vol
33, pp. 823-835, 2000.
with reference to our remark that with the use of CCEP the [7] N. Bhatti, A. Bouch, A. Kuchinsky, ”Integrating User-Perceived Quality
dominating constituent problem is mitigated. into Web Server Design”, Elsevier Journal on Computer Networks, vol
33, pp. 1-16, 2000.
[8] S. Brin, L. Page, ”The Anatomy of a Large-Scale Hypertextual Web
VI. C ONCLUSIONS Search Engine”, http://www-db.stanford.edu/pub/papers/google.pdf,
In this paper, we analyzed the impact of correlation among 2000.
[9] J. Kleinberg, ”Authoritative Sources in a Hyperlinked Environment”,
queries on search results for two representative search portals Proceedings of 9th ACM/SIAM Symposium on Discrete Algorithms,
namely Google and Yahoo. The major accomplishments of the 1998.
paper are as follows: [10] R. Motwani, P. Raghavan, Randomized Algorithms, Cambridge Univer-
sity Press, August 1995.
We analyzed the search time, the query space and the
number of relevant results per page for different per-
mutations of the same query. We observed that these
parameters vary with pages of searched results and are
different for different permutations of the given query.
We analyzed the impact of k-correlation among two
subsequent queries given to a search engine. In that
we analyzed the search time and the query space. We
observed that
– The search time is less in case of queries with
embedded semantics as compared to randomized
queries without any semantic consideration.
– In case of randomized query, the query space is
increased in case the given query includes a word
that is frequently found on the Web and vice versa.
Further, we considered a unified criteria for comparison be-
tween the search engines. Our criteria is based upon the
concept of earned points. An end-user may assign different
UbiCC Journal - Volume 3 68
ARTIFICIAL IMMUNE SYSTEMS FOR ILLNESSES DIAGNOSTIC
ABSTRACT
Lately, a lot of new illnesses are frequently observed in our societies, that it can be
avoid by daily visits to the doctor. Cancer is one of these illnesses where patients
discover it only when it is too late. In this work we propose an artificial Cancer
diagnostic which can classify patients if they are affected by Cancer or no, for this
goal we have developed the artificial immune system for Cancer diagnostic. The
artificial immune system is one of the newest approaches used in several domains
as pattern recognition, robotic, intrusion detection, illnesses diagnostic…a lot of
methods are exposed as negative selection, clone selection and artificial immune
network (AINet). In this paper we’ll present the natural immune system, we’ll
develop also four versions of Artificial Immune Recognition System (AIRS) and
after we’ll present results for Cancer diagnostic with some critics and remarks of
these methods.
The changes made to the AIRS algorithm are To determine the relative performance of AIRSs
small, but it offers simplicity of implementation, data algorithms, it was necessary to test it on data base; so
reduction and minimizes the processing time. we have chosen three Cancer data bases from
The AIRS2 training steps are the same as AIRS hospitable academic center of Wisconsin: Brest
one, just some changes which are presented as Cancer Wisconsin (BCW), Wisconsin Prognostic
following: Breast Cancer (WPBC) and Wisconsin Diagnostic
1- It’s not necessary to initialize the ARB set. Breast Cancer (WDBC). The description of this data
2- It’s not necessary to mutate the ARBs class bases are given as following:
feature, because in AIRS2 we are interesting only - Brest Cancer Wisconsin (BCW): This data base
about cells of the same class of antigen. was obtained from hospitable academic center of
3- Resources are only allocated to ARBs of the same Wisconsin in 1991, which describe the cancerous
class as antigen and are allocated in proportion to the symptoms and classify them into two classes:
ARB’s stimulation level in reaction to the antigen. ‘Malignant’ or ‘Benin’. The distribution of patients
4- The training stopping criterion no longer takes is given as following: (Malignant, 214) (Benin, 458).
into account the stimulation value of ARBs in all - Wisconsin Prognostic Breast Cancer (WPBC): This
classes, but only accounts for the stimulation value data base is conceived by the same hospitable
of the ARBs of the same class as the antigen. academic in 1995 but it gives more details then BCW
giving nucleus of cell observations. Basing of its
6 AIRS AND AIRS2 ALGORITHMS USING characteristics, patients are classify into two classes:
MERGING FACTOR ‘Recur’ and ‘NonRecur’, where its distribution as
following: (Recur, 47) (NonRecur, 151).
In this session we’ll present other modification - Wisconsin Diagnostic Breast Cancer (WDBC):
of the AIRS and AIRS2. This modification carries on This data base is conceived also by the same
the last training step (Memory cell introduction), hospitable academic in 1995, it has the same attribute
mainly in the cell introduction criterion; the then WPBC, but it classify its patients into two
condition was as following: classes: ‘Malignant’ and ‘Benin’, where its
C a n d S tim ← S tim u la tio n ( a g , m c ca n d id a te ) distribution as following: (Malignant, 212) (Benin,
M a tch S tim ← S tim u la tio n ( a g , m c m a tch ) 357).
(7) All training data are antigens, represented by
C ellA ff ← a ffin ity ( m c ca n d id a te , m c m a tch )
characteristic vectors; also for antibodies have the
if ( C a n d S tim > M a tch S tim ) same characteristic vector size as antigens. The ARB
if ( C ellA ff < A T * ATS ) is represented as structure having antibody
M C ← M C − m c m a tch characteristic vector, his stimulation with antigen and
M C ← M C ∪ m c ca n d id a te
the resources that allowed.
9 CONCLUSIONS
ABSTRACT
Runtime hot method detection being an important dynamic compiler optimization
parameter, has challenged researchers to explore and refine techniques to address
the problem of expensive profiling overhead incurred during the process. Although
the recent trend has been toward the application of machine learning heuristics in
compiler optimization, its role in identification and prediction of hot methods has
been ignored. The aim of this work is to develop a model using the machine
learning algorithm, the Support Vector Machine (SVM) to identify and predict hot
methods in a given program, to which the best set of optimizations could be
applied. When trained with ten static program features, the derived model predicts
hot methods with an appreciable 62.57% accuracy.
authors have used the PCA to reduce the feature set. Figure 3: Optimal hyperplane and margin of
A linear regression model and an artificial neural separation
network model are used for building the prediction
model which is shown to work better than the non- 4 HOT METHOD PREDICTION
feature-based predictors.
In their work Fursin et al. [14] have used This section briefly describes how machine
Prediction Accuracy %
100
and prediction. UTDSP is a C benchmark and SPEC
80
CPU2000 INT has C and C++ benchmarks.
Evaluation of the system is based on only the C 60
2
p
cf
x
er
c
e
f
ol
ip
gc
r te
zi
ag
m
rs
tw
bz
g
6.
1.
vo
pa
er
4.
0.
6.
17
18
Av
16
5.
7.
30
25
25
19
5.3 Tools and platform SPEC CPU2000 INT
The system is implemented in the Low Level
Virtual Machine (LLVM) version 1.6 [6]. LLVM is Figure 7: Total prediction accuracy on the SPEC
an open source compiler infrastructure that supports CPU2000 INT benchmark
compile time, link time, run time and idle time
optimizations. The results are evaluated on an Intel The total method prediction accuracy on the
(R) Pentium (R) D with 2.80 GHz and 480MB of SPEC CPU2000 INT and UTDSP benchmark suites
RAM running Fedora Core 4. This system uses the is shown in Fig. 7 and 9. The total method prediction
libSVM tool [7]. It is a simple library for Support accuracy for all C programs on the SPEC CPU2000
Vector Machines written in C. INT varies from 36 % to 100 % with an average of
68.43%, 71.14% and 71.14% for the three hot
method thresholds respectively. This averages to
Hot Method Prediction Accuracy
70.24%. The average prediction accuracies obtained
Hot Method Thresholds 50% 40% 30% on the UTDSP benchmark suite are 69%, 71% and
120 58% respectively for 50%, 40% and 30% hot method
thresholds. This averages to 66%. Overall the system
Prediction Accuracy %
100
80
predicts both hot and cold methods in a program with
68.15% accuracy.
60
0
Optimizers depend on profile information to
identify the hot methods of program segments. The
2
p
cf
x
er
c
e
f
ol
ip
gc
rte
zi
ag
m
tw
bz
g
6.
1.
vo
pa
er
4.
0.
6.
17
18
Av
16
5.
7.
30
25
80
60
40
20
t
ir
c
2
ff t
m
em
m
s
fi r
ss
eg
ge
m
iir
ee
ng
ct
ul
tra
lp
l li
sf
72
nr
pc
ra
te
e
ra
tr e
jp
sL
lm
Fu
ec
od
pr
t
de
og
ad
la
e
cu
dy
m
m
sp
Av
st
_
co
2.
ar
ge
en
hi
V3
M
ed
W
1.
1.
72
72
UTDSP benchmark
G
80
60
40
20
e
ff t
fir
i ir
ng
eg
ct
t
c
ir
m
rm
em
ee
llis
m
ra
ul
ss
ag
lp
72
sf
te
pc
ra
m
jp
Fu
tn
L
e
tre
lm
er
ec
od
e
G
s
pr
ad
og
la
_d
cu
dy
Av
sp
m
m
st
ge
en
ar
2.
co
hi
M
V3
ed
W
1.
1.
72
72
UTDSP benchmark
G
ABSTRACT
Keywords: Handoff, Path loss, Received signal strength, ping pong, cellular mobile.
HATA
Distance COST ECC33
OKUMURA Distance (km)
(km) 231(dB) (dB)
(dB)
0.5 104.7 105.1 130.7 Fig.2 Comparison of Path loss
Distance (km)
Distance (km)
Fig.3 Received signal strength for suburban models
Fig. 4 Received signal strength for highway models
The received signal strength for COST
The general area around highway could be
231, Hata-Okumura and ECC 33 models are
suburban because of location. The path loss
calculated using eq (7) shown in Table.3 for
calculation is a major concern in highways with and
suburban area and the comparison shown in Fig.3.
without noise level. In this paper the suburban model
The received signal strength using ECC 33 model is
is modified to highway with small correction factor
-103.3 dBm at four kilometers which is greater than
with respect to the location. RSS value for highway
threshold level of mobile receiver -102 dBm. The
without noise factor was calculated using suburban
received signal strength using COST 231 and Hata-
model but highway with noise factor was calculated
Okumura model values are less than the sensitivity
using additional correction factor of 5.4 with suburban
threshold of mobile. So these two models are
model. Here up to 2.5 km the received signal strength
preferred for maximum coverage area and reduce
was calculated using suburban model and beyond 2.5
the number of handoff.
km it was calculated with additional noise
factor of 5.4.
4.3 Highway propagation model
REFERENCES
1
G.A.Sathishkumar and 2Dr.K.Boopathy bagan
1
Assistant Professor, Department of Electronics and Communication Engineering, Sri Venkateswara
College of Engineering, Sriperumbudur -602108.
2
Professor, Department of Electronics, Madras Institute of Technology, Anna University Chrompet Campus,
Chennai-600044
Tamil Nadu.
INDIA
[email protected] , [email protected]
ABSTRACT
This paper presents a hardware implementation of efficient algorithms that uses the
mathematical framework. The framework based on the Winograd’s Fourier Transform
Algorithm, obtaining a set of formulations that simplify cyclic convolution (CC) computations
and CRT. In particularly, this work focuses on the arithmetic complexity of a multiplication
and when there is multiplication then the product represents a CC computational operation.
The proposed algorithms is compared against existing algorithms developed making use of the
FFT and it is shown that the proposed algorithms exhibit an advantage in computational
efficiency .This design is most useful when dealing with large integers, and is required by
many modern cryptographic systems.
The Winograd Fourier Transform Algorithm (WFTA) is a technique that combines
the Rader’s index mapping and Winograd’s short convolution modules for prime-factors into
a composite-N Fourier Transform structure with fewer multipliers (O(N) ). While theoretically
interesting, WFTA’s are very complicated and different for every length. It can be
implemented on modern processors with few hardware multipliers and hence, is very useful in
practice today.
INTRODUCTION
Many popular crypto-systems like the RSA WFTA was the maximization of performance
encryption scheme [12], the Diffie-Hellman on several levels, including the implemented
(DH) key agreement scheme [13], or the hardware algorithms.
Digital Signature Algorithm (DSA) [14] are
based on long integer modular exponentiation. In digital signal processing, the design of fast
A major difference between the RSA scheme and computationally efficient algorithms has
and cryptosystems based on the discrete been a major focus of research activity. The
logarithm problem is the fact that the modulus objective, in most cases, is the design of
used in the RSA encryption scheme is the algorithms and their respective implementation
product of two prime numbers. This allows in a manner that perform the required
utilizing the Chinese Remainder Theorem computations in the least amount of time. In
(CRT) in order to speed up the private key order to achieve this goal, parallel processing
operations. From a mathematical point of has also received a lot attention in the research
view, the usage of the CRT for RSA community [1]. This document is organized as
decryption is well known. However, for a follows. First, mathematical foundations
hardware implementation, special multiplier needed for the study of algorithms to compute
architecture is necessary to meet the the DFT and FFT algorithm are summarized.
requirements for efficient CRT-based Second, identification is established between
decryption. This paper presents the basic Winograd Fourier Transform and the Rader’s
algorithmic and architectural concepts of the algorithm. Third, the algorithm development
WFTA crypto chip, and describes how they for the basic problem of the multiplication,
were combined to provide optimum using the conceptual framework developed in
performance. The major design goal with the the two previous sections, is explained. The
3. Compute
for i=0,1,2,...,k.
for i =0,1,...,k.
4. REALIZATION OF WFTA IN
VERILOG HDL
5. REFERENCES
Figure 7 Comparisons of normal dft and wdft [5].M. D. Macleod and N. L. Bragg, “A fast
hardware implementation of the Winograd
4.2 CONCLUSION AND FUTURE Fourier transform algorithm,” Electron. Lett.,
ENHANCEMENTS vol. 19, pp. 363–365, May (1983).
ABSTRACT
In Global world there is a huge network consisting by different companies for their suppliers, warehouses,
distribution centers, retailers, with the help of these entities any organization acquired raw material ,
transformed , and delivered finished goods. The major challenges for Industrial organization are to reduce
product development time, improve quality, and reduce cost of production. This is done only when the
relationship among various organization/industrial houses is good, This is not only be done by the change of
Industrial process/ method but the latest electronic tools controlled by computer software is required to establish
in this competitive world. Software agents consist of one or many responsibility of supply chain, and each agent
can interact with others without human intervention in planning and execution of their responsibilities. This
paper present solution for the construction, architecture, coordination and designing of agents. This paper
integrates bilateral negotiation, Order monitoring system and Production Planning and Scheduling multiagent
system.
KeyWords: Agent, Supply chain, Multiagent, Multiagent System Architecture for supply chain
management
1. INTRODUCTION
To improve the efficiency of supply chain the different activities of supply chain can be
management it is mandatory to take intelligent, distributed in agents. A typical example of
tactical, strategic and good operational decision at multiagent system is taken with the help of Coffee
each end of chain. Under strategic decision the agent maker and toast maker. Let a person wants the toast
will take the decision about suppliers, warehouses, as the coffee is ready, means the coordination
production units, transportation system etc. The between Coffee maker and toast maker is essential.
tactical decision takes place to meet actual demand. Otherwise many situation may be raised like Coffee
The agent on operational level is responsible to is ready but toast is not prepared and it comes after
execute whole plan. To do all things in smooth way some time or the toast is ready and the Coffee is not
the coordination among agents is must otherwise if prepared.
the material do not arrive on time the production will 1.Agents are problem solvers.
stop, if finished good has been ready and warehouses
are not empty then it will create a great confusion. 2.Agents are pro-active.
The ability to manage all level of supply chain 3.Agents are goal-oriented.
system [1], coordination, accurate and timely
dissimilation of information is the enterprise goal. 4.Agents are context-ware.
5.Agents are autonomous
2 Agent
2.1 Requirement / Logistics agent
In Software we can define the agent that it is an These agents coordinate all activities of plant and
entity which consists of various attributes defines a find the various demands of various sections. It holds
particular domain. Exp: An agent deals with the data of day to day production, find how much
warehousing consist its local attribute as well as the material has been consumed a day depending on the
details which will be coordinated with other entity working hours a machine works. Categorized each
(Agents). So agents emulate the mental process or component in different table and coordinates with
simulate the rational behavior. A multi-agent system other agent like Demand agent etc. The intelligent
is a loosely coupled network of problem-solver part of the agent is to find the efficiency of machine,
entities that work together to find answers to minimizing cost increasing throughput etc. It can
problems that are beyond the individual capabilities also consist feedback of the finished goods and
or knowledge of each entity. The first issue is how suggest appropriate changes if required.
This agent coordinates with other agent like and generating schedules that are sent to the
requirement/logistics agent. The main objective of dispatching agent for execution. It assigns resources
this agent is to fulfill the requirement of various new orders and start times to activities that are
section of the company/customer. The intelligent part feasible while at the same time optimizing certain
of this agent is to acquire orders from various criteria such as minimizing work in progress or
vendors, compare them on the basis of quality, price, tardiness. It can generate a schedule from scratch or
availability etc. In case any demand increases or repair an existing schedule that has violated some
decreases automatically vendor will be constraints. In anticipation of domain uncertainties
communicated. like machine breakdowns or material unavailability,
the agent may reduce the precision of a schedule by
2.3 Transport agent increasing the degrees of freedom in the schedule for
the dispatcher to work with. For example, it may
This agent is responsible for the availability of the “temporally pad” a schedule by increasing an
transport, dispatching of finished goods at particular activity’s duration or “resource pad” an operation by
destination. It manages all the transportation routes. either providing a choice of more than one resource
or increasing the capacity required so that more is
2.4 Financial agent available.
Agent System
Type
Software XML
Application
Developer
Data
Agents that exploit encoded knowledge to create • A common format for the content of
services for a higher level of applications. communication
Retailer
Retailer Agent
Data
mining
looks
Warehouse Logistics
Warehouse
Plant
Warehouse Plant
Plant
Operation
Purchase Resource
Management
Scheduling
Supplier
Raw material
Behavior depends on the generic templates and work embedding specific knowledge into agents. This data
flow i.e. receiving and sending the message. Execute mining module receives the information from the
the stored application and gives necessary deriving XML document and executes the suitable data
decision using inference engine. mining functions designed by the application
There are four types of workflow terminals. developer. These models represented in Predictive
Modeling Markup Language [8] which is a data
mining standard defined by DMG (Data Mining
6.1 Add-on terminals Group) [9] provides the agent platform with
versatility and compatibility to other. Major data
For the addition of predefined function. mining software are Oracle SAS SPSS and MineIT
etc
6.2 Execute terminals
New agent can be created by existing one which will 1. Zhou, L.; Xu, X.; Huang, T. and Deng,
be a template for creating agent instances during the S.: Enterprise Interoperability: New Challenges and
design of a multiagent system architecture. Approaches,
6.6 Data base for agent 2. Nwana, H. S.: Software Agents: An Overview,
The Knowledge Engineering Review,
This unit acts as a storage facility to ensure inters October/November 1996, Vol. 11, No. 3, PP. 205-
functionality between all system components. In this 244.
system the database stores ontologies, behavior,
types of agent and the historical data to be mined. 3. Data Mining Group, the: Predictive Model
This unit can be designed by RMI. Markup Language Specifications (PMML)[1], ver.
2.0 available at: http://www.dmg.org
6.7 Agent training system (ATS)
4. Bradshaw, J. M.; Dutfield, S.; Benoit, P. and
This system gathers information from the data Woolley, J.D. KAoS: Toward An Industrial-Strength
mining procedure and then takes the decision and Open Agent Architecture, Software Agents,
sends this decision into the newly created agent. Bradshaw, J.M. (Ed.), Menlo Park, Calif., AAAI
Press, 1997, PP. 375-418.
6.8 Data Mining System
5. Russell, S. J. and Norvik, P.: Artificial
This system holds the implementation of data mining Intelligence: A Modern Approach, Prentice Hall,
algorithm executed by data mining procedures which Englewood Cliffs, N.J., 1995.
gives a new decision model which are again enabled
into agent via ATS. And also responsible for
Sikander Singh
Research Scholar, Department of Computer Science & Engineering, Punjab Engineering College (PEC),
Deemed University, Sector-12, Chandigarh-160012 (India)
[email protected],
ABSTRACT
A new routing protocol called Scalable Energy Efficient Ad-hoc on Demand
Distance Vector (SEE-AODV) having good scalable properties and energy efficient
than existing Ad hoc on Demand Distance (AODV) routing protocol in wireless
mesh networks has been proposed in this paper. Wireless mesh networks (WMNs)
consist of mesh routers and mesh clients, where mesh routers have minimal mobility
and form the backbone of WMNs. They provide network access for both mesh and
conventional clients. Two techniques called Clustering and Blocking-Expanding
Ring Search has been applied on existing AODV routing protocol to improve its
scalability and energy efficiency problem. Results shows that, performance of SEE-
AODV routing protocol is better than existing AODV routing protocol in wireless
mesh networks. To show the efficiency of our proposed routing protocol,
simulations have been done by using Network Simulator-2 (NS-2).
ERS curve after ring 3. As it is clear from Fig. 2, the E Blocking − ERS = 2(1 + ∑ ni ) + E RREP (UnitEnergy ) (3)
i =1
difference of the amount of energy consumption
between these two mechanisms becomes larger as
the distance increases between the source and the Similarly, the total energy consumption by the
route node. conventional TTL sequence based ERS is given by
Eq. (4)
Hr i
E TTL − ERS = H r + ∑ ∑ n j + E RREP (UnitEnergy ) (4)
i −1 j =1
H r −1 i
E Saved = H r − 2 + ∑ ((∑ n j ) − 2 ni )(UnitEnergy) (5)
i =1 j =1