Cryptography Notes PDF
Cryptography Notes PDF
Cryptography Notes PDF
UNIT – I
PART-A
Every element of the set has an inverse element. If we take any element of
the set p, there is another element q such that p op q = q op p = e.
The operation is associative. For any three elements of the set, (a op b) op c
always equals a op (b op c).
Rings:
A ring is a set of elements with two operations, one of which is like
addition, the other of which is like multiplication, which we will call add and mul. It
has the following properties:
The elements of the ring, together with the addition operation, form a group.
Addition is commutative. That is, for any two elements of the set p and q, p
add q = q add p. (The word Abelian is also used for "commutative", in honor
of the mathematician Niels Henrik Abel.)
The multiplication operation is associative.
Multiplication distributes over addition: that is, for any three elements of the
group a, b, and c, a mul ( b add c ) = (a mul b) add (a mul c).
18. Convert the given text “Anna University” into cipher text using Rail-fence.
Plaintext : anna university
Rail-fence :
A N U I E S T
N A N V R I Y
Ciphertext : ANUIESTNANVRIY
19. What are the aspects required for Network security model?
Using this model requires us to:
Design a suitable algorithm for the security transformation
Generate the secret information (keys) used by the algorithm
Develop methods to distribute and share the secret information
Specify a protocol enabling the principals to use the transformation and secret
information for a security service
20. Write short notes on Euclidean Algorithm
An efficient way to find the GCD(a,b)
Uses theorem that:
GCD(a,b) = GCD(b, a mod b)
Euclidean Algorithm to compute GCD(a,b) is:
EUCLID(a,b)
1. A = a; B = b
2. if B = 0 return A = gcd(a, b)
3. R = A mod B
4. A = B
5. B = R
6. goto 2
5
+ 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6
0 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0
1 1 2 3 4 5 6 7 0
1 0 1 2 3 4 5 6
2 2 3 4 5 6 7 0 1
2 0 2 4 6 1 3 5
3 3 4 5 6 7 0 1 2
4 4 5 6 7 0 1 2 3 3 0 3 6 2 5 1 4
5 5 6 7 0 1 2 3 4 4 0 4 1 5 2 6 3
6 6 7 0 1 2 3 4 5
5 0 5 3 1 6 4 2
7 7 0 1 2 3 4 5 6
6 0 6 5 4 3 2 1
3. A= 11(mod37) A= 42(mod49)
Find the value ‘A’ using Chinese Remainder Theorem. Suppose add the value 678
with ‘A’ value, what we do by method of CRT.
4. Derive Fermat’s and Euler’s Theorem with suitable example.
5. Encrypt the message “PAY” using Hill Cipher with the following matrix and
show the decryption to get the original plain text.
UNIT – II
PART-A
length is 64-bit, DES has an effective key length of 56 bits, since 8 of the 64
bits of the key are not used by the encryption algorithm (function as check bits
only).
Interestingly, AES performs all its computations on bytes rather than bits.
Hence, AES treats the 128 bits of a plaintext block as 16 bytes. These 16 bytes
are arranged in four columns and four rows for processing as a matrix −
Unlike DES, the number of rounds in AES is variable and depends on the length
of the key. AES uses 10 rounds for 128-bit keys, 12 rounds for 192-bit keys and
14 rounds for 256-bit keys. Each of these rounds uses a different 128-bit round
key, which is calculated from the original AES key.
12. Draw the block diagram for AES
13. What are the transformation functions for each round in AES?
Substitution bytes
Shift rows
Mix columns
Add round key
14. What are the approaches to attack the RSA algorithm?
Brute attack
Mathematical attack
Timing attack
Hardware fault based attack
Chosen cipher text attack
15. How to manage the keys?
Encryption is an effective way to secure data, but the encryption keys used
must be carefully managed to ensure data remains protected and accessible
when needed.
Symmetric key distribution using Symmetric encryption
Symmetric key distribution using Asymmetric encryption
16. Define Elliptic curve cryptography
Elliptic curve cryptography (ECC) is an approach to public-key
cryptography based on the algebraic structure of elliptic curves over finite
fields. ECC requires smaller keys compared to non-ECC cryptography (based
on plain Galois fields) to provide equivalent security.
17. List four general characteristics of schema for the distribution of the
public key.
Public announcement
Publicly available directory
Public-key authority
Public-key certificates
18. Where is the miller-rabin algorithm is used?
9
1. n = p x q = 3 x 11 = 33
j(n) = (p-1) x (q-1) = 2 x 10 = 20
gcd(j(n), e) = gcd(20, 7) = 1
∵ d ≡ e-1(mod j(n))
d x e mod j(n) = 1
7d mod 20 = 1
∴ d=3
So: Public Key pu = {e, n} = {7, 33}
Private Key pr = {d, n} = {3, 33}
Encryption:
C = Me mod n = 57 mod 33 = 14
Decryption: M = Cd mod n = 143 mod 33 = 5
20. What is the effect on the cipher text of a error in block P1 of plain text?.
Assume CBC mode. What is the error at the reciever?
If a bit of a plain text block P1 is in error the entire cipher text block will
be effected and will be erroneous. (Though, the encryption algorithm is
correctly encrypting what is given to it.) All subsequent cipher blocks will also
be effected each cipher text block is fed to next stage and XOR with next plain
text block. However, at the receiver, only the block P1 of plain text recovered
reproduces the same bit error. All the subsequent plain text blocks are
reproduced correcltly.
21. Define Public key certificate
In cryptography, a public key certificate (also known as a digital certificate or
identity certificate) is an electronic document used to prove ownership of a
public key.
22. Define RC5 algorithm
The RC5 encryption algorithm is a fast, symmetric block cipher suitable for
hardware or software implementations. A novel feature of RC5 is the heavy
use of data-dependent rotations. RC5 has a variable-length secret key,
providing flexibility in its security level.
23. What are the uses of Public Key Cryptography
10
Encryption/ Decryption
Digital Signature
Key exchange
24. Define RSA Cryptosystem-( three scholars Ron Rivest, Adi Shamir, and
Len Adleman)
Two aspects of the RSA cryptosystem, firstly generation of key pair and
secondly encryption-decryption algorithms.RSA does not directly operate on
strings of bits as in case of symmetric key encryption.
Generate the RSA modulus (n)
Select two large primes, p and q.
Calculate n=p*q. For strong unbreakable encryption, let n be a large
number, typically a minimum of 512 bits.
Find Derived Number (e)
Number e must be greater than 1 and less than (p − 1)(q − 1).
There must be no common factor for e and (p − 1)(q − 1) except for 1.
In other words two numbers e and (p – 1)(q – 1) are coprime.
Form the public key
The pair of numbers (n, e) form the RSA public key and is made
public.
Interestingly, though n is part of the public key, difficulty in factorizing
a large prime number ensures that attacker cannot find in finite time
the two primes (p & q) used to obtain n. This is strength of RSA.
Generate the private key
Private Key d is calculated from p, q, and e. For given n and e, there is
unique number d.
Number d is the inverse of e modulo (p - 1)(q – 1). This means that d is
the number less than (p - 1)(q - 1) such that when multiplied by e, it is
equal to 1 modulo (p - 1)(q - 1).
PART - B
11
1. Identify the possible threats for RSA algorithm and list their counter measures
2. Perform decryption and encryption using RSA algorithm with p=3, q=11, e=7
and N=5.
3. Draw the general structure of DES and explain the encryption decryption
process.
4. Explain the generation sub key and S Box from the given 32-bit key by
Blowfish.
5. In AES, how the encryption key is expanded to produce keys for the 10 rounds
6. Explain AES algorithm with block diagram
7. Describe about RC4 and RC5 algorithm.
8. Mention the strengths and weakness of DES algorithm.
12
UNIT – III
PART - A
PART - B
1. Define hash Function and Explain its properties in cryptography.
It is a procedure that verifies whether the received message comes from assigned
source has not been altered. It uses message authentication codes, hash algorithms to
authenticate the message
2. Define the classes of message authentication function.
Message encryption: The entire cipher text would be used for authentication.
Message Authentication Code: It is a function of message and secret key produce a fixed
length value.
Hash function: Some function that map a message of any length to fixed length which
serves as authentication.
3. What are the requirements for message authentication?
The requirements for message authentication are
Disclosure: Release of message contents to any person or process not processing the
appropriate cryptographic key
Traffic Analysis: Discovery of the pattern of traffic between parties. In a connection
oriented application, the frequency and duration of connections could be determined. In
either a connection oriented or connectionless environment, the number and length of
messages between parties could be determined.
Masquerade: Insertion of messages into the network from a fraudulent source. This
includes the creation of messages by an opponent that are purported to come from an
authorized entity. Also included are fraudulent acknowledgements of message receipt or no
receipt by someone other than the message recipient.
MAC: In Message Authentication Code, the secret key shared by sender and receiver. The
MAC is appended to the message at the source at a time which the message is assumed or
known to be correct.
Hash Function: The hash value is appended to the message at the source at time when the
message is assumed or known to be correct. The hash function itself not considered to be
13
secret.
6. Write Any three hash algorithm.
MD5 (Message Digest version 5) algorithm.
SHA_1 (Secure Hash Algorithm).
RIPEMD_160 algorithm.
7. What are the requirements of the hash function?
H can be applied to a block of data of any size. H produces a fixed length output.
H(x) is relatively easy to compute for any given x, making both hardware and software
implementations practical
h = H(M)
M = Variable length Message
H(M) = Fixed length hash value
8. What you meant by MAC?
MAC is Message Authentication Code. It is a function of message and secret key
which produce a fixed length value called as MAC.
MAC = Ck(M)
Where,
M = variable length message
K = secret key shared by sender and receiver.
CK (M) = fixed length authenticator.
9. Differentiate internal and external error control.
Internal error control:
In internal error control, an error detecting code also known as frame check
sequence or checksum.
External error control:
In external error control, error detecting codes are appended after encryption.
10. What is the meet in the middle attack?
This is the cryptanalytic attack that attempts to find the value in each of the range
and domain of the composition of two functions such that the forward mapping of one
through the first function is the same as the inverse image of the other through the second
function-quite literally meeting in the middle of the composed function.
11. What is the role of compression function in hash function?
The hash algorithm involves repeated use of a compression function f, that takes two
inputs and produce a n-bit output. At the start of hashing the chaining variable has an initial
value that is specified as part of the algorithm. The final value of the chaining variable is
the hash value usually b>n; hence the term compression.
12. Distinguish between direct and arbitrated digital signature?
Direct digital signature Arbitrated Digital Signature The direct digital signature
involves only the communicating parties.
The arbiter plays a sensitive and crucial role in this digital signature.
This may be formed by encrypting the entire message with the sender’s private key.
Every signed message from a sender x to a receiver y goes first to an arbiter A, who
subjects the message and its signature to a number of tests to check its origin and
content.
14. What requirements should a digital signature scheme should satisfy?
The signature must be bit pattern that depends on the message being signed.
The signature must use some information unique to the sender, to prevent both
forgery and denial.
It must be relatively easy to produce the digital signature.
It must be relatively easy to recognize and verify the digital signature. It must be
computationally infeasible to forge a digital
Signature, either by constructing a new message for an existing digital signature
or by constructing a fraudulent digital signature for a given message.
It must be practical to retain a copy of the digital signature in storage.
15. What types of attacks are addressed by message authentication?
Content modification: Changes to the contents of the message
Sequence Modification: Any modification to a sequence of messages between parties
including insertion, deletion and recording
Timing Modification: Delay or replay of message
16. What is the difference between a message authentication code and a one-way hash
function?
The difference between a MAC and a one way hash function is that unlike a MAC a hash
code does not use a key but is a function only of the input message.
17. Is it necessary to recover the secret key in order to attack a MAC algorithm?
A number of keys will produce the correct MAC and the opponent has no way of knowing
which the correct key is. On an average 2 (n-k) keys produce a match. Therefore attacks do
not require the discovery of the key.
18. What is the function of a compression function in a hash function?
The hash function involves repeated use of a compression function. The motivation is that
if the compression function is collision resistant, then the hash function is also collision
resistant function. So a secure hash function can be produced.
19. What are the two types of certificates?
Two types of certificates are,
i. Forward Certificate
ii. Reverse Certificate
the surprising result that the probability that two or more people in a group of 23 share the
same birthday is greater than 0.5. such a result is called a birthday paradox.
24. What do you mean by one way property in hash function?
For any given value h, it is computationally infeasible to find x such that H(x) = h.
25. What is digital signature?
Digital signature is an authentication mechanism that enables the creator of a message to
attach a code that acts as a signature.
26. What is one way property?
A function that maps an arbitrary length message to a fixed length message digest is a one-
way property hash function if it is a one-way function.
27. Write any two differences between MD5 and secure hash algorithm.
MD5 SHA
Pad message so its length is a multiple of
Pad message so its length is 448 mod 512
512 bits
Initialize the 4 word buffer (A, B, C, D) Initialize the 5 bit buffer (A, B, C, D, E)
Process the message in 16 word chunks
Process the message in 16 word chunks
using3 rounds of 16 bit operations each on
using 4 rounds of 20 bit operations
chunk and buffer
28. What are the performance difference between MD5, SHA-512, and RIPEMD-16?
MD5 produces a 128 bit hash value. SHA-512 produces 160 bit hash values
Brute force attack is harder
Not vulnerable to known attacks
Slower than MD5
All designed as simple and compact
SHA-1 optimized for big-endian CPUs vs RIPEMD-160 & MD5 optimized for little-
endian CPUs
2. Explain Digital signature with ElGamal public key cryptosystems.
4. Explain the process of deriving eighty 64 bit words from the 1024-bits for processing of
a single block and also discuss single round function in SHA-512 algorithm. Show the
values of W16, W17, W18 and W19.
5. Alice chooses Q=101 and P=7879. Assume (q,p,and y): Alice public key, Alicxe selects
h=3 and calculates g. Alice choose x=75 as the private key and calculates y. Now Alice
can send a message to bob. Assume that H(M) =22 and alice chooses secret no
K=50.Verify the signature.
6. Explain in detail about MD5 algorithm with necessary diagrams.
16
UNIT – IV
PART-A
1. What is an intruder?
Accessing a network unauthorized is called intrusion.
2. What is intrusion deduction system?
An Intrusion Deduction System (IDS) is a system for deduction unauthorized access to the system
3. What are audit reports? Give its two forms?
Audit report is a fundamental tool for intrusion deducting. Two forms of audit are:
1. Native audit records
2.Detective specific audit records.
4. Define malicious program.
A program that is intentionally included or inserted in a system for harmful purpose is malicious
program.
5. What is a virus?
A virus is a piece of program code that can infect other programs by modifying them.
6. What is a worm?
A worm is a program designed to copy itself and send copies from a computer to other computer
across the network.
7. Enlist four types of viruses?
1. Parasitic virus.
2. Memory resident virus.
3. Boot sector virus.
4. Stealth virus.
8. What is Trojan horse?
A Trojan horse is a computer program that appears to be useful but that actually does damage.
9. What is a logic bomb?
A logic bomb is a software embedded in some legitimate programs and is set to explode under
certain conditions..
10. What are the steps in virus removal process?
a) Detection of virus
b) Identification of virus
c) Removal of traces of virus.
11. What is generic decryption technology?
A generic decryption technology can detect most complex polymorphic viruses with fast
scanning speed.
12. What is Denial Of Service?
A denial of service is an attempt to prevent a genuine user of service from using it.
13. What are the design goals of firewalls?
1. All the traffic must pass through it.
2. Only authorized traffic is allowed to pass.
3. Firewall itself is immune to penetration.
2. Clandestine user: A user who seizes supervisory control of system to suppress audit
collection.
16. What is mentioned by a trusted system? .
A trusted system is a computer and operating system that can be verified to implement a given
security policy. Typically, the focus of a trusted system is data access control. A policy is
implemented that dictates what objects may be accessed.
17. Define honey pots.
A honey pot is a trap set to detect, deflect or in some manner counteract attempts at unauthorized
use of information systems.
18. List out the types of viruses.
Types of viruses are;
a) Parasitic virus
b) Memory-resident virus
c) Boot sector virus
d) Stealth virus
e) Polymorphic virus
19. What are the major issues derived by porras about the design of a distributed intrusion
deduction system?
Porras points out following major issues :
a) System may need to deal with different audit record formats.
b) One or more nodes in the network will serve as collision and analysis points for the data
from the systems on the networks.
c) Either centralized or decentralized architecture can be used.
20. What are three main components involved in the distributed intrusion detection system ?
Components :
a) Host agent module: An audit collection module operating as a background process on a
monitored system.
b) Lan monitor agent module: Same as host agent except that it analysis LAN traffic and reports.
c) Central manager module: Receives reports from LAN monitor and hos agents and processes
and correlates these reports to deduction intrusion.
21. Define intruder. Name three different classes of intruders.
An intruder is a person who attempts to gain unauthorized access to a system, to damage that
system, or to disturb data on that system. Classes of intruders: masquerader, misfeasor, clandestine
user.
22. What do you mean by Trojan horses?
Trojan horse is a computer program that appears to be useful but that actually does damage.
23. What is honey pot?
A System placed there just so it will be attacked, so attackers waste time, and so their attacks
can be analyzed.
24. Write down the role of security standards.
Standard allows products from multiple vendors to communicate, giving the purchaser more
flexibility in equipment selection in use.
25. Write down the system security standards?
Security standards development and publication are done by Internet architecture board,
Internet engineering task force and Internet engineering steering group.
26. Define: Intrusion.
Intrusion is an illegal act of entering, seizing or taking possession of another’s property.
27. Give few examples of worms.
Example of worm is the Morris worm and My doom.
28. Mention the two levels of hackers.
18
Two level’s of hackers are criminal hackers , disgruntled employees, ideological hackers and
underemployed adult hackers etc.
29. What are the effects of malicious software? Write an two?
The effects of malicious viruses on a computer system include occupation of disk space. Tantacle
2 virus will change icons on a computer screen.
30. What are Zombies?
Zombies are computer connected to internet that has been compromised by a hacker, computer
virus or Trojan horse and can be used to perform malicious tasks under remote direction.
31. Difference between spyware and virus
Sno Delete Truncate
1. Spyware is specific unwanted software A virus is a specific software that can be
that collects user information without distributed (spread) from computer to
appropriate notice and consent. computer usually by e-mail.
36. What are the principle differences between Kerberos version 4 and version 5?
1)Kerberos V.4 requires DES and V.5 allows many encryption techniques
2)V.4 requires use of IP and V.5 allows other network protocols.
3)Version 5 has a longer ticket life time
4)Version 5 allows tickets to be renewed
5)Version 5 can accept any symmetric-key algorithm
6)Version 5 uses a different protocol for describing data types
7)Version 5 has more overhead than Version 4
37. Define :malicious software
Malicious software is any software that gives partial to full control of your computer to do whatever
the malware creator wants. Malware can be a virus, worm , trojan, adware, spyware, root kit , etc.
38. Differentiate macro virus and boot virus.
A macro virus is platform independent virtually all of the macro viruses in fact
MS word documents. Macro viruses take advantages of a feature found in word
and other office applications such as Microsoft excel , namely the macro. Boot
sector virus infects a master boot record or boot record and spreads when a system
is booted from the disk containing the virus.
19
PART - B
UNIT – V
PART-A
Used for end-to-end communication between two It is used when one or both ends of an SA is
host. a security gateway , such as firewall or
router that implement IPSec.
AH: Authentication IP payload and selected Authentication entire inner IP packet
portions of IP header and IPv6 extension Plus selected portions of outer IP header and
header . outer IPv6 extension headers
PART - B
UNIT I INTRODUCTION
1. Define Graph.
A graph G = (V, E) consists of a set of objects V={v1, v2, v3, … } called vertices (also called
points or nodes) and other set E = {e1, e2, e3, .......} whose elements are called edges (also called lines
or arcs).
The set V(G) is called the vertex set of G and E(G) is the edge set of G.
For example :
A graph G is defined by the sets V(G) = {u, v, w, x, y, z} and E(G) = {uv, uw, wx, xy, xz}.
v
Graph G: u
x y
w
z
A graph with p-vertices and q-edges is called a (p, q) graph. The (1, 0) graph is called trivial
graph.
w x y w x y
Simple Graph Pseudo Graph
e6 e7
T
v3 v4 v5
Finite Graphs
Infinite Graphs
6. Define Isolated and pendent vertex.
A vertex having no incident edge is called an isolated vertex. In other words, isolated vertices
are vertices with zero degree. A vertex of degree one is called a pendant vertex or an end vertex.
Graph G: v1 v2 e1
e3
e5 e4 e2
e6 e7
v7 v3 v4 v5 v6
v7 v3 v4 v5 v6
8. Define Multigraph
In a multigraph, no loops are allowed but more than one edge can join two vertices, these edges
are called multiple edges or parallel edges and a graph is called multigraph.
26
Graph G: v1 v2
e3
e5 e4 e2
27
e6 e7
v3 v4 v5 v6
The edges e5 and e4 are multiple (parallel) edges.
Two graphs G and G' are said to be isomorphic to each other if there is a one-to-one
correspondence between their vertices and between their edges such that the incidence relationship is
preserved.
a v5
Graph G: v1 e1
e Graph G':
5 2 e3
4 v
1 4 v3
e2 e4 e6
6 c
3
b d
v1 e5 v2
28
e3 e4 e5 e4
e6 e4 e4
v4 v5 v6 v5 v6
g a g a c g a
c c
b b b
v2 v2 v2
v3 v3 v3
e e f d e f
d f d
h h h
v4 v5 v4 v5 v4 v5
e6 e7
v3 v4 v5 v3 e6 v6 v4 v5
Connected Graph G Disconnected Graph H
29
e5 e4 e3 e2
v3 e6 v6 v4 v5
30
e1
e2 e3 e5
e6
v3 e7 v4
v4 e1 v1 e2 v3 e3 v1 e4 v2 e5 v4 e6 v3 e7 v4 is an Euler circuit. So the above graph is Euler graph.
v1 e v2
Graph G: d
31
f j k
a c h
v6 b v3 g v4 i v5
32
Paths between vertices v6 and v2 are (a, e), (a, c, f), (b, c, e), (b, f), (b, g, h), and (b, g, i, k).
The shortest paths between vertices v6 and v2 are (a, e) and (b, f), each of length two.
Hence d(v6 , v2) =2
The eccentricity E(v) of a vertex v in a graph G is the distance from v to the vertex farthest
from v in G; that is,
= max ( , )
∈
A vertex with minimum eccentricity in graph G is called a center of G
Graph G: a
d c
b
Distance d(a, b) = 1, d(a, c) =2, d(c, b)=1, and so on.
Eccentricity E(a) =2, E(b) =1, E(c) =2, and E(d) =2.
Center of G = A vertex with minimum eccentricity in graph G = b.
v3 e7 v4 v3 v4
v1 e4 v2 v1
v1 e4 v2 v2
e1
e2 e3 e5
e3 e1 e2 e5
e6 e
6
v3 e7 v4 v3 e7
v3 v4 v4
∪ =
34
Possible cut sets are {a, c, d, f}, {a, b, e, f}, {a, b, g}, {d, h, f}, {k}, and so on.
{a, c, h, d} is not a cut set, because its proper subset {a, c, h} is a cut set.
{g, h} is not a cut set.
A minimal set of edges in a connected graph whose removal reduces the rank by one is called
minimal cut set (simple cut-set or cocycle). Every edge of a tree is a cut set.
v1
v1
v1 v2
v1
v1 is an articulation point.
The max. flow between two vertices = Min. of the capacities of all cut-sets.
To eliminate the distinction between finite and infinite regions, a planar graph is often
embedded in the surface of sphere. This is done by stereographic projection.
41
The generating function for the geometric sequence 1, a, a2, a3, ... for any constant a:
∞
( ) =
−
=
What is Partitions of integer?
Partitioning a positive n into positive summands and seeking the number of such partitions
without regard to order is called Partitions of integer.
This number is denoted by p(n). For example
P(1) = 1: 1
P(2) = 2: 2=1+1
P(3) = 3: 3 = 2 +1 = 1 + 1 +1
P(4) = 5: 4=3+1=2+2=2+1+1=1+1+1+1
P(5) = 7: 5 = 4 + 1 = 3 + 2 = 3 + 1 + 1 = 2 + 2 + 1 = 2 + 1 + 1+ 1 = 1 + 1 + 1 + 1 + 1
Define Exponential generating function
For a sequence a0, a1, a2, a3,, … of real numbers.
2 3 ∞
= 0 + 1 + + 3 +⋯=
2
2! 3! !
=
is called the exponential generating function for the given sequence.
−
=1− + − + −⋯
Adding these two series together, we get, 2! 3! 4!
2 4
−
+−
+ = 2(1
= 1++ + +⋯)
2 2! 4!
Define Summation operator 2 4
+ +⋯
2! 4!
= 0 + ( 0 + 1) + 0 + 1+ 2 2 + + 0 + 1+ 2 + 3 3 + ⋯
So f(x)/(1-x) generates the sequence of sums a0, a0 + a1, a0 + a1 + a2, a0 + a1 + a2 + a3,,
1/(1-x) is called the summation operator.
43
Recurrence relations
A recurrence relation is an equation that recursively defines a sequence or multidimensional
array of values, once one or more initial terms are given: each further term of the sequence or array is
defined as a function of the preceding terms.
The term difference equation sometimes (and for the purposes of this article) refers to a
specific type of recurrence relation. However, "difference equation" is frequently used to refer
to any recurrence relation.
Fibonacci numbers
The recurrence satisfied by the Fibonacci numbers is the archetype of a homogeneous linear
recurrence relation with constant coefficients (see below). The Fibonacci sequence is defined using the
recurrence
Fn = Fn-1 + Fn-2
with seed values F0 = 0 and F1 = 1
We obtain the sequence of Fibonacci numbers, which begins
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
PART – B
1. Find the shortest spanning tree for the following graph.
PART – B
42. Prove that any simple planar graph can be embedded in a plane such that every edge is drawn as a
straight line.
43. Show that a connected planar graph with n vertices and e edges has e-n+2 regions.
3. Define chromatic polynomial. Find the chromatic polynomial for the following graph.
PART – B
47
2. What is QOS?
Grid computing system is the ability to provide the quality of service requirements necessary for
the end-user community. QOS provided by the grid like performance, availability, management
aspects, business value and flexibility in pricing.
3. What are the derivatives of grid computing?
There are 8 derivatives of grid computing. They are as follows:
a) Compute grid
b) Data grid
c) Science grid
d) Access grid
e) Knowledge grid
f) Cluster grid
g) Terra grid
h) Commodity grid
4. What are the features of data grids?
Business On Demand is not just about utility computing as it has a much broader set
of ideas about the transformation of business practices, process transformation, and
technology implementations.
The essential characteristics of on-demand businesses are responsiveness to the
dynamics of business, adapting to variable cost structures, focusing on core business
49
50
The formation of virtual task forces, or groups, to solve specific problems associated with
the virtual organization.
The dynamic provisioning and management capabilities of the resource required meeting
the SLA’s.
8. What are the properties of Cloud Computing?
There are six key properties of cloud computing: Cloud computing is
x user-centric
x task-centric
x powerful
x accessible
x intelligent
x programmable
9. Sketch the architecture of Cloud.
Schedulers are types of applications responsible for the management of jobs, such as
allocating resources needed for any specific job, partitioning of jobs to schedule parallel
execution of tasks, data management, event correlation, and service-level management
capabilities.
12. What is meant by resource broker?
Resource broker provides pairing services between the service requester and the
service provider. This pairing enables the selection of best available resources from the
service provider for the execution of a specific task.
13. What is load balancing?
Load balancing is concerned with the integrating the system in order to avoid
processing delays and over -commitment of resources. It involves partitioning of jobs,
identifying the resources and queuing the jobs.
14. What is grid infrastructure?
Grid infrastructure forms the core foundation for successful grid applications.
This infrastructure is a complex combination of number of capabilities and resources identified
for the specific problem and environment being addressed.
PART – B
1) Explain in detail about virtual organization. (16)
3) Explain some of the grid application and their usage patterns. (16)
5) What are the data and functional requirements of grid computing? (16)
7) Describe in detail about the Technologies for network based systems? (16)
53
CLOUD COMPUTING
UNIT 2 QUESTION BANK
Part - A
1. Define – OSGI.
Open Grid Services Architecture (OGSA) is a set of standards defining the way in which
information is shared among diverse components of large, heterogeneous grid systems. In this
context, a grid system is a scalable wide area network (WAN) that supports resource sharing and
distribution. OGSA is a trademark of the Open Grid Forum.
2. Define – OSGA.
The Open Grid Services Infrastructure (OGSI) was published by the Global Grid Forum
(GGF) as a proposed recommendation in June 2003.[1] It was intended to provide an
infrastructure layer for the Open Grid Services Architecture (OGSA). OGSI takes the
statelessness issues (along with others) into account by essentially extending Web services to
accommodate grid computing resources that are both transient and stateful.
3. Define – Peer to Peer Computing.
Peer to Peer computing is a relatively new computing discipline in the realm of distributed
computing. P2P system defines collaboration among a larger number of individuals and/or
organizations, with a limited set of security requirements and a less complex resource-sharing
topology.
4. What is Dynamic Accounting System?
DAS provides the following enhanced categories of accounting functionality to the IPG
community:
Allows a grid user to request access to a local resource via the presentation of grid
credentials
Determines and grants the appropriate authorizations for a user to access a local resource
without requiring a preexisting account on the resource to govern local authorizations.
4. Define – SOA.
2 Identify the use cases that can drive the OGSA platform components.
3 Identify and define the core OGSA platform components.
4 Define hosting and platform specific bindings.
5 Define resource models and resource profiles with interoperable solutions.
54
55
GRAM provides resource allocation, process creation, monitoring, and management services.
The most common use of GRAM is the remote job submission and control facility. GRAM
simplifies the use of remote systems.
11. What is the role of the grid computing organization?
SOAP is a simple and lightweight XML-based mechanism for creating structured data packages
that can be exchanged between network applications. SOAP provides a simple enveloping
mechanism and is proven in being able to work with existing networking services technologies
such as HHTP.SOAP is also flexible and extensible. SOAP is based on the fact that it builds
upon the XML info set.
15. Define WSDL.
WSDL is an XML Info set based document, which provides a model and XML format for describe
web services. This enables services to be described and enables the client to consume these services
in a standard way without knowing much on the lower level protocol exchange binding including
SOAP and HTTP. This high level abstraction on the service limits human interaction and enables
the automatic generation of proxies for web services, and these proxies can be static or dynamic.
It allows both document and RPC - oriented messages.
57
PART – B
8) Write short notes on Open Grid Service Architecture. (16)
9) Explain in detail, the functional requirements of OGSA. (16)
10) Explain Practical & Detailed view of OGSA/OGSI. (16)
11) Explain in detail, OGSA services.(16)
12) Describe about the relation of grid architecture with other distributed technologies.(16)
13) What are the third generation initiatives of grid computing?
14) Discuss briefly about organization building and using grid based solution to
solve their computing data and network requirements.
58
Part - A
1. What is the working principle of Cloud Computing?
The cloud is a collection of computers and servers that are publicly accessible via the This
hardware is typically owned and operated by a third party on a consolidated basis in one or
more data center locations. The machines can run any combination of operating systems.
2. What is Virtualization?
Virtualization is a foundational element of cloud computing and helps deliver on the value
of cloud computing," Adams said. "Cloud computing is the delivery of shared computing
resources, software or data — as a service and on-demand through the Internet.
3. Define Cloud services with example.
Any web-based application or service offered via cloud computing is called a cloud Cloud
services can include anything from calendar and contact applications to word processing and
presentations.
4. What are the types of Cloud service development?
x Software as a Service
x Platform as a Service
x Infrastructure as a Service
5. Explain cloud provider and cloud broker?
Cloud Provider: Is a company that offers some component of cloud computing typically
infrastructure as a service, software as a Service or Platform as a Service. It is something referred
as CSP.
Cloud Broker: It is a third party individual or business that act as an intermediary between the
purchase of cloud computing service and sellers of that service.
6. Define - Private Cloud.
The private cloud is built within the domain of an intranet owned by a single organization.
Therefore, they are client owned and managed. Their access is limited to the owning clients and
their partners. Their deployment was not meant to sell capacity over the Internet through publicly
accessible interfaces. Private clouds give local users a flexible and agile private infrastructure to
run service workloads within their administrative domains.
7. Define - Public Cloud.
A public cloud is built over the Internet, which can be accessed by any user who has paid for the
service. Public clouds are owned by service providers. They are accessed by subscription. Many
59
companies have built public clouds, namely Google App Engine, Amazon AWS, Microsoft Azure,
IBM Blue Cloud, and Salesforce Force.com. These are commercial providers that offer a publicly
accessible remote interface for creating and managing VM instances within their proprietary
infrastructure.
8. Define - Hybrid Cloud.
A hybrid cloud is built with both public and private clouds; Private clouds can also support a hybrid
cloud model by supplementing local infrastructure with computing capacity from an external
public cloud. For example, the research compute cloud (RC2) is a private cloud built by IBM.
9. Define anything-as-a-service?
Providing services to the client on the basis on meeting their demands at some pay per use cost such as
data storage as a service, network as a service, communication as a service etc. It is generally denoted as
anything as a service (XaaS).
7 What is Hypervisor?
A hypervisor or virtual machine monitor (VMM) is a piece of computer software, firmware or hardware
that creates and runs virtual machines. A computer on which a hypervisor is running one or more
virtual machines is defined as a host machine. Each virtual machine is called a guest machine.
16. What are the types of hypervisor?
6. Type 1 (bare-metal)
7. Type 2 (hosted)
Type 1 hypervisors run directly on the system hardware. They are often referred to as a "native"
or "bare metal" or "embedded" hypervisors in vendor literature.
Type 2 hypervisors run on a host operating system. When the virtualization movement first
began to take off, Type 2 hypervisors were most popular. Administrators could buy the software
and install it on a server they already had.
61
PART – B
11. Write short notes on cloud deployment model. (16)
12. Explain in detail, categories of cloud. (16)
13. Explain in detail, pros and cons of cloud. (8)
14. Explain in detail, different implementation level of virtualization? (16)
15. Write short notes on OS level virtualization. List the pros and cons of OS
level virtualization. (16)
16. Explain in detail, the virtualization of CPU, Memory and I/O devices. (16)
17. Write short notes on virtual clusters. (8)
18. Explain in detail, the virtualization for data center automation. (16)
62
Part -A
1. What is The Globus Toolkit Architecture (GT4)
The Globus Toolkit, started in 1995 with funding from DARPA, is an open
middleware library for the grid computing communities. The toolkit addresses common
problems and issues related to grid resource discovery,management, communication, security, fault
detection, and portability. The library includes a rich set of service implementations.
18. What are two types of nodes that control the job execution process?
a jobtracker and a number of tasktrackers controls the job execution process. The
jobtracker coordinates all the jobs run on the system by scheduling tasks to run on tasktrackers.
Tasktrackers run tasks and send progress reports to the jobtracker, which keeps a record of the
overall progress of each job. If a task fails, the jobtracker can reschedule it on a different
tasktracker.
Part -B
UNIT V
PART A
1. List the challenges in building trust management?
2. What are the security requirements of grid?
3. What are the types of message level security?
4. What is IAM?
5. List the components in IAM architecture provider.
6. What is privacy in cloud?
7. List the important tasks in the management of identities in cloud?
8. What is SD?
9. What is TI?
10. List some potential security issues.
11. What is security assurance condition?
12. Give the steps accomplished in fuzzy inference.
13. Which information are taken into account for calculating site trust worthiness?
14. What are the major authenticated methods?
15. Give the category classifications of authority?
16. What is the role of GSI functional layers?
17. What are the additional protection mechanisms of GSI?
18. Give the various levels of security.
19. Name the Cloud security controls.
20. Give some of the data security issues.
21. List the types of PHRs.
PART B
1. Discuss in detail about the various trust models in grids?
2. Write about Authorization and Delegation in Grids?
3. Explain briefly about Grid Security Infrastructure?
4. Explain briefly about the aspects of data security, provider data and security.
5. Describe in detail about the IAM architecture and its practices in cloud.
6. Write about the various key privacy issues in the cloud?
UNIT I: INTRODUCTION (16 marks)
Part-B
2. Explain about Scalable Computing Trends and its New Paradigms. What do you mean
by the Internet of Things and Cyber-Physical Systems? Discuss.
Scalable Computing Trends and New Paradigms
Moore’s law indicates that processor speed doubles every 18 months.
Gilder’s law indicates that network bandwidth has doubled each year in the past.
Degrees of Parallelism
bit-level parallelism (BLP) converts bit-serial processing to word-level processing
gradually This led us to the next wave known as instruction-level parallelism (ILP)Data-level
parallelism (DLP) was made popular through SIMD (single instruction, multiple data) and vector
66
67
machines using vector or array types of instructions. From chip multiprocessors (CMPs), we
have been exploring task-level parallelism (TLP).
Innovative Applications
Both HPC and HTC systems desire transparency in many application aspects
Applications of High-Performance and High-Throughput Systems
The Trend toward Utility Computing
These paradigms are composable with QoS and SLAs (service-level agreements).
The Hype Cycle of New Technologies
This cycle shows the expectations for the technology at five different stages. The
expectations rise sharply from the trigger period to a high peak of inflated expectations.
The Internet of Things and Cyber-Physical Systems
The Internet of Things
Three communication patterns co-exist: namely H2H (human-to-human), H2T (human-
to-thing), and T2T (thing-to-thing
Cyber-Physical Systems
A cyber-physical system (CPS) is the result of interaction between computational
processes and the physical world.
67
68
68
69
are linked with customized, high-level communication systems: SOAP, RMI, and IIOP. These
communication systems are built on message-oriented middleware infrastructure such as Web
Sphere MQ or Java Message Service (JMS) which provide rich functionality and support
virtualization of routing, senders, and recipients. In the case of fault tolerance, the features in
the Web Services Reliable Messaging (WSRM) framework mimic the OSI layer capability
modified to match the different abstractions
At the entity levels. Security is a critical capability that either uses or re implements the
capabilities seen in concepts such as Internet Protocol Security (IPsec) and secure sockets in the
OSI layers. The CORBA Trading Service, UDDI (Universal Description, Discovery, and
Integration), LDAP (Lightweight Directory Access Protocol), and ebXML (Electronic Business
using eXtensible Markup Language)
Web Services and Tools
Both web services and REST systems have very distinct approaches to building reliable
Interoperable systems:This specification is carried with communicated messages using
Simple Object Access Protocol (SOAP). The hosting environment then becomes a universal
distributed operating system with fully distributed capability carried by SOAP
Messages: REST can use XML schemas but not those that are part of SOAP; “XML over
HTTP” is a popular design choice in this regard.
The Evolution of SOA
Filter services (fs in the figure) are used to eliminate unwanted raw data, in order to
respond to specific requests from the web, the grid, or web services.
Grids versus Clouds
THE general approach used in workflow, the BPEL Web Service standard, and several
important workflow approaches including Pegasus, Taverna, Kepler, Trident, and Swift. May
end up building with a system of systems: such as a cloud of clouds, a grid of clouds, or a cloud
of grids, or inter-clouds as a basic SOA architecture.
69
70
with a Grid service.OGSI provides the Web Service Definition Language (WSDL) definitions
for these key interfaces.
OGSA-DAI
The OGSA-DAI (data access and integration) project is concerned with
Constructing middleware to assist with access and integration of data from separate data sources
via the grid.
GridFTP
GridFTP is a secure and reliable data transfer protocol providing high performance and
optimized for wide-area networks that have high bandwidth. GridFTP uses basic Grid security
on both control (command) and data channels. Features include multiple data channels for
parallel transfers, partial file transfers, third-party transfers, and more.
GridFTP can be used to move files (especially large files) across a network efficiently and
reliably.
WSRF
WSRF defines a set of specifications for defining the relationship between Web services
and stateful resources. WSRF is a general term that encompasses several related proposed
standards that cover:
Resources
Resource lifetime
Resource properties
Service groups (collections of resources)
Faults
Notifications
Topics
Web services related standards
Standards commonly associate with Web services are
XML
WSDL
SOAP
OGSA Interfaces
The OGSA is centered on grid services. These services demand special well-
defined application interfaces. These interfaces provide resource discovery, dynamic service
creation, lifetime management, notification, and manageability. Two key properties of a grid
service are transience and statefulness. These properties have significant implications regarding
how a grid service is named, discovered, and managed. Being transient means the service can be
created and destroyed dynamically; statefulness refers to the fact that one can distinguish one
service instance from another.
Grid Service Handle
A GSH is a globally unique name that distinguishes a specific grid service instance
from all others. The OGSA employs a “handle-resolution” mechanism for mapping from a GSH
to a GSR. The GSH must be globally defined for a particular
Instance.
72
73
73
74
GGF calls OGSI the “base for OGSA.” Specifically, there is a relationship between
OGSI and distributed object systems and also a relationship between OGSI and the existing Web
services framework.
Relationship to Distributed Object Systems.
A given grid service implementation is an addressable and potentially
stateful instance that implements one or more interfaces described by WSDL portTypes. Grid
service factories can be used to create instances implementing a given set of portType(s). Each
grid service instance has a notion of identity with respect to the other instances in the distributed
grid. Each instance can be characterized as state coupled with behaviour published through type-
specific operations.
Grid service instances are made accessible to client applications through
the use of a grid service handle and a grid service reference (GSR).A client application can use
a grid service reference to send requests, represented by the operations defined in the portType(s)
of the target service description directly to the specific instance at the specified network-attached
service endpoint identified by the grid service reference.
Client-Side Programming Patterns.
OGSI exploits an important component of the Web services framework:
the use of WSDL to describe multiple protocol bindings, encoding styles, messaging styles, and
so on, for a given Web service. The Web Services Invocation Framework (WSIF) and Java API
for XML RPC (JAX-RPC) are among the many examples of infrastructure software that provide
this capability.
Various tools can take the WSDL description of the Web service and
generate interface definitions in a wide range of programming-language-specific constructs.
A proxy provides a client-side representation of remote service instance’s
interface. Proxy behaviors specific to a particular encoding and network protocol are
encapsulated in a protocol-specific (binding-specific) stub. This includes both application-
specific services and common infrastructure services that are defined by OGSA.
Client Use of Grid Service Handles and References.
A grid service handle (GSH) can be thought of as a permanent network
pointer to a particular grid service instance. The client resolves a GSH into a GSR by invoking
a HandleResolver grid service instance identified by some out-of-band mechanism. The
HandleResolver may have the GSR stored in a local cache. The HandleResolver may need to
invoke another HandleResolver to resolve the GSH.
Relationship to Hosting Environment.
OGSI does not dictate a particular service-provider-side implementation
architecture. A container implementation may provide a range of functionality beyond simple
argument demarshaling.
75
76
Defining grid service description and grid service instance, as organizing principles for
their extension and their use
Defining how OGSI models time
Defining the grid service handle and grid service reference constructs that are used to
refer to grid service instances
Defining a common approach for conveying fault information from operations.
This approach defines a base XML schema definition and associated semantics for
WSDL fault messages to support a common interpretation; the approach simply defines
the base format for fault messages, without modifying the WSDL fault message model.
Defining the life cycle of a grid service instance
WSDL Extensions and Conventions
It uses WSDL as the mechanism to describe the public interfaces of grid services.
er, WSDL 1.1 is deficient in two critical areas: lack of interface (portType) extension and the
inability to describe additional information elements on a portType.
Service Data
The approach to stateful Web services introduced in OGSI identified the need for
a common mechanism to expose a service instance’s state data to service requestors for query,
update, and change notification. The GGF is endeavouring to introduce this concept to the
broader Web services community.
Service data can be exposed for read, update, or subscription purposes. Since
WSDL defines operations and messages for portTypes, the declared state of a service must be
externally accessed only through service operations defined as part of the service interface. To
avoid the need to define service Data-specific operations for each service Data element, the grid
service portType provides base operations for manipulating service Data elements by name.
Elements of the publicly available state exposed by the service’s interface.
Motivation and Comparison to JavaBean Properties.
The OGSI specification introduces the serviceData concept to provide a
flexible, properties-style approach to accessing state data of a Web service. The OGSI
specification has chosen not to require getXXX and setXXX WSDL Operations for each
serviceData element.
Extending portType with serviceData
ServiceData defines a new portType child element named serviceData,
used to define serviceData elements, or SDEs, associated with that portType. These serviceData
element definitions are referred to as serviceData declarations, or SDDs.
ServiceDataValues
Each service instance is associated with a collection of serviceData
elements: those serviceData elements defined within the various portTypes that form the
service’s interface, and also, potentially, additional service-Data elements added at runtime.
OGSI calls the set of serviceData elements associated with a service instance its “serviceData
set.”
Each service instance must have a “logical” XML document, with a root
element of serviceDataValues that contains the serviceData element values. An example of a
serviceDataValues element was given above.
SDE Aggregation within a portType Interface Hierarchy
WSDL 1.2 has introduced the notion of multiple portType extension, and
one can model that construct within the GWSDL namespace. A portType can extend zero or
more other portTypes. There is no direct relationship between a wsdl: service and the portTypes
supported by the service modeled in the WSDL syntax.”
76
77
77
78
Event
Policy and Agreements
Base Data Services
Other Data Services
Discovery Services
Job Agreement Service
Reservation Agreement Service
Data Access Agreement Service
Queuing Service
Open Grid Services Infrastructure
Common Management Model
Monadic model: This is a centralized data repository model. All the data is saved in a central
data repository. When users want to access some data they have to submit requests directly to
78
79
the central repository. No data is replicated for preserving data locality. This model is the
simplest to implement for a small grid. For a large grid, this model is not efficient in terms of
performance and reliability. Data replication is permitted in this model only when fault tolerance
is demanded.
Hierarchical model: The hierarchical model, is suitable for building a large data grid which has
only one large data access directory. The data may be transferred from the source to a second-
level center. Then some data in the regional center is transferred to the third-level center. After
being forwarded several times, specific data objects are accessed directly by users. Generally
speaking, a higher-level data center has a wider coverage area. It provides higher bandwidth for
access than a lower-level data center. KI security services are easier to implement in this
hierarchical data access model.
Federation model: This data access model is better suited for designing a data grid with multiple
sources of data supplies. Sometimes this model is also known as a mesh model. The data sources
are distributed to many different locations. Although the data is shared, the data items are still
owned and controlled by their original owners. According to predefined access policies, only
authenticated users are authorized to request data from any data source. This mesh model may
cost the most when the number of grid institutions becomes very large.
Hybrid model: This data access model combines the best features of the hierarchical and mesh
models. Traditional data transfer technology, such as FTP, applies for networks with lower
bandwidth. Network links in a data grid often have fairly high bandwidth, and other data transfer
models are exploited by high-speed data transfer tools such as GridFTP developed with the
Globus library. The cost of the hybrid model can be traded off between the two extreme models
for hierarchical and mesh-connected grids.
Parallel versus Striped Data Transfers
Compared with traditional FTP data transfer, parallel data transfer opens multiple
data streams for passing subdivided segments of a file simultaneously.
. In striped data transfer, a data object is partitioned into a number of sections, and each section
is placed in an individual site in a data grid. When a user requests this piece of data, a data stream
is created for each site, and all the sections of data objects are transferred simultaneously. Striped
data transfer can utilize the bandwidths of multiple sites more efficiently to speed up data transfer
UNIT III: VIRTUALIZATION (16 marks)
1. Explain NIST Cloud Computing Architecture?
The NIST definition “Cloud computing”
“… a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of
configurable computing resources (e.g., networks, servers, storage, applications, and services)
that can be rapidly provisioned and released with minimal management effort or service provider
interaction.”
79
80
4 deployment models
o Public: Accessible, via the Internet, to anyone who pays
Owned by service providers; e.g., Google App Engine, Amazon Web
Services, Force.com.
o Community: Shared by two or more organizations with joint interests, such as
colleges within a university
o Private: Accessible via an intranet to the members of the owning organization
Can be built using open source software such as CloudStack or OpenStack
Example of private cloud: NASA’s cloud for climate modeling
o Hybrid
A private cloud might buy computing resources from a public cloud.
3 service models
o Cloud Software as a Service (SaaS)
Use provider’s applications over a network
o Cloud Platform as a Service (PaaS)
Deploy customer-created applications to a cloud
o Cloud Infrastructure as a Service (IaaS)
Rent processing, storage, network capacity, and other fundamental computing
resources
5 essential characteristics
o On-demand self-service: consumers can acquire the necessary computational
resources without having to interact with human service providers.
o Ubiquitous network access: cloud features don’t require special devices – laptops,
mobile phones, etc. are generally supported.
o Resource pooling: cloud resources are pooled to serve many customers “… using a
multi-tenant model, with different physical and virtual resources…”
o Rapid elasticity: resources can be allocated and de-allocated quickly as needed.
o Measured service: resource use is measured and monitored; charges are made based
on usage and service type (e.g., storage, CPU cycles, etc.)
2. Explain Cloud Design Objectives?
80
81
81
82
This PaaS model enables a collaborated software development platform for users from different
parts of the world.
Software as a Service (SaaS)
This refers to browser-initiated application software over thousands of cloud customers.
Services and tools offered by PaaS are utilized in construction of applications and management
of their deployment on resources offered by IaaS providers. The SaaS model provides software
applications as a service. As a result, on the customer side, there is no upfront investment in
servers or software licensing. On the provider side, costs are kept rather low, compared with
conventional hosting of user applications. Customer data is stored in the cloud that is either
vendor proprietary or publicly hosted to support PaaS and IaaS. The best examples of SaaS
services include Google Gmail and docs, Microsoft SharePoint, and the CRM software from
Salesforce.com. They are all very successful in promoting their own business or are used by
thousands of small businesses in their dayto- day operations. Providers such as Google and
Microsoft offer integrated IaaS and PaaS services, whereas others such as Amazon and GoGrid
offer pure IaaS services and expect third-party PaaS providers such as Manjrasoft to offer
application development and deployment services on top of their infrastructure services.
At the ISA level, virtualization is performed by emulating a given ISA by the ISA of the
host machine. For example, MIPS binary code can run on an x86-based host machine with the
help of ISA emulation. The basic emulation method is through code interpretation. An interpreter
program interprets the source instructions to target instructions one by one. One source
instruction may require tens or hundreds of native target instructions to perform its function. This
approach translates basic blocks of dynamic source instructions to target instructions. The basic
blocks can also be extended to program traces or super blocks to increase translation efficiency.
Instruction set emulation requires binary translation and optimization. A virtual instruction set
architecture (V-ISA) thus requires adding a processor-specific software translation layer to the
compiler.
Hardware Abstraction Level
Hardware-level virtualization is performed right on top of the bare hardware. On the one
hand, this approach generates a virtual hardware environment for a VM. On the other hand, the
process manages the underlying hardware through virtualization. The idea is to virtualize a
computer’s resources, such as its processors, memory, and I/O devices. The intention is to
upgrade the hardware utilization rate by multiple users concurrently. The idea was implemented
in the IBM VM/370 in the 1960s. More recently, the Xen hypervisor has been applied to
virtualize x86-based machines to run Linux or other guest OS.
Operating System Level
This refers to an abstraction layer between traditional OS and user applications. OS-level
virtualization creates isolated containers on a single physical server and the OS instances to
utilize the hardware and software in data centers. The containers behave like real servers. OS-
level virtualization is commonly used in creating virtual hosting environments to allocate
hardware resources among a large number of mutually distrusting users.
Library Support Level
Most applications use APIs exported by user-level libraries rather than using lengthy
system calls by the OS. Since most systems provide well-documented APIs, such an interface
becomes another candidate for virtualization. Virtualization with library interfaces is possible by
controlling the communication link between applications and the rest of a system through API
hooks. The software tool WINE has implemented this approach to support Windows applications
on top of UNIX hosts.
User-Application Level
Virtualization at the application level virtualizes an application as a VM. On a traditional
OS, an application often runs as a process. Therefore, application-level virtualization is also
known as process-level virtualization. The most popular approach is to deploy high level
language (HLL) VMs. In this scenario, the virtualization layer sits as an application program on
top of the operating system, and the layer exports an abstraction of a VM that can run programs
written and compiled to a particular abstract machine definition. Any program written in the
HLL and compiled for this VM will be able to run on it. The Microsoft .NET CLR and Java
Virtual Machine (JVM) are two good examples of this class of VM.
Relative Merits of Different Approaches
The column headings correspond to four technical merits. “Higher Performance” and
“Application Flexibility” are self-explanatory. “Implementation Complexity” implies the cost to
implement that particular virtualization level. “Application Isolation” refers to the effort required
to isolate resources committed to different VMs. Each row corresponds to a particular level of
virtualization.
83
84
virtual memory to a physical memory mapping, the VMM updates the shadow page tables to
enable a direct lookup.
I/O Virtualization
I/O virtualization involves managing the routing of I/O requests between virtual devices
and the shared physical hardware. At the time of this writing, there are three ways to implement
I/O virtualization: full device emulation, para-virtualization, and direct I/O.
Full device emulation is the first approach for I/O virtualization. Generally, this approach
emulates well-known, real-world devices. All the functions of a device or bus infrastructure,
such as device enumeration, identification, interrupts, and DMA, are replicated in software. This
software is located in the VMM and acts as a virtual device. The I/O access requests of the guest
OS are trapped in the VMM which interacts with the I/O devices.
The para-virtualization method of I/O virtualization is typically used in Xen. It is also
known as the split driver model consisting of a frontend driver and a backend driver. The
frontend driver is running in Domain U and the backend driver is running in Domain 0.They
interact with each other via a block of shared memory. The frontend driver manages the I/O
requests of the guest OSes and the backend driver is responsible for managing the real I/O
devices and multiplexing the I/O data of different VMs.
Virtualization in Multi-Core Processors
Virtualizing a multi-core processor is relatively more complicated than virtualizing
a unicore processor. Though multicore processors are claimed to have higher performance by
integrating multiple processor cores in a single chip, muti-core virtualization has raised some
new challenges to computer architects, compiler constructors, system designers, and application
programmers. There are mainly two difficulties: Application programs must be parallelized to
use all cores fully, and software must explicitly assign tasks to the cores, which is a very complex
problem.
Physical versus Virtual Processor Cores
This technique alleviates the burden and inefficiency of managing hardware
resources by software. It is located under the ISA and remains unmodified by the operating
system or VMM (hypervisor).
Virtual Hierarchy
The emerging many-core chip multiprocessors (CMPs) provides a new
computing landscape. To optimize for space-shared workloads, they propose using virtual
hierarchies to overlay a coherence and caching hierarchy onto a physical processor. Unlike a
fixed physical hierarchy, a virtual hierarchy can adapt to fit how the work is space shared for
improved performance and performance isolation.
6. Explain Live VM Migration Steps and Performance Effects?
In a cluster built with mixed nodes of host and guest systems, the normal method of
operation is to run everything on the physical machine. When a VM fails, its role could be
replaced by another VM on a different node, as long as they both run with the same guest OS.
The advantage is enhanced failover flexibility.
The migration copies the VM state file from the storage area to the host machine.
There are four ways to manage a virtual cluster.
First, you can use a guest-based manager, by which the cluster manager resides on a guest
system. The host-based manager supervises the guest systems and can restart the guest system
on another physical machine.
These two cluster management systems are either guest-only or host-only, but they do
not mix.
85
86
A third way to manage a virtual cluster is to use an independent cluster manager on both
the host and guest systems. This will make infrastructure management more complex.
Finally, you can use an integrated cluster on the guest and host systems.
VM can be in one of the following four states.
An inactive state is defined by the virtualization platform, under which the VM is not
enabled.
An active state refers to a VM that has been instantiated at the virtualization platform to
perform a real task.
A paused state corresponds to a VM that has been instantiated but disabled to process a
task or paused in awaiting state.
86
87
The provisioning of VMs to a virtual cluster is done dynamically to have the following
Interesting properties:
• The virtual cluster nodes can be either physical or virtual machines. Multiple VMs running
with different OSes can be deployed on the same physical node.
• A VM runs with a guest OS, which is often different from the host OS, that manages the
resources in the physical machine, where the VM is implemented.
• The purpose of using VMs is to consolidate multiple functionalities on the same server. This
will greatly enhance server utilization and application flexibility.
• VMs can be colonized (replicated) in multiple servers for the purpose of promoting distributed
parallelism, fault tolerance, and disaster recovery.
• The size (number of nodes) of a virtual cluster can grow or shrink dynamically, similar to the
way an overlay network varies in size in a peer-to-peer (P2P) network.
• The failure of any physical nodes may disable some VMs installed on the failing nodes. But
the failure of VMs will not pull down the host system.
87
88
Cloud OS for Building Private Clouds (VI: Virtual Infrastructure, EC2: Elastic Compute Cloud).
88
89
Techniques for establishing trusted zones for virtual cluster insulation and VM isolation
89
90
of resources and services, in scientific and engineering applications. The shared resources can
be computers, storage, data, services, networks, science instruments (e.g., sensors), and so on.
Figure: Globus Toolkit GT4 supports distributed and cluster computing services
The GT4 Library
The GT4 Library GT4 offers the middle-level core services in grid applications. The
high-level services and tools, such as MPI, Condor-G, and Nirod/G, are developed by third
parties for general purpose distributed computing applications. The local services, such as LSF,
TCP, Linux, and Condor, are at the bottom level and are fundamental tools supplied by other
developers.
Globus Job Workflow
A typical job execution sequence proceeds as follows: The user delegates his credentials
to a delegation service. The user submits a job request to GRAM with the delegation identifier
as a parameter. GRAM parses the request, retrieves the user proxy certificate from the delegation
service, and then acts on behalf of the user. GRAM sends a transfer request to the RFT, which
applies GridFTP to bring in the necessary files.
GRAM invokes a local scheduler via a GRAM adapter and the SEG initiates a set of user jobs.
The local scheduler reports the job state to the SEG. Once the job is complete, GRAM uses RFT
and GridFTP to stage out the resultant files.
90
91
Description, discovery, access, authentication, authorization, and the like. GT4 makes extensive
use of Java, C, and Python to write user code. Web service mechanisms define
specific interfaces for grid computing. Web services provide flexible, extensible, and widely
adopted XML-based interfaces.
These demand computational, communication, data, and storage resources. We must
enable a range of end-user tools that provide the higher-level capabilities needed in specific user
applications. Developers can use these services and libraries to build simple and complex
systems quickly.
Figure: Client and GT4 server interactions; vertical boxes correspond to service programs and
horizontal boxes represent the user codes.
The horizontal boxes in the client domain denote custom applications and/or third-party
tools that access GT4 services. The toolkit programs provide a set of useful infrastructure
services.
Three containers are used to host user-developed services written in Java, Python, and C,
respectively. These containers provide implementations of security, management, discovery,
state management, and other mechanisms frequently required when building services.
91
92
92
93
93
94
The Hadoop Core MapReduce framework requires a shared file system. This shared file
system does not need to be a system-level file system, as long as there is a distributed file system
plug-in available to the framework.
The Hadoop Core framework comes with plug-ins for HDFS, CloudStore, and S3. Users
are also free to use any distributed file system that is visible as a system-mounted file system,
such as Network File System (NFS), Global File System (GFS), or Lustre.
The Hadoop Distributed File System (HDFS)MapReduce environment provides the user with
a sophisticated framework to manage the execution of map and reduce tasks across a cluster of
machines.
The user is required to tell the framework the following:
• The location(s) in the distributed file system of the job input
• The location(s) in the distributed file system for the job output
• The input format
• The output format
• The class containing the map function
• Optionally. the class containing the reduce function
• The JAR file(s) containing the map and reduce functions and any support classes
The final output will be moved to the output directory, and the job status will be reported
to the user.MapReduce is oriented around key/value pairs. The framework will convert each
record of input into a key/value pair, and each pair will be input to the map function once. The
map output is a set of key/value pairs—nominally one pair that is the transformed input pair. The
map output pairs are grouped and sorted by key. The reduce function is called one time for each
key, in sort sequence, with the key and the set of values that share that key. The reduce method
may output an arbitrary number of key/value pairs, which are written to the output files in the
job output directory. If the reduce output keys are unchanged from the reduce input keys, the
final output will be sorted. The framework provides two processes that handle the management
of MapReduce jobs:
• TaskTracker manages the execution of individual map and reduce tasks on a compute
node in the cluster.
• JobTracker accepts job submissions, provides job monitoring and control, and manages
the distribution of tasks to the TaskTracker nodes.
The JobTracker is a single point of failure, and the JobTracker will work around the
failure of individual TaskTracker processes.
The Hadoop Distributed File System
94
95
HDFS is a file system that is designed for use for MapReduce jobs that read input in large chunks
of input, process it, and write potentially large chunks of output. HDFS does not handle random
access particularly well. For reliability, file data is simply mirrored to multiple storage nodes.
This is referred to as replication in the Hadoop community. As long as at least one replica of a
data chunk is available, the consumer of that data will not know of storage server failures.
HDFS services are provided by two processes:
• NameNode handles management of the file system metadata, and provides management and
control services.
• DataNode provides block storage and retrieval services.
There will be one NameNode process in an HDFS file system, and this is a single point
of failure. Hadoop Core provides recovery and automatic backup of the NameNode, but no hot
failover services. There will be multiple DataNode processes within the cluster, with typically
one DataNode process per storage node in a cluster.
IdentityReducer.java
package org.apache.hadoop.mapred.lib;
95
96
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.MapReduceBase;
/** Performs no reduction, writing all input values directly to the output. */
public class IdentityReducer<K, V>
extends MapReduceBase implements Reducer<K, V, K, V> {
Chapter 2 ■ THE BASICS OF A MAPREDUCE JOB 35
/** Writes all keys and values directly to output. */
public void reduce(K key, Iterator<V> values,
OutputCollector<K, V> output, Reporter reporter)
throws IOException {
while (values.hasNext()) {
output.collect(key, values.next());
}
}
If you require the output of your job to be sorted, the reducer function must pass the key objects
to the output.collect() method unchanged. The reduce phase is, however, free to output any
number of records, including zero records, with the same key and different values.
96
97
It is also possible to run a secondary namenode, which despite its name does not act as a
namenode. Its main role is to periodically merge the namespace image with the edit log to prevent
the edit log from becoming too large. The secondary namenode usually runs on a separate
physical machine, since it requires plenty of CPU and as much memory as the namenode to
perform the merge. It keeps a copy of the merged namespace image, which can be used in the
event of the namenode failing.
HDFS Federation
The namenode keeps a reference to every file and block in the filesystem in memory,
which means that on very large clusters with many files, memory becomes the limiting factor for
scaling.
HDFS Federation, introduced in the 0.23 release series, allows a cluster to scale by adding
namenodes, each of which manages a portion of the filesystem namespace. For example, one
namenode might manage all the files rooted under /user, say, and a second
Namenode might handle files under /share.Under federation, each namenode manages
a namespace volume, which is made up of the metadata for the namespace, and a block
pool containing all the blocks for the files in the namespace. Namespace volumes are
independent of each other, which means namenodes do not communicate with one another, and
furthermore the failure of one namenode does not affect the availability of the namespaces
managed by other namenodes.
Block pool storage is not partitioned, however, so datanodes register with each namenode in the
cluster and store blocks from multiple block pools.
HDFS High-Availability
The combination of replicating namenode metadata on multiple filesystems, and using
the secondary namenode to create checkpoints protects against data loss, but does not provide
high-availability of the filesystem. The namenode is still a single point of failure (SPOF), since
if it did fail, all clients—including MapReduce jobs—would be unable to read, write, or list files,
because the namenode is the sole repository of the metadata and the file-to-block mapping. In
such an event the whole Hadoop system would effectively be out of service until a new namenode
could be brought online. In the event of the failure of the active namenode, the standby takes
over its duties to continue servicing client requests without a significant interruption.
A few architectural changes are needed to allow this to happen:
• The namenodes must use highly-available shared storage to share the edit log.
When a standby namenode comes up it reads up to the end of the shared edit log to synchronize
its state with the active namenode, and then continues to read new entries as they are written by
the active namenode.
• Datanodes must send block reports to both namenodes since the block mappings are
stored in a namenode’s memory, and not on disk.
• Clients must be configured to handle namenode failover, which uses a mechanism that
is transparent to users.
If the active namenode fails, then the standby can take over very quickly since it has the latest
state available in memory: both the latest edit log entries, and an up-to-date block mapping. The
actual observed failover time will be longer in practice (around a minute or so), since the system
needs to be conservative in deciding that the active namenode has failed.
Failover and fencing
The transition from the active namenode to the standby is managed by a new entity in the
system called thefailover controller. Failover controllers are pluggable, but the first
implementation uses ZooKeeper to ensure that only one namenode is active. Each namenode
97
98
runs a lightweight failover controller process whose job it is to monitor its namenode for failures
and trigger a failover should a namenode fail.
Failover may also be initiated manually by an administrator, in the case of routine
maintenance, for example.
In the case of an ungraceful failover, however, it is impossible to be sure that the failed
namenode has stopped running. The HA implementation goes to great lengths to ensure that the
previously active namenode is prevented from doing any damage and causing corruption—a
method known as fencing. The system employs a range of fencing mechanisms, including killing
the namenode’s process, revoking its access to the shared storage directory, and disabling its
network port via a remote management command. As a last resort, the previously active
namenode can be fenced with a technique rather graphically known as STONITH, or “shoot the
other node in the head”, which uses a specialized power distribution unit to forcibly power down
the host machine. Client failover is handled transparently by the client library. The simplest
implementation uses client-side configuration to control failover. The HDFS URI uses a logical
hostname which is mapped to a pair of namenode addresses, and the client library tries each
namenode address until the operation succeeds.
5. Explain Anatomy of a File Read?
The client opens the file it wishes to read by calling open () on the FileSystem object,
which for HDFS is an instance of DistributedFileSystem. DistributedFileSystem calls the
namenode, using RPC, to determine the locations of the blocks for the first few blocks in the file.
The namenode returns the addresses of the datanodes that have a copy of that block.
If the client is itself a datanode ,then it will read from the local datanode, if it hosts a copy
of the block .TheDistributedFileSystem returns an FSDataInputStream to the client for it to read
data from. FSDataInputStream in turn wraps a DFSInputStream, which manages the datanode
and namenode I/O.
98
99
The client then calls read () on the stream. DFSInputStream, which has stored the datanode
addresses for the first few blocks in the file, then connects to the first (closest) datanode for the
first block in the file. Data is streamed from the datanode back to the client, which calls read
() repeatedly on the stream. When the end of the block is reached, DFSInputStream will close
the connection to the datanode, then find the best datanode for the next block. This happens
transparently to the client, which from its point of view is just reading a continuous stream.
Blocks are read in order with the DFSInputStream opening new connections to datanodes
as the client reads through the stream. It will also call the namenode to retrieve the datanode
locations for the next batch of blocks as needed. When the client has finished reading, it
calls close () on the FSDataInputStream .
99
100
100
101
First the pipeline is closed, and any packets in the ack queue are added to the front of the
data queue so that datanodes that are downstream from the failed node will not miss any packets.
The current block 0on the good datanodes is given a new identity, which is communicated to the
namenode, so that the partial block on the failed datanode will be deleted if the failed. Datanode
recovers later on. The failed datanode is removed from the pipeline and the remainder of the
block’s data is written to the two good datanodes in the pipeline. The namenode notices that the
block is under-replicated, and it arranges for a further replica to be created on another node.
Subsequent blocks are then treated as normal. It’s possible, but unlikely, that multiple datanodes
fail while a block is being written. As long as dfs.replication.min replicas (default one) are
written, the write will succeed, and the block will be asynchronously replicated across the cluster
until its target replication factor is reached.
When the client has finished writing data, it calls close () on the stream (step 6). This
action flushes all the remaining packets to the datanode pipeline and waits for acknowledgments
before contacting the namenode to signal that the file is complete (step 7). The namenode already
knows which blocks the file is made up.
101
102
102
103
Deterministic model is a model which does not take uncertainty into account.
12. What is an example of Descriptive model?
An opinion poll, any survey.
13. Define the term Iconic model with example.
This is a physical, or pictorial representation of various aspects of a system.
Example: Toy, Miniature model of a building, scaled up model of a call in biology etc.
14. What are slack variables?
The non-negative variable which added to LHS of the constraint to convert the
inequality ‘≤’,into an equation is called the slack variable.
n
a j 1
ij xi si bi i 1,2,...m Where si are called the slack variables.
a
j 1
ij xi si bi i 1,2,...m
The non-negative variable which is removed from LHS of the constraint to convert the
inequality ‘≥’,into an equation is called the surplus variable.
16. What is meant by decision variable?
While making mathematical modeling of operation research problems, the variables
which are used and the value of which gives the solution are the decision variables.
17. Define artificial variable.
Any non-negative variable which is introduced in the constraint in order to get the
initial basic feasible solution is called artificial variable.
18. What are the methods used to solve an LPP involving artificial variables?
(i) Big M method or penalty method
(ii) Two-phase simplex method
19. What is degeneracy?
A solution is degenerate if one or more basic variables vanished.
20. Explain the importance of the LPP.
In LPP, all the decision variables were allowed to take any non-negative real values as
it is quite possible and appropriate to have fractional values in many solutions and
which are meaningless in the content of the actual decision problem. This is the main
reason where LPP is so important for marginal decisions.
PART- B
1. a. Explain the scope of Operation Research.
(8)
b. List the phases of OR and explain them.
(8)
2. a. Explain classification of models
(8)
b. What is an iconic model in the study of operations research?
(8)
3. a. A manufacturer of packing material manufactures two types of packing tins, Round
and flat. Major production facilities involved are cutting and joining. The cutting
department can process 300 round tins or 500 flat tins per hour. The joining
department can process 500 round tins or 300 flat tins per hour. The contribution
towards profit for a round tin is the same as that of a flat tin. Formulate a linear
103
104
104
105
2x1 +3 x2 ≥ 12 and x1 , x2 ≥ 0
(8)
b. Apply graphical method to find non-negative values of x1 and x2 which
minimize z = 10x1+25x2 subject to x1+x2 ≥ 50,x1 ≥ 20 and x2 ≤ 40.
7. a. Describe simplex method for solving linear programming problem.
(8)
b. Express the following LPP in the canonical form, Maximize z = 3x1+x2
Subject to x1+2x2 ≥ -5
3x1+5x2 ≤ 6 and x1 , x2 ≥ 0
(8)
8. a. Use simplex method to solve the following LPP, Maximize z = 4x1+10x2
Subject to the constraints 2x1+x2 ≤ 50
2x1+5x2 ≤ 100
2x1+3x2 ≤ 90 and x1 , x2 ≥ 0
(8)
b. Solve the following problem by simplex method, Minimize z = x 1-3x2+2x3
Subject to the constraints 3x1- x2 + 2x3 ≤ 7
-2x1+4x2 ≤ 12
-4x1+3x2+8x3 ≤ 10 and x1 , x2,x3 ≥ 0
(8)
10. Solve the following LPP by simplex method. (AU
2016)
Maximize z = 4x1+x2+3x3+5x4
Subject to the constraints 4x1-6 x2-5 x3+4 x4 ≥-20
3x1-2 x2+4x3+x4 ≤ 10
8x1-3x2+3x3+2 x4 ≤ 20 and x1 , x2,x3, x4 ≥ 0
(16)
9. a. Solve by simplex method: Maximize z = 3x1+5x2+4x3
Subject to the constraints 2x1+3 x2 ≤ 8
2x2+5x3 ≤ 10
3x1+2x2+4x3 ≤ 15 and x1 , x2,x3 ≥ 0
(8)
b. A manufacturer is engaged in producing 2 products X and Y, the contribution margin
being Rs.15 and Rs.45 respectively. A unit of product X requires 1 unit of facility A
and 0.5 unit of facility B.A unit of product Y requires 1.6 units of facility A,2.0 units of
facility B and 1 unit of raw material C. The availability of total facility A and b and raw
material c during a particular time period are 240,162and 50 units respectively. Find
out the product mix which will maximize the contribution margin, by simplex
method.
(8)
10. a. Use Two phase simplex method to Maximize z = 5x1-4x2+3x3
Subject to the constraints 2x1+x2-6x3 = 20
6x1+5x2+10x3 ≤ 76
8x1-3x2+6x3 ≤ 50 and x1 , x2,x3 ≥ 0
(8)
b. Use Two phase simplex method to Maximize z = - 4x1 - 3x2 - 9x3
Subject to the constraints 2x1+4x2+6x3 ≥ 15
105
106
5. What is the difference between regular simplex method and dual simplex
method?
In regular simplex method we first determine the entering variable and then the
leaving variable while in the case of dual simplex method we first determine the
leaving variable and then the entering variable.
6. What do you understand by transportation problem?
T.P is a special class of LPP in which we transport the commodity (single
product) from the source to a destination in such a way that the total transportation
problem is minimum.
7. list any three approaches used with T.P for determining the starting solution.
(i) North west corner rule
(ii) Least cost method (or) Matrix minima method
(iii) Vogel’s approximation method
8. What do you mean by degeneracy in a T.P?
If the number of occupied cells in a mxn T.P is less than m+n-1 then it’s called a
degeneracy in a T.P.
9. What do you mean by an unbalanced T.P?
Any T.P is said to be unbalanced if
m n
ai
i 1
b
j 1
j
square matrix.
(ii) supply and demand at any source at any destination may be positive quantity
ai , b j in T.P whereas in A.P it will be 1 i.e., a i b j =1
14.What is the objective of the travelling salesman problem?
The objective of the travelling salesman problem is that the salesman has to visit
Various cities, not visiting twice to the same place and return to the starting place
by spending minimum transportation cost.
15.State the necessary and sufficient condition for a transportation problem to have a
solution.
(AU 2016)
Shortest – Path problem which the objective is to find the shortest distance and the
corresponding path from a given source node to a given destination node in a given
distance network.
16. What is the name of the method used in getting the optimum assignment?
Hungarian Method
PART-B
1. Find the maximum of Z = 6x+8y subject to 5x+2y ≤ 20 , x+2y ≥10 , x,y ≥ 0 by
(16)
solving its dual problem.
2. Solve by Dual simplex method the following LPP
(16)
Minimize Z = 5x1+6x2
Subject to the constraints
x1+x2≥2,
4x1+x2≥ 4
x1, x2 ≥0
O1 6 4 1 5 14
O2 8 9 2 7 16
O3 4 3 6 2 5
Required 6 10 15 4 35
(b)Obtain an initial basic feasible solution to the following TP using Matrix
minima method
(8)
D1 D2 D3 D4 Supply
O1 1 2 3 4 6
O2 4 3 2 0 8
O3 0 2 2 1 10
Required 4 6 8 6 24
6. Obtain an initial basic feasible solution to the following TP using VAM (16)
D1 D2 D3 D4 Supply
O1 11 13 17 14 250
O2 16 18 14 10 300
O3 21 24 13 10 400
7. Solve the following transportation problem starting with the initial solution obtained by
VAM
(16)
D1 D2 D3 D4 Supply
O1 2 2 2 1 3
O2 10 8 5 4 7
O3 7 6 6 8 5
Required 4 3 4 4 15
109
110
8. Consider the problem of assigning four sales persons to four different sales regions as shown
in the following table such that the total sales is maximized. (AU
2016)(16)
Sales region
1 2 3 4
1 10 22 12 14
Salesman 2 16 18 22 10
3 24 20 12 18
4 16 14 24 20
The cell entries represent annual sales figures in lakhs of rupees. Find the optional
allocation of the sales persons to different regions.
8. A company has 3 plants A,B, and C , three warehouses X,Y,Z. A number of units
(16)
available at the plants is 60,70,80 and the demand at X,Y,Z are 50,80,80
respectively. The unit cost of the transportation is given in the following table
X Y Z
A 8 7 3
B 3 8 9
C 11 3 5
110
111
2004 490 -- --
(b)Assuming that a
car must be kept in service at least 2 years, with a maximum service
life of 4 years. The planning horizon is from the start of 2001 to the end of 2005. The
following table provides the necessary data (8)
UNIT-III
INTEGER DYNAMIC PROGRAMMING
PART-A
1. What do you mean by integer programming problem?
An LPP in which some or all of the variables in the optimal solution are restricted to
assume non – negative integer values is called an integer programming problem.
2. Define a pure integer programming problem.
In a LPP if all the variables in the optimal solution are restricted to assume non-
negative integer values then it is called a pure IPP.
3. Mention some important applications of integer programming problem. (AU 2016)
Capital budgeting, Construction scheduling, Routing and shipping schedule, Capacity
expansion.
4. Define a mixed integer programming problem.
In a LPP if only some of the variables in the optimal solution are restricted to assume
non – negative integer values, while the remaining variables are free to take any non-
negative values then it is called a mixed IPP.
5. Differentiate between pure and mixed IPP.
In a pure IPP all the variables in the optimal solution are restricted to assume non-
negative integer values. Whereas in mixed IPP, only some of the variables in the optimal
solutions are restricted to assume non-negative integer values.
6. What are the methods used in solving IPP
(AU 2016)
There are two methods namely
1. Cutting methods (Gomory’s cutting plane method)
2. Search method ( Branch and Bound typing)
7. Explain Gomorian constraint (or) Fractional Cut constraint.
A new constraint introduced to the problem such that the new set of feasible solution
includes all the original feasible integer solution but does not include the optimum non-
integer solution initially found. This new constraint is called Gomorian constraint (or)
Fractional Cut constraint
7. Where is branch and bound method used?
111
112
This method is an enumeration method which is used when all feasible integer
points are not enumerated.
8.What is dynamic programming?
Many decision making problems involve a process that takes place in multiple
stages in such a way that in each stage the process is dependent and the strategy
chosen. Such type of problems are called dynamic programming problem.
9. Define the terms in dynamic programming : stage, state ,state variables
Stage : A stage may be defined as a portion of the problem that possess a set of mutually
exclusive alternatives from which the best alternative is to be selected.
State : States are various possible conditions in which the system may find itself at that
stage of the problem
State Variables : The current situation of the system at a stage is described by a set of
variables called state variables.
10. Give a few applications of DPP.
(i) It is used to determine the optimal combination of advertising media (TV, radio,
newspapers) and the frequency of advertising.
(ii) It has been used to determine the inventory level and for formulating the
inventory recording.
11. State Bellman’s principle of optimality.
It states that “an optimal policy has the property that whatever be the initial state and the
initial decisions , the remaining decisions must constitute an optimal policy for the state
resulting from the first decision”.
12. What are the advantages of Dynamic programming?
The decision making process consist of selecting a combination of plans from a large
number alternative combinations, which also need a lot of computational work, where too
much time is involved. Also the number of combinations is very large. These drawbacks
can be avoided by using DPP as it divides a given problem in sub-problems or stages.
Only one stage is considered at a time and various feasible combinations are eliminated
by reducing the volume of computations.
13. Write the any two need of dynamic programming ?
(i)All the decisions of a combination are specified
(ii)The number of combinations is so large.
14. Write the any two characteristics of dynamic programming problems?
(i) The problem can be divided in to stages, with a policy decision required at each
stage.
(ii) The effect of the policy decision at each stage is to transform the current state into a
state associated with the next stage.
15. Define forward computational procedure.
If the dynamic programming problem is solved by using the recursive equation starting
from the first through the last stage, i.e., obtaining the sequence f1→f2→f3→……fn of the
optimal solutions. This computation is called the forward computational procedure.
PART B
1. Find an optimum integer solution to the following LPP
(16)
Maximize Z = x1+2x2
Subject to the constraints
2x2≤ 7,
x1+x2≤ 7
112
113
2x2≤ 11
x1, x2 ≥0 , x1, x2 are integers
2. Solve the following integer programming problem (16)
Maximize Z = 2x1+20 x2-10x3
Subject to the constraints
2x1+20x2+4x3 ≤15,
6x1+20x2+4x3 =20,
x1, x2, x3 ≥0 and are integers
3. Solve the following LPP.
(AU 2016)(16)
Minimize Z = -2x1-3x2
Subject to the constraints
2x1+2x2≤ 7
x1 ≤ 2
x2 ≤ 2
and x1, x2 ≥0 and are integer
4. A manufacturer of baby-dolls makes two types of dolls, doll X and doll Y. Processing of
these two dolls is done on two machines, A and B. Doll X requires two hours on machine
A and six hours on machine B. Doll Y requires five hours on machine A and also five
hours on machine B. There are sixteen hours of time per day available on machine A
and thirty hours on machine B. The profit gained on both the dolls is same, ie., one
rupee per doll. What should be the daily production of each of the two dolls?
(a) Set up and solve the I.P.P
(b) If the optimal solution is not integer valued, use the Gomory technique to derive the
optimal solution.
(16)
5. Using Gomory’s cutting plane method
(16)
Maximize Z = 2x1+2x2
Subject to the constraints
5x1+3x2≤8,
2x1+4x2≤ 8
x1, x2 ≥0 and all are integers
6. Solve the following mixed integer programming problem by using Gomory’s cutting
plane method
(16)
Maximize Z = x1+x2
Subject to the constraints
3x1+2x2≤ 5,
x2 ≤ 2
x1, x2 ≥0 , x1 is an integer
7. Solve the following mixed integer programming problem:
(16)
Maximize Z = x1+x2
Subject to the constraints
2x1+5x2≤16
6x1+5x2≤ 30
x2 ≥0,x1,non negative integers
113
114
Maximize Z = x1+4x2
Subject to the constraints
2x1+4x2≤ 7,
5x1+3x2≤15
x1, x2 ≥0 and are integers
12. Use Branch and Bound technique to solve the following (16)
Maximize Z = 2x1+2x2
Subject to the constraints
5x1+3x2≤ 8,
x1+2x2≤4
x1, x2 ≥0 and integers
13. Use Branch and Bound technique to solve the following (16)
Maximize Z = 3x1+4x2
Subject to the constraints
7x1+16x2≤ 52,
3x1-2x2≤18
x1, x2 ≥0 and integers
14. Use dynamic programming to solve the following LPP
(16)
Maximize Z = x1+9x2
Subject to the constraints
2x1+x2≤25
x2≤11,
114
115
x1, x2 ≥0
15. A student has to take examinations is three courses A,B and C. He has three
days available for study. He feels is would be best to denote a whole day to the
study of the same course. So that he may study a course for one day, two days or
three days or not at all. His estimates of grades he may get by study are as follows
Course
/ study A B C
days
0 0 1 0
1 1 1 1
2 1 3 3
3 3 4 3
How should he plan to study so that he maximizes the sum of his grades. (AU
2016)(16)
16. The owner of a chain of four grocery stores has purchased six crates of fresh (16)
strawberries. The estimated probability distribution of potential sales of the
strawberries before spoilage differ among four stores. The following table gives the
estimated total expected profit at each store when various number of crates are
allocated to it. For administrative reasons, the owner does not wish to split crates
between stores. However, he is willing to distribute zero crates to any of his stores.
Find the allocation of six crates to four stores so as to maximize the expected profit
1 4 2 6 2
2 6 4 8 3
3 7 6 8 4
4 7 8 8 4
5 7 9 8 4
6 7 10 8 4
x1, x2 ≥0
UNIT – IV
CLASSICAL OPTIMIZATION THEORY
PART -A
116
117
z g1
.
C mxnm z g .
Define is called the Control matrix
.
g
z m
6. Define sensitivity analysis in the Jacobean method
The Jacobean method can be used to study the effect of small changes in the right
hand side of the constraints on the optimal value of f. what is the effect of changing
gi(X)=0 to gi(X)= g i on the optimal value of f. this type of investigation is called
sensitivity analysis
117
118
2 ) [ X f( X * ) X g( X ) ]X
* * *
= 0
3) X
*
0
4) g( X * ) b
5 ) ( g( X ) b )
* *
= 0
6)
*
0
13. Define maximization type objective function.
The stationary points will give the maximum objective function value if the sign
of each of the last (n-m) principal minor determinants of the bordered Hessian matrix
is the same as that of (-1)m+1, ending with the (2m+1)th principal minor determinant.
14. Define minimization type objective function.
The stationary points will give the minimum objective function value if the sign of
each of the last (n-m) principal minor determinants of the bordered Hessian matrix is
the same as that of (-1)m, ending with the (2m+1)th principal minor determinant.
15.Examine f(x)=6x5-4x3+10 for extreme points (AU 2016)
f(x)=6x5-4x3+10
f’(x)=30x4-12x2
f’(x)=0 implies x=0, x2=2/5
PART - B
g 2 ( X ) x1 5 x1 x 2 x3 7 0
2 2
118
119
f(X) x1 x 2 x3 x 4
2 2 2 2
Maximize
subject to g 1 (X) x 1 2 x 2 3 x3 5 x 4 10 0
g 2 ( X ) x 1 2 x 2 5 x3 6 x 4 15 0
Show that by selecting x3 and x4 as independent variables, the Jacobean method fails
to provide a solution and state the reason
(8)
3.(a) Consider the linear program, by Lagrangean method (8)
Maximize f(X) x1 x 2 x3
2 2 2
subject to g 1 (X) x 1 x 2 3 x3 2 0
g 2 ( X ) 5x 1 2 x 2 x3 5 0
(b) Solve the following linear programming problem, by Jacobean and Lagrangean
methods
(8)
Maximize f(X) 5 x1 3 x 2
subject to g 1 (X) x 1 2 x 2 x3 6 0
g 2 ( X ) 3x 1 x 2 x 4 9 0
x 1 , x 2 , x3 , x 4 0
subject to g 1 (X) x 1 x 2 x3 5 0
2
g 2 ( X ) x 1 5 x 2 x3 7 0
Suppose that g1(X)=.01 and g2(X)=.02. Find the corresponding change in the optimal
Value of f(X)
(8)
(b) Solve the following nonlinear programming problem using Lagrangean method
(8)
Maximize z 4 x1 0.02 x1 x 2 0.02 x 2
2 2
x 1 2 x 2 120
x 1 , and x 2 , 0
5.(a) Solve the following nonlinear programming problem using Lagrangean method (8)
Maximize z 2 x1 3x 2 18 x 2
2 2
2 x 1 x2 8
x 1 , and x 2 , 0
119
120
(b) Solve the following nonlinear programming problem using Lagrangean method (8)
Maximize z x1 2 x 2 x3
2 2 2
2 x 1 x 2 2 x3 30
x 1 , x 2 and x3 , 0
6.(a) Solve the following nonlinear programming problem using Lagrangean method (8)
Maximize z x1 2 x 2 1.5 x3
2 2 2
2 x 1 2 x 2 3 x3 30 0
3 x 1 4 x 2 4 x3 20 0
x 1 , x 2 and x3 , 0
(b) Solve the following nonlinear programming problem using Lagrangean multipliers
Method
(8)
Maximize z 4 x1 2 x 2 x3 4 x1 x 2
2 2 2
subject to x 1 x 2 x3 15
2 x 1 x 2 2 x3 20
x 1 , x 2 and x3 , 0
7.(a) Solve the following nonlinear programming problem using Lagrangean multipliers
Method
(8)
Maximize z x1 x 2 x3
2 2 2
subject to x 1 x 2 3 x3 2
5x 1 2 x 2 x3 5
x 1 , x 2 and x3 , 0
(b) Obtain the necessary condition for the optimum solution of the following nonlinear
Programming problem (NLLP)
(8)
Minimize z 2 x1 24 x1 2 x 2 8 x 2 2 x3 12 x3 200
2 2 2
(b) Solve the nonlinear Programming problem (NLLP) using the method of lagrangian
Multipliers
(8)
120
121
Optimize z 4 x1 2 x 2 x3 4 x1 x 2
2 2 2
Minimize z 6 x1 5x2
2 2
(8)
z 3 x1 14 x1 x 2 8 x 2
2 2
Maximize
subject to 3x 1 6 x 2 72
x1 , x 2 0
11.(a) Solve the following nonlinear programming problem using Kuhn-Tucker conditions
(8)
z x1 x1 x 2 2 x 2
2 2
Maximize
subject to 4x 1 2 x 2 24
5x 1 10 x 2 30
x1 , x 2 0
(b) Solve the following nonlinear programming problem using Kuhn-Tucker conditions
(8)
121
122
z x1 x1 x 2 2 x 2
2 2
Maximize
subject to 4x 1 2 x 2 24
x1 , x 2 0
12.(a) Solve the following nonlinear programming problem using Kuhn-Tucker conditions
(8)
z 8 x1 10 x 2 x1 x 2
2 2
Maximize
subject to 3x 1 2 x 2 6
x1 , x 2 0
(b) Solve the following nonlinear programming problem using Kuhn-Tucker conditions
(8)
f(X) x1 x 2 x1 x3
3 2 3
Maximize
x 1 x 2 x3 5
2
subject to
5x 1 x 2 x3 2
2 2
x 1 , x 2 ,x 3 0
13.(a) Solve the following nonlinear programming problem using Kuhn-Tucker conditions
(8)
f(X) x1 x 2 5 x1 x 2 x3
4 2
Minimize
x 1 x 2 x3 10
2 2 3
subject to
x 1 x 2 4 x3 20
3 2 3
x 1 , x 2 ,x 3 0
122
123
Minimize z 2 x1 3 x 2 x1 2 x 2
2 2
Maximize z 3 x1 x 2
x 1 3x2 5
2 2
subject to the constraints
x 1 x2 1
x1 , x 2 0
UNIT-V
OBJECT SCHEDULING
PART-A
1. Define activity?
This is a task or job of work, which takes or consumes time and resources. For example,
“Build a wall”, Dig foundations for a building”, “Verify the names of debtors in a sales ledger’,
etc.
An activity is represented in a network by an arrow. The tail of the arrow indicates where the
task begins and the head where the task ends. The arrow points from left to right and it is not
drawn to scale.
2. Define event?
This is a point in time and indicates the “start” or “finish” of an activity or activities. An
event is represented in a network by a circle or node .
3. Define dummy activity?
This is an activity, which does not consume time or resources. It issued to merely show
clear, logical dependencies between activities so as not to violets the rules for drawing networks.
It is represented in a network by dotted arrow thus.
4. Define Network?
This is the combination of activities, dummy activities and events in logical sequence
according to the rules for drawing networks.
5. State the rules in drawing a NEWORK DIAGRAM?
i. Node 1 represents the ‘start” of the project. An arc or arcs should lead from node 1 to
represent such activity that has no predecessor. A network has only one “start” node.
ii. A node (called the “finish” node) representing the completion of the project
should be included in the network. A network has only one “finish” node.
iii. Number the nodes in the network so that the node representing the completion of an
activity always has a larger number than the node representing the beginning of an
activity.
(There may be more than one numbering scheme)
iv. An activity should not be represented by more than one arc in the network
v. Two nodes can be connected by at most one arc. That is activities should not share the
same “start” node (tail event) and “finish” node (head event).
vi. Every activity must have one preceding event (tail event) and one succeeding event(head
event).
6. State the FULKERSON’S RULE?
123
124
Calculations begin from the ‘finish’ node and move to the ‘start’ node. At each node a number
is computed representing the Latest Completion Time (LCT) the corresponding event.
10 .Define FLOAT?
Float (spare time or slack time) is the amount of time a path of activities could be
delayed without delaying the overall project. A float can only be associated with
activities, which are non-critical. By definition, activities on the critical path cannot have
a float (spare time). In other words, the float for an activity is the difference between the
maximum time available for that activity and the duration of that activity. The float for an
activity with given by:
Float = Latest Start Time – Earliest Start Time (F= LST– EST ) or (LCT-ECT)
11.What is independent float time?
The independent float time of an activity is the amount by which the duration of an
activity could be extended without affecting the total project time, the time available for
subsequent activities or the time available for the preceding activities. = [Free Floatij – (Slack
of event i)] or ZERO, whichever is higher. Also EST of following activity – LFT of preceding
activity – Duration of current activity or Zero, whichever is higher.
12. What is interfering float time?
The interfering float time is the part of total float which causes a reduction in the float
of successor activities. It is that portion of the activity float which cannot be consumed without
affecting adversely the float of the subsequent activity or activities. = LFT – (EST of following
activity) or ZERO, whichever is higher.
13. What are the benefits of PERT?
PERT is useful because it provides the following information:
Expected project completion time.
Probability of completion before a specified date.
The critical path activities that directly impact the completion time.
The activities that have slack time and that can lend resources to critical path
activities.
Activity starts and end dates.
14. What are the steps in the pert planning process?
PERT planning involves the following steps:
Identify the specific activities and milestones.
Determine the proper sequence of the activities.
Construct a network diagram.
Estimate the time required for each activity.
Determine the critical path.
Update the PERT chart as the project progresses.
15. What do you mean by project crashing?
Project Crashing: There are usually compelling reasons to complete the project
earlier than the originally estimated duration of critical path computed on the
normal basis of a new project.
Direct Cost: This is the cost of the materials, equipment and labour required to perform
the activity. When the time duration is reduced the project direct cost increases.
Activity Cost Slope = (Cc- Nc)÷(Nt-Ct)
Where, Cc = Crash Cost = Direct cost that is anticipated in completing
an
activity within crash time. Nc = Normal Cost = This is the lowest possible
direct cost required to complete an activity Nt = Normal Time = Min.
time
125
126
A 3 wks C
2 wks D 2 wks
3 wks
1 wk End
Start
4 wks
9 wks
2 wks
B E
126
127
PART-B
1. a. Depict the following dependency relationships by means of network diagrams.(The
Alphabets stands for activities)
1. A and B control F; B and C control G.
2. A and B control F; B controls G while C controls G and H.
3. A controls F and G; B controls G while C controls G and H.
4. A controls F and G; B and C control G with H depending upon C.
5. F and G are controlled by A, G and H are controlled by B with H controlled by B and C.
127
128
b. Construct the project network comprised of activities A to P that satisfies the following
precedence relationships:
(a) A,B and C, the first activities of the project can be executed concurrently
(b) D,E and F follow A
(c) I and G follow both B and D
(d) H follows both C & G
(e) K and L follow I
(f) J succeeds both E and H
(g) M and N succeed F, but cannot start until both E and H are completed.
(h) O succeeds both M and I
(i) P succeeds J,L and O
(j) K,N and P are the terminal activities of the project. (8)
3. a. Construct the project network comprised of activities A to P that satisfies the following
precedence relationships:
(a) A,B and C, the first activities of the project can be executed concurrently
(b) D,E and F follow A
(c) I and G follow both B and D
(d) H follows both C & G
(e) K and L follow I
(f) J succeeds both E and H
(g) M and N succeed F, but cannot start until both E and H are completed.
(h) O succeeds both M and I
(i) P succeeds J,L and O
(j) K,N and P are the terminal activities of the project. (8)
128
129
4. a. The footing of a building can be completed in four consecutive sections. The activities
for
each section include (1) digging, (2) placing steel, and (3) pouring concrete. The
digging
of one section cannot start until that of the preceding section has been completed. The
same restriction applies placing steel & pouring concrete. Develop the project network.
1-3 1 5-7 8
2-4 1 6-8 1
3-4 1 7-8 2
3-5 6 8-9 1
4-9 5 8-10 8
9-10 7
(i) Construct the PERT network
(ii) Compute E and L for each event;
Float for each activity; and
(iii) Find critical path and its duration.
b. The utility data for a network are given below. Determine the total, free, independent and
Interfering floats and identify the critical path.
Activity: 0-1 1-2 1-3 2-4 2-5 3-4 3-6 4-7 5-7 6-7
Duration: 2 8 10 6 3 3 7 5 2 8
(8)
6. a. For the network given below, compute E and L for each event & determine the
total, free, independent and interfering floats and identify the critical path.
129
130
(8)
b. The following table gives the activities in a construction project and the time duration of
each activity:
Normal Time
Activity Preceding activity (Days)
A - 16
B - 20
C A 8
D A 10
E B, C 6
F D, E 12
(i)Draw the activity network of the project.
(ii) Find critical path.
(iii) Find the total float and free-float for each activity.
(8)
7.a. Consider the network shown below. The three time estimates for the activities are given
along the arrows. Determine the critical path. What is the probability that the project will be
completed in 20 days? (8)
b. Consider the schedule of activities and related information as given below, for the
construction of a plant: (8)
Activiy Expected Time Variance Expected Cost
(Millions of
(Months) Rs.)
1-2 4 1 5
2-3 2 1 3
3-6 3 1 4
2-4 6 2 9
1-5 2 1 2
5-6 5 1 12
4-6 9 5 20
130
131
5-7 7 8 7
7-8 10 16 14
6-8 1 1 4
Assuming that the cost and time required for one activity is independent of the time an cost of
any other activity are expected to follow normal distribution.
Draw a network based on the above data and calculate:
a. Critical path
b. Expected cost of construction of the plant.
c. Expected time required to build the plant.
d. The standard deviation of the expected time.
8. a A project consists of seven activities and the time estimates of the activities are furnished
as under:
Activity Optimistic Most likely Pessimistic
Days Days Days
1-2 4 10 16
1-3 3 6 9
1-4 4 7 16
2-5 5 5 5
3-5 8 11 32
4-6 4 10 16
5-6 2 5 8
(i) Draw the network diagram.
(ii) Identify the critical path and its duration.
(8)
(iii) What is the probability that project will be completed in 5 days earlier than the
critical path duration?
(iv) What project duration will provide 95% confidence level of completion ?
b. The time estimate (in weeks) for the activities of a PERT network is given below:
Activity to tm tp
1-2 1 1 7
1-3 1 4 7
1-4 2 2 8
2-5 1 1 1
3-5 2 5 14
4-6 2 5 8
5-6 3 6 15
(a) Draw the project network and identify all the paths through it. (8)
(b) Determine the expected project length.
(c) Calculate the standard deviation and variance of the project length.
(d) What is the probability that the project will be completed.
1. at least 4 weeks earlier than expected time?
2. no more that 4 weeks later than expected time?
(e) If the projecttime
(f) completion
The probability dueisthat
date is
the19project
20 weeks.weeks, will
what be
is the probability
completed on ofschedule
not meeting the scheduled
if the due date?
What should be the scheduled completion time for the probability of completion to be
90%?
9. a .Given the following project network, determine:
131
132
(8)
9. b. A small project is composed of seven activities, whose time estimates are listed below.
Activities are identified by their beginning (i) and (j) node number.
Activity Estimated durations (in days)
Pessimisti
(i-j) Optimistic Most Likely c
1-2 2 2 14
1-3 2 8 14
1-4 4 4 16
2-5 2 2 2
3-5 4 10 28
4-6 4 10 16
(a) Draw the project network.
(b) Find the expected duration and variance for each activity. What is the expected project
length. (8)
10.a. A project consists of the following activities, whose time estimates are given against each
as under:
Estimated duration (weeks)
Activit Most Pessimisti
y Optimistic likely c
1-2 3 6 15
1-3 2 5 14
1-4 6 12 30
2-5 2 5 8
2-6 5 11 17
3-6 3 6 15
4-7 3 9 27
5-7 1 4 7
6-7 4 19 28
132
133
Required :
(i)Draw the project net work.
(ii) Find the expected duration and variance of each activity.
(iii) Determine the critical path and the expected project duration.
(iv) What is the probability that the project will be completed in 38 weeks? (8)
b. An Engineering Project has the following activities, whose time estimates are listed below:
Activity Estimated Duration (in months)
(i-j) Optimistic Most Likely Pessimistic
1-2 2 2 14
1-3 2 8 14
1-4 4 4 16
2-5 2 2 2
3-5 4 10 28
4-6 4 10 16
5-6 6 12 30
(a) Draw the project network and find the critical path.
(b) Find the expected duration and variance for each activity. What is the expected project
length?
(c) Calculate the variance and standard deviation of the project length.
(d) What is the probability that the project will be completed at least eight months earlier
than expected time?
(e) If the project due date is 38 months, what is the probability of not meeting the due date?
Given:
Given: z 0.50 0.67 1.00 1.33 2.00
P 0.3085 0.2514 0.1587 0.0918 0.0228
11.a.A civil engineering firm has to bid for the construction of a dam. The activities and their
time estimates are given below:
Activity Optimistic Most likely Pessimistic
1-2 14 17 25
2-3 14 18 21
2-4 13 15 18
2-8 16 19 28
3-4
(dummy) 0 0 0
3-5 15 18 27
4-6 13 17 21
5-7
(dummy) 0 0 0
5-9 14 18 20
6-7
(dummy) 0 0 0
6-8
(dummy) 0 0 0
7-9 16 20 41
133
134
8-9 14 16 22
The policy of the firm with respect to submitting bids is to bid the minimum amount that will
provide a 95% of probability of at best breaking-even. The fixed costs for the project are eight
lakhs and the variable costs are 9000 every day spent working on the project. The duration is
in days and the costs are in rupees.
What amount should the firm bid under this policy? (8)
b. The optimistic, most likely and pessimistic times of the activities of a project are given
below. Activity 40-50 must not start before 22 days, while activity 70-90 must end by 35 days.
The scheduled completion time of the project is 46 days. Draw the network and (8) determine
the critical path. What is the probability of completing the project in scheduled time?
to-tm-
Activity to-tm-tp Activity tp
10-20 4-8-12 50-70 3-6-9
20-30 1-4-7 50-80 4-6-8
20-40 8-12-16 60-100 4-6-8
30-50 3-5-7 70-90 4-8-12
40-50 0-0-0 80-90 2-5-8
40-60 3-6-9 90-100 4-10-16
H 3 F,G 4
The contract specifies that the project must be completed in 14 weeks. This company will assign
a fixed number of workers to the project for its entire duration, and so it would like to ensure that
the minimum number of workers is assigned and that the project will be completed in 14 weeks.
Find a schedule which will do this. (8)
15.a. Office Automation, Inc., has developed a proposal for introducing a new computerized
office system that will improve word processing and interoffice communications for a
particular company. Contained in the proposal is a list of activities that must be
accomplished to complete the new office system project. Information about the activities is
shown below.
135
136
b. The data shown in the following table relates to a contract being undertaken. There are
also site costs of £500 per day.
You are required to:
(1) calculate and state the time for completion on a normal basis;
(2) calculate and state the critical path on this basis, and the cost;
(3) calculate and state the cost of completion in the shortest possible time. (8)
136
137
Extensible markup language. It offer a standard, flexible and inherently extensible data
format, XML significantly reduces the burden of deploying the many technologies needed to
ensure the success of Web services.
2. Define XML attributes
• XML elements can have attributes in the start tag, just like HTML.
• Attributes are used to provide additional information about elements.
• Attributes cannot contain multiple values (child elements can)
• Attributes are not easily expandable (for future changes)
3. Write the main difference between XML and HTML
Main Difference between XML
and HTML XML was designed to
carry data.
XML is not a replacement for HTML.
XML and HTML were designed with different goals:
XML was designed to describe data and to focus on
what data is. HTML was designed to display data and to
focus on how data looks.
HTML is about displaying information, while XML is about describing information
4. What is meant by a XML namespace?
XML Namespaces provide a method to avoid element name conflicts. When using prefixes in
XML, a so-called namespace for the prefix must be defined. The namespace is defined by the
xmlns attribute in the start tag of an element. The namespace declaration has the following
syntax. xmlns:prefix="URI".
<root> <h:table xmlns:h="http://www.w3.org/TR/html4/"> <h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr> </h:table>
<f:table xmlns:f="http://www.w3schools.com/furniture">
<f:name>African Coffee Table</f:name> <f:width>80</f:width>
<f:length>120</f:length> </f:table> </root>
5. What is XML namespace?
XML allows document authors to create custom elements.
This extensibility can result in naming collisions (i.e. different elements that have the
same name) among elements in an XML document.
An XML namespace is a collection of element and attribute names. Each namespace has
a unique name that provides a means for document authors to unambiguously refer to
elements with the same name (i.e. prevent collisions).
6. What is the purpose of namespace?
XML Namespaces provide a method to avoid element name conflicts. In XML, element
names are defined by the developer. This often results in a conflict when trying to mix XML
documents from different XML applications.
137
138
139
140
UNIT – II
2. What is SAX?
SAX is an example of a grass- roots development effort to provide a simple; Java based API
for processing XML.
CSS can be used with HTML.But XSL can’t be used in HTML Both can
be used in XML
CSS is not a transformation language but XSL.
<Signature>
<SignedInfo>
<CanonicalizationMethod />
<SignatureMethod />
<Reference>
<Transforms>
<DigestMethod>
<DigestValue>
</Reference>
140
141
<Reference />
</SignedInfo>
<SignatureValue/>
<KeyInfo/>
<Object/>
</Signature>
7. What is DOM?
The Document Object Model (DOM) is the model that describes how all elements in an HTML
page, like input fields, images, paragraphs etc., are related to the topmost structure: the
document itself. By calling the element by its proper DOM name, we can influence it.
The window object represents an open window in a browser.If a document contain frames ( or
tags), the browser creates one window object for the HTML document, and one additional
window object for each frame.some of the window object properties are:
closed document frames history.
SAX (Simple API for XML) Parser DOM (Document Object Model) Parser and XSLT
(XML Style Sheet) Parsers.
XSL (XML Stylesheet) Programming is the Next Generation of the CSS (Cascading Style
Sheet Programming). In CSS, users can use certain style tags which the browsers can
understand and are predefined by W3 consortium. XSL takes ths to one step ahead and users
can define any tags in the XML file. XML sheets can help in showing the same data in
different formats.
XSLT stands for XSL Transformations XSLT is the most important part of XSL XSLT
transforms an XML document into another XML document XSLT uses XPath to navigate in
XML documents XSLT is a W3C Recommendation.
Servlets are Java technology's answer to CGI programming. They are programs that run on a
Web server and build Web pages.
XML is used in many aspects of web development, often to simplify data storage and sharing.
· Security
· Portability
· Scalability
142
143
· Reliability
· Relational Databases
It identifies the version of the XML specification to which the document conforms.
Example:
<?xml version=”1.0”?>
· Encoding Declaration
143
144
UNIT – III
1. What is Service Oriented Architecture?
Contemporary SOA represents an architecture that promotes service orientation through the use
of web services.
144
145
6. What is Architecture?
Application architecture is a template for all others which specifically explained the technology,
boundaries, rules, limitations, and design characteristics that apply to all solutions based on this
template.
9. List out the primary characteristics of the two tier client server architecture?
The primary characteristics of the two tier client server architectures is given below which is
compared to SOA
i. Application logic
ii. Application processing
145
146
iii. Technology
iv. Security
v. Administration
11. What are the issues that are raised in the client-server and the distributed Internet
architecture?
The issues that are raised in the client-server and the distributed Internet architecture
comparisons are discussed in a comparison between multi-tier client-server and SOA.
i. Application logic
ii. Application processing
iii. Technology
iv. Security
v. Administration
12. List some of the characteristics of Application Service layer.
146
147
Enterprise architectures often contain a long-term vision of how the organization plans to evolve
its technology and environments. For example, the goal of phasing out an outdated technology
platform may be established in this specification.
Fundamental parts of the framework 1. SOAP messages 2. Web service operations 3. Web
services 4. Activities
Messages = units of communication (A message represents the data required to complete some
or all parts of a unit of work).
Operations = units of work (An operation represents the logic required to process messages in
order to complete a unit of work.)
22. Define services.
Services = units of processing logic (A service represents a logically grouped set of operations
capable of performing related units of work.)
23. Define processes.
Processes = units of automation logic. (A process contains the business rules that determine
which service operations are used to complete a unit of automation.
The three layers of abstraction we identified for SOA are: *) the application service layer *) the
business service layer *) the orchestration service layer.
147
148
While application services are responsible for representing technology and application logic, the
business service layer introduces a service concerned solely with representing business logic,
called the business service.
Business services are the lifeblood of contemporary SOA. They are responsible for expressing
business logic through service-orientation and bring the representation of corporate business
models into the Web services arena.
148
149
UNIT IV
1. Expand UDDI.
Client-server remote procedure call (RPC) connection is used for remote communication
between components residing on client workstations and servers.
The definition element is the root element in WSDL. It defines the name of the web
service and specifies the namespace that would be used in the WSDL document.
The <message> element describes the data being exchanged between the web service
providers and consumers. The <message> element assigns the message a name and
contains one or more part child elements that each are assigned a type.
The binding element begins the concrete portion of the service definition, to assign a
communications protocol that can be used to access and interact with the WSDL. The
binding construct contains one or more operation elements.
Element Defines
149
150
10. What are the basic parts comprised in the web services framework?
150
151
The service provider is used to identify the organization (or individual) responsible for
actually providing the web service. It simply referred as the service being invoked.
Service requestor is a processing logic unit capable of issuing a request message that can
be understood by the service provider.
A WSDL service description explains how the service description document itself is
organized. It is also known as WSDL service definition or just WSDL definition.
An abstract description establishes the interface characteristics of the web service without
any reference to the technology used to host or enable a web service to transmit messages.
Port type provides a high-level view of the service interface by sorting the messages a
service can process into groups of functions.
The Simple Object Access Protocol (SOAP) is used to define a standard message format
151
152
<?xml version=”1.0”?>
<soap:Envelope
xmlns:soap=”http://www.w3.org/2001/12/soap-envelope”
soap:encodingStyle=”http://www.w3.org/2001/12/soap-encoding”>
<soap:Header>
........
</soap:Header>
<soap:Body>
......
<soap:Fault>
......
</soap:Fault>
</soap:Body>
</soap:Envelope>
The programs that use services to transmit and receive SOAP messages are referred to as
SOAP nodes.
152
153
The route taken by the message is called the SOAP message path. The set of SOAP nodes
through which the SOAP message passes, including the initial sender, the ultimate
receiver and one or more intermediaries are called the SOAP message path.
Message Exchange Pattern (MEP) defines the way that SOAP messages are exchanged
between the web service requester and web service provider. It represents a set of
templates.
The style attribute of the soap:binding element defines whether the SOAP messages used
to support an operation are to be formatted.
34. List out the format supported by the style attribute of the soap:binding element.
The soap:body element defines the data type system to be used SOAP processors, via the
use attribute. The use attribute can be set to “encoding” or “literal”.
The import element is used to import parts of the WSDL definition as well as XSD
schemas.
153
154
The SOAP <Envelope> is the root element in every SOAP message, and contains two
child elements
i. an optional <Header>
ii. a mandatory <Body>
41. What is the use of Header element?
The SOAP <Fault> is a sub-element of the SOAP body, which i used for reporting errors.
It is used to carry error and status information within a SOAP message.
<participantType name=”Buyer”>
<description type=”documentation”>
Buyer Participant
</description>
<roleType typeRef=”tns:BuyerRole”/>
</partcipantType>
46. How will you declare the relationship between the roles in WS-Choreography?
<relationshipType name=”ncname”>
<role type=”qname” behavior=”list of ncname”?/>
<role type=”qname” behavior=”list of ncname”?/>
</relationshipType>
UNIT V
1. What is Service oriented analysis?
The service oriented analysis is the process of determining how business automation
requirements can be represented through service orientation.
155
156
Business-centric SOA is the process of introducing service oriented principles into the
domain of business analysis.
i. Reusability
ii. Autonomy
iii. Statelessness
iv. Discoverability
Service oriented design phase is a process that transforms previously modeled service
candidates into physical service designs.
156
157
12. What are the steps needed to design the Entity-centric business
service?
13. List out the SOA principles supported by Application service design.
i. Reusability
ii. Autonomy
iii. Statelessness
iv. Discoverability
14. Write down the steps for Task-centric business service design.
157
158
JAX-WS is a technology for building web services using XML. In JAX-WS, a web
service operation invocation is represented by an XML-based protocol such as SOAP.
SEI is a java interface or class that declares the methods that a client can invoke on the
service.
Java Architecture for XML binding API (JAXB) provides a means of generating Java
classes from XSD schemas and further abstracting XML-level development.
The Java API for XML Registries (JAXR) provides a uniform and standard Java API for
accessing various kinds of XML registries.
158
159
i. JAXR client
ii. JAXR provider
i. javax.xml.registry
ii. javax.xml.registry.infomodel
JAX-RPC is used for building and deploying SOAP+WSDL web services clients and
endpoints. It enables clients to invoke web services developed across heterogeneous
platforms.
159
160
describe business process activities as Web Services and define how they can be
connected to accomplish specific tasks.
WSDL XML
WSFL XLANG
(IBM 2001) (Microsoft 2001)
BPEL4WS 1.1
BPEL4WS 2.0
(current standard)
WS-Policy defines a framework for allowing web services to express their constraints and
requirements in relation to security, processing, or message content.
WS-Policy provides the mechanisms needed to enable web services application to specify
160
161
policies.
37. Give the specifications of WS-Policy framework.
<Envelope>
<Header>
.......
UNIT - I
161
162
1. Draw the XML Tree Structure or XML Document structures with style sheets or well
formed and Valid document
bookstore
{ display: block }
title{ display: block;
font-family: arial;
color: #008000;
162
163
font-weight: 600;
font-size: 22;
text-align: center }
author
{ display: block;
font-family: arial;
color: #000080;
font-weight: 400;
font-size: 20 }
year
{ display: block;
list-style-type: decimal;
font-family: arial;
color: #000000;
font-weight: 400;
font-size: 18 }
price
{ display: block;
list-style-type: square;
font-family: arial;
color: #0000ff;
font-weight: 200;
font-size: 14 }
XML namespaces are used for providing uniquely named elements and
attributes in an XML document. They are defined in a W3C recommendation.
An XML instance may contain element or attribute names from more than
one XML vocabulary.
XML Namespaces provide a method to avoid element name conflicts.
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
<state>Tamilnadu</state>
<zip>639101</zip>
<phone>788595785</phone>
</employee>
<employee>
<name>ranjith</name>
<position>teamleader</position>
<age>26</age>
<sex>male</sex>
<status>unmarried</status>
<address>Cauvery nagar</address>
<city>karur</ciy>
<state>Tamilnadu</state>
<zip>639102</zip>
<phone>788785785</phone>
</employee>
</employees>
The External DTD:
Employee.xml
<?xml version="1.0"?>
<employees>
<employee>
<name>john</name>
<position>manager</position>
<age>25</age>
<sex>male</sex>
<status>unmarried</status>
<address>Cauvery nagar</address>
<city>karur</ciy>
<state>Tamilnadu</state>
<zip>639101</zip>
<phone>978595785</phone>
</employee>
<employee>
<name>ram</name>
<position>programmer</position>
<age>23</age>
<sex>male</sex>
<status>unmarried</status>
<address>Cauvery nagar</address>
<city>karur</ciy>
<state>Tamilnadu</state>
<zip>639101</zip>
<phone>788595785</phone>
165
166
</employee>
<employee>
<name>ranjith</name>
<position>teamleader</position>
<age>26</age>
<sex>male</sex>
<status>unmarried</status>
<address>Cauvery nagar</address>
<city>karur</ciy>
<state>Tamilnadu</state>
<zip>639102</zip>
<phone>788785785</phone>
</employee>
employees.dtd:
<!ELEMENT employees(employee+)>
<!ELEMENT employee(name,position,age,sex,status,address,city,state,zip,phone)?
<!ELEMENT name (#PCDATA)>
<!ELEMENT position (#PCDATA)>
<!ELEMENT age (#PCDATA)>
<!ELEMENT sex (#PCDATA)>
<!ELEMENT status (#PCDATA)>
<!ATTLIST address (#PCDATA>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
The purpose of an XML Schema is to define the legal building blocks of an XML
document: the elements and attributes that can appear in a document. the number of
(and order of) child elements. data types for elements and attributes.
<xs:simpleType name="inttype">
<xs:restriction base="xs:positiveInteger"/>
</xs:simpleType>
<xs:simpleType name="dectype">
166
167
<xs:restriction base="xs:decimal"/>
</xs:simpleType>
<xs:complexType name="shiptotype">
<xs:sequence>
<xs:element name="name" type="stringtype"/>
<xs:element name="address" type="stringtype"/>
<xs:element name="city" type="stringtype"/>
<xs:element name="country" type="stringtype"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="itemtype">
<xs:sequence>
<xs:element name="title" type="stringtype"/>
<xs:element name="note" type="stringtype" minOccurs="0"/>
<xs:element name="quantity" type="inttype"/>
<xs:element name="price" type="dectype"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="shipordertype">
<xs:sequence>
<xs:element name="orderperson" type="stringtype"/>
<xs:element name="shipto" type="shiptotype"/>
<xs:element name="item" maxOccurs="unbounded" type="itemtype"/>
</xs:sequence>
<xs:attribute name="orderid" type="orderidtype" use="required"/>
</xs:complexType>
<xs:element name="shiporder" type="shipordertype"/>
</xs:schema>
5. Difference between DTD and XSD.
No. DTD XSD
1) DTD stands for Document Type Definition. XSD stands for XML Schema Definition.
2) DTDs are derived from SGML syntax. XSDs are written in XML.
3) DTD doesn't support datatypes. XSD supports datatypes for elements and
attributes.
5) DTD doesn't define order for child XSD defines order for child elements.
elements.
167
168
7) DTD is not simple to learn. XSD is simple to learn because you don't need to
learn new language.
8) DTD provides less control on XML structure. XSD provides more control on XML structure.
/bookstore/book[1] Selects the first book element that is the child of the bookstore
element
/bookstore/book[last()] Selects the last book element that is the child of the bookstore
element
/bookstore/book[last()-1] Selects the last but one book element that is the child of
the bookstore element
/bookstore/book[position()<3] Selects the first two book elements that are children of the
bookstore element
document.To link to a specific part of a page, add a number sign(#) and an XPointer expression
xlink:href = "http://www.books.com/bookdata.xml#xpointer(id('java'))"
Shorthand for above XPointer statement look like this xlink:href =
"http://www.books.com/bookdata.xml#java"
X-Link:
XLink is used to create hyperlinks in XML documents.
XLink is used to create hyperlinks within XML documents.
With XLink, the links can be defined outside the linked files.
<bookstore xmlns:xlink="http://www.w3.org/1999/xlink">
Example explained:
The xlink:show="new" specifies that the link should open in a new window
UNIT – II
1. Explain the DOM (DOCUMENT OBJECT MODEL)
The DOM defines a standard for accessing and manipulating documents.
"The W3C Document Object Model (DOM) is a platform and language-neutral interface that
allows programs and scripts to dynamically access and update the content, structure, and style
of a document."
DOM Levels:
DOM Level 1
It allows traversal of an XML document as well as the manipulation of the content in that document
DOM Level 2
Extend level 1 with additional features such as namespace support, events , Traversals and ranges,
DOM Level3
The DOM Level 3 specification contains five different specifications: The DOM3 Core, Load
and Save, Validation, Events, and XPath
2. Explain XML Parser using DOM.
Following are the steps used while parsing a document using DOM Parser.
<?xml version="1.0"?>
<company>
<staff id="1001">
<firstname>yong</firstname>
<lastname>mook kim</lastname>
<nickname>mkyong</nickname>
<salary>100000</salary>
</staff>
<staff id="2001">
<firstname>low</firstname>
<lastname>yin fong</lastname>
<nickname>fong fong</nickname>
<salary>200000</salary>
</staff>
</company>
DomParserDemo.java
package com.mkyong.seo;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
try {
doc.getDocumentElement().normalize();
System.out.println("----------------------------");
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
Java SAX Parser provides API to parse XML documents. SAX Parser is different from DOM
parser because it doesn’t load complete XML into memory and read xml document
sequentially.
SAX callback methods :
Steps
1. Create the SAX parser and parse the XML file: In this step we will take one factory
instance from SAXParserFactory to parse the xml file this factory instance in turns give us
instance of parser using the parse() method will parse the Xml file.
2. Event Handling: when Sax Parser starts the parsing whenever it founds the start or end tag it
will invoke the corresponding event handling method which is public void startElement (…)
and public void end Element (...).
3. Register the events: The class extends the Default Handler class to listen for callback
events and we register this handler to sax Parser to notify us for call back event
4. XML file
<?xml version="1.0"?>
<company>
<staff>
<firstname>yong</firstname>
<lastname>mook kim</lastname>
<nickname>mkyong</nickname>
<salary>100000</salary>
</staff>
<staff>
<firstname>low</firstname>
174
175
<lastname>yin fong</lastname>
<nickname>fong fong</nickname>
<salary>200000</salary>
</staff>
</company>
5. Java file
Use SAX parser to parse the XML file
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXMLFile {
public static void main(String argv[]) {
try {
if (qName.equalsIgnoreCase("FIRSTNAME")) {
bfname = true;
}
if (qName.equalsIgnoreCase("LASTNAME")) {
blname = true;
}
if (qName.equalsIgnoreCase("NICKNAME")) {
bnname = true;
}
if (qName.equalsIgnoreCase("SALARY")) {
bsalary = true;
}
175
176
public void characters(char ch[], int start, int length) throws SAXException {
if (bfname) {
System.out.println("First Name : " + new String(ch, start, length));
bfname = false;
}
if (blname) {
System.out.println("Last Name : " + new String(ch, start, length));
blname = false;
}
if (bnname) {
System.out.println("Nick Name : " + new String(ch, start, length));
bnname = false;
}
if (bsalary) {
System.out.println("Salary : " + new String(ch, start, length));
bsalary = false;
}
};
saxParser.parse("c:\\file.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
Result:
176
177
XSL Technologies:
XSL Transformation Language
XSL Formatting Object Language
<title>Empire Burlesque</title>
<artist>Bob Dylan</artist>
<country>USA</country>
<company>Columbia</company>
<price>10.90</price>
<year>1985</year>
</cd>
.
.
</catalog>
Create an XSL Style Sheet ("cdcatalog.xsl") with a transformation template:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
XSL Formatting Objects is itself an XML-based markup language that lets you specify
in great detail the pagination, layout, and styling information that will be applied to
your content.
Page Masters and Page Sequences
A sequence of pages follows a page master.
A page master defines the basic layout of a page and the regions on the page.
There is a "simple page master" built-in with the following regions:
region-before - The header of the page.
178
179
Page Layouts:
Let's start out by specifying the page widths and heights and margins. The units below
are all in centimeters, but you may use any of the CSS units, such as px (pixel), pt
(point), em, in, mm, etc. Each of these specifications is called a simple-page-master and
must be given a master-name so you can refer to it later.
<fo:layout-master-set>
<fo:simple-page-master master-name="cover"
page-height="12cm"
page-width="12cm"
margin-top="0.5cm"
margin-bottom="0.5cm"
margin-left="1cm"
margin-right="0.5cm">
</fo:simple-page-master>
<fo:simple-page-master master-name="leftPage"
page-height="12cm"
page-width="12cm"
margin-left="0.5cm"
margin-right="1cm"
margin-top="0.5cm"
margin-bottom="0.5cm">
</fo:simple-page-master>
<fo:simple-page-master master-name="rightPage"
page-height="12cm"
179
180
page-width="12cm"
margin-left="1cm"
margin-right="0.5cm"
margin-top="0.5cm"
margin-bottom="0.5cm">
</fo:simple-page-master>
<!-- more info will go here -->
</fo:layout-master-set>
The margins are areas which will not contain any printed output.
The Content Area
All of the printing occurs within the dotted lines in the diagram above. This is the page
content area (officially called the page-reference-area), which can be divided into five
regions as shown below.
Region Dimensions
The cover page doesn't need a header or footer, so we need only specify information for
the region-body by adding the information shown in bold below.
<fo:simple-page-master master-name="cover"
page-height="12cm"
page-width="12cm"
margin-top="0.5cm"
margin-bottom="0.5cm"
margin-left="1cm"
margin-right="0.5cm">
<fo:region-body
margin-top="3cm" />
</fo:simple-page-master>
The left and right pages will have a header and footer, so we must specify the extent of
the region-before and region-after.
<fo:simple-page-master master-name="leftPage"
page-height="12cm"
page-width="12cm"
margin-left="0.5cm"
margin-right="1cm"
margin-top="0.5cm"
180
181
margin-bottom="0.5cm">
<fo:region-before extent="1cm"/>
<fo:region-after extent="1cm"/>
<fo:region-body
margin-top="1.1cm"
margin-bottom="1.1cm" />
</fo:simple-page-master>
<fo:simple-page-master master-name="rightPage"
page-height="12cm"
page-width="12cm"
margin-left="1cm"
margin-right="0.5cm"
margin-top="0.5cm"
margin-bottom="0.5cm">
<fo:region-before extent="1cm"/>
<fo:region-after extent="1cm"/>
<fo:region-body
margin-top="1.1cm"
margin-bottom="1.1cm" />
</fo:simple-page-master>
UNIT – III
12. List out some characteristics of Contemporary SOA.
4. Comparing SOA with client server architecture and distributed internet architectures.
What is Architecture?
Application architecture.
Enterprise architecture.
SOA vs. Client server architecture.
Client server architecture.
Characteristics. The primary characteristics of the two tier client server architectures is given
below which is compared to SOA
Application logic
Application processing
Technology
Security
Administration
182
183
UNIT – IV
1. Explain Web Services in detail.
Service Roles:
Service Provider.
Service Requestor.
Service Intermediaries.
Service Models:
Business Service Model.
Utility service Model.
Controller Service Model.
Primitive MEPs
Request – Response
Fire- and – forget
Complex MEPS
MEPs and SOAP
MEPs and WSDL
Request – Response operation
Solicit – Response operation
One way operation
Notification operation
MEPs and SOA
184
185
5. Discuss in detail about SOA Support with J2EE and its API’s.
1) Platform overview
1) Primitive SOA support
2) Support for service orientation principles
3) Contemporary SOA support.
6. Discuss in detail about SOA Support with .NET.
Platform overview
Primitive SOA support
Support for service orientation principles
Contemporary SOA support.
7. Discuss in detail about the WS – BPEL with code snippets.
WS-BPEL language basics
A brief history of BPEL 4 WS and WS-BPEL
Prerequisites
The process element
The partner links and partner link element
The partner link type element
The variables element
186
187
Information Retrieval
UNIT – I
INTRODUCTION
-PART – A
Information characterisation
Search formulation in information seeking
System Integration
187
188
Support functions
User simply enters a query, suggests what needs to be done, and the system executes the
query to return results.
Full Automation. User queries are entered and the rest is done by the system.
188
189
Relevance appears to be a subjective quality, unique between the individual and a given document
supporting the assumption that relevance can only be judged by the information user.Subjectivity and
fluidity make it difficult to use as measuring tool for system performance.
Stemming is techniques used to find out the root/stem of a word. Used to improve effectiveness of
IR and text mining.Stemming usually refers to a crude heuristic process that chops off the ends of
words in the hope of achieving this goal correctly most of the time, and often includes the removal
of derivational affixes.
Document indexing is the process of associating or tagging documents with different “search”
terms. Assign to each document (respectively query) a descriptor represented with a set of features,
usually weighted keywords, derived from the document (respectively query) content.
The impacts of information retrieval on the web are influenced in the following areas.
189
190
Web search is often not informational -- it might be navigational (give me the url of the site I ant
to reach) or transactional (show me sites where I can perform a certain transaction, e.g. shop,
download a file, or find a map).
Web search engines crawl the Web, downloading and indexing pages in order to allow full-text
search. There are many general purpose search engines; unfortunately none of them come close to
indexing the entire Web. There are also thousands of specialized search services that index specific
content or specific sites.
Generally there are three basic components of a search engine as listed below:
1. Web Crawler
2. Database
3. Search Interfaces
This is the part of the search engine which combs through the pages on the internet and gathers the
information for the search engine. It is also known as spider or bots. It is a software component that
traverses the web to gather information.
Indexing Process
Text acquisition
Text transformation
Index creation
Query Process
User interaction
Ranking
Evaluation
190
191
191
192
PART – B
▪ 2005+: Google gains search share, dominating in Europe and very strong in North America
IR helps users find information that matches their information needs expressed as queries.
Historically, IR is about document retrieval, emphasizing document as the basic unit.
– Finding documents relevant to user queries.
IR Queries
• Keyword queries
• Boolean queries (using AND, OR, NOT)
• Phrase queries
• Proximity queries
• Full document queries
• Natural language questions
IR Models
– Boolean model
192
193
Areas of AI for IR
Natural language processing
Knowledge representation
❖ Expert systems
❖ Ex: Logical formalisms, conceptual graphs, etc
Machine learning
❖ Short term: over a single session
❖ Long term: over multiple searches by multiple users
Computer Vision
❖ Ex: OCR
Reasoning under uncertainty
❖ Ex: Dempster-Shafer, Bayesian networks, probability theory, etc
Cognitive theory
❖ Ex: User modelling
AI applied to IR
Four main roles investigated
❖ Information characterisation
❖ Search formulation in information seeking
❖ System Integration
❖ Support functions
AI has a very valuable contribution to make
❖ Specialised systems where domain is controlled, well-integrated and understood
❖ Support functions
❖ Case-based reasoning and dialogue functions
❖ Integrated functions
4. Explain in detail about Search Engine.
Search Engine is in the field of IR .Searching authors, titles and subjects in library card catalogs or
computers. Document classification and categorization, user interfaces, data visualization, filtering
Types of Search Engines
Search by Keywords (e.g. AltaVista,
Excite, Google, and Northern Light)
Search by categories (e.g. Yahoo!)
Specialize in other languages (e.g.
Chinese Yahoo! and Yahoo! Japan)
Interview simulation (e.g. Ask Jeeves!)
193
194
Faceted search
▪ Social search
195
196
UNIT II
INFORMATION RETRIEVAL
PART – A
◼ wij {0,1}
196
197
◼ Clean Formalism
◼ Easy to implement
◼ Intuitive concept
◼ Difficult to rank output, some documents are more important than others
197
198
198
199
◼ Sim(dj, q) = t
( wi, j x wi,q )
dj q
i 1
t 2 t 2
dj x q wi, j x wi,q
i 1 j 1
◼ The order in which the terms appear in the document is lost in the vector
space representation
9. What are the Parameters in calculating a weight for a document term or query term?
Term Frequency (tf): Term Frequency is the number of times a term i appears in
document j (tfij )
– Document Frequency (df): Number of documents a term i appears in, (dfi ).
– Inverse Document Frequency (idf): A discriminating measure for a term i in
collection, i.e., how discriminating term i is. (idf i) = log10(n / dfi), where n is the
number of document
199
200
200
201
is idfi=log(N/ni) or idfi=log(D/dfi)
Where
2016) 11.How do you calculate the term weighting in document and Query term weight ?(nov/dec
Cosine ɵ=
Q.D
|Q|.|D|
201
202
W
eig
ht
var
iab
les
all
are
bin
ary
,
i.e.
wi,
j
€{
0,1
}
an
d
wi,
q
€{
0,1
}
q-
a
qu
ery
is
a
su
bs
et
of
ind
ex
ter
ms
202
203
where
P(dj|R) - probability of randomly selecting the document dj from the set R
of relevant documents
P(R) - probability of randomly selecting the document from the entire collection is
relevant
203
204
Advantages
204
205
◼ Elimination of noise
◼ Removal of redundancy
20.Define Relevance feedback model:-(nov/dec 2016)
205
206
After initial retrieval results are presented allow the user to provide feedback on the
relevance of one or more of the retrieved documents. use this feedback information to
reformulate the query and produce new results based on reformulated query. Thus allows
more interactive multi pass process.
21.Draw the flow diagram for relevance feedback query processing model:(nov/dec 2016)
206
207
◼ It shields the user from the details of the query reformulation process because all
the user has to provide is a relevance judgement on documents
◼ It breaks down the whole searching task into a sequence of small steps which
are easier to grasp
207
208
PART – B
209
210
211
212
◼ To each element Mij of this matrix is assigned a weight wijassociated with the
pair
[ki,dj]
◼ The weight wij can be based on a tf-idf weighting scheme, like Vector model
The matrix vec(M) can be decomposed into 3 matrices (singular value decomposition) as
follows:
213
214
◼ Keep the corresponding columns in (K) and (D) t i.e. The remaining singular
values of the S ae deleted.
s t
Ms = Ks Ss D
6.Give brief notes about user Relevance feedback method and how it is used in query
expansion:It is the most popular query formulation strategy
◼ Query expansion
◼ Term reweighting
7. Write the advantages and disadvantages for classic models which are used in IR and
discriminate their techniques:
214
215
215
216
◼ Where,
D – a set composed of logical views for the documents in the collection
Q – a set composed of logical views for the user information needs – queries
ƒ – a framework for modeling doc representations, queries and their
relationships
R(qi,dj) – a ranking function, qi € Q and dj € D, ranking based on qi
To build the model
truck" Answer:
216
217
217
218
Finally we sort and rank the documents in descending order according to the similarity
values
Rank 1: Doc 2 = 0.8246
Rank 2: Doc 3 = 0.3271
Rank 3: Doc 1 = 0.0801
218
219
UNIT-III
PART – A
A web search engine is a software system that is designed to search for information on the World
Wide Web. The search results are generally presented in a line of results often referred to as search
engine results pages (SERPs).
Security Commercial transactions over the Internet are not yet a completely safe procedure Privacy
Frequently, people are willing to exchange information as long as it does not become public Copyright
and patent rights It is far from clear how the wide spread of data on the Web affects copyright and
patent laws in the various countries Scanning, optical character recognition (OCR), and cross-
language retrieval
219
220
220
221
microscopic level: related to the statistical properties of links and individual nodes
mesoscopic level: related to the properties of areas or regions of the Web
macroscopic level: related to the structure of the Web at large
The crawler start downloading a set of seed pages, that are parsed and scanned for new
links
221
222
The links to pages that have not yet been downloaded are added to a central queue for
download later
Next, the crawler selects a new page for download and the process is repeated until a
stop criterion is met.
14. What are the Types of Web search based on crawling?(nov/dec 2016)
16. What are the basic rules for Web crawler operation are?
A Web crawler must identify itself as such, and must not pretend to be a regular Web user
A Web crawler must obey the robots exclusion protocol (robots.txt)
A Web crawler must keep a low bandwidth usage in a given Web site.
222
223
223
224
PART – B
Search Engine refers to a huge database of internet resources such as web pages, newsgroups,
programs, images etc. It helps to locate information on World Wide Web.User can search for
any information by passing query in form of keywords or phrase. It then searches for relevant
information in its database and return to the user.
Generally there are three basic components of a search engine as listed below:
1. Web Crawler
2. Database
3. Search Interfaces
Web crawler
It is also known as spider or bots. It is a software component that traverses the web to gather
information.
Database
All the information on the web is stored in database. It consists of huge web resources.
Search Interfaces
This component is an interface between user and the database. It helps the user to search through
the database.
Web crawler, database and the search interface are the major component of a search engine that
actually makes search engine to work. Search engines make use of Boolean expression AND, OR,
NOT to restrict and widen the results of a search. Following are the steps that are performed by the
search engine:
The search engine looks for the keyword in the index for predefined database instead of
going directly to the web to search for the keyword.
It then uses software to search for the information in the database. This software component
is known as web crawler.
224
225
Once web crawler finds the pages, the search engine then shows the relevant web pages as a
result. These retrieved web pages generally include title of page, size of text portion, first
several sentences etc.
These search criteria may vary from one search engine to the other. The retrieved information is
ranked according to various factors such as frequency of keywords, relevancy of information, links
etc.
Architecture
The search engine architecture comprises of the three basic layers listed below:
Indexing Process
225
226
Text acquisition
Text transformation
Index creation
Text acquisition
Text Transformation
Index Creation
It takes index terms created by text transformations and create data structures to suport fast
searching.
Query Process
User interaction
Ranking
Evaluation
User interaction
It supporst creation and refinement of user query and displays the results.
Ranking
Evaluation
Examples
Search Description
Engine
Google It was originally called BackRub. It is the most popular search engine globally.
Bing It was launched in 2009 by Microsoft. It is the latest web-based search engine that
226
227
Web crawlers
▪ Starts with a set of seeds, which are a set of URLs given to it asparameters
▪ Seeds are added to a URL request queue
▪ Crawler starts fetching pages from the request queue
▪ Downloaded pages are parsed to find link tags that might contain other useful URLs to fetch
▪ New URLs added to the crawler’s request queue, or frontier
Continue until no more new URLs or disk full
Explain each types in details.
Focused retrieval
Structured documents
227
228
Types of Compression
◼ Lossy
◼ Loseless
◼ Loseless compression
❑ Elias Codes
❑ n-s encoding
❑ Golomb encoding
❑ CPSS-Tree
228
229
▪ Number of documents/hour
▪ (Average document size)
▪ How fast does it search
▪ Latency as a function of index size
▪ Expressiveness of query language
▪ Ability to express complex information needs
▪ Speed on complex queries
▪ Uncluttered UI
▪ Is it free?
▪ All of the preceding criteria are measurable: we can quantify speed/size
▪ we can make expressiveness precise
▪ The key measure: user happiness
▪ What is this?
▪ Speed of response/size of index are factors
▪ But blindingly fast, useless answers won’t make a user happy
▪ Need a way of quantifying user happiness
Issue: who is the user we are trying to make happy?
▪ Depends on the setting
▪ Web engine:
▪ User finds what s/he wants and returns to the engine
▪ Can measure rate of return users
▪ User completes task – search as a means, not end
▪ See Russell http://dmrussell.googlepages.com/JCDL-talk-June-2007-short.pdf
▪ eCommerce site: user finds what s/he wants and buys
▪ Is it the end-user, or the eCommerce site, whose happiness we measure?
▪ Measure time to purchase, or fraction of searchers who become buyers?
Motives
229
230
▪ Forums
▪ E.g., Web master world ( www.webmasterworld.com )
▪ Search engine specific tricks
Discussions about academic papers
More spam techniques
▪ Doorway pages
▪ Pages optimized for a single keyword that re-direct to the real target page
▪ Link spamming
▪ Mutual admiration societies, hidden links, awards – more on these later
▪ Domain flooding: numerous domains that point or re-direct to a target page
▪ Robots
▪ Fake query stream – rank checking programs
▪ “Curve-fit” ranking programs of search engines
Millions of submissions via Add-Url
230
231
UNIT IV
PART – A
Authority: The pages that will emerge with high authority scores.
231
232
Example: In this approach stems from a particular insight into the creation of web pages,
that there are two primary kinds of web pages useful as results for broad-topic searches. By a
broad topic search we mean an informational query such as "I wish to learn about
leukemia". There are authoritative sources of information on the topic; in this case, the National
Cancer Institute's page on leukemia would be such a page. We will call such pages authorities;
in the computation we are about to describe, they are the pages that will emerge with high
authority scores.
Hub: These hub pages are the pages that will emerge with high hub scores
On the other hand, there are many pages on the Web that are hand-compiled lists of
links to authoritative web pages on a specific topic. These hub pages are not in themselves
authoritative sources of topic-specific information, but rather compilations that someone with an
interest in the topic has spent time putting together. The approach we will take, then, is to use these
hub pages to discover the authority pages. In the computation we now develop, these hub pages are
the pages that will emerge with high hub scores
how well a retrieved document or set of documents meets the information need of the user.
Relevance may include concerns such as timeliness, authority or novelty of the result.
When the user gives a query, the index is consulted to get the documents most relevant to the query.
The relevant documents are then ranked according to their degree of relevance, importance etc.
232
233
13.What is MapReduce?
The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set
of data and converts it into another set of data, where individual elements are broken down into tuples
(key/value pairs). Secondly, reduce task, which takes the output from a map as an input and combines
those data tuples into a smaller set of tuples.
14.DefineHadoop?
Hadoop is an open-source framework that allows to store and process big data in a distributed
environment across clusters of computers using simple programming models. It is designed to scale
up from single servers to thousands of machines, each offering local computation and storage.
15. What are the factors affecting the performance of CLIR systems.
16. What are the Challenges in CLIR.( Cross lingual Information Retrieval)
17.Define Snippets.
Snippets are short fragments of text extracted from the document content or its metadata. They may
be static or query based. In static snippet it always the first 50 words of the document pe the content
233
234
of its description. In query based snippet is one selectively extracted on the basis of its relation to
the searcher’s query.
It is a method of making automatic predictions about the interests of a single user by collecting
preferences or taste information from many users.
Item based CF is a model based approach which produces recommendations based on the relationship
between items inferred from the rating matrix. The assumption behind this approach is that users will
prefer items that are similar to other items they like.
The two main problems of user based CF are that the whole user databases has to be kept in
memory and that expensive similarity computation between the active user and all other users in the
database has to be performed.
User based collaborative filtering algorithms work off the premise that if a user A has a similar profile
to another user B, then A is more likely to prefer things that it prefers when compared with a user
chosen at random.
Macroscopic level
Limitations of Link analysis:
Meta tags/invisible text
Pay-for-place
Stability
Topic Drift
Convent evolution
235
236
UNIT V
PART – A
Information filtering delivers to users only the information that is relevant to them, filtering out
all irrelevant new data items
Information retrieval is about fulfilling immediate queries from a library of information available.
Example : you have a deal store containing 100 deals and a query comes from a user. You show the
deals that are relevant to that query.
Information Filtering is about processing a stream of information to match your static set of likes,
tastes and preferences.Example: a clipper service which reads all the news articles published today
and serves you content that is relevant to you based on your likes and interests.
Feedback given by the user about the relevance of the documents in the initial set of results.
6. Differentiate Text Mining vs. Data Mining ,web mining, information retrieval
In Text Mining, patterns are extracted from natural language text rather than databases
Text Mining vs • Web Mining – In Text Mining, the input is free unstructured text, whilst web
sources are structured.
236
237
K-Means clustering.
Agglomerative hierarchical clustering.
Text pre-processing is an essential part of any NLP system, since the characters, words, and
sentences identified at this stage are the fundamental units passed to all further processing stages,
from analysis and tagging components, such as morphological analyzers and part-of-speech taggers,
through applications, such as information retrieval and machine translation systems.
9. Define classification
Classification is a data mining function that assigns items in a collection to target categories or
classes. The goal of classification is to accurately predict the target class for each case in the data.
Clustering is a process of partitioning a set of data (or objects) into a set of meaningful sub-classes,
called clusters. Help users understand the natural grouping or structure in a data set. Used either as
a stand-alone tool to get insight into data distribution or as a preprocessing step for other
algorithms.
In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based on
applying Bayes' theorem with strong (naive) independence assumptions between the features.
A decision tree is a structure that includes a root node, branches, and leaf nodes. Each internal node
denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node holds a
class label. The topmost node in the tree is the root node.
Agglomerative hierarchical clustering is a bottom-up clustering method where clusters have sub-
clusters, which in turn have sub-clusters, etc. The classic example of this is species taxonomy. Gene
expression data might also exhibit this hierarchical quality (e.g. neurotransmitter gene families).
Agglomerative hierarchical clustering starts with every single object (gene or sample) in a single
cluster. Then, in each successive iteration, it agglomerates (merges) the closest pair of clusters by
satisfying some similarity criteria, until all of the data is in one cluster.
237
238
In supervised learning both input and output are provided. The network then processes the inputs
and compares its resulting output against the desired outputs. Errors are then propagated back through
the systems causing the system to adjust the weights which control the network.
Unsupervised learning is a type of machine learning algorithm used to draw inferences from
datasets consisting of input data without labeled responses. The most common unsupervised
learning method is cluster analysis.
A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters
produced by hierarchical clustering. Dendrograms are often used in computational biology to
illustrate the clustering of genes or samples, sometimes on top of heatmaps.
PART – B
In general all of Machine Learning Algorithms need to be trained for supervised learning tasks like
classification, prediction etc. or for unsupervised learning tasks like clustering.
By training it means to train them on particular inputs so that later on we may test them for
unknown inputs (which they have never seen before) for which they may classify or predict etc (in
case of supervised learning) based on their learning. This is what most of the Machine Learning
techniques like Neural Networks, SVM, Bayesian etc. are based upon.
So in a general Machine Learning project basically you have to divide your input set to a
Development Set (Training Set + Dev-Test Set) & a Test Set (or Evaluation set). Remember your
basic objective would be that your system learns and classifies new inputs which they have never
seen before in either Dev set or test set.
The test set typically has the same format as the training set. However, it is very important that the
238
239
test set be distinct from the training corpus: if we simply reused the training set as the test set, then
a model that simply memorized its input, without learning how to generalize to new examples,
would receive misleadingly high scores.
In general, for an example, 70% can be training set cases. Also remember to partition the original
set into the training and test sets randomly.
To demonstrate the concept of Naïve Bayes Classification, consider the example given below:
The Naive Bayes Classifier technique is based on the so-called Bayesian theorem and is
particularly suited when the dimensionality of the inputs is high. Despite its simplicity, Naive
Bayes can often outperform more sophisticated classification methods.
To demonstrate the concept of Naïve Bayes Classification, consider the example displayed in the
illustration above. As indicated, the objects can be classified as either GREEN or RED. Our task is
to classify new cases as they arrive, i.e., decide to which class label they belong, based on the
currently exiting objects.
Since there are twice as many GREEN objects as RED, it is reasonable to believe that a new case
(which hasn't been observed yet) is twice as likely to have membership GREEN rather than RED.
In the Bayesian analysis, this belief is known as the prior probability. Prior probabilities are based
on previous experience, in this case the percentage of GREEN and RED objects, and often used to
predict outcomes before they actually happen.
Since there is a total of 60 objects, 40 of which are GREEN and 20 RED, our prior probabilities for
class membership are:
239
240
Having formulated our prior probability, we are now ready to classify a new object (WHITE
circle). Since the objects are well clustered, it is reasonable to assume that the more GREEN (or
RED) objects in the vicinity of X, the more likely that the new cases belong to that particular color.
To measure this likelihood, we draw a circle around X which encompasses a number (to be chosen
a priori) of points irrespective of their class labels. Then we calculate the number of points in the
circle belonging to each class label. From this we calculate the likelihood:
From the illustration above, it is clear that Likelihood of X given GREEN is smaller than
Likelihood of X given RED, since the circle encompasses 1 GREEN object and 3 RED ones. Thus:
Although the prior probabilities indicate that X may belong to GREEN (given that there are twice
as many GREEN compared to RED) the likelihood indicates otherwise; that the class membership
of X is RED (given that there are more RED objects in the vicinity of X than GREEN). In the
Bayesian analysis, the final classification is produced by combining both sources of information,
i.e., the prior and the likelihood, to form a posterior probability using the so-called Bayes' rule
(named after Rev. Thomas Bayes 1702-1761).
240
241
Finally, we classify X as RED since its class membership achieves the largest posterior probability.
Note.The above probabilities are not normalized. However, this does not affect the classification
outcome since their normalizing constants are the same.
As indicated, the objects can be classified as either GREENor RED. Our task is to classify new
cases as they arrive, i.e., decide to which class label they belong, based on the currently existing
objects.
Since there are twice as many GREENobjects as RED, it is reasonable to believe that a new case
(which hasn't been observed yet) is twice as likely to have membership GREENrather than RED. In
the Bayesian analysis, this belief is known as the prior probability. Prior probabilities are based on
previous experience, in this case the percentage of GREENand REDobjects, and often used to
predict outcomes before they actually happen.
Since there is a total of 60objects, 40of which are GREENand 20 RED, our prior probabilities
for class membership are:
Having formulated our prior probability, we are now ready to classify a new object (WHITEcircle
in the diagram below). Since the objects are well clustered, it is reasonable to assume that the more
GREEN(or RED) objects in the vicinity of X, the more likely that the new cases belong to that
particular color. To measure this likelihood, we draw a circle around X which encompasses a
number (to be chosen a priori) of points irrespective of their class labels. Then we calculate the
number of points in the circle belonging to each class label
241
242
Ans.Very simply, ID3 builds a decision tree from a fixed set of examples. ... The leaf nodes of
the decision tree contain the class name whereas a non-leaf node is a decision node. The decision
node is an attribute test with each branch (to another decision tree) being a possible value of the
attribute.
DecisionTreeAlgorithmID3
Ans.In the agglomerative hierarchical approach, we start by defining each data point to be a cluster
and combine existing clusters at each step. Here are four different methods for doing this:
1. Single Linkage: In single linkage, we define the distance between two clusters to be the
minimum distance between any single data point in the first cluster and any single data point in the
second cluster. On the basis of this definition of distance between clusters, at each stage of the
process we combine the two clusters that have the smallest single linkage distance.
242
243
2. Complete Linkage: In complete linkage, we define the distance between two clusters to be the
maximum distance between any single data point in the first cluster and any single data point in the
second cluster. On the basis of this definition of distance between clusters, at each stage of the
process we combine the two clusters that have the smallest complete linkage distance.
3. Average Linkage: In average linkage, we define the distance between two clusters to be the
average distance between data points in the first cluster and data points in the second cluster. On the
basis of this definition of distance between clusters, at each stage of the process we combine the
two clusters that have the smallest average linkage distance.
4. Centroid Method: In centroid method, the distance between two clusters is the distance between
the two mean vectors of the clusters. At each stage of the process we combine the two clusters that
have the smallest centroid distance.
5. Ward’s Method: This method does not directly define a measure of distance between two
points or clusters. It is an ANOVA based approach. At each stage, those two clusters marge,
which provides the smallest increase in the combined error sum of squares from one-way univariate
ANOVAs that can be done for each variable with groups defined by the clusters at that stage of the
process
Ans.Clustering is the process of partitioning a group of data points into a small number of clusters.
For instance, the items in a supermarket are clustered in categories (butter, cheese and milk are
grouped in dairy products). Of course this is a qualitative kind of partitioning. A quantitative
approach would be to measure certain features of the products, say percentage of milk and others,
and products with high percentage of milk would be grouped together. In general, we have n data
points xi,i=1...n that have to be partitioned in k clusters. The goal is to assign a cluster to each data
point. K-means is a clustering method that aims to find the positions μi,i=1...k of the clusters that
minimize the distance from the data points to the cluster. K-means clustering solves
argminc∑i=1k∑x∈cid(x,μi)=argminc∑i=1k∑x∈ci∥x−μi∥22
whereci is the set of points that belong to cluster i. The K-means clustering uses the square
of the Euclidean distance d(x,μi)=∥x−μi∥22. This problem is not trivial (in fact it is NP-
hard), so the K-means algorithm only hopes to find the global minimum, possibly getting
stuck in a different solution.
243
244
Given the statistical model which generates a set of observed data, a set of unobserved latent
data or missing values , and a vector of unknown parameters , along with a likelihood
function , the maximum likelihood estimate (MLE) of the unknown parameters is determined
by the marginal likelihood of the observed dataHowever, this quantity is often intractable (e.g. if
is a sequence of events, so that the number of values grows exponentially with the sequence
length, making the exact calculation of the sum extremely difficult).
The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying these
two steps:
Expectation step (E step): Calculate the expected value of the log likelihood function, with
respect to the conditional distribution of given under the current estimate of the
parameters :
Maximization step (M step): Find the parameter that maximizes this quantity:
1. The observed data points may be discrete (taking values in a finite or countably infinite
set) or continuous (taking values in an uncountably infinite set). Associated with each data
point may be a vector of observations.
2. The missing values (aka latent variables) are discrete, drawn from a fixed number of
values, and with one latent variable per observed unit.
3. The parameters are continuous, and are of two kinds: Parameters that are associated with all
data points, and those associated with a specific value of a latent variable (i.e., associated
with all data points which corresponding latent variable has that value).
244
245
However, it is possible to apply EM to other sorts of models.The motive is as follows. If the value
of the parameteris known, usually the value of the latent variables can be found by maximizing the
log-likelihood over all possible values of , either simply by iterating over or through an
algorithm such as the Viterbi algorithm for hidden Markov models. Conversely, if we know the
value of the latent variables , we can find an estimate of the parameters fairly easily,
typically by simply grouping the observed data points according to the value of the associated latent
variable and averaging the values, or some function of the values, of the points in each group. This
suggests an iterative algorithm, in the case where bothandare unknown:
The algorithm as just described monotonically approaches a local minimum of the cost function.
245
246