LP IV Lab Manual Sem II AY 2019-2020 (1) (1) - Removed

Download as pdf or txt
Download as pdf or txt
You are on page 1of 93

LP – IV LAB MANUAL

List of Experiments
Assignment Assignment Name Page
Number No.
Implement Union, Intersection, Complement and Difference
operations on fuzzy sets. Also create fuzzy relation by Cartesian
1
product of any two fuzzy sets and Perform max-min composition on
any two fuzzy relations.

Implement genetic algorithm for benchmark function (eg. Square,


Rosenbrock function etc) Initialize the population from the
Standard Normal Distribution. Evaluate the fitness of all its
individuals. Then you will do multiple generation of a genetic
algorithm. A generation consists of applying selection, crossover,
mutation, and replacement.
2 Use:

• Tournament selection without replacement with tournament


size s
• One point crossover with probability Pc
• bit-flip mutation with probability Pm
• use full replacement strategy

Implement Particle swarm optimization for benchmark function (eg.


Square, Rosenbrock function). Initialize the population from the
Standard Normal Distribution. Evaluate fitness of all particles.
Use :
3 • c1=c2 = 2
• Inertia weight is linearly varied between 0.9 to 0.4.
• Global best variation

Implement basic logic using Mc-Culoch-Pitts or Hebbnet neural


4 networks.

Dept. of Comp Engg. Page 2


LP – IV LAB MANUAL

Write a program to find the Boolean function to implement


following single layer perceptron. Assume all activation functions
to be the threshold function which is 1 for all input values greater
than zero and 0, otherwise.

Implement Union, Intersection, Complement and Difference


operations on fuzzy sets. Also create fuzzy relation by Cartesian
6
product of any two fuzzy sets and Perform
max-min composition on any two fuzzy relations.

Mini-Project on Genetic Algorithm: Apply the Genetic


Algorithm for optimization on a dataset obtained from UCI ML
7 repository.
For Example: IRIS Dataset or Travelling Salesman Problem or
KDD Dataset.

Use Business intelligence and analytics tools to recommend the


8
combination of share purchases and sales for maximizing the profit.

9 On Twitter Data performs computing using BIA tools electively.

Mini-Project on BI : Sentiment Analysis of Whatsapp data


10 analytics tools .

Dept. of Comp Engg. Page 3


LP – IV LAB MANUAL

Assignment No: 01

Aim: Implement Union, Intersection, Complement and Difference operations on fuzzy sets.
Also create fuzzy relation by Cartesian product of any two fuzzy sets and perform max-min
composition on any two fuzzy relations.

Objectives:
1. To learn different fuzzy operation on fuzzy set.
2. To learn min-max composition on fuzzy set.

Software Requirements:
Ubuntu 18.04

Hardware Requirements:
Pentium IV system with latest configuration

Theory:

Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the
concept of partial truth- truth values between "completely true" and "completely false". As its
name suggests, it is the logic underlying modes of reasoning which are approximate rather than
exact. The importance of fuzzy logic derives from the fact that most modes of human reasoning
and especially common sense reasoning are approximate in nature.
The essential characteristics of fuzzy logic as founded by Zader Lotfi are as follows.
฀ In fuzzy logic, exact reasoning is viewed as a limiting case of approximate reasoning. 

฀ In fuzzy logic everything is a matter of degree.

฀ Any logical system can be fuzzified

฀ In fuzzy logic, knowledge is interpreted as a collection of elastic or, equivalently , fuzzy
constraint on a collection of variables

฀ Inference is viewed as a process of propagation of elastic constraints.

Dept. of Comp Engg. Page 4


LP – IV LAB MANUAL

The third statement hence, defines Boolean logic as a subset of Fuzzy logic.

Fuzzy Sets
Fuzzy Set Theory was formalised by Professor Lofti Zadeh at the University of California in
1965. What Zadeh proposed is very much a paradigm shift that first gained acceptance in the Far
East and its successful application has ensured its adoption around the world.

A paradigm is a set of rules and regulations which defines boundaries and tells us what to do to
be successful in solving problems within these boundaries. For example the use of transistors
instead of vacuum tubes is a paradigm shift - likewise the development of Fuzzy Set Theory
from conventional bivalent set theory is a paradigm shift.

Bivalent Set Theory can be somewhat limiting if we wish to describe a 'humanistic' problem
mathematically. For example, Fig 1 below illustrates bivalent sets to characterise the temperature
of a room.

The most obvious limiting feature of bivalent sets that can be seen clearly from the diagram is that
they are mutually exclusive - it is not possible to have membership of more than one set ( opinion
would widely vary as to whether 50 degrees Fahrenheit is 'cold' or 'cool' hence the expert knowledge
we need to define our system is mathematically at odds with the humanistic world). Clearly, it is not
accurate to define a transiton from a quantity such as 'warm' to 'hot' by the application of one degree
Fahrenheit of heat. In the real world a smooth (unnoticeable) drift from warm to hot would occur.
This natural phenomenon can be described more accurately by Fuzzy Set Theory. Fig.2 below shows
how fuzzy sets quantifying the same information can describe this natural drift.

Dept. of Comp Engg. Page 5


LP – IV LAB MANUAL

The whole concept can be illustrated with this example. Let's talk about people and "youthness".
In this case the set S (the universe of discourse) is the set of people. A fuzzy subset YOUNG is
also defined, which answers the question "to what degree is person x young?" To each person in
the universe of discourse, we have to assign a degree of membership in the fuzzy subset
YOUNG. The easiest way to do this is with a membership function based on the person's age.
young(x) = { 1, if age(x) <= 20,
(30-age(x))/10, if 20 < age(x) <= 30,
0, if age(x) > 30 }
A graph of this looks like:

Dept. of Comp Engg. Page 6


LP – IV LAB MANUAL

Given this definition, here are some example values:

Person Age Degree of youth


Johan 10 1.00
Edwin 21 0.90

Parthiban 25 0.50
Arosha 26 0.40
Chin Wei 28 0.20
Rajkumar 83 0.00

So given this definition, we'd say that the degree of truth of the statement "Parthiban is YOUNG"
is 0.50.
Note: Membership functions almost never have as simple a shape as age(x). They will at least tend to
be triangles pointing up, and they can be much more complex than that. Furthermore, membership
functions so far is discussed as if they always are based on a single criterion, but this isn't always the
case, although it is the most common case. One could, for example, want to have the membership
function for YOUNG depend on both a person's age and their height (Arosha's short for his age).
This is perfectly legitimate, and occasionally used in practice. It's referred to as a two-dimensional
membership function. It's also possible to have even more criteria, or to have the membership
function depend on elements from two completely different universes of discourse.

So given this definition, we'd say that the degree of truth of the statement "Parthiban is YOUNG"
is 0.50.
Note: Membership functions almost never have as simple a shape as age(x). They will at least
tend to be triangles pointing up, and they can be much more complex than that. Furthermore,
membership functions so far is discussed as if they always are based on a single criterion, but
this isn't always the case, although it is the most common case. One could, for example, want to
have the membership function for YOUNG depend on both a person's age and their height
(Arosha's short for his age). This is perfectly legitimate, and occasionally used in practice. It's
referred to as a two-dimensional membership function. It's also possible to have even more
criteria, or to have the membership function depend on elements from two completely different
universes

Dept. of Comp Engg. Page 7


LP – IV LAB MANUAL

Fuzzy Set Operations.

Union
The membership function of the Union of two fuzzy sets A and B with membership functions
and respectively is defined as the maximum of the two individual membership functions. This is
called the maximum criterion.

The Union operation in Fuzzy set theory is the equivalent of the OR operation in Boolean algebra.

Intersection
The membership function of the Intersection of two fuzzy sets A and B with membership
functions and respectively is defined as the minimum of the two individual membership
functions. This is called the minimum criterion.

Dept. of Comp Engg. Page 8


LP – IV LAB MANUAL

The Intersection operation in Fuzzy set theory is the equivalent of the AND operation in Boolean
algebra

Complement
The membership function of the Complement of a Fuzzy set A with membership function is defined
as the negation of the specified membership function. This is caleed the negation criterion.

The Complement operation in Fuzzy set theory is the equivalent of the NOT operation in
Boolean algebra.
The following rules which are common in classical set theory also apply to Fuzzy set theory.

Dept. of Comp Engg. Page 9


LP – IV LAB MANUAL

Let A1, A2, ….., An be fuzzy sets in U1, U2, …Un, respectively. The Cartesian product of A1, A2,
….., An is a fuzzy set in the space U1 x U2 x…x Un with the membership function as: µA1 x A2
x…x An (x1, x2,…, xn) = min [µA1 (x1), µA2 (x2), …. µAn (xn)]
So, the Cartesian product of A1, A2, ….., An are donated by A1 x A2 x….. x An.
Cartesian Product: Example Let A = {(3, 0.5), (5, 1), (7, 0.6)} Let B = {(3, 1), (5, 0.6)}
The product is all set of pairs from A and B with the minimum associated memberships Ax B =
{[(3, 3), min (0.5, 1)], [(5, 3), min(1, 1)], [(7, 3), min(0.6, 1)], [(3, 5), min(0.5, 0.6)], [(5, 5), min(1,
0.6)], [(7, 5), min(0.6, 0.6)]} = {[(3, 3), 0.5], [(5, 3), 1], [(7, 3), 0.6], [(3, 5), 0.5], [(5, 5), 0.6], [(7, 5),
0.6]}.

MATLAB Code:

Union Operation:
clear all; %clear all previous workspace variables
x=(0:0.1:10)'; %x-axis from 0 to 10 with 0.1 interval
u1=gaussmf(x,[1,4]); %Gaussian Membership function
u2=trimf(x,[3 6.5 9]); %Triangular Membership function
u_reun=max(u1,u2); %Union Operation on Set u1 and u2
hold on
plot(x,u1,'r') %plotting set1
plot(x,u2,'m') %plotting set2
axis([0 10 0 1.05]);
legend('A','B')
plot(x,u_reun,'color','b','linewidth',4);
title('Union operation on Fuzzy Sets');

Fig: Union operation of two fuzzy sets

Intersection Operation:
clear all; %clear all previous workspace variables
x=(0:0.1:10)'; %x-axis from 0 to 10 with 0.1 interval
u1=gaussmf(x,[1,4]); %Gaussian Membership function

Dept. of Comp Engg. Page 10


LP – IV LAB MANUAL

u2=trimf(x,[3 6.5 9]); %Triangular Membership function


u_reun=min(u1,u2); %Intersection Operation on Set u1 and u2
hold on
plot(x,u1,'r') %plotting set1
plot(x,u2,'m') %plotting set2
axis([0 10 0 1.05]);
legend('A','B')
plot(x,u_reun,'color','b','linewidth',4);
title('Intersection operation on Fuzzy Sets');

Fig: Intersection operation on two fuzzy sets

Complement Operation:
clear all; % clear all previous workspace variables
x=(0:0.1:10)'; % x-axis from 0 to 10
u1=gaussmf(x,[1,4]); % Gaussian Membership function
u_reun=1-u1; % Complement of Set u1
hold on
plot(x,u1,'r') % plot graph of set u1
axis([0 10 0 1]); % x-axis and y-axis definition
plot(x,u_reun,'color','b','linewidth',4); % plot of complement operation
title('Complement operation on Fuzzy Set');

Fig: Complement operation on a set

Dept. of Comp Engg. Page 11


LP – IV LAB MANUAL

Difference:
clear all; % clear all previous workspace variables
x=(0:0.1:10)'; % x-axis from 0 to 10
u1=gaussmf(x,[1,4]); % Gaussian Membership function
u2=trimf(x,[2 5 7]); % Triangular membership function
u3=1-u2; %complement of u2
u_reun=min(u1,u3); % Complement of Set u1
hold on
plot(x,u1,'r') % plot graph of set u1
plot(x,u3,'b') % plot graph of set u1
axis([0 10 0 1]); % x-axis and y-axis definition
plot(x,u_reun,'color','m','linewidth',4); % plot of difference operation
title('Difference operation on two fuzzy sets');

Fig: Difference operation on two fuzzy set

Cartesian product:
clear all; %clear all previous workspace variables
x=(0:0.1:10)'; %x-axis from 0 to 10 with 0.1 interval
u1=gaussmf(x,[1,4]); %Gaussian Membership function
u2=trimf(x,[3 6.5 9]); %Triangular Membership function
u_reun=times(u1,u2); %Cartesian Product Operation on Set u1 and u2
hold on
plot(x,u1,'r') %plotting set1
plot(x,u2,'m') %plotting set2
axis([0 10 0 1.05]);
legend('A','B')
plot(x,u_reun,'color','b','linewidth',4);
title('Cartesian Product on two fuzzy Sets');

Dept. of Comp Engg. Page 12


LP – IV LAB MANUAL

Fig: Cartesian product on two fuzzy sets


MAX-MIN Composition:
clear all; % clear all previous workspace variables
x=(0:0.1:10)'; % x-axis from 0 to 10
u1=trimf(x,[1 5 9]) % Triangular Membership function
u2=trimf(x,[2 4 7]); % Triangular membership function
u_reun=max(min(u1,u2)); % MAX-MIN Composition on sets u1, u2
hold on
plot(x,u1,'r') % plot graph of set u1
plot(x,u2,'m') % plot graph of set u2
axis([0 10 0 1]); % x-axis and y-axis definition
plot(x,u_reun,'color','b','linewidth',4); % plot of MAX-MIN Composition
title('MAX-MIN composition on two fuzzy sets');

Fig: MAX-MIN composition on two fuzzy sets

Conclusion: Thus we learnt implementation of Union, Intersection, Complement and


Difference operations on fuzzy sets. Also created fuzzy relation by Cartesian product of any two
fuzzy sets.

Dept. of Comp Engg. Page 13


LP – IV LAB MANUAL

Assignment No: 02

Aim: Implement genetic algorithm for benchmark function (eg. Square, Rosenbrock function
etc)Initialize the population from the Standard Normal Distribution. Evaluate the fitness of all its
individuals. Then you will do multiple generation of a genetic algorithm. A generation consists
of applying selection, crossover, mutation, and replacement. Use:

• Tournament selection without replacement with tournament size s


• One point crossover with probability Pc
• Bit-flip mutation with probability Pm
• Use full replacement strategy

Objectives:
1. To learn Genetic Algorithm.
2. To learn about optimization algorithm.

Software Requirements:
Ubuntu 18.04

Hardware Requirements:
Pentium IV system with latest configuration

Theory:
Genetic Algorithms are a class of stochastic, population based optimization algorithms inspired
by the biological evolution process using the concepts of “Natural Selection” and “Genetic
Inheritance” (Darwin 1859) and originally developed by Holland.
GAs are now used in engineering and business optimization applications, where the search space
is large and/or too complex (non-smooth) for analytic treatment. This algorithm reflects the

Dept. of Comp Engg. Page 14


LP – IV LAB MANUAL

process of natural selection where the fittest individuals are selected for reproduction in order to
produce offspring of the next generation.

The process of natural selection starts with the selection of fittest individuals from a population. They
produce offspring which inherit the characteristics of the parents and will be added to the next
generation. If parents have better fitness, their offspring will be better than parents and have a
better chance at surviving. This process keeps on iterating and at the end, a generation with the
fittest individuals will be found.
This notion can be applied for a search problem. We consider a set of solutions for a problem
and select the set of best ones out of them.
Five phases are considered in a genetic algorithm.
1. Initial population
2. Fitness function
3. Selection
4. Crossover
5. Mutation

Initial Population
The process begins with a set of individuals which is called a Population. Each individual is a
solution to the problem you want to solve.
An individual is characterized by a set of parameters (variables) known as Genes. Genes are
joined into a string to form a Chromosome (solution).
In a genetic algorithm, the set of genes of an individual is represented using a string, in terms of
an alphabet. Usually, binary values are used (string of 1s and 0s). We say that we encode the
genesin a chromosome

Dept. of Comp Engg. Page 15


LP – IV LAB MANUAL

Fitness Function :

The fitness function determines how fit an individual is (the ability of an individual to compete
with other individuals). It gives a fitness score to each individual. The probability that an
individual will be selected for reproduction is based on its fitness score.

Selection

The idea of selection phase is to select the fittest individuals and let them pass their genes to the
next generation.
Two pairs of individuals (parents) are selected based on their fitness scores. Individuals with
high fitness have more chance to be selected for reproduction.

Crossover

Crossover is the most significant phase in a genetic algorithm. For each pair of parents to be
mated,a crossover point is chosen at random from within the genes. For example, consider the
crossover point to be 3 as shown below.

Dept. of Comp Engg. Page 16


LP – IV LAB MANUAL

Offspring are created by exchanging the genes of parents among themselves until the crossover
point is reached.

The new offspring are added to the population

Mutation
In certain new offspring formed, some of their genes can be subjected to a mutation with a low
random probability. This implies that some of the bits in the bit string can be flipped

Dept. of Comp Engg. Page 17


LP – IV LAB MANUAL

Mutation occurs to maintain diversity within the population and prevent premature convergence.

Termination
The algorithm terminates if the population has converged (does not produce offspring which are
significantly different from the previous generation). Then it is said that the genetic algorithm
has provided a set of solutions to our problem.

Comments
The population has a fixed size. As new generations are formed, individuals with least fitness
die, providing space for new offspring.
The sequence of phases is repeat ed to produce individuals in each new generation which are
better than the previous generation.

GA Software: Python, Weka ,Mendal Soft ,Matlab

Python Code:
Assign2.py
import numpy
import GA

Dept. of Comp Engg. Page 18


LP – IV LAB MANUAL

"""
The y=target is to maximize this equation ASAP:
y = w1x1+w2x2+w3x3+w4x4+w5x5+6wx6
where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7)
What are the best values for the 6 weights w1 to w6?
We are going to use the genetic algorithm for the best possible values after a number of
generations.
"""
# Inputs of the equation.
equation_inputs = [4,-2,3.5,5,-11,-4.7]
# Number of the weights we are looking to optimize.
num_weights = 6
"""
Genetic algorithm parameters:
Mating pool size
Population size
"""
sol_per_pop = 8
num_parents_mating = 4

# Defining the population size.


pop_size = (sol_per_pop,num_weights) # The population will have sol_per_pop
chromosome where each chromosome has num_weights genes.
#Creating the initial population.
new_population = numpy.random.uniform(low=-4.0, high=4.0, size=pop_size)
print(new_population)

num_generations = 5
for generation in range(num_generations):
print("Generation : ", generation)
# Measing the fitness of each chromosome in the population.
fitness = GA.cal_pop_fitness(equation_inputs, new_population)

# Selecting the best parents in the population for mating.


parents = GA.select_mating_pool(new_population, fitness,
num_parents_mating)

# Generating next generation using crossover.


offspring_crossover = GA.crossover(parents,
offspring_size=(pop_size[0]-parents.shape[0], num_weights))

# Adding some variations to the offsrping using mutation.


offspring_mutation = GA.mutation(offspring_crossover)

# Creating the new population based on the parents and offspring.


new_population[0:parents.shape[0], :] = parents
new_population[parents.shape[0]:, :] = offspring_mutation

Dept. of Comp Engg. Page 19


LP – IV LAB MANUAL

# The best result in the current iteration.


print("Best result : ", numpy.max(numpy.sum(new_population*equation_inputs,
axis=1)))

# Getting the best solution after iterating finishing all generations.


#At first, the fitness is calculated for each solution in the final generation.
fitness = GA.cal_pop_fitness(equation_inputs, new_population)
# Then return the index of that solution corresponding to the best fitness.
best_match_idx = numpy.where(fitness == numpy.max(fitness))

print("Best solution : ", new_population[best_match_idx, :])


print("Best solution fitness : ", fitness[best_match_idx])

GA.py:
import numpy

def cal_pop_fitness(equation_inputs, pop):


# Calculating the fitness value of each solution in the current population.
# The fitness function caulcuates the sum of products between each input and its
corresponding weight.
fitness = numpy.sum(pop*equation_inputs, axis=1)
return fitness

def select_mating_pool(pop, fitness, num_parents):


# Selecting the best individuals in the current generation as parents for producing the offspring
of the next generation.
parents = numpy.empty((num_parents, pop.shape[1]))
for parent_num in range(num_parents):
max_fitness_idx = numpy.where(fitness == numpy.max(fitness))
max_fitness_idx = max_fitness_idx[0][0]
parents[parent_num, :] = pop[max_fitness_idx, :]
fitness[max_fitness_idx] = -99999999999
return parents

def crossover(parents, offspring_size):


offspring = numpy.empty(offspring_size)
# The point at which crossover takes place between two parents. Usually it is at the center.
crossover_point = numpy.uint8(offspring_size[1]/2)

for k in range(offspring_size[0]):
# Index of the first parent to mate.
parent1_idx = k%parents.shape[0]
# Index of the second parent to mate.
parent2_idx = (k+1)%parents.shape[0]
# The new offspring will have its first half of its genes taken from the first parent.
offspring[k, 0:crossover_point] = parents[parent1_idx, 0:crossover_point]

Dept. of Comp Engg. Page 20


LP – IV LAB MANUAL

# The new offspring will have its second half of its genes taken from the second parent.
offspring[k, crossover_point:] = parents[parent2_idx, crossover_point:]
return offspring

def mutation(offspring_crossover):
# Mutation changes a single gene in each offspring randomly.
for idx in range(offspring_crossover.shape[0]):
# The random value to be added to the gene.
random_value = numpy.random.uniform(-1.0, 1.0, 1)
offspring_crossover[idx, 4] = offspring_crossover[idx, 4] + random_value
return offspring_crossover

Output:
Generation : 1
Fitness values:
[ 18.24112489 17.0688537 15.99527402 14.40299221 -8.46075629 31.73289712
6.10307563 24.08733441]
Selected parents:
[[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]
[ 2.12480298 2.97122243 3.60375452 -1.40103767 -1.5781162 0.30567304]
[-0.63698911 -2.8638447 2.93392615 -1.40103767 -1.20313655 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -0.72163167 0.7516408 0.00677938]]
Crossover result:
[[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.5781162 0.30567304]
[ 2.12480298 2.97122243 3.60375452 -1.40103767 -1.20313655 0.30567304]
[-0.63698911 -2.8638447 2.93392615 -0.72163167 0.7516408 0.00677938]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]]
Mutation result:
[[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.2392086 0.30567304]
[ 2.12480298 2.97122243 3.60375452 -1.40103767 -0.38610586 0.30567304]
[-0.63698911 -2.8638447 2.93392615 -0.72163167 1.33639943 0.00677938]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.13941727 2.29682254]]
Best result after generation 1 : 34.1663669207
Generation : 2
Fitness values:
[ 31.73289712 24.08733441 18.24112489 17.0688537 34.16636692 10.97522073 -
4.89194068 22.86998223]
Selected Parents:
[[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.2392086 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]
[ 2.12480298 2.97122243 3.60375452 -1.40103767 -1.5781162 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.13941727 2.29682254]]
Crossover result:
[[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.5781162 0.30567304]
[ 2.12480298 2.97122243 3.60375452 -1.56909315 -1.13941727 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.2392086 0.30567304]]

Dept. of Comp Engg. Page 21


LP – IV LAB MANUAL

Mutation result:
[[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.20515009 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -0.73543721 0.30567304]
[ 2.12480298 2.97122243 3.60375452 -1.56909315 -0.50581509 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.20089639 0.30567304]]
Best result after generation 2: 34.5930432629
Generation : 3
Fitness values:
[ 34.16636692 31.73289712 24.08733441 22.86998223 34.59304326 28.6248816
2.09334217 33.7449326 ]
Selected parents:
[[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.20515009 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.2392086 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.20089639 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]]
Crossover result:
[[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.2392086 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.20089639 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.20515009 2.29682254]]
Mutation result:
[[ 3.00912373 -2.745417 3.27131287 -1.40103767 -2.20744102 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.16589294 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.37553107 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.44124005 2.29682254]]
Best result after generation 3: 44.8169235189
Generation : 4
Fitness values
[ 34.59304326 34.16636692 33.7449326 31.73289712 44.81692352
33.35989464 36.46723397 37.19003273]
Selected parents:
[[ 3.00912373 -2.745417 3.27131287 -1.40103767 -2.20744102 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.44124005 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.37553107 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.20515009 2.29682254]]
Crossover result:
[[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.37553107 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.20515009 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -2.20744102 0.30567304]]
Mutation result:
[[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.13382082 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.98105233 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.27638584 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.70558545 0.30567304]]
Best result after generation 4: 44.8169235189

Conclusion: Thus we learnt implementation of genetic algorithm for benchmark function.

Dept. of Comp Engg. Page 22


LP – IV LAB MANUAL

Assignment No: 3

Aim: Implement Particle swarm optimization for benchmark function (eg. Square, Rosenbrock
function).Initialize the population from the Standard Normal Distribution. Evaluate fitness of all
particles. Use:
 c1=c2 = 2
 Inertia weight is linearly varied between 0.9 to 0.4.
 Global best variation

Objectives:
1. To learn swarm algorithm
2. To learn about optimization algorithm.

Software Requirements:
Ubuntu 18.04

Hardware Requirements:
Pentium IV system with latest configuration

Theory:
Particle swarm optimization (PSO) is a population based stochastic optimization technique
developed by Dr. Eberhart and Dr. Kennedy in 1995, inspired by social behavior of bird flocking
or fish schooling.
PSO shares many similarities with evolutionary computation techniques such as Genetic
Algorithms (GA). The system is initialized with a population of random solutions and searches
for optima by updating generations. However, unlike GA, PSO has no evolution operators such
as crossover and mutation. In PSO, the potential solutions, called particles, fly through the
problem space by following the current optimum particles. Compared to GA, the advantages of
PSO are that PSO is easy to implement and there are few parameters to adjust. PSO has been
successfully applied in many areas: function optimization, artificial neural network training ,
fuzzy system control, and other areas where GA can be applied.

Dept. of Comp Engg. Page 23


LP – IV LAB MANUAL

PSO uses a bunch of particles called the swarm. These particles are allowed to move around &
explore the search-space.
These particles move in a direction which is guided by —
1. The particle’s own previous velocity (Inertia)
2. Distance from the individual particles’ best known position (Cognitive Force)
3. Distance from the swarms best known position (Social Force)

Particle movement is overpowered by the swam direction

Essentially the particles collectively communicate with each other to converge faster. The swarm
doesn’t fully explore the search space but potentially finds a better solution.
Interestingly the overall direction of the swarm movement can be changed at any point of time
when a particle’s individual best is better than the swarm best. This allows a lot of disorder and
more chances of getting close to the global minima of the cost function

Algorithm
PSO is initialized with a group of random particles (solutions) and then searches for optima by
updating generations. In every iteration, each particle is updated by following two "best" values.
The first one is the best solution (fitness) it has achieved so far. (The fitness value is also stored.)
This value is called pbest. Another "best" value that is tracked by the particle swarm optimizer is
the best value, obtained so far by any particle in the population. This best value is a global best
and called gbest. When a particle takes part of the population as its topological neighbors, the
best value is a local best and is called lbest.

Dept. of Comp Engg. Page 24


LP – IV LAB MANUAL

After finding the two best values, the particle updates its velocity and positions with following
equation (a) and (b).

v[] = v[] + c1 * rand() * (pbest[] - present[]) + c2 * rand() * (gbest[] - present[]) (a)


present[] = persent[] + v[] (b)

v[] is the particle velocity, persent[] is the current particle (solution). pbest[] and gbest[] are
defined as stated before. rand () is a random number between (0,1). c1, c2 are learning factors.
usually c1 = c2 = 2.

The pseudo code of the procedure is as follows

For each particle


Initialize particle
END
Do
For each particle
Calculate fitness value
If the fitness value is better than the best fitness value (pBest) in
history set current value as the new pBest
End

Choose the particle with the best fitness value of all the particles as the gBest
For each particle
Calculate particle velocity according equation (a)
Update particle position according equation (b)
End

While maximum iterations or minimum error criteria is not attainedParticles' velocities on each
dimension are clamped to a maximum velocity Vmax. If the sum of accelerations would cause
the velocity on that dimension to exceed Vmax, which is a parameter specified by the user. Then
the velocity on that dimension is limited to Vmax.

Dept. of Comp Engg. Page 25


LP – IV LAB MANUAL

Flowchart :

MATLAB Code:
ofun.m:
function f=ofun(x)
% objective function (minimization)
of=10*(x(1)-1)^2+20*(x(2)-2)^2+30*(x(3)-3)^2;
% constraints (all constraints must be converted into <=0 type)
% if there is no constraints then comments all c0 lines below
c0=[];
c0(1)=x(1)+x(2)+x(3)-5; % <=0 type constraints
c0(2)=x(1)^2+2*x(2)-x(3); % <=0 type constraints
% defining penalty for each constraint
for i=1:length(c0)
if c0(i)>0
c(i)=1;
else
c(i)=0;
end
end
penalty=10000; % penalty on each constraint violation
f=of+penalty*sum(c); % fitness function

Dept. of Comp Engg. Page 26


LP – IV LAB MANUAL

run_pso.m:
tic
clear all
close all
rng default
LB=[0 0 0]; %lower bounds of variables
UB=[10 10 10]; %upper bounds of variables
% pso parameters values
m=3; % number of variables
n=100; % population size
wmax=0.9; % inertia weight
wmin=0.4; % inertia weight
c1=2; % acceleration factor
c2=2; % acceleration factor
% pso main program----------------------------------------------------start
maxite=1000; % set maximum number of iteration
maxrun=10; % set maximum number of runs need to be
for run=1:maxrun
run
% pso initialization----------------------------------------------start
for i=1:n
for j=1:m
x0(i,j)=round(LB(j)+rand()*(UB(j)-LB(j)));
end
end
x=x0; % initial population
v=0.1*x0; % initial velocity
for i=1:n
f0(i,1)=ofun(x0(i,:));
end
[fmin0,index0]=min(f0);
pbest=x0; % initial pbest
gbest=x0(index0,:); % initial gbest
% pso initialization------------------------------------------------end
% pso algorithm---------------------------------------------------start
ite=1;
tolerance=1;
while ite<=maxite && tolerance>10^-12
w=wmax-(wmax-wmin)*ite/maxite; % update inertial weight
% pso velocity updates
for i=1:n
for j=1:m
v(i,j)=w*v(i,j)+c1*rand()*(pbest(i,j)-x(i,j))...
+c2*rand()*(gbest(1,j)-x(i,j));
end
end

Dept. of Comp Engg. Page 27


LP – IV LAB MANUAL

% pso position update


for i=1:n
for j=1:m
x(i,j)=x(i,j)+v(i,j);
end
end
% handling boundary violations
for i=1:n
for j=1:m
if x(i,j)<LB(j)
x(i,j)=LB(j);
elseif x(i,j)>UB(j)
x(i,j)=UB(j);
end
end
end
% evaluating fitness
for i=1:n
f(i,1)=ofun(x(i,:));
end
% updating pbest and fitness
for i=1:n
if f(i,1)<f0(i,1)
pbest(i,:)=x(i,:);
f0(i,1)=f(i,1);
end
end
[fmin,index]=min(f0); % finding out the best particle
ffmin(ite,run)=fmin; % storing best fitness
ffite(run)=ite; % storing iteration count
% updating gbest and best fitness
if fmin<fmin0
gbest=pbest(index,:);
fmin0=fmin;
end
% calculating tolerance
if ite>100;
tolerance=abs(ffmin(ite-100,run)-fmin0);
end
% displaying iterative results
if ite==1
disp(sprintf('Iteration Best particle Objective fun'));
end
disp(sprintf('%8g %8g %8.4f',ite,index,fmin0));
ite=ite+1;
end
% pso algorithm-----------------------------------------------------end

Dept. of Comp Engg. Page 28


LP – IV LAB MANUAL

gbest;
fvalue=10*(gbest(1)-1)^2+20*(gbest(2)-2)^2+30*(gbest(3)-3)^2;
fff(run)=fvalue;
rgbest(run,:)=gbest;
disp(sprintf('--------------------------------------'));
end
% pso main program------------------------------------------------------end
disp(sprintf('\n'));
disp(sprintf('*********************************************************'));
disp(sprintf('Final Results-----------------------------'));
[bestfun,bestrun]=min(fff)
best_variables=rgbest(bestrun,:)
disp(sprintf('*********************************************************'));
toc
% PSO convergence characteristic
plot(ffmin(1:ffite(bestrun),bestrun),'-k');
xlabel('Iteration');
ylabel('Fitness function value');
title('PSO convergence characteristic')

Output :

Conclusion: Thus we learnt implementation of particle swarm optimization for benchmark function.

Dept. of Comp Engg. Page 29


LP – IV LAB MANUAL

Assignment No: 04

Aim: Implement basic logic gates using Mc-Culoch-Pitts or Hebbnet neural networks.

Objectives:
1.The student will be able to obtain the fundamentals and different architecture of neural
networks.

2. The student will have a broad knowledge in developing the different algorithms for neural
networks.

Software Requirements:
Ubuntu 18.04

Hardware Requirements:
Pentium IV system with latest configuration

Outcomes: The students will be able to,


 Describe the relation between real brains and simple artificial neural network
 models.
  Understand the role of neural networks in engineering.
 Apply the knowledge of computing and engineering concept to this discipline.

Theory:

Neural network was inspired by the design and functioning of human brain and components.
Definition:
―Information processing model that is inspired by the way biological nervous system (i.e the
brain) process information, is called Neural Network.
Neural Network has the ability to learn by examples. It is not designed to perform fix /specific
task, rather task which need thinking (e.g. Predictions).

ANN is composed of large number of highly interconnected processing elements(neurons)


working in unison to solve problems. It mimic human brain. It is configured for special

Dept. of Comp Engg. Page 30


LP – IV LAB MANUAL

application such as pattern recognition and data classification through a learning process. ANN is
85-90% accurate.

Basic Operation of a Neural Network:

X1 and X2 – input neurons.

Y- output neuron

Weighted interconnection links- W1 and W2.

Net input calculation is :

Yin= x1w1+x2w2

Output is :

y=f(Yin)

Output= function

The McCulloch-Pitts Model of Neuron:

The early model of an artificial neuron is introduced by Warren McCulloch and Walter Pitts in
1943. The McCulloch-Pitts neural model is also known as linear threshold gate. It is a neuron of
a set of inputs I1,I2,I3…Im and one output y . The linear threshold gate simply classifies the set
of inputs into two different classes. Thus the output y is binary. Such a function can be described
mathematically using these equations:

Dept. of Comp Engg. Page 31


LP – IV LAB MANUAL

W1,W2…Wm are weight values normalized in the range of either (0,1) or (-1,1) and associated
with each input line, Sum is the weighted sum, and T is a threshold constant. The function f is a
linear step function at threshold T as shown in figure

A simple M-P neuron is shown in the figure.


It is excitatory with weight (w>0) / inhibitory with weight –p (p<0).
In the Fig., inputs from x1 to xn possess excitatory weighted connection and Xn+1 to xn+m has
inhibitory weighted interconnections.
Since the firing of neuron is based on threshold, activation function is defined as

For inhibition to be absolute, the threshold with the activation function should satisfy the
following condition:θ >nw –p

Output will fire if it receives ―k‖ or more excitatory inputs but no inhibitory inputs where

kw≥θ>(k-1) w

- The M-P neuron has no particular training algorithm.

Dept. of Comp Engg. Page 32


LP – IV LAB MANUAL

- An analysis is performed to determine the weights and the threshold.


- It is used as a building block where any function or phenomenon is modelled based on a
logic function.

-
X1 X2 Y
0 0 0

0 1 1
1 0 1

1 1 0
-

Activation function Yin is as follows:

Yin=x1w1+x2w2

As we know,
-
-

Let Z1= and Z2=

X1 X2 Z1
0 0 0
0 1 0
1 0 1
1 - 1 0

For Z1,
W11=1 and W12=-1
Θ=1

X1 X2 Z2
0 0 0
0 1 1
1 0 0

Dept. of Comp Engg. Page 33


LP – IV LAB MANUAL

1 1 0

For Z2,

W11=-1 and W12=1

Θ=1

Y=Z1+Z2

Z1 Z2 Y
0 0 0
0 1 1
1 0 1
1 1 0

For Y,

W11=1 and W12=1

Θ=1

Input:

Input1 Input2 Output


0 0 0
0 1 0
1 0 0
1 1 1

Python Code:
import numpy as np
class Perceptron:
def __init__(self, input_length, weights=None):
if weights is None:
self.weights = np.ones(input_length) * 0.5
else:
self.weights = weights
@staticmethod
def unit_step_function(x):
if x > 0.5:
return 1
return 0
def __call__(self, in_data):

Dept. of Comp Engg. Page 34


LP – IV LAB MANUAL

weighted_input = self.weights * in_data


weighted_sum = weighted_input.sum()
return Perceptron.unit_step_function(weighted_sum)
p = Perceptron(2, np.array([0.5, 0.5]))
for x in [np.array([0, 0]), np.array([0, 1]),
np.array([1, 0]), np.array([1, 1])]:
y = p(np.array(x))
print(x, y)

Output:
[0 0] 0
[0 1] 0
[1 0] 0
[1 1] 1

Conclusion: Mc-Culloch pits Model is implemented for XOR function by using the
thresholding activation function. Activation of M-P neurons is binary (i.e) at any time step the
neuron may fire or may not fire. Threshold plays major role here.

Dept. of Comp Engg. Page 35


LP – IV LAB MANUAL

Assignment No: 05

Aim: Write a program to find the Boolean function to implement following single layer
perceptron.Assume all activation functions to be the threshold function which is 1 for all input
values greater than zero and 0, otherwise

Objectives:
1. To learn single layer perceptron.
2. To learn Boolean logic implementation using perceptron.

Software Requirements:
Ubuntu 18.04

Hardware Requirements:
Pentium IV system with latest configuration

Theory:
The most common Boolean functions are AND, OR, NOT. The Boolean logic AND only returns
1 if both inputs are 1 else 0, Boolean logic OR returns 1 for all inputs with 1, and will only return
0 if both input is 0 and lastly logic NOT returns the invert of the input, if the input is 0 it returns
1, if the input is 1 it returns 0. To make it clear the image below shows the truth table for the
basic Boolean Function.

Dept. of Comp Engg. Page 36


LP – IV LAB MANUAL

The columns A and B are the inputs and column Z is the output. So, for the inputs A = 0, B = 0
the output is Z = 0.
Perceptron

A perceptron is the basic part of a neural network. A perceptron represents a single neuron on a
human’s brain, it is composed of the dataset ( Xm ) , the weights ( Wm ) and an activation
function, that will then produce an output and a bias.
The datasets ( inputs ) are converted into an ndarray which is then matrix multiplied to another
ndarray that holds the weights. Summing up all matrix multipy and adding a bias will create the
net input function, the output would then passed into an activation function that would determine
if the neuron needs to fire an output or not.
Most common activation function used for classification used is a sigmoid function, which is a
great function for classification (Although sigmoid is not the leading activation function for
middle layers of neural networks [ ehem ReLU / Leaky ReLU ] it still is widely used for final
classifications. ) The perceptron is simply separating the input into 2 categories, those that cause
a fire, and those that don't. It does this by looking at (in the 2-dimensional case): w1I1 + w2I2< t

If the LHS is < t, it doesn't fire, otherwise it fires. That is, it is drawing the line:

w1I1 + w2I2 = t

and looking at where the input point lies. Points on one side of the line fall into 1 category,
points on the other side fall into the other category. And because the weights and thresholds can
be anything, this is just any line across the 2 dimensional input space. So what the perceptron is
doing is simply drawing a line across the 2-d input space. Inputs to one side of the line are
classified into one category, inputs on the other side are classified into another. e.g. the OR
perceptron, w1=1, w2=1, t=0.5, draws the line: I1 + I2 = 0.5

Dept. of Comp Engg. Page 37


LP – IV LAB MANUAL

across the input space, thus separating the points (0,1),(1,0),(1,1) from the point (0,0):

As you might imagine, not every set of points can be divided by a line like this. Those that can
be, are called linearly separable.
In 2 input dimensions, we draw a 1 dimensional line. In n dimensions, we are drawing the (n-1)
dimensional hyperplane:
w1I1 + .. + wnIn = t

Perceptron Learning Algorithm

Dept. of Comp Engg. Page 38


LP – IV LAB MANUAL

We initialize w with some random vector. We then iterate over all the examples in the data, (P U
N) both positive and negative examples. Now if an input x belongs to P, ideally what should the
dot product w.x be? It would be greater than or equal to 0 because that’s the only thing what our
perceptron wants at the end of the day so let's give it that. And if x belongs to N, the dot product
MUST be less than 0. So if you look at the if conditions in the while loop:

Case 1: When x belongs toPand its dot product w.x < 0


Case 2: When x belongs toNand its dot product w.x ≥ 0
Only for these cases, we are updating our randomly initialized w. Otherwise, we don’t touch w at
all because Case 1 and Case 2 are violating the very rule of a perceptron. So we are adding x to
w (ahem vector addition ahem) in Case 1 and subtracting x from w in Case 2.

Python Code:
import numpy as np
def perceptron(weights, inputs, bias):
model = np.add(np.dot(inputs, weights), bias)
logit = activation_function(model, type="sigmoid")
return np.round(logit)
def activation_function(model, type="sigmoid"):
return {
"sigmoid": 1 / (1 + np.exp(-model))
}[type]
def compute(data, logic_gate, weights, bias):
weights = np.array(weights)
output = np.array([ perceptron(weights, datum, bias) for datum in data ])

Dept. of Comp Engg. Page 39


LP – IV LAB MANUAL

return output
# This function is taken from https://github.com/fjcamillo/Neural-Representation-of-Logic-
Functions/blob/master/Logic.py
def print_template(dataset, name, data):
# act = name[6:]
print("Logic Function: {}".format(name.upper()))
print("X0\tX1\tX2\tY")
toPrint = ["{1}\t{2}\t{3}\t{0}".format(output, *datas) for datas, output in zip(dataset, data)]
for i in toPrint:
print(i)
def main():
dataset = np.array([
[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]
])
gates = {
"and": compute(dataset, "and", [1, 1, 1], -2),
"or": compute(dataset, "or", [1, 1, 1], -0.9),
"not": compute(np.array([ [0], [1] ]), "not", [-1], 1),
"nand": compute(dataset, "nand", [-1, -1, -1], 3),
"nor": compute(dataset, "nor", [-1, -1, -1], 1),
# _xor = compute(dataset, "and", [1], dataset),
# _xnor = compute(dataset, "xnor", [], dataset)
}
for gate in gates:
print_template(dataset, gate, gates[gate])
if __name__ == '__main__':
main()

Output:

runfile('C:/Users/Bhavesh/.spyder-py3/temp.py', wdir='C:/Users/Bhavesh/.spyder-py3')
Logic Function: AND
X0 X1 X2 Y
0 0 0 0.0
0 0 1 0.0
0 1 0 0.0
0 1 1 0.0
1 0 0 0.0
1 0 1 0.0
1 1 0 0.0

Dept. of Comp Engg. Page 40


LP – IV LAB MANUAL

1 1 1 1.0
Logic Function: OR
X0 X1 X2 Y
0 0 0 0.0
0 0 1 1.0
0 1 0 1.0
0 1 1 1.0
1 0 0 1.0
1 0 1 1.0
1 1 0 1.0
1 1 1 1.0
Logic Function: NOT
X0 X1 X2 Y
0 0 0 1.0
0 0 1 0.0
Logic Function: NAND
X0 X1 X2 Y
0 0 0 1.0
0 0 1 1.0
0 1 0 1.0
0 1 1 1.0
1 0 0 1.0
1 0 1 1.0
1 1 0 1.0
1 1 1 0.0
Logic Function: NOR
X0 X1 X2 Y
0 0 0 1.0
0 0 1 0.0
0 1 0 0.0
0 1 1 0.0
1 0 0 0.0
1 0 1 0.0
1 1 0 0.0
1 1 1 0.0

Conclusion : Thus we learned implementation of single layer perceptron for AND ,OR and
NOT Boolean function.

Dept. of Comp Engg. Page 41


LP – IV LAB MANUAL

Assignment No: 06
Aim: Implement Union, Intersection, Complement and Difference operations on fuzzy sets. Also create
fuzzy relation by Cartesian product of any two fuzzy sets and perform max-min composition on any two
fuzzy relations.

Objectives:
1. To learn different fuzzy operation on fuzzy set.
2. To learn min-max composition on fuzzy set.

Software Requirements:
Ubuntu 18.04

Hardware Requirements:
Pentium IV system with latest configuration

Theory:
Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the
concept of partial truth- truth values between "completely true" and "completely false". As its
name suggests, it is the logic underlying modes of reasoning which are approximate rather than
exact. The importance of fuzzy logic derives from the fact that most modes of human reasoning
and especially common sense reasoning are approximate in nature.
The essential characteristics of fuzzy logic as founded by ZaderLotfi are as follows.
฀ In fuzzy logic, exact reasoning is viewed as a limiting case of approximate reasoning.

฀ In fuzzy logic everything is a matter of degree.

฀ Any logical system can be fuzzified

฀ In fuzzy logic, knowledge is interpreted as a collection of elastic or, equivalently , fuzzy
constraint on a collection of variables

฀ Inference is viewed as a process of propagation of elastic constraints.
The third statement hence, defines Boolean logic as a subset of Fuzzy logic.

Dept. of Comp Engg. Page 42


LP – IV LAB MANUAL

Fuzzy Sets

Fuzzy Set Theory was formalised by Professor Lofti Zadeh at the University of California in
1965. What Zadeh proposed is very much a paradigm shift that first gained acceptance in the Far
East and its successful application has ensured its adoption around the world.

A paradigm is a set of rules and regulations which defines boundaries and tells us what to do to
be successful in solving problems within these boundaries. For example the use of transistors
instead of vacuum tubes is a paradigm shift - likewise the development of Fuzzy Set Theory
from conventional bivalent set theory is a paradigm shift.

Bivalent Set Theory can be somewhat limiting if we wish to describe a 'humanistic' problem
mathematically. For example, Fig 1 below illustrates bivalent sets to characterise the temperature
of a room.

The most obvious limiting feature of bivalent sets that can be seen clearly from the diagram is that
they are mutually exclusive - it is not possible to have membership of more than one set ( opinion
would widely vary as to whether 50 degrees Fahrenheit is 'cold' or 'cool' hence the expert knowledge
we need to define our system is mathematically at odds with the humanistic world). Clearly, it is not
accurate to define a transiton from a quantity such as 'warm' to 'hot' by the application of one degree
Fahrenheit of heat. In the real world a smooth (unnoticeable) drift from warm to hot would occur.

Dept. of Comp Engg. Page 43


LP – IV LAB MANUAL

This natural phenomenon can be described more accurately by Fuzzy Set Theory. Fig.2 below shows
how fuzzy sets quantifying the same information can describe this natural drift.

The whole concept can be illustrated with this example. Let's talk about people and "youthness".
In this case the set S (the universe of discourse) is the set of people. A fuzzy subset YOUNG is
also defined, which answers the question "to what degree is person x young?" To each person in
the universe of discourse, we have to assign a degree of membership in the fuzzy subset
YOUNG. The easiest way to do this is with a membership function based on the person's age.

young(x) = { 1, if age(x) <= 20,


(30-age(x))/10, if 20 < age(x) <= 30,
0, if age(x) >30 }
A graph of this looks like:

Dept. of Comp Engg. Page 44


LP – IV LAB MANUAL

Given this definition, here are some example values:

Person Age Degree of youth


Johan 10 1.00
Edwin 21 0.90

Parthiban 25 0.50
Arosha 26 0.40
Chin Wei 28 0.20
Rajkumar 83 0.00

So given this definition, we'd say that the degree of truth of the statement "Parthiban is YOUNG"
is 0.50.
Note: Membership functions almost never have as simple a shape as age(x). They will at least tend to
be triangles pointing up, and they can be much more complex than that. Furthermore, membership
functions so far is discussed as if they always are based on a single criterion, but this isn't always the
case, although it is the most common case. One could, for example, want to have the membership
function for YOUNG depend on both a person's age and their height (Arosha's short for his age).
This is perfectly legitimate, and occasionally used in practice. It's referred to as a two-dimensional
membership function. It's also possible to have even more criteria, or to have the membership
function depend on elements from two completely different universes of discourse.

Fuzzy Set Operations.

Union

The membership function of the Union of two fuzzy sets A and B with membership functions
and respectively is defined as the maximum of the two individual membership functions.
This is called the maximum criterion.

Dept. of Comp Engg. Page 45


LP – IV LAB MANUAL

The Union operation in Fuzzy set theory is the equivalent of the OR operation in Boolean algebra.
Intersection
The membership function of the Intersection of two fuzzy sets A and B with membership
functions and respectively is defined as the minimum of the two individual
membership functions. This is called the minimum criterion.

The Intersection operation in Fuzzy set theory is the equivalent of the AND operation in Boolean
algebra. Complement The membership function of the Complement of a Fuzzy set A with
membership function is defined as the negation of the specified membership function. This is caleed
the negation criterion.

Dept. of Comp Engg. Page 46


LP – IV LAB MANUAL

The Complement operation in Fuzzy set theory is the equivalent of the NOT operation in
Boolean algebra.
The following rules which are common in classical set theory also apply to Fuzzy set theory.

Let A1, A2, ….., An be fuzzy sets in U1, U2, …Un, respectively. The Cartesian product of A1, A2,
….., An is a fuzzy set in the space U1 x U2 x…x Un with the membership function as: µA1 x A2
x…x An (x1, x2,…, xn) = min [µA1 (x1), µA2 (x2), …. µAn (xn)]
So, the Cartesian product of A1, A2, ….., An are donated by A1 x A2 x….. x An.

Dept. of Comp Engg. Page 47


LP – IV LAB MANUAL

Cartesian Product: Example Let A = {(3, 0.5), (5, 1), (7, 0.6)} Let B = {(3, 1), (5, 0.6)}
The product is all set of pairs from A and B with the minimum associated memberships Ax B =
{[(3, 3), min (0.5, 1)], [(5, 3), min(1, 1)], [(7, 3), min(0.6, 1)], [(3, 5), min(0.5, 0.6)], [(5, 5), min(1,
0.6)], [(7, 5), min(0.6, 0.6)]} = {[(3, 3), 0.5], [(5, 3), 1], [(7, 3), 0.6], [(3, 5), 0.5], [(5, 5), 0.6], [(7, 5),
0.6]}.
MATLAB Code:

Union Operation:
clear all; %clear all previous workspace variables
x=(0:0.1:10)'; %x-axis from 0 to 10 with 0.1 interval
u1=gaussmf(x,[1,4]); %Gaussian Membership function
u2=trimf(x,[3 6.5 9]); %Triangular Membership function
u_reun=max(u1,u2); %Union Operation on Set u1 and u2
hold on
plot(x,u1,'r') %plotting set1
plot(x,u2,'m') %plotting set2
axis([0 10 0 1.05]);
legend('A','B')
plot(x,u_reun,'color','b','linewidth',4);
title('Union operation on Fuzzy Sets');

Fig: Union operation of two fuzzy sets


Intersection Operation:
clear all; %clear all previous workspace variables
x=(0:0.1:10)'; %x-axis from 0 to 10 with 0.1 interval
u1=gaussmf(x,[1,4]); %Gaussian Membership function
u2=trimf(x,[3 6.5 9]); %Triangular Membership function
u_reun=min(u1,u2); %Intersection Operation on Set u1 and u2
hold on
plot(x,u1,'r') %plotting set1
plot(x,u2,'m') %plotting set2
axis([0 10 0 1.05]);
legend('A','B')

Dept. of Comp Engg. Page 48


LP – IV LAB MANUAL

plot(x,u_reun,'color','b','linewidth',4);
title('Intersection operation on Fuzzy Sets');

Fig: Intersection operation on two fuzzy sets

Complement Operation:
clear all; % clear all previous workspace variables
x=(0:0.1:10)'; % x-axis from 0 to 10
u1=gaussmf(x,[1,4]); % Gaussian Membership function
u_reun=1-u1; % Complement of Set u1
hold on
plot(x,u1,'r') % plot graph of set u1
axis([0 10 0 1]); % x-axis and y-axis definition
plot(x,u_reun,'color','b','linewidth',4); % plot of complement operation
title('Complement operation on Fuzzy Set');

Fig: Complement operation on a set


Difference:
clear all; % clear all previous workspace variables
x=(0:0.1:10)'; % x-axis from 0 to 10
u1=gaussmf(x,[1,4]); % Gaussian Membership function
u2=trimf(x,[2 5 7]); % Triangular membership function
u3=1-u2; %complement of u2

Dept. of Comp Engg. Page 49


LP – IV LAB MANUAL

u_reun=min(u1,u3); % Complement of Set u1


hold on
plot(x,u1,'r') % plot graph of set u1
plot(x,u3,'b') % plot graph of set u1
axis([0 10 0 1]); % x-axis and y-axis definition
plot(x,u_reun,'color','m','linewidth',4); % plot of difference operation
title('Difference operation on two fuzzy sets');

Fig: Difference operation on two fuzzy set

Cartesian product:
clear all; %clear all previous workspace variables
x=(0:0.1:10)'; %x-axis from 0 to 10 with 0.1 interval
u1=gaussmf(x,[1,4]); %Gaussian Membership function
u2=trimf(x,[3 6.5 9]); %Triangular Membership function
u_reun=times(u1,u2); %Cartesian Product Operation on Set u1 and u2
hold on
plot(x,u1,'r') %plotting set1
plot(x,u2,'m') %plotting set2
axis([0 10 0 1.05]);
legend('A','B')
plot(x,u_reun,'color','b','linewidth',4);
title('Cartesian Product on two fuzzy Sets');

Fig: Cartesian product on two fuzzy sets

Dept. of Comp Engg. Page 50


LP – IV LAB MANUAL

MAX-MIN Composition:
clear all; % clear all previous workspace variables
x=(0:0.1:10)'; % x-axis from 0 to 10
u1=trimf(x,[1 5 9]) % Triangular Membership function
u2=trimf(x,[2 4 7]); % Triangular membership function
u_reun=max(min(u1,u2)); % MAX-MIN Composition on sets u1, u2
hold on
plot(x,u1,'r') % plot graph of set u1
plot(x,u2,'m') % plot graph of set u2
axis([0 10 0 1]); % x-axis and y-axis definition
plot(x,u_reun,'color','b','linewidth',4); % plot of MAX-MIN Composition
title('MAX-MIN composition on two fuzzy sets');

Fig: MAX-MIN composition on two fuzzy sets

Conclusion: Thus we learnt implementation of Union, Intersection, Complement and


Difference operations on fuzzy sets. Also created fuzzy relation by Cartesian product of any two
fuzzy sets.

Dept. of Comp Engg. Page 51


LP – IV LAB MANUAL

Mini-Project on Genetic Algorithm

Aim: Apply the Genetic Algorithm for optimization on a dataset obtained from UCI ML
repository.For Example: IRIS Dataset or Travelling Salesman Problem or KDD Dataset.

Objectives:
1.To learn Genetic Algorithm .
2.To learn about optimization algorithm
3.To find the set of solution for Travelling Salesman Problem.

Software Requirements:
Ubuntu 18.04
Python (3.6)64 bit

Theory:
Genetic Algorithms are a class of stochastic, population based optimization algorithms inspired
by the biological evolution process using the concepts of “Natural Selection” and “Genetic
Inheritance” (Darwin 1859) and originally developed by Holland.
GAs are now used in engineering and business optimization applications, where the search space
is large and/or too complex (non-smooth) for analytic treatment. This algorithm reflects the
process of natural selection where the fittest individuals are selected for reproduction in order to
produce offspring of the next generation.
The process of natural selection starts with the selection of fittest individuals from a population. They
produce offspring which inherit the characteristics of the parents and will be added to the
nextgeneration. If parents have better fitness, their offspring will be better than parents and have
a better chance at surviving. This process keeps on iterating and at the end, a generation with the
fittest individuals will be found.

Dept. of Comp Engg. Page 52


LP – IV LAB MANUAL

This notion can be applied for a search problem. We consider a set of solutions for a problem
and select the set of best ones out of them.

Five phases are considered in a genetic algorithm.

1.Initial population

2. Fitness function

3.Selection

4.Crossover

5. Mutation

Initial Population
The process begins with a set of individuals which is called a Population. Each individual is a
solution to the problem you want to solve.
An individual is characterized by a set of parameters (variables) known as Genes. Genes are
joined into a string to form a Chromosome (solution).
In a genetic algorithm, the set of genes of an individual is represented using a string, in terms of
an alphabet. Usually, binary values are used (string of 1s and 0s). We say that we encode the
genesin a chromosome.

Dept. of Comp Engg. Page 53


LP – IV LAB MANUAL

The fitness function determines how fit an individual is (the ability of an individual to compete
with other individuals). It gives a fitness score to each individual. The probability that an
individual will be selected for reproduction is based on its fitness score.

Selection
The idea of selection phase is to select the fittest individuals and let them pass their genes to the
next generation.
Two pairs of individuals (parents) are selected based on their fitness scores. Individuals with
high fitness have more chance to be selected for reproduction.

Crossover
Crossover is the most significant phase in a genetic algorithm. For each pair of parents to be
mated,a crossover point is chosen at random from within the genes. For example, consider the
crossover point to be 3 as shown below.

Offspring are created by exchanging the genes of parents among themselves until the crossover
point is reached.

Dept. of Comp Engg. Page 54


LP – IV LAB MANUAL

The new offspring are added to the population

Mutation
In certain new offspring formed, some of their genes can be subjected to a mutation with a low
random probability. This implies that some of the bits in the bit string can be flipped.

Mutation occurs to maintain diversity within the population and prevent premature convergence.

Termination

The algorithm terminates if the population has converged (does not produce offspring which are
significantly different from the previous generation). Then it is said that the genetic algorithm
has provided a set of solutions to our problem.

Dept. of Comp Engg. Page 55


LP – IV LAB MANUAL

Genetic Algorithm :

Optimization problems

In a genetic algorithm, a population of candidate solutions (called individuals, creatures, or


phenotypes) to an optimization problem is evolved toward better solutions. Each candidate
solution has a set of properties (its chromosomes or genotype) which can be mutated and altered;

traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are
also possible.

The evolution usually starts from a population of randomly generated


individuals, and is an iterative process, with the population in each iteration called a generation.
In each generation, the fitness of every individual in the population is evaluated; the fitness is
usually the value of the objective function in the optimization problem being solved. The more fit
individuals are stochastically selected from the current population, and each individual's genome
is modified (recombined and possibly randomly mutated) to form a new generation. The new
generation of candidate solutions is then used in the next iteration of the algorithm. Commonly,
the algorithm terminates when either a maximum number of generations has been produced, or a
satisfactory fitness level has been reached for the population.

Dept. of Comp Engg. Page 56


LP – IV LAB MANUAL

A typical genetic algorithm requires:

1. a genetic representation of the solution domain,


2. a fitness function to evaluate the solution domain.

A standard representation of each candidate solution is as an array of bits.[3] Arrays of other


types and structures can be used in essentially the same way. The main property that makes these
genetic representations convenient is that theirparts are easily aligned due to their fixed size,
which facilitates simple crossover operations. Variable length representations may also be used,
but crossover implementation is more complex in this case. Tree-like representations are
explored in genetic programming and graph-form representations are explored in evolutionary
programming; a mix of both linear chromosomes and trees is explored in gene expression
programming.

Once the genetic representation and the fitness function are defined, a GA proceeds to initialize a
population of solutions and then to improve it through repetitive application of the mutation,
crossover, inversion and selection operators.

Initialization
The population size depends on the nature of the problem, but typically contains several
hundreds or thousands of possible solutions. Often, the initial population is generated randomly,
allowing the entire range of possible solutions (the search space). Occasionally, the solutions
may be "seeded" in areas where optimal solutions are likely to be found.

Selection

During each successive generation, a portion of the existing population is selected to breed a new
generation. Individual solutions are selected through a fitness-based process, where fitter

solutions (as measured by a fitness function) are typically more likely to be selected. Certain
selection methods rate the fitness of each solution and preferentially select the best solutions.

Other methods rate only a random sample of the population, as the former process may be very
time-consuming.
The fitness function is defined over the genetic representation and measures the quality of the
represented solution. The fitness function is always problem dependent. For instance, in the
knapsack problem one wants to maximize the total value of objects that can be put in a knapsack

of some fixed capacity. A representation of a solution might be an array of bits, where each bit
represents a different object, and the value of the bit (0 or 1) represents whether or not the object

Dept. of Comp Engg. Page 57


LP – IV LAB MANUAL

is in the knapsack. Not every such representation is valid, as the size of objects may exceed the
capacity of the knapsack. The fitness of the solution is the
sum of values of all objects in the knapsack if the representation is valid, or 0 otherwise.

In some problems, it is hard or even impossible to define the fitness expression; in these cases, a
simulation may be used to determine the fitness function value of a phenotype (e.g.
computational fluid dynamics is used to determine the air resistance of a vehicle whose shape is
encoded as the phenotype), or even interactive genetic algorithms are used.

Genetic operators
The next step is to generate a second generation population of solutions from those selected
through a combination of genetic operators: crossover (also called recombination), and mutation.
For each new solution to be produced, a pair of "parent" solutions is selected for breeding from
the pool selected previously. By producing a "child" solution using the above methods of
crossover and mutation, a new solution is created which typically shares many of the
characteristics of its "parents". New parents are selected for each new child, and the process
continues until a new
population of solutions of appropriate size is generated. Although reproduction methods that are
based on the use of two parents are more "biology inspired", some research suggests that more
than two "parents" generate higher quality chromosomes.
These processes ultimately result in the next generation population of chromosomes that is
different from the initial generation. Generally the average fitness will have increased by this
procedure for the population, since only the best organisms from the first generation are selected
for breeding, along with a small proportion of less fit solutions. These less fit solutions ensure g

genetic diversity within the genetic pool of the parents and therefore ensure the genetic diversity
of the subsequent generation of children.
Opinion is divided over the importance of crossover versus mutation. There are many references
in Fogel (2006) that support the importance of mutation-based search.

Although crossover and mutation are known as the main genetic operators, it is possible to use
other operators such asregrouping, colonization-extinction, or migration in genetic algorithms.
It is worth tuning parameters such as the mutation probability, crossover probability and
population size to find reasonable settings for the problem class being worked on. A very small
mutation rate may lead to genetic drift (which is non-ergodic in nature). A recombination rate
that is too high may lead to premature convergence of the genetic algorithm. A mutation rate that
is too high may lead to loss of good solutions, unless elitist selection is employed.

Dept. of Comp Engg. Page 58


LP – IV LAB MANUAL

Heuristics
In addition to the man operators above, other heuristics may be employed to make the calculation
faster or more robust. The speciation heuristic penalizes crossover between candidate solutions
that are too similar; this encourages population diversity and helps prevent premature
convergence to a less optimal solution.

Termination
This generational process is repeated until a termination condition has been reached. Common
terminating conditions are:
 A solution is found that satisfies minimum criteria
 Fixed number of generations reached Allocated budget (computation time/money)
reached
 The highest ranking solution's fitness is reaching or has reached a plateau such that
successive iterations no longer produce better results
 Manual inspection
 Combinations of the above

Program :

from random import *


from DataSet import *

class Gene:def __init__(self, InclusiveLowerBound, InclusiveUpperBound,


NumberOfGeneDigits): X = randint(InclusiveLowerBound, InclusiveUpperBound) # an integer,
0 to 2**16 - 1
self.__BitString = self.Int2BitString(X, NumberOfGeneDigits)

def Int2BitString(self, Int, NumberOfGeneDigits): ## example Int2BitString(128) ->


'10000000'
IntIsNegative = Int < 0
BitString = (bin(Int))[2:]
if IntIsNegative:
BitString = (bin(Int)[3:])
else:
BitString = (bin(Int)[2:])
# BitString is as short as 1 character, so pad 0's until len(BitString) == GeneLength
while len(BitString) < NumberOfGeneDigits:
BitString = '0' + BitString
if IntIsNegative:

Dept. of Comp Engg. Page 59


LP – IV LAB MANUAL

BitString = '1' + BitString


else:
BitString = '0' + BitString
return BitString

def BitString2Int(self): ## example BitString2Int('1111111111111111') -> 65535


AbsoluteValue = int(self.__BitString[1:], 2)
if self.__BitString[0] == '1':
return -1 * AbsoluteValue
else:
return AbsoluteValue

def BitString(self):
return self.__BitString

def ToString(self):
Str = ''
if self.BitString2Int() < 0:
Str += '-' + self.BitString()[1:]
else:
Str += '+' + self.BitString()[1:]
return Str + ', '

def SetBitString(self, NewBitString):


self.__BitString = NewBitString

'''
G = Gene(-3, 3, 5)
print(G.ToString())
print(str(G.BitString2Int()))
'''

class Genotype:#where each of the genes are the same length


def __init__(self, NumberOfGenes, InclusiveLowerBound, InclusiveUpperBound,
NumberOfGeneDigits):
self.__NumberOfGeneDigits = NumberOfGeneDigits
self.__NumberOfGenes = NumberOfGenes
self.__GeneList = []
for i in range(0, NumberOfGenes):
self.__GeneList.append(Gene(InclusiveLowerBound, InclusiveUpperBound,
NumberOfGeneDigits))

Dept. of Comp Engg. Page 60


LP – IV LAB MANUAL

def BitString(self):
Str = ''
for Gene in self.__GeneList:
#print(Gene.ToString())
Str += Gene.BitString()
return Str

def BitStringToString(self):
Str = ''
for Gene in self.__GeneList:
Str += Gene.ToString()
return Str

def ValuesToString(self):
Str = 'Genotype\'s values: '
for Gene in self.__GeneList:
Str += str(Gene.BitString2Int()) + ', '
return Str

def SetBitString(self, NewBitString):


for i in range(0, len(self.__GeneList)):
LowIndex = i*(self.__NumberOfGeneDigits + 1)
HighIndex = LowIndex + self.__NumberOfGeneDigits + 1
self.__GeneList[i].SetBitString( NewBitString[LowIndex:HighIndex] )

def Gene(self, Index):


return self.__GeneList[Index]

def Invalid(self):
for i in range(0, self.__NumberOfGenes):
if i == 1 or i == 3 or i == 5:
if self.Gene(i).BitString2Int() == 0:
return True
return False
'''
G = Genotype(2, -10, 10, 6)
G2 = Genotype(2, -10, 10, 6)
print('G.BitString(): ' + G.BitString())
print(G.ToString())
print('G2.BitString(): ' + G2.BitString())

Dept. of Comp Engg. Page 61


LP – IV LAB MANUAL

print(G2.ToString())
G.SetBitString(G2.BitString())
print('G.BitString(): ' + G.BitString())
print(G.ToString())
print('G2.BitString(): ' + G2.BitString())
print(G2.ToString())
'''

class Individual:
DataSet = DataSet()

def Classification(self, Case):


A = self.Genotype.Gene(0).BitString2Int()
B = self.Genotype.Gene(1).BitString2Int()
C = self.Genotype.Gene(2).BitString2Int()
D = self.Genotype.Gene(3).BitString2Int()
E = self.Genotype.Gene(4).BitString2Int()
F = self.Genotype.Gene(5).BitString2Int()
Number = (Case.PetalLength)*(A/B) + (Case.PetalWidth)*(C/D) + (E/F)
## print('Number: ' + str(Number))
DifferenceToZero = abs(Number - 0)
DifferenceToOne = abs(Number - 1)
DifferenceToTwo = abs(Number - 2)
## if (DifferenceToZero <= DifferenceToOne) and (DifferenceToZero <=
DifferenceToTwo):
## return 0
## elif (DifferenceToOne <= DifferenceToZero) and (DifferenceToOne <=
DifferenceToTwo):
## return 1
## else:
## return 2
if abs(Number) > 1.5:
return 2
elif abs(Number) > .5:
return 1
else:
return 0

def Fitness(self):
# returns a numeric value as the raw fitness
if self.Genotype.Invalid():

Dept. of Comp Engg. Page 62


LP – IV LAB MANUAL

return 0
CorrectnessCount = 0
for Case in Individual.DataSet.ListOfCases:
if self.Classification(Case) == Case.Classification:
CorrectnessCount += 1
return CorrectnessCount

def WriteCorrectnessToFile(self):
if self.Genotype.Invalid():
return 0
OutputFile = open( 'Classifier1.txt' , 'w')
CorrectnessCount = 0
for Case in Individual.DataSet.ListOfCases:
OutputFile.write("%s\n" % (self.Classification(Case)))
if self.Classification(Case) == Case.Classification:
CorrectnessCount += 1
return CorrectnessCount
OutputFile.close()

def Solution(self):
# returns a numeric value as the solution to the fitness function
return Individual.DataSet.Length

def FitnessAsPercentage(self, Fitness):


CorrectnessFraction = Fitness / Individual.DataSet.Length
return round(CorrectnessFraction*100, 4)

def __init__(self):
## self.Genotype = XAsBitString + YAsBitString
## self.Genotype = '0'
self.Genotype = Genotype(6, -100, 100, 7)

def OnePointCrossover(self, OtherIndividual):


CrossoverIndex = randint(0, len(self.Genotype.BitString()) )
## print('CrossoverIndex: ' + str(CrossoverIndex))
SelfLeftHalf = self.Genotype.BitString()[:CrossoverIndex]
SelfRightHalf = self.Genotype.BitString()[CrossoverIndex:]
## print('I1: ' + SelfLeftHalf + ' ' + SelfRightHalf)
OtherLeftHalf = OtherIndividual.Genotype.BitString()[:CrossoverIndex]

Dept. of Comp Engg. Page 63


LP – IV LAB MANUAL

OtherRightHalf = OtherIndividual.Genotype.BitString()[CrossoverIndex:]
## print('I2: ' + OtherLeftHalf + ' ' + OtherRightHalf)

self.Genotype.SetBitString(SelfLeftHalf + OtherRightHalf)
OtherIndividual.Genotype.SetBitString(OtherLeftHalf + SelfRightHalf)
## print('I1: ' + self.Genotype.BitString()[:CrossoverIndex] + ' ' +
self.Genotype.BitString()[CrossoverIndex:])
## print('I2: ' + OtherIndividual.Genotype.BitString()[:CrossoverIndex] + ' ' +
OtherIndividual.Genotype.BitString()[CrossoverIndex:])

def TwoPointCrossover(self, OtherIndividual):


LeftCrossoverIndex = randint(0, len(self.Genotype.BitString()) )
RightCrossoverIndex = randint(LeftCrossoverIndex, len(self.Genotype.BitString()) )
## print('LeftCrossoverIndex: ' + str(LeftCrossoverIndex))
## print('RightCrossoverIndex: ' + str(RightCrossoverIndex))
SelfLeftThird = self.Genotype.BitString()[:LeftCrossoverIndex]
SelfMiddleThird = self.Genotype.BitString()[LeftCrossoverIndex:RightCrossoverIndex]
SelfRightThird = self.Genotype.BitString()[RightCrossoverIndex:]
## print('I1: ' + SelfLeftThird + ' ' + SelfMiddleThird + ' ' + SelfRightThird)
OtherLeftThird = OtherIndividual.Genotype.BitString()[:LeftCrossoverIndex]
OtherMiddleThird =
OtherIndividual.Genotype.BitString()[LeftCrossoverIndex:RightCrossoverIndex]
OtherRightThird = OtherIndividual.Genotype.BitString()[RightCrossoverIndex:]
## print('I2: ' + OtherLeftThird + ' ' + OtherMiddleThird + ' ' + OtherRightThird)

self.Genotype.SetBitString(OtherMiddleThird + SelfLeftThird + SelfRightThird)


OtherIndividual.Genotype.SetBitString(SelfMiddleThird + OtherLeftThird +
OtherRightThird)

NewLeftCrossoverIndex = RightCrossoverIndex - LeftCrossoverIndex

## print('I1: ' + self.Genotype.BitString()[:NewLeftCrossoverIndex] + ' ' +


self.Genotype.BitString()[NewLeftCrossoverIndex:RightCrossoverIndex] + ' ' +
self.Genotype.BitString()[RightCrossoverIndex:])
## print('I2: ' + OtherIndividual.Genotype.BitString()[:NewLeftCrossoverIndex] + ' ' +
OtherIndividual.Genotype.BitString()[NewLeftCrossoverIndex:RightCrossoverIndex] + ' ' +
OtherIndividual.Genotype.BitString()[RightCrossoverIndex:])

def ApplyCrossover(self, OtherIndividual, NumberOfCrossoverPoints):


## Applies Crossover to self and to OtherInvidual, editing their Genotype

Dept. of Comp Engg. Page 64


LP – IV LAB MANUAL

if NumberOfCrossoverPoints == 1:
self.OnePointCrossover(OtherIndividual)
else:
self.TwoPointCrossover(OtherIndividual)

def ApplyMutation(self):
RandomIndex = randint(0, len(self.Genotype.BitString()) - 1)
GenotypeAsList = list(self.Genotype.BitString())
if GenotypeAsList[RandomIndex] == '1':
GenotypeAsList[RandomIndex] = '0'
else:
GenotypeAsList[RandomIndex] = '1'
NewGenotype = ''
for Character in GenotypeAsList:
NewGenotype += Character
self.Genotype.SetBitString(NewGenotype)

def ToString(self, NumberOfSpaces):


return self.Genotype.BitStringToString()
## Str = str(self.FitnessAsPercentage(self.Fitness()))
## Str = NumberOfSpaces + 'Genotype: ' + self.Genotype + '\n'
#### Str += NumberOfSpaces + 'XAsBitString: ' + self.XAsBitString() + '\n'
#### Str += NumberOfSpaces + 'YAsBitString: ' + self.YAsBitString() + '\n'
#### Str += NumberOfSpaces + 'XAsInt: ' + str(self.XAsInt()) + '\n'
#### Str += NumberOfSpaces + 'YAsInt: ' + str(self.YAsInt())
## Str += NumberOfSpaces + 'Fitness: ' + str(self.Fitness())
## return Str

##I1 = Individual()
##print(I1.ToString(''))
##print(I1.Fitness())
##print(I1.FitnessAsPercentage(I1.Fitness()))
##I1.ApplyMutation()
##print(I1.ToString(''))
##print(I1.Fitness())
##I2 = Individual()
##print(I2.ToString(''))
##print('ApplyCrossover')
##I1.ApplyCrossover(I2, 2)
##print(I1.ToString(''))
##print(I2.ToString(''))

Dept. of Comp Engg. Page 65


LP – IV LAB MANUAL

####print(Int2BitString(65535))
####print(BitString2Int( Int2BitString(65535) ))
####print(Int2BitString(1))
####print(BitString2Int( Int2BitString(1) ))
####print(Int2BitString(0))
####print(BitString2Int( Int2BitString(0) ))
####print(BitString2Int('1111111111111111'))
####print(BitString2Int('0000000000000001'))

class IndividualList:
def __init__(self):
self.List = []

def GetRandomIndividualWithReplacement(self):
return sample(self.List, 1)[0]

def GetAndRemoveRandomIndividual(self):
RandomIndividual = self.GetRandomIndividualWithReplacement()
self.List.remove( RandomIndividual )
return RandomIndividual

def ResetList(self):
self.List = []

def ShuffleList(self):
shuffle(self.List)

def BestFitness(self):
BestFitness = self.List[0].Fitness()
for Individual in self.List:
IndividualFitness = Individual.Fitness()
if IndividualFitness > BestFitness:
BestFitness = IndividualFitness
return BestFitness

def MostFitIndividual(self):
MostFitIndividual = self.List[0]
for Individual in self.List:
if Individual.Fitness() > MostFitIndividual.Fitness():

Dept. of Comp Engg. Page 66


LP – IV LAB MANUAL

MostFitIndividual = Individual
return MostFitIndividual

def ToString(self, NumberOfSpaces, NumberOfAdditionalSpacesForIndividualObject):


Str = 'BestFitness: ' + str(self.BestFitness()) + '\n'
for i in range(1, len(self.List) + 1):
Str += NumberOfSpaces + 'Individual ' + str(i)
Str += self.List[i-1].ToString(NumberOfSpaces +
NumberOfAdditionalSpacesForIndividualObject)
Str += '\n'
return Str

##I1 = Individual()
##print(I1.ToString(' '))
##I2 = Individual()
##IL = IndividualList()
##IL.List.append(I1)
##IL.List.append(I2)
##print(IL.ToString('', ' '))
##print(I1 == I2)
##I3 = I1
##print(I1 == I3)

class GA:
def __init__(self, PopulationSize, MutationRate, CrossoverRate, NumberOfCrossoverPoints,
MaxNumberOfGenerations):
self.MutationRate = MutationRate
self.CrossoverRate = CrossoverRate
self.NumberOfCrossoverPoints = NumberOfCrossoverPoints
self.MaxNumberOfGenerations = MaxNumberOfGenerations
self.Population = IndividualList()
self.MatingPool = IndividualList()
self.PopulationSize = PopulationSize
self.MatingPoolSize = PopulationSize
self.Finished = False
self.GenerationCount = 1
self.InitialBestFitness = 0
self.FinalBestFitness = 0
self.PercentInitialBestFitness = 0
self.PercentFinalBestFitness = 0

Dept. of Comp Engg. Page 67


LP – IV LAB MANUAL

self.AverageFitnessDelta = 0
self.InitializePopulation()

def CalculateAverageFitnessDelta(self): # e.g. if fitness improves, then the average


improvement per generation
self.AverageFitnessDelta = (self.PercentFinalBestFitness - self.PercentInitialBestFitness) /
self.GenerationCount

def InitializePopulation(self):
# return an IndividualList object of random Individual objects
while len(self.Population.List) < self.PopulationSize:
self.Population.List.append( Individual() )

def InitializeMatingPool(self):
self.MatingPool.ResetList()
while len(self.MatingPool.List) < self.MatingPoolSize:
Individual1 = self.Population.GetRandomIndividualWithReplacement()
Individual2 = self.Population.GetRandomIndividualWithReplacement()
if Individual1.Fitness() >= Individual2.Fitness():
self.MatingPool.List.append(Individual1)
else:
self.MatingPool.List.append(Individual2)

def ShouldApplyCrossover(self):
RandomFraction = uniform(0, 1)
if self.CrossoverRate >= RandomFraction:
return True
else:
return False

def ShouldApplyMutation(self):
RandomFraction = uniform(0, 1)
if self.MutationRate >= RandomFraction:
return True
else:
return False

def NOffspring(self, N, ParentList):


OffSpringList = IndividualList()
for i in range(0, N):
# make OffSpring a clone of a random parent

Dept. of Comp Engg. Page 68


LP – IV LAB MANUAL

ParentList.ShuffleList()
OffSpring = ParentList.List[0]
if self.ShouldApplyCrossover():
# choose to cross OffSpring with a non-cloned parent, unless len(ParentList.List) == 0,
# then cross with OffSpring (crossing Offspring with Offspring does not change
OffSpring whatsoever)
OffSpring.ApplyCrossover( ParentList.List[-1], self.NumberOfCrossoverPoints )
if self.ShouldApplyMutation():
OffSpring.ApplyMutation()
OffSpringList.List.append( OffSpring )
## print('OffSpringList:\n' + OffSpringList.ToString(' ', ' '))
return OffSpringList

def UpdatePopulation(self):
self.Population.ResetList()
while len(self.Population.List) < self.PopulationSize:
ParentList = IndividualList()
ParentList.List.append( self.MatingPool.GetAndRemoveRandomIndividual() )

## if MatingPool is empty, but we need 1 more Parent, then the two parents are clones
if len(self.MatingPool.List) == 0:
ParentList.List.append( ParentList.List[0] )
else:
ParentList.List.append( self.MatingPool.GetAndRemoveRandomIndividual() )
## print('ParentList:\n' + ParentList.ToString(' ', ' '))
self.Population.List += self.NOffspring(2, ParentList).List

## too many offspring can be added to Population, so remove the latest added offspring
until
## len(self.Population.List) == self.PopulationSize
while len(self.Population.List) != self.PopulationSize:
self.Population.List = self.Population.List[:-1]

def HasConverged(self):
if Individual().FitnessAsPercentage(self.Population.BestFitness()) >= 96:
return True
if self.GenerationCount == self.MaxNumberOfGenerations:
return True
return False

def Run(self):

Dept. of Comp Engg. Page 69


LP – IV LAB MANUAL

## print('InitialPopulation.List:\n'+ self.Population.ToString(' ', ' '))

# calculate a statistic
self.InitialBestFitness = self.Population.BestFitness()

# the actual body of the GA


while not self.Finished:
self.InitializeMatingPool()
## print('MatingPool:\n' + self.MatingPool.ToString(' ', ' '))
self.UpdatePopulation()
self.Finished = self.HasConverged()
self.GenerationCount += 1
if self.GenerationCount == self.MaxNumberOfGenerations + 1:
self.GenerationCount -= 1

# calculate the rest of the statistics for this GA run


self.FinalBestFitness = self.Population.BestFitness()
self.PercentInitialBestFitness = Individual().FitnessAsPercentage(self.InitialBestFitness)
self.PercentFinalBestFitness = Individual().FitnessAsPercentage(self.FinalBestFitness)
self.CalculateAverageFitnessDelta()

class GARunner:
def __init__(self):
self.PopulationSize = 10
self.MutationRate = .01
self.CrossoverRate = .6
self.NumberOfCrossoverPoints = 2
self.NumberOfTimesToRun = 4
self.RunCount = 0
self.TotalNumberOfGenerations = 0
self.TotalOfFitnessDeltas = 0
self.OutputFileName = 'a.txt'
self.MaxNumberOfGenerations = 1000

def SetPreferences(self):
print("Please state the population size (an integer >= 1)")
self.PopulationSize = int(input("=> "))
print("Please state the mutation rate (a decimal point '.', followed by a nonnegative integer,
so that 0 <= mutation rate <= 1)")
self.MutationRate = float(input("=> "))

Dept. of Comp Engg. Page 70


LP – IV LAB MANUAL

print("Please state the crossover rate (a decimal point '.', followed by a nonnegative integer,
so that 0 <= crossover rate <= 1)")
self.CrossoverRate = float(input("=> "))
print("Please state the number of crossover points (either '1' or '2')")
self.NumberOfCrossoverPoints = int(input("=> "))
print("Please state the maximum number of generations (an integer >= 1)")
self.MaxNumberOfGenerations = int(input("=> "))
print("Please state the number of times to run the GA with these parameters (an integer >=
1)")
self.NumberOfTimesToRun = int(input("=> "))
## print("Please state the name of the output file. (file.txt)")
## self.OutputFileName = input("=> ")

def PrintRunResults(self):
## print('Population.List: \n'+ self.GA.Population.ToString(' ', ' '))
print()
print('Results of Run #' + str(self.RunCount) + ':')
print('Generation count: ', str(self.GA.GenerationCount))
print('Population\'s initial best fitness: ' + str(self.GA.InitialBestFitness) + ', or ' +
str(self.GA.PercentInitialBestFitness) + ' percent fit')
print('Population\'s final best fitness: ' + str(self.GA.FinalBestFitness) + ', or ' +
str(self.GA.PercentFinalBestFitness) + ' percent fit')
print('Average fitness delta: ' + str(round(self.GA.AverageFitnessDelta, 3)))
print('Most fit individual: ' +
self.GA.Population.MostFitIndividual().Genotype.ValuesToString())

def WriteResultsToOutputFile(self):
OutputFile = open( self.OutputFileName , 'a')
OutputFile.write("%s, %s\n" % (self.GA.GenerationCount, self.GA.AverageFitnessDelta))
OutputFile.close()

def PrintSummary(self):
print()
print('Summary of runs of GA with the given parameters:')

def main(self):
self.SetPreferences()
## OutputFile = open( self.OutputFileName , 'a')
## OutputFile.write("%s, %s\n" % ('Run\'s total number of generations', 'Run\'s average
fitness delta (a percentage)'))
## OutputFile.close()

Dept. of Comp Engg. Page 71


LP – IV LAB MANUAL

while self.RunCount < self.NumberOfTimesToRun:


self.RunCount += 1

# initialize the GA
self.GA = GA(self.PopulationSize, self.MutationRate, self.CrossoverRate,
self.NumberOfCrossoverPoints, self.MaxNumberOfGenerations)

# run the GA, print the run's results, print the results, write the results to the output file
self.GA.Run()
self.PrintRunResults()
## self.WriteResultsToOutputFile()
## if self.RunCount == 1:
## self.GA.Population.MostFitIndividual().WriteCorrectnessToFile()

GARunner = GARunner()
GARunner.main()

Output :
database@database-HP-Pro-3090-Microtower-PC:~$ python GA.py
Please state the population size (an integer >= 1)
=> 5
Please state the mutation rate (a decimal point '.', followed by a nonnegative integer, so that 0 <=
mutation rate <= 1)
=> .4
Please state the crossover rate (a decimal point '.', followed by a nonnegative integer, so that 0 <=
crossover rate <= 1)
=> .3
Please state the number of crossover points (either '1' or '2')
=> 2
Please state the maximum number of generations (an integer >= 1)
=> 1
Please state the number of times to run the GA with these parameters (an integer >= 1)
=> 3
()
Results of Run #1:
('Generation count: ', '1')
Population's initial best fitness: 83, or 0.0 percent fit
Population's final best fitness: 83, or 0.0 percent fit
Average fitness delta: 0.0

Most fit individual: Genotype's values: -78, -72, 9, 91, 76, -48,

Dept. of Comp Engg. Page 72


LP – IV LAB MANUAL

()
Results of Run #2:
('Generation count: ', '1')
Population's initial best fitness: 53, or 0.0 percent fit
Population's final best fitness: 51, or 0.0 percent fit
Average fitness delta: 0.0
Most fit individual: Genotype's values: 120, 124, -1, 44, -107, 11,
()
Results of Run #3:
('Generation count: ', '1')
Population's initial best fitness: 81, or 0.0 percent fit
Population's final best fitness: 64, or 0.0 percent fit
Average fitness delta: 0.0
Most fit individual: Genotype's values: -107, -90, -12, -2, -101, 96,

Conclusion :

We have successfully studied the Genetic Algorithm for optimization on a dataset obtained
from UCI ML repository.For Example: IRIS Dataset

References :

1) https://en.wikipedia.org/wiki/Genetic_algorithm
2) https://www.geeksforgeeks.org/genetic-algorithms/

Dept. of Comp Engg. Page 73


LP – IV LAB MANUAL

Assignment No: 08

Title: Use Business intelligence and analytics tools to recommend the combination of share
purchases and sales for maximizing the profit.

Objectives:
1. To study different types of BI analytics tools.
2. To study the representation and implementation of BI tools for maximizing shares profit.

Theory:

Business Intelligence: -

Business intelligence (BI) is a technology-driven process for analyzing data and presenting
actionable information to help corporate executives, business managers and other end users make
more informed business decisions. BI encompasses a variety of tools, applications and
methodologies that enable organizations to collect data from internal systems and external
sources, prepare it for analysis, develop and run queries against the data, and create reports,
dashboards and data visualizations to make the analytical results available to corporate decision
makers as well as operational workers. The potential benefits of business intelligence programs
include accelerating and improving decision making; optimizing internal business processes;
increasing operational efficiency; driving new revenues; and gaining competitive advantages
over business rivals. BI systems can also help companies identify market trends and spot
business problems that need to be addressed. Business intelligence combines a broad set of data
analysis applications, including ad hoc analysis and querying, enterprise reporting, online
analytical processing (OLAP), mobile BI, real-time BI, operational BI, cloud and software as a
service BI, open source BI, collaborative BI and location intelligence. BI technology also
includes data visualization software for designing charts and other infographics, as well as tools
for building BI dashboards and performance scorecards that display visualized data on business
metrics and key performance indicators in an easy-to-grasp way. BI applications can be bought
separately from different vendors or as part of a unified BI platform from a single vendor.

BI Programs: BI programs can also incorporate forms of advanced analytics, such as data
mining, predictive analytics, text mining, statistical analysis and big data analytics. In many
cases though, advanced analytics projects are conducted and managed by separate teams of data
scientists, statisticians, predictive modelers and other skilled analytics professionals, while BI
teams oversee more straightforward querying and analysis of business data.

BI Data: Business intelligence data typically is stored in a data warehouse or smaller data marts
that hold subsets of a company's information. In addition, Hadoop systems are increasingly being

Dept. of Comp Engg. Page 74


LP – IV LAB MANUAL

used within BI architectures as repositories or landing pads for BI and analytics data, especially
for unstructured data, log files, sensor data and other types of big data. Before it's used in BI
applications, raw data from different source systems must be integrated, consolidated and
cleansed using data integration and data quality tools to ensure that users are analyzing accurate
and consistent information

BI Tools:
Business Intelligence tools provide companies reliable information and true insights in order to
improve decision making and social collaboration. With business intelligence you’ll be able to
produce much better company results. The BI tools provide the means for efficient reporting,
thorough analysis of (big) data, statistics and analytics and dashboards displaying KPIs. Bring
your company data to life with BI tools Bring your company data to life by combining, analyzing
and visualizing all that data very easily. The tools will help you to see and understand the success
factors of your business more quickly. And where things (might) go wrong and where you need
to make adjustments. They give employees and managers the possibility to improve business
processes on a daily basis by using correct information and relevant insights.

Selecting the wrong tool might hurt

Companies who are not successful often have an issue with their information infrastructure. They
may have selected the wrong Business Intelligence tool or perhaps they don’t use business
intelligence-tools at all.They have not been able to implement BI and are still in the dark and that
hurts company results.

What are the biggest benefits of BI tools?

√ Improve the overall performance of your organization, departments and teams.


√ Make fact-based decisions without neglecting the intuition of experienced employees.
√ Enhance the business processes in the organization using the right visualizations.
√ Easy monitoring and reporting of your genuine KPIs using role-based dashboards.

How can you easily select one of the tools for your organization?

Step 1: Define the key bi tool selection criteria, both the user and IT requirements.
Step 2: With a list of questions you need to contact all the vendors to get the answers.
Step 3: Validate and analyze all the information from the Business Intelligence vendors.
Step 4: Make a short list for a proof-of-concept (POC) and perform the POC.

Step 5: Choose the tool / platform that suits your needs and price criteria best.
How can you compare BI tools very quickly?

Dept. of Comp Engg. Page 75


LP – IV LAB MANUAL

Business Intelligence tools come in many different flavors. All the tools run on the Windows
platform for example, but only a few support the different flavors of Unix and Linux. Some have
excellent functionality for pixel perfect reporting and others do better in dash-boarding and
predictive analytics.

KPIs

What are Key Performance Indicators (KPIs) and how do you choose which ones to use? It is
difficult to determine what are genuine KPIs, a little like looking for a needle in a haystack. The
problem being that we are trying to shed light on is the (future) performance of our most critical
internal processes which will allow us to achieve the organization’s (strategic) goals within the
chosen policies.A KPI should be important enough that achieving it will have a huge impact on
the total performance of theorganization (or part of the organization). The King of KPIs, David
Parmenter, said once “Key performance indicators (KPIs), while used commonly around the
world, have never until now been clearly defined.” This is the reason that it is essential to
understand which types of (performance) indicators actually exist:

1. Key Results Indicators (KRIs): These are complex indicators based on the total achievement
of a broad range of actions, for example, the profitability of a company, the level of satisfaction
of the employees, or customer satisfaction levels.

2. Performance Indicators (PIs): These are very specific indicators, they tell us what we need
to achieve in a given area and are not particularly “key” to achieving efficient performance in the
organization’s internal processes. Examples are: the profit achieved from the company’s 10
largest customers or the revenue growth percentage for a given product or service. These can, of
course, be very useful things to measure, but in general they are not critical for achieving better
performance in multiple result areas of the organization at the same time. By concentrating
solely on performance in terms of, for example, revenue growth we often miss the fact that we
are no longer making any profit because the customers and employees have become dissatisfied.

3. Indicators (IND): These are measures of a single action which can form the basis for any
KRI, PI or KPI. They are, in general, neither critical, nor key in achieving major performance
improvement. Some examples are: number of new customers, gross revenue, number of lost
customers, number of orders etc.

4. Key Performance Indicators (KPIs): These are the indicators that measure one activity or
action, they are directly connected to an organization’s strategic goals and improvement
measured using these indicators can (and should) mean a dramatic improvement of the
performance in more multiple organizational areas. Examples are: the number of minutes that the
departure of an airplane is delayed, or the number of times a product cannot be sold due to lack
of inventory.
KPIs can often be recognized as missed chances as seen by the “lost sales” example or when, if
the value of a KPI decreases it causes rework. Rework means that actions have to be carried out
again because they didn’t work correctly the first time.

Dept. of Comp Engg. Page 76


LP – IV LAB MANUAL

Proprietary Products:

ActiveReports, Actuate Corporation, ApeSoft, Diamond Financial Management System, Birst,


BOARD, ComArch, Data Applied, Decision Support Panel, Dexon Business, Intelligence,
Domo, Dundas Data Visualization, Inc., Dimensional Insight, Dynamic AI, Entalysis, Grapheur,
GoodData - Cloud Based, InfoCaptor Dashboard, IBM CognosicCube. IDV Solutions Visual
FusionInetSoft, RubyReport, Information Builders,InfoZoom, Jackbe, Jaspersoft (now TIBCO,
iReport,Jasper Studio, Jasper Analysis, Jasper ETL, Jasper Library), Jedox, JReport (from
Jinfonet Software), KlipfolioDashboard,Lavastorm, LIONsolver, List and Label, Logi Analytics,
Looker, Lumalytics, Manta Tools,Microsoft, SQL Server Reporting Services, SQL Server

Analysis Services, PerformancePoint Server 2007, Proclarity, Power Pivot, MicroStrategy,


myDIALS, NextAction, Numetric, Oracle, Hyperion Solutions Corporation, Business \
Intelligence Suite Enterprise Edition, Panorama Software.

Tableau Software:
It is an American computer software company headquartered in Seattle, Washington. It produces
a family of interactive data visualization products focused on business intelligence.

Tableau offers five main products:


1. Tableau Desktop,
2. Tableau Server,
3. Tableau Online,
4. Tableau Reader and
5. Tableau Public.

Dept. of Comp Engg. Page 77


LP – IV LAB MANUAL

Tableau Public and Tableau Reader are free to use, while both Tableau Server and Tableau
Desktop come with a 14-day fully functional free trial period, after which the user must pay for
the software. Tableau Desktop comes in both a Professional and a lower cost Personal edition.
Tableau Online is available with an annual subscription for a single user, and scales to support
thousands of users. Tableau is the most powerful, secure, and flexible end-to-end analytics
platform for your data. Elevate people with the power of data. Designed for the individual, but
scaled for the enterprise, Tableau is the only business intelligence platform that turns your data
into insights that drive action. Tableau has a highly scalable, n-tier client-server architecture that
serves mobile clients, web clients and desktop-installed software. Tableau Desktop is the
authoring and publishing tool that is used to create shared views on Tableau Server.

Fig: Tableau provides a scalable solution for creation and delivery of web, mobile and desktop
analytics.

Evaluation of Tableau

Strengths Speed - The best strength point of Tableau is its speed through which it evaluates
millions of rows and provides the necessary responses in moment.

Ease of use - It is very simple to use. One can begin with Tableau even with no earlier
programming experience. With just fundamental MS Excel skills individual can simply learn
Tableau.

Beautiful and interactive dashboard - The Dashboard of Tableau is very interactive and
provides dynamic outcomes. Rich visualizations can be produced very easily. The graphics and
charts are smart and beautiful. Images, web pages and documents can be added hooked on to the
dashboard for simple story telling. All this guides to a large extent coming into the data.

Direct connection - Tableau permits the users to directly hook up with databases, cubes, and
data warehouses etc. The data access is so simple without any advanced setup and the data is live

Dept. of Comp Engg. Page 78


LP – IV LAB MANUAL

to be getting updated on its own. Single can select tables as of spreadsheets to data from Hadoop
to generate a great mash-up and get preferred outcomes in no time. This is easy ad hoc business
analytics.

Weakness

Not comprehensive solution, specialize in BI

No predictive analytical capabilities

Customization and Integration with other apps

Expandability for analytics

Social media integration

Robust enterprise-class security

Similar experience is available at: http://www.tableau.com/solutions/real-estate-analysis

Conclusion: Hence, we studied Business Intelligence and analytical tools to recommend the
combination of share purchases and sales for maximizing the profit.

Dept. of Comp Engg. Page 79


LP – IV LAB MANUAL

Assignment No: 9
Title: On Twitter Data performs computing using BIA tools electively

Objectives:
1. To handle Twitter Data for performing computing.
2. To analyze data using R programming tool.

Theory:

Why Text Processing using R?

With the increasing importance of computational text analysis in research, many researchers face
the challenge of learning how to use advanced software that enables this text analysis. Currently,
one of the most popular environments for computational methods and the emerging field of “data
science” is the R statistical software. However, for researchers that are not well-versed in
programming, learning how to use R can be a challenge, and performing text analysis in
particular can seem daunting. In this step by step guide one can learn that performing text
analysis in R is not as hard as some might fear. This is an introduction into the use of common
techniques, with the aim of helping researchers get acquainted with computational text analysis
in general, as well as getting a start at performing advanced text analysis studies in R.

Why R?
R was specifically designed for statistical analysis, which makes it highly suitable for data
science applications. Although the learning curve for programming with R can be steep,
especially for people without prior programming experience, the tools now available for carrying
out text analysis in R make it easy to perform powerful, cutting-edge text analytics using only a
few simple commands. One of the keys to R’s explosive growth has been its densely populated
collection of extension software libraries, known in R terminology as packages, supplied and
maintained by R’s extensive user community. Each package extends the functionality of the base
R language and core packages, and in addition to functions and data must include documentation
and examples, often in the form of vignettes demonstrating the use of the package. The best-
known package repository, the Comprehensive R Archive Network (CRAN), currently has over
10,000 packages that are published.

Text analysis in particular has become well established in R. There is a vast collection of
dedicated text processing and text analysis packages, from low-level string operations to
advanced text modeling techniques such as fitting Latent Dirichlet Allocation models, R
provides it all. One of the main advantages of performing text analysis in R is that it is often

Dept. of Comp Engg. Page 80


LP – IV LAB MANUAL

possible, and relatively easy, to switch between different packages or to combine them. Recent
efforts among the R text analysis developers’ community are designed to promote this
interoperability to maximize flexibility and choice among users.4 As a result, learning the basics
for text analysis in R provides access to a wide range of advanced text analysis features.

Why Twitter Data?


Twitter is an online microblogging tool that disseminates more than 400 million messages per
day, including vast amounts of information about almost all industries from entertainment to
sports, health to business etc. One of the best things about Twitter — indeed, perhaps its greatest
appeal — is in its accessibility. It’s easy to use both for sharing information and for collecting it.
Twitter provides unprecedented access to our lawmakers and to our celebrities, as well as to
news as it’s happening. Twitter represents an important data source for the business models of
huge companies as well. All the above characteristics make twitter a best place to collect real
time and latest data to analyse and do any sought of research for real life situations.

Twitter sentiment analysis using R


In the past one decade, there has been an exponential surge in the online activity of people across
the globe. The volume of posts that are made on the web every second runs into millions. To add
to this, the rise of social media platforms has led to flooding to content on the internet.
Social media is not just a platform where people talk to each other, but it has become very vast
and serves many more purposes. It has become a medium where people
Express their interests.
Share their views.
Share their displeasures.
Compliment companies for good and poor services.

Implementing sentiment analysis application in R


Now, we will try to analyze the sentiments of tweets made by a Twitter handle. We will develop
the code in R step by step and see the practical implementation of sentiment analysis in R.

The code is divided into following parts:


Extracting tweets using Twitter application
Cleaning the tweets for further analysis
Getting sentiment score for each tweet
Segregating positive and negative tweets
Extracting tweets using Twitter application

We will first install the relevant packages that we need. To extract tweets from Twitter, we will
need package ‘twitteR’.
‘Syuzhet’ package will be used for sentiment analysis; while ‘tm’ and ‘SnowballC’ packages are
used for text mining and analysis.

Dept. of Comp Engg. Page 81


LP – IV LAB MANUAL

# Install Requried Packages


installed.packages("SnowballC")
installed.packages("tm")
installed.packages("twitteR")
installed.packages("syuzhet")

# Load Requried Packages


library("SnowballC")
library("tm")
library("twitteR")
library("syuzhet")

# Install Requried Packages


installed.packages("SnowballC")
installed.packages("tm")
installed.packages("twitteR")
installed.packages("syuzhet")

# Load Requried Packages


library("SnowballC")
library("tm")
library("twitteR")
library("syuzhet")

# Create a vector of numbers


numbers = c(1, 2, 3, 4, 5)
print(numbers)

# [1] 1 2 3 4 5
# Create a vector of letters
ltrs = c('a', 'b', 'c', 'd', 'e')
# [1] "a" "b" "c" "d" "e"
# Concatenate both
mixed_vec = c(numbers, ltrs)
print(mixed_vec)
# [1] "1" "2" "3" "4" "5" "a" "b" "c" "d" "e"

Let’s analyze what happened in the previous code. We can see it is possible to create vectors
with numbers and with letters. We did not need to tell R what type of data type we wanted
beforehand. Finally, we were able to create a vector with both numbers and letters. The vector

Dept. of Comp Engg. Page 82


LP – IV LAB MANUAL

mixed_vec has coerced the numbers to character, we can see this by visualizing how the values
are printed inside quotes. The following code shows the data type of different vectors as returned
by the function class. It is common to use the class function to "interrogate" an object, asking
him what his class is.

### Evaluate the data types using class


### One dimensional objects
# Integer vector
num = 1:10
class(num)
# [1] "integer"

# Numeric vector, it has a float, 10.5


num = c(1:10, 10.5)
class(num)
# [1] "numeric"

# Character vector
ltrs = letters[1:10]
class(ltrs)
# [1] "character"

# Factor vector
fac = as.factor(ltrs)
class(fac)
# [1] "factor"
R supports two-dimensional objects also. In the following code, there are examples of the two
most popular data structures used in R: the matrix and data.frame.
# Matrix
M = matrix(1:12, ncol = 4)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12

lM = matrix(letters[1:12], ncol = 4)
# [,1] [,2] [,3] [,4]
# [1,] "a" "d" "g" "j"
# [2,] "b" "e" "h" "k"
# [3,] "c" "f" "i" "l"

Dept. of Comp Engg. Page 83


LP – IV LAB MANUAL

#Coerces the numbers to character


# cbind concatenates two matrices (or vectors) in one matrix
cbind(M, lM)

# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]


# [1,] "1" "4" "7" "10" "a" "d" "g" "j"
# [2,] "2" "5" "8" "11" "b" "e" "h" "k"
# [3,] "3" "6" "9" "12" "c" "f" "i" "l"

class(M)
# [1] "matrix"
class(lM)
# [1] "matrix"

# data.frame
# One of the main objects of R, handles different data types in the same object.
# It is possible to have numeric, character and factor vectors in the same data.frame

df = data.frame(n = 1:5, l = letters[1:5])


df
# nl
#11a
#22b
#33c
#44d
#55e

As demonstrated in the previous example, it is possible to use different data types in the same
object. In general, this is how data is presented in databases, APIs part of the data is text or
character vectors and other numeric. In is the analyst job to determine which statistical data type
to assign and then use the correct R data type for it. In statistics we normally consider variables
are of the following types −

Numeric
Nominal or categorical
Ordinal
In R, a vector can be of the following classes −
Numeric - Integer
Factor
Ordered Factor

Dept. of Comp Engg. Page 84


LP – IV LAB MANUAL

R provides a data type for each statistical type of variable. The ordered factor is however rarely
used, but can be created by the function factor, or ordered.
The following section treats the concept of indexing. This is a quite common operation, and
deals with the problem of selecting sections of an object and making transformations to them.

# Let's create a data.frame


df = data.frame(numbers = 1:26, letters)
head(df)
# numbers letters
#1 1 a
#2 2 b
#3 3 c
#4 4 d
#5 5 e
#6 6 f
# str gives the structure of a data.frame, it’s a good summary to inspect an object
str(df)
# 'data.frame': 26 obs. of 2 variables:
# $ numbers: int 1 2 3 4 5 6 7 8 9 10 ...
# $ letters: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
# The latter shows the letters character vector was coerced as a factor.
# This can be explained by the stringsAsFactors = TRUE argumnet in data.frame
# read ?data.frame for more information

class(df)
# [1] "data.frame"

### Indexing
# Get the first row
df[1, ]

# numbers letters
#1 1 a
# Used for programming normally - returns the output as a list
df[1, , drop = TRUE]
# $numbers
# [1] 1
#
# $letters
# [1] a

Dept. of Comp Engg. Page 85


LP – IV LAB MANUAL

# Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
# Get several rows of the data.frame
df[5:7, ]

# numbers letters
#5 5 e
#6 6 f
#7 7 g
### Add one column that mixes the numeric column with the factor column
df$mixed = paste(df$numbers, df$letters, sep = ’’)

str(df)
# 'data.frame': 26 obs. of 3 variables:
# $ numbers: int 1 2 3 4 5 6 7 8 9 10 ...
# $ letters: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ mixed :chr "1a" "2b" "3c" "4d" ...
### Get columns
# Get the first column
df[, 1]
# It returns a one dimensional vector with that column
# Get two columns
df2 = df[, 1:2]
head(df2)
# numbers letters
#1 1 a
#2 2 b
#3 3 c
#4 4 d
#5 5 e
#6 6 f

# Get the first and third columns


df3 = df[, c(1, 3)]
df3[1:3, ]

# numbers mixed
#1 1 1a
#2 2 2b
#3 3 3c
### Index columns from their names
names(df)

Dept. of Comp Engg. Page 86


LP – IV LAB MANUAL

# [1] "numbers" "letters" "mixed"


# This is the best practice in programming, as many times indeces change, but
variable names don’t
# We create a variable with the names we want to subset
keep_vars = c("numbers", "mixed")

df4 = df[, keep_vars]


head(df4)
# numbers mixed
#1 1 1a
#2 2 2b
#3 3 3c
#4 4 4d
#5 5 5e
#6 6 6f
### subset rows and columns
# Keep the first five rows
df5 = df[1:5, keep_vars]
df5
# numbers mixed
#1 1 1a
#2 2 2b
#3 3 3c
#4 4 4d
#5 5 5e
# subset rows using a logical condition
df6 = df[df$numbers< 10, keep_vars]
df6
# numbers mixed
#1 1 1a
#2 2 2b
#3 3 3c
#4 4 4d
#5 5 5e
#6 6 6f
#7 7 7g
#8 8 8h
#9 9 9i

Conclusion: Hence, we studied On Twitter Data performs computing using Business


Intelligence analytical tools electively.

Dept. of Comp Engg. Page 87


LP – IV LAB MANUAL

Mini project On BI

Aim : Sentiment Analysis of Whatsapp data analytics tools .

Objective :

1) To learn concept of Sentiment Analysis.

2) To study the data analytics tools like R programming.

Software Requirements: R Studio

Theory :

Sentiment Analysis :

Sentiment analysis is the measurement of positive and negative language.

It is a way to evaluate written or spoken language to determine if the expression is favorable,


unfavorable, or neutral, and to what degree. Today’s algorithm-based sentiment analysis tools
can handle huge volumes of customer feedback consistently and accurately. Paired with text
analytics, sentiment analysis reveals the customer’s opinion about topics ranging from your
products and services to your location, your advertisements, or even your competitors.

Why is sentiment analysis important?

Sentiment analysis is critical because helps you see what customers like and dislike about you
and your brand. Customer feedback—from social media, your website, your call center agents,
or any other source—contains a treasure trove of useful business information. But, it isn’t
enough to know what customers are talking about. You must also know how they feel. Sentiment
analysis is one way to uncover those feelings. Sometimes known as “opinion mining,” sentiment
analysis can let you know if there has been a change in public opinion toward any aspect of your
business. Peaks or valleys in sentiment scores give you a place to start if you want to make
product improvements, train sales or customer care agents, or create new marketing campaigns
Sentiment analysis is not a once and done effort. By reviewing your customer’s feedback on your
business regularly you can be more proactive regarding the changing dynamics in the market
place.

Dept. of Comp Engg. Page 88


LP – IV LAB MANUAL

Working :

The world is moving towards a fully digitalized economy at an incredible pace and as a result, a
ginormous amount of data is being produced by the internet, social media, smartphones, tech
equipment and many other sources each day which has led to the evolution of Big Data
management and analytics. Sentiment analysis is one such tool and the most popular branch of
textual analytics which with the help of statistics and natural language processing examine and
classify the unorganized textual data into various sentiments. It is also known as opinion mining
as it largely focuses on the opinion and attitude of the people through analyzing their texts.

Branches of Textual Analysis

More than 34 billion texts are exchanged over the WhatsApp every day and just imagine if we
could analyze and get valuable insights from this data and leverage it to not only take better real-
time decisions but also add value to the stakeholders at much lower cost and time and hence
align our operational efficiency with organizational strategy. In this article, we’ll leverage the
power of sentiment analysis to investigate the WhatsApp chat using R, visualize and interpret the
results at the same time.

Applications of WhatsApp Chat Analysis:

WhatsApp is most popular chat app with monthly active users of more than 700 million. The
popularity of this app has made it a necessary app among smartphone users and even businesses
and organizations use WhatsApp for daily communication in groups and across departments.
Corporations get a huge amount of textual data from WhatsApp and they can leverage WhatsApp
chat sentiment analysis to gain better insights about their employees and try to avoid unforeseen
conflicts due to various redundancies and inefficiency of business processes.

Dept. of Comp Engg. Page 89


LP – IV LAB MANUAL

Flowchart :

How it’s done :


Firstly, we need to select and export a chat from WhatsApp to our system which is an easy task
and can be done either by phone or WhatsApp for the desktop. After this, the process is fairly
simple and has been explained with all the coding details needed to analyze the texts.

Program :

Dept. of Comp Engg. Page 90


LP – IV LAB MANUAL

#Load required packages


library(ggplot2)
library(tm)
library(wordcloud)
library(syuzhet)

#get the data from whatsapp chat


text <- readLines("chat.txt")

#let us create the corpus


docs <- Corpus(VectorSource(text))

#clean our chat data


trans <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
docs <- tm_map(docs, trans, "/")
docs <- tm_map(docs, trans, "@")
docs <- tm_map(docs, trans, "\\|")
docs <- tm_map(docs, content_transformer(tolower))
docs <- tm_map(docs, removeNumbers)
docs <- tm_map(docs, removeWords, stopwords("sang"))
docs <- tm_map(docs, removeWords, c("where","how"))
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, stripWhitespace)
docs <- tm_map(docs, stemDocument)

#create the document term matrix


dtm <- TermDocumentMatrix(docs)
mat <- as.matrix(dtm)
v <- sort(rowSums(mat),decreasing=TRUE)

#Data frame
data <- data.frame(word = names(v),freq=v)
head(data, 10)

#generate the wordcloud


set.seed(1056)
wordcloud(words = data$word, freq = data$freq, min.freq = 1,
max.words=200, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))

#fetch sentiment words from texts


Sentiment <- get_nrc_sentiment(text)
head(Sentiment)

Dept. of Comp Engg. Page 91


LP – IV LAB MANUAL

texts <- cbind(text,Sentiment)

#count the sentiment words by category


TotalSentiment <- data.frame(colSums(texts[,c(2:11)]))
names(TotalSentiment) <- "count"
TotalSentiment <- cbind("sentiment" = rownames(TotalSentiment), TotalSentiment)
rownames(TotalSentiment) <- NULL

#total sentiment score of all texts


ggplot(data = TotalSentiment, aes(x = sentiment, y = count)) +
geom_bar(aes(fill = sentiment), stat = "identity") +
theme(legend.position = "none") +
xlab("Sentiment") + ylab("Total Count") + ggtitle("Total Sentiment Score")

Output :

Dept. of Comp Engg. Page 92


LP – IV LAB MANUAL

We can easily infer from the bar plot that the chat had a maximum number of positive sentiments
followed bynegative as second and anticipation at third. Issues with Sentiments and Analytics
Though Sentiment analysis has been one of the most popular textual analysis tools among
businesses, scholars and analysts to take decisions and for research purposes Sentiment analysis
has its own limitations as language is very complex and the meaning of each and every word
changes with time and from person to person. Also, the accuracy of the analysis can’t be
accurately measured and compared with how human beings analyze emotions.

The problem can be classified into three main factors:


1. Sarcasm: It is a popular form of mockery to ridicule or convey insult. Analytics fails to
recognize these forms of emotions and might prove to be ineffective in such cases. Though the
efforts are being made to cater to this problem through the extensive use of machine learning and
artificial intelligence and we might see an improved version of sentiment analysis in near future.
“I am so proud of your stupidity, you make me feel good about myself.”
2. Multiple Meanings: A word could have many meanings and it may represent multiple
emotions as we move from one geography to another or even one person to another. Many
English words in the UK may mean different in American English. For ex: “I think you’ve been
playing horribly dope.”
3. Dependency: Sentiment analysis largely depends on the predefined words and their individual
score. Which leads to many problems like ambiguity in the context of the sentence. A sentence
which includes ‘good’ might not have any emotions attached to it but will be shown as positive
by the analysis.

Despite its limitations Sentiment Analysis is extremely popular and widely used analytical tool

Dept. of Comp Engg. Page 93


LP – IV LAB MANUAL

in business intelligence for social media monitoring, brand health examination, effects of ad
campaigns or new product launch and various research purposes. It is frequently applied to
Twitter data and Customer reviews by marketers and customer service teams to identify the
feelings of consumers. Sentiment analysis has also started to gain popularity in areas like
psychology, political science and other alike fields where textual data is obtained and explored
from books, transcripts, and reports.

Conclusion :
We have successfully studied Sentiment Analysis of Whatsapp data analytics tools.

References :
1) https://www.clarabridge.com/sentiment-analysis/
2) http://www.planetanalytics.in/2017/09/whatsapp-chat-sentiment-analysis-in-r.html
3) https://creately.com/diagram/example/h11q823n2/sentiment%20analysis

Dept. of Comp Engg. Page 94

You might also like