LP IV Lab Manual Sem II AY 2019-2020 (1) (1) - Removed
LP IV Lab Manual Sem II AY 2019-2020 (1) (1) - Removed
LP IV Lab Manual Sem II AY 2019-2020 (1) (1) - Removed
List of Experiments
Assignment Assignment Name Page
Number No.
Implement Union, Intersection, Complement and Difference
operations on fuzzy sets. Also create fuzzy relation by Cartesian
1
product of any two fuzzy sets and Perform max-min composition on
any two fuzzy relations.
Assignment No: 01
Aim: Implement Union, Intersection, Complement and Difference operations on fuzzy sets.
Also create fuzzy relation by Cartesian product of any two fuzzy sets and perform max-min
composition on any two fuzzy relations.
Objectives:
1. To learn different fuzzy operation on fuzzy set.
2. To learn min-max composition on fuzzy set.
Software Requirements:
Ubuntu 18.04
Hardware Requirements:
Pentium IV system with latest configuration
Theory:
Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the
concept of partial truth- truth values between "completely true" and "completely false". As its
name suggests, it is the logic underlying modes of reasoning which are approximate rather than
exact. The importance of fuzzy logic derives from the fact that most modes of human reasoning
and especially common sense reasoning are approximate in nature.
The essential characteristics of fuzzy logic as founded by Zader Lotfi are as follows.
In fuzzy logic, exact reasoning is viewed as a limiting case of approximate reasoning.
In fuzzy logic everything is a matter of degree.
Any logical system can be fuzzified
In fuzzy logic, knowledge is interpreted as a collection of elastic or, equivalently , fuzzy
constraint on a collection of variables
Inference is viewed as a process of propagation of elastic constraints.
The third statement hence, defines Boolean logic as a subset of Fuzzy logic.
Fuzzy Sets
Fuzzy Set Theory was formalised by Professor Lofti Zadeh at the University of California in
1965. What Zadeh proposed is very much a paradigm shift that first gained acceptance in the Far
East and its successful application has ensured its adoption around the world.
A paradigm is a set of rules and regulations which defines boundaries and tells us what to do to
be successful in solving problems within these boundaries. For example the use of transistors
instead of vacuum tubes is a paradigm shift - likewise the development of Fuzzy Set Theory
from conventional bivalent set theory is a paradigm shift.
Bivalent Set Theory can be somewhat limiting if we wish to describe a 'humanistic' problem
mathematically. For example, Fig 1 below illustrates bivalent sets to characterise the temperature
of a room.
The most obvious limiting feature of bivalent sets that can be seen clearly from the diagram is that
they are mutually exclusive - it is not possible to have membership of more than one set ( opinion
would widely vary as to whether 50 degrees Fahrenheit is 'cold' or 'cool' hence the expert knowledge
we need to define our system is mathematically at odds with the humanistic world). Clearly, it is not
accurate to define a transiton from a quantity such as 'warm' to 'hot' by the application of one degree
Fahrenheit of heat. In the real world a smooth (unnoticeable) drift from warm to hot would occur.
This natural phenomenon can be described more accurately by Fuzzy Set Theory. Fig.2 below shows
how fuzzy sets quantifying the same information can describe this natural drift.
The whole concept can be illustrated with this example. Let's talk about people and "youthness".
In this case the set S (the universe of discourse) is the set of people. A fuzzy subset YOUNG is
also defined, which answers the question "to what degree is person x young?" To each person in
the universe of discourse, we have to assign a degree of membership in the fuzzy subset
YOUNG. The easiest way to do this is with a membership function based on the person's age.
young(x) = { 1, if age(x) <= 20,
(30-age(x))/10, if 20 < age(x) <= 30,
0, if age(x) > 30 }
A graph of this looks like:
Parthiban 25 0.50
Arosha 26 0.40
Chin Wei 28 0.20
Rajkumar 83 0.00
So given this definition, we'd say that the degree of truth of the statement "Parthiban is YOUNG"
is 0.50.
Note: Membership functions almost never have as simple a shape as age(x). They will at least tend to
be triangles pointing up, and they can be much more complex than that. Furthermore, membership
functions so far is discussed as if they always are based on a single criterion, but this isn't always the
case, although it is the most common case. One could, for example, want to have the membership
function for YOUNG depend on both a person's age and their height (Arosha's short for his age).
This is perfectly legitimate, and occasionally used in practice. It's referred to as a two-dimensional
membership function. It's also possible to have even more criteria, or to have the membership
function depend on elements from two completely different universes of discourse.
So given this definition, we'd say that the degree of truth of the statement "Parthiban is YOUNG"
is 0.50.
Note: Membership functions almost never have as simple a shape as age(x). They will at least
tend to be triangles pointing up, and they can be much more complex than that. Furthermore,
membership functions so far is discussed as if they always are based on a single criterion, but
this isn't always the case, although it is the most common case. One could, for example, want to
have the membership function for YOUNG depend on both a person's age and their height
(Arosha's short for his age). This is perfectly legitimate, and occasionally used in practice. It's
referred to as a two-dimensional membership function. It's also possible to have even more
criteria, or to have the membership function depend on elements from two completely different
universes
Union
The membership function of the Union of two fuzzy sets A and B with membership functions
and respectively is defined as the maximum of the two individual membership functions. This is
called the maximum criterion.
The Union operation in Fuzzy set theory is the equivalent of the OR operation in Boolean algebra.
Intersection
The membership function of the Intersection of two fuzzy sets A and B with membership
functions and respectively is defined as the minimum of the two individual membership
functions. This is called the minimum criterion.
The Intersection operation in Fuzzy set theory is the equivalent of the AND operation in Boolean
algebra
Complement
The membership function of the Complement of a Fuzzy set A with membership function is defined
as the negation of the specified membership function. This is caleed the negation criterion.
The Complement operation in Fuzzy set theory is the equivalent of the NOT operation in
Boolean algebra.
The following rules which are common in classical set theory also apply to Fuzzy set theory.
Let A1, A2, ….., An be fuzzy sets in U1, U2, …Un, respectively. The Cartesian product of A1, A2,
….., An is a fuzzy set in the space U1 x U2 x…x Un with the membership function as: µA1 x A2
x…x An (x1, x2,…, xn) = min [µA1 (x1), µA2 (x2), …. µAn (xn)]
So, the Cartesian product of A1, A2, ….., An are donated by A1 x A2 x….. x An.
Cartesian Product: Example Let A = {(3, 0.5), (5, 1), (7, 0.6)} Let B = {(3, 1), (5, 0.6)}
The product is all set of pairs from A and B with the minimum associated memberships Ax B =
{[(3, 3), min (0.5, 1)], [(5, 3), min(1, 1)], [(7, 3), min(0.6, 1)], [(3, 5), min(0.5, 0.6)], [(5, 5), min(1,
0.6)], [(7, 5), min(0.6, 0.6)]} = {[(3, 3), 0.5], [(5, 3), 1], [(7, 3), 0.6], [(3, 5), 0.5], [(5, 5), 0.6], [(7, 5),
0.6]}.
MATLAB Code:
Union Operation:
clear all; %clear all previous workspace variables
x=(0:0.1:10)'; %x-axis from 0 to 10 with 0.1 interval
u1=gaussmf(x,[1,4]); %Gaussian Membership function
u2=trimf(x,[3 6.5 9]); %Triangular Membership function
u_reun=max(u1,u2); %Union Operation on Set u1 and u2
hold on
plot(x,u1,'r') %plotting set1
plot(x,u2,'m') %plotting set2
axis([0 10 0 1.05]);
legend('A','B')
plot(x,u_reun,'color','b','linewidth',4);
title('Union operation on Fuzzy Sets');
Intersection Operation:
clear all; %clear all previous workspace variables
x=(0:0.1:10)'; %x-axis from 0 to 10 with 0.1 interval
u1=gaussmf(x,[1,4]); %Gaussian Membership function
Complement Operation:
clear all; % clear all previous workspace variables
x=(0:0.1:10)'; % x-axis from 0 to 10
u1=gaussmf(x,[1,4]); % Gaussian Membership function
u_reun=1-u1; % Complement of Set u1
hold on
plot(x,u1,'r') % plot graph of set u1
axis([0 10 0 1]); % x-axis and y-axis definition
plot(x,u_reun,'color','b','linewidth',4); % plot of complement operation
title('Complement operation on Fuzzy Set');
Difference:
clear all; % clear all previous workspace variables
x=(0:0.1:10)'; % x-axis from 0 to 10
u1=gaussmf(x,[1,4]); % Gaussian Membership function
u2=trimf(x,[2 5 7]); % Triangular membership function
u3=1-u2; %complement of u2
u_reun=min(u1,u3); % Complement of Set u1
hold on
plot(x,u1,'r') % plot graph of set u1
plot(x,u3,'b') % plot graph of set u1
axis([0 10 0 1]); % x-axis and y-axis definition
plot(x,u_reun,'color','m','linewidth',4); % plot of difference operation
title('Difference operation on two fuzzy sets');
Cartesian product:
clear all; %clear all previous workspace variables
x=(0:0.1:10)'; %x-axis from 0 to 10 with 0.1 interval
u1=gaussmf(x,[1,4]); %Gaussian Membership function
u2=trimf(x,[3 6.5 9]); %Triangular Membership function
u_reun=times(u1,u2); %Cartesian Product Operation on Set u1 and u2
hold on
plot(x,u1,'r') %plotting set1
plot(x,u2,'m') %plotting set2
axis([0 10 0 1.05]);
legend('A','B')
plot(x,u_reun,'color','b','linewidth',4);
title('Cartesian Product on two fuzzy Sets');
Assignment No: 02
Aim: Implement genetic algorithm for benchmark function (eg. Square, Rosenbrock function
etc)Initialize the population from the Standard Normal Distribution. Evaluate the fitness of all its
individuals. Then you will do multiple generation of a genetic algorithm. A generation consists
of applying selection, crossover, mutation, and replacement. Use:
Objectives:
1. To learn Genetic Algorithm.
2. To learn about optimization algorithm.
Software Requirements:
Ubuntu 18.04
Hardware Requirements:
Pentium IV system with latest configuration
Theory:
Genetic Algorithms are a class of stochastic, population based optimization algorithms inspired
by the biological evolution process using the concepts of “Natural Selection” and “Genetic
Inheritance” (Darwin 1859) and originally developed by Holland.
GAs are now used in engineering and business optimization applications, where the search space
is large and/or too complex (non-smooth) for analytic treatment. This algorithm reflects the
process of natural selection where the fittest individuals are selected for reproduction in order to
produce offspring of the next generation.
The process of natural selection starts with the selection of fittest individuals from a population. They
produce offspring which inherit the characteristics of the parents and will be added to the next
generation. If parents have better fitness, their offspring will be better than parents and have a
better chance at surviving. This process keeps on iterating and at the end, a generation with the
fittest individuals will be found.
This notion can be applied for a search problem. We consider a set of solutions for a problem
and select the set of best ones out of them.
Five phases are considered in a genetic algorithm.
1. Initial population
2. Fitness function
3. Selection
4. Crossover
5. Mutation
Initial Population
The process begins with a set of individuals which is called a Population. Each individual is a
solution to the problem you want to solve.
An individual is characterized by a set of parameters (variables) known as Genes. Genes are
joined into a string to form a Chromosome (solution).
In a genetic algorithm, the set of genes of an individual is represented using a string, in terms of
an alphabet. Usually, binary values are used (string of 1s and 0s). We say that we encode the
genesin a chromosome
Fitness Function :
The fitness function determines how fit an individual is (the ability of an individual to compete
with other individuals). It gives a fitness score to each individual. The probability that an
individual will be selected for reproduction is based on its fitness score.
Selection
The idea of selection phase is to select the fittest individuals and let them pass their genes to the
next generation.
Two pairs of individuals (parents) are selected based on their fitness scores. Individuals with
high fitness have more chance to be selected for reproduction.
Crossover
Crossover is the most significant phase in a genetic algorithm. For each pair of parents to be
mated,a crossover point is chosen at random from within the genes. For example, consider the
crossover point to be 3 as shown below.
Offspring are created by exchanging the genes of parents among themselves until the crossover
point is reached.
Mutation
In certain new offspring formed, some of their genes can be subjected to a mutation with a low
random probability. This implies that some of the bits in the bit string can be flipped
Mutation occurs to maintain diversity within the population and prevent premature convergence.
Termination
The algorithm terminates if the population has converged (does not produce offspring which are
significantly different from the previous generation). Then it is said that the genetic algorithm
has provided a set of solutions to our problem.
Comments
The population has a fixed size. As new generations are formed, individuals with least fitness
die, providing space for new offspring.
The sequence of phases is repeat ed to produce individuals in each new generation which are
better than the previous generation.
Python Code:
Assign2.py
import numpy
import GA
"""
The y=target is to maximize this equation ASAP:
y = w1x1+w2x2+w3x3+w4x4+w5x5+6wx6
where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7)
What are the best values for the 6 weights w1 to w6?
We are going to use the genetic algorithm for the best possible values after a number of
generations.
"""
# Inputs of the equation.
equation_inputs = [4,-2,3.5,5,-11,-4.7]
# Number of the weights we are looking to optimize.
num_weights = 6
"""
Genetic algorithm parameters:
Mating pool size
Population size
"""
sol_per_pop = 8
num_parents_mating = 4
num_generations = 5
for generation in range(num_generations):
print("Generation : ", generation)
# Measing the fitness of each chromosome in the population.
fitness = GA.cal_pop_fitness(equation_inputs, new_population)
GA.py:
import numpy
for k in range(offspring_size[0]):
# Index of the first parent to mate.
parent1_idx = k%parents.shape[0]
# Index of the second parent to mate.
parent2_idx = (k+1)%parents.shape[0]
# The new offspring will have its first half of its genes taken from the first parent.
offspring[k, 0:crossover_point] = parents[parent1_idx, 0:crossover_point]
# The new offspring will have its second half of its genes taken from the second parent.
offspring[k, crossover_point:] = parents[parent2_idx, crossover_point:]
return offspring
def mutation(offspring_crossover):
# Mutation changes a single gene in each offspring randomly.
for idx in range(offspring_crossover.shape[0]):
# The random value to be added to the gene.
random_value = numpy.random.uniform(-1.0, 1.0, 1)
offspring_crossover[idx, 4] = offspring_crossover[idx, 4] + random_value
return offspring_crossover
Output:
Generation : 1
Fitness values:
[ 18.24112489 17.0688537 15.99527402 14.40299221 -8.46075629 31.73289712
6.10307563 24.08733441]
Selected parents:
[[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]
[ 2.12480298 2.97122243 3.60375452 -1.40103767 -1.5781162 0.30567304]
[-0.63698911 -2.8638447 2.93392615 -1.40103767 -1.20313655 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -0.72163167 0.7516408 0.00677938]]
Crossover result:
[[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.5781162 0.30567304]
[ 2.12480298 2.97122243 3.60375452 -1.40103767 -1.20313655 0.30567304]
[-0.63698911 -2.8638447 2.93392615 -0.72163167 0.7516408 0.00677938]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]]
Mutation result:
[[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.2392086 0.30567304]
[ 2.12480298 2.97122243 3.60375452 -1.40103767 -0.38610586 0.30567304]
[-0.63698911 -2.8638447 2.93392615 -0.72163167 1.33639943 0.00677938]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.13941727 2.29682254]]
Best result after generation 1 : 34.1663669207
Generation : 2
Fitness values:
[ 31.73289712 24.08733441 18.24112489 17.0688537 34.16636692 10.97522073 -
4.89194068 22.86998223]
Selected Parents:
[[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.2392086 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]
[ 2.12480298 2.97122243 3.60375452 -1.40103767 -1.5781162 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.13941727 2.29682254]]
Crossover result:
[[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.5781162 0.30567304]
[ 2.12480298 2.97122243 3.60375452 -1.56909315 -1.13941727 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.2392086 0.30567304]]
Mutation result:
[[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.20515009 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -0.73543721 0.30567304]
[ 2.12480298 2.97122243 3.60375452 -1.56909315 -0.50581509 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.20089639 0.30567304]]
Best result after generation 2: 34.5930432629
Generation : 3
Fitness values:
[ 34.16636692 31.73289712 24.08733441 22.86998223 34.59304326 28.6248816
2.09334217 33.7449326 ]
Selected parents:
[[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.20515009 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.2392086 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.20089639 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]]
Crossover result:
[[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.2392086 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.20089639 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.20515009 2.29682254]]
Mutation result:
[[ 3.00912373 -2.745417 3.27131287 -1.40103767 -2.20744102 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.16589294 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.37553107 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.44124005 2.29682254]]
Best result after generation 3: 44.8169235189
Generation : 4
Fitness values
[ 34.59304326 34.16636692 33.7449326 31.73289712 44.81692352
33.35989464 36.46723397 37.19003273]
Selected parents:
[[ 3.00912373 -2.745417 3.27131287 -1.40103767 -2.20744102 0.30567304]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.44124005 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.37553107 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.20515009 2.29682254]]
Crossover result:
[[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.37553107 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.20515009 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -2.20744102 0.30567304]]
Mutation result:
[[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.13382082 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.98105233 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.27638584 2.29682254]
[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.70558545 0.30567304]]
Best result after generation 4: 44.8169235189
Assignment No: 3
Aim: Implement Particle swarm optimization for benchmark function (eg. Square, Rosenbrock
function).Initialize the population from the Standard Normal Distribution. Evaluate fitness of all
particles. Use:
c1=c2 = 2
Inertia weight is linearly varied between 0.9 to 0.4.
Global best variation
Objectives:
1. To learn swarm algorithm
2. To learn about optimization algorithm.
Software Requirements:
Ubuntu 18.04
Hardware Requirements:
Pentium IV system with latest configuration
Theory:
Particle swarm optimization (PSO) is a population based stochastic optimization technique
developed by Dr. Eberhart and Dr. Kennedy in 1995, inspired by social behavior of bird flocking
or fish schooling.
PSO shares many similarities with evolutionary computation techniques such as Genetic
Algorithms (GA). The system is initialized with a population of random solutions and searches
for optima by updating generations. However, unlike GA, PSO has no evolution operators such
as crossover and mutation. In PSO, the potential solutions, called particles, fly through the
problem space by following the current optimum particles. Compared to GA, the advantages of
PSO are that PSO is easy to implement and there are few parameters to adjust. PSO has been
successfully applied in many areas: function optimization, artificial neural network training ,
fuzzy system control, and other areas where GA can be applied.
PSO uses a bunch of particles called the swarm. These particles are allowed to move around &
explore the search-space.
These particles move in a direction which is guided by —
1. The particle’s own previous velocity (Inertia)
2. Distance from the individual particles’ best known position (Cognitive Force)
3. Distance from the swarms best known position (Social Force)
Essentially the particles collectively communicate with each other to converge faster. The swarm
doesn’t fully explore the search space but potentially finds a better solution.
Interestingly the overall direction of the swarm movement can be changed at any point of time
when a particle’s individual best is better than the swarm best. This allows a lot of disorder and
more chances of getting close to the global minima of the cost function
Algorithm
PSO is initialized with a group of random particles (solutions) and then searches for optima by
updating generations. In every iteration, each particle is updated by following two "best" values.
The first one is the best solution (fitness) it has achieved so far. (The fitness value is also stored.)
This value is called pbest. Another "best" value that is tracked by the particle swarm optimizer is
the best value, obtained so far by any particle in the population. This best value is a global best
and called gbest. When a particle takes part of the population as its topological neighbors, the
best value is a local best and is called lbest.
After finding the two best values, the particle updates its velocity and positions with following
equation (a) and (b).
v[] is the particle velocity, persent[] is the current particle (solution). pbest[] and gbest[] are
defined as stated before. rand () is a random number between (0,1). c1, c2 are learning factors.
usually c1 = c2 = 2.
Choose the particle with the best fitness value of all the particles as the gBest
For each particle
Calculate particle velocity according equation (a)
Update particle position according equation (b)
End
While maximum iterations or minimum error criteria is not attainedParticles' velocities on each
dimension are clamped to a maximum velocity Vmax. If the sum of accelerations would cause
the velocity on that dimension to exceed Vmax, which is a parameter specified by the user. Then
the velocity on that dimension is limited to Vmax.
Flowchart :
MATLAB Code:
ofun.m:
function f=ofun(x)
% objective function (minimization)
of=10*(x(1)-1)^2+20*(x(2)-2)^2+30*(x(3)-3)^2;
% constraints (all constraints must be converted into <=0 type)
% if there is no constraints then comments all c0 lines below
c0=[];
c0(1)=x(1)+x(2)+x(3)-5; % <=0 type constraints
c0(2)=x(1)^2+2*x(2)-x(3); % <=0 type constraints
% defining penalty for each constraint
for i=1:length(c0)
if c0(i)>0
c(i)=1;
else
c(i)=0;
end
end
penalty=10000; % penalty on each constraint violation
f=of+penalty*sum(c); % fitness function
run_pso.m:
tic
clear all
close all
rng default
LB=[0 0 0]; %lower bounds of variables
UB=[10 10 10]; %upper bounds of variables
% pso parameters values
m=3; % number of variables
n=100; % population size
wmax=0.9; % inertia weight
wmin=0.4; % inertia weight
c1=2; % acceleration factor
c2=2; % acceleration factor
% pso main program----------------------------------------------------start
maxite=1000; % set maximum number of iteration
maxrun=10; % set maximum number of runs need to be
for run=1:maxrun
run
% pso initialization----------------------------------------------start
for i=1:n
for j=1:m
x0(i,j)=round(LB(j)+rand()*(UB(j)-LB(j)));
end
end
x=x0; % initial population
v=0.1*x0; % initial velocity
for i=1:n
f0(i,1)=ofun(x0(i,:));
end
[fmin0,index0]=min(f0);
pbest=x0; % initial pbest
gbest=x0(index0,:); % initial gbest
% pso initialization------------------------------------------------end
% pso algorithm---------------------------------------------------start
ite=1;
tolerance=1;
while ite<=maxite && tolerance>10^-12
w=wmax-(wmax-wmin)*ite/maxite; % update inertial weight
% pso velocity updates
for i=1:n
for j=1:m
v(i,j)=w*v(i,j)+c1*rand()*(pbest(i,j)-x(i,j))...
+c2*rand()*(gbest(1,j)-x(i,j));
end
end
gbest;
fvalue=10*(gbest(1)-1)^2+20*(gbest(2)-2)^2+30*(gbest(3)-3)^2;
fff(run)=fvalue;
rgbest(run,:)=gbest;
disp(sprintf('--------------------------------------'));
end
% pso main program------------------------------------------------------end
disp(sprintf('\n'));
disp(sprintf('*********************************************************'));
disp(sprintf('Final Results-----------------------------'));
[bestfun,bestrun]=min(fff)
best_variables=rgbest(bestrun,:)
disp(sprintf('*********************************************************'));
toc
% PSO convergence characteristic
plot(ffmin(1:ffite(bestrun),bestrun),'-k');
xlabel('Iteration');
ylabel('Fitness function value');
title('PSO convergence characteristic')
Output :
Conclusion: Thus we learnt implementation of particle swarm optimization for benchmark function.
Assignment No: 04
Aim: Implement basic logic gates using Mc-Culoch-Pitts or Hebbnet neural networks.
Objectives:
1.The student will be able to obtain the fundamentals and different architecture of neural
networks.
2. The student will have a broad knowledge in developing the different algorithms for neural
networks.
Software Requirements:
Ubuntu 18.04
Hardware Requirements:
Pentium IV system with latest configuration
Theory:
Neural network was inspired by the design and functioning of human brain and components.
Definition:
―Information processing model that is inspired by the way biological nervous system (i.e the
brain) process information, is called Neural Network.
Neural Network has the ability to learn by examples. It is not designed to perform fix /specific
task, rather task which need thinking (e.g. Predictions).
application such as pattern recognition and data classification through a learning process. ANN is
85-90% accurate.
Y- output neuron
Yin= x1w1+x2w2
Output is :
y=f(Yin)
Output= function
The early model of an artificial neuron is introduced by Warren McCulloch and Walter Pitts in
1943. The McCulloch-Pitts neural model is also known as linear threshold gate. It is a neuron of
a set of inputs I1,I2,I3…Im and one output y . The linear threshold gate simply classifies the set
of inputs into two different classes. Thus the output y is binary. Such a function can be described
mathematically using these equations:
W1,W2…Wm are weight values normalized in the range of either (0,1) or (-1,1) and associated
with each input line, Sum is the weighted sum, and T is a threshold constant. The function f is a
linear step function at threshold T as shown in figure
For inhibition to be absolute, the threshold with the activation function should satisfy the
following condition:θ >nw –p
Output will fire if it receives ―k‖ or more excitatory inputs but no inhibitory inputs where
kw≥θ>(k-1) w
-
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 0
-
Yin=x1w1+x2w2
As we know,
-
-
X1 X2 Z1
0 0 0
0 1 0
1 0 1
1 - 1 0
For Z1,
W11=1 and W12=-1
Θ=1
X1 X2 Z2
0 0 0
0 1 1
1 0 0
1 1 0
For Z2,
Θ=1
Y=Z1+Z2
Z1 Z2 Y
0 0 0
0 1 1
1 0 1
1 1 0
For Y,
Θ=1
Input:
Python Code:
import numpy as np
class Perceptron:
def __init__(self, input_length, weights=None):
if weights is None:
self.weights = np.ones(input_length) * 0.5
else:
self.weights = weights
@staticmethod
def unit_step_function(x):
if x > 0.5:
return 1
return 0
def __call__(self, in_data):
Output:
[0 0] 0
[0 1] 0
[1 0] 0
[1 1] 1
Conclusion: Mc-Culloch pits Model is implemented for XOR function by using the
thresholding activation function. Activation of M-P neurons is binary (i.e) at any time step the
neuron may fire or may not fire. Threshold plays major role here.
Assignment No: 05
Aim: Write a program to find the Boolean function to implement following single layer
perceptron.Assume all activation functions to be the threshold function which is 1 for all input
values greater than zero and 0, otherwise
Objectives:
1. To learn single layer perceptron.
2. To learn Boolean logic implementation using perceptron.
Software Requirements:
Ubuntu 18.04
Hardware Requirements:
Pentium IV system with latest configuration
Theory:
The most common Boolean functions are AND, OR, NOT. The Boolean logic AND only returns
1 if both inputs are 1 else 0, Boolean logic OR returns 1 for all inputs with 1, and will only return
0 if both input is 0 and lastly logic NOT returns the invert of the input, if the input is 0 it returns
1, if the input is 1 it returns 0. To make it clear the image below shows the truth table for the
basic Boolean Function.
The columns A and B are the inputs and column Z is the output. So, for the inputs A = 0, B = 0
the output is Z = 0.
Perceptron
A perceptron is the basic part of a neural network. A perceptron represents a single neuron on a
human’s brain, it is composed of the dataset ( Xm ) , the weights ( Wm ) and an activation
function, that will then produce an output and a bias.
The datasets ( inputs ) are converted into an ndarray which is then matrix multiplied to another
ndarray that holds the weights. Summing up all matrix multipy and adding a bias will create the
net input function, the output would then passed into an activation function that would determine
if the neuron needs to fire an output or not.
Most common activation function used for classification used is a sigmoid function, which is a
great function for classification (Although sigmoid is not the leading activation function for
middle layers of neural networks [ ehem ReLU / Leaky ReLU ] it still is widely used for final
classifications. ) The perceptron is simply separating the input into 2 categories, those that cause
a fire, and those that don't. It does this by looking at (in the 2-dimensional case): w1I1 + w2I2< t
If the LHS is < t, it doesn't fire, otherwise it fires. That is, it is drawing the line:
w1I1 + w2I2 = t
and looking at where the input point lies. Points on one side of the line fall into 1 category,
points on the other side fall into the other category. And because the weights and thresholds can
be anything, this is just any line across the 2 dimensional input space. So what the perceptron is
doing is simply drawing a line across the 2-d input space. Inputs to one side of the line are
classified into one category, inputs on the other side are classified into another. e.g. the OR
perceptron, w1=1, w2=1, t=0.5, draws the line: I1 + I2 = 0.5
across the input space, thus separating the points (0,1),(1,0),(1,1) from the point (0,0):
As you might imagine, not every set of points can be divided by a line like this. Those that can
be, are called linearly separable.
In 2 input dimensions, we draw a 1 dimensional line. In n dimensions, we are drawing the (n-1)
dimensional hyperplane:
w1I1 + .. + wnIn = t
We initialize w with some random vector. We then iterate over all the examples in the data, (P U
N) both positive and negative examples. Now if an input x belongs to P, ideally what should the
dot product w.x be? It would be greater than or equal to 0 because that’s the only thing what our
perceptron wants at the end of the day so let's give it that. And if x belongs to N, the dot product
MUST be less than 0. So if you look at the if conditions in the while loop:
Python Code:
import numpy as np
def perceptron(weights, inputs, bias):
model = np.add(np.dot(inputs, weights), bias)
logit = activation_function(model, type="sigmoid")
return np.round(logit)
def activation_function(model, type="sigmoid"):
return {
"sigmoid": 1 / (1 + np.exp(-model))
}[type]
def compute(data, logic_gate, weights, bias):
weights = np.array(weights)
output = np.array([ perceptron(weights, datum, bias) for datum in data ])
return output
# This function is taken from https://github.com/fjcamillo/Neural-Representation-of-Logic-
Functions/blob/master/Logic.py
def print_template(dataset, name, data):
# act = name[6:]
print("Logic Function: {}".format(name.upper()))
print("X0\tX1\tX2\tY")
toPrint = ["{1}\t{2}\t{3}\t{0}".format(output, *datas) for datas, output in zip(dataset, data)]
for i in toPrint:
print(i)
def main():
dataset = np.array([
[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]
])
gates = {
"and": compute(dataset, "and", [1, 1, 1], -2),
"or": compute(dataset, "or", [1, 1, 1], -0.9),
"not": compute(np.array([ [0], [1] ]), "not", [-1], 1),
"nand": compute(dataset, "nand", [-1, -1, -1], 3),
"nor": compute(dataset, "nor", [-1, -1, -1], 1),
# _xor = compute(dataset, "and", [1], dataset),
# _xnor = compute(dataset, "xnor", [], dataset)
}
for gate in gates:
print_template(dataset, gate, gates[gate])
if __name__ == '__main__':
main()
Output:
runfile('C:/Users/Bhavesh/.spyder-py3/temp.py', wdir='C:/Users/Bhavesh/.spyder-py3')
Logic Function: AND
X0 X1 X2 Y
0 0 0 0.0
0 0 1 0.0
0 1 0 0.0
0 1 1 0.0
1 0 0 0.0
1 0 1 0.0
1 1 0 0.0
1 1 1 1.0
Logic Function: OR
X0 X1 X2 Y
0 0 0 0.0
0 0 1 1.0
0 1 0 1.0
0 1 1 1.0
1 0 0 1.0
1 0 1 1.0
1 1 0 1.0
1 1 1 1.0
Logic Function: NOT
X0 X1 X2 Y
0 0 0 1.0
0 0 1 0.0
Logic Function: NAND
X0 X1 X2 Y
0 0 0 1.0
0 0 1 1.0
0 1 0 1.0
0 1 1 1.0
1 0 0 1.0
1 0 1 1.0
1 1 0 1.0
1 1 1 0.0
Logic Function: NOR
X0 X1 X2 Y
0 0 0 1.0
0 0 1 0.0
0 1 0 0.0
0 1 1 0.0
1 0 0 0.0
1 0 1 0.0
1 1 0 0.0
1 1 1 0.0
Conclusion : Thus we learned implementation of single layer perceptron for AND ,OR and
NOT Boolean function.
Assignment No: 06
Aim: Implement Union, Intersection, Complement and Difference operations on fuzzy sets. Also create
fuzzy relation by Cartesian product of any two fuzzy sets and perform max-min composition on any two
fuzzy relations.
Objectives:
1. To learn different fuzzy operation on fuzzy set.
2. To learn min-max composition on fuzzy set.
Software Requirements:
Ubuntu 18.04
Hardware Requirements:
Pentium IV system with latest configuration
Theory:
Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the
concept of partial truth- truth values between "completely true" and "completely false". As its
name suggests, it is the logic underlying modes of reasoning which are approximate rather than
exact. The importance of fuzzy logic derives from the fact that most modes of human reasoning
and especially common sense reasoning are approximate in nature.
The essential characteristics of fuzzy logic as founded by ZaderLotfi are as follows.
In fuzzy logic, exact reasoning is viewed as a limiting case of approximate reasoning.
In fuzzy logic everything is a matter of degree.
Any logical system can be fuzzified
In fuzzy logic, knowledge is interpreted as a collection of elastic or, equivalently , fuzzy
constraint on a collection of variables
Inference is viewed as a process of propagation of elastic constraints.
The third statement hence, defines Boolean logic as a subset of Fuzzy logic.
Fuzzy Sets
Fuzzy Set Theory was formalised by Professor Lofti Zadeh at the University of California in
1965. What Zadeh proposed is very much a paradigm shift that first gained acceptance in the Far
East and its successful application has ensured its adoption around the world.
A paradigm is a set of rules and regulations which defines boundaries and tells us what to do to
be successful in solving problems within these boundaries. For example the use of transistors
instead of vacuum tubes is a paradigm shift - likewise the development of Fuzzy Set Theory
from conventional bivalent set theory is a paradigm shift.
Bivalent Set Theory can be somewhat limiting if we wish to describe a 'humanistic' problem
mathematically. For example, Fig 1 below illustrates bivalent sets to characterise the temperature
of a room.
The most obvious limiting feature of bivalent sets that can be seen clearly from the diagram is that
they are mutually exclusive - it is not possible to have membership of more than one set ( opinion
would widely vary as to whether 50 degrees Fahrenheit is 'cold' or 'cool' hence the expert knowledge
we need to define our system is mathematically at odds with the humanistic world). Clearly, it is not
accurate to define a transiton from a quantity such as 'warm' to 'hot' by the application of one degree
Fahrenheit of heat. In the real world a smooth (unnoticeable) drift from warm to hot would occur.
This natural phenomenon can be described more accurately by Fuzzy Set Theory. Fig.2 below shows
how fuzzy sets quantifying the same information can describe this natural drift.
The whole concept can be illustrated with this example. Let's talk about people and "youthness".
In this case the set S (the universe of discourse) is the set of people. A fuzzy subset YOUNG is
also defined, which answers the question "to what degree is person x young?" To each person in
the universe of discourse, we have to assign a degree of membership in the fuzzy subset
YOUNG. The easiest way to do this is with a membership function based on the person's age.
Parthiban 25 0.50
Arosha 26 0.40
Chin Wei 28 0.20
Rajkumar 83 0.00
So given this definition, we'd say that the degree of truth of the statement "Parthiban is YOUNG"
is 0.50.
Note: Membership functions almost never have as simple a shape as age(x). They will at least tend to
be triangles pointing up, and they can be much more complex than that. Furthermore, membership
functions so far is discussed as if they always are based on a single criterion, but this isn't always the
case, although it is the most common case. One could, for example, want to have the membership
function for YOUNG depend on both a person's age and their height (Arosha's short for his age).
This is perfectly legitimate, and occasionally used in practice. It's referred to as a two-dimensional
membership function. It's also possible to have even more criteria, or to have the membership
function depend on elements from two completely different universes of discourse.
Union
The membership function of the Union of two fuzzy sets A and B with membership functions
and respectively is defined as the maximum of the two individual membership functions.
This is called the maximum criterion.
The Union operation in Fuzzy set theory is the equivalent of the OR operation in Boolean algebra.
Intersection
The membership function of the Intersection of two fuzzy sets A and B with membership
functions and respectively is defined as the minimum of the two individual
membership functions. This is called the minimum criterion.
The Intersection operation in Fuzzy set theory is the equivalent of the AND operation in Boolean
algebra. Complement The membership function of the Complement of a Fuzzy set A with
membership function is defined as the negation of the specified membership function. This is caleed
the negation criterion.
The Complement operation in Fuzzy set theory is the equivalent of the NOT operation in
Boolean algebra.
The following rules which are common in classical set theory also apply to Fuzzy set theory.
Let A1, A2, ….., An be fuzzy sets in U1, U2, …Un, respectively. The Cartesian product of A1, A2,
….., An is a fuzzy set in the space U1 x U2 x…x Un with the membership function as: µA1 x A2
x…x An (x1, x2,…, xn) = min [µA1 (x1), µA2 (x2), …. µAn (xn)]
So, the Cartesian product of A1, A2, ….., An are donated by A1 x A2 x….. x An.
Cartesian Product: Example Let A = {(3, 0.5), (5, 1), (7, 0.6)} Let B = {(3, 1), (5, 0.6)}
The product is all set of pairs from A and B with the minimum associated memberships Ax B =
{[(3, 3), min (0.5, 1)], [(5, 3), min(1, 1)], [(7, 3), min(0.6, 1)], [(3, 5), min(0.5, 0.6)], [(5, 5), min(1,
0.6)], [(7, 5), min(0.6, 0.6)]} = {[(3, 3), 0.5], [(5, 3), 1], [(7, 3), 0.6], [(3, 5), 0.5], [(5, 5), 0.6], [(7, 5),
0.6]}.
MATLAB Code:
Union Operation:
clear all; %clear all previous workspace variables
x=(0:0.1:10)'; %x-axis from 0 to 10 with 0.1 interval
u1=gaussmf(x,[1,4]); %Gaussian Membership function
u2=trimf(x,[3 6.5 9]); %Triangular Membership function
u_reun=max(u1,u2); %Union Operation on Set u1 and u2
hold on
plot(x,u1,'r') %plotting set1
plot(x,u2,'m') %plotting set2
axis([0 10 0 1.05]);
legend('A','B')
plot(x,u_reun,'color','b','linewidth',4);
title('Union operation on Fuzzy Sets');
plot(x,u_reun,'color','b','linewidth',4);
title('Intersection operation on Fuzzy Sets');
Complement Operation:
clear all; % clear all previous workspace variables
x=(0:0.1:10)'; % x-axis from 0 to 10
u1=gaussmf(x,[1,4]); % Gaussian Membership function
u_reun=1-u1; % Complement of Set u1
hold on
plot(x,u1,'r') % plot graph of set u1
axis([0 10 0 1]); % x-axis and y-axis definition
plot(x,u_reun,'color','b','linewidth',4); % plot of complement operation
title('Complement operation on Fuzzy Set');
Cartesian product:
clear all; %clear all previous workspace variables
x=(0:0.1:10)'; %x-axis from 0 to 10 with 0.1 interval
u1=gaussmf(x,[1,4]); %Gaussian Membership function
u2=trimf(x,[3 6.5 9]); %Triangular Membership function
u_reun=times(u1,u2); %Cartesian Product Operation on Set u1 and u2
hold on
plot(x,u1,'r') %plotting set1
plot(x,u2,'m') %plotting set2
axis([0 10 0 1.05]);
legend('A','B')
plot(x,u_reun,'color','b','linewidth',4);
title('Cartesian Product on two fuzzy Sets');
MAX-MIN Composition:
clear all; % clear all previous workspace variables
x=(0:0.1:10)'; % x-axis from 0 to 10
u1=trimf(x,[1 5 9]) % Triangular Membership function
u2=trimf(x,[2 4 7]); % Triangular membership function
u_reun=max(min(u1,u2)); % MAX-MIN Composition on sets u1, u2
hold on
plot(x,u1,'r') % plot graph of set u1
plot(x,u2,'m') % plot graph of set u2
axis([0 10 0 1]); % x-axis and y-axis definition
plot(x,u_reun,'color','b','linewidth',4); % plot of MAX-MIN Composition
title('MAX-MIN composition on two fuzzy sets');
Aim: Apply the Genetic Algorithm for optimization on a dataset obtained from UCI ML
repository.For Example: IRIS Dataset or Travelling Salesman Problem or KDD Dataset.
Objectives:
1.To learn Genetic Algorithm .
2.To learn about optimization algorithm
3.To find the set of solution for Travelling Salesman Problem.
Software Requirements:
Ubuntu 18.04
Python (3.6)64 bit
Theory:
Genetic Algorithms are a class of stochastic, population based optimization algorithms inspired
by the biological evolution process using the concepts of “Natural Selection” and “Genetic
Inheritance” (Darwin 1859) and originally developed by Holland.
GAs are now used in engineering and business optimization applications, where the search space
is large and/or too complex (non-smooth) for analytic treatment. This algorithm reflects the
process of natural selection where the fittest individuals are selected for reproduction in order to
produce offspring of the next generation.
The process of natural selection starts with the selection of fittest individuals from a population. They
produce offspring which inherit the characteristics of the parents and will be added to the
nextgeneration. If parents have better fitness, their offspring will be better than parents and have
a better chance at surviving. This process keeps on iterating and at the end, a generation with the
fittest individuals will be found.
This notion can be applied for a search problem. We consider a set of solutions for a problem
and select the set of best ones out of them.
1.Initial population
2. Fitness function
3.Selection
4.Crossover
5. Mutation
Initial Population
The process begins with a set of individuals which is called a Population. Each individual is a
solution to the problem you want to solve.
An individual is characterized by a set of parameters (variables) known as Genes. Genes are
joined into a string to form a Chromosome (solution).
In a genetic algorithm, the set of genes of an individual is represented using a string, in terms of
an alphabet. Usually, binary values are used (string of 1s and 0s). We say that we encode the
genesin a chromosome.
The fitness function determines how fit an individual is (the ability of an individual to compete
with other individuals). It gives a fitness score to each individual. The probability that an
individual will be selected for reproduction is based on its fitness score.
Selection
The idea of selection phase is to select the fittest individuals and let them pass their genes to the
next generation.
Two pairs of individuals (parents) are selected based on their fitness scores. Individuals with
high fitness have more chance to be selected for reproduction.
Crossover
Crossover is the most significant phase in a genetic algorithm. For each pair of parents to be
mated,a crossover point is chosen at random from within the genes. For example, consider the
crossover point to be 3 as shown below.
Offspring are created by exchanging the genes of parents among themselves until the crossover
point is reached.
Mutation
In certain new offspring formed, some of their genes can be subjected to a mutation with a low
random probability. This implies that some of the bits in the bit string can be flipped.
Mutation occurs to maintain diversity within the population and prevent premature convergence.
Termination
The algorithm terminates if the population has converged (does not produce offspring which are
significantly different from the previous generation). Then it is said that the genetic algorithm
has provided a set of solutions to our problem.
Genetic Algorithm :
Optimization problems
traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are
also possible.
Once the genetic representation and the fitness function are defined, a GA proceeds to initialize a
population of solutions and then to improve it through repetitive application of the mutation,
crossover, inversion and selection operators.
Initialization
The population size depends on the nature of the problem, but typically contains several
hundreds or thousands of possible solutions. Often, the initial population is generated randomly,
allowing the entire range of possible solutions (the search space). Occasionally, the solutions
may be "seeded" in areas where optimal solutions are likely to be found.
Selection
During each successive generation, a portion of the existing population is selected to breed a new
generation. Individual solutions are selected through a fitness-based process, where fitter
solutions (as measured by a fitness function) are typically more likely to be selected. Certain
selection methods rate the fitness of each solution and preferentially select the best solutions.
Other methods rate only a random sample of the population, as the former process may be very
time-consuming.
The fitness function is defined over the genetic representation and measures the quality of the
represented solution. The fitness function is always problem dependent. For instance, in the
knapsack problem one wants to maximize the total value of objects that can be put in a knapsack
of some fixed capacity. A representation of a solution might be an array of bits, where each bit
represents a different object, and the value of the bit (0 or 1) represents whether or not the object
is in the knapsack. Not every such representation is valid, as the size of objects may exceed the
capacity of the knapsack. The fitness of the solution is the
sum of values of all objects in the knapsack if the representation is valid, or 0 otherwise.
In some problems, it is hard or even impossible to define the fitness expression; in these cases, a
simulation may be used to determine the fitness function value of a phenotype (e.g.
computational fluid dynamics is used to determine the air resistance of a vehicle whose shape is
encoded as the phenotype), or even interactive genetic algorithms are used.
Genetic operators
The next step is to generate a second generation population of solutions from those selected
through a combination of genetic operators: crossover (also called recombination), and mutation.
For each new solution to be produced, a pair of "parent" solutions is selected for breeding from
the pool selected previously. By producing a "child" solution using the above methods of
crossover and mutation, a new solution is created which typically shares many of the
characteristics of its "parents". New parents are selected for each new child, and the process
continues until a new
population of solutions of appropriate size is generated. Although reproduction methods that are
based on the use of two parents are more "biology inspired", some research suggests that more
than two "parents" generate higher quality chromosomes.
These processes ultimately result in the next generation population of chromosomes that is
different from the initial generation. Generally the average fitness will have increased by this
procedure for the population, since only the best organisms from the first generation are selected
for breeding, along with a small proportion of less fit solutions. These less fit solutions ensure g
genetic diversity within the genetic pool of the parents and therefore ensure the genetic diversity
of the subsequent generation of children.
Opinion is divided over the importance of crossover versus mutation. There are many references
in Fogel (2006) that support the importance of mutation-based search.
Although crossover and mutation are known as the main genetic operators, it is possible to use
other operators such asregrouping, colonization-extinction, or migration in genetic algorithms.
It is worth tuning parameters such as the mutation probability, crossover probability and
population size to find reasonable settings for the problem class being worked on. A very small
mutation rate may lead to genetic drift (which is non-ergodic in nature). A recombination rate
that is too high may lead to premature convergence of the genetic algorithm. A mutation rate that
is too high may lead to loss of good solutions, unless elitist selection is employed.
Heuristics
In addition to the man operators above, other heuristics may be employed to make the calculation
faster or more robust. The speciation heuristic penalizes crossover between candidate solutions
that are too similar; this encourages population diversity and helps prevent premature
convergence to a less optimal solution.
Termination
This generational process is repeated until a termination condition has been reached. Common
terminating conditions are:
A solution is found that satisfies minimum criteria
Fixed number of generations reached Allocated budget (computation time/money)
reached
The highest ranking solution's fitness is reaching or has reached a plateau such that
successive iterations no longer produce better results
Manual inspection
Combinations of the above
Program :
def BitString(self):
return self.__BitString
def ToString(self):
Str = ''
if self.BitString2Int() < 0:
Str += '-' + self.BitString()[1:]
else:
Str += '+' + self.BitString()[1:]
return Str + ', '
'''
G = Gene(-3, 3, 5)
print(G.ToString())
print(str(G.BitString2Int()))
'''
def BitString(self):
Str = ''
for Gene in self.__GeneList:
#print(Gene.ToString())
Str += Gene.BitString()
return Str
def BitStringToString(self):
Str = ''
for Gene in self.__GeneList:
Str += Gene.ToString()
return Str
def ValuesToString(self):
Str = 'Genotype\'s values: '
for Gene in self.__GeneList:
Str += str(Gene.BitString2Int()) + ', '
return Str
def Invalid(self):
for i in range(0, self.__NumberOfGenes):
if i == 1 or i == 3 or i == 5:
if self.Gene(i).BitString2Int() == 0:
return True
return False
'''
G = Genotype(2, -10, 10, 6)
G2 = Genotype(2, -10, 10, 6)
print('G.BitString(): ' + G.BitString())
print(G.ToString())
print('G2.BitString(): ' + G2.BitString())
print(G2.ToString())
G.SetBitString(G2.BitString())
print('G.BitString(): ' + G.BitString())
print(G.ToString())
print('G2.BitString(): ' + G2.BitString())
print(G2.ToString())
'''
class Individual:
DataSet = DataSet()
def Fitness(self):
# returns a numeric value as the raw fitness
if self.Genotype.Invalid():
return 0
CorrectnessCount = 0
for Case in Individual.DataSet.ListOfCases:
if self.Classification(Case) == Case.Classification:
CorrectnessCount += 1
return CorrectnessCount
def WriteCorrectnessToFile(self):
if self.Genotype.Invalid():
return 0
OutputFile = open( 'Classifier1.txt' , 'w')
CorrectnessCount = 0
for Case in Individual.DataSet.ListOfCases:
OutputFile.write("%s\n" % (self.Classification(Case)))
if self.Classification(Case) == Case.Classification:
CorrectnessCount += 1
return CorrectnessCount
OutputFile.close()
def Solution(self):
# returns a numeric value as the solution to the fitness function
return Individual.DataSet.Length
def __init__(self):
## self.Genotype = XAsBitString + YAsBitString
## self.Genotype = '0'
self.Genotype = Genotype(6, -100, 100, 7)
OtherRightHalf = OtherIndividual.Genotype.BitString()[CrossoverIndex:]
## print('I2: ' + OtherLeftHalf + ' ' + OtherRightHalf)
self.Genotype.SetBitString(SelfLeftHalf + OtherRightHalf)
OtherIndividual.Genotype.SetBitString(OtherLeftHalf + SelfRightHalf)
## print('I1: ' + self.Genotype.BitString()[:CrossoverIndex] + ' ' +
self.Genotype.BitString()[CrossoverIndex:])
## print('I2: ' + OtherIndividual.Genotype.BitString()[:CrossoverIndex] + ' ' +
OtherIndividual.Genotype.BitString()[CrossoverIndex:])
if NumberOfCrossoverPoints == 1:
self.OnePointCrossover(OtherIndividual)
else:
self.TwoPointCrossover(OtherIndividual)
def ApplyMutation(self):
RandomIndex = randint(0, len(self.Genotype.BitString()) - 1)
GenotypeAsList = list(self.Genotype.BitString())
if GenotypeAsList[RandomIndex] == '1':
GenotypeAsList[RandomIndex] = '0'
else:
GenotypeAsList[RandomIndex] = '1'
NewGenotype = ''
for Character in GenotypeAsList:
NewGenotype += Character
self.Genotype.SetBitString(NewGenotype)
##I1 = Individual()
##print(I1.ToString(''))
##print(I1.Fitness())
##print(I1.FitnessAsPercentage(I1.Fitness()))
##I1.ApplyMutation()
##print(I1.ToString(''))
##print(I1.Fitness())
##I2 = Individual()
##print(I2.ToString(''))
##print('ApplyCrossover')
##I1.ApplyCrossover(I2, 2)
##print(I1.ToString(''))
##print(I2.ToString(''))
####print(Int2BitString(65535))
####print(BitString2Int( Int2BitString(65535) ))
####print(Int2BitString(1))
####print(BitString2Int( Int2BitString(1) ))
####print(Int2BitString(0))
####print(BitString2Int( Int2BitString(0) ))
####print(BitString2Int('1111111111111111'))
####print(BitString2Int('0000000000000001'))
class IndividualList:
def __init__(self):
self.List = []
def GetRandomIndividualWithReplacement(self):
return sample(self.List, 1)[0]
def GetAndRemoveRandomIndividual(self):
RandomIndividual = self.GetRandomIndividualWithReplacement()
self.List.remove( RandomIndividual )
return RandomIndividual
def ResetList(self):
self.List = []
def ShuffleList(self):
shuffle(self.List)
def BestFitness(self):
BestFitness = self.List[0].Fitness()
for Individual in self.List:
IndividualFitness = Individual.Fitness()
if IndividualFitness > BestFitness:
BestFitness = IndividualFitness
return BestFitness
def MostFitIndividual(self):
MostFitIndividual = self.List[0]
for Individual in self.List:
if Individual.Fitness() > MostFitIndividual.Fitness():
MostFitIndividual = Individual
return MostFitIndividual
##I1 = Individual()
##print(I1.ToString(' '))
##I2 = Individual()
##IL = IndividualList()
##IL.List.append(I1)
##IL.List.append(I2)
##print(IL.ToString('', ' '))
##print(I1 == I2)
##I3 = I1
##print(I1 == I3)
class GA:
def __init__(self, PopulationSize, MutationRate, CrossoverRate, NumberOfCrossoverPoints,
MaxNumberOfGenerations):
self.MutationRate = MutationRate
self.CrossoverRate = CrossoverRate
self.NumberOfCrossoverPoints = NumberOfCrossoverPoints
self.MaxNumberOfGenerations = MaxNumberOfGenerations
self.Population = IndividualList()
self.MatingPool = IndividualList()
self.PopulationSize = PopulationSize
self.MatingPoolSize = PopulationSize
self.Finished = False
self.GenerationCount = 1
self.InitialBestFitness = 0
self.FinalBestFitness = 0
self.PercentInitialBestFitness = 0
self.PercentFinalBestFitness = 0
self.AverageFitnessDelta = 0
self.InitializePopulation()
def InitializePopulation(self):
# return an IndividualList object of random Individual objects
while len(self.Population.List) < self.PopulationSize:
self.Population.List.append( Individual() )
def InitializeMatingPool(self):
self.MatingPool.ResetList()
while len(self.MatingPool.List) < self.MatingPoolSize:
Individual1 = self.Population.GetRandomIndividualWithReplacement()
Individual2 = self.Population.GetRandomIndividualWithReplacement()
if Individual1.Fitness() >= Individual2.Fitness():
self.MatingPool.List.append(Individual1)
else:
self.MatingPool.List.append(Individual2)
def ShouldApplyCrossover(self):
RandomFraction = uniform(0, 1)
if self.CrossoverRate >= RandomFraction:
return True
else:
return False
def ShouldApplyMutation(self):
RandomFraction = uniform(0, 1)
if self.MutationRate >= RandomFraction:
return True
else:
return False
ParentList.ShuffleList()
OffSpring = ParentList.List[0]
if self.ShouldApplyCrossover():
# choose to cross OffSpring with a non-cloned parent, unless len(ParentList.List) == 0,
# then cross with OffSpring (crossing Offspring with Offspring does not change
OffSpring whatsoever)
OffSpring.ApplyCrossover( ParentList.List[-1], self.NumberOfCrossoverPoints )
if self.ShouldApplyMutation():
OffSpring.ApplyMutation()
OffSpringList.List.append( OffSpring )
## print('OffSpringList:\n' + OffSpringList.ToString(' ', ' '))
return OffSpringList
def UpdatePopulation(self):
self.Population.ResetList()
while len(self.Population.List) < self.PopulationSize:
ParentList = IndividualList()
ParentList.List.append( self.MatingPool.GetAndRemoveRandomIndividual() )
## if MatingPool is empty, but we need 1 more Parent, then the two parents are clones
if len(self.MatingPool.List) == 0:
ParentList.List.append( ParentList.List[0] )
else:
ParentList.List.append( self.MatingPool.GetAndRemoveRandomIndividual() )
## print('ParentList:\n' + ParentList.ToString(' ', ' '))
self.Population.List += self.NOffspring(2, ParentList).List
## too many offspring can be added to Population, so remove the latest added offspring
until
## len(self.Population.List) == self.PopulationSize
while len(self.Population.List) != self.PopulationSize:
self.Population.List = self.Population.List[:-1]
def HasConverged(self):
if Individual().FitnessAsPercentage(self.Population.BestFitness()) >= 96:
return True
if self.GenerationCount == self.MaxNumberOfGenerations:
return True
return False
def Run(self):
# calculate a statistic
self.InitialBestFitness = self.Population.BestFitness()
class GARunner:
def __init__(self):
self.PopulationSize = 10
self.MutationRate = .01
self.CrossoverRate = .6
self.NumberOfCrossoverPoints = 2
self.NumberOfTimesToRun = 4
self.RunCount = 0
self.TotalNumberOfGenerations = 0
self.TotalOfFitnessDeltas = 0
self.OutputFileName = 'a.txt'
self.MaxNumberOfGenerations = 1000
def SetPreferences(self):
print("Please state the population size (an integer >= 1)")
self.PopulationSize = int(input("=> "))
print("Please state the mutation rate (a decimal point '.', followed by a nonnegative integer,
so that 0 <= mutation rate <= 1)")
self.MutationRate = float(input("=> "))
print("Please state the crossover rate (a decimal point '.', followed by a nonnegative integer,
so that 0 <= crossover rate <= 1)")
self.CrossoverRate = float(input("=> "))
print("Please state the number of crossover points (either '1' or '2')")
self.NumberOfCrossoverPoints = int(input("=> "))
print("Please state the maximum number of generations (an integer >= 1)")
self.MaxNumberOfGenerations = int(input("=> "))
print("Please state the number of times to run the GA with these parameters (an integer >=
1)")
self.NumberOfTimesToRun = int(input("=> "))
## print("Please state the name of the output file. (file.txt)")
## self.OutputFileName = input("=> ")
def PrintRunResults(self):
## print('Population.List: \n'+ self.GA.Population.ToString(' ', ' '))
print()
print('Results of Run #' + str(self.RunCount) + ':')
print('Generation count: ', str(self.GA.GenerationCount))
print('Population\'s initial best fitness: ' + str(self.GA.InitialBestFitness) + ', or ' +
str(self.GA.PercentInitialBestFitness) + ' percent fit')
print('Population\'s final best fitness: ' + str(self.GA.FinalBestFitness) + ', or ' +
str(self.GA.PercentFinalBestFitness) + ' percent fit')
print('Average fitness delta: ' + str(round(self.GA.AverageFitnessDelta, 3)))
print('Most fit individual: ' +
self.GA.Population.MostFitIndividual().Genotype.ValuesToString())
def WriteResultsToOutputFile(self):
OutputFile = open( self.OutputFileName , 'a')
OutputFile.write("%s, %s\n" % (self.GA.GenerationCount, self.GA.AverageFitnessDelta))
OutputFile.close()
def PrintSummary(self):
print()
print('Summary of runs of GA with the given parameters:')
def main(self):
self.SetPreferences()
## OutputFile = open( self.OutputFileName , 'a')
## OutputFile.write("%s, %s\n" % ('Run\'s total number of generations', 'Run\'s average
fitness delta (a percentage)'))
## OutputFile.close()
# initialize the GA
self.GA = GA(self.PopulationSize, self.MutationRate, self.CrossoverRate,
self.NumberOfCrossoverPoints, self.MaxNumberOfGenerations)
# run the GA, print the run's results, print the results, write the results to the output file
self.GA.Run()
self.PrintRunResults()
## self.WriteResultsToOutputFile()
## if self.RunCount == 1:
## self.GA.Population.MostFitIndividual().WriteCorrectnessToFile()
GARunner = GARunner()
GARunner.main()
Output :
database@database-HP-Pro-3090-Microtower-PC:~$ python GA.py
Please state the population size (an integer >= 1)
=> 5
Please state the mutation rate (a decimal point '.', followed by a nonnegative integer, so that 0 <=
mutation rate <= 1)
=> .4
Please state the crossover rate (a decimal point '.', followed by a nonnegative integer, so that 0 <=
crossover rate <= 1)
=> .3
Please state the number of crossover points (either '1' or '2')
=> 2
Please state the maximum number of generations (an integer >= 1)
=> 1
Please state the number of times to run the GA with these parameters (an integer >= 1)
=> 3
()
Results of Run #1:
('Generation count: ', '1')
Population's initial best fitness: 83, or 0.0 percent fit
Population's final best fitness: 83, or 0.0 percent fit
Average fitness delta: 0.0
Most fit individual: Genotype's values: -78, -72, 9, 91, 76, -48,
()
Results of Run #2:
('Generation count: ', '1')
Population's initial best fitness: 53, or 0.0 percent fit
Population's final best fitness: 51, or 0.0 percent fit
Average fitness delta: 0.0
Most fit individual: Genotype's values: 120, 124, -1, 44, -107, 11,
()
Results of Run #3:
('Generation count: ', '1')
Population's initial best fitness: 81, or 0.0 percent fit
Population's final best fitness: 64, or 0.0 percent fit
Average fitness delta: 0.0
Most fit individual: Genotype's values: -107, -90, -12, -2, -101, 96,
Conclusion :
We have successfully studied the Genetic Algorithm for optimization on a dataset obtained
from UCI ML repository.For Example: IRIS Dataset
References :
1) https://en.wikipedia.org/wiki/Genetic_algorithm
2) https://www.geeksforgeeks.org/genetic-algorithms/
Assignment No: 08
Title: Use Business intelligence and analytics tools to recommend the combination of share
purchases and sales for maximizing the profit.
Objectives:
1. To study different types of BI analytics tools.
2. To study the representation and implementation of BI tools for maximizing shares profit.
Theory:
Business Intelligence: -
Business intelligence (BI) is a technology-driven process for analyzing data and presenting
actionable information to help corporate executives, business managers and other end users make
more informed business decisions. BI encompasses a variety of tools, applications and
methodologies that enable organizations to collect data from internal systems and external
sources, prepare it for analysis, develop and run queries against the data, and create reports,
dashboards and data visualizations to make the analytical results available to corporate decision
makers as well as operational workers. The potential benefits of business intelligence programs
include accelerating and improving decision making; optimizing internal business processes;
increasing operational efficiency; driving new revenues; and gaining competitive advantages
over business rivals. BI systems can also help companies identify market trends and spot
business problems that need to be addressed. Business intelligence combines a broad set of data
analysis applications, including ad hoc analysis and querying, enterprise reporting, online
analytical processing (OLAP), mobile BI, real-time BI, operational BI, cloud and software as a
service BI, open source BI, collaborative BI and location intelligence. BI technology also
includes data visualization software for designing charts and other infographics, as well as tools
for building BI dashboards and performance scorecards that display visualized data on business
metrics and key performance indicators in an easy-to-grasp way. BI applications can be bought
separately from different vendors or as part of a unified BI platform from a single vendor.
BI Programs: BI programs can also incorporate forms of advanced analytics, such as data
mining, predictive analytics, text mining, statistical analysis and big data analytics. In many
cases though, advanced analytics projects are conducted and managed by separate teams of data
scientists, statisticians, predictive modelers and other skilled analytics professionals, while BI
teams oversee more straightforward querying and analysis of business data.
BI Data: Business intelligence data typically is stored in a data warehouse or smaller data marts
that hold subsets of a company's information. In addition, Hadoop systems are increasingly being
used within BI architectures as repositories or landing pads for BI and analytics data, especially
for unstructured data, log files, sensor data and other types of big data. Before it's used in BI
applications, raw data from different source systems must be integrated, consolidated and
cleansed using data integration and data quality tools to ensure that users are analyzing accurate
and consistent information
BI Tools:
Business Intelligence tools provide companies reliable information and true insights in order to
improve decision making and social collaboration. With business intelligence you’ll be able to
produce much better company results. The BI tools provide the means for efficient reporting,
thorough analysis of (big) data, statistics and analytics and dashboards displaying KPIs. Bring
your company data to life with BI tools Bring your company data to life by combining, analyzing
and visualizing all that data very easily. The tools will help you to see and understand the success
factors of your business more quickly. And where things (might) go wrong and where you need
to make adjustments. They give employees and managers the possibility to improve business
processes on a daily basis by using correct information and relevant insights.
Companies who are not successful often have an issue with their information infrastructure. They
may have selected the wrong Business Intelligence tool or perhaps they don’t use business
intelligence-tools at all.They have not been able to implement BI and are still in the dark and that
hurts company results.
How can you easily select one of the tools for your organization?
Step 1: Define the key bi tool selection criteria, both the user and IT requirements.
Step 2: With a list of questions you need to contact all the vendors to get the answers.
Step 3: Validate and analyze all the information from the Business Intelligence vendors.
Step 4: Make a short list for a proof-of-concept (POC) and perform the POC.
Step 5: Choose the tool / platform that suits your needs and price criteria best.
How can you compare BI tools very quickly?
Business Intelligence tools come in many different flavors. All the tools run on the Windows
platform for example, but only a few support the different flavors of Unix and Linux. Some have
excellent functionality for pixel perfect reporting and others do better in dash-boarding and
predictive analytics.
KPIs
What are Key Performance Indicators (KPIs) and how do you choose which ones to use? It is
difficult to determine what are genuine KPIs, a little like looking for a needle in a haystack. The
problem being that we are trying to shed light on is the (future) performance of our most critical
internal processes which will allow us to achieve the organization’s (strategic) goals within the
chosen policies.A KPI should be important enough that achieving it will have a huge impact on
the total performance of theorganization (or part of the organization). The King of KPIs, David
Parmenter, said once “Key performance indicators (KPIs), while used commonly around the
world, have never until now been clearly defined.” This is the reason that it is essential to
understand which types of (performance) indicators actually exist:
1. Key Results Indicators (KRIs): These are complex indicators based on the total achievement
of a broad range of actions, for example, the profitability of a company, the level of satisfaction
of the employees, or customer satisfaction levels.
2. Performance Indicators (PIs): These are very specific indicators, they tell us what we need
to achieve in a given area and are not particularly “key” to achieving efficient performance in the
organization’s internal processes. Examples are: the profit achieved from the company’s 10
largest customers or the revenue growth percentage for a given product or service. These can, of
course, be very useful things to measure, but in general they are not critical for achieving better
performance in multiple result areas of the organization at the same time. By concentrating
solely on performance in terms of, for example, revenue growth we often miss the fact that we
are no longer making any profit because the customers and employees have become dissatisfied.
3. Indicators (IND): These are measures of a single action which can form the basis for any
KRI, PI or KPI. They are, in general, neither critical, nor key in achieving major performance
improvement. Some examples are: number of new customers, gross revenue, number of lost
customers, number of orders etc.
4. Key Performance Indicators (KPIs): These are the indicators that measure one activity or
action, they are directly connected to an organization’s strategic goals and improvement
measured using these indicators can (and should) mean a dramatic improvement of the
performance in more multiple organizational areas. Examples are: the number of minutes that the
departure of an airplane is delayed, or the number of times a product cannot be sold due to lack
of inventory.
KPIs can often be recognized as missed chances as seen by the “lost sales” example or when, if
the value of a KPI decreases it causes rework. Rework means that actions have to be carried out
again because they didn’t work correctly the first time.
Proprietary Products:
Tableau Software:
It is an American computer software company headquartered in Seattle, Washington. It produces
a family of interactive data visualization products focused on business intelligence.
Tableau Public and Tableau Reader are free to use, while both Tableau Server and Tableau
Desktop come with a 14-day fully functional free trial period, after which the user must pay for
the software. Tableau Desktop comes in both a Professional and a lower cost Personal edition.
Tableau Online is available with an annual subscription for a single user, and scales to support
thousands of users. Tableau is the most powerful, secure, and flexible end-to-end analytics
platform for your data. Elevate people with the power of data. Designed for the individual, but
scaled for the enterprise, Tableau is the only business intelligence platform that turns your data
into insights that drive action. Tableau has a highly scalable, n-tier client-server architecture that
serves mobile clients, web clients and desktop-installed software. Tableau Desktop is the
authoring and publishing tool that is used to create shared views on Tableau Server.
Fig: Tableau provides a scalable solution for creation and delivery of web, mobile and desktop
analytics.
Evaluation of Tableau
Strengths Speed - The best strength point of Tableau is its speed through which it evaluates
millions of rows and provides the necessary responses in moment.
Ease of use - It is very simple to use. One can begin with Tableau even with no earlier
programming experience. With just fundamental MS Excel skills individual can simply learn
Tableau.
Beautiful and interactive dashboard - The Dashboard of Tableau is very interactive and
provides dynamic outcomes. Rich visualizations can be produced very easily. The graphics and
charts are smart and beautiful. Images, web pages and documents can be added hooked on to the
dashboard for simple story telling. All this guides to a large extent coming into the data.
Direct connection - Tableau permits the users to directly hook up with databases, cubes, and
data warehouses etc. The data access is so simple without any advanced setup and the data is live
to be getting updated on its own. Single can select tables as of spreadsheets to data from Hadoop
to generate a great mash-up and get preferred outcomes in no time. This is easy ad hoc business
analytics.
Weakness
Conclusion: Hence, we studied Business Intelligence and analytical tools to recommend the
combination of share purchases and sales for maximizing the profit.
Assignment No: 9
Title: On Twitter Data performs computing using BIA tools electively
Objectives:
1. To handle Twitter Data for performing computing.
2. To analyze data using R programming tool.
Theory:
With the increasing importance of computational text analysis in research, many researchers face
the challenge of learning how to use advanced software that enables this text analysis. Currently,
one of the most popular environments for computational methods and the emerging field of “data
science” is the R statistical software. However, for researchers that are not well-versed in
programming, learning how to use R can be a challenge, and performing text analysis in
particular can seem daunting. In this step by step guide one can learn that performing text
analysis in R is not as hard as some might fear. This is an introduction into the use of common
techniques, with the aim of helping researchers get acquainted with computational text analysis
in general, as well as getting a start at performing advanced text analysis studies in R.
Why R?
R was specifically designed for statistical analysis, which makes it highly suitable for data
science applications. Although the learning curve for programming with R can be steep,
especially for people without prior programming experience, the tools now available for carrying
out text analysis in R make it easy to perform powerful, cutting-edge text analytics using only a
few simple commands. One of the keys to R’s explosive growth has been its densely populated
collection of extension software libraries, known in R terminology as packages, supplied and
maintained by R’s extensive user community. Each package extends the functionality of the base
R language and core packages, and in addition to functions and data must include documentation
and examples, often in the form of vignettes demonstrating the use of the package. The best-
known package repository, the Comprehensive R Archive Network (CRAN), currently has over
10,000 packages that are published.
Text analysis in particular has become well established in R. There is a vast collection of
dedicated text processing and text analysis packages, from low-level string operations to
advanced text modeling techniques such as fitting Latent Dirichlet Allocation models, R
provides it all. One of the main advantages of performing text analysis in R is that it is often
possible, and relatively easy, to switch between different packages or to combine them. Recent
efforts among the R text analysis developers’ community are designed to promote this
interoperability to maximize flexibility and choice among users.4 As a result, learning the basics
for text analysis in R provides access to a wide range of advanced text analysis features.
We will first install the relevant packages that we need. To extract tweets from Twitter, we will
need package ‘twitteR’.
‘Syuzhet’ package will be used for sentiment analysis; while ‘tm’ and ‘SnowballC’ packages are
used for text mining and analysis.
# [1] 1 2 3 4 5
# Create a vector of letters
ltrs = c('a', 'b', 'c', 'd', 'e')
# [1] "a" "b" "c" "d" "e"
# Concatenate both
mixed_vec = c(numbers, ltrs)
print(mixed_vec)
# [1] "1" "2" "3" "4" "5" "a" "b" "c" "d" "e"
Let’s analyze what happened in the previous code. We can see it is possible to create vectors
with numbers and with letters. We did not need to tell R what type of data type we wanted
beforehand. Finally, we were able to create a vector with both numbers and letters. The vector
mixed_vec has coerced the numbers to character, we can see this by visualizing how the values
are printed inside quotes. The following code shows the data type of different vectors as returned
by the function class. It is common to use the class function to "interrogate" an object, asking
him what his class is.
# Character vector
ltrs = letters[1:10]
class(ltrs)
# [1] "character"
# Factor vector
fac = as.factor(ltrs)
class(fac)
# [1] "factor"
R supports two-dimensional objects also. In the following code, there are examples of the two
most popular data structures used in R: the matrix and data.frame.
# Matrix
M = matrix(1:12, ncol = 4)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
lM = matrix(letters[1:12], ncol = 4)
# [,1] [,2] [,3] [,4]
# [1,] "a" "d" "g" "j"
# [2,] "b" "e" "h" "k"
# [3,] "c" "f" "i" "l"
class(M)
# [1] "matrix"
class(lM)
# [1] "matrix"
# data.frame
# One of the main objects of R, handles different data types in the same object.
# It is possible to have numeric, character and factor vectors in the same data.frame
As demonstrated in the previous example, it is possible to use different data types in the same
object. In general, this is how data is presented in databases, APIs part of the data is text or
character vectors and other numeric. In is the analyst job to determine which statistical data type
to assign and then use the correct R data type for it. In statistics we normally consider variables
are of the following types −
Numeric
Nominal or categorical
Ordinal
In R, a vector can be of the following classes −
Numeric - Integer
Factor
Ordered Factor
R provides a data type for each statistical type of variable. The ordered factor is however rarely
used, but can be created by the function factor, or ordered.
The following section treats the concept of indexing. This is a quite common operation, and
deals with the problem of selecting sections of an object and making transformations to them.
class(df)
# [1] "data.frame"
### Indexing
# Get the first row
df[1, ]
# numbers letters
#1 1 a
# Used for programming normally - returns the output as a list
df[1, , drop = TRUE]
# $numbers
# [1] 1
#
# $letters
# [1] a
# Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
# Get several rows of the data.frame
df[5:7, ]
# numbers letters
#5 5 e
#6 6 f
#7 7 g
### Add one column that mixes the numeric column with the factor column
df$mixed = paste(df$numbers, df$letters, sep = ’’)
str(df)
# 'data.frame': 26 obs. of 3 variables:
# $ numbers: int 1 2 3 4 5 6 7 8 9 10 ...
# $ letters: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ mixed :chr "1a" "2b" "3c" "4d" ...
### Get columns
# Get the first column
df[, 1]
# It returns a one dimensional vector with that column
# Get two columns
df2 = df[, 1:2]
head(df2)
# numbers letters
#1 1 a
#2 2 b
#3 3 c
#4 4 d
#5 5 e
#6 6 f
# numbers mixed
#1 1 1a
#2 2 2b
#3 3 3c
### Index columns from their names
names(df)
Mini project On BI
Objective :
Theory :
Sentiment Analysis :
Sentiment analysis is critical because helps you see what customers like and dislike about you
and your brand. Customer feedback—from social media, your website, your call center agents,
or any other source—contains a treasure trove of useful business information. But, it isn’t
enough to know what customers are talking about. You must also know how they feel. Sentiment
analysis is one way to uncover those feelings. Sometimes known as “opinion mining,” sentiment
analysis can let you know if there has been a change in public opinion toward any aspect of your
business. Peaks or valleys in sentiment scores give you a place to start if you want to make
product improvements, train sales or customer care agents, or create new marketing campaigns
Sentiment analysis is not a once and done effort. By reviewing your customer’s feedback on your
business regularly you can be more proactive regarding the changing dynamics in the market
place.
Working :
The world is moving towards a fully digitalized economy at an incredible pace and as a result, a
ginormous amount of data is being produced by the internet, social media, smartphones, tech
equipment and many other sources each day which has led to the evolution of Big Data
management and analytics. Sentiment analysis is one such tool and the most popular branch of
textual analytics which with the help of statistics and natural language processing examine and
classify the unorganized textual data into various sentiments. It is also known as opinion mining
as it largely focuses on the opinion and attitude of the people through analyzing their texts.
More than 34 billion texts are exchanged over the WhatsApp every day and just imagine if we
could analyze and get valuable insights from this data and leverage it to not only take better real-
time decisions but also add value to the stakeholders at much lower cost and time and hence
align our operational efficiency with organizational strategy. In this article, we’ll leverage the
power of sentiment analysis to investigate the WhatsApp chat using R, visualize and interpret the
results at the same time.
WhatsApp is most popular chat app with monthly active users of more than 700 million. The
popularity of this app has made it a necessary app among smartphone users and even businesses
and organizations use WhatsApp for daily communication in groups and across departments.
Corporations get a huge amount of textual data from WhatsApp and they can leverage WhatsApp
chat sentiment analysis to gain better insights about their employees and try to avoid unforeseen
conflicts due to various redundancies and inefficiency of business processes.
Flowchart :
Program :
#Data frame
data <- data.frame(word = names(v),freq=v)
head(data, 10)
Output :
We can easily infer from the bar plot that the chat had a maximum number of positive sentiments
followed bynegative as second and anticipation at third. Issues with Sentiments and Analytics
Though Sentiment analysis has been one of the most popular textual analysis tools among
businesses, scholars and analysts to take decisions and for research purposes Sentiment analysis
has its own limitations as language is very complex and the meaning of each and every word
changes with time and from person to person. Also, the accuracy of the analysis can’t be
accurately measured and compared with how human beings analyze emotions.
Despite its limitations Sentiment Analysis is extremely popular and widely used analytical tool
in business intelligence for social media monitoring, brand health examination, effects of ad
campaigns or new product launch and various research purposes. It is frequently applied to
Twitter data and Customer reviews by marketers and customer service teams to identify the
feelings of consumers. Sentiment analysis has also started to gain popularity in areas like
psychology, political science and other alike fields where textual data is obtained and explored
from books, transcripts, and reports.
Conclusion :
We have successfully studied Sentiment Analysis of Whatsapp data analytics tools.
References :
1) https://www.clarabridge.com/sentiment-analysis/
2) http://www.planetanalytics.in/2017/09/whatsapp-chat-sentiment-analysis-in-r.html
3) https://creately.com/diagram/example/h11q823n2/sentiment%20analysis