Lab Assignment Report: ECS 851 Data Warehousing and Data Mining
Lab Assignment Report: ECS 851 Data Warehousing and Data Mining
Lab Assignment Report: ECS 851 Data Warehousing and Data Mining
On
ECS 851
Data Warehousing and Data Mining
TMU, MORADABAD
Enrollment No. :
Course :
Section:
EXPERIMENT NO: 1
Aim:
Create an Employee Table with the help of Data Mining Tool WEKA.
Description:
We need to create an Employee Table with training data set which includes attributes like name, id, salary,
experience, gender, phone number.
Procedure:
Steps:
@relation employee
@attribute name {x,y,z,a,b}
@attribute id numeric
@attribute salary {low,medium,high}
@attribute exp numeric
@attribute gender {male,female}
@attribute phone numeric
@data
x,101,low,2,male,250311
y,102,high,3,female,251665
z,103,medium,1,male,240238
a,104,low,5,female,200200
b,105,high,2,male,240240
2
Training Data Set Weather Table
Result:
3
EXPERIMENT NO:2
Aim:
Create a Weather Table with the help of Data Mining Tool WEKA.
Description:
We need to create a Weather table with training data set which includes attributes like outlook, temperature,
humidity, windy, play.
Procedure:
Steps:
@relation weather
@attribute outlook {sunny,rainy,overcast}
@attribute temparature numeric
@attribute humidity numeric
@attribute windy {true,false}
@attribute play {yes,no}
@data
sunny,85.0,85.0,false,no
overcast,80.0,90.0,true,no
sunny,83.0,86.0,false,yes
rainy,70.0,86.0,false,yes
rainy,68.0,80.0,false,yes
rainy,65.0,70.0,true,no
overcast,64.0,65.0,false,yes
sunny,72.0,95.0,true,no
sunny,69.0,70.0,false,yes
rainy,75.0,80.0,false,yes
4
Training Data Set Weather Table
Result:
5
EXPERIMENT NO:3
Aim:
Description:
Real world databases are highly influenced to noise, missing and inconsistency due to their queue size so the
data can be pre-processed to improve the quality of data and missing results and it also improves the efficiency.
1) Add
2) Remove
3) Normalization
Procedure:
@relation weather
@attribute outlook {sunny,rainy,overcast}
@attribute temparature numeric
@attribute humidity numeric
@attribute windy {true,false}
@attribute play {yes,no}
@data
sunny,85.0,85.0,false,no
overcast,80.0,90.0,true,no
sunny,83.0,86.0,false,yes
rainy,70.0,86.0,false,yes
rainy,68.0,80.0,false,yes
rainy,65.0,70.0,true,no
overcast,64.0,65.0,false,yes
sunny,72.0,95.0,true,no
sunny,69.0,70.0,false,yes
rainy,75.0,80.0,false,yes
3) After that the file is saved with .arff file format.
4) Minimize the arff file and then open Start Programs weka-3-4.
5) Click on weka-3-4, then Weka dialog box is displayed on the screen.
6) In that dialog box there are four modes, click on explorer.
7) Explorer shows many options. In that click on and select the arff file
8) Click on edit button which shows weather table on weka.
6
Add Pre-Processing Technique:
Procedure:
7
Remove Pre-Processing Technique:
Procedure:
8
Weather Table after removing attributes WINDY, PLAY:
Procedure:
9
Result:
10
EXPERIMENT NO:4
Aim:
Description:
Real world databases are highly influenced to noise, missing and inconsistency due to their queue size so the
data can be pre-processed to improve the quality of data and missing results and it also improves the efficiency.
1) Add
2) Remove
3) Normalization
Procedure:
@relation employee
@attribute name {x,y,z,a,b}
@attribute id numeric
@attribute salary {low,medium,high}
@attribute exp numeric
@attribute gender {male,female}
@attribute phone numeric
@data
x,101,low,2,male,250311
y,102,high,3,female,251665
z,103,medium,1,male,240238
a,104,low,5,female,200200
b,105,high,2,male,240240
11
Training Data Set Employee Table
Procedure:
12
Employee Table after adding new attribute ADDRESS:
Procedure:
13
Employee Table after removing attributes SALARY, GENDER:
Procedure:
14
Employee Table after Normalizing ID, EXP, PHONE:
Result:
15
EXPERIMENT NO:5
Aim:
Description:
algorithm. Knowledge flow is a working progress. So, some of the functionality from explorer is not yet available. So,
on the other hand there are the things that can be done in knowledge flow, but not in explorer. Knowledge flow
presents a dataflow interface to WEKA. The user can select WEKA components from a toolbar placed them on a
layout campus and connect them together in order to form a knowledge flow for processing and analyzing the data.
Procedure:
@relation weather
@attribute outlook {sunny,rainy,overcast}
@attribute temparature numeric
@attribute humidity numeric
@attribute windy {true,false}
@attribute play {yes,no}
@data
sunny,85.0,85.0,false,no
overcast,80.0,90.0,true,no
sunny,83.0,86.0,false,yes
rainy,70.0,86.0,false,yes
rainy,68.0,80.0,false,yes
rainy,65.0,70.0,true,no
overcast,64.0,65.0,false,yes
sunny,72.0,95.0,true,no
sunny,69.0,70.0,false,yes
rainy,75.0,80.0,false,yes
3) After that the file is saved with .arff file format.
4) Minimize the arff file and then open Start Programs weka-3-4.
5) Click on weka-3-4, then Weka dialog box is displayed on the screen.
6) In that dialog box there are four modes, click on explorer.
7) Explorer shows many options. In that click on and select the arff file
8) Click on edit button which shows Weather table on weka.
16
Output:
18
EXPERIMENT NO:6
Aim:
Description:
algorithm. Knowledge flow is a working progress. So, some of the functionality from explorer is not yet available. So,
on the other hand there are the things that can be done in knowledge flow, but not in explorer. Knowledge flow
presents a dataflow interface to WEKA. The user can select WEKA components from a toolbar placed them on a
layout campus and connect them together in order to form a knowledge flow for processing and analyzing the data.
Procedure:
19
Output:
20
13) Check whether output is created or not by selecting the preferred path.
14) Rename the data name as a.arff
15) Double click on a.arff then automatically the output will be opened in MS-Excel.
Result:
21
EXPERIMENT NO:7
Description:
In data mining, association rule learning is a popular and well researched method for discovering interesting
relations between variables in large databases. It can be described as analyzing and presenting strong rules discovered
in databases using different measures of interestingness. In market basket analysis association rules are used and they
are also employed in many application areas including Web usage mining, intrusion detection and bioinformatics.
Procedure:
Output:
22
Procedure for Association Rules:
23
Result:
24
EXPERIMENT NO:8
Description:
In data mining, association rule learning is a popular and well researched method for discovering interesting
relations between variables in large databases. It can be described as analyzing and presenting strong rules discovered
in databases using different measures of interestingness. In market basket analysis association rules are used and they
are also employed in many application areas including Web usage mining, intrusion detection and bioinformatics.
Procedure:
25
Procedure for Association Rules:
26
Output:
Result:
27
EXPERIMENT NO:9
Description:
In data mining, association rule learning is a popular and well researched method for discovering interesting
relations between variables in large databases. It can be described as analyzing and presenting strong rules discovered
in databases using different measures of interestingness. In market basket analysis association rules are used and they
are also employed in many application areas including Web usage mining, intrusion detection and bioinformatics.
Procedure:
@data
youth, high, A
youth,medium,B
youth, low, C
middle, low, C
middle, medium, C
middle, high, A
senior, low, C
senior, medium, B
senior, high, B
middle, high, B
28
Training Data Set Employee Table
29
Output:
Result:
30
EXPERIMENT NO:10
Aim:
Description:
Classification is the process for finding a model that describes the data values and concepts for the
purpose of Prediction.
Decision Tree:
A decision Tree is a classification scheme to generate a tree consisting of root node, internal nodes
and external nodes.
Root nodes representing the attributes. Internal nodes are also the attributes. External nodes are the
classes and each branch represents the values of the attributes
Decision Tree also contains set of rules for a given data set; there are two subsets in Decision Tree.
One is a Training data set and second one is a Testing data set. Training data set is previously classified data.
Testing data set is newly generated data.
Procedure:
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
31
5) Click on weka-3-4, then Weka dialog box is displayed on the screen.
6) In that dialog box there are four modes, click on explorer.
7) Explorer shows many options. In that click on and select the arff file
8) Click on edit button which shows weather table on weka.
32
Output:
Decision Tree:
33
EXPERIMENT NO:11
Aim:
Description:
Classification is the process for finding a model that describes the data values and concepts for the
purpose of Prediction.
Decision Tree:
A decision Tree is a classification scheme to generate a tree consisting of root node, internal nodes
and external nodes.
Root nodes representing the attributes. Internal nodes are also the attributes. External nodes are the
classes and each branch represents the values of the attributes
Decision Tree also contains set of rules for a given data set; there are two subsets in Decision Tree.
One is a Training data set and second one is a Testing data set. Training data set is previously classified data.
Testing data set is newly generated data.
Procedure:
@data
x,youth,high,A
y,youth,low,B
z,middle,high,A
u,middle,low,B
v,senior,high,A
l,senior,low,B
w,youth,high,A
q,youth,low,B
r,middle,high,A
n,senior,high,A
34
Training Data Set Customer Table
Output:
35
Decision Tree:
EXPERIMENT NO:12
36
Aim:
Description:
Classification is the process for finding a model that describes the data values and concepts for the
purpose of Prediction.
Decision Tree:
A decision Tree is a classification scheme to generate a tree consisting of root node, internal nodes
and external nodes.
Root nodes representing the attributes. Internal nodes are also the attributes. External nodes are the
classes and each branch represents the values of the attributes
Decision Tree also contains set of rules for a given data set; there are two subsets in Decision Tree.
One is a Training data set and second one is a Testing data set. Training data set is previously classified data.
Testing data set is newly generated data.
Procedure:
@data
21,hyd
21,hyd
24,blr
24,blr
24,blr
24,blr
21,hyd
25,kdp
25,kdp
25,kdp
3) After that the file is saved with .arff file format.
4) Minimize the arff file and then open Start Programs weka-3-4.
5) Click on weka-3-4, then Weka dialog box is displayed on the screen.
6) In that dialog box there are four modes, click on explorer.
7) Explorer shows many options. In that click on and select the arff file
8) Click on edit button which shows location table on weka.
37
Procedure for Decision Trees:
38
Output:
Decision Tree:
Result:
39
EXPERIMENT NO:13
Aim:
Description:
This program calculates and has comparisons on the data set selection of attributes and methods of
manipulations have been chosen. The Visualization can be shown in a 2-D representation of the information.
Procedure:
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
40
Training Data Set Weather Table
41
Procedure:
5) After that we select the Select Attribute button, then select Outlook attribute and clock OK.
6) Click on the Update button to display the output.
7) After that select the Select Attribute button and select Temperature attribute and then click OK.
8) Increase the Plot Size and Point Size.
9) Click on the Update button to display the output.
10) After that we select the Select Attribute button, then select Humidity attribute and clock OK.
11) Click on the Update button to display the output.
12) After that select the Select Attribute button and select Windy attribute and then click OK.
13) Increase the Jitter Size.
14) Click on the Update button to display the output.
15) After that we select the Select Attribute button, then select Play attribute and clock OK.
42
16) Click on the Update button to display the output.
Output:
Output:
43
Output:
Output:
Result:
44
EXPERIMENT NO:14
Aim:
Description:
This program calculates and has comparisons on the data set selection of attributes and methods of
manipulations have been chosen. The Visualization can be shown in a 2-D representation of the information.
Procedure:
45
Procedure:
46
2-D Plot Matrix:
5) After that we select the Select Attribute button, then select Cust attribute and clock OK.
6) Click on the Update button to display the output.
7) Output:
47
8) After that select the Select Attribute button and select Accno attribute and then click OK.
9) Increase the Plot Size and Point Size.
10) Click on the Update button to display the output.
Output:
48
11)After that we select the Select Attribute button, then select Bankname attribute and clock OK.
12)Click on the Update button to display the output.
Output:
13)After that select the Select Attribute button and select location attribute and then click OK.
14)Increase the Jitter Size.
15)Click on the Update button to display the output.
49
Output:
16)After that we select the Select Attribute button, then select Deposit attribute and clock OK.
17) Click on the Update button to display the output.
Output:
50
Result:
51
EXPERIMENT NO:15
Aim:
Write a procedure for cross-validation using J48 Algorithm for weather table.
Description:
Cross-validation, sometimes called rotation estimation, is a technique for assessing how the results of a
statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction,
and one wants to estimate how accurately a predictive model will perform in practice. One round of cross-validation
involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the
training set), and validating the analysis on the other subset (called the validation set or testing set).
Procedure:
52
Procedure:
1) Start -> Programs -> Weka 3.4
2) Open Knowledge Flow.
3) Select Data Source tab & choose Arff Loader.
4) Place Arff Loader component on the layout area by clicking on that component.
5) Specify an Arff file to load by right clicking on Arff Loader icon, and then a pop-up menu will appear.
In that select Configure & browse to the location of weather.arff
6) Click on the Evaluation tab & choose Class Assigner & place it on the layout.
7) Now connect the Arff Loader to the Class Assigner by right clicking on Arff Loader, and then select
Data Set option, now a link will be established.
8) Right click on Class Assigner & choose Configure option, and then a new window will appear & specify
a class to our data.
9) Select Evaluation tab & select Cross-Validation Fold Maker & place it on the layout.
10) Now connect the Class Assigner to the Cross-Validation Fold Maker.
11) Select Classifiers tab & select J48 component & place it on the layout.
12) Now connect Cross-Validation Fold Maker to J48 twice; first choose Training Data Set option and
then Test Data Set option.
13) Select Evaluation Tab & select Classifier Performance Evaluator component & place it on the layout.
14) Connect J48 to Classifier Performance Evaluator component by right clicking on J48 & selecting
Batch Classifier.
15) Select Visualization tab & select Text Viewer component & place it on the layout.
16) Connect Text Viewer to Classifier Performance Evaluator by right clicking on Text Viewer & by
selecting Text option.
17) Start the flow of execution by selecting Start Loading from Arff Loader.
53
18) For viewing result, right click on Text Viewer & select the Show Results, and then the result will be
displayed on the new window.
Output:
Result:
The program has been successfully executed.
54
EXPERIMENT NO:16
Aim: Write a procedure for Clustering Buying data using Cobweb Algorithm.
Description:
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the
objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters.
Clustering is a main task of explorative data mining, and a common technique for statistical data analysis used in
many fields, including machine learning, pattern recognition, image analysis, information retrieval, and
bioinformatics.
Procedure:
55
Training Data Set Buying Table
Procedure:
1) Click Start -> Programs -> Weka 3.4
2) Click on Explorer.
3) Click on open file & then select Buying.arff file.
4) Click on Cluster menu. In this there are different algorithms are there.
5) Click on Choose button and then select cobweb algorithm.
6) Click on Start button and then output will be displayed on the screen.
56
Output:
Result:
The program has been successfully executed.
57
EXPERIMENT NO:17
Description:
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the
objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters.
Clustering is a main task of explorative data mining, and a common technique for statistical data analysis used in
many fields, including machine learning, pattern recognition, image analysis, information retrieval, and
bioinformatics.
Procedure:
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
58
Training Data Set Weather Table
Procedure:
9) Click Start -> Programs -> Weka 3.4
10) Click on Explorer.
11) Click on open file & then select Weather.arff file.
12) Click on Cluster menu. In this there are different algorithms are there.
13) Click on Choose button and then select EM algorithm.
14) Click on Start button and then output will be displayed on the screen.
59
Output:
Result:
The program has been successfully executed.
60
EXPERIMENT NO:18
Aim: Write a procedure for Banking data using Farthest First Algorithm.
Description:
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the
objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters.
Clustering is a main task of explorative data mining, and a common technique for statistical data analysis used in
many fields, including machine learning, pattern recognition, image analysis, information retrieval, and
bioinformatics.
Procedure:
61
Procedure:
1) Click Start -> Programs -> Weka 3.4
2) Click on Explorer.
3) Click on open file & then select Banking.arff file.
4) Click on Cluster menu. In this there are different algorithms are there.
5) Click on Choose button and then select FarthestFirst algorithm.
6) Click on Start button and then output will be displayed on the screen.
62
Output:
Result:
The program has been successfully executed.
63
EXPERIMENT NO:19
Aim: Write a procedure for Employee data using Make Density Based Cluster Algorithm.
Description:
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the
objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters.
Clustering is a main task of explorative data mining, and a common technique for statistical data analysis used in
many fields, including machine learning, pattern recognition, image analysis, information retrieval, and
bioinformatics.
Procedure:
@data
101,raj,10000,4,pdtr
102,ramu,15000,5,pdtr
103,anil,12000,3,kdp
104,sunil,13000,3,kdp
105,rajiv,16000,6,kdp
106,sunitha,15000,5,nlr
107,kavitha,12000,3,nlr
108,suresh,11000,5,gtr
109,ravi,12000,3,gtr
110,ramana,11000,5,gtr
111,ram,12000,3,kdp
112,kavya,13000,4,kdp
113,navya,14000,5,kdp
3) After that the file is saved with .arff file format.
4) Minimize the arff file and then open Start Programs weka-3-4.
5) Click on weka-3-4, then Weka dialog box is displayed on the screen.
6) In that dialog box there are four modes, click on explorer.
7) Explorer shows many options. In that click on and select the arff file
8) Click on edit button which shows employee table on weka.
64
Procedure:
1) Click Start -> Programs -> Weka 3.4
2) Click on Explorer.
3) Click on open file & then select Employee.arff file.
4) Click on Cluster menu. In this there are different algorithms are there.
5) Click on Choose button and then select MakeDensityBasedClusterer algorithm.
6) Click on Start button and then output will be displayed on the screen.
65
Output:
Result:
The program has been successfully executed.
66
EXPERIMENT NO:20
Aim: Write a procedure for Clustering Customer data using Simple KMeans Algorithm.
Description:
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the
objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters.
Clustering is a main task of explorative data mining, and a common technique for statistical data analysis used in
many fields, including machine learning, pattern recognition, image analysis, information retrieval, and
bioinformatics.
Procedure:
@data
x,youth,high,A
y,youth,low,B
z,middle,high,A
u,middle,low,B
v,senior,high,A
l,senior,low,B
w,youth,high,A
q,youth,low,B
r,middle,high,A
n,senior,high,A
67
Training Data Set Customer Table
Procedure:
1) Click Start -> Programs -> Weka 3.4
2) Click on Explorer.
3) Click on open file & then select Customer.arff file.
4) Click on Cluster menu. In this there are different algorithms are there.
5) Click on Choose button and then select SimpleKMeans algorithm.
6) Click on Start button and then output will be displayed on the screen.
68
Output:
Result:
The program has been successfully executed.
69