DWM Mini Project

Manjra Charitable Trust's
RAJIV GANDHI INSTITUTE OF TECHNOLOGY

JUHU VERSOVA LINK ROAD, ANDHERI (W), MUMBAI 400 053
Introduction:
TANAGRA is free DATA MINING software for academic and research purposes. It
proposes several data mining methods from exploratory data analysis, statistical learning,
machine learning and databases area.
This project is the successor of SIPINA which implements various supervised

learning algorithms, especially an interactive and visual construction of decision trees.
TANAGRA is more powerful, it contains some supervised learning but also other paradigms
such as clustering, factorial analysis, parametric and nonparametric statistics, association rule,
feature selection and construction algorithms...
TANAGRA is an "open source project" as every researcher can access to the source
code, and add his own algorithms, as far as he agrees and conforms to the software
distribution license.
The main purpose of Tanagra project is to give researchers and students an easy-to-
use data mining software, conforming to the present norms of the software development in
this domain (especially in the design of its GUI and the way to use it), and allowing to
analyze either real or synthetic data.
The second purpose of TANAGRA is to propose to researchers an architecture

allowing them to easily add their own data mining methods, to compare their performances.
TANAGRA acts more as an experimental platform in order to let them go to the essential of
their work, dispensing them to deal with the unpleasant part in the programmation of this
kind of tools: the data management.
The third and last purpose, in direction of novice developers, consists in diffusing a
possible methodology for building this kind of software. They should take advantage of free
access to source code, to look how this sort of software is built, the problems to avoid, the
main steps of the project, and which tools and code libraries to use for. In this way, Tanagra
can be considered as a pedagogical tool for learning programming techniques.
TANAGRA does not include, presently, what makes all the strength of the
commercial softwares in this domain: a wide set of data sources, direct access to data
warehouses and databases, data cleansing, interactive utilization,...
DEPARTMENT: INFORMATION TECHNOLOGY Page 1


Import dataset into Tanagra:
1. Choose “File/New…” in the main menu of TANAGRA
2. Enter a title for the diagram: « TANAGRA : Importing Data »

3. Enter the name of the associated file in which you will save your work
(« TANAGRA_ImportingData.bdm »).
4. Before click on Save button, to run through the hard disk and place yourself in the
directory « …\TANAGRA\Tutorials ».
5. Click on the open button icon to seek the file you have created “weather.txt”.


6. Validate with OK to start data importation.
A new diagram is created, based on the file « weather.txt ». You can see the
description of its contents in the right frame.


This project is undertaken in the subject of Data warehouse and Mining and Business
Intelligence. It is a tool based project. We are using the Tanagra tool and the database used is
the weather report. In this project we are going to show all the attributes affecting the weather
and it includes attributes such as temperature, humidity, windy etc. This gives us a brief idea
of the weather of the area. Using Tanagra tool we can derive different conclusions about the
given database. By using visualization, regression techniques, association and K means
method helps us derive different observations and conclusions about the database. To view
the data in graphical form we use Scatter plot. Tanagra tool helps us to get an overview of
this database.
Database details:
The database that is used in this mini project includes the results of weather and their
information. This data consists of various fields. The database is available as an Excel
document. The Excel document consists of records of 15 weather.
Tanagra loads data from text files with tab separator, built in the following way:
- 1st line: names of attributes
- Next lines: values of the attributes for the sample (one line for each record).
This text file (Dataset) includes the following attributes:

1. Outlook
2. Temp
3. Humidity
4. Windy
5. Class
The dataset contain two continuous attribute and three are discrete attribute .
The discrete values of attribute are as follows:-

Outlook = “sunny”,”overcast”,”rain”.
Windy = “yes”,”no”.
Class =”play”,”dontplay“.
This project contains analysis of the above database in terms of

1. Scatter plot with label
2. clustering
3. association
4. Regression tree


Scatter plot with label:-
Problem statement:
The dataset provided enormous information about the weather. This data set is plotted to
form a scatter plot with label. The features taken account to plot the scatter graph are
1. Temp
2. Humidity
The scatter plot with label is a tool to provide a graphical view which must include all this
information.
Steps for creating the scatter plot with label:

1. The dataset (weather.txt) to be classified is loaded into the Tanagra statistics data
editor.
2. Open data visualization tab from the component bar.
3. And select the scatter plot with label option from the visualization tab.
4. Drag this option onto dataset and open it.
5. The output appears in right frame.
6. It contains the scatter plot for the chosen features.
By using data visualization we have derived the scatter plot of the attributes humidity and
temperature.


Clustering:-
Problem statement:
The data set provides vast information based on different characteristics and features.
Clustering is the task of segmenting a diverse group into number of more similar subgroups
or clusters. Here the clustering is done on the attribute Temp and Humidity.
Step for creating the clustering (k mean):

1. Add a Define Status operator under the “Dataset” node, by clicking on its icon in the
shortcuts toolbar. A dialog box appears automatically, allowing the definition of
the status of the attributes.
2. Before all, be sure that the active tab in the dialog is the “Input” one. Then select the
continuous attributes in the left list by clicking the corresponding button below the list
(as shown in the following screenshot), and hit the arrow button to bring them in the
Input list.


3. Select two continuous attributes for input value.

4. Now you have defined the descriptors to do this. Click OK to validate and close this
dialog box.
5. Drag the k-mean option onto Define Status 1 for which we define the descriptor.
6. And select view option by right clicking on k-mean 1 option.
7. The output appears in right frame.
8. It contains the clustering for the chosen features.
By performing k-mean clustering operation we have grouped the data into more
manageable, distinct and fixed number of cluster.


Association:-
Problem statement:
The data set provides vast information based on different characteristics and features.
It is used to find relationship in database. The relationship has been shown between three
attributes outlook, windy and class.
Step for creating the association:

Input list.
3. Select one continuous attribute for input value as temp.

4. And select one continuous attribute for target value as humidity.
5. In the same dialog box, activate the Target tab. Select the « class » attribute in the list
and click the arrow button.
6. Now you have defined the class attribute (« class » = Target), and the descriptors to
do this (the others = Input).
7. Click OK to validate and close this dialog box


8. Drag Apriori on define status and see the output appears in right frame.
9. It contains the Association for the chosen features


By performing association we have manage to show distinct link between two attributes.


Regression Tree:-
Problem statement:
The data set contains all the information according to the various attributes. We
attempt to use regression tree to find the relationship between variables temp and humidity.
Step for creating the Regression:

Input list.
3. Select one continuous attribute for input value as temp.

4. And select one continuous attribute for target value as humidity.


5. In the same dialog box, activate the Target tab. Select the « class » attribute in the list
and click the arrow button.
6. Now you have defined the class attribute (« class » = Target), and the descriptors to
do this (the others = Input).
7. Click OK to validate and close this dialog box
8. Drag Regression tree on define status and see the output appears in right frame.
9. It contains the Association for the chosen features


By constucting the regression tree we have been able to show the relationship.


This was a mini project in DWMI using tool Tanagra.
We have successfully completed the analysis of the above data set. The data set
contained the information about weather details. Using Tanagra, we could carry out analysis
of the data using the tools provided.
The Scatter Plot with Label was then carried out on the above dataset. The features
taken account to plot the scatter graph are temperature and humidity of the weather. The
scatter plot is a tool to provide a graphical view which includes all this information.
The Clustering Analysis is used to produce segmenting a diverse group into number
of more similar subgroups and clustering is done on the attribute Temp and Humidity.
Association Analysis is used to find relationship in database. The relationship has

been shown between three attributes outlook, windy and class.
Regression tree was carried on the data to find the relationship between variables like
temperature and humidity.
Hence, we have successfully completed the mini project.

DWM Mini Project

Uploaded by

Copyright:

Available Formats

DWM Mini Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DWM Mini Project

Uploaded by

Copyright:

Available Formats

Manjra Charitable Trust's

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

This project is the successor of SIPINA which implements various supervised

The second purpose of TANAGRA is to propose to researchers an architecture

DEPARTMENT: INFORMATION TECHNOLOGY Page 1

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

Import dataset into Tanagra:

1. Choose “File/New…” in the main menu of TANAGRA

2. Enter a title for the diagram: « TANAGRA : Importing Data »

DEPARTMENT: INFORMATION TECHNOLOGY Page 2

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

6. Validate with OK to start data importation.

DEPARTMENT: INFORMATION TECHNOLOGY Page 3

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

This text file (Dataset) includes the following attributes:

The discrete values of attribute are as follows:-

This project contains analysis of the above database in terms of

DEPARTMENT: INFORMATION TECHNOLOGY Page 4

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

Scatter plot with label:-

Steps for creating the scatter plot with label:

DEPARTMENT: INFORMATION TECHNOLOGY Page 5

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

Step for creating the clustering (k mean):

DEPARTMENT: INFORMATION TECHNOLOGY Page 6

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

3. Select two continuous attributes for input value.

DEPARTMENT: INFORMATION TECHNOLOGY Page 7

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

Step for creating the association:

3. Select one continuous attribute for input value as temp.

DEPARTMENT: INFORMATION TECHNOLOGY Page 8

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

DEPARTMENT: INFORMATION TECHNOLOGY Page 9

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

DEPARTMENT: INFORMATION TECHNOLOGY Page 10

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

Step for creating the Regression:

3. Select one continuous attribute for input value as temp.

DEPARTMENT: INFORMATION TECHNOLOGY Page 11

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

DEPARTMENT: INFORMATION TECHNOLOGY Page 12

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

DEPARTMENT: INFORMATION TECHNOLOGY Page 13

RAJIV GANDHI INSTITUTE OF TECHNOLOGY

This was a mini project in DWMI using tool Tanagra.

Association Analysis is used to find relationship in database. The relationship has

Hence, we have successfully completed the mini project.

DEPARTMENT: INFORMATION TECHNOLOGY Page 14

You might also like