DWM Mini Project
DWM Mini Project
DWM Mini Project
Introduction:
TANAGRA is free DATA MINING software for academic and research purposes. It
proposes several data mining methods from exploratory data analysis, statistical learning,
machine learning and databases area.
TANAGRA is an "open source project" as every researcher can access to the source
code, and add his own algorithms, as far as he agrees and conforms to the software
distribution license.
The main purpose of Tanagra project is to give researchers and students an easy-to-
use data mining software, conforming to the present norms of the software development in
this domain (especially in the design of its GUI and the way to use it), and allowing to
analyze either real or synthetic data.
The third and last purpose, in direction of novice developers, consists in diffusing a
possible methodology for building this kind of software. They should take advantage of free
access to source code, to look how this sort of software is built, the problems to avoid, the
main steps of the project, and which tools and code libraries to use for. In this way, Tanagra
can be considered as a pedagogical tool for learning programming techniques.
TANAGRA does not include, presently, what makes all the strength of the
commercial softwares in this domain: a wide set of data sources, direct access to data
warehouses and databases, data cleansing, interactive utilization,...
A new diagram is created, based on the file « weather.txt ». You can see the
description of its contents in the right frame.
This project is undertaken in the subject of Data warehouse and Mining and Business
Intelligence. It is a tool based project. We are using the Tanagra tool and the database used is
the weather report. In this project we are going to show all the attributes affecting the weather
and it includes attributes such as temperature, humidity, windy etc. This gives us a brief idea
of the weather of the area. Using Tanagra tool we can derive different conclusions about the
given database. By using visualization, regression techniques, association and K means
method helps us derive different observations and conclusions about the database. To view
the data in graphical form we use Scatter plot. Tanagra tool helps us to get an overview of
this database.
Database details:
The database that is used in this mini project includes the results of weather and their
information. This data consists of various fields. The database is available as an Excel
document. The Excel document consists of records of 15 weather.
Tanagra loads data from text files with tab separator, built in the following way:
- 1st line: names of attributes
- Next lines: values of the attributes for the sample (one line for each record).
The dataset contain two continuous attribute and three are discrete attribute .
Problem statement:
The dataset provided enormous information about the weather. This data set is plotted to
form a scatter plot with label. The features taken account to plot the scatter graph are
1. Temp
2. Humidity
The scatter plot with label is a tool to provide a graphical view which must include all this
information.
By using data visualization we have derived the scatter plot of the attributes humidity and
temperature.
Clustering:-
Problem statement:
The data set provides vast information based on different characteristics and features.
Clustering is the task of segmenting a diverse group into number of more similar subgroups
or clusters. Here the clustering is done on the attribute Temp and Humidity.
By performing k-mean clustering operation we have grouped the data into more
manageable, distinct and fixed number of cluster.
Association:-
Problem statement:
The data set provides vast information based on different characteristics and features.
It is used to find relationship in database. The relationship has been shown between three
attributes outlook, windy and class.
8. Drag Apriori on define status and see the output appears in right frame.
9. It contains the Association for the chosen features
By performing association we have manage to show distinct link between two attributes.
Regression Tree:-
Problem statement:
The data set contains all the information according to the various attributes. We
attempt to use regression tree to find the relationship between variables temp and humidity.
5. In the same dialog box, activate the Target tab. Select the « class » attribute in the list
and click the arrow button.
6. Now you have defined the class attribute (« class » = Target), and the descriptors to
do this (the others = Input).
7. Click OK to validate and close this dialog box
8. Drag Regression tree on define status and see the output appears in right frame.
9. It contains the Association for the chosen features
By constucting the regression tree we have been able to show the relationship.
We have successfully completed the analysis of the above data set. The data set
contained the information about weather details. Using Tanagra, we could carry out analysis
of the data using the tools provided.
The Scatter Plot with Label was then carried out on the above dataset. The features
taken account to plot the scatter graph are temperature and humidity of the weather. The
scatter plot is a tool to provide a graphical view which includes all this information.
The Clustering Analysis is used to produce segmenting a diverse group into number
of more similar subgroups and clustering is done on the attribute Temp and Humidity.
Regression tree was carried on the data to find the relationship between variables like
temperature and humidity.