Lecture 7 - Weka
Lecture 7 - Weka
Lecture 7 - Weka
Sampath Jayarathna
Cal Poly Pomona
Todays Workshop!
• What is WEKA?
• The Explorer:
• Preprocess data
• Classification
• Clustering (later)
• Attribute Selection
• Data Visualization
• KnowledgeFlow
• Generate multiple ROC curves
What is WEKA?
• Waikato Environment for Knowledge
Analysis
• It’s a data mining/machine learning tool developed by Department of
Computer Science, University of Waikato, New Zealand.
• Weka is also a bird found only on the islands of New Zealand.
• Website: http://www.cs.waikato.ac.nz/ml/weka/
• Support multiple platforms (written in java):
• Windows, Mac OS X and Linux
Main Features
• Main components
• “The Explorer” (exploratory data analysis)
• “The Experimenter” (experimental environment)
• “The KnowledgeFlow” (process model inspired interface)
• “Simple CLI” (Command Line interface)
Todays Workshop!
• What is WEKA?
• The Explorer:
• Preprocess data
• Classification
• Clustering (later)
• Attribute Selection
• Data Visualization
• KnowledgeFlow
• Generate multiple ROC curves
Explorer: pre-processing the data
• Data can be imported from a file in various formats: ARFF, CSV, C4.5,
binary
• Data can also be read from a URL or from an SQL database (using
JDBC)
• Pre-processing tools in WEKA are called “filters”
• WEKA contains filters for:
• Discretization, normalization, resampling, attribute selection, transforming
and combining attributes, …
WEKA only deals with “flat” files called “arff”
@relation heart-disease-simplified
Numeric attribute
@data
63,male,typ_angina,233,no,not_present
Missing value
67,male,asympt,286,yes,present
represented by ?
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
Exercise 1
14
05/09/2024 University of Waikato
15
Exercise 4
• What is WEKA?
• The Explorer:
• Preprocess data
• Classification
• Clustering (later)
• Attribute Selection
• Data Visualization
• KnowledgeFlow
• Generate multiple ROC curves
Knowledge Flow Interface