Weka Assign 1 Text
Weka Assign 1 Text
Weka Assign 1 Text
this assignment, your task is to familiarize yourself with the WEKA machine learning tool and the attribute ranking facilities in WEKA (Select attributes feature in WEKA Explorer). For this assignment, you will use contact-lenses, iris, and soybean data sets, all of which are available in the required .arff format in the WEKA package. The contact-lenses data set has 24 instances with 5 nominal attributes, the last of which (contact-lenses) is the class dimension. The iris set has 150 instances with 4 continuous attributes and the nominal class, which is the last (5th) dimension. The soybean set has 683 instances with 36 nominal attributes, the last of which is the class dimension. Unlike the other two sets, soybean has missing values.
B. Load the iris.arff data set. Perform attribute ranking on the iris.arff data set using the two attribute ranking methods with default parameters. C. Go back to Preprocess and load the iris.arff data set. Perform discretization of all non-class attributes into 10 equal-width bins as follows: under Filter in the Preprocess window of the Explorer, select filters->unsupervised->attribute->Discretize (use default parameters of the Discretize filter) and hit `Apply. Verify that all attributes are nominal by clicking on individual attributes in the Attributes window in Preprocess. Then perform attribute ranking on the discretized set using the two attribute-ranking methods with default parameters. D. Go back to Preprocess and load the original iris.arff data set again. Perform discretization of all non-class attributes into 5 close-to-equal-height bins by selecting the Discretize filter. Then, select appropriate parameters by clicking on the Discretize filter in the Filter window, and setting `bins to 5 and useEqualFrequency to true. After you verify that all attributes are nominal, perform attribute ranking on the new set using the two attribute-ranking methods with default parameters. E. Load the soybean.arff data set. Then perform attribute ranking on the soybean.arff data set using the two attribute ranking methods with default parameters.
Evaluation
Once you have performed the experiments, you should spend some time evaluating your results. In particular, try to answer at least the following questions: Why would one need attribute relevance ranking? Do these attribute-ranking methods often agree or disagree? On which data set(s), if any, these methods disagree? Does discretization and its method affect the results of attribute ranking? Do missing values affect the results of attribute ranking? Record these and any other observations in a Word file called Observations.doc.