Assignment3 - CSE4002
Assignment3 - CSE4002
Assignment3 - CSE4002
Assignment Specification
Semester 1, 2024
General Information
1. Due Date
2. Submission (please read this carefully before you submit the assignment)
The assignment submission is electronic only. Any handwritten content, including diagrams, will not be
assessed. In other words, you will receive 0 mark. Submit the assignment through the Assignment 3
Submission Chute which is available in the Assessment folder of the subject LMS site.
Use your student id to name the file. For example, if your student id is 1234567, then the file name is
ID_1234567.docx/.doc/.pdf. Otherwise, it will not be assessed, which means you will receive 0 mark. The
reason is your assessment submission must generate a similarity score (you are responsible for checking
this). Submitting in Word format is the best way to do this. If your submission does not generate a
similarity score, it cannot be checked for plagiarism and therefore will not be marked.
In this word file, please answer Task 1, and Task 2 and Task 3 one by one. Task 1 and task 2 are Prolog
programming problems, please write solutions, queries, and answers. Task 3 is a python programming
problem; you need to write a report regarding solving this problem.
• a .py or .ipynb file to be submitted. The python program is to solve the task 3 of this assignment. If your
student id is 1234567, then the file name is ID_1234567.py or ID_1234567.ipynb. Otherwise, it will not be
assessed, which means you will receive 0 mark.
3. Weighting
You need to answer all three tasks if you are master students. The assignment contributes 40% of the
final assessment for this subject.
4. Academic Integrity
This assessment must be done individually. This means that answers to questions and code that you
write must be your own. You must not collude with other students in any way, and you must not outsource
your work to any third party. La Trobe University treats plagiarism seriously. When detected, penalties are
strictly imposed. Further information can be found at
http://www.latrobe.edu.au/plagiarism/plagiarism.html
Assignment Specification
Problem Description:
Write a PROLOG program to represent the following facts and enter the program into the online
executor. There were six people in a family reunion - Anna, Lily, Rebecca, Elizabeth, Mia, and
Olivia. They provide some information about the family relations.
The program defines a predicate which is mother (X, Y) (X is the mother of Y) and assume above
statements are the following facts:
mother(anna, lily).
mother(anna, rebecca).
mother(lily, elizabeth).
mother(lily, mia).
mother(rebecca, olivia).
Requirements:
Once the program is entered, make the following queries in the PROLOG executor:
Problem Description:
A fruit shop exported a basket of two types of fruits – strawberry and orange, and the owner wants to
measure the weights. The weight of a strawberry is between 15g and 25g. The weight of an orange is
between 125g and 200g. A robot one-by-one picks up fruits from the basket, weight it and enter the
weight into the PROLOG program that you wrote. The robot enters 0 (zero) when all fruits have been
picked out of the basket. Your program will then display the average weight of the strawberry and the
average weight of the orange from the basket.
Hint: We will use a few variables, S_sum, S_num, O_sum, and O_num, to represent the total weight of
input strawberries, total number of input strawberries, total weight of input oranges, and total number of
input oranges, respectively. Then, by dividing the total weight by the total number, we can obtain the
average weight.
Requirements:
?- weight.
155.2
188
17.3
126.7
19.9
179.9
24.5
Task 3
Background
Decision Trees (DTs) are a non-parametric supervised learning method used for
classification and regression. The goal is to create a model that predicts the value of a
target variable by learning simple decision rules inferred from the data features. A tree
can be obtained after training. In practice, a predictive decision tree model will
incrementally select the best decisions to split on (evaluated based on the entropy
principle) to provide an output classification based on our input data. For this
assessment, you will be describing a new problem and utilising some machine learning
Python modules to create an ID3 predictive decision tree model, along with a
visualisation to better understand the classification process.
This Task 3 will measure your ability to 1) study a problem that can be tackled by an
artificial intelligence method, e.g., a decision tree; 2) implement the decision tree using
the given instructions; 3) visualize and analyse the decision tree. The objective of this
assessment is to utilise the pandas the scikit-learn library to implement the ID3
decision tree machine learning algorithm to create a classifier that can tackle a specific
problem. By the end of this assessment, you should have a better understanding of how
decision trees work. Utilising the trained tree visualisation, you should have a reinforced
understanding of the core decision tree principles and how the trees’ split evaluations
operate.
Problem Description
You will be creating a decision tree that will predict wine classes based on provided attributes.
Imagine that you are a wine producer compiling data for a study. The data is the results of a
chemical analysis of wines grown in the same region in Italy by three different cultivators. There
are thirteen different measurements taken for different constituents found in the three types of
wine, class_0, class_1, and class_2.
Those thirteen different measurements include: Alcohol, Malic acid, Ash, Alcalinity of ash,
Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color
intensity, Hue, OD280/OD315 of diluted wines and Proline.
Therefore, the model’s input parameters can be Those thirteen different measurements and the
model output should be the wine classes.
Implementation instructions
Assignment dependency installation
• scikit-learn
• pandas
• matplotlib
Imports
You will be utilising a number of well-known machine learning Python modules in this
task. These steps allow for ease of implementation of our decision tree and include
numerous learning tools to help boost your understanding.
import pandas as pd
import sklearn
data = load_wine()
The pandas python module is a very powerful, frequently used data analysis tool in all
forms of machine learning; it allows you to store and manipulate large datasets very
easily and is highly compatible/integrated with other machine learning tools/modules.
You will be storing your dataset into a pandas DataFrame. A DataFrame is very similar to
a dictionary in standard Python but has many additional useful features. Add our data
into this DataFrame by specifying the data keys and corresponding values.
Then, what we need to do is to create a training set for training the classifier and a test
set to evaluate the quality of the trained classifier. Creating the training and test sets
can be quite easy for this task, we can simply split the collected data into two groups.
For example, if we have collected 100 records, we can use 80 records as the training
set, leaving the remaining 20 records as the test set.
Once you have correctly formatted your data, you can move on to creating the decision
tree. Create a new scikit-learn DecisionTreeClassifier, pass the ‘entropy’ key as the
criterion for the information gain. Scikit-learn is a powerful machine learning
framework. You will be utilising the included DecisionTreeClassifier class to create
and train your decision tree. Please follow this instruction: https://scikit-
learn.org/stable/modules/tree.html
Call the fit method on this classifier object to train the decision tree.
Next, utilise the plt.figure method to generate the dot data representation of the trained
decision tree graph. The decision tree graph visualisation will be saved in your working
directory as a PDF named output_graph.
Discuss the generated graph visualisation in your report in details. You will have to put
the generated graph visualisation (output_graph) in the word file as well. Some useful
information you may need for the visualization of a decision tree:
To test your developed model, you will have to pass the test set into your trained
decision tree classifier. Input this test set into the trained model. Calculating the
classification accuracy is needed.
According to the above descriptions, this task requires you to fulfill the following
objectives:
(1) Load the wine dataset correctly, and split it into train/test sets appropriately;
(2) Appropriate implementation of a decision tree for solving the classification task
using the loaded dataset;
(3) Train decision tree correctly, obtaining good test results (the actual test
performance should be revealed in report);
(4) Visualize the trained decision tree correctly.
This task 3 requires a detailed report to explain what you have done and what you have
learned by finishing the Task 3. You are required to write problem definition, the logic of
your implementation of a decision tree, and the results (the accuracy) of your code and
so on.