ITS66604 MidTerm Individual - Preview
ITS66604 MidTerm Individual - Preview
ITS66604 MidTerm Individual - Preview
STUDENT DECLARATION
1. I confirm that I am aware of the University’s Regulation Governing Cheating in a University
Test and Assignment and of the guidance issued by the School of Computing and IT
concerning plagiarism and proper academic practice, and that the assessed work now
submitted is in accordance with this regulation and guidance.
2. I understand that, unless already agreed with the School of Computing and IT, assessed
work may not be submitted that has previously been submitted, either in whole or in part,
at this or any other institution.
3. I recognise that should evidence emerge that my work fails to comply with either of the
above declarations, then Imay be liable to proceedings under Regulation.
Instructions:
1. Follow the instructions provided in MyTimes questions.
3. Create document (Word or google Doc) to an online location, paste the URL to a
designated box in MyTimes. All the screenshots should be pasted in this file. A template
is provided in MyTimes.
5. Attach the URL link to google colab notebook or your original program files (.py) (.ipynb)
to the report upon submission. Use a proper program written in Python and execute the
code.When writing your code, be sure to follow best practices and document your code
thoroughly. Make sure to test your code thoroughly to ensure that it is working correctly.
The Case Study:
This case study investigates the use of machine learning to predict total conversion from a
social media ad campaign. The dataset contains 1143 observations in 11 variables.
Metadata:
Attributes Description
age Age of the person to whom the ad is shown
gender Gender of the person to whom the ad is shown
A code specifying the category to which the person's interest
interest belongs (interests are as mentioned in the person's Facebook public
profile)
Impressions The number of times the ad was shown
Clicks Number of clicks on for that ad
Spent Amount paid by company XYZ to Facebook, to show that ad
Total number of people who enquired about the product after seeing
Total conversion
the ad
Objective
The objective of this case study is to develop an optimized regression model to predict total
conversion. The model should be able to accurately predict total conversion for new ad
campaigns, based on the features that are available.
Approach
The following approach will be used to develop the optimized regression model:
• Exploratory data analysis (EDA) will be performed to understand the distribution of the
variables and the relationship between the variables.
• A baseline linear regression model will be trained to predict total conversion.
• The performance of the baseline model will be evaluated using various metrics, such as
mean squared error (MSE), root mean squared error (RMSE), and R-squared.
• More complex regression models, such as polynomial regression, will be trained and
evaluated.
• The best performing regression model will be selected, tuned, optimized and used to
predict total conversion for new ad campaigns.
Benefits
The development of an optimized regression model to predict total conversion will provide the
following benefits:
• The model can be used to identify the ad campaigns that are most likely to be successful.
• The model can be used to optimize the ad campaigns to improve total conversion.
• The model can be used to predict the total conversion for new ad campaigns, before they
are launched.
This case study will demonstrate the use of machine learning to solve a real-world problem. The
optimized regression model that is developed in this case study can be used by businesses to
improve their social media ad campaigns and increase their sales.
Perform exploratory data analysis (EDA) on the dataset to understand the distribution of the
variables and the relationship between the variables.
a. Use relevant visualizations to inspect the data quality, including checking for missing
values.
Provide a description of your main findings in MyTimes text box, paste the screenshots
into the online document.
(5 marks)
b. Examine the range of values among all the features. Analyze the spread of the data and
identify any potential outliers.
Provide a description of your main findings in MyTimes text box, paste the screenshots
into the online document.
(5 marks)
c. Identify and visualize the features that have the strongest correlations with the total
conversion rate.
Provide a description of your main findings in MyTimes text box, paste the screenshots
into the online document.
(10 marks)
d. Perform feature encoding on ‘Age’ using label encoding and ‘Gender’ using one-hot
encoding techniques.
Provide the name of dataframe that stores this output in MyTimes text box, paste the
screenshots of dataframe into the online document.
(10 marks)
Question 2: Baseline Regression Model (30 marks)
Train a baseline linear regression model to predict the total conversion. Evaluate the
performance of the baseline model.
a. Based on your EDA findings in question 1, select at least two relevant features as
independent variables (X) to predict the total conversion rate (y).
Name the features you selected in MyTimes text box, provide functional code with its
output in your google Colab notebook.
(5 marks)
Provide the line of code in MyTimes text box, provide functional code with its output in
your google Colab notebook.
(5 marks)
c. Train the linear regression model and report the equation of the model with the trained
coefficients. (10 marks)
Provide the equation of model in MyTimes text box, provide the screenshot of features
and coefficients in online document, provide functional code with its output in your
google Colab notebook.
d. Evaluate the model performance using appropriate metrics and provide a detailed
explanation of the results. (10 marks)
Provide the metrics and detailed explanation in MyTimes text box, provide relevant
screenshot of features and coefficients in online document, provide functional code with
its output in your google Colab notebook.
_end of preview_
Question 3: Polynomial regression (20 marks)
Refer to MyTimes.
Refer to MyTimes.
MACHINE LEARNING AND PARALLEL COMPUTING ITS66604
Mid Term Marking Scheme (APRIL 2024)
Score (Percentage of the allocated marks for each task)
Criteria Excellent Good Average Poor
>= 90% < 90% , >= 70% < 70% , >= 40% < 40%
18 marks 14 - 17 marks 8 – 13 marks 0 – 7 marks
Question 1: EDA Demonstrates a Demonstrates a Demonstrates a Demonstrates a
deep good basic limited
understanding of understanding of understanding of understanding of
the data and its the data and its the data and its the data and its
relationships. relationships. relationships. relationships.
Able to identify key Able to identify key Able to identify key unable to identify
patterns and patterns and patterns and key patterns and
insights, and insights, and insights, and insights, and
communicate communicate communication communication is
findings clearly and findings clearly. may be less clear unclear and
concisely. and concise. concise
- END -