First Project

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

A Project Report

on
Early Prediction Of Student’s Performance Based on
Machine Learning
Submitted in partial fulfillment of the requirements
for the degree of

B.Tech. in Information Technology


by
Sakshi Chetan More (2030331246003)
Alisha Jakariya Inamdar (2030331246005)
Sapna Jayram Shelar (2030331246012)
Pratiksha Prakash Zore (1930331246035)
under the guidance of
S. R. Hivre

Department of Information Technology


Dr. Babasaheb Ambedkar Technological University,
Lonere-402103, Dist. Raigad, (MS) INDIA.
December 2022
Certificate

A Project report, Early Prediction Of Student’s Performance Based on Ma-


chine Learning submitted by Sakshi Chetan More (2030331246003), Alisha
Jakariya Inamdar (2030331246005),Sapna Jayram Shelar (2030331246012),
Pratiksha Prakash Zore(1930331246035), is approved for the partial fulfillment
of the requirements for the degree of B.Tech.in Information Techonolgy of Dr.
Babasaheb Ambedkar Technological University, Lonere - 402 103, Raigad (MS).

Examiner(s)

(1) —————————————————————— Sign.: ——————–

(2) —————————————————————— Sign.: ——————–

Prof. S. R. Hivre Dr. Sanjay R. Sutar


(Guide) (Head of Department, IT)

Place: Dr. Babasaheb Ambedkar Technological University, Lonere - 402 103.


Date:

i
Acknowledgments
It gives me great pleasure on bringing out the project entitled, we express our deep sense
of gratitude and sincere regards to our guide Prof. S. R. Hivre their timely guidance and
friendly discussion had helped us immensely in selecting this current topic and complet-
ing the project work.
We are thankful to Dr. V. J. Kadam Project Coordinator Information Technology De-
partment, for their inspiration and encouragement. He has immensely helped in providing
all opportunities and facilities for the project work.
We are thankful to Dr. Babasaheb Ambedkar Technology University, Lonere for provid-
ing institutional facilities and suggestions. We are thankful to all the faculty members
of Information Technology department and library for help which have been immensely
useful in our work.
We will fail in our duties if we do not mention our classmate who was a constant source
in inspiration during the project work. Last but not the least we are thankful to all them
who directly or indirectly helped us to complete this work

ii
Abstract
The prediction of student’s academic achievement is crucial to be conducted in a uni-
versity for early detection of students at risk. This project aims to present data mining
models using classification methods. To predict students’ academic achievement.
This project provides a collection of models (Student’s t-hidden Markov model (Student’s
t-HMM) and nuisance attribute projection (NAP). NAP can remove nuisance attributes
caused by individual differences from the feature space. Student’s t-HMM utilizes the
finite Student’s t-mixture models (SMMs) to describe the observation emission densities
associated with each hidden state, which can be more tolerant towards outliers than
conventional HMMs.) to early predict the academic performance of students.
The prediction of academic performance is important not only to help students take con-
trol of their own learning and become self-regulated learners but also to allow educators
to identify at-risk students and reduce the chances of failure.

iii
Contents

1 Introduction 2
1.1 Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 System Overview 5
2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Components of the Proposed System . . . . . . . . . . . . . . . . 6

3 Project Planning and System Requirements 8


3.1 Project Plan and Action Plan . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.2 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . 11

4 Methodology 12
4.1 Methodology for the system . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Methodology Used in the Proposed System . . . . . . . . . . . . . . . . . 15
4.2.1 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.2 K-Nearest Neighbour . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.3 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 Data Processing 17
5.1 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

iv
5.2 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3 K-Nearest Neighbor Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 18

6 Building a Prediction Model 19


6.1 Analytical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2.1 Importing Libraries Loading Data . . . . . . . . . . . . . . . . . 19
6.2.2 Visualizing Data Gaining Insights . . . . . . . . . . . . . . . . . 20
6.2.3 Plotting the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2.4 Prepare Data for Machine Learning Algorithm . . . . . . . . . . . 21

7 Applications And Advantages 22


7.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.2 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

8 Future Work 24

9 Conclusion 25

Bibliography 27

v
List of Figures

2.1 The Main Steps and Components of the Proposed System . . . . . . . . 7

3.1 Action Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

vi
Chapter 1

Introduction

Student performance is an important part of the learning process. Predicting student


performance is a key to identifying students who are more likely to have low academic
performance in the future. This data could be helpful and will be utilized in prediction if
that data has been processed into knowledge. Therefore, that knowledge could enhance
teaching and learning quality and help students achieve their academic goals.
Machine Learning (ML) is a research field that applies various Algorithms and Techniques
to information generated from educational backgrounds Furthermore, applying Machine
Learning helps policymakers to plan how to improve student performance. Con-sequent,
it will enhance the learning process and the students’ experience in the educational insti-
tution. Renz et al. investigate the impact and integration of data driven and Artificial
Intelligence models in education.
The emergence of the Artificial Intelligence (AI), specifically the ML models have greatly
enhanced the prediction performance in various domains such as health, business, agri-
culture, and many more. This motivates the implication of ML in educational data as
well. In the proposed study, we used ML model to predict the final grades of the students.
This paper intends the early prediction of the students’ grades and to find the most sig-
nificant factors that affect the students’ results. These features will help to find ways to
improve the educational system in serving the students get better academic results and
enhance the overall results. In the proposed study, Academics data, i.e., marks, grades,
placement information was used to predict the student’s performance.

1
Due to the huge amount of data in educational databases, predicting the performance of
students has become more difficult. The shortage of an established framework for eval-
uating and tracking the success of students also isn’t currently being considered. There
are two primary reasons why such kind of occurring. First, the research on existing
methods of prediction is still insufficient to determine the most appropriate methods for
predicting student performance in institutions. Second, is the absence of inquiry of the
specific courses.
The real goal is to have an overview of the systems of artificial intelligence that were
used to predict academic learning. This research also focuses on how to classify the most
relevant attributes in student data by using prediction algorithm. Using educational
machine learning methods, we could potentially improve the performance and progress
of students more efficiently in an efficient manner. Students, educator and academic
institutions could benefit and also have an impact.
Educator’s duties in historical educational systems are generally confined to delivering
lectures in the classroom to improve student’s knowledge. However, today’s modern
instructor’s participation’ ’is’ ’always’ ’to’ ’the’ ’total’ ’growth’ ’of students in order to
accomplish the best possible growth of both skills and personalities. As a result, it is
the responsibility of academic institutions to deliver better counselling to students in
finding the proper’ ’career’ ’path’ based’ ’on’ their’ talents, skills and’ ’abilities. In’ ’the’
’present’ ’digital’ ’world,’ ’this evolving technology covers various sectors such as bank-
ing, ”education, hospitals,” ’and’ ’more.’ ’Online Digital’ platforms’ play’ ’a’ vital’ role
in’ educational institutions, where the intended learning courses have been delivered for
the students due to the covid-19 pandemic.

1.1 Keywords
• Machine Learning

• Student Performance

• Academics

2
• Data Analytics

• Predictions

• Classification

1.2 Objectives
• To increase the Academic achievement level of each student

• To our System provide performance in early

• Easy to use or handle

• To examine the impact of sport on academic performance.

3
Chapter 2

System Overview

2.1 Problem Statement


It’s known that students are dropping at a certain rate from different university level
bachelor’s programs. There are many reasons for this, but for the purposes of this study,
it’s mainly interested in motivation and progress perception. In the technical sense how-
ever, it’s interested in the comparison of different ML Models used in ML frameworks
to generate a consensus on which model performs betters in terms of the parameters to
predict student performance.
The formation of an application which is used by students teachers to represent analyze
their performance monthly, yearly basis in the tabular form graph. The goal of online
student performance analysis system is to develop software produce high quality software.
For any school, college or other educational institute, students are an important asset in
order to of great quality who excel in academics, practical-knowledge, self-development
and innovative thinking.
To achieve this, it is become essential for every school, college or any other educational
institute to analyze the performance of students. Academic performance (AP) can be
measured by conducting various examinations, assessments and other form of measure-
ments. Managing the grades of an entire class in its learning makes the grading process
easier, and the teachers have a clearly-set-out overview.

4
2.2 Proposed System
The proposed system provides the student an easy and accurate data about projects
and academic percentages. Students can view all the information in just one click which
saves a lot of time and effort. The proposed system maintains a database to store all the
information. In this system, there is no chance of losing data. Adding and searching the
information is very easy which does not take much time and physical effort.

2.2.1 Components of the Proposed System


I. The first step is collecting the data from the data sources. In our case, the data has
been collected using a survey given to the students and the students’ grade book.
II. The second step is pre-processing the data in order to get a normalized dataset and
then labelling the data rows.
III. In the third step, the result of the second step, the training and testing data-set, is
fed to the Machine Learning algorithm.

The Machine Learning Algorithm builds a model using the training data and tests the
model using the test data. Finally, the Machine Learning Algorithm produces a trained
model or a trained classifier that can take as an input a new data row and predicts its
label.
The above diagram, shows the main steps and components of the proposed machine
learning system.

5
Figure 2.1: The Main Steps and Components of the Proposed System

6
Chapter 3

Project Planning and System


Requirements

3.1 Project Plan and Action Plan

Sr. No. Activities Duration In Week


1. Plan of project 02 Week
2. Designing /Module Creation 03 Week
3. Coding 02 Week
4. Testing 01 Week
5. Report Writing / Documentation 01 Week
Table 3.1: Project Plan

7
Figure 3.1: Action Plan

8
3.2 System Requirements

3.2.1 Software Requirements

Sr. No Particulars Specification


1. IDE Spyder
2. Coding Language Python version 3.8
3. Operating System Windows 10
Table 3.2: Software Requirements

9
3.2.2 Hardware Requirements

Sr. No Particulars Specification


1. Processor Pentium-IV
2. RAM 512 Mb(min)
3. Hard Disk 40 GB
4. Key Board Standard Windows Keyboard
5. Mouse Two or Three Button Mouse
6. Monitor LCD/LED
Table 3.3: Hardware Requirements

10
Chapter 4

Methodology

4.1 Methodology for the system


In order to achieve our goal study methodology comprises of few stages, which are accu-
mulation of diabetes dataset with the relevant attributes of the patients, prepossessing
the numeric value attributes to apply various machine learning classification techniques
and corresponding predictive analysis utilizing such data. In the following, we briefly
discuss these phases. A. Dataset and Attributes In this work, we collect diabetes data
from the diagnostic of Medical Centre Chittagong (MCC), Bangladesh. The dataset con-
sists of various attributes or risk factors of diabetes mellitus of 200 patients. We have
summarized the attributes and corresponding values in Fig.

11
12
13
4.2 Methodology Used in the Proposed System

4.2.1 Decision Tree


Decision Tree classifier is the regression model which is represented in the form of tree
structure. The purpose of Decision Tree classifier is to breakdown the dataset into smaller
subset. The tree consists of decision nodes and leaf nodes.[19] In our proposed architec-
ture the attribute which delivers maximum information will act as a decision node. The
node which is present as the top most of the decision node acts as a predictor which is
called as root node. The node which cannot be further divided is known as leaf node.
The steps involved in the decision tree are specified below:

• Process 1: Start the root.

• Process 2: Perform the test.

• Process 3: Follow the edges corresponding to the outcome.

• Process 4: go to step 2 until reaches leaf node.

• Process 5: Predict the outcome associated with the leaf.

4.2.2 K-Nearest Neighbour


K-Nearest Neighbour is one of the basic and essential classification algorithms in ma-
chine learning. It is non-parametric and makes any underlying assumptions about the
distribution of data. The steps involved the KNN is listed below:

• File the training data in a sample points array.

• The Euclidean distance measures.

• Make the least distance range available

14
4.2.3 Linear Regression
Logical regression is the classification technique which handles with the threshold value.
The following arguments are used for determining the threshold value:

• Low Precision / High Recall

• High Precision / Low Recall

Based on the number of categories the Logical regression can be classified as:

• Binomial

• Multinomial

• Ordinal

15
Chapter 5

Data Processing

To achieve the goal of this research some data preprocesses have been done on the dia-
betes dataset. For instance, the exact numeric value of the attributes is not meaningful
to predict diabetes. As such we convert the numeric attribute values into nominal. For
example, the patient’s age is classified into three categories, such as Young (10-25 years),
Adult (26- 50 years) and Old (above 50 years). Similarly, patient’s weight is classified
into three categories, such as Underweight (less than equal 40 Kgs), normal (41-60 Kgs)
and Overweight (above 60 Kgs). Finally, blood pressure is classified into three categories,
such as Normal (120/80 mmHg), Low (less than 80 mmHg) and High (greater than 120
mmHg). C. Apply Machine Learning Techniques Once the data has been ready for
modeling, we employ four popular machine learning classification techniques to predict
diabetes mellitus. Hence, we give an overview of these techniques.

5.1 Support Vector Machines


This is one of the most popular classification techniques proposed by J. Platt et. al.
A Support Vector Machine (SVM) is a excluding classifier, formally characterized the
data by separating a hyperplane. SVM isolates entities in specified classes. It can also
identify and classify instances which is not supported by data. SVM is not caring in the
distribution of acquiring data of each class. The one extension of this algorithm is to

16
execute regression analysis to produce a linear function and another extension is learning
to rank elements to produce classification for individual elements.

5.2 Naive Bayes


Naive Bayes is a popular probabilistic classification technique proposed by Johnet.al.
Naive Bayes also called Bayesian theorem is a simple, effective and commonly used ma-
chine learning classifier. The algorithm calculates probabilistic results by counting the
frequency and combines the value given in data set. By using Bayesian theorem, it as-
sumes that all attributes are independent and based on variable values of classes. In real
world application, the conditional independence assumption rarely holds true and gives
well and more sophisticate classifier results.

5.3 K-Nearest Neighbor Algorithm


K-nearest neighbor is simple classification and regression algorithm that used non para-
metric method proposed by Ahaet al. The algorithm records all valid attributes and
classifies new attributes based on their resemblance measure. To determine the distance
from point of interest to points in training data set it uses tree like data structure. The
attribute is classified by its neighbors. In a classification technique, the value of k is
always a positive integer of nearest neighbor.

17
Chapter 6

Building a Prediction Model

6.1 Analytical Approach


Supervised Machine Learning is applied to predict and analyze a student’s marks. For
this task, we begin our pursuit by approaching the problem using a technique called the
“simple linear regression model”. It is a statistical model commonly used to estimate
the relationship between two quantitative variables; one dependent variable and one or
more independent variables using a line. This algorithm is fast and efficient for a small
and medium-sized database and is useful to quickly discover insights from labeled data.
Our two quantitative variables are:

1. The percentage of marks scored by each student on a particular subject.

2. The number of hours studied by each student on a particular subject.

6.2 Data Analysis

6.2.1 Importing Libraries Loading Data


We had import the libraries involved. Scikit-Learn will be imported later on. The next
step is to load the given data into the Python Interpreter I used on Jovian, to proceed with

18
the training of the model. Pandas are used to load the CSV file and give a confirmation
of sorts when the data is successfully loaded.

6.2.2 Visualizing Data Gaining Insights


Before proceeding further, we will check a summary of the technical information of our
data. The info() function used prints a concise summary of a specific DataFrame. This
function provides information about a DataFrame including but not limited to:

• index type

• column dtypes

• non-null values

• memory usage

Based on the information given above, we can reiterate that there are two columns called
Hours and Scores, and there are a total of 25 values in each column. Thus, it can be
concluded that there are 25 elements in the data being fed to the machine learning model.
The type of data (dtype) in the hours is float, while the type of data (dtype) in the Scores
is an integer. For future purposes, both columns should have the same type of data.
Upon successful import and conversion of both columns’ data types to be the same, the
data can be previewed using the head() function.
Notice how the head() function only previews the top five elements by default. This can
be customized by simply adding the number of elements that are required to be seen
between the parenthesis.
In particular, functional skills include having a good sense of numbers. One should be
able to analyze and translate what the numbers are saying. This requires a firm hold in
statistics and room for interpretation. Fortunately, the describe() function provides a set
of important values for further statistical analysis.

19
6.2.3 Plotting the Data
The next phase is to enter the distribution scores and plot them according to the require-
ment. The data points are plotted on a 2-D graph to visualize the dataset and see if any
relationship between the data can be identified.

6.2.4 Prepare Data for Machine Learning Algorithm


A crucial part of a Data Scientist’s job is to prepare this data by cleaning, organizing,
and optimizing for use by end-users. End-users include business stakeholders, analysts,
and programmers. The “prepared” data is then used to interpret the results and relay
information for the management to make better-informed decisions.

20
Chapter 7

Applications And Advantages

7.1 Applications
1. Provides support in selecting courses and designing appropriate future study plans
for students.

2. Tracking the student or class to get the overall performance. of student or class

3. Provides an easy way to students in searching the details of projects, academic


attendance report and marks/percentage details the with graph.

4. The academic performance of student is usually stored in various formats like files,
documents, records etc.

5. All the details of the projects and details of student’s attendance and marks are
added by the Teachers and HODs.

6. Student’s performance became an urgent desire in most of educational entities and


institutes

7. Help to increase academic achievement level of each student

8. Easy to use and handle

9. The system will provide performance as earliest as possible

21
7.2 Advantages
1. Improved students’ performance prediction for multi-class

2. It reduces the official warning signs as well as expelling students because of their
inefficiency

3. It helps in maintaining students’ records

4. It helps teacher to get their assigned work

5. Easy way of displaying notice.

22
Chapter 8

Future Work

Predicting student’s success in the past has proven to be incredibly useful in helping
educational institutions enhance their teaching quality. This research claims that by
considering a student’s scholarly intricacies, one can forecast their performance. Edu-
cational institutions are exceptional and play a critical role in the development of any
country. People’s lives. families lives, networks lives, social orders lives, nations lives,
and finally the world’s lives are all changed by education! Therefore, we have such a
pleasant life today.
Today’s education isn’t simply limited to ’classroom’ instruction; it also includes things
like Online Education Systems, Web-based Education’ ’Systems,’ ’Seminars,’ ’Work-
shops,” and MOOC course turns into It’s much more difficult to predict a student’s
success considering the massive amount of data stored in Educational databases and
Learning Management databases.
The performance of students can be evaluated using a variety of methodologies that are
readily available. It is a developing area of focus that emphasises several methodologies
such as characterisation, prediction, and include determination. It is used to predict stu-
dent’s performance and learning behaviour by extracting hidden knowledge from learning
records or information associated with the instruction.

23
Chapter 9

Conclusion

• In the modern educational system, predicting student’s academic achievement is


critical. However, it has been discovered that the number of studies that use in-
formation from students is restricted in this arena. Personal characteristics, demo-
graphics, university entry scores, and academic grades were used to predict tertiary
students academic grades in this study.

• An empirical study was conducted using the most prevalent Machine Learning
algorithms on a student dataset. Using MLP and Naive Bayes gives a higher TP
rate in’ which results indicates the students were classified with a high correctly
classified rate. Predicting students academic grades, learning outcomes helps to give
any necessary assistance from academics and university administration to students.

• The output model can also be linked to a university’s management information


system to give an early warning system for at-risk students. We hope to address the
imbalanced problem in academic datasets in future research to improve prediction
quality.

• The project focuses on the student academic growth analysis using machine learning
techniques. For analysis Binomial logical regression, Decision tree, Entropy and
KNN classifier are used. This process can help the instructor to decide easily
about performance of the students and schedule better method for improving their

24
academics. In future additional features are added to our dataset to acquire better
accuracy.

• Present studies shows that academic performances of the students are primarily
dependent on their past performances. Our investigation confirms that past perfor-
mances have indeed got a significant influence over students’ performance. Further,
we confirmed that the performance of neural networks increases with increase in
dataset size.

• Machine learning has come far from its nascent stages, and can prove to be a pow-
erful tool in academia. In the future, applications similar to the one developed, as
well as any improvements thereof may become an integrated part of every academic
institution.

25
Bibliography

[1] J. Xu, K. H. Moon, and M. Van Der Schaar, “A Machine Learning Approach for
Tracking and Predicting Student Performance in Degree Programs,” IEEE J. Sel.
Top. Signal Process., vol. 11, no. 5, pp. 742–753, 2017.

[2] K. P. Shaleena and S. Paul, “Data mining techniques for predicting student perfor-
mance,” in ICETECH 2015 - 2015 IEEE International Conference on Engineering
and Technology, 2015, no. March, pp. 0–2.

[3] A. M. Shahiri, W. Husain, and N. A. Rashid, “A Review on Predicting Student’s


Performance Using Data Mining Techniques,” in Procedia Computer Science, 2015.

[4] Y. Meier, J. Xu, O. Atan, and M. Van Der Schaar, “Predicting grades,” IEEE Trans.
Signal Process., vol. 64, no. 4, pp. 959–972, 2016.

[5] P. Guleria, N. Thakur, and M. Sood, “Predicting student performance using decision
tree classifiers and information gain,” Proc. 2014 3rd Int. Conf. Parallel, Distrib. Grid
Comput. PDGC 2014, pp. 126–129, 2015.

[6] P. M. Arsad, N. Buniyamin, and J. L. A. Manan, “A neural network students’ perfor-


mance prediction model (NNSPPM),” 2013 IEEE Int. Conf. Smart Instrumentation,
Meas. Appl. ICSIMA 2013, no. July 2006, pp. 26–27, 2013.

[7] K. F. Li, D. Rusk, and F. Song, “Predicting student academic performance,” Proc.
- 2013 7th Int. Conf. Complex, Intelligent, Softw. Intensive Syst. CISIS 2013, pp.
27–33, 2013.

26
[8] G. Gray, C. McGuinness, and P. Owende, “An application of classification models
to predict learner progression in tertiary education,” in Souvenir of the 2014 IEEE
International Advance Computing Conference, IACC 2014, 2014.

27

You might also like