Plagiarism

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

CHAPTER -1

INTRODUCTION

Prediction of the performance of students became an urgent need in most


of the educational institutes. That is essential as it is important to help at-
risk students and prevent their detention, providing them with learning
resources and educational experience, and also improving the
university’s ranking and reputation.
This research also explores the ways of identifying various key
indicators present in the dataset, which will be used in creating the most
accurate prediction model, using different types of visualization,
clustering and regression algorithms. The Best indicators were chosen to
feed the various machine learning algorithms being used to evaluate
them for the most accurate model. From the algorithms that are used, the
results show the ability of each algorithm in detecting key parameters in
small datasets.

Student performance prediction poses a crucial challenge within


the realm of education. Various factors influence students'
academic outcomes, consequently impacting the reputation of
universities. Sustaining high educational standards in institutions
becomes intricate when dealing with underperforming students.
Extensive research offers diverse avenues for enhancing the
educational system to cater to students' individual requirements.
The complexity of education data mining presents novel
research opportunities. Techniques employed in education,
which fall under educational data mining, involve the automatic
extraction of valuable insights from raw data.
LITERATURE
EXISTING WORK
LGBM:
LightGBM (Light Gradient Boosting Machine) is a popular machine
learning algorithm that falls under the category of gradient boosting
frameworks. It is specifically designed for speed and efficiency, making
it a powerful choice for handling large datasets and achieving high
performance in various tasks such as classification, regression, and
ranking.
Here are some key features and characteristics of the LightGBM
algorithm:

Gradient Boosting Algorithm: LightGBM implements the gradient


boosting framework, which is an ensemble learning technique that
builds strong predictive models by combining multiple weak learners
(usually decision trees) sequentially. It uses gradient descent
optimization to minimize the loss function at each iteration.
Leaf-Wise Tree Growth: LightGBM grows trees in a leaf-wise manner
rather than level-wise. This strategy selects the leaf with the maximum
delta loss to grow the tree, resulting in faster convergence and
potentially better accuracy.
Gradient-Based Learning: LightGBM uses gradient-based techniques
to optimize the loss function. It computes gradients of the loss with
respect to predictions and employs these gradients to guide the tree
construction process, making it more efficient in finding informative
splits.
Histogram-Based Learning: One of LightGBM's unique features is its
histogram-based approach for computing gradients. It bins the features
into discrete bins and constructs histograms for efficient computation of
gradients and splits. This speeds up the training process and reduces
memory usage.
Categorical Feature Handling: LightGBM has built-in support for
categorical features. It converts categorical features into integers and
then treats them as ordinal variables. This allows the algorithm to
efficiently handle categorical data without the need for one-hot
encoding.
Regularization: LightGBM provides several options for regularization
to prevent overfitting. These include parameters like max_depth,
min_child_samples, and lambda (L2 regularization).
Lightweight and Fast: As the name suggests, LightGBM is designed to
be lightweight and fast. Its efficient tree construction and histogram-
based approach contribute to its speed advantage over other gradient
boosting implementations.
Parallel and GPU Learning: LightGBM supports parallel and GPU
learning, further enhancing its training speed, especially when dealing
with large datasets.
Tuning and Hyperparameters: Like any machine learning algorithm,
LightGBM has several hyperparameters that can be tuned to achieve
optimal performance. Common hyperparameters include learning rate,
number of trees (boosting rounds), tree depth, and more.
Python Interface: LightGBM provides a Python interface along with
APIs for other languages like R, Java, and C++. This makes it accessible
and usable within a wide range of programming environments.

LightGBM has gained popularity in machine learning competitions and


real-world applications due to its speed, efficiency, and strong predictive
performance. However, as with any algorithm, it's important to
experiment with different hyperparameters and settings to find the
configuration that works best for your specific task and dataset.
CHAPTER-2
SOFTWARE ENVIORNMENT
2.1.PYTHON: Python is a important, interactive, object- acquainted, and
interpreted
scripting language. Python has been created to be veritably readable. It
has smaller
syntactical structures than other languages and generally employs
English keywords
rather than punctuation. The main defense for exercising Python to carry
out the
data gathering and processing is given below.

Increased productivity of programmers: The language boosts the output


of the
programmer by utilizing a sizable help library and a sophisticated article
aligned
pattern.
Feature of Integration: Python integrates COBRA or COM
characteristics to enable
Enterprise Application Integration, which enhances online services. As a
result, it has
great authority skills because Python can rapidly access C, C++, or
Java.. Python also
refers to XML and other markup languages because it may be used on
every new
operating system utilizing a comparable set of bytes for symbols.

Extensive Support Libraries: This includes the vastness. Examples


include the
implementation of internet services, OS commands and interfaces, and a
number of
actions that make use of high-quality libraries. It already has a major
portion of the
frequently used programming functions, which limits the number of
ciphers that may
be created in Python. The classifiers are called by identifying built-in
functions in the
Python SKlearn package for machine learning.
GUI Programming in Python allows for the creation and porting of GUI
applications to a variety of system calls, libraries, and windows
platforms, including Windows MFC, Macintosh, and the X Window
system of Unix. • Scalable – Python offers larger projects better structure
and support than shell scripting.
2.1.2.NumPy:
NumPy is a general-purpose library for managing arrays. It provides an
extremely quick multidimensional array object along with the ability to
interact with these arrays. The foundational Python module for scientific
computing, to put it simply. The software is freely available. It has a
variety of traits, however the following are the
most important ones:
• Integration tools for C/C++ and Fortran; practical Fourier transform,
random number, and linear algebra capabilities; sophisticated
(broadcasting) functions; a robust N-dimensional array object.

NumPy is a potent multi-dimensional data container with a wide range


of non- science uses. NumPy can connect quickly and efficiently with a
various types of databases thanks to its ability to declare any data-types.

2.1.3.Seaborn:
Python's Seaborn visualization module is fantastic for conniving
statistical
visualizations. It offers lovely dereliction styles and color schemes to
enhance the
appeal of statistics maps. It's tightly constructed on top of the matplotlib
library. With
Seaborn, visualization will be at the heart of data disquisition and
appreciation.
offers dataset- acquainted APIs, allowing us to move between colorful
visual
representations of the same variables for bettered dataset appreciation.

2.1.4.Matplotlib:
Matplotlib is an excellent visualization package in Python for 2D array
displays. To handle the larger SciPy mound, the Matplotlib amuletic-
platform data visualization software was developed and is based on
NumPy arrays. John Hunter initially presented it in the year 2002. One
of visualization's main benefits is that it allows us visual access to
enormous amounts of data in easily understandable forms. Matplotlib
offers a wide variety of graphs, including line, bar, scatter, and
histogram.
Plotting Functions: Matplotlib offers a wide range of functions for
creating different types of plots, including line plots, scatter plots, bar
plots, histogram plots, pie charts, 3D plots, and more. These functions
provide a simple and intuitive interface for visualizing data in various
formats.
Integration with NumPy: Matplotlib seamlessly integrates with NumPy
arrays, making it easy to visualize data stored in NumPy arrays. This
integration enables users to plot data directly without the need for
extensive data format conversions.

2.1.5.Sklearn:
The most effective and dependable Python machine literacy library is
called Skearn(SciKit- Learn). Through a Python thickness interface, it
offers a variety of effective tools for statistical modeling and machine
literacy, including bracket, retrogression, clustering, and dimensionality
reduction. This library is grounded on NumPy, SciPy, and Matplotlib
and was written primarily in Python.
CHAPTER – 3
SYSTEM ANALYSIS
Finding the optimal answer to an issue involves analysis. In order to
readily accomplish them, a system must be developed after careful
analysis of a business or operation to determine its goals and objectives.
The process of learning about current issues, defining objects and
requirements, and assessing potential solutions is known as system
analysis. It is a mode of thinking about the organization and the issues it
faces, as well as a group of technologies that aid in problem solving. In
system analysis, which provides the aim for design and development,
feasibility studies are crucial.
Requirements analysis is an important step of systems engineering and
software engineering. It concentrates on tasks such as analyzing,
validating, documenting, and managing the software or system
requirements while taking into account the potentially conflicting
requirements of various stakeholders.
A systems or software project's success or failure depends on the results
of the requirements analysis. The requirements ought to be well-
documented, usable, quantifiable, testable, traceable, tied to recognized
business opportunities or needs, and sufficiently defined for system
design.
3.1 Proposed System
In this system we will manage the various features that are to be
considered for the prediction of performance and by reading the
information about the various students we will analyze the student
performance by using various features from the dataset
In this system we will train the machine learning model using the
student performance dataset derived from Kaggle
We will use Light Gradient Boosting Machine(LGBM) Regressor
algorithm in order to predict the students performance.

3.2 Software Requirements


The Software Requirements specify the logical characteristics of each
interface and software components of the system.
The following are the required software specifications
• Operating system : Windows 8, 10
• Languages : Python
• Back end : Machine Learning
• IDE : Jupyter

3.3 Hardware Requirements:


The Hardware interfaces specify the logical characteristics of each
interface between the software program and the hardware components
of the system.
The following are the required hardware specifications
• Processor : Intel Dual Core @ CPU 2.90 GHz.
• Hard disk : 500GB and Above
• RAM : 4GB and Above
Chapter-4
IMPLEMENTATION
4.1 Introduction
The implementation of Student Performance Prediction is done by using one
algorithm named LGBM(light gradient boosting machine). The IDE that we are
using for this implementation is Jupyter Notebook.

4.2 Jupyter Notebook


Step 1: for installing jupyter notebook first we have to open command
prompt check the version of python. Using “python –version”
command .It will show python 3.7.8 like that otherwise we have to
install python first.
Step 2: After that check the pip version also using “pip –version”
Step 3: Now we have to install Jupiter lab using “pip install jupypter
lab”
command in command prompt.
Step 4: After successful installation of jupyter lab then Jupiter notebook
using command “pip install jupyter notebook”. If there is any upgrade
required
then update.
Step 5: create one folder name test jupyter and open that folder path in
command prompt.
Step 6:Now the jupyter home page will be appear like this as shown in .
Using this we can create python files for our model.
4.3 Data set
This dataset contains information of high school students and their
performance in mathematics, including their grades and demographic
information. The data was acquired from three high schools in the
United States of America.
Columns:
• Gender: The gender of the student (male/female)
• Race/ethnicity: Racial or ethnic background of the students (Asian,
African-American, Hispanic, etc.)
• Parental level of education: The highest level of education attained
by the student's parent(s) or guardian(s)
• Lunch: Whether the student is receiving reduced price or free lunch
(yes/no)
• Test preparation course: If the student has completed a test
preparation course(yes/no)
• Math score: The score of the students in the standardized mathematics
test
• Reading score: The score of the students in the standardized reading
test
• Writing score: The score of the students in the standardized writing
test
This dataset could be used for various research questions related to
education, such as examining the impact of parental education or test
preparation courses on student performance. It could also be used to
develop machine learning models to predict student performance based
on demographic and other factors.
4.4 Data Visualization

Importing libraries and dataset path reading: The following required


libraries are imported to perform operations and analyzing python
project through executing following commands in Jupyter notebook and
set the dataset path present in a csv file.
4.5 Data Preprocessing
Data preprocessing is the cleaning of data which is next step
involved in execution of our project. This technique is applied at
early stage of machine learning. By this technique we can
improve the quality of data. Data preprocessing transforms the
raw data into more easily and efficient format.
Handling missing values:
For handle the missing values in our data we have to methods they are 1.
Removing the null value records from the dataset. 2.filling the null
values with its Mean value. Here we will fill null values with their mean
because if you delete the records data will become small. Filling the
missing values with their mean value by executing following command
CHAPTER-5

OUTPUT

Applying LGBM Regressor Algorithm to get the result :


CHAPTER-6

CONCLUSION

In this project we have used Light Gradient Boosting


Machine (LGBM) Regressor Algorithm to get the accurate
results. The LGBM algorithm gives more accuracy and
can handle large amounts of data. Using this algorithm, we
have predicted the marks of the students in the sample data
and how many marks the student would get.

You might also like