PCE20CS705 - Rameshber Goswami - ITSREPORT

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 36

An

Industrial Training Report on


MACHINE LEARNING & DATA SCIENCE
At
POORNIMA INSTITUTE OF ENGINEERING & TECHNOLOGY,
JAIPUR (RAJASTHAN)
Submitted in partial fulfillment of the requirements for
the award of the degree of
Bachelor of Technology
in
Computer Engineering

(Session 2022-23)

Submitted to: Submitted by:


Dr. Nikita Jain Rameshber Goswami
Assistant Professor PCE20CS705
(Coordinators- Industrial Training) Class: 5CS-C

DEPARTMENT OF COMPUTER ENGINEERING


POORNIMA COLLEGE OF ENGINEERING, JAIPUR
RAJASTHAN TECHNICAL UNIVERSITY, KOTA
SEPTEMBER, 2022
DECLARATION

I hereby declare that the work which is being presented in the Industrial Training report titled
Machine Learning & Data Science in partial fulfillment for the award of the Degree of
Bachelor of Technology in Computer Engineering and submitted to the Department of
Computer Engineering, Poornima College of Engineering, Jaipur, is an authentic record of
my work carried out at Poornima Institute of Engineering & Technology, Jaipur (Rajasthan)
during the session 2022-23.

I have not submitted the matter presented in this report anywhere for the award of any other
Degree.

Signature of the Student


Name: Rameshber Goswami
Reg. No.: PCE20CS705
Place: Jaipur
Date: 10/10/2022

(ii)
Training Certificate from Company

(iii)
DEPARTMENT OF COMPUTER ENGINEERING
Date: 10/10/2022

CERTIFICATE
This is to certify that the Industrial Training report Machine Learning & Data

Science has been submitted by Rameshber Goswami, PCE20CS705 in partial

fulfillment for the award of the Degree of Bachelor of Technology in Computer

Engineering during the session 2022-23. The industrial training work is found

satisfactory and approved for submission.

Dr. Nikita Jain


Professor & Head
Department of Computer Engineering
Coordinators- Industrial Training
(iv)

ACKNOWLEDGEMENT

I would like to convey my profound sense of reverence and admiration to my supervisor Mr.
Deepak Moud, HOD (Computer Engineering), and Dr. Uday Pratap Singh, Assistant Professor,
Poornima Institute of Engineering & Technology, Jaipur (Rajasthan) for his intense concern,
attention, priceless direction, guidance and encouragement throughout this internship.

I extend my heartiest gratitude to Dr. Nikita Jain, Coordinator-Industrial & Head, Department
of Computer Engineering, Poornima College of Engineering, for unvarying support, guidance,
and motivation during the course of this research.

I am grateful to Dr. Mahesh Bundele, Director, Poornima College of Engineering for his
helping attitude a keen interest in completing this training work in time.

I would like to express my deep sense of gratitude towards the management of Poornima College
of Engineering including Dr. S. M. Seth, Chairman Emeritus, Poornima Group, and former
Director NIH, Roorkee, Shri Shashikant Singhi, Chairman, Poornima Group, Mr. M. K. M.
Shah, Director Admin & Finance, Poornima Group, and Ar. Rahul Singhi, Director of
Poornima Group for the establishment of the institute and providing facilities for my studies.

I am deeply thankful to my parents and all other family members for their blessings and
inspiration. Last, but not least I would like to give special thanks to God who enabled me to
complete my training work on time.

Rameshber Goswami

PCE20CS705

(v)
TABLE OF CONTENTS

PARTICULARS PAGE NO.

Title Page i
Candidate’s Declaration ii
Training Certificate iii

Certificate by the Department iv

Acknowledgment v
Table of Contents vi-vii
List of Tables viii
List of Figures ix
Abstract 1
Chapter 1: Introduction 2-6
1.1 About company 2
1.2 Training Platform 2
1.3 Training Starting Date 2
1.4 Training Ending Date 3
1.5 Total Training Duration 3
1.6 Date of Certification 3
1.7 Training Pictures/Images 4-6
1.8 Conclusion 6
Chapter 2: Technical Training Platform 7-8
2.1 Introduction 7
2.2 Reason for selecting this platform 7
2.3 Profile of Organization 7
2.4 Conclusion 8

Chapter 3: Overview of Technology Learn 14-22


3.1 Python and its Libraries 9-14
3.2 Machine Learning and Data Science 15-22
Chapter 4: Project Description 23-24
4.1 Description 25
4.2 Project Snapshots 26
Conclusion 27
References 28
(vii)

LIST OF TABLES

Table No. Title Page No.

1. Table of Contents v-vi

2. List of Tables vii

3. List of Figures viii


(viii)

LIST OF FIGURES

Figure Figure Discerption Page No.


No.

1. Training Photo 1 4

2. Training Photo 2 4

3. Training Photo 3 5

4. Training Photo 4 5

5. Training Photo 5 6

6. NumPy Example 10

7. Data Visualization 11

8. Type Of Plots 11

9. Matplotlib Example 12

10. Data Science 16

11. Types of Machine Learning 17

12. Project 24

13. Project (GitHub) 24


(ix)

ABSTRACT

The major of Machine Learning is concerned with the question of how to construct
computer programs that automatically enhance experiences. So, your answer is in your
data. Machine Learning is considered a subset of AI, which uses statistical methods to
enable machines to improve with experience. It enables a computer system to make
decisions to carry out a certain task. These programs or algorithms are designed in such a
way, which they can learn and enhance over time by observing new data. Machine
Learning aims to derive meaning from data. Thus, data is the key to unlocking Machine
Learning. The more qualified data ML has, the more accurate the ML algorithm becomes.
Data science is the study of data to establish its origin, content matter, and how it can be
of benefit. It is about equipping you with how to extract meaning from complex large
amounts of data. The data either be structured or unstructured, and the goal is to obtain
valuable insights about business or market patterns to help inform business decisions.
Data scientists are specialists who work to convert raw data into meaningful business
matters. They are usually trained and skilled in algorithmic coding, data mining, machine
learning, and statistics. Data science also incorporates other fields like mathematics,
statistics, and computation to understand and present data. The two fields are similar in
that squares are like rectangles, but rectangles are not squares. Data science is the
rectangle, while machine learning is the square; creating something different requires a
unique skill set. Data science involves researching, building, and interpreting a model you
have built, while machine learning involves producing that model. Data science uses a
scientific approach to obtain meaning from data, while machine learning deals with
system programming to automate and improve learning from data. Machine learning
cannot exist without data science since the data needs to be prepared before creating,
training, and testing the model.
(1)
CHAPTER 1
INTRODUCTION
1.1 About Company
I had gone for an in-house Summer Internship offered by the Poornima Institute of
Engineering & Technology, Sitapura 302022, Jaipur (Rajasthan). Poornima Institute of
Engineering & Technology (PIET) is a constituent college of the Poornima Group of
Colleges. The institute was established in 2007 in Jaipur, Rajasthan. Poornima Institute of
Engineering & Technology (PIET) offers a 4-year B. Tech program under 4 disciplines
with an annual intake of 420 students. The institute is affiliated with Rajasthan Technical
University (RTU).
Institute is Collaborating with IBM Lab for research on Business Intelligence and Cloud
Computing.
Institute has MTLC (Mission 10X Technology Learning Center) by Wipro.
Institute organizes several workshops on Technical and Non-Technical Topics. Institute
has tie-ups with industries and academics.
Institute has collaborations with Wipro for Wipro Mission10X.
Two Centre of Excellence recognized by Rajasthan Technical University:
Integrated Design and Innovations in Advanced Digital Manufacturing and AI & Big Data.

1.2 Training Platform

I have done my Summer Internship in Machine Learning & Data Science at Poornima Institute
of Engineering & Technology. The institute offered an in-house internship opportunity for
students of Poornima Group. Being a student of Poornima College of Engineering, which is also
a part of Poornima Group. I went for that opportunity. The mode of training was Offline in PIET.

Our training was held in Neural Network & Deep Learning Laboratory in Offline Mode for 45
days of our training.

1.3 Training Starting Date


The Summer Internship in Machine Learning & Data Science was an internship program and the
starting date of the training was the 27th of June,2022.

(2)

1.4 Training End Date

The Summer Internship in Machine Learning & Data Science was an internship program offered
by PIET and the ending date of the training was the 8th of August,2022.

1.5 Total Training Duration

The Summer Internship in Machine Learning & Data Science was an internship program offered
by PIET, which gave us exposure to the industry and how the industry works. There were
sessions and industrial visits in our internship program that shows us the real industry scenario.
The total duration of the training was 45 days.

1.6 Date of Certifications

The duration of industrial training was 45 days and after its completion, we were assigned major
projects to submit in order to have our certificates. The exhibition was held for the projects on
18th August 2022. After the submission of my major project, I received my certificate on that
day.
(3)
1.7 Training Photos

Fig.no-1: Training Photo1


Fig.no-2: Training Photo2
(4)

Fig.no-3: Training Photo3


Visit To Bhamoshah Techno Hub
Fig.no-4: Training Photo4
Visit To Bhamoshah Techno Hub

(5)

Fig.no-5: Training Photo5


Visit Auriga IT Solutions

1.8 Conclusion

The technology of training in the Summer Internship was Machine Learning & Data Science.
The institute and our training coordinators worked hard to train us. During, the internship I
learned initially starting from the basics of python, NumPy, Pandas, and Machine Learning
Techniques. I can perform data cleaning, data scraping, data manipulation, and drawing the
conclusion in form of client understandable format.
(6)

CHAPTER 2
TECHNICAL TRAINING PLATFORM

2.1 Introduction
I did Summer Internship in Machine Learning & Data Science at Poornima Institute of
Engineering & Technology. The institute offered an in-house internship opportunity for students
of Poornima Group. Being a student of Poornima College of Engineering, which is also a part of
Poornima Group. I went for that opportunity. The mode of training was Offline in PIET.

Our training was held in Neural Network & Deep Learning Laboratory in Offline Mode
for 45 days of our training.

2.2 Reason for selecting this platform


The reason for selecting Poornima Institute of Engineering & Technology for my summer
training is that the institute was offering in-house internship program in offline manner.
Institute is Collaborating with IBM Lab for research on Business Intelligence and Cloud
Computing. Institute has MTLC (Mission 10X Technology Learning Centre) by Wipro.
The institute is also doing good in field of Artificial Intelligence and Machine Learning.
That is why, I went for the internship summer training in Poornima Institute of
Engineering & Technology.

2.3 Profile of Organization


Poornima Institute of Engineering & Technology (PIET) offers a 4-year B. Tech program
under 4 disciplines with an annual intake of 420 students. The institute is affiliated with
Rajasthan Technical University (RTU).
Institute collabs with IBM Lab for research on Intelligence and Cloud Computing.
Institute has MTLC (Mission 10X Technology Learning Center) by Wipro.
Institute organizes several workshops on Technical and Non-Technical Topics.
Two Centre of Excellence recognized by Rajasthan Technical University:
Integrated Design and Innovations in Advanced Digital Manufacturing and AI & Big Data.
(7)
2.4 Conclusion
The institute offered an in-house internship opportunity for students of Poornima Group. Being
a student of Poornima College of Engineering, which is also a part of Poornima Group. I went
for that opportunity. The mode of training was Offline in PIET.

Our training was held in Neural Network & Deep Learning Laboratory in Offline Mode for 45
days of our training.

The technology of training in the Summer Internship was Machine Learning & Data
Science. The institute and our training coordinators worked hard to train us. During, the
internship I learned initially starting from the basics of python, NumPy, Pandas, and
Machine Learning Techniques. I can perform data cleaning, data scraping, data
manipulation, and drawing the conclusion in form of client understandable format.
(8)

CHAPTER 3
OVERVIEW OF TECHNOLOGY LEARNED

3.1 Python and its Libraries


Python is a high-level, interpreted scripting language developed in the late 1980s by Guido van
Rossum at the National Research Institute for Mathematics and Computer Science in the
Netherlands. The initial version was published at the alt. sources newsgroup in 1991, and version
1.0 was released in 1994.

Python 2.0 was released in 2000, and the 2.x versions were the prevalent releases until December
2008. At that time, the development team made the decision to release version 3.0, which
contained a few relatively small but significant changes that were not backward compatible with
the 2.x versions. Python 2 and 3 are very similar, and some features of Python 3 have been
backported to Python 2. But in general, they remain not quite compatible.

3.2 Python Libraries

The python libraries that are mainly used for machine learning and data science are as follows:

1. NumPy

2. Matplotlib

3. Pandas

1. NumPy

NumPy stands for numeric python which is a python package for the computation and
processing of the multidimensional and single dimensional array elements. Travis
Oliphant created NumPy package in 2005 by injecting the features of the ancestor module
Numeric into another module Numarray. It is an extension module of Python which is mostly
written in C.

(9)

The need of NumPy

With the revolution of data science, data analysis libraries like NumPy, SciPy, Pandas, etc. have
seen a lot of growth. With a much easier syntax than other programming languages, python is the
first-choice language for the data scientist.

NumPy provides a convenient and efficient way to handle the vast amount of data. NumPy is
also very convenient with Matrix multiplication and data reshaping. NumPy is fast which makes
it reasonable to work with a large set of data.

There are the following advantages of using NumPy for data analysis.

1. NumPy performs array-oriented computing.

2. It efficiently implements the multidimensional arrays.

3. It performs scientific computations.

4. It is capable of performing Fourier Transform and reshaping the data stored in


multidimensional arrays.

Nowadays, NumPy in combination with SciPy and Mat-plotlib is used as the replacement to
MATLAB as Python is more complete and easier programming language than MATLAB.
Fig.no-6: NumPy (10)

2. Matplotlib

Human minds are more adaptive for the visual representation of data rather than textual data. We
can easily understand things when they are visualized. It is better to represent the data through
the graph where we can analyze the data more efficiently and make the specific decision
according to data analysis. Before learning the matplotlib, we need to understand data
visualization and why data visualization is important.

Data Visualization

Fig.no-7: Data Visualization


Data visualization is a new term. It expresses the idea that involves more than just representing
data in the graphical form (instead of using textual form).

There are five key plots that are used for data visualization.

Fig.no-8: Types of plots (11)

Need of Matplotlib

o It identifies areas that need improvement and attention.

o It clarifies the factors.

o It helps to understand which product to place where.

o Predict sales volumes.

Example:

from matplotlib import pyplot as plt

x = [5, 2, 9, 4, 7]

y = [10, 5, 8, 4, 2]

plt.plot(x, y)

plt.show()
Output:

Fig.no-9: Matplotlib

(12)

3. Pandas

Pandas is defined as an open-source library that provides high-performance data manipulation in


Python. The name of Pandas is derived from the word Panel Data, which means an Econometrics
from Multidimensional data. It is used for data analysis in Python and developed by Wes
McKinney in 2008.

Data analysis requires lots of processing, such as restructuring, cleaning or merging, etc. There
are different tools are available for fast data processing, such as NumPy, SciPy, Cython,
and Panda. But we prefer Pandas because working with Pandas is fast, simple and more
expressive than other tools.

Benefits of Pandas:

o Data Representation: It represents the data in a form that is suited for data analysis
through its Data Frame and Series.
o Clear code: The clear API of the Pandas allows you to focus on the core part of the code.
So, it provides clear and concise code for the user.

Python Pandas Data Structure

The Pandas provides two data structures for processing the data, i.e., Series and DataFrame,
which are discussed below:

1. Series is defined as a one-dimensional array that is capable of storing various data types.
The row labels of series are called the index. We can easily convert the list, tuple, and
dictionary into series using "series' method. A Series cannot contain multiple columns. It
has one parameter:

Data: It can be any list, dictionary, or scalar value.

(13)

Creating Series from Array:

import pandas as pd  
import numpy as np  
info = np. array(['P','a','n','d','a','s'])  
a = pd. Series(info)  
print(a)  

Output

0 P
1 a
2 n
3 d
4 a
5 s
dtype: object
2. DataFrame is a widely used data structure of pandas and works with a two-dimensional
array with labelled axes (rows and columns). DataFrame is defined as a standard way to
store data and has two different indexes, i.e., row index and column index.

Example: import pandas as pd  

x = ['Python', 'Pandas']  

df = pd.DataFrame(x) 

print(df)  

Output

0
0 Python
1 Pandas

(14)

3.2 Machine Learning and Data Science

Data Science

Data science is a deep study of the massive amount of data, which involves extracting
meaningful insights from raw, structured, and unstructured data that is processed using the
scientific method, different technologies, and algorithms.

It is a multidisciplinary field that uses tools and techniques to manipulate the data so that you can
find something new and meaningful.

Data science uses the most powerful hardware, programming systems, and most efficient
algorithms to solve the data related problems. It is the future of artificial intelligence.
In short, we can say that data science is all about:

o Asking the correct questions and analysing the raw data.

o Modelling the data using various complex and efficient algorithms.

o Visualizing the data to get a better perspective.

o Understanding the data to make better decisions and finding the final result.

Need of Data Science

Some years ago, data was less and mostly available in a structured form, which could be easily
stored in excel sheets, and processed using BI tools.

But in today's world, data is becoming so vast, i.e., approximately 2.5 quintals bytes of data is
generating on every day, which led to data explosion. It is estimated as per researches, that by
2020, 1.7 MB of data will be created at every single second, by a single person on earth. Every
Company requires data to work, grow, and improve their businesses.

Now, handling of such huge amount of data is a challenging task for every organization. So, to
handle, process, and analysis of this, we required some complex, powerful, and efficient
algorithms and technology, and that technology came into existence as data Science.

(15)

Following are some main reasons for using data science technology:

o Data science technology is opting by various companies, whether it is a big brand or a


start-up. Google, Amazon, Netflix, etc, which handle the huge amount of data, are using
data science algorithms for better customer experience.

o Data science is working for automating transportation such as creating a self-driving car,
which is the future of transportation.

o Data science can help in different predictions such as various survey, elections, flight
ticket confirmation, etc.
Fig.no-10: Data Science

(16)

Machine Learning

Machine learning is a subset of AI, which enables the machine to automatically learn from data,
improve performance from past experiences, and make predictions. Machine learning contains a
set of algorithms that work on a huge amount of data. Data is fed to these algorithms to train
them, and on the basis of training, they build the model & perform a specific task.

These ML algorithms help to solve different business problems like Regression, Classification,
Forecasting, Clustering, and Associations, etc.
Based on the methods and way of learning, machine learning is divided into mainly four types,
which are:

1. Supervised Machine Learning

2. Unsupervised Machine Learning

3. Semi-Supervised Machine Learning

4. Reinforcement Learning

Fig.no-11: Types of Machine Learning

(17)

1. Supervised Machine Learning

As its name suggests, Supervised machine learning is based on supervision. It means in the


supervised learning technique, we train the machines using the "labelled" dataset, and based on
the training, the machine predicts the output. Here, the labelled data specifies that some of the
inputs are already mapped to the output. More preciously, we can say; first, we train the machine
with the input and corresponding output, and then we ask the machine to predict the output using
the test dataset.
The main goal of the supervised learning technique is to map the input variable(x) with the
output variable(y). Some real-world applications of supervised learning are Risk Assessment,
Fraud Detection, Spam filtering, etc.

Supervised machine learning can be classified into two types of problems, which are given
below:

o Classification

o Regression

Classification

Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification
algorithms predict the categories present in the dataset.

Some popular classification algorithms are given below:

o Random Forest Algorithm

o Decision Tree Algorithm

o Logistic Regression Algorithm

o Support Vector Machine Algorithm

(18)

Regression

Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous output
variables, such as market trends, weather prediction, etc.

Some popular Regression algorithms are given below:


o Simple Linear Regression Algorithm

o Multivariate Regression Algorithm

o Decision Tree Algorithm

o Lasso Regression

2.Unsupervised Machine Learning

Unsupervised learning is different from the Supervised learning technique; as its name suggests,
there is no need for supervision. It means, in unsupervised machine learning, the machine is
trained using the unlabeled dataset, and the machine predicts the output without any supervision.

In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories the unsorted
dataset according to the similarities, patterns, and differences. Machines are instructed to find the
hidden patterns from the input dataset.

(19)

Unsupervised Learning can be further classified into two types, which are given below:

o Clustering

o Association

Clustering
The clustering technique is used when we want to find the inherent groups from the data. It is a
way to group the objects into a cluster such that the objects with the most similarities remain in
one group and have fewer or no similarities with the objects of other groups. An example of the
clustering algorithm is grouping the customers by their purchasing behavior.

ADSome of the popular clustering algorithms are given below:

o K-Means Clustering algorithm

o Mean-shift algorithm

o DBSCAN Algorithm

o Principal Component Analysis

o Independent Component Analysis

Association

Association rule learning is an unsupervised learning technique, which finds interesting relations
among variables within a large dataset. The main aim of this learning algorithm is to find the
dependency of one data item on another data item and map those variables accordingly so that it
can generate maximum profit. This algorithm is mainly applied in Market Basket analysis, Web
usage mining, continuous production, etc.

Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-growth
algorithm.

(20)

3. Semi-Supervised Learning

Semi-Supervised learning is a type of Machine Learning algorithm that lies between Supervised
and Unsupervised machine learning. It represents the intermediate ground between Supervised
(With Labelled training data) and Unsupervised learning (with no labelled training data)
algorithms and uses the combination of labelled and unlabeled datasets during the training
period.

Although Semi-supervised learning is the middle ground between supervised and unsupervised
learning and operates on the data that consists of a few labels, it mostly consists of unlabeled
data. As labels are costly, but for corporate purposes, they may have few labels. It is completely
different from supervised and unsupervised learning as they are based on the presence & absence
of labels.

To overcome the drawbacks of supervised learning and unsupervised learning algorithms, the
concept of Semi-supervised learning is introduced. The main aim of semi-supervised learning is
to effectively use all the available data, rather than only labelled data like in supervised learning.
Initially, similar data is clustered along with an unsupervised learning algorithm, and further, it
helps to label the unlabeled data into labelled data. It is because labelled data is a comparatively
more expensive acquisition than unlabeled data.

We can imagine these algorithms with an example. Supervised learning is where a student is
under the supervision of an instructor at home and college. Further, if that student is self-
analyzing the same concept without any help from the instructor, it comes under unsupervised
learning. Under semi-supervised learning, the student has to revise himself after analyzing the
same concept under the guidance of an instructor at college.

(21)

4. Reinforcement Learning

Reinforcement learning works on a feedback-based process, in which an AI agent (A software


component) automatically explore its surrounding by hitting & trail, taking action, learning from
experiences, and improving its performance. Agent gets rewarded for each good action and get
punished for each bad action; hence the goal of reinforcement learning agent is to maximize the
rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents learn
from their experiences only.

The reinforcement learning process is similar to a human being; for example, a child learns


various things by experiences in his day-to-day life. An example of reinforcement learning is to
play a game, where the Game is the environment, moves of an agent at each step define states,
and the goal of the agent is to get a high score. Agent receives feedback in terms of punishment
and rewards.

Due to its way of working, reinforcement learning is employed in different fields such as Game
theory, Operation Research, Information theory, multi-agent systems.

A reinforcement learning problem can be formalized using Markov Decision Process


(MDP). In MDP, the agent constantly interacts with the environment and performs actions; at
each action, the environment responds and generates a new state.

Reinforcement learning is categorized mainly into two types of methods/algorithms:

o Positive Reinforcement Learning: Positive reinforcement learning specifies increasing


the tendency that the required behavior would occur again by adding something. It
enhances the strength of the behavior of the agent and positively impacts it.

o Negative Reinforcement Learning: Negative reinforcement learning works exactly


opposite to the positive RL. It increases the tendency that the specific behavior would
occur again by avoiding the negative condition.

(22)

CHAPTER 4
PROJECT DESCRIPTION
Description

Title: Predict Dropout or Academic Success

The project Predict Dropout and Academic Success aims to contribute to the reduction of
academic dropout and failure in higher education, by using machine learning techniques to
identify students at risk at an early stage of their academic path, so that strategies to support them
can be put into place. The dataset includes information known at the time of student enrollment –
academic path, demographics, and social-economic factors. The problem is formulated as a
three-category classification task (dropout, enrolled, and graduate) at the end of the normal
duration of the course.

The data is used to build classification models for predicting the student’s academic success and
dropout. This problem is formulated as a three-category classification task, in which there is a
strong imbalance towards one of the classes.

Predict Dropout or Academic Success is a machine learning model that a student will drop out
or will have academic success based on the variables given i.e., Curricular units 1 st Sem and 2nd
Sem, age, and gender.

(23)

Project Snapshots:
Fig.no-12: Project

Fig.no-13: Project (GitHub) (24)

CONCLUSION
Machine Learning can be a Supervised or Unsupervised. If you have lesser amount of data and
clearly labelled data for training, opt for Supervised Learning. Unsupervised Learning would
generally give better performance and results for large data sets. If you have a huge data set
easily available, go for deep learning techniques. You also have learned Reinforcement Learning
and Deep Reinforcement Learning. You now know what Neural Networks are, their applications
and limitations.

Finally, when it comes to the development of machine learning models of your own, you looked
at the choices of various development languages, IDEs and Platforms. Next thing that you need
to do is start learning and practicing each machine learning technique. The subject is vast, it
means that there is width, but if you consider the depth, each topic can be learned in a few hours.
Each topic is independent of each other. You need to take into consideration one topic at a time,
learn it, practice it and implement the algorithm/s in it using a language choice of yours. This is
the best way to start studying Machine Learning. Practicing one topic at a time, very soon you
would acquire the width that is eventually required of a Machine Learning expert.

Using machine learning is a powerful tool that can help you gain valuable insight from your data.
However, it’s important to remember that it is still an art to master. It’s imperative that you have
a good understanding of how to organize and use data. The field of data science is a complex one
that spans a variety of domains, and machine learning is one of the most exciting. This
technology can help your business solve problems and make better decisions by using data to
predict the future. Using machine learning algorithms can help you prevent financial fraud.
These algorithms analyze billions of online transactions and recognize patterns in them, enabling
them to generate insights about new data.

Because machine learning involves coding lessons from examples of good data, it’s a versatile
and powerful tool. The applications of these techniques are limitless, and there’s no shortage of
opportunities for a data scientist with expertise in these techniques.

(25)

REFERENCES
[1] Machine Learning by Tom M. Mitchell
[2] Machine Learning Using Python by Manaranjan Pradhan
[3] Superintelligence by Nick Bostrom
[4] docs.python.org
[5] Building a Reproducible Machine Learning Pipeline (Paper)
[6] A Tour of End-to-End Machine Learning Platforms (Article)
[7] Efficient ML engineering: Tools and best practices (Article)
[8] MLOps.community (Community)

(26)

You might also like