Internship Report File
Internship Report File
Internship Report File
Bachelor of Technology
In
Computer Science & Engineering
by
Mohit Agrawal (16EELCS020)
I carried out my internship at Dzone Software Solution & Service Provider, Jaipur. Dzone
Software Solution & Service Provider represents the connected word offering innovative
and customer-centric information technology experiences, enabling Enterprises,
Associates and the Society to Rise. Dzone Software Solution & Service Provider provides
internship opportunity to the students in various emerging technologies.
The purpose of the program is to fulfil the core equipment for the award of a degree of
Bachelor of Technology in Computer Science and Engineering to get a practical aspect of
the theoretical work studied at the university and to understand the operation in the
corporate sector and to enable students gain experience in different tasks.
During my internship period, I was assigned to the department of Machine Learning where
I was assigned to make a Disease Prediction Software. There I interacted with many
working professionals.
There I have gained the knowledge of the things actually work in an organization like the
complete procedure of implementing a project which include the understanding of the
problem, the cost estimation of project, the methodology and the final implementation of
the project.
In conclusion, this was an opportunity to develop and enhance skills and competencies in
my career field which I actually achieved.
i
ACKNOWLEDGEMENT
I would like to take the opportunity to thank and express my deep sense of
gratitude to my corporate mentor Mr. Hemant Saxena and my faculty
mentor Prof. Arvind Singh Chaudhary. I am greatly indebted to both of
them for providing their valuable guidelines at all stages of the study, their
advice, constructive suggestions, positive and supportive attitude and
continuous encouragement, without which it would have not been possible to
complete the project.
I would also like to thank my supervisor Ms. Surbhi Saxena, who helped me
a lot during my internship period in completing my machine learning project.
I hope that I can build upon the experience and knowledge that I have gained
and make a valuable contribution towards this industry in coming future.
Mohit Agrawal
ii
CERTIFICATE
iii
Table of Content
iv
INTRODUCTION
The Company
Dzone Software Solution & Service Provider represents the connected word offering
innovative and customer-centric information technology experiences, enabling Enterprises,
Associates and the Society to Rise.
Dzone Software Solution & Service Provider is providing its services in field of software
solution for the application development sector and ERP design with accelerated growth
over the last 10 years. Our mission is to provide to our customer cost effective state of the
art product and services, to enable them to implement straight through processes to better
serve and retain their clients. We employ highly trained specialized and motivated people
to deliver outstanding consulting implementation and training services.
We believe “Innovate from inside” i.e. we offer innovative solutions to our valuable
customers that enable them to realize their full potential; we anticipate future trends and
demand by engaging in active dialogue with our customers. Our commitment to our
customer satisfaction is only matched by a relentless quest for forming strategic alliances
with world-class software vendors and business consultants that assist us to expand and
improve our value proposition to the benefit of our customer
Vision
We will Rise™ to be among the top three leaders in each of our chosen market segments
while fostering innovation and inclusion.
We will consistently achieve top quartile growth by contributing to our customers' success,
by enabling our employees to realize their potential and by creating value for all our
stakeholders.
History
Dzone Software Solution & Service Provider started in 2010 as a technology outsourcing.
1
SOLUTIONS AND SERVICES
Python Programming
Swift Programming
JavaScript
Java
Infrastructure and Cloud Services
Mobile App Development
Customer Experience
DevOps
Enterprise Architecture
Machine Learning
2
Machine Learning
Introduction
Learning Algorithms
The types of machine learning algorithms differ in their approach, the type of data they
input and output, and the type of task or problem that they are intended to solve.
o Supervised Learning
o Semi-Supervised Learning
o Unsupervised Learning
o Reinforcement Learning
o Features Learning
3
Features of Machine Learning
2) Explore new learning methods and develop general learning algorithms independent of
applications.
5) ML will produce smarter computers capable of all the above intelligent behavior.
ML Applications
There are many machine learning applications in the market. The top categories are:
o Banking
o Financial Market Analysis
o Medical Diagnosis
o Natural Language Processing
o Sentiments Analysis
o Recommendation Systems
o Time Series Forecasting etc.
History
Arthur Samuel, an American pioneer in the field of computer gaming and artificial
intelligence, coined the term "Machine Learning" in 1959 while at IBM.
A representative book of the machine learning research during 1960s was the Nilsson's
book on Learning Machines, dealing mostly with machine learning for pattern
classification.
However, an increasing emphasis on the logical, knowledge-based approach caused a rift
between AI and machine learning. Probabilistic systems were plagued by theoretical and
practical problems of data acquisition and representation.
4
HL vs ML
5
Machine Learning Architecture
6
Machine Learning Algorithms
Supervised Learning
Supervised learning is a machine learning technique for learning a function from training
data. The training data consist of pairs of input objects (typically vectors), and desired
outputs. The output of the function can be a continuous value (called regression), or can
predict a class label of the input object (called classification).
7
Unsupervised Learning -
Unsupervised learning is a type of machine learning where manual labels of inputs are not
used. It is distinguished from supervised learning approaches which learn how to perform
a task, such as classification or regression, using a set of human prepared examples.
Semi-supervised Learning -
Semi-supervised learning combines both labeled and unlabeled examples to generate an
appropriate function or classifier.
Reinforcement Learning -
Reinforcement Learning where the algorithm learns a policy of how to act given an
observation of the world. Every action has some impact in the environment, and the
environment provides feedback that guides the learning algorithm.
Transduction -
Similar to supervised learning, but does not explicitly construct a function.
Learning to Learn -
Learning to learn where the algorithm learns its own inductive bias based on previous
experience.
8
Algorithms Types
Linear Classifiers -
In machine learning, the goal of classification is to group items that have similar feature.
1. Fisher’s Linear Discriminant
2. Naïve Bayes Classifier
3. Perception
4. Support Vector Machine
Decision Tree -
A decision tree is a hierarchical data structure implementing the divide-and-conquer
strategy. It is an efficient nonparametric method, which can be used for both classification
and regression. A decision tree is a hierarchical model for supervised learning whereby the
local region is identified in a sequence of recursive splits in a smaller number of steps. A
decision tree is composed of internal decision nodes and terminal leaves (see figure).
9
Machine Learning Development Lifecycle
Machine learning projects are highly iterative; as you progress through the ML lifecycle,
you’ll find yourself iterating on a section until reaching a satisfactory level of performance,
then proceeding forward to the next task (which may be circling back to an even earlier
step).
10
Data Collection and labelling-
Model Exploration-
Model Refinement-
Evaluate model on test distribution; understand differences between train and test
set distributions (how is “data in the wild” different than what you trained on)
Revisit model evaluation metric; ensure that this metric drives desirable
downstream user behavior
Model Deployment-
11
Setting up a ML Codebase
data/ provides a place to store raw and processed data for your project.
docker/ is a place to specify one or many Docker files for the project.
models/ defines a collection of machine learning models for the task, unified by a common
API defined in base.py.
train.py defines the actual training loop for the model. This code interacts with the
optimizer and handles logging during training.
12
ML-based System Testing and Monitoring-
Training system processes raw data, runs experiments, manages results, stores weights.
Test the full training pipeline (from raw data to trained model) to ensure that
changes haven't been made upstream with respect to how data from our application
is stored. These tests should be run nightly/weekly.
Prediction system constructs the network, loads the stored weights, and makes
predictions.
Run inference on the validation data (already processed) and ensure model score
does not degrade with new model/weights. This should be triggered every code
push.
Serving system exposed to accept "real world" input and perform inference on production
data. This system must be able to scale to demand.
Required monitoring:
Alerts for downtime and errors
Check for distribution shift in data
13
Machine Learning Project Structure-
Various businesses use machine learning to manage and improve operations. While ML
projects vary in scale and complexity requiring different data science teams, their general
structure is the same.
Disease Predication:
When a patient wants to consult to a doctor it may take much time or patient may be unable
to consult to a doctor at that incident. Then there is a solution of the problem is that He
can use Disease Prediction Software at primary level.
In this case, a user or patient can feed his symptoms to software, then machine learning
model will predict the disease using some machine learning algorithms.
Data is the foundation for any machine learning project. The second stage of project
implementation is complex and involves data collection, selection, preprocessing, and
transformation. Each of these phases can be split into several steps.
Data Collection
It’s time for a data analyst to pick up the baton and lead the way to machine learning
implementation. The job of a data analyst is to find ways and sources of collecting relevant
and comprehensive data, interpreting it, and analyzing results with the help of statistical
techniques.
14
Data Visualization
A large amount of information represented in graphic form is easier to understand and
analyze. Some companies specify that a data analyst must know how to create slides,
diagrams, charts, and templates.
Data Cleaning
This set of procedures allows for removing noise and fixing inconsistencies in data. A data
scientist can fill in missing data using imputation techniques, e.g. substituting missing
values with mean attributes.
15
3. Dataset Splitting
A dataset used for machine learning should be partitioned into three subsets - training, test,
and validation sets.
Training Set:
A data scientist uses a training set to train a model and define its optimal parameters -
parameters it has to learn from data.
16
Testing Set:
A test set is needed for an evaluation of the trained model and its capability for
generalization. The latter means a model’s ability to identify patterns in new unseen data
after having been trained over a training data. It’s crucial to use different subsets for
training and testing to avoid model overfitting, which is the incapacity for generalization
we mentioned above.
Validation Set:
The purpose of a validation set is to tweak a model’s hyper parameters — higher-level
structural settings that can’t be directly learned from data. These settings can express, for
instance, how complex a model is and how fast it finds patterns in data.
17
4. Modelling
During this stage, a data scientist trains numerous models to define which one of them
provides the most accurate predictions.
Model training
After a data scientist has preprocessed the collected data and split it into three subsets, he
or she can proceed with a model training. This process entails “feeding” the algorithm with
training data. An algorithm will process data and output a model that is able to find a target
value (attribute) in new data — an answer you want to get with predictive analysis. The
purpose of model training is to develop a model.
Supervised learning: Supervised learning allows for processing data with target attributes
or labeled data. These attributes are mapped in historical data before the training begins.
With supervised learning, a data scientist can solve classification and regression problems.
Unsupervised learning: During this training style, an algorithm analyzes unlabeled data.
The goal of model training is to find hidden interconnections between data objects and
structure objects by similarities or differences. Unsupervised learning aims at solving such
problems as clustering, association rule learning, and dimensionality reduction. For
instance, it can be applied at the data preprocessing stage to reduce data complexity.
18
Decision Tree Algorithm
Decision tree algorithm falls under the category of supervised learning. They can
be used to solve both regression and classification problems.
Decision tree uses the tree representation to solve the problem in which each leaf
node corresponds to a class label and attributes are represented on the internal node
of the tree.
19
Random Forest Algorithm
A Random Forest is an ensemble technique capable of performing both regression
and classification tasks with the use of multiple decision trees and a technique
called Bootstrap Aggregation, commonly known as bagging.
The basic idea behind this is to combine multiple decision trees in determining the
final output rather than relying on individual decision trees.
20
Naïve Bayer Algorithm
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’
Theorem. It is not a single algorithm but a family of algorithms where all of them share a
common principle, i.e. every pair of features being classified is independent of each other.
21
Module Evaluation and Testing
The goal of this step is to develop the simplest model able to formulate a target value fast
and well enough. A data scientist can achieve this goal through model tuning. That’s the
optimization of model parameters to achieve an algorithm’s best performance.
22
Cross-validation:
Cross-validation is the most commonly used tuning method. It entails splitting a training
dataset into ten equal parts (folds). A given model is trained on only nine folds and then
tested on the tenth one (the one previously left out). Training continues until every fold is
left aside and used for testing. As a result of model performance measure, a specialist
calculates a cross-validated score for each set of hyper parameters. A data scientist trains
models with different sets of hyper parameters to define which model has the highest
prediction accuracy. The cross-validated score indicates average model performance
across ten hold-out folds.
23
5. Model Deployment
The model deployment stage covers putting a model into production use.
Once a data scientist has chosen a reliable model and specified its performance
requirements, he or she delegates its deployment to a data engineer or database
administrator. The distribution of roles depends on your organization’s structure and the
amount of data you store.
24
TKINTER in Python
# gui_stuff---------------------------------------------------------------
---------------------
root = Tk()
root.title("My Doctor")
root.configure(background='white')
Heading in Window
# Heading
w2 = Label(root, justify=LEFT, text="My Doctor : Disease Predictor", fg="B
lack", bg="white")
w2.config(font=("Aharoni", 25))
w2.grid(row=1, column=1, columnspan=2, padx=100)
w2 = Label(root, justify=LEFT, text="A Project by Mohit Agrawal", fg="Gree
n", bg="white")
w2.config(font=("Aharoni", 15))
w2.grid(row=2, column=1, columnspan=2, padx=100)
# labels
NameLb = Label(root, text="Patient Name", fg="black", bg="white")
NameLb.grid(row=6, column=0, pady=25,sticky=W)
NameLb.config(font=("Aharoni", 15))
25
S2Lb = Label(root, text="Symptom 2", fg="black", bg="white")
S2Lb.grid(row=8, column=0, pady=20, sticky=W)
S2Lb.config(font=("Aharoni", 15))
List View
# entries
OPTIONS = sorted(l1)
NameEn = Entry(root,textvariable=Name,width=50,bg="black",fg="white")
NameEn.grid(row=6, column=1, padx=10)
26
Button
Text Fields
#textfileds
t1 = Text(root, height=1, width=40,bg="black",fg="white", pady=5)
t1.grid(row=15, column=1, padx=10, pady=5)
27
Disease Predictor Prototype
This Machine Learning project is used to predict the disease based on the
symptoms given by the user. So, the output is accurate.
The patient can fill up to 5 symptoms and based on these symptoms Machine
Learning will predict disease.
It predicts disease by using three different machine learning algorithms.
It uses tkinter for GUI and Numpy, Pandas for data mining.
28
Conclusion
During my two months of summer internship at Dzone Software Solution and Service
Provider, I have gained the exposure of the real working environment of a company and
learned how the things work in real life. I have received the exposure of the company
world.
As I have done my summer internship in Machine Learning with Python, I have learnt a
lot about this technology and there is a lot more to be learned in this technology. There is
a lot of stuff that can be done using this technology. In the training period, I have gone
through an intermediate level of developing an android app and there is a lot more to be
explored.
29
Bibliography
https://www.javatpoint.com/
https://www.dzone.co.in
https://www.kaggle.com/datasets
https://www.wikipedia.org/
https://www.youtube.com/
30