Ritesh Mangla ML PracticalFile
Ritesh Mangla ML PracticalFile
Ritesh Mangla ML PracticalFile
1. Objective
Retrieve data from a database using Python.
2. Data Overview
The dataset utilized for this task will be analyzed, with key statistical summaries and descriptive
metrics presented to provide insights into the data structure and content.
3. Techniques used (Algorithm)
● Python Database Connectivity: Using Python libraries like sqlite3 or MySQL connector to
establish a connection to a database and extract data.
Input
● Database connection details (e.g., database file for SQLite, or host, user, password for
other databases).
● SQL query : A query string to select and retrieve the necessary data.
● Optional : Any query parameters to customize the SQL command safely.
Output
Steps
4. Results
Data extraction successful. The data has been loaded into a pandas DataFrame.
Problem 2: Write a program to implement Linear and Logistic
Regression
1. Objective
This program performs both Linear and Logistic Regression on a given dataset, splitting it into
training and testing sets for evaluation.
2. Data Statistics
Data statistics and descriptions will be presented based on the dataset used for this problem.
3. Techniques used (Algorithm)
● Linear Regression: Linear regression is used to predict a continuous
variable based on one or more independent variables.
● Logistic Regression: Logistic regression is used for classification
problems where the dependent variable is categorical.
Output:
● Predicted values for the target variable on the test set.
● Evaluation metrics such as Mean Squared Error (MSE) and R-
squared (R²) score.
● (Optional) A scatter plot showing the actual vs predicted values.
Steps:
Input:
● A dataset containing features (independent variables) and a binary
target variable (0 or 1). For example, a dataset where features include
age, gender, fare, etc., and the target is whether the person survived
(1) or not (0) on the Titanic.
Output:
● Predicted binary class labels on the test set (e.g., survived or not
survived).
● Evaluation metrics such as Accuracy, Precision, Recall, F1-score,
and a Confusion Matrix.
● (Optional) Visualization of the confusion matrix.
Steps:
4. Results
The Linear Regression and Logistic Regression models have been trained.
Confusion matrix and evaluation metrics are provided for Logistic
Regression.
2. Data Statistics
Data statistics and descriptions will be presented based on the dataset used
for this problem.
3. Techniques used (Algorithm)
● Naive Bayes Classifier: Naive Bayes is a simple probabilistic classifier
based on applying Bayes' theorem with strong independence
assumptions.
Input:
● A dataset stored in a CSV file containing features (independent
variables) and a target variable (dependent variable).
Output:
● Classification results (predicted class labels).
● Performance metrics such as accuracy, precision, recall, F1-score,
and Confusion Matrix.
Steps:
4. Results
Naive Bayes classifier has been implemented. Accuracy metrics and
confusion matrix are provided.
2. Data Statistics
Data statistics and descriptions will be presented based on the dataset used
for this problem.
3. Techniques used (Algorithm)
● K-Nearest Neighbors (KNN): KNN is a simple, instance-based learning
algorithm where the class of a new sample is predicted based on the
majority vote of its k-nearest neighbors.
● Support Vector Machine (SVM): SVM is a supervised learning algorithm
that finds the optimal hyperplane which maximizes the margin between
different classes.
Input:
● A dataset containing features (independent variables) and a target
variable (dependent variable).
Output:
● Classification results (predicted class labels).
● Performance metrics such as accuracy, precision, recall, F1-score,
and Confusion Matrix for both KNN and SVM.
4. Results
Both KNN and SVM models have been trained on the dataset. Confusion
matrix and classification report are provided for both.
Problem 5: Implement classification of a given dataset using
random forest
1. Problem Statement
2. Data Statistics
Data statistics and descriptions will be presented based on the dataset used
for this problem.
3. Techniques
used (Algorithm)
Random Forest is a popular and powerful ensemble learning method
used for classification and regression tasks. It builds multiple decision trees
during training and merges their outputs to improve the overall performance
and control overfitting.
Steps:
1. Load the Dataset:
● Input: Load the dataset from a CSV file using pandas.
● Output: DataFrame holding the dataset.
2. Preprocess the Data:
● Handle missing values, encode categorical variables using
LabelEncoder, and optionally scale the data.
3. Define Features (X) and Target (y):
● Separate the features (X) and the target variable (y).
4. Split the Data:
● Use train_test_split to divide the dataset into training and testing sets.
5. Initialize the Random Forest Classifier:
● Initialize RandomForestClassifier from sklearn and specify the
number of decision trees (n_estimators).
6. Train the Model:
● Fit the Random Forest model on the training data using .fit().
7. Make Predictions:
● Predict class labels for the test set using .predict().
8. Evaluate the Model:
● Calculate accuracy, precision, recall, F1-score, and the confusion
matrix using sklearn.metrics. Visualize the confusion matrix using a
heatmap if needed.
4. Results
Problem 6: Build an Artificial Neural Network (ANN) by
implementing the Back propagation algorithm and test the
same using appropriate data sets.
1. Problem Statement
Build an Artificial Neural Network (ANN) by implementing the Back
propagation algorithm and test the same using appropriate data sets.
2. Data Statistics
Data statistics and descriptions will be presented based on the dataset used
for this problem.
3. Techniques used (Algorithm)
Artificial Neural Network (ANN)
Input:
● A dataset with features and target labels (for classification or
regression).
Output:
● Trained ANN model and performance metrics (like accuracy or mean
squared error).
Algorithm Steps:
7. Backpropagation:
○ Compute gradients of the loss with respect to the output layer's
weights and biases using the chain rule.
○ Propagate the gradients backward through the network:
■ Calculate gradients for the hidden layer weights and
biases.
○ Update the weights and biases using an optimization algorithm
(e.g., Gradient Descent).
8. Repeat Training:
○ Repeat steps 5 to 7 for a specified number of epochs or until
convergence (i.e., the loss is minimized).
1. Problem Statement
Apply K-Means algorithm K-Means algorithm to cluster a set of data stored
in a .CSV file. Use the same set for clustering using the K-Means algorithm.
Compare the results of these two algorithms and comment on the quality of
clustering. You can add Python ML library classes in the program.
2. Data Statistics
Data statistics and descriptions will be presented based on the dataset used
for this problem.
3. Techniques used (Algorithms)
Input
1. Dataset:
a. A CSV file (laptop.csv) containing structured data. The
dataset should ideally have a mix of numerical and
categorical features, such as:
i. Numerical Columns: E.g., Price, RAM, Storage, etc.
ii. Categorical Columns: E.g., Brand, Model, etc.
Output
Cluster Distribution:
○ Output of the distribution of clusters formed by both K-
Means and Hierarchical clustering.
Steps:
4. Results
Problem 8: Write a program to implement Self - Organizing
Map (SOM).
1. Problem Statement
Write a program to implement Self - Organizing Map (SOM).
2. Data Statistics
Data statistics and descriptions will be presented based on the dataset used
for this problem.
3. Techniques used (Algorithms)
Input
1. Import Libraries:
● Import necessary libraries like NumPy, Matplotlib, and MiniSom for
creating and visualizing the SOM.
3. Results
Problem 9: Write a Program for empirical comparison of
different supervised learning algorithms.
1. Problem Statement
Write a program for empirical comparison of different supervised learning
algorithms.
2. Data Statistics
Data statistics and descriptions will be presented based on the dataset used
for this problem.
3. Techniques used (Algorithms)
Empirical Comparison
Empirical comparison refers to the process of systematically evaluating and
contrasting the performance of different algorithms, models, or methods
based on actual observed data rather than theoretical predictions. This
approach is widely used in fields like machine learning, statistics, and
experimental sciences to derive insights and conclusions based on real-world
performance.
Supervised Learning
Definition: Supervised learning is a type of machine learning in which a
model is trained on labeled data. In this context, labeled data consists of
input-output pairs, where each input (feature) is associated with a known
output (target or label). The objective is to learn a mapping from inputs to
outputs so that the model can make accurate predictions on unseen data.
Input
● Dataset: In this example, the program uses the Iris dataset, which
consists of:
○ Features: Sepal length, Sepal width, Petal length, Petal width.
○ Target: Iris species (Setosa, Versicolor, Virginica).
Output
1. Evaluation Results:
○ The program outputs a DataFrame containing evaluation
metrics for each model.
1. Import Libraries:
● Import necessary libraries (e.g., pandas, numpy, sklearn, matplotlib).
5. Initialize Models:
● Create a list or dictionary of supervised learning models to compare
(e.g., Logistic Regression, Decision Tree, Random Forest, SVM, k-
NN).
8. Visualize Results:
● Create bar plots or other visualizations to compare the performance of
different models.
4. Results
Problem 10: Write a Program for empirical comparison of
different unsupervised learning algorithms.
1. Problem Statement
Write a program for empirical comparison of different unsupervised learning
algorithms.
2. Data Statistics
Data statistics and descriptions will be presented based on the dataset used
for this problem.
3. Techniques used (Algorithms)
Empirical Comparison
Empirical comparison refers to the process of systematically evaluating and
contrasting the performance of different algorithms, models, or methods
based on actual observed data rather than theoretical predictions. This
approach is widely used in fields like machine learning, statistics, and
experimental sciences to derive insights and conclusions based on real-world
performance.
Unsupervised Learning
Unsupervised learning is a type of machine learning where models are
trained on data without labeled outputs. In this context, the algorithm
attempts to identify patterns, groupings, or structures within the data without
any explicit guidance on what to look for. The primary objective is to
uncover hidden patterns or intrinsic structures in the input data.
Input
● A dataset in a CSV file containing features for clustering.
● The number of clusters (for algorithms that require this parameter,
such as K-Means).
Output
● Visual comparison of clustering results.
● Evaluation metrics (Silhouette Score, Davies-Bouldin Index) for each
algorithm.
Steps
1. Load the Dataset:
○ Import the necessary libraries (pandas, numpy, sklearn,
matplotlib, seaborn).
○ Load the dataset using pd.read_csv().
7. Display Results:
○ Print the evaluation metrics for each algorithm.
4. Results