ANN-Classification-Customer-Churn-Prediction

This project aims to predict customer churn using an Artificial Neural Network (ANN) model. The project includes data preprocessing, model training, and deployment using a Streamlit app.

Overview

This code develops an artificial neural network model to perform binary classification on a bank customer churn dataset to predict whether a customer will leave the bank or not. It loads and preprocesses the dataset, does one-hot encoding of categorical variables, and develops and trains a simple multi-layer perceptron model using TensorFlow Keras with two hidden layers. The model is compiled and trained on the preprocessed training set. Then its predictive performance is evaluated on the held-out test set by generating predictions and calculating classification metrics like a confusion matrix and accuracy score. The code thus demonstrates the basic end-to-end workflow of developing, training, and evaluating an artificial neural network classifier on a real-world classification problem involving preprocessing of categorical variables.

Dataset

The dataset used in this project is Churn_Modelling.csv. It contains customer information and churn status, which is used to train the ANN model to predict churn.

Methodology

Data Collection

Gather the data required for the project. The specified dataset contains information about customers, including features like credit score, geography, gender, age, tenure, balance, and more.

Exploratory Data Analysis

Explore and analyze the dataset to understand its characteristics. This involves:

Checking for missing values
Handling duplicates
Visualizing distributions of variables
Exploring relationships between variables

The EDA process helps to gain insights into the data and make informed decisions about preprocessing steps.

Data Preprocessing

Clean and prepare the data for modeling. Steps may include:

Removing null values
Dropping unnecessary columns (e.g., RowNumber, CustomerId, Surname)
Handling duplicate records
Label encoding categorical variables
Standardizing numerical features

Splitting Data

Divide the dataset into training and testing sets. The training set is used to train the ANN, while the testing set is used to evaluate its performance.

Model Building

Construct an Artificial Neural Network for predicting customer churn. Design the architecture of the neural network, including the number of layers, activation functions, and neurons. A sequential model with layers of dense neurons is used.

Model Training

Train the ANN using the training set. Adjust model hyperparameters such as the number of epochs, batch size, and learning rate. Monitor the training process using TensorBoard for performance visualization. The trained model is saved as model.h5.

TensorBoard Visualization

Monitor the training process using TensorBoard for performance visualization. TensorBoard helps in visualizing the following metrics:

Accuracy: Track the accuracy of the model over epochs.
Loss: Observe how the loss decreases over training epochs.
Learning Rate: Visualize how the learning rate changes during training.

To set up TensorBoard, follow these steps:

Import TensorBoard and create a callback:

from tensorflow.keras.callbacks import EarlyStopping,TensorBoard
import datetime

log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorflow_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)

Include the callback in the model training process:

model.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    epochs=100,
    callbacks=[tensorflow_callback,early_stopping_callback]
)

To launch TensorBoard, use the following command in your terminal:
```
%tensorboard --logdir=logs/fit
```

The trained model is saved as model.h5.

Model Evaluation

Evaluate the trained model using the testing set. Calculate metrics such as accuracy, precision, recall, and F1-score. Generate a confusion matrix and a classification report to assess the model's performance.

Model Prediction

Use the prediction.ipynb notebook to demonstrate how to use the trained model for predicting customer churn on new data. Load the pre-trained model and apply it to new datasets to generate predictions.

Results

Achieved an impressive accuracy of 86.4% with the developed Artificial Neural Network (ANN) model, showcasing the effectiveness of the predictive model in customer churn prediction.

Deployment

Deploy the model using a Streamlit app (app.py). The app allows users to input customer data and get churn predictions. To run the app, execute the following command:

https://ann-classification-customer-churn-prediction-ananavr8p8fapjabn.streamlit.app/

This starts a web server and opens the app in the default web browser, enabling interaction with the model for churn predictions.

Challenges and Solutions

Mixed Data Types

Label encoding was used to convert categorical 'Gender' to numeric.
One-hot encoding handled multi-class 'Geography'.
No text features in this dataset. Normalization handled other numeric types.

Class Imbalance

Models generally perform poorly on imbalanced classes. Over-sampling could be used to duplicate minority class examples.
Using accuracy alone as a metric would mask poor minority class prediction. Confusion matrix helps identify true/false positives and negatives.

Model Optimization

Started with a simple 2 hidden layer network, gradually adjusted the number of units, activations, dropouts, etc.
Tried additional convolutional/LSTM layers since sequence/images were unavailable.
Used callback functions like EarlyStopping to prevent overfitting.
Permutation feature importance helped identify impactful predictors.

Prediction Interpretation

Studied relationships between features and targets via visualization.
Identified customer profiles most/least likely to churn based on predictions.
Used model to simulate retention programs - if changes are made profile is unlikely to churn.

Installation

To run this project, you need to have Python installed on your machine. Follow the steps below to set up the environment:

Clone the repository:

git clone https://github.com/Jayita11/ANN-Classification-Customer-Churn-Prediction/tree/main.git
cd ANN-Classification-Customer-Churn-Prediction

Install the required packages:
```
pip install -r requirements.txt
```

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Churn_Modelling.csv		Churn_Modelling.csv
LICENSE		LICENSE
README.md		README.md
app.py		app.py
experiments.ipynb		experiments.ipynb
label_encoder_gender.pkl		label_encoder_gender.pkl
model.h5		model.h5
onehot_encoder_geo.pkl		onehot_encoder_geo.pkl
prediction.ipynb		prediction.ipynb
requirements.txt		requirements.txt
scaler.pkl		scaler.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ANN-Classification-Customer-Churn-Prediction

Table of Contents

Overview

Dataset

Methodology

Data Collection

Exploratory Data Analysis

Data Preprocessing

Splitting Data

Model Building

Model Training

TensorBoard Visualization

Model Evaluation

Model Prediction

Results

Deployment

Challenges and Solutions

Mixed Data Types

Class Imbalance

Model Optimization

Prediction Interpretation

Installation

About

Releases

Packages

Languages

License

Jayita11/ANN-Classification-Customer-Churn-Prediction

Folders and files

Latest commit

History

Repository files navigation

ANN-Classification-Customer-Churn-Prediction

Table of Contents

Overview

Dataset

Methodology

Data Collection

Exploratory Data Analysis

Data Preprocessing

Splitting Data

Model Building

Model Training

TensorBoard Visualization

Model Evaluation

Model Prediction

Results

Deployment

Challenges and Solutions

Mixed Data Types

Class Imbalance

Model Optimization

Prediction Interpretation

Installation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages