Elsarticle Template New

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Elsevier LATEX template⋆

Elsevier1
Radarweg 29, Amsterdam

Elsevier Inca,b , Global Customer Serviceb,∗

a 1600 John F Kennedy Boulevard, Philadelphia


b 360
Park Avenue South, New York

Abstract
This template helps you to create a properly formatted LATEX manuscript.
Keywords: elsarticle.cls, LATEX, Elsevier, template
2010 MSC: 00-01, 99-00

1. Introduction

Email is one of the most popular and convenient means of communication today. As the use of email has
grown, so has the volume of emails people receive. This has made it difficult for individuals and organizations
to manage their email effectively. One solution to this problem is to classify emails automatically using
Natural Language Processing (NLP).
Email classification through NLP involves using machine learning algorithms and linguistic rules to
automatically assign categories to incoming emails based on their content. These categories can include
spam, marketing, personal, work-related, and more. Once emails are categorized, they can be prioritized,
labeled, and directed to the appropriate recipient or folder.
The purpose of this report is to provide an overview of email classification through NLP. We will discuss
the various techniques used for email classification, including feature extraction, machine learning algorithms,
and rule-based systems. We will also examine the advantages and limitations of these techniques and explore
the challenges in building an accurate and effective email classification system.
The problem of spam e-mail has been increasing for years. In recent statistics, 40% of all emails are
spam which about 15.4 billion email per day and that cost internet users about $355 million per year
Organisations and individuals face challenges of effectively managing and prioritising large volumes of emails
that they receive regularly. The task can be overwhelming and very time consuming, since it is very difficult
to distinguish important messages from less important ones or spam ones.Furthermore, email security is a
concern, as users may receive spam or phishing emails that pose a risk to their personal information or the
organisation’s data.
To address the above mentioned issues, this project aims to provide a solution that can efficiently manage
and enhance email security systems. By preventing spam or phishing emails from reaching users’ inboxes,
the solution can help enhance email security and protect against potential data breaches.

2. Problem Statement

Organisations and individuals face challenges of effectively managing and prioritising large volumes of
emails that individuals and organisations receive regularly. The task can be overwhelming and very time

⋆ Fully documented templates are available in the elsarticle package on CTAN.


∗ Corresponding author
Email address: [email protected] (Global Customer Service)
URL: www.elsevier.com (Elsevier Inc)
1 Since 1880.

Preprint submitted to Journal of LATEX Templates May 5, 2023


consuming, since it is very difficult to distinguish important messages from less important ones or spam
ones.Furthermore, email security is a concern, as users may receive spam or phishing emails that pose a risk
to their personal information or the organisation’s data.
The purpose of this project is two folds -
1. Develop an automated system that can effectively manage and organise incoming emails. By detecting
and filtering out spam messages, the system can help users save time and focus on important messages.
2. Clustering similar emails together, the system can help users organise their inbox and identify groups
of related messages.

3. Objective

The purpose of this project is to develop an automated system that can effectively manage and organise
incoming emails. By detecting and filtering out spam messages, the system can help users save time and
focus on important messages. By clustering similar emails together, the system can help users organise their
inbox and identify groups of related messages. The project can be useful for individuals or organisations
that receive large volumes of emails and need a way to efficiently manage and prioritise them. It can also be
useful for improving email security by preventing spam or phishing messages from reaching users’ inboxes.

4. Methodology

1. Spam Detection of Emails: Identifying and filtering unwanted emails. Implementing models such as
Naives Bayes, LSTM, etc to identify patterns and features in new emails to classify them as spam or
not.
2. Clustering of Emails: Grouping similar emails into clusters based on their content and other features
using clustering algorithms.

5. Related Work

6. Dataset Description

The dataset that we have chosen for our analysis is Enron Email Dataset, it contains email generated by
employees of Enron Corporation.
The Data was obtained by the Federal Energy Regulatory Commission during the investigation of Enron’s
Collapse.
The form of Dataset we have chosen contains 6 parts and each has two folders, one for spam and other
for ham.
After Data Extraction, the data will be represented in below form:

7. Data Retrieval

- Dataset was retrieved from 3 folders each with spam and ham folder, which had text files -

8. EXPLORATORY DATA ANALYSIS

General - Data consists of 16541 rows and 2 columns (text and label - spam/ham) - Spam in the dataset
was marked as spam and non-spam messages as ham - The Dataset was cleared beforehand for null values
after retrieving. - The Dataset had 484 duplicate values.
Final Dataset shape after null values and duplicate removal as was - 16057X2
Spam and Ham Distribution

2
9. Front matter

The author names and affiliations could be formatted in two ways:


(1) Group the authors per affiliation.
(2) Use footnotes to indicate the affiliations.
See the front matter of this document for examples. You are recommended to conform your choice to the
journal you are submitting to.

10. Bibliography styles

There are various bibliography styles available. You can select the style of your choice in the preamble of
this document. These styles are Elsevier styles based on standard styles like Harvard and Vancouver. Please
use BibTEX to generate your bibliography and include DOIs whenever available.
Here are two sample references: [? ? ].

References

References

You might also like