Raport PFE
Raport PFE
Raport PFE
Reference :
« Title
Facial recognition access» managment
system based on artificial intelligence
Class: SR A - RST C
We would first and foremost like to use this occasion and address our sincere thanks
to the “Institut Supérieur des Études Technologiques en Communication” for having
provided us the good fortune of prospering and accomplishing the present project.
Therefore, it is our pleasure to thank Ms.Haifa Ben Saber, whose unwavering com-
mitment and substantial contributions have significantly enriched the substance of this
report. Ms.Ben Saber’s guidance, wealth of expertise, and consistent assistance were
pivotal in steering us through the complexities of the project and ensuring its successful
culmination.
Hence, with due regards to Ms. Haifa Ben Saber, it is crucial to notice her desire to
foster a learning and thinking environment for students. Needless to say it is far from
being a mere apprenticeship and the kind of knowledge she has endowed us with has
left imprints on our character, academic and professional careers.
2
Contents
General Introduction 9
1 Project Context 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Project context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Host company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 SFM Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 SFM activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Functional chart of SFM . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.9 Functional Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.10 Non Functional Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.11 Theoretical Gantt chart . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.12 Development methodology . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.12.1 Agile Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.12.2 SCRUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.12.3 Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.12.4 Meetings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.12.5 Sprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.12.6 Artefacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.13 Choice of the modeling language . . . . . . . . . . . . . . . . . . . . . . 12
1.14 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3
CONTENTS
3 System design 27
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.3 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 General Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4
CONTENTS
4 Implementation 37
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Working environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Hardware environment . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.2 Software environment . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Tech stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.1 Programming languages . . . . . . . . . . . . . . . . . . . . . . 38
4.3.2 Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.3 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.4 Application Servers . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4.1 LBPH(Local Binary Patterns): . . . . . . . . . . . . . . . . . . 41
4.4.2 ResNet(CNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4.3 FaceNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5
CONTENTS
4.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5.1 Facial detection component . . . . . . . . . . . . . . . . . . . . 43
4.5.2 Main Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.5.3 AI model server . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.6 Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.6.1 Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.6.2 Detection Performance . . . . . . . . . . . . . . . . . . . . . . . 47
4.6.3 Processing performance . . . . . . . . . . . . . . . . . . . . . . . 48
4.6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.7 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.8 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.8.1 HTTPS implementation . . . . . . . . . . . . . . . . . . . . . . 50
4.8.2 Authentication API implementation . . . . . . . . . . . . . . . . 50
4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6
List of Figures
2.1 The 5 different types of Haar-like features extracted from an image patch.[3] 16
2.2 Haar cascades are a class of object detection algorithms [4] . . . . . . . 17
2.3 Gray-scaled image and surrounding pixels.[5] . . . . . . . . . . . . . . . 18
2.4 Original Image.[5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 HOG image representation with characteristics.[5] . . . . . . . . . . . . 19
2.6 Face detected using a HOG model generated from multiple faces.[5] . . 19
2.7 CNN architecture.[6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.8 MTCNN 3-network stages[7] . . . . . . . . . . . . . . . . . . . . . . . . 21
2.9 Facenet embedding vector extracted from image . . . . . . . . . . . . . 22
2.10 Triplet loss example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.11 VGG-Face data augmentation example . . . . . . . . . . . . . . . . . . 24
2.12 Facial Embeddings Extraction System . . . . . . . . . . . . . . . . . . 25
7
LIST OF FIGURES
8
List of Tables
9
General Introduction
This report delves into the world of facial recognition systems. We’ll explore the
core functionalities of these systems, examining how they leverage facial features for
accurate identification.
The first chapter will explore the project context, introducing the host company and
its activities and address the problem statement as well as the solution according to
the functional and non functional needs while defining the agile method applied during
the project.
The third chapter will address the system design and its modules along with multiple
diagrams explaining the process of accurate face recognition from the camera input to
the identification of the user.
The last chapter will discuss the implementation of the system, performance testing
and optimisation while ensuring the highest security levels.
1
Chapter 1
Project Context
2
CHAPTER 1. PROJECT CONTEXT
1.1 Introduction
As security concerns rise with the advancement of technology, traditional access man-
agement systems such as key cards and codes are becoming increasingly inadequate.
Facial recognition systems provide a compelling solution, addressing several key chal-
lenges and offering enhanced security and scalability.
3
CHAPTER 1. PROJECT CONTEXT
1. Technical expertise:
SFM provides and transfers its expertise to regulators, authorities, and telecom-
munications operators in several forms:
2. Strategic Consulting:
SFM conducts strategic studies for Ministries, General Directorates of Operators,
as well as Regulatory Authority Presidencies in the context of restructuring ac-
tions, sector regulation, pricing of scarce resources, migration to new architectures
or technologies (2G/3G/3G+/LTE, WiMax, NGN, etc.), sale of licenses, opening
of capital, participation, etc.
4
CHAPTER 1. PROJECT CONTEXT
clients, whether during training or missions. SFM has developed tools as part
of its SFM Lab laboratory dedicated to the research and development of new
solutions, whose purpose is to meet the evolving needs of clients.
4. Training:
SFM offers custom training with high-added value. On its premises, in the field,
or at its clients, it periodically organizes training missions on QoS measurement
techniques. The speakers naturally and easily transfer their expertise to the
client’s teams.
5
CHAPTER 1. PROJECT CONTEXT
We had the pleasure to work in the Business Unit IT department while colaborating
with the commercial department.
• Performance: The existing algorithms take too much time for both detection
and recognition phases, which makes real-time functionality difficult and it isn’t
optimized to handle data streams from multiple cameras installed throughout the
building.
• Integration: The system lacks seamless integration with SFM’s central manage-
ment platform, which hinders information exchange and limits the development
of value-added functionalities.
from which arises the need for a new system that solves these problems.
1.8 Solution
The objective of this project is to establish a facial recognition system for access
management in SFM’s building. The system will automate personnel and visitor iden-
tification by eliminating the need for manual processes and physical access credentials.
6
CHAPTER 1. PROJECT CONTEXT
It will be integrated with SFM’s central management platform for efficient data man-
agement and utilization.
Furthermore, the system will optimize its performance by achieving efficient execution
times, scalability across multiple cameras, and reliable recognition results.
• Access Control:
Authorized users who are trying to gain access to secure areas could use facial
recognition technology for authentication.
Provide the access only to the authorized users according to the privileges they
have been assigned.
Collect the access log with timestamps and user identification.
• Accuracy:
.............
• Security: The exchange of data should be secure and the system shoudn’t be
easily bypaced
7
CHAPTER 1. PROJECT CONTEXT
8
CHAPTER 1. PROJECT CONTEXT
Agile is a modern way of doing projects, especially in tech. Instead of planning every-
thing in advance and sticking to that plan no matter what, Agile is more flexible. It’s
like taking small steps, checking where you are frequently, and adjusting your direction
as needed. It is an iterative approach that emphasizes collaboration, adaptability, and
customer feedback throughout the project’s life cycle.[10]
Agile is not a one-size-fits-all approach. It offers several methodologies:
• Scrum
• Kanban
• Lean
• Crystal
1.12.2 SCRUM
Scrum is a management framework that teams use to self-organize and work towards
a common goal. It describes a set of meetings, tools, and roles for efficient project
delivery. Much like a sports team practicing for a big match, Scrum practices allow
teams to self-manage, learn from experience, and adapt to change. Software teams use
Scrum to solve complex problems cost-effectively and sustainably.[11]
1.12.3 Roles
A Scrum Team needs three specific roles: a Product Owner, a scrum leader, and the
development team.[11]
9
CHAPTER 1. PROJECT CONTEXT
• Scrum leader: Scrum leaders are the champions for Scrum within their teams.
They are accountable for the Scrum Team’s effectiveness. They coach teams,
Product Owners, and the business to improve its Scrum processes and optimize
delivery. Scrum leaders are also responsible for doing the following:
1.12.4 Meetings
Scrum events or Scrum ceremonies are a set of sequential meetings that Scrum Teams
perform regularly.[11] Some Scrum events include the following:
10
CHAPTER 1. PROJECT CONTEXT
• Sprint Planning: In this event, the team estimates the work to be completed in
the next Sprint. Members define Sprint Goals that are specific, measurable, and
attainable. At the end of the planning meeting, every Scrum member knows how
each Increment can be delivered in the Sprint.
• Sprint: A Sprint is the actual period when the Scrum Team works together to
finish an Increment. Two weeks is the typical length for a Sprint but can vary
depending on the needs of the project and the team. The more complex the work
and the more unknowns, the shorter the Sprint should be.
• Sprint Review: At the end of the Sprint, the team gets together for an informal
session to review the work completed and showcase it to stakeholders. The Prod-
uct Owner might also rework the Product Backlog based on the current Sprint.
• Sprint Retrospective: The team comes together to document and discuss what
worked and what didn’t work during the Sprint. Ideas generated are used to
improve future Sprints.
1.12.5 Sprints
11
CHAPTER 1. PROJECT CONTEXT
1.12.6 Artefacts
Scrum Teams use tools called Scrum artifacts to solve problems and manage projects.
Scrum artifacts provide critical planning and task information to team members and
stakeholders.[11] There are three primary artifacts:
However, to get rid of the overload of the report and go into some technical details, we
will simply represent some of the diagrams that we found useful for the comprehension
of the project, namely use case diagrams, activity diagram, sequence diagram, class
diagram, and deployment diagram.
12
CHAPTER 1. PROJECT CONTEXT
1.14 Conclusion
In this chapter, we presented our host organization “SFM Technologies”. We defined
the problem statement of the project and the solution to be implemented. Finally, we
described in detail our “SCRUM” working method as well as the choice of modeling
language and the different diagram to present. In the next chapter, we will begin to
carry out the concept of facial recognition and research reviews.
13
Chapter 2
14
CHAPTER 2. STATE OF THE ART
2.1 Introduction
In this chapter we will define facial recognition and detection, we will present different
methods for facial recognition and facial detection that we found during the research
phase and finally justify which method was chosen to be used in this project.
15
CHAPTER 2. STATE OF THE ART
• Element occlusion: the presence of things like eyewear, facial hair, or hats can
affect performance and detection.
• Face pose: in an ideal situation the user is always facing the camera directly but
this is not the case in general, especially video feed.
• Face expressions: facial traits are affected by the expressions and so affect both
detection and recognition.
• Image conditions: different cameras have different levels of distortion and quality,
the image is also heavily affected by lighting conditions.
Figure 2.1: The 5 different types of Haar-like features extracted from an image patch.[3]
16
CHAPTER 2. STATE OF THE ART
Figure 2.2: Haar cascades are a class of object detection algorithms [4]
Analyzing these five rectangular regions and their corresponding differences in pixel
sums, we can create features that help classify different parts of a face. Then, for
an entire dataset of features, we use the AdaBoost algorithm which is a supervised
learning algorithm that is used to classify data by combining multiple weak or base
learners (e.g., decision trees) into a strong learner to select which ones correspond to the
facial regions of an image. Haar cascade offers several advantages. It quickly computes
Haar-like features using integral images, enhancing processing speed. Additionally, it
efficiently selects features using the AdaBoost algorithm. A key benefit is the ability
to detect faces in images irrespective of their location or scale. Lastly, the Viola-Jones
object detection algorithm can operate in real time, making it practical for various
applications. Haar-cascade is by far not a perfect algorithm, it is notorious for false
positives and will report a face where no face is present in an image.
17
CHAPTER 2. STATE OF THE ART
HOG is a feature descriptor used in computer vision and image processing, partic-
ularly in object detection and recognition tasks. It was introduced by Navneet Dalal
and Bill Triggs in 2005 and is widely used for its effectiveness in capturing local shape
and appearance information from images.[13] Initially, the input image is converted to
grayscale to simplify computations. Additionally, it may undergo gamma and color
normalization to enhance contrast.
Next, the method calculates gradients using filters such as the Sobel filter. These
filters determine the horizontal and vertical derivatives of the image. For each pixel,
the gradient magnitude (strength) and orientation (angle) are computed. The image is
then divided into small, square, or rectangular regions known as cells. Typically, these
cells are around 8x8 or 16x16 pixels in size. For each cell, a histogram of gradients is
calculated, representing the distribution of gradient orientations within the cell. The
orientations are divided into bins, such as nine bins for orientations ranging from 0 to
180 degrees or from 0 to 360 degrees. Each gradient contributes to one or more bins in
the histogram based on its orientation and magnitude, with contributions weighted by
the magnitude of the gradient.
18
CHAPTER 2. STATE OF THE ART
Cells are grouped into overlapping blocks, such as 2x2 cells. The histograms of
cells within each block are concatenated and normalized to account for variations in
lighting and contrast. Finally, the histograms of all cells (after normalization) are
concatenated to form a feature descriptor for the image. This descriptor can be used
as input to machine learning algorithms such as Support Vector Machines (SVMs) for
tasks such as pedestrian detection and other object detection applications. HOG is
effective because it captures local object shape and appearance in a way that is robust
to changes in lighting and image noise.
Figure 2.6: Face detected using a HOG model generated from multiple faces.[5]
While algorithms do provide good results, a more accurate approach would be us-
ing CNN (Convolutional Neural Networks) and DNN (Deep Neural Networks) mod-
els.CNNs and DNNs automatically learn complex features from the data, whereas algo-
rithms like Haar Cascade and HOG rely on handcrafted features. This ability to learn
hierarchical features makes CNNs more robust and effective. They also achieve higher
accuracy in face detection tasks due to their ability to model complex patterns and
relationships in data. The higher accuracy and reduced false negatives and positives
19
CHAPTER 2. STATE OF THE ART
make them better suited for tasks that require accuracy, but this comes at the cost of
performance when resources are limited or the task is time-sensitive like real-time face
detection.
2.4.4 MTCNN
20
CHAPTER 2. STATE OF THE ART
The models operate sequentially, where the output of one model becomes the input
for the next, allowing for intermediate processing steps such as non-maximum suppres-
sion (NMS) to filter bounding boxes. This cascade structure enables refined detection
across stages. Implementing the MTCNN architecture can be complex, but there are
open-source implementations and pre-trained models available for immediate use and
further training on new datasets for face detection tasks.
21
CHAPTER 2. STATE OF THE ART
Non-deep learning methods, such as Eigenfaces, Fisherfaces, and Local Binary Pat-
terns (LBP), rely on handcrafted features. These features are manually designed based
on domain knowledge and may not capture complex patterns as effectively as deep
learning models. feature extraction and decision-making are often separate processes.
Features are extracted first, and then a classifier (e.g., SVM or k-NN) is used for recog-
nition.they are advantageous in regards to data requirements, needing only small data
sets to operate at an acceptable level.
Deep learning is a sub-field of machine learning that involves training artificial neu-
ral networks with multiple layers (deep neural networks) to learn representations and
patterns from data. These networks are capable of automatically learning complex
features from raw data, making deep learning a powerful tool for tasks such as image
recognition, natural language processing, and speech recognition.
FaceNet An example of deep learning models intended for face recognition. It is a 22-
layer deep model published by Google researchers Schroff et al. It takes an input image
and outputs a 128-dimension vector, called embedding, in reference to the amount of
information embedded within this vector.[15]
The idea behind FaceNet is to train a model to generate these 128-dimension vec-
tors, the training process involves loading 3 images, 2 belong to the same person while
another is different. The algorithm will then generate a vector for each image and
22
CHAPTER 2. STATE OF THE ART
modify the neural network according to the distance measured since images 1 and 2
are closer than 2 and 3. This method is called triplet loss. This model can be used in
conjunction with a classification model trained on the generated embeddings.
VGG-Face VGG-Face is another model that also generates a vector for recognition.
It was described by Omkar Parkhi in a 2015 article titled "Deep Face Recognition".[16]
This model uses data augmentation during training to improve the performance and
robustness of the model. This technique involves applying random transformations to
the training data to increase its diversity and help the model generalize better. VGG-
Face generates a larger 4096-dimension vector which makes it more accurate. This
algorithm can also be used along with a classification model to skip the correspondence
step.
23
CHAPTER 2. STATE OF THE ART
2.6.1 Classification
Once a novel photo has been loaded into the system, it then proceeds through an
analysis process to acquire a feature matrix. For this, the matrix which is like a fin-
gerprint of the face, captures the most essential element of it. It transmits no real
distances, instead, it contains coded vital measurements like the distance between the
eyes, the width of the nose, and the proportions of facial features.
24
CHAPTER 2. STATE OF THE ART
Next, the feature matrix for the new face is computed and the system employs
distance metrics such as Euclidean distance metrics to compare the new face to the
database of stored faces. How about each facial biometric also have corresponding
feature sets? Distances metric, namely the Euclidean distance, estimate the similarity
between the matrix of the new face’s feature and all the matrices of the database. The
person with the biggest number of matching features from the database is designated
as the supposed match.
In other words, the feature matrix stands for face characteristics, and the distance
metric returns the best match from the database after comparing it against encoded
features.
For the face detection process we chose Haar cascade because it was fast and overall
reliable. While it is prone to false positives we have not found an issue using it and only
encountered false positives a handful of times and they were easily distinguishable from
real faces. Other methods were not reliable or consumed too many resources, since the
detection is a continuous process resource consumption was a major factor in choosing
Haar cascade over other models and methods
25
CHAPTER 2. STATE OF THE ART
For the face recognition model, we could not decide on a single approach from research
alone. Almost all the methods and models were promising so we decided to try a number
of these models, some we trained and others pre-trained, with the goal of determining
which one is the best for our system. We tried both deep learning and none deep
learning methods. The best-performing model was a pre-trained FaceNet model. It
was an embedding extracting model without the classification. A part of why we had
forgone classification is the difficulty of adding and removing users.
When a new user is added, the model must be retrained and tested with the new
user image, which can be time-consuming. Also, using the classification technique,
the system must be taken offline and then restarted with the newly added user for
the change to occur. Removing a user is significantly more difficult because we can’t
"untrain" the model with user data. To remove a user, we must first find a previous
version that has not been educated on the user’s data or add a filter to the decision-
making system. Adding a filter for users may be a solution, but we end up with a model
that considers data in its prediction.
2.8 Conclusion
In this chapter we discussed how face detection and recognition is achieved citing
important papers on the subject and then deciding which approach is best suited for
out needs. In the next chapter, we will talk about the design of the system, how
these techniques play a role in the system, and how different components operate and
communicate in the system.
26
Chapter 3
System design
27
CHAPTER 3. SYSTEM DESIGN
3.1 Introduction
In this chapter we discus the design of the system, how it behaves, the different re-
quirements and components that make up the system and how they communicate with
each other as well as other systems.In the first section we have an overview of the sys-
tem,then we discus the system’s requirements,the general design and finally the data
flow throughout.
28
CHAPTER 3. SYSTEM DESIGN
3.3.1 Accuracy
Perhaps the most important requirement is accuracy, the system needs to be able
to accurately identify a user when given an image of sufficient quality. A failure to
correctly identify a person could lead to time wasting in the case of not granting access
or breaking security measures in the case of an incorrect identification attributed to a
person.
3.3.2 Performance
3.3.3 Security
As with any system that deals with restricted resource access and sensitive personnel
data, there is a minimum security requirement that needs to be met.API requests
between the various components need to be secured and users personnel data need to
be stored in a safe and legally-compliant manner as to guarantee the privacy of users.
After reviewing the needs and requirements of the system the following composition of
modules and techniques were chosen.
29
CHAPTER 3. SYSTEM DESIGN
The facial detection component is responsible for handling the multiple video feeds and
detecting and sending faces.It takes camera information from a database and starts
a detection thread on that feed and communicates any detected faces with the main
component.Along with the face image it sends the timestamp of when the face was
detected along with the camera identifier to know where the user was detected.Using
this information, the main component can proceed with the recognition process and
sends the user id to the appropriate system.
the facial recognition model component is responsible for converting face images into
embeddings for use in identification or adding users to a database.This is where the
main use of AI comes into play.It listens for incoming requests that contain image data
encoded in base64, it then converts the data back into a readable image form and runs
it through the AI model.The resulting embeddings vector is then sent back to the client.
30
CHAPTER 3. SYSTEM DESIGN
Because the face detection and face recognition modules are separate an intermediate
component is needed.This component is responsible for interacting with the detection
component and the AI model.It receives the face images,stores them on disk, and search
for corresponding embeddings in the database and sends the results to the decision
making component.It also handles authentication with the central system for database
access.
31
CHAPTER 3. SYSTEM DESIGN
REST API stands for "Representational State Transfer Application Programming In-
terface." It is a way to design networked applications using a set of constraints and
principles that help standardize the way systems communicate over the web. Because
our modules and the overall system need to be independent we chose this method of
exchanging information.The key features of REST API design are :
• Stateless
• Client-Server Architecture
• Resource-Based
• HTTP methods
• JSON/XML Format
• Layered-system
• Cacheability
The choice of using the REST API also stems from the performance and security
requirements. In terms of performance it allows us to run the different components on
separate machines tailored for each module’s resource consumption as well as allowing
us to develop these components in more performance oriented languages if the need
arises. Since we can use HTTPS/TLS for communication and the modules can be
deployed in separate networks (allowing for example the database to be hosted in a
secure machine) the security requirement is met.
3.4.6 Database
There are many types of databases to choose for this type of application but the two
main types are SQL databases and noSQL databases.SQL databases are perfect for re-
lational data while noSQL databases are best suited for non -relational data of different
types.The choice of database type is also restricted by what is already in use by SFM
Group, and since they already have an SQL database management system we will be
implementing SQL into our system to eliminate compatibility issues and reduce type
conversion processes.
32
CHAPTER 3. SYSTEM DESIGN
The use-case diagram depicts how the users will interact with the system. The only
effective interaction is when the user walks inside the range of facial detection.Aside
from this interaction, the user has no other options that can influence the behavior or
outcome of the recognition process.
33
CHAPTER 3. SYSTEM DESIGN
The class diagram defines the classes from which objects will be derived and used.For
our system the main classes are the user and camera classes with other classes defined for
ease of implementation and abstraction of the system like the level class.The following
diagram describes the main classes in the system.
The purpose of the sequence diagram is to explain the interactions between modules in
time-based manner.The following diagram explains when each component intervenes in
the overarching recognition process.
34
CHAPTER 3. SYSTEM DESIGN
This diagram describes the general behavior and activity of the system explained in the
previous section.It defines the general behavior of the system in each step and results
of decision nodes.
3.5.5 Authentication
The authentication component is a utility that already exists within the central man-
agement system of SFM.It receives a username and password and sends back a token
to be used to access the databases and send the user’s id back to the central system.
35
CHAPTER 3. SYSTEM DESIGN
3.6 Conclusion
In conclusion the system needs to be decided into independent components that can
reliably communicate with each other.A central module links the detection,AI model
and database to execute the recognition and then communicates the result with a
decision making system.The communication is ensured by using web requests in the
form of REST APIs and security is guaranteed by an authentication service and the
use of TLS encryption.In the next chapter we will discus the implementation of the
design in depth.
36
Chapter 4
Implementation
37
CHAPTER 4. IMPLEMENTATION
4.1 Introduction
This chapter discusses the working environment and covers the chosen technology
stack in detail. It goes on to explain the various models trained, thus explaining the
reasons and processes for model selection. Lastly, system testing takes place, and
security considerations are checked.
Windows 10 A Microsoft operating system for personal computers used as the main
development environment.
Ubuntu Server An open-source environment that delivers the best value scale-out
performance and available for free.
Python
38
CHAPTER 4. IMPLEMENTATION
• Use Case: Many libraries and frameworks for AI are designed with python.
4.3.2 Frameworks
Flask
• Use Case: Flask is easy to get started with and keeps the core of the application
simple and scalable.
Tensor-Flow
• Use Case: It’s designed to run machine learning, deep learning workloads and to
streamline the process of developing and executing advanced analytics applica-
tions
DeepFace
4.3.3 Libraries
39
CHAPTER 4. IMPLEMENTATION
• Description: An open source computer vision and machine learning software li-
brary.
Numpy
• Use Case: It provides a multidimensional array object, and various derived objects
(such as masked arrays and matrices).
Requests
• Use Case: It makes requests behind a simple API that returns a Response Object
with all the necessary data (content, encoding, etc)
• Use Case: It is broadly compatible with various web frameworks, simply im-
plemented, light on server resources, and allows multiple instances of the same
application to be deployed.
Nginx
• Description: It is open source software for web serving and reverse proxying.
• Use Case: It directs the client’s request to the appropriate back-end server and
HTTPS encryption.
40
CHAPTER 4. IMPLEMENTATION
Postman
The first one we tried was a non deep learning algorithm called LBPH that summa-
rizes facial textures using histograms (distribution of numerical data) of the unknown
face and then compare them to those from known faces in a database. We used a
dataset of 5 faces, each face has 50 images. We reached a maximum of 46%.
4.4.2 ResNet(CNN)
The second type of model we trained was ResNet model that is a type of feed-
forward neural network but with a number of regularization effects where, the feature
engineering is done by the model itself through filter optimization where We trained 5
faces, each face has 50 images.
The main architecture of ResNet is composed of the two layers of convolution, with
32 and 64 filters of 3 (column) by 3 (rows) in size respectively. These layers learn
various low level features from the input image and the Later convolutional layers use
1×1 filters to learn more localized features.
41
CHAPTER 4. IMPLEMENTATION
The Activation Function used were ReLU (Rectified Linear Unit) activation that are
normally used after most of the convolutional layers so as to add non-linear transfor-
mations that assist the network detect non-linearity features.
After some of the convolutional layers we applied batch normalization to enhance the
rate and stability of training.
We also applied Dropout layers to minimize the risk of overfitting and set them in
a way that a 50% probability of neurons being dropped out would occur during the
training phase. Then MaxPooling layer down samples the image while obtaining the
max activation in the regions of interest.
The obtained features are then flattened into a 1D vector and then passed through
two fully-connected layers.
The first dense layer has 32 neurons next which is followed by a last dense layer
having the number of neurons equal to the number of classes to be predicted and in the
Output Layer, Sigmoid activation has been used.
Lastly, the categories were compiled using categorical cross-entropy as the suitable
loss function for a multi class–multi label dataset and stochastic gradient descent is
used as the optimizer.
4.4.3 FaceNet
While comparing the performance of the different models that can be used in our
facial recognition system, we determined that by far, the best one is Facenet since it is a
deep learning architecture that can produce robust and high-quality facial embeddings.
42
CHAPTER 4. IMPLEMENTATION
4.5 Implementation
First we need to establish a way to receive video feed from cameras.In SFM a work-
ing CCTV system is already in place, using RTSP to send video feed to a central
server.Using the RTSP URL allows us to handle the video feed in the code and pass
frames to the detection function. We used Opencv’s haarcascade model to detect faces,
once a face is detected, we use the bounding box and extract the face from the frame,
then this face image is converted to base64 in order for it to be sent in JSON file.The
Base64 string is then sent, along with the timestamp of when it was detected and
the camera id from which it was detected from, to the intermediate server component
which handles the conversion,recognition,logging and communication with the decision
making component.
Once a response confirming that the data has been handled correctly the system
resumes analysing the video frames looking for face patterns to repeat the process
again. In order to handle multiple cameras at the same time the program first fetches
information regarding the cameras from a database.This database contains the unique
identifier of each camera and the RTSP link to establish connection.Once the data is
43
CHAPTER 4. IMPLEMENTATION
received, a thread is launched, creating a separate process for each camera with its ID
and link.Using multi-threading allows each camera to operate independently,optimizes
recourse usage and allows the program to be dynamic and scalable.
The software moves on to the recognition phase.The face features stored in memory
44
CHAPTER 4. IMPLEMENTATION
are supplied to a function that loops over the database, calculating the distance between
the detected picture facial features and the recorded facial features.If a distance is
discovered to be less than a certain measure, the method returns the user id associated
with the database embedding; otherwise, the text "unknown" is returned. When the
recognition process produces a result, the timestamp and camera id are passed to the
decision-making process.At the same moment, a "200 OK" response is delivered back
to the detecting component, allowing it to resume its work. Finally, the intermediate
server continues to listen for incoming requests.
Figure 4.7: Automatically produced folders containing face photos, each with a unique
id.
For the implementation of the AI model we used DeepFace library, an open source
python wrapper for many of the popular models including Facenet.The primary reason
45
CHAPTER 4. IMPLEMENTATION
for utilizing this library is its representation function, which takes an image as input
in the form of a numpy array or the image location and returns the image embed-
ding.Although the library has other functions, all we needed was a simple code-based
method of interacting with the model.
The idea for creating an API for the model stems from performance requirements; if
we include the model loading and represent function in our verification code, it will be
loaded and unloaded from memory each time the function is called, adding significant
latency given that the model is approximately 100 Mb in memory.So the goal is to load
the model into memory once and then access it several times via a web API.
46
CHAPTER 4. IMPLEMENTATION
When the model server starts, it listens for new requests.The request is a JSON
file that includes a "image" key with a Base 64 representation of the image.This B64
string is then transformed to a JPEG image in memory, then to a numpy array, which
is supplied to the "represent" function.The function’s return value is subsequently sent
to the client.
Several tests have been performed on the model to assess its performance, primarily
how accurate it is and how well it can build a facial embedding from an image.Using a
modest 40-image dataset, the model achieved 98 percent accuracy and performed well
even with varying facial expressions. During testing, we discovered that the speed of
producing a result from an image is proportional to the size of the image.The larger
the image, the longer it will take to build an embedding.
47
CHAPTER 4. IMPLEMENTATION
lighting influence detection, although distance can be mostly ignored if the preceding
two conditions are met. The accuracy of detection cannot be easily measured.False
positives are common using Haar cascade face detection, but this is insignificant under
ideal conditions. Hardware has a considerable impact on performance, with weaker
CPUs processing fewer frames in the same time interval as capable CPUs.The detection
procedure is done in real-time and is continuous, putting a constant load on processors
and memory.
Processing the images includes resizing and converting the image from base64 to
byte data.This processing adds strain on the system but only when a face is sent for
identification.We see spikes in resource consumption when an image is being processed
so we do the processing in the intermediate phase to reduce the load on the AI model
component.
4.6.4 Results
Latency Test: The response latency is the time that a server takes to respond to a
request. For our system it is the time that it takes to identify a face when a user is
positioned in front of the camera. Both the intermediate server and the AI model server
were tested separately in order to identify where the performance bottle-necks are.
48
CHAPTER 4. IMPLEMENTATION
Load Test: The purpose of the load test is to see how many requests can the system
handle at the same time without slowing down or crashing. The test is conducted using
Postman’s runner function which allows us to define a request and send it multiple
times at the same time. One example is sending ten requests three times a second for a
total of 107 per minute.We found that,even when running the system on hardware that
is low in performance, we did not find a significant drop in response time or crashing.
4.7 Optimisation
Various measures were taken to optimize overall performance.For example image pro-
cessing is done in the main component and not in the detection phase to reduce resource
consumption when treating multiple frames a second.Copies of the user database are
kept on local machine with the main component to reduce the amount of requests made
to the database server and speed up the search for the user. Both the main server and
the AI model server are deployed using Gunicorn, which allows us to deploy multiple
workers in order to handle multiple requests at the same time.
49
CHAPTER 4. IMPLEMENTATION
4.8 Security
50
CHAPTER 4. IMPLEMENTATION
Figure 4.16: Example auth response (image is only representative and not reflective of
actual token size
4.9 Conclusion
In this chapter we detailed the implementation of the design, explaining how each
component works and communicates.We also discussed the security and the perfor-
mance testing conducted.
51
General Conclusion
In summery this report details the research design and implementation of an access
management system using artificial intelligence based facial recognition.The result is a
rudimentary system that is extensible and easy to implement within existing software
and hardware architecture.
The research conducted facilitated the design process by allowing us to compare the
different facial detection and facial recognition techniques, algorithms and models.The
methods chosen during the design process were justified according to the requirements
specified in the project overview.
The results obtained were very promising.The system was fairly accurate, utilizing
only one picture to recognize a person and recognition process was very fast considering
it was running on average hardware.
On a critical note the system could have been designed in a more object-oriented
way to facilitate extensibility.We could have also used programming languages that are
more performance oriented for components with such requirements.The way we encode
and send images can use more optimization to reduce time spent on conversion.Like
wise for the database, if we utilized a document based database instead of an SQL one,
we would gain some flexibility in saving and loading data.
Overall the project is good starting point for creating a facial recognition access
management system.
52
Bibliography
[5] Adam Geitgey. Machine learning is fun! part 4: Modern face recognition with
deep learning. https://miro.medium.com/v2/resize:fit:1100/format:webp/
1*6xgev0r-qn4oR88FrW6fiA.png, 2016. Accessed: 2024-4-11.
[7] Ramandeep Kaur and Er. Himanshi. Face recognition using principal component
analysis. In 2015 IEEE International Advance Computing Conference (IACC),
pages 585–589, 2015.
[8] Kawtar Choubari. Sql vs nosql : What’s the best option for your
database? https://medium.com/ieee-ensias-student-branch/
sql-vs-nosql-whats-the-best-option-for-your-database-3e0fe08c1449,
2020. Accessed: 2024-5-13.
53
[10] Nandkishor More. What is agile methodology, in simple words?
https://medium.com/@nandkishor204/what-is-agile-methodology-in-simple-words-
741ee44ede51, 2023.
[12] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple
features. In Proceedings of the 2001 IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition. CVPR 2001, volume 1, pages I–I, 2001.
[13] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection.
In 2005 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR’05), volume 1, pages 886–893 vol. 1, 2005.
[14] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. Joint face detec-
tion and alignment using multitask cascaded convolutional networks. IEEE Signal
Processing Letters, 23(10):1499–1503, October 2016.
[15] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified em-
bedding for face recognition and clustering. In 2015 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pages 815–823, 2015.
[16] Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition.
In British Machine Vision Conference, 2015.
54
Résumé
Ce projet présente le développement et la mise en œuvre d'un système de
gestion d'accès par reconnaissance faciale utilisant des technologies
d'intelligence artificielle (IA). L'objectif principal du système est d'améliorer la
sécurité et de rationaliser le contrôle d'accès dans divers environnements
tels que les bureaux d'entreprise, les installations sécurisées et les
complexes résidentiels. En utilisant l'apprentissage profond pour l'extraction
et la reconnaissance des traits faciaux, le système offre une grande précision
et efficacité dans l'identification des individus autorisés. L'architecture du
système comprend le traitement d'image en temps réel. Le système est
conçu pour être facilement intégré aux infrastructures de sécurité existantes.
Le déploiement de ce système de gestion d'accès par reconnaissance
faciale basé sur l'IA promet une sécurité renforcée, une réduction des
charges administratives et une expérience utilisateur fluide, en faisant une
solution viable pour les défis modernes du contrôle d'accès.
Abstract
This project presents the development and implementation of a facial recognition
access management system leveraging artificial intelligence (AI) technologies.
The primary objective of the system is to enhance security and streamline access
control in various environments such as corporate offices, secure facilities, and
residential complexes. Utilizing deep learning for facial feature extraction and
recognition, the system offers high accuracy and efficiency in identifying authorized
individuals.The system's architecture includes real-time image processing.The system
is designed to be easily integrated with existing security infrastructures.
The deployment of this AI-based facial recognition access management system
promises enhanced security, reduced administrative overhead, and a seamless user
experience, making it a viable solution for modern access control challenges.