Raport PFE

Department: STIC
Reference :
Applied License in Information and Communication Sciences

and Technologies
Option:
End of Studies Project
« Title
Facial recognition access» managment
system based on artificial intelligence
Realized by: Youssef BAHRI

Chaima MELKI
Class: SR A - RST C
Superviser(s): HaifaBEN SABER

Hassen SGHIR
Host Company: Groupe SFM
Academic year: 2023-2024

Acknowledgements
We would first and foremost like to use this occasion and address our sincere thanks
to the “Institut Supérieur des Études Technologiques en Communication” for having
provided us the good fortune of prospering and accomplishing the present project.
Therefore, it is our pleasure to thank Ms.Haifa Ben Saber, whose unwavering com-
mitment and substantial contributions have significantly enriched the substance of this
report. Ms.Ben Saber’s guidance, wealth of expertise, and consistent assistance were
pivotal in steering us through the complexities of the project and ensuring its successful
culmination.
Hence, with due regards to Ms. Haifa Ben Saber, it is crucial to notice her desire to
foster a learning and thinking environment for students. Needless to say it is far from
being a mere apprenticeship and the kind of knowledge she has endowed us with has
left imprints on our character, academic and professional careers.
2
Contents
General Introduction 9
1 Project Context 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Project context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Host company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 SFM Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 SFM activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Functional chart of SFM . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.9 Functional Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.10 Non Functional Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.11 Theoretical Gantt chart . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.12 Development methodology . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.12.1 Agile Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.12.2 SCRUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.12.3 Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.12.4 Meetings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.12.5 Sprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.12.6 Artefacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.13 Choice of the modeling language . . . . . . . . . . . . . . . . . . . . . . 12
1.14 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3
CONTENTS
2 State of the art 14

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Defining facial recognition . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Facial Recognition Steps . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Facial Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 Haar Cascade . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.2 Histogram of Oriented Gradients . . . . . . . . . . . . . . . . . 18
2.4.3 CNN and DNN . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.4 MTCNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Non-Deep Learning Method . . . . . . . . . . . . . . . . . . . . 22
2.5.2 Deep Learning Method . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Classification versus embedding correspondence . . . . . . . . . . . . . 24
2.6.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.2 Embedding Correspondence . . . . . . . . . . . . . . . . . . . . 24
2.7 Method choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7.1 Face detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7.2 Face recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 System design 27
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.3 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 General Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4
CONTENTS
3.4.1 Modules and components . . . . . . . . . . . . . . . . . . . . . . 29

3.4.2 Facial Detection Module . . . . . . . . . . . . . . . . . . . . . . 30
3.4.3 Facial Recognition Module . . . . . . . . . . . . . . . . . . . . . 30
3.4.4 Main Server Module . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.5 REST APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.6 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 Data flow and component intercommunication . . . . . . . . . . . . . . 33
3.5.1 Use-Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.2 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.3 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.4 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5.5 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Implementation 37
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Working environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Hardware environment . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.2 Software environment . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Tech stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.1 Programming languages . . . . . . . . . . . . . . . . . . . . . . 38
4.3.2 Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.3 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.4 Application Servers . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4.1 LBPH(Local Binary Patterns): . . . . . . . . . . . . . . . . . . 41
4.4.2 ResNet(CNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4.3 FaceNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5
CONTENTS
4.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5.1 Facial detection component . . . . . . . . . . . . . . . . . . . . 43
4.5.2 Main Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.5.3 AI model server . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.6 Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.6.1 Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.6.2 Detection Performance . . . . . . . . . . . . . . . . . . . . . . . 47
4.6.3 Processing performance . . . . . . . . . . . . . . . . . . . . . . . 48
4.6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.7 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.8 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.8.1 HTTPS implementation . . . . . . . . . . . . . . . . . . . . . . 50
4.8.2 Authentication API implementation . . . . . . . . . . . . . . . . 50
4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6
List of Figures
1.1 SFM IT Solutions[1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Locations of SFM activities[2] . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Chart of SFM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Gantt Diagram 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Gantt Diagram 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Life Cycle of Scrum Methodology . . . . . . . . . . . . . . . . . . . . . 10
1.7 UML Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 The 5 different types of Haar-like features extracted from an image patch.[3] 16
2.2 Haar cascades are a class of object detection algorithms [4] . . . . . . . 17
2.3 Gray-scaled image and surrounding pixels.[5] . . . . . . . . . . . . . . . 18
2.4 Original Image.[5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 HOG image representation with characteristics.[5] . . . . . . . . . . . . 19
2.6 Face detected using a HOG model generated from multiple faces.[5] . . 19
2.7 CNN architecture.[6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.8 MTCNN 3-network stages[7] . . . . . . . . . . . . . . . . . . . . . . . . 21
2.9 Facenet embedding vector extracted from image . . . . . . . . . . . . . 22
2.10 Triplet loss example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.11 VGG-Face data augmentation example . . . . . . . . . . . . . . . . . . 24
2.12 Facial Embeddings Extraction System . . . . . . . . . . . . . . . . . . 25
3.1 General design diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7
LIST OF FIGURES
3.2 Face detection component design diagram. . . . . . . . . . . . . . . . . 30

3.3 AI module behavior diagram . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Main component role in system . . . . . . . . . . . . . . . . . . . . . . 31
3.5 SQL vs noSQL comparison.[8] . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 Use-Case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7 Class diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8 Sequence diagram describing system. . . . . . . . . . . . . . . . . . . . 34
3.9 Activity diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1 LBPH Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2 CNN Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Detection and conversion process . . . . . . . . . . . . . . . . . . . . . 43
4.4 Face detection log running on a single camera. . . . . . . . . . . . . . . 44
4.5 Main component implementation . . . . . . . . . . . . . . . . . . . . . 44
4.6 Main server event log . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.7 Automatically produced folders containing face photos, each with a unique
id. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.8 Main server resulting response after identification. . . . . . . . . . . . . 45
4.9 Sequence diagram for AI model implementation . . . . . . . . . . . . . 46
4.10 AI model server log confirming request is received correctly. . . . . . . 46
4.11 AI model server response containing face embeddings. . . . . . . . . . . 47
4.12 AI model server request and response time . . . . . . . . . . . . . . . . 49
4.13 Example test and results . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.14 Nginx reverse proxy explanatory diagram. . . . . . . . . . . . . . . . . 50
4.15 Example of authentication request . . . . . . . . . . . . . . . . . . . . . 50
4.16 Example auth response (image is only representative and not reflective
of actual token size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8
List of Tables
1.1 Our Scrum Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Refinement of the general use case . . . . . . . . . . . . . . . . . . . . . 34
4.1 Hardware used in project . . . . . . . . . . . . . . . . . . . . . . . . . . 38
9
General Introduction
In today’s security-conscious world, access management systems play a vital role in

safeguarding physical and digital spaces. Facial recognition technology, powered by
artificial intelligence (AI), is emerging as a powerful tool for access control, offering a
convenient and secure alternative to traditional methods like key cards and passwords.
This report delves into the world of facial recognition systems. We’ll explore the
core functionalities of these systems, examining how they leverage facial features for
accurate identification.
The first chapter will explore the project context, introducing the host company and
its activities and address the problem statement as well as the solution according to
the functional and non functional needs while defining the agile method applied during
the project.
The second chapter will provide a foundational understanding of facial recognition

and facial detection technology, explaining how it works along with the choice of the
method applied.
The third chapter will address the system design and its modules along with multiple
diagrams explaining the process of accurate face recognition from the camera input to
the identification of the user.
The last chapter will discuss the implementation of the system, performance testing
and optimisation while ensuring the highest security levels.
1
Chapter 1
Project Context
2
CHAPTER 1. PROJECT CONTEXT
1.1 Introduction
As security concerns rise with the advancement of technology, traditional access man-
agement systems such as key cards and codes are becoming increasingly inadequate.
Facial recognition systems provide a compelling solution, addressing several key chal-
lenges and offering enhanced security and scalability.
1.2 Project context

Facial recognition technology is now being used as a highly effective way to manage
access control. It provides a convenient and potentially secure solution for controlling
physical access to buildings. This system will be implemented within the building oper-
ated by SFM and will cover all installed IP cameras. This will ensure high performance
and security, leading to a robust and efficient access management system.
1.3 Host company

The work presented in this report was carried out in SFM, an engineering and exper-
tise firm that specializes in telecommunications networks and information technologies.[9]
SFM is a member of ITU-D and comprises of several entities such as SFM Technologies,
SFM Guinée, SFM Cameroun, SFM Burkina Faso, and SFM Europe.
Created in 1995, it has acquired expertise in the production of IT products for
the telecom sector but also for other sectors such as insurance, banking and integrated
management solutions.[9]
1.4 SFM Solutions

SFM is interested in the digitalization of telecommunications processes by offering
IOT products based on artificial intelligence algorithms adapted to the specific needs
of customers.
3
Figure 1.1: SFM IT Solutions[1]
1.5 SFM activities

SFM’s activities are organized around four axes:
1. Technical expertise:
SFM provides and transfers its expertise to regulators, authorities, and telecom-
munications operators in several forms:
• Audit and Benchmark of mobile network quality of service.

• Planning, Dimensioning, and Optimization of mobile networks
• Expertise and Engineering related to Fiber Optics.
• Traffic Analysis.
• Management and control of the spectrum.
• Pricing of services.
2. Strategic Consulting:
SFM conducts strategic studies for Ministries, General Directorates of Operators,
as well as Regulatory Authority Presidencies in the context of restructuring ac-
tions, sector regulation, pricing of scarce resources, migration to new architectures
or technologies (2G/3G/3G+/LTE, WiMax, NGN, etc.), sale of licenses, opening
of capital, participation, etc.
3. Research and Development:

SFM, through its team that includes university researchers, has engaged in re-
search and development projects that allow it to stay at the forefront of new
technologies. Thus, propagation models, interconnection models, traffic analysis
tools, QoS analysis techniques, and field measurement tools for QoS analysis are
among the areas of work in partnership with the university that benefit SFM
4
clients, whether during training or missions. SFM has developed tools as part
of its SFM Lab laboratory dedicated to the research and development of new
solutions, whose purpose is to meet the evolving needs of clients.
4. Training:
SFM offers custom training with high-added value. On its premises, in the field,
or at its clients, it periodically organizes training missions on QoS measurement
techniques. The speakers naturally and easily transfer their expertise to the
client’s teams.
In this second illustration we represent the locations of SFM activities:
Figure 1.2: Locations of SFM activities[2]
1.6 Functional chart of SFM

The organizational structure of SFM Technologies is designed with a simple and clear
hierarchy. Its organizational chart is as follows:
5
Figure 1.3: Chart of SFM
We had the pleasure to work in the Business Unit IT department while colaborating
with the commercial department.
1.7 Problem Statement

SFM has created a proof-of-concept (PoC) facial recognition platform for automating
personnel and visitor identification within their building However, the current system
is facing some limitations in terms of performance, scalability, and integration:
• Performance: The existing algorithms take too much time for both detection
and recognition phases, which makes real-time functionality difficult and it isn’t
optimized to handle data streams from multiple cameras installed throughout the
building.
• Integration: The system lacks seamless integration with SFM’s central manage-
ment platform, which hinders information exchange and limits the development
of value-added functionalities.
from which arises the need for a new system that solves these problems.
1.8 Solution
The objective of this project is to establish a facial recognition system for access
management in SFM’s building. The system will automate personnel and visitor iden-
tification by eliminating the need for manual processes and physical access credentials.
6
It will be integrated with SFM’s central management platform for efficient data man-
agement and utilization.
Furthermore, the system will optimize its performance by achieving efficient execution
times, scalability across multiple cameras, and reliable recognition results.
1.9 Functional Needs

• User Management:
Allow authorized users to use the system.
Allocate the rights to the varying users.
Deactivate or even terminate user accounts.
• Access Control:
Authorized users who are trying to gain access to secure areas could use facial
recognition technology for authentication.
Provide the access only to the authorized users according to the privileges they
have been assigned.
Collect the access log with timestamps and user identification.
1.10 Non Functional Needs

• Performance:
The system should be able to handle with a large number of individuals that will
be trying to log in to the system at the same time.
• Accuracy:
.............
• Security: The exchange of data should be secure and the system shoudn’t be
easily bypaced
7
1.11 Theoretical Gantt chart

After the preliminary study phase, we used the Gantt chart which offers a representa-
tion making it possible to provide information and locate in time the phases, activities,
tasks, and resources of the project.
Indeed, we have drawn a forecast Gantt chart that we believe we will follow in
order to complete our project. Figure below illustrates a screenshot of the steps to be
carried out using the "Gantt Project" tool.
Figure 1.4: Gantt Diagram 1
Figure 1.5: Gantt Diagram 2
1.12 Development methodology

In order to carry out our project, we propose to use the SCRUM model of the
agile methodology for several reasons. We hold regular meetings with the company’s
supervisor and we deliver operational versions as often as possible of the project. We
collaborate with the supervisor to ensure the achievement of our objectives, Finally, the
scrum model aims to improve the productivity of the team thanks to different meetings,
and essentially aims to optimize the predictability of a project and control the risks.
8
1.12.1 Agile Method
Agile is a modern way of doing projects, especially in tech. Instead of planning every-
thing in advance and sticking to that plan no matter what, Agile is more flexible. It’s
like taking small steps, checking where you are frequently, and adjusting your direction
as needed. It is an iterative approach that emphasizes collaboration, adaptability, and
customer feedback throughout the project’s life cycle.[10]
Agile is not a one-size-fits-all approach. It offers several methodologies:
• Scrum
• Kanban
• Lean
• Extreme Programming (XP)
• Crystal
We are interested in the SCRUM method.
1.12.2 SCRUM
Scrum is a management framework that teams use to self-organize and work towards
a common goal. It describes a set of meetings, tools, and roles for efficient project
delivery. Much like a sports team practicing for a big match, Scrum practices allow
teams to self-manage, learn from experience, and adapt to change. Software teams use
Scrum to solve complex problems cost-effectively and sustainably.[11]
1.12.3 Roles
A Scrum Team needs three specific roles: a Product Owner, a scrum leader, and the
development team.[11]
• Product Owner: The Product Owner focuses on ensuring the development

team delivers the most value to the business. They understand and prioritize
the changing needs of end users and customers. Effective product owners do the
following:
9
Figure 1.6: Life Cycle of Scrum Methodology
• Scrum leader: Scrum leaders are the champions for Scrum within their teams.
They are accountable for the Scrum Team’s effectiveness. They coach teams,
Product Owners, and the business to improve its Scrum processes and optimize
delivery. Scrum leaders are also responsible for doing the following:
• Scrum development team: The Scrum Team consists of testers, designers, UX

specialists, Ops engineers, and developers. Team members have different skill sets
and cross-train each other, so no one person becomes a bottleneck in delivering
work.
Scrum Roles Person Responsible

Product Owner Mr.Sami Tabenne
SCRUM master Mr.Hassan Sghir
Team members Chaima Melki and Youssef Bahri
Table 1.1: Our Scrum Team
1.12.4 Meetings
Scrum events or Scrum ceremonies are a set of sequential meetings that Scrum Teams
perform regularly.[11] Some Scrum events include the following:
10
• Sprint Planning: In this event, the team estimates the work to be completed in
the next Sprint. Members define Sprint Goals that are specific, measurable, and
attainable. At the end of the planning meeting, every Scrum member knows how
each Increment can be delivered in the Sprint.
• Sprint: A Sprint is the actual period when the Scrum Team works together to
finish an Increment. Two weeks is the typical length for a Sprint but can vary
depending on the needs of the project and the team. The more complex the work
and the more unknowns, the shorter the Sprint should be.
• Daily Scrum or stand-up: A Daily Scrum is a short meeting in which team

members check in and plan for the day. They report on work completed and voice
any challenges in meeting Sprint Goals. It is called a stand-up because it aims to
keep the meeting as short as practical—like when everybody is standing.
• Sprint Review: At the end of the Sprint, the team gets together for an informal
session to review the work completed and showcase it to stakeholders. The Prod-
uct Owner might also rework the Product Backlog based on the current Sprint.
• Sprint Retrospective: The team comes together to document and discuss what
worked and what didn’t work during the Sprint. Ideas generated are used to
improve future Sprints.
1.12.5 Sprints
• Sprint 1: Literature Review and System Requirements Gathering.

This sprint emphasizes understanding existing solutions and defining the specific
needs of the system.
• Sprint 2: System Development and Training.

This sprint builds the AI system for processing and recognizing faces.
• Sprint 3: System Testing and Implementation.

This sprint focuses on polishing the system, ensuring functionality, and integrating
it with the company platform.
11
1.12.6 Artefacts
Scrum Teams use tools called Scrum artifacts to solve problems and manage projects.
Scrum artifacts provide critical planning and task information to team members and
stakeholders.[11] There are three primary artifacts:
• Product Backlog: The Product Backlog is a dynamic list of features, require-

ments, enhancements, and fixes that must be completed for project success. It
is essentially the team’s to-do list, which is constantly revisited and reprioritized
to adapt to market changes. The product owner maintains and updates the list,
removing irrelevant items or adding new requests from customers.
• Sprint Backlog: The Sprint Backlog is the list of items to be completed by

the development team in the current Sprint cycle. Before each Sprint, the team
chooses which items it will work on from the Product Backlog. A Sprint Backlog
is flexible and can evolve during a Sprint.
• Increment: The Increment is a step towards a goal or vision. It is the usable

end product from a Sprint. Teams can adopt different methods to define and
demonstrate their Sprint Goals. Despite the flexibility, the fundamental Sprint
Goal—what the team wants to achieve from the current Sprint—can’t be com-
promised.
1.13 Choice of the modeling language

For the design of our project, we adopted the UML modeling language (Unified
Modeling Language) which is dedicated to the specification. The UML language is
designed to be a common and semantically rich visual modeling language syntactically.
Its objective is to architect design and implement complex systems software through
their structure as well as their behavior.
However, to get rid of the overload of the report and go into some technical details, we
will simply represent some of the diagrams that we found useful for the comprehension
of the project, namely use case diagrams, activity diagram, sequence diagram, class
diagram, and deployment diagram.
12
Figure 1.7: UML Model
1.14 Conclusion
In this chapter, we presented our host organization “SFM Technologies”. We defined
the problem statement of the project and the solution to be implemented. Finally, we
described in detail our “SCRUM” working method as well as the choice of modeling
language and the different diagram to present. In the next chapter, we will begin to
carry out the concept of facial recognition and research reviews.
13
Chapter 2
State of the art
14
CHAPTER 2. STATE OF THE ART
2.1 Introduction
In this chapter we will define facial recognition and detection, we will present different
methods for facial recognition and facial detection that we found during the research
phase and finally justify which method was chosen to be used in this project.
2.2 Defining facial recognition

Facial recognition is the process of analyzing faces in images in order to identify to
whom they belong. This can be done by either matching a face to a labeled dataset
or verifying its existence in a database. The process typically involves detecting and
capturing facial features, such as the distance between the eyes, the shape of the nose
and mouth, and the contours of the face. These features are then converted into a digital
signature that can be compared against known faces in a database. Facial recognition
technology has a wide range of applications, including security, law enforcement, access
control, and even marketing.
2.3 Facial Recognition Steps

A facial recognition system is primarily constituted of the following steps:
• image data input
• face detection in image
• extraction of facial features
• recognition using the extracted features
2.4 Facial Detection

Perhaps the most important aspect of the facial recognition system is facial detection,
which involves estimating a boundary box that contains the face in a given image. The
goal is to extract the pixels containing the face and reduce noise that might affect the
facial recognition process. There are many difficulties encountered in the process of
face detection namely:
15
• Element occlusion: the presence of things like eyewear, facial hair, or hats can
affect performance and detection.
• Face pose: in an ideal situation the user is always facing the camera directly but
this is not the case in general, especially video feed.
• Face expressions: facial traits are affected by the expressions and so affect both
detection and recognition.
• Image conditions: different cameras have different levels of distortion and quality,
the image is also heavily affected by lighting conditions.
2.4.1 Haar Cascade
Haar Cascade is an object detection algorithm designed to locate objects within

images or videos in real-time.First introduced by Paul Viola and Michael Jones in their
paper "Rapid Object Detection using a Boosted Cascade of Simple Features",
this seminal work has become a highly cited piece in the field of computer vision.[12]
In their paper, Viola and Jones present an algorithm that can detect objects in images,
regardless of their position and scale. This algorithm can also be applied in real time,
making it suitable for identifying objects in video streams. Although their main focus
is on face detection in images, the method can be adapted to train detectors for other
types of objects. The technique involves sliding a fixed-size window across an image at
various scales. At each position, the algorithm calculates specific features and classifies
the region as either containing a face or not. This process involves machine learning,
specifically training a classifier using positive and negative samples of faces:
• Positive data points are examples of regions with faces.
• Negative data points are examples of regions without faces.
Figure 2.1: The 5 different types of Haar-like features extracted from an image patch.[3]
16
By utilizing these samples, we can train a classifier to recognize whether a particular

region of an image contains a face. For each of the stops along the path of the sliding
window, five rectangular characteristics are calculated.To extract features from each of
the five rectangular areas, subtract the pixel sum of the white region from the pixel
sum of the black region. This calculation is significant in the context of face detection
because:
• Eye areas are typically darker than cheek areas.
• The nose area is usually brighter than the eye area.
Figure 2.2: Haar cascades are a class of object detection algorithms [4]
Analyzing these five rectangular regions and their corresponding differences in pixel
sums, we can create features that help classify different parts of a face. Then, for
an entire dataset of features, we use the AdaBoost algorithm which is a supervised
learning algorithm that is used to classify data by combining multiple weak or base
learners (e.g., decision trees) into a strong learner to select which ones correspond to the
facial regions of an image. Haar cascade offers several advantages. It quickly computes
Haar-like features using integral images, enhancing processing speed. Additionally, it
efficiently selects features using the AdaBoost algorithm. A key benefit is the ability
to detect faces in images irrespective of their location or scale. Lastly, the Viola-Jones
object detection algorithm can operate in real time, making it practical for various
applications. Haar-cascade is by far not a perfect algorithm, it is notorious for false
positives and will report a face where no face is present in an image.
17
2.4.2 Histogram of Oriented Gradients
HOG is a feature descriptor used in computer vision and image processing, partic-
ularly in object detection and recognition tasks. It was introduced by Navneet Dalal
and Bill Triggs in 2005 and is widely used for its effectiveness in capturing local shape
and appearance information from images.[13] Initially, the input image is converted to
grayscale to simplify computations. Additionally, it may undergo gamma and color
normalization to enhance contrast.
Figure 2.3: Gray-scaled image and surrounding pixels.[5]
Next, the method calculates gradients using filters such as the Sobel filter. These
filters determine the horizontal and vertical derivatives of the image. For each pixel,
the gradient magnitude (strength) and orientation (angle) are computed. The image is
then divided into small, square, or rectangular regions known as cells. Typically, these
cells are around 8x8 or 16x16 pixels in size. For each cell, a histogram of gradients is
calculated, representing the distribution of gradient orientations within the cell. The
orientations are divided into bins, such as nine bins for orientations ranging from 0 to
180 degrees or from 0 to 360 degrees. Each gradient contributes to one or more bins in
the histogram based on its orientation and magnitude, with contributions weighted by
the magnitude of the gradient.
Figure 2.4: Original Image.[5]
18
Figure 2.5: HOG image representation with characteristics.[5]
Cells are grouped into overlapping blocks, such as 2x2 cells. The histograms of
cells within each block are concatenated and normalized to account for variations in
lighting and contrast. Finally, the histograms of all cells (after normalization) are
concatenated to form a feature descriptor for the image. This descriptor can be used
as input to machine learning algorithms such as Support Vector Machines (SVMs) for
tasks such as pedestrian detection and other object detection applications. HOG is
effective because it captures local object shape and appearance in a way that is robust
to changes in lighting and image noise.
Figure 2.6: Face detected using a HOG model generated from multiple faces.[5]
2.4.3 CNN and DNN
While algorithms do provide good results, a more accurate approach would be us-
ing CNN (Convolutional Neural Networks) and DNN (Deep Neural Networks) mod-
els.CNNs and DNNs automatically learn complex features from the data, whereas algo-
rithms like Haar Cascade and HOG rely on handcrafted features. This ability to learn
hierarchical features makes CNNs more robust and effective. They also achieve higher
accuracy in face detection tasks due to their ability to model complex patterns and
relationships in data. The higher accuracy and reduced false negatives and positives
19
make them better suited for tasks that require accuracy, but this comes at the cost of
performance when resources are limited or the task is time-sensitive like real-time face
detection.
Figure 2.7: CNN architecture.[6]
2.4.4 MTCNN
A popular approach to CNN face detection is “Multi-Task Cascaded Convolu-

tional Neural Network,” or MTCNN for short, described by Kaipeng Zhang, et al.
in the 2016 paper titled “Joint Face Detection and Alignment Using Multitask Cas-
caded Convolutional Networks.”[14] It is popular because it achieves accurate results
on a multitude of data sets and is capable of landmark detection, differentiating be-
tween eyes, mouths, and such. MTCNN uses a 3-network cascade structure; first, an
image is resized to what is called an "image pyramid", then a proposal network (P-net)
suggests candidates for facial regions. A second model called refine network (R-net)
filters bounding boxes, and a final output network (O-net) outputs facial landmarks.
20
Figure 2.8: MTCNN 3-network stages[7]
The models operate sequentially, where the output of one model becomes the input
for the next, allowing for intermediate processing steps such as non-maximum suppres-
sion (NMS) to filter bounding boxes. This cascade structure enables refined detection
across stages. Implementing the MTCNN architecture can be complex, but there are
open-source implementations and pre-trained models available for immediate use and
further training on new datasets for face detection tasks.
2.5 Face Recognition

After a face is detected, the image is passed onto a face recognition model. The goal
of this model is to extract features from a face image in order to identify the person.
these features are based the person’s unique facial traits and are different from one
person to another. What face features are taken into consideration varies from one
model to another but in general it includes eyes, mouth nose, and distances between
these elements. There are two main ways to extract features, a deep-learning approach
and a non-deep-learning one.
21
2.5.1 Non-Deep Learning Method
Non-deep learning methods, such as Eigenfaces, Fisherfaces, and Local Binary Pat-
terns (LBP), rely on handcrafted features. These features are manually designed based
on domain knowledge and may not capture complex patterns as effectively as deep
learning models. feature extraction and decision-making are often separate processes.
Features are extracted first, and then a classifier (e.g., SVM or k-NN) is used for recog-
nition.they are advantageous in regards to data requirements, needing only small data
sets to operate at an acceptable level.
2.5.2 Deep Learning Method
Deep learning is a sub-field of machine learning that involves training artificial neu-
ral networks with multiple layers (deep neural networks) to learn representations and
patterns from data. These networks are capable of automatically learning complex
features from raw data, making deep learning a powerful tool for tasks such as image
recognition, natural language processing, and speech recognition.
FaceNet An example of deep learning models intended for face recognition. It is a 22-
layer deep model published by Google researchers Schroff et al. It takes an input image
and outputs a 128-dimension vector, called embedding, in reference to the amount of
information embedded within this vector.[15]
Figure 2.9: Facenet embedding vector extracted from image
The idea behind FaceNet is to train a model to generate these 128-dimension vec-
tors, the training process involves loading 3 images, 2 belong to the same person while
another is different. The algorithm will then generate a vector for each image and
22
modify the neural network according to the distance measured since images 1 and 2
are closer than 2 and 3. This method is called triplet loss. This model can be used in
conjunction with a classification model trained on the generated embeddings.
Figure 2.10: Triplet loss example
VGG-Face VGG-Face is another model that also generates a vector for recognition.
It was described by Omkar Parkhi in a 2015 article titled "Deep Face Recognition".[16]
This model uses data augmentation during training to improve the performance and
robustness of the model. This technique involves applying random transformations to
the training data to increase its diversity and help the model generalize better. VGG-
Face generates a larger 4096-dimension vector which makes it more accurate. This
algorithm can also be used along with a classification model to skip the correspondence
step.
23
Figure 2.11: VGG-Face data augmentation example
2.6 Classification versus embedding correspondence

There are 2 ways to approach the recognition segment of the system, Classification
and embedding correspondence.
2.6.1 Classification
The classification method involves adding a classification layer to the AI model or a

separate classification model at the tail of the identification pipeline. Using this method
means that the resulting output is the user identification, so given an image the model
returns the unique ID without having to do this step ourselves. It can be more precise
and we can train the model on many images for the same user to increase accuracy.
2.6.2 Embedding Correspondence
Once a novel photo has been loaded into the system, it then proceeds through an
analysis process to acquire a feature matrix. For this, the matrix which is like a fin-
gerprint of the face, captures the most essential element of it. It transmits no real
distances, instead, it contains coded vital measurements like the distance between the
eyes, the width of the nose, and the proportions of facial features.
24
Next, the feature matrix for the new face is computed and the system employs
distance metrics such as Euclidean distance metrics to compare the new face to the
database of stored faces. How about each facial biometric also have corresponding
feature sets? Distances metric, namely the Euclidean distance, estimate the similarity
between the matrix of the new face’s feature and all the matrices of the database. The
person with the biggest number of matching features from the database is designated
as the supposed match.
In other words, the feature matrix stands for face characteristics, and the distance
metric returns the best match from the database after comparing it against encoded
features.
Figure 2.12: Facial Embeddings Extraction System
2.7 Method choice

Here, we will discuss the face detection and recognition applied for the success of the
system:
2.7.1 Face detection
For the face detection process we chose Haar cascade because it was fast and overall
reliable. While it is prone to false positives we have not found an issue using it and only
encountered false positives a handful of times and they were easily distinguishable from
real faces. Other methods were not reliable or consumed too many resources, since the
detection is a continuous process resource consumption was a major factor in choosing
Haar cascade over other models and methods
25
2.7.2 Face recognition
For the face recognition model, we could not decide on a single approach from research
alone. Almost all the methods and models were promising so we decided to try a number
of these models, some we trained and others pre-trained, with the goal of determining
which one is the best for our system. We tried both deep learning and none deep
learning methods. The best-performing model was a pre-trained FaceNet model. It
was an embedding extracting model without the classification. A part of why we had
forgone classification is the difficulty of adding and removing users.
When a new user is added, the model must be retrained and tested with the new
user image, which can be time-consuming. Also, using the classification technique,
the system must be taken offline and then restarted with the newly added user for
the change to occur. Removing a user is significantly more difficult because we can’t
"untrain" the model with user data. To remove a user, we must first find a previous
version that has not been educated on the user’s data or add a filter to the decision-
making system. Adding a filter for users may be a solution, but we end up with a model
that considers data in its prediction.
Using the embedding correspondence approach separates identification and feature

extraction, making it easy to add or delete users. To provide someone access, all we need
to do is delete their face embeddings from the database, and they will be "unknown"
to the system. The option of installing a filter is also available without impacting
performance.
2.8 Conclusion
In this chapter we discussed how face detection and recognition is achieved citing
important papers on the subject and then deciding which approach is best suited for
out needs. In the next chapter, we will talk about the design of the system, how
these techniques play a role in the system, and how different components operate and
communicate in the system.
26
Chapter 3
System design
27
CHAPTER 3. SYSTEM DESIGN
3.1 Introduction
In this chapter we discus the design of the system, how it behaves, the different re-
quirements and components that make up the system and how they communicate with
each other as well as other systems.In the first section we have an overview of the sys-
tem,then we discus the system’s requirements,the general design and finally the data
flow throughout.
3.2 System Overview

The general design of the system is as follows, a video feed from a camera positioned
in an optimal way is read and processed by a facial detection system.This system then
determines the bounding box containing the user’s face and extracts it from the image.
it is then sent to a facial recognition system that determines the user’s identity and sends
the information, whether the user is known or unknown, to decision making service that
logs, opens doors or grants access to resources based on the identity provided.
Figure 3.1: General design diagram.
3.3 System Requirements

Since the purpose of the system is access management, various requirements need to be
met in order for it to be usable and reliable.
28
3.3.1 Accuracy
Perhaps the most important requirement is accuracy, the system needs to be able
to accurately identify a user when given an image of sufficient quality. A failure to
correctly identify a person could lead to time wasting in the case of not granting access
or breaking security measures in the case of an incorrect identification attributed to a
person.
3.3.2 Performance
A key requirement for this system is performance in terms of resource consumption

and latency in response time.Since multiple users will be accessing the building/floor
multiple times a day the system needs to be able to handle multiple incoming requests
and respond in an adequate amount of time without putting strain on servers and the
network.
3.3.3 Security
As with any system that deals with restricted resource access and sensitive personnel
data, there is a minimum security requirement that needs to be met.API requests
between the various components need to be secured and users personnel data need to
be stored in a safe and legally-compliant manner as to guarantee the privacy of users.
3.4 General Design

The different components of the system, including the AI model, interact using REST
APIs.This way they can be developed using different technologies stacks as needed
without having to worry about compatibility. This also allows the system to be exten-
sible and take image inputs from IoT devices, web and mobile applications and security
cameras.
3.4.1 Modules and components
After reviewing the needs and requirements of the system the following composition of
modules and techniques were chosen.
29
3.4.2 Facial Detection Module
The facial detection component is responsible for handling the multiple video feeds and
detecting and sending faces.It takes camera information from a database and starts
a detection thread on that feed and communicates any detected faces with the main
component.Along with the face image it sends the timestamp of when the face was
detected along with the camera identifier to know where the user was detected.Using
this information, the main component can proceed with the recognition process and
sends the user id to the appropriate system.
Figure 3.2: Face detection component design diagram.
3.4.3 Facial Recognition Module
the facial recognition model component is responsible for converting face images into
embeddings for use in identification or adding users to a database.This is where the
main use of AI comes into play.It listens for incoming requests that contain image data
encoded in base64, it then converts the data back into a readable image form and runs
it through the AI model.The resulting embeddings vector is then sent back to the client.
30
Figure 3.3: AI module behavior diagram
3.4.4 Main Server Module
Because the face detection and face recognition modules are separate an intermediate
component is needed.This component is responsible for interacting with the detection
component and the AI model.It receives the face images,stores them on disk, and search
for corresponding embeddings in the database and sends the results to the decision
making component.It also handles authentication with the central system for database
access.
Figure 3.4: Main component role in system
31
3.4.5 REST APIs
REST API stands for "Representational State Transfer Application Programming In-
terface." It is a way to design networked applications using a set of constraints and
principles that help standardize the way systems communicate over the web. Because
our modules and the overall system need to be independent we chose this method of
exchanging information.The key features of REST API design are :
• Stateless
• Client-Server Architecture
• Resource-Based
• HTTP methods
• JSON/XML Format
• Layered-system
• Cacheability
The choice of using the REST API also stems from the performance and security
requirements. In terms of performance it allows us to run the different components on
separate machines tailored for each module’s resource consumption as well as allowing
us to develop these components in more performance oriented languages if the need
arises. Since we can use HTTPS/TLS for communication and the modules can be
deployed in separate networks (allowing for example the database to be hosted in a
secure machine) the security requirement is met.
3.4.6 Database
There are many types of databases to choose for this type of application but the two
main types are SQL databases and noSQL databases.SQL databases are perfect for re-
lational data while noSQL databases are best suited for non -relational data of different
types.The choice of database type is also restricted by what is already in use by SFM
Group, and since they already have an SQL database management system we will be
implementing SQL into our system to eliminate compatibility issues and reduce type
conversion processes.
32
Figure 3.5: SQL vs noSQL comparison.[8]
3.5 Data flow and component intercommunication

In this section we visualize the data flow in the system from end to end and explain how
components communicate.We also demonstrate how the user interacts with the system
and the different classes involved in the interaction.
3.5.1 Use-Case Diagram
The use-case diagram depicts how the users will interact with the system. The only
effective interaction is when the user walks inside the range of facial detection.Aside
from this interaction, the user has no other options that can influence the behavior or
outcome of the recognition process.
Figure 3.6: Use-Case diagram
33
Use case Grant access to areas based on facial features

Actors User
Precondition User is added to the database.
Postcondition User is able to access the area.
Normal scenario After approaching the camera, the user is recognized
and granted access to the area.
Alternative scenario A user who is not in the database or does not have
access to a specific area gets denied.
Table 3.1: Refinement of the general use case
3.5.2 Class Diagram
The class diagram defines the classes from which objects will be derived and used.For
our system the main classes are the user and camera classes with other classes defined for
ease of implementation and abstraction of the system like the level class.The following
diagram describes the main classes in the system.
Figure 3.7: Class diagram.
3.5.3 Sequence Diagram
The purpose of the sequence diagram is to explain the interactions between modules in
time-based manner.The following diagram explains when each component intervenes in
the overarching recognition process.
Figure 3.8: Sequence diagram describing system.
34
3.5.4 Activity Diagram
This diagram describes the general behavior and activity of the system explained in the
previous section.It defines the general behavior of the system in each step and results
of decision nodes.
Figure 3.9: Activity diagram
3.5.5 Authentication
The authentication component is a utility that already exists within the central man-
agement system of SFM.It receives a username and password and sends back a token
to be used to access the databases and send the user’s id back to the central system.
35
3.6 Conclusion
In conclusion the system needs to be decided into independent components that can
reliably communicate with each other.A central module links the detection,AI model
and database to execute the recognition and then communicates the result with a
decision making system.The communication is ensured by using web requests in the
form of REST APIs and security is guaranteed by an authentication service and the
use of TLS encryption.In the next chapter we will discus the implementation of the
design in depth.
36
Chapter 4
Implementation
37
CHAPTER 4. IMPLEMENTATION
4.1 Introduction
This chapter discusses the working environment and covers the chosen technology
stack in detail. It goes on to explain the various models trained, thus explaining the
reasons and processes for model selection. Lastly, system testing takes place, and
security considerations are checked.
4.2 Working environment

The following section details the working environment including hardware and software.
4.2.1 Hardware environment
Specification Laptop 1 Laptop 2

RAM 8 GB 8 GB
Storage 512 GB SSD 512 GB HDD
CPU Intel(R) Core(TM) i5-10300H CPU @ 2.50GHz Intel(R) Celeron @ 2.50GHz
GPU NVIDIA GeForce GTX 1650 Integrated graphics
Camera resolution 1280×720 –
Table 4.1: Hardware used in project
4.2.2 Software environment
Windows 10 A Microsoft operating system for personal computers used as the main
development environment.
Ubuntu Server An open-source environment that delivers the best value scale-out
performance and available for free.
4.3 Tech stack
4.3.1 Programming languages
Python
38
• Description: Python is a popular programming language. It was created by Guido

van Rossum, and released in 1991.
• Use Case: Many libraries and frameworks for AI are designed with python.
4.3.2 Frameworks
Flask
• Description: A web application framework written in Python, developed by Armin

Ronacher.
• Use Case: Flask is easy to get started with and keeps the core of the application
simple and scalable.
Tensor-Flow
• Description: An open source framework developed by Google researchers to run

machine learning, deep learning, and other statistical and predictive analytics
workloads.
• Use Case: It’s designed to run machine learning, deep learning workloads and to
streamline the process of developing and executing advanced analytics applica-
tions
DeepFace
• Description: A lightweight face recognition and facial attribute analysis frame-

work for python.
• Use Case: It is a hybrid face recognition framework wrapping state-of-the-art

models: VGG-Face, FaceNet, OpenFace, DeepFace, DeepID, ArcFace, Dlib, SFace
and GhostFaceNet.
4.3.3 Libraries
OpenCV (Open Source Computer Vision Library)
39
• Description: An open source computer vision and machine learning software li-
brary.
• Use Case: It provides a common infrastructure for computer vision applications.
Numpy
• Description: It is the fundamental package for scientific computing in Python.
• Use Case: It provides a multidimensional array object, and various derived objects
(such as masked arrays and matrices).
Requests
• Description: It is a common library for making HTTP requests in Python.
• Use Case: It makes requests behind a simple API that returns a Response Object
with all the necessary data (content, encoding, etc)
4.3.4 Application Servers
Gunicorn (Green Unicorn)
• Description: It is a Python WSGI (Web Server Gateway Interface) HTTP Server

for UNIX.
• Use Case: It is broadly compatible with various web frameworks, simply im-
plemented, light on server resources, and allows multiple instances of the same
application to be deployed.
Nginx
• Description: It is open source software for web serving and reverse proxying.
• Use Case: It directs the client’s request to the appropriate back-end server and
HTTPS encryption.
40
Postman
• Description: It is an API platform for building and using APIs.
• Use Case: It helps to build, test and modify APIs.
4.4 Model Training

To develop a robust face recognition system, we had to compare the performances
of different models. To ensure that we choose the best one that accurately recognizes
people’s faces.
4.4.1 LBPH(Local Binary Patterns):
The first one we tried was a non deep learning algorithm called LBPH that summa-
rizes facial textures using histograms (distribution of numerical data) of the unknown
face and then compare them to those from known faces in a database. We used a
dataset of 5 faces, each face has 50 images. We reached a maximum of 46%.
Figure 4.1: LBPH Performance
4.4.2 ResNet(CNN)
The second type of model we trained was ResNet model that is a type of feed-
forward neural network but with a number of regularization effects where, the feature
engineering is done by the model itself through filter optimization where We trained 5
faces, each face has 50 images.
The main architecture of ResNet is composed of the two layers of convolution, with
32 and 64 filters of 3 (column) by 3 (rows) in size respectively. These layers learn
various low level features from the input image and the Later convolutional layers use
1×1 filters to learn more localized features.
41
The Activation Function used were ReLU (Rectified Linear Unit) activation that are
normally used after most of the convolutional layers so as to add non-linear transfor-
mations that assist the network detect non-linearity features.
After some of the convolutional layers we applied batch normalization to enhance the
rate and stability of training.
We also applied Dropout layers to minimize the risk of overfitting and set them in
a way that a 50% probability of neurons being dropped out would occur during the
training phase. Then MaxPooling layer down samples the image while obtaining the
max activation in the regions of interest.
The obtained features are then flattened into a 1D vector and then passed through
two fully-connected layers.
The first dense layer has 32 neurons next which is followed by a last dense layer
having the number of neurons equal to the number of classes to be predicted and in the
Output Layer, Sigmoid activation has been used.
Lastly, the categories were compiled using categorical cross-entropy as the suitable
loss function for a multi class–multi label dataset and stochastic gradient descent is
used as the optimizer.
Figure 4.2: CNN Performance
4.4.3 FaceNet
While comparing the performance of the different models that can be used in our
facial recognition system, we determined that by far, the best one is Facenet since it is a
deep learning architecture that can produce robust and high-quality facial embeddings.
42
4.5 Implementation
4.5.1 Facial detection component
First we need to establish a way to receive video feed from cameras.In SFM a work-
ing CCTV system is already in place, using RTSP to send video feed to a central
server.Using the RTSP URL allows us to handle the video feed in the code and pass
frames to the detection function. We used Opencv’s haarcascade model to detect faces,
once a face is detected, we use the bounding box and extract the face from the frame,
then this face image is converted to base64 in order for it to be sent in JSON file.The
Base64 string is then sent, along with the timestamp of when it was detected and
the camera id from which it was detected from, to the intermediate server component
which handles the conversion,recognition,logging and communication with the decision
making component.
Figure 4.3: Detection and conversion process
Once a response confirming that the data has been handled correctly the system
resumes analysing the video frames looking for face patterns to repeat the process
again. In order to handle multiple cameras at the same time the program first fetches
information regarding the cameras from a database.This database contains the unique
identifier of each camera and the RTSP link to establish connection.Once the data is
43
received, a thread is launched, creating a separate process for each camera with its ID
and link.Using multi-threading allows each camera to operate independently,optimizes
recourse usage and allows the program to be dynamic and scalable.
Figure 4.4: Face detection log running on a single camera.
4.5.2 Main Server
In order for facial detection and facial recognition components to be independent

and reusable, an intermediate service is needed. Once the server is up and running,
it listens to requests from the facial detection component.Once a request is received,
different processes happen. First the Base64 data is sent to the AI model API and
the corresponding facial features are stored in memory.At the same time the program
checks if the a folder corresponding to the camera that detected the face exists.If not,
a folder is created in order to save the image.The image is then converted from B64 to
JPEG format and a unique image ID is generated and saved with the unique id on disk
inside the corresponding camera folder.
Figure 4.5: Main component implementation
The software moves on to the recognition phase.The face features stored in memory
44
are supplied to a function that loops over the database, calculating the distance between
the detected picture facial features and the recorded facial features.If a distance is
discovered to be less than a certain measure, the method returns the user id associated
with the database embedding; otherwise, the text "unknown" is returned. When the
recognition process produces a result, the timestamp and camera id are passed to the
decision-making process.At the same moment, a "200 OK" response is delivered back
to the detecting component, allowing it to resume its work. Finally, the intermediate
server continues to listen for incoming requests.
Figure 4.6: Main server event log
Figure 4.7: Automatically produced folders containing face photos, each with a unique
id.
Figure 4.8: Main server resulting response after identification.
4.5.3 AI model server
For the implementation of the AI model we used DeepFace library, an open source
python wrapper for many of the popular models including Facenet.The primary reason
45
for utilizing this library is its representation function, which takes an image as input
in the form of a numpy array or the image location and returns the image embed-
ding.Although the library has other functions, all we needed was a simple code-based
method of interacting with the model.
Figure 4.9: Sequence diagram for AI model implementation
The idea for creating an API for the model stems from performance requirements; if
we include the model loading and represent function in our verification code, it will be
loaded and unloaded from memory each time the function is called, adding significant
latency given that the model is approximately 100 Mb in memory.So the goal is to load
the model into memory once and then access it several times via a web API.
Figure 4.10: AI model server log confirming request is received correctly.
46
Figure 4.11: AI model server response containing face embeddings.
When the model server starts, it listens for new requests.The request is a JSON
file that includes a "image" key with a Base 64 representation of the image.This B64
string is then transformed to a JPEG image in memory, then to a numpy array, which
is supplied to the "represent" function.The function’s return value is subsequently sent
to the client.
4.6 Performance Testing

To achieve performance objectives, a variety of tests have been run to identify where
problems occur in the system. Because the system is broken into modules, each one
can be evaluated and tweaked to improve performance.
4.6.1 Model Performance
Several tests have been performed on the model to assess its performance, primarily
how accurate it is and how well it can build a facial embedding from an image.Using a
modest 40-image dataset, the model achieved 98 percent accuracy and performed well
even with varying facial expressions. During testing, we discovered that the speed of
producing a result from an image is proportional to the size of the image.The larger
the image, the longer it will take to build an embedding.
4.6.2 Detection Performance
A variety of factors influence detection performance. The accuracy is influenced by

image resolution, lighting conditions, and camera positioning. Image resolution and
47
lighting influence detection, although distance can be mostly ignored if the preceding
two conditions are met. The accuracy of detection cannot be easily measured.False
positives are common using Haar cascade face detection, but this is insignificant under
ideal conditions. Hardware has a considerable impact on performance, with weaker
CPUs processing fewer frames in the same time interval as capable CPUs.The detection
procedure is done in real-time and is continuous, putting a constant load on processors
and memory.
4.6.3 Processing performance
Processing the images includes resizing and converting the image from base64 to
byte data.This processing adds strain on the system but only when a face is sent for
identification.We see spikes in resource consumption when an image is being processed
so we do the processing in the intermediate phase to reduce the load on the AI model
component.
4.6.4 Results
We used postman to conduct performance and connectivity checks. We conducted

two types of tests, one is a speed test to see how fast of a response we can get, the
second is a load test where we gauge how many requests can the system handle at the
same time.
Latency Test: The response latency is the time that a server takes to respond to a
request. For our system it is the time that it takes to identify a face when a user is
positioned in front of the camera. Both the intermediate server and the AI model server
were tested separately in order to identify where the performance bottle-necks are.
48
Figure 4.12: AI model server request and response time
Load Test: The purpose of the load test is to see how many requests can the system
handle at the same time without slowing down or crashing. The test is conducted using
Postman’s runner function which allows us to define a request and send it multiple
times at the same time. One example is sending ten requests three times a second for a
total of 107 per minute.We found that,even when running the system on hardware that
is low in performance, we did not find a significant drop in response time or crashing.
Figure 4.13: Example test and results
4.7 Optimisation
Various measures were taken to optimize overall performance.For example image pro-
cessing is done in the main component and not in the detection phase to reduce resource
consumption when treating multiple frames a second.Copies of the user database are
kept on local machine with the main component to reduce the amount of requests made
to the database server and speed up the search for the user. Both the main server and
the AI model server are deployed using Gunicorn, which allows us to deploy multiple
workers in order to handle multiple requests at the same time.
49
4.8 Security
4.8.1 HTTPS implementation
In order to implement HTTPS in our system we used Nginx as a reverse proxy to

the Gunicorn application and configured HTTPS on Nginx.This allows our system to
be interacted with in a secure manner and prevents request tampering or hijacking.
Figure 4.14: Nginx reverse proxy explanatory diagram.
4.8.2 Authentication API implementation
Authentication was implemented within the server applications.It is independent

from the system and is only called when a token needs to be generated or has ex-
pired.Periodically the application will check token validity and will retrieve a new one
in case of error.
Figure 4.15: Example of authentication request
50
Figure 4.16: Example auth response (image is only representative and not reflective of
actual token size
4.9 Conclusion
In this chapter we detailed the implementation of the design, explaining how each
component works and communicates.We also discussed the security and the perfor-
mance testing conducted.
51
General Conclusion
In summery this report details the research design and implementation of an access
management system using artificial intelligence based facial recognition.The result is a
rudimentary system that is extensible and easy to implement within existing software
and hardware architecture.
The research conducted facilitated the design process by allowing us to compare the
different facial detection and facial recognition techniques, algorithms and models.The
methods chosen during the design process were justified according to the requirements
specified in the project overview.
The results obtained were very promising.The system was fairly accurate, utilizing
only one picture to recognize a person and recognition process was very fast considering
it was running on average hardware.
On a critical note the system could have been designed in a more object-oriented
way to facilitate extensibility.We could have also used programming languages that are
more performance oriented for components with such requirements.The way we encode
and send images can use more optimization to reduce time spent on conversion.Like
wise for the database, if we utilized a document based database instead of an SQL one,
we would gain some flexibility in saving and loading data.
Overall the project is good starting point for creating a facial recognition access
management system.
52
Bibliography
[1] Iot-solutions. "https://solutions-iot.sfmtechnologies.com". (accessed:

2024-03-11).
[2] company-website. "https://www.sfmtechnologies.com/en/". (accessed: 2024-

03-11).
[3] A. Rosebrock. Opencv haar cascades. pysearchimage.com, 2021.
[4] Haar features used for viola jones face detec-

tion method. https://www.researchgate.net/figure/
Haar-features-used-for-Viola-Jones-face-detection-method_fig1_
268348020, 2009. Accessed: 2024-4-26.
[5] Adam Geitgey. Machine learning is fun! part 4: Modern face recognition with
deep learning. https://miro.medium.com/v2/resize:fit:1100/format:webp/
1*6xgev0r-qn4oR88FrW6fiA.png, 2016. Accessed: 2024-4-11.
[6] Introduction to convolutional neural networks. https://www.theclickreader.

com/introduction-to-convolutional-neural-networks/, unkown. Accessed:
2024-4-02.
[7] Ramandeep Kaur and Er. Himanshi. Face recognition using principal component
analysis. In 2015 IEEE International Advance Computing Conference (IACC),
pages 585–589, 2015.
[8] Kawtar Choubari. Sql vs nosql : What’s the best option for your
database? https://medium.com/ieee-ensias-student-branch/
sql-vs-nosql-whats-the-best-option-for-your-database-3e0fe08c1449,
2020. Accessed: 2024-5-13.
[9] Groupe sfm. "https://www.linkedin.com/company/groupe-sfm/about/". (ac-

cessed: 2024-03-10).
53
[10] Nandkishor More. What is agile methodology, in simple words?
https://medium.com/@nandkishor204/what-is-agile-methodology-in-simple-words-
741ee44ede51, 2023.
[11] What is scrum? https://aws.amazon.com/what-is/scrum/. Accessed: 2024-04-

1.
[12] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple
features. In Proceedings of the 2001 IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition. CVPR 2001, volume 1, pages I–I, 2001.
[13] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection.
In 2005 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR’05), volume 1, pages 886–893 vol. 1, 2005.
[14] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. Joint face detec-
tion and alignment using multitask cascaded convolutional networks. IEEE Signal
Processing Letters, 23(10):1499–1503, October 2016.
[15] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified em-
bedding for face recognition and clustering. In 2015 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pages 815–823, 2015.
[16] Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition.
In British Machine Vision Conference, 2015.
54
Résumé
Ce projet présente le développement et la mise en œuvre d'un système de
gestion d'accès par reconnaissance faciale utilisant des technologies
d'intelligence artificielle (IA). L'objectif principal du système est d'améliorer la
sécurité et de rationaliser le contrôle d'accès dans divers environnements
tels que les bureaux d'entreprise, les installations sécurisées et les
complexes résidentiels. En utilisant l'apprentissage profond pour l'extraction
et la reconnaissance des traits faciaux, le système offre une grande précision
et efficacité dans l'identification des individus autorisés. L'architecture du
système comprend le traitement d'image en temps réel. Le système est
conçu pour être facilement intégré aux infrastructures de sécurité existantes.
Le déploiement de ce système de gestion d'accès par reconnaissance
faciale basé sur l'IA promet une sécurité renforcée, une réduction des
charges administratives et une expérience utilisateur fluide, en faisant une
solution viable pour les défis modernes du contrôle d'accès.
Mots clés : intelligence artificielle,reconnaissance faciale,gestion des accès,sécurité physique
Abstract
This project presents the development and implementation of a facial recognition
access management system leveraging artificial intelligence (AI) technologies.
The primary objective of the system is to enhance security and streamline access
control in various environments such as corporate offices, secure facilities, and
residential complexes. Utilizing deep learning for facial feature extraction and
recognition, the system offers high accuracy and efficiency in identifying authorized
individuals.The system's architecture includes real-time image processing.The system
is designed to be easily integrated with existing security infrastructures.
The deployment of this AI-based facial recognition access management system
promises enhanced security, reduced administrative overhead, and a seamless user
experience, making it a viable solution for modern access control challenges.
Keywords : artificial intelligence,facial recognition,access management,physical security

Raport PFE

Uploaded by

Copyright:

Available Formats

Raport PFE

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Raport PFE

Uploaded by

Copyright:

Available Formats

Department: STIC

Applied License in Information and Communication Sciences

End of Studies Project

Realized by: Youssef BAHRI

Superviser(s): HaifaBEN SABER

Host Company: Groupe SFM

Academic year: 2023-2024

2 State of the art 14

3.4.1 Modules and components . . . . . . . . . . . . . . . . . . . . . . 29

1.1 SFM IT Solutions[1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.1 General design diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Face detection component design diagram. . . . . . . . . . . . . . . . . 30

4.1 LBPH Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

1.1 Our Scrum Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1 Refinement of the general use case . . . . . . . . . . . . . . . . . . . . . 34

4.1 Hardware used in project . . . . . . . . . . . . . . . . . . . . . . . . . . 38

In today’s security-conscious world, access management systems play a vital role in

The second chapter will provide a foundational understanding of facial recognition

1.2 Project context

1.3 Host company

1.4 SFM Solutions

Figure 1.1: SFM IT Solutions[1]

1.5 SFM activities

• Audit and Benchmark of mobile network quality of service.

3. Research and Development:

In this second illustration we represent the locations of SFM activities:

Figure 1.2: Locations of SFM activities[2]

1.6 Functional chart of SFM

Figure 1.3: Chart of SFM

1.7 Problem Statement

1.9 Functional Needs

1.10 Non Functional Needs

1.11 Theoretical Gantt chart

Figure 1.4: Gantt Diagram 1

Figure 1.5: Gantt Diagram 2

1.12 Development methodology

1.12.1 Agile Method

• Extreme Programming (XP)

We are interested in the SCRUM method.

• Product Owner: The Product Owner focuses on ensuring the development

Figure 1.6: Life Cycle of Scrum Methodology

• Scrum development team: The Scrum Team consists of testers, designers, UX

Scrum Roles Person Responsible

• Daily Scrum or stand-up: A Daily Scrum is a short meeting in which team

• Sprint 1: Literature Review and System Requirements Gathering.

• Sprint 2: System Development and Training.

• Sprint 3: System Testing and Implementation.

• Product Backlog: The Product Backlog is a dynamic list of features, require-

• Sprint Backlog: The Sprint Backlog is the list of items to be completed by

• Increment: The Increment is a step towards a goal or vision. It is the usable

1.13 Choice of the modeling language

Figure 1.7: UML Model

State of the art

2.2 Defining facial recognition

2.3 Facial Recognition Steps

• image data input

• face detection in image

• extraction of facial features

• recognition using the extracted features