Master's Thesis: Houcem Eddine TESTOURI
Master's Thesis: Houcem Eddine TESTOURI
Master's Thesis: Houcem Eddine TESTOURI
To obtain :
Elaborated by :
SERCOM Laboratory
2020-2021
Acknowledgement
Also, I would like to thank my fiancée Meriem Ata for her wise counsel
and sympathetic ear. You were always there for me.
Finally, I could not have completed this work without the support of my fam-
ily and friends who provided with stimulating discussions as well as happy
distractions to rest my mind outside of my research.
i
Contents
List of Figures iv
Introduction 1
1 Background 3
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
I Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . 3
I.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 3
I.2 Internet of Multimedia Things . . . . . . . . . . . . . . 3
II Big data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
II.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 6
II.2 Big Data characteristics . . . . . . . . . . . . . . . . . 7
II.3 Definition and characteristics of multimedia big data . 8
II.4 Big Data applications in multimedia . . . . . . . . . . 9
III Multimedia big data and IoT . . . . . . . . . . . . . . . . . . 10
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
ii
3 Related work 25
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
I AMSEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
II RAM3 S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
III Rate adaptive multicast video streaming over software defined
network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
IV Real-time Data Infrastructure at Uber . . . . . . . . . . . . . 29
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4 Solution approach 31
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
I System architecture . . . . . . . . . . . . . . . . . . . . . . . . 31
II MAFC architecture components . . . . . . . . . . . . . . . . . 34
II.1 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . 34
II.2 Flow controller . . . . . . . . . . . . . . . . . . . . . . 34
II.3 Policy adapter . . . . . . . . . . . . . . . . . . . . . . . 34
II.4 Control policies database . . . . . . . . . . . . . . . . . 34
II.5 Message broker . . . . . . . . . . . . . . . . . . . . . . 35
II.6 Stream processing engine . . . . . . . . . . . . . . . . . 36
II.7 Third parties services . . . . . . . . . . . . . . . . . . . 36
III Application to surveillance systems . . . . . . . . . . . . . . . 36
III.1 Implementation of multimedia stream controller . . . . 36
III.2 Message broker . . . . . . . . . . . . . . . . . . . . . . 36
III.3 Implementation of multimedia data processing engine . 39
IV Experimental evaluation . . . . . . . . . . . . . . . . . . . . . 39
IV.1 Processing time . . . . . . . . . . . . . . . . . . . . . . 40
IV.2 End-to-end latency . . . . . . . . . . . . . . . . . . . . 40
IV.3 Flow controller impact . . . . . . . . . . . . . . . . . . 42
IV.4 Data Loss . . . . . . . . . . . . . . . . . . . . . . . . . 43
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Bibliography 46
iii
List of Figures
iv
Introduction
Big data and the Internet of Things (IoT) are evolving rapidly, affecting
many aspects of technology and business and increasing the benefits to com-
panies and individuals. The explosion of data generated by the IoT has had
a significant impact on the big data landscape [7]. Multimedia data, such as
text, audio, images and videos are rapidly evolving and becoming the main
channels for controlling, processing, exchanging and storing information in
the modern era [8]. The fact that 2.5 quintillion bytes (2.5 e+9 GB) of
data are generated every day is a huge challenge for this undertaking. Ap-
proximately 90 percent of the world’s data has been created in the last two
years [9]. Video streaming and downloading are expected to account for 82
percent of global internet traffic by 2022, according to estimates. To put this
in perspective, in 2018, 56 exabytes (equivalent to one billion gigabytes) of
internet video were used each month [10].
We can therefore predict the amount of data that will be generated in the
coming years. The enormous development of media data accessible to users
poses a whole new set of problems in terms of data control and processing.
Many crucial data is transferred in the IoT context. Due to the growing rel-
evance of public security in many locations, the demand for intelligent visual
surveillance has increased. It is necessary to build computer vision-based
controlling and processing tools for real-time detection of frauds, suspicious
movements, and criminal suspect vehicles, among other things. The detec-
tion and subsequent tracking of moving objects is an important task for
computer vision based surveillance systems. The identification of an object
in a video that varies its position relative to the field of view of a scene is
defined as the process of recognising moving objects. The detection process
can be characterised as the detection of the trajectory of an object in a video.
While the traditional real-world multimedia systems provide real-time
multimedia processing. Scalability remains a nightmare due to the volume
and velocity of this data, as these traditional systems cannot keep pace with
the addition of multimedia sources such as cameras with exponentially higher
bandwidth and processing rates. More memory, computing power and energy
1
consumption are required with higher bandwidth [1]. The problem is that
the data bandwidth can exceed the processing time, which can lead to data
loss or delay in real-time processing. In addition, intensive data processing
can impact on power consumption and even damage the system hardware.
For similar mission-critical applications, information loss or long notification
delays after a security breach are not acceptable. However, in a big data
context, with a large number of data sources and heterogeneous networks,
deploying solutions dealing with these constraints is difficult to achieve.
In this study, we are interested on real-time control and processing of
massive multimedia data in order to extract interesting events with low la-
tency and without data loss. We propose MAFC, a Multimedia Adaptable
Flow Controller for big data systems. Compared to conventional multime-
dia applications, we show that MAFC enables scalability and improves the
processing performance of big data multimedia systems.
The rest of the report is organized as follows:
2
Chapter 1
Background
Introduction
In this chapter, we will present a relevant background on the Internet of
Things and the Internet of Multimedia Things. Furthermore, a basic knowl-
edge of big data concepts such as definition, characteristics, multimedia big
data and applications is also required.
I Internet of Things
I.1 Definition
The Internet of Things (IoT) designs a group of interconnected objects pos-
sessing each a unique identifier and the ability to exchange data. This in-
terconnection is guaranteed thanks to a network that doesn’t require the
human supervision. First, a group of embedded sensors collects the data. It
is then stored, processed and shared through the network in order to achieve
a certain functionality depending on the studied case.
3
network architecture. Quality of experience (QoE) refers to how users per-
ceive QoS. QoE can be classified as objective or subjective. Objective QoE
of customers is difficult to quantify and varies greatly with needs. How-
ever, service providers are concerned with subjective QoE when evaluating
the network’s Mean Opinion Score (MOS). The amount of multimedia data
is multiplying. It creates additional problems in terms of data transmis-
sion, processing, storage and sharing. For edge, fog, and cloud devices, new
processing algorithms are needed. For multimedia data storage, new com-
pression and decompression algorithms are introduced. The standard routing
protocol for the IoT is the routing protocol for low-power, lossy networks.
It requires more work, with the consideration of IoMT deployment scenarios
taking into account energy, fault tolerance and latency.
Multimedia communications are supported by IoT features, although
multimedia applications are bandwidth intensive and delay sensitive. The
increasing expansion of multimedia traffic in the Internet of Things has neces-
sitated further development of new techniques to meet its requirements. To
process data, IoMT devices need more bandwidth, more memory and faster
processing resources. Multipoint-to-point and multipoint-to-multipoint sce-
narios are common communication scenarios. Emergency response systems,
traffic monitoring, crime inspection, smart cities, smart homes, smart hospi-
tals, smart agriculture, surveillance systems, Internet of Bodies (IoB), and
industrial IoT are examples of real world multimedia (IIoT) applications.
4
Figure 1.1: Key Data Characteristics of IoT and IoMT [1]
5
Multimedia communication in the IoT faces enormous challenges due to
network dynamics, device and data diversity, QoS stringency, latency sensi-
tivity, and reliability requirements in a resource-constrained IoT. Figure 1.2
shows a variety of IoMT uses.
II Big data
II.1 Definition
The term ”big data” refers to datasets that are too large to be processed by
traditional database management tools and which are too difficult to use. It
also implies datasets with a lot of variation and velocity, which requires the
development of potential solutions to extract value and insights from large,
rapidly changing datasets [12]. According to the Oxford English Dictio-
nary, big data is described as ”very massive data sets that can be processed
by computer to reveal patterns, trends, and relationships, especially in hu-
man behavior and relationships.” However, according to Arunachalam et al.
6
(2018) [13], this definition does not capture all of big data, as big data must
be distinguished from data that is difficult to process using typical data ana-
lytics. Due to the exponential increase in complexity, big data requires more
sophisticated approaches to process it.
Velocity: refers to the time it takes to analyze a large amount of data. Fast
processing maximizes efficiency because some activities are extremely
vital and require instantaneous responses. big data streams should be
examined and leveraged as they come into businesses for time-sensitive
activities, such as fraud detection, to maximize the value of the infor-
mation (e.g., processing 5 million business events created each day to
identify potential fraud or analyzing 500 million daily call detail records
in real-time to more quickly predict customer choices).
Variety: refers to the type of data that can be found in big data. This
information can be either structured or unstructured. Any kind of
data, including structured and unstructured data, such as text, sensor
data, audio, video, click streams, log files, etc. are classified as big data.
(e.g., tracking hundreds of live video streams from surveillance cameras
to target points of interest, using the 80% growth in data in photos,
videos and documents to improve customer satisfaction); analyzing the
merged data types generates new challenges, scenarios, etc.
Value: the added value that the acquired data can provide to the intended
process, business, or predictive analysis/hypothesis is defined as an im-
portant element of the data. The value of the data will be determined
by the events or processes it represents, such as stochastic, probabilis-
tic, regular or random occurrences or processes. Depending on this,
7
obligations may be imposed to collect all the data, store it for a longer
period (for a potentially interesting occurrence), and so on. In this
regard, the volume and diversity of data is closely related to the value
of the data.
8
size. Compared to ordinary big data, it contains a wide variety of data types.
Compared to machines, humans can better interpret multimedia datasets. In
addition, multimedia big data is more complex to manage than traditional
big data, which includes a variety of audio and video files such as interactive
videos, three-dimensional stereoscopic movies, social videos, etc. Multimedia
big data is difficult to describe and categorize because it is collected from
a variety of (heterogeneous) sources, including ubiquitous mobile devices,
sensor-embedded gadgets, the Internet of Things (IoT), the Internet, digital
games, virtual worlds and social media. Analyzing the content and context of
multimedia big data, which is inconsistent across time and space, is sobering.
Due to the increasing growth of sensitive video communication data, the
security of multimedia big data is complex. In order to keep up with the
speed of network transmission, it is necessary to process multimedia big data
quickly and continuously. In order to transmit massive amounts of data in
real-time, multimedia big data must be stored for real-time computing.
Based on the above characteristics, it is clear that scientific multimedia
big data poses some fundamental challenges, such as cognition and com-
plexity understanding, complex and heterogeneous data analysis, distributed
data security management, quality of experience, quality of service, detailed
requirements, and performance constraints. The problems listed above are
related to the processing, storage, transmission, and analysis of multimedia
big data, which leads to new avenues of study in this area. The term ”big
data” refers to datasets that, due to their size and complexity, are too mas-
sive and complex to be processed by typical data processing and analysis
application software. The vast volume of data sets is both structured and
unstructured, making various types of tasks such as querying and sharing
extremely difficult.
9
Smartphones: have recently surpassed the use of other electronic devices
such as desktops and laptops. Smartphones are equipped with modern
technology and features such as Bluetooth, a camera, network connec-
tion, GPS, and a large capacity central processing unit (CPU), among
others. Users can edit, process, and access heterogeneous multimedia
data via smartphones. In big data context, challenges related to mobile
smartphone sensors and data analysis, such as data sharing, influence,
security, and privacy issues. Other issues related to smartphones are
explored, such as big data, security, and multimedia cloud computing.
10
for the implementation of IoT applications in the development of multime-
dia big data. The rapid growth of the Internet of Things, along with a huge
volume of multimedia data, has created new opportunities for the growth of
multimedia big data. These two well-known technological advances are mu-
tually dependent and should be developed in combination, providing more
opportunities for IoT research.
Conlusion
This chapter allowed us to have a better idea about the concepts treated in
this work such as the Internet of multimedia Things and the multimedia in
big data systems.
11
Chapter 2
Introduction
Throughout this chapter, we will focus on the processing and analysis of data
stream, definition, key features and some stream processing technologies.
Programming models
A Stream processing engine’s programming model consists of a processing
job that accepts stream items from a source, processes them, and then emits
some things to a sink [18]. A programming model, in this case, is an abstrac-
tion of the Stream processing engine that enables stream transformations and
12
processing. Given the unbounded nature of streaming data, having a global
perspective of the incoming streaming data is not possible [19]. As a result,
data is handled in pieces on top of an endless data stream by constructing dis-
crete, bounded time periods. In most stream processing engines,(i) a window
is specified by either a time length or a count of records. The count-based
window is sized by the number of events included in the window, whereas
the time-based window defines a moving view that decomposes a stream into
subsets and computes a result over each subset [20] [21]. This makes it pos-
sible to run transformations on the records in the window [18]. In stream
applications, time-based windows are common [21]. (ii) Sliding windows
divide the incoming data stream into fixed-size segments, each with its own
window length and sliding interval.It’s the same as the fixed window if the
sliding interval is the same size as the window length. If the sliding interval
is less than the window length, one or more pieces of data will be included
in more than one sliding window; because the windows overlap, aggregation
transformations in a sliding window will provide a smoother outcome than
in a fixed window. (iii) Unlike fixed and sliding windows, session windows do
not have a predetermined window length. Instead, it is usually defined by a
period of inactivity that exceeds a certain threshold. A session is defined as a
burst of user activity followed by inactivity or a timeout [18]. Many stream
conversions are stateless because they are performed independently on each
record. The most frequent types of stateless transformations are (i) Map,
which converts one record to another, (ii) Filter, which filters out records
that do not fulfill a given predicate, and (iii) FlatMap, which converts each
record to a stream and then flattens the stream elements into a single stream.
Stateful transformations are those that need the processor to retain internal
state across many records.Common types of stateful transformations are (i)
Aggregation, which combines all the records into a single value, (ii) Group-
and-aggregate, which extracts a grouping key from the record and computes
a separate aggregated value for each key, (iii) Join, which joins same-keyed
records from multiple streams, and (iv) Sort, which sorts the records ob-
served in the stream All ingested records within a specified time range are
included in the computation in stateful transformations.Windowing repre-
sents different or partially overlapped time frames and meaningfully limits
the scope of the aggregation [18]. In summary, stream processing engines
can do real-time streaming data processing, in which tuples are processed
as they come, batch data processing, in which time windows are utilized to
divide tuples into batches, or both. A batch of stream elements for a given
stream is a time interval-based subset of the stream.Incoming data streams
are usually processed natively (one tuple at a time) or in micro batches by
distributed stream processing engines. A query or process is represented as a
13
DAG, which consists of a set of node operators as well as data stream source
and sink operators. Stateless or stateful operators are both possible. The
continuous operator model, in which streaming calculations are done by a set
of stateful operators, is used by the majority of distributed Stream processing
engines. Each operator updates the internal state and sends new records in
response to data stream records as they arrive.
14
State management
Operators can be stateless or stateful, as mentioned at the start of this sec-
tion. Stateless operators are simply functional in that they produce output
solely dependent on the input they receive. Stateful operators, on the other
hand, compute their output based on a series of inputs and may employ ad-
ditional side information stored in an internal data structure known as state.
In a data stream application, a state is a data structure that keeps track of
previous operations and influences the processing logic for future calculations.
State was isolated from the application logic that computed and manipulated
the data in traditional stream processing engines, and the state information
was kept in a centralized database management system to be shared among
applications . Managing state effectively poses a number of technical difficul-
ties . The state management facilities in different stream processing engines
naturally fall along a complexity continuum from a simple in-memory-only
choice to a persistent state that can be queried and reproduced.
15
Fault tolerance and recovery
Engines for data stream processing must be up and running for as long as it
takes to recover from failures. Bugs in the system logic, network problems,
the crash of a compute node in a cluster environment, as well as other hard-
ware issues, dependency on third-party software, and bottlenecks caused by
the volume and speed of incoming data are all reasons for failure. A stream
processing engine’s fault-tolerance qualities are demonstrated by its capac-
ity to remain functioning in the face of failures. Stream processing engines
should be able to handle faults with little impact and overhead. Data loss
(for example, due to a node crash or because the data is in memory) and
resource access loss (for example, due to a cluster manager fault) are com-
mon losses caused by stream processing engine failures. On top of regular
processing, failure recovery necessitates additional resources, and repeated
processing results in overhead. For real-time applications, reprocessing from
the start of the pipeline is impractical. Ideally a system should be able to
restore itself to a previous state before the fault occurred and repeat only
some of the failed components of the data processing pipeline. There are two
forms of fault tolerance in distributed stream processing engines: passive
(such as checkpoint, upstream buffer, and source replay) and active (such as
replicas) techniques .
16
features are still very important today, but other features are also added to
this framework. This allows Kafka to perform simple stream processing.
Kafka offers streaming processing using the Kafka Streams library, and
also offers Streaming SQL via its own SQL language, Kafka SQL (KSQL).
Kafka’s coordinator, Zookeeper, is responsible for coordinating brokers and
failure recovery. Brokers are nodes that manage topics, store events and
perform processing. The broker performs the processing. This is illustrated
in Figure 2.1. he figure describes the possibility of multiple brokers and
subjects, where the subjects can be distributed over several brokers. A topic
can be considered a hub, with some systems publishing to it and others
subscribing to it. This allows Kafka to simplify event processing by allowing
a producer to publish on one topic, process it with Kafka Streams, and
then publish on another topic and forward it to a consumer. Ostensibly,
producers write events to the Kafka cluster, while consumers read them.
Kafka can guarantee ”Exactly-once” semantics, which means that an event
will only be processed once. On the other hand, this requires a lot of system
management, which slows down the operation considerably. However, as
this is a new feature of Kafka, it will probably be improved in the future.
Otherwise, Kafka is extremely adaptable, allowing different semantics such
as ”At most once Semantics”, which guarantees that an event is processed
17
only once, and ”At least once Semantics”, which guarantees that an event is
processed at least once.
II.2 MQTT
In 1999, Andy Stanford-Clark and Arlen Nipper created the MQTT proto-
col as a bandwidth-efficient means of communication. This protocol sup-
ports many levels of quality services by providing a lightweight subscrip-
tion/publication messaging mechanism to create links between different sys-
tems. The importance of the MQTT protocol becomes apparent when con-
necting remote sites with small hardware. Indeed, to ensure this connection,
it separates the publisher (the transmitter) from the subscriber (the receiver).
These two elements become totally independent and do not need authenti-
cation to exchange data. An MQTT session is divided into four stages:
18
Termination: When a publisher or subscriber wishes to terminate an MQTT
session, it sends a DISCONNECT message to the broker, then closes
the connection. sends a DISCONNECT message to the broker and
then closes the connection.
Spouts, like producers in Kafka, are the inputs to the stream, while Bolts
are the processing engines for the stream. The capacity of the Bolt is compa-
rable to Hadoop’s MapReduce, with the Bolt being able to Map and Reduce
at the same time. In addition, to ensure maximum speed, the workload of
Spouts and Bolts is spread across all workers in the architecture. However,
without the implementation of the Trident high-level API, which relies on
micro-batching, it cannot guarantee ”Exact-once semantics.” The Storm core,
19
on the other hand, has built-in ”At least once” semantics, which is more than
sufficient in most cases.
20
Figure 2.4: Flink architecture [4]
21
streaming, with each batch including many events or information. It cannot
compete with pure streaming implementations that have significantly lower
latency due to its fundamental batching structure. This may indicate that
Spark is not being considered for testing on systems with lower latency re-
quirements. The data in Spark, on the other hand, is processed consistently
and can guarantee ”Exactly-once” semantics, as well as other underlying cat-
egories. These and other features make Spark a viable option for streaming
systems. Spark Structured Streaming, which is part of the Spark SQL li-
brary, is another streaming library. This library did not support streaming
prior to Spark 2.x, but it has since become a well-known streaming library.
In addition, Structured Streaming is an attempt to combine batching and
streaming, with only minor variations in method calls and syntax.
22
II.6 Apache Samza
Apache Samza [6] is a scalable data processing engine that allows you to
process and analyze your data in real-time. Here is a summary of Samza’s
features that simplify application creation:
Unified API: Use a simple API to describe your application logic indepen-
dent of your data source. The same API can handle both batch and
streaming data.
Pluggability at all levels: Process and transform data from any source.
Samza offers built-in integrations with Apache Kafka, AWS Kinesis,
Azure EventHubs, ElasticSearch and Apache Hadoop. In addition, it
is quite easy to integrate your own sources.
Fault tolerance: Seamlessly migrate jobs and their associated state in the
event of failure. Samza supports host affinity and incremental check-
pointing to enable rapid recovery from failures.
Massive scale: The solution has been tested on applications that use multi-
ple terabytes of state and run on thousands of cores. It powers several
large enterprises, including LinkedIn, Uber, TripAdvisor, Slack, and
more.
II.7 Others
Apache Apex is another stream processor that we did not consider. In many
ways, it is identical to Spark in terms of functionality. However, it is a
relatively new system that is only used by a few major companies. In addi-
tion, there are few comparable benchmarks between Apex and other stream
23
Figure 2.6: Apache Samza stack [6]
Conlusion
This chapter made us cease the importance of stream analysis for real-time
data processing. In the next chapter, we are going to show some previous
works that address similar issues to our proposed solution.
24
Chapter 3
Related work
Introduction
Many research works have studied multimedia big data systems using sev-
eral methods such as incoming stream control and preprocessing method
optimization. However, others have realized their architecture for stream
processing systems. In this section, some of these papers will be presented
with their results.
I AMSEP
The AMSEP [25], an Automated Multi-level Security Management for Mul-
timedia Event Processing, a system that provides an end-to-end secure pub-
lish/subscribe communication. Each broker topic is associated with a notable
event. Surveillance cameras are used to generate multimedia messages at the
gateways. In order to avoid unnecessary broadcasting of video data, scalar
sensors activate them when needed. Based on a set of Deep Learning (DL)
processors, AMSEPm enables intelligent and autonomous labeling of mes-
sages. The information provided Multimedia event messages are sent to DL
classifiers, which determine the probability of an event occurring.If any of the
DL detectors detect an event (e.g., human presence, fire, flood, etc.), a new
message containing the event class and event metadata (e.g., when, where)
is sent to the security manager. At this point, the security manager parses
a security policy to assign the multimedia event message the appropriate la-
bel. In the case of the section, a DL processor evaluates the likelihood that a
human being is present in the office.The system generates a message that in-
cludes the type of event (intrusion), the office number and the time the event
occurred. The security manager assigns a security label to the event based
25
on these three pieces of information. For example, a nighttime intrusion
into the president’s office is classified as a high-risk security event that must
be reported about the breach with the highest level of confidentiality. Con-
sumers, on the other hand, must have a sufficient level of privacy on a subject
to read its communications.Only security managers with a high security level
are informed of the intrusion, which can be criminal or legitimate in some
scenario (for secrecy reasons). Consumer security levels and approved topics
are assigned and controlled by the security manager. Therefore, consumers
do not discover any information that could compromise the security of the
system. The architecture of the proposed publish/subscribe secure messag-
26
II RAM3S
RAM3 S [26] is a software framework for the real-time analysis of massive mul-
timedia streams. By exploiting RAM3 S, information from many data sources
(such as sensors and cameras) can be efficiently integrated using RAM3S in
conjunction with a big data streaming platform, with the goal of discover-
ing new and hidden information from the output of the data sources as it
happens, i.e., with very low latency. RAM3 S thus serves as a bridge between
two challenging technologies, one applied (multimedia stream analysis) and
the other architectural (big data platform), allowing the former to be imple-
mented seamlessly on the latter.
The analyzer and receiver interfaces are the core of the RAM3S, and when
used together, they can function as a general purpose multimedia flow pro-
cessor. The Analyzer component is responsible for analyzing the individual
27
Figure 3.3: RAM3 S implementation for face detection system
28
media server. The video data is broadcast by the media server, and users
join a session based on their ability to process it. Compared to unicast trans-
mission, the bandwidth consumption in multi-bitrate multicast transmission
is lower.
While this research work provide a frame rate adapting mechanism, this flow
control may affect on important video stream that must be sent with highest
rate.
In this paper, authors present the overall architecture of the real-time data
infrastructure and identify three scaling challenges that they need to contin-
uously address for each component in the architecture by relying on open
source technologies for the key areas of the infrastructure. On top of those
29
open-source software, they add significant improvements and customizations
to make the open-source solutions fit in Uber’s environment and bridge the
gaps to meet Uber’s unique scale and requirements. for the implimentation,
Uber researchers use Apache Kafka for streaming storage associated with
Apache Flink for stream processing. Also, they impliment Apache Pinot for
OLAP and HDFS for archival store and presto for Interactive SQL Query.
The proposed solution was used in several different real-time use cases across
the 4 broad categories (as in Figure 3.4) in production at Uber such as
Analytical Application(Surge Pricing), Dashboards (UberEats Restaurant
Manager), Machine Learning (Real-time Prediction Monitoring) and Ad-hoc
Exploration (UberEats Ops Automation).
Despite using robust data processing systems such as Apache Flink combined
with kafka to power streaming applications to calculate updated pricing, im-
prove driver dispatch and combat fraud on its platform. they did not apply
their system on multimedia data.
Conclusion
Since these related works propose an efficient solution in the field of flow
controlling and multimedia data processing , they show some limitations and
the need for another solution approach to improve efficienty for multimedia
processing in big data systems.for this we propose a MAFC architecture in
the next chapter by inspiring on previous related works.
30
Chapter 4
Solution approach
Introduction
Based on the previous chapter, very few research works provide a flow con-
trol solution for multimedia big data systems. In this context, we propose
MAFC, an adaptable multimedia flow controller in Big Data systems. In
this chapter, we will present the architecture of the MAFC system and an
overview of its components and workflow description. Then, we aim to im-
plement this architecture by applying it to a surveillance system use case.
This will give us an global view of some of the essential concepts adapted in
the chosen architecture. Furthermore, we will present a series of performance
and adaptability tests, and we will conclude with a discussion of the results.
I System architecture
We propose MAFC, a multimedia adaptable flow controller for Big Data
systems that ensures adaptability with system scalability without data loss
and with performance optimization according to the application context.As
illustrated in Figure 4.1 We make the assumption that the multimedia data is
only a video sources. For the data collection, the proposed architecture is able
to ingest video data from either unbounded video streaming data or stored
multimedia sequences (1). Once the multimedia data has been acquired,
it will decomposed into a sequence of individual multimedia objects that
will be fed one by one to the flow controller (2). Based on the calculated
control policies by the policy adapter (8), the flow controller choose which
frame will be published to the message broker on the topic called ”Frame
Processing” and which is not (3). Then, once the frame was published to
the message broker (4), the real-time data processing engine will extract the
31
specific events from the in coming frames in real-time and publish them to
the concerned topics (6) to consume them by the media stream controller in
order to automatically update the control policies and by third-party services
to notify concerned parties (7).
32
33
Figure 4.1: MAFC architecture
II MAFC architecture components
II.1 Receiver
The main role of the receiver is to break down an incoming multimedia stream
into discrete elements called multimedia objects that can be examined one by
one. The type of ”individual multimedia object” depends on the application
concerned: In a surveillance system scenario, the object is a single image
extracted from a camera video sequence. In this case, the receiver acts as a
video-to-image converter.
34
Time-based controller: the incoming stream was controlled based on time.
For example, in a surveillance system, we increase the multimedia
stream in the day time when there is a large movement of people and
decrease it at night to make it possible to add cameras inside to provide
security in prohibited areas.
In addition of these principle policies, the user can add other custom policies
according to the its needs and prioritize those policies to choose which order
to follow to control the multimedia stream.
35
on a substructure or component called a message queue that stores and or-
ders the messages until the consuming applications can process them. In a
message queue, messages are stored in the exact order in which they were
transmitted and remain in the queue until receipt is confirmed.
36
messaging system it can handle a large volume of data which enables you to
send messages at end-point. Apache Kafka is developed at Linked In avail-
able as an open source project with Apache Software Foundation. Following
are some points that describe why Kafka.
37
38
Figure 4.2: MAFC implementation
III.3 Implementation of multimedia data processing
engine
In order to process all the real-time data coming through Kafka, we have
built a stream processing platform on top of Apache Flink[42]. Apache
Flink is an open-source, distributed stream processing framework with a
high-throughput, low-latency engine widely adopted in the industry. We
adopted Apache Flink for a number of reasons. First, it is robust enough to
continuously support a large number of frames with the built-in state man-
agement and checkpointing features for failure recovery. Second, it is easy
to scale and can handle back-pressure efficiently when faced with a massive
input Kafka lag. Third, it has a large and active open source community as
well as a rich ecosystem of components and toolings. Based on previous com-
parisons done in previous research with Apache Storm and Apache Spark,
Flink was deemed as the better choice of technology for this layer. Storm
performed poorly in handling back pressure when faced with a massive input
backlog of millions of messages, taking several hours to recover whereas Flink
only took 20 minutes. Spark jobs consumed 5-10 times more memory than
a corresponding Flink job for the same workload.
On top of Flink , we use tensorflow with SSD-mobilnet as an object detection
model for its high accuracy with low latency.
In the system implementation, we associate Kafka with Flink for its persis-
tent channel that prevents immediate back pressure processing and avoids
data loss.
IV Experimental evaluation
In order to choose the optimal parameters and to show the impact of MAFC
on the system performance for the situation at hand, we performed a series
of experiments on our use case on different machines by changing the CPU
and RAM performance to test their impact.These tests were performed on 4
Virtual Machines.
the first was a virtual box on top of my PC by allocating 2 virtual processors,
an 8Gb of ram and a 20Gb of storage. then we doubled the number of cpu to
4 virtual processors .For the third Virtual machine was a virtual machine on
Microsoft azure cloud with 4 virtual processors, 16 Gb of ram and 20Gb of
storage. Then we doubled the RAM to 32 GB. The first set of experiments
aims at evaluating the spent time for each processing step from acquiring the
video to sending the extracted objects to third parties.
39
IV.1 Processing time
Figure 4.3 plots the four evaluation measures for the different virtual ma-
chine. for the first one, it took half a second to process an image, almost
zero point forty-five seconds for detection and classification. the second VM
was faster, taking a quarter of a second to process a frame. almost one point
twelve of this time was spent on classification and detection and the rest on
sending and extracting the images from video 2. This demonstrates that the
processing time was halved if we doubled the RAM or processor performance.
According to Figure 4.4 , the bottleneck of the system is the object detec-
tion and classification phase: about 90% of the latency time is spent in object
detection, while the image splitting and image sending tasks are relatively
fast. so, the flow control take a relatively negligible time in processing.
40
Figure 4.4: Percentage of time processing
video itself. For the second virtual machine, the system can process 4 fps in
real time without latency, and then the latency increases as the video rate
increases. And the same thing happened for the other virtual machines.
41
IV.3 Flow controller impact
Our third test aimed to evaluate the adaptation of a surveillance system
with Multimedia Adaptable Flow Controller in big data systems by changing
the number of cameras starting from and measuring the end-to-end latency.
figure 4.6 shows that the traditional surveillance system that processes all
incoming multimedia streams without control has not been able to handle
the increasing number of cameras, as a result, the end-to-end latency has in-
creased exponentially to intolerable limits for a surveillance system. Whereas
with the addition of the multimedia flow controller, our system was able to
adapt the system to operate efficiently without latency by adapting the total
incoming flow to the number of cameras.
42
IV.4 Data Loss
In the last experiment, we implement our system without a kafka broker
in order to show the impact of a persistent channel to manage the stream
buffer before processing, So we bombard the system with a flow of frame
throughput higher than what it can handle by simulating a flow of 30 fps
while the system can only handle 13 fps. Figure 4.7 shows that without a
persistent channel, Flink’s memory can’t handle all the incoming streams,
which is why it overwhelmed quickly and caused the system to crash.
Conclusion
In this chapter, we have presented the proposed solution. First, we intro-
duced the workflow. Then, in order to clarify the contribution, we made a
detailed description of each component of the architecture. All of the above
facilitated our next step, namely implementation and evaluation, where we
implemented open source technologies for the MAFC system. Furthermore,
through experimentation, we were able to adapt the system to run efficiently
with low latency despite the high resources consumption by the processing
stage.
43
Conclusions and Further
Directions
44
ASFC
As a continuation of this research, it is possible to propose an extension of the
MAFC architecture to make it generic and ready to use with any type of data
and not only multimedia data. So, it could be called AFSC as ”Adaptable
Streams Flow Controller in big data systems”.
45
Bibliography
[3] “Apache Storm,” last accessed: june 21, 2021. [Online]. Available:
https://storm.apache.org/
[5] “Apache Spark™ - Unified Analytics Engine for Big Data,” last
accessed: june 21, 2021. [Online]. Available: https://spark.apache.org/
[6] “Apache Samza,” last accessed: june 21, 2021. [Online]. Available:
http://samza.apache.org
[9] “Big Data, for better or worse: 90% of world’s data generated over last
two years.”
46
[11] Y. A. Qadri, A. Nauman, Y. B. Zikria, A. V. Vasilakos, and S. W.
Kim, “The future of healthcare internet of things: A survey of emerging
technologies,” IEEE Communications Surveys Tutorials, vol. 22, no. 2,
pp. 1121–1167, 2020.
[14] M. Cheung, J. She, and Z. Jie, “Connection discovery using big data of
user-shared images in social media,” IEEE Transactions on Multimedia,
vol. 17, no. 9, pp. 1417–1428, 2015.
[15] Z. Tufekci, “Big questions for social media big data: Representativeness,
validity and other methodological pitfalls,” CoRR, vol. abs/1403.7400,
2014.
47
[22] S. Hamdi, E. Bouazizi, and S. Faiz, “Real-Time Data Stream Partition-
ing over a Sliding Window in Real-Time Spatial Big Data,” in Algo-
rithms and Architectures for Parallel Processing, ser. Lecture Notes in
Computer Science, J. Vaidya and J. Li, Eds. Cham: Springer Interna-
tional Publishing, 2018, pp. 75–88.
[23] “Apache Kafka,” last accessed: june 21, 2021. [Online]. Available:
https://kafka.apache.org/
[24] “How does flink maintain rapid development even as one of the most
active apache project ?”
48