Module 5 SECURITY IN WSN USING EDGE COMPUTING
Module 5 SECURITY IN WSN USING EDGE COMPUTING
Module 5 SECURITY IN WSN USING EDGE COMPUTING
Qin Liu
1 Introduction
Q. Liu ()
College of Computer Science and Electronic Engineering, Hunan University, Changsha, P.R.
China
e-mail: [email protected]
companies (e.g., Amazon Web Services, Microsoft Azure, and Google) offer cloud
services to users in a pay-as-you-go fashion.
Although cloud computing has overwhelming superiorities over traditional
computing models, the application of clouds is still far from expected. Most of the
enterprises only be willing to outsource the data and services that are unrelated
to their business to the cloud. The main reason is that customers worry that their
sensitive data may be deliberately or unintentionally leaked by the CSP. Actually,
the concerns about cloud security are not unnecessary. State-of-art CSPs experience
noteworthy outages and security breaches from time to time due to attacks,
malfunctions or misconfigurations. For example, Gmail’s mass email deletions in
2006, Microsoft Azure had an outage lasting 22 h in 2008, and the recent news
about Apple iCloud leaking out celebrities’ sensitive photos [4], Dropbox password
leak [5] and medical data leak on Amazon [6]. Therefore, a natural way to keep
sensitive data confidential against an untrusted CSP is to apply cryptographic
approaches, by disclosing decryption keys only to authorized users. In this way, only
the authorized entities can decrypt the data with appropriate keys. The unauthorized
entities, even if the CSPs, cannot know data contents. Actually, the state-of-art
CSPs already adopt cryptographic techniques to preserve data confidentiality. For
example, Amazon EC2 encrypt users’ data by default, and Amazon Simple Storage
Service(S3) allows users to encrypt their data before outsourcing.
This book chapter researches the problem of secure search and storage services
in cloud computing. We will first introduce related work on preserving storage
security and search privacy in Sect. 2, before discussing the system model and threat
model in Sect. 3. Then, we will describe cryptographic primitives in Sect. 4 before
proposing possibly feasible approaches to provide secure cloud storage and search
services in Sects. 5 and 6, respectively. Finally, we will discuss how the proposed
schemes can be applied to the fog/edge computing environment in Sect. 7 before
concluding this book chapter in Sect. 8.
2 Related Work
For data encryption, symmetric key and public key cryptosystems are widely used
tools. In symmetric key cryptosystem [7], a shared key between the sender and
the recipient is used as an encryption key and a decryption key. It is quick and
efficient, but its security relies on the length of the shared key. Moreover, different
messages require different shared keys for confidentiality, and thus the number
of keys maintained by each user grows linearly with the number of messages. In
the public key cryptosystem [8], each user has a public/private key pair. Messages
are encrypted with the recipient’s public key, and can only be decrypted with the
corresponding private key. No keys need to be shared between the sender and the
Secure Search and Storage Services in Cloud and Fog/Edge Computing 57
recipient before communication. It ensures high security, but it is much slower than
a symmetric key cryptosystem.
In cloud computing environments, data is generally shared by many data users
of different roles and attributes, thus how to achieve fine-grained access controls on
ciphertexts becomes a burning question. A lot of researches have been conducted
on achieving fine-grained access controls based on symmetric and public key
cryptosystems, all of which face various challenges. For example, Kallahalla et
al. [9] classified data with similar Access Control Lists (ACLs) into a file group, and
then encrypted each file group with a symmetric key. This key will be distributed
to the users in the ACL, so that only they can access this group of files. The
main drawback of this approach is that the number of symmetric keys managed
by each user grows linearly with the number of file groups he/she is authorized
to access. Goh et al. [10] encrypted the data with a symmetric key, which was in
turn encrypted with the public keys of the users in the ACL. Therefore, only the
users in the ACL can use their secret keys to recover the symmetric key, and then
use the symmetric key to recover the data. The main drawback of this approach is
that the encryption cost grows linearly with the number of users in the ACL. Liu et
al. [11] constructed an efficient data sharing scheme in cloud computing by using the
one-to-many property of the Hierarchical Identity-Based Encryption (HIBE) [12],
where a ciphertext can be decrypted by the recipient and all his ancestors with their
own private keys. Though each data needs to be encrypted only once, the length of
ciphertext is affected by the number of recipients.
Attribute-based encryption (ABE) is an effective cryptosystem for ensuring
fine-grained access controls on ciphertexts. The predecessor of ABE is fuzzy
identity-based encryption [13], which is first designed to achieve fault tolerance
in biometric identities. Now, ABE has developed to two branches, key-policy ABE
(KP-ABE) [14] and ciphertext-policy ABE (CP-ABE) [15, 16]. The main difference
between KP-ABE and CP-ABE lies in the location of the access structure, which is
in key and in ciphertext, respectively. CP-ABE allows the data owner to take more
initiative on specifying access structure for the ciphertext, and thus is more suitable
for a data sharing environment.
The original ABE systems only support monotone access policy and assume
the existence of a single private key generator (PKG). A lot of research has been
done to achieve more expressive access policy [17–20], and distributed key manage-
ment [21–23]. To achieve a flexible access control in cloud computing, our previous
work [24–26] proposed a Hierarchical Attribute-Based Encryption (HABE) scheme,
by combining HIBE and ABE systems. The HABE scheme supports both ID-based
and attribute-based access policies, with hierarchical key generation properties.
On the basis of the ABE scheme, Zhu et al. [27] proposed a Comparison-based
encryption (CBE) scheme by making use of forward and backward derivation
functions and applied CBE to the cloud environment. However, the encryption cost
of the CBE scheme grows linearly with the number of attributes in the access
policy. To solve this problem, our previous work proposed a Hierarchical CBE
scheme by incorporating the attribute hierarchy into the CBE scheme [28]. Wu et
al. [29] proposed a Multi-message Ciphertext Policy Attribute-Based Encryption
58 Q. Liu
(MCP-ABE) scheme to achieve scalable media data sharing in clouds. Li et al. [30]
leveraged ABE techniques to achieve fine-grained and scalable data access control
on Personal Health Records (PHRs) in cloud environments. ABE inherently exposes
the access structure. To achieve both payload hiding and attribute hiding, Predicate
Encryption (PE) [31, 32] was proposed as a stronger security notion of ABE. In this
book chapter, we will introduce the HABE scheme, which simultaneously achieves
a fine-grained access control, high performance, practicability, and scalability, so as
to be more applicable in cloud computing.
User revocation is a well studied, but non-trivial task. The key problem is that
the revoked users still retain the keys issued earlier, and thus can still decrypt
ciphertexts. Therefore, whenever a user is revoked, the re-keying and re-encryption
operations need to be executed by the data owner to prevent the revoked user from
accessing the future data. For example, when ABE is adopted to encrypt data,
the work in reference [33] proposed to require the data owner to periodically re-
encrypt the data, and re-distribute new keys to authorized users. This approach is
very inefficient due to the heavy workload introduced on the data owner.
A better solution is to let the data owner delegate a third party to execute
some computational intensive tasks, e.g., re-encryption, while leaking the least
information. Proxy re-encryption [34, 35] is a good choice, where a semi-trusted
proxy is able to convert a ciphertext that can be decrypted by Alice into another
ciphertext that can be decrypted by Bob, without knowing the underlying data and
user secret keys. For example, the work in reference [36] is the first to combine KP-
ABE and PRE to delegate most of the computation tasks involved in user revocation
to the CSP. Our previous work [25] is the first to combine PRE and a CP-ABE
system (HABE) to achieve a scalable revocation mechanism in cloud computing.
The work in reference [37] that supports attribute revocation may be applicable to
a cloud environment. This approach requires that once a user is revoked from a
system, the data owner should send PRE keys to the CSP, with which the CSP can
be delegated to execute re-encryption. The main problem of this approach is that the
data owner should be online in order to send the PRE keys to the CSP in a timely
fashion, to prevent the revoked user from accessing the data. The delay of issuing
PRE keys may cause potential security risks.
Shi et al. [38] proposed a dubbed directly revocable key-policy ABE with
verifiable ciphertext delegation (drvuKPABE) scheme to achieve direct revocation
and verifiable ciphertext delegation. Liu et al. [39] proposed a time-based proxy
re-encryption (TimePRE) scheme, which allowed the cloud to automatically re-
encrypt the ciphertexts based on time. Yang et al. [40] presented a conditional
proxy re-encryption scheme to achieve cloud-enabled user revocation. Yang et
al. [41] proposed a novel scheme that enables efficient access control with dynamic
policy updating in cloud computing. They designed policy updating algorithms for
Secure Search and Storage Services in Cloud and Fog/Edge Computing 59
The first searchable encryption scheme was first proposed by Song et al. [42],
where both the user query as well as the data is encrypted under a symmetric key
setting. Depending on the selection of index key, pk, and search key, sk, existing SE
solutions can be classified into symmetric-key settings (sk = pk) and public-key
settings (sk = pk). In their scheme, each word in the file is encrypted independently
with a symmetric key, and is encapsulated with a two-layer structure. On receiving
a trapdoor from the user, the server can get rid of the outer layer and determine
whether the file matches user query or not. The main drawback of this approach
is that the server has to scan the whole file collection while conducting searches,
and thus the searching cost grows linearly with the number of files in the collection.
Since then, there has been much work conducted in this field, for example, both
Goh [43] and Chang et al. [44] developed secure searchable index schemes to solve
the problem of linear searching cost. Furthermore, to allows the users to verify
the integrity of the search results returned by an untrusted platform, Kurosawa et
al. [45] proposed a verifiable searchable symmetric encryption scheme. However,
their scheme only supports verification of static data. Kamara et al. [46] proposed
a dynamic verifiable searchable symmetric encryption scheme to achieve integrity
verification of search results from dynamic data sets.
The work in [47] proposed the first public key-based searchable encryption
protocol, where anyone with the public key can encrypt data, but only users with
the private key can generate queries. The typical application of their work is to
employ a gateway to test whether certain keywords are contained in an email without
learning any information about the email and keywords. The above work supports
only OR semantic in the keywords. As an attempt to enrich search predicates,
searchable encryption schemes that support conjunctive keyword search [48], subset
query [50], and range query [51], have also been proposed. Popa et al. [52] proposed
a CryptDB system that allows users to execute SQL queries over encrypted data, by
using adjustable searchable encryption incorporating with an SQL-aware encryption
strategy and onion encryption.
60 Q. Liu
3 Problem Formulation
As shown in Fig. 1, our system model consists of three types of entities: data owner,
data user, and cloud server.
Secure Search and Storage Services in Cloud and Fog/Edge Computing 61
We assume that the data owner and the authorized data users are fully trusted.
Furthermore, we assume that communication channels are secured under existing
security protocols such as SSL. According to the attack ways initiated by the
CSP, we mainly consider the Honest but curious model: A honest but curious CSP
would correctly execute the prespecified protocol, but still attempt to learn extra
information about the stored data and the received message.
62 Q. Liu
Therefore are two kinds of privacy against the adversary: file privacy and query
privacy. In terms of file privacy, the file should be accessed by only the authorized
users. That is, the attacker cannot deduce the contents of files from the encrypted
data stored in the cloud.
In terms of query privacy, the ideal case is that the server should learn nothing
during the query process. However, this perfect level of privacy requires expensive
primitives, such as oblivious RAM [72] and fully homomorphic encryption [73].
To be practical, existing work trades leakage for efficiency. This leakage typically
includes search pattern and access pattern. As defined in [74], access pattern refers
to the outcome of search results, i.e., which documents have been returned; search
pattern refers to whether two searches have been performed for the same keyword.
4 Preliminaries
Definition 1 (Bilinear Map) Let G1 and G2 be two cyclic groups of some large
prime order q, where G1 is an additive group and G2 is a multiplicative group. A
bilinear map, ê: G1 × G1 → G2 , satisfies the following properties: (1) Computable:
There is a polynomial time algorithm to compute ê(P , Q) ∈ G2 , for any P , Q ∈ G1 .
(2) Bilinear: ê(α0 P , α1 Q) = ê(P , Q)α0 α1 for all P , Q ∈ G1 and all α0 , α1 ∈ Z∗q .
(3) Non-degenerate: The map does not send all pairs in G1 × G1 to the identity in
G2 .
Definition 2 (BDH Parameter Generator) A randomized algorithm IG is called
a BDH parameter generator if IG takes a sufficiently large security parameter K as
input, runs in polynomial time in K, and outputs a prime number q, the description
of two groups G1 and G2 of order q, and the description of a bilinear map ê :
G1 × G1 → G2 .
Definition 3 (BDH Problem) Given a random element P ∈ G1 , as well as α0 P ,
α1 P , and α2 P , for some α0 , α1 , α2 ∈ Z∗q , compute ê(P , P )α0 α1 α2 ∈ G2 .
Definition 4 (BDH Assumption) If IG is a BDH parameter generator, the advan-
tage AdvIG (B) that an algorithm B has in solving the BDH problem is defined to be
the probability that B outputs ê(P , P )α0 α1 α2 on inputs q, G1 , G2 , ê, P , α0 P , α1 P ,
α2 P , where (q, G1 , G2 , ê) are the outputs of IG for a sufficiently large security
parameter K, P is a random element ∈ G1 , and α0 , α1 , α2 are random elements of
Z∗q . The BDH assumption is that AdvIG (B) is negligible for any efficient algorithm
B.
Secure Search and Storage Services in Cloud and Fog/Edge Computing 63
Let us illustrate the motivation of the PRE scheme [34] by the following example:
Alice receives emails encrypted under her public key P KA via a semi-trusted mail
server. When she leaves for vacation, she wants to delegate her email to Bob whose
public key is P KB , but does not want to share her secret key SKA with him. The
PRE scheme allows Alice to provide a PRE key RKA→B to the mail server, with
which the mail server can convert a ciphertext that is encrypted under Alice’s public
key P KA into another ciphertext that can be decrypted by Bob’s secret key SKB ,
without seeing the underlying plaintext, SKA , and SKB .
Let G be a multiplicative group of prime order q, and g be a random generator
of G. The PRE scheme is consisted of the following algorithms:
Key Generation Alice can choose a random element a ∈ Z∗q as her secret key
SKA , and her public key P KA is g a ∈ G. In the same way, Bob’s public/secret
key pair (SKB , P KB ) are (b, g b ). The PRE key RKA→B = b/a( mod q) is used
to transfer a ciphertext that is encrypted under P KA to the ciphertext that can be
decrypted with SKB , and vice versa.
Encryption To encrypt a message m ∈ G to Alice, the sender randomly chooses
r ∈ Z∗q , and generates ciphertext CA = (CA1 , CA2 ) = (g r m, g ar ).
Decryption Given the ciphertext CA = (CA1 , CA2 ), Alice can recover message m
with her secret key a by calculating CA1 /(CA2 )1/a .
Re-encryption Given RKA→B , the mail server can convert CA to CB that can
be decrypted by Bob as follows: CB1 = CA1 and CB2 = (CA2 )RKA→B . Given
the ciphertext (CB1 , CB2 ), Bob can recover message m with his secret key b by
calculating CB1 /(CB2 )1/b .
Note that although the data is encrypted twice, first encrypted with Alice’s public
key, and then re-encrypted with a PRE key, Bob only needs to execute decryption
once to recover data. The PRE scheme is based on ElGamal encryption [75], and
thus the ciphertext is semantically secure, and given the PRE key, the mail server
cannot guess the secret keys a nor b. Please refer to [34] for more details.
Since the cloud service provider (CSP) is outside the users’ trusted domain,
existing research suggests encrypting data before outsourcing. In cloud computing
environments, data is generally shared by many data users of different roles and
attributes, thus how to achieve fine-grained access controls on ciphertexts becomes
a burning question.
In CP-ABE, users are identified by a set of attributes rather than an exact identity.
The data is encrypted with an attribute-based access structure, such that only the
64 Q. Liu
users whose attributes satisfy the access structure can decrypt the ciphertext using
their private keys. For example, for data which is encrypted with the access structure
{(Student ∧ CIS) ∨ Staff}, either users with attributes Student and CIS,
or users with attribute Staff, can recover data. The original ABE systems only
support monotone access policy and assume the existence of a single private key
generator (PKG). A lot of research has been done to achieve more expressive access
policy [17, 18], and distributed key management [22, 23].
However, new problems, such as fine-grained access control on the encrypted
data and scalable user revocation, emerge for ABE schemes. To illustrate, let
us consider the following application scenario, as shown in Fig. 2. Suppose that
University A outsources the electronic library database to a cloud for easy access
by its staff and students. For the protection of copyright, each piece of data is
encrypted before outsourcing. In this application, the staff and students are users,
and University A is the data owner who will specify the access structure for each
data, and will distribute decryption keys to users. Once joining University A, each
user will first be assigned an access right with certain validity for accessing the
outsourced database. Once the period of validity passes, this user should request
an extension for his access right from University A. In Fig. 2, data F ’s access
structure stipulates that only Alice or the students in computer information science
(CIS) department have the right to access it. In this access structure, the data
owner describes the intended recipients using not only their IDs but also descriptive
attributes, such as Staff, Student, and CIS. Therefore, the adopted encryption
scheme should have the ability to efficiently implement a fine-grained access control
over ID and attributes simultaneously.
Furthermore, from the above application scenario, we observe that each user’s
access right is only effective in a predetermined period of time. For example, the
effective time of Alice’s access right is from 01/01/2020 to 12/31/2020 and she can
access the database in year 2020, but the effective time of Bob’s access right is
from 05/01/2020 to 06/30/2020 and thus he cannot access the database after June.
Therefore, the adopted encryption scheme should support a scalable revocation
mechanism to efficiently achieve a dynamic set of users.
To achieve a flexible access control in cloud computing, our previous work [24–
26] proposed a Hierarchical Attribute-Based Encryption (HABE) scheme, by
Secure Search and Storage Services in Cloud and Fog/Edge Computing 65
combining HIBE and ABE systems. The main merits of the HABE scheme is as
follows: (1) It supports both ID-based and attribute-based access policies, with
hierarchical key generation properties. (2) It requires only a constant number of
bilinear map operations during decryption, is tailored for best serving the needs of
accessing data anytime and anywhere.
Our subsequent work proposed a time-based proxy re-encryption (TimePRE)
scheme, by incorporating the concept of time into a combination of HABE
and proxy-re-encryption (PRE). The TimePRE take full advantage of abundant
resources in a cloud by delegating the CSP to execute computationally intensive
tasks in user revocations, while leaking the least information to the CSP. The main
merits of the TimePRE scheme is as follows: (1) It enables the CSP to automatically
re-encrypt data without receiving any PRE keys from the data owner. Therefore,
our scheme can avoid the security risks caused by the delay of issuing PRE keys.
(2) It allows a user’s access right to automatically expire after a predetermined
period of time. Therefore, the data owner, who can be offline in the process of
user revocations, has much less workload. In the next subsections, we will expatiate
the HABE scheme and the TimePRE scheme. The most relevant notations used in
HABE and TimePRE are shown in Table 1.
The access structure in HABE is expressed as disjunctive normal form (DNF). For
N N ni
example, access structure A = ∨ (CCi ) = ∨ ( ∧ aij ) consists of N conjunctive
i=1 i=1 j =1
clauses, CC1 , . . . , CCN , where the i-th conjunctive clause CCi is expressed as ai1 ∧
ai2 ∧ . . . ∧ aini . The users that possess all of the attributes in CCi can decrypt the
ciphertext. The original HABE allows a delegation mechanism in the generation of
decryption keys. For ease of illustration, we simplify the HABE scheme to a one-
layer structure and provide a modified version as follows:
Setup(K, UA) → (P K, MK): takes the security parameter K and the universal
attribute UA as input, and outputs system public key P K and system master key
MK as follows:
where (q, G1 , G2 , ê) are the outputs of a BDH parameter generator IG, P0 is a
random generator of G1 , and P1 is a random element in G1 ; ska ∈ Z∗q is the secret
key of attribute a and P Ka = ska P0 ∈ G1 is the public key of attribute a; mk0 and
mk1 are random elements in Z∗q , Q0 = mk0 P0 ∈ G1 , and SK1 = mk0 P1 ∈ G1 .
GenKey(P K, MK, I Du , a) →: takes system public key P K, system master
key MK, user identity I Du , and attribute a as inputs to generate a decryption key
dku,a = (SKu , SKu,a ) as follows:
66 Q. Liu
N N ni
A = ∨ (CCi ) = ∨ ( ∧ aij ),
i=1 i=1 j =1
U0 = rP0 ,
{Ui = r P Ka }1≤i≤N ,
a∈CCi
V = F · ê(Q0 , rnA P1 )
From the above construction, we know that The Encrypt and Decrypt algorithms
require only a constant number of bilinear map operations, and the length of the
ciphertext is related to the number of conjunctive clauses (O(N)), instead of the
number of attributes (O(m)), in the access structure. To encrypt a data, we need to
execute one bilinear map and O(N) number of point multiplication operations to
output a ciphertext of O(N) length, where N is the number of conjunctive clauses
in the access structure; To recover a data, we only need to execute O(1) bilinear
map operations. The HABE scheme is proven to be semantically secure under the
random oracle model and the BDH assumption. In Table 2, we briefly compare
our scheme with the work by Bethencourt et al. [15] and the work by Muller et
al. [16]. We believe that the most expensive computation is bilinear map operation,
abbreviated as map, the next is the exponentiation operation, abbreviated as exp. In
Table 2, n, S, N , T , and P denote the number of attributes associated with a user,
the number of attributes in an access structure, the number of conjunctive clauses
in an access structure, and the number of attributes in an access structure that is
matched by attributes in a user’s private key, respectively. More details can be found
in [25, 26].
While a user exits the system, his/her access right should be revoked. A typical
application is that a company authorizes its staffs to access corporate data; while
a staff left the company, his/her access right should be revoked. In the case of
accessing encrypted data, the revoked users still retain the keys issued earlier, and
thus can still decrypt ciphertexts. Therefore, whenever a user is revoked, the re-
keying and re-encryption operations need to be executed by the data owner to
prevent the revoked user from accessing the future data.
User revocation is a well studied, but non-trivial task. In an ABE encryption
system, it is more tricky to achieve effective user revocation, since each attribute
is conceivably possessed by multiple different users so that multiple different users
might match the same decryption policy [76]. The usual solution is to require each
attribute to contain a time frame within which it is valid, and the latest version
of attributes and user information will be periodically distributed. As discussed in
Bethencourt et al. [33], this type of solution, which demands the users to maintain a
large amount of private key storage, lacks flexibility and scalability.
Other than requiring the data owner to do all of work on user revocation, a better
solution is to let the data owner delegate a third party (e.g., the service provider)
to execute some computational intensive tasks, e.g., re-encryption. Since the third
party is not fully trusted, delegation should leak the least information. Proxy Re-
Encryption (PRE) [34, 77] is a good choice, where a semi-trusted proxy is able to
convert a ciphertext that can be decrypted by Alice into another ciphertext that can
be decrypted by Bob, without knowing the underlying data and user secret keys.
In recent work, Yu et al. [36] and our previous work [24–26] applied proxy
re-encryption (PRE) [34] into KP-ABE and CP-ABE, respectively, to achieve a
scalable revocation mechanism in cloud computing. Once a user is revoked from
a system, this kind of approaches, also called instruction-based PRE, require that
the data owner should distribute new keys to unrevoked users, and send PRE keys
to the CSPs, which can be delegated to execute re-encryption in a lazy way [9].
Although the revocation can be performed on demand, the data owner needs to send
the PRE keys to the servers in a timely fashion, to prevent the revoked user from
accessing the data. The delay of issuing PRE keys may cause potential security
risks. To solve this problem, Liu et al. [39] proposed a TimePRE scheme, which
allows the servers to perform re-encryption automatically. In their work, each user’s
access right is effective within a pre-determined time, and enables the servers to
re-encrypt ciphertexts automatically based on its own time. Thus, the data owner
can be offline in the process of user revocations. The differences between the
instruction-based PRE and the time-based PRE are shown in Fig. 3. Suppose that
T 1 < T 2 < . . . < T 6. In the instruction-based PRE, data user A who is revoked at
T 2 can access data from the clouds before the CSP re-encrypts the data at T 4.
The main idea of the TimePRE scheme is to incorporate the concept of time
into the combination of HABE and PRE. Intuitively, each user is identified by a set
of attributes and a set of effective time periods that denotes how long the user is
eligible for these attributes, i.e., the period of validity of the user’s access right. The
data accessed by the users is associated with an attribute-based access structure and
an access time. The access structure is specified by the data owner, but the access
time is updated by the CSP with the time of receiving an access request. The data
Secure Search and Storage Services in Cloud and Fog/Edge Computing 69
Fig. 3 Proxy re-encryption in clouds. (a) Instruction-based PRE. (b) Time-based PRE
Sa = Hs (a)
a b
Fig. 4 Three-level time tree and attribute a’s PRE key tree. (a) Sample time tree. (b) Sample PRE
key tree for attribute a
can be recovered by only the users whose attributes satisfies the access structure and
whose access rights are effective in the access time.
To enable the CSP to update the access time automatically, we first express actual
time as a time tree. The height of the time tree can be changed as required. For ease
of presentation, in this paper we only consider a three-layer time tree as shown in
Fig. 4a, where time is accurate to the day, and the time tree is classified into three
layers in order: year, month, and day. We use (y, m, d), (y, m), and (y) to denote
a particular day, month, and year, respectively. For example, (2020, 4, 5) denotes
April 5, 2020. The access time associated with a data corresponds to a leaf node in
the time tree, and the effective time periods associated with a user correspond to a
set of nodes in the time tree. If there is a node corresponding to an effective time
period that is an ancestor of (or the same as) the node corresponding to the access
time, then the user’s access right is effective in the access time.
Then, we allow the data owner and the CSP to share a root secret key s in
advance, with which the CSP can calculate required PRE keys based on its own time
and re-encrypt corresponding ciphertext automatically. Specifically, at any time,
each attribute a is associated with one initial public key P Ka , and three time-based
(y,m,d) (y,m)
public keys: day-based public key P Ka , month-based public key P Ka , and
(y)
year-based public key P Ka , each of which denotes a’s public key in a particular
70 Q. Liu
day (y, m, d), month (y, m), and year (y), respectively. For example, given current
time (2020, 4, 5), attribute a’s public keys include P Ka , P Ka(2020,4,5) , P Ka(2020,4) ,
and P Ka(2020) . In the TimePRE scheme, the original ciphertexts are encrypted by
the data owner using the initial public keys of attributes in the data access structure.
On receiving a request, the CSP first uses the root secret key s to calculate PRE keys
on all attributes in the access structure based on its own time, and then uses these
PRE keys to re-encrypt the original ciphertext by updating the initial public keys of
all attributes in the access structure to time-based public keys.
(y) (y,m) (y,m,d)
We use sa , sa , and sa to denote the PRE keys on attribute a in time
(y), (y, m), and (y, m, d), which can be used to update attribute a’s initial public
(y) (y,m) (y,m,d)
key P Ka to time-based public keys P Ka , P Ka , and P Ka , respectively.
As shown in Fig. 4b, for each attribute a, the CSP can use the root secret key s and
the time tree to hierarchically calculate the time-based PRE keys with Eq. (1):
(y)
sa = Hsa (y) (1a)
(y,m)
sa = Hs (y) (m) (1b)
a
(y,m,d)
sa = Hs (y,m) (d) (1c)
a
where (q, G1 , G2 , ê) are the outputs of a BDH parameter generator [49] IG, P0
is a random generator of G1 , P1 is a random element in G1 ; ska ∈ Z∗q is the
initial secret key of attribute a and P Ka = ska P0 ∈ G1 is the initial public key
of attribute a; mk0 and mk1 are random elements in Z∗q , Q0 = mk0 P0 ∈ G1 , and
SK1 = mk0 P1 ∈ G1 . P K will be published, MK will be kept secret, and s will
be sent to the CSP.
Tu
2. GenKey(P K, MK, s, P Ku , a, Tu ) → (SKu , SKu,a ) : After authenticating
user u is eligible for attribute a and his access right is effective in time period
Tu , the data owner takes the system public key P K, the system master key MK,
72 Q. Liu
user public key P Ku , and the root secret key s as inputs, and generates a UIK
Tu
SKu and a time-based UAK SKu,a , as follows:
(y) (y)
P Ka = P Ka + sa P0 (2a)
(y,m) (y,m)
P Ka = P Ka + s a P0 (2b)
(y,m,d) (y,m,d)
P Ka = P Ka + s a P0 (2c)
U0 = rP0 , (3a)
{Ui = r P Ka }1≤i≤N , (3b)
a∈CCi
U0t = U0 + r P0 , (4a)
(Ui + r P Ka + sa U0t ),
(y)
t
U(y)i = (4b)
a∈CCi
Secure Search and Storage Services in Cloud and Fog/Edge Computing 73
(Ui + r P Ka + sa
(y,m)
t
U(y,m)i = U0t ), (4c)
a∈CCi
(Ui + r P Ka + sa
(y,m,d)
t
U(y,m,d)i = U0t ), (4d)
a∈CCi
V t = V · ê(Q0 , r nA P1 ) (4e)
The key technique of the TimePRE scheme is that the root secret key s is
simultaneously used by the data owner to generate time-based UAKs, and by the
CSP to generate PRE keys. Note that each time-based UAK is generated with time-
based attribute public key, which is in turn generated by s and an effective time
period Tu ; each data is first encrypted with initial attribute public keys, and will be
74 Q. Liu
updated by the CSP to day-based attribute public keys, which are in turn generated
by s and the access time of receiving a request t. Therefore, even if s is only shared
between the data owner and the CSP, the users still can decrypt the ciphertext when
their attributes satisfy the access structure and their access rights are effective in the
access time. Furthermore, the GenKey algorithm should take the system master key
MK as inputs, which is kept secret by the data owner. Thus, given the root secret
key that has nothing to do with the system master key, the CSP cannot know any
information about the UAKs.
Security Analysis of TimePRE The Encrypt algorithm in the TimePRE scheme
is the same as the Encryption algorithm in HABE, which has been proven to be
semantically secure in [25]. Therefore, we consider that the TimePRE scheme is
secure if the following propositions hold:
• Proposition 1. The keys produced by the GenKey algorithm are secure.
• Proposition 2. The ciphertext produced by the ReEncrypt algorithm is semanti-
cally secure.
• Proposition 3. Given the root secret key and the original ciphertext, the
CSP cannot know neither the underlying data, nor UAKs while executing re-
encryption.
For Proposition 1, we prove that the GenKey algorithm is as secure as the Key
Generation algorithm in HABE. First, the way to generate UIK is the same in both
algorithms. Then, given the system public key P K, the system master key MK, user
public key P Ku , and attribute a, if the data owner takes the time-based attribute
(T )
public key P Ka u as inputs of the Key Generation algorithm in HABE, then the
produced UAK is the same as that of the GenKey algorithm that takes time Tu ,
the initial attribute public key P Ka , and the root secret key s as inputs. As proven
in [25], due to the BDH assumption, the malicious users cannot obtain MK, even if
all of them collude. Therefore, Proposition 1 is correct.
For Proposition 2, we prove that the ReEncrypt algorithm is as secure as the
Encryption algorithm in HABE. Given system public key P K and data F with
access structure A, if the data owner takes the time-based attribute public keys
}a∈A , and a random number r = r + r
(y) (y,m) (y,m,d)
{P Ka }a∈A , {P Ka }a∈A , {P Ka
as inputs of the Encryption algorithm in HABE, then the produced ciphertext is
the same as that of the the ReEncrypt algorithm that takes time t = (y, m, d), the
original ciphertext CA , and root secret key s as inputs. Therefore, Proposition 2 is
correct.
For completeness, we provide an intuitive security proof for the ReEncrypt
algorithm as follows:
Recall that data F is re-encrypted to V t = F · ê(Q0 , (r + r )nA P1 ) in time
t = (y, m, d). Therefore, an adversary A needs to construct ê(Q0 , (r + r )nA P1 ) =
ê(U0t , SK1 )nA to recover F . From the GenKey algorithm, we know that the only
occurrence of SK1 is in the UAKs. In our security model, we assume that the CSP
will not collude with the malicious users, who possess UAKs. Therefore, we only
Secure Search and Storage Services in Cloud and Fog/Edge Computing 75
consider the case that malicious users work independently, or collude to compromise
data security.
We consider that the TimePRE scheme is insecure if one of the following cases
happens: (1) Adversary A, whose effective time period satisfies the access time, but
whose attributes do not satisfy the access control, can recover data F . (2) Adversary
A, whose attributes satisfy the access control, but whose effective time does not
satisfy the access time, can recover data F .
For case (1), we have the following assumptions for ease of presenta-
tion: Adversary A has requested UAKs on all but one of the attributes
ai1 , . . . , ai(k−1) , ai(k+1) , . . . , aini in CCi for user u, and has requested a UAK
on the missing attribute aik for user u . Both users’ effective time periods Tu and
Tu satisfy the access time t = (y, m, d). Based on Proposition 1, we know that the
adversary cannot generate fake keys. The only occurrence of SK1 is in the UAKs,
so the adversary has to use UAKs requested for user u and u for bilinear map,
yielding for some α:
ni
Tu Tu
ê(U0t , nnAi SKu,a ij +
nA
ni SKu ,aik + α)
j =1,j =k
T
nA
ni nA
= ê(U0t , SK1 )nA ê(r P0 , α)ê(SKu , r P Kaiku ) ni ê(SKu , r P KaTiju ) ni
j =1,j =k
where r = r + r . To obtain ê(U0 , SK1 )nA , the last three elements have to be
eliminated. Note that SKu and SKu are known to adversary A, but r is randomly
chosen by the data owner for the original ciphertext CA and r is randomly chosen by
T
the CSP for the re-encrypted ciphertext CAt . The adversary cannot know r P Kaiku or
ni
r P KaTiju , even if he knows U(T
t
u )i
t
and U(T )i
due to the BDH assumption.
u
j =1,j =k
Therefore, adversary A cannot recover the data from V t .
For case (2), we have the following assumptions for ease of presentation:
Adversary A has requested UAKs on all attributes in CCi for user u. Any effective
time period Tu of this user does not satisfy the access time t = (y, m, d). Based
on Proposition 1, we know that the adversary cannot generate fake keys. The only
occurrence of SK1 is in the UAKs, so the adversary has to use UAKs requested for
user u for bilinear map, yielding for some α:
ni
Tu
ê(U0t , nnAi SKu,a ij + α)
j =1
ni nA
= ê(U0t , SK1 )nA ê(r P0 , α)ê(SKu , r P KaTiju ) ni
j =1
where r = r + r . To obtain ê(U0 , SK1 )nA , the last two elements have to be
eliminated. Note that the SKu is known to adversary A, but r is randomly chosen
by the data owner for the original ciphertext CA and r is randomly chosen by the
76 Q. Liu
ni
CSP for the re-encrypted ciphertext CAt . The adversary cannot know r PaTiju ,
j =1,j =k
even if he knows Uit and UiTu due to the BDH assumption. Therefore, adversary A
cannot recover data from V t .
For Proposition 3, we first prove that the CSP cannot derive the system master
key MK and UAKs from the root secret key s. As compared to HABE, the TimePRE
scheme discloses an additional root secret key to the CSP, which is randomly chosen
by the data owner and has nothing to do with the system master key. Therefore,
the CSP cannot derive the system master key from the root secret key. Based on
Proposition 1, the CSP cannot obtain UAKs without the system master key.
Then, we prove that the CSP cannot compromise data security given the original
ciphertext. Note that the original ciphertext is encrypted with the Encrypt algorithm,
which is semantically secure. Therefore, the ciphertext can be decrypted by only the
entity who possesses UAKs on the initial attribute public keys. In the TimePRE
scheme, a users’ UAKs are generated on the time-based attribute public key, rather
than the initial attribute public key. Therefore, only the data owner with the initial
attribute secret keys can recover the data from the original ciphertext. Neither the
users, nor the CSP can decrypt the original ciphertext.
Since the cloud service provider (CSP) may leak users’ private data consciously
or unconsciously, existing research suggests encrypting data before outsourcing
to preserve user privacy. However, data encryption would make searching over
ciphertexts a very challenging task. The simple solution that downloads the whole
data set from the cloud will incur extensive communication costs. Therefore,
searchable encryption (SE) [42, 47] was proposed to enable a user to retrieve data
of interest while keeping user privacy from the CSP.
In previous SE solutions, the search permission is granted in a coarse-grained
way. That is, the data user has the ability to generate search tokens for all the
keywords by using the search key. In many situations, such a kind of search
authorization will cause a potential risk of privacy disclosure. To illustrate, let us
consider the following scenario: Company A outsources the file management system
to Amazon S3 for easy access by its staff. Suppose that the collaboration agreement,
F, described with keywords “Company B” and “Project X” can be accessed only by
the manager of Company A. If attacker Alice is allowed to first search with keyword
“Project X”, and then with keyword “Company B”, she can infer that Company A
is cooperating with Company B on Project X from the search results returned, even
if she cannot recover file content.
To alleviate this problem, the work in [69] proposed the attribute-based keyword
search (ABKS) scheme, which utilizes attribute-based encryption (ABE) [14, 15] to
achieve fine-grained search authorization for public-key settings. In ABKS, each
Secure Search and Storage Services in Cloud and Fog/Edge Computing 77
6.1 Preliminaries
Fig. 5 Access tree. (a) Sample access tree. (b) Sample tree implementation
The most used notations are shown in Table 5. The DABKS scheme consists of the
following algorithms:
Secure Search and Storage Services in Cloud and Fog/Edge Computing 79
1. System setup. The TTP runs the Init algorithm to generate the system public key
P K and the system master key MK. It then sends P K to the CSP.
2. Data creation. The data owner processes the data before outsourcing as follows:
(1) it classifies into several file groups, where a group of files share the
same access policy. (2) Suppose that file Fi is associated with access policy
APK , which will be expressed as access tree Tw . Given Tw , it runs the EncKW
algorithm to encrypt each keyword w ∈ Wi , and runs the EncFile algorithm to
encrypt Fi . (3) It sends the keyword ciphertexts and file ciphertext, denoted as
{{cphw }w∈W i , CFi }Fi ∈ , to the CSP.
3. User grant. When a new cloud user u joins the system, the TTP first determines
the attribute set Su , which stipulates the search permission and data access right,
for u. It then runs algorithms GenKey and KeyGen to generate decryption key
SKu and search key sku to u, respectively. Finally, the TTP sends the generated
keys (SKu , sku ) along with system public key P K to u.
4. Policy update. If the access policy for file Fi is changed to APK , the data owner
first builds a new access tree Tw for APK , and then runs the GenUpd algorithm
to generate the update key U K and some auxiliary information . The update
instruction sent to the CSP is set to = {F id, U O, U K, }, where F id is the
ID of file Fi and U O is the specific update operation. On receiving the policy
update request, the CSP first locates Fi , and then runs the ExeUpd algorithm to
generate new ciphertexts {cphw }w∈Wi for Wi based on .
5. Data access. If data user u wants to retrieve files containing keyword w, it runs
the TokenGen algorithm to generate a search token T okw , which will be sent to
the CSP. On receiving the data access request, the CSP runs the Search algorithm
by evaluating the search token on the whole keyword ciphertext set, and returns
the searching results, denoted as {{cphw }w∈W i , CFi }Search(T okw ,cphw )=1 . If u’s
attributes satisfy the access policy of Fi , it can run the Decrypt algorithm to
recover file content.
Taking Fig. 6 as an example, = {F1 , . . . , F7 } are classified into two groups,
denoted as F G1 and F G2 . Let APK,1 and APK,2 denote the access policy
associated with file group F G1 and F G2 , respectively. The data owner will encrypt
F1 , . . . , F4 with APK,1 and F5 , F6 , F7 with APK,2 . Therefore, data user u with
attribute set Su = {A1 , A2 } has permission to search files only in F G1 . That is, u
can generate valid search tokens for keywords a, b, c, d only. If APK,2 is updated
to APK,2 = {A1 ∧ A2 }, u can search all files in .
Fig. 7 Policy updating operations. (a) Att2OR and AttRmOR. (b) Att2AND and AttRmAND
on a large number of files, the workload on the data owner is heavy. In this paper,
we propose a DABKS scheme that achieving an efficient update of access policy by
delegating the policy updating operations to the cloud.
Inspired by the work in [41], we consider four basic operations involved in policy
updating (Fig. 7): Att2OR denotes adding an attribute to an OR gate, Att2AND
denotes adding an attribute to an AND gate, AttRmOR denotes removing an
attribute from an OR gate, and AttRmAND denotes removing an attribute from
an AND gate. Given access tree T over access policy APK , the data owner needs to
preserve = {qx (0)}x∈T generated by SSS, where qx (0) is the sharer of secret σ
for node x.
Let node y be the AND/OR gate that will be updated, where A1 , . . . , Am are
the original attributes under y. Let qy (0) and {qx1 (0), . . . , qxm (0)} denote the secret
shares for node y and y’s children nodes x1 , . . . , xm , where att (xi ) = Ai for i ∈
[1, m]. Given an access tree Tw , the DABKS scheme will produce a ciphertext for
each leave node x based on share qx (0). The original and new ciphertext for node
xi is denoted as Ci and Ci , respectively.
• Att2OR : This operation can be transformed to updating a (1, m) gate to a
(1, m + 1) gate. Given qy (0) and the new access policy T , the data owner runs
SSS to generate new shares {qx 1 (0), . . . , qx m+1 (0)} for attributes A1 , . . . , Am+1 .
Since qxi (0) = qx i (0) = qy (0) for i ∈ [1, m + 1], the ciphertexts for
the original attributes will not be changed, i.e., Ci = Ci for i ∈ [1, m].
For the newly added attribute Am+1 , the data owner needs to generate a new
82 Q. Liu
ciphertext Cm+1 based on qx m+1 (0). Finally, it sets the update instruction =
{F id, Att2OR, NU LL, Cm+1 } to the CSP, which will add Cm+1 to cphw and
update the access tree to Tw by adding Am+1 under node y.
• AttRmOR : This operation can be transformed to updating a (1, m)
gate to a (1, m − 1) gate. As the Att2OR operation, we have qxi (0) =
qx i (0) = qy (0) for i ∈ [1, m − 1]. Therefore, the data owner will send
= {F id, AttRmOR, NU LL, NU LL} to the CSP, which will remove Cm
from cphw and update the access tree to Tw by removing Am under node y.
• Att2AND : This operation can be transformed to updating a (m, m) gate to
a (m + 1, m + 1) gate. Given qy (0) and the new access policy T , the data
owner runs SSS to generate new shares {qx 1 (0), . . . , qx m+1 (0)} for attributes
A1 , . . . , Am+1 . Next, it executes the GenUpd algorithm to generate the update
key U K for attributes A1 , . . . , Am . Moreover, it generates a ciphertext Cm+1
for the newly added attribute Am+1 based on qxm+1 (0). Finally, it sends =
{F id, Att2AND, U K, Cm+1 } to the CSP, which will execute the ExeUpd
algorithm to update the ciphertext Ci to Ci for i ∈ [1, m], add the new ciphertext
Cm+1 to the cphw , and update the access tree to Tw by adding Am+1 under node
y.
• AttRmAND : This operation can be transformed to updating a (m, m) gate to a
(m−1, m−1) gate. Given qy (0) and the new access policy T , the data owner runs
SSS to generate new shares {qx 1 (0), . . . , qx m−1 (0)} for attributes A1 , . . . , Am−1 .
Next, it executes the GenUpd algorithm to generate the update key U K for
C1 . . . , Cm−1 . Finally, it sends = {F id, AttRmAND, U K, NU LL to the
CSP, which will execute the ExeUpd algorithm to update the ciphertext Ci to
Ci for i ∈ [1, m − 1], remove Cm from cphw , and update the access tree to Tw
by removing Am under node y.
6.4 Construction
I nit (λ): Let e : G0 × G0 → G1 be the bilinear group, where G0 and G1 are cyclic
groups of prime order p, the TTP takes security parameter λ as input, and sets the
system public key P K and the system master key MK as follows:
Aj ∈ Su , and computes Bj = g r H1 (Aj )rj and B̄j = g rj . The search key is set as
follows:
Search(T okw , cphw ): On receiving the search token T okw from data user u,
the CSP first constructs a set S ∈ Su that satisfies the access tree Tw specified
in cphw , and then it computes Exi = e(Bi , Cxi )/e(B̄i , C̄xi ) = e(g, g)rsqxi (0) for
each attribute Ai ∈ S, where Ai = att (xi ) for xi ∈ lev(T ). Next, it executes the
Lagrange interpolation to recover ER = e(g, g)rsσ . Finally, it tests whether Eq. (6)
holds.
where U K1,i = g σ , U K2,i = H1 (att (xi ))σ , and σ = qx i (0) − qxi (0). Furthermore,
for adding an attribute Am+1 under gate node y, the data owner generates the new
84 Q. Liu
qx
for Am+1 as follows: it computes Cx m+1 = g and C̄x m+1 =
(0)
ciphertext Cm+1 m+1
e(B j ,Cxi ) r s
e(g rs H (A ) j ,g qxi (0) )
Ex i = e(B̄j ,C̄xi )
= rj s 1 j q
e(g ,H1 (att (xi )) xi (0) ) (8)
= e(g, g) rsqxi (0)
If Su
Tw , we can recover ER = e(g, g)rsqR (0) = e(g, g)rsσ by executing the
Lagrange interpolation. Therefore, the right side of Eq. (6) will evolve as follows:
Cx i = Cxi · U K1,i
q x (0)−qxi (0) q x (0)
= g qxi (0) g i =g i
Secure Search and Storage Services in Cloud and Fog/Edge Computing 85
q x (0)
= H1 (att (xi )) i (11)
The new ciphertext Cm+1 corresponding to Am+1 is constructed as follows:
Cm+1 = (Cx m+1 , C̄x m+1 )
q x (0) q x (0) (12)
= (g m+1 , H1 (att (xm+1 )) m+1 )
Given the updated ciphertexts Ci for i ∈ [1, m + 1], ER = e(g, g)rsqR (0) =
e(g, g)rsσ can be recovered only when Su
Tw . Therefore, the DABKS scheme is
correct.
The work in [69] has proven that, given the one-way hash function H2 , the ABKS
scheme is selectively secure against chosen-keyword attacks in the generic bilinear
group model and achieves keyword secrecy in the random oracle model. The
correctness proof has proven that the DABKS scheme carries out a correct search
control after the update of access policy APK . Therefore, the security of our scheme
can be derived from that of the ABKS scheme.
search key sk for the data users, the complexity of which relates to S, the number
of attributes associated with a user. For the data owner, the EncKW algorithm is
mainly impacted by N, the number of attributes in the specified access policy, and
the GenUpd algorithm is mainly impacted by m, the number of attributes under the
updating gate node. For the data user, the cost of the TokenGen algorithm will grow
linearly with S. For the CSP, the ExeUpd algorithm is also impacted by m.
search results to user; Otherwise, the fog devices contact the cloud tier to retrieve
corresponding ciphertexts before performing searches. The cloud tier centralizes
sufficient storage and computing resources, providing storage services. The cloud
servers are considered to be honest but curious. Therefore, all data will be encrypted
before being uploading to the cloud.
In the tiered architecture, the main problem is which ciphertexts should be
distributed to the fog devices. A naive solution is to let the fog devices store all
the ciphertexts, so that the user can retrieve all the required data at the edge layer.
However, this simple solution requires that the storage space of a fog device is as
large as that of the central cloud. Therefore, a novel cache algorithm is required to
locate appropriate ciphertexts to fog devices.
8 Conclusion
References
1. Research and Markets: Cloud storage market - forecasts from 2017 to 2022. https://www.
researchandmarkets.com/research/lf8wbx/cloud_storage/
2. Rimal, B.P., Choi, E., Lumb, I.: A taxonomy and survey of cloud computing systems. In:
Proc. of the 5th International Joint Conference on INC, IMS and IDC (NCM 2009), pp. 44–51
(2009)
3. Zhou, M., Zhang, R., Zeng, D., Qian, W.: Services in the cloud computing era: a survey.
In: Proc. of the 4th International Conference on Universal Communication Symposium (IUCS
2010), pp. 40–46 (2010)
4. Wikipedia: icloud leaks of celebrity photos. https://en.wikipedia.org/wiki/ICloud_leaks_of_
celebrity_photos
5. Khandelwal, S.: Download: 68 million hacked dropbox accounts are just a click away!. https://
thehackernews.com/2016/10/dropbox-password-hack.html
6. McGee, M.K.: Blood test results exposed in cloud repository. https://www.databreachtoday.
com/blood-test-results-exposed-in-cloud-repository-a-10382
7. Ferguson, N., Kelsey, J., Lucks, S., Schneier, B., Stay, M., Wagner, D., Whiting, D.:
Improved cryptanalysis of Rijndael. In: Fast Software Encryption, pp. 213–230. Springer,
Berlin (2001)
88 Q. Liu
8. Kleinjung, T., Aoki, K., Franke, J., Lenstra, A.K., Thomé, E., Bos, J.W., Gaudry, P., Kruppa,
A., Montgomery, P.L., Osvik, D.A., et al.: Factorization of a 768-bit RSA modulus. In:
Advances in Cryptology–CRYPTO 2010, pp. 333–350. Springer, Berlin (2010)
9. Kallahalla, M., Riedel, E., Swaminathan, R., Wang, Q., Fu, K.: Plutus: scalable secure file
sharing on untrusted storage. In: Proc. of the 2nd USENIX Conference on File and Storage
Technologies (FAST 2003), pp. 29–42 (2003)
10. Goh, E., Shacham, H., Modadugu, N., Boneh, D.: Sirius: securing remote untrusted storage.
In: Proc. of the 10th Network and Distributed Systems Security Symposium (NDSS 2003), pp.
131–145 (2003)
11. Liu, Q., Wang, G., Wu, J.: Efficient sharing of secure cloud storage services. In: Proc. of
the 10th IEEE 10th International Conference on Computer and Information Technology (CIT
2010), pp. 922–929 (2010)
12. Gentry, C., Silverberg, A.: Hierarchical ID-based cryptography. In: Proc. of the 8th Inter-
national Conference on the Theory and Application of Cryptology and Information Security
(ASIACRYPT 2002), pp. 149–155 (2002)
13. Sahai, A., Waters, B.: Fuzzy identity-based encryption. In: Proc. of the 24th International
Conference on the Theory and Application of Cryptographic Techniques (EUROCRYPT
2005), pp. 557–557 (2005)
14. Goyal, V., Pandey, O., Sahai, A., Waters, B.: Attribute-based encryption for fine-grained
access control of encrypted data. In: Proc. of the 13th ACM Conference on Computer and
Communications Security (CCS 2006), pp. 89–98 (2006)
15. Bethencourt, J., Sahai, A., Waters, B.: Ciphertext-policy attribute-based encryption. In: Proc.
of the 27th IEEE Symposium on Security and Privacy (SP 2007), pp. 321–334 (2007)
16. Müller, S., Katzenbeisser, S., Eckert, C.: Distributed attribute-based encryption. In: Proc. of
the 7th International Conference on Information Security and Cryptology (ICISC 2009), pp.
20–36 (2009)
17. Waters, B.: Ciphertext-policy attribute-based encryption: an expressive, efficient, and provably
secure realization. In: Public Key Cryptography–PKC 2011, pp. 53–70. Springer, Berlin (2011)
18. Lewko, A., Okamoto, T., Sahai, A., Takashima, K., Waters, B.: Fully secure func-
tional encryption: attribute-based encryption and (hierarchical) inner product encryption. In:
Advances in Cryptology–EUROCRYPT 2010, pp. 62–91. Springer, Berlin (2010)
19. Liang, K., Au, M.H., Liu, J.K., Susilo, W., Wong, D.S., Yang, G., Yu, Y., Yang, A.:
A secure and efficient ciphertext-policy attribute-based proxy re-encryption for cloud data
sharing. Future Gener. Comput. Syst. 28, 95–108 (2015)
20. Jung, T., Li, X.Y., Wan, Z., Wan, M.: Control cloud data access privilege and anonymity with
fully anonymous attribute-based encryption. IEEE Trans. Inf. Forensics Secur. 10, 190–199
(2015)
21. Han, J., Susilo, W., Mu, Y., Zhou, J.: Improving privacy and security in decentralized
ciphertext-policy attribute-based encryption. IEEE Trans. Inf. Forensics Secur. 10, 665–678
(2015)
22. Chase, M., Chow, S.S.: Improving privacy and security in multi-authority attribute-based
encryption. In: Proceedings of the 16th ACM Conference on Computer and Communications
Security, pp. 121–130. ACM, New York (2009)
23. Lewko, A., Waters, B.: Decentralizing attribute-based encryption. In: Advances in
Cryptology–EUROCRYPT 2011, pp. 568–588. Springer, Berlin (2011)
24. Wang, G., Liu, Q., Wu, J.: Achieving fine-grained access control for secure data sharing on
cloud servers. Concurrency Comput. Pract. Exp. 23(12), 1443–1464 (2011)
25. Wang, G., Liu, Q., Wu, J., Guo, M.: Hierarchical attribute-based encryption and scalable user
revocation for sharing data in cloud servers. Comput. Secur. 30(5), 320–331 (2011)
26. Wang, G., Liu, Q., Wu, J.: Hierarchical attribute-based encryption for fine-grained access
control in cloud storage services. In: Proceedings of the ACM Conference on Computer and
Communications Security (CCS), pp. 735–737 (2010)
27. Zhu, Y., Hu, H., Ahn, G., et al.: Comparison-based encryption for fine-grained access control
in clouds. In: Proceedings of ACM CODASPY, pp. 105–116 (2012)
Secure Search and Storage Services in Cloud and Fog/Edge Computing 89
28. Liu, X., Liu, Q., Peng, T., Wu, J.: Dynamic access policy in cloud-based personal health
record (PHR) systems. Inf. Sci. 8(7), 1332–1346 (2015)
29. Wu, Y., Wei, Z., Deng, H., et al.: Attribute-based access to scalable media in cloud-assisted
content sharing. IEEE Trans. Multimed. 15(4), 778–788 (2013)
30. Li, M., Yu, S., Zheng, Y., Ren, K., Lou, W.: Scalable and secure sharing of personal health
records in cloud computing using attribute-based encryption. IEEE Trans. Parallel Distrib. Syst.
24(1), 131–143 (2013)
31. Katz, J., Sahai, A., Waters, B.: Predicate encryption supporting disjunctions, polynomial
equations, and inner products. In: Advances in Cryptology–EUROCRYPT 2008, pp. 146–162.
Springer, Berlin (2008)
32. Okamoto, T., Takashima, K.: Hierarchical predicate encryption for inner-products. In:
Advances in Cryptology–ASIACRYPT 2009, pp. 214–231. Springer, Berlin (2009)
33. Pirretti, M., Traynor, P., McDaniel, P., Waters, B.: Secure attribute-based systems. J. Comput.
Secur. 18(5), 799–837 (2010)
34. Blaze, M., Bleumer, G., Strauss, M.: Divertible protocols and atomic proxy cryptography. In:
Proc. of the 17th International Conference on the Theory and Application of Cryptographic
Techniques (EUROCRYPT 1998), pp. 127–144 (1998)
35. Green, M., Ateniese, G.: Identity-based proxy re-encryption. In: Proceedings of the Inter-
national Conference on Applied Cryptography and Network Security (ACNS), pp. 288–306
(2007)
36. Yu, S., Wang, C., Ren, K., Lou, W.: Achieving secure, scalable, and fine-grained data access
control in cloud computing. In: Proc. of the 29th IEEE International Conference on Computer
Communications (INFOCOM 2010), pp. 534–542 (2010)
37. Yu, S., Wang, C., Ren, K., Lou, W.: Attribute based data sharing with attribute revocation. In:
Proceedings of the ACM Symposium on Information, Computer and Communications Security
(ASIACCS), pp. 261–270 (2010)
38. Shi, Y., Zheng, Q., et al.: Directly revocable key-policy attribute-based encryption with
verifiable ciphertext delegation. Inf. Sci. 295, 221–231 (2015)
39. Liu, Q., Wang, G., Wu, J.: Time-based proxy re-encryption scheme for secure data sharing in
a cloud environment. Inf. Sci. 258, 355–370 (2014)
40. Yang, Y., Zhu, H., et al.: Cloud based data sharing with fine-grained proxy re-encryption.
Pervasive Mob. Comput. (2015). http://dx.doi.org/10.1016/j.pmcj.2015.06.017
41. Yang, K., Jia, X., Ren, K., et al.: Enabling efficient access control with dynamic policy
updating for big data in the cloud. In: Proceedings of IEEE INFOCOM, pp. 2013–2021 (2014)
42. Song, D.X., Wagner, D., Perrig, A.: Practical techniques for searches on encrypted data. In:
Proc. of the 2000 IEEE Symposium on Security and Privacy (SP 2000), pp. 44–55 (2000)
43. Goh, E.-J.: Secure indexes. Cryptology ePrint Archive, Report 2003/216, Tech. Rep. (2003)
44. Chang, Y.-C., Mitzenmacher, M.: Privacy preserving keyword searches on remote encrypted
data. In: Applied Cryptography and Network Security, pp. 442–455. Springer, Berlin (2005)
45. Kurosawa, K., Ohtaki, Y.: UC-secure searchable symmetric encryption. In: Financial Cryp-
tography and Data Security, pp. 285–298. Springer, Berlin (2012)
46. Kamara, S., Papamanthou, C.: Parallel and dynamic searchable symmetric encryption. In:
Financial Cryptography and Data Security, pp. 258–274. Springer, Berlin (2013)
47. Boneh, D., Di Crescenzo, G., Ostrovsky, R., Persiano, G.: Public key encryption with
keyword search. In: Advances in Cryptology-Eurocrypt 2004, pp. 506–522. Springer, Berlin
(2004)
48. Golle, P., Staddon, J., Waters, B.: Secure conjunctive keyword search over encrypted data. In:
Applied Cryptography and Network Security, pp. 31–45. Springer, Berlin (2004)
49. Boneh, D., Franklin, M.: Identity-Based Encryption from the Weil Pairing. In: Proceedings of
CRYPTO 2001, LNCS, 2139, 213–229 (2001).
50. Boneh, D., Waters, B.: Conjunctive, subset, and range queries on encrypted data. In: Theory of
Cryptography, pp. 535–554. Springer, Berlin (2007)
51. Shi, E., Bethencourt, J., Chan, T.-H., Song, D., Perrig, A.: Multi-dimensional range query
over encrypted data. In: Proc. of the 2007 IEEE Symposium on Security and Privacy (SP 2007),
pp. 350–364 (2007)
90 Q. Liu
52. Popa, R.A., Redfield, C., Zeldovich, N., Balakrishnan, H.: Cryptdb: protecting confidentiality
with encrypted query processing. In: Proc. of the 23rd ACM Symposium on Operating Systems
Principles (SOSP 2011), pp. 85–100 (2011)
53. Wang, C., Cao, N., Li, J., Ren, K., Lou, W.: Secure ranked keyword search over encrypted
cloud data. In: Proc. of the 30th IEEE International Conference on Distributed Computing
Systems (ICDCS 2010), pp. 253–262 (2010)
54. Boldyreva, A., Chenette, N., Lee, Y., Oneill, A.: Order-preserving symmetric encryption. In:
Advances in Cryptology-EUROCRYPT 2009, pp. 224–241. Springer, Berlin (2009)
55. Cao, N., Wang, C., Li, M., Ren, K., Lou, W.: Privacy-preserving multi-keyword ranked
search over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 25(1), 222–233 (2014)
56. Wong, W.K., Cheung, D.W.-l., Kao, B., Mamoulis, N.: Secure KNN computation on encrypted
databases. In: Proc. of the 2009 ACM SIGMOD International Conference on Management of
Data (SIGMOD 2009), pp. 139–152 (2009)
57. Li, J., Wang, Q., Wang, C., Cao, N., Ren, K., Lou, W.: Fuzzy keyword search over encrypted
data in cloud computing. In: Proc. of the 29th IEEE International Conference on Computer
Communications (INFOCOM 2010), pp. 1–5 (2010)
58. Wang, B., Yu, S., Lou, W., Hou, Y.T.: Privacy-preserving multi-keyword fuzzy search over
encrypted data in the cloud. In: Proc. of IEEE INFOCOM (2014)
59. Guo, D., Wu, J., Chen, H., Luo, X., et al.: Theory and network applications of dynamic bloom
filters. In: Proc. of INFOCOM (2006)
60. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of
dimensionality. In: Proc. of ACM STOC (1998)
61. Fu, Z., Xia, L., Sun, X., Liu, A.X., Xie, G.: Semantic-aware searching over encrypted data
for cloud computing. IEEE Trans. Inf. Forensics Secur. 13(9), 2359–2371 (2018)
62. Wang, D., Jia, X., Wang, C., Yang, K., Fu, S., Xu, M.: Generalized pattern matching string
search on encrypted data in cloud systems. In: Proc. of INFOCOM, pp. 2101–2109 (2015)
63. Ding, X., Liu, P., Jin, H.: Privacy-preserving multi-keyword top-k similarity search over
encrypted data. IEEE Trans. Dependable Secure Comput. 16(2), 344–357 (2019)
64. Boldyreva, A., Chenette, N.: Efficient fuzzy search on encrypted data. In: Proc. of FSE, pp.
613–633 (2014)
65. Moataz, T., Ray, I., Ray, I., Shikfa, A., Cuppens, F., Cuppens, N.: Substring search over
encrypted data. J. Comput. Secur. 26(1), 1–30 (2018)
66. Hahn, F., Loza, N., Kerschbaum, F.: Practical and secure substring search. In: Proc. of
SIGMOD, pp. 163–176 (2018)
67. Bao, F., Deng, R.H., Ding, X., Yang, Y.: Private query on encrypted data in multi-user settings.
In: Information Security Practice and Experience. Springer, Berlin (2008)
68. Li, M., Yu, S., Cao, N., Lou, W.: Authorized private keyword search over encrypted personal
health records in cloud computing. In: Proc. of IEEE ICDCS (2011)
69. Zheng, Q., Xu, S., Ateniese, G.: VABKS: verifiable attribute-based keyword search over
outsourced encrypted data. In: Proc of IEEE INFOCOM (2014)
70. Hu, B., Liuy, Q., Liu, X., Peng, T., Wu, J.: DABKS: dynamic attribute-based keyword search
in cloud computing. In: Proc of IEEE ICC (2017)
71. Peng, T., Liu, Q., Hu, B., Liu, J., Zhu, J.: Dynamic keyword search with hierarchical attributes
in cloud computing. IEEE Access 6, 68948–68960 (2018)
72. Naveed, M.: The fallacy of composition of oblivious ram and searchable encryption. IACR
Cryptol. ePrint Arch. 2015, 668 (2015)
73. Canetti, R., Raghuraman, S., Richelson, S., Vaikuntanathan, V.: Chosen-ciphertext secure
fully homomorphic encryption. In: Proc. of IACR International Workshop on Public Key
Cryptography, pp. 213–240 (2017)
74. Curtmola, R., Garay, J., Kamara, S., Ostrovsky, R.: Searchable symmetric encryption:
improved definitions and efficient constructions. In: Proc. of the 13th ACM Conference on
Computer and Communications Security (CCS 2006) (2006)
75. ElGamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms.
In: Proceedings of International Cryptology Conference (CRYPTO), pp. 10–18 (1984)
Secure Search and Storage Services in Cloud and Fog/Edge Computing 91
76. Yu, S., Wang, C., Ren, K., Lou, W.: Attribute based data sharing with attribute revocation. In:
Proc. of the 5th ACM Symposium on Information, Computer and Communications Security
(ASIACCS 2010), pp. 261–270 (2010)
77. Libert, B., Vergnaud, D.: Unidirectional chosen-ciphertext secure proxy re-encryption. In:
Public Key Cryptography–PKC 2008, pp. 360–379. Springer, Berlin (2008)
Collaborative Intrusion Detection
Schemes in Fog-to-Things Computing
1 Background
The IoT is an emerging technology that aims to provide a virtual presence for a
massive number of intelligent physical objects. In other words, it is a new network
of things that extend IT systems to physical environments in order to form a
globally connected heterogeneous network of smart objects [1]. This has ushered
in the era of smart connectivity, where the power of computing has moved from
desktops to pockets and wearable devices [2]. One of the main driving factor for
the rapid adoption of the IoT is the miniaturization of hardware. This has enabled
the mass production of these smart objects and has also led to its affordability.
The potential of IoT technologies and services has prompted wide interest across
the globe in the adoption of IoT applications. IoT technologies have been applied
to several critical infrastructures, in particular, to devices which are used in smart
homes, smart cities and operational technologies [3]. The accumulative effect of this
massive connectivity and wide applicability is the surge of data (over 60 ZB data by
2020) at the edge of the network. The explosion of data accounts for about nine
trillion USD economic impacts [4, 5]. The big data generated from IoT devices can
have a potential influence on people’s day-to-day activities such as offering huge
profitability and improving quality of life. At the same time, the surge in the traffic
from IoT applications increases the potential danger of cyber-attacks as the IoT
come with unseen protocols, work flows, interfaces and applications.
Centralized platforms such as cloud suffer from scalability and high delay in
data collection, communication and suffer reaction time for IoT applications. Due
to this, the ability to leverage embedded intelligence in distributed IoT network is an
essential architectural component [6, 7]. This necessitates to extend the cloud to a
cognitive and distributed network known as fog computing (FC). FC is an emerging
architecture of distributed computing [8] that pushes the cloud to the edge of the
network for efficient data collection, computation and control, communication and
networking, data access and storage [9]. It is a horizontal distributed architecture
that distributes the services and resources closer to the origin of the data. FC differs
substantially from edge and mobile computing in data processing mechanisms, and
the location of intelligence and computational power. It is the push of centralised
computing towards the edge of the network, particularly wireless networks, to solve
the limitations of the cloud. The emerging usage of FC, however, is not to replace
the cloud, but it complements the cloud by bringing intelligence into distributed
fog nodes. The FC architecture provides a cloud-to-things continuum which caters
for latency, bandwidth and the communication requirements of next generation
networks [10]. In addition to controlling and managing the massive amount of IoT
data as a mini-cloud at the edge, FC provides a layered architecture as a cloud-to-
things continuum for big data storage and analytics.
FC enables the delivery of new types of services closer to the IoT. As it
confines data movement, storage and processing to the edge of the network, it
offers scalability and low latency for real-time systems [11]. IoT applications
benefit from FC services in offloading computation and storage services, and in
communicating with nearby nodes for service delivery. In such networks, obtaining
networking services from nearby nodes guarantees the IoT urgent response time for
supported real-time applications, while delegating closer nodes for computation and
storage facilities helps the smart applications to be resource efficient. As it is closer
to clients, FC supports IoT applications that leverage artificial intelligence (AI),
such as augmented and virtual reality. It is an ideal paradigm for next-generation
networks that support the convergence of IoT, 5G and AI applications The shift from
centralisation, such as the cloud, to fog networking has also been necessitated by
other unique features and advantages in the IoT, such as mobility support, location
awareness, scalability, heterogeneity, low latency, and geographic distribution [12].
Because of this, smart city applications such as smart grids, smart transportation
and health care systems benefit greatly from fog computing as it provides embedded
and distributed intelligence for IoT in data collection and resource utilisation. This
reinforces that FC is an ideal architectural element in the application of IoT security,
especially cooperative intrusion detection system.
2.1 Significance
The construction of a resilient IDS is one of the network security challenges faced by
organizations and industries despite the advancement of detection techniques [27].
These challenges include decreasing the high false alarm rate (FAR), the detection
of novel attacks, the lack of training and test data and dynamic network and system
behaviours. These challenges are some of the reasons for the continued widespread
use of inefficient and inaccurate signature-based solutions and the reluctance to
adopt anomaly-based systems. Thus, as an additional layer of defence to thwart
cyber-criminals, effective and robust IDSs should be deployed to monitor malicious
activities in real-time for IoT applications. IDSs for the IoT in particular, should be
characterized by:
• fast and accurate detection for a timely incident response and recovery with less
damage
• deterrence of attacks, acting as prevention systems
• logging of intrusion events for future prevention
However, the detection technologies for IoT devices are challenged by:
1. big data- increased connectivity, mainly due to the fact that the IoT needs a high
response time for processing and detection of big data volume.
2. depth of monitoring- high accuracy of detection requires deep pattern extraction
and attribute extraction.
98 A. Diro et al.
Despite its huge advantages, the centralised model of the cloud cannot satisfy
the requirements of data collection, processing, storage and sharing for distributed
Collaborative Intrusion Detection Schemes in Fog-to-Things Computing 99
services and applications such as IoTs [34]. The application of cloud computing for
the security of the IoT forms a far apart IoT-Cloud communication architecture that
consists of IoT on the users side and the cloud at the remote site. As the distance gap
is large between things and the cloud, this two-tier architecture of cloud-to-things
computing is not scalable and suffers from a large delay for IoT-based applications.
This could be aggravated by the increase in the number of IoT devices, which
necessitates an architectural change in the way data processing, storage, control
and communication can be handled on the Internet. The architectural arrangement
should consider the resource-constraint nature of IoT devices which needs to offload
data storage and processing to nodes in proximity [35]. This indicates that the
cloud should be extended towards the IoT to form a distributed layer between
smart objects and the cloud to monitor resource management and to control IoT
devices more closely. The requirements of IoT applications such as latency, location
awareness, mobility and resource constraints could be solved using the newly
formed distributed architecture formed by pushing the cloud towards the edge.
To this end, one of the emerging architectures that supports these requirements is
known as fog computing.
The consideration of appropriate network architecture is as important as selecting
suitable security algorithms [36]. The individual node security and perimeter
security schemes of IT devices cannot be adapted to IoT security because of
processing and storage limitations. Simultaneously, the lack of scalability and the
remoteness of centralized-cloud hosted security mechanisms cannot satisfy specific
requirements such as delay sensitivity and bandwidth efficiency [37]. Such schemes
were designed for the traditional Internet where hosts are sparsely connected
with sufficient storage, processing and communication resources. It highlights that
neither security approaches are suitable for the IoT with limited resources and
because they are massively connected to support real-time applications [38, 39].
Apart from offloading computations and storage from both things and the cloud, a
fog-based security mechanism seems to be scalable and distributed in the ecosystem
that quickly recognizes attacks or suspicious behaviors before escalating to the
end devices. This is a crucial property for a fast incident response in IoT without
service disruptions, especially for industrial IoT with real-time mission-critical
applications. For instance, the service interruption by DDoS on smart grids and
connected vehicles can be detected on nearby fog nodes without disrupting the
power grids and the car engines, respectively. suspicious events and threats from the
Internet are more quickly deterred at fog network than in the cloud, which makes
FC-based IDS appealing for critical infrastructures.
One of the most important design and implementation aspect of IDS is to choose
an architecture that shows the distribution of IDS components. Though evolving
continuously, architecturally, there are mainly three approaches of implementing
100 A. Diro et al.
Table 1 Resource overheads and scalability of IDS architectures for fog computing
Design approach Processing Storage Communication latency Scalability
Things-fog-level Medium Medium Low High
Fog-cloud-level Low Low High Medium
Things-fog-cloud-level Very low Very low Very high Medium
Fog-Cloud-Level IDS Approach This approach includes IoT layer to fog envi-
ronment, and distributes IDS functionalities across fog and cloud levels [41]. The
fog level can be leveraged to detect anomalies based on machine learning models.
This enables to quickly identify suspicious events before propagating to critical
infrastructures. Distributing intrusion detection model across local fog nodes can
be taken as a technique of offloading overheads of computing and storage from
resource-constrained devices [42]. Simultaneously, fog nodes can share machine
learning models and parameters for sharing local experiences. The second level
Collaborative Intrusion Detection Schemes in Fog-to-Things Computing 101
the cloud, and the distribution of the fog will make this approach of IDS deployment
to have a complete picture of the network. As IDS can be deployed at IoT level, it
is extremely useful to detect malicious encrypted traffic. While this approach can
provide improved accuracy, detection time, and efficient use of resources it might
complicate the design and implementation of IDS. Despite its complexity, three-tier
IDS architecture is reliable and secure for fog computing (Fig. 2).
However, the distance of the cloud can expose data communication for security
and privacy threats. It is also complicated to coordinate the IDS functionalities
distributed across three tiers.
ML is a field of AI that deals with the mechanism of learning from data. The
prevalent machine learning algorithms include linear regression, decision trees,
Collaborative Intrusion Detection Schemes in Fog-to-Things Computing 103
naive-Bayes classifier, logistic regression, SVMs and ANNs [44]. Deep learning
(DL) is an advanced machine learning scheme, consisting of deeply layered neural
networks(NNs) through which hierarchical and automatical features are learned. It
is a ML scheme with several hierarchical layers of complex non-linear processing
stages to model complex relationships using multiple levels of representations.
DL is a recent breakthrough in ML that mimics the ability of human brain to
learn from experience. Similar to the capability of human brain to process raw
data using neuron inputs in learning the high-level features, DL enables input
data to be fed into deep NNs for hierarchical feature extraction, training and
classification [45]. It has a capacity of stabilizing and generalizing model training
on big data [46]. The algorithms can discover complex functions mapping input into
output without manual human expert intervention. DL algorithms have benefited
from the advancement of hardware technologies (e.g. GPU) and algorithms (e.g.
deep neural networks). The generation of a massive amount of learning data has
also significantly contributed to this evolution as observed in globally dominant
companies such as Google and Facebook.
Several deep learning applications have flourished in the field of big data such as
image classification, object recognition, natural language processing and computer
vision [47]. While large global companies such as Google, Microsoft, and Facebook
have already embraced deep learning and have embedded it into their products, the
cybersecurity industry is lagging behind in applying this technology to safeguard
businesses and products. The success of DL in various areas can be adopted to
solve the limitations of traditional ML in combating cyber-threats. For example,
mutations due to attacks can be small, for instance, changes to image pixels. This
means that DL can bring cybersecurity to a next level by discriminating even small
variations of cyber-attacks. Because of the massive amount of data generated by IoT
devices and systems, deep learning methods are highly effective in detecting and
analyzing intrusions to IoT systems. It has been shown in traffic classification and
IDS [40] applications. Hence, the capability of DL in compressed and automatic
feature engineering and self-learning empowers IDS in resource efficiency, high
cyber-attack detection rate and quick detection.
There are several ongoing efforts in academia to bring the success of ML in various
fields to cyber security applications. In adopting ML as an underlying technology,
cybersecurity can harness the potentials of ML in numerous applications. Malware
detection is probably the most prominent application of ML [48, 49]. Network
anomaly detection [50] is another widely researched area of ML in cybersecu-
rity. ML is also the backbone technology behind the newly emerged biometric
authentication for network and mobile security [51]. The ultimate goal of these
efforts is to improve the security posture of industries, businesses and governments
by effectively assisting security experts in distinguishing malicious activities from
104 A. Diro et al.
normal network profile at machine speed in big data. This enables to save the time,
efforts and cost in making decisions of segregating normal network data from cyber-
attacks.
While tremendous achievements have been made in applying ML to cyberse-
curity, there is a large vacuum to fill the gaps of research in bringing autonomy
cybersecurity [52]. The existing ML based security systems have made massive
progress in predicting and classifying cyber-attacks using network traffic. However,
the slow in speed, and the limited in knowledge of human loop fail to make rapid
actionable decisions on the outputs of ML models. To solve this issue, and improve
a security posture, ML should be given a permission to take actions and decisions
to respond to cyber-attacks at machine speeds. Despite technical and administrative
challenges, autonomous and intelligent security systems using AI will revolutionize
the security of all sectors.
Classical ML algorithms are constrained by the steps of decisions made to solve
a problem [53]. While some problems are a sing-step in nature, others required
multi-steps to be solved. The multi-step decision process calls for learning systems
which provide feedback, particularly reinforcement learning. Reinforcement learn-
ing functions without labelled data (unlike supervised learning), however, it also
requires feedback to guide the algorithm (unlike supervised learning). This means it
is a semi-supervised learning scheme that needs to understand the consequence of
its actions to reinforce the behaviour (positive or negative) to continuously learn to
react to its environment. In the efforts to improve a security posture of organisations
and governments, embracing autonomy enables to reduce human loops from the
security in taking actions and decisions. Reinforcement learning fits perfectly in
line of making autonomous decisions in cybersecurity, especially to detect and
defend cyber-criminals. Apart from automatic learning like other ML algorithms,
it provides algorithms that learn policies of next system/network profile from the
observable state of systems/networks. This enables to learn automatically over
historical data from past experiences in various actions, conditions and settings
using feedback signal to adapt to previously unseen conditions.
With the prevalence of machine learning such as reinforcement learning in
playing games, cybersecurity is massively benefited from the autonomy of AI
applications. Like playing games, autonomous cybersecurity requires experience
gathering, opponent defeating and measurable rewards. In a dynamic network,
observing the normal network profiles at various points in time, reinforcement
learning can learn general policy about the state of the network through Markov
Decision Process (MDP). As a multi-step problem solving process, MDP can be
defined using four components: state-space, action-space, reward function, and
discount factor [54].
Observation Space: Commonly known as the state space, this describes what
the reinforcement learning agent can see and observe about the world to use in its
decision-making process. In the course of learning to solve a problem, observable
relevant features or everything monitored by the algorithm to decide is determined
state-space. Actions: This is a set of actions the agent can choose from and execute
as part of its policy. In other words, it is the set of actions and conditions under
Collaborative Intrusion Detection Schemes in Fog-to-Things Computing 105
which an algorithm can be used. Reward Signal: This provides both positive and
negative reinforcement to the agent and informs the learning algorithm as to whether
actions are leading to favourable or unfavourable outcomes. The reward function is
the feedback signal the algorithm uses to learn and optimize policies. After receiving
a reward as a consequence of an action, the change in the environment enables to
update the policy and maximize the reward for future actions. The Discount Factor
specifies the value of long-term vs. short-term rewards.
As shown in Fig. 3, an agent is responsible for updating and optimizing the
security program and function of a network. For instance, the agent can perform
actions, such as scans, logging, monitoring and classifications. Referring to its
policy and observed current state of a network, the agent can decide on the bets
next course of action at any given time. The actions taken from the observation,
learned from prior experiences, will be a new experience for the next observation,
and it is used to update a policy. The feedback is applied as a positive or negative
reward in improving the security posture of the network in iterative manner. Hence,
reinforcement learning algorithms can adapt to, and autonomously solve complex
long-term unseen problems of network security in dynamic environment.
MDP can be formulated by 5-attributes consisting of state at time t t (st ), an
action at time t (at ), state probability transition at time t (P (st + 1|st , at ), discount
factor γ and reward function at time t (rt ). The objective function is to come up with
the appropriate action to be taken at each state (i.e. produce an optimal policy π ∗ )
to maximize the agent’s reward over long period of time).
As a common reinforcement learning algorithm, Q-learning is a temporal
difference learning that uses recursive algorithm to estimate a value function. It
works by forming a Q table in which states form rows and actions represent
columns. The quality function Q(s,a) estimates the maximum overall reward r the
106 A. Diro et al.
agent obtains in the state s and performing the action a to transit to next state s . The
solution to the Bellman equation Q(s, a) = r+ γ maxa Q(s , a ) provides the value
of Q(s,a) for all states and actions. This means Q(s,a) converges to optimal quality
function Q∗ (s, a) after a random start by iterating the Bellman equation. Practically,
however, the derivative form of the Bellman equation, known as temporal difference
learning function, is used in updating the Q value to Q’, where α(0 < α < 1) is a
learning rate and γ (0 < γ < 1) is a constant for immediate or delayed reward.
4.1 Architecture
4.2 Algorithms
The proposed self-taught system has two learning modules and a classifier which
aims at:
1. Reducing dimensionality for lightweight IDS construction
2. Accelerating training and detection
3. Improving the accuracy of detection
Stochastic gradient descent (often abbreviated SGD) is an iterative method for opti-
mizing an objective function with suitable smoothness properties (e.g. differentiable
or subdifferentiable). It can be regarded as a stochastic approximation of gradient
descent optimization, since it replaces the actual gradient (calculated from the entire
Collaborative Intrusion Detection Schemes in Fog-to-Things Computing 109
data set) by an estimate thereof (calculated from a randomly selected subset of the
data) [58]. The training and aggregation algorithms are adopted from sequential
SGD for distributed fog network. Having initial training weights Wn , a learning rate
α and bias parameters bn , DAT Atotal is the total data space across all distributed
m nodes. This means data on a given fog node DAT A1 , DAT A2 , . . . , DAT Am
are subsets of DAT Atotal . The local data DAT Ai on the given fog node can
be further divided into DAT Ai1 , DAT Ai2 , . . . , DAT Ais samples. The DAT Ais
samples on each fog node is further be split into DAT Aisc on processor threads nc .
Algorithm 1 shows local training and model/parameter updates on each fog node
while Algorithm 2 reveals the aggregate function at the coordinator node.
Algorithm 1 Local-learning ( α, nreceive , nsend )
step=0
While TRUE
If(step % nreceive ==0 )
then get-Paramaeters (parameters)
data ← get-minibatch (k ∈ data)
sgd ←compute-sgd (parameters, k)
update weights Wj i := Wj i − α αL(W,b|j
αWj i
)
training. In these self-taught schemes, the pre-training process only considers the
unsupervised extraction of features in the half symmetry (encoding) of the encode-
decode scheme. This arrangement aims to decrease the computational cost for
training and detection while delivering almost the same accuracy as the encode-
decode process. The other approach incorporates the supervised learning approach,
in particular, the LSTM model. All the unsupervised models use shallow learning
systems as the end layer for classification. The rationale behind the choice of these
algorithms over other is their effectiveness in other fields, and their ability to produce
lightweight features for the resource-constrained IoT environment. The comparison
of the deep models is with classical machine learning models such as softmax alone
and random forest. It is also essential to discuss the implications of the obtained
results.
The AE Model In the experimentation phase, the deep learning package Keras
on Theano and the distributed processing framework Apache Spark [59] are
combined to implement the system Fig. 6. The first learning module in our training
procedure gives optimal parameters for the second module as can be seen from
the steps involved in our attack detection system in Fig. 5. First, m unlabelled data
(1) (2) (3) (m)
xu , xu , xu , . . . , xu ∈ R n are extracted from the training data by removing
the labels. These samples are input to the unsupervised module to learn the
optimal parameter set such as weight W, and bias b (as shown in Fig 5). Then,
(1) (2) (3) (m)
m labelled train data {(xl , y (1) ), (xl , y (2) ), (xl , y (3) ) . . . , xl } ∈ R n are fed to
the supervised module to produce reduced and latent representation of the same data
(1) (2) (3) (m)
as {(hl , y (1) ), (hl , y (2) ), (hl , y (3) ) . . . , hl } ∈ R n through the reconstruction
(1) (2) (3) (m)
of the output values {(x l , y (1) ), (x l , y (2) ), (x l , y (3) ) . . . , x l } ∈ R n . At this
stage, AEs extract layer-wise hierarchical and discriminatory features, which enable
the model to learn the complex relationships in the data. The module learns features
from d-dimensional input vector x ∈ R d to map to latent representation hi ∈ D di
using a deterministic function:
Fig. 6 AE model training and test curve on 2-class NSL-KDD dataset in distributed settings
and RF in FAR as shown in Table 2. The ROC curve of a shallow model (RF model)
lies below that deep model with higher number of FARs.
The DBN Model As in the case of AE, the features are learned by layer-wise
pre-training using RBM to provide the initial weights. The learning step involves
the adjustment of network weights in the hidden successive layers of RBM. After
initializing DBN from the weights obtained in the pre-training phase, iterative global
parameter adjustment (fine-tuning) is accomplished by a backpropagation algorithm
in training the whole network. The weights obtained through this process enhance
the discrimination capacity of the model. Cross-entropy is used as the loss function
in this unsupervised fine-tuning phase, and regularization techniques are applied to
112 A. Diro et al.
solve the problem of over-fitting. The fine-tuning stage ends with trained model and
weights for classification purposes. The classification task inputs the last hidden
layer k to discriminate the network traffic into normal/attack or normal/multi-attack
classes. In relation to performance measures, unseen test data are chosen to represent
zero-day attack detection. The training curves of the accuracy of the algorithm are
shown in Fig. 9.
As shown in Table 3, the detection accuracy of DL is better than classic ML
model. Though there are performance differences among various deep models, they
Collaborative Intrusion Detection Schemes in Fog-to-Things Computing 113
achieved better performance than the centralized model. For instance, in binary
classification, Table 3 shows the FAR of the DL (0.22%) is much less than that of the
ML model (6.57%). In multi-classes, as shown in Table 4, the performance of DL
is also better than the normal ML model for most attack categories. For instance,
the R2L.U2R recall of the DL is 91%, while the traditional model has a recall of
114 A. Diro et al.
Table 6 The performance of LSTM vs RF on ISCX and AWID datasets in binary classification
ISCX dataset AWID dataset
Algorithm ACC Recall Precision ACC Recall Precision
LSTM 99.91 99.96 99.85 98.22 98.9 98.5
RF 90.92 99 89.11 84.87 90 85
model per second is 54.3 while LSTM required 374.6 training instances per second.
It can be seen that the training time od LSTM network is considerably greater than
training the LR model. Moreover, it has been observed LR model has detected
1177.8 instances per second while LSTM model has detected 80.2 instances per
second. However, the memory storage of the LR model is almost 50 times greater
than that of the LSTM model. This indicates the compactness features and models
of deep learning.
LSTM model has been compared with softmax in multi-class classification. It
has been shown that the overall detection rate of LSTM model (98.85%) is greater
than the rate of softmax (86%) similar to binary classification scheme (Table 6).
IDSs have a significant importance for monitoring IoT systems against internal and
external cyber-attacks. The emergence of fog computing has brought about new
alternative distributed architectures for implementing IDS in the context of IoT
applications. The choice of an architecture in which IDS can be deployed depends
on the parameters such as resource efficiency, scalability and response time.
Pre-training has been effective in accelerating data training and attack detection.
For instance, the training times of SL are higher than DL, and increasing sharply
as the number of nodes increase while the training times of DL are almost constant
and below that of SL. Similarly, the detection speed of DL is less than that of SL
as the node increases. The experiment results show clearly that the reduction of
dimensions gained through pre-training techniques enabled to build lightweight IDS
as shown in Figs. 8 and 7, respectively. The results indicate that the deep learning
model is superior in detection accuracy, detection rate and false alarm rate over the
shallow model. In the proposed distributed and parallel learning approach, every
fog node trains its data locally using AE and shares the best parameters with the
other nodes via the coordinator node. The details of the architecture are depicted in
Fig. 4. The architecture enabled to contain training data to the edge which would
116 A. Diro et al.
have been trained on a single location, the cloud. It distributes storage, processing
and controls to local nodes so that each fog node can detect cyber-threats for the
nearby IoT devices while best model is exchanged via the coordinator node. As
shown in Table 2, the boost in accuracy of deep model could be a result of model
and parameter sharing among fog network computing nodes. This is compelling
evidence to show that the parameter-sharing approach results in the scalability and
increased accuracy of the detector compared to the standalone detector.
From the detailed analysis, it seems that things-fog architecture provides a
significant reduction in the latency of communication between IoT devices and
fog nodes for detecting suspicious events and cyber-attacks. Though it also enables
to offload computational and processing activities, it doesn’t offer a massive save
of the resources in contrast to the things-fog-cloud scheme. However, distributing
detection schemes across things-fog-cloud enables to save processing and storage
resources as the cloud offloads most of the operations. Nevertheless, this approach
complicates IDS design and incurs relatively higher latency of communication than
the first scheme.
With regard to improving IDS algorithms, a limited number of works in the
literature indicate that there is a strong need for lightweight IDS that can reduce
false alarm rate. In this regard, the things-fog architecture ideal in hosting learning
algorithms in such a way that fog nodes can share best parameters or models in
peer-to-peer or via coordinator.
It is wise to explore the application of blockchain-based fog architecture for
implementing IDS as it has a potential benefit of overcoming the poisoning and
evasion attacks of machine learning models. Technologies such as software defined
networking (SDN) can also provide a global view of fog networks by utilizing its
controller in cooperating fog nodes.
References
1. Diro, A.A., Reda, H.T., Chilamkurti, N.: Differential flow space allocation scheme in SDN
based fog computing for IoT applications. J. Ambient Intell. Humaniz. Comput. 1–11 (2018).
https://doi.org/10.1007/s12652-017-0677-z
2. David, M., Murman, C.: Designing Apps for Success: Developing Consistent App Design
Practices. Focal Press, Burlington (2014)
3. CISCO: Cisco visual networking index predicts near-tripling of IP traffic by 2020. https://
newsroom.cisco.com/press-release-content?type=press-release&articleId=1771211 (2016).
Accessed on June 2018
4. Caron, X., Bosua, R., Maynard, S.B., Ahmad, A.: The internet of things (IoT) and its impact
on individual privacy: an Australian perspective. Comput. Law Secur. Rev. 32(1), 4–15 (2016)
5. Louis, C.: 2017 roundup of internet of things forecasts. https://www.forbes.com/sites/
louiscolumbus/2017/12/10/2017-roundup-of-internet-of-things-forecasts/#3a6fc1051480
(2017). Accessed on July 2018
Collaborative Intrusion Detection Schemes in Fog-to-Things Computing 117
6. Frahim, J., Pignataro, C., Apcar, J., Morrow, M.: Securing the internet of things: a pro-
posed framework. https://www.cisco.com/c/en/us/about/security-center/secure-iot-proposed-
framework.html (2015). Accessed on June 2018
7. Diro, A.A., Chilamkurti, N., Veeraraghavan, P.: Elliptic curve based cybersecurity schemes
for publish-subscribe internet of things. In: International Conference on Heterogeneous
Networking for Quality, Reliability, Security and Robustness, pp. 258–268. Springer, Berlin
(2016)
8. Vaquero, L.M., Rodero-Merino, L.: Finding your way in the fog: towards a comprehensive
definition of fog computing. ACM SIGCOMM Comput. Commun. Rev. 44(5), 27–32 (2014)
9. Bonomi, F., Milito, R., Zhu, J., Addepalli, S.: Fog computing and its role in the internet
of things. In: Proceedings of the First Edition of the MCC Workshop on Mobile Cloud
Computing, pp. 13–16. ACM, New York (2012)
10. Consortium, O.: Definition of fog computing. https://www.openfogconsortium.org/resources/
definition-of-fog-computing (2015) Accessed on Dec 2017
11. Stojmenovic, I., Wen, S.: The fog computing paradigm: Scenarios and security issues. In: 2014
Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 1–8.
IEEE, Piscataway (2014)
12. Klas, G.I.: Fog computing and mobile edge cloud gain momentum open fog consortium,
etsimec and cloudlets (2015)
13. Almseidin, M., Alzubi, M., Kovacs, S., Alkasassbeh, M.: Evaluation of machine learning
algorithms for intrusion detection system. In: IntelligentSystemsandInformatics (SISY) (2017)
14. Andrea, I., Chrysostomou, C., Hadjichristofi, G.: Internet of things: security vulnerabilities
and challenges. In: 2015 IEEE Symposium on Computers and Communication (ISCC), pp.
180–187. IEEE, Piscataway (2015)
15. Solutions, C.F.C.: Cisco Fog Computing Solutions: Unleash the Power of the Internet of
Things. Cisco Systems Inc, San Jose (2015)
16. Diro, A., Chilamkurti, N., Kumar, N.: Lightweight cybersecurity schemes using elliptic curve
cryptography in publish-subscribe fog computing. Mob. Netw. Appl. 22, 848–858 (2017)
17. Ab Rahman, N.H., Glisson, W.B., Yang, Y., Choo, K.-K.R.: Forensic-by-design frame-work
for cyber-physical cloud systems. IEEE Cloud Comput. 3(1), 50–59 (2016)
18. Pajic, M., Weimer, J., Bezzo, N., Sokolsky, O., Pappas, G.J., Lee, I.: Design and implementa-
tion of attack-resilient cyberphysical systems: with a focus on attack-resilient state estimators.
IEEE Control Syst. 37(2), 66–81 (2017)
19. Ring, T.: Connected cars–the next target for hackers. Netw. Secur. 2015(11), 11–16 (2015)
20. McDermott, C.D., Petrovski, A., Majdani, F.: Towards situational awareness of botnet activity
in the internet of things (2018)
21. Kolias, C., Kambourakis, G., Stavrou, A., Voas, J.: Ddos in the IoT: Mirai and other botnets.
Computer 50(7), 80–84 (2017)
22. Packard, H.: Internet of Things Research Study 2015, vol. 2. http://www8.hp.com/h20195
(2014)
23. Fenrich, K.: Securing your control system: the “cia triad” is a widely used benchmark for
evaluating information system security effectiveness. Power Eng. 112(2), 44–49 (2008)
24. Banks, A., Gupta, R.: Mqtt version 3.1. 1. OASIS standard, vol. 29 (2014)
25. Wu, H., Schwab, S., Peckham, R.L.: Signature based network intrusion detection system and
method, 9 Sept 2008. US Patent 7,424,744
26. Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G., Vázquez, E.: Anomaly-based
network intrusion detection: techniques, systems and challenges. Comput. Secur. 28(1–2), 18–
28 (2009)
27. Lee, B., Amaresh, S., Green, C., Engels, D.: Comparative study of deep learning models for
network intrusion detection. SMU Data Sci. Rev. 1(1), 8 (2018)
28. Deepak, B., Pavithra, H.: Applications of machine learning in cyber security. Digit. Image
Process. 10(5), 93–98 (2018)
29. Ford, V., Siraj, A.: Applications of machine learning in cyber security. In: Proceedings of the
27th International Conference on Computer Applications in Industry and Engineering (2014)
118 A. Diro et al.
30. Guy, C.: Introducing deep learning: boosting cybersecurity with an artificial brain. http://
www.darkreading.com/analytics/introducing-deep-learning-boosting-cybersecurity-with-an-
artificial-brain/a/d-id/1326824 (2016). Accessed on July 2018
31. Diro, A., Chilamkurti, N.: Distributed attack detection scheme using deep learning approach
for internet of things. Future Gener. Comput. Syst. 82, 761–768 (2018)
32. Diro, A., Chilamkurti, N.: Deep learning: the frontier for distributed attack detection in fog-to-
things computing. IEEE Commun. Mag. 56(2), 169–175 (2018)
33. Hitaj, B., Ateniese, G., Perez-Cruz, F.: Deep models under the GAN: information leakage
from collaborative deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on
Computer and Communications Security, Dallas, TX, 30 Oct–3 Nov 2017, pp. 603–618. ACM,
New York (2017)
34. Botta, A., De Donato, W., Persico, V., Pescapé, A.: Integration of cloud computing and internet
of things: a survey. Future Gener. Comput. Syst. 56, 684–700 (2016)
35. El Jaouhari, S.: A secure design of WoT services for smart cities. PhD thesis, Ecole nationale
supérieure Mines-Télécom Atlantique (2018)
36. Al Shayokh, M., Abeshu, A., Satrya, G., Nugroho, M.: Efficient and secure data delivery in
software defined wban for virtual hospital. In: 2016 International Conference on Control, Elec-
tronics, Renewable Energy and Communications (ICCEREC), pp. 12–16. IEEE, Piscataway
(2016)
37. Ni, J., Zhang, A., Lin, X., Shen, X.S.: Security, privacy, and fairness in fog-based vehicular
crowdsensing. IEEE Commun. Mag. 55(6), 146–152 (2017)
38. Yi, S., Li, C., Li, Q.: A survey of fog computing: concepts, applications and issues. In:
Proceedings of the 2015 Workshop on Mobile Big Data, pp. 37–42. ACM, New York (2015)
39. Yi, S., Qin, Z., Li, Q.: Security and privacy issues of fog computing: a survey. In: International
Conference on Wireless Algorithms, Systems, and Applications, pp. 685–695. Springer, Berlin
(2015)
40. Diro, A., Chilamkurti, N.: Leveraging LSTM networks for attack detection in fog-to-things
communications. IEEE Commun. Mag. 56(9), 124–130 (2018)
41. Illy, P., Kaddoum, G., Moreira, C.M., Kaur, K., Garg, S.: Securing fog-to-things envi-
ronment using intrusion detection system based on ensemble learning (2019). Preprint,
arXiv:1901.10933
42. Prabavathy, S., Sundarakantham, K., Shalinie, S.M.: Design of cognitive fog computing for
intrusion detection in Internet of Things. J. Commun. Netw. 20(3), 291–298 (2018)
43. Hosseinpour, F., Vahdani Amoli, P., Plosila, J., Hämäläinen, T., Tenhunen, H.: An intrusion
detection system for fog computing and IoT based logistic systems using a smart data approach.
Int. J. Digit. Content Technol. Appl. 10, 34–46 (2016)
44. Berman, D.S., Buczak, A.L., Chavis, J.S., Corbett, C.L.: A survey of deep learning methods
for cyber security. Information 10(4), 122 (2019)
45. Deng, L.: A tutorial survey of architectures, algorithms, and applications for deep learning.
APSIPA Trans. Signal Inf. Process. 3, e2 (2014)
46. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising au-
toencoders: learning useful representations in a deep network with a local denoising criterion.
J. Mach. Learn. Res. 11, 3371–3408 (2010)
47. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.:
Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1 (2015)
48. Li, J., Sun, L., Yan, Q., Li, Z., Srisaan, W., Ye, H.: Significant permission identification for
machine learning based android malware detection. IEEE Trans. Industr. Inf. 14, 3216–3225
(2018)
49. Bartos, K., Sofka, M., Franc, V.: Optimized invariant representation of network traffic for
detecting unseen malware variants. In: USENIX Security Symposium (2016)
50. Tuor, A., Kaplan, S., Hutchinson, B., Nichols, N., Robinson, S.: Deep learning for unsupervised
insider threat detection in structured cybersecurity data streams (2017). arXiv:1710.00811
51. Staff, M., Flieshman, G.: Face ID on the iPhone X: everything you need to know about Apple’s
facial recognition. Macworld, 25 Dec 2017. https://www.macworld.com/article/3225406/
iphone-ipad/face-id-iphone-x-faq.html. Accessed 12 June 2018
Collaborative Intrusion Detection Schemes in Fog-to-Things Computing 119
52. Buczak, A.L., Guven, E.: A survey of data mining and machine learning methods for cyber
security intrusion detection. IEEE Commun. Surv. Tutor. 18(2), 1153–1176 (2016)
53. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., Graves, A., et al.:
Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
54. Wright, R., Dora, R.: Learning to win: making the case for autonomous cyber security
solutions, Aug 2018. https://www.csiac.org/csiac-report/learning-to-win-making-the-case-
for-autonomous-cyber-security-solutions/. Accessed 16 Jan 2020
55. Qinbin, L., Wen, Z., Bingsheng, H.: Federated learning systems: vision, hype and reality for
data privacy and protection (2019). Preprint, arXiv:1907.09693
56. Preuveneers, D., Vera, R., Ilias, T., Jan, S., Wouter, J., Elisabeth, I.: Chained anomaly detection
models for federated learning: an intrusion detection case study. Appl. Sci. 8(12), 2663 (2018)
57. Du, M., Li, F., Zheng, G., Srikumar, V.: DeepLog: anomaly detection and diagnosis from
system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference
on Computer and Communications Security, Dallas, TX, 30 Oct–3 Nov 2017, pp. 1285–1298.
ACM, New York (2017)
58. Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Suvrit, S., Nowozin, S.,
Wright, S.J. (eds.) Optimization for Machine Learning, pp. 351–368. MIT Press, Cambridge
(2012). ISBN 978-0-262-01646-9
59. Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM
59(11), 56–65 (2016)
On the Feasibility of Byzantine
Agreement to Secure Fog/Edge Data
Management
1 Introduction
Fog and edge computing extend the cloud computing model by offloading some of
the storage and computation closer to the data source or user [5, 50, 57, 61]. This
is achieved through introducing extra storage and computing layers between the
cloud data center and fog/edge applications, often latency-sensitive. The benefits
are many: reduced response time, less bandwidth utilization, higher security and
privacy, among others [5, 56, 75].
In particular, security (and privacy) can be improved since data is kept close to
its source rather than being exposed to vulnerable environments all the way to the
cloud center. However, this form of isolation makes the fog/edge nodes prone to
data integrity problems due to potential malicious attacks or arbitrary faults usually
tolerated via replication, e.g., Byzantine fault tolerance [8, 42]. On the other hand,
applications that benefit from replicated or decentralized data across fog/edge nodes
have to face the challenge of data management in untrusted environments.
The need for trusted systems against malicious and arbitrary, a.k.a., Byzan-
tine [19, 41, 69], behaviors is gaining a lot of attention given the rise of the
blockchain technology. Beyond its prime application, i.e., cryptocurrency [48, 73],
blockchains proposed a generic solution to decentralized systems whose nodes are
not (all) trusted. This has recently caused disruption in several fields of computer
technology and economy [51], and revived a new wave of Byzantine fault tolerance
(BFT) research [3, 7, 9, 21, 35, 37, 43, 44], as traditionally known in academia for
A. Shoker ()
VORTEX Colab, Porto, Portugal
H. Yactine
HASLab, INESC TEC and Universidade do Minho, Braga, Portugal
decades [8, 42]. This raises the question whether BFT and blockchain protocols can
be leveraged to empower fog/edge applications in untrusted environments.
A notable observation is that fog/edge computing and Blockchain technologies
introduce complementary security solutions and challenges. While fog/edge com-
puting improves the edge node security and privacy by reducing the attack surface,
i.e., through limiting the exposure to external edge domains, blockchain is geared
towards remote data access in untrusted environments.
In particular, blockchain ensures the integrity of the system against Byzantine
behaviors through employing some variants of agreement protocols to maintain
a total order on operations. This brings the classical CAP and PACELC theo-
rems [1, 22] tradeoffs between availability and consistency, which requires trading
one for the other in geo-replicated systems, where network partition-tolerance is
often impossible.
In this work, we shed the light on the feasibility of BFT/blockchain protocols
to fog/edge computing from a data management perspective. Our aim is to explore
(in Sect. 4) the available BFT approaches and analyze their application to fog/edge
computing considering the different security, consistency, and availability tradeoffs.
While we do not intend to do an exhaustive survey of existing techniques, we try to
ease the understanding of the existing synergies among different areas like fog/edge
computing, blockchain, security, and data management. Our study addresses three
different tradeoff approaches:
1. The first is using the Strong Consistency (SC) model as those Byzantine fault
tolerant State-Machine Replication (SMR) protocols [8, 28, 76]. In general, these
solutions are not effective in the geo-replicated fog/edge setting due to their high
latency on both read and write operations [71].
2. The second is an Eventual Consistency (EC) approach that makes use of
blockchain protocols in a peer-to-peer fashion while using a variant of proof-of-
something protocols (e.g., Proof-of-Work, Proof-of-Stake, etc. [15, 20, 34, 45].
Such protocols provide low response time on read (although stale) operations,
but they are blocking on write operations.
3. The third approach is based on Strong Eventual Consistency (SEC) [58, 62, 79]
that allows (stale) reads as well as immediate writes. This works as long as
operations are designed to be commutative, provided with a conflict-resolution
technique as in Conflict-free Replicated DataTypes or Cloud types [6, 59]. This
approach has recently gained great adoption in industry, in the fault-recovery
model, which encouraged the development of solutions in the Byzantine fault
model as well [62, 79].
Our findings, presented in Sect. 5, demonstrate that addressing Byzantine behav-
iors in fog/edge computing is in its infancy. Despite the recent development of
BFT/blockchain solutions, they are yet to meet the needs of latency-sensitive
fog/edge applications. Interestingly, there is significant overlap in different aspects
of these technologies and their application requirements which encourages further
research in this direction.
On the Feasibility of Byzantine Agreement to Secure Fog/Edge Data Management 123
Fog/edge computing and blockchain emerged in the last decade as two distinct tech-
nologies with different purposes [48, 57]. However, they currently share common
fundamentals and applications. We first overview these technologies demonstrating
their definitions.
Smart things
point of failure. In addition, access to remote data beyond the local edge domain
becomes more challenging since edge nodes and peers are often less protected or
equipped compared to cloud data centers.
The Blockchain was first proposed as an underlying layer for the Bitcoin cryptocur-
rency [48]. It proposed a distributed peer-to-peer infrastructure that implemented
distributed consensus in untrusted environments, based on cryptography and heavy
computation. Any two untrusted peers can transact over the network that guarantees
total order on transactions and immutability via a Proof-of-Work (PoW) leader
election protocol: the peer that solves a cryptographic puzzle [16] is chosen as leader
and has the privilege to commit a set of transactions in a block (in the blockchain).
This has later progressed in three interesting directions.
The first is abstracting the blockchain as an infrastructure in untrusted environ-
ments for applications where a trusted centralized authority cannot be assumed or
guaranteed [7, 37, 73]. This gave rise to a plethora of applications in Fintech, supply
chain, social networks, industry, IoT, economy, etc.
The second direction is the observation that the blockchain security model is
equivalent to the Byzantine model that targets malicious or arbitrary behaviors, and
has been studied for three decades in academia [8, 36, 42]. This suggested new
sustainable alternatives to the PoW protocols that are criticized for being extremely
On the Feasibility of Byzantine Agreement to Secure Fog/Edge Data Management 125
Despite growing in two distinct areas, the fog/edge computing and blockchain tech-
nologies intersect in several aspects. For instance, they both (1) embrace distribution
or decentralization as a means to boost the availability of the deployed service;
and (2) they try to reduce the latency on applications by locating computation and
storage closer to the data source or user.
More interestingly, blockchain can be complementary to fog/edge computing in
security [33, 80]: Although keeping the data close to its source in edge computing
avoids vulnerabilities outside the local edge domain, it makes it a single point
of failure and prone to malicious attacks. This is common in edge applications
where nodes are fat edge nodes with abundant resources that can empower
1 https://www.bbc.com/news/technology-48853230.
126 A. Shoker and H. Yactine
The benefits of fog/edge computing come at the price of new challenges at the
data management level. Indeed, bringing storage and computation close to the
data source, e.g., smart things, is deemed efficient as long as the data queried or
updated is recent and local. Once access to remotely geolocated data sources or more
historical data is sought, requests (Writes and Reads) are either offloaded close to the
data storage or the data is pulled to close-by edge nodes. This is controversial with
the low latency and autonomy requirements of fog/edge applications. The problem
is further aggravated in the presence of faults and Byzantine behaviors or malicious
attacks where an agreement is usually required to maintain a replicated state and
thus exclude such non-benign behaviors.
These challenges suggest some tradeoffs specified by the CAP theorem [22]: one
has to trade availability for consistency or vice versa, given that partition-tolerance
cannot be guaranteed in geo-replicated systems. Consequently, there are three main
data consistency approaches that are usually followed in such systems:
1. Strong Consistency: this trades availability for consistency through ensuring
atomic writes and reads on data before delivering to the application or end
user. This is often implemented through a State-Machine-Replication (SMR)
consensus protocol, like Paxos or Raft [40, 49] under crash-recovery faults, and
PBFT [8] under Byzantine faults.
2. Eventual Consistency: this trades consistency for availability to ensure low
response time (mainly on reads) and high autonomy of fog/edge applications [63,
70]. However, these approaches suffer from finality delays or rollbacks on writes,
which cannot be afforded in most fog/edge applications. Blockchain’s proof-
of-something (PoX), e.g., proof-of-stake and proof-of-work [15, 20, 34] or a
combination between them, is the prominent approach here.
On the Feasibility of Byzantine Agreement to Secure Fog/Edge Data Management 127
Safety in such systems requires the system to satisfy linearizability (or serializ-
ability), i.e., execute operations atomically one at a time [42]. In particular, this
ensures that data invariants in the service cannot be broken by Byzantine clients
or replicas. Safety is usually maintained despite network partitions, and thus the
protocols run under partial synchrony: there is an unknown bound where the
system eventually synchronizes [8]. Liveness on the other hand means that clients
eventually receive replies to their requests, provided that at most one third of the
replicas is Byzantine. Due to the impossibility of implementing consensus in an
asynchronous system [17], such systems rely on a weak form of synchrony, i.e.,
partial synchrony [8], to maintain liveness.
RQ ∩ WQ > 1
On the Feasibility of Byzantine Agreement to Secure Fog/Edge Data Management 129
Client C
replica 0
replica 1
replica 2
replica 3
This guarantees that despite network partitions and up to f Byzantine nodes, the
system can still make progress and commit requests.
The practical Byzantine fault tolerant protocol PBFT is the seminal practical BFT
protocol from which its successors inspire. Before the blockchain era, this protocol
has been considered very costly due to its high latency, low throughput and fault-
scalability (up to f = 5). Consequently, several optimizations have been introduced
in other protocols as in Zyzzyva, Q/U, HQ, UpRight [2, 10, 12, 36], etc.
After the introduction of blockchain’s Proof-of-Work protocol variants, known to
be heavily computation-hungry, PBFT is now referred to as a sustainable protocol.
This urged another round of optimizations that lead to modern protocols like
Tendermint, HyperLedger, HotStuff, SBFT, FBFT [3, 7, 25, 37, 44], etc. Although
these protocols managed to reduce the latency, throughput, and scalability of PBFT,
they could not completely avoid the synchronization overhead of consensus [28, 71].
In particular, the best-case latency where no Byzantine faults exists can range
from one, e.g., SBFT, to three messaging round-trips, HotStuff. On the other hand,
Tendermint and HotStuff make use of primary node rotation (which makes fair use
of primary node features), and thus optimize for the view-change protocol. Finally,
HyperLedger and FBFT try to divide the load over federated groups or channels to
improve the scalability of the system, but they are backed by protocols similar to the
previous ones.
The system model in this category assumes that edge nodes are peers connected
through a peer-to-peer (P2P) network. Nodes communicate through messaging that
are propagated via a gossip protocol in an asynchronous multi-hop manner. Mem-
bership can be private or public. Whereas in the former case nodes’ identities are
known a priori, nodes have to establish Sybil-resistant identities (i.e., public/private
keys) in the latter [48, 73].
A standard method is to solve a computationally-hard puzzle on their locally-
generated identities (i.e., public keys) verified by all other non-Byzantine nodes.
All messages sent in the network are authenticated with the sender’s private key
to prevent spoofing. (To avoid repetition, all assumptions on the network and
cryptographic techniques in the BFT-SMR case hold here as well.) To improve the
scalability, the system can be sharded in such a way several shards can commit
transactions in parallel. Another possibility is to arrange nodes in committees to
assist in committing atomic transactions. Nodes can be fat or thin nodes. Thin nodes
that cannot afford retaining the entire state can hold a cache, and request the missing
data from a fat node.
The safety properties of BFT-P2P are often more relaxed than in BFT-SMR.
In particular, serializability is only enforced eventually. This means there is an
instability period after the execution of a transaction and before it is confirmed. The
guarantee here is that all confirmed transactions are totally-ordered. This however
only holds for update requests. Read requests are delivered locally or requested from
a remote fat node in the case of a cache miss. Importantly, concurrent read requests
are not observed until they get confirmed (which reduces freshness).
On the other hand, the liveness property requires transactions to be eventually
committed. This necessitates the successful election of a leader which is the core
of such systems as explained next. In general, despite this guarantee, such systems
suffer from finality delays due to the starvation caused by concurrent transactions at
scale.
The BFT-P2P approach optimizes for scalability and decentralization. It allows for
thousands to millions of nodes across the globe to participate in the system through
asynchronous messaging via a P2P gossip protocol.
In particular, any node can issue write requests in transactions that are eventually
confirmed by the system. Confirmation is crucial to maintain a consistent state
through ensuring total order of write operations. Since consensus is not practical
at this (up to millions) scale, as discussed in BFT-SMR, this approach relies on a
single leader at a time to confirm requests. For security and availability reasons, this
requires a fair (e.g., random or contribution-based) leader election technique. The
uniqueness property of leader election ensures that no concurrent confirmations are
happening at a time.
Another scalability technique in such systems is segregating transactions (of
requests) into coarse-grained batches, called blocks. In its epoch, a leader can
confirm one block instead of a single request or transaction. The purpose is to reduce
the overhead of the (costly) leader election process in favor of a larger number of
requests confirmed on each epoch. The ultimate goal is to reduce the finality of the
system that is sometimes orders of magnitude higher than BFT-SMR.
A security requirement in blockchain applications is to ensure the immutability of
confirmed blocks against a strong adversary (e.g., that controls half of the network).
To prevent this, confirmed blocks are linked in a chain of blocks, i.e., a blockchain,
by including the reference (i.e., a hash digest) of the last globally committed block
in the blockchain in a newly committed block. This makes it extremely hard for an
adversary to revert back confirmed blocks (unless it controls more than 50% of the
network).
On the Feasibility of Byzantine Agreement to Secure Fog/Edge Data Management 133
Leader election in a BFT-P2P system is a daunting task due the huge number of
peers in the decentralized, likely globally-scaled, system. Consequently, choosing
a leader though atomic broadcast is impractical. There are two main approaches
to do leader election, i.e., via Proof-of-Something (PoX) or Committees. Proof-
of-Something is a generalization of Proof-of-Work (PoW), originally used in the
Bitcoin [48] underlying blockchain leader election (a process called mining).
In PoW, leader election is a competition through which peers try to solve an
extremely hard-to-solve cryptographic puzzle [16]: to find a salt that when hashed
with a digest of the last block in the chain satisfies a publicly known difficulty
property (e.g., ends with a leading number of zeros). The salt stands as a PoW for
the node to confirm a new block (i.e., implicitly chosen as leader for that epoch).
This technique boosts the security since it is arguably very hard to control 51% of
the computing power of the network [72]. Solving the puzzle is also key to reduce
Sybil attacks [14] in public settings. For these reasons, PoW is currently advocated
for scenarios that require high decentralization and security (e.g., Cryptocurrencies).
Unfortunately, the PoW approach is known to be extremely energy-consuming
and not green-friendly [60, 68]. This encouraged the research for several less-costly
alternatives like Proof-of-Stake [34] that is biased to peers that contribute more
to the network. Although PoS is less secure than PoW, it has many benefits over
PoW such as faster processing of transactions lows energy consumption. In addition
to PoS, there are Proof-of-Elapsed-Time (PoET) that uses random waiting time
protocol using trusted execution environment like Intel’s Software Guard Extension
(SGX), and many others [4, 11]. However, it argued that none of the new alternatives
can achieve the security level of PoW. Regardless of the PoX method use, these
protocols are often criticized for having finality delays.
In particular, the time to provide a PoX incurs some waiting time by design,
that ranges from seconds to minutes [48, 73]. This imposes direct impact on write
requests that cannot be committed in a low latency as required in most fog/edge
applications.
An alternative leader election method is to use BFT-SMR protocols within peers
organized in Committees [37]. A committee is a group of elected peers that have
some extra privileges. Since the committee size is usually orders of magnitude
smaller than the entire network size, it is possible to run a consensus protocol.
The committee that runs the consensus protocols can thus choose a leader and
commit transactions. This solution reduces the fairness in the system given the
committee selection technique, and diminishes decentralization that is sought in
several applications. In addition, this technique reduces the finality of transactions,
but it is as best as the BFT-SMR protocols used in a committee.
134 A. Shoker and H. Yactine
Common BFT-P2P are mainly those developed for blockchain applications. All
protocols inspired from the Bitcoin blockchain that proposed the Proof-of-Work
approach [48]. However, Bitcoin’s protocol only considered cryptocurrency appli-
cations. The blockchain protocol used in Ethereum [73] is similar to that of Bitcoin,
however it generalizes the blockchain for any turing-machine style application,
among them supply chain, decentralized organizations, etc. These two protocols
imposed a tuned latency on issuing new blocks (on average 10 min in Bitcoin
and several seconds in Ethereum) and required extensive computation power
(e.g., equivalent to those of complete countries [27]). The underlying Blockchain
protocols of both Bitcoin and Ethereum are BFT-P2P. These protcols however do
not use the classical BFT-SMR consensus to reach agreement. They follow an
asynchronous majority agreement with Game Theory [47] approach in PoW to
decide on a leader that confirms transactions (in a block).
These protocols can maintain security if up to half of the network nodes are
Byzantine [72].
Several protocols have been found later to address these caveats. Among them the
Proof-of-Stake variants suggested freezing a stake (i.e., an amount of cryptocoins)
for an amount of time to have the chance of block confirmation [34]. Other Proof-
of-Something protocols, like Proof-of-Storage, Proof-of-Stake-Velocity, Proof-of-
Credit, Proof-of-Exercise [60], followed the same concept without requiring solving
a crypto-puzzle. Proof-of-Elapsed-Time (PoET) proposed replacing the puzzle with
a time-delay proposed by a protocol in trusted execution environment like Intel’s
Software Guard Extension (SGX), and many others [9, 11].
Alternative protocols that are based on BFT protocols are considered hybrid
protocols of PoX and scalable versions of PBFT (or its variants discussed in
Sect. 4.2). For instance, Tendermint extends the PBFT protocol with Proof-of-Stake
approach, i.e., by giving weights to the peer’s vote. Algorand [21] uses Pure Proof-
of-Stake (PPoS) consensus protocol built on Byzantine Agreement (BA) similar to
PBFT. Each block is approved by a unique committee of nodes that are randomly
selected by a lottery-based verifiable random function based on public information
from the blockchain.
A similar randomness approach used by Dfinity [29] is to have a decen-
tralized randomness beacon to produce random outputs as the seed. Similarly,
Omniledger [35] used a committee selection process in a similar fashion as the
lottery algorithm of Algorand in addition to sharding: splitting the overheads of
processing trans-actions among groups of nodes, called shards. In a similar fashion,
Elastico [43] uses shards processed by smaller BFT-based committees whose
number grows near linearly in the total computational power of the network.
All of these protocols are considered less secure than PoS methods as they can
only tolerate up to one third of Byzantine nodes in addition to sometimes being
vulnerable to DoS attacks. The advantage is that finality latency, e.g., few seconds,
is often less than that of PoX-based protocols.
On the Feasibility of Byzantine Agreement to Secure Fog/Edge Data Management 135
BFT-P2P protocols are geared for large numbers of peers communicating through
gossip protocols. This limits their finality speed and thus imposes high latency
on write requests. In this paradigm, applications are required to expect some
time (seconds or minutes) to make sure a request has been eventually confirmed.
Although this can be acceptable in some fog/edge applications that aim at reducing
the overhead on cloud datacenters, it is not feasible for fog/edge applications where
timely responses are crucial.
In addition, some PoW variants allow for forks that are only resolved later, by
rolling back confirmed blocks. This can have serious implications on some fog
applications where actuators are controlled based on this data. These very solutions
are also very expensive to implement due to the extensive energy consumption that
is scarce—especially—in thin edge networks.
Nevertheless, these protocols are typical for cryptocurrencies due to their high
scalability in public permissionless settings.
This approach assumes a system of edge nodes that are loosely coupled in an asyn-
chronous network. Nodes communicate through messaging via a reliable broadcast
or gossip P2P protocol. The system assumes that at most f fog/edge servers out of
3f + 1 can be Byzantine. In addition, a Strong Eventual Consistency [58] (SEC)
is assumed on data: to reduce response time, edge nodes can serve (concurrent)
read and write requests of clients immediately, and they eventually synchronize
their states in the background. Therefore, a client’s read request retrieves the
value of the locally observed state where write requests may yield conflicts with
remote nodes. This has two implications at the data level: client applications are
assumed to tolerate stale reads, and employ a conflict-resolution technique to
ensure convergence. Examples are the use of Last Writer Wins in Cassandra [38],
Cloud Types [6], or Conflict-free Replicated DataTypes where operations are
assumed/made commutative [59].
Some hybrid variants adopt a two-tier structure where a separate BFT cluster is
used in the backend. In this case, the same assumptions used in Sect. 4.1 holds. In
addition, the cryptographic assumptions made in Sect. 4.1 are also assumed here.
Although in the [58] (SEC) model concurrent writes can lead to divergence, a
conflict resolution technique can ensure system convergence under benign faults.
136 A. Shoker and H. Yactine
Time
Regarding the safety property, the system requires that all resulting states generated
at all correct edge/fog servers are eventually equivalent. This assumes a commuta-
tivity property of used datatypes which leads to equivalent state upon executing the
same log of operations [79]. Furthermore, the fog/edge nodes’ agreement on that
stable state is required to be final.
As for liveness, all correct fog/edge nodes should always agree on a stable state
once operations are delivered thereof. This is often guaranteed by having the distinct
write quorums intersect in more than 2f + 1 nodes despite network partitions.
For the BFT cluster side, when used, liveness and safety properties follow the
same rules described in the BFT-SMR case in Sect. 4.1.
On the Feasibility of Byzantine Agreement to Secure Fog/Edge Data Management 137
S1 S2 S3 S4
Log1 RBcst RBcst RBcst
Proxy1 Proxy2 Proxy3 Proxy4
BFT Cluster B4
B1
Enre Log
B2 B3
Cerfied Log
Fog/edge computing is a promising model that is optimized for data- and latency-
sensitive applications as it keeps computation and data close within the secure edge
domain, i.e., close to data source and user-end [50, 57]. This mitigates the potential
threats that otherwise arise along the way to the cloud center. It however raises new
challenges on the availability of the system against malicious attacks that can exploit
this form of isolation.
On the Feasibility of Byzantine Agreement to Secure Fog/Edge Data Management 139
More recently, fog/edge computing systems are becoming more distributed [50,
61] which requires secure data management in untrusted environments: different
edge nodes belong to untrusted owners or domains. We therefore believe that
making fog/edge systems more resilient to Byzantine behaviors is of high relevance
that is worth more research.
In this work, we analyzed the relevance of Blockchain or Byzantine resilient
solutions to fog/edge computing focusing on the availability and consistency
tradeoffs—raised by the CAP theorem in scalable and available systems [22]. Our
goal is to make it easier for researchers in the different areas to make synergies
between these topics. A summary of the main characteristics and differences
between these approaches is presented in Table 1.
This study lead to these following findings and conclusions:
• Blockchain and BFT protocols have the potential to bridge the distributed
fog/edge systems gap in untrusted environments. This is referred to the common-
alities in fog/edge and blockchain architectures and applications (as discussed in
Sects. 2 and 3).
• Blockchain solutions are still immature to be used in fog/edge computing due to
their limitations on finality latency. However, there are promising advancements
that encourage further research.
• BFT-SMR are strongly consistent solutions that have limited scalability and
high latency on reads and writes. This is not suitable for many latency-sensitive
fog/edge applications. However, applications that can afford a latency up to one
second, e.g., no use of actuators, may benefit from these solutions to improve
their tolerance to Byzantine attacks.
• BFT-P2P PoX are eventually consistent solutions that have high finality latency
on writes (seconds or minutes) which is unacceptable in fog/edge applications.
• BFT-P2P are hybrid (PoX-BFT-based) eventually consistent solutions that have
the potential in the future to support scalable fog/edge applications where a
finality latency of less than a second is acceptable. We encourage more research
in this direction.
Table 1 Comparison of the three different Byzantine-resilient data management approaches for
fog/edge
Approach Considered solutions Consistency Latency Security Fog/edge compatibility
BFT-SMR PBFT, Zyzzyva, Q/U, Strong Medium Medium Low
HQ, UpRight, BFTSMaRt,
BFT-P2P HotStuff, SBFT, FBFT. . . Eventual High High Medium
PoW, PoS, DPoS, PoE,
PoET, PoSV, SGX, PPoS. . .
BFT-SEC OBFT, ByzEc Strong eventual Low Medium Medium
140 A. Shoker and H. Yactine
Acknowledgments The research leading to these results has received funding support from
the project “NORTE-06-3559-FSE-000046—Emprego altamente qualificado nas empresas—
Contratação de Recursos Humanos Altamente Qualificados (PME ou CoLAB)”, financed by the
Norte’s Regional Operational Programme (NORTE 2020) through the European Social Fund
(ESF).
References
1. Abadi, D.: Consistency tradeoffs in modern distributed database system design: cap is only
part of the story. Computer 45(2), 37–42 (2012)
2. Abd-El-Malek, M., Ganger, G.R., Goodson, G.R., Reiter, M.K., Wylie, J.J.: Fault-scalable
byzantine fault-tolerant services. ACM SIGOPS Oper. Syst. Rev. 39(5), 59–74 (2005)
3. Abraham, I., Malkhi, D., Nayak, K., Ren, L., Yin, M.: Sync HotStuff: simple and practical
synchronous state machine replication
4. Aissi, S.: Method and apparatus for secure application execution. US Patent 9,317,689, 19 Apr
2016
5. Bonomi, F., Milito, R., Zhu, J., Addepalli, S.: Fog computing and its role in the internet
of things. In: Proceedings of the First Edition of the MCC Workshop on Mobile Cloud
Computing, pp. 13–16 (2012)
6. Burckhardt, S., Gotsman, A., Yang, H., Zawirski, M.: Replicated data types: specification,
verification, optimality. In: ACM Sigplan Not., vol. 49, pp. 271–284. ACM, New York (2014)
7. Cachin, C.: Architecture of the hyperledger blockchain fabric. In: Workshop on Distributed
Cryptocurrencies and Consensus Ledgers, vol. 310 (2016)
8. Castro, M., Liskov, B.: Practical Byzantine fault tolerance and proactive recovery. ACM Trans.
Comput. Syst. 20(4), 398–461 (2002)
9. Chen, L., Xu, L., Shah, N., Gao, Z., Lu, Y., Shi, W.: On security analysis of proof-of-elapsed-
time (PoET). In: International Symposium on Stabilization, Safety, and Security of Distributed
Systems, pp. 282–297. Springer, Berlin (2017)
10. Clement, A., Kapritsos, M., Lee, S., Wang, Y., Alvisi, L., Dahlin, M., Riche, T.: Upright
cluster services. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems
Principles, pp. 277–290. ACM, New York (2009)
11. Costan, V., Devadas, S.: Intel SGX explained. IACR Cryptol. ePrint Arch. 2016(086), 1–118
(2016)
12. Cowling, J., Myers, D., Liskov, B., Rodrigues, R., Shrira, L.: HQ replication: a hybrid quorum
protocol for Byzantine fault tolerance. In: Proceedings of the 7th symposium on Operating
Systems Design and Implementation, pp. 177–190 (2006)
13. Da Xu, L., He, W., Li, S.: Internet of things in industries: a survey. IEEE Trans. Ind. Inform.
10(4), 2233–2243 (2014)
14. Douceur, J.R.: The Sybil attack. In: International Workshop on Peer-to-Peer Systems, pp. 251–
260. Springer, Berlin (2002)
15. Duong, T., Fan, L., Zhou, H.-S.: 2-hop blockchain: combining proof-of-work and proof-of-
stake securely. Cryptol. ePrint Arch. Report 2016/716 (2016)