AI Paper
AI Paper
AI Paper
https://doi.org/10.1007/s10489-022-03902-9
Abstract
Privacy preserving data release is a major concern of many data mining applications. Using Generative Adversarial
Networks (GANs) to generate an unlimited number of synthetic samples is a popular replacement for data sharing. However,
GAN models are known to implicitly memorize details of sensitive data used for training. To this end, this paper proposes
ADAM-DPGAN, which guarantees differential privacy of training data for GAN models. ADAM-DPGAN specifies the
maximum effect of each sensitive training record on the model parameters at each step of the learning procedure when
the Adam optimizer is used, and adds appropriate noise to the parameters during the training procedure. ADAM-DPGAN
leverages Rényi differential privacy account to track the spent privacy budgets. In contrast to prior work, by accurately
determining the effect of each training record, this method can distort parameters more precisely and generate higher
quality outputs while preserving the convergence properties of GAN counterparts without privacy leakage as proved.
Through experimental evaluations on different image datasets, the ADAM-DPGAN is compared to previous methods and
the superiority of the ADAM-DPGAN over the previous methods is demonstrated in terms of visual quality, realism and
diversity of generated samples, convergence of training, and resistance to membership inference attacks.
Keywords Differential privacy · Generative adversarial network · Deep learning · Information leakage
data. Different attacks [4–6] are conducted against these Through experimental evaluations, ADAM-DPGAN is also
models, which can infer information about training datasets. compared with GANobfuscator method [7], GS-WGAN
Therefore, it is highly demanded to incorporate privacy method [12] and the method of reference [13] on MNIST,
mechanisms with GANs. Fashion-MNIST and CelebA datasets. Experimental results
For this reason, researches [7–14] have recently been demonstrate the superiority of the ADAM-DPGAN over
conducted on the combination of privacy mechanisms the prior methods in terms of visual quality, realism and
with GAN models. These researches generally fall into diversity of generated samples, convergence of training and
two categories. The first category focuses on applying a resistance to membership inference attacks. In summary, the
strong privacy standard, i.e. differential privacy, to GAN contributions of this paper are as below:
models [7–13] and the second category only defends
1. Determining the global sensitivity of GAN model
against a particular attack [14]. Although the first category
parameters in each training step when Adam optimizer
provides strong privacy, some of these approaches [7–9]
is used.
suffer from problems such as low synthetic sample quality,
2. Presenting ADAM-DPGAN algorithm, which uses the
low convergence rate and the need to fine-tune hyper-
introduced sensitivity to guarantee differential privacy
parameters, while the others [10–13] provide privacy only
for GAN models.
for the generator network in GAN model. On the other
3. Proving the differential privacy feature of the ADAM-
hand, the second category’s defense mechanisms make
DPGAN.
assumptions about the adversary’s knowledge and are not
4. Evaluating ADAM-DPGAN under various image
universal solutions. Therefore the need for a solution that
datasets and network structures and demonstrating its
can provide differential privacy for both discriminator and
performance with reasonable privacy budgets.
generator, while generating high-quality synthetic samples
without a low training convergence rate, is an open problem. The remainder of the paper is organized as follows.
To this end, in this paper, ADAM-DPGAN, a differential Section 2 reviews the related work. Section 3 introduces
private method for GAN models is proposed, which GAN and relevant concepts used in the paper. Section 4
provides differential privacy for both the discriminator presents the details of the proposed ADAM-DPGAN
and the generator networks when Adam optimizer [15] method. In Section 5, the proposed method is evaluated and,
is used. In contrast to previous work, ADAM-DPGAN finally in Section 6, the findings are summarized, and the
accurately determines the impact of each sensitive data conclusion is presented.
sample on the model parameters at each step of the learning
procedure and adds appropriate noises to the parameters
to guarantee differential privacy of training data samples 2 Related work
without meaningfully impacting the quality of the final
synthesized outputs. By specifying the maximum effect This paper is mostly related to two strands of literature.
of each sample on the model parameters, the need for First, the literature which involves attacks conducted against
traditional expensive computational operations such as per- machine learning (ML) models to infer information about
example calculation of gradients [7, 8], changing the GAN their training dataset; second, the literature that is about
architecture (e.g. additional neural network training) [9–12] designing privacy preserving mechanisms for ML models.
and clipping the parameters/gradients during the training The following two subsections overview the related work on
procedure [7, 8] is eliminated. As a result, there is no these areas.
need for public data access to adaptively adjust the hyper-
parameters such as clipping bound as in done in [7]. 2.1 Privacy attacks against ML models
Moreover, ADAM-DPGAN guarantees differential privacy
for both the discriminator and the generator networks, Membership inference attack and model inversion attack
and unlike [14], it has no assumptions about attacker’s are two main attacks that infer information about training
knowledge, and unlike [10–13], it can be used when data from the trained model. In a model inversion attack,
the discriminator network needs to be differential private. the attacker uses the trained model’s output to derive
ADAM-DPGAN is built upon the improved Wasserstein the values of the training records’ attributes. A model
Generative Adversarial Network (WGAN) [16], but the idea inversion attack against neural networks is introduced by
can be used with any types of models, even discriminative [20], where the attacker finds the input that maximizes
models, which use Adam optimizer during training. To the returned classification confidence value. Later, Yang
measure privacy loss, ADAM-DPGAN leverages Rényi et al. [21] propose a model inversion attack under an
Differential privacy account [17] and can generate high adversarial setting where a second neural network is trained
quality synthetic samples with reasonable privacy budgets. to reconstruct the sample based on the prediction vector.
11144 M. Azadmanesh et al.
In a membership inference attack, given an input record architecture and training dynamics. Furthermore, the effect
and access to the trained model, the attacker determines if of per-example clipping is theoretically examined in [33]
the record was in the model’s training dataset or not. The and it is theoretically confirmed that it causes slower
first membership inference attack against neural network convergence than its non-private counterpart. In the GAN
models was introduced by [22]. Yeom et al. [23] formulate training, clipping the gradients and adding noise based
the quantitative advantage of adversaries for membership on clipping bounds lead to lower convergence, training
inference attack in terms of generalization error and instability and low-quality synthetic samples. Although
influence. Sablyarose et al. [24] exploit a probabilistic GANObfsure exploits small public data to tackle these
framework to drive an optimal strategy for membership problems, the availability of such public data is not a
inference attack. Nasr et al. [25] present white box practical assumption for many applications.
membership inference attacks againts deep neural networks. Input perturbation is used by Papernot et al. [27] to
The first membership inference attack against genera- provide differential privacy. In their method, which is called
tive models was introduced by [4] in which the authors Private Aggregator of Teacher Ensembles (PATE), multiple
use the discriminator’s output to learn statistical differ- teacher models are trained on disjoint partitions of the
ences between training data members and non-members. training data, and a differential private student model is
Hilprecht et al. [5] conduct an inference membership attack trained on the public data labeled by the noisy voting
using generated samples of the model where a record with among all of the teachers. Scalable PATE [28] improves
the largest number of the nearest generated samples is the utility of PATE with a Confident-GNMax aggregator.
inferred as a membership record. Later, Chen et al. [6] However, both PATE and Scalable PATE require unlabeled
extend Hilprecht’s attack to the white box, partial black box public data availability, and their aggregators can only be
generator setting and full black box setting. applied to categorical data. PATE-GAN [9] is a modified
version of PATE. In PATE-GAN, K teacher-discriminators
2.2 Privacy preserving mechanisms in ML models and one student-discriminator are trained, and the student-
discriminator is trained on the generated synthetic samples
Many approaches have been developed which protect ML labeled by the teachers. In PATE-GAN, the need for public
models against privacy attacks. These approaches generally data access has been resolved, but at the beginning of
fall into two broad categories: 1) differential private the training, the generator cannot generate enough samples
approaches and 2) empirical approaches. The first category labeled as real samples by the teachers and as the training
provides a strong theoretical guarantee for privacy but low progresses, the opposite is true. So it seems the student-
synthetic sample quality. In contrast, the second category generator training procedure may fail to learn the real
cannot guarantee strict privacy and it empirically protects distribution without seeing real samples.
training data against a particular attack while imposing a PATE method has also been combined with gradient
negligible utility loss. perturbation to guarantee differential privacy only for
In the first category, the efforts are mostly based on the generator. In G-PATE [10], aggregated information
gradient perturbation [7, 8, 10–12, 26], input perturbation provided by teacher-discriminators are used to train a
[9, 27, 28] and objective perturbation [29]. student- generator. DATALENS [11] improved the utility of
Abadi et al. [26] propose gradient perturbation to protect G-PATE [10] with top-k gradient compression. GS-WGAN
privacy leakage of training data on the discriminative [12] is another method based on the combination of PATE
models. In their method, per-example gradients of the with gradient perturbation. In this method, at each training
loss function are computed, and the gradients are clipped step, a randomly selected discriminator and the generator
based on clipping bounds and then, to guarantee differential update their parameters. To prevent information leakage
privacy, random noise is added to the clipped gradients. from the selected teacher to the student-generator, Abadi’s
Abadi’s method is also adapted for GAN models, named method [26] with improved WGAN[16] is used, and the
DP-SGD GAN [7, 8]. GANObfscure [7] uses gradient gradient clipping bound is set to one. Han et al. [13]
perturbation in the discriminator of improved WGAN [16]. propose another method which provides privacy guarantee
Torkzadehmahani et al. [8] exploit Abadi’s method with for the generator network. In this method, in each update
conditional GAN [30] to generate synthetic samples with of generator parameters, the discriminator loss is clipped
corresponding labels in a differential private manner. While and the appropriate noise is added to it. Ensuring privacy
the above methods provide privacy guarantee, per example only for the generator is one of the shortcomings of these
calculation of gradients has a computational overhead [31, methods.
32]. Moreover, clipping bounds have a considerable impact Objective perturbation is another method that falls in the
on the model’s performance, and optimal bounds depend first category. Phan et al. [29] exploit objective perturbation
on many hyper-parameters such as learning rate, model’s to guarantee differential privacy for a deep auto-encoder.
ADAM-DPGAN: a differential private mechanism for generative adversarial network 11145
While their approach provides privacy for auto-encoder and G : Y → Y is an arbitrary data-independent mapping,
models, it cannot be generalized to other types of deep then GoF : X → Y also satisfies (, δ)-differential
neural networks. privacy.
As described, the second category of defense mech- Gaussian noise-based mechanism is one of the common
anisms only focuses on mitigating a particular type of mechanisms which gives differential privacy for a real-
attack. Nasr et al. [34] introduce a privacy model that is valued function by adding Gaussian noise scaled to the
robust against membership inference attacks with black box sensitivity of function. Sensitivity of function f (i.e. S f ) is
accesses. To do this, they designed a multi-objective learn- the maximum distance between its output for two adjacent
ing algorithm with the goal of minimizing the classification inputs. Formally, S f is defined as:
loss and the maximum gain of the membership infer-
ence attack. MemGaurd [35] is another defense mechanism S f = maxadj acentX,X | f (X) − f (X ) | (2)
against the black box membership inference attack, which Regarding sensitivity S f , Gaussian noise-based mechanism
adds a noise vector to the predicted classification confi- is formulated as:
dence score. Yang et al. [36] introduce a filtering framework
to defend againts model inversion and membership infer- F (X) = f (X) + N(0, S f 2 σ 2 ) (3)
ence attacks. PrivGAN [14] is another empirical defense
for GAN models which defends against membership infer- where N(0, S f 2 σ 2 ) is the added noise randomly selected
ence attack. The fact that these methods only defend against according to a normal distribution with mean 0 and standard
a particular attack is their shortcoming compared to the deviation S f σ . A Gaussian mechanism complies (, δ)-
2ln( 1.25 )
method presented in this paper. differential privacy if σ 2 ≥
δ
[38].
Composability is a feature of differential privacy
that enables combination of multiple differential private
3 Preliminaries mechanisms into one. However, a basic problem is tracking
the overall privacy loss in the composite mechanism.
In this section, differential privacy [37–39] and related For this purpose, several methods, including advanced
concepts are reviewed. Then GAN [3] and its variants composition theorems [39], moments account method [26]
are introduced; and finally, Adam optimization [15] is and Rényi Differential Privacy (RDP) account method [40]
discussed. have been developed. Moments account [26] and RDP
account [40] work particularly well with uniform subsample
3.1 Differential privacy Gaussian mechanisms, and RDP provides a more accurate
bound for privacy loss than moment account [17]. RDP
Differential privacy is a strong standard to quantify account is defined in terms of Rényi divergence.
individual’s privacy loss for algorithms on aggregated data.
Differential privacy is defined using adjacent databases. In Definition 2 (Rényi divergence [40]) The lambda-order
ML application, adjacent databases refer to two training Rényi divergence between two distributions P and P is
datasets that differ by one training record. Informally, a stated as:
differential private algorithm is a randomized algorithm that
1 P (x)
its output is nearly identical on the adjacent datasets. The Dλ (P P ) logEx∼P [ ( ) λ ]
λ−1 P (x)
formal definition of differential privacy is presented below.
1 P (x)
= logEx∼P [ ( ) λ−1 ] (4)
Definition 1 ((, δ)-Differential-privacy [37]) A random- λ−1 P (x)
ized mechanism F : X → Y complies (, δ)-differential
Definition 3 (Rényi Differential privacy [40]) A random-
privacy if for any two neighboring inputs X, X ∈ X and for
ized mechanism F is (λ, )-RDP with λ ≥ 1 if for
any possible set of outputs S ⊆ Y ,
any neighboring datasets X and X , the Rényi divergence
P r[F (X) ∈ S] ≤ e P r[F (X ) ∈ S] + δ (1) between F (X) and F (X ) satisfied the following equation:
Here, and δ are referred to as privacy budget and 1
Dλ ( F ( X) F ( X ) ) = logEy∼F ( X)
confidence parameter, respectively. λ−1
P r[ F (X) = y] λ−1
[( ) ] ≤ (5)
Post-processing is a useful feature of differential privacy P r[ F (X ) = y]
[38]. It means that any function of the output of the
differential private algorithm will not invade privacy. In this paper, we exploit the following two properties
Formally, if F : X → Y satisfies (, δ)-differential privacy that allow some combination of RDP mechanisms and
11146 M. Azadmanesh et al.
also conversion of RDP bound to (, δ)-differential privacy Distance between the synthesized distribution and the true
bound as proved by Mironov et al. [40]. distribution rather than Jensen-Shannon divergence as in the
original GAN formulation. The objective function of the
Theorem 1 (Composition RDP [40]) Let F1 , ..., Fk be a improved WGAN [16] is given by:
sequence of (λ, i )-RDP mechanisms, then the composition min max Ex∼pdata [ DθD ( x) ) ] −Ez∼pz [ DθD ( GθG ( z) ) ]
k θG θD
Fk ◦ ... ◦ F1 guarantees (λ, i )-RDP.
i=1 +λ( ∇x̃ DθD ( x̃) 2 − 1) 2 (7)
GAN [3] architecture typically comprises two neural Adam is a method that adapts the learning rate of
networks, a generator G and a discriminator D, in which G each neural network weight using the estimation of first
learns to map from a latent distribution pz to the true data and second moments of the gradient. To estimate those
distribution pdata , while D discriminates between instances moments, exponentially moving average of the gradient and
sampled from pdata and those generated by G. G’s objective squared gradient are assessed on current mini-batch [15] as
is to “deceive” D by synthesizing instances that appear to denoted by mt and vt :
be from pdata . The training goal is formulated as
mt = β1 mt−1 + (1 − β1 )gt (8)
min max Ex∼pdata [ log( DθD ( x) ) ]
θG θD
+Ez∼pz [ log( 1 − DθD ( GθG ( z) ) ) ] (6) vt = β2 vt−1 + (1 − β2 )gt2 (9)
where θG and θD represent the parameters of the generator where β1 , β2 ∈ [0, 1). Since the moment estimators
network and the discriminator network, respectively [3]. are biased toward zeros, Adam uses the bias-corrected
Figure 1 shows the GAN architecture. estimation (m̂t and v̂t defined in (10) and (11)):
Despite its simplicity, the original GAN formulation is
unstable and inefficient to train. A number of subsequent mt
m̂t = (10)
studies propose new training procedures and network 1 − β1t
architectures to improve the stability and convergence
vt
rate. In particular, WGAN [41] and improved training v̂t = (11)
of WGANs [16] attempt to minimize the Earth Mover’s 1 − β2t
Using these moving averages, Adam’s parameters are are used to track the accumulated privacy (line 8), which
updated through (12): will be explained in detail in Section 4.2. This process is
m̂t iterated Id times (the number of discriminator iterations
wt = wt−1 − α (12) per generator iteration) and then generator parameters
v̂t + γ are updated (lines 10-12). The optimization procedure is
where γ is a small constant for ensuring stability and α is repeated until reaching convergence.
the stepsize [15].
4 Proposed method
privacy of sensitive data, using the post-processing property met. The following theorem provides a new upper bound for
of differential privacy, it can be concluded that in each step global sensitivity, which is independent of Adam’s gradient
of training, the generator parameters also preserve the pri- moments estimation.
vacy of sensitive data. (3) Since at each step of training, the
discriminator and the generator preserve the privacy of sen- Theorem 4 If Adam optimizer is used in the optimization
sitive real data, using composability property of differential procedureof the discriminator network, assuming that
privacy, it can be proved that the final trained model also β1D ≤ β2D (the common setting of Adam’s hyper-
preserves privacy of sensitive data. Formal privacy analysis parameters), the global sensitivity of the discriminator
is detailed in Section 4.2. parameters in each step of the training will be αD ×
Global sensitivity ( t ) in Algorithm 1 is the main √ 1 × 1
β1 .
1−β2D 1−( √ D )
parameter that must be specified. This is due the fact β2
D
that guaranteeing differential privacy of discriminator
parameters, noise magnitude should depend on global Proof According to (12), Adam’s parameter update rule is
sensitivity of discriminator’s parameters. Therefore, the wt = wt−1 − αD √m̂t . As γ ∼
= 0, the effective step taken
v̂t +γ
following theorem computes the global sensitivity of
discriminator’s parameters in each step of training. in parameter space at time step t is t = αD √
m̂t
. Therefore,
v̂t
expanding m̂t and v̂t , the upper bound of t is formulated
t
Theorem 3 If Adam optimizer is used in the optimization ( 1−β1D ) β1
t−j
gj
1−β2t j =1 D
procedure of the discriminator network, the global sensitiv- as |αD √
m̂t
| = |αD 1−β1t
D
× | =
v̂t
t
ity of the discriminator parameters in each step of training D
( 1−β2D ) β2t−k gk2
αD ( 1−β1D ) D
will be max( αD , √
k=1
) , where αD , β1D , β2D are
1−β2D 1−β2t
t t−j
( 1−β1D ) β1 gj
Adam hyper-parameters in the discriminator. |αD 1−β1t
D
× D
|
j =1 t
D
( 1−β2D ) β2t−k gk2
k=1 D
Proof As Kingma et al. [15] describe, the absolute value If β1D = 0 and β2D = 0, given that 00 = 1,
of the effective stepsize (| t |) in Adam
update rule has
t t−j
( 1−β1D ) β1 gj 1−β2t
two upper bounds: if ( 1 − β1D ) > 1 − β2D , | t |≤ D
is equal to one and |α D 1−β t
D
×
( 1−β1D ) j =1
t 1D
αD √ , otherwise | |≤ αD . Therefore, for any ( 1−β2D ) β2t−k gk2
1−β2D t k=1 D
( 1−β1D )
( 1−β1t )
≤ 1. The last inequality is obtained supposing that where μ0 and μ1 are probability density functions, then the
D
β1
mechanism F satisfies ( λ, ) -RDP for:
β1D ≤ β2D and 1 − ( √ D ) t < 1. It should be noted that
β
2D 1
if the condition β1D ≤ β2D is not met,|αD × √ 1 × ≤ log( max{Aλ , Bλ }) (15)
β1
1−β2D λ−1
1−( √D )t
β2
D
| can be used as the effective stepsize. where Aλ Ex∼μ0 [ ( μ μ0 λ
μ0 ) ] and Bλ Ex∼μ1 [ ( μ1 ) ] .
1 λ
β
1−( √1D )
β2
D
Mironvo et al. [17] show that Aλ ≥ Bλ and the Aλ upper
According to Theorem 4, as in Theorem 3, the higher αD , bound should be specified to track privacy loss. For upper
the more noise is needed to provide privacy at each training bound Aλ , a stable numerical procedure and a closed-form
β1 bound are presented by them. The following theorem shows
step. Also, when β2D → 1 or √ D → 1, more noise
β2D the closed form bound of Aλ for a SGM mechanism with
is needed. In Section 5, we evaluate the proposed ADAM- the sampling rate q and Gaussian noise N( 0, σ 2 ) .
DPGAN algorithm with global sensitivities of Theorem 3
and Theorem 4, which are called ADAM-DPGAN-1 and Theorem 6 (RDP for SGM [17]) If q ≤ 1
5, σ ≥ 4,
ADAM-DPGAN-2, respectively. 1 2 2
2 σ L −ln5−2lnσ
1 < λ ≤ 1 2
2σ L − 2lnσ , and λ ≤ where
L+ln( qλ) + 1 2
2σ
4.2 Privacy analysis L = ln( 1 + 1
) applying SGM to a function with
q( λ−1)
2q 2 λ
l2 -sensitivity of one satisfies ( λ, ) -RDP where σ2
Given the number of discriminator iterations executed
per generator iteration (Id ), the overall RDP for each
According to Theorem 1, 2 and 6, more total number of
training step of the generator can be calculated using
iterations (Ig × Id ) leads to more privacy budget. Also, for
the composition RDP (Theorem 1). Because the RDP
fixed σ , a larger sampling rate (q) leads to larger privacy
computations are transformed to DP based on Theorem 2
budget (i.e. less privacy).
and following the post-processing feature of DP, the
In practice, we use the numerical implementation of RDP
differential privacy for the generator with respect to all
account.1 In each step, of RDP bound is calculated for
records of the training dataset is guaranteed in each training
different orders (λ values). For any given λ, the overall
step. As the generator training step iterates Ig times, the
privacy for t steps can be calculated by × t. Then to find
overall privacy for the final trained model can be calculated
the tighter upper bound, the minimum value of and the
by applying Theorem 1 to the RDP bound for each step
corresponding λ order are used to compute ( , δ) -DP.
and transforming it to DP based on Theorem 2. Therefore,
the main remaining challenge is to compute RDP for each
training step of the discriminator.
During each training step of the discriminator, a batch of
5 Exprimental results
size b from the real dataset is sampled and d-dimensional
In this section, the evaluation of the ADAM-DPGAN is
Gaussian noise with per coordinate covariance σn2 2t is
presented. The ADAM-DPGAN is compared to a DP-
added to discriminator parameters. If the number of all
SGD GAN method, i.e. GANObfscure [7] and also to the
training records is N, the sampling rate q = b/N
GS-WGAN [12] and the method proposed by Han et al.
adds another level of privacy protection and according
[13] in four aspects: 1) the effect of privacy level on the
to Mironov et al. [17], this mechanism is called Sample
visual image quality, 2) the effect of privacy level on the
Gaussian Mechanism (SGM). Mironvo et al. show that in
realism and diversity of the generated samples, evaluated by
the case of SGM, for assessing privacy loss, it is sufficient
Inception Score (IS) [18], Frechet Inception Distance (FID)
to use the following theorem.
[19] and Jensen-Shannon score [3], 3) the effect of privacy
level on stability of training, 4) the effect of privacy level
Theorem 5 (SGM privacy loss [17]) Assume X and X are
on the membership inference attack’s accuracy, and 5) the
two neighboring datasets and F is an SGM applied to a
runtime. Following, the experimental setting is described
function f with l2 -sensitivity of one. If we have:
and then results are discussed.
F ( X) ∼ μ0 N( 0, σ 2 ) (13)
F ( X ) ∼ μ1 (1 − q)N( 0, σ 2 ) + qN( 1, σ 2 )
1 https://github.com/tensorflow/privacy/blob/master/tensorflow privacy/
(14)
privacy/analysis/rdp accountant.py
11150 M. Azadmanesh et al.
Fig. 2 Synthetic generated images with ADAM-DPGAN-1 for differ- budgets on Fashion-MNIST dataset. (c) Synthetic samples versus
ent datasets. (a) Synthetic samples versus different privacy budgets different privacy budgets on CelebA dataset
on MNIST dataset. (b) Synthetic samples versus different privacy
11152 M. Azadmanesh et al.
Fig. 3 Synthetic generated images with ADAM-DPGAN-2 for differ- budgets on Fashion-MNIST dataset. (c) Synthetic samples versus
ent datasets. (a) Synthetic samples versus different privacy budgets different privacy budgets on CelebA dataset
on MNIST dataset. (b) Synthetic samples versus different privacy
ADAM-DPGAN: a differential private mechanism for generative adversarial network 11153
Fig. 4 Synthetic generated images of different methods (GANObfs- MNIST dataset. (b) Synthetic samples on Fashion-MNIST dataset. (c)
cure [7], GS-WGAN [12] and the method proposed by Han et al [13]) Synthetic samples on CelebA dataset
with = 1, δ = 10−5 on different datasets (a) Synthetic samples on
Fig. 5 Inception scores of synthetic samples for δ = 10−5 and different privacy budgets
11154 M. Azadmanesh et al.
Fig. 6 Frechet Inception Distance of generated samples for δ = 10−5 and different privacy budgets
reason is less global sensitivity in ADAM-DPGAN-1 than is measured. Formally, Jenson Shannon divergence of these
ADAM-DPGAN-2. distributions is defined as [3]:
FID score is another metric for evaluating the GAN 1 1
S(G) = KL( p( w | u) Bp ) + KL( Bp p( w | u) ) (18)
performance, which captures the similarity of the generated 2 2
images to the real ones. Formally, FID is defined as [19]:
where Bp is Bernoulli distribution with parameter p = 0.5
1 and p( w | u) is the conditional distribution of discrimina-
F I D = μt − μs 2 +tr( + −2( ) )
2 (17)
t s t s tor’s output to predict u’s label as w (real/synthetic sample
label). The more the synthetic samples are similar to the
where xt ∼ N( μt , t ) and xs ∼ N( μs , s ) are element
real samples, the shorter the distance between the condi-
vectors of a specific layer of an Inception Network for
tional distribution and the Bernoulli distribution. Therefore,
real and generated images, respectively and tr denotes
the lower value of S( G) indicates a better generator.
trace of a matrix. A lower FID value indicates more
similarity between the real and the synthetic images, which
corresponds to the higher quality of the generated images.
Figure 6 demonstrates the FID score of different methods
on labeled data (i.e. MNIST and FASHION-MNIST). =
∞ is related to the model without employing any privacy
mechanism. As this figure shows, the FID score increases
when the privacy budget decreases. The FID score of
the proposed method is slightly higher than that of the
Improved WGAN [16] without any privacy mechanism,
and it is considerably lower than GANObfscure [7], GS-
WGAN [12] and Han’s method [13]. The ADAM-DPGAN-
1 is slightly better than the ADAM-DPGAN-2, because
the global sensitivity in ADAM-DPGAN-1 is lower than
ADAM-DPGAN-2.
To quantitatively assess the proposed method on the
unlabeled dataset (i.e. CelebA), first, another discriminator,
D is trained to classify real and synthetic samples. Using
the output of the discriminator, Jenson Shannon divergence
between the conditional probability of the discriminator’s Fig. 7 Jensen-Shannon score of synthetic samples on CelebA for δ =
output and a Bernoulli distribution with parameter p = 0.5 10−5 and different privacy budgets
ADAM-DPGAN: a differential private mechanism for generative adversarial network 11155
Fig. 11 LOGAN [4] attack performance against ADAM-DPGAN for different privacy budgets and δ = 10−5 on different datasets
Figure 7 shows the Jensen-Shannon score on the GS-WGAN [12], the variance of discriminator and gen-
CelebA dataset. As the figure shows, Jensen-Shannon score erator losses remains modest, and only small fluctuations
decreases with the increase of the privacy budget. ADAM- are observed, which is the result of min-max training. In
DPGAN-1 generates more realistic samples than the other GANObfscure [7] and Han’s method [13], the oscillations
methods and ADAM-DPGAN-2 is better than GS-WGAN in the losses are more considerable.
[12] and Han’s method [13].
5.5 The effect of privacy level on the membership
5.4 The effect of privacy level on training stability inference attack’s accuracy
To evaluate the effect of privacy loss on stability and con- Although the privacy feature of the proposed method has
vergence of GAN models, the generator and discriminator been proved in Section 4.2, it is appropriate to examine
losses are tracked during training steps and are reported the resistance of the proposed solution against attacks.
every 50 iterations on the generator (Tg ). Figures 8, 9 and 10 In this section, the resistance of the proposed method
compare the discriminator and generator losses during train- against the membership inference attacks is examined.
ing on the MNIST, Fashion-MNIST, and CelebA datasets According to [6], LOGAN discriminator-accessible attack
for all methods when the privacy budgets are set to 8 and [4] outperforms the others [5, 6], so we evaluate our
2. As seen, in ADAM-DPGAN-1, ADAM-DPGAN-2 and methods against this attack. Since the size of the training
Fig. 12 LOGAN [4] attack performance against different methods for various privacy budgets and δ = 10−5 on MNIST and Fashion-MNIST
datasets
ADAM-DPGAN: a differential private mechanism for generative adversarial network 11159
40. Mironov I (2017) Rényi differential privacy. In: IEEE 30th Behrouz Sahgholi Ghah-
computer security foundations symposium (CSF), pp 263–275 farokhi received his B.Sc.
41. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative in Computer Engineering
adversarial networks. In: International conference on machine (2004), his M.S. in Artificial
learning, pp 214–223 Intelligence (2006), and his
42. Chen X, Liu S, Sun R, Hong M (2019) On the convergence of Ph.D. in Computer Architec-
a class of ADAM-type algorithms for non-convex optimization. ture (2011) from University
In: 7th International conference on learning representations (ICLR of Isfahan. He joined the Uni-
2019), pp 1–43 versity of Isfahan in 2011 and
43. Radford A, Metz L, Chintala S (2015) Unsupervised represen- he is now associate professor
tation learning with deep convolutional generative adver-sarial at Faculty of Computer Engi-
networks. arXiv:1511.06434 neering. His research interests
include mobile communica-
Publisher’s note Springer Nature remains neutral with regard to tions, network security, and
jurisdictional claims in published maps and institutional affiliations. artificial intelligence.