Discrete Bilal Distribution With Right-Censored Da
Discrete Bilal Distribution With Right-Censored Da
Discrete Bilal Distribution With Right-Censored Da
ORG
Preto Medical School, University of São Paulo (USP), Ribeirão Preto, Brazil
ARTICLE HISTORY
Compiled November 4, 2021
ABSTRACT
This paper presents inferences for the discrete Bilal (DB) distribution introduced
by Altun et al. (2020). We consider parameter estimation for DB distribution in the
presence of randomly right-censored data. We use maximum likelihood and Bayesian
methods for the estimation of the model parameters. We also consider the inclusion
of a cure fraction in the model. The usefulness of the proposed model was illustrated
with three examples considering real datasets. These applications suggested that the
model based on DB distribution performs at least as good as some other traditional
discrete models as the DsFx-I, discrete Lindley, discrete Rayleigh, and discrete Burr-
Hatke distributions. R codes are provided in an appendix at the end of the paper
so that reader can carry out their own analysis.
KEYWORDS
Survival analysis; Maximum likelihood estimation; Cure fraction; Bayesian
inference; Discrete distributions; Censored data.
1. Introduction
Survival analysis is one of the statistical techniques most commonly encountered in the
medical literature (Flynn, 2012). These methods are applied when the time until the
occurrence of an event is the object of interest. Examples in medical research include
the time to respond to treatment, relapse-free survival time, time to death, time to
device failure, and time to regain mobility (Myers, 2007). The Kaplan-Meier plots, log-
rank tests, and Cox (proportional hazards) regression model are the most widely used
survival analysis techniques in medical studies (Le Rademacher and Wang, 2021). As
an alternative to the traditional proportional hazard model, parametric models have
become popular in the last decades. The parametric models assume that the time-to-
event variable follows a known probability distribution, such as Weibull, gamma, or
the log-normal distributions. Among the discrete distributions proposed in the sta-
tistical literature to model time-to-event data, we have the discrete Weibull distribu-
tion (Nakagawa and Osaki, 1975), the discrete Lindley distribution (Gómez-Déniz and
Calderı́n-Ojeda, 2012), the exponentiated discrete Weibull distribution (Nekoukhou
and Bidram, 2015; Cardial et al., 2020; Freitas et al., 2021), the discrete generalized
Corresponding author: Ribeirão Preto Medical School, Av. Bandeirantes 3900, University of São Paulo (USP),
Ribeirão Preto, 14049-900, Brazil. E-mail: [email protected]
Rayleigh distribution (Alamatsaz et al., 2016), and the discrete Sushila distribution
(Oliveira et al., 2019).
Let X be a random variable denoting a survival time, and let x be an observation
of X. The continuous Bilal distribution introduced by Abd-Elrahman (2013) has a
probability density function (pdf) given by
6 2x x
fX (x) = e− θ 1 − e− θ , x ≥ 0, θ > 0
θ
and probability accumulated distribution function given by
2x x
FX (x) = 1 − e− θ 3 − 2e− θ .
The survival function, that is, the probability that an individual survives at least until
time x, is given by SX (x) = 1 − FX (x). The author named this distribution as Bilal
since this is his youngest son’s name (Abd-Elrahman, 2013). Classical and Bayesian
approaches to find an estimated value for the parameter θ of a Bilal distribution based
on a given type-2 right censoring sample are given by Abd-Elrahman and Niazi (2017).
Generalizations of the Bilal distribution are found in the works of Abd-Elrahman
(2019) and Shi et al. (2019).
To obtain a discrete version of the Bilal distribution, Altun et al. (2020) considered
that the random variable T has probability mass function (pmf) given by
where X is the underlying continuous random variable, T = [X] (the largest integer
less than or equal to X), and SX (x) = P (X > x) (see Methodology-IV
in the article
− 2t − t
by Chakraborty (2015)). Thus, replacing SX (t) by e θ 3 − 2e θ and SX (t + 1) by
2(t+1)
t+1
e− θ 3 − 2e− θ in the expression (1), we have
2t
t
2(t+1)
t+1
P (T = t) = e− θ 3 − 2e− θ − e− θ 3 − 2e− θ . (2)
1
Let us assume the parameter transformation p = e− θ , 0 < p < 1. From (2) and,
following the notation of Altun et al. (2020), the pmf of the discrete Bilal distribution
is given by
To simplify the obtaining of estimators for the parameters of the discrete Bilal distri-
bution, we consider the reparameterization p = e−β , where β > 0. Thus, the pmf is
2
given by
Altun et al. (2020) showed that the mean and the variance of a random variable T
that follows a discrete Bilal distribution with parameter β are respectively given by
and
3
(a) f(t), β = 0.05 (b) S(t), β = 0.05 (c) h(t), β = 0.05
0.05 1.0 ●●
0.10
●●●● ●● ●●
●● ●● ●● ●●● ●●● ●●●●
0.04 ● ●● 0.08 ●● ●● ●●
0.8 ● ● ● ●●
● ●● ●
●● ●●●
0.03 ●● ●● 0.06 ● ●●
S(t)
0.6
h(t)
● ●●
f(t)
●● ●●
●● ●● ●
0.02 ●● 0.04 ●
● ●● 0.4 ●● ●
●● ●● ●
●● ●●
0.01 ●●● 0.02
●
0.2 ● ●● ●● ●
●● ●● ●● ●
0.00 0.0 0.00
0 2 4 6 8 11 14 17 20 23 26 29 0 2 4 6 8 11 14 17 20 23 26 29 32 35 0 2 4 6 8 11 14 17 20 23 26 29 32 35
S(t)
0.6
h(t)
● ●
f(t)
0.10 ●
● ● ●
0.04 ● 0.4 ●
● ● ●
● ●
● ● ● ● ● 0.05
0.02 ● ● 0.2 ● ●
● ● ● ● ● ● ●
● ● ● ●
0.00 0.0 0.00
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
S(t)
0.6
h(t)
● 0.15
f(t)
●
● ●
● 0.4 0.10
0.05 ● ● ●
● ●
● 0.2 ● 0.05 ●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
0.00 0.0 0.00
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
0.6
h(t)
●
f(t)
0.10 ●
0.2 ●
● ●
●
0.4
●
0.05 ● ● 0.1 ●
● 0.2 ●
● ● ● ●
● ● ● ● ● ● ● ● ● ●
0.00 ● ● ● ● ● ● 0.0 ● ● ● ● ● ● 0.0
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
0.6
h(t)
f(t)
0.2 ●
0.4
0.4
●
0.1 ●
0.2
0.2
●
● ●
0.0 ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● 0.0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Figure 1. The pmf, survival function and hazard function of DB(β) for different values of β.
2. Methods
4
By deriving the log-likelihood function with respect to β, we have the following equa-
tion:
n n
2e−β(ti −1) 1 + 2e−β + 2ti e−βti e−2β + e−β + 1 − 3e−β
d` X ne−β X
= −2 ti − −β − .
dβ e −1 2e−βti (e−2β + e−β + 1) − 3e−β − 3
i=1 i=1
(5)
The maximum-likelihood (ML) estimator βbM L for β is obtained by equating the right-
hand side of (5) to zero and solving for β. Nevertheless, the resulting expression has not
a closed-form solution and so numerical methods are needed to find the ML estimate
for β. In this article, we use the maxLik package in R program to obtain the ML
estimate of the parameter β (Henningsen and Toomet, 2011). A confidence interval
for β can be constructed from the asymptotic normality of the ML estimate considering
large sample sizes, given by
βbM L ∼ N β, Vd
ar(βbM L ) ,
where zυ denotes the upper υ-th percentile of the standard normal distribution. The
asymptotic variance of a ML estimator can be estimated by the negative of the inverse
of the second derivative of the log-likelihood function evaluated at βbM L . Thus, we have
n
d2 ` ne−β X Ai
2
= − 2 + 2, (6)
dβ −β
(e − 1) −βt −2β + e + 1) − 3e−β − 3]
−β
i=1 [2e
i (e
where
Ai = −2e−β(ti −1) (3 − 2e−βti + 6e−2β + 12e−β ) + 2ti e−β(ti −1) (3 + 9e−β − 6e−2β + 4e−βti )
+6ti e−βti 2e−2β − e−2β − e−β + ti + 3e−β 2e−βti − 2e−2β e−βti − 3
+4e−2β (6ti − e−β + 6ti e−β + 4ti e−2β − 4) + 6t2i e−β(1+ti ) 2e−β + e−2β + 2 .
The negative second derivative of the log-likelihood function is the observed informa-
tion denoted by In (β), that is,
d2 `
In (β) = − .
dβ 2
The expected value of In (β), say in (β), is called the expected Fisher information.
The asymptotic variance of βbM L is given by the inverse of the expected information
evaluated at the ML estimate of β, that is,
h i−1
Vd
ar(βbM L ) = in (βbM L ) .
5
2.2. Maximum-likelihood estimation in presence of censored data
Considering a random sample (ti , di ) of size n, i = 1, · · · , n, the contribution of the
ith individual to the likelihood function is given by
Setting this expression equal to zero, we get the corresponding score equation whose
numerical solution leads to the ML estimator. The second derivative of the log-
likelihood function with respect to β is given by
n n
d2 ` X e−β X Bi
= − di 2 − di 2
dβ 2 (e−β − 1) [2e−βti (e−2β + e−β + 1) − 3e−β − 3]
i=1 i=1
n
X e−β(ti +1) (ti + 1)2
−6 (1 − di ) 2 ,
i=1 2e−β(ti +1) − 3
6
where
Bi = 6t2i e−βti e−3β + 2e−2β + 2e−β e−βti + 1
+12ti e−βti e−2β e−β + 2 + 6e−βti e−β e−2β + 4e−β + 2
−4e−2βti e−β e−2β + 4e−β + 1 − 9e−β .
where η is the proportion of immune, cured or not susceptible individuals, and S0 (t)
is the baseline survival function for the susceptible individuals (Farewell, 1982). Con-
sidering a random sample (ti , di ) of size n, i = 1, · · · , n, the contribution of the ith
7
individual to the likelihood function is given by
Li = [f (ti )]di [S (ti )]1−di = [(1 − η)f0 (ti )]di [η + (1 − η)S0 (t)]1−di ,
where di is a binary censoring indicator variable and f0 (t) is the corresponding baseline
probability function. Assuming the mixture model based on the DB distribution, the
likelihood function for β and η is given by
n
Y h id i
L( β, η| t, d) = (1 − η)di e−2βdi ti (e−β − 1)di 2e−βti (e−2β + e−β + 1) − 3e−β − 3
i=1
n h i o1−di
× η + (1 − η) 3 − 2e−β(ti +1) e−2β(ti +1) .
− di
2e−βti (e−2β + e−β + 1) − 3e−β − 3
i=1
n
(ti + 1) e−2β(ti +1) e−β(ti +1) − 1
X
+6 (η − 1) (1 − di ) n 1 o,
η + (1 − η) e−2β(ti +1) 3+2 e −2β(ti +1) 2
i=1
and the first derivative of the log-likelihood function with respect to η is given by
n n −β(t +1) 2
+ 1 e−β(ti +1) − 1
∂` 1 X X 2e i
=− di + (1 − di ) n 1 o.
∂η 1−η
η + (1 − η) e−2β(ti +1) 3 + 2 e−2β(ti +1) 2
i=1 i=1
Setting these expressions equal to zero and solving them simultaneously we get the ML
estimators of the parameters β and η. Although we cannot obtain explicit expressions
for the ML estimators for these parameters, they can be estimated numerically using
iterative algorithms such as the Newton-Raphson method and its variants.
The second partial derivatives of the ML function are given as follows:
n n
∂2` X e−β X Ci
2
= − d i 2 + 6 (η − 1) (1 − di ) (ti + 1)2 e−2β(ti +1) ,
∂β (e−β − 1) Di
i=1 i=1
8
where
( )
−β ti +1
Ci = η 2 − 3e + 5 (η − 1) e−3β(ti +1)
and
Di = η 2 + 6η (1 − η) e−2β(ti +1) + 4ηe−3β(ti +1) + 9 (1 − 2η) e−4β(ti +1) + 4 (1 − η)2 e−6β(ti +1)
(
−β ti +1 )
+12 (1 − 2η) e−4β(ti +1) e + 9η 2 e−4β(ti +1) − 4η 2 e−3β(ti +1) + 12η 2 e−5β(ti +1) ,
n n −β(t +1) 2
+ 1 e−β(ti +1) − 1 3e−2β(ti +1) + 2e−3β(ti +1) − 1
∂2` 1 X X 2e i
= di + (1−di ) ,
∂η 2 (η − 1)2 i=1 i=1
Ei
where
and
n
(ti + 1) e−2β(ti +1) e−β(ti +1) − 1
∂2` X
=6 (1 − di ) ,
∂β∂η Fi
i=1
where
h i h i
Fi = η 2 +2η (1 − η) 3e−2β(ti +1) + 2e−3β(ti +1) +(1 − η)2 9e−4β(ti +1) + 12e−5β(ti +1) + 4e−6β(ti +1) .
where V11 is the variance of βbM L , V22 is the variance of ηbM L , and V12 = V12 is the
covariance between βbM L and ηbM L . Approximate 100(1 − υ)% Wald-type CIs for β and
η are, respectively, given by
q q
βbM L ∓ zυ/2 Vd
ar(βbM L ) and ηbM L ∓ zυ/2 Vd
ar(ηbM L ),
where zυ denotes the upper υ-th percentile of the standard normal distribution. The
asymptotic variances of the ML estimators are given by the elements of the inverse of
the Fisher’s information matrix. The expected information matrix is given by
2 2
∂ ` ∂ `
−E ∂β 2 −E ∂β∂η
in (β, η) = ,
∂2` ∂2`
−E ∂β∂η −E ∂η 2
9
where the derivatives are provided above. R code for implementing this procedure is
presented in the Appendix.
3. Examples
In this section, we illustrate the estimation procedure proposed here with three ex-
amples from the literature. We compare the fits of the discrete Bilal distribution with
some competitive models such as DsFx-I (Eliwa and El-Morshedy, 1996), discrete
Lindley (Gómez-Déniz and Calderı́n-Ojeda, 2012), discrete Rayleigh (Roy, 2004), and
discrete Burr-Hatke (El-Morshedy et al., 2020) distributions. All these distributions
have only one parameter to estimate. The fitted models are compared using Akaike
and Bayesian information criteria (AIC and BIC). When the sample size is small,
many authors have suggested using the corrected AIC (AICC) as an alternative to
AIC (Hurvich and Tsai, 1989).
10
1.0
●
● ●
● ●
● ● ● Kaplan−Meier estimates
● ●
● ● DB model (AIC 132.6, BIC 133.7, AICC 132.8)
● ● ●
0.8
●
● DsFx−I model (AIC 135.3, BIC 136.3, AICC 135.5)
● ● ● ● Discrete Lindley model (AIC 133.2, BIC 134.3, AICC 133.4)
●
Survival function ● ● ● ● Discrete Rayleigh model (AIC 136.0, BIC 137.0, AICC 136.2)
●
0.6
● ● ● ● Discrete Burr−Hatke model (AIC 178.8, BIC 180.0, AICC 179.0)
● ●
● ●
● ● ●
● ●
0.4
●
● ●
●
●
● ● ●
●
● ●
● ●
●
0.2
● ●
● ●
●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
●
●
0.0
0 5 10 15 20 25
Time (weeks)
Figure 2. Survival function for the acute leukemia patients’ data estimated by the Kaplan-Meier method
and by using the models based on the DB, DsFx-I, discrete Lindley, discrete Rayleigh, and discrete Burr-Hatke
distributions.
1.5
● ●
●
●
Sample Quantiles
Sample Quantiles
●
1.0
●
● ●
●● ●●
0.5
●● ●●
● ●● ●●●●
●●
0.0
●
1
−0.5
●● ●●
●● ●●
●
● ● ●
●
K−S, p−value 0.9238
● ● ● ●
−1.0
−1.5
● K−S, p−value 0.7939
Sample Quantiles
●
● ●
●
● ● ● −2 −1 0 1 2 −2 −1 0 1 2
0
2.0
●
2
●
● ●
Sample Quantiles
Sample Quantiles
● ●
● ●● ● ●
−1
● ●●●●
●
1.0
● ●● ● ●●
●● ●●
0
● ●
●
●●● ●
0.0
● ●
● ●●
●
−2
●
● K−S, p−value < 0.0001
−2
−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2
Figure 3. Randomized quantile residuals for the models based on the (a) DB, (b) DsFx-I, (c) discrete
Lindley, (d) discrete Rayleigh, and (e) discrete Burr-Hatke distributions, fitted to data from patients with
acute leukemia.
11
residual analysis and the corresponding p-values for the Kolmogorov-Smirnov (K-S)
test for normality. The model based on the DB distribution appears to fit the data
reasonably well, as does the model based on the Lindley distribution. Figures 2 and
3 suggest that the discrete Rayleigh and discrete Burr-Hatke distributions do not fit
the data well.
(a) Traceplot (b) Posterior density (c) ACF plot
1.0
30
0.14
25
0.8
0.12 20
0.6
Density
ACF
0.10 15
0.09111
0.4
0.08 10
0.2
0.06 5
0.0
0 ● ● ●
0.04
0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0 5 10 15 20
Iterations Lag
Figure 4. Posterior samples for the model parameters of the DB distribution applied to data from patients
with acute leukemia. (a) Traceplot of posterior samples, (b) histogram and posterior density with the corre-
spondent 95% HDI (blue line), and (c) auto-correlation function (ACF) plot for the posterior samples of the
model parameters.
12
1.0
●
● ● ●
● ● ●
● ● ● Kaplan−Meier estimates
●
● ● ● ● DB model (AIC 1019.9, BIC 1023.0, AICC 1019.9)
● ●
●
0.8
● ● DsFx−I model (AIC 1019.0, BIC 1022.1.3, AICC 1019.0)
● ●
● ●
● ●
● Discrete Lindley model (AIC 1014.6.2, BIC 1017.6, AICC 1014.6)
● ●
● Discrete Rayleigh model (AIC 1054.6, BIC 1057.7, AICC 1054.6)
Survival function ● ●
● ●
0.6
● ●
● ● ● Discrete Burr−Hatke model (AIC 1359.3, BIC 1362.3, AICC 1359.3)
● ● ●
● ● ●
● ●
●
● ●
0.4
●
●
● ●
● ● ●
●
● ● ●
● ●
● ●
●
● ●
0.2
● ●
● ●
● ●
● ●
● ●
● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
0.0
0 5 10 15 20 25 30
Figure 5. Survival function for the COVID-19 patients’ data estimated by the Kaplan-Meier method and
by using the models based on the DB, DsFx-I, discrete Lindley, discrete Rayleigh, and discrete Burr-Hatke
distributions.
estimated by the Kaplan-Meier method and fitted by parametric models based on the
DB and other distributions. Figure 6 shows the randomized quantile residuals from
the fitted models. We note that the DB distribution fitted the data as well as the
DsFx-I and the discrete Lindley distributions, but the models based on the discrete
Rayleigh and discrete Burr-Hatke distributions did not fit the data well.
In the Bayesian analysis, as in the previous example, we assumed a gamma prior dis-
tribution for the parameter β given by β ∼ Gamma(0.001, 0.001). Figure 7 describes
the posterior samples for β. The traceplot in panel (a) shows that the MCMC algorithm
has stabilized, and the corresponding Geweke z-score is 0.033, also suggesting satisfac-
tory convergence. Panel (b) describes the shape of the posterior distribution and shows
the 95% HDI. The posterior mean for β is βbBayes = 0.07051, and the corresponding
95% HDI is (0.0626, 0.0787). This MCMC estimate is closer to the corresponding ML
estimate. The ACF plot shows that autocorrelations are not significantly different from
zero (panel (c)).
13
(a) DB model (b) DsFx−I model (c) Discrete Lindley model
● ●
● ●
● ●
2
● ●● ●●
●●●
2
●●
Sample Quantiles
Sample Quantiles
●● ●
●●
●●●
●●●●
● ●
● ●●
●●
●●
●● ●
●
●●●
●
●
●●
●●●
●●
● ● ●●
●
1
●● ●
1
●
●
●●
● ●●
●
●● ●
●
●
●
●●
● ●
●
●●
●
●
●●
●
2
●
●●
● ●
●
●●
●
● ●
●
●●
●●
●
●●
●●
●
●●
●
●●
●●
●
●●
●●
●●
● ●●
● ●
●●
● ●
●
0
●
●●●●●●
●
● ●
●
0
●
● ●
●●
●
●
●
●●
●
● ●
●
●
●
● ●●
●
●● ●●
●
●
●●
●
●●● ●●
●
●
●
●●
●
●●
●
● ●●
●
●●
● ●
●●
●
●
●●
●
●
●
●●
●
●●
●
● ●
●●
−1
−1
●●● ●
●
●●
●●
●●●●●● ●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●
●● ●●
●
● ●●● K−S, p−value 0.1712 ●
●●
●
●
●
●
● ●● ●●●
●
1
● ●●●
−2
●●
−2
●
●
●
●
● K−S, p−value 0.5905
●
●●●
●
●● ●
Sample Quantiles
● ● ●
●
●●
●●
●
●●
●●●
●
● ●
●
●
●
●
●●●
● −2 −1 0 1 2 −2 −1 0 1 2
●
●●
●
●●
●
●
●
●
●●
●
●
●
0
●●
●
●●
●
●●
●
●
●
●
●
●
● Theoretical Quantiles Theoretical Quantiles
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
● (d) Discrete Rayleigh model (e) Discrete Burr−Hatke model
●
4
●
●●
●●
●
4
●
−1
●●
●
● ●
Sample Quantiles
Sample Quantiles
●
●
3
●
●
3
● ●
●
● ●●
●
●● ●●●
●●● ●●●●
●●
2
●● ● ●●
2
● ●●
●●
●
●● ●●
●
●●
●●
● ●
● ●●
●●
●●
●
●
●●
● ●●●●
●
●●
●● ●●
●● ●●
●
●
●●
●●●
● ●●
●
●●
● ●●
●
●●
●
●●
● ●
●
●
●●
●
●
●● ●●
●●
●●
●●●
●
●●
1
●● ●
●●
●●
●
●●●
●
●●●
●
1
●
● ●
●●
●
●
●●●
●
●
●●
●
●
●● ●
●
●
●●●
−2
●
●
●
● ●●
●●
●
●●
●
●
●
●
●
● ●●
●●
●●
●
●
●
●
● ●●
● ● ●
−1 0
●●
●●
●
●● ●●●●
●
●●
● ●
●
●●
●
●
●
● ●
● ●
●
●
●
●●
●
●
●
−1
●
●●
●
● ●
●●
●
●
●●
●
●
●●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●● K−S, p−value < 0.0001
●
●●●●●●
● K−S, p−value < 0.0001
−3
−3
−3
● ●
−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2
Figure 6. Randomized quantile residuals for the models based on the (a) DB, (b) DsFx-I, (c) discrete Lindley,
(d) discrete Rayleigh, and (e) discrete Burr-Hatke distributions, fitted to data from hospitalized patients with
COVID-19.
1.0
0.085 100
0.080
0.8
80
0.075
0.6
60
Density
0.07051
ACF
0.070
0.4
40
0.065
0.2
20
0.060 0.0
0 ● ● ●
0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 0.055 0.060 0.065 0.070 0.075 0.080 0.085 0 5 10 15 20
Iterations Lag
Figure 7. Posterior samples for the parameter of the DB distribution applied to data from hospitalized
patients with COVID-19. (a) Traceplot of posterior samples, (b) histogram and posterior density with the
correspondent 95% HDI (blue line), and (c) auto-correlation function (ACF) plot for the posterior samples of
the model parameter.
tributions. We can note that the results provided by models based on DB and discrete
Lindley distributions are almost identical and are overlapping on the graph. Figure 9
describes the randomized quantile residuals from the fitted models. The model based
on the DB distribution appears to fit the data reasonably well, as does the mod-
els based on the DsFx-I and the Lindley distribution. Models based on the discrete
Rayleigh and discrete Burr-Hatke distributions did not fit the data well.
Assuming the Bayesian model introduced in Section 2.4, we assumed prior dis-
tributions β ∼ Gamma(0.001, 0.001) and η ∼ Beta(1, 1), that are approximately
non-informative priors for the model parameters. Posterior means for β and η are
βbBayes = 0.02597 (95% HDI 0.0082 to 0.0464) and ηbBayes = 0.51048 (95% HDI 0.1879
to 0.8286), respectively. Figure 10 describes the posterior samples for the parameters
β and η. Panels (a) and (d) show that the MCMC chains reached satisfactory conver-
gence for both parameters and the Geweke Z scores for β and η chains are -0.060 and
-0.729, respectively. Panels (b) and (e) describe the posterior densities and the 95%
HDI. Panels (c) and (f) shows that autocorrelations within each chain were reasonably
14
1.0
●
●●●●●●●●●
●●●●● ●●●
●●●●●● ●●
●●
●●
●●●●● ●●●
●● ●●
●● ●●●●●
●● ●
● ●●●●●●●
●●●
0.8
●● ●
●●●
●●
●●
● ●● ●●
●●
● ●●
●●●●
●● ●●●● ●●
●●●● ●●● ●●●●
Survival function
●●●●●●●●●●●● ●●
●●●● ●
●● ●●●● ●●●●●●
●●●●●●●●●●●●●●●●● ● ●● ●●
● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●
●●●●
●●
●●●●●
●●●●●●●●
●●●●●
●●● ●●●
●●●
●●
0.6
●●● ●●● ● ●●●●●●
●● ●●●●●●
●●●●●●●●●●●●●●●● ●●
●●●●
●●●●●●●●●●
●●●●
●●●●
●●●●●●
●● ● ● ● ●● ●● ●● ●●
●●
Kaplan−Meier estimates
0.4
0 20 40 60 80 100
Follow up (months)
Figure 8. Survival function for the pelvic tumors patients’ data estimated by the Kaplan-Meier method and
by using the models based on the DB, DsFx-I, discrete Lindley, discrete Rayleigh, and discrete Burr-Hatke
distributions.
1.5
● ●
●
2
●
●
Sample Quantiles
Sample Quantiles
●
1.0
●
●● ●
0.5
● ● ●
●● ●
●
● ●●●●
●
●●●
0.0
●● ●
●
● ● ●●
−0.5
●●
● ● ● ●
●
1
●
● ●
−1.0
●
K−S, p−value 0.1004 K−S, p−value 0.127
−1.5
Sample Quantiles
● ● ● ● ●
● −2 −1 0 1 2 −2 −1 0 1 2
●
0
●
● (d) Discrete Rayleigh model (e) Discrete Burr−Hatke model
●
3
● ●
●
●
Sample Quantiles
Sample Quantiles
1
−1
●
●
2
● ●
● ●
●●●
●● ●
●●
0
1
●
●
● ● ●●
● ●● ● ●
●●
0
● ●●●
−1
●
● ● ● ● ●●
●
−2
●
−2
−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2
Figure 9. Randomized quantile residuals for the models based on the (a) DB, (b) DsFx-I, (c) discrete Lindley,
(d) discrete Rayleigh, and (e) discrete Burr-Hatke distributions, fitted to data from patients with pelvic tumors.
low.
4. Concluding Remarks
The literature contains few articles on the DB distribution introduced by (Altun et al.,
2020). Hence the contribution of the present article is the introduction of parameter
estimation for DB distribution considering the inclusion of right-censored data and a
cure fraction. Three applications to real datasets show that the model based on DB
distribution performs at least as good as some other traditional discrete models as
the DsFx-I, discrete Lindley, discrete Rayleigh, and discrete Burr-Hatke distributions.
Therefore, the model based on DB distribution showed to be a suitable way to analyze
discrete survival data, even including the presence of immune individuals. Moreover,
the model can be easily implemented in computational programs as R, as showed in
15
(a) Traceplot (b) Posterior density (c) ACF plot
40
0.08
0.8
0.06 30
Parameter β
Density
ACF
20
0.4
0.04
●
0.02597
0.02 10
0.0
0 ● ● ●
0e+00 4e+05 8e+05 0.00 0.02 0.04 0.06 0.08 0 5 10 15 20
Iterations Lag
0.8
2.5
Parameter η
0.6 2.0
Density
0.51048
ACF
1.5
0.4
●
0.4
1.0
0.2 0.5
0.0
0.0 0.0 ● ● ●
0e+00 4e+05 8e+05 0.0 0.2 0.4 0.6 0.8 0 5 10 15 20
Iterations Lag
Figure 10. Posterior samples for the model parameters of the DB distribution applied to data from patients
with pelvic tumors. (a) Traceplots of posterior samples, (b) histograms and posterior densities with the corre-
spondent 95% HDI (blue lines), and (c) auto-correlation function (ACF) plots for the posterior samples of the
model parameters.
the Appendix. Currently, we find many examples of application of cure rate models to
medical data (Gallardo et al., 2021; Leão, 2020; Rafati et al., 2020), which makes these
models attractive to be assumed in lifetime data analysis. The methods introduced in
this paper could be very helpful to researchers dealing with discrete survival data.
References
16
[12] Eliwa, M. S. and El-Morshedy, M. (2021). A one-parameter discrete distribution for over-
dispersed data: Statistical and reliability properties with applications, Journal of Applied
Statistics, 1–21.
[13] El-Morshedy, M., Eliwa, M. S. and Altun, E. (2020). Discrete Burr-Hatke distribution
with properties, estimation methods and regression model, IEEE Access, 8, 74359–74370.
[14] Farewell, V. T. (1982). The use of mixture models for the analysis of survival data with
long-term survivors, Biometrics, 38, 1041–1046.
[15] Flynn, R. (2012). Survival analysis, Journal of Clinical Nursing, 21, 2789–2797.
[16] Freireich, E. J., Gehan, E., Frei-3rd, E., Schroeder, L. R., Wolman, I. J., Anbari, R.,
Burgert, E. O., Mills, S. D., Pinkel, D., and Selawry, O. S. (1963). The effect of 6-
mercaptopurine on the duration of steroid induced remissions in acute leukemia - A model
for evaluation of other potentially useful therapy, Blood, 21, 699–716.
[17] Freitas, B. C. L., Oliveira-Peres, M. V., Achcar, J. A., and Martinez, E. Z. (2021). Classical
and Bayesian inference approaches for the exponentiated discrete Weibull model with
censored data and a cure fraction, Pakistan Journal of Statistics and Operation Research,
17, 467–481.
[18] Gallardo, D. I., Castro, M., and Gómez, H. W. (2021). An alternative promotion time
cure model with overdispersed number of competing causes: an application to melanoma
data, Mathematics, 9, 1815.
[19] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B.
(2013). Bayesian data analysis, 3rd ed., Chapman Hall.
[20] Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calcula-
tion of posterior moments, Bayesian Statistics, 4, 641—649.
[21] Gómez-Déniz, E. and Calderı́n-Ojeda, E. (2011). The discrete Lindley distribution: prop-
erties and applications, Journal of Statistical Computation and Simulation, 81, 1405–1416.
[22] Henningsen, A. and Toomet, O. (2011). maxLik: A package for maximum likelihood esti-
mation in R, Computational Statistics, 26, 443–458.
[23] Hurvich, C. M. and Tsai, C. L. (1989). Regression and time series model selection in small
samples, Biometrika, 76, 297—307.
[24] Lambert, P. C. (2007). Modeling of the cure fraction in survival studies, The Stata Journal,
7, 351–375.
[25] Le Rademacher, J. and Wang, X. (2021). Time-to-event data: An overview and analysis
considerations, Journal of Thoracic Oncology, 16, 1067–1074.
[26] Leão, J., Bourguignon, M., Gallardo, D. I., Rocha, R., and Tomazella, V. (2020). A
new cure rate model with flexible competing causes with applications to melanoma and
transplantation data, Statistics in Medicine, 39, 3272–3284.
[27] Maller, R. A. and Zhou, X. (1996). Survival analysis with long-term survivors, John Wiley
& Sons, New York.
[28] Martin, A. D. and Quinn, K. M. (2006). Applied Bayesian inference in R using MCMC-
pack, R News, 6, 2–7.
[29] Martinez, E. Z and Achcar, J. A. and Jácome, A. A. and Santos, J. S (2013). Mixture
and non-mixture cure fraction models based on the generalized modified Weibull distri-
bution with an application to gastric cancer data, Computer Methods and Programs in
Biomedicine, 112, 343–355.
[30] Myers, J. (2007). Survival analysis techniques in clinical research, The Journal of the
Kentucky Medical Association, 105, 545–550.
[31] Nakagawa, T. and Osaki, S. (1975). The discrete Weibull distribution, IEEE Transactions
on Reliability, 24, 300-301.
[32] Nekoukhou, V. and Bidram, H. (2015). The exponentiated discrete Weibull distribution,
SORT Statistics and Operations Research Transactions, 39, 127–146.
[33] Oliveira, R. P., Oliveira-Peres, M. V. Martinez, E. Z., and Achcar, J. A. (2019). Use of a
discrete Sushila distribution in the analysis of right-censored lifetime data, Model Assisted
Statistics and Applications, 14, 255–2681.
[34] Othus, M., Barlogie, B., LeBlanc, M. L., and Crowley, J. J. (2012). Cure models as a
17
useful statistical tool for analyzing survival, Clinical Cancer Research, 18, 3731–3736.
[35] Paranjpe, I., Fuster, V., Lala, A., Russak, A. J., Glicksberg, B. S., Levin, M. A., Charney,
A. W., Narula, J., Fayad, Z. A., Bagiella, E., Zhao, S., and Nadkarni, G. N. (2020). As-
sociation of treatment dose anticoagulation with in-hospital survival among hospitalized
patients with COVID-19, Journal of the American College of Cardiology, 76, 122–124.
[36] Peng, Y. and Yu, B. (2021). Cure models: Methods, applications, and implementation,
CRC Press.
[37] Rafati, S., Baneshi, M. R., and Bahrampour, A. (2020). Factors affecting long-survival of
patients with breast cancer by non-mixture and mixture cure models using the Weibull,
log-logistic and Dagum distributions: a Bayesian approach, Asian Pacific Journal of Can-
cer Prevention: APJCP, 21, 485.
[38] Rohatgi, A. (2004). WebPlotDigitizer version 4.4, Available from:
https://automeris.io/WebPlotDigi-tizer/.
[39] Roy, D. (2004). Discrete Rayleigh distribution, IEEE Transactions on Reliability, 53,
255–260.
[40] Shi, X., Shi, Y. and Zhou, K. (2021). Estimation for entropy and parameters of generalized
Bilal distribution under adaptive type II progressive hybrid censoring scheme, Entropy,
23, 206.
[41] Wang, B., Xie, X., Yin, J., Zou, C., Wang, J., Huang, G., Wang, Y., and Shen, J. (2015).
Reconstruction with modular hemipelvic endoprosthesis after pelvic tumor resection: a
report of 50 consecutive cases, PLoS One, 10, e0127263.
Appendice: R Codes
Under the frequentist approach, the following R code is used to implement the model
for survival data with a cure fraction based on the DB distribution, as presented in
subsection 2.3. We used the function maxLik of the maxLik package (Henningsen and
Toomet, 2011) for the maximization of the likelihood function.
# Reading data (Wang et al., 2015)
t <- c(3,7,11,18,22,25,28,32,34,35,35,36,40,40,41,54,66,76,84,88,92)
d <- c(1,1,0,1,0,1,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0)
n <- length(t) # the sample size
K <- 2 # number of parameters
18
L <- sum(log(like))
if (is.na(L)==TRUE) {return(-Inf)} else {return(L)} }
# Obtaining the ML estimates
mle <- c()
mle <- maxLik(logLik=log.f,start=c(0.08,0.6))
summary(mle)
betaDB <-mle$estimate[1]
etaDB <-mle$estimate[2]
s <- vcov(mle)
# The 95% confidence intervals
llimDB <- round(betaDB - qnorm(0.975) * sqrt(s[1,1]),4)
ulimDB <- round(betaDB + qnorm(0.975) * sqrt(s[1,1]),4)
llimDBe <- round(etaDB - qnorm(0.975) * sqrt(s[2,2]),4)
ulimDBe <- round(etaDB + qnorm(0.975) * sqrt(s[2,2]),4)
cat("n = ",n,"\n")
cat("Beta = ",betaDB, "95%CI: (",llimDB,",",ulimDB, ") \n")
cat("Eta = ",etaDB, "95%CI: (",llimDBe,",",ulimDBe, ") \n")
# Calculating AIC, BIC and AICC
aic <- AIC(mle)
bic <- AIC(mle,k = log(n))
aicc <- aic + (2*K^2+2*K)/(n-K-1)
cat("AIC = ",aic,", BIC = ",bic,", AICC = ",aicc,"\n")
This is the R code for the Bayesian model for survival data with a cure fraction
based on the DB distribution, as presented in subsection 2.4:
# The log posterior function
log.post <- function(t,d,parms) {
beta <- parms[1]
eta <- parms[2]
if (parms[1]<0) return(-Inf)
if (parms[2]<0) return(-Inf)
if (parms[2]>1) return(-Inf)
p <- exp(-beta)
St0 <- (3-2*p^(t+1))*p^(2*(t+1))
ft0 <- p^(2*t)*(p-1)*(2*p^t*(p^2+p+1)-3*p-3)
St <- eta + (1-eta)*St0
ft <- (1-eta)*ft0
like <- ft^d * St^(1-d)
log.like <- sum(log(like))
prior <- dgamma(beta,0.001,0.001)*dbeta(eta,1,1)
log.prior <- log(prior)
L <- log.like + log.prior
if (is.na(L)==TRUE) {return(-Inf)} else {return(L)} }
19
# Obtaining the HPD intervals
HPDinterval(posterior, prob = 0.95)
# Geweke z scores
geweke.diag(posterior)
20