2011 IEEE International Symposium on Information Theory Proceedings
Population Size Estimation
Using a Few Individuals as Agents
Farid Movahedi Naini∗ , Olivier Dousse†, Patrick Thiran∗ and Martin Vetterli∗
∗
†
EPFL, Lausanne, Switzerland
Nokia Research Center, Lausanne, Switzerland
Abstract—We conduct an experiment where ten attendees of
an open-air music festival are acting as Bluetooth probes. We
then construct a parametric statistical model to estimate the
total number of visible Bluetooth devices in the festival area. By
comparing our estimate with ground truth information provided
by probes at the entrances of the festival, we show that the total
population can be estimated with a surprisingly low error (1.26%
in our experiment), given the small number of agents compared
to the area of the festival and the fact that they are regular
attendees who move randomly. Also, our statistical model can
easily be adapted to obtain more detailed estimates, such as the
evolution of the population size over time.
I. I NTRODUCTION
Nearly every current mobile phone is equipped with a Bluetooth radio interface, each having a unique MAC address. This
technology was originally designed to replace wires between
electronic devices. In order to ease the peering of devices, it
includes a detection functionality, where enabled devices can
detect each other within a small radius (typically 10-20m).
It has also been observed [1] that a non-trivial fraction of
mobile phone users leave the detection feature of their phone
turned on constantly (“visible mode”), most probably because
the energy autonomy of their phone is not much affected
to attract their attention. A particularly interesting feature is
that when they are in visible mode, phones broadcast their
MAC address, which makes them uniquely identifiable. This
possibility allows therefore to use mobile phones as sensing
devices, and to evaluate different features of a population
related to their mobility patterns. We focus here on a more
specific problem, which is the population size estimation.
In this paper, we only consider the case where measurements are performed by mobile agents that move randomly,
as every other user (the mobile phones carried by “standard”
users), and not by agents who would carefully swipe the
monitored area. Is it possible to estimate with a good accuracy
the size of the population in a closed environment from such
traces? To the best of our knowledge, this is the first effort to
use such measurements for population size estimation.
In order to study the feasibility and accuracy of population
size estimation, we conducted an experiment at Paléo Music
Festival [2] that took place in Nyon, Switzerland in July 2010.
As explained in the next section, this festival provides a good
environment to perform experiments related to population
sampling. We use the obtained data from this experiment as a
basis to benchmark our method.
978-1-4577-0594-6/11/$26.00 ©2011 IEEE
The problem and the solution exposed in this paper are
closely related to problems addressed in some fields such as
ecology, biostatistics and information theory. Ecologists and
biostatisticians are interested in estimating population sizes of
certain animals (refer to [3], [4], [5] for a review). One of
their techniques is called “Capture-Recapture”, where some
of the animals in a population are first caught (by setting
up traps), marked and released. In the recapture process,
some of the animals are captured again and the number of
previously marked animals will provide information that is
used to infer about the population size [6], [7]. Thanks to the
unique Bluetooth MAC address attached to every device, we
can keep a similar record of the individuals who have already
been seen and thus apply similar methods in our setting. In
the field of information theory, alphabet size estimation [8],
pattern likelihood maximization [9], and sequence probability
estimation [10], [11] also address related problems.
In contrast to the above works, we do not place monitoring
devices or traps at given places, and we cannot start and
terminate the measurement campaign at given times. In our
case, the “sensing devices” are carried by regular individuals
from the population, with an uncontrolled, random mobility
pattern, and who arrive and leave the monitored area at
different, random times. Consequently, after describing the
experiment we conducted at Paléo Music Festival and the
obtained measurements in Section II, we develop a method
that factors in these sources of uncertainty in Section III. We
discuss the estimation results in Section IV.
II. E XPERIMENT
A. Experiment description
Paléo Music Festival is one the major music festivals in
Europe, which attracts more than thirty thousand attendees
per day. It is an open-air festival that allows GPS coverage,
and takes place within a closed area with fixed entrance/exit
points. The surface of the festival covers around 120000 m2 .
These characteristics make this festival a good environment
for performing experiments related to population sampling. In
order to have a better understanding of the environment of the
festival, a map is shown in Figure 1. Our idea is to sample the
population by sending some attendees as “agents” inside the
festival. Each agent is equipped with a mobile phone (Nokia
N95) that is programmed to regularly scan for Bluetooth
devices within its range (around 10-20 meters). The phone then
collects Bluetooth MAC addresses of mobile devices that have
2426
DÔME
7
Ecopoint
12
1000
14
17
O
6
15
1
2
3
JURA
18
HES-S
4
11
QUARTIER
DE l'ORIENT
10
WC
E DESS
SOL
PASSAG
RNE
TOU
16
5
WC
LE DÉTOUR
RACCOU
E DU
PASSAG
QUARTIER
LATIN
8
E DES
PASSAGISANS
ART
WC
WC
QUARTIER DE
LA TERRASSE
COIN
PIQUE-NIQUE
ESPACE
MIELIMÉLO
LA TERRASSE
800
RCI
ENTRÉE
WC
WC
CHAPITEAU
ESPACE
Cie CARABOSSE
FORUM
11
LA PL’ASSE
LA PLAT’FORME
CAMPING
LA
RUCHE
QUARTIER DES ALPES
13
WC
11
GRANDE
SCÈNE
WC
8
QUARTIER DU MIDI
WC
400
BANCOMAT
WC
SCÈNES
BARS
STANDS
RESTAURANTS
BOUTIQUES A
PLÉO
WC
STANDS/BARS SPONSORS
600
MOTOS
200
CAISSES
BANCOMAT
CLUB
TENT
WC
9
1
2
3
4
5
6
7
8
9
10
Agent
GARDERIE
LA LUCIOLE
Fig. 2. Number of different Bluetooth devices detected by each agent (left
bar), and the duration of stay (in minutes) for each agent (right bar).
ACCRÉDITATIONS
LA
RU
E
PALÉO
SHOP
WC
BANCOMATS
MOTOS
GARE
MOTOS
Fig. 1. Paléo Music Festival map. The surface covers around 120000 m2 .
Position of the entrance phones is indicated by dark triangular markers.
their Bluetooth visibility turned on. Bluetooth MAC addresses
are unique to each device and can be used as identifiers of
attendees. The goal is to use this information to estimate the
population size of attendees (or the subset of them that carry
visible Bluetooth devices).
In order to have a ground truth of the number of visible
Bluetooth devices at the festival, a regular Bluetooth scanning
is done at the entrances of the festival as well. Two mobile
phones are installed at the main entrance of the festival, and
another phone is installed at the back entrance. The position of
these three mobile phones is shown by markers in Figure 1.
The same gates are used both for the entrance and for the
exit of attendees. Some additional information, such as the
estimated total number of attendees at the festival (obtained
on the basis of the number of sold tickets and counted tickets
at the entrance gates), is also provided by the organizers of
the festival.
In our experiment, ten (unrelated) people were chosen to
take part as agents. Agents’ phones and entrance phones are
programmed to perform Bluetooth scanning every 80 seconds.
The experiment was performed during one day of the festival,
and the duration of the festival on that day was 13 hours.
3) Measurements by agents: The agents were able to detect
2637 out of 3326 Bluetooth devices detected at the festival,
which corresponds to 79.3% of the Bluetooth devices. We
expect this ratio to be less than 100%, because there were
only a few agents present in the festival and the mobile phones
have a short Bluetooth range. Nevertheless, this ratio is pretty
large: 10 agents, spending a few hours at a large area, and
with more than 3300 visible Bluetooth devices, have detected
nearly 80% of them.
Figure 2 shows the number of different Bluetooth devices
detected by each agent, and the time duration of stay (in
minutes) for each agent. As mentioned before, our goal is
to estimate the total number of visible Bluetooth devices at
the festival (3326) based on agents’ Bluetooth traces.
III. M ODEL
A. Data structure and notation
B. Measurements
In this section we discuss the measurements obtained in the
experiment.
1) Preprocessing: The measurements are first preprocessed
in order to discard irrelevant information. For the entrance
phones, we consider only the Bluetooth traces that were
collected during the opening hours of the festival. For the
agents’ phones, we consider only the Bluetooth traces that
were collected during the period when the agents were on
the festival grounds. Using the entrance phones traces, it is
possible to determine the time period during which the agents
were on the festival grounds.
2) Measurements at entrance: 3326 different Bluetooth
devices were detected at the entrance. The estimated number
of attendees given by the organizers of the festival is 40536.
By dividing the number of detected Bluetooth devices at the
entrance by the total number of attendees, we get the approximate percentage of attendees that have visible Bluetooth
devices. This ratio is equal to 8.2%, which is close to the
values reported in the literature (4.7% to 7% in [1])1 .
1) Population: The population is comprised of attendees
with visible Bluetooth devices. We denote its size by N .
We call the population members individuals and use variable
i for indexing them. Denote the festival duration by Tf est .
For simplicity, we shift the time origin such that the festival
opening time is at time 0 and its closing time is at time Tf est .
Let sti and dti denote, respectively, the entrance and departure
times of individual i to the festival; these variables are not
directly observable (at least not for all individuals), and will
be treated as random variables, which are assumed to be i.i.d.
across the population. We denote by f (st, dt) their probability
density distribution (pdf), on which we will elaborate later.
Moreover, let tif rst and tilast denote the first and the last time,
respectively, when individual i has been detected by any of the
agents. This information indicates that individual i has been
on the festival grounds between tif rst and tilast .
2) Agents: We denote the number of agents by M and
A
use variable j for indexing them. Let stA
j and dtj denote
the entrance and departure times of agent j to the festival.
Note that, unlike individuals, agents’ entrance and departure
times are known to us. Let tjsti ,dti denote the duration of time
between the entrance and departure of individual i, which is
overlapped with the entrance and departure of agent j 2 . We
A
have tjsti ,dti = max min(dtA
j , dti ) − max(sti , sti ), 0 .
3) Detection: The data that each agent provides consists of
a list of MAC addresses detected by the agent together with
the corresponding detection times. Denote the total number
1 The ratio is a bit higher probably because the population structure (such
as age) at Paléo is different than the population structure in [1].
2 We assume that when an individual or an agent enters the festival, he stays
on the festival grounds until he departs from the festival.
2427
of detected MAC addresses by S and map the detected MAC
addresses to the set {1, . . . , S}. Note that this mapping is not
unique. Denote by kij the number of times that individual i
PM
has been detected by agent j 3 . Let ni = j=1 kij denote the
total number of times that individual i has been detected. Note
that individual i is observed if and only if ni > 0 (if it has
been detected by at least one of the agents).
B. Likelihood based estimation
Our model is mainly based on the following two assumptions.
• Poisson detection: We assume that the number of times
an agent detects an individual is Poisson distributed.
• Independence: We assume that the detection of any individual by any agent is independent of all other individuals
and agents.
More precisely, we assume that the number kij of times that
agent j detects individual i is a Poisson random variable with
parameter λi tjsti ,dti . In other words, we set the mean number
of detections of individual i by agent j to be proportional to
the amount of time during which both individual i and agent
j are on the festival grounds (tjsti ,dti ) and to a factor specific
to i which we call detection rate (λi ) of individual i.
Moreover, we treat λi as a random variable. We assume that
for individual i, λi is drawn from a Gamma distribution with
parameters α and β, independently from other individuals and
from its arrival and departure time. We use the Gamma prior
because it is a flexible distribution and it is the conjugate prior
of the Poisson distribution. The probability density function of
λi therefore reads: fλi (λi ; α, β) = β α e−βλi λα−1
/Γ(α).
i
The Poisson-mixed model has previously been used in
the literature to address problems related to population size
estimation [12], [13]. In these methods, all the population
members (animals for example) are vulnerable to the sampling process (traps for example) for the entire duration of
experiment. However, in our experiment, this assumption does
not hold, and we account for this by using the pdf f (dt, st).
Some other methods [7], [3] could be applied to this problem,
but they will only account for whether individual i has been
detected by agent j or not. In other words, they only take
into account 11{kij >0} and not kij . These methods attack
the problem by modeling the detection probability of an
individual. A limitation of this approach is that the detection
probability of an individual does not linearly scale with time
and hence the effect of time cannot be readily included. In
contrast, in the Poisson model, the average number of times
agent j detects individual i scales linearly with time, as
one would expect. Moreover, parameters λi and tjsti ,dti have
meaningful interpretations.
In order to derive the estimator for N , we compute the
probability of observing the obtained measurements under
the model described above with parameters N, α, β. This is
usually called the likelihood function. We then pick the set
of parameters, in particular N , that maximize this likelihood.
The likelihood function has the following form:
S
Y
N
N −S
L(N, α, β) =
(1 − pdet (α, β))
(1)
·
Pi ,
S
|
{z
} i=1
| {z }
L1 (N,α,β)
L2 (α,β)
where pdet and Pi are given below.
The first term (L1 ) is related to the likelihood of the
unobserved individuals, and the second term (L2 ) is related
to the likelihood of the pattern of the observed individuals.
We discuss below each part of the likelihood function.
(st,dt,λ)
be the
1) Likelihood of the unobserved: Let pdet
probability of observing an individual having detection rate
λ, and entrance and departure times st, dt. Using the Poisson
detection assumption, we have
(st,dt,λ)
pdet
=1−
M
Y
j
e−λtst,dt = 1 − e−λ
PM
j
j=1 tst,dt
.
(2)
j=1
Since λ, st and dt are random variables, we compute the
expectation of this probability over (st, dt, λ):
#α %
"!
β
.
(3)
pdet (α, β) = 1 − Est,dt
P
β + j tjst,dt
The likelihood of the unobserved individuals is equal to the
probability of not observing N − S of the individuals:
N
N −S
L1 (N, α, β) =
(1 − pdet (α, β))
N −S
"!
#α %#N −S
!
N
β
=
Est,dt
.
PM
S
β + j=1 tjst,dt
(4)
2) Likelihood of the observed: We first compute the probability of the observed pattern of detection by each agent for
one of the observed individuals. Given that individual i has
detection rate λ and entrance and departure times st, dt, the
probability for him to be detected kij times by agent j for
j = 1, . . . , M , with tif rst > st and tilast < dt, is
(st,dt,λ)
Pi
=
M
Y
j=1
j
e−λtst,dt
(λtjst,dt )kij
kij !
11{st<tif rst ,dt>tilast } . (5)
Again taking expectations, we get
kij
M
Γ(α + ni )β α 11{st<tif rst ,dt>tilast } Y
(tjst,dt )
.
Pi = Est,dt
PM
kij !
Γ(α)(β + j=1 tjst,dt )α+ni j=1
(6)
The second part of the likelihood is equal to the probability
of the observed pattern for all the observed individuals. Using
the independence assumption we have
3 As
Bluetooth scanning is performed every 80 seconds, if we observe a
burst of repeated detections of individual i by agent j, we only consider the
first detection of the burst.
2428
L2 (α, β) =
S
Y
i=1
Pi .
(7)
3) Maximum likelihood estimator: We define the maximum
likelihood estimators for N, α, β as
(N̂ , α̂, β̂) = arg max log L(N, α, β).
Parameter
α̂
β̂
p̂det (α, β)
N̂
(N − N̂ )/N
(8)
N,α,β
Where L(N, α, β) is the full likelihood given by (1), (4) and
(7). N̂ is the maximum likelihood estimator for the population
size.
C. Estimating the total number of attendees
Remember that N is the number of attendees who carry
visible Bluetooth devices. By applying the ratio of attendees
that have visible Bluetooth devices to the estimated N , we can
estimate the total number of attendees. Let NT ot be the total
number of attendees and let r be the ratio of attendees carrying
visible Bluetooth devices: r = N/NT ot . Let N̂ = N (1 + ∆N )
and r̂ = r(1 + ∆r) be the estimates for N and r, respectively,
with relative errors equal to ∆N and ∆r. If |∆N | ≪ 1 and
|∆r| ≪ 1 then,
N̂T ot =
N̂
N (1 + ∆N )
=
≈ NT ot (1 + ∆N − ∆r),
r̂
r(1 + ∆r)
which means in the worst case, the relative error in estimating
the total number of attendees is approximately equal to the
sum of the relative errors in estimating N and r.
Choice of f (st, dt)
f1 (st, dt)
f2 (st, dt)
f3 (st, dt)
1.588
1669.4
0.850
3104
6.67%
1.994
1653.9
0.796
3314
0.36%
1.935
1624.2
0.803
3284
1.26%
TABLE I
C OMPARISON OF THE ESTIMATED POPULATION SIZE WITH THE GROUND
TRUTH (N = 3326) FOR THREE DIFFERENT DISTRIBUTIONS OF
ENTRANCE AND DEPARTURE TIMES .
Method
Mth in [7]
[8]
N̂
(N − N̂ )/N
3013
2676
9.46%
19.54%
TABLE II
R ESULT OF APPLYING THE ESTIMATORS IN [7], [8] TO THE
MEASUREMENTS .
For generating a valid (st, dt), we draw an entrance time
and a positive duration of stay according to the described
distributions; the departure time is accepted only if it is smaller
than Tf est .
B. Estimating the population size
IV. R ESULTS
In this section we discuss some results from the application
of our model to the data. We first elaborate on the choice of
the model for arrival and departure times f (st, dt).
A. Choice of f (st, dt)
We use three different entrance and departure times distributions which we discuss below.
1) Deterministic: One extreme choice for f (st, dt) is a
deterministic entrance time and departure time for all the
individuals. We choose f1 (st, dt) = δ(st)δ(dt − Tf est ), where
δ(·) is the Dirac function. This distribution assumes that all
the individuals enter at the beginning of the festival (time 0)
and leave at the end of the festival (Tf est ), similarly to the
studies in [7], [12], [13].
2) Estimated actual distribution: The opposite extreme
choice for f (st, dt) is to use the Bluetooth traces obtained
from entrance phones to estimate the distribution of f (st, dt).
This information is in general not available, but is used in
our experiment for benchmarking purposes. Recall that the
entrance phones perform a Bluetooth scanning at the entrance
gates; as a result, they measure entrance and departure times
for all individuals. After observing entrance phones traces, we
computed the empirical distribution of f (st, dt).
3) Low informative: In practice, we do not have detailed
enough information of entrance and departure times to estimate f (st, dt). We assume that individuals enter uniformly
at random between the start of the festival until the midtime of the festival. In other words, st ∼ U(0, Tf est /2). We
also assume that the duration of stay for each individual is
N (Tf est /2, 2hours ) and is independent of the entrance time.
For each of the three pdf f (st, dt) described above, we
maximized the full likelihood give in (8) using numerical
methods. The result is given in Table I.
We observe in the table that the naive choice of deterministic
entrance and departure times gives a relatively large undershoot. An explanation for this undershoot is that based on
f1 (st, dt), all the individuals are in contact with all the agents,
and hence the overlap time between agents and individuals
is overestimated. The detection probability is overestimated,
which results in an undershoot. By using a probabilistic
f (st, dt) instead, individuals are on average in contact with
the agents for a smaller time duration, hence the detection
probability decreases and we have an increase in the estimated
population. We also observe that by estimating f (st, dt) from
the entrances traces, we get surprisingly close to the true value
(N = 3326). Finally, the low informative f3 (st, dt) gives a
reasonably good result.
We compare our method with the capture-recapture method
described in [7] and with the method in [8]. The results
are shown in Table II. Both methods exhibit an undershoot.
Remember that the time duration which each individual is
vulnerable to the sampling process is random (according to its
entrance and departure time), which is not taken into account
in [7]. Therefore, the result has an undershoot similar to our
method for the choice of f1 (st, dt). The method in [7] assumes
uniform sampling of the population, which is not valid in our
experiment and is the reason for the undershoot. We remark
that the approximation used in the estimator in [8] is not valid
for our measurements, thus we have used the exact expression.
2429
3000
S(t)
2500
E2 [S(t)]
2000
E1 [S(t)]
E3 [S(t)]
1500
1000
500
0
3PM
6PM
9PM
12AM
3AM
Fig. 3. The dashed line is the cumulative number of individuals detected over
time (S(t) defined in IV-C). The time axis is shifted to the opening/closing
hours of the festival. The solid lines Ei [S(t)] are the average of S(t),
computed using the distribution fi (st, dt).
C. Average detected individual versus time
One way to compare the method against the actual traces
is to look at the evolution of expected number of detected
individuals versus time. Recall that the total number of detected individual is denoted by S. We denote by S(τ ) the
total number
PN of individuals detected by agents up to time τ :
S(τ ) = i=1 11{tif rst ≤τ } . In particular, S = S(Tf est ). The
obtained value of S(τ ) based on agents’ traces is the dashed
line plotted in Figure 3. We observe that S(τ ) is zero before
any agent enters the festival, and then rapidly grows.
Based on the model, we can estimate E[S(τ )] as follows.
By linearity of expectation we have E[S(τ )] = N P[tf rst ≤ τ ],
where P[tf rst ≤ τ ] is the probability for an individual to be
detected by at least one agent before time τ . For any value
of 0 ≤ τ ≤ Tf est , the agents can be categorized into two
types. Type I agents are those who enter the festival after time
τ . These agents cannot detect any individual before time τ .
Type II agents are the remaining agents who enter the festival
before time τ . Type II agents can detect an individual before
time τ relative to the duration of time they stay on the festival
up to time τ . In fact by setting dtA = τ for type II agents,
we can use (3) to estimate P[tf rst ≤ τ ]. We do this for the
three choices of f (st, dt) and using the estimated α and β
that are given in Table I. E[S(τ )] is then equal to N̂ P[tf rst ≤
τ ], where N̂ is the estimated population size. The results are
plotted in Figure 3. Note that the dashed line in Figure 3 is one
realization of S(τ ). However, the solid lines are expectations
of S(τ ) based on the model for three different f (st, dt). We
observe that the solid lines follow the dashed line closely,
and that the model can predict the time evolution of S(τ ).
Similarly, by restricting agents’ entrance and departure times
to a particular time interval, it is straightforward to use the
method to estimate the size of the population present at the
festival at that time interval.
V. C ONCLUSION & FUTURE
WORK
In this paper we introduced a novel application that exploits
the opportunistic contacts between mobile devices, namely,
population size estimation by using mobile devices to sample
a population. In order to test the feasibility of this method, we
conducted an experiment at Paléo Music Festival. We derived
a model to estimate the population of people that carry visible
Bluetooth devices. We observed that the resulting estimate
is surprisingly close to the ground truth, even with a small
number of agents.
Furthermore, the model that we presented can be easily
applied to specific parts of the collected data in order to obtain
more specific estimates. For example, a simple extension allows to estimate the population size at different time intervals.
We believe that similar extensions can be made to estimate
the population size in different areas of the festival, provided
that we include some information about the agent’s location
in the dataset. Although having an estimate for the number
of attendees requires the knowledge of the ratio of visible
Bluetooth devices, some population characteristics such as the
relative density of attendees in different time periods or in
different areas of the festival scale linearly with the size of the
subset of visible Bluetooth devices. Therefore, the method can
be used to study such population characteristics. Our future
work will focus on the inclusion of location information and
local estimates.
ACKNOWLEDGMENT
We thank Stéphane Szilagyi for his help in organizing the
experiment at Paléo, Pascal Viot for providing us with an
estimate of the total attendees of the festival, Julien Eberle for
his help in programming the mobile phones, and the agents
for participating in the experiment. This work was supported
by the Nokia Research Center grant ‘Accidental sampling’.
R EFERENCES
[1] R. Jose, N. Otero, S. Izadi, and R. Harper, “Instant places: Using bluetooth for situated interaction in public displays,” Pervasive Computing,
IEEE, vol. 7, no. 4, pp. 52–57, 2008.
[2] http://www.paleo.ch.
[3] J. Bunge and M. Fitzpatrick, “Estimating the number of species: A
review,” Journal of the American Statistical Association, vol. 88, no.
421, pp. pp. 364–373, 1993.
[4] I. J. Good, “The population frequencies of species and the estimation of
population parameters,” Biometrika, vol. 40, no. 3/4, pp. pp. 237–264,
1953.
[5] C. J. Schwarz and G. A. F. Seber, “Estimating animal abundance: Review
iii,” Statistical Science, vol. 14, no. 4, pp. 427–456, 1999.
[6] A. Chao, “An overview of closed capture-recapture models,” Journal of
Agricultural, Biological, and Environmental Statistics, vol. 6, no. 2, pp.
pp. 158–175, 2001.
[7] S. Lee and A. Chao, “Estimating population size via sample coverage
for closed capture-recapture models,” Biometrics, vol. 50, no. 1, pp. pp.
88–97, 1994.
[8] A. Orlitsky, N. Santhanam, and K. Viswanathan, “Population estimation
with performance guarantees,” in ISIT’07, 2007, pp. 2026–2030.
[9] J. Acharya, A. Orlitsky, and S. Pan, “The maximum likelihood probability of unique-singleton, ternary, and length-7 patterns,” in ISIT’09,
pp. 1135–1139.
[10] G. M. Gemelos and T. Weissman, “On the entropy rate of pattern
processes,” IEEE Trans. on Info. Theory, vol. 52, no. 9, pp. 3994–4007,
2006.
[11] A. B. Wagner, P. Viswanath, and S. R. Kulkami, “A better good-turing
estimator for sequence probabilities,” in ISIT’07, 2007.
[12] A. Chao and J. Bunge, “Estimating the number of species in a stochastic
abundance model,” Biometrics, vol. 58, no. 3, pp. 531–539, 2002.
[13] J. Wang, “Estimating species richness by a poisson-compound gamma
model,” Biometrika, vol. 97, no. 3, pp. 727–740, 2010.
2430