Moments of random variables:
a systems-theoretic interpretation
Alberto Padoan, Member, IEEE, and Alessandro Astolfi, Fellow, IEEE
an analytic function. The representation of an analytic function
through its Taylor series [9] allows to know the whole function
once all the derivatives (at a point) are specified. Similarly, the
set of all moments of a random variable uniquely identifies the
probability distribution of the random variable, provided that
certain conditions are satisfied. This means that the essential
features of a random variable can be captured by its moments,
which can be then used as an alternative description of the
random variable.
Moments have been used in a number of different contexts in
probability theory, including the Stieltjes moment problem [10],
the Hamburger moment problem [11], the Hausdorff moment
problem [12] and the Vorobyev moment problem [13]. Moments
Index Terms— Differential equations, interpolation, non- have proved instrumental in the first rigorous (yet incomplete)
linear systems, probability, random variables, statistics, proof of the central limit theorem [14], but also in the proof of
transfer functions.
the Feynman-Kac formula [15], in the characterisation of the
eigenvalues of random matrices [16], and in the study of birthdeath processes [17]. The correspondence between moments
I. I NTRODUCTION
and probability distributions is also exploited in statistics to fit
One of the most important and challenging problems in curves and to design parameter estimation procedures [1 – 4].
probability theory and statistics is that of determining the For example, the method of moments introduced in [18] takes
probability distribution of the random process that is thought advantage of this correspondence to build consistent (usually
to have produced a given set of data. Without making any biased) estimators. Approximations of probability density
assumptions on the underlying random process, this problem functions may be computed exploiting such correspondence
is extremely difficult in general. In parametric statistics [1 – 4], through Pearson’s and Johnson’s curves [2]. A revisitation of
where the probability distribution which generates the given this correspondence has also led to the exploratory projection
data is assumed to belong to a family of probability distributions pursuit [19 – 21], a graphical visualisation method for the
parameterized by a fixed set of parameters, the problem can interpretation of high-dimensional data. While this list of
be dealt with using different approaches. A classical way [1 – examples is far from being exhaustive, it illustrates the central
4] to solve this problem is to find mathematical objects that, role of moments in a number of interesting problems. Further
under specific assumptions, uniquely identify a probability detail on theory and applications of moments can be found,
distribution in the family. In probability theory [5 – 8], a similar e.g., in [22 – 24].
necessity arises when one is confronted with the problem of
This work represents a step towards bridging the gap between
specifying uniquely a probability measure through a sequence the notions of moment in probability theory and in systems
of numbers. The determination of simple, yet meaningful, theory. Connections are first established between moments
objects with these features is therefore of paramount importance. of random variables and moments of systems [25]. These
The significance of the moments of a random variable in this connections are then used to support the claim that a system
context is comparable to that of the derivatives (at a point) for can be seen as an alternative, equivalent description of the
probabilistic structure of a random variable. This, in turn,
This work has been partially supported by the European Union’s Horiindicates that systems-theoretic techniques and tools can be
zon 2020 Research and Innovation Programme under grant agreement
No 739551 (KIOS CoE).
used to revisit and shed new light on classical results of
A. Padoan
is
with
the
Department
of
Engineering, probability theory and statistics.
University of Cambridge, Cambridge CB2 1PZ, UK (email:
The benefits of a dialogue between the two areas of research
[email protected]).
A. Astolfi is with the Department of Electrical and Electronic Engi- has been highlighted, e.g., in [26] and briefly mentioned in [27].
neering, Imperial College London, London SW7 2AZ, UK and with the The literature on the analysis of results of probability theory
Dipartimento di Ingegneria Civile e Ingegneria Informatica, Università
di Roma “Tor Vergata”, Via del Politecnico 1, 00133, Rome, Italy (email: in systems-theoretic terms, however, appears to be limited to
[email protected]).
specific topics and scattered across different research fields.
Abstract— Moments of continuous random variables admitting a probability density function are studied. We show
that, under certain assumptions, the moments of a random
variable can be characterised in terms of a Sylvester equation and of the steady-state output response of a specific
interconnected system. This allows to interpret well-known
notions and results of probability theory and statistics in
the language of systems theory, including the sum of independent random variables, the notion of mixture distribution and results from renewal theory. The theory developed
is based on tools from the center manifold theory, the theory of the steady-state response of nonlinear systems, and
the theory of output regulation. Our formalism is illustrated
by means of several examples and can be easily adapted to
the case of discrete and of multivariate random variables.
1
Linear systems theory has allowed to reinterpret basic results the notion of mixture distribution [7 , 68], and results from
of probability theory for probability density functions with renewal theory [6 , 7 , 69 , 70]. Given the conceptual nature of
rational Laplace transform, e.g., in [28 , 29]. Concepts and the paper, we focus on linear systems, linear time-delay systems
tools of systems theory have been used to investigate phase- and systems in explicit form to streamline the exposition,
type distributions [30 , 31] and matrix exponential distributions although generalisations to further classes of systems are
[32]. The study of the properties of these distributions have possible. The formalism is developed for univariate continuous
grown into a well-established area of research which goes random variables, but can be easily extended to discrete and
under the name of “matrix analytic methods” [33 , 34]. Another multivariate random variables.
well-known class of models which has attracted the interest
Preliminary results have been presented in [71], where
of systems theorists over the past century is that of Markov a first connection between moments of probability density
chains [5 , 6 , 8 , 35 – 37]. One of the reasons for this interest is functions and moments of linear time-invariant systems has
that these models can be regarded as a special class of positive been established. As a result, the moment generating function
systems [38], which can be studied exploiting the theory of non- of a random variable and the solution of a Sylvester equation
negative matrices [39 – 42]. Queuing systems have been studied have been shown to be closely related. The present manuscript
borrowing analysis and design tools from systems theory, extends our results to probability density functions which may
e.g., in [43 , 44], whereas a systems-theoretic approach has not be described by means of a linear time-invariant system.
been adopted to study risk theory, e.g., in [45 , 46]. In circuits To this end, the notion of moment of a system is revisited and
theory, moments have been used for the approximation of the extended to systems the impulse response of which is defined
propagation delay in a RC tree network by interpreting the on the whole real line or on a compact subset of the real
impulse response of a circuit as a probability density function line. The theory is supported by several worked-out examples
[47]. Finally, we emphasise that a number of fundamental tools and applications to identifiability, queueing theory and renewal
for modelling, estimation and prediction may not be sharply theory.
categorised as belonging to the sphere of probability theory or
The rest of the paper is organized as follows. Section II
to that of systems theory, but rather lie at the intersection
provides basic definitions and the necessary background
of these areas of research. The Kalman filter [48 , 49] is
concerning moments of linear systems, of time-delay systems,
perhaps the most successful example of exchange of ideas
of systems in explicit form and of random variables. Section III
between probability theory and systems theory, but several
contains our main results and includes a characterisation of
other examples can be found (see, e.g., [50 – 56] and references
the moments of a random variable admitting a (linear or
therein).
explicit) realization in terms of the systems-theoretic notion of
The main objective of this work is to establish a one-to-one
moment. The moments of probability density functions defined
correspondence between moments of random variables and
on the real axis as well as probability density functions with
moments of systems [25]. This goal is readily achieved by
compact support are also characterised by means of Sylvester
interpreting probability density functions as impulse responses
equations and steady-state responses. Section IV provides
whenever these can be realized by linear time-invariant systems.
selected applications of the theoretical results developed,
The situation is more delicate when a probability density
including the sum of independent random variables, the notion
function can be only realized, e.g., by means of a linear
of mixture distribution, and results from renewal theory. Finally,
time-varying system or a system in explicit form [57 , 58],
conclusions and future research directions are outlined in
for which the interpretation in terms of impulse responses
Section V.
does not possess a direct counterpart. This issue is overcome
Notation: Z≥0 and Z>0 denote the set of non-negative
exploiting recent developments in systems theory [58 – 63],
integer
numbers and the set of positive integer numbers,
where the moments of a linear system have been characterised
respectively.
R, Rn and Rp×m denote the set of real numbers,
as solutions of a Sylvester equation [59 , 60] and, under certain
of
n-dimensional
vectors with real entries and of p × mhypotheses, as steady-state responses of the output of particular
dimensional
matrices
with real entries, respectively. R≥0 and
interconnected systems [61 – 63]. The characterisation of the
R
denote
the
set
of
non-negative real numbers and the
>0
moments of a system in terms of steady-state responses is based
set
of
positive
real
numbers,
respectively. C denotes the
on tools arising in the center manifold theory [64], the theory
C
set
of
complex
numbers.
<0 denotes the set of complex
of the steady-state response of a nonlinear system [65], and the
numbers
with
negative
real
part.
i denotes the imaginary unit.
output regulation theory for nonlinear systems [66]. Existing
I
denotes
the
identity
matrix.
e
j denotes the vector with the
results on moments of linear and nonlinear systems [61 – 63]
j-th
entry
equal
to
one
and
all
other entries equal to zero.
are extended and the notion of moment for systems which
σ(A)
denotes
the
spectrum
of
the
matrix A ∈ Rn×n . M ′
only possess a representation in explicit form is revisited [58].
p×m
. kxk denotes
As a direct by-product of our approach, Sylvester equations denotes the transpose of the matrix M ∈ R
n
the
standard
Euclidean
norm
of
the
vector
x
∈
R
while, with
and Sylvester-like differential equations may be interpreted
some
abuse
of
notation,
kAk
denotes
the
induced
norm
of the
as powerful computational tools which allow to calculate
n×n
k
matrix
A
∈
R
.
∆
denotes,
for
all
k
∈
Z
,
the
standard
≥0
n
o
moments of random variables. Our approach allows to revisit
k+1
k
and re-interpret well-known notions and results of probability k-simplex, i.e. ∆ = w ∈ R≥0 : w1 + . . . + wk+1 = 1 .
theory and statistics using the language of systems theory, diag(λ) denotes the diagonal matrix whose diagonal elements
including the sum of independent random variables [6 , 8 , 67], are the entries of the vector λ ∈ Rn . J0 denotes the matrix
2
where Ψk ∈ R(k+1)×(k+1) is a signature matrix1 and
Υk ∈ R(k+1)×n is the unique solution of the Sylvester equation
with ones on the superdiagonal and zeros elsewhere. Jλ
denotes the Jordan block associated with the eigenvalue λ ∈ C,
i.e. Jλ = λI + J0 . k! and (k)!! denote the factorial and the
double factorial of k ∈ Z≥0 , respectively. ẋ denotes the time
derivative of the function x, provided it exists. f1 ∗ f2 denotes
the convolution of the functions f1 and f2 . L{f } denotes
the bilateral Laplace transform of the function f . δ0 and
δ−1 denote the Dirac δ-function and the right-continuous
Heaviside unit step function, respectively. The time reversal
of the function f : R → R is defined as t 7→ f (−t). δ−1
denotes the right-continuous time reversal of δ−1 . 1S denotes
the indicator function of the subset S of a set X , i.e. the
function 1S : X → {0, 1} defined as 1S (x) = 1 if x ∈ S
and as 1S (x) = 0 if x 6∈ S. By a convenient abuse of
notation, 1 also denotes the vector the elements of which
are all equal to one. The translation of the function f :
R → R by τ ∈ R is defined as fτ : t 7→ f (t − τ ). C[−T, 0]
denotes the set of continuous functions mapping the interval
[−T, 0] into Rn with the topology of uniform convergence.
For all s⋆ ∈ C and A(s) ∈ Cn×n , s⋆ ∈ C \ σ(A(s)) denotes
det(s⋆ I − A(s⋆ )) 6= 0, while σ(A(s)) ⊂ C<0 indicates that
if s⋆ ∈ C is such that det(s⋆ I − A(s⋆ )) = 0, then s⋆ ∈ C<0 .
E[X] denotes the expectation of the random variable X.
Js⋆ Υk + ek+1 C = Υk A.
(3)
Lemma 2. Consider system (1). Assume k ∈ Z≥0 and
s⋆ ∈ C \ σ(A). Then there exists a one-to-one correspondence
between the moments η0 (s⋆ ), η1 (s⋆ ), . . . , ηk (s⋆ ) and the matrix Υk B, where Υk ∈ R(k+1)×n is the unique solution of the
Sylvester equation
SΥk + M C = Υk A,
(4)
in which S ∈ R(k+1)×(k+1) is a non-derogatory2 matrix with
characteristic polynomial
χ(s) = (s − s⋆ )k+1
(5)
and M ∈ Rk+1 is such that the pair (S, M ) is reachable.
The moments of system (1) can be also described under
special circumstances by means of the (well-defined) steadystate output response3 of the interconnection of system (1) with
a system described by equations
ω̇ = Sω + M v,
d = ω,
(6)
with ω(t) ∈ Rk+1 , v(t) ∈ R, d(t) ∈ Rk+1 and v = y, i.e. of
the interconnected system
II. P RELIMINARIES
ẋ = Ax + Bu, ω̇ = Sω + M Cx, d = ω,
(7)
This section contains a short digression on the notion of
(k+1)×(k+1)
is a non-derogatory matrix with
moment, which is used with different meanings in the literature in which S ∈ R
k+1
is such that the pair
and throughout the paper. To give a compact and self-contained characteristic polynomial (5), M ∈ R
exposition, we first provide some material on the notion of (S, M ) is reachable, as detailed by the following statement,
moment of a linear system [25 , 59 – 62]. We then illustrate and which is a variation of [62, Theorem 1]. In what follows,
extend notions and results concerning moments of a system in we tacitly assume that the pair (S, M ) always satisfies these
explicit form [58] and of a time-delay systems [72]. Finally, properties.
we recall the notion of moment of a random variable [73].
Theorem 1. Consider system (1), system (6), and
the interconnected system (7). Assume σ(A) ⊂ C<0 ,
⋆
s
∈ C \ σ(A), x(0) = 0, ω(0) = 0 and u = δ0 . Then there
A. Moments of a linear system
exists a one-to-one correspondence4 between the moments
Consider a single-input, single-output, continuous-time, η (s⋆ ), η (s⋆ ), . . . , η (s⋆ ) and the (well-defined) steady-state
0
1
k
linear, time-invariant system described by the equations
response of the output d of system (7).
ẋ = Ax + Bu,
y = Cx,
(1)
B. Moments of a time-delay system
in which x(t) ∈ Rn , u(t) ∈ R, y(t) ∈ R and A ∈ Rn×n , B ∈
Rn×1 and C ∈ R1×n are constant matrices. Throughout the
paper we assume that the system (1) is minimal, i.e. reachable
and observable, and let W (s) = C(sI − A)−1 B be its transfer
function.
Consider a single-input, single-output, continuous-time, linear, time-delay system with discrete constant delays described
by equations
ẋ(t) =
Definition 1. [61] The moment of order zero of system (1) at
s⋆ ∈ C \ σ(A) is the complex number η0 (s⋆ ) = W (s⋆ ). The
moment of order k ∈ Z>0 of system (1) at s⋆ ∈ C \ σ(A) is
the complex number
(−1)k dk
⋆
ηk (s ) =
W (s)
.
(2)
k!
dsk
s=s⋆
ς
X
j=0
Aj x τ j +
̺
X
j=ς+1
Bj uτj , y(t) =
ς
X
Cj xτj ,
(8)
j=0
with x(θ) = ϕ(θ) for −T ≤ θ ≤ 0, in which x(t) ∈ Rn ,
u(t) ∈ R, y(t) ∈ R, τ0 = 0, τj ∈ R>0 , for 1 ≤ j ≤ ̺,
1A
signature matrix is a diagonal matrix with ±1 on the diagonal [74].
matrix is non-derogatory if its characteristic and minimal polynomials
coincide [41].
3 The notion of steady-state response is taken from [75] (see
also [65 , 66 , 76]). Note that if system (1) is stable and the signal u is bounded
backward and forward in time, system (1) has a unique, well-defined steadystate response.
4 The terminology is borrowed from [61 , 62]. By one-to-one correspondence
we mean that the steady-state response of the output d is uniquely determined
by the moments and vice versa.
2A
The moments of system (1) can be characterised in terms of
the solution of a Sylvester equation [59 – 62]. The following
statements provide a version of the results presented in [61 , 62].
Lemma 1. Consider system (1). Assume k ∈ Z≥0 and
′
s⋆ ∈ C \ σ(A). Then [ ηk (s⋆ ) · · · η1 (s⋆ ) η0 (s⋆ ) ] = Ψk Υk B,
3
T = max0≤j≤̺ τj ,
ϕ ∈ C[−T, 0],
and
Aj ∈ Rn×n ,
n×1
1×n
Bj ∈ R
and Cj ∈ R
are constant matrices for
0 ≤ j ≤ ς and ς + 1 ≤ j ≤ ̺, respectively. Throughout the
paper the system (8) is assumed to be minimal, i.e. reachable
and observable, and let W (s) = C(s)(sI −PA(s))−1 B(s)
ς
be its Ptransfer function, with A(s)
Aj e−τj s ,
Pς = j=0
̺
−τj s
−τj s
B(s) = j=ς+1 Bj e
and C(s) = j=0 Cj e
.
Definition 2. [72] The moment of order zero of system (8) at s⋆ ∈ C \ σ(A(s)) is the complex number
η0 (s⋆ ) = W (s⋆ ). The moment of order k ∈ Z>0 of system (8)
at s⋆ ∈ C \ σ(A(s)) is the complex number
(−1)k dk
⋆
W (s)
.
(9)
ηk (s ) =
k!
dsk
s=s⋆
This, in turn, allows to define a notion of moment for systems
in explicit form [77]. To this end, the following technical
assumptions are needed.
Assumption 1. The matrix Λ(t, t0 ) is non-singular for every
t ≥ t0 .
Assumption 2. The function t 7→ Λ̇(t, t0 )Λ(t, t0 )−1 is piecewise continuous.
Assumption 3. There exist K ∈ R≥0 and α ∈ R>0 such that
kΛ(t, t0 )k ≤ Ke−α(t−t0 ) for every t ≥ t0 .
Assumption 4. The point s⋆ ∈ C is such that Re(s⋆ ) < −α.
Assumption 5. The entries of Λ have a strictly proper Laplace
transform.
Assumption 6. The pair (x0 , C) is such that the poles of the
A theory of moments can be also developed for time-delay Laurent series associated with L{CΛx0 } and L{Λ} coincide.
systems exploiting a particular Sylvester-like equation [72].
Remark 1. The manifold
For completeness, we briefly formulate the results of [72] in a
form that is more convenient for our purposes.
M = (x, ω, t) ∈ Rn+k+2 : ω(t) = Υ(t)x(t) ,
(14)
Lemma 3. Consider system (8). Assume k ∈ Z≥0 and
is an invariant integral manifold6 of system (13) whenever
′
⋆
sP
∈ C \ σ(A(s)). Then [ ηk (s⋆ ) · · · η1 (s⋆ ) η0 (s⋆ ) ] =
the function Υ : R → R(k+1)×n satisfies (except at points of
̺
−Js⋆ τj
Υk Bj , where Ψk ∈ R(k+1)×(k+1) is a
j=ς+1 Ψk e
discontinuity) the ordinary differential equation
signature matrix and Υk ∈ R(k+1)×n is the unique solution
Υ̇(t) = SΥ(t) − Υ(t)Λ̇(t, t0 )Λ(t, t0 )−1 + M C,
(15)
of the equation
J s⋆ Υ k +
ς
X
ek+1 Cj e−Js⋆ τj =
ς
X
Aj Υk e−Js⋆ τj .
the solution of which is
(10)
S(t−t0 )
j=0
j=0
Z t
Υ(t0 )Λ(t0 , t) + eS(t−ζ) M CΛ(ζ, t)dζ, (16)
Υ(t) = e
t0
Lemma 4. Consider system (8). Assume k ∈ Z≥0 and
⋆
(k+1)×n
s ∈ C \ σ(A(s)). Then there exists a one-to-one correspon- with initial condition Υ(t0 ) ∈ R
, for all t ≥ t0 .
△
⋆
⋆
⋆
η
(s
),
η
(s
),
.
.
.
,
η
(s
)
and
dence between
the
moments
0
1
k
P̺
The following result is instrumental to define the notion of
the matrix j=ς+1 e−Sτj Υk Bj , where Υk ∈ R(k+1)×n is the
moment for system (12) and is adapted to the purposes of this
unique solution of the equation
paper from a statement originally presented in [77].
ς
ς
X
X
Theorem 2. Consider system (6), system (12) and the interconSΥk +
Aj Υk e−Sτj ,
(11)
M Cj e−Sτj =
nected system (13). Suppose Assumptions 1, 2, 3 and 4 hold.
j=0
j=0
Then there exists a unique matrix Υ∞ (t0 ) ∈ R(k+1)×n such
in which S ∈ R(k+1)×(k+1) is a non-derogatory matrix with that limt→∞ kΥ(t) − Υ∞ (t)k = 0 for any Υ(t0 ) ∈ R(k+1)×n ,
characteristic polynomial (5) and M ∈ Rk+1 is such that the where Υ and Υ∞ are the solutions of (15) with initial
pair (S, M ) is reachable.
conditions Υ(t0 ) and Υ∞ (t0 ), respectively. Moreover, the
manifold (14) is an attractive invariant integral manifold of
system (13).
C. Moments of a system in explicit form
Consider a single-output, continuous-time system in explicit
form5 described by equations
x(t) = Λ(t, t0 )x0 ,
y(t) = Cx(t),
Proof. To begin with note that by Assumption 1 the righthand side of (15) is well-defined. Let Υ1 and Υ2 be the
solutions of (15) (except at point of discontinuity) corresponding to the initial conditions Υ1 (t0 ) ∈ R(k+1)×n and
Υ2 (t0 ) ∈ R(k+1)×n . Defining the function E as the difference of Υ1 and Υ2 and differentiating with respect to
the time argument yields the ordinary differential equation
Ė(t) = SE(t) − E(t)Λ̇(t, t0 )Λ(t, t0 )−1 the solution of which,
in view of (16), is E(t) = eS(t−t0 ) E(t0 )Λ(t0 , t). By Assumptions 3 and 4, this implies limt→∞ kE(t)k = 0. As
a result, there exists a solution Υ∞ to which every solution of (15) converges asymptotically, i.e. there exists
Υ∞ (t0 ) ∈ R(k+1)×n such that limt→∞ kΥ(t) − Υ∞ (t)k = 0
for any Υ(t0 ) ∈ R(k+1)×n , where Υ and Υ∞ are the solutions
of (15) with initial conditions Υ(t0 ) and Υ∞ (t0 ), respectively.
(12)
n
in which x(t) ∈ R , y(t) ∈ R, t0 ∈ R≥0 , x(t0 ) = x0 ∈ Rn ,
C ∈ R1×n and Λ : R × R → Rn×n is piecewise continuously
differentiable in the first argument and such that Λ(t0 , t0 ) = I.
In analogy with the theory developed for linear systems,
a characterisation of the steady-state output response of the
interconnection of system (6) and of system (12), with v = y,
i.e. of the system
x(t) = Λ(t, t0 )x0 , ω̇ = Sω + M Cx, d = ω,
(13)
can be given in terms of the solution of a matrix differential
equation which plays the role of the Sylvester equation (4).
5 The
6 See
terminology is borrowed from [57 , 58].
4
[78].
A direct computation shows that the moment of order k ∈ Z>0
of the random variable X satisfies the relation µk = λk µk−1 ,
N
with µ0 = 1, and hence µk = λk!k .
Moreover, by Assumption 2, Υ∞ is unique. To complete the
proof, note that by Remark 1 the set (14) is an invariant integral
manifold of system (13) and that the set (14) is attractive since
every solution of (15) converges asymptotically to Υ∞ .
Example 2 (The hyper-exponential distribution). The probability density function of a random variable X having hyperexponential distribution is defined as
Theorem 2 allows to introduce the following definition,
which is reminiscent of the notion of moment given in [58].
n
X
Definition 3. Consider system (6), system (12) and the interf
:
R
→
R,
t
→
7
pj λj e−λj t δ−1 (t),
(19)
X
connected system (13). Suppose Assumptions 1, 2, 3 and 4
⋆
j=1
hold. The moment of system (12) at s is the function Υ∞ x,
where Υ∞ is the unique solution of (15) with Υ(t0 ) = 0.
where n ∈ Z>0 is referred to as the number of phases
n
Theorem 3. Consider system (6), system (12) and the intercon- of X and λ = (λ1 , λ2 , . . . , λn ) ∈ R>0
and p =
nected system (13). Suppose Assumptions 1, 2, 3, 4, 5 and 6 (p1 , p2 , . . . , pn ) ∈ ∆n−1 are given parameters. Observe
P
hold. Then there exists a one-to-one correspondence between that fX can be written as fX = nj=1 pj fXj , in which
⋆
the moment of system (12) at s and the (well-defined) steady- fXj : R → R, t 7→ λj e−λj t δ−1 (t) is the probability density
state response of the output d of system (13).
function of a random variable Xj having exponential distribution with parameter λj . Thus, by linearity of the expectation
Proof. By Assumptions 1, 2, 3 and 4, in view of Theorem 2,
operator, thePmoment of order k ∈ Z≥0 of the random variable
the steady-state response of the output d is well-defined. By
n
X is µk = j=1 pj λk!k .
N
j
Assumption 5 and 6, the steady-state response of the output
d coincides with Υ∞ x, which by definition is the moment of
system (12) at s⋆ .
III. M AIN RESULTS
Remark 2. Equations (12) describe a considerably general
class of continuous-time signals. In particular, under the stated
assumptions, this class contains all (exponentially bounded)
piecewise continuously differentiable functions that can be
generated as the solution of a single-input, single-output,
continuous-time, linear, time-varying system of the form
ẋ = A(t)x + B(t)u,
y = Cx,
This section contains the main results of the paper. The
moments of a random variable are shown to be in one-to-one
correspondence with the moments of a system at zero. This
is established first for the special situation in which a given
probability density functions can be identified with the impulse
response of a linear system. The theory developed is then
extended to the broader class of probability density functions
which can be represented by a system in explicit form using the
theory of moments presented in the previous section. Finally,
the case of probability density functions defined on the whole
real axis and on compact sets is considered.
(17)
with x(t) ∈ Rn , u(t) ∈ R, y(t) ∈ R, x(0) = 0, u = δ0 ,
A : R → Rn×n defined as A(t) = Λ̇(t, t0 )Λ(t, t0 )−1 for all
t ≥ t0 such that Λ is differentiable in the first argument and
B : R → Rn such that the pair (A(t), B(t)) is reachable for
all t ∈ R≥0 .
△
A. Probability density functions realized by linear systems
Definition 5. Consider system (1) and a random variable X
with probability density function fX . The probability density
function fX is realized by system (1) if fX (t) = CeAt Bδ−1 (t)
for all t ∈ R, in which case system (1) is referred to as a (linear)
realization of fX .
D. Moments of a random variable
For completeness, we now recall the notion of moment of a
random variable.
Definition 4. [73] The moment of order k ∈ Z≥0 of the
random variable X is defined as µk = E[X k ] whenever the
expectation exists.
Necessary and sufficient conditions for a probability density
function to be realized by a linear system can be established
using well-known results of linear realization theory [79 – 81].
Note that every linear realization of a probability function must
be stable7 , as detailed by the following statement.
To simplify the exposition, in the sequel we consider
exclusively continuous random variables admitting a probability
density function with finite moments of all orders. We also
ignore all measure-theoretic considerations as they are not
essential for any of the arguments. A discussion on the
extension of our results to more general situations is deferred
to Section V.
To illustrate our results and to demonstrate our approach we
use several worked-out examples throughout this work.
Lemma 5. Consider system (1) and a random variable X with
probability density function fX . If system (1) is a realization
of fX , then σ(A) ⊂ C<0 .
As a direct application of Definitions 1, 4, 5 and of Lemma 5
the moments of a random variable can be characterised by
means of moments of systems.
Example 1 (The exponential distribution). The probability
density function of a random variable X having exponential
distribution with parameter λ ∈ R>0 is defined as
fX : R → R, t 7→ λe−λt δ−1 (t).
Theorem 4. Consider system (1) and a random variable X
with probability density function fX . Assume system (1) is a
7
(18)
5
See [25] for the definition.
realization of fX . Then the moments of the random variable
X and the moments of system (1) at zero are such that
µk
= ηk (0),
k!
equation (4). Finally, note that the components of the (welldefined) steady-state response of the output d of system (7)
j
Pl−1
Pl−1
can be written as dl (t) = j=0 (−1)j λ1j tj = j=0 µj (−t)
j! ,
for all l ∈ {1, 2, . . . , k + 1}, and hence there is a one-to-one
correspondence between the moments up to the order k of the
random variable X and the steady-state response of the output
d of system (7).
N
(20)
for all k ∈ Z≥0 .
Corollary 1. Consider system (1) and a random variable X
with probability density function fX . Assume system (1) is a
realization of fX . Then the moments up to the order k ∈ Z≥0
of the random variable X are given by the entries of Ψk Υk B,
where Υk is the unique solution of the Sylvester equation (3),
with Ψk = diag((−1)k k!, . . . , −1, 1) and s⋆ = 0.
B. Probability density functions realized by systems in
explicit form
We have seen that a systems-theoretic interpretation can be
given to probability density function which can be realized
Remark 3. The moments of a random variable can be deter- by a linear system. However, the vast majority of probability
mined by direct application of Definition 4 or by “pattern density functions cannot be described by a linear time-invariant
matching” using existing tables of moments. The one-to-one differential equation. To provide a generalisation of the results
correspondence established in Corollary 1, on the other hand, established which accounts for more general probability density
indicates that a closed-form expression for the moments of functions, we develop a parallel of the formulation presented in
a random variable can be computed from the solution of a the previous section using the theory of moments for systems
Sylvester equation, which can be solved with numerically in explicit form [58]. To begin with, we introduce the following
reliable techniques [25]. The computation of moments of definition.
random variables through Sylvester-like equations is one of
Definition 6. Consider system (12) and a random varithe leitmotifs underlying our approach.
△
able X with probability density function fX . The probaCorollary 2. Consider system (1) and a random variable X bility density function fX is realized8 by system (12) if
with probability density function fX . Assume system (1) is a fX (t) = CΛ(t, t0 )x0 δ−1 (t) for all t ≥ t0 , in which case sysrealization of fX . Then the moments up to the order k ∈ Z≥0 tem (12) is referred to as a (explicit) realization of fX .
of the random variable X are in one-to-one correspondence Theorem 5. Consider system (6), system (12) and the interwith the matrix Υk B, where Υk is the unique solution of the connected system (13). Suppose Assumptions 1, 2, 3 and 4
Sylvester equation (4), in which S is a non-derogatory matrix hold. Let X be a random variable with probability density
with characteristic polynomial (5) and M is such that the pair function f and assume system (12) is a realization of f .
X
X
(S, M ) is reachable.
Then the moments of the random variable X up to the order
Corollary 3. Consider system (1), system (6), and the intercon- k ∈ Z≥0 are in one-to-one correspondence with the moment
nected system (7). Let X be a random variable with probability of system (12) at zero.
density function fX . Assume system (1) is a realization of fX ,
Proof. To begin with, note that by Assumptions 1, 2, 3
s⋆ = 0, x(0) = 0, ω(0) = 0 and u = δ0 . Then there exists a
and 4 the moment of system (12) at zero is well-defined.
one-to-one correspondence between the moments up to the
By Definition 3 and by (16), the moment of system (12) at
order k ∈ Z≥0 of the random variable X and the (well-defined)
zero reads as
steady-state response of the output d of system (7).
Z t
S(t−ζ)
Example 3 (The exponential distribution, continued). Consider
Υ∞ (t)x(t) =
e
M CΛ(ζ, t)dζ Λ(t, t0 )x0
t0
a random variable X having exponential distribution with
Z t
probability density function fX and parameter λ ∈ R>0 . A
= eSt
e−Sζ M CΛ(ζ, t0 )x0 dζ
direct inspection shows that the probability density function
t0
Z t
fX is realized by the linear, time-invariant system
= eSt
e−Sζ M fX (ζ)dζ,
t
0
ẋ = −λx + λu, y = x,
(21)
where the last identity holds
since system (12) is a realization
R
i.e. by system (1) with A = −λ, B = λ and C = 1. Note that of fX . Define H+ (t) = t e−Sζ M fX (ζ)dζ. Since S is a nont0
the only eigenvalue of system (21) has negative real part, which derogatory matrix with characteristic polynomial (5), with s⋆ =
is consistent with Lemma 5. A direct application of Definition 1 0, and since the pair (S, M ) is reachable, there exists a nonyields ηk (0) = λ1k , which, in view of Example 1 and in singular matrix T ∈ R(k+1)×(k+1) such that T −1 M = ek+1
agreement with Theorem 4, shows that the moments of the and T −1 ST = J0 . This implies
random variable X are in one-to-one correspondence with the
h
i′
k
lim H+ (t) = (−1)
(22)
moments of system (21) at zero. In accordance with Corollary 1,
· · · −µ1 µ0
k! µk
t→∞
k+1
setting Ψk =
diag((−1)
k!,
.
.
.
,
−1,
1)
and
Σ
=
J
yields
k
0
′
8 The impulse response may depend on the time of the impulse t for time
Ψk Υk B = λk!k · · · λ1 1 , in which Υk is the unique
0
solution of the Sylvester equation (3). By Corollary 2, a varying systems. Note, however, that our purpose is to model the probabilistic
structure of a random variable representing its probability density function
one-to-one correspondence can be also inferred between by means of a system and its impulse response. This means that t0 can be
the moments of the random variable X and the Sylvester considered as a parameter that can be assigned.
6
and hence that the components of the moment of system (12) at
zero grow polynomially as t → ∞, with coefficients uniquely
determined by the moments µ0 , µ1 , . . . , µk , which proves the
claim.
Corollary 4. Consider system (6), system (12) and the interconnected system (13). Suppose system (12) is a realization of
fX and Assumptions 1, 2, 3, 4, 5 and 6 hold. Then there exists
a one-to-one correspondence between the moments up to the
order k ∈ Z≥0 of the random variable X and the (well-defined)
steady-state response of the output d of system (13).
Remark 4. Assumption 1 is violated by any explicit realization
of the form (12) associated with a probability density function
with compact support, i.e. zero everywhere except on a compact
subset of the real line. A discussion on the extension of
Theorem 5 to such class of probability density functions is
deferred to Section III-D.
△
C. Probability density functions on the whole real axis
The results established so far characterise probability density
functions with support on the non-negative real axis. These
Remark 5. While every linear realization of a probability results are not satisfactory because most probability density
density function must be internally stable (Lemma 5), it is not functions are defined over the whole real line. This issue,
possible to prove that every explicit realization of a probability however, can be easily resolved using the following approach.
Every probability density function fX can be decomposed
density function must satisfy Assumption 3. The reason is
as
the sum of a function fc which vanishes on the non-positive
that there exist probability density functions with a “tail”
real
axis and of a function fac which vanishes on the nonthat is “heavier” than the one of the exponential [69 , 82],
negative
real axis, i.e. fX (t) = fc (t)δ−1 (t) + fac (t) δ−1 (t)
including those of Pareto, Weibull, and Cauchy random
for
all
t
∈
R. We call fc the causal part of fX and fac the
variables. Assumption 3 is therefore a strong assumption
anticausal
part
of fX . Note that the function need not to be
which rules out important probability density functions. Note,
continuous,
but
only
integrable.
however, that a generalisation of our results to probability
With
these
premises,
the following result holds.
density functions with a heavy tail can be established with
more advanced measure-theoretic tools.
△ Theorem 6. Consider a random variable X with probability
Example 4 (The half-normal distribution). The probability density function fX . Let fc and fac be the causal part and the
density function of a random variable X having half-normal anti-causal part of fX , respectively. Assume fc is realized by
the minimal system
distribution with parameter σ ∈ R>0 is defined as
r
ẋc = Ac xc + Bc uc , yc = Cc xc ,
(27)
2 − t22
(23)
fX : R → R, t 7→
e 2σ δ−1 (t).
2
πσ
and the time reversal of fac is realized by the minimal system
A direct inspection shows that the probability density function
ẋac = Aac xac + Bac uac , yac = Cac xac ,
(28)
fX is realized by the linear, time-varying system
r
with xj (t) ∈ Rnj , uj (t) ∈ R, yj (t) ∈ R, and Aj ∈ Rnj ×nj ,
2
t
ẋ = − 2 x +
u, y = x,
(24) Bj ∈ Rnj ×1 and Cj ∈ R1×nj constant matrices for
σ
πσ 2
j ∈ {c, ac}. Then the moments of the random variable
in which x(t) ∈ R, u(t) ∈ R, y(t) ∈ R. Consider now the X and the moments of systems (27) and (28) at zero satisfy
interconnection of system (6) and of system (24), with v = y, the identity
µk
set s⋆ = 0 and note that Assumptions 1, 2, 3 and 4 hold.
(29)
= (−1)k ηkac (0) + ηkc (0).
k!
Equation (15) boils down to
for every k ∈ Z≥0 .
t
Υ̇(t) = S + 2 I Υ(t) + M
(25)
σ
Proof. Let k ∈ Z≥0 and note that
Z ∞
and can be solved by direct application of formula (16), with
St
µ
=
tk fX (t)dt
k
Υ(t0 ) = 0, yielding Υ∞ (t) = e H+ (t), with
−∞
2
2)
Z 0
Z ∞
(−1)k R t k − (ζ −t
2σ 2
ζ
e
dζ
k
k!
0
=
t fac (t)dt +
tk fc (t)dt
..
−∞
0
Z ∞
Z ∞
.
(26)
H+ (t) =
.
R t − (ζ2 −t2 )
k
k
=
(−1)
t
f
(−t)dt
+
tk fc (t)dt
ac
− 0 ζe 2σ2 dζ
0
0
R t − (ζ2 −t2 )
e 2σ2 dζ
= (−1)k k!ηkac (0) + k!ηkc (0),
0
q
t2
Since x(t) = πσ2 2 e− 2σ2 δ−1 (t), by Definitions 3 and 4, this which proves the claim.
implies (22) holds, with µ0 = 1 and
Corollary 5. Consider system (1) and a random variable X with
( k
probability density function fX . Assume fX is even and the
σ (k − 1)!!,
if k is even,
causal part of fX is realized by system (1). Then the moments
µk = q 2 k
π σ (k − 1)!!, if k is odd.
of the random variable X and the moments of system (1) at
In accordance with Theorem 4, this shows that the moments zero satisfy the identity
of the random variable X up to the order k ∈ Z≥0 uniquely
µk
(−1)k + 1
=
ηk (0).
(30)
specify the moment of system (24) at zero as t → ∞.
N
k!
2
7
for all k ∈ Z≥0 .
realize the causal part and the anticausal part of the probability
density function of the random variable of interest. For brevity
we do not repeat other versions of these results for probability
density functions realized by systems in explicit form; instead,
we consider the following important example.
Example 6 (The normal distribution). The probability density
function of a random variable X having a normal distribution
with parameter σ ∈ R>0 is defined as
t2
1
fX : R → R, t 7→ √
(36)
e− 2σ2 .
2
2πσ
Proof. Let fc and fac be the causal part and the anticausal part of fX , respectively. By hypothesis the identity fX (t) = fc (t)δ−1 (t) + fac (t) δ−1 (t) = fc (t)δ−1 (t) +
fc (−t) δ−1 (t) holds for all t ∈ R. By the time-reversal property of the Laplace transform [83, p. 687], this implies
L{fX }(s) = L{fc }(s) + L{fc }(−s), from which the claim
follows.
Example 5 (The Laplace distribution). The probability density
function of a random variable X having a Laplace distribution
with parameter λ ∈ R>0 is defined as
fX : R → R, t 7→
λ −λ|t|
e
.
2
t2
1
e− 2σ2 δ−1 (t),
The causal part of fX is fc : R → R, t 7→ √2πσ
2
while the anticausal part of fX is fac : R → R,
t2
1
e− 2σ2 δ−1 (t). The causal part and the time
t 7→ √2πσ
2
reversal of the anti-causal part of fX are both realized by the
linear, time-varying system
1
t
u, y = x,
(37)
ẋ = − 2 x + √
σ
2πσ 2
(31)
The causal part of fX is fc : R → R, t 7→ λ2 e−λt δ−1 (t), while
the anticausal part of fX is fac : R → R, t 7→ λ2 eλt δ−1 (t).
The causal part and the time reversal of the anti-causal part of
fX are both realized by the minimal system
λ
u, y = x,
(32)
2
in which x(t) ∈ R, u(t) ∈ R, y(t) ∈ R. Thus, by Theorem 6,
the moment of order k ∈ Z≥0 of the random variable X is
in which x(t) ∈ R, u(t) ∈ R, y(t) ∈ R. Consider now the
interconnection of system (6) and of system (37), with
v = y, set s⋆ = 0 and note that Assumptions 1, 2, 3
and 4 hold. In analogy with Example 4, noting that equation (15) boils down to (25) gives Υ∞ (t) = eSt H+ (t), with
H+ (t) defined as in (26). For a suitable signature matrix
Ψk , defining ΥT,∞ (t) = Υ∞ (t) + Ψk Υ∞ (t) and noting that
ẋ = −λx +
k!
k!
+ k.
(33)
2λk
2λ
In agreement with Corollary 5, since fX is even and the
causal part of fX is realized by system (32), the moment
of order k ∈ Z≥0 of the random variable X can be written
k
as µk = (−1)2 +1 λk!k , which is consistent with formula (33).
Finally, we emphasise that a simple exercise in integration
shows that the moments of the random variable X are indeed
given by (33).
N
µk = (−1)k
t2
1
x(t) = √2πσ
e− 2σ2 δ−1 (t), by Definitions 3 and 4, allows to
2
conclude (22) holds, with µ0 = 1 and
(
σ k (k − 1)!!, if k is even,
µk =
0,
if k is odd.
Generalising the results of Theorem 6, this shows that the
moments up to the order k ∈ Z≥0 of the random variable X
uniquely specify the moment of system (24) at zero as t → ∞.
Note also that ΥT,∞ can be written as ΥT,∞ = (I + Ψk ) Υ∞ ,
which, in a broad sense, is in agreement with Corollary 5. N
Remark 6. Theorem 6 and Corollary 5 allow to establish
Corollaries 1, 2 and 3 for a random variable X with probability
density function defined on the whole real axis, provided that
its causal part and the time reversal of its anti-causal part are
realized by systems of the form (27) and (28), respectively. D. Probability density functions with compact support
This can be achieved noting that the moments up to the order
We now concentrate on probability density functions with
k ∈ Z≥0 of the random variable
X are given
the entries of compact support. To begin with a limitation of the characterisa
by (k+1)×(2k+2)
ΨT ΥT BT , where ΨT = Dk Ψk Dk ∈ R
, tion of the moments of a random variables in terms of explicit
with Ψk ∈ R(k+1)×(k+1) a signature matrix and systems is illustrated through a simple example.
Dk = diag(k!, . . . , 1!, 1),
and
ΥT ∈ R(2k+2)×(na +nac )
Example 7 (The uniform distribution). Suppose we wish to find
is the unique solution of the Sylvester equation
a realization of the probability density function of a random
J0 ΥT + ek+1 CT = ΥT AT ,
(34) variable X having a uniform distribution with parameters
a, b ∈ R>0 , with a < b, defined as
with
1
′
1[a,b] (t).
(38)
fX : R → R, t 7→
Bac
Cac
Aac 0
′
b−a
, BT =
, CT =
. (35)
AT =
′
Cc
0 Ac
Bc
Clearly, any explicit realization of the form (12) necessarily
△ violates Assumption 1, since fX is zero everywhere except on a
The arguments used to prove the results above extend compact subset of the real line. As a result, the theory developed
immediately to the case in which the probability density in Section III-B does not apply. However, the probability density
function a given random variable is defined on the whole function (38) can be also interpreted as the impulse response
real axis and its causal and anticausal parts are realized by of the linear, time-delay system with discrete constant delays
a system in explicit form. The key point is that one has to
1
ẋ =
(ua − ub ) , y = x,
(39)
consider a signed sum of the moments of the systems which
b−a
8
Pν−1
in which a, b ∈ R>0 , with a < b, and q(t) =
q k tk ,
Pν−1 kk=0
k
with qν−1 6= 0. Defining Q1 (s) =
=
k=0 q1 s
P
Pν−1 Pk k! i ν−k+i−1
ν−1 k k
and
Q
(s)
=
q
s
=
a
s
2
k=0 2
k=0 Pi=0 i!
Pν−1
k
k! i ν−k+i−1
the
Laplace
transform
k=0
i=0 i! b s
−sa
−sb
2 (s)e
of (43) can be written as L{fX } = Q1 (s)e s−Q
ν
and has aP removable singularity at zero if and
l
k
q1k al−k − q2k bl−k = 0 for all
only if
k=0 (−1)
l ∈ {0, 1, . . . , ν − 1}. Under these conditions, the probability
density function (43) can be realised by system (8) setting,
e.g., τ1 = a, τ2 = b,
0
J0 0
eν
, B2 =
A0 =
, B1 =
,
0
eν
0 J0
i.e. by system (8) with x(t) ∈ R, u(t) ∈ R, y(t) ∈ R, τ1 = a,
1
1
τ2 = b, B1 = b−a
, B2 = − b−a
and C0 = 1, A(s) = 0,
1
−sa
−sb
− e ), C(s) = 1. Note that the moments
B(s) = b−a (e
of system (39) are not classically defined at zero, since
0 ∈ σ(A(s)). However, since zero is a removable singularity9
of the transfer function of system (39), the moments of
system (39) can be defined and characterised by means of
Sylvester equations and impulse responses using the notions
and results introduced in [85 – 87]. In particular, the moments
of system (39) at zero satisfy the identity
−J0 a
e
− e−J0 b
′
[ ηk (0) · · · η1 (0) η0 (0) ] = Ψk
Υk , (40)
b−a
for every k ∈ Z≥0 , with Ψk ∈ R(k+1)×(k+1) a signature matrix
and Υk ∈ Rk+1 a solution of the (Sylvester) equation
J0 Υk + ek+1 = 0,
C1 =
(41)
C2 =
To see this, note that
ηk (0) =
ak+1 − bk+1
.
(k + 1)!(b − a)
j=1
=
k+1
X
j=1
q11
0
0
···
···
q1ν−1
0 q20
0 0
q21
···
···
0
q2ν−1
,
,
Example 8 (The triangular distribution). The probability density
function of a random variable X having a triangular distribution
with parameter τ ∈ R>0 is defined as
|t|
1
.
(44)
fX : R → R, t 7→ max 0, 1 −
τ
τ
allows to conclude
−J0 a
∞
X
(−1)j aj − bj j
e
− e−J0 b
Υk =
J Υk
b−a
j!
b−a 0
j=0
=
q10
and the moments of the random variable X can be shown to
be in one-to-one correspondence with the moments at zero
of such system. To illustrate this point, we consider a simple
example.
(42)
Exploiting the definition of matrix exponential, the identity
(41) and the property
(
ej , for 1 ≤ j ≤ k + 1,
j−1
J0 ek+1 =
0, for j ≥ k + 2,
k+1
X
The causal part of fX is fc : R → R, t 7→ τ1 1 − τt 1[0,τ ] (t),
while the anticausal part of fX is fac : R → R,
t 7→ τ1 1 + τt 1[−τ,0] (t). The causal part and the time
reversal of the anti-causal part of fX are both realized by the
time-delay system
(−1)j+1 aj − bj j−1
J
ek+1
j!
b−a 0
(−1)j+1 aj − bj
ej
j!
b−a
ẋ = e2 u + e4 uτ ,
which, in view of (42), proves the identity (40). We emphasise
that, in line with the results developed for probability density
functions realized by linear systems, the relation (20) between
the moments of fX and the moments of the corresponding
realization is satisfied: a one-to-one correspondence exists
between the moments of the random variable X and the
ak+1 −bk+1
moments of system (39), since µk = (k+1)(b−a)
.
N
y=
e′
τ e′2 − e′1
x + 32 xτ ,
2
τ
τ
(45)
i.e. by system (8) with x(t) ∈ R4 , u(t) ∈ R, y(t) ∈ R, τ1 = 0,
τ e′ −e′
e′
τ2 = τ , A0 = 0, B1 = e2 , B2 = e4 , C1 = 2τ 2 1 , C2 = τ 32 .
The moment at zero of order k ∈ Z≥0 of the system is
ηk (0) =
(−1)k τ k + τ k
,
2(k + 2)!
(46)
The main reason why it is possible to characterise in systemstheoretic terms the moments of a random variable having a which is consistent with the identity (20), since the moment
uniform distribution is that zero is a removable singularity of order k ∈ Z≥0 of the random variable X reads as
of the transfer function of the associated time-delay system.
(−1)k τ k + τ k
This observation allows to generalise the argument used in
.
(47)
µk =
2(k + 2)(k + 1)
Example 7 to treat random variables the probability density
function of which has compact support and is polynomial on
the complement of its zero set. To see this, consider a random This also implies that a one-to-one correspondence exists
variable X having a probability density function of the form between the moments of the random variable X and the
moments of the system. We emphasise that exploiting the
fX : R → R, t 7→ q(t)1[a,b] (t),
(43) argument of Example 7 the moments of the system (and thus
those of the random variable X) may be computed using the
9 See, e.g., [84].
formula (11).
N
9
IV. A PPLICATIONS
This section contains a series of applications of the proposed
ideas. We first focus on the identifiability of probability density
functions admitting a linear realization. Then, a systemstheoretic interpretation for sums of independent random variables, the notion of mixture distribution and basic results from
renewal theory are provided. Finally, connections between the
approximation of probability density functions and the model
reduction problem are studied.
A. Identifiability of probability density functions with linear
realizations
systems have the same impulse response. However, a oneto-one correspondence between impulse responses and their
realizations can be established resorting to canonical forms
[80], such as the observer canonical form, defined by constant
matrices A ∈ Rn×n , B ∈ Rn and C ∈ R1×n of the form
−αn 1 0 . . . 0
βn
..
..
.
. 0 1 . . . ..
.
′
..
.. . . . .
A = ..
, B=
, C = e1 , (48)
.
.
.
0
.
.
−α2 0 . . . 0 1
β2
β1
−α1 0 . . . 0 0
with α = (α1 , . . . , αn ) ∈ Rn and β = (β1 , . . . , βn ) ∈ Rn .
With these premises, we may recover the following wellknown result [51].
We begin this section considering the case in which the
probability density function of a given random variable is
parameterized by a fixed set of parameters. In other words, Lemma 6. Let Θ be an open subset of Rd , representing the
while in the previous sections the parameters of probability parameter set, and let FX be a family of probability density
density functions have been assumed to be known, in this functions defined on the real axis and associated with a random
section parameters are constant unknown quantities, which in variable X. Assume every fX ∈ FX is realized by system (1)
principle can (or must) be estimated. In particular, we study the with A ∈ Rn×n , B ∈ Rn and C ∈ R1×n as in (48) and let
identifiability of parametric families of probability density func- θ = (α, β) ∈ Θ . Then the family of probability density
tions the elements of which admit a linear realization. This is functions FX is identifiable if and only if every pair (A, B)
important, for example, in the context of parametric estimation, is reachable.
where identifiability allows to avoid redundant parametrisations
Proof. Note that the family of probability density functions
and to achieve consistency of estimates [50 , 51 , 88].
d
Let Θ be an open subset of R representing the parameter FX is identifiable if and only if for every fX ∈ FX the map
set and let FX be a family of probability density functions
βn sn−1 + . . . + β2 s + β1
(α, β) 7→ FX (s; α, β) = n
defined on the real axis and associated with a random variable
s + αn sn−1 + . . . + α2 s + α1
X. Every element fX ∈ FX is a probability density function
t 7→ fX (t; θ) which is known once the element θ ∈ Θ has been with FX (s; α, β) = L{fX }, is injective. This, in turn, corresponds to the numerator and denominator of the rational
fixed.
function FX (s; α, β) being coprime. As a result, the identiDefinition 7. [88] The parameters θ1 ∈ Θ and θ2 ∈ Θ are
fiability of the family FX is equivalent to the minimality of
observationally equivalent if fX (t; θ1 ) = fX (t; θ2 ) for almost
system
(1) and, by observability of the pair (A, C), to the
all10 t ∈ R.
reachability of the pair (A, B), which proves the claim.
The notion of observational equivalence induces an equivalence relation on the parameter set, defined as θ1 ∼ θ2 if θ1 ∈ Θ Remark 7. A dual result can be proved using the controllability
and θ2 ∈ Θ are observationally equivalent. The parameter set is canonical form as long as observability, and hence minimality,
therefore partitioned into equivalence classes the cardinality of is enforced. This suggests that the identifiability of a family of
which determines the identifiability of the family of probability probability density functions admitting a linear realization is
density functions considered, as specified by the following equivalent to the minimality of a given canonical realization,
which can be thus taken as the definition of identifiability. △
definition.
Definition 8. The family of probability density functions FX
is identifiable if θ1 ∼ θ2 implies θ1 = θ2 for all θ1 ∈ Θ and
θ2 ∈ Θ.
B. Sums of independent random variables
A classical theorem of probability theory states that the probA characterisation of the identifiability of a family of ability density function of the sum of two jointly continuous,
probability density functions admitting a linear realization independent random variables is given by the convolution of
can be given by means of the systems-theoretic notion of their probability density functions (see, e.g., [73]). This result
minimality. To this end, note that the description of proba- can be given a simple systems-theoretic interpretation.
bility density functions as impulse responses has an inherent
Theorem 7. Let X1 and X2 be jointly continuous, independent
non-uniqueness issue, since algebraically equivalent11 linear
random variables with probability density functions fX1 and
fX2 realized by the minimal system
10 A property is satisfied for almost all t ∈ R if the set where the property
does not hold has Lebesgue measure equal to zero.
11 The single-input, single-output, continuous-time, linear, time-invariant
systems ẋ1 = A1 x1 + B1 u1 , y1 = C1 x1 , with x1 (t) ∈ Rn , u1 (t) ∈ R,
y1 (t) ∈ R and ẋ2 = A2 x2 + B2 u2 , y2 = C2 x2 , with x2 (t) ∈ Rn ,
u2 (t) ∈ R, y2 (t) ∈ R are algebraically equivalent if there exists a non-singular
matrix T ∈ Rn×n such that A2 = T A1 T −1 , B2 = T B1 , C2 = C1 T −1 .
10
ẋ1 = A1 x1 + B1 u1 ,
y1 = C1 x1 ,
(49)
y2 = C2 x2 ,
(50)
and the minimal system
ẋ2 = A2 x2 + B2 u2 ,
with xj (t) ∈ Rnj , uj (t) ∈ R, yj (t) ∈ R, and Aj ∈ Rnj ×nj ,
Bj ∈ Rnj ×1 and Cj ∈ R1×nj constant matrices for j ∈ {1, 2},
respectively. Then the probability density functions of the random variable Y = X1 + X2 is realized by the interconnection
of system (49) and system (50) with u1 = y2 .
In light of Theorem 7 of Corollary 6, these notions can
be characterised in systems-theoretic terms. In particular,
decomposability (and divisibility) of a random variable are
related to the possibility of describing the corresponding system
as the series interconnection of finitely many (and possibly
identical) systems.
△
Proof. Recall that the probability density function of the sum of
The following result provides a systems-theoretic necessary
two jointly continuous, independent random variables is given
and
sufficient condition which ensures the identifiability of a
by the convolution of their probability density functions [73,
family
of probability density function the elements of which can
Theorem 6.38], i.e. fY = fX1 ∗ fX2 . Taking the Laplace transbe
represented
as the sum of random variables with probability
form on both sides, this implies L{fY } = L{fX1 }L{fX1 }.
density
functions
admitting a linear realization.
Thus, the probability density function fY is realized by the
Theorem
8.
Let
F
interconnection of systems (49) and (50) with u1 = y2 , since
X be a family of probability density functions
the
elements
of
which can be realized as the sum of the
the transfer function associated with the probability density
probability
density
functions fX1 and fX2 . Assume fX1 and
function fY is the product of the transfer functions associated
f
are
realized
by
the minimal systems (49) and (50),
with the probability density functions fX1 and fX2 .
X2
respectively. Then the family of probability density functions
The following result is an immediate extension of Theorem 7. F is identifiable if and only if
X
Corollary 6. Let X1 , X2 , . . . , XN be jointly continuous, in(i) the polynomials C1 adj(sI − A1 )B1 and det(sI − A2 )
dependent random variables. Assume the probability density
have no common roots and
function fXj is realized by the minimal system
(ii) the polynomials C2 adj(sI − A2 )B2 and det(sI − A1 )
have no common roots.
ẋj = Aj xj + Bj uj , yj = Cj xj ,
(51)
with xj (t) ∈ Rnj , uj (t) ∈ R, yj (t) ∈ R, and Aj ∈ Rnj ×nj ,
Bj ∈ Rnj ×1 and Cj ∈ R1×nj constant matrices for
j ∈ {1, 2, . . . , N }, respectively. Then the probability density
functions of the random variable Y = X1 + X2 + . . . + XN is
realized by the interconnection of the family of systems (51)
with u1 = y2 , u2 = y3 , . . . , uN −1 = yN .
Proof. The identifiability of the family of probability density
functions FX is equivalent to the minimality of the series
interconnection of the minimal systems (49) and (50), which,
in turn, is equivalent to conditions (i) and (ii) [80 , 91].
Example 9 (The Erlang distribution). Suppose we wish to
show that the probability density function fY of the random
variable Y = X1 + X2 + . . . + XN , in which N ∈ Z>0 and
X1 , X2 , . . . , XN are jointly continuous, independent random
variables having exponential distribution with parameter λ ∈
R>0 , is that of an Erlang distribution with parameters λ and
N , defined as
We have seen that the sum of two jointly continuous
independent random variables has a natural interpretation
in terms of the series interconnection of the realizations of
their probability density functions. To provide a probabilistic
counterpart of the notion of parallel interconnection we recall
the following definition, which is adapted from [92].
λN
fY : R → R : t 7→
tN −1 e−λt δ−1 (t).
(N − 1)!
(52)
λN
e′ .
(N − 1)! 1
(53)
Definition 9. A random variable Z with probability density
function fZ is said to arise from a finite mixture distribution if there exist N ∈ Z>0 and jointly continuous random
variables X1 , X2 , . . . , XN with probability density functions
fX1 , fX2 , . . . , fXN such that the probability density function
fZ satisfies fZ = w1 fX1 + w2 fX2 + · · · + wN fXN , for some
w = (w1 , w2 , . . . , wn ) ∈ ∆n−1 . N is referred to as the number
of components of fZ , fX1 , fX2 , . . . , fXN are referred to as the
components of fZ , and w1 , w2 , . . . , wN are referred to as the
weights of fZ .
Recall that a minimal realization of the probability density
function fXj of the random variable Xj is described by
system (21) for all j ∈ {1, 2, . . . , N }. Thus, by Corollary 6, a
realization of the probability density function fY is given by
system (1), in which
A = J−λ ,
B = eN ,
C=
C. Mixture distributions
We conclude this example noting that a direct computation
shows that the probability density function fY is indeed given
by (52).
N
Remark 8. A random variable Y is decomposable if
there exist N ∈ Z>0 , with N ≥ 2, and jointly continuous,
independent random variables X1 , X2 , . . . , XN such that
Y = X1 + X2 + . . . + XN [89]. In case the random variables
X1 , X2 , . . . , XN are also identically distributed, then Y is
said to be divisible [89]. The notions of decomposability and
of divisibility play an important role in probability theory
[90], particularly in the analysis of Lévy processes [7 , 67].
11
Theorem 9. Under the hypotheses of Theorem 7, if the random
variable Z with probability density function fZ arises from
a finite mixture distribution with components fX1 and fX2
and weights w1 6= 0 and w2 6= 0, then the probability density
function fZ is realized by the interconnection of system (49)
and system (50) with u = u1 = u2 and y = w1 y1 + w2 y2 .
Proof. By hypothesis, the probability density function fZ arises
from a finite mixture distribution with components fX1 and fX2
and weights w1 and w2 , i.e. fZ = w1 fX1 + w2 fX2 . Taking the
Laplace transform on both sides, by linearity, yields L{fZ } =
w1 L{fX1 }+w2 L{fX2 }. Thus, the probability density function
fZ is realized by the interconnection of system (49) and
system (50) with u = u1 = u2 and y = w1 y1 + w2 y2 , as
desired.
Example 10 (G/H2 /1 queueing system). Consider the G/H2 /1
queueing system in Fig. 1, in which the arrival process is a
general random process and the service process is governed by
a two-phase hyper-exponential random variable X. A customer
accesses either the service offered by the first server at rate
λ1 ∈ R>0 with probability p1 ∈ (0, 1) or the service offered
by the second server at rate λ2 ∈ R>0 with probability
p2 = 1 − p1 . The probability density function of the random
variable X which represents the service time, i.e. the time
spent by an arbitrary customer in the service process, is
fX (t) = p1 λ1 e−λ1 t δ−1 (t) + p2 λ2 e−λ2 t δ−1 (t) for all t ∈ R≥0 .
In view of Theorem 9, the probability density function of the
service time is realized by system (1), with
′
λ1
p1
−λ1 0
, B=
, C=
.
(54)
A=
0 −λ2
λ2
p2
N
A straightforward generalisation of Theorem 9 is given by
the following result.
Corollary 7. Under the hypotheses of Corollary 6, let the
random variable Z with probability density function fZ
arise from a finite mixture distribution with components
fX1 , fX2 , . . . , fXN and weights w1 , w2 , . . . , wN 6= 0. Then
the probability density function fZ is realized by the interconnection of systems (51) with u = u1 = u2 = . . . = uN and
y = w 1 y1 + w 2 y2 + . . . + w N y N .
We conclude this section presenting a systems-theoretic
necessary and sufficient condition which guarantees the identifiability of finite mixtures admitting a linear realization (see
also [92]).
Theorem 10. Let FX be a family of probability density
functions, the elements of which arise from a finite mixture
distribution with components fX1 and fX2 and weights
w1 ∈ R>0 and w2 ∈ R>0 . Assume fX1 and fX2 are realized
by the minimal systems (49) and (50), respectively. Then the
family of probability density functions FX is identifiable if
and only if σ(A1 ) ∪ σ(A2 ) = ∅.
since p1 , p2 ∈ (0, 1), the family of probability density functions
FX is identifiable if and only if λ ∈ R2 does not lie on the
bisector of the first quadrant, i.e. λ1 6= λ2 . Note that in the case
λ1 = λ2 = λ ∈ R>0 a customer accesses the service offered
by a server at rate λ with probability one and hence the model
is overparameterised. In other words, the queueing system
in question may be equivalently described by the G/M/1
queueing system displayed in Fig. 2, in which the service
process is governed by an exponential random variable X with
parameter λ. As anticipated in Remark 7, this phenomenon is
also captured by the systems-theoretic notion of minimality:
in accordance with Theorem 10 for λ1 = λ2 the system (54)
is not minimal.
N
D. Renewal processes
We complete this section showing that elementary results
from renewal theory [6 , 7 , 69] can be translated in the language
of systems theory using the notion of feedback interconnection.
Definition 10.
[7] A sequence of random variables
{Sj }j∈Z≥0 constitutes a renewal process if it is of the form12
Sj = T1 + T2 + . . . + Tj , where {Tj }j∈Z>0 is a sequence
of mutually independent random variables with a common
distribution F such that F (0) = 0.
The random variable Sj in the above definition is often
referred to as (the j-th) renewal, while the elements of the
sequence {Tj }j∈Z>0 are referred to as waiting times [7 , 69].
The common distribution of the waiting times of a renewal
process is called the waiting-time distribution [69].
The probabilistic behaviour of a renewal process {Sj }j∈Z≥0
is closely related to the random variable Nt , with t ∈ R>0 ,
defined as the largest j ∈ Z≥0 for which Sj ≤ t [93]. The
random variable Nt describes the number of renewals occurred
by time t and its expected value H(t) = E[Nt ], referred to as
the renewal function, satisfies the integral equation of renewal
theory [93]. Moreover, if the waiting-time distribution of the
renewal process is absolutely continuous then the renewal
density, defined as h(t) = Ḣ(t) for all t ∈ R>0 , satisfies the
renewal density integral equation [93], i.e.
Z t
h(t − ζ)f (ζ)dζ,
(55)
h(t) = f (t) +
0
Proof. The family of probability density functions FX is
equivalent to the minimality of the parallel interconnection with
weights w1 ∈ R>0 and w2 ∈ R>0 of the minimal systems (49)
and (50), which, in turn, is equivalent to σ(A1 ) ∪ σ(A2 ) = ∅
[80 , 91].
in which f is the derivative of the waiting-time distribution.
Theorem 11. Let {Sj }j∈Z≥0 be a renewal process. Assume
the waiting-time distribution of the renewal process {Sj }j∈Z≥0
admits a probability density function f which is realized by
the minimal system (1). Then the renewal density h of the
renewal process {Sj }j∈Z≥0 is realized by the system obtained
from system (1) with input v(t) ∈ R, output y(t) ∈ R, and
interconnection equation u = v + y.
Example 11 (G/H2 /1 queueing system, continued). Suppose
we are interested in finding conditions under which the family
of probability density functions
2
X
2
FX = t 7→ fX (t; λ) =
pj λj e−λj t δ−1 (t), λ ∈ R>0
,
j=1
which describes the service time of the G/H2 /1 queueing
system displayed in Fig. 1, is identifiable. Note that the
probability density function of the service time is realized by
system (1), with matrices defined as in (54). By Theorem 10,
12
Proof. To begin with note that under the stated assumptions
the renewal density h of the renewal process {Sj }j∈Z≥0 is
well-defined [69, Proposition 2.7]. In addition, the renewal
density h satisfies the renewal density integral equation (55).
This implies that the Laplace transform of the renewal density
12 By
convention, S0 = 0 and 0 counts as zero occurrences [7].
λ1
p1
λ
Arrival process
Arrival process
Queue
p2
Service process
Queue
λ2
Service process
Fig. 1: G/H2 /1 queueing system.
Fig. 2: G/M/1 queueing system.
is such that L{h} = L{f }/(1 − L{f }) [93, p.252]. Thus,
since by hypothesis L{f } coincides with the transfer function
of system (1), the renewal density h is realized by the system
obtained from system (1) with input v(t) ∈ R, output y(t) ∈ R,
and interconnection equation u = v + y.
with α ∈ R1×n such that α1 = 1, the probability density
function of X reads as
Example 12 (Poisson processes). Poisson processes are renewal
processes in which the waiting times have an exponential
distribution [70]. In other words, a renewal process is a Poisson
process if there exists λ ∈ R>0 such that the probability density
function of each waiting time is f (t) = λe−λt δ−1 (t), for all
t ∈ R≥0 . In view of Theorem 11, the renewal density of the
process is realized by the system obtained from system (21)
with input v(t) ∈ R, output y(t) ∈ R, and interconnection
equation u = v + y, i.e. by system (1), with A = 0, B = λ,
C = 1. Note that the impulse response of the system is given
by h(t) = λδ−1 (t) for all t ∈ R, which is consistent with the
fact that the renewal function of a Poisson process can be
written as H(t) = λt for all t ∈ R≥0 [94].
N
′
fX (t) = Q′0 eQ t α′ δ−1 (t),
(56)
for every t ∈ R. Note that (56) can be regarded as the
impulse response of system (1), with A = Q′ , B = α′ and
C = Q′0 . This indicates that the problem of approximating the
probability density function (56) can be regarded as the problem
of approximating the impulse response of system (1) or,
equivalently, the problem of constructing a reduced order model
of system (1) [25]. In particular, approximating the probability
density function (56) by another phase-type distribution boils
down to constructing a system
ξ˙ = F ξ + Gv,
ψ = Hξ,
(57)
in which ξ(t) ∈ Rν , with ν < n, v(t) ∈ R, ψ(t) ∈ R and
F ∈ Rν×ν , G ∈ Rν×1 and H ∈ R1×ν are constant matrices
such that 1′ G = 1 and H = −1′ F . To illustrate this point we
consider a simple example and exploit the model reduction
technique devised in [61 , 62] to obtain reduced order models
(i.e. approximations of a given probability density function)
E. On the approximation of probability density functions
which match a prescribed number of (systems-theoretic and,
This section investigates some connections between the thus, probabilistic) moments. Note, however, that different
approximation of probability density functions and the model model reduction techniques can be used to solve the problem
reduction problem [25]. In particular we show that these prob- in question (see [25] and references therein for an overview
lems are essentially the same problem when probability density of available model reduction techniques).
functions are regarded as impulse responses. As a guiding Example 13. Consider the Markov chain described by the
example, we consider phase-type distributions [30 , 31 , 33 , 34], diagram
which play an important role in the analysis of queuing
λ
networks [94] and can be represented by a random variable
0
1
2
λ
describing the time until absorption of a Markov chain with one
µ
absorbing state [95]. Note, however, that similar considerations
can be performed for more general classes of probability density The Q-matrix of the Markov chain is
functions.
0
0
0
Consider a continuous-time Markov chain over the set
,
0
Q = λ −λ
S = {0, 1, . . . , n}, with n ∈ Z>0 , in which 0 is an absorbing
µ
λ −(λ + µ)
state and 1, . . . , n are transient states. The random variable
X which characterises the time until absorption is described and
λ
−λ
0
by a continuous phase-type distribution represented by the
Q0 =
Q=
,
.
µ
λ −(λ + µ)
Q-matrix [37]
0
0
Let α = [ 0 0 1 ] be the the initial condition of the Markov
,
Q=
Q0 Q
chain and let X be the random variable which characterises the
time until absorption. A realization of the probability density
in which S ∈ Rn×n is such that Q0 = −Q1. Assuming that
function of the random variable X is given by system (1), with
the initial probability of the Markov chain is
−λ
λ
0
A=
, B=
, C= λ µ .
π0 = 0 α ,
0 −(λ + µ)
1
13
TABLE I: Random variables and associated probability density
function, parameters and realization.
Exponential
Half-normal
Laplace
Normal
Uniform
q
λe−λt δ−1 (t)
λ ∈ R>0
TABLE II: Correspondence between concepts of probability
theory and of systems theory.
(21)
Sum of independent random variables
Series interconnection
σ ∈ R>0
(24)
Mixture distribution
Parallel interconnection
λ ∈ R>0
(32)
Renewal process
Feedback interconnection
σ ∈ R>0
a, b ∈ R
(a < b)
(37)
t2
−
2
e 2σ2 δ−1 (t)
πσ 2
λ −λ|t|
e
2
2
− t
√ 1
e 2σ2
2
2πσ
1
1
(t)
b−a [a,b]
|t|
1
max 0, 1 − τ
τ
n
n−1
−λt
λ t
e
V. C ONCLUSION
(39)
Moments of continuous random variables admitting a
probability density function have been studied. Under certain
δ−1 (t)
λ ∈ R>0
(53)
Erlang
(n − 1)!
hypotheses, the moments of a random variable have been shown
to be in one-to-one correspondence with the solutions of a
Sylvester equation and with the steady-state output response
Following [62], to construct a reduced order model which of a specific interconnected system. The results established
matches the moment of order one of the system at zero one in this work have shown that, under certain assumptions, a
needs to solve the Sylvester equation (4), with S = 0 and system can be seen as an alternative, equivalent description
M = 1, which gives Υ1 = [ −1 − 1 ]. Then one defines of the probabilistic structure of a random variable. This, in
turn, indicates that systems-theoretic techniques and tools can
reduced order model (57) as
be used to revisit and shed new light on classical results
F = S − M ∆, G = −Υ1 B, H = ∆,
of probability theory and statistics. Table I displays a short
list of random variables along with their probability density
with ∆ ∈ (0, 1) a free parameter that can be assigned, yielding functions, parameters and associated realization, while Table II
summarises the correspondence between certain concepts of
F = −∆,
G = 1,
H = ∆.
probability theory and their systems-theoretic counterpart.
The present work is a first step towards a unified understandNote that the structure of the original system is preserved ing of the role of systems-theoretic concepts in probability
in the reduced order model, since 1′ G = 1 and H = −1′ F . theory and statistics. Several directions for interdisciplinary
Moreover, in agreement with the results of [62], the moment research are left open. For example, discrete-time systems
at zero of the reduced order model coincides with the moment [99] provide the right tool to carry out the analysis presented
at zero of the original system, regardless of the value of ∆.
in this work for discrete random variables. Hybrid systems
From a probabilistic point of view, the impulse response of [100 , 101] may be used to deal with random variables the
the reduced order model corresponds to the probability density distribution function of which has both a discrete part and
e which quantifies the time
function of the random variable X
an absolutely continuous part. Moments of multivariate disuntil absorption of the “reduced” Markov chain described by tributions may be studied resorting to systems described by
the diagram
PDEs [102 , 103], for continuous random variables, and nD
systems [104], for discrete random variables. As a consequence,
0
1
∆
conditional probabilities may be characterised in systemstheoretic terms. Note, however, that modelling moments of
and the Q-matrix
multivariate distributions using PDEs might raise challenging
computational issues. Further connections may be explored with
0
0
e=
.
Q
positive systems [38] and, in particular, with Markov chains
∆ −∆
[5 , 6 , 8 , 35 – 37]. The interplay between the role of Hankel
It is interesting to note that the “reduced” Markov chain can matrices in realization theory [79 – 81] and in probability
be interpreted as Markov chain built from the original Markov theory [5 – 8] may be studied. The notions of information and
chain by aggregation of the states 2 and 3, thus showing the of entropy may be investigated using the proposed framework.
connection between the model reduction problem and the use The discussion on the approximation of probability density
of the concept of aggregation to reduce the state space of a functions which arise from phase-type distributions can be
Markov chain [96 – 98].
generalised to probability density functions with a systemsWe conclude this example by emphasising that there is no theoretic representation in explicit form, for which model
natural choice of the parameter ∆. For example, one may reduction techniques are available [58 , 77]. Finally, a systemsselect ∆ = λ to ensure that in the “reduced” Markov chain theoretic counterpart of the method of moments [18] may
the transition rate from 1 to 0 matches that of the original be developed modifying existing data-driven model reduction
which methods [63].
Markov chain. Another sensible choice is ∆ = λ+µ
2
guarantees that the moments of order two at zero of the reduced
order model and of the original system coincide, so that the
R EFERENCES
e also
moments of order two of the random variables X and X
[1] H. Cramér, Mathematical methods of statistics. Princeton, NJ: Princeton
coincide.
N
Univ. Press, 1946.
Triangular
τ ∈ R>0
(45)
14
[2] M. G. Kendall and A. Stuart, The advanced theory of statistics. London,
U.K.: Griffin, 1977, vol. I.
[3] R. J. Serfling, Approximation theorems of mathematical statistics. New
York: Wiley, 1980.
[4] E. L. Lehmann and G. Casella, Theory of point estimation (2nd edition).
New York: Springer, 1998.
[5] J. L. Doob, Stochastic processes. New York: Wiley, 1953.
[6] W. Feller, An introduction to probability theory and its applications
(3rd edition). New York: Wiley, 1971, vol. 1.
[7] ——, An introduction to probability theory and its applications (3rd
edition). New York: Wiley, 1971, vol. 2.
[8] P. Billingsley, Probability and measure (2nd edition). New York:
Wiley, 1986.
[9] W. Rudin, Principles of mathematical analysis. New York: McGrawHill, 1964.
[10] T. J. Stieltjes, “Recherches sur les fractions continues,” in Ann. Fac.
Sci. Toulouse Sci. Math. Sci. Phys., vol. 8, 1894, pp. 1–122.
[11] H. Hamburger, “Über eine erweiterung des stieltjesschen momentenproblems,” Math. Ann., vol. 81, pp. 235–319, 1920.
[12] F. Hausdorff, “Momentprobleme für ein endliches intervall,” Math. Z.,
vol. 16, pp. 220–248, 1923.
[13] Y. V. Vorobyev, Methods of moments in applied mathematics (translated
from russian by B. Seckler). New York: Gordon and Breach, 1965.
[14] H. Fischer, A history of the central limit theorem: from classical to
modern probability theory. New York: Springer, 2010.
[15] M. Kac, “On some connection between probability theory and differential and integral equations,” Proc. 2nd Berkeley Sympos. Math. Stat.
Prob, 1951, pp. 189–215.
[16] E. P. Wigner, “On the distribution of the roots of certain symmetric
matrices,” Ann. Math., vol. 67, pp. 325–327, 1958.
[17] S. Karlin and J. L. Mc Gregor, “The differential equations of birth-anddeath processes, and the Stieltjes moment problem,” Trans. Amer. Math.
Soc., vol. 85, no. 2, pp. 481–546, 1957.
[18] K. Pearson, “Contributions to the mathematical theory of evolution,”
Phil. Trans. Roy. Soc., vol. 185a, pp. 71–110, 1894.
[19] J. H. Friedman and J. W. Tukey, “A projection pursuit algorithm for
exploratory data analysis,” IEEE Trans. Comput., vol. 23, pp. 881–890,
1974.
[20] P. J. Huber, “Projection pursuit,” Ann. Stat., vol. 13, no. 2, pp. 435–475,
1985.
[21] J. H. Friedman, “Exploratory projection pursuit,” J. Amer. Stat. Ass.,
vol. 82, pp. 249–266, 1987.
[22] N. I. Akhiezer, The classical moment problem. New York: Hafner,
1965.
[23] S. Karlin and W. J. Studden, Tchebycheff systems: with applications in
analysis and statistics. New York: Interscience, 1966.
[24] H. J. Landau, Ed., Moments in mathematics. Providence, RI: Amer.
Math. Soc., 1987.
[25] A. C. Antoulas, Approximation of large-scale dynamical systems.
Philadelphia, PA: SIAM, 2005.
[26] C. Commault and S. Mocanu, “Phase-type distributions and representations: some results and open problems for system theory,” Int. J.
Control, vol. 76, no. 6, pp. 566–580, 2003.
[27] L. Benvenuti and L. Farina, “A tutorial on the positive realization
problem,” IEEE Trans. Autom. Control, vol. 49, no. 5, pp. 651–664,
2004.
[28] B. Hanzon and R. J. Ober, “A state-space calculus for rational probability
density functions and applications to non-gaussian filtering,” SIAM J.
Control Optim., vol. 40, no. 3, pp. 724–740, 2001.
[29] ——, “State space calculations for discrete probability densities,” Linear
Algebra Appl., vol. 350, no. 1, pp. 67–87, 2002.
[30] C. A. O’Cinneide, “Characterization of phase-type distributions,” Stoch.
Models, vol. 6, no. 1, pp. 1–57, 1990.
[31] ——, “Phase-type distributions: open problems and a few properties,”
Stoch. Models, vol. 15, no. 4, pp. 731–757, 1999.
[32] M. Bladt and M. F. Neuts, “Matrix-exponential distributions: Calculus
and interpretations via flows,” Stoch. Models, vol. 15, no. 4, pp. 731–757,
2003.
[33] M. F. Neuts, Matrix-geometric solutions in stochastic models: an
algorithmic approach. Baltimore, MD: John Hopkins Univ. Press,
1981.
[34] G. Latouche and V. Ramaswami, Introduction to matrix analytic methods
in stochastic modeling. Philadelphia, PA: SIAM, 1999.
[35] E. B. Dynkin, Markov processes. New York: Plenum, 1963.
[36] K. L. Chung, Markov chains with stationary transition probabilities.
New York: Springer-Verlag, 1967.
15
[37] J. R. Norris, Markov chains. Cambridge, U.K.: Cambridge Univ. Press,
1998.
[38] L. Farina and S. Rinaldi, Positive linear systems - theory and applications. New York: Wiley, 2000.
[39] F. Gantmacher, The theory of matrices. New York: Chelsea, 1959,
vol. II.
[40] E. Seneta, Non-negative matrices and Markov chains (2nd edition).
New York: Springer-Verlag, 1981.
[41] R. A. Horn and C. R. Johnson, Matrix analysis. Cambridge, U.K.:
Cambridge Univ. Press, 1985.
[42] H. Minc, Nonnegative matrices. New York: Wiley, 1988.
[43] C. G. Cassandras and S. Lafortune, Introduction to discrete event
systems. New York: Kluwer, 1999.
[44] P. Brémaud, Markov chains: Gibbs fields, Monte Carlo simulation, and
queues. New York: Springer-Verlag, 2001.
[45] M. Andretta, “Some considerations on the definition of risk based on
concepts of systems theory and probability,” Risk Analysis, vol. 34,
no. 7, pp. 1184–1195, 2014.
[46] N. Leveson, “A systems approach to risk management through leading
safety indicators,” Reliab. Eng. Syst. Saf., vol. 136, pp. 17–34, 2015.
[47] W. C. Elmore, “The transient response of damped linear networks with
particular regard to wideband amplifiers,” J. App. Phys., vol. 19, no. 1,
pp. 55–63, 1948.
[48] R. E. Kalman, “A new approach to linear filtering and prediction
problems,” J. Basic Eng., vol. 82, pp. 35–45, 1960.
[49] R. E. Kalman and R. S. Bucy, “New results in linear filtering and
prediction theory,” J. Basic Eng., vol. 83, pp. 95–108, 1961.
[50] T. D. Söderström and P. G. Stoica, System identification. Upper Saddle
River, NJ: Prentice Hall, 1989.
[51] L. Ljung, System identification - theory for the user (2nd edition).
Upper Saddle River, NJ: Prentice Hall, 1999.
[52] T. Kailath, A. H. Sayed, and B. Hassibi, Linear estimation. Englewood
Cliffs, NJ: Prentice Hall, 1999.
[53] P. Van Overschee and B. De Moor, Subspace identification for linear
systems: theory - implementation - applications. Dordrecht, The
Netherlands: Kluwer, 1996.
[54] T. Katayama, Subspace methods for system identification. London,
U.K.: Springer, 2006.
[55] M. Verhaegen and V. Verdult, Filtering and system identification: a
least squares approach. Cambridge, U.K.: Cambridge Univ. Press,
2007.
[56] A. Lindquist and G. Picci, Linear Stochastic Systems. Berlin, Germany:
Springer-Verlag, 2015.
[57] L. A. Zadeh and C. A. Desoer, Linear system theory: the state space
approach. New York: McGraw-Hill, 1963.
[58] G. Scarciotti and A. Astolfi, “Model reduction by matching the steadystate response of explicit signal generators,” IEEE Trans. Autom. Control,
vol. 61, no. 7, pp. 1995–2000, 2016.
[59] K. A. Gallivan, A. Vandendorpe, and P. Van Dooren, “Sylvester
equations and projection-based model reduction,” J. Comp. Applied
Math., vol. 162, no. 1, pp. 213–229, 2004.
[60] ——, “Model reduction and the solution of Sylvester equations,” in
Proc. MTNS, Kyoto, Japan, 2006.
[61] A. Astolfi, “Model reduction by moment matching for linear and
nonlinear systems,” IEEE Trans. Autom. Control, vol. 55, no. 10, pp.
2321–2336, 2010.
[62] ——, “Model reduction by moment matching, steady-state response
and projections,” in Proc. 49th Conf. Decision Control. IEEE, 2010,
pp. 5344–5349.
[63] G. Scarciotti and A. Astolfi, “Data-driven model reduction by moment
matching for linear and nonlinear systems,” Automatica, vol. 79, pp.
340–351, 2017.
[64] J. Carr, Application of center manifold theory. New York: SpringerVerlag, 1981.
[65] C. I. Byrnes and A. Isidori, “Steady state response, separation principle
and the output regulation of nonlinear systems,” in Proc. 28th Conf.
Decision Control, Tampa, FL, USA, 1989, pp. 2247–2251.
[66] A. Isidori and C. I. Byrnes, “Output regulation of nonlinear systems,”
IEEE Trans. Autom. Control, vol. 35, no. 2, pp. 131–140, 1990.
[67] B. V. Gnedenko and A. N. Kolmogorov, Limit distributions for sums
of independent random variables (2nd edition). Cambridge, MA:
Addison-Wesley, 1968.
[68] B. S. Everitt and D. J. Hand, Finite mixture distributions. London,
U.K.: Chapman and Hall, 1981.
[69] S. Asmussen, Applied probability and queues. New York: Springer,
2008.
[70] R. G. Gallager, Stochastic processes: theory for applications. Cambridge, U.K.: Cambridge Univ. Press, 2013.
[71] A. Padoan and A. Astolfi, “Moments of random variables: a linear
systems-theoretic view,” in Amer. Control Conf., Seattle, WA, USA,
2017, pp. 1035–1040.
[72] G. Scarciotti and A. Astolfi, “Model reduction of neutral linear and
nonlinear time-invariant time-delay systems with discrete and distributed
delays,” IEEE Trans. Autom. Control, vol. 61, no. 6, pp. 1438–1451,
2016.
[73] G. Grimmett and D. Welsh, Probability: an introduction. Oxford,
U.K.: Oxford Univ. Press, 2014.
[74] R. B. Bapat, Graphs and matrices. London, U.K.: Springer, 2010.
[75] A. Isidori and C. I. Byrnes, “Steady-state behaviors in nonlinear systems
with an application to robust disturbance rejection,” Ann. Rev. Control,
vol. 32, no. 1, pp. 1–16, 2008.
[76] A. Isidori, Nonlinear control systems (3rd edition). New York: SpringerVerlag, 1995.
[77] G. Scarciotti and A. Astolfi, “Characterization of the moments of a
linear system driven by explicit signal generators,” Proc. Amer. Control
Conf., Chicago, IL, USA, 2015, pp. 589–594.
[78] J. K. Hale, Ordinary differential equations (2nd edition). Malabar, FL,
USA: Kreiger, 1980.
[79] R. E. Kalman, P. L. Falb, and M. A. Arbib, Topics in mathematical
system theory. New York: McGraw-Hill, 1969, vol. 33.
[80] T. Kailath, Linear systems. Englewood Cliffs, NJ: Prentice Hall, 1980.
[81] E. D. Sontag, Mathematical control theory: deterministic finite dimensional systems (2nd edition). New York: Springer-Verlag, 1998.
[82] P. Embrechts and H. Schmidli, “Modelling of extremal events in
insurance and finance,” ZOR, vol. 39, no. 1, pp. 1–34, 1994.
[83] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, Signals and systems.
Upper Saddle River, N.J.: Prentice Hall, 1997.
[84] J. B. Conway, Functions of one complex variable. New York: SpringerVerlag, 1973.
[85] A. Padoan and A. Astolfi, “Model reduction by moment matching at
isolated singularities for linear systems: a complex analytic approach,”
in IFAC World Congr., Toulouse, France, 2017, pp. 6525–6528.
[86] ——, “Model reduction by moment matching at isolated singularities
for linear systems: a geometric approach,” in Proc. 56th Conf. Decision
Control, Melbourne, Australia, 2017, pp. 4807–4812.
[87] ——, “Isolated singularities and moments of linear and nonlinear
systems,” IEEE Trans. Autom. Control, 2017, under review.
[88] T. J. Rothenberg, “Identification in parametric models,” Econometrica,
no. 3, pp. 577–591, 1971.
[89] F. W. Steutel and K. Van Harn, Infinite divisibility of probability
distributions on the real line. New York: Marcel Dekker, 2003.
[90] E. Lukacs, Characteristic functions (2nd edition). London, U.K.:
Griffin, 1970.
[91] C. T. Chen and C. Desoer, “Controllability and observability of
composite systems,” IEEE Trans. Autom. Control, vol. 12, no. 4, pp.
402–409, 1967.
[92] S. Frühwirth-Schnatter, Finite mixture and Markov switching models.
New York: Springer, 2006.
[93] W. L. Smith, “Renewal theory and its ramifications,” J. Roy. Statist.
Soc., pp. 243–302, 1958.
[94] W. J. Stewart, Probability, Markov chains, queues, and simulation: the
mathematical basis of performance modeling. Princeton, NJ: Princeton
Univ. Press, 2009.
[95] M. Harchol-Balter, Performance modeling and design of computer
systems: queueing theory in action. Cambridge, U.K.: Cambridge
Univ. Press, 2013.
[96] G. Kotsalis and M. Dahleh, “Model reduction of irreducible markov
chains,” in Proc. 42nd Conf. Decision Control, Maui, Hawaii, USA,
2003, pp. 5727–5728.
[97] T. Runolfsson and Y. Ma, “Model reduction of nonreversible markov
chains,” in Proc. 46th Conf. Decision Control, New Orleans, LA, USA,
2007, pp. 3739–3744.
[98] K. Deng and D. Huang, “Model reduction of markov chains via lowrank approximation,” in Proc. Amer. Control Conf., Montreal, Canada,
2012, pp. 2651–2656.
[99] K. Ogata, Discrete-time control systems. Englewood Cliffs, NJ: Prentice
Hall, 1995.
[100] A. J. Van Der Schaft and J. M. Schumacher, An introduction to hybrid
dynamical systems. London, U.K.: Springer-Verlag, 2000.
[101] R. Goebel, R. G. Sanfelice, and A. R. Teel, “Hybrid dynamical systems,”
IEEE Control Syst. Mag., vol. 29, no. 2, pp. 28–93, 2009.
[102] R. F. Curtain and H. Zwart, An introduction to infinite-dimensional
linear systems theory. New York: Springer-Verlag, 1995.
16
[103] M. Tucsnak and G. Weiss, Observation and control for operator
semigroups. Basel, Switzerland: Birkhäuser, 2009.
[104] N. K. Bose, Multidimensional systems theory and applications (2nd
ed.). Amsterdam, The Netherlands: Kluwer, 2003.
Alberto Padoan was born in Ferrara, Italy, in
1989. He received the Laurea Magistrale degree
(M.Sc. equivalent) cum laude in automation engineering from the University of Padova, Italy,
in 2013. He then joined the Control and Power
Group at the Electrical and Electronic Engineering Department, Imperial College London, U.K.,
from which he received the Ph.D. degree in
2018.
He is currently a Research Associate in the
Control Group, Department of Engineering, University of Cambridge, U.K., and a Research Associate of Sidney Sussex
College. His research interests are focused on dynamical systems and
control theory, with special emphasis on modelling, system identification
and model reduction.
Alessandro Astolfi was born in Rome, Italy, in
1967. He graduated in electrical engineering from
the University of Rome in 1991. In 1992 he
joined ETH-Zurich where he obtained a M.Sc.
in Information Theory in 1995 and the Ph.D.
degree with Medal of Honor in 1995 with a thesis
on discontinuous stabilization of nonholonomic
systems. In 1996 he was awarded a Ph.D. from
the University of Rome La Sapienza for his
work on nonlinear robust control. Since 1996
he has been with the Electrical and Electronic
Engineering Department of Imperial College London, London (UK),
where he is currently Professor of Nonlinear Control Theory and Head
of the Control and Power Group. From 1998 to 2003 he was also an
Associate Professor at the Dept. of Electronics and Information of the
Politecnico of Milano. Since 2005 he has also been a Professor at
Dipartimento di Ingegneria Civile e Ingegneria Informatica, University of
Rome Tor Vergata. He has been a visiting lecturer in Nonlinear Control in
several universities, including ETH-Zurich (1995-1996); Terza University
of Rome (1996); Rice University, Houston (1999); Kepler University, Linz
(2000); SUPELEC, Paris (2001), Northeastern University (2013).
His research interests are focused on mathematical control theory
and control applications, with special emphasis for the problems of
discontinuous stabilization, robust and adaptive control, observer design
and model reduction. He is the author of more than 150 journal papers,
of 30 book chapters and of over 240 papers in refereed conference
proceedings. He is the author (with D. Karagiannis and R. Ortega) of the
monograph Nonlinear and Adaptive Control with Applications (SpringerVerlag).
He is the recipient of the IEEE CSS A. Ruberti Young Researcher
Prize (2007), the IEEE RAS Googol Best New Application Paper Award
(2009), the IEEE CSS George S. Axelby Outstanding Paper Award
(2012), the Automatica Best Paper Award (2017). He is a Distinguished
Member of the IEEE CSS, IET Fellow, IEEE Fellow and IFAC Fellow. He
is the recipient of the 2015 Sir Harold Hartley Medal, for outstanding
contribution to the technology of measurement and control, from the
Institute of Measurement and Control.
He served as Associate Editor for Automatica, Systems and Control
Letters, the IEEE Trans. on Automatic Control, the International Journal of
Control, the European Journal of Control and the Journal of the Franklin
Institute; as Area Editor for the Int. J. of Adaptive Control and Signal
Processing; as Senior Editor for the IEEE Trans. on Automatic Control;
and as Editor-in-Chief for the European Journal of Control. He is currently
Editor-in-Chief of the IEEE Trans. on Automatic Control. He served as
Chair of the IEEE CSS Conference Editorial Board (2010-2017) and in
the IPC of several international conferences. He is a Member of the IEEE
Fellow Committee (2016).