Academia.eduAcademia.edu

Moments of Random Variables: A Systems-Theoretic Interpretation

2019, IEEE Transactions on Automatic Control

Moments of continuous random variables admitting a probability density function are studied. We show that, under certain assumptions, the moments of a random variable can be characterised in terms of a Sylvester equation and of the steady-state output response of a specific interconnected system. This allows to interpret well-known notions and results of probability theory and statistics in the language of systems theory, including the sum of independent random variables, the notion of mixture distribution and results from renewal theory. The theory developed is based on tools from the center manifold theory, the theory of the steady-state response of nonlinear systems, and the theory of output regulation. Our formalism is illustrated by means of several examples and can be easily adapted to the case of discrete and of multivariate random variables.

Moments of random variables: a systems-theoretic interpretation Alberto Padoan, Member, IEEE, and Alessandro Astolfi, Fellow, IEEE an analytic function. The representation of an analytic function through its Taylor series [9] allows to know the whole function once all the derivatives (at a point) are specified. Similarly, the set of all moments of a random variable uniquely identifies the probability distribution of the random variable, provided that certain conditions are satisfied. This means that the essential features of a random variable can be captured by its moments, which can be then used as an alternative description of the random variable. Moments have been used in a number of different contexts in probability theory, including the Stieltjes moment problem [10], the Hamburger moment problem [11], the Hausdorff moment problem [12] and the Vorobyev moment problem [13]. Moments Index Terms— Differential equations, interpolation, non- have proved instrumental in the first rigorous (yet incomplete) linear systems, probability, random variables, statistics, proof of the central limit theorem [14], but also in the proof of transfer functions. the Feynman-Kac formula [15], in the characterisation of the eigenvalues of random matrices [16], and in the study of birthdeath processes [17]. The correspondence between moments I. I NTRODUCTION and probability distributions is also exploited in statistics to fit One of the most important and challenging problems in curves and to design parameter estimation procedures [1 – 4]. probability theory and statistics is that of determining the For example, the method of moments introduced in [18] takes probability distribution of the random process that is thought advantage of this correspondence to build consistent (usually to have produced a given set of data. Without making any biased) estimators. Approximations of probability density assumptions on the underlying random process, this problem functions may be computed exploiting such correspondence is extremely difficult in general. In parametric statistics [1 – 4], through Pearson’s and Johnson’s curves [2]. A revisitation of where the probability distribution which generates the given this correspondence has also led to the exploratory projection data is assumed to belong to a family of probability distributions pursuit [19 – 21], a graphical visualisation method for the parameterized by a fixed set of parameters, the problem can interpretation of high-dimensional data. While this list of be dealt with using different approaches. A classical way [1 – examples is far from being exhaustive, it illustrates the central 4] to solve this problem is to find mathematical objects that, role of moments in a number of interesting problems. Further under specific assumptions, uniquely identify a probability detail on theory and applications of moments can be found, distribution in the family. In probability theory [5 – 8], a similar e.g., in [22 – 24]. necessity arises when one is confronted with the problem of This work represents a step towards bridging the gap between specifying uniquely a probability measure through a sequence the notions of moment in probability theory and in systems of numbers. The determination of simple, yet meaningful, theory. Connections are first established between moments objects with these features is therefore of paramount importance. of random variables and moments of systems [25]. These The significance of the moments of a random variable in this connections are then used to support the claim that a system context is comparable to that of the derivatives (at a point) for can be seen as an alternative, equivalent description of the probabilistic structure of a random variable. This, in turn, This work has been partially supported by the European Union’s Horiindicates that systems-theoretic techniques and tools can be zon 2020 Research and Innovation Programme under grant agreement No 739551 (KIOS CoE). used to revisit and shed new light on classical results of A. Padoan is with the Department of Engineering, probability theory and statistics. University of Cambridge, Cambridge CB2 1PZ, UK (email: The benefits of a dialogue between the two areas of research [email protected]). A. Astolfi is with the Department of Electrical and Electronic Engi- has been highlighted, e.g., in [26] and briefly mentioned in [27]. neering, Imperial College London, London SW7 2AZ, UK and with the The literature on the analysis of results of probability theory Dipartimento di Ingegneria Civile e Ingegneria Informatica, Università di Roma “Tor Vergata”, Via del Politecnico 1, 00133, Rome, Italy (email: in systems-theoretic terms, however, appears to be limited to [email protected]). specific topics and scattered across different research fields. Abstract— Moments of continuous random variables admitting a probability density function are studied. We show that, under certain assumptions, the moments of a random variable can be characterised in terms of a Sylvester equation and of the steady-state output response of a specific interconnected system. This allows to interpret well-known notions and results of probability theory and statistics in the language of systems theory, including the sum of independent random variables, the notion of mixture distribution and results from renewal theory. The theory developed is based on tools from the center manifold theory, the theory of the steady-state response of nonlinear systems, and the theory of output regulation. Our formalism is illustrated by means of several examples and can be easily adapted to the case of discrete and of multivariate random variables. 1 Linear systems theory has allowed to reinterpret basic results the notion of mixture distribution [7 , 68], and results from of probability theory for probability density functions with renewal theory [6 , 7 , 69 , 70]. Given the conceptual nature of rational Laplace transform, e.g., in [28 , 29]. Concepts and the paper, we focus on linear systems, linear time-delay systems tools of systems theory have been used to investigate phase- and systems in explicit form to streamline the exposition, type distributions [30 , 31] and matrix exponential distributions although generalisations to further classes of systems are [32]. The study of the properties of these distributions have possible. The formalism is developed for univariate continuous grown into a well-established area of research which goes random variables, but can be easily extended to discrete and under the name of “matrix analytic methods” [33 , 34]. Another multivariate random variables. well-known class of models which has attracted the interest Preliminary results have been presented in [71], where of systems theorists over the past century is that of Markov a first connection between moments of probability density chains [5 , 6 , 8 , 35 – 37]. One of the reasons for this interest is functions and moments of linear time-invariant systems has that these models can be regarded as a special class of positive been established. As a result, the moment generating function systems [38], which can be studied exploiting the theory of non- of a random variable and the solution of a Sylvester equation negative matrices [39 – 42]. Queuing systems have been studied have been shown to be closely related. The present manuscript borrowing analysis and design tools from systems theory, extends our results to probability density functions which may e.g., in [43 , 44], whereas a systems-theoretic approach has not be described by means of a linear time-invariant system. been adopted to study risk theory, e.g., in [45 , 46]. In circuits To this end, the notion of moment of a system is revisited and theory, moments have been used for the approximation of the extended to systems the impulse response of which is defined propagation delay in a RC tree network by interpreting the on the whole real line or on a compact subset of the real impulse response of a circuit as a probability density function line. The theory is supported by several worked-out examples [47]. Finally, we emphasise that a number of fundamental tools and applications to identifiability, queueing theory and renewal for modelling, estimation and prediction may not be sharply theory. categorised as belonging to the sphere of probability theory or The rest of the paper is organized as follows. Section II to that of systems theory, but rather lie at the intersection provides basic definitions and the necessary background of these areas of research. The Kalman filter [48 , 49] is concerning moments of linear systems, of time-delay systems, perhaps the most successful example of exchange of ideas of systems in explicit form and of random variables. Section III between probability theory and systems theory, but several contains our main results and includes a characterisation of other examples can be found (see, e.g., [50 – 56] and references the moments of a random variable admitting a (linear or therein). explicit) realization in terms of the systems-theoretic notion of The main objective of this work is to establish a one-to-one moment. The moments of probability density functions defined correspondence between moments of random variables and on the real axis as well as probability density functions with moments of systems [25]. This goal is readily achieved by compact support are also characterised by means of Sylvester interpreting probability density functions as impulse responses equations and steady-state responses. Section IV provides whenever these can be realized by linear time-invariant systems. selected applications of the theoretical results developed, The situation is more delicate when a probability density including the sum of independent random variables, the notion function can be only realized, e.g., by means of a linear of mixture distribution, and results from renewal theory. Finally, time-varying system or a system in explicit form [57 , 58], conclusions and future research directions are outlined in for which the interpretation in terms of impulse responses Section V. does not possess a direct counterpart. This issue is overcome Notation: Z≥0 and Z>0 denote the set of non-negative exploiting recent developments in systems theory [58 – 63], integer numbers and the set of positive integer numbers, where the moments of a linear system have been characterised respectively. R, Rn and Rp×m denote the set of real numbers, as solutions of a Sylvester equation [59 , 60] and, under certain of n-dimensional vectors with real entries and of p × mhypotheses, as steady-state responses of the output of particular dimensional matrices with real entries, respectively. R≥0 and interconnected systems [61 – 63]. The characterisation of the R denote the set of non-negative real numbers and the >0 moments of a system in terms of steady-state responses is based set of positive real numbers, respectively. C denotes the on tools arising in the center manifold theory [64], the theory C set of complex numbers. <0 denotes the set of complex of the steady-state response of a nonlinear system [65], and the numbers with negative real part. i denotes the imaginary unit. output regulation theory for nonlinear systems [66]. Existing I denotes the identity matrix. e j denotes the vector with the results on moments of linear and nonlinear systems [61 – 63] j-th entry equal to one and all other entries equal to zero. are extended and the notion of moment for systems which σ(A) denotes the spectrum of the matrix A ∈ Rn×n . M ′ only possess a representation in explicit form is revisited [58]. p×m . kxk denotes As a direct by-product of our approach, Sylvester equations denotes the transpose of the matrix M ∈ R n the standard Euclidean norm of the vector x ∈ R while, with and Sylvester-like differential equations may be interpreted some abuse of notation, kAk denotes the induced norm of the as powerful computational tools which allow to calculate n×n k matrix A ∈ R . ∆ denotes, for all k ∈ Z , the standard ≥0 n o moments of random variables. Our approach allows to revisit k+1 k and re-interpret well-known notions and results of probability k-simplex, i.e. ∆ = w ∈ R≥0 : w1 + . . . + wk+1 = 1 . theory and statistics using the language of systems theory, diag(λ) denotes the diagonal matrix whose diagonal elements including the sum of independent random variables [6 , 8 , 67], are the entries of the vector λ ∈ Rn . J0 denotes the matrix 2 where Ψk ∈ R(k+1)×(k+1) is a signature matrix1 and Υk ∈ R(k+1)×n is the unique solution of the Sylvester equation with ones on the superdiagonal and zeros elsewhere. Jλ denotes the Jordan block associated with the eigenvalue λ ∈ C, i.e. Jλ = λI + J0 . k! and (k)!! denote the factorial and the double factorial of k ∈ Z≥0 , respectively. ẋ denotes the time derivative of the function x, provided it exists. f1 ∗ f2 denotes the convolution of the functions f1 and f2 . L{f } denotes the bilateral Laplace transform of the function f . δ0 and δ−1 denote the Dirac δ-function and the right-continuous Heaviside unit step function, respectively. The time reversal of the function f : R → R is defined as t 7→ f (−t). δ−1 denotes the right-continuous time reversal of δ−1 . 1S denotes the indicator function of the subset S of a set X , i.e. the function 1S : X → {0, 1} defined as 1S (x) = 1 if x ∈ S and as 1S (x) = 0 if x 6∈ S. By a convenient abuse of notation, 1 also denotes the vector the elements of which are all equal to one. The translation of the function f : R → R by τ ∈ R is defined as fτ : t 7→ f (t − τ ). C[−T, 0] denotes the set of continuous functions mapping the interval [−T, 0] into Rn with the topology of uniform convergence. For all s⋆ ∈ C and A(s) ∈ Cn×n , s⋆ ∈ C \ σ(A(s)) denotes det(s⋆ I − A(s⋆ )) 6= 0, while σ(A(s)) ⊂ C<0 indicates that if s⋆ ∈ C is such that det(s⋆ I − A(s⋆ )) = 0, then s⋆ ∈ C<0 . E[X] denotes the expectation of the random variable X. Js⋆ Υk + ek+1 C = Υk A. (3) Lemma 2. Consider system (1). Assume k ∈ Z≥0 and s⋆ ∈ C \ σ(A). Then there exists a one-to-one correspondence between the moments η0 (s⋆ ), η1 (s⋆ ), . . . , ηk (s⋆ ) and the matrix Υk B, where Υk ∈ R(k+1)×n is the unique solution of the Sylvester equation SΥk + M C = Υk A, (4) in which S ∈ R(k+1)×(k+1) is a non-derogatory2 matrix with characteristic polynomial χ(s) = (s − s⋆ )k+1 (5) and M ∈ Rk+1 is such that the pair (S, M ) is reachable. The moments of system (1) can be also described under special circumstances by means of the (well-defined) steadystate output response3 of the interconnection of system (1) with a system described by equations ω̇ = Sω + M v, d = ω, (6) with ω(t) ∈ Rk+1 , v(t) ∈ R, d(t) ∈ Rk+1 and v = y, i.e. of the interconnected system II. P RELIMINARIES ẋ = Ax + Bu, ω̇ = Sω + M Cx, d = ω, (7) This section contains a short digression on the notion of (k+1)×(k+1) is a non-derogatory matrix with moment, which is used with different meanings in the literature in which S ∈ R k+1 is such that the pair and throughout the paper. To give a compact and self-contained characteristic polynomial (5), M ∈ R exposition, we first provide some material on the notion of (S, M ) is reachable, as detailed by the following statement, moment of a linear system [25 , 59 – 62]. We then illustrate and which is a variation of [62, Theorem 1]. In what follows, extend notions and results concerning moments of a system in we tacitly assume that the pair (S, M ) always satisfies these explicit form [58] and of a time-delay systems [72]. Finally, properties. we recall the notion of moment of a random variable [73]. Theorem 1. Consider system (1), system (6), and the interconnected system (7). Assume σ(A) ⊂ C<0 , ⋆ s ∈ C \ σ(A), x(0) = 0, ω(0) = 0 and u = δ0 . Then there A. Moments of a linear system exists a one-to-one correspondence4 between the moments Consider a single-input, single-output, continuous-time, η (s⋆ ), η (s⋆ ), . . . , η (s⋆ ) and the (well-defined) steady-state 0 1 k linear, time-invariant system described by the equations response of the output d of system (7). ẋ = Ax + Bu, y = Cx, (1) B. Moments of a time-delay system in which x(t) ∈ Rn , u(t) ∈ R, y(t) ∈ R and A ∈ Rn×n , B ∈ Rn×1 and C ∈ R1×n are constant matrices. Throughout the paper we assume that the system (1) is minimal, i.e. reachable and observable, and let W (s) = C(sI − A)−1 B be its transfer function. Consider a single-input, single-output, continuous-time, linear, time-delay system with discrete constant delays described by equations ẋ(t) = Definition 1. [61] The moment of order zero of system (1) at s⋆ ∈ C \ σ(A) is the complex number η0 (s⋆ ) = W (s⋆ ). The moment of order k ∈ Z>0 of system (1) at s⋆ ∈ C \ σ(A) is the complex number   (−1)k dk ⋆ ηk (s ) = W (s) . (2) k! dsk s=s⋆ ς X j=0 Aj x τ j + ̺ X j=ς+1 Bj uτj , y(t) = ς X Cj xτj , (8) j=0 with x(θ) = ϕ(θ) for −T ≤ θ ≤ 0, in which x(t) ∈ Rn , u(t) ∈ R, y(t) ∈ R, τ0 = 0, τj ∈ R>0 , for 1 ≤ j ≤ ̺, 1A signature matrix is a diagonal matrix with ±1 on the diagonal [74]. matrix is non-derogatory if its characteristic and minimal polynomials coincide [41]. 3 The notion of steady-state response is taken from [75] (see also [65 , 66 , 76]). Note that if system (1) is stable and the signal u is bounded backward and forward in time, system (1) has a unique, well-defined steadystate response. 4 The terminology is borrowed from [61 , 62]. By one-to-one correspondence we mean that the steady-state response of the output d is uniquely determined by the moments and vice versa. 2A The moments of system (1) can be characterised in terms of the solution of a Sylvester equation [59 – 62]. The following statements provide a version of the results presented in [61 , 62]. Lemma 1. Consider system (1). Assume k ∈ Z≥0 and ′ s⋆ ∈ C \ σ(A). Then [ ηk (s⋆ ) · · · η1 (s⋆ ) η0 (s⋆ ) ] = Ψk Υk B, 3 T = max0≤j≤̺ τj , ϕ ∈ C[−T, 0], and Aj ∈ Rn×n , n×1 1×n Bj ∈ R and Cj ∈ R are constant matrices for 0 ≤ j ≤ ς and ς + 1 ≤ j ≤ ̺, respectively. Throughout the paper the system (8) is assumed to be minimal, i.e. reachable and observable, and let W (s) = C(s)(sI −PA(s))−1 B(s) ς be its Ptransfer function, with A(s) Aj e−τj s , Pς = j=0 ̺ −τj s −τj s B(s) = j=ς+1 Bj e and C(s) = j=0 Cj e . Definition 2. [72] The moment of order zero of system (8) at s⋆ ∈ C \ σ(A(s)) is the complex number η0 (s⋆ ) = W (s⋆ ). The moment of order k ∈ Z>0 of system (8) at s⋆ ∈ C \ σ(A(s)) is the complex number   (−1)k dk ⋆ W (s) . (9) ηk (s ) = k! dsk s=s⋆ This, in turn, allows to define a notion of moment for systems in explicit form [77]. To this end, the following technical assumptions are needed. Assumption 1. The matrix Λ(t, t0 ) is non-singular for every t ≥ t0 . Assumption 2. The function t 7→ Λ̇(t, t0 )Λ(t, t0 )−1 is piecewise continuous. Assumption 3. There exist K ∈ R≥0 and α ∈ R>0 such that kΛ(t, t0 )k ≤ Ke−α(t−t0 ) for every t ≥ t0 . Assumption 4. The point s⋆ ∈ C is such that Re(s⋆ ) < −α. Assumption 5. The entries of Λ have a strictly proper Laplace transform. Assumption 6. The pair (x0 , C) is such that the poles of the A theory of moments can be also developed for time-delay Laurent series associated with L{CΛx0 } and L{Λ} coincide. systems exploiting a particular Sylvester-like equation [72]. Remark 1. The manifold For completeness, we briefly formulate the results of [72] in a  form that is more convenient for our purposes. M = (x, ω, t) ∈ Rn+k+2 : ω(t) = Υ(t)x(t) , (14) Lemma 3. Consider system (8). Assume k ∈ Z≥0 and is an invariant integral manifold6 of system (13) whenever ′ ⋆ sP ∈ C \ σ(A(s)). Then [ ηk (s⋆ ) · · · η1 (s⋆ ) η0 (s⋆ ) ] = the function Υ : R → R(k+1)×n satisfies (except at points of ̺ −Js⋆ τj Υk Bj , where Ψk ∈ R(k+1)×(k+1) is a j=ς+1 Ψk e discontinuity) the ordinary differential equation signature matrix and Υk ∈ R(k+1)×n is the unique solution Υ̇(t) = SΥ(t) − Υ(t)Λ̇(t, t0 )Λ(t, t0 )−1 + M C, (15) of the equation J s⋆ Υ k + ς X ek+1 Cj e−Js⋆ τj = ς X Aj Υk e−Js⋆ τj . the solution of which is (10) S(t−t0 ) j=0 j=0 Z t Υ(t0 )Λ(t0 , t) + eS(t−ζ) M CΛ(ζ, t)dζ, (16) Υ(t) = e t0 Lemma 4. Consider system (8). Assume k ∈ Z≥0 and ⋆ (k+1)×n s ∈ C \ σ(A(s)). Then there exists a one-to-one correspon- with initial condition Υ(t0 ) ∈ R , for all t ≥ t0 . △ ⋆ ⋆ ⋆ η (s ), η (s ), . . . , η (s ) and dence between the moments 0 1 k P̺ The following result is instrumental to define the notion of the matrix j=ς+1 e−Sτj Υk Bj , where Υk ∈ R(k+1)×n is the moment for system (12) and is adapted to the purposes of this unique solution of the equation paper from a statement originally presented in [77]. ς ς X X Theorem 2. Consider system (6), system (12) and the interconSΥk + Aj Υk e−Sτj , (11) M Cj e−Sτj = nected system (13). Suppose Assumptions 1, 2, 3 and 4 hold. j=0 j=0 Then there exists a unique matrix Υ∞ (t0 ) ∈ R(k+1)×n such in which S ∈ R(k+1)×(k+1) is a non-derogatory matrix with that limt→∞ kΥ(t) − Υ∞ (t)k = 0 for any Υ(t0 ) ∈ R(k+1)×n , characteristic polynomial (5) and M ∈ Rk+1 is such that the where Υ and Υ∞ are the solutions of (15) with initial pair (S, M ) is reachable. conditions Υ(t0 ) and Υ∞ (t0 ), respectively. Moreover, the manifold (14) is an attractive invariant integral manifold of system (13). C. Moments of a system in explicit form Consider a single-output, continuous-time system in explicit form5 described by equations x(t) = Λ(t, t0 )x0 , y(t) = Cx(t), Proof. To begin with note that by Assumption 1 the righthand side of (15) is well-defined. Let Υ1 and Υ2 be the solutions of (15) (except at point of discontinuity) corresponding to the initial conditions Υ1 (t0 ) ∈ R(k+1)×n and Υ2 (t0 ) ∈ R(k+1)×n . Defining the function E as the difference of Υ1 and Υ2 and differentiating with respect to the time argument yields the ordinary differential equation Ė(t) = SE(t) − E(t)Λ̇(t, t0 )Λ(t, t0 )−1 the solution of which, in view of (16), is E(t) = eS(t−t0 ) E(t0 )Λ(t0 , t). By Assumptions 3 and 4, this implies limt→∞ kE(t)k = 0. As a result, there exists a solution Υ∞ to which every solution of (15) converges asymptotically, i.e. there exists Υ∞ (t0 ) ∈ R(k+1)×n such that limt→∞ kΥ(t) − Υ∞ (t)k = 0 for any Υ(t0 ) ∈ R(k+1)×n , where Υ and Υ∞ are the solutions of (15) with initial conditions Υ(t0 ) and Υ∞ (t0 ), respectively. (12) n in which x(t) ∈ R , y(t) ∈ R, t0 ∈ R≥0 , x(t0 ) = x0 ∈ Rn , C ∈ R1×n and Λ : R × R → Rn×n is piecewise continuously differentiable in the first argument and such that Λ(t0 , t0 ) = I. In analogy with the theory developed for linear systems, a characterisation of the steady-state output response of the interconnection of system (6) and of system (12), with v = y, i.e. of the system x(t) = Λ(t, t0 )x0 , ω̇ = Sω + M Cx, d = ω, (13) can be given in terms of the solution of a matrix differential equation which plays the role of the Sylvester equation (4). 5 The 6 See terminology is borrowed from [57 , 58]. 4 [78]. A direct computation shows that the moment of order k ∈ Z>0 of the random variable X satisfies the relation µk = λk µk−1 , N with µ0 = 1, and hence µk = λk!k . Moreover, by Assumption 2, Υ∞ is unique. To complete the proof, note that by Remark 1 the set (14) is an invariant integral manifold of system (13) and that the set (14) is attractive since every solution of (15) converges asymptotically to Υ∞ . Example 2 (The hyper-exponential distribution). The probability density function of a random variable X having hyperexponential distribution is defined as Theorem 2 allows to introduce the following definition, which is reminiscent of the notion of moment given in [58]. n X Definition 3. Consider system (6), system (12) and the interf : R → R, t → 7 pj λj e−λj t δ−1 (t), (19) X connected system (13). Suppose Assumptions 1, 2, 3 and 4 ⋆ j=1 hold. The moment of system (12) at s is the function Υ∞ x, where Υ∞ is the unique solution of (15) with Υ(t0 ) = 0. where n ∈ Z>0 is referred to as the number of phases n Theorem 3. Consider system (6), system (12) and the intercon- of X and λ = (λ1 , λ2 , . . . , λn ) ∈ R>0 and p = nected system (13). Suppose Assumptions 1, 2, 3, 4, 5 and 6 (p1 , p2 , . . . , pn ) ∈ ∆n−1 are given parameters. Observe P hold. Then there exists a one-to-one correspondence between that fX can be written as fX = nj=1 pj fXj , in which ⋆ the moment of system (12) at s and the (well-defined) steady- fXj : R → R, t 7→ λj e−λj t δ−1 (t) is the probability density state response of the output d of system (13). function of a random variable Xj having exponential distribution with parameter λj . Thus, by linearity of the expectation Proof. By Assumptions 1, 2, 3 and 4, in view of Theorem 2, operator, thePmoment of order k ∈ Z≥0 of the random variable the steady-state response of the output d is well-defined. By n X is µk = j=1 pj λk!k . N j Assumption 5 and 6, the steady-state response of the output d coincides with Υ∞ x, which by definition is the moment of system (12) at s⋆ . III. M AIN RESULTS Remark 2. Equations (12) describe a considerably general class of continuous-time signals. In particular, under the stated assumptions, this class contains all (exponentially bounded) piecewise continuously differentiable functions that can be generated as the solution of a single-input, single-output, continuous-time, linear, time-varying system of the form ẋ = A(t)x + B(t)u, y = Cx, This section contains the main results of the paper. The moments of a random variable are shown to be in one-to-one correspondence with the moments of a system at zero. This is established first for the special situation in which a given probability density functions can be identified with the impulse response of a linear system. The theory developed is then extended to the broader class of probability density functions which can be represented by a system in explicit form using the theory of moments presented in the previous section. Finally, the case of probability density functions defined on the whole real axis and on compact sets is considered. (17) with x(t) ∈ Rn , u(t) ∈ R, y(t) ∈ R, x(0) = 0, u = δ0 , A : R → Rn×n defined as A(t) = Λ̇(t, t0 )Λ(t, t0 )−1 for all t ≥ t0 such that Λ is differentiable in the first argument and B : R → Rn such that the pair (A(t), B(t)) is reachable for all t ∈ R≥0 . △ A. Probability density functions realized by linear systems Definition 5. Consider system (1) and a random variable X with probability density function fX . The probability density function fX is realized by system (1) if fX (t) = CeAt Bδ−1 (t) for all t ∈ R, in which case system (1) is referred to as a (linear) realization of fX . D. Moments of a random variable For completeness, we now recall the notion of moment of a random variable. Definition 4. [73] The moment of order k ∈ Z≥0 of the random variable X is defined as µk = E[X k ] whenever the expectation exists. Necessary and sufficient conditions for a probability density function to be realized by a linear system can be established using well-known results of linear realization theory [79 – 81]. Note that every linear realization of a probability function must be stable7 , as detailed by the following statement. To simplify the exposition, in the sequel we consider exclusively continuous random variables admitting a probability density function with finite moments of all orders. We also ignore all measure-theoretic considerations as they are not essential for any of the arguments. A discussion on the extension of our results to more general situations is deferred to Section V. To illustrate our results and to demonstrate our approach we use several worked-out examples throughout this work. Lemma 5. Consider system (1) and a random variable X with probability density function fX . If system (1) is a realization of fX , then σ(A) ⊂ C<0 . As a direct application of Definitions 1, 4, 5 and of Lemma 5 the moments of a random variable can be characterised by means of moments of systems. Example 1 (The exponential distribution). The probability density function of a random variable X having exponential distribution with parameter λ ∈ R>0 is defined as fX : R → R, t 7→ λe−λt δ−1 (t). Theorem 4. Consider system (1) and a random variable X with probability density function fX . Assume system (1) is a 7 (18) 5 See [25] for the definition. realization of fX . Then the moments of the random variable X and the moments of system (1) at zero are such that µk = ηk (0), k! equation (4). Finally, note that the components of the (welldefined) steady-state response of the output d of system (7) j Pl−1 Pl−1 can be written as dl (t) = j=0 (−1)j λ1j tj = j=0 µj (−t) j! , for all l ∈ {1, 2, . . . , k + 1}, and hence there is a one-to-one correspondence between the moments up to the order k of the random variable X and the steady-state response of the output d of system (7). N (20) for all k ∈ Z≥0 . Corollary 1. Consider system (1) and a random variable X with probability density function fX . Assume system (1) is a realization of fX . Then the moments up to the order k ∈ Z≥0 of the random variable X are given by the entries of Ψk Υk B, where Υk is the unique solution of the Sylvester equation (3), with Ψk = diag((−1)k k!, . . . , −1, 1) and s⋆ = 0. B. Probability density functions realized by systems in explicit form We have seen that a systems-theoretic interpretation can be given to probability density function which can be realized Remark 3. The moments of a random variable can be deter- by a linear system. However, the vast majority of probability mined by direct application of Definition 4 or by “pattern density functions cannot be described by a linear time-invariant matching” using existing tables of moments. The one-to-one differential equation. To provide a generalisation of the results correspondence established in Corollary 1, on the other hand, established which accounts for more general probability density indicates that a closed-form expression for the moments of functions, we develop a parallel of the formulation presented in a random variable can be computed from the solution of a the previous section using the theory of moments for systems Sylvester equation, which can be solved with numerically in explicit form [58]. To begin with, we introduce the following reliable techniques [25]. The computation of moments of definition. random variables through Sylvester-like equations is one of Definition 6. Consider system (12) and a random varithe leitmotifs underlying our approach. △ able X with probability density function fX . The probaCorollary 2. Consider system (1) and a random variable X bility density function fX is realized8 by system (12) if with probability density function fX . Assume system (1) is a fX (t) = CΛ(t, t0 )x0 δ−1 (t) for all t ≥ t0 , in which case sysrealization of fX . Then the moments up to the order k ∈ Z≥0 tem (12) is referred to as a (explicit) realization of fX . of the random variable X are in one-to-one correspondence Theorem 5. Consider system (6), system (12) and the interwith the matrix Υk B, where Υk is the unique solution of the connected system (13). Suppose Assumptions 1, 2, 3 and 4 Sylvester equation (4), in which S is a non-derogatory matrix hold. Let X be a random variable with probability density with characteristic polynomial (5) and M is such that the pair function f and assume system (12) is a realization of f . X X (S, M ) is reachable. Then the moments of the random variable X up to the order Corollary 3. Consider system (1), system (6), and the intercon- k ∈ Z≥0 are in one-to-one correspondence with the moment nected system (7). Let X be a random variable with probability of system (12) at zero. density function fX . Assume system (1) is a realization of fX , Proof. To begin with, note that by Assumptions 1, 2, 3 s⋆ = 0, x(0) = 0, ω(0) = 0 and u = δ0 . Then there exists a and 4 the moment of system (12) at zero is well-defined. one-to-one correspondence between the moments up to the By Definition 3 and by (16), the moment of system (12) at order k ∈ Z≥0 of the random variable X and the (well-defined) zero reads as steady-state response of the output d of system (7). Z t  S(t−ζ) Example 3 (The exponential distribution, continued). Consider Υ∞ (t)x(t) = e M CΛ(ζ, t)dζ Λ(t, t0 )x0 t0 a random variable X having exponential distribution with Z t probability density function fX and parameter λ ∈ R>0 . A = eSt e−Sζ M CΛ(ζ, t0 )x0 dζ direct inspection shows that the probability density function t0 Z t fX is realized by the linear, time-invariant system = eSt e−Sζ M fX (ζ)dζ, t 0 ẋ = −λx + λu, y = x, (21) where the last identity holds since system (12) is a realization R i.e. by system (1) with A = −λ, B = λ and C = 1. Note that of fX . Define H+ (t) = t e−Sζ M fX (ζ)dζ. Since S is a nont0 the only eigenvalue of system (21) has negative real part, which derogatory matrix with characteristic polynomial (5), with s⋆ = is consistent with Lemma 5. A direct application of Definition 1 0, and since the pair (S, M ) is reachable, there exists a nonyields ηk (0) = λ1k , which, in view of Example 1 and in singular matrix T ∈ R(k+1)×(k+1) such that T −1 M = ek+1 agreement with Theorem 4, shows that the moments of the and T −1 ST = J0 . This implies random variable X are in one-to-one correspondence with the h i′ k lim H+ (t) = (−1) (22) moments of system (21) at zero. In accordance with Corollary 1, · · · −µ1 µ0 k! µk t→∞ k+1 setting Ψk = diag((−1) k!, . . . , −1, 1) and Σ = J yields k 0 ′  8 The impulse response may depend on the time of the impulse t for time Ψk Υk B = λk!k · · · λ1 1 , in which Υk is the unique 0 solution of the Sylvester equation (3). By Corollary 2, a varying systems. Note, however, that our purpose is to model the probabilistic structure of a random variable representing its probability density function one-to-one correspondence can be also inferred between by means of a system and its impulse response. This means that t0 can be the moments of the random variable X and the Sylvester considered as a parameter that can be assigned. 6 and hence that the components of the moment of system (12) at zero grow polynomially as t → ∞, with coefficients uniquely determined by the moments µ0 , µ1 , . . . , µk , which proves the claim. Corollary 4. Consider system (6), system (12) and the interconnected system (13). Suppose system (12) is a realization of fX and Assumptions 1, 2, 3, 4, 5 and 6 hold. Then there exists a one-to-one correspondence between the moments up to the order k ∈ Z≥0 of the random variable X and the (well-defined) steady-state response of the output d of system (13). Remark 4. Assumption 1 is violated by any explicit realization of the form (12) associated with a probability density function with compact support, i.e. zero everywhere except on a compact subset of the real line. A discussion on the extension of Theorem 5 to such class of probability density functions is deferred to Section III-D. △ C. Probability density functions on the whole real axis The results established so far characterise probability density functions with support on the non-negative real axis. These Remark 5. While every linear realization of a probability results are not satisfactory because most probability density density function must be internally stable (Lemma 5), it is not functions are defined over the whole real line. This issue, possible to prove that every explicit realization of a probability however, can be easily resolved using the following approach. Every probability density function fX can be decomposed density function must satisfy Assumption 3. The reason is as the sum of a function fc which vanishes on the non-positive that there exist probability density functions with a “tail” real axis and of a function fac which vanishes on the nonthat is “heavier” than the one of the exponential [69 , 82], negative real axis, i.e. fX (t) = fc (t)δ−1 (t) + fac (t) δ−1 (t) including those of Pareto, Weibull, and Cauchy random for all t ∈ R. We call fc the causal part of fX and fac the variables. Assumption 3 is therefore a strong assumption anticausal part of fX . Note that the function need not to be which rules out important probability density functions. Note, continuous, but only integrable. however, that a generalisation of our results to probability With these premises, the following result holds. density functions with a heavy tail can be established with more advanced measure-theoretic tools. △ Theorem 6. Consider a random variable X with probability Example 4 (The half-normal distribution). The probability density function fX . Let fc and fac be the causal part and the density function of a random variable X having half-normal anti-causal part of fX , respectively. Assume fc is realized by the minimal system distribution with parameter σ ∈ R>0 is defined as r ẋc = Ac xc + Bc uc , yc = Cc xc , (27) 2 − t22 (23) fX : R → R, t 7→ e 2σ δ−1 (t). 2 πσ and the time reversal of fac is realized by the minimal system A direct inspection shows that the probability density function ẋac = Aac xac + Bac uac , yac = Cac xac , (28) fX is realized by the linear, time-varying system r with xj (t) ∈ Rnj , uj (t) ∈ R, yj (t) ∈ R, and Aj ∈ Rnj ×nj , 2 t ẋ = − 2 x + u, y = x, (24) Bj ∈ Rnj ×1 and Cj ∈ R1×nj constant matrices for σ πσ 2 j ∈ {c, ac}. Then the moments of the random variable in which x(t) ∈ R, u(t) ∈ R, y(t) ∈ R. Consider now the X and the moments of systems (27) and (28) at zero satisfy interconnection of system (6) and of system (24), with v = y, the identity µk set s⋆ = 0 and note that Assumptions 1, 2, 3 and 4 hold. (29) = (−1)k ηkac (0) + ηkc (0). k! Equation (15) boils down to   for every k ∈ Z≥0 . t Υ̇(t) = S + 2 I Υ(t) + M (25) σ Proof. Let k ∈ Z≥0 and note that Z ∞ and can be solved by direct application of formula (16), with St µ = tk fX (t)dt k Υ(t0 ) = 0, yielding Υ∞ (t) = e H+ (t), with −∞   2 2) Z 0 Z ∞ (−1)k R t k − (ζ −t 2σ 2 ζ e dζ k k! 0 = t fac (t)dt + tk fc (t)dt   ..   −∞ 0 Z ∞ Z ∞   . (26) H+ (t) =  . R t − (ζ2 −t2 ) k k   = (−1) t f (−t)dt + tk fc (t)dt ac − 0 ζe 2σ2 dζ   0 0 R t − (ζ2 −t2 ) e 2σ2 dζ = (−1)k k!ηkac (0) + k!ηkc (0), 0 q t2 Since x(t) = πσ2 2 e− 2σ2 δ−1 (t), by Definitions 3 and 4, this which proves the claim. implies (22) holds, with µ0 = 1 and Corollary 5. Consider system (1) and a random variable X with ( k probability density function fX . Assume fX is even and the σ (k − 1)!!, if k is even, causal part of fX is realized by system (1). Then the moments µk = q 2 k π σ (k − 1)!!, if k is odd. of the random variable X and the moments of system (1) at In accordance with Theorem 4, this shows that the moments zero satisfy the identity of the random variable X up to the order k ∈ Z≥0 uniquely µk (−1)k + 1 = ηk (0). (30) specify the moment of system (24) at zero as t → ∞. N k! 2 7 for all k ∈ Z≥0 . realize the causal part and the anticausal part of the probability density function of the random variable of interest. For brevity we do not repeat other versions of these results for probability density functions realized by systems in explicit form; instead, we consider the following important example. Example 6 (The normal distribution). The probability density function of a random variable X having a normal distribution with parameter σ ∈ R>0 is defined as t2 1 fX : R → R, t 7→ √ (36) e− 2σ2 . 2 2πσ Proof. Let fc and fac be the causal part and the anticausal part of fX , respectively. By hypothesis the identity fX (t) = fc (t)δ−1 (t) + fac (t) δ−1 (t) = fc (t)δ−1 (t) + fc (−t) δ−1 (t) holds for all t ∈ R. By the time-reversal property of the Laplace transform [83, p. 687], this implies L{fX }(s) = L{fc }(s) + L{fc }(−s), from which the claim follows. Example 5 (The Laplace distribution). The probability density function of a random variable X having a Laplace distribution with parameter λ ∈ R>0 is defined as fX : R → R, t 7→ λ −λ|t| e . 2 t2 1 e− 2σ2 δ−1 (t), The causal part of fX is fc : R → R, t 7→ √2πσ 2 while the anticausal part of fX is fac : R → R, t2 1 e− 2σ2 δ−1 (t). The causal part and the time t 7→ √2πσ 2 reversal of the anti-causal part of fX are both realized by the linear, time-varying system 1 t u, y = x, (37) ẋ = − 2 x + √ σ 2πσ 2 (31) The causal part of fX is fc : R → R, t 7→ λ2 e−λt δ−1 (t), while the anticausal part of fX is fac : R → R, t 7→ λ2 eλt δ−1 (t). The causal part and the time reversal of the anti-causal part of fX are both realized by the minimal system λ u, y = x, (32) 2 in which x(t) ∈ R, u(t) ∈ R, y(t) ∈ R. Thus, by Theorem 6, the moment of order k ∈ Z≥0 of the random variable X is in which x(t) ∈ R, u(t) ∈ R, y(t) ∈ R. Consider now the interconnection of system (6) and of system (37), with v = y, set s⋆ = 0 and note that Assumptions 1, 2, 3 and 4 hold. In analogy with Example 4, noting that equation (15) boils down to (25) gives Υ∞ (t) = eSt H+ (t), with H+ (t) defined as in (26). For a suitable signature matrix Ψk , defining ΥT,∞ (t) = Υ∞ (t) + Ψk Υ∞ (t) and noting that ẋ = −λx + k! k! + k. (33) 2λk 2λ In agreement with Corollary 5, since fX is even and the causal part of fX is realized by system (32), the moment of order k ∈ Z≥0 of the random variable X can be written k as µk = (−1)2 +1 λk!k , which is consistent with formula (33). Finally, we emphasise that a simple exercise in integration shows that the moments of the random variable X are indeed given by (33). N µk = (−1)k t2 1 x(t) = √2πσ e− 2σ2 δ−1 (t), by Definitions 3 and 4, allows to 2 conclude (22) holds, with µ0 = 1 and ( σ k (k − 1)!!, if k is even, µk = 0, if k is odd. Generalising the results of Theorem 6, this shows that the moments up to the order k ∈ Z≥0 of the random variable X uniquely specify the moment of system (24) at zero as t → ∞. Note also that ΥT,∞ can be written as ΥT,∞ = (I + Ψk ) Υ∞ , which, in a broad sense, is in agreement with Corollary 5. N Remark 6. Theorem 6 and Corollary 5 allow to establish Corollaries 1, 2 and 3 for a random variable X with probability density function defined on the whole real axis, provided that its causal part and the time reversal of its anti-causal part are realized by systems of the form (27) and (28), respectively. D. Probability density functions with compact support This can be achieved noting that the moments up to the order We now concentrate on probability density functions with k ∈ Z≥0 of the random variable X are given the entries of compact support. To begin with a limitation of the characterisa  by (k+1)×(2k+2) ΨT ΥT BT , where ΨT = Dk Ψk Dk ∈ R , tion of the moments of a random variables in terms of explicit with Ψk ∈ R(k+1)×(k+1) a signature matrix and systems is illustrated through a simple example. Dk = diag(k!, . . . , 1!, 1), and ΥT ∈ R(2k+2)×(na +nac ) Example 7 (The uniform distribution). Suppose we wish to find is the unique solution of the Sylvester equation a realization of the probability density function of a random J0 ΥT + ek+1 CT = ΥT AT , (34) variable X having a uniform distribution with parameters a, b ∈ R>0 , with a < b, defined as with 1     ′   1[a,b] (t). (38) fX : R → R, t 7→ Bac Cac Aac 0 ′ b−a , BT = , CT = . (35) AT = ′ Cc 0 Ac Bc Clearly, any explicit realization of the form (12) necessarily △ violates Assumption 1, since fX is zero everywhere except on a The arguments used to prove the results above extend compact subset of the real line. As a result, the theory developed immediately to the case in which the probability density in Section III-B does not apply. However, the probability density function a given random variable is defined on the whole function (38) can be also interpreted as the impulse response real axis and its causal and anticausal parts are realized by of the linear, time-delay system with discrete constant delays a system in explicit form. The key point is that one has to 1 ẋ = (ua − ub ) , y = x, (39) consider a signed sum of the moments of the systems which b−a 8 Pν−1 in which a, b ∈ R>0 , with a < b, and q(t) = q k tk , Pν−1 kk=0 k with qν−1 6= 0. Defining Q1 (s) = = k=0 q1 s P Pν−1 Pk k! i ν−k+i−1 ν−1 k k and Q (s) = q s = a s 2 k=0 2 k=0 Pi=0 i! Pν−1 k k! i ν−k+i−1 the Laplace transform k=0 i=0 i! b s −sa −sb 2 (s)e of (43) can be written as L{fX } = Q1 (s)e s−Q ν and has aP removable singularity at zero if and l k q1k al−k − q2k bl−k = 0 for all only if k=0 (−1) l ∈ {0, 1, . . . , ν − 1}. Under these conditions, the probability density function (43) can be realised by system (8) setting, e.g., τ1 = a, τ2 = b,       0 J0 0 eν , B2 = A0 = , B1 = , 0 eν 0 J0 i.e. by system (8) with x(t) ∈ R, u(t) ∈ R, y(t) ∈ R, τ1 = a, 1 1 τ2 = b, B1 = b−a , B2 = − b−a and C0 = 1, A(s) = 0, 1 −sa −sb − e ), C(s) = 1. Note that the moments B(s) = b−a (e of system (39) are not classically defined at zero, since 0 ∈ σ(A(s)). However, since zero is a removable singularity9 of the transfer function of system (39), the moments of system (39) can be defined and characterised by means of Sylvester equations and impulse responses using the notions and results introduced in [85 – 87]. In particular, the moments of system (39) at zero satisfy the identity   −J0 a e − e−J0 b ′ [ ηk (0) · · · η1 (0) η0 (0) ] = Ψk Υk , (40) b−a for every k ∈ Z≥0 , with Ψk ∈ R(k+1)×(k+1) a signature matrix and Υk ∈ Rk+1 a solution of the (Sylvester) equation J0 Υk + ek+1 = 0, C1 = (41) C2 = To see this, note that ηk (0) = ak+1 − bk+1 . (k + 1)!(b − a) j=1 = k+1 X j=1 q11 0 0 ··· ··· q1ν−1 0 q20 0 0 q21 ··· ··· 0 q2ν−1  ,  , Example 8 (The triangular distribution). The probability density function of a random variable X having a triangular distribution with parameter τ ∈ R>0 is defined as   |t| 1 . (44) fX : R → R, t 7→ max 0, 1 − τ τ allows to conclude   −J0 a ∞ X (−1)j aj − bj j e − e−J0 b Υk = J Υk b−a j! b−a 0 j=0 =  q10 and the moments of the random variable X can be shown to be in one-to-one correspondence with the moments at zero of such system. To illustrate this point, we consider a simple example. (42) Exploiting the definition of matrix exponential, the identity (41) and the property ( ej , for 1 ≤ j ≤ k + 1, j−1 J0 ek+1 = 0, for j ≥ k + 2, k+1 X   The causal part of fX is fc : R → R, t 7→ τ1 1 − τt 1[0,τ ] (t), while the anticausal part of fX is fac : R → R, t 7→ τ1 1 + τt 1[−τ,0] (t). The causal part and the time reversal of the anti-causal part of fX are both realized by the time-delay system (−1)j+1 aj − bj j−1 J ek+1 j! b−a 0 (−1)j+1 aj − bj ej j! b−a ẋ = e2 u + e4 uτ , which, in view of (42), proves the identity (40). We emphasise that, in line with the results developed for probability density functions realized by linear systems, the relation (20) between the moments of fX and the moments of the corresponding realization is satisfied: a one-to-one correspondence exists between the moments of the random variable X and the ak+1 −bk+1 moments of system (39), since µk = (k+1)(b−a) . N y= e′ τ e′2 − e′1 x + 32 xτ , 2 τ τ (45) i.e. by system (8) with x(t) ∈ R4 , u(t) ∈ R, y(t) ∈ R, τ1 = 0, τ e′ −e′ e′ τ2 = τ , A0 = 0, B1 = e2 , B2 = e4 , C1 = 2τ 2 1 , C2 = τ 32 . The moment at zero of order k ∈ Z≥0 of the system is ηk (0) = (−1)k τ k + τ k , 2(k + 2)! (46) The main reason why it is possible to characterise in systemstheoretic terms the moments of a random variable having a which is consistent with the identity (20), since the moment uniform distribution is that zero is a removable singularity of order k ∈ Z≥0 of the random variable X reads as of the transfer function of the associated time-delay system. (−1)k τ k + τ k This observation allows to generalise the argument used in . (47) µk = 2(k + 2)(k + 1) Example 7 to treat random variables the probability density function of which has compact support and is polynomial on the complement of its zero set. To see this, consider a random This also implies that a one-to-one correspondence exists variable X having a probability density function of the form between the moments of the random variable X and the moments of the system. We emphasise that exploiting the fX : R → R, t 7→ q(t)1[a,b] (t), (43) argument of Example 7 the moments of the system (and thus those of the random variable X) may be computed using the 9 See, e.g., [84]. formula (11). N 9 IV. A PPLICATIONS This section contains a series of applications of the proposed ideas. We first focus on the identifiability of probability density functions admitting a linear realization. Then, a systemstheoretic interpretation for sums of independent random variables, the notion of mixture distribution and basic results from renewal theory are provided. Finally, connections between the approximation of probability density functions and the model reduction problem are studied. A. Identifiability of probability density functions with linear realizations systems have the same impulse response. However, a oneto-one correspondence between impulse responses and their realizations can be established resorting to canonical forms [80], such as the observer canonical form, defined by constant matrices A ∈ Rn×n , B ∈ Rn and C ∈ R1×n of the form     −αn 1 0 . . . 0 βn  ..  ..  .   . 0 1 . . . ..   .      ′   ..  .. . . . . A = .. , B= , C = e1 , (48)   . . . 0  .  .   −α2 0 . . . 0 1   β2  β1 −α1 0 . . . 0 0 with α = (α1 , . . . , αn ) ∈ Rn and β = (β1 , . . . , βn ) ∈ Rn . With these premises, we may recover the following wellknown result [51]. We begin this section considering the case in which the probability density function of a given random variable is parameterized by a fixed set of parameters. In other words, Lemma 6. Let Θ be an open subset of Rd , representing the while in the previous sections the parameters of probability parameter set, and let FX be a family of probability density density functions have been assumed to be known, in this functions defined on the real axis and associated with a random section parameters are constant unknown quantities, which in variable X. Assume every fX ∈ FX is realized by system (1) principle can (or must) be estimated. In particular, we study the with A ∈ Rn×n , B ∈ Rn and C ∈ R1×n as in (48) and let identifiability of parametric families of probability density func- θ = (α, β) ∈ Θ . Then the family of probability density tions the elements of which admit a linear realization. This is functions FX is identifiable if and only if every pair (A, B) important, for example, in the context of parametric estimation, is reachable. where identifiability allows to avoid redundant parametrisations Proof. Note that the family of probability density functions and to achieve consistency of estimates [50 , 51 , 88]. d Let Θ be an open subset of R representing the parameter FX is identifiable if and only if for every fX ∈ FX the map set and let FX be a family of probability density functions βn sn−1 + . . . + β2 s + β1 (α, β) 7→ FX (s; α, β) = n defined on the real axis and associated with a random variable s + αn sn−1 + . . . + α2 s + α1 X. Every element fX ∈ FX is a probability density function t 7→ fX (t; θ) which is known once the element θ ∈ Θ has been with FX (s; α, β) = L{fX }, is injective. This, in turn, corresponds to the numerator and denominator of the rational fixed. function FX (s; α, β) being coprime. As a result, the identiDefinition 7. [88] The parameters θ1 ∈ Θ and θ2 ∈ Θ are fiability of the family FX is equivalent to the minimality of observationally equivalent if fX (t; θ1 ) = fX (t; θ2 ) for almost system (1) and, by observability of the pair (A, C), to the all10 t ∈ R. reachability of the pair (A, B), which proves the claim. The notion of observational equivalence induces an equivalence relation on the parameter set, defined as θ1 ∼ θ2 if θ1 ∈ Θ Remark 7. A dual result can be proved using the controllability and θ2 ∈ Θ are observationally equivalent. The parameter set is canonical form as long as observability, and hence minimality, therefore partitioned into equivalence classes the cardinality of is enforced. This suggests that the identifiability of a family of which determines the identifiability of the family of probability probability density functions admitting a linear realization is density functions considered, as specified by the following equivalent to the minimality of a given canonical realization, which can be thus taken as the definition of identifiability. △ definition. Definition 8. The family of probability density functions FX is identifiable if θ1 ∼ θ2 implies θ1 = θ2 for all θ1 ∈ Θ and θ2 ∈ Θ. B. Sums of independent random variables A classical theorem of probability theory states that the probA characterisation of the identifiability of a family of ability density function of the sum of two jointly continuous, probability density functions admitting a linear realization independent random variables is given by the convolution of can be given by means of the systems-theoretic notion of their probability density functions (see, e.g., [73]). This result minimality. To this end, note that the description of proba- can be given a simple systems-theoretic interpretation. bility density functions as impulse responses has an inherent Theorem 7. Let X1 and X2 be jointly continuous, independent non-uniqueness issue, since algebraically equivalent11 linear random variables with probability density functions fX1 and fX2 realized by the minimal system 10 A property is satisfied for almost all t ∈ R if the set where the property does not hold has Lebesgue measure equal to zero. 11 The single-input, single-output, continuous-time, linear, time-invariant systems ẋ1 = A1 x1 + B1 u1 , y1 = C1 x1 , with x1 (t) ∈ Rn , u1 (t) ∈ R, y1 (t) ∈ R and ẋ2 = A2 x2 + B2 u2 , y2 = C2 x2 , with x2 (t) ∈ Rn , u2 (t) ∈ R, y2 (t) ∈ R are algebraically equivalent if there exists a non-singular matrix T ∈ Rn×n such that A2 = T A1 T −1 , B2 = T B1 , C2 = C1 T −1 . 10 ẋ1 = A1 x1 + B1 u1 , y1 = C1 x1 , (49) y2 = C2 x2 , (50) and the minimal system ẋ2 = A2 x2 + B2 u2 , with xj (t) ∈ Rnj , uj (t) ∈ R, yj (t) ∈ R, and Aj ∈ Rnj ×nj , Bj ∈ Rnj ×1 and Cj ∈ R1×nj constant matrices for j ∈ {1, 2}, respectively. Then the probability density functions of the random variable Y = X1 + X2 is realized by the interconnection of system (49) and system (50) with u1 = y2 . In light of Theorem 7 of Corollary 6, these notions can be characterised in systems-theoretic terms. In particular, decomposability (and divisibility) of a random variable are related to the possibility of describing the corresponding system as the series interconnection of finitely many (and possibly identical) systems. △ Proof. Recall that the probability density function of the sum of The following result provides a systems-theoretic necessary two jointly continuous, independent random variables is given and sufficient condition which ensures the identifiability of a by the convolution of their probability density functions [73, family of probability density function the elements of which can Theorem 6.38], i.e. fY = fX1 ∗ fX2 . Taking the Laplace transbe represented as the sum of random variables with probability form on both sides, this implies L{fY } = L{fX1 }L{fX1 }. density functions admitting a linear realization. Thus, the probability density function fY is realized by the Theorem 8. Let F interconnection of systems (49) and (50) with u1 = y2 , since X be a family of probability density functions the elements of which can be realized as the sum of the the transfer function associated with the probability density probability density functions fX1 and fX2 . Assume fX1 and function fY is the product of the transfer functions associated f are realized by the minimal systems (49) and (50), with the probability density functions fX1 and fX2 . X2 respectively. Then the family of probability density functions The following result is an immediate extension of Theorem 7. F is identifiable if and only if X Corollary 6. Let X1 , X2 , . . . , XN be jointly continuous, in(i) the polynomials C1 adj(sI − A1 )B1 and det(sI − A2 ) dependent random variables. Assume the probability density have no common roots and function fXj is realized by the minimal system (ii) the polynomials C2 adj(sI − A2 )B2 and det(sI − A1 ) have no common roots. ẋj = Aj xj + Bj uj , yj = Cj xj , (51) with xj (t) ∈ Rnj , uj (t) ∈ R, yj (t) ∈ R, and Aj ∈ Rnj ×nj , Bj ∈ Rnj ×1 and Cj ∈ R1×nj constant matrices for j ∈ {1, 2, . . . , N }, respectively. Then the probability density functions of the random variable Y = X1 + X2 + . . . + XN is realized by the interconnection of the family of systems (51) with u1 = y2 , u2 = y3 , . . . , uN −1 = yN . Proof. The identifiability of the family of probability density functions FX is equivalent to the minimality of the series interconnection of the minimal systems (49) and (50), which, in turn, is equivalent to conditions (i) and (ii) [80 , 91]. Example 9 (The Erlang distribution). Suppose we wish to show that the probability density function fY of the random variable Y = X1 + X2 + . . . + XN , in which N ∈ Z>0 and X1 , X2 , . . . , XN are jointly continuous, independent random variables having exponential distribution with parameter λ ∈ R>0 , is that of an Erlang distribution with parameters λ and N , defined as We have seen that the sum of two jointly continuous independent random variables has a natural interpretation in terms of the series interconnection of the realizations of their probability density functions. To provide a probabilistic counterpart of the notion of parallel interconnection we recall the following definition, which is adapted from [92]. λN fY : R → R : t 7→ tN −1 e−λt δ−1 (t). (N − 1)! (52) λN e′ . (N − 1)! 1 (53) Definition 9. A random variable Z with probability density function fZ is said to arise from a finite mixture distribution if there exist N ∈ Z>0 and jointly continuous random variables X1 , X2 , . . . , XN with probability density functions fX1 , fX2 , . . . , fXN such that the probability density function fZ satisfies fZ = w1 fX1 + w2 fX2 + · · · + wN fXN , for some w = (w1 , w2 , . . . , wn ) ∈ ∆n−1 . N is referred to as the number of components of fZ , fX1 , fX2 , . . . , fXN are referred to as the components of fZ , and w1 , w2 , . . . , wN are referred to as the weights of fZ . Recall that a minimal realization of the probability density function fXj of the random variable Xj is described by system (21) for all j ∈ {1, 2, . . . , N }. Thus, by Corollary 6, a realization of the probability density function fY is given by system (1), in which A = J−λ , B = eN , C= C. Mixture distributions We conclude this example noting that a direct computation shows that the probability density function fY is indeed given by (52). N Remark 8. A random variable Y is decomposable if there exist N ∈ Z>0 , with N ≥ 2, and jointly continuous, independent random variables X1 , X2 , . . . , XN such that Y = X1 + X2 + . . . + XN [89]. In case the random variables X1 , X2 , . . . , XN are also identically distributed, then Y is said to be divisible [89]. The notions of decomposability and of divisibility play an important role in probability theory [90], particularly in the analysis of Lévy processes [7 , 67]. 11 Theorem 9. Under the hypotheses of Theorem 7, if the random variable Z with probability density function fZ arises from a finite mixture distribution with components fX1 and fX2 and weights w1 6= 0 and w2 6= 0, then the probability density function fZ is realized by the interconnection of system (49) and system (50) with u = u1 = u2 and y = w1 y1 + w2 y2 . Proof. By hypothesis, the probability density function fZ arises from a finite mixture distribution with components fX1 and fX2 and weights w1 and w2 , i.e. fZ = w1 fX1 + w2 fX2 . Taking the Laplace transform on both sides, by linearity, yields L{fZ } = w1 L{fX1 }+w2 L{fX2 }. Thus, the probability density function fZ is realized by the interconnection of system (49) and system (50) with u = u1 = u2 and y = w1 y1 + w2 y2 , as desired. Example 10 (G/H2 /1 queueing system). Consider the G/H2 /1 queueing system in Fig. 1, in which the arrival process is a general random process and the service process is governed by a two-phase hyper-exponential random variable X. A customer accesses either the service offered by the first server at rate λ1 ∈ R>0 with probability p1 ∈ (0, 1) or the service offered by the second server at rate λ2 ∈ R>0 with probability p2 = 1 − p1 . The probability density function of the random variable X which represents the service time, i.e. the time spent by an arbitrary customer in the service process, is fX (t) = p1 λ1 e−λ1 t δ−1 (t) + p2 λ2 e−λ2 t δ−1 (t) for all t ∈ R≥0 . In view of Theorem 9, the probability density function of the service time is realized by system (1), with     ′  λ1 p1 −λ1 0 , B= , C= . (54) A= 0 −λ2 λ2 p2 N A straightforward generalisation of Theorem 9 is given by the following result. Corollary 7. Under the hypotheses of Corollary 6, let the random variable Z with probability density function fZ arise from a finite mixture distribution with components fX1 , fX2 , . . . , fXN and weights w1 , w2 , . . . , wN 6= 0. Then the probability density function fZ is realized by the interconnection of systems (51) with u = u1 = u2 = . . . = uN and y = w 1 y1 + w 2 y2 + . . . + w N y N . We conclude this section presenting a systems-theoretic necessary and sufficient condition which guarantees the identifiability of finite mixtures admitting a linear realization (see also [92]). Theorem 10. Let FX be a family of probability density functions, the elements of which arise from a finite mixture distribution with components fX1 and fX2 and weights w1 ∈ R>0 and w2 ∈ R>0 . Assume fX1 and fX2 are realized by the minimal systems (49) and (50), respectively. Then the family of probability density functions FX is identifiable if and only if σ(A1 ) ∪ σ(A2 ) = ∅. since p1 , p2 ∈ (0, 1), the family of probability density functions FX is identifiable if and only if λ ∈ R2 does not lie on the bisector of the first quadrant, i.e. λ1 6= λ2 . Note that in the case λ1 = λ2 = λ ∈ R>0 a customer accesses the service offered by a server at rate λ with probability one and hence the model is overparameterised. In other words, the queueing system in question may be equivalently described by the G/M/1 queueing system displayed in Fig. 2, in which the service process is governed by an exponential random variable X with parameter λ. As anticipated in Remark 7, this phenomenon is also captured by the systems-theoretic notion of minimality: in accordance with Theorem 10 for λ1 = λ2 the system (54) is not minimal. N D. Renewal processes We complete this section showing that elementary results from renewal theory [6 , 7 , 69] can be translated in the language of systems theory using the notion of feedback interconnection. Definition 10. [7] A sequence of random variables {Sj }j∈Z≥0 constitutes a renewal process if it is of the form12 Sj = T1 + T2 + . . . + Tj , where {Tj }j∈Z>0 is a sequence of mutually independent random variables with a common distribution F such that F (0) = 0. The random variable Sj in the above definition is often referred to as (the j-th) renewal, while the elements of the sequence {Tj }j∈Z>0 are referred to as waiting times [7 , 69]. The common distribution of the waiting times of a renewal process is called the waiting-time distribution [69]. The probabilistic behaviour of a renewal process {Sj }j∈Z≥0 is closely related to the random variable Nt , with t ∈ R>0 , defined as the largest j ∈ Z≥0 for which Sj ≤ t [93]. The random variable Nt describes the number of renewals occurred by time t and its expected value H(t) = E[Nt ], referred to as the renewal function, satisfies the integral equation of renewal theory [93]. Moreover, if the waiting-time distribution of the renewal process is absolutely continuous then the renewal density, defined as h(t) = Ḣ(t) for all t ∈ R>0 , satisfies the renewal density integral equation [93], i.e. Z t h(t − ζ)f (ζ)dζ, (55) h(t) = f (t) + 0 Proof. The family of probability density functions FX is equivalent to the minimality of the parallel interconnection with weights w1 ∈ R>0 and w2 ∈ R>0 of the minimal systems (49) and (50), which, in turn, is equivalent to σ(A1 ) ∪ σ(A2 ) = ∅ [80 , 91]. in which f is the derivative of the waiting-time distribution. Theorem 11. Let {Sj }j∈Z≥0 be a renewal process. Assume the waiting-time distribution of the renewal process {Sj }j∈Z≥0 admits a probability density function f which is realized by the minimal system (1). Then the renewal density h of the renewal process {Sj }j∈Z≥0 is realized by the system obtained from system (1) with input v(t) ∈ R, output y(t) ∈ R, and interconnection equation u = v + y. Example 11 (G/H2 /1 queueing system, continued). Suppose we are interested in finding conditions under which the family of probability density functions   2   X 2 FX = t 7→ fX (t; λ) = pj λj e−λj t δ−1 (t), λ ∈ R>0 ,   j=1 which describes the service time of the G/H2 /1 queueing system displayed in Fig. 1, is identifiable. Note that the probability density function of the service time is realized by system (1), with matrices defined as in (54). By Theorem 10, 12 Proof. To begin with note that under the stated assumptions the renewal density h of the renewal process {Sj }j∈Z≥0 is well-defined [69, Proposition 2.7]. In addition, the renewal density h satisfies the renewal density integral equation (55). This implies that the Laplace transform of the renewal density 12 By convention, S0 = 0 and 0 counts as zero occurrences [7]. λ1 p1 λ Arrival process Arrival process Queue p2 Service process Queue λ2 Service process Fig. 1: G/H2 /1 queueing system. Fig. 2: G/M/1 queueing system. is such that L{h} = L{f }/(1 − L{f }) [93, p.252]. Thus, since by hypothesis L{f } coincides with the transfer function of system (1), the renewal density h is realized by the system obtained from system (1) with input v(t) ∈ R, output y(t) ∈ R, and interconnection equation u = v + y. with α ∈ R1×n such that α1 = 1, the probability density function of X reads as Example 12 (Poisson processes). Poisson processes are renewal processes in which the waiting times have an exponential distribution [70]. In other words, a renewal process is a Poisson process if there exists λ ∈ R>0 such that the probability density function of each waiting time is f (t) = λe−λt δ−1 (t), for all t ∈ R≥0 . In view of Theorem 11, the renewal density of the process is realized by the system obtained from system (21) with input v(t) ∈ R, output y(t) ∈ R, and interconnection equation u = v + y, i.e. by system (1), with A = 0, B = λ, C = 1. Note that the impulse response of the system is given by h(t) = λδ−1 (t) for all t ∈ R, which is consistent with the fact that the renewal function of a Poisson process can be written as H(t) = λt for all t ∈ R≥0 [94]. N ′ fX (t) = Q′0 eQ t α′ δ−1 (t), (56) for every t ∈ R. Note that (56) can be regarded as the impulse response of system (1), with A = Q′ , B = α′ and C = Q′0 . This indicates that the problem of approximating the probability density function (56) can be regarded as the problem of approximating the impulse response of system (1) or, equivalently, the problem of constructing a reduced order model of system (1) [25]. In particular, approximating the probability density function (56) by another phase-type distribution boils down to constructing a system ξ˙ = F ξ + Gv, ψ = Hξ, (57) in which ξ(t) ∈ Rν , with ν < n, v(t) ∈ R, ψ(t) ∈ R and F ∈ Rν×ν , G ∈ Rν×1 and H ∈ R1×ν are constant matrices such that 1′ G = 1 and H = −1′ F . To illustrate this point we consider a simple example and exploit the model reduction technique devised in [61 , 62] to obtain reduced order models (i.e. approximations of a given probability density function) E. On the approximation of probability density functions which match a prescribed number of (systems-theoretic and, This section investigates some connections between the thus, probabilistic) moments. Note, however, that different approximation of probability density functions and the model model reduction techniques can be used to solve the problem reduction problem [25]. In particular we show that these prob- in question (see [25] and references therein for an overview lems are essentially the same problem when probability density of available model reduction techniques). functions are regarded as impulse responses. As a guiding Example 13. Consider the Markov chain described by the example, we consider phase-type distributions [30 , 31 , 33 , 34], diagram which play an important role in the analysis of queuing λ networks [94] and can be represented by a random variable 0 1 2 λ describing the time until absorption of a Markov chain with one µ absorbing state [95]. Note, however, that similar considerations can be performed for more general classes of probability density The Q-matrix of the Markov chain is   functions. 0 0 0 Consider a continuous-time Markov chain over the set , 0 Q =  λ −λ S = {0, 1, . . . , n}, with n ∈ Z>0 , in which 0 is an absorbing µ λ −(λ + µ) state and 1, . . . , n are transient states. The random variable X which characterises the time until absorption is described and     λ −λ 0 by a continuous phase-type distribution represented by the Q0 = Q= , . µ λ −(λ + µ) Q-matrix [37]   0 0 Let α = [ 0 0 1 ] be the the initial condition of the Markov , Q= Q0 Q chain and let X be the random variable which characterises the time until absorption. A realization of the probability density in which S ∈ Rn×n is such that Q0 = −Q1. Assuming that function of the random variable X is given by system (1), with the initial probability of the Markov chain is       −λ λ 0   A= , B= , C= λ µ . π0 = 0 α , 0 −(λ + µ) 1 13 TABLE I: Random variables and associated probability density function, parameters and realization. Exponential Half-normal Laplace Normal Uniform q λe−λt δ−1 (t) λ ∈ R>0 TABLE II: Correspondence between concepts of probability theory and of systems theory. (21) Sum of independent random variables Series interconnection σ ∈ R>0 (24) Mixture distribution Parallel interconnection λ ∈ R>0 (32) Renewal process Feedback interconnection σ ∈ R>0 a, b ∈ R (a < b) (37) t2 − 2 e 2σ2 δ−1 (t) πσ 2 λ −λ|t| e 2 2 − t √ 1 e 2σ2 2 2πσ 1 1 (t) b−a [a,b]  |t| 1 max 0, 1 − τ τ n n−1 −λt λ t e  V. C ONCLUSION (39) Moments of continuous random variables admitting a probability density function have been studied. Under certain δ−1 (t) λ ∈ R>0 (53) Erlang (n − 1)! hypotheses, the moments of a random variable have been shown to be in one-to-one correspondence with the solutions of a Sylvester equation and with the steady-state output response Following [62], to construct a reduced order model which of a specific interconnected system. The results established matches the moment of order one of the system at zero one in this work have shown that, under certain assumptions, a needs to solve the Sylvester equation (4), with S = 0 and system can be seen as an alternative, equivalent description M = 1, which gives Υ1 = [ −1 − 1 ]. Then one defines of the probabilistic structure of a random variable. This, in turn, indicates that systems-theoretic techniques and tools can reduced order model (57) as be used to revisit and shed new light on classical results F = S − M ∆, G = −Υ1 B, H = ∆, of probability theory and statistics. Table I displays a short list of random variables along with their probability density with ∆ ∈ (0, 1) a free parameter that can be assigned, yielding functions, parameters and associated realization, while Table II summarises the correspondence between certain concepts of F = −∆, G = 1, H = ∆. probability theory and their systems-theoretic counterpart. The present work is a first step towards a unified understandNote that the structure of the original system is preserved ing of the role of systems-theoretic concepts in probability in the reduced order model, since 1′ G = 1 and H = −1′ F . theory and statistics. Several directions for interdisciplinary Moreover, in agreement with the results of [62], the moment research are left open. For example, discrete-time systems at zero of the reduced order model coincides with the moment [99] provide the right tool to carry out the analysis presented at zero of the original system, regardless of the value of ∆. in this work for discrete random variables. Hybrid systems From a probabilistic point of view, the impulse response of [100 , 101] may be used to deal with random variables the the reduced order model corresponds to the probability density distribution function of which has both a discrete part and e which quantifies the time function of the random variable X an absolutely continuous part. Moments of multivariate disuntil absorption of the “reduced” Markov chain described by tributions may be studied resorting to systems described by the diagram PDEs [102 , 103], for continuous random variables, and nD systems [104], for discrete random variables. As a consequence, 0 1 ∆ conditional probabilities may be characterised in systemstheoretic terms. Note, however, that modelling moments of and the Q-matrix multivariate distributions using PDEs might raise challenging   computational issues. Further connections may be explored with 0 0 e= . Q positive systems [38] and, in particular, with Markov chains ∆ −∆ [5 , 6 , 8 , 35 – 37]. The interplay between the role of Hankel It is interesting to note that the “reduced” Markov chain can matrices in realization theory [79 – 81] and in probability be interpreted as Markov chain built from the original Markov theory [5 – 8] may be studied. The notions of information and chain by aggregation of the states 2 and 3, thus showing the of entropy may be investigated using the proposed framework. connection between the model reduction problem and the use The discussion on the approximation of probability density of the concept of aggregation to reduce the state space of a functions which arise from phase-type distributions can be Markov chain [96 – 98]. generalised to probability density functions with a systemsWe conclude this example by emphasising that there is no theoretic representation in explicit form, for which model natural choice of the parameter ∆. For example, one may reduction techniques are available [58 , 77]. Finally, a systemsselect ∆ = λ to ensure that in the “reduced” Markov chain theoretic counterpart of the method of moments [18] may the transition rate from 1 to 0 matches that of the original be developed modifying existing data-driven model reduction which methods [63]. Markov chain. Another sensible choice is ∆ = λ+µ 2 guarantees that the moments of order two at zero of the reduced order model and of the original system coincide, so that the R EFERENCES e also moments of order two of the random variables X and X [1] H. Cramér, Mathematical methods of statistics. Princeton, NJ: Princeton coincide. N Univ. Press, 1946. Triangular τ ∈ R>0 (45) 14 [2] M. G. Kendall and A. Stuart, The advanced theory of statistics. London, U.K.: Griffin, 1977, vol. I. [3] R. J. Serfling, Approximation theorems of mathematical statistics. New York: Wiley, 1980. [4] E. L. Lehmann and G. Casella, Theory of point estimation (2nd edition). New York: Springer, 1998. [5] J. L. Doob, Stochastic processes. New York: Wiley, 1953. [6] W. Feller, An introduction to probability theory and its applications (3rd edition). New York: Wiley, 1971, vol. 1. [7] ——, An introduction to probability theory and its applications (3rd edition). New York: Wiley, 1971, vol. 2. [8] P. Billingsley, Probability and measure (2nd edition). New York: Wiley, 1986. [9] W. Rudin, Principles of mathematical analysis. New York: McGrawHill, 1964. [10] T. J. Stieltjes, “Recherches sur les fractions continues,” in Ann. Fac. Sci. Toulouse Sci. Math. Sci. Phys., vol. 8, 1894, pp. 1–122. [11] H. Hamburger, “Über eine erweiterung des stieltjesschen momentenproblems,” Math. Ann., vol. 81, pp. 235–319, 1920. [12] F. Hausdorff, “Momentprobleme für ein endliches intervall,” Math. Z., vol. 16, pp. 220–248, 1923. [13] Y. V. Vorobyev, Methods of moments in applied mathematics (translated from russian by B. Seckler). New York: Gordon and Breach, 1965. [14] H. Fischer, A history of the central limit theorem: from classical to modern probability theory. New York: Springer, 2010. [15] M. Kac, “On some connection between probability theory and differential and integral equations,” Proc. 2nd Berkeley Sympos. Math. Stat. Prob, 1951, pp. 189–215. [16] E. P. Wigner, “On the distribution of the roots of certain symmetric matrices,” Ann. Math., vol. 67, pp. 325–327, 1958. [17] S. Karlin and J. L. Mc Gregor, “The differential equations of birth-anddeath processes, and the Stieltjes moment problem,” Trans. Amer. Math. Soc., vol. 85, no. 2, pp. 481–546, 1957. [18] K. Pearson, “Contributions to the mathematical theory of evolution,” Phil. Trans. Roy. Soc., vol. 185a, pp. 71–110, 1894. [19] J. H. Friedman and J. W. Tukey, “A projection pursuit algorithm for exploratory data analysis,” IEEE Trans. Comput., vol. 23, pp. 881–890, 1974. [20] P. J. Huber, “Projection pursuit,” Ann. Stat., vol. 13, no. 2, pp. 435–475, 1985. [21] J. H. Friedman, “Exploratory projection pursuit,” J. Amer. Stat. Ass., vol. 82, pp. 249–266, 1987. [22] N. I. Akhiezer, The classical moment problem. New York: Hafner, 1965. [23] S. Karlin and W. J. Studden, Tchebycheff systems: with applications in analysis and statistics. New York: Interscience, 1966. [24] H. J. Landau, Ed., Moments in mathematics. Providence, RI: Amer. Math. Soc., 1987. [25] A. C. Antoulas, Approximation of large-scale dynamical systems. Philadelphia, PA: SIAM, 2005. [26] C. Commault and S. Mocanu, “Phase-type distributions and representations: some results and open problems for system theory,” Int. J. Control, vol. 76, no. 6, pp. 566–580, 2003. [27] L. Benvenuti and L. Farina, “A tutorial on the positive realization problem,” IEEE Trans. Autom. Control, vol. 49, no. 5, pp. 651–664, 2004. [28] B. Hanzon and R. J. Ober, “A state-space calculus for rational probability density functions and applications to non-gaussian filtering,” SIAM J. Control Optim., vol. 40, no. 3, pp. 724–740, 2001. [29] ——, “State space calculations for discrete probability densities,” Linear Algebra Appl., vol. 350, no. 1, pp. 67–87, 2002. [30] C. A. O’Cinneide, “Characterization of phase-type distributions,” Stoch. Models, vol. 6, no. 1, pp. 1–57, 1990. [31] ——, “Phase-type distributions: open problems and a few properties,” Stoch. Models, vol. 15, no. 4, pp. 731–757, 1999. [32] M. Bladt and M. F. Neuts, “Matrix-exponential distributions: Calculus and interpretations via flows,” Stoch. Models, vol. 15, no. 4, pp. 731–757, 2003. [33] M. F. Neuts, Matrix-geometric solutions in stochastic models: an algorithmic approach. Baltimore, MD: John Hopkins Univ. Press, 1981. [34] G. Latouche and V. Ramaswami, Introduction to matrix analytic methods in stochastic modeling. Philadelphia, PA: SIAM, 1999. [35] E. B. Dynkin, Markov processes. New York: Plenum, 1963. [36] K. L. Chung, Markov chains with stationary transition probabilities. New York: Springer-Verlag, 1967. 15 [37] J. R. Norris, Markov chains. Cambridge, U.K.: Cambridge Univ. Press, 1998. [38] L. Farina and S. Rinaldi, Positive linear systems - theory and applications. New York: Wiley, 2000. [39] F. Gantmacher, The theory of matrices. New York: Chelsea, 1959, vol. II. [40] E. Seneta, Non-negative matrices and Markov chains (2nd edition). New York: Springer-Verlag, 1981. [41] R. A. Horn and C. R. Johnson, Matrix analysis. Cambridge, U.K.: Cambridge Univ. Press, 1985. [42] H. Minc, Nonnegative matrices. New York: Wiley, 1988. [43] C. G. Cassandras and S. Lafortune, Introduction to discrete event systems. New York: Kluwer, 1999. [44] P. Brémaud, Markov chains: Gibbs fields, Monte Carlo simulation, and queues. New York: Springer-Verlag, 2001. [45] M. Andretta, “Some considerations on the definition of risk based on concepts of systems theory and probability,” Risk Analysis, vol. 34, no. 7, pp. 1184–1195, 2014. [46] N. Leveson, “A systems approach to risk management through leading safety indicators,” Reliab. Eng. Syst. Saf., vol. 136, pp. 17–34, 2015. [47] W. C. Elmore, “The transient response of damped linear networks with particular regard to wideband amplifiers,” J. App. Phys., vol. 19, no. 1, pp. 55–63, 1948. [48] R. E. Kalman, “A new approach to linear filtering and prediction problems,” J. Basic Eng., vol. 82, pp. 35–45, 1960. [49] R. E. Kalman and R. S. Bucy, “New results in linear filtering and prediction theory,” J. Basic Eng., vol. 83, pp. 95–108, 1961. [50] T. D. Söderström and P. G. Stoica, System identification. Upper Saddle River, NJ: Prentice Hall, 1989. [51] L. Ljung, System identification - theory for the user (2nd edition). Upper Saddle River, NJ: Prentice Hall, 1999. [52] T. Kailath, A. H. Sayed, and B. Hassibi, Linear estimation. Englewood Cliffs, NJ: Prentice Hall, 1999. [53] P. Van Overschee and B. De Moor, Subspace identification for linear systems: theory - implementation - applications. Dordrecht, The Netherlands: Kluwer, 1996. [54] T. Katayama, Subspace methods for system identification. London, U.K.: Springer, 2006. [55] M. Verhaegen and V. Verdult, Filtering and system identification: a least squares approach. Cambridge, U.K.: Cambridge Univ. Press, 2007. [56] A. Lindquist and G. Picci, Linear Stochastic Systems. Berlin, Germany: Springer-Verlag, 2015. [57] L. A. Zadeh and C. A. Desoer, Linear system theory: the state space approach. New York: McGraw-Hill, 1963. [58] G. Scarciotti and A. Astolfi, “Model reduction by matching the steadystate response of explicit signal generators,” IEEE Trans. Autom. Control, vol. 61, no. 7, pp. 1995–2000, 2016. [59] K. A. Gallivan, A. Vandendorpe, and P. Van Dooren, “Sylvester equations and projection-based model reduction,” J. Comp. Applied Math., vol. 162, no. 1, pp. 213–229, 2004. [60] ——, “Model reduction and the solution of Sylvester equations,” in Proc. MTNS, Kyoto, Japan, 2006. [61] A. Astolfi, “Model reduction by moment matching for linear and nonlinear systems,” IEEE Trans. Autom. Control, vol. 55, no. 10, pp. 2321–2336, 2010. [62] ——, “Model reduction by moment matching, steady-state response and projections,” in Proc. 49th Conf. Decision Control. IEEE, 2010, pp. 5344–5349. [63] G. Scarciotti and A. Astolfi, “Data-driven model reduction by moment matching for linear and nonlinear systems,” Automatica, vol. 79, pp. 340–351, 2017. [64] J. Carr, Application of center manifold theory. New York: SpringerVerlag, 1981. [65] C. I. Byrnes and A. Isidori, “Steady state response, separation principle and the output regulation of nonlinear systems,” in Proc. 28th Conf. Decision Control, Tampa, FL, USA, 1989, pp. 2247–2251. [66] A. Isidori and C. I. Byrnes, “Output regulation of nonlinear systems,” IEEE Trans. Autom. Control, vol. 35, no. 2, pp. 131–140, 1990. [67] B. V. Gnedenko and A. N. Kolmogorov, Limit distributions for sums of independent random variables (2nd edition). Cambridge, MA: Addison-Wesley, 1968. [68] B. S. Everitt and D. J. Hand, Finite mixture distributions. London, U.K.: Chapman and Hall, 1981. [69] S. Asmussen, Applied probability and queues. New York: Springer, 2008. [70] R. G. Gallager, Stochastic processes: theory for applications. Cambridge, U.K.: Cambridge Univ. Press, 2013. [71] A. Padoan and A. Astolfi, “Moments of random variables: a linear systems-theoretic view,” in Amer. Control Conf., Seattle, WA, USA, 2017, pp. 1035–1040. [72] G. Scarciotti and A. Astolfi, “Model reduction of neutral linear and nonlinear time-invariant time-delay systems with discrete and distributed delays,” IEEE Trans. Autom. Control, vol. 61, no. 6, pp. 1438–1451, 2016. [73] G. Grimmett and D. Welsh, Probability: an introduction. Oxford, U.K.: Oxford Univ. Press, 2014. [74] R. B. Bapat, Graphs and matrices. London, U.K.: Springer, 2010. [75] A. Isidori and C. I. Byrnes, “Steady-state behaviors in nonlinear systems with an application to robust disturbance rejection,” Ann. Rev. Control, vol. 32, no. 1, pp. 1–16, 2008. [76] A. Isidori, Nonlinear control systems (3rd edition). New York: SpringerVerlag, 1995. [77] G. Scarciotti and A. Astolfi, “Characterization of the moments of a linear system driven by explicit signal generators,” Proc. Amer. Control Conf., Chicago, IL, USA, 2015, pp. 589–594. [78] J. K. Hale, Ordinary differential equations (2nd edition). Malabar, FL, USA: Kreiger, 1980. [79] R. E. Kalman, P. L. Falb, and M. A. Arbib, Topics in mathematical system theory. New York: McGraw-Hill, 1969, vol. 33. [80] T. Kailath, Linear systems. Englewood Cliffs, NJ: Prentice Hall, 1980. [81] E. D. Sontag, Mathematical control theory: deterministic finite dimensional systems (2nd edition). New York: Springer-Verlag, 1998. [82] P. Embrechts and H. Schmidli, “Modelling of extremal events in insurance and finance,” ZOR, vol. 39, no. 1, pp. 1–34, 1994. [83] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, Signals and systems. Upper Saddle River, N.J.: Prentice Hall, 1997. [84] J. B. Conway, Functions of one complex variable. New York: SpringerVerlag, 1973. [85] A. Padoan and A. Astolfi, “Model reduction by moment matching at isolated singularities for linear systems: a complex analytic approach,” in IFAC World Congr., Toulouse, France, 2017, pp. 6525–6528. [86] ——, “Model reduction by moment matching at isolated singularities for linear systems: a geometric approach,” in Proc. 56th Conf. Decision Control, Melbourne, Australia, 2017, pp. 4807–4812. [87] ——, “Isolated singularities and moments of linear and nonlinear systems,” IEEE Trans. Autom. Control, 2017, under review. [88] T. J. Rothenberg, “Identification in parametric models,” Econometrica, no. 3, pp. 577–591, 1971. [89] F. W. Steutel and K. Van Harn, Infinite divisibility of probability distributions on the real line. New York: Marcel Dekker, 2003. [90] E. Lukacs, Characteristic functions (2nd edition). London, U.K.: Griffin, 1970. [91] C. T. Chen and C. Desoer, “Controllability and observability of composite systems,” IEEE Trans. Autom. Control, vol. 12, no. 4, pp. 402–409, 1967. [92] S. Frühwirth-Schnatter, Finite mixture and Markov switching models. New York: Springer, 2006. [93] W. L. Smith, “Renewal theory and its ramifications,” J. Roy. Statist. Soc., pp. 243–302, 1958. [94] W. J. Stewart, Probability, Markov chains, queues, and simulation: the mathematical basis of performance modeling. Princeton, NJ: Princeton Univ. Press, 2009. [95] M. Harchol-Balter, Performance modeling and design of computer systems: queueing theory in action. Cambridge, U.K.: Cambridge Univ. Press, 2013. [96] G. Kotsalis and M. Dahleh, “Model reduction of irreducible markov chains,” in Proc. 42nd Conf. Decision Control, Maui, Hawaii, USA, 2003, pp. 5727–5728. [97] T. Runolfsson and Y. Ma, “Model reduction of nonreversible markov chains,” in Proc. 46th Conf. Decision Control, New Orleans, LA, USA, 2007, pp. 3739–3744. [98] K. Deng and D. Huang, “Model reduction of markov chains via lowrank approximation,” in Proc. Amer. Control Conf., Montreal, Canada, 2012, pp. 2651–2656. [99] K. Ogata, Discrete-time control systems. Englewood Cliffs, NJ: Prentice Hall, 1995. [100] A. J. Van Der Schaft and J. M. Schumacher, An introduction to hybrid dynamical systems. London, U.K.: Springer-Verlag, 2000. [101] R. Goebel, R. G. Sanfelice, and A. R. Teel, “Hybrid dynamical systems,” IEEE Control Syst. Mag., vol. 29, no. 2, pp. 28–93, 2009. [102] R. F. Curtain and H. Zwart, An introduction to infinite-dimensional linear systems theory. New York: Springer-Verlag, 1995. 16 [103] M. Tucsnak and G. Weiss, Observation and control for operator semigroups. Basel, Switzerland: Birkhäuser, 2009. [104] N. K. Bose, Multidimensional systems theory and applications (2nd ed.). Amsterdam, The Netherlands: Kluwer, 2003. Alberto Padoan was born in Ferrara, Italy, in 1989. He received the Laurea Magistrale degree (M.Sc. equivalent) cum laude in automation engineering from the University of Padova, Italy, in 2013. He then joined the Control and Power Group at the Electrical and Electronic Engineering Department, Imperial College London, U.K., from which he received the Ph.D. degree in 2018. He is currently a Research Associate in the Control Group, Department of Engineering, University of Cambridge, U.K., and a Research Associate of Sidney Sussex College. His research interests are focused on dynamical systems and control theory, with special emphasis on modelling, system identification and model reduction. Alessandro Astolfi was born in Rome, Italy, in 1967. He graduated in electrical engineering from the University of Rome in 1991. In 1992 he joined ETH-Zurich where he obtained a M.Sc. in Information Theory in 1995 and the Ph.D. degree with Medal of Honor in 1995 with a thesis on discontinuous stabilization of nonholonomic systems. In 1996 he was awarded a Ph.D. from the University of Rome La Sapienza for his work on nonlinear robust control. Since 1996 he has been with the Electrical and Electronic Engineering Department of Imperial College London, London (UK), where he is currently Professor of Nonlinear Control Theory and Head of the Control and Power Group. From 1998 to 2003 he was also an Associate Professor at the Dept. of Electronics and Information of the Politecnico of Milano. Since 2005 he has also been a Professor at Dipartimento di Ingegneria Civile e Ingegneria Informatica, University of Rome Tor Vergata. He has been a visiting lecturer in Nonlinear Control in several universities, including ETH-Zurich (1995-1996); Terza University of Rome (1996); Rice University, Houston (1999); Kepler University, Linz (2000); SUPELEC, Paris (2001), Northeastern University (2013). His research interests are focused on mathematical control theory and control applications, with special emphasis for the problems of discontinuous stabilization, robust and adaptive control, observer design and model reduction. He is the author of more than 150 journal papers, of 30 book chapters and of over 240 papers in refereed conference proceedings. He is the author (with D. Karagiannis and R. Ortega) of the monograph Nonlinear and Adaptive Control with Applications (SpringerVerlag). He is the recipient of the IEEE CSS A. Ruberti Young Researcher Prize (2007), the IEEE RAS Googol Best New Application Paper Award (2009), the IEEE CSS George S. Axelby Outstanding Paper Award (2012), the Automatica Best Paper Award (2017). He is a Distinguished Member of the IEEE CSS, IET Fellow, IEEE Fellow and IFAC Fellow. He is the recipient of the 2015 Sir Harold Hartley Medal, for outstanding contribution to the technology of measurement and control, from the Institute of Measurement and Control. He served as Associate Editor for Automatica, Systems and Control Letters, the IEEE Trans. on Automatic Control, the International Journal of Control, the European Journal of Control and the Journal of the Franklin Institute; as Area Editor for the Int. J. of Adaptive Control and Signal Processing; as Senior Editor for the IEEE Trans. on Automatic Control; and as Editor-in-Chief for the European Journal of Control. He is currently Editor-in-Chief of the IEEE Trans. on Automatic Control. He served as Chair of the IEEE CSS Conference Editorial Board (2010-2017) and in the IPC of several international conferences. He is a Member of the IEEE Fellow Committee (2016).