Academia.eduAcademia.edu

Causality Analysis with Information Geometry: A Comparison

Entropy

The quantification of causality is vital for understanding various important phenomena in nature and laboratories, such as brain networks, environmental dynamics, and pathologies. The two most widely used methods for measuring causality are Granger Causality (GC) and Transfer Entropy (TE), which rely on measuring the improvement in the prediction of one process based on the knowledge of another process at an earlier time. However, they have their own limitations, e.g., in applications to nonlinear, non-stationary data, or non-parametric models. In this study, we propose an alternative approach to quantify causality through information geometry that overcomes such limitations. Specifically, based on the information rate that measures the rate of change of the time-dependent distribution, we develop a model-free approach called information rate causality that captures the occurrence of the causality based on the change in the distribution of one process caused by another. This measure...

entropy Article Causality Analysis with Information Geometry: A Comparison Heng Jie Choong 1, * , Eun-jin Kim 1 1 2 * and Fei He 2 Centre for Fluid and Complex Systems, Coventry University, Coventry CV1 5FB, UK; [email protected] Centre for Computational Science and Mathematical Modelling, Coventry University, Coventry CV1 5FB, UK; [email protected] Correspondence: [email protected] Abstract: The quantification of causality is vital for understanding various important phenomena in nature and laboratories, such as brain networks, environmental dynamics, and pathologies. The two most widely used methods for measuring causality are Granger Causality (GC) and Transfer Entropy (TE), which rely on measuring the improvement in the prediction of one process based on the knowledge of another process at an earlier time. However, they have their own limitations, e.g., in applications to nonlinear, non-stationary data, or non-parametric models. In this study, we propose an alternative approach to quantify causality through information geometry that overcomes such limitations. Specifically, based on the information rate that measures the rate of change of the time-dependent distribution, we develop a model-free approach called information rate causality that captures the occurrence of the causality based on the change in the distribution of one process caused by another. This measurement is suitable for analyzing numerically generated non-stationary, nonlinear data. The latter are generated by simulating different types of discrete autoregressive models which contain linear and nonlinear interactions in unidirectional and bidirectional time-series signals. Our results show that information rate causalitycan capture the coupling of both linear and nonlinear data better than GC and TE in the several examples explored in the paper. Keywords: causality; information geometry; Transfer Entropy; Granger Causality; information causal rate; signal processing; nonlinear models; non-stationary; probability distribution Citation: Choong, H.J.; Kim, E.-j.; He, F. Causality Analysis with Information Geometry: A Comparison. Entropy 2023, 25, 806. https://doi.org/ 10.3390/e25050806 Academic Editor: Hector Zenil Received: 6 March 2023 Revised: 12 May 2023 Accepted: 13 May 2023 Published: 16 May 2023 Copyright: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). 1. Introduction The study of causality aims to explore cause-and-effect relationships between processes. In general, causality can be investigated in two categories: causal discovery and causal analysis. Causal discovery shows the inherent causal relationship within the dataset by analyzing and creating causal models based on graph theory. The construction of these models can be achieved via algorithmic dynamics and evaluating the algorithms against the observed data [1–3]. Generally, the complexity of the models’ networks is characterized and measured using graph-theoretic measures. A recent approach using classical information theory, such as Shannon entropy, has been considered as an alternative measurement, but its high dependence on the distribution can lead to spurious disparate values for the same complexity, as shown in [4]. Alternatively, causal analysis, which is the focus of this paper, studies the potential changes in a system caused by another. In practical terms, causal discovery requires more detailed background information about the provided dataset to accurately conduct the study. However, the causal analysis does not require detailed underlying knowledge of the connectivity embedded within the dataset [3,5]. Among the various analysis methods used to quantify causality, Granger causality (GC) and Transfer Entropy (TE) are extensively used in various fields, including economics, climate, education, meteorology, neuroscience, and more [5–16]. For instance, Rebecca M. Pullon et al. utilized GC to demonstrate the decrease in cortical information flow throughout the brain as the subject loses responsiveness [10]; Yunyuan Gao et al. used TE to investigate the coupling strength of the brain Entropy 2023, 25, 806. https://doi.org/10.3390/e25050806 https://www.mdpi.com/journal/entropy Entropy 2023, 25, 806 2 of 59 between the motor and sensory areas [15]. Both of these methods are derived from the same notion of examining the reliability of a present process towards the past process(es). Despite sharing the same notion, their approaches to quantifying causality are completely different. GC evaluates the dependency based on signal models, which means modelling the signals incorrectly can lead to the wrong conclusion about the causality [11,17,18]. On the other hand, TE is model-free and employs the statistical dependency approach to access causality. However, it is subject to the estimation of the probability distribution, such as the number of bin size and the high dimensionality in the distribution due to the number of lags [19,20]. More specifically, inspired by Norbert Wiener, Clive Granger introduced GC [6,21], which suggested that the causality of two stochastic processes (x1 (t) and x2 (t)) exists if and only if one of the processes is able to predict another. For instance, if including the past observation of x1 (t) along with x2 (t) improves the prediction of the current state of x2 (t) compared to using only past information of x2 (t), then this would suggest that x1 (t) is Granger-causing x2 (t). This predictivity is quantitatively studied by comparing the errors of the autoregressive model (linear regressive model) that is used in modelling the signals [6,18]. As natural time series such as electroencephalogram (EEG) often exhibit oscillatory aspects, it is interesting to study the spectrum of the GC. John Geweke has discussed the evaluation of GC in the frequency domain instead of the time domain [17,22,23]. Additionally, the time-varying frequency domain of the GC that is studied in [18,24,25] is able to show both the spectrum of the GC and the period of causation occurrence. The frequency domain of GC is the spectral of GC which shows the amplitude of GC correlated to a specific frequency of the entire signals, while the time-varying frequency of GC shows when the coupling of the signals occurs alongside the spectral of GC. When utilizing the GC, it is necessary to model the provided signals using a linear autoregressive model [17,18]. However, this linear model cannot accommodate possible nonlinear and non-stationary effects within real-world signals, such as the brain EEG signals [26,27]. Therefore, instead of studying the causality through deterministic or parametric linear models, causality can be statistically quantified using information-theoretic measures like TE [19]. In the event of two stochastic processes, TE measures the additional information required from one process to achieve the current realization of another process [28]. Therefore, TE is not limited to any specific model or linearity assumption. Although TE uses a different approach in evaluating the causality, it can yield similar results to GC for a Gaussian process, as discussed analytically by Lionel Barnett et al. [29]. Alternatively, the causality can be investigated by examining the changes in the probability distribution that are caused by the inclusion of the other variables [30]. In a statistical space, the change in distribution is known as distance and can be quantified via Kullback–Leibler (KL) divergence. However, KL divergence cannot be well-defined as a distance, as it is not symmetrical and not path-dependent. Hence, information rate and information length are used with the intention to have the symmetrical and path-dependent measurement of the distance [30–33]. Information rate quantifies the rate of the temporal change in distributions, while information length measures the total statistical change in the process along the time [30,32]. Refs. [30,31] proposed an information-geometric causality measure, the so-called information rate causality based on the information rate, by quantifying causality through the effects of the information rate (see also [34]) and applied it to a solvable, linear Gaussian process (Kramer model) which has an exact timedependent distribution. In this paper, we will further develop this information rate causality for analyzing the numerically generated time-series signals that are in general time-varying and nonlinear and compare it with the GC and TE. Specifically, we will generate the signals by simulating the (discrete) autoregressive equations which contain unidirectional or bidirectional interactions. For the GC, both the frequency domain and time varying of GC will be presented. For the TE, the signals are evaluated as a whole at different lags. Next, the net TE is calculated net = T by taking the difference in the TE from both directions. For instance, T1,2 1→2 − T2→1 Entropy 2023, 25, 806 3 of 59 represents the net TE of signals 1 and 2. The net TE enables us to identify the appropriate lag(s) that yield significant results in TE. Using the knowledge of the appropriate lag(s) from the net TE, the TE is calculated through the window sliding approach, with each small window of the signals evaluating TE at the designated lag. Similarly, information rate causality will be evaluated through the window sliding approach with the information rate calculated within each window. In the scenario where the signals are coupled, it is expected that there will be spikes in the information rate causality, indicating changes in the distribution of one signal caused by another. Note that the methodology and concepts of the analyses used in this paper are mainly from the previous works [17–19,28,30,32]. The remainder of this paper is organized as follows. In Section 2, we briefly review the concepts for GC, TE, and information rate causality along with their implementation. The three different types of autoregressive models—unidirectional, interchange unidirectional, and bidirectional models—are introduced and analyzed in Sections 3 to 5, respectively. Section 6 contains our conclusions. Appendices A–C contain some background theory to make the paper self-contained. As a reference, the basic model of continuous coupling is studied and analyzed in Appendix D. 2. Methodologies 2.1. Granger Causality The general idea behind GC is to evaluate the dependency of one process on another process [17,18]. This dependency is calculated by comparing the errors/noises of the modelled signals through the autoregressive model. Consider two stochastic processes, namely x1 (t) and x2 (t), which can be modelled using the information from their respective signals (as shown in Equations (1) and (2)) or including some information from each other (as shown in Equations (3) and (4)). n x1 ( t ) = ∑ a i x 1 ( t − i ) + ǫ1 ( t ) , (1) ∑ c i x 2 ( t − i ) + ǫ2 ( t ) , (2) i =1 n x2 ( t ) = x1 ( t ) = x2 ( t ) = i =1 n n i =1 n i =1 n i =1 i =1 ∑ ai∗ x1 (t − i) + ∑ bi∗ x2 (t − i) + ǫ1∗ (t), (3) ∑ ci∗ x1 (t − i) + ∑ di∗ x2 (t − i) + ǫ2∗ (t). (4) Here, { ai , ci , ai∗ , bi∗ , ci∗ , di∗ } ∈ R are the constant parameters that represent the fractions of contribution from the past observations towards x1 (t) and x2 (t). ǫi∈[1,2] is the uncorrelated external additive noise needed for modelling the processes with variance Σii,i∈[1,2] . ∗ ǫi∗∈[1,2] is the correlated noise with the variance Σij, (i,j)∈[1,2] , which can be represented by   ∗ ∗ Σ11 Σ12 the covariance matrix Σ = ∗ ∗ . Σ21 Σ22 The total interdependence or total causality index (Fx1 · x2 ) between x1 (t) and x2 (t) is composed of two directional causalities (Fx1 → x2 and Fx2 → x1 ) and one instantaneous causality (Fx1 ↔ x2 ). These are defined in Equations (5) to (8). Σ Σ Σ11 Σ22 = log ∗ ∗ 11 22 ∗ 2 = Fx1 → x2 + Fx2 → x1 + Fx1 ↔ x2 , |Σ| Σ11 Σ22 − (Σ12 ) Σ = log 11 ∗ , Σ11 Fx1 · x2 = log Fx1 → x2 (5) (6) Entropy 2023, 25, 806 4 of 59 Σ22 ∗ , Σ22 Σ∗ Σ∗ = log 11 22 . |Σ| Fx2 → x1 = log (7) Fx1 ↔ x2 (8) In conjunction, the spectral of the GC can be evaluated in the frequency domain (ω = 2π f , f ≡ frequency) using the spectral matrix S(ω), transfer matrix H (ω), and the covariance matrix Σ. They are related according to Equation (9).  S (ω ) S(ω) = 11 S21 (ω )  D E S12 (ω ) = x(ω) x† (ω) = H (ω)ΣH † (ω). S22 (ω ) (9)    x1 ( ω ) H11 (ω ) H12 (ω ) Here, x(ω) = , H (ω) = = x2 ( ω ) H21 (ω ) H22 (ω )   1 − ∑in=1 ai∗ e− jωi − ∑in=1 bi∗ e− jωi ; h·i is defined as an ensemble average and † is de− ∑in=1 ci∗ e− jωi 1 − ∑in=1 di∗ e− jωi noted as the complex conjugate transpose of a matrix. Similar to the analysis in the time domain, the total independence in the frequency domain between x1 (ω ) and x2 (ω ) (Ix1 · x2 ) also consists of two directional causalities (Ix1 → x2 and Ix2 → x1 ) and one instantaneous causality (Ix1 ↔ x2 ). They are expressed as Equations (10) to (13).  S11 (ω )S22 (ω ) = Ix1 → x2 + Ix2 → x1 + Ix1 ↔ x2 . |S(ω)| S22 (ω ) , = log Ĥ22 (ω )Σ22 Ĥ 22 (ω ) S11 (ω ) = log , H̃11 (ω )Σ11 H̃ 11 (ω )    H̃11 (ω )Σ11 H̃ 11 (ω ) Ĥ22 (ω )Σ22 Ĥ 22 (ω ) = log . |S(ω)| Ix1 · x2 = log Ix1 → x2 Ix2 → x1 Ix1 ↔ x2 Σ∗ (10) (11) (12) (13) Σ∗ 12 Here, Ĥ22 (ω ) = H22 (ω ) + Σ12 ∗ H21 ( ω ) and H̃11 ( ω ) = H11 ( ω ) + Σ∗ H12 ( ω ); [·] is de22 11 noted as the complex conjugate of the element. In this paper, the frequency domain and the time-varying frequency domain of GC will be calculated through the non-parametric method. The general idea of this method is to decompose the spectral matrix S(ω) into the transfer matrix H (ω) and the covariance of the noises Σ. This decomposition can be achieved using Wilson’s algorithm [18,24,25,35,36]. The spectral matrices for the frequency domain and the time-varying frequency domain are expressed as Equations (14) and (15), respectively. D E xi ( ω ) x j ( ω ) S(ω) = , i, j ∈ R, (14) T E D si (t, ω )s j (t, ω ) S(t, ω) = , i, j ∈ R. (15) T Here, T denotes the total period of the signal. x (ω )n|n∈[i,j] is the discrete Fourier transform of the signal and sn|n∈[i,j] (t, ω ) is the short-time Fourier transform of the signal x (t)n|n∈[i,j] . Note that [·] denotes the complex conjugate. The Hann function window is used when evaluating the short-time Fourier transform. The window will move through the whole series with half of the data points being overlapped. The general flow of the procedure is shown in Figure 1a. Entropy 2023, 25, 806 5 of 59 window sliding direction (a) (b) # lags (c) window : sampling window sliding direction of window ( window) Figure 1. The procedure of implementing the causality analyses used in this paper , each colour of the lines in the image represents a different simulation. In this study, each window contains 0.5 s of data points and overlaps with the previous window by 0.25 s. The causality analyses are conducted within each window. (a) illustrates the components (S(. . .), H (. . . )) calculated within the windows for the non-parametric GC analysis. Refer to Section 2.1 to know the corresponding components. (b) shows the estimation of distributions p(. . .) for TE evaluation. The distributions p(. . .) are estimated based (1) (1) on the samples xn , xn+1 , and yn . Refer to Section 2.2 for the definition of TE. (c) demonstrates the evaluation of information rate causality. Each window (labelled as 1st window) is divided into two windows with the first half labelled as 2nd window. The distribution p( x (t), y(ti )) estimates the evolution of distribution x (t) while y(ti ) is fixed at the 2nd window. Refer to Section 2.3 for the definition of information rate causality. Entropy 2023, 25, 806 6 of 59 2.2. Transfer Entropy TE is a model-free approach used to calculate causality by evaluating the dependency between processes. The general expression of TE is given by Equation (17), which measures the additional information required for the realization of one state of the process (x1 (t)) depending on the past states of the processes (e.g., x1 (t − 1) and x2 (t − 1)) [19,28,37]. TEx2 (t)→ x1 (t) = ∑ (k) (k) (l ) p( xn+1 , xn , yn ) log2 x ∈ x1 (t),y∈ x2 (t) ∑ = (k) x ∈ x1 (t),y∈ x2 (t) (16) (k) p ( x n +1 | x n ) (k) (l ) p( xn+1 , xn , yn ) log2 (l ) p ( x n +1 | x n , y n ) (l ) (k) p ( x n +1 , x n , y n ) p ( x n ) (k) (k) (l ) p ( x n +1 , x n ) p ( x n , y n ) . (17) (k) Here, xn+1 represents the state of x1 (t) at the (n + 1)-th moment, xn represents the (l ) state at n-th moment of x1 (t) consisting of k previous states of x1 (t), yn represents the state at n-th moment of x2 (t) consisting of l previous states. Note that the previous states of k and l are arbitrary depending on the interest of the study. For instance, in a collective (2) observed stochastic process X = ( x1 , x2 , . . . , x10 ), x8 = ( xn , xm ) |n, m ∈ [1, 8] → ( x1 , x2 ) or ( x3 , x8 ), etc. Thus, TE quantifies the additional information needed for the realization of x1 (t) at (n + 1) from x2 (t) with the assumption that x1 (t) is independent of x2 (t). If x2 (t) (k) (l ) (k) has no impact on the future evolution of x1 (t), and hence p( xn+1 | xn , yn ) = p( xn+1 | xn ), then TY → X = 0. For the TE calculation, the probability distribution p(. . .) will be estimated via histogram with the bin size of 5, as this bin size is determined by the cubic root of Rice’s (1) (1) rule to accommodate the 3-dimensional of joint probability within TE (p( xn+1 , xn , yn )). Using a larger number of bin sizes will not be able to properly depict the distribution and it will bring a spurious result in calculating the TE. To simplify the analysis, the number of past values of k and l is set to 1 (k = l = 1) in this paper. Two different evaluations will be conducted for TE. First, assuming that the processes are stationary, the net TE (TEx1 (t),x2 (t) = TEx1 (t)→ x2 (t) − TEx2 (t)→ x1 (t) ) is evaluated on the entire processes at different lags. This will enable us to choose the appropriate past value(s)/lag(s) in evaluating the TE based on the significance of the net TE. Second, using the knowledge of the appropriate lag from the net TE, the TE is next evaluated via a sliding window approach. This sliding window approach is used to calculate the TE at each instance of the interested window. In this sliding window approach, a small number of data points of the stochastic processes is sampled within the window and the TE is calculated. These calculations are sketched in Figure 1b. 2.3. Information Rate Causality In a dimensionless statistical manifold, the distance between two probability distributions is defined by their statistical difference. One commonly used measure of this difference is the KL divergence/relative entropy, but this measure is not symmetrical and not path-dependent. For the time-dependent probability distribution, the changes in the distribution along with time can be measured through the information rate and information length. The information rate (Γ(t)) quantifies the rate of change of the distribution, while information length (L) measures the total change of the distribution. These measures are defined and expressed as Equations (18) and (19) [30–32]. Γ2 ( t ) = L= Z Z dx p( x, t)[∂t ln p( x, t)]2 = 4 Z 2  q dx ∂t p( x, t) , dt Γ(t). Here p( x, t) is the probability distribution of a stochastic process x at the time t. (18) (19) Entropy 2023, 25, 806 7 of 59 For studying causality, we consider two stochastic processes (x1 (t) and x2 (t)) and the information rate for a joint probability distribution is evaluated. The causality between these processes can be quantified by the changes in the information rate (statistical changes or the changes in distribution) while having another process remain static. For instance, the information rate of x2 (t) causing x1 (t) is defined as Equation (21) and called information rate causality in this paper. Z dx1 dx2 p( x1 (t), x2 (ti ))[∂t ln p( x1 (t), x2 (ti ))]2 2  q Z = 4 dx1 dx2 ∂t p( x1 (t), x2 (ti )) . Γ2 (t) x2 (t)→ x1 (t) = (20) (21) Referring to Figure 1c, here ti is denoted as the reference time that remains static in the process. Therefore, p( x1 (t), x2 (ti )) is a probability distribution that is sampled by having the process x2 (t) remain static at time ti while the process x1 (t) evolves along the time t. Since information rate can only quantify the changes in the distribution, therefore the information rate causality is evaluated within each window of interest instead of the whole process of the time series to determine whether the coupling of the processes is still persists. To accommodate the calculation, a discretized version of Equation (21) is used and it is expressed as Equation (22). 2 Γ ( t ) x2 → x1 = 4 ∑ x1 ∈ X1 ,x2 ∈ X2 ∆x1 ∆x2 [∆t]2 q p( x1 (t + ∆t), x2 (ti )) − q p( x1 (t), x2 (ti )) 2 (22) For calculating the information rate, the joint probability distribution for two processes is estimated via the histogram with the bin size (b) determined by the square root of Rice’s rule, as expressed in Equation (23). Rice’s rule is used because it is able to appropriately determines the bin size to sample the obtained data [38,39] for 1-dimensional distribution. Hence, the square root of Rice’s rule is to accommodate the 2-dimensional distribution in this case. q √  2 3 n , n = number of sampled data. (23) b= In this paper, we evaluate the information rate causality by employing Equation (22) to examine the impact of one process, which remains static in time, on the distribution of another process within a specific time window. To estimate the joint probability distribution utilized in Equation (22), a histogram is employed with a bin size determined by Equation (23). Figure 1c illustrates the overall procedure of this evaluation. 2.4. Summary of the Procedure: Data And Analysis We further develop information rate causality (refer to Section 2.3) for the analysis of our numerically generated data and compare it with the non-parametric GC (refer to Section 2.1) and window sliding TE (refer to Section 2.2). Our numerical data are generated by simulating different discrete autoregressive models covering both linear and nonlinear models. The simulation is conducted for 10,000 trials with each trial running for 25 s (physical time) at 200 Hz (physical sampling frequency). Different coupling/causal relationships between the signals are considered in this paper, such as unidirectional (refer to Section 3), interchanging unidirectional (refer to Section 4), and bidirectional (refer to Section 5). Note that the noncoupling cases are also considered in this paper to check the robustness of the causality analyses; specifically, we refer to the uncoupling of x1 (t) towards x2 (t) after the physical time 10 s. Using simulated signals, the causality analyses (GC, TE, information rate causaltiy) are conducted according to the sketch shown in Figure 1. To be consistent with the data points used for each window, each sampling window will contain 0.5 s of data points with an overlap of 0.25 s (0.5 × 0.5 s), which is equivalent to 100 data points for one simulation. Since the models are simulated for 10,000 trials, each Entropy 2023, 25, 806 8 of 59 window will consist of 1 × 106 sample points (100 points × 10, 000 trials). Note that this window is not applied to the frequency domain of GC and the net TE as both are calculated based on the whole series. 2.5. Summary of the Different Models and Key Results Prior to the discussion of different models in Sections 3 to 5 with the implementation of different analysis methods, we provide the key results/findings in Table 1. Table 1. Summary of the performance of GC, TE, and information rate causality for different models discussed in Sections 3 to 5. In general, GC performs well for linear and stationary signals; TE performs well only if the lags are chosen correctly; information rate causality performs well and can recover the underlying lags of the signals. Method GC TE Information rate causality Unidirectional (Section 3) Interchange (Section 4) Linear Nonlinear Linear performs well fails in nonstationary signals Nonlinear Bidirectional (Section 5) Linear performs well Nonlinear fails to capture the bidirectional coupling of signals performs well if correct lags are chosen or else produces spurious results performs well and able to recover the underlying lags of coupling 3. Unidirectional Model In this part, unidirectional causality, where the first process (x1 (t)) influencing the second process (x2 (t)), is considered for both linear and nonlinear autoregressive models given by Equations (24) and (25), respectively. The causality between the processes occurs as the processes are coupled with each other through the Heaviside step function (H (. . .)). Linear model: x1 (t) = 0.55x1 (t − 1) + ǫ1 (t), x2 (t) = H (τ − 10) x1 (t − 2) + ǫ2 (t). (24) Nonlinear model: x1 (t) = 0.55x1 (t − 1)e1−| x1 (t−1)| + ǫ1 (t), x2 (t) = H (τ − 10) x1 (t − 2) + ǫ2 (t). (25) Here, t is the time steps and τ is the physical time. Both t and τ are related as τ = [physical frequency] × t. H (τ − 10) is a Heaviside step function that allows the coupling to occur starting at the physical time of 10 s. ǫn|n∈[1,2] is the Gaussian noise that perturbates the systems. In this study, the noises are set to have zero mean (hǫn i = 0|n ∈ [1, 2]) along with specific covariance/variance (Σnm = h(ǫn − hǫn i)(ǫm − hǫm i)i|n, m ∈ [1, 2]), Σ Σ12 which will be expressed in covariance matrix ( 11 ). Due to the exponential term Σ21 Σ22 in the nonlinear model, the process will have non-stationary oscillation when the power of the exponential term becomes positive (eR → diverge). Hence, two different values of noise covariance will be considered to study the stationary and non-stationary oscillation of the processes. They are labelled and    as  large noise   with the respective  small noise 1.0 0.0 0.005 0.0 Σ11 Σ12 Σ11 Σ12 matrices expressed as = = and . Note 0.0 1.0 0.0 0.005 Σ21 Σ22 Σ21 Σ22 that the cross-covariance of the noises is set to zero (Σ12 = Σ21 = 0.0) to ensure the coupling is purely due to the intrinsic interaction in the model but not through the mutual noises. The linear model will have stationary oscillation for either large or small noises, and hence only one noise will be used to simulate the linear model and large noise is chosen. The general structure of the models is shown in Figure 2. Entropy 2023, 25, 806 9 of 59 To explore all possible combinations of cases, we investigate the coupling and noncoupling cases for both linear and nonlinear models. To this end, we investigate the following six difference cases: [linear: couple], [linear: noncouple], [nonlinear: large noise, coupling], [nonlinear: large noise, noncoupling], [nonlinear: small noise, coupling], and [nonlinear: small noise, noncoupling]. noise, all time time, noise, Figure 2. Model of the flow of information of the processes x1 (t) and x2 (t) for Equations (24) and (25). In this paper, the equations are simulated with the physical time of 25 s and sampling frequency of 200 Hz (5000 realizations) with either large noise or small noise. The coupling between the processes occurs at physical time of 10 s. The result of the simulation for the Equations (24) and (25) is shown in Figure 3. Notice that the signals for the nonlinear model exhibit non-stationary oscillation when the perturbating noise is small, as shown in Figure 3e,f. This is due to the divergence of the positive power of the exponential term of Equation (25) (e1−| x1 (t−1)| = eR → diverge |ǫ1 (t) ≡ small noise) and the influences of the previous state towards the current state (x1 (t) ← 0.55x1 (t − 1)). The rest of the simulated signals have stationary oscillation, as shown in the phase space where the observed states are localized around ( x1 , x2 ) = (0, 0). (a) linear: coupling (b) linear: noncouple Figure 3. Cont. Entropy 2023, 25, 806 10 of 59 (c) nonlinear: large noise, couple (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 3. Result of the simulation based on Equations (24) and (25) (refer to Figure 2 for graphical explanation of the process). The [top left, red] is the result of x1 and [bottom, green] is the result of x2 ; [right blue] is the phase space of x1 and x2 . Note that the coupling of the processes occurs at 10 s. Entropy 2023, 25, 806 11 of 59 3.1. Granger Causality As shown in Figure 4, the coupling of x1 to x2 can be captured well through nonparametric GC analysis for stationary linear signals, as shown in Figure 4a. Concurrently, the detection of the causality works well for stationary nonlinear signals, as shown in Figure 4c. This is because the nonlinear exponential term is well-approximated to a constant value as the spectral matrix decomposed to the transfer matrix and covariance matrix of noise via Wilson’s algorithm. Note that the amplitude of the GC decreases for this nonlinear stationary signal. This is due to the contribution of the exponential term (e−R ) that eventually affects the S22 (ω ) element of the spectral matrix (refer to Equation (11)) and hence the amplitude of the causality (Ix1 → x2 ). For non-stationary signals, the frequency domain of GC is still able to show the causality within the signals, but the time-varying frequency of GC cannot represent well the period of causation, as shown in Figure 4e. From the figure, we can see that the nonparametric GC gives a false result, as it suggests that the coupling between the processes occurs throughout the time. Similarly, the nonparametric GC analysis works fine in evaluating the non-causality cases for stationary cases but not so well for the non-stationary case (refer to Figure 4b,d,f. GC analysis fails to work on the signals oscillating non-stationary, as the signals cannot be modelled well via the linear autoregressive equations (refer to Equations (3) and (4)). (a) linear: couple (b) linear: noncouple Figure 4. Cont. Entropy 2023, 25, 806 12 of 59 (c) nonlinear: large noise, couple (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple Figure 4. Cont. Entropy 2023, 25, 806 13 of 59 (f) nonlinear: small noise, noncouple Figure 4. Result of the spectral and time-varying frequency of GC. Note that for each subfigure, the [left] shows the spectral (frequency domain) of GC, [blue] indicates x2 causes x1 , while [orange] indicates x1 causes x2 ; [top, right] shows the time-varying frequency of GC of x2 to x1 ; [bottom, right] shows the time-varying frequency of GC of x1 to x2 . The results shown are based on the processes in Equations (24) and (25). Refer to Figure 2 for the graphical explanation of the process. 3.2. Transfer Entropy The results from TE analysis are shown in Figure 5. By first assuming that the signals are stationary, the net TE (TEX,Y = TEX →Y − TEY → X ) is first calculated to find the suitable lag for evaluating the sliding window TE later. Figure 5a,c,e show that the net TE between the signals is significant at the lag of 1. This is because the simulated models (Equations (24) and (25)) have the coupling occurring at one lag (compares x1 (t) := x1 (t − 1) and x2 (t) := x1 (t − 2)). Next, the window sliding TE is conducted through the conditional probability (multidimensional probability) that contains the data from one previous lag (refer to Equation (17)) at each window of the evaluation of TE. As shown in Figure 5a,c,e, the window sliding TE is able to show the time (10 s physical time) when the causality occurs. Similarly, the TE is also able to capture the non-causality situation well for all the cases, as shown in Figure 5b,d,f. Even though TE is able to detect the causality between the signals for either linear stationary or nonlinear non-stationary cases, it requires extra inputs/assumptions (such as bin sizes, number of lags, and the dimension of the multidimensional probability) to work properly/accurately. The failure of TE in capturing the causality for the linear model (Equation (24)) is shown in Figure 6 when the lag is set to 9 instead of 1 when evaluating the TE. (a) linear: couple Figure 5. Cont. Entropy 2023, 25, 806 14 of 59 (b) linear: noncouple (c) nonlinear: large noise, couple (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple Figure 5. Cont. Entropy 2023, 25, 806 15 of 59 (f) nonlinear: small noise, noncouple Figure 5. Result of the net TE and window sliding TE. Note that for each subfigure, [left, 4 × 4] depicts the net TE for the whole signals; while the [right] shows the window sliding TE. The results shown are based on the processes in Equations (24) and (25). Refer to Figure 2 for the graphical explanation of the process. Figure 6. Failure of TE in capturing the causality between the signals x1 and x2 for [linear: couple] (refer to Equation (24)) when lag 9 is used for evaluation, as compared to Figure 5a, where the TE accurately shows that the coupling occurs after 10 s. 3.3. Information Rate Causality Both GC and TE detect causality by examining the increase in the predictability of the stochastic processes (measured by the inverse of entropy) when the past information is included. In comparison, information rate causality evaluates causality by quantifying the rate of change of the probability distribution of one variable conditioned on another, as noted previously. The results are shown in Figures 7 and 8 where the changes of the distribution due to the causality between the processes are well-reflected via the changes in the information rate (Γ). For the cases of noncoupling, the information rate is not changing and remains constant, which would suggest that none of the signals cause the changes to the joint distribution (refer to Figure 7b,d,f). Hence, no causality occurs between the signals. Similarly, prior to the coupling of the signals, the causality is not seen, and hence the information rate again remains constant. Notice that the information rate does not stay at zero when no causality happens, it is due to the noises/spikes of the estimated distribution, as shown in Figure S1. In the presence of the couplings, the changes in the distribution due to the causality among the signals can be observed via the changes in the information rate. For instance, the changes in the information rate due to the causality of x1 (t) to x2 (t) can be observed in Figure 7a,c,e (zoom-in: Figure 8a,c,e). Referring to the models, the signals are coupled with the lag difference of one (x1 (t) = x1 (t − 1) and x2 (t) = x1 (t − 2)), and hence the peak of the information rate is observed after one lag. From Figure 8a,c,e, the changes in the information rate prior to the coupling (at physical time 10 s) are observed as the evaluation window of [9.75 s to 10.25 s] picks up the causality of the signals from [10 s to 10.25 s]. Referring to the starting time (near 0 s) of Figure 7e,f, there is the presence of a sharp difference in information rate. This is due to the non-stationary oscillation of x1 (t) that causes the high disturbance of the distribution. Entropy 2023, 25, 806 16 of 59 (a) linear: couple (b) linear: noncouple (c) nonlinear: large noise, couple Figure 7. Cont. Entropy 2023, 25, 806 17 of 59 (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 7. Result of information rate causality for Equations (24) and (25). Entropy 2023, 25, 806 18 of 59 (a) linear: couple (b) linear: noncouple (c) nonlinear: large noise, couple Figure 8. Cont. Entropy 2023, 25, 806 19 of 59 (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 8. Zoom-in of Figure 7 for the information rate causality between 9.6 s and 10.4 s. Note that the change in the causality occurs at 10 s. To demonstrate the capability of information rate causality to detect lag differences within a signal, we simulated and analyzed Equation (26), which has the coupling starting at 10 s with a lag difference of 9 (compares x1 = x1 (t − 1) and x2 = x1 (t − 10)). Figure 9 shows the section of the information rate from 9.6 s to 10.4 s. The figure reveals that the peak of the information rate appears at the 9th time step (0.045 s) at every window of evaluation, which accurately reflects the lag difference specified in Equation (26). This result suggests Entropy 2023, 25, 806 20 of 59 that information rate causality can effectively capture the underlying causality of the signals at each evaluation window. x1 (t) = 0.55x1 (t − 1) + ǫ1 (t), x2 (t) = H (τ − 10) x1 (t − 10) + ǫ2 (t). (26) Figure 9. The information rate causality of Equation (26) between 9.6 s and 10.4 s where the causality occurs at 10 s. 4. Interchange of Unidirectional Model In this part, similar models from Section 3 are used with slight modifications to study the interchange of the unidirectional causality. The models are modified to Equations (27) and (28) for linear and nonlinear models, respectively. The modification with the Heaviside step functions (H (. . .)) in the models portrays that the signal x2 (t) causing x1 (t) occurs prior to physical time (τ) 10 s, while the signal x1 (t) causes x2 (t) after 10 s. The general structure of Equations (27) and (28) is illustrated in Figure 10. Linear model: x1 (t) = H (10 − τ )0.55x2 (t − 1) + ǫ1 (t), x2 (t) = H (τ − 10) x1 (t − 2) + ǫ2 (t). (27) Nonlinear model: x1 (t) = H (10 − τ )0.55x2 (t − 1)e1−| x2 (t−1)| + ǫ1 (t), x2 (t) = H (τ − 10) x1 (t − 2) + ǫ2 (t). (28) Here t is the time-step and τ is the physical time. The Heaviside step functions are used with H (10 − τ ) suggesting the coupling occurs from the beginning of the simulation till the physical time 10 s conversely H (τ − 10), suggesting the coupling occurs after 10 s. Similar to Section 3, 6 possible cases are simulated based on Equations (27) and (28). The simulated signals will be evaluated by GC, TE, and information rate causality. The result of the simulations is shown in Figure 11, which suggests that all the signals oscillate stationary. Unlike Equation (25), in which the signal x1 (t) is perturbated by its previous state by a factor of 0.55 (0.55x1 (t − 1)) and the small noise (ǫ1 (t)), Equation (28) is perturbated by the past state of x2 (t) (x2 (t − 2)) instead. Hence, the power of the exponential will always be negative; consequently, all the simulated signals have stationary oscillations. Note that the noncoupling is referring to the decoupling of x1 (t) to x2 (t) (H (τ − 10) = 0). Entropy 2023, 25, 806 21 of 59 noise, AFTER time, noise, BEFORE time, Figure 10. Model of the flow of information of the processes x1 (t) and x2 (t) for Equations (27) and (28). In this paper, the equations are simulated with the physical time of 25 s and sampling frequency of 200 Hertz (5000 realizations) with either large noise or small noise. Note that the process x2 (t) coupling with x1 (t) occurs before 10 s and the interchange of coupling occurs after 10 s. In the context of noncoupling, it is referring to H (τ − 10) = 0. (a) linear: couple (b) linear: noncouple (c) nonlinear: large noise, couple Figure 11. Cont. Entropy 2023, 25, 806 22 of 59 (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 11. Result of simulation based on Equations (27) and (28) (refer to Figure 10 for graphical explanation of the process). The [top left, red] is the result of x1 and [bottom, green] is the result of x2 ; [right, blue] is the phase space of x1 and x2 . Note that in the context of noncoupling, it is referring to H (τ − 10) = 0. 4.1. Granger Causality Based on the models, the signal x2 (t) will first cause x1 (t) for the beginning of the process until the physical time of 10 s, and x1 (t) causes x2 (t) after the 10 s. This pattern is captured in the time-varying frequency of GC, as shown in Figure 12 for all the cases. Prior to 10 s, the x1 (t) is caused by x2 (t) by a factor of 0.55, and hence a similar factor of GC is shown in the time-varying frequency domain. Similarly, the coupling between the signals can also be observed in the frequency domain of GC. Notice that the [nonlinear: Entropy 2023, 25, 806 23 of 59 small noise] cases are able to show the significance of x2 (t) causing x1 (t) prior to 10 s because the small noise and the exponential term had x2 (t) causing x1 (t) increased to about 1.5 (e1 × 0.55 ≈ 1.5). (a) linear: couple (b) linear: noncouple (c) nonlinear: large noise, couple Figure 12. Cont. Entropy 2023, 25, 806 24 of 59 (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 12. Result of the spectral and time-varying frequency of GC. Note that for each subfigure, the [left] shows the spectral (frequency domain) of GC, [blue] indicates x2 causes x1 , while [orange] indicates x1 causes x2 ; [top, right] shows the time-varying frequency of GC of x2 to x1 ; [bottom, right] shows the time-varying frequency of GC of x1 to x2 . The results shown are based on the processes in Equations (27) and (28). Refer to Figure 10 for the graphical explanation of the process. Note that the noncoupling is referring to H (τ − 10) = 0 for all the cases. 4.2. Transfer Entropy TE is able to show the correction direction of the causality between the signals if the correct lag is chosen. Therefore, having models (like Equations (27) and (28)) with the coupling occurring at different time lags cannot properly show the direction of the causality via the window sliding TE. For instance, the window sliding TE shows that x2 (t) is causing Entropy 2023, 25, 806 25 of 59 the x1 (t) when lag 0 is chosen, while the reverse is shown when lag 1 is used, as shown in Figure 13 (for lag 0) and Figure 14 (for lag 1). This is depicted well in the underlining models (Equations (27) and (28)), in which x1 (t) causes x2 (t) one lag later. As a result, TE cannot show the whole causality of the signals by just one lag. (a) linear: couple (b) linear: noncouple (c) nonlinear: large noise, couple (d) nonlinear: large noise, noncouple Figure 13. Cont. Entropy 2023, 25, 806 26 of 59 (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 13. Result of the net TE and window sliding TE with the lag of 0. Note that for each subfigure, [left, 4 × 4] depicts the net TE for the whole signals, while the [right] shows the window sliding TE. The results shown are based on the processes in Equations (27) and (28). Refer to Figure 10 for the graphical explanation of the process. Note that the noncoupling is referring to H (τ − 10) = 0 for all the cases. (a) linear: couple (b) linear: noncouple Figure 14. Cont. Entropy 2023, 25, 806 27 of 59 (c) nonlinear: large noise, couple (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 14. Result of the net TE and window sliding TE with the lag of 1. Note that for each subfigure, [left, 4 × 4] depicts the net TE for the whole signals, while the [right] shows the window sliding TE. The results shown are based on the processes in Equations (27) and (28). Refer to Figure 10 for the graphical explanation of the process. Note that the noncoupling is referring to H (τ − 10) = 0 for all the cases. 4.3. Information Rate Causality Alternatively, the information rate causality is able to show the coupling of the signals along with the lag of coupling in each window of the evaluation. As discussed, the information rate causality evaluates the causality by observing the changes in the probability distribution of one signal after it is influenced by another signal. As shown in Figures 15 and 16, the information rate causality can captures the interchange of the coupling among the signals of the simulated models (Equations (27) and (28)). As discussed Entropy 2023, 25, 806 28 of 59 in TE, signal x2 (t) is causing x1 (t) at a lag of 0, and x1 (t) causes x2 (t) at a lag of 1. These lags can be observed at each window of the information rate evaluation. For instance, as shown in Figure 16, the information rate of 2 to 1 (Γ2 2 to 1) has a spike at the beginning of the time, while the information rate of 1 to 2 (Γ2 1 to 2) has a spike after one time step (one lag). Hence, the information rate causality is capable of showing the underlying causality of the signals. (a) linear: couple (b) linear: noncouple (c) nonlinear: large noise, couple Figure 15. Cont. Entropy 2023, 25, 806 29 of 59 (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 15. Result of information rate causality for Equations (27) and (28). Note that the interchange of the causality occurs at 10 s. Entropy 2023, 25, 806 30 of 59 (a) linear: couple (b) linear: noncouple (c) nonlinear: large noise, couple Figure 16. Cont. Entropy 2023, 25, 806 31 of 59 (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 16. Zoom-in of Figure 15 for the information rate causality between 9.6 s and 10.4 s. Note that the interchange of the causality occurs at 10 s. 5. Bidirectional Model In this section, the bidirectional causality is studied by modifying the linear and nonlinear models from Section 3 as Equations (29) and (30), respectively. These modified models have the signals x2 (t) continuously influencing x1 (t) at all times t ≥ 1. In addition, at time 10 s, the Heaviside step function (H (τ − 10)) enables the coupling of x1 (t) and Entropy 2023, 25, 806 32 of 59 x2 (t), which allows x1 (t) to influence x2 (t) while the previous coupling remains intact. The general structure of the Equations (29) and (30) is illustrated in Figure 17. Linear model: x1 (t) = 0.55x2 (t − 1) + ǫ1 (t), x2 (t) = H (τ − 10) x1 (t − 2) + ǫ2 (t). (29) Nonlinear model: x1 (t) = 0.55x2 (t − 1)e1−| x2 (t−1)| + ǫ1 (t), x2 (t) = H (τ − 10) x1 (t − 2) + ǫ2 (t). (30) Here, t is the time step and τ is the physical time; H (τ − 10) is the Heaviside step function that allows the coupling of the signal starting at a physical time of 10 s. Similar to the previous model, 6 cases are simulated and the signals are used to evaluate through GC, TE, and information rate causality. Recall that the noncoupling cases here are referring to the decoupling of x1 (t) to x2 (t) (H (τ − 10) = 0). noise, time, noise, All time Figure 17. Model of the flow of information of the processes x1 (t) and x2 (t) for Equations (29) and (30). In this paper, the equations are simulated with physical time of 25 s and sampling frequency of 200 Hz (5000 realizations) with either large noise or small noise. Note that the x2 (t) coupled with x1 (t) throughout the process and the occurrence of bidirectional causality happens after 10 s. In the context of noncoupling, it is referring to H (τ − 10) = 0. The simulation result from all aforementioned cases is shown in Figure 18. From the observation of the individual signal and the phase space, the scenario of [nonlinear: small noise, couple] has non-stationary oscillation after the coupling, as shown in Figure 18e. This is due to the divergence caused by the nonlinear exponential term embedded within the simulated equation (e1−|x2 (t−1)| = eR) along with the factor of 0.55 of the previous state of x2 (t) (0.55x2 (t − 1) . . .). On the other hand, [nonlinear: small noise, noncouple] has a stationary oscillation, as shown in Figure 18f; this is because x1 (t) only receives the input from x2 (t), which keeps the power of the exponential term negative and hence results in a stationary oscillation. (a) linear: couple Figure 18. Cont. Entropy 2023, 25, 806 33 of 59 (b) linear: noncouple (c) nonlinear: large noise, couple (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple Figure 18. Cont. Entropy 2023, 25, 806 34 of 59 (f) nonlinear: small noise, noncouple Figure 18. Result of simulation based on Equations (29) and (30) (refer to Figure 17 for graphical explanation of the process). The [top left, red] is the result of x1 ; [bottom, green] shows the result of x2 ; [right, blue] shows the phase space of x1 and x2 . Note that the noncoupling in the context is referring to H (τ − 10) = 0. 5.1. Granger Causality From the GC analysis, the coupling from x2 (t) to x1 (t) cannot be captured well in it except for the [nonlinear: small noise] cases. This result is due to a similar reason as discussed in Section 4, i.e., the coupling between the signals is affected by a factor of 1.5 (e1 × 0.55 ≈ 1.5). Therefore, the rest of the cases cannot show the significant contrast of the causality from [x2 (t) to x1 (t)] as shown in Figure 19. Furthermore, referring to the time-varying GC, the bidirectional coupling/causality between x1 (t) and x2 (t) that occurs after 10 s are not able to capture all the coupling cases. (a) linear: couple (b) linear: noncouple Figure 19. Cont. Entropy 2023, 25, 806 35 of 59 (c) nonlinear: large noise, couple (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple Figure 19. Cont. Entropy 2023, 25, 806 36 of 59 (f) nonlinear: small noise, noncouple Figure 19. Result of the spectral and time-varying frequency of GC. Within each subfigure, the [left] shows the spectral (frequency domain) of GC, with [blue] showing that x2 causes x1 (Ix2 → x1 ) and [orange] indicating that x1 causes x2 (Ix1 → x2 ). The [top, right] figure shows that the time-varying frequency of GC of x2 causes x1 and [bottom, right] shows that x1 causes x2 . The results shown are based on the expression in Equations (29) and (30). Refer to Figure 17 for the graphical explanation of the process. Note that the noncoupling is referring to H (τ − 10) = 0 for all the cases. 5.2. Transfer Entropy The results of TE shown in Figures 20 and 21 once again are quite similar to those shown in Section 4. The window sliding TE, evaluated with lag 0, shows that signal x2 (t) causes x1 (t) throughout the time, as shown in Figure 20. For the coupling cases, the amplitude of the TE changes when x1 (t) feedbacks to x2 (t) (after time 10 s). Similar to Section 4, the window sliding TE only shows that signal x1 (t) is causing x2 (t) when evaluated at lag 1 for the coupling cases shown in Figure 21. Hence, TE cannot fully capture the causality between the signals with just one lag. (a) linear: couple (b) linear: noncouple Figure 20. Cont. Entropy 2023, 25, 806 37 of 59 (c) nonlinear: large noise, couple (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 20. Result of the net TE and window sliding TE with the lag at 0. Within each subfigure, the [left, 4 × 4] figures show the net TE for the whole signals, while the [right] shows the window sliding TE. The results shown are based on the expression in Equations (29) and (30). Refer to Figure 17 for the graphical explanation of the process. Note that the noncoupling is referring to H (τ − 10) = 0 for all the cases. Entropy 2023, 25, 806 38 of 59 (a) linear: couple (b) linear: noncouple (c) nonlinear: large noise, couple (d) nonlinear: large noise, noncouple Figure 21. Cont. Entropy 2023, 25, 806 39 of 59 (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 21. Result of the net TE and window sliding TE with the lag at 1. Within each subfigure, the [left, 4 × 4] figures show the net TE for the whole signals, while the [right] shows the window sliding TE. The results shown are based on the expression in Equations (29) and (30). Refer to Figure 17 for the graphical explanation of the process. Note that the noncoupling is referring to H (τ − 10) = 0 for all the cases. 5.3. Information Rate Causality The information rate causality analysis evaluates the causality through the change in probability distribution of an original signal after including another signal assuming it causes the original signal. The information rate causality is able to show the coupling among the signals well, as shown in Figures 22 and 23. Similar to Section 4, prior to time 10 s, the causality from signal x2 (t) to x1 (t) can be depicted with the presence of the peak of Γ2 2 to 1. The peak presented at the beginning of the evaluation window suggests that the coupling occurs at lag 0. After 10 s, the mutual feedback between the signals is shown within the evaluation window of the information rate causality. This mutual causality between the signals is observed through the alternating oscillations of the peak occurrence in the information rate within the window. Hence, this can reflect well the underlying model of the signals. Note that this is different than Section 4, which is an interchange of unidirectional causality that has the feedback solely from one signal to the other without retaining information from itself. Entropy 2023, 25, 806 40 of 59 (a) linear: couple (b) linear: noncouple (c) nonlinear: large noise, couple Figure 22. Cont. Entropy 2023, 25, 806 41 of 59 (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 22. Result of information rate causality for Equations (29) and (30). Note that the bidirectional causation begins at 10 s. The results shown are based on the expression in Equations (29) and (30). Refer to Figure 17 for the graphical explanation of the process. Note that the noncoupling is referring to H (τ − 10) = 0 for all the cases. Entropy 2023, 25, 806 42 of 59 (a) linear: couple (b) linear: noncouple (c) nonlinear: large noise, couple Figure 23. Cont. Entropy 2023, 25, 806 43 of 59 (d) nonlinear: large noise, noncouple (e) nonlinear: small noise, couple (f) nonlinear: small noise, noncouple Figure 23. Zoom-in of Figure 22 for the information rate causality between 9.6 s and 10.4 s. Note that the bidirectional causation begins at 10 s. The results shown are based on the expression in Equations (29) and (30). Refer to Figure 17 for the graphical explanation of the process. Note that the noncoupling is referring to H (τ − 10) = 0 for all the cases. 6. Conclusions In this paper, we proposed an alternative causal analysis method to the popular GC and TE for causality quantification. Specifically, based on information rate, which measures the rate of temporal change in the time-dependent distribution, we developed a model-free, Entropy 2023, 25, 806 44 of 59 information-geometric measure of causality—information rate causality—that is suitable for analyzing numerically generated non-stationary, nonlinear data. To compare the GC, TE, and information rate causality, we applied the methods to numerical data generated by simulating different types of discrete autoregressive models which contain linear and nonlinear interactions. In Section 3, we showed that information rate causality performs equally well compared to GC in the standard linear stationary signals. Later, we showed that the GC did not perform well for non-stationary and/or nonlinear signals. Furthermore, it failed to capture mutual feedback between the signals, as shown specifically in Section 5 in all the coupling cases. TE performed slightly better than GC, since it is model-independent. However, the information on the lag is needed to properly evaluate the TE, as shown in Sections 4 and 5. In comparison with GC and TE, information rate causality has shown to be able to uncover the underlying mechanism of causality in the signals, such as the interchanging oscillatory feedback between the signals that is discussed in Section 5, which was not captured by either GC or TE. While our results in this paper were obtained based on the interaction of two time series of different types of (non)linearity and (non)stationary, they have at least pointed out some limitations of GC and TE that can be resolved by employing information rate causality. It remains as future work to extend this work and to investigate a larger class of time series data, including real data such as EEG signals. Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/e25050806/s1, Figure S1: Estimation of the joint probability distribution via histogram. Author Contributions: Conceptualization, E.-j.K.; Methodology; E.-j.K., F.H. and H.J.C.; software, H.J.C.; Writing–original draft, H.J.C.; Writing–review & editing, E.-j.K.; Supervision, E.-j.K. and F.H. All authors have read and agreed to the published version of the manuscript. Funding: E.-j.K.and F.H. are partly supported by EPSRC Grant EP/W036770/1. Institutional Review Board Statement: Not applicable. Data Availability Statement: The data that support the findings of this study are available from the corresponding author upon reasonable request. Conflicts of Interest: The authors declare no conflict of interest. Appendix A. Granger Causality Appendix A.1. Autoregressive Process Given a stochastic process (x (t)), it can be modelled through an autoregressive (AR) model with available/appropriate n number of observations from the process (e.g., AR(n) known as autoregressive of nth order). The general expression of the AR(n) model is expressed as Equation (A1). n x (t) = ∑ α i x ( t − i ) + ǫ ( t ), (A1) i =1 such that t is the time step, and αi is the constant coefficient which suggested the weight of contribution from the ith past observation towards the process x (t). ǫ(t) is the additive noise that has the variance Σ, which indicated the proportion of the external influence towards the process. Hence, a process that is strongly related to past observations will have small noise (e.g., small Σ). Appendix A.2. Granger Causality: Time Domain Given two stochastic processes (x1 (t) and x2 (t)), they can be modelled independently, as shown in Equation (A1) and expressed as Equations (A2) and (A3). Entropy 2023, 25, 806 45 of 59 n x1 ( t ) = ∑ a i x 1 ( t − i ) + ǫ1 ( t ) ; (A2) ∑ c i x 2 ( t − i ) + ǫ2 ( t ) , (A3) i =1 n x2 ( t ) = i =1 such that ǫ1 (t) and ǫ2 (t) are the noises with variances of Σ11 and Σ22 , respectively; ai and ci are the constant parameters that suggested the fractions of the contribution from the past observations towards x1 (t) and x2 (t), respectively. If there is a present causal relationship between two processes, they can be modelled with the past values from each other, as shown in Equations (A4) and (A5). x1 ( t ) = x2 ( t ) = n n i =1 n i =1 n i =1 i =1 ∑ ai x1 (t − i) + ∑ bi x2 (t − i) + ǫ1∗ (t), (A4) ∑ ci x1 (t − i) + ∑ di x2 (t − i) + ǫ2∗ (t). (A5) Here ai , bi , ci , and di are the constants parameters. The noises (ǫ1∗ (t) and ǫ2∗ (t)) would have the variance presented as the covariance matrix Σ shown in Equation (A6): Σ=  ∗ Σ11 ∗ Σ21  ∗ Σ12 ∗ . Σ22 (A6) ∗ and Σ∗ are the variance of the ǫ∗ ( t ) and ǫ∗ ( t ), Similar to Equations (A2) and (A3), Σ11 22 2 1 ∗ ∗ respectively, while Σ12 = Σ21 is the covariance of the noises (ǫ1∗ (t) and ǫ2∗ (t)). The causal relationship between x1 (t) and x2 (t) can determined by evaluating the ∗ < Σ noises from Equations (A2) to (A5). For instance, Σ11 11 would indicate that the predictability of x1 (t) increases as the past information of x2 (t) is included. Hence, this would suggest that there is a causal relationship from x2 (t) to x1 (t) which is known ∗ < Σ would suggest x ( t ) as GC (x2 (t) Granger-causes x1 (t), Fx2 → x1 ). Similarly, Σ22 22 1 ∗ = Σ∗ would suggest that Granger-causes x2 (t), Fx1 → x2 . The covariance of the noises Σ12 21 the processes are correlated by one another via a common external force that drives the processes; consequently, it shows an instantaneous causality (Fx1 ↔ x2 ). The idea behind GC is to evaluate the total interdependence (also known as total causality index) between x1 (t) and x2 (t), which consists of two directional causalities (Fx1 → x2 and Fx2 → x1 ) and one instantaneous causality (Fx1 ↔ x2 ). The total interdependence between x1 (t) and x2 (t) is defined as Equation (A7). Fx1 · x2 = log Σ11 Σ22 Σ Σ = log ∗ ∗ 11 22 ∗ 2 = Fx1 → x2 + Fx2 → x1 + Fx1 ↔ x2 , |Σ| Σ11 Σ22 − (Σ12 ) (A7) such that |Σ| is the determinant of the covariance matrix Σ. If either one of the signals ∗ <Σ , Granger-causes or correlates to one another, the value of Fx1 · x2 > 0 because the Σ11 11 ∗ ∗ Σ22 < Σ22 , or Σ12 6= 0. On the contrary, Fx1 · x2 = 0 indicates there is no GC relationship and correlation between the signals. The directional causalities and instantaneous causality are defined as Equations (A8) to (A10). Σ11 ∗ , Σ11 Σ = log 22 ∗ , Σ22 Σ∗ Σ∗ = log 11 22 . |Σ| Fx1 → x2 = log (A8) Fx2 → x1 (A9) Fx1 ↔ x2 (A10) Entropy 2023, 25, 806 46 of 59 Similar to Equation (A7), the individual causalities from Equations (A8) to (A10) have the range of [0, ∞), with F(... ) = 0 indicating the absence of causality relationship. Appendix A.3. Granger Causality: Frequency Domain The spectral of the GC is studied at the frequency domain of the AR model. Unlike the time domain, the frequency domain of GC required both the constant parameters and the variance of the noises for the evaluation. To understand the formulation, the lag operator (Lk ) is used and defined as Equation (A11). L k x ( t ) = x ( t − k ). (A11) Hence, by applying the lag operator expressed in Equation (A11) to Equations (A4) and (A5), they can be rewritten as (   x1 (t) = ∑in=1 ai Li x1 (t) + ∑in=1 bi Li x2 (t) + ǫ1∗ (t),   (A12) x2 (t) = ∑in=1 ci Li x1 (t) + ∑in=1 di Li x2 (t) + ǫ2∗ (t), (   1 − ∑in=1 ai Li x1 (t) + − ∑in=1 bi Li x2 (t) = ǫ1∗ (t),   (A13) − ∑in=1 ci Li x1 (t) + 1 − ∑in=1 di Li x2 (t) = ǫ2∗ (t), # (" #" # " a ( L ) b ( L ) x1 ( t ) ǫ1∗ (t) , (A14) = ∗ c ( L ) d ( L ) x2 ( t ) ǫ2 ( t ) Fourier transform → (" n (" a(ω ) b(ω ) c(ω ) d(ω ) x1 ( ω ) x2 ( ω ) # = " #" x1 ( ω ) x2 ( ω ) a(ω ) b(ω ) c(ω ) d(ω ) # = " # −1 " ǫ1∗ (ω ) ǫ2∗ (ω ) ǫ1∗ (ω ) ǫ2∗ (ω ) # # ≡ A ( ω ) x ( ω ) = ǫ ∗ ( ω ), = " H11 (ω ) H12 (ω ) H21 (ω ) H22 (ω ) x ( ω ) = H ( ω ) ǫ ∗ ( ω ), #" ǫ1∗ (ω ) ǫ2∗ (ω ) (A15) # , (A16) (A17) such that A(ω) is known as the coefficient matrix. The elements in the coefficient matrix are expressed as n a(ω ) = 1 − ∑ ai e− jωi , (A18) b(ω ) = − ∑ bi e− jωi , (A19) c(ω ) = − ∑ ci e− jωi , (A20) i =1 n i =1 n i =1 n d(ω ) = 1 − ∑ di e− jωi , (A21) i =1 which is obtained via the Fourier transform from the time domain to the frequency domain. Note that j is the imaginary unit while ω is the angular frequency that is defined as ω = 2π f , f = frequency. The matrix H (ω) is known as the transfer matrix. The spectral matrix (S(ω)) can be obtained by proper ensemble averaging of Equation (A17), which is expressed as Equation (A22):  D  E S (ω ) S12 (ω ) = x(ω) x† (ω) = H (ω)ΣH † (ω), (A22) S(ω) = 11 S21 (ω ) S22 (ω ) Entropy 2023, 25, 806 47 of 59 such that h·i is defined as an ensemble average, and † is denoted as the complex conjugate transpose of a matrix. The element Snm|n=m is known as auto spectra and Snm|n6=m is known as cross spectra. Similar to Equation (A7), the total interdependence between x1 (ω ) and x2 (ω ) is defined as Equation (A23), which consists of two directional causalities (Ix1 → x2 and Ix2 → x1 ) and one instantaneous causality (Ix1 ↔ x2 ). Ix1 · x2 = log S11 (ω )S22 (ω ) = Ix1 → x2 + Ix2 → x1 + Ix1 ↔ x2 , |S(ω)| (A23) such that |S(ω)| is the determinant of spectral matrix S(ω). The individual causalities are defined as Ix1 → x2 = log S22 (ω ) , ∗ (ω ) Ĥ22 (ω )Σ22 Ĥ22 (A24) Ix2 → x1 = log S11 (ω ) , ∗ (ω ) H̃11 (ω )Σ11 H̃11 (A25) Ix1 ↔ x2   ∗ ( ω ) Ĥ ( ω ) Σ Ĥ ∗ ( ω ) H̃11 (ω )Σ11 H̃11 22 22 22 . = log |S(ω)| (A26) Σ12 Σ12 H21 (ω ) and H̃11 (ω ) = H11 (ω ) + Σ H12 (ω ); ∗ is deHere Ĥ22 (ω ) = H22 (ω ) + Σ 22 11 noted as the complex conjugate of the element [17]. As shown in Equations (A23) to (A26), the elements from the transfer matrix and covariance matrix of the noises are needed to evaluate the GC in the frequency domain. The elements can be obtained via the constant parameters of Equations (A4) and (A5), which can be estimated via Yule–Walker equations [18,40]. This approach of evaluating GC through the knowledge of constant parameters is known as the parametric method; it has the disadvantage that the information of lags (number of past values) is needed when estimating the constant parameters. Alternatively, the non-parametric method conducts the GC analysis by directly decomposing the spectral matrix (S(ω)) into the transfer matrix (H (ω)) and the covariance matrix of the noises (Σ). The decomposition is conducted via a recursive algorithm known as the Wilson algorithm [18,35]. The derivation of the Wilson algorithm will be discussed in the next section. Appendix A.4. Wilson Algorithm The spectral matrix S(ω) that is defined in the range [−π, π ] is known as the spectral density function. It can be expressed as Equation (A27). ∞ S(ω) = Ci e jωi , ∑ (A27) i =−∞ Ci = 1 2π Z π −π S(ω)e− jωi dω, (A28) such that the Ck is known as the correlation coefficient obtained via inverse Fourier transform Equation (A28). Following Wilson’s factorization theorem (theorem 1.1 in [35]), the spectral density function is the product of generating function ψ(e jω ) with its complex conjugate transpose that is defined on a unit circle |z| = 1: S(ω) = ψ(e jω )ψ(e jω )† , (A29) ∞ ψ(e jω ) = ∑ Ai e jωi , (A30) i =0 Ai = 1 2π Z π −π ψ(e jω )e− jωi dω. (A31) Entropy 2023, 25, 806 48 of 59 Here Ak is the moving average coefficient, which can be obtained by inverse Fourier transform as expressed in Equation (A31). With Equation (A30) being holomorphic extended to the inner of the unit circle (|z| < 1), it can be rewritten as Equation (A32). ∞ ∞ ψ(e jω ) = ψ(z) = ∑ A i z i = A0 + ∑ A i z i . (A32) i =1 i =0 ∞ ∑ Ai zi ∴ S(z) = ψ(z)ψ(z)† = i =0 ! ∞ ∑ ′ Ai ′ ⊤ z −i ′ i =0 ! ∞ = ∞ ∑∑ ′ ′ Ai Ai ′ ⊤ zi −i . (A33) i =0 i =0 Comparing Equations (A22) and (A33) when z = e jω = 0: S(0) = ψ(0)ψ(0)† = H (0)ΣH † (0) = IΣI ⊤ = Σ = A0 A0 ⊤ , A0 (A34) such that I is the identity matrix, and [·]⊤ indicates the transpose of the matrix. Since ⊤ −⊤ = I, the transfer matrix H ( ω ) can be obtained via 0 A0 A0 −1 A S(z) = ψ(z) Iψ(z)† = ψ(z) A0 −1 A0 A0 ⊤ A0 −⊤ ψ(z)† = ψ(z) A0 −1 ΣA0 −⊤ ψ(z)† , (A35) ∴ H (ω) = ψ(e jω ) A0 −1 , H (ω)† = A0 −⊤ ψ(e jω )† . (A36) As shown, the transfer matrix H (ω) and covariance matrix of the noises Σ can be obtained once the generating function ψ(e jω ) is known. G.T. Wilson introduces the solution of ψ(e jω ) by iteration along with the plus operator ([·]+ ) of a given spectral density function f (ω ), as expressed in Equations (A37) to (A39). ∞ f (ω ) = ∑ β i e jωi , β i = i=∞ 1 2π Z π ∞ f (ω )e− jωi dω, −π [ f (ω )]+ = 0.5β 0 + ∑ β i e jωi , (A37) (A38) i =1 ψi+1 = ψi  ψi−1 S |  ψi−1 {z † +I [ f (ω )]+ + . (A39) } From the previous work [35], it has been shown that the iteration of Equation (A39) will be uniquely converged if the initial generating function (ψ0 ) is an upper triangle of the matrix which can be obtained via Cholesky factorization. Appendix B. Transfer Entropy Given a stochastic process (X), the average information/uncertainty needed to describe the process can be defined by Shannon entropy (IX ) and expressed as Equation (A40). IX = − ∑ x∈X p( x ) log2 p( x ), (A40) such that p( x ) is the discrete probability distribution at state x. The unit of the Shannon entropy varies with the base of the logarithm (i.e., the unit is bits when the logarithm is based 2) [28,41,42]. For the situation of two stochastic processes (X and Y), the statistical dependency of these two processes can be evaluated via Mutual Information (MI) and defined as Equation (A41). MI = ∑ x ∈ X,y∈Y p( x, y) log2 p( x, y) . p( x ) p(y) (A41) Entropy 2023, 25, 806 49 of 59 The definition of MI is derived from the Kullback–Leibler (KL) divergence. KL divergence measures the differences between two probability distributions (p( x ) and q( x )) non-symmetrically; it is defined as Equation (A42) [37]. DKL ( P|| Q) = ∑ x∈X p( x ) log2 p( x ) . q( x ) (A42) Hence, MI measures the difference between the marginal probability distribution (p( x ) p(y)) and joint probability distribution (p( x, y)) that quantifies the common information shared between the processes. Therefore, the MI will be zero when both processes are statistically independent and no common information is presented between the processes as the joint probability distribution will be equal to the product of marginal probability distribution (p( x, y) = p( x ) p(y) → MI = ∑ x∈ X,y∈Y p( x, y) log2 1 = 0) [19,37]. As discussed above, MI only measured the commonly shared information, which subsequently does not indicate the directional flow of information. On the contrary, Transfer Entropy (TE) is able to reveal the directional flow of information by considering the conditional probability p(·|·) and is defined as Equation (A43). TEY → X = ∑ x ∈ X,y∈Y (k) (l ) (k) p( xn+1 , xn , yn ) log2 (l ) p ( x n +1 | x n , y n ) (k) p ( x n +1 | x n ) , (A43) (k) such that xn+1 is the state of X at the (n + 1)-th moment; xn is the state at n-th (l ) moment of X consists of k previous states of X; yn is the state at n-th moment of Y consisting of l previous states. Note that the previous states of k and l are arbitrary and depend on the interest of the study. For instance, given a collective observed stochastic (2) process X = ( x1 , x2 , . . . , x10 ), x8 = ( xn , xm ) |n, m ∈ [1, 8] → ( x1 , x2 ) or ( x3 , x8 ), etc. Hence, TE measures the additional information needed for the realization of X at (n + 1) from Y with the assumption that X is independent of Y. If Y does not affect the future evolution of (k) (l ) (k) X, then p( xn+1 | xn , yn ) = p( xn+1 | xn ) and subsequently TY → X = 0. Using the general p( x,y) definition of conditional probability distribution (p( x |y) = p(y) ), Equation (A43) can be rewritten as Equation (A44) [19]. TEY → X = ∑ x ∈ X,y∈Y (k) (l ) p( xn+1 , xn , yn ) log2 (k) (l ) (k) p ( x n +1 , x n , y n ) p ( x n ) (k) (k) (l ) p ( x n +1 , x n ) p ( x n , y n ) . (A44) Appendix C. Information Rate, Information Length, and Information Rate Causality In a dimensionless statistical manifold, the distance of two probability distributions is defined by their differences. One of the distances is discussed in Appendix B; this is KL divergence/relative entropy with the expression as Equation (A42). As mentioned, the KL divergence measured the differences non-symmetrically, and hence it is hardly considered as a metric distance because it does not satisfy the triangle inequality [32]. The symmetric version of KL divergence is introduced as Jensen divergence and expressed as Equation (A45). The square root of Jensen divergence has shown to be a metric distance that satisfies triangle inequality [32,43,44]. D JS = 1 [ DKL ( P|| Q) + DKL ( Q|| P)]. 2 (A45) As depicted in Equations (A42) and (A45), the distance is measured at the ending points (e.g., initial and final probabilities), and hence it is not path-dependent. The pathdependent distance can be evaluated with the information length (L), which is computed through the knowledge of information rate (Γ). The detailed discussion and derivation are shown in [32]. An essential brief discussion and derivation of Γ and L are shown in the next part. Entropy 2023, 25, 806 50 of 59 Consider a time-dependent probability distribution (p( x, t)) that evolves over time. For instance, p( x, t) changes to p( x, t + dt) with the small time step of dt → 0. This evolution of difference can be expressed as Equation (A46) with the dt as the leading order. lim dt→0 1 1 1 D ( p( x, t + dt)|| p( x, t)) = lim DKL ( p( x, t)|| p( x, t + dt)) = lim D JS , (dt)2 KL dt→0 ( dt )2 dt→0 ( dt )2 1 dt→0 ( dt )2 lim 1 dt→0 ( dt )2 lim Z dx p( x, t + dt) ln p( x, t + dt) , p( x, t) (A47)   1 dx p( x, t) + (dt)∂t p + (dt)2 (∂tt p) + O((dt)3 ) 2 Z   (dt)∂t p 1 (dt)2 (∂tt p) ln 1 + + + O((dt)3 ) , p( x, t) 2 p( x, t) 1 lim dt→0 ( dt )2 1 lim dt→0 ( dt )2 Z (A46) " Z dx p( x, t) + (dt)∂t p #"   # (dt)∂t p 1 (dt)∂t p 2 − , p( x, t) 2 p( x, t) " (A48) (A49) #     (dt)∂t p 1 (dt)∂t p 2 (dt)∂t p 1 (dt)∂t p 2 dx p( x, t) − − p( x, t) + (dt)∂t p (dt)∂t p , p( x, t) 2 p( x, t) p( x, t) 2 p( x, t) (A50)   (dt)2 (∂t p)2 1 (dt)3 (∂t p)3 1 (dt)2 (∂t p)2 + − dx (dt)∂t p − , 2 p( x, t) p( x, t) 2 ( p( x, t))2 (A51) 1 dt→0 ( dt )2 lim Z 1 dt→0 ( dt )2 lim lim 0 dt→0 Z   1 (dt)3 (∂t p)3 1 (dt)2 (∂t p)2 − dx (dt)∂t p + , 2 p( x, t) 2 ( p( x, t))2 Z dx ∂t p + dt Z 1 ( ∂ t p )2 −0 2 p( x, t) Z dx 1 1 D JS = 2 2 dt→0 ( dt ) Z Z dx ∂tt p( x, t) = 0 ∵ ∴ lim ∵ dx dx ∂t p( x, t) = ∴ Γ2 ( t ) = Z Z 1 (dt)(∂t p)3 , 2 ( p( x, t))2 Z (A53) 1 2 Γ ( t ), 2 (A54) dx p( x, t) = 1, (A55) 2  q dx ∂t p( x, t) (A56) dx p( x, t)(∂t ln p( x, t))2 ≡ dx p( x, t)[∂t ln p( x, t)]2 = 4 (A52) Z Note that Taylor series is implemented in Equation (A48) and Mercator series is used in Equation (A49). The final result Γ(t) is defined as the information rate which measures the rate of the statistical change (e.g., due to the new input of information) from one state to another. Hence, the total statistical change in p( x, t) or the total path-dependent distance of p( x, t) that evolves through the time is expressed as Equation (A57) and defined as information length (L). L= Z dt Γ(t). (A57) Entropy 2023, 25, 806 51 of 59 d f (t) ∂ f ( x,t) To accommodate R Rthe computation, the continuous differentiation ( dt , ∂t ) and integration ( dt f (t), ∂t f ( x, t)) are discretized, as shown in Equations (A58) to (A61). f (t + ∆t) − f (t) d f (t) = , dt ∆t f ( x, t + ∆t) − f ( x, t) ∂ f ( x, t) = , ∂t ∆t Z Z (A58) (A59) N ∑ ∆t f (ti ), dt f (t) = ti =0 ∆t = ti+1 − ti , N ∈ R (A60) N ∑ ∆t f (x, ti ), ∂t f ( x, t) = ti =0 ∆t = ti+1 − ti , N ∈ R. (A61) Hence, the discretized version of information rate (Γ), information length (L), and information rate causality (Γ xn → xm | n6=m&(n,m)∈[1,2] ), shown in Equations (A62) to (A64), are used when computing the results that are included in the paper. Γ2 = 4 L= Γ2x2 → x1 ∑ x∈X ∆x [∆t]2 q p( x, t + ∆t) − ∑ ∆t Γ(t), 2 p( x, t) , (A62) (A63) t∈ T =4 q ∆x1 ∆x2 ∑ x1 ∈ X1 ,x2 ∈ X2 [∆t]2 q p( x1 (t + ∆t), x2 (ti )) − q 2 p( x1 (t), x2 (ti )) . (A64) Appendix D. Basic Model—Stationary and Continuous Coupling In this part, we investigate the coupling between two processes, namely x1 (t) and x2 (t), where the first process (x1 (t)) influences the second process (x2 (t)) throughout the entire process. We consider both linear and nonlinear cases, which are represented by Equations (A65) and (A66), respectively. Linear model: x1 (t) = 0.55x1 (t − 1) + ǫ1 (t), x 2 ( t ) = x 1 ( t − 2 ) + ǫ2 ( t ) . (A65) Nonlinear model: x1 (t) = 0.55x1 (t − 1)e1−| x1 (t−1)| + ǫ1 (t), x 2 ( t ) = x 1 ( t − 2 ) + ǫ2 ( t ) . (A66) Here, t is the time steps and ǫn|n∈[1,2] is the Gaussian white noise. The white noise     1.0 0.0 Σ11 Σ12 = , Σnm = has zero mean (hǫn i = 0|n ∈ [1, 2]) with unit variance ( 0.0 1.0 Σ21 Σ22 h(ǫn − hǫn i)(ǫm − hǫm i)i|n, m ∈ [1, 2]). Additionally, we also consider the noncoupling scenario where the process x2 (t) solely depends on the noise ǫ2 (t) (x2 (t) = ǫ2 (t) ∵ x1 (t − 2) = 0). The simulation results for Equations (A65) and (A66) are presented in Figure A1. From the figures, it can be observed that both the linear and nonlinear processes exhibit stationary oscillation, regardless of whether they are coupled or noncoupled. Entropy 2023, 25, 806 52 of 59 (a) linear: coupling (b) linear: noncoupling (c) nonlinear: coupling Figure A1. Cont. Entropy 2023, 25, 806 53 of 59 (d) nonlinear: noncoupling Figure A1. Result of the simulation based on Equations (A65) and (A66). The [top left, red] is the result of x1 , while the [bottom left, green] shows the result of x2 . The phase space of x1 and x2 is shown in [right blue] figure. Note that the noncoupling here refers to the processes oscillating by themselves without sharing any information with each other. Appendix D.1. Granger Causality The results of the GC are shown in Figure A2 for all the cases. Based on the results, both the coupling and noncoupling cases can be depicted well via the nonparametric GC approach (refer to Section 2.1). The continuous causality from x1 to x2 can be depicted well in Figure A2a,c. Similarly, the noncoupling between the signals is shown well in Figure A2b,d. (a) linear: coupling (b) linear: noncoupling Figure A2. Cont. Entropy 2023, 25, 806 54 of 59 (c) nonlinear: coupling (d) nonlinear: noncoupling Figure A2. Result of the spectral and time-varying frequency of GC. In each subfigure, the [left panel] shows the spectral (frequency domain) of GC with [blue line] indicating that x2 causes x1 , while [orange line] indicates that x1 causes x2 . The [top, right] subfigure illustrates the timevarying frequency of GC of x2 to x1 , while [bottom, right] shows x1 to x2 . The results are based on Equations (A65) and (A66). Appendix D.2. Transfer Entropy The results of TE shown in Figure A3 also show similar findings as GC for both coupling and noncoupling of the signals. As depicted in Figure A3a,c, TE can effectively capture the coupling from x1 to x2 when appropriate lag is selected. The suitable lag can be determined by observing the peak of Net TE (TEx1 ,x2 = TEx1 → x2 − TEx2 → x1 ), as it indicates the lag at which a significant result can be expected. Similarly, the noncoupling cases are depicted well in Figure A3b,d, where the TE between the two signals is extremely low, in the order of magnitude of ×10−4 . (a) linear: coupling Figure A3. Cont. Entropy 2023, 25, 806 55 of 59 (b) linear: noncoupling (c) nonlinear: coupling (d) nonlinear: noncoupling Figure A3. Result of the net TE and window sliding TE. Note that for each subfigure, [left, 4 × 4] depicts the net TE for the whole signals. The [right] subfigure shows the window sliding TE. The results shown are based on Equations (A65) and (A66). Appendix D.3. Information Rate Causality Information rate causality is a quantitative measure of the changes in probability distribution caused by one signal influencing another. In the cases of coupling, a high peak is expected to indicate a significant change in the distribution due to the causal relationship between the signals, as depicted in Figure A4a,c. The magnification of these figures is shown in Figure A5a,c. In this scenario, the causation from x1 (t) to x2 (t) is represented well by the peaks at an order of magnitude of ×104 . In noncoupling cases, as shown in Figure A4b,d (with magnification shown in Figure A5b,d), information rate causality does not reduce to zero due to the noise present in the estimation of the probability distribution (refer to Figure S1). However, this noise does not affect the evaluation, as it does not result in a high order of peaks observed in coupling cases. Entropy 2023, 25, 806 56 of 59 (a) linear: coupling (b) linear: noncoupling (c) nonlinear: coupling Figure A4. Cont. Entropy 2023, 25, 806 57 of 59 (d) nonlinear: noncoupling Figure A4. Result of information rate causality for Equations (A65) and (A66). (a) linear: coupling (b) linear: noncoupling Figure A5. Cont. Entropy 2023, 25, 806 58 of 59 (c) nonlinear: coupling (d) nonlinear: noncoupling Figure A5. Zoom-in of Figure A4 for the information rate causality between 9.6 s and 10.4 s. Reference 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Zenil, H.; Kiani, N.A.; Marabita, F.; Deng, Y.; Elias, S.; Schmidt, A.; Ball, G.; Tegnér, J. An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems. iScience 2019, 19, 1160–1172. [CrossRef] [PubMed] Zenil, H.; Kiani, N.A.; Zea, A.A.; Tegnér, J. Causal deconvolution by algorithmic generative models. Nat. Mach. Intell. 2019, 1, 58–66. [CrossRef] Nogueira, A.R.; Pugnana, A.; Ruggieri, S.; Pedreschi, D.; Gama, J. Methods and tools for causal discovery and causal inference. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2022, 12, e1449. [CrossRef] Zenil, H.; Kiani, N.A.; Tegnér, J. Low-algorithmic-complexity entropy-deceiving graphs. Phys. Rev. E 2017, 96, 012308. [CrossRef] [PubMed] Danks, D.; Davis, I. Causal inference in cognitive neuroscience. WIRES Cogn. Sci. 2023, e1650 . [CrossRef] Granger, C.W.J. Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econom. J. Econom. Soc. 1969, 37, 424–438. [CrossRef] Guo, R.; Cheng, L.; Li, J.; Hahn, P.R.; Liu, H. A Survey of Learning Causality with Data: Problems and Methods. ACM Comput. Surv. (CSUR) 2020, 53, 1–37. [CrossRef] Kaufmann, R.K.; Stern, D.I. Evidence for human influence on climate from hemispheric temperature relations. Nature 1997, 388, 39–44. [CrossRef] Imbens, G.W. Nonparametric Estimation of Average Treatment Effects under Exogeneity: A Review*. Rev. Econ. Stat. 2004, 86, 4–29. [CrossRef] Pullon, R.M.; Yan, L.; Sleigh, J.W.; Warnaby, C.E. Granger causality of the electroencephalogram reveals abrupt global loss of cortical information flow during propofol-induced loss of responsiveness. Anesthesiology 2020, 133, 774–786. [CrossRef] Zhao, Y.; Billings, S.A.; Wei, H.; He, F.; Sarrigiannis, P.G. A new NARX-based Granger linear and nonlinear casual influence detection method with applications to EEG data. J. Neurosci. Methods 2013, 212, 79–86. [CrossRef] [PubMed] Ebert-Uphoff, I.; Deng, Y. Causal discovery for climate research using graphical models. J. Clim. 2012, 25, 5648–5665. [CrossRef] Entropy 2023, 25, 806 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 59 of 59 Dehejia, R.H.; Wahba, S. Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs. J. Am. Stat. Assoc. 1999, 94, 1053–1062. [CrossRef] Shovon, M.H.I.; Nandagopal, N.; Vijayalakshmi, R.; Du, J.T.; Cocks, B. Directed Connectivity Analysis of Functional Brain Networks during Cognitive Activity Using Transfer Entropy. Neural Process. Lett. 2017, 45, 807–824. [CrossRef] Gao, Y.; Su, H.; Li, R.; Zhang, Y. Synchronous analysis of brain regions based on multi-scale permutation transfer entropy. Comput. Biol. Med. 2019, 109, 272–279. [CrossRef] Seth, A.K.; Barrett, A.B.; Barnett, L. Granger causality analysis in neuroscience and neuroimaging. J. Neurosci. 2015, 35, 3293–3297. [CrossRef] Ding, M.; Chen, Y.; Bressler, S.L. Granger Causality: Basic Theory and Application to Neuroscience; In Handbook of Time Series Analysis: Recent Theoretical Developments and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2006. Lima, V.; Dellajustina, F.J.; Shimoura, R.O.; Girardi-Schappo, M.; Kamiji, N.L.; Pena, R.F.; Roque, A.C. Granger causality in the frequency domain: Derivation and applications. Rev. Bras. Ensino Fis. 2020, 42 . [CrossRef] van Milligen, B.P.; Birkenmeier, G.; Ramisch, M.; Estrada, T.; Hidalgo, C.; Alonso, A. Causality detection and turbulence in fusion plasmas. Nucl. Fusion 2014, 54, 023011. [CrossRef] Sigtermans, D. Is Information Theory Inherently a Theory of Causation? arXiv 2020, arXiv:2010.01932. Bressler, S.L.; Seth, A.K. Wiener-Granger Causality: A well established methodology. Neuroimage 2011, 58, 323–329. [CrossRef] Geweke, J. Measurement of Linear Dependence and Feedback Between Multiple Time Series Measurement of Linear Dependence and Feedback Betwveen Multiple Time Series. J. Am. Stat. Assoc. 1982, 77, 304–313. [CrossRef] Geweke, J.F. Measures of Conditional Linear Dependence and Feedback Between Time Series. J. Am. Stat. Assoc. 1984, 79, 907–915. [CrossRef] Dhamala, M.; Rangarajan, G.; Ding, M. Estimating granger causality from fourier and wavelet transforms of time series data. Phys. Rev. Lett. 2008, 100, 018701. [CrossRef] Dhamala, M.; Rangarajan, G.; Ding, M. Analyzing information flow in brain networks with nonparametric Granger causality. NeuroImage 2008, 41, 354–362. [CrossRef] [PubMed] Klonowski, W. Everything you wanted to ask about EEG but were afraid to get the right answer. Nonlinear Biomed. Phys. 2009, 3, 2. [CrossRef] [PubMed] He, F.; Yang, Y. Nonlinear System Identification of Neural Systems from Neurophysiological Signals. Neuroscience 2021, 458, 213–228. [CrossRef] Schreiber, T. Measuring Information Transfer. Phys. Rev. Lett. 2000, 85, 461–464. [CrossRef] Barnett, L.; Barrett, A.B.; Seth, A.K. Granger causality and transfer entropy Are equivalent for gaussian variables. Phys. Rev. Lett. 2009, 103, 238701. [CrossRef] [PubMed] Kim, E.J.; Guel-Cortez, A.J. Causal information rate. Entropy 2021, 23, 1087. [CrossRef] Kim, E.J. Information geometry and non-equilibrium thermodynamic relations in the over-damped stochastic processes. J. Stat. Mech. Theory Exp. 2021, 2021, 93406. [CrossRef] Kim, E.j. Information geometry, fluctuations, non-equilibrium thermodynamics, and geodesics in complex systems. Entropy 2021, 23, 1393. [CrossRef] Ito, S.; Dechant, A. Stochastic Time Evolution, Information Geometry, and the Cramér-Rao Bound. Phys. Rev. X 2020, 10, 021056. [CrossRef] O’donnell, R.T.; Korb, K.B.; Allison, L. Causal KL: Evaluating Causal Discovery. arXiv 2021, arXiv:2111.06029. Wilson, G.T. The Factorization of Matricial Spectral Densities. SIAM J. Appl. Math. 1972, 23, 420–426. [CrossRef] Detto, M.; Molini, A.; Katul, G.; Stoy, P.; Palmroth, S.; Baldocchi, D. Causality and persistence in ecological systems: A nonparametric spectral Granger causality approach. Am. Nat. 2012, 179, 524–535. [CrossRef] [PubMed] Gencaga, D.; Knuth, K.H.; Rossow, W.B. A recipe for the estimation of information flow in a dynamical system. Entropy 2015, 17, 438–470. [CrossRef] Sahann, R.; Muller, T.; Schmidt, J. Histogram binning revisited with a focus on human perception. In Proceedings of the 2021 IEEE Visualization Conference (VIS), New Orleans, LA, USA, 24–29 October 2021. [CrossRef] Terrell, G.R.; Scott, D.W. Oversmoothed nonparametric density estimates. J. Am. Stat. Assoc. 1985, 80, 209–214. [CrossRef] Schelter, B.; Winterhalder, M.; Timmer, J. Handbook of Time Series Analysis; Wiley-VCH Verlag GmbH & Co KGaA: Weinheim, Germany, 2006; pp. 335–491. Shannon, C.E. A Mathematical Theory of Communication PART III: MATHEMATICAL PRELIMINARIES. Bell Syst. Tech. J. 1948, 27, 623–656. [CrossRef] Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [CrossRef] Virosztek, D. The metric property of the quantum Jensen-Shannon divergence. Adv. Math. 2021, 380, 107595. [CrossRef] Briët, J.; Harremoës, P. Properties of classical and quantum Jensen-Shannon divergence. Phys. Rev. A—At. Mol. Opt. Phys. 2009, 79, 052311. [CrossRef] Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.