Untitled document (4)
Untitled document (4)
Untitled document (4)
A Markov model is a widely used tool for modeling sequences of events from a finite state-space and
hence can be employed to identify the transition probabilities across treatments based on treatment
sequence data. To understand how patient-level covariates impact these treatment transitions, the
transition probabilities are modeled as a function of patient covariates. This approach enables the
visualization of the effect of patient-level covariates on the treatment transitions across patient visits.
The proposed method automatically estimates the entries of the transition matrix with smaller numbers
of empirical transitions as constant; the user can set desired cutoff of the number of empirical
transition counts required for a particular transition probability to be estimated as a function of
covariates. Firstly, this strategy automatically enforces the final estimated transition matrix to contain
zeros at the locations corresponding to zero empirical transition counts, avoiding further complicated
model constructs to handle sparsity, in an efficient manner. Secondly, it restricts estimation of
transition probabilities as a function of covariates, when the number of empirical transitions is
particularly small, thus avoiding the identifiability issue which might arise due to the p>n scenario
when estimating each transition probability as a function of patient covariates. To optimize the
multi-modal likelihood, a parallelized scalable global optimization routine is also developed. The
proposed method is applied to understand how the transitions across disease modifying therapies
(DMTs) in Multiple Sclerosis (MS) patients are influenced by patient-level demographic and clinical
phenotypes.
In this paper we develop a novel hidden Markov graphical model to investigate time-varying
interconnectedness between different financial markets. To identify conditional correlation structures
under varying market conditions and accommodate stylized facts embedded in financial time series, we
rely upon the generalized hyperbolic family of distributions with time-dependent parameters evolving
according to a latent Markov chain. We exploit its location-scale mixture representation to build a
penalized EM algorithm for estimating the state-specific sparse precision matrices by means of an
L
1
penalty. The proposed approach leads to regime-specific conditional correlation graphs that allow us
to identify different degrees of network connectivity of returns over time. The methodology's
effectiveness is validated through simulation exercises under different scenarios. In the empirical
analysis we apply our model to daily returns of a large set of market indexes, cryptocurrencies and
commodity futures over the period 2017-2023.
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Systems and Control (eess.SY)
In this paper we study the asymptotics of linear regression in settings where the covariates exhibit a
linear dependency structure, departing from the standard assumption of independence. We model the
covariates using stochastic processes with spatio-temporal covariance and analyze the performance of
ridge regression in the high-dimensional proportional regime, where the number of samples and
feature dimensions grow proportionally. A Gaussian universality theorem is proven, demonstrating that
the asymptotics are invariant under replacing the covariates with Gaussian vectors preserving mean
and covariance. Next, leveraging tools from random matrix theory, we derive precise characterizations
of the estimation error. The estimation error is characterized by a fixed-point equation involving the
spectral properties of the spatio-temporal covariance matrices, enabling efficient computation. We then
study optimal regularization, overparameterization, and the double descent phenomenon in the context
of dependent data. Simulations validate our theoretical predictions, shedding light on how
dependencies influence estimation error and the choice of regularization parameters.
Causal inference is vital for informed decision-making across fields such as biomedical research and
social sciences. Randomized controlled trials (RCTs) are considered the gold standard for the internal
validity of inferences, whereas observational studies (OSs) often provide the opportunity for greater
external validity. However, both data sources have inherent limitations preventing their use for broadly
valid statistical inferences: RCTs may lack generalizability due to their selective eligibility criterion, and
OSs are vulnerable to unobserved confounding. This paper proposes an innovative approach to
integrate RCT and OS that borrows the other study's strengths to remedy each study's limitations. The
method uses a novel triplet matching algorithm to align RCT and OS samples and a new two-parameter
sensitivity analysis framework to quantify internal and external biases. This combined approach yields
causal estimates that are more robust to hidden biases than OSs alone and provides reliable inferences
about the treatment effect in the general population. We apply this method to investigate the effects of
lactation on maternal health using a small RCT and a long-term observational health records dataset
from the California National Primate Research Center. This application demonstrates the practical
utility of our approach in generating scientifically sound and actionable causal estimates.
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP)
Complex networked systems driven by latent inputs are common in fields like neuroscience, finance,
and engineering. A key inference problem here is to learn edge connectivity from node outputs
(potentials). We focus on systems governed by steady-state linear conservation laws:
X
t
=
L
∗
Y
t
, where
X
t
Y
t
R
p
denote inputs and potentials, respectively, and the sparsity pattern of the
p×p
Laplacian
L
∗
X
t
to be a wide-sense stationary stochastic process with a known spectral density matrix, we learn the
support of
L
∗
Y
t
via an
ℓ
1
-regularized Whittle's maximum likelihood estimator (MLE). The regularization is particularly useful for
learning large-scale networks in the high-dimensional setting where the network size
.
We show that the MLE problem is strictly convex, admitting a unique solution. Under a novel mutual
incoherence condition and certain sufficient conditions on
(n,p,d)
L
∗
L
∗
L
∗
in element-wise maximum, Frobenius, and operator norms. Finally, we complement our theoretical
results with several simulation studies on synthetic and benchmark datasets, including engineered
systems (power and water networks), and real-world datasets from neural systems (such as the human
brain).