Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2003
…
16 pages
1 file
This paper reviews the realizations of ML and EM procedures in the presence of missing data when the data are correlated. The motivation came from the geostatistical analysis of the spatial data where some data were missing. A version of the EM algorithm was suggested, but its theoretical properties were not established. This paper analyzes the standard and the new procedure with the simplest correlation structure, that of an AR (1) process, and shows that they differ by terms of the order Op (n− 1).
2005
ii iii c 2005 Stanislav Kolenikov ALL RIGHTS RESERVED iv v ABSTRACT STANISLAV KOLENIKOV: A modification of the EM algorithm with applications to spatio-temporal modeling. (Under the direction of Prof. Richard L. Smith.)
Mathematics and Statistics, 2022
Time series forecasting is the main objective in many life applications such as weather prediction, natural phenomena analysis, financial or economic analysis, etc. In real-life data analysis, missing data can be considered as a feature that the researcher faces because of human error, technical damage, or catastrophic natural phenomena, etc. When one or more observations are missing, it might be urgent to estimate the model as well as to estimate the missing values which lead to a better understanding of the data, and more accurate prediction. Different time series require different effective techniques to have better estimates for those missing values. Traditionally, the missing values are simply replaced by mean and mode imputation, deleted or handled using other methods, which are not convenient enough to address missing values, as those methods can cause bias. One of the most popular models used in estimating time-series data is autoregressive models. Autoregressive models forecast the future values in terms of the previous ones. The first-order autoregressive AR (1) model is the one which the current value is based on the immediately preceding value, then estimating parameters of AR (1) with missing observations is an urgent topic in time series analysis. Many approaches have been developed to address the estimation problems in time series such as ordinary least square (OLS), Yule Walker (YW). Therefore, a suggested method will be introduced to estimate the parameter of the model by using weighted least squares. The properties of the (WLS) estimator are investigated. Moreover, a comparison between those methods using AR (1) model with missing observations is conducted through a Monte Carlo simulation at various sample sizes and different proportions of missing observations, this comparison is conducted in terms of mean square error (MSE) and mean absolute error (MAE). The results of the simulation study state that (WLS) estimator can be considered as the preferable method of estimation. Also, time series real data with missing observations were estimated.
Statistical Methods & Applications, 2018
Missing data arise in many statistical analyses, due to faults in data acquisition, and can have a significant effect on the conclusions that can be drawn from the data. In environmental data, for example, a standard approach usually adopted by the Environmental Protection Agencies to handle missing values is by deleting those observations with incomplete information from the study, obtaining a massive underestimation of many indexes usually used for evaluating air quality. In multivariate time series, moreover, it may happen that not only isolated values but also long sequences of some of the time series' components may miss. In such cases, it is quite impossible to reconstruct the missing sequences basing on the serial dependence structure alone. In this work, we propose a new procedure that aims to reconstruct the missing sequences by exploiting the spatial correlation and the serial correlation of the multivariate time series, simultaneously. The proposed procedure is based on a spatial-dynamic model and imputes the missing values in the time series basing on a linear combination of the neighbor contemporary observations and their lagged values. It is specifically oriented to spatio-temporal data, although it is general enough to be applied to generic stationary multivariate time-series. In this paper, the procedure has been applied to the pollution data, where the problem of missing sequences is of serious concern, with remarkably satisfactory performance.
2016
Parameter Estimation for the Spatial Ornstein-Uhlenbeck process with missing observations
The standard geostatistical problem is to predict the values of a spatially continuous phenomenon, S(x) say, at locations x using data (yi, xi) : i = 1, . . . , n where yi is the realisation at location xi of S(xi), or of a random variable Yi that is stochastically related to S(xi). In this paper we address the inverse problem of predicting the locations of observed measurements y. We discuss how knowledge of the sampling mechanism can and should inform a prior specification, π(x) say, for the joint distribution of the measurement locations X = {xi : i = 1, . . . , n}, and propose an efficient Metropolis–Hastings algorithm for drawing samples from the resulting predictive distribution of the missing elements of X. An important feature in many applied settings is that this predictive distribution is multi-modal, which severely limits the usefulness of simple summary measures such as the mean or median. We present three simulated examples to demonstrate the importance of the specification for π(x) and show how a one-by-one approach can lead to substantially incorrect inferences in the case of multiple unknown locations. We also analyse rainfall data from Paranб State, Brazil to show how, under additional assumptions, an empirical estimate of π(x) can be used when no prior information on the sampling design is available.
Nonlinear Processes in Geophysics, 2008
The paper presents an approach to estimate parameters of a local stationary AR(1) time series model by maximization of a local likelihood function. The method is based on a propagation-separation procedure that leads to data dependent weights defining the local model. Using free propagation of weights under homogeneity, the method is capable of separating the time series into intervals of approximate local stationarity. Parameters in different regions will be significantly different. Therefore the method also serves as a test for a stationary AR(1) model. The performance of the method is illustrated by applications to both synthetic data and real time-series of reconstructed NAO and ENSO indices and GRIP stable isotopes. where φ is the autoregressive parameter, y i is the ith observed realization of the AR(1) process and i are independent, Gaussian distributed innovations with zero mean and variance σ 2. We assume that the process is weaklystationary, that is, |φ| < 1.
Statistical Papers, 2020
In spatial data, especially in geostatistics data where measurements are often provided by satellite scanning, some parts of data may get missed. Due to spatial dependence in the data, these missing values probably are caused by some latent spatial random fields. In this case, ignoring missingness is not logical and may lead to invalid inferences. Thus incorporating the missingness process model into the inferences could improve the results. There are several approaches to take into account the non-ignorable missingness, one of them is the shared parameter model method. In this paper, we extend it for spatial data so that we will have a joint spatial Bayesian shared parameter model. Then the missingness process will be jointly modeled with the measurement process and one or more latent spatial random fields as shared parameters would describe their association. Bayesian inference is implemented by Integrated nested Laplace approximation. A computationally effective approach is applied via a stochastic partial differential equation for approximating latent Gaussian random field. In a simulation study, the proposed spatial joint model is compared with a model that assumes data are missing at random. Based on these two models, the lake surface water temperature data for lake Vänern in Sweden are analyzed. The results of estimation and prediction confirm the efficiency of the spatial joint model.
Statistics, Optimization & Information Computing
The problem of the mean-square optimal linear estimation of linear functionals which depend on the unknown values of a multidimensional continuous time stationary stochastic process is considered. Estimates are based on observations of the process with an additive stationary stochastic noise process at points which do not belong to some finite intervals of a real line. The problem is investigated in the case of spectral certainty, where the spectral densities of the processes are exactly known. Formulas for calculating the mean-square errors and spectral characteristics of the optimal linear estimates of functionals are proposed under the condition of spectral certainty. The minimax (robust) method of estimation is applied in the case of spectral uncertainty, where spectral densities of the processes are not known exactly while some sets of admissible spectral densities of the processes are given. Formulas that determine the least favorable spectral densities and the minimax spectral characteristics of the optimal estimates of functionals are proposed for some special sets of admissible spectral densities.
Applied Ocean Research, 2001
A new methodology for the analysis, missing-value completion and simulation of an incomplete nonstationary time series of wave data is presented and validated. The method is based on the nonstationary modelling of long-term time series developed by the authors [J. Geophys. Res. 100 (1995) 16,149]. The missing-value completion is performed at the level of the series of the uncorrelated residuals, obtained after having removed both any systematic trend (e.g. monotone and periodic components) and the correlation structure of the remaining stationary part. The incomplete time series of uncorrelated residuals is then completed by means of simulated data from a population with the same probability law. Combining then all estimated components, a new nonstationary time series without missing values is constructed. Any number of time series with the same statistical structure can be obtained by using different populations of uncorrelated residuals. The missing-value completion procedure is applied to an incomplete time series of signi®cant wave height, and validated using two synthetic time series having the typical structure of many-year long time series of signi®cant wave height. The missing-value patterns used for validation have been obtained from existing (measured) wave datasets with 16.5 and 33% missing values, respectively. q
Neurocomputing, 2017
Imputing missing data from a multivariate time series dataset remains a challenging problem. There is an abundance of research on using various techniques to impute missing, biased, or corrupted values to a dataset. While a great amount of work has been done in this field, most imputing methodologies are centered about a specific application, typically involving static data analysis and simple time series modelling. However, these approaches fall short of desired goals when the data originates from a multivariate time series. The objective of this paper is to introduce a new algorithm for handling missing data from multivariate time series datasets. This new approach is based on a vector autoregressive (VAR) model by combining an expectation and minimization (EM) algorithm with the prediction error minimization (PEM) method. The new algorithm is called a vector autoregressive imputation method (VAR-IM). A description of the algorithm is presented and a case study was accomplished using the VAR-IM. The case study was applied to a real-world data set involving electrocardiogram (ECG) data. The VAR-IM method was compared with both traditional methods list wise deletion and linear regression substitution; and modern methods multivariate autoregressive state-space (MARSS) and expectation maximization algorithm (EM). Generally, the VAR-IM method achieved significant improvement of the imputation tasks as compared with the other two methods. Although an improvement, a summary of the limitations and restrictions when using VAR-IM is presented.
Motivation
In a recent paper, Smith, Kolenikov & Cox (forthcoming) analyze a spatiotemporal data set that featured repeated measurement of the spatially correlated data. For each moment of time t = 1, . . . , T , there are up to K measurements available at certain fixed locations of the monitors. After certain transformations, the log likelihood of such model (assuming normality) looks like ln L(θ, β; y, X) ∼ − 1 2
where X is the design matrix, the vector β represents the trends in time, space, and other covariates, and θ is the set of parameters for the geostatistical model for spatial covariance. The subindex t denotes a possible dependence of the dimensionality of the vectors and matrices on time t, as long as some data are missing. The likelihood (1.1) can be maximized explicitly with a nonlinear optimization routine, but it would possibly involve inverting T matrices of rather big size (in Smith et al. (forthcoming), K = 74) and computing their determinants 1 . An appealing method to reduce those computational costs seems to be the EM algorithm (Dempster, Laird & Rubin 1977, McLachlan & Krishnan 1997. This way, only one matrix inversion and one determinant computation are required for the largest K × K covariance matrix Σ(θ) of all observation sites. (The questions would still remain which of the ML and EM procedures faster in the computer time, as EM takes more iterations to converge.)
Then the maximization procedure that also involves splitting the maximization step of the EM algorithm into maximization over the trend parameter β subspace and the covariance parameter θ subspace (a version of the EM known as expectation-conditional maximization) proceeds as follows:
1. Initialize the trend parameters β by OLS over the available cases;
2. Initialize the covariance parameters θ by some reasonable guess;
3. (E-step, h-th iteration) Compute y 5. (E-step, h-th iteration) Compute the conditional expectation of the sufficient statistic
6. (M-step, h-th iteration, part 1) Maximize the log likelihood (1.1) with respect to the covariance parameters θ;
7. (M-step, h-th iteration, part 2) Run GLS regression of y (h) with the covariance matrix Σ(θ (h) ).
8. Declare convergence, or reiterate to step 3.
The most subtle point in the above procedure is step 5 to compute the sufficient statistic of the data given the observed values and the current parameter estimates. The suggestion in Smith et al. (forthcoming) was to use e it e jt in (i, j)-th position of the t-th term in (1.2) whenever both residuals were available, and use σ ij (θ (h) ) otherwise. An alternative that seems to be more in line with the EM methodology might be to perform universal kriging at each time point in the locations of the missing data, and use the predicted residuals instead. This however bounces us back to inverting all covariance matrices of different sizes. Thus a question would be, by how much is the above approximation to the conditional expectation biased, and what kind of other problems it may lead to. This question is probably more pertinent for the covariance parameter estimates as long as the trend estimates will be unbiased and consistent for any weighting matrix in the GLS estimator.
The remainder of the paper analyzes the simplest correlation structure of an AR(1) process and analyzes the performance of the procedure suggested above for that case.
AR(1) process
Consider an AR(1) process
I shall assume |ρ| < 1, so the process is stationary. The mean of the process is
so the process can also be written in deviations form
3)
The variance of the process is
If a sample of y t , t = 1, ..., T is observed, then the log likelihood of the data is
where 1 I = (1, . . . , 1) T is the column vector of ones,
From (2.9), the sufficient statistic of the data is
This fact is crucial in the EM algorithm theory. Note that the variance estimator iŝ
so the log likelihood concentrated with respect to σ 2 is then
Differentiating with respect to µ gives
Differentiating with respect to ρ giveŝ
ML estimates of the parameters arê
The subindex indicates that the estimates are obtained from the complete data set by maximum likelihood. Those can be computed with an iterative procedure: first, the OLS estimateŝ
are obtained which are consistent estimates of the corresponding parameters, then α(ρ OLS , T ) and Q(y,μ OLS ,ρ OLS ) are computed to improve the estimates up to the first order terms by (2.17) and (2.18).
3 AR(1) with missing data: ML approach Now, suppose a portion of data that came from the AR(1) process is missing, so we first have n observed data points (y 1 , . . . , y n ), then l missing observations (y n+1 , . . . , y n+l ), and then again m observed points (y n+l+1 , . . . , y T ),
where Σ o has a block structure:
which is a s × s matrix, s = n, m, and its inverse is given by an expression similar to (2.7);
is a rank 1 matrix of dimensions n × m. By using the formulae for block inverses,
the component blocks of
can be obtained as follows.
To proceed with (T n − RT −1 m R T ) −1 , the following matrix identity can be used:
In using this formula, we shall have V = T n with known inverse, and W is minus the result of (3.7). Then
which is the upper left block of Σ o −1 . The next block of Σ o −1 is
The computations identical to (3.7)-(3.10) yield
and finally Σ o 12 = Σ o 21 T . The changes due to the missing data are concentrated near the place where it is missing, as it should have been expected from the Markovian character of AR(1) process. Let us consider the two limiting cases.
If l = 0, no data is missing, and one can verify that φ(ρ, 0) = 1 + ρ 2 , ψ(ρ, 0) = ρ. Then Σ o −1 has the form of (2.7).
If l = ∞, there are two independent samples of possibly different length from AR(1), and both the covariance matrix and its inverse are block matrices with off-diagonal block equal to zero matrices.
The determinant of Σ o −1 can be shown to be
which can be verified to give the appropriate limits 1 − ρ 2 and (1 − ρ 2 ) 2 when l = 0 and l = ∞, respectively.
Let us now derive the normal equations. The generalized sum of squares is
The likelihood can again be concentrated w.r.t. σ 2 :
where the first term is O p (1), and all others are O p (n + m) −1 . The second term is the correction due to the terminal points similar to the complete data case, and the other two are corrections for the missing data.
The derivatives with respect to ρ are as follows:
Then the normal equation for ρ is 1 2
where the terms in the LHS are of the order O p (1), and the terms in the RHS are of the order O p (n + m) −1 . Just as in the case of the ML estimates for the complete data, the explicit analytic solution cannot be obtained. In practice, one would need to use the twostep or iterative procedures. The starting values can be based on the sample mean and the sample correlation between y t and y t−1 , as before. It should be noted however that the corrections due to the missing data are of order O p (n + m) −1 .
AR(1) with missing data: the EM algorithm
The EM algorithm is an iterative procedure of finding the critical points of the likelihood function 2 . In this procedure, one switches back and forth through estimating the conditional expectation of the missing data, or the complete statistic of the full data set if one is applicable, given the current estimates of the parameter values and the observed data (E-step), and maximizing the likelihood over the parameters using the imputed data / sufficient statistic (Mstep). For details and a multitude of applications, see McLachlan & Krishnan (1997).
The particular version that is nicer to consider in this particular example of the AR(1) process is the expectation-conditional maximization version of the EM algorithm in which the maximization can be performed separately over subspaces of the parameter space spanning the whole space. We would like to split the maximization into maximization over the mean parameter subspace (µ in our case, or a regression mean if covariates are allowed for) and the variance subspace that may be split further into the overall constant σ 2 that lends itself to a simple enough estimate such as (2.11), and the covariance structure parameter, which in this case is ρ.
As was shown above in (2.10), the sufficient statistic of the complete data set is
Thus, to get an implementation of the EM algorithm, we need to be able to predict the three sums given the parameter estimates and the available observations, which boils down to prediction of y t , y 2 t and y t y t−1 when at least one of y t , y t−1 is missing.
To derive the aforementioned conditional expectation, consider four-variate normal distribution
By the standard multivariate normal theory, the distribution of the unob-served y n+r−1 , y n+r conditional on y n , y n+l+1 is bivariate normal with mean
( 4.3)and covariance matrix 1
This is also the kriging estimator. Hence, the conditional expectations of the relevant sums are as follows:
This can now be substituted into (2.14):
which after a number of simplifications coincides with (3.18).
Summary: It looks like the substitution of the missing values through kriging would produce the sums of geometric progressions that reduce to the ML stuff obtained before.
5 AR(1) with missing data:
the "lazy" EM algorithm Now, the proposed "lazy" version of the EM algorithm suggests that the predictions of the missing data statistics be simply
which is obviously quite different from what was derived in the previous section, and does not account for the observed data at all. Those are the values that can be substituted in place of the missing components of (y) − 1 Iµ)(y) − 1 Iµ) T of (1.1).
Let us see what kind of estimating equations are implied by this version of the EM algorithm. The approximate conditional expectation of the generalized sum of squares is
(5.4) and the approximate, or pseudo-likelihood being maximized is
Let us concentrate on σ 2 :
which is different by O p (n + m) −1 from the ML estimate. The concentrated pseudo-likelihood is now
Differentiating with respect to µ gives This differs from the ML estimates by O p (n + m) −1 . The derivatives with respect to ρ are as follows.
1 2
(y, µ, ρ) n + m (5.14)
The terms in the LHS are of the order O p (1), and the term in the RHS are of the order O p (n + m) −1 . Note that the principal term is the same as for the ML case except for the term (y n+l+1 − µ)(y n − µ)/(n + m) which is O p (n + m) −1 , so the two expressions differ by O p (n + m) −1 .
Summary: As noted in the end of Section 3, the corrections for the missing data are of order (sample size) −1 . The suggested procedure essentially affects the weights of the bordering observations y n and y n+l+1 only, with differences in resulting estimating equations of order (sample size) −1 , and thus is asymptotically equivalent to the EM algorithm (and leads to the estimates asymptotically equivalent to the maximum likelihood estimates).
Review of Political Economy, 2020
Nubian Archaeology in the XXIst Century. Proceedings of the Thirteenth International Conference for Nubian Studies. Neuchâtel, 1-6 September 2014, OLA 273, , 2018
Classics@, 2021
Journal of the American Romanian Academy of Arts and Sciences, 2019
Misión Joven, 2019
Revista PH , 2020
SSRN Electronic Journal, 2000
Standplaats Istanbul. Lange lijnen in de cultuurgeschiedenis van Turkije (edd. F. Gerritsen & H. van der Heijden), Amsterdam: Jurgen Maas, 2018, 174-181. See: http://www.uitgeverijjurgenmaas.nl/, 2018
Lexicon Philosophicum International Journal For the History of Texts and Ideas, 2014
British Journal of Oral and Maxillofacial Surgery, 1984
Jahreshefte des Österreichischen Archäologischen Institutes in Wien (ÖJh), 2021
Indian Journal of Medical Research, 2020
Chemical Engineering Journal, 2022
Quadern de les idees, les arts i les lletres, 2010
2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
Journal of the American Association for Laboratory Animal Science : JAALAS, 2011
Dokuz Eylül Üniversitesi Buca Eğitim Fakültesi Dergisi