SSRN Id3334458

ISSN 2610-931X
CEIS Tor Vergata

RESEARCH PAPER SERIES
Vol. 17, Issue 2, No. 452 – February 2019
Horse Race in
High Dimensional Space
Paolo Andreini and Donato Ceci
This paper can be downloaded without charge from the

Social Science Research Network Electronic Paper Collection
http://papers.ssrn.com/paper.taf?abstract_id=3334458
Electronic copy available at: http://ssrn.com/abstract=3334458
Electronic copy available at: https://ssrn.com/abstract=3334458

A Horse Race in High Dimensional Space
Paolo Andreini1, 2 and Donato Ceci1, 3
1
University of Rome, Tor Vergata
2
Now-casting Economics
3
Bank of Italy
Abstract
In this paper, we study the predictive power of dense and sparse estimators in a high dimensional
space. We propose a new forecasting method, called Elastically Weighted Principal Components
Analysis (EWPCA) that selects the variables, with respect to the target variable, taking into ac-
count the collinearity among the data using the Elastic Net soft thresholding. Then, we weight the
selected predictors using the Elastic Net regression coefficient, and we finally apply the principal
component analysis to the new “elastically” weighted data matrix. We compare this method to
common benchmark and other methods to forecast macroeconomic variables in a data-rich envi-
ronment, dived into dense representation, such as Dynamic Factor Models and Ridge regressions
and sparse representations, such as LASSO regression. All these models are adapted to take into
account the linear dependency of the macroeconomic time series.
Moreover, to estimate the hyperparameters of these models, including the EWPCA, we propose
a new procedure called “brute force”. This method allows us to treat all the hyperparameters of
the model uniformly and to take the longitudinal feature of the time-series data into account.
Our findings can be summarized as follows. First, the “brute force” method to estimate the
hyperparameters is more stable and gives better forecasting performances, in terms of MSFE, than
the traditional criteria used in the literature to tune the hyperparameters. This result holds for all
samples sizes and forecasting horizons. Secondly, our two-step forecasting procedure enhances the
forecasts’ interpretability. Lastly, the EWPCA leads to better forecasting performances, in terms of
mean square forecast error (MSFE), than the other sparse and dense methods or naïve benchmark,
at different forecasts horizons and sample sizes.
Keywords: Variable selection, High-dimensional time series, Dynamic factor models, Shrinkage
methods, Cross-validation.
JEL Classification: C22, C52, C53, C55.
We are grateful to Tommaso Proietti, Marco Lippi, Lucrezia Reichilin, Filippo Pellegrino, Thomas Hasenzagel, the
conference participants at the ’1st Vienna Workshop on Economic Forecasting 2018’ and the Now-casting Economics
team for helpful comments and suggestions.
The opinions expressed and conclusions drawn are those of the authors and do not necessarily reflect the views of
the Bank of Italy

1 Introduction
Due to recent advances in data availability as well as in machine computing, econometricians can now
study models for high-dimensional data including predictive models with many possible predictors.
The application of standard techniques (like OLS or maximum likelihood) performs poorly in these
cases. For this reason, during this last decade, the literature developed several methods with the spe-
cific aim to deal with the curse of dimensionality. One of the first method used in macroeconomics
is Dynamic Factor Models proposed by Forni, Hallin, Lippi, Reichlin (FHLR, 2003) and Stock and
Watson (2002). This method led to improvements in the prediction of industrial production and other
key macroeconomic variables, and it is based on the estimation of principal components (PCs, FHLR
introduced the dynamic principal components based on Brillinger (1980)).
Despite its success, the literature proposes several other competitors in forecasting with a large
number of predictors. A short and non comprehensive literature review should include LASSO based
models (Tibshirani (1996), Hastie et al. (2015) and Belloni et al. (2011)); the Elastic Net regression
model (Friedman et al (2010)); bayesian VAR (Banbura et al. (2010)); De Mol, Giannone and Reichlin
(2008), in which both Ridge and Lasso regression are considered); Bai and Ng (2008), where the au-
thors used LARS to select “targeted predictors” and then they performed PC on them. The literature
about targeting predictors contains also hard-thresholding techniques to perform variable selection as
in Giovannelli and Proietti (2016).
In a very broad sense, all these methodologies can be divided into two classes: dense and sparse
modeling techniques. The former considers all the variables as important for prediction even if their
impact might be small, while the latter examines the possibility to select a small subset of variables
with a high predictive power “throwing” away the others. A review of the sparse methods within a
macroeconomic framework has been written by Ng (2013), while Hastie and Tibshirani (2015) analyze
all these methods in a general environment.
Recently Giannone, Lenza, and Primiceri (2017) propose an empirical framework based on Bayesian
techniques in order to assess which one of the two approaches is the best. They study the joint posterior
of two parameters: the probability of inclusion of the variables in the prediction model and the amount
of shrinkage needed. Using micro, macro, and financial datasets, they argue that sparse models are
not supported by data.
We build on this framework and in particular, we want to make a step further in the assessment
of predictive power of both dense and sparse estimators: instead of relying on the analysis of the
joint posterior which sometimes could be misleading and not so clear to be interpreted, our aim is to
study the specific predictive performance of the different models. Moreover, we want to understand
the proper characteristics of the dataset pandering to a certain setup if any.
Moreover, we propose an alternative forecasting method, called Elastically Weighted Principal
Component Analysis (EWPCA), which deals with a large number of predictors simultaneously using
shrinkage methods. EWPCA, due to its formulation, is able to handle multicollinear data and it
stabilizes the selection of the predictors over time. These two features allow EWPCA to overcome
two common drawbacks of sparse models like LASSO. EWPC is a dense but supervised method and
it consists in two different steps: first we perform the screening of predictors using Elastic Net soft
thresholding, Hastie and Zou (2005), that selects the important regressors with respect to the target
variable and it takes into account the collinearity between the data. Then, we weight the selected pre-

dictors using the coefficients obtained from the regression in the first step, and we apply the principal
component analysis to the new “elastically” weighted data matrix in order to extract the most impor-
tant factors in terms of forecast ability. The Elastic Net uses a shrinkage estimator of the covariance
matrix, and it imposes an L2 norm constraint into the LASSO problem, making the variable selection
more stable, and it is able to deal with collinear data.
It is crucial to point out that, in our method, the variables selection step has two important features.
First, it is performed in a way that takes into account the linear dependency of the macroeconomic
data, and moreover, it is not performed without taking into accunt the aim of forecasting, but it is led
by the forecasting scheme so as not lose any important information.
Second, we implement a cross-validation procedure, different from others proposed in the literature,
that is able to treat uniformly all the hyperparameters of the model that are included in the vector
ϑm , where m indicates the model name. We call this procedure “brute force” cross-validation because
it explores the entire space for all the hyperparameters.
It is implemented in estimation using a rolling window scheme on a calibration set (t = 1, ..., Tc ) with
Tc < T . On each rolling window we test all the possible combinations of the hyperparameters. Since
this cross-validation tunes the hyperparameters to perform the variable selection step, it is implemented
concerning both the linear data dependency and the forecasting power of the data selected.
Moreover, we perform an extensive empirical application where we compare the forecasts of our
procedure with two univariate autoregressive benchmarks and other methods available in the literature
of forecasting in high dimensional space. The critical parameters for each model are calibrated using
the cross-validation procedure that we propose. The forecasting exercise is performed using different
size of the sample and different forecast horizons, as in Kim and Swanson (2014), in order to check the
forecasts’ robustness and the validity of our method. The forecasts are compared in term of MSFE
but also at a local level to determine if one model outperforms the other at each point in time, using
so-called fluctuation test proposed in Giacomini and Rossi (2010). Results show that two-step fore-
casting procedure enhances and clarifies the model interpretation, due to the first step, and improves
the forecast results.
Furthermore, in the empirical application, we compare the forecasts of EWPCA calibrating the hyper-
parameters using common information criteria against our cross-validation procedure using different
sample sizes, to determine which method reaches the lowest MSFE possible. Finally, we focus on the
forecast ability of variables at each horizon in order to understand if the composition of the optimal
subset of the variables selected in the first step of the EWPCA changes with the horizon due to dif-
ferent relation between the variables. This can give useful insights about the temporal complexity of
macroeconomic time series and it enhances the model interpretability. We can analyze these phenom-
ena thank the first step of our procedure.
The rest of the paper is organized as follows. Section 2 presents an overview of the different
methodologies used in the literature to forecast with in the high-dimensional data space, their estima-
tion features and how they are modified to take into consideration the linear dependency of the data
and the forecasting aim. In Section 3 we describe our two-step forecasting method called EWPCA.
Section 4 introduces the “brute force” cross-validation procedure to estimate the hyperparameters of
the model in the calibration set. Section 5 presents the empirical analysis. We first show the estimated
values of the parameters determined by our cross-validation procedure for all the models. Then we
exploit the predictive accuracy of EWPCA with respect to the other models presented in Section 2

through an extensive forecasting exercise. Finally, in the last part of this section, we perform three
different consistency checks: first, the stability selection of the variables in the EWPCA; second, com-
parison of the EWPCA forecasting performances using criteria available in the literature to estimate
the critical parameter versus the cross-validation procedure we are proposing; third comparison of the
forecasting performances of all the models presented using different size of the dataset. Finally, Section
6 provides some evidences and concluding remarks.
2 Methodology
In this section we compare the most common methods of variable selection proper of the literature
in the high dimensional space, adapting these models in order to take into account the autoregressive
features of the macroeconomic time series and to give a better assessment of the predictive power of
these techniques.
In particular, this section presents a brief overview of the more sophisticated approaches used in
the literature to forecast in high dimensional framework. These methods can be classified into two
different categories: sparse and dense modeling techniques.
Models in the first category select a small subset of variables with a high predictive power “throwing”
away the others. We consider LASSO-based approach by Tibshirani (1996) as a typical example of
this category,augmenting this method to include the linear dependency both of the target variable and
the covariates in selecting the variables, as in Li et al. (2014).
On the other hand, dense models consider all the variables as important for prediction at the same
time even if their impact might be small. Among th others we propose to compare the Dynamic Factor
Model, proposed firstly by Stock and Watson (2002 a, b), based on principal component analysis;
the Ridge regression by Hoerl and Kennard (1970) and the Elastic Net regression, by Zou and Hastie
(2005), which it can be seen as an “almost” sparse representation model.
2.1 LASSO based method

Let x t be a vector of N macroeconomic variables, that has been transformed to stationarity and
standardized, with t = 1, ..., T, x t = (x1,t , ..., xN,t )T , with N being a (very) large number.
The goal is to forecast the i-th variable, that is our target (either GDP, IP or CPI), using the following
forecasting model:
P
p
X
x i,T +h = µi + βi,j x j,T −p + ei,T +h (1)
p=1
where p = 1, ..., P denotes the linear dependency of our target variable, xt on xj,T −p , with j = 1, ..., N,
the error term ei,T +h is stationary and gaussian, with mean zero and variance σe2 and h denotes the
forecast horizon. We proceed estimating the coefficients βip ≡ {βi,jp
}, with j = 1, ..., N and p = 1, ..., P,
by minimizing the penalized least square:
 2
T P XN P X N
1
βip = arg min p p
X X X
xi,t+h − µ − βi,j x j,t−p  + λ1 βi,j (2)
β 2
t=1+P p=1 j=1 p=1 j=1

where the second term of the penalized OLS comes from the classical Lagrangian form. This penaliza-
tion is known as L1 norm constraint on the regression coefficients.
This LASSO approach differs from the one commonly applied in the literature (De Mol et al. (2008))
since we select the shrinkage parameter λ1 using a time series cross-validation. Differently form
Bergmeir et al. (2015) we propose to use in the calibration set the minimization of the mean square
forecast error. Moreover, in the estimation of λ1 , we consider also the linear dependency that our
data may produce, and we estimate the optimal autoregressive order (p) of our LASSO regression.
This cross-validation procedure is described in more details in Section 4 and it is performed also to
determine the optimal number of lags in both the target variable and the covariates.
Once we estimate the optimal parameters, (in the min MSFE sense) within the calibration step, we
use them to compute our prediction exercise on different horizons.
2.2 Ridge Regression

Ridge regression is very similar to LASSO regression, with the difference that while LASSO allows the
coefficients to be exactly equal to zero, Ridge shrinks the coefficients towards zero, but never equal
p
to zero. In this case the penalization applied to the regression model is the L2 -norm (kβi,j k2 ). Using
the same notation and the same forecasting model as in the equation (1), we minimize the following
penalized least square:
 2
T P X
N P X
N
1
βip = arg min p p 2
X X X
xi,t+h − µ − βi,j x j,t−p  + λ2 (βi,j ) (3)
β 2
t=1+P p=1 j=1 p=1 j=1
In this class of models the shrinkage parameter is identified by λ2 and it allows us to use the normal
restriction to minimize equation (3). For the estimation of both coefficients and critical parameters of
the Ridge regression we proceed as for the LASSO using the “brute force” cross-validation.
2.3 Elastic Net regression

The Elastic Net regression has been formalized by Hastie and Zou (2005).The idea is to overcome
the limitations presented by LASSO-type regression. The most concerned in the literature are two:
the instability of the variable selected and the struggle to deal with collinear data, i.e. LASSO-type
regression selects only one variable among a group with high pairwise correlation, without any specific
criteria. Both problems are crucial in the macroeconomic context especially when we deal with high
dimensional frameworks. Since we want to analyze high dimensional datasets in all their features,
without dropping variables prior economics fundamentals the improvement of the Elastic Net is very
useful.
The Elastic net combines features of both LASSO and Ridge regressions, because it simultaneously
does variable selection and continuous shrinkage of the coefficients, hence it can select the most impor-
tant variables withing a group. It can be shown that the Elastic Net is a linear combination of LASSO
and Ridge penalizations, since when λ1 approaches zero we are in the Ridge framework and viceversa
when λ2 is equal to zero.
Following the same scheme used for LASSO and Ridge, we can write the Lagrangian formula of the

new penalized least square to be minimized, starting again from equation (1):
 2
T P XN P X
N P X
N
1
βip = arg min p p p 2
X X X X
xi,t+h − µ − βi,j x j,t−p  + λ1 βi,j + λ2 (βi,j ) (4)
β 2
t=1+P p=1 j=1 p=1 j=1 p=1 j=1
where there are two penalty coefficients, meaning that our minimization problem has to fulfill two
constraints. The Elastic Net regression allows a lower instability in the variable selection since it
decreases the uncertainty related to large covariance matrices. The above formula refers to the naïve
Elastic Net which can introduce extra bias in the estimates because the double shrinkage in the equation
(4) and it does not reduce the variance. We can write α = λ1λ+λ 2
2
so the above formula become:
 2
T P X
N P X
N P X
N
1
βip = arg min p p p 2
X X X X
xi,t+h − µ − βi,j x j,t−p  + (1 − α) βi,j +α (βi,j ) (5)
β 2
t=1+P p=1 j=1 p=1 j=1 p=1 j=1
Following Hastie and Zou (2005), augmenting the data1 , we can determine the elastic net estimates
(β̂ en ) as:
en,p ∗p
β̂i,j = (1 + λ2 )βi,j
where β ∗ is the naïve elastic net estimates obtained from equation (4) using the augmented data.
This scaling transformation preserves the variable selection property, unties the shrinkage and it is
due to the Ridge component in the Elastic Net. The reason why (1 + λ2 ) is selected as a scaling factor
is due to the decomposition of the Ridge operator. Moreover, Hastie and Zou (2005) show (Th.22 ) that
the Elastic Net can be seen as a decorrelation estimator solved by a Lasso minimization. To simplify
the notation and using matrix form, we can reframe the equation (5) in the following:
′
′ X X + λ2 I
β̂ = arg min β β − 2y ′ Xβ + λ1 |β|1 (6)
β 1 + λ2
Where β is the vector containing elastic net estimators for all predictors, (1+λ2 ) is the decorrelation
operator that stabilizes the parameters estimation and controls the estimation of the variance.
To calibrate the value of the critical parameters we proceed as for the previous models using the “brute
force” cross-validation.
2.4 Dynamic Factor Model

The Dynamic Factor Model (DFM) was originally proposed by Forni, Hallin, Lippi, Reichlin (FHLR,
2003) and Stock and Watson (2002 a, b) (SW). In this paper we estimate this model using Principal
Component Analysis as proposed by SW. The DFM assumes a number r of unobserved dynamic factors
that summarize the information contained into the data-set x t = (x1,t , ..., xN,t )T .
We assumes that x t admits an approximate factor structure, i.e., using a vector notation we can write:
x t = Λf t + ξ t (7)
1
Details in the Appendix A.2.
2
Proof in Appendix A.2.

Where Λ is a N × r matrix of factor loadings in which each column is represented by the eigenvectors
of the sample covariance matrix, Sx , that correspond to the ordered-highest r eigenvalues and f t is
the vector containing the so-called factors.
χt = Λf t is the so-called common component while ξ t is the idiosyncratic component. Moreover the
latent factors at time t are estimated by:
f̂ t = Λ̂x t
where Λ̂ is the estimated loading factors matrix and the resulting latent factors structure is an orthog-
onal projection of our data set on a smaller-dimensional space.
This methodology has been widely used in many fields because it allows to be parsimonious in param-
eter estimation, it is able to catches co-movements among the different time series and it has useful
forecasting properties and performances.
The number of unobserved dynamic factors is estimated both using the so-called “brute force”
method we propose, trying all possible values of r, with r = 1, ..., R and using information criteria like
ABC (Alessi, Barigozzi, Capasso, 2010) and BN (Bai and Ng).
Prediction of the i-th variable in x t is obtained by projecting xi,t+h on the space spanned by
xi,t , xi,t−1 , . . . , xi,t−pY ; f̂t , f̂t−1 , . . . , f̂t−pF
performing a Factor Augmentes regression in the spirit of Bernanke et al. (2005).
Including the lag of the variable of interest is motivated by the fact that we would like to capture the
autocorrelation in the idiosyncratic component if present.
Therefore, prediction with DFM (using PCA) requires to set three parameters: r, pY and pF . In
Section 4 we explain how we choose these parameters together with those of the other models.
3 EWPCA
The Elastically Weighted Principal Component Analysis (EWPCA) is a two-step forecasting procedure
which allows us both to select the most important variables, with respect to a target variable, and to
perform a parsimonious forecasting exercise. Using the same notation as in section 2.1, in the first step
we perform the variable screening using the Elastic Net soft-thresholding formalized by Hastie and
Zou (2005) by minimizing the objective function in the equation 5 framed into a scaled minimization
problem to not bias the estimates. We use this penalization technique because it overcomes the two
most significant shortcomings of LASSO. First, it enhances the variable selection making it more stable
over time, because as a new data becomes available the subset of the most important predictors does
not change considerably; second it deals in an optimal way with multicollinear data, so with variables
that have a very high pairwise correlation. LASSO takes only one of these variables, and it does not
care on which one, while the EN with its double regularization is able to pick the most important one,
due to the decorrelation mechanism.
Following Hastie et al. (2005), we can see that the β̂ estimates of the Elastic Net, presented in section
2.3., using matrix form has the following formula: 3
′
′ X X + λ2 I ′
β̂ = arg min β β − 2y Xβ + λ1 |β|1 (8)
β 1 + λ2
3
For convenience we drop all the subscripts and superscripts referred to variable i and lags p.

en,p ′ 4
where X = (x 1 , x 2 , ..., x t−p )′ and y = x k,t , β = (β1en,p , β2en,p , ..., βN ) and the factor 1+λ 1
2
it is
the decorrelation term.
This first step procedure is necessary to determine which are the variables with the highest predictive
ability. Hence, we create a new data matrix, X ∗ , composed only by the the screened predictors weighted
by the Elastic Net regression coefficients. This method allow us to shrink the coefficients, select the
variables and weighted them avoiding the LASSO’s drawbacks within a time series framework.
Then we apply the principal component analysis to estimate the F̂t principal components from
the “elastically” weighted data matrix X ∗ that accounts for the predictive perspective in the factors
estimation. The principal components are then used within a Dynamic Factor Model regression,
that takes into consideration lags of both the target variables and the factors estimated, in order
to compute the forecast of the target variables. Hence, the prediction of the target variable i-th is
obtained by projecting xi,t+h onto the space spanned by: xi,t , xi,t−1 , . . . , xi,t−pY , F̂t , F̂t−1 , . . . , F̂t−pF ,
where F̂ includes r = 1, . . . , R factors. We use the following equation to compute the forecast:
yt+h = α1 (L)yt + α2 (L)F̂t + εt+h (9)
where yt+h = xi,t+h , α1 (L) are the coefficients of the target variable and its lags up to pY , and α2 (L)
are the coefficients of the factors F̂t and their lags up to pF . We consider only a linear combination of
the factors, and we do not add any non-linear feature in our estimation procedure, as in Bai and Ng
(2008), since it does not help to improve the forecasts of quarterly variables. In this second step, we
use the principal component analysis because it allows to represent the space in a parsimonious way.
Indeed, the factors estimated reduce the number of parameters to estimate and the model uncertainty
related to them. Moreover, the bulk of the variation in X ∗ can be explained by few mutually orthogonal
linear combination of the variables selected in the first step through the factor representation.
Therefore to compute the forecast, we need to set five hyperparameters ϑ = (λ1 , λ2 , r, pY , pF ) and
they are estimated as defined the previous section. In the empirical application, consistency checks
are performed to evaluate this forecast estimation procedure and to show the improved stability in the
variable selection along time with respect to the standard LASSO.
This two-step procedure also helps the macroeconomic interpretability of the forecast and the
model, since from the step one we can interpret which are the most important variables and their
weights that help to forecast the target variable. Moreover, from the second step, we can determine
which are the variables that contribute most to the factors in the forecasting exercise.
4 Estimation of critical parameters / hyperparameters

The estimation of the models proposed in previuos sections is based on the choice of some critical
parameters other than the regression coefficients {β}. These parameters, that we call hyperparameters,
are collected in the vector ϑ. In particular, this vector contains:
• ϑLASSO = (λ1 , p): the amount of shrinkage and the number of lags to be included in the prediction
equation;
4
Further details are included in section 2.3.

• ϑRidge = (λ2 , p): as above;
• ϑElasticN et = (λ1 , λ2 , p): as above;
• ϑDF M = (r, pY , pF ): the number of factors, the number of lags of both the dependent and
independents variables.
• ϑEW P CA = (λ1 , λ2 , r, pY , pF ): the amount of shrinkage and selection, number of factors and
number of lags of dependent and independent variables.
Most of these parameters can be estimated using criteria proposed in the literature. In particular,
the number of lags in the regression (p, pY , pF ) is usually estimated by information criteria like Akaike
Information Criterion (AIC), Bayesian Information Criterion (BIC) or Schwartz Information Criteriona
(SIC), while the optimal amount of shrinkage (λ1 , λ2 ) is estimated using the cross-section K-fold cross-
validation procedure, or other similar techniques, like for example, “Leave-one-out” or what is proposed
by Bergmeir et al. (2015). The number of factors (r) can be estimated using the methods proposed
by Alessi et al. (2010) (ABC) or Bai and Ng (2002).
All these criteria rely on the evaluation of the performance of the model, expressed in terms of fit
and complexity, once we change the single hyperparameter of interest. But it could be the case that
mixing them is not enough to find the optimal prediction setup. In this paper, we propose to use an
estimation method which is able to treat all the hyperparameters of the model uniformly.
The estimation procedure we propose is as follow:
i. we divide the entire sample (t = 1, . . . T ) into a calibration (t = 1, . . . Tc )and a proper (t =

Tc + 1, . . . T ) set, as it is usual in the machine/statistical learning literature. In the calibration
set we are going to find ϑ∗ , i.e., the optimal set of hyperparameters to be used in prediction in
the proper set;
ii. to evaluate the performance of all possible compositions of ϑ in the calibration set we define a
rolling window scheme. We choose the size of this window as τ = (kϑk + h) × 2, that roughly
coincide with a little less than 20 years;
iii. for each rolling window we compute {ŷt+h (ϑ)}t∈cal , i.e. the h−step ahead prediction of the
dependent variable based on a specific composition of the vector ϑ. In particular, we aim to
try all possible combinations of hyperparameters. In practice, for each of them, we define a set
of values sufficiently large to include the optimal one because we want to avoid results on the
bounds of our grid of values.
For each prediction horizon we compute M SF Ehi (ϑ) = t (ŷt+h (ϑ) − yt+h )2 , the Mean Square
P
Forecast Error of model i based on the hyperparameters in ϑ;
iv. the optimal set of hyperparameters is such that M SF Ehi (ϑ∗ ) is minimum;
v. ϑ∗ is then used in the proper dataset in order to produce the prediction of the dependent variable.
For each model, the M SF Ehi is computed, and it is at the base of the performance comparison.
The procedure described above, is a data-driven model, and it can be seen as a greedy cross-validation
algorithm where we take into consideration all the information available. We call it “brute force”

cross-validation procedure because it exploits the entire space of the possible solutions. We do not
use the K-fold cross-validation procedure nor the “leave-one-out” because we do not want to lose the
longitudinal feature of the time-series data. Usually in these techniques there is the assumption that
the data are i.i.d. and we do not use this assumption in our procedure.
In what follows we allow the vector ϑ to change accordingly to the prediction horizon, i.e. we are going
to consider ϑh as the vector of hyperparameters. The values of the critical parameters try to reflect
the data structure and complexity.
We also run a lot of different simulation, taking into consideration different size of the calibration
set to check the stability of this cross-validation procedure and, as it is shown in the Section 5.3.2,
we can assert that our cross-validation procedure gives low and stable MSFE regardless the size of
the calibration sample used. In the same section, we compare the out of sample performances of the
optimal model selected by our cross-validation method against the one selected via the criteria cited
above (SIC and ABC). The estimation of the regression coefficients is based on standard methods like
OLS or the coordinate descent algorithm proposed by Friedman, Hastie and, Tibshirani (2010) for
LASSO and Elastic Net regression.
5 Empirical Application and Forecasting Performances

The empirical application is divided into three parts: calibration, evaluation, and consistency.
In the first part, we aim to assess our cross-validation procedure within the calibration sample, outlined
in Section 5.1, to select the hyperparameters uniformly for each model. In the second part, Section 5.2
, we evaluate, through a forecasting exercise, the predictive ability of our methodology, EWPCA, with
respect to the other models presented in Section 4 and two common benchmarks in this literature,
the AR(1) and the AR(p) where the optimal lag for each rolling window is estimated with the BIC
criterion. Moreover, we check also the equally local forecasting performance using the test of Giacomini
and Rossi (2010). The estimation of the critical parameters, in Section 5.2, for each model is carried
out following our cross-validation procedure.
In the last section of the empirical application we perform some consistency checks in order to
control different features of our forecasting method and cross-validation procedure. First, we exploit
the stability of the cross-validation procedure, using a calibration set of different sizes, then we evaluate
the forecast performances at each horizon of the EWPCA using different criteria for lag selection (AIC,
BIC) and the number of factors (ABC) to select the hyperparameters. Finally, we analyze the stability
in the variable selection for each forecast horizon due to the usage of the Elastic Net soft-thresholding.
We do not compare our cross-validation procedure to determine λ1 and λ2 with the K − f old cross-
validation procedure or other similar. Moreover, in Section 5.3, we also exploit the sparsity in the
variable selection of our method with respect to the LASSO and its stability over time.
The data set for the empirical analysis is in an updated version of the Stock and Watson macroe-
conomic panel and includes 132 quarterly macroeconomic variables from 1960Q1 to 2014Q1, and it is
constantly updated. According to the common practice in the macroeconomic literature, the data have
been transformed to be stationary following Forni, Giovannelli, Lippi, and Soccorsi (2106), by taking
first and eventually second differences, logarithms or first or second differences in the logarithm5 . The
target variable is the USA quarterly GDP, while all the other variables are considered as potential
5
Details about the series, their name and transformations are in the appendix (B).
10

predictors. The sample is split into a calibration set from 1960Q1 to 1984Q4, it includes 96 quarters,
and a proper set from 1985Q1 to 2014Q1. For all the methods we use a rolling window of fifteen
years, 60 quarters, [t − 60, t] and the models are re-estimated for each t. The forecasts horizons are
h = 1, 2, 3, 4, 8 and the last forecast is 2014Q1 for all. In Section 5.3, related to the consistency checks,
the forecasting exercise is performed by using different sizes of the calibration and the proper set.
5.1 Cross-validation
In the cross-validation procedure we aim to determine the optimal value of the hyperparameters for
each model within the calibration set to be used in the whole forecasting exercise. As we have outlined
in Section 4, each model has its own set of parameters to calibrate, and the best parameters are selected
in the sense of minimizing the M SF E for each horizon. The general set up illustrated in Section 2
is used for the cross-validation step, where we also consider the autoregressive component of the tar-
get variable and the covariates, to calibrate the hyperparameters with a forward-looking procedure 6 .
Table 1 shows the optimal value of the critical parameters selected by this cross-validation procedure.
To initialize and find the grid for λ1 and λ2 , related to the variable selection and shrinkage on the
coefficients, we implement a common algorithm used in Hastie et al. (2009). As mentioned above, the
“brute-force” cross-validation procedure is compared to the classical criteria to check consistency and
stability in the value of the critical parameters.
Table 1: Optimal hyperparameters value
Forecasting horizon 1 2 3 4 8
LASSO
λ1 0.3266 0.2646 0.3164 0.3186 0.2764
p 2 2 2 2 2
RIDGE
λ2 6.0753 7.9300 6.0652 6.0652 9.7598
p 0 0 0 0 0
DFM
r 14 10 12 10 3
pY 1 1 1 2 1
pF 9 7 7 9 9
EWPCA
λ1 0.9141 0.5377 0.8644 0.4623 0.7638
λ2 0.030 0.2935 0.0792 0.0372 0.008
r 14 4 16 18 18
pY 1 1 1 1 1
pF 7 9 9 7 6
It is interesting to note that the Ridge regression does not admit any lag of the target nor the
6
See Section 2 for more details.
11

covariates, while LASSO regression has the same lag value for each forecast horizon. Both the DFM
and the EWPCA admit few lags of the target and the covariates for each time horizon but the number
of lags of the factors used in Factor Augmented regression to compute the predicted value of the target
is quite high, meaning that the accuracy of the predicted values of the dependent variable depend more
on the factor lags than on the covariates. Moreover, the number of factors in the EWPCA is quite
high because the variables selected has been decorrelated by the Elastic Net, hence few factors, from
1 to 5, are not enough to describe the new data space. The values of λ1 and λ2 of EWPCA are not
compatible with the Lasso λ1 and Ridge λ2 because the Elastic Net soft-thresholding has been framed
into a scaled Lasso minimization problem, see Hastie and Zou (2005).
5.1.1 LASSO
The vector of hyperparameters to calibrate for the LASSO is: ϑLASSO = (λ1 , p), where λ1 is the
amount of shrinkage and p the number of the lag of the covariates and the target variables. The grid
of values for the number of lags is p = 0, ..., P , where P = 12 is chosen to be large because we want
to avoid solutions on the boundary; p can also be equal to 0, hence, no lags are considered. The lag
refers to both the covariates and the target variable, because, as it is shown in equation (2), the vector
xj contains also the lags of the target variable.
As a first step, we create a grid of values for λ1 , i.e., the parameter governing the amount of shrink-
age. This grid is composed by 200 different values such that the highest corresponds to the largest value
of λ1 for a model with at least one coefficient different from zero when we run a regression considering
the whole calibration set (not the single time window). The values of λ1 go from 0.005 to 0.995, we do
not consider λ1 = 0 because it corresponds to the original OLS regression. Using this approach, we do
not consider the K-fold cross-validation usually performed in LASSO regression, differently from what
is proposed by Bergmeir et al. (2015), since it deteriorates our results in prediction.7 Once we have
this grid, for each time window and each lag, we compute the prediction of the dependent variable
using the coefficients from the LASSO regression using every λ1 , one at a time. We save the optimal
λ̂1 , the optimal lag p and the optimal β LASSO . The optimal values selected are the one that correspond
to the minimum MSFE produced in the calibration set and they are kept fix in the proper sample.
The optimal β̂ will be used in Section 5.3 for the selection variable stability check. This procedure is
implemented for each forecast horizon h = 1, 2, 3, 4, 8.
5.1.2 Ridge
The cross-validation procedure for the vector of hyperparameters, ϑRidge = (λ2 , p), in the Ridge regres-
sion, is similar as the one performed for Lasso, with the difference that the grid of values used to cross
validate λ2 is created following the procedure proposed by Hastie, Tibshirani, and, Friedman (2009).
In order to evaluate each time window of the calibration set using the same grid of λ2 , we perform the
SVD on the augmented matrix X2′ X2 , that includes also the target variable, X2 = (y, X ′ )′ , with the
time span of the data that goes from 1, the first observation, up to Tcal − h. Then, we multiply the
7
Alternative for the grid of λ1 : Hastie et al. (2009): to perform Single Value Decomposition (SVD) on the matrix
X ′ X, take the square of the highest eigenvalue and then compute a grid with equidistant lambda. We do not consider
this method in our paper and we leave the investigation of the properties of this approach to future research.
12

highest eigenvalue by ten, and finally we compute a grid with equidistant lambda in descending order.
The last one is very close to zero, meaning that we are approaching the OLS regression also in this
case.
As for the Lasso, once we have built the grid, for each time window and each lag, p = 0, ..., P with
P = 10, we compute the prediction of the target variable using all the lambda. Finally, we save the
optimal λ̂2 , β̂ Ridge8 and lag that minimizes the M SF Eh for each horizon.
5.1.3 Dynamic Factor Model
The vector of the critical parameters in the Dynamic Factor Model is ϑDF M = (r, pY , pF ), where r is
the number of factors and it goes from 1 to 20, also in this case to avoid boundary solutions, pY is
the number of lags of only the target variable differently from LASSO and Ridge, while pF it is the
number of lags of the factors and it goes from pF = 1, ..., 10. The factors of the dynamic factor model
are estimated from the data matrix without using lags, through the principal component analysis.
The critical parameters are estimated using the same rolling window scheme used for Lasso and
Ridge, and for each time window we compute the prediction of the dependent variable and we store
the set of optimal parameters (r̂, p̂Y , p̂F ), that minimize the M SF Eh for each forecast horizon.
5.1.4 EWPCA
The Elastically Weighted PCA has a vector of five hyperparameters to calibrate ϑEW P CA = (λ1 , λ2 , r, pY ,
pF ). The grid of values for each critical parameter is determined as for the previous models using the
rolling window estimation and we store the optimal set of parameters (λ̂1 , λ̂2 , r̂, p̂Y , p̂F ) that minimize
the M SF Eh for each forecast horizon. To reduce the computation burden, for the EWPCA we use a
grid of fifty values for λ1 and fifty values for λ2 .
5.2 Predictive accuracy

We now compare the performances of the EWPCA to predict the GDP with respect to the other
models at each forecast horizons. The value of the critical parameters used in the proper sample,
for the pseudo-out of sample exercise, is the one shown in Table 1 and they have been calculated
following the cross-validation procedure outlined in Section 4. We use the same cross-validation across
the models to have compatible forecasts. The proper sample used to evaluate the forecasts goes from
1985Q1 to 2014Q1.
Moreover, we compare our model with the univariate AR, where we determined the optimal number
of lags for each time horizon. The forecasting exercise is implemented using a direct forecast approach
as outlined by Marcellino, Stock, and Watson (2005). The M SF E measures the average performance of
the models and we do not present the RM SF E because it tends to overweight the recession periods9 .
In Table 2 we report the results of the out-of-sample exercise and for each forecast horizon of the
EWPCA with respect to the AR(1) and the AR(p) where the optimal number of lags, estimated using
BIC criteria, is equal to 2. We focus on this comparison to see if our methodology can perform better
than classical benchmarks. The value used for the hyperparameters is the one outlined in the previous
8
βRidge is used for each time window are used in the stability check along time.
9
However, the same results hold also with the RMSFE.
13

sections.
In Table 3 we show the results of the forecasting exercise for each forecast horizons of the EWPCA with
respect to the models presented in Section 2. These models are: LASSO regression, Ridge regression,
and the Dynamic Factor Model, which are the classical methods in the literature used to forecast a
target variable in a high dimensional space. The critical values for each model, at each horizon have
been selected following the cross-validation procedure presented above and shown in Table 1. In order
to assess the significance of our results we conduct the Diebold and Mariano test, see Diebold and
Mariano (1995), where one, two or three asterisks indicate if the null hypothesis is rejected at the
1%, 5%, 10% significance level. The null hypothesis is defined as H0 : errorhEW P CA = errorhm , where
m = {AR(1), AR(bic), LASSO, Ridge, DF M }
Table 2: Forecast Evaluation EWPCA vs AR
EWPCA vs AR(1) 0.4097∗∗∗ 0.5156∗∗∗ 0.5148∗∗∗ 0.2940∗∗∗ 0.6363∗∗∗
EWPCA vs AR(p) 0.4263∗∗∗ 0.5080∗∗∗ 0.5168∗∗∗ 0.2943∗∗∗ 0.6489∗∗∗
Notes: Entries are Relative Mean Square Forecast Errors of the EWPCA with respect to the AR(1) and
AR(bic). Bold values mean that we reject the null hypothesis while the asterisks indicate at which level we
reject the null hypothesis. ∗∗∗ p < 0.01, ∗∗ p < 0.05, ∗ p < 0.1
In Table 2 we can notice that our methodology performs better than both the AR(1) and the AR(p) at
each forecast horizons. The results are all significant with p < 0.01. We notice that the performance of
our methodology deteriorates more than both the AR specifications as the forecast horizon increases
but it still outperform these two standard benchmarks.
The better forecasting performances are due to the two steps of our methodology where we are able
to select the most important variables from which we extract the principal components. In both steps
our procedure try to be parsimonious in the variable selected, avoiding the multicollinearity issue due
to Elastic Net penalization procedure, and then the use of the principal component analysis allows to
use a parsimonious forecast method. Moreover, the cross-validation procedure helps us to detect the
optimal value for the critical parameters to use in the out-of-sample exercise. In all the steps of our
methodology, including the cross-validation, we want to preserve the forecasting ability of the variables
and let this feature drive the procedure.
Table 3: Forecast Evaluation EWPCA vs Lasso, Ridge and DFM
EWPCA vs Lasso 0.3208∗∗∗ 0.3633∗∗∗ 0.3922∗∗∗ 0.2297∗∗∗ 0.4807∗∗∗
EWPCA vs Ridge 0.2610∗∗∗ 0.3088∗∗∗ 0.1680∗∗∗ 0.1315∗∗∗ 0.2234∗∗∗
EWPCA vs DFM 0.9164∗∗ 1.3889 0.6644∗∗∗ 0.7067∗∗∗ 0.9108∗∗∗
Notes: Entries are Relative Mean Square Forecast Errors of the EWPCA with respect to the LASSO, Ridge,
and DFM. All the critical parameters have been selected using the cross-validation procedure in section 2.3.
Bold values mean that we reject the null hypothesis while the asterisks indicate at which level we reject the
null hypothesis. ∗∗∗ p < 0.01, ∗∗ p < 0.05, ∗ p < 0.1
14

Table 3 confirms that the newly proposed procedure usually outperforms the other methods used
to forecast high-dimensional time-series and it produces less extreme and more stable forecasts errors.
First of all we can notice that EWPCA has a much better performance than Ridge regression at all the
forecast horizons. The latter evidence is due to the fact that the Ridge is the less parsimonious model
and it has fewer forecasts ability than the others and because it is not able to detect collinear variables
so it can give the same coefficients to predictors with similar effects. Moreover, EWPCA outperforms
LASSO regression at each forecast horizons, as for Ridge, and all the results are statistically significant
at a 1% level, in a Diebold-Mariano sense. This result is due to the fact that sometimes LASSO does
not select the best variables to implement the forecasting exercise because of the drawbacks outlined
in Section 2.1. LASSO has a huge instability in variable selection and often in the proper set the
calibrated hyperparameter is not able to select any variables. We can notice that even if LASSO is a
quite unstable procedure, it outperforms the Ridge regression in forecasting at each horizon because it
often gets rid of an important junk of useless predictors that add noise.
Finally, in the last row of Table 3, we can observe that EWPCA also outperforms the DFM at all the
forecast horizons except at h = 2, where the Dynamic Factor Model has a better performance than the
EWPCA. It is well known and studied that DFM for the one-step-ahead forecast is a hard benchmark
to beat especially when the critical parameters, number of factors and lags, are calibrated using our
"brute force" cross-validation procedure that treats all the parameters in an equivalent way, but the
EWPCA is able to improve the one-step-ahead forecast. When the forecast horizon is equivalent to
eight months (h = 2), the DFM has a better performance probably due to the fact that the factors
computed in the DFM take into consideration all the variables and not just an optimal subselection of
them as in the EWPCA. In this context the DFM is able to capture, probably, some shadow dynamics
hidden in the all set of the variables available that the EWPCA loses in the first step. This has been
left to a future investigation but the Giacomini and Rossi (2010) fluctuation test for local performance
helps us in the explanation of this phenomena as we see in the second part of this paragraph.
The values in Table 3 are relative measures, and they seem to point out huge differences in the
Table 4: Mean Square Forecast Errors
EWPCA 0.1647 0.3146 0.2823 0.1614 0.3731
DFM 0.1797 0.2488 0.4249 0.2283 0.4097
AR(1) 0.4387 0.4826 0.5484 0.5487 0.5863
AR(p) 0.4216 0.4898 0.5463 0.5481 0.5749
Notes: Entries are Mean Square Forecast Errors of the EWPCA, DFM, AR(1), and AR(bic). All the critical
parameters have been selected using the cross-validation procedure in section 2.3.
forecast errors between the models, but as we can see in Table 4, the difference of the M SF E, also
between EWPCA and DFM at h = 2, when DFM outperforms EWPCA, is not so large10 . The gap
between the EWPCA and the DFM increases as the forecast horizon becomes bigger, hence we can
assert that the EWPCA produces lower MSFE than the DFM al longer forecast horizons. We can
also notice that the M SF E for each model increases as the forecast horizon increases. In general, we
10
We also put the AR(1) and AR(bic) in the table to show also their M SF E values.
15

obtain better and less volatile forecasts than the other models because the two steps procedure is able
to get rid of noisy and redundant information in the first step and it improves the factors estimation
in the second step. The out of sample data includes the Great Recession and other small crisis periods
where the covariance structure of the dataset changes and the predictive performance of EWPCA and
DFM deteriorates. This because the estimation of the factors and the loadings take time to adjust.
Moreover, at h = 2, the EWPCA has a slightly worse performance than the DFM maybe because in
the variables selection step the Elastic Net selects predictors based on the previous covariance structure
slowing down a little the reaction to the change. Indeed, if we enlarge the calibration set including the
recession we see, in Section 5.3.3, that the EWPCA is able to outperforms the DFM also at h = 2.
This effect is clear once we implement the fluctuation test of Giacomini and Rossi (2010) to test
the null hypothesis of the equal local performance of two different forecasting methods. The dotted
lines indicate the 5% critical values, meaning that, if the solid line is below the lower dashed line
m1 outperforms m2 locally. The fluctuation test is presented in Figures 1 and 2, and it compares
the EWPCA at each forecast horizons with the other high dimensional models and the benchmarks.
In Figure 1, we can notice that EWPCA is able to outperform at each horizons Lasso and Ridge
regression, especially during and after the crisis period. However, the solid line is going to converge
to its average pre-crisis value. Figures 1 and 2 show an important result because it means that out
procedure is able to detect the recession periods and it adapts to them quickly, while other methods
are not able to perform this fast change and for unstable periods they produce constantly worse results.
We can notice the same pattern with respect to the AR(1) and AR(p). We can assert that the better
forecasting results of our procedure is gained during the recession periods.
The first column in Figure 1 shows the fluctuation test of EWPCA versus the DFM. The results,
in this case, are a little less clear since at h = 1, 3, 4 and 8 the EWPCA outperforms the DFM from
2001 to 2006, while during the crisis they perform almost the same, hence, both models are able to
detect changes in the data structure. At h = 2 the situation is a little different because the EWPCA
performs almost as good as the DFM except in 2002 and after the crisis, while the pre-crisis period
and in the first stage of it, EWPCA seems to perform better.
16

Figure 1: Fluctuation test EWPCA versus DFM, Lasso and Ridge
Notes: Each column in the graph represents the fluctuation test of the EWPCA versus DFM (column 1),
LASSO (column 2) and Ridge (column 3) for each forecast horizons. The dotted lines indicate the 5%
critical values. 17

Figure 2: Fluctuation test EWPCA versus AR(1) and AR(bic)
Notes: Each column in the graph represents the fluctuation test of the EWPCA versus AR(1) (column 1)
and AR(bic) (column 2) for each forecast horizons. The dotted lines indicate the 5% critical values.
18

5.3 Stability selection and Consistency checks
5.3.1 Stability selection EWPCA versus LASSO
In this paragraph, we analyze the model sparsity and its increased interpretability thanks to the usage
of the Elastic Net soft-thresholding. In Table 5 we show the model sparsity of LASSO versus the
Elastic Net regression used in step one of our procedure. We summarize the sparsity as the proportion
of the independent variables selected by these shrinkage methods in the out of sample exercise for each
forecast horizon. To compute this measure, we simply count how many predictors are selected in each
rolling window and then we take the average. The standard deviation of this measure shows whether
the number of variables selected is consistent over time.
The number of variables selected by the Elastic Net is much bigger than the one selected from the
LASSO, but they are quite stable over time, this means that the group of variables selected in each
rolling window comprehends almost the same predictors. Moreover, this result has been also proven
by the fact that the standard deviation is very small, meaning that the number of variables selected
in each rolling window does not diverge. It is interesting to notice that the number of variables
selected by LASSO it is very small at each h, it might be the case that the out of sample data
structure changes considerably with respect to the calibration data set making the optimal value of
the hyperparameters inadequate to select the predictors in the proper data set. Furthermore, this
difference can be explained by the fact that lasso imposes a sparse structure of the data, while Elastic
Net adopts both the penalization (L1 and L2 norm) without imposing a strict sparse structure, and
therefore it is able to “kill” fewer predictors.
Table 5: Sparisity in the variable selection
1 2 3 4 8
mean st.dev mean st.dev mean st.dev mean st.dev mean st.dev
EWPCA 0.288 0.026 0.068 0.032 0.173 0.016 0.406 0.035 0.515 0.035
LASSO 0.014 0.022 0.014 0.013 0.002 0.008 0.0015 0.005 0.0015 0.006
Notes: At each forecast horizon we compute the mean and standard deviation of the number of predictors
selected across time for LASSO and EWPCA.
In Figure 3 there are the heatmaps of the variables selected by Elastic Net in step one of the
EWPCA and its stability over time. We can notice that a lot of variables, especially at the first
three forecasts horizons, are not selected while some of them are selected for almost all the rolling
windows. This fact can enhance the model interpretability and help to detect which variables are the
most important to forecast the target variable at each forecast horizon. We can assert that even if we
do not impose a sparse structure, for h = 1, 2, 3 the data structure is enough sparse and the variables
selected, thanks to the decorrelation correction of the Elastic Net, are stable along time. This evidence
overcomes one of the drawbacks of LASSO that is the instability in the predictors’ selection along time.
We can notice that at short horizons, h = 1 and 3, the core predictors selected are almost the same
and it might be the case that in the data set there are some variables with a lot of forecast ability for
short horizons, while others can be completely useless, hence, our two step procedure can give us a less
noisy measure of the GDP.
The situation at longer horizons (i.e. h = 4 and 8) is different because we can not see a clear sparsity
19

pattern. First, at each rolling window, the algorithm selects, on average, from 40 to 50 predictors and
it can be due to the fact that the long-term forecast ability spreads on more predictors because they are
less correlated at long horizons than short horizons. Second, despite the fact that the data structure is
quite dense in these cases, the algorithm is still able to delete noisy and redundant predictors that do
not have any forecasting ability, making the factor extraction more precise. Our two-step procedure
is then able to deal with cases where there is not a clear sparse pattern, especially at longer forecast
horizons, producing a more precise and less volatile forecast than the other models.
Figure 3: Stability selection and Sparsity
Notes: Each graph represents the variables selected (yellow points) by the Elastic Net soft-thresholding for
each rolling window, at each forecast horizon.
5.3.2 Consistency check EWPCA: brute force versus criteria
In this paragraph, we study extensively the ability to detect the best hyperparameters combination
of our cross-validation procedure with respect to the common criteria. To determine which method is
better we use the M SF E for each horizon computed in the proper set. The criteria used are the ABC
criteria (Alessi, Barigozzi and Capasso 2010) for the selection of the number of factors, and Schwarz
criteria to determine the number of lags for the target variable and the factors. The value of the critical
parameters determined by “brute force” cross-validation are computed in the calibration set following
the procedure outlined in Section 2 and kept fixed for the all the out of sample periods. The values
of the hyperparameters estimated through the criteria are computed for each out of sample rolling
window in order to select always the best value. The criteria can adapt the optimal value at each time,
and this is an advantage with respect to our cross-validation procedure. To calibrate the λ1 and λ2 we
20

do not use the K-fold cross validation or similar to not lose the longitudinal feature of the data.
Table 6: Mean Square Forecast Errors EWPCA “brute force” versus criteria
Forecast horizon 1 2 3 4 8
Size = 40%
EWPCAbf 0.1647 0.3146 0.2823 0.1614 0.3731
EWPCAabc 1.4607 1.2910 1.1325 1.3112 1.8725
EWPCAs 0.3571 0.4790 0.8911 0.5192 0.5417
EWPCAabc−s 0.4278 0.4921 0.7409 0.5833 0.5755
Size = 50%
EWPCAbf 0.1169 0.1932 0.2436 0.1286 0.1587
EWPCAabc 1.8345 0.4900 2.2464 0.9961 0.5020
EWPCAs 0.3464 0.3843 0.8824 0.3838 0.5884
EWPCAabc−s 0.3360 0.4605 0.5852 0.5485 0.7795
Size = 70%
EWPCAbf 0.1102 0.1319 0.2947 0.1624 0.1774
EWPCAabc 3.6321 2.3120 1.2297 0.3313 0.6540
EWPCAs 0.3457 1.1921 0.4432 0.4076 0.7028
EWPCAabc−s 0.5473 0.5791 0.8976 0.6653 1.5115
Notes: Entries are Mean Square Forecast Errors of the EWPCA calibrated in different ways. Bold values
are the best forecast at each forecast horizon among the different model specification.
In Table 6 we show the out of sample M SF E at each forecast horizon, using different size of the
calibration set {40%, 50% and 70%} of T for different EWPCAc , where c indicates the criteria used,
c = {bf, abc, s, abc-s}11 . The value of the critical parameters, for each size and model, are reported
in the online appendix.12 We can notice that in all the cases the MSFE referred to our “brute force”
cross-validation procedure is the lowest for each forecast horizon and size of the calibration data set
and it is highlighted using bold characters. We can also notice that the MSFE of our cross-validation
procedure is stable among the different sizes used for this accuracy and consistency control. Looking at
the other values in Table 6, we can assert that the ABC criteria, used to select the number of factors,
is the most unstable. Its MSFE has big volatility both the forecast horizons and the sizes of the
calibration data set. This effect can be due to the reduction of the high dimensional space performed
in the first step of EWPCA, making the selection criteria unstable and unable to detect the optimal
number of the factors 13
It is clear from Table 6 that treating all the parameters in the same way and exploiting the space
of all the possible combinations helps to improve the forecasting ability of EWPCA, where we have
a consistent number of hyperparameters. The hybrid methods, EWPCAabc and EWPCAs , where we
11
bf = brute force, abc = ABC, s = SIC, abc-s = ABC-SIC.
12
Available upon request.
13
We perform an extensive exercise where we compare the ABC criteria against a simpler factor selection, where we fix
the number of factors = {1, 2, 3} and we notice that the outcome for each forecast horizon of the latter method produces
lower MSFEs with a more stable pattern.
21

combine brute force cross-validation and criteria are quite unstable and, in most of the cases, they
have the worse performances. On the other hand, when using the criteria for both lag and factors, as
in EWPCAabc−s , we obtain more stable performances but the MSFE value obtained is bigger than the
one from the full “brute force” cross-validation method.
We can conclude that our “brute force” cross-validation method has the best forecast accuracy and
it is the most stable, while the implementation of criteria can be quite unstable and gives misleading
results. This results can be addressed by the fact that in the past most of the criteria focuses on being
parsimonious but today with data availability and the techniques to deal with them, we can focus less
on this aspect and more on forecast performances.
5.3.3 Consistency check: EWPCA versus other models
In this section, we perform the last consistency check, and we compare EWPCA calibrated using the
brute force methods with the other models at each forecast horizon for different size of the calibration
set, as in the paragraph above. The hyperparameters of the LASSO, Ridge, and DFM are estimated
using the same cross-validation procedure outlined in Section 4 but with different size of the dataset.
Table 7: Mean Square Forecast Errors EWPCA brute force versus criteria
Forecast horizon 1 2 3 4 8
Size = 50%
EWPCAbf 0.1169 0.1932 0.2436 0.1286 0.1587
Lasso 0.4523 0.5695 0.6315 0.6224 0.7351
Ridge 0.5857 0.7056 0.9247 1.0275 1.2355
DFMbf 0.0911 0.2528 0.2015 0.2608 0.3471
Size = 70%
EWPCAbf 0.1102 0.1319 0.2947 0.1624 0.1774
Lasso 0.5561 0.8074 0.9525 0.9276 1.2516
Ridge 0.5949 0.6790 0.9270 1.0839 1.2828
DFMbf 0.1508 0.2843 0.1987 0.3665 0.3208
Notes: Entries are Mean Square Forecast Errors of the EWPCA, LASSO, Ridge and DFM using different
size of the calibration data set. Bold values are the best forecast at each forecast horizon among the different
model specification.
In Table 7 we report out of sample M SF E for all the models using different size of the calibration
data set {50% and 70%}. The bold characters highlight the lowest mean square forecast error for
a specific forecast horizon. We can notice that when the calibration size is equal to the 50% of the
sample, the EWPCA on average is the model that performs better and has the most stable MSFE
along the different horizons. At h = 1 and 3 the DFM slightly outperforms EWPCA and for h = 1
and the difference is statistically significant, meaning that the DFM outperforms the EWPCA in this
case. When the calibration set is equal to 70% of our data, the EWPCA performs much better than
the DFM at each forecast horizon, except then h = 3, and also, in this case, the volatility of the mean
square forecast error is lower than the DFM, meaning that the results of the EWPCA are more stable.
22

We notice that as the calibration data set increases its size the forecast ability of all the models
does not have any clear pattern of improvement, but it has almost the same power. In the Ridge case
there is a slight deterioration of the performances when c = 70%, but it is not statistically significant.
EWPCA performs much better than LASSO and Ridge regression at each forecast horizon and with
each size of the calibration dataset, and this is due to the drawbacks we have outlined along this work
that we try to overcome.
Collecting the results that compare the DFMbf and EWPCAbf in Section 5.3.1 and in the Table 7,
we are able to assert that EWPCA has a better performances in terms of mean square forecast error
than the DFMbf at h = 1 for two out of three sizes we have used and it has an overall smaller volatility
in the results for each calibration data size.
6 Conclusion
Forecasting in high dimensional time series is an important and contemporaneous problem that re-
searchers are trying to address theoretically and empirically. In the recent past, especially in economics,
with the word “big” or “high dimension” people were referring to the simultaneous modeling of 15-20
variables to predict the target variable in order to avoid the curse of dimensionality in the parameter
estimation and to maintain the interpretation of the model. Recently, dense method, like DFM, were
able to use much more data series at the same time with the losing of the model interpretation. This
paper proposes a new method able to deal with a huge number of series and to improve the forecast of
the target variables without losing model interpretation that is useful for macroeconomic implications.
The EWPCA selects the variables with the highest predictive power and through the principal
component analysis is able to extract the most important information to forecast the target variable.
This method is preferred to the DFM because it has better performances and it enhances the model
explanation high dimensional and complex space where a lot of the variables are correlated. Moreover,
we can assert that EWPCA has a stable variable selection along time that overcomes the drawbacks
related to the LASSO selection. In the first step we are able to decorrelate the variables and selects
for each forecast horizon the optimal subsample. From the empirical evidence we can notice that the
optimal subsample changes at each forecast horizon meaning that the variables have different forecast’s
ability due to the different structure and complexity of the data.
Finally, this work proposes also an alternative cross-validation procedure for high dimensional time
series called “brute force” that gives overall more stable results, in terms of MSFE, and it treats all the
critical parameters, in the same way, exploiting the space of all the possible combinations.
23

A Appendix
A.1 Dataset description
N Mnemonic Descriptions Tcode

1 INDPRO Industrial Production Index 5
2 IPBUSEQ Industrial Production: Business Equipment 5
3 IPCONGD Industrial Production: Consumer Goods 5
4 IPDCONGD Industrial Production: Durable Consumer Goods 5
5 IPDMAT Industrial Production: Durable Materials 5
6 IPFINAL Industrial Production: Final Products (Market Group) 5
7 IPMAT Industrial Production: Materials 5
8 IPNCONGD Industrial Production: Nondurable Consumer Goods 5
9 IPNMAT Industrial Production: nondurable Materials 5
10 CPIAUCSL Consumer Price Index for All Urban Consumers: All Items 6
11 CPIENGSL Consumer Price Index for All Urban Consumers: Energy 6
12 CPILEGSL Consumer Price Index for All Urban Consumers: All Items Less Energy 6
13 CPILFESL Consumer Price Index for All Urban Consumers: All Items Less Food & Energy 6
14 CPIUFDSL Consumer Price Index for All Urban Consumers: Food 6
15 CPIULFSL Consumer Price Index for All Urban Consumers: All Items Less Food 6
16 PPICRM Producer Price Index: Crude Materials for Further Processing 6
17 PPIENG Producer Price Index: Fuels & Related Products & Power 6
18 PPIFCG Producer Price Index: Finished Consumer Goods 6
19 PPIFGS Producer Price Index: Finished Goods 6
20 PPIIDC Producer Price Index: Industrial Commodities 6
21 PPICPE Producer Price Index: Finished Goods: Capital Equipment 6
22 PPIACO Producer Price Index: All Commodities 6
23 PPIITM Producer Price Index: Intermediate Materials: Supplies & Components 6
24 AMBSL St. Louis Adjusted Monetary Base 6
25 ADJRESSL St. Louis Adjusted Reserves 6
26 CURRSL Currency Component of M1 6
27 M1SL M1 Money Stock 6
28 M2SL M2 Money Stock 6
29 M3SL M3 Money Stock (DISCONTINUED SERIES) 6
30 BUSLOANS Commercial and Industrial Loans at All Commercial Banks 6
31 CONSUMER Consumer Loans at All Commercial Banks 6
32 LOANINV Bank Credit at All Commercial Banks 6
33 LOANS Loans and Leases in Bank Credit, All Commercial Banks 6
34 OTHSEC Other Securities at All Commercial Banks 6
35 REALLN Real Estate Loans at All Commercial Banks 6
36 TOTALSL Total Consumer Credit Owned and Securitized, Outstanding 6
37 GDPC1 Real Gross Domestic Product, 1 Decimal 5
38 FINSLC1 Real Final Sales of Domestic Product, 1 Decimal 5
39 GPDIC1 Real Gross Private Domestic Investment, 1 Decimal 5
40 UEMP5TO14 Civilians Unemployed for 5-14 Weeks 5
41 PRFIC1 Real Private Residential Fixed Investment, 1 Decimal 5
42 PNFIC1 Real Private Nonresidential Fixed Investment, 1 Decimal 5
43 NRIPDC1 Real Nonresidential Investment: Equipment & Software, 1 Decimal 5
44 IMPGSC1 Real Imports of Goods & Services, 1 Decimal 5
45 FGCEC1 Real Federal Consumption Expenditures & Gross Investment, 1 Decimal 5
46 GCEC1 Real Government Consumption Expenditures & Gross Investment, 1 Decimal 5
47 FPIC1 Real Private Fixed Investment, 1 Decimal 5
48 EXPGSC1 Real Exports of Goods & Services, 1 Decimal 5
49 CBIC1 Real Change in Private Inventories, 1 Decimal 5
50 PCNDGC96 Real Personal Consumption Expenditures: Nondurable Goods 5
51 SLINVC96 Real State & Local Government: Gross Investment 5
52 PCESVC96 Real Personal Consumption Expenditures: Services 5
53 PCDGCC96 Real Personal Consumption Expenditures: Durable Goods 5
54 PCECC96 Real Personal Consumption Expenditures 5
55 DGIC96 Real National Defense Gross Investment 5
56 NDGIC96 Real Federal Nondefense Gross Investment 5
57 DPIC96 Real Disposable Personal Income 5
58 PCECTPI Personal Consumption Expenditures: Chain-type Price Index 6
59 GPDICTPI Gross Private Domestic Investment: Chain-type Price Index 6
60 GDPDEF Gross Domestic Product: Implicit Price Deflator 6
61 GDPCTPI Gross Domestic Product: Chain-type Price Index 6
62 GNPDEF Gross National Product: Implicit Price Deflator 6
63 GNPCTPI Gross National Product: Chain-type Price Index 6
64 HOUSTMW Housing Starts in Midwest Census Region 4
65 HOUSTNE Housing Starts in Northeast Census Region 4
66 HOUSTS Housing Starts in South Census Region 4
67 HOUSTW Housing Starts in West Census Region 4
68 PERMIT New Private Housing Units Authorized by Building Permits 4
69 ULCNFB Nonfarm Business Sector: Unit Labor Cost 5
70 COMPRNFB Nonfarm Business Sector: Real Compensation Per Hour 5
71 COMPNFB Nonfarm Business Sector: Compensation Per Hour 6
72 HOANBS Nonfarm Business Sector: Hours of All Persons 6
24

N Mnemonic Descriptions Tcode
73 OPHNFB Nonfarm Business Sector: Output Per Hour of All Persons 5
74 ULCMFG Manufacturing Sector: Unit Labor Cost 5
75 COMPRMS Manufacturing Sector: Real Compensation Per Hour 5
76 COMPMS Manufacturing Sector: Compensation Per Hour 6
77 HOAMS Manufacturing Sector: Hours of All Persons 6
78 OPHMFG Manufacturing Sector: Output Per Hour of All Persons 5
79 ULCBS Business Sector: Unit Labor Cost 5
80 RCPHBS Business Sector: Real Compensation Per Hour 5
81 HCOMPBS Business Sector: Compensation Per Hour 6
82 HOABS Business Sector: Hours of All Persons 6
83 OPHPBS Business Sector: Output Per Hour of All Persons 5
84 MPRIME Bank Prime Loan Rate 2
85 FEDFUNDS Effective Federal Funds Rate 2
86 AAA Moodys Seasoned Aaa Corporate Bond Yield 2
87 BAA Moodys Seasoned Baa Corporate Bond Yield 2
88 TB3MS 3-Month Treasury Bill: Secondary Market Rate 2
89 TB6MS 6-Month Treasury Bill: Secondary Market Rate 2
90 GS1 1-Year Treasury Constant Maturity Rate 2
94 CIVPART Civilian Labor Force Participation Rate 2
95 EMRATIO Civilian Employment-Population Ratio 2
96 CLF16OV Civilian Labor Force 5
97 CE16OV Civilian Employment 5
98 UNRATE Civilian Unemployment Rate 2
99 UEMPLT5 Civilians Unemployed - Less Than 5 Weeks 5
100 SLCEC1 Real State & Local Consumption Expenditures & Gross Investment, 1 Decimal 5
101 UEMP15T26 Civilians Unemployed for 15-26 Weeks 5
102 UEMP27OV Civilians Unemployed for 27 Weeks and Over 5
103 UEMPMEAN Average (Mean) Duration of Unemployment 2
104 UNEMPLOY Unemployed 5
105 PAYEMS All Employees: Total nonfarm 5
106 MANEMP All Employees: Manufacturing 5
107 DMANEMP All Employees: Durable goods 5
108 NDMANEMP All Employees: Nondurable goods 5
109 SRVPRD All Employees: Service-Providing Industries 5
110 USCONS All Employees: Construction 5
111 USEHS All Employees: Education & Health Services 5
112 USFIRE All Employees: Financial Activities 5
113 USGOOD All Employees: Goods-Producing Industries 5
114 USGOVT All Employees: Government 5
115 USINFO All Employees: Information Services 5
116 USLAH All Employees: Leisure & Hospitality 5
117 USMINE All Employees: Mining and logging 5
118 USPBS All Employees: Professional & Business Services 5
119 USPRIV All Employees: Total Private Industries 5
120 USSERV All Employees: Other Services 5
121 USTPU All Employees: Trade, Transportation & Utilities 5
122 USTRADE All Employees: Retail Trade 5
123 USWTRADE All Employees: Wholesale Trade 5
124 OILPRICE Spot Oil Price: West Texas Intermediate 5
125 NAPM ISM Manufacturing: PMI Composite Index 1
126 NAPMNOI ISM Manufacturing: New Orders Index 1
127 NAPMPI ISM Manufacturing: Production Index 1
128 NAPMEI ISM Manufacturing: Employment Index 1
129 NAPMSDI ISM Manufacturing: Supplier Deliveries Index 1
130 NAPMII ISM Manufacturing: Inventories Index 1
131 DJCA Dow Jones Composite Average 5
132 SP500 S&P 500 Stock Price Index 2
The T-transformation Codes, Tcode in the above Table, refer to how we transform raw time series
into stationary. Being Xt a raw series, the transformations adopted are:



 Xt if Tcode = 1




 (1 − L)Xt if Tcode = 2
(1 − L)(1 − L12 )X

if Tcode = 3
t
Zt =


 logXt if Tcode = 4




 (1 − L)logXt if Tcode = 5
(1 − L)(1 − L12 )logX if Tcode = 6

t
25

A.2 Elastic Net Algorithm
Zou and Hastie (2005) proof how to solve the naïve Elastic Net using the decorrelation step due to the
Ridge penalization term to avoid bias in the coefficients estimated. We simply use the notation as in
equation (6). Give the data (y, X), where y is a vector and X is a matrix, penalization values (λ1 , λ2 )
and the augmented data(y∗ , X∗ ), defined by:

− 12 √X
X∗(t+N )×N = (1 + λ2 )
λ2 I

y
y∗t+N =
0
the naïve Elastic Net solves:
λ1
β̂ ∗ = arg min |y ∗ − X∗ β| + √ |β|
β 1 + λ2 1
We can write the above equation using the definition of the Elastic Net in the following way:
2
β λ1 β
β̂ ∗ = arg min y ∗ − X∗ √ +√ √ (10)
β 1 + λ2 1 + λ2 1 + λ2 1
′
!
′ X∗ X y∗′ X∗ λ1 |β|
= arg min β β−2 + y∗′ y∗ + (11)
β 1 + λ2 1 + λ2 1 + λ2
We substitute the identities for the augmented variables:
′
!
∗′ XX
X X∗ =
1 + λ2
y′ X
y∗′ X∗ =
1 + λ2
y y = y′ y
∗′ ∗
Into the equation (11) we obtain:
( ′
! )
1 ′ X∗ X ′
β̂ = arg min β β − 2y Xβ + λ1 |β|1 + y′ y (12)
β 1 + λ1 1 + λ2
′
!
′ X∗ X
= arg min β β − 2y′ Xβ + λ1 |β|1 (13)
β 1 + λ2
26

References
[1] Alessi, L., Barigozzi, M., & Capasso, M. Improved penalization for determining the number of
factors in approximate factor models. Statistics and Probability Letters, 80(2324), 1806-1813 ,
2010.
[2] Bai, J., & Ng, S. Determining he number of factors in the approximate factor models. Econometrica,
70(1), 191-221, 2002.
[3] Bai, J., & Ng, S. Forecasting economic time series using targeted predictors. Journal of Economet-
rics, 146(2), 304-317, 2008.
[4] Banbura, M., Giannone, D., & Reichlin, L. Large Bayesian vector auto regression. Journal of
Applied Econometrics, 25(1), 71-91 , 2010.
[5] Bergmeir, C., Hyndman, R. J., & Koo, B. A note on the validity of cross-validation for evaluat-
ing autoregressive time series prediction. Journal of Computational Statistics and Data Analysis,
120(4), 70-83 , 2018.
[6] D’Agostino, A., Gambetti, L., & Giannone, D. Comparing alternative predictors based on large-
panel of factor models: Comparing alternative predictors. Oxford Bulletin of Economics and Statis-
tics, 74(2), 306-326, 2012.
[7] De Mol, C., Giannone, D., & Reichlin, L. Forecasting using a large number of predictors: is Bayesian
shrinkage a valid alternative to principal components?. Journal of Econometrics, 146(2), 318-328,
2008.
[8] Diebold, C. X., & Mariano, R. S. Forecasting using a large number of predictors: is Bayesian
shrinkage a valid alternative to principal components?. Journal of Business and Economic Statistics,
13(3), 253-263, 1995.
[9] Forni, M., Giovannelli, A., Lippi, M., & Soccorsi, S.. Dynamic factor model with infinite dimansional
space: Forecasting. CEPR Disucssion Paper no. DP11161, 1-43, 2016.
[10] Forni, M., Hallin, M., Lippi, M., & Reichlin L. The generalized dynamic-factor model: Identifica-
tion and estimation. Review of Economics and Statistics, 82(4), 540-554, 2000.
[11] Forni, M., Hallin, M., Lippi, M., & Reichlin L. The generalized dynamic-factor model: One-side
estimation and forecasting. Journal of the American Statistical Association, 100, 830-840, 2005.
[12] Forni, M., & Lippi, M. The generalized dynamic-factor model: One-side representation results.
Journal of Econometrics,163(1), 23-28, 2011.
[13] Friedman, J., Hastie, T., & Tibshirani, R. Regularization path for generalized linear model via
coordinate descent. Journal of Statistical Software, 33(1), 1-22, 2010.
[14] Giacomini, R., & Rossi, B. Forecast comparison in unstable environments. Journal of Applied
Econometrics, 25(4), 595-620, 2010.
27

[15] Giovannelli, A., & Proietti, T. On the selection of common factors for Macroeconomic forecasting.
Dynamic Factor Modes Advances in Econometrics, 35,595-630, 2016.
[16] Hastie, T., Tibshirani, R. & Friedman, J. The elements of statistical learning: Data mining,
inference and prediction. Springer Series in Statistics. New York, 2009.
[17] Kim, H. H., & Swanson, N. R. Forecasting financial and macroeconomic variables using data
reduction methods: New empirical evidence. Journal of Econometrics, 178, 352-367, 2014.
[18] Ledoit, O., & Wolf, M. Honey, I shrunk the sample covariance matrix. Economic Working Papers
691, Department of Economics and business, Universitat Pompeu Fabra, 1-22, 2003.
[19] Ledoit, O., & Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices.
Journal of Multivariate Analysis, 88(2), 365-411, 2004.
[20] Li, J. & Chen, W. Forecasting macroeconomic time series: LASSO-based approaches and their
forecast combinations with dynamic factor models. International Journal of Forecasting, 30, 996-
1015, 2014.
[21] Marcellino, M., Stock, J. H., % Watson, M. W. A comparison of direct forecast and iterated
multistep AR methods for forecasting macroeconomic time series. Review of Econometrics, 135,
499-526, 2006.
[22] McCracken, M. & Ng, S. FRED- MD: A monthly database for Macroeconomic Research. Journal
of Business and Economic Statistics, 0-0, 2015.
[23] McCracken, M. & Ng, S. FRED- MD: A quarterly database for Macroeconomic Research. Journal
of Business and Economic Statistics, 0-0, 2014.
[24] Ng, S. Variable selection in predictive regressions. in G. Elliott & A. Timmermann (Eds.), Hand-
book of economic forecassting,2(B), 752-789, 2013.
[25] Stock, J. H., & Watson, M. W. Macroeconomic forecasting using diffusion indexes. Journal of
Business and Economic Statistics, 20, 147-162, 2002(a).
[26] Stock, J. H., & Watson, M. W. Forecasting using principal components from a large number of
predictors. Journal of the American Statistical Association, 97(460), 1167-1179, 2002(b).
[27] Tibshirani, R. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical
Society, Sereis B, 58, 267-288, 1996.
[28] Zou, H., & Hastie, T. Regularization and variable selection via the elastic net. Journal of the Royal
Statistical Society, Sereis B, 67(2), 301-320, 2005.
28

RECENT PUBLICATIONS BY CEIS Tor Vergata
Robust Tests for Convergence Clubs

Luisa Corrado, Thanasis Stengos, Melvyn Weeks and M. Ege Yazgan
CEIS Research Paper, 451, February 2019
Forecasting Volatility with Time-Varying Leverage and Volatility of Volatility Effects

Leopoldo Catania and Tommaso Proietti
CEIS Research Paper, 450, February 2019
The Financial Decisions of Immigrant and Native Households: Evidence from Italy
Graziella Bertocchi, Marianna Brunetti and Anzelika Zaiceva
CEIS Research Paper, 449, January 2019
How to Set Budget Caps for Competitive Grants

Alessandro De Chiara and Elisabetta Iossa
CEIS Research Paper, 448, January 2019
Assessing the Effects of Fiscal Policy News under Imperfect Information: Evidence
from Aggregate and Individual Data
Luisa Corrado and Edgar Silgado-Gómez
CEIS Research Paper, 447, November 2018
A Behavioral Model of the Credit Cycle

Barbara Annicchiarico, Silvia Surricchio and Robert J. Waldmann
CEIS Research Paper, 446, October 2018
Forecasting Realized Volatility Measures with Multivariate and Univariate Models:

The Case of The US Banking Sector
Gianluca Cubadda, Alain Hecq and Antonio Riccardo
Wavelet analysis for temporal disaggregation

Chiara Perricone
Ambiguous economic news and heterogeneity: What explains asymmetric

consumption responses?
Luisa Corrado, Robert Waldmann, Donghoon Yoo
CEIS Research Paper, 443, August 2018
New Technologies and Costs

Vincenzo Atella and Joanna Kopinska
CEIS Research Paper, 442, August 2018

DISTRIBUTION
Our publications are available online at www.ceistorvergata.it
DISCLAIMER
The opinions expressed in these publications are the authors’ alone and therefore do
not necessarily reflect the opinions of the supporters, staff, or boards of CEIS Tor
Vergata.
COPYRIGHT
Copyright © 2019 by authors. All rights reserved. No part of this publication may be
reproduced in any manner whatsoever without written permission except in the case
of brief passages quoted in critical articles and reviews.
MEDIA INQUIRIES AND INFORMATION

For media inquiries, please contact Barbara Piazzi at +39 06 72595652/01 or by e-
mail at [email protected]. Our web site, www.ceistorvergata.it, contains more
information about Center’s events, publications, and staff.
DEVELOPMENT AND SUPPORT

For information about contributing to CEIS Tor Vergata, please contact Carmen Tata
at +39 06 72595615/01 or by e-mail at [email protected]

SSRN Id3334458

Uploaded by

Copyright:

Available Formats

SSRN Id3334458

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSRN Id3334458

Uploaded by

Copyright:

Available Formats

ISSN 2610-931X

CEIS Tor Vergata

Paolo Andreini and Donato Ceci

This paper can be downloaded without charge from the

Electronic copy available at: http://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

JEL Classification: C22, C52, C53, C55.

Electronic copy available at: https://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

2.1 LASSO based method

Electronic copy available at: https://ssrn.com/abstract=3334458

2.2 Ridge Regression

2.3 Elastic Net regression

Electronic copy available at: https://ssrn.com/abstract=3334458

2.4 Dynamic Factor Model

Electronic copy available at: https://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

yt+h = α1 (L)yt + α2 (L)F̂t + εt+h (9)

4 Estimation of critical parameters / hyperparameters

Electronic copy available at: https://ssrn.com/abstract=3334458

• ϑElasticN et = (λ1 , λ2 , p): as above;

i. we divide the entire sample (t = 1, . . . T ) into a calibration (t = 1, . . . Tc )and a proper (t =

Electronic copy available at: https://ssrn.com/abstract=3334458

5 Empirical Application and Forecasting Performances

Electronic copy available at: https://ssrn.com/abstract=3334458

Table 1: Optimal hyperparameters value

Electronic copy available at: https://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

5.1.3 Dynamic Factor Model

5.2 Predictive accuracy

Electronic copy available at: https://ssrn.com/abstract=3334458

Table 2: Forecast Evaluation EWPCA vs AR

Table 3: Forecast Evaluation EWPCA vs Lasso, Ridge and DFM

Electronic copy available at: https://ssrn.com/abstract=3334458

Table 4: Mean Square Forecast Errors

Electronic copy available at: https://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

Table 5: Sparisity in the variable selection

Electronic copy available at: https://ssrn.com/abstract=3334458

Figure 3: Stability selection and Sparsity

5.3.2 Consistency check EWPCA: brute force versus criteria

Electronic copy available at: https://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

5.3.3 Consistency check: EWPCA versus other models

Electronic copy available at: https://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

N Mnemonic Descriptions Tcode

Electronic copy available at: https://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

the naïve Elastic Net solves:

We substitute the identities for the augmented variables:

Into the equation (11) we obtain:

Electronic copy available at: https://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

Electronic copy available at: https://ssrn.com/abstract=3334458

Robust Tests for Convergence Clubs