Nowcasting Paper PDF
Nowcasting Paper PDF
Nowcasting Paper PDF
1
Banque de France, [email protected]
2
CREST, CNRS, Ecole Polytechnique, ENSAE, [email protected]
We would like to thank Roberto Golinelli, Michele Lenza, Francesca Monti, Giorgio Primiceri, Simon Sheng,
Hal Varian, Pablo Winant and the participants of the 10th ECB Conference on Macro forecasting with large
datasets, and Data Day@HEC conference for useful comments. We would like to thank Per Nymand-
Andersen (ECB) for sharing the Google dataset as well as Dario Buono and Rosa Ruggeri-Cannata (Eurostat)
for sending the real-time euro area GDP data. We are grateful to Vivien Chbicheb for outstanding research
assistance. Anna Simoni gratefully acknowledges financial support from ANR-11-LABEX-0047 and Fondation
Banque de France for the hospitality. A first version of this paper was circulated under the title:
Macroeconomic nowcasting with big data through the lens of a targeted factor model.
Working Papers reflect the opinions of the authors and do not necessarily express the views of the
Banque de France. This document is available on publications.banque-france.fr/en
Nowcasting GDP growth is extremely useful for policy-makers to assess macroeconomic conditions
in real-time. The concept of macroeconomic nowcasting has been popularized by many researchers
(see e.g. Giannone et al., 2008) and differs from standard forecasting approaches in the sense it aims at
evaluating current macroeconomic conditions on a high-frequency basis.
In the existing literature, GDP nowcasting tools integrate standard official macroeconomic
information stemming, for instance, from National Statistical Institutes, Central Banks, International
Organizations. However, more recently, a lot of emphasis has been put on the possible gain that
forecasters can get from using alternative sources of high-frequency information, referred to as Big
Data (see for example Varian, 2014, Giannone et al., 2017, or Buono et al., 2018). One of the main
sources of alternative data is Google search; seminal papers on the use of such data for forecasting are
the ones by Choi and Varian (2009) and Choi and Varian (2012). Overall, empirical papers show
evidence of some forecasting power for Google data, at least for some specific macroeconomic
variables such as consumption (Choi and Varian, 2012). However, when correctly compared to other
sources of information, the jury is still out on the gain that economists can get from using Google data
for forecasting and nowcasting.
In this paper, we estimate both pseudo real-time and true real-time nowcasts for the euro area
quarterly GDP growth between 2014q1 and 2016q1 by plugging Google data into the analysis, in
addition to official variables on industrial production and opinion surveys, commonly used as
predictors for GDP growth. The approach that we carry out is deliberately extremely simple and relies
on a bridge equation that integrates variables selected from a large set of Google data, as proposed by
Angelini et al. (2011). More precisely, we pre-select Google variables by targeting GDP growth using
the Sure Independence Screening method put forward by Fan and Lv, (2008) enabling to preselect the
Google variables the most related to GDP growth before entering the bridge equation. After pre-
selection we use Ridge regularization to estimate the bridge equation as the number of pre-selected
variables may still be large.
RÉSUMÉ
L’évaluation en temps réel (nowcasting) du taux de croissance du PIB est extrêmement utile aux
décideurs de politique économique afin d’appréhender correctement l’activité macroéconomique à une
fréquence élevée. Dans ce travail, nous cherchons à évaluer en temps réel le PIB de la zone euro à
l’aide d’une grande base de données de recherches Google par mots-clés. Notre objectif est de vérifier
si, et quand, ce type d’information permet d’améliorer la précision des évaluations en temps réel
lorsqu’on contrôle par des variables officielles d’enquêtes et de production. Pour cela, nous estimons
des modèles de régression qui permettent une réduction de la dimension à partir des données Google
préalablement présélectionnées de manière optimale, et nous montrons de manière empirique
l’efficacité de cette approche lorsqu’on cherche à évaluer le PIB de la zone euro. En particulier, nous
montrons que les données Google contiennent une information utile pendant les quatre premières
semaines du trimestre, lorsque l’information macroéconomique officielle sur le trimestre en cours
n’est pas disponible. Toutefois, lorsque les données officielles deviennent disponibles, le gain relatif
des données Google se dissipe rapidement. Enfin, nous montrons qu’une analyse dans des vraies
conditions de temps réel, à partir de données millésimées, confirme tous les résultats précédents,
notamment que les données Google constituent une alternative crédible lorsque les données officielles
ne sont pas encore disponibles.
Les Documents de travail reflètent les idées personnelles de leurs auteurs et n'expriment pas nécessairement la
position de la Banque de France. Ils sont disponibles sur publications.banque-france.fr
In this paper, we estimate both pseudo real-time and true real-time nowcasts for the
euro area quarterly GDP growth between 2014q1 and 2016q1 by plugging Google data
into the analysis, in addition to official variables on industrial production and opin-
ion surveys, commonly used as predictors for GDP growth. Google data are indexes
of weekly volume changes of Google searches by keywords in the six main euro area
countries about different topics which are gathered in 26 broad categories such as auto
and vehicles, finance, food and drinks, real estate, etc. Those broad categories are then
split into a total of 269 sub-categories per country, leading to a total of 1776 variables
Four main stylized facts come out from our empirical analysis. First, we point out
the usefulness of Google search data for nowcasting euro area GDP for the first four
weeks of the quarter when there is no available official information about the state of
the economy. Indeed, we show that at the beginning of the quarter, Google data provide
an accurate picture of the GDP growth rate. Against this background, this means that
such data are a good alternative in the absence of official information and can be used
by policy-makers. Second, we get that as soon as official data become available, that
is starting from the fifth week of the quarter, then the gain from using Google data for
GDP nowcasting rapidly vanishes. This result contributes to the debate on the use of
big data for short-term macroeconomic assessment when controlling for standard usual
macroeconomic information. Third, we show that pre-selecting Google data before en-
tering the nowcasting models appears to be a pertinent strategy in terms of nowcasting
accuracy. Indeed, this approach enables to retain only Google variables that have some
link with the targeted variable. This result confirms previous analyses that have been
carried when dealing with large datasets through dynamic factor models (see e.g. Bai
and Ng [2008] or Schumacher [2010]). Finally, we carry out a true real-time analysis by
nowcasting euro area GDP growth rate using the official Eurostat timeline and vintages
of data. We show that the three previous results still hold in real-time, in spite of an
expected increase in the size of errors, suggesting that Google search data can be effec-
tively used in practice to help the decision-making process.
The rest of the paper is organized as follows. In Section 2 we describe the model we
consider for nowcasting, the Sure Independence Screening (SIS) approach to pre-select
the data, as well as the Ridge regularization. Section 3 describes the structure of the
2
See for example Bontempi et al. [2018] for a detailed description of this dataset, in a different
framework
2 Methodology
2.1 The nowcasting approach
In order to get GDP nowcasts, we focus on linear bridge equations that link quarterly
GDP growth rates and monthly economic variables. The classical bridging approach is
based on linear regressions of quarterly GDP growth on a small set of key monthly in-
dicators as for example in Diron [2008]. In our exercise, in addition to those monthly
variables, we also consider Google data, available at a higher frequency, and we aim at
assessing their nowcasting power. More precisely, Google data are available on a weekly
basis, providing thus additional information when official information is not yet avail-
able. Even if Google data are not on average extremely correlated with the GDP growth
rate, we are going to show that they still provide accurate GDP nowcasts if conveniently
treated.
Therefore, we assume that we have three types of data at disposal: soft data, such as
opinion surveys, hard data, such as industrial production or sales, and data stemming
from Google search machines. Let t denote a given quarter of interest identified by its
last month, for example the first quarter of 2005 is dated by t = March2005. A general
model to nowcast the growth rate of any macroeconomic series of interest Yt for a specific
quarter t is the following, for t = 1, . . . , T :
Yt = β0 + βs0 xt,s + βh0 xt,h + βg0 xt,g + εt , E[εt |xt,s , xt,h , xt,g ] = 0, (2.1)
where xt,s is the Ns -vector containing soft variables, xt,h is the Nh -vector containing
hard variables, xt,g is the Ng -vector of variables coming from Google search and εt is an
unobservable shock. In our empirical analysis Yt is the quarterly GDP growth rate of
the euro area. Because variables xt,s , xt,h and xt,g are sampled over different frequencies
(monthly vs weekly), the relevant dataset for calculating the nowcast evolves within
(w)
the quarter. By denoting with xt,j , j ∈ {s, h, g}, the j-th series released at week
w = 1, . . . , 13 of quarter t, we denote the relevant information set at week w of a quarter
t by
(w) (w)
Ωt := {xt,j , j ∈ {s, h, g} such that xt,j is released at w}.
(w)
For simplicity, we keep in Ωt only the observations relative to the current quarter t
To explicitly account for the different frequencies of the variables, we replace model
(2.1) by a model for each week w such that:
(w)
Ybt|w = E[Yt |Ωt ], t = 1, . . . , T and w = 1, . . . , 13
(w) 0 (w)
0 0 (w) (w)
and E[Yt |Ωt ] = β0,w + βs,w xt,s + βh,w xt,h + βg,w xt,g (2.2)
Let us start from a standard linear regression equation with only the standardized
Ng Google variables as explanatory variables, that is β0 = βs = βh = 0 in equation
(2.1). Let M ∗ = {1 ≤ j ≤ Ng : βg,j 6= 0} be the true sparse model with non-sparsity size
s = |M ∗ |. The other Ng − s variables can also be correlated with Y via linkage to the
predictors contained in the true sparse model. Let Y denote the T -vector of quarterly
GDP growth: Y = (Y1 , . . . , YT )0 . We compute ω = (ω1 , . . . , ωNg )0 , the vector of marginal
correlations of predictors with the response variable Yt , such as
0
ω = X g Y, (2.3)
where X g is the T × Ng matrix of average Google data where the average is taken over
each quarter and that then has been centered and standardized columnwise. The average
over each quarter is taken to make the weekly Google data comparable to the quarterly
GDP growth data in terms of frequency. For any given λ ∈]0, 1[, the Ng componentwise
magnitudes of the vector ω are sorted in a decreasing order and we define a submodel
Mλ such as: Mλ = {1 ≤ j ≤ Ng : |ωj | is among the first [λT ] largest of all }, where [λT ]
denotes the integer part of λT . Since only the order of componentwise magnitudes of
ω is used, this procedure is invariant under scaling and thus it is identical to selecting
predictors using their correlations with the response. This approach is an easy way to
filter out Google variables with the weaker correlations with GDP growth rate so that we
are left with d = [λT ] < T Google variables. An important feature of the SIS procedure
is that it uses each covariate xt,g,j independently as a predictor to decide how useful it
P (M ∗ ⊂ Mλ ) → 1
as Ng → ∞. In particular, SIS can reduce the dimension to [λT ] = O(T 1−θ ) < T for
some θ > 0 and the reduced model Mλ still contains all the variables in the true model
M ∗ with a probability converging to one.
In the following, we write Xt,Mλ = [1, x0t,s , x0t,h , x0t,g,Mλ ]0 , where xt,g,Mλ = {xt,g,j ; j ∈
Mλ } is the vector containing only the selected Google variables. Moreover, for a vector
β ∈ Rp and a set M ⊂ {1, . . . , p} we write M c for the complement of M in {1, . . . , p}
and βM = {βj ; j ∈ M }. The empirical choice of the hyperparameter λ is discussed in
subsection 3.3.
where α > 0 is a regularization parameter that tunes the amount of shrinkage. The
estimated coefficients in βb are then shrunk towards zero. By using model (2.2) for each
T
!−1 T
(w) 1X 0 1X 0 (w)
βbRidge,Mλ = Xt,Mλ Xt,M + αI X Yt , βbRidge,M c = 0
T t=1 λ
T t=1 t,Mλ λ
and I is the |Mλ |-dimensional identity matrix. This is the estimator we are going to use
in our empirical analysis. Even if it depends on α in a crucial way, we leave implicit this
dependence. The empirical choice of the hyperparameter α is a crucial issue because it
has an important impact on the nowcasting accuracy. We discuss this choice in Section
3.3
3.1 Data
Our objective in this paper is to assess the role of Google data for nowcasting the
euro area GDP, especially to assess (i) if these big data are relevant when there is no
official data available for the forecaster and (ii) to what extent these data provide use-
ful information when official data become available. In this respect, the variable Yt in
model (2.1)-(2.2) that we target is the quarterly growth rate of the real euro area GDP,
stemming from Eurostat. The official data that we consider are of two kinds: industrial
production for the euro area as a whole provided by Eurostat, which is a global measure
of hard data and is denoted by IPt , and a composite index of opinion surveys from var-
ious sectors computed by the European Commission (the so called euro area Sentiment
Index) denoted by St .
As regards the dates of availability, we mimic the exact release dates as published by
3
Applying this standard seasonal filter eliminates a large part of seasonal effects in the dataset. We
also test for outliers in our study, but dropping detected outliers does not seem to improve nowcasting
accuracy. Obviously this question needs to be tackle in more details in further research.
By denoting with xt,g,Mλ ,w the vector of pre-selected variables from the Google search
(w)
data for week w of quarter t we construct the variable xt,g in equation (2.2) as the aver-
(w) P
age of the vector of selected Google variables up to the w-th week: xt,g = v≤w xt,g,Mλ ,v .
(3)
That is, take for instance w = 3 (i.e. Model 3 which is used at week 3), then xt,g is
equal to (xt,g,Mλ ,1 + xt,g,Mλ ,2 + xt,g,Mλ ,3 )/3.4 The other variables in equation (2.2) denote,
(w)
respectively: Yt the euro area GDP growth rate, xt,s the monthly data from surveys,
(w)
available at the end of each month, and xt,h denotes the growth rate of the index of
industrial production, available about 45 days after the end of the reference month. Be-
cause of the frequency mismatch within the whole dataset, the thirteen models include a
(w)
different number of predictors, as we have explained above. As regards the survey, xt,s ,
(w)
and the industrial production, xt,h , we impose the following specific structure which
mimics the data release explained above, and that will be used throughout our exercise.
(w)
The variable xt,s is not present in models 1 to 4 because the survey is not available in the
(1) (2) (3) (4)
first four weeks of the quarter, so that βt,s = βt,s = βt,s = βt,s = 0. Then, for models
(w) (w)
5 to 8, xt,s is the value of the survey for the first month of the quarter: xt,s = St,1
Figure 1: Timeline of data release in the pseudo real-time exercise within the quarter.
4
In our empirical analysis, we also test models that do not use the average over weeks of Google
search data as explanatory variables, but instead, Google search data for each new weeks is considered
as the variable for the quarter. Results clearly point that models integrating averaged Google search
data give smaller Mean Squared Forecasting Errors than models that do not use the averaged Google
search data.
10
One of the main issue in the literature on big data is to know whether and when
such alternative data are able to bring an additional gain with respect to standard types
of variables, like hard and soft data. To contribute to the existing literature on this is-
sue, we have also estimated nowcasting models without including the vector of variables
selected from the Google search data. That is, these models only include as predictors
(w)
the survey and the growth rate of the index of industrial production (i.e. βt,g = 0 in
equation (2.2)). We have in total four such models, one for each release of data of these
two variables within the quarter, denoted N oGoogle1 , . . . , N oGoogle4 in Table 2 in the
Annex, that will be used for comparison purposes.
An additional issue with the reporting lags concerns the release of GDP figures. In
fact, the first GDP assessment is generally released about 45 days after the end of the
reference quarter, but sometimes the delay may be longer. For instance, GDP figures
for the first quarter of 2014 were only released on the 4th of June 2014. For this reason
if one wants to nowcast in real-time GDP growth for 2014q2 it is not possible to use
the fitting computed with the data available up to 2014q1 because one does not observe
the GDP for 2014q1. Instead, one has to use the estimated parameters computed with
the data available up to 2013q4. Because of this, we impose a gap of two quarters be-
tween the sample used for fitting the model (training sample) and the sample used for
the out-of-sample analysis (test data). For coherence, we use this structure in both the
pseudo-real-time and the true real-time analysis.
Another issue concerns the inclusion of lagged GDP among the explanatory vari-
ables. Because of the delay in the release of the GDP we cannot include the lagged GDP
11
T
!−1 T
X X
R
bT (α) = XM
λ
T −1 0
Xt,Mλ Xt,M λ
+ αI T −1 0
Xt,M λ
t=1 t=1
12
For λ we consider a grid of 99 equispaced values in (0, 1], denoted by Λ. The selec-
tion is made sequentially: for each value of λ in the grid we select for α in model w the
(w) (w) b(w) (α). This is done for each of the thirteen
value αbT that solves α bT := arg minα∈A Q T
models and for each nowcasting period by using only the training sample corresponding
to the specific nowcasting period we are considering. We notice that T depends on the
nowcasting period.
(w)
Once a value α bT is selected for each value of λ in the grid, we select the value
of λ that minimizes the MSFE for the GDP growth of the last quarter of the training
(w)
sample obtained by using the selected α bT . That is, if T denotes the last quarter of
the training sample, then we select for λ in model w the value λ b(w) = arg minλ∈Λ (YT −
T
0 (w) (w) 2
XT,Mλ βRidge,Mλ (b
b αT )) .
4 Empirical Results
In this section we present the results of our empirical exercises aiming at nowcasting
the euro area GDP growth using various types of data sources. This section is split into
three parts. First, we look at the accuracy gains stemming from using Google data when
controlling for standard official macroeconomic data, by comparing nowcasts obtained
with and without such data, in a pseudo real-time exercise. Then we look at the effects
of pre-selecting Google data before estimating Ridge regressions. Third, we perform a
true real-time analysis.
13
Figure 2: The importance of Google data. Pseudo-real-time analysis with pre-selection of Google
data. RMSFEs from: (i) models M1 - M13 with only variables extracted from Google data (in light
gray), (ii) models M1 - M13 with all the variables (St , IPt and Google data) (in gray), (iii) models with
only official variables N oGoogle1 - N oGoogle4 (in black).
The first striking feature that we observe in Figure 2 is the downward sloping evolution
of RMSFEs stemming from the models with full information (Google, Industrial Pro-
duction and Survey) over the quarter. This is in line with what could be expected from
nowcasting exercises when integrating more and more information throughout the quar-
ter (see Angelini et al. [2011]). When using Google information only (light gray bars),
we still observe a decline but to a much lower extent and the RMSFEs stay above 0.25
even at the end of the quarter. However, when focusing on the beginning of the quarter,
models that only integrate Google information provide reasonable RMSFEs that do not
exceed 0.30 (see Figure 2). This result shows that Google search data possess a informa-
tional content that can be valuable for nowcasting GDP growth for the first four weeks
14
The idea of the SIS pre-selection method is to identify ex ante specific Google vari-
ables, among the initial large dataset, that have the highest absolute correlation with
the targeted variable, namely the GDP growth rate. First, let us have a look at the
relationship between the number of selected variables through the SIS procedure and
the absolute correlation between each Google variable and the GDP growth rate at the
same quarter. We recall that for the Google variables we take the average over each
quarter, see Section 2.2. This relationship is described in Figure 3. We clearly observe
an inverse non-linear relationship, with a kind of plateau starting from an absolute cor-
relation of about 0.25. Indeed, most of Google variables present an absolute correlation
with current GDP growth rate lower than 0.30. Thus it seems useful to only focus on a
core dataset with the highest correlations.
We then analyse the performances in terms of RMSFEs from bridge regressions that use
the SIS pre-selection approach associated with Ridge regularization. Figure 4 presents
15
the evolution over the 13 weeks of the quarter of RMSFEs stemming from bridge models
estimated using Google search data and Ridge regression coupled or not with the SIS
pre-selection approach. We clearly see that the SIS pre-selection approach (gray bars,
similar to the gray bars in Figure 2) allows for an overall improvement in nowcasting
accuracy. A striking result is that the RMSFE is lower for all the weeks when the SIS
pre-selection approach is used. Moreover, when pre-selection is implemented, RMSFEs
evolve over the quarter in a more smoother way. For example, without any pre-selection,
we observe that in week 6 the RMSFE jumps to 0.3829, from 0.3239 in week 5. The
overall gain underlines the need for pre-selecting data using a targeted approach. Panel
2 in Figure 8 in the Annex reports the exact values of the RMSFEs with and without
pre-selection.
5
Survey data are generally not revised.
16
3 in the Annex gives the exact weeks in the out-of-sample period 2014q1-2016q1 where
the lagged GDP growth is included in the real-time analysis.
In Figure 5 we show that pre-selecting Google data is still worth in real-time. Indeed,
RMSFEs obtained from models integrating pre-selected Google data are systematically
lower, for all weeks, than those obtained without any pre-selection. The corresponding
RMSFE values are reported in Panel 3 of Table 8.
In Figure 6, we show the impact of Google search data on GDP growth nowcast-
ing accuracy in the context of a true real-time nowcasting analysis. The corresponding
RMSFE values are reported in Panel 4 of Table 8. Similarly to the pseudo real-time
exercise, we get that during the first 4 weeks of the quarters, when only Google infor-
mation is available, RMSFEs are quite reasonable. This fact is reassuring about the
real-time use of Google search data when nowcasting GDP. However, starting from week
5, as soon as the first survey of quarter is released, the marginal gain of using Google
data instantaneously vanishes. 6
6
There is an exception in week 11, where it is surprising to note that the integration of surveys, past
GDP value and industrial production tend to suddenly increase the RMSFEs, in opposition to what
can be expected from previous empirical results. This stylized has to be further explored.
17
Finally, in order to compare the results of the real-time analysis with the ones from
the pseudo-real-time analysis, we compute GDP growth nowcasts without including the
lagged GDP growth among the explanatory variables. The results are given in Figure
7. The corresponding RMSFE values are reported in Panel 5 of Table 8. We see that
both analyses lead to a similar shape in the evolution of RMSFEs within the quarter,
although, as expected, the uncertainty around weekly nowcasts is a bit higher in real
time.
18
Figure 7: Pseudo-Real-time versus True Real-time analysis (with pre-selection). Comparison of RMS-
FEs within the quarter from pseudo-real-time (in light gray) and true real-time (in gray) analysis. The
true real-time analysis does not include lagged GDP growth among the explanatory variables.
19
Four salient facts emerge from our empirical analysis. First, against the background
of a pseudo real-time analysis, we point out the usefulness of Google search data in
nowcasting euro area GDP growth rate for the first four weeks of the quarter when there
is no information about the state of the economy. We show that at the beginning of the
quarter, Google data provide an accurate picture of the GDP growth rate.
Second, as soon as official data become available, that is starting from week 5 with
the release of opinion surveys, then the relative nowcasting power of Google data instan-
taneously vanishes.
Third, we show that pre-selecting Google data before entering the nowcasting mod-
els appears to be a pertinent strategy in terms of nowcasting accuracy. Especially, we
implement the Sure Independent Screening approach put forward by Fan and Lv [2008]
enabling to retain only the Google variables that are the most correlated with the tar-
geted variable, that is GDP growth rate. This result confirms previous results obtained
with bridge equations augmented with dynamic factor (see e.g. Bai and Ng [2008] or
Schumacher [2010]).
Finally, we show when using Google search data in the context of a true real-time
analysis, the three previous salient facts remain valid. This result argues in favor of
the use of Google search data at the beginning of the quarter, when there is no official
information available, for real-time policy-making.
20
J. Bai and S. Ng. Forecasting economic time series using targeted predictors. Journal of
Econometrics, 146(2):304 – 317, 2008. Honoring the research contributions of Charles
R. Nelson.
K. Barhoumi, O. Darne, and L. Ferrara. Are disaggregate data useful for forecasting
french gdp with dynamic factor models ? Journal of Forecasting, 29(1-2):132–144,
2010.
J. Boivin and S. Ng. Are more data always better for factor analysis? Journal of
Econometrics, 132:169–194, 2006.
H. Choi and H. Varian. Predicting initial claims for unemployment insurance using
Google trends. Google Technical Report, 2009.
21
D. Coble and P. Pincheira. Nowcasting building permits with Google Trends. MPRA
Paper 76514, University Library of Munich, Germany, 2017.
M. Diron. Short-term forecasts of euro area real gdp growth: An assesment of real-time
performance based on vintage data. Journal of Forecasting, 27:371–390, 2008.
J. Fan and J. Lv. Sure independence screening for ultrahigh dimensional feature space.
Journal of the Royal Statistical Society B, 70:849–911, 2008.
D. Giannone, M. Lenza, and G. Primiceri. Economic predictions with big data: The
illusion of sparsity. mimeo, 2017.
T. Goetz and T. Knetsch. Google data in bridge equation models for german gdp.
International Journal of Forecasting, 35(1):45–66, 2019.
R. Golinelli and G. Parigi. Tracking world trade and GDP in real time. International
Journal of Forecasting, 30(4):847–862, 2014.
22
X. Li. Nowcasting with big data : Is Google useful in the presence of other information?
mimeo, 2016.
M. Modugno, B. Soybilgen, and E. Yazgan. Nowcasting Turkish GDP and news decom-
position. International Journal of Forecasting, 32(4):1369–1384, 2016.
F. Narita and R. Yin. In search for information: Use of Google Trends’ data to narrow
information gaps for low-income developing countries. Technical Report WP/18/286,
IMF Working Paper, 2018.
S. Scott and H. Varian. Bayesian variable selection for nowcasting economic time series.
In A. Goldfarb, S. Greenstein, and C. Tucker, editors, Economic Analysis of the Digital
Economy, pages 119–135. NBER, 2015.
H. Varian. Big data: New tricks for econometrics. Journal of Economic Perspectives,
28(2):3–28, 2014.
23
Table 1: Equations of the 13 models (M 1, . . . , M 13) used to nowcast GDP growth over each quarter.
Equations include the variables pre-selected from Google data as well as information stemming from
surveys (St ) and industrial production (IPt ). St,i denotes the variable surveys St referring to the i-th
month of the current-quarter t and IPt,i denotes the growth rate of the industrial production available
at the 11th week of the current-quarter t and referring to the i-th month of the current-quarter t.
24
Table 2: Equations of the four models used to nowcast GDP growth without the variables extracted
from Google data. St,i denotes the variable surveys St referring to the i-th month of the current-
quarter t and IPt,i denotes the growth rate of the industrial production available at the 11th week of
the current-quarter t and referring to the i-th month of the current-quarter t.
25
Table 3: Timeline of GDP release in real-time within the quarter. The first column gives the last
period used for the in-sample analysis (training sample), the second column indicates the nowcasting
period, the third column indicates the date of the first vintage which contains the GDP growth in the
last period of the training sample (indicated in the first column), the fourth columns indicates whether
a lagged GDP growth is available to be included among the explanatory variables (the corresponding
date and week of availability are given in the third and fifth columns, respectively). Finally, the fifth
column gives the week, and so the model, corresponding to the date in the third column.
26