Web Appendix: Model Estimation
Web Appendix: Model Estimation
Web Appendix: Model Estimation
WEB APPENDIX
MODEL ESTIMATION
For a given brand j ( j = 1,,J) Equations (2)-(6) can be combined in a single model and written
as
(W.1b) 1t G1t 1 h t t .
In Equations (W.1a) and (W.1b) Yt is a (3S+K)1 vector of dependent variables including log
sales, log regular price, log price index, and K (= 4 in the application) marketing mix variables.
explanatory variables with a time varying parameter (brand and store intercepts, log regular
(3S+K)1 vector of observation equation errors. F2t is a (3S+K)M matrix of regressors with
vector ht = + Z't-1 includes the lagged marketing mix and the system equation intercepts. Both
Yt and t have multivariate normal distributions, and so do the associated error terms. We
assume that t ~ N(0,V), where the variance matrix V, of size (3S+K) (3S+K), is time invariant
and full. Note that we correlate sales, regular price, promotional price and marketing mix error
terms within each brand. This allows us to capture unobserved shocks that may cause
2
endogeneity. The system errors are distributed multivariate normal, t ~ N(0,W), where W is a
We place normal priors on all parameters of the Equations (W.1a) and (W.1b). The
evolution equation (W.1b) error covariance matrix is assumed to be diagonal and we place an
Inverse Gamma prior on their diagonal elements. As we allow for correlation between the
observation equation error terms and the marketing mix equation error terms (W.1a), the
associated error covariance matrix is full. Therefore we place an Inverse Wishart prior. Given
these priors the estimation is carried out using DLM updating within a Gibbs sampler.
Conditional on 2, V, W, G, and ht the time varying parameters (1t) are obtained via the
forward filtering backward sampling procedure (Carter and Kohn 1994, Frhwirth-Schnatter
1994). The long-term marketing mix effects () are estimated using a random walk Metropolis-
Hastings algorithm. Next, we derive the full conditional posterior distributions used in the
sampling chain.
First, define Yt = [Y'1t Y'2t]' such that Y1t includes log sales of the focal brand, and Y2t
includes the rest (log regular price, log price index, and K marketing mix variables). Also define
2 = ['21 '22]', and F2t = diag([F21t F22t]), where 21 and 22 contain non-time varying
parameters from the sales equation and remaining equations, respectively. As Y1t and Y2t are
~
the conditional covariance matrix is given by V V11 V12 V221 V21 , and the conditional mean
vector (net off sales attributed to the variables with non-time varying parameters) is given by
~
Y1t Y1t V12 V221 (Y2t F22t 22 ) - F21t 21 .
3
Assuming that the DLM is closed to external information at times t 1 -i.e. given initial
Conditional on these parameters the solution is given by West and Harrison (1997). Prior at time
t is 1t | Dt-1 ~ N(at, Rt), where the mean and the covariance matrix are at = Gmt-1 + ht and Rt =
~
GCt-1G' + W. One-step ahead forecast at time t is Y1t |Dt-1 ~ N(ft, Qt), where ft = Ftat and Qt =
~
FtRtF't + V . Then the posterior distribution at time t is 1t | Dt ~ N(mt, Ct), where
~
m t a t R t FtQ t 1 (Y1t f t ) , and C t R t R t FtQ t 1Ft R t .
Step 1: 1t|rest
In order to sample from the conditional distribution of 1t for each brand j, we adopt the forward
filtering, backward sampling algorithm proposed by Carter and Kohn (1994) and Frhwirth-
Schnatter (1994). The sampling of system parameters starts with the standard DLM updating.
For t = 1,,T we apply forward filtering to obtain the moments, mt and Ct. At t = T we sample a
vector of system parameters from the distribution N(mt, Ct), then we sequence backwards for t =
T-1, , 1 sampling from p(1t | 1t+1 , ) ~ N(q*t, Q*t), where q*t = mt + Bt (1t+1 at+1), Q*t = Ct
BtRt+1B't, and B t C t G R t 11 . For the starting values of time varying parameters, we use m0 =
Step 2: V|rest
For each brand j, we allow for correlations between all error terms and place an Inverse Wishart
prior on the error covariance matrix. We use a diffuse prior for V that has a prior mean-diagonal
element SV0 = .001I(3S+K) and set the prior degrees of freedom nV0 to (3S+K)+2. Then the full
conditional posterior distribution has degrees of freedom nV1 = nV0 + T with a variance matrix
4
Step 3: 2|rest
In order to obtain the conditional posterior distribution of the non-time varying parameters for
each brand j we define Y*t = Yt F1t1t and VT = VIT. We place a diffuse Normal prior on the
1
posterior is 2 ~ N( 2 2 ) , where 2 2 { 2 2 [F2t VT1 Yt* ]} , and
1
2 { 2 [F2t VT1 F' 2t ]}1 .
Step 4: W|rest
For each brand j, we assume that the system equation error covariance matrix is diagonal, and
place an Inverse Gamma prior on the diagonal elements of this matrix, with nW0/2 degrees of
freedom and a scale parameter of SW0/2. The full conditional posterior distribution is also
Step 5: |rest
In this step we derive the full conditional posteriors of decay parameters for each brand j and
and ij = 1000. We first stack the observations 1it across time in vectors 1iT and 1iT-1,
running from t = 2,,T and t = 1,,T-1 respectively. We also stack the corresponding
components of ht in hiT. Then for each i we define yiT 1iT - hiT and WiT = WiIT-1. Given the
normal priors, and the likelihoods, the full conditional posterior distributions are
1 1
WiT1 y iT ]} and ij { ij [1iT
ij ~ N( ij ij ) , where ij ij { ij ij [1iT WiT1 1iT ]}1 .
5
Step 6: |rest
In this step we derive the full conditional posteriors of intercepts for each brand j and system
equation i (i = 1 for intercept, i = 2 for elasticity). We place a Normal prior on all parameters, ij
~ N(ij,ij), where ij = 0 and ij = 1000. We stack the observations 1it across time in vectors
1iT and 1iT-1, running from t = 2,,T and t = 1,,T-1, respectively. We also stack the
corresponding components of ht in hiT. Then for each i we define yiT 1iT - ij1iT-1 - hiT and
WiT = WiIT-1. Given the normal priors, and the likelihoods, the full conditional posterior
1
distributions are ij ~ N( ij ij ) , where ij ij { ij ij [1 WiT1 y iT ]} and
1
ij { ij [1 WiT11]}1 .
Step 7: |rest
We use a random walk Metropolis-Hastings step within the Gibbs sampler to obtain each
marketing mix coefficient in the two system equations. We generate the candidate rate draw by
(m) = (m-1) + z, where (m) denotes mth iteration, and z is a random draw from N(0, I). We
select such that the acceptance rate is between 20%-50% (Chib and Greenberg 1995). The
L (m) | 1 , W, , p (m) | ,
| , W, , p ,
(W.3)
L (m 1) 1
(m 1)
| ,
L(()|) is conditional likelihood of Equation (W.2), and p(()|) is the prior density evaluated at
REFERENCES
Carter, C. and R. Kohn (1994), On Gibbs Sampling for State Space Models, Biometrika, 81
(3), 54153.
Chib, Siddhartha and Edward Greenberg (1995), Understanding the Metropolis-Hastings
Algorithm, American Statistician, 49 (4), 32735.
Frhwirth-Schnatter, Sylvia (1994), Data Augmentation and Dynamic Linear Models, Journal
of Time Series Analysis, 15 (2), 183202.
West, Mike and Jeff Harrison (1997), Bayesian Forecasting and Dynamic Models, 2d ed. New
York: Springer-Verlag.