Eviews
Eviews
Eviews
Oct 5, 2016
30 minute read
EViews
EViews documents (also known as “workfiles”) are files that hold different objects (which
contain data). Unlike Microsoft Word or Excel, EViews does not open up a blank document upon
launch. Instead, the user must always specify the structure (or frequency) of the file, a range for
the data, and a sample. A workfile is made up of one or more page files.
EViews allows the user to create an empty workfile (i.e., one without any data) and to create a
workfile by importing data from another program.
A workfile can have multiple pagefiles, each of a specific (possibly different) type. However,
series objects in a specific pagefile must have the same structure (frequency).
Creating a workfile for annual time series data involves 4 steps: (1) specifying the workfile
structure as “dated – regular frequency”, (2) specifying the frequency of data as annual, (3)
selecting a start and end date, and (4) naming your workfile. The 4th step is optional.
After creating a workfile, EViews will automatically create two (2) series objects. Please identify
(i.e., check) these two objects in the following list.
The range of the workfile allows for up to 140 observations while the current (or active) sample
allows for a maximum of 100 observations.
Notice that EViews automatically detects that the source data was a time series. Choose the pre-
defined range called “Bahamas” to import the economic series relevant to The Bahamas and then
click Next.
EViews then prompts you to enter information about each of the columns of data. From Header
type, select “Names in first line” to make sure that the variable names will appear at the top of
each column.
As a last step, EViews asks you how you would like to import the data; select “Create new page”
and click Finish. Your workfile should now appear with a pagefile containing data on the
Bahamas
EViews also allows the user to import data by simply copying data from an existing spreadsheet
(in Microsoft Excel or other applications) and pasting it into an existing workfile. Using the steps
in the chart below, work in EViews to discover the correct way to do so. Then place the steps
below in the correct order.
One way to create a new page is by loading a new data file into the page. Let’s add to our file the
same macroeconomic variables for another country–this time for Jamaica. On the New Page tab,
click Load Workfile Page… and select the Caribbean Excel file.
The Excel Read box will open up again. Select Jamaica as the pre-defined range, specifying the
structure (or frequency) of your data in the new page, and then click Finish to load the data.
Similar to when you created the original workfile, specify the structure type, data range and
name of the page (e.g., “trin”). You now have a blank page for this country.
Open the Caribbean file and copy and paste the data into the new “trin” page.
series
The series object is the main data object in EViews, represented by a yellow icon with a graph in
it. By nature, a series object contains one column of data (objects with more than one series are
called “groups”). Series objects in the same pagefile will have the same range, which is always
shown in the left-most column. This column shows the dates associated with each observation in
the case of time series data or the observation number in the case of cross-sectional data.
The first two choices will create empty series with the same workfile structure as the other series
on that pagefile. The last three options are ways to transform existing series using equations.
They all involve performing an EViews operation on a series in your current workfile to create a
new series. In particular, the last choice allows you to create a new (transformed) series with
sample restrictions.
After you double-click on a series, the series will open in the standard EView’s spreadsheet view.
At the top of the window, you will notice the menu items associated with the series object, each
one allowing you to perform specific analyses of the series.
A Graph Option dialog box will appear asking for you to specify some items for your graph.
Under Option pages, select Basic type page. Under the Graph type section, select Basic graph
from the General category.
Now create a line graph. Select Line & Symbol from the Specific category. Leave the Details
setting as specified by the default options and click OK.
A new graph appears showing the % change in real GDP (annual basis). Notice that at the bottom
of the graph there is a slider bar that allows you to restrict the sample under consideration.
(Incidentally, any changes to the sample made using the slider will not affect the rest of the
workfile unless you right-click and select “Set as workfile sample”).
Group
Groups help you work with a collection of series. A Group is a list of series names (and
potentially mathematical expressions) that provides access to all the data in that list. Once you
create a Group Object, you can use the group name in many EViews commands to refer to all the
series contained in that group. Note that a group is a “live” feed and is NOT a copy of each
individual series. This means that if the data in any one of the series changes, these changes will
be reflected in the group with which it’s associated. Lastly, note that a group object can be
created without it having any series (i.e., empty groups are allowed). This is why the last
statement is false.
Highlight the series you wish to group together, right-click and select Open → as Group.
Even after creating a group, you can add more series to the group. To do so, click View →Group
Members. Then you can edit the window to add more series. Clicking the Update Group button
applies your changes.
Creating a Group
Open up your Carribean workfile. Now create a group consisting of jam_rgdp and dom_rgdp.
To do so, select Object → New Object from the main menu. In the New Object box, select
Group, keep the default name for the object (group01), and press OK. The Series List window
will now appear. Here, you choose what series to include in the group. Type in the series names
you want ot be included in the group separated by spaces and then click OK.
1) Holding the Ctrl key, click on the series you wish to group together, right-click and select:
Open→as Group.
2) Type in the command window: group groupname jam_gdp dom_gdp, and press Enter. Here,
group is the EViews command to create a group, and groupname is the name you have decided to
use.
Open your recently created group01 and click View → Graph. In the Graph Option dialog box,
select Basic type page and choose Basic graph under the Graph type section. Select Line &
Symbol from the Specific category and Single Graph under Details/Multiple Series. Click OK.
Notice that EViews has plotted the raw data of both series in a single frame. You can also change
the default view to be differenced data, % change, etc.
As described in the lecture, editing a series in EViews is easy. Open any series and click on the
Edit +/- tab to turn on “editing mode”. This allows you to add data to the series or change
existing values in the series.
Similar to Excel, you can fill in data manually using the operators shown below:
Interpolating a Value
Better yet, let’s have EViews interpolate the data using any one of the built in interpolation
procedures. EViews has special symbols for various interpolation techniques, some of wich were
discussed in the previous session:
Adjusting a Series
One useful EViews tool not discussed in the previous session is the so-called Adjust mode. Series
Adjust mode allows you to compare data in the original series to any changes you may make.
Unlike in Edit mode, the changes in Adjust mode are not permanent and can be easily reversed.
Click on the Adjust +/- button in the Series toolbar. EViews will add additional columns to the
spreadsheet view: an “Unadjusted” column containing data before any adjustment, one column
that detects the change in the new value from the unadjusted data (before you entered Adjust
mode), and one that detects the percentage difference.
Sample
the user can change the sample under consideration using “smpl” statements. There are a number
of useful @ functions to help you define the active sample more easily. Some examples of such
@functions , as well as a description of the statements, are listed in the chart below:
create an equation in EViews
Select Object → New Object from the main menu and choose Equation.
Highlight the series you wish to estimate the equation with, right-click and select “Open” and
then “as Equation…”.
auto series,
which will be useful in forthcoming sessions on equation objects and regression. Auto series
allow the user to create functions of series “on the fly” and can be useful in keeping your
workfile parsimonious (i.e., to not have too many series objects).
EViews allows you to work with expressions (functions of existing series) directly, without
having to create and save new series.
Let’s use auto series to examine the relationship between inflation in Jamaica and The Bahamas.
Select both bah_cpi_eop and jam_cpi_eop (Use the Ctrl key to select both) and double click to
Open equation.
In the equation window, transform both cpi variables using the log difference. Under Estimation
settings, select LS for regression type and include the whole sample period, 1980 2014. Click
OK.
Recall that the notation “(-1)” in EViews means the previous period (or first “lag”).
As you know, the Philips Curve refers to the historical relationship between unemployment and
inflation in an economy. The relationship is typically negative; that is, on average, higher
unemployment rates have (historically at least) been associated with lower rates of inflation.
From the main menu, select Quick→Estimate Equation, and click OK. The equation box will
now appear.
In the Equation specification box, type atl_infl c atl_unemp_rate, where the first term, namely
atl_infl, is the dependent variable, c is the constant (a vector of ones), and atl_unemp_rate is the
independent variable.
For Panel 1:
For Panel 2:
For Panel 3:
The R² of this simple regression explains about a fifth of the variance in the inflation rate. Other
variables could be added to increase the explanatory power.
The coefficient on this period’s unemployment rate falls to -1.9561. This implies that a 1%
increase in unemployment is associated with about a 2% decrease in inflation (on average). Note
that the coefficient on the lagged unemployment rate is positive and significant. Including it in
the regression helps better specify the coefficient on current unemployment.
As shown in the explanation to Question 2.42, the Durbin Watson statistic of the regression was
1.95. As a rule of thumb, a DW statistic substantially lower than 2 is generally a sign of positive
autocorrelation.
You can easily look for autocorrelation in the residuals of an estimation using various methods.
What are some ways in which to investigate serial correlation? Select all that apply:
Use Proc→Make Residual Series to capture the residuals of the most recent regression
and save them as a new series.
After running your regression, select View→Actual, Fitted Residual→Actual, Fitted
Residual Table or Actual, Fitted Residual Graph, click Freeze and then click Name to
save this table or graph.
After running your regression, select View→Residual Diagnostics→ Serial Correlation
LM test, and look to see whether or not the null can be rejected.
After running a regression, you can examine the fitted residuals by making a new residual series
or viewing the residual series in a table or graph. Note that the residuals will represent the
difference between the actual values of the dependent variable and the fitted value created by
your regression for each period.
While the resid series will show the residuals post-regression, note that this series changes with
your most recent estimation. Thus, it is advisable to use Proc→Make Residual Series to save the
residuals of particular regressions (for future testing, for instance).
Spotting Autocorrelation
To investigate, examine the residuals of the regression by selecting View → Actual, Fitted,
Residual → Graph. Note that because the DW statistic is close to 2 in this case, the serial
correlation, if it exists, will be difficult to see.
Looking at the graph, it’s not evident in which period is responsible for causing the DW statistic
to be less than 2. Thus, including a dummy variable (as done in the previous session) is unlikely
deal with the issue.
Adding Dynamics
Let’s try adding the a lag of the dependent variable to the right-hand side. Note that this will
create a dynamic equation, leading to specific issues about how to generate forecasts from the
equation. Do we, for example, use the forecast value of the lag to predict the current level of
inflation or do we use (assuming it is available) the observed value in the previous period.
But let’s first see if the past period’s inflation can help explain inflation in the current period.
Estimate a multiple term regression including the lagged dependent variable (inflation) for
Atlantis.
No, the lagged dependent variable is insignificant and its inclusion actually causes the DW
statistic to decrease marginally to 1.87.
Let’s run the LM test on a model that includes a constant, the unemployment rate, lagged
unemployment, and lagged inflation for Atlantis.
As performed in the lecture, estimate your regression, then select View → Residual Diagnostics
→ Serial Correlation LM Test. Make sure you allow for 3 lags in the computation of the statistic.
The null hypothesis of the Breusch Godfrey is no serial correlation. Running the test with 3 lags,
yields the following results.
Session 4: Basic Forecasting Using the Equation Object, Part 1
A within sample forecast utilizes a subset of the available data to forecast values outside of the
estimation period and compare them to the corresponding known or actual outcomes. This is
done to assess the ability of the model to forecast known values. For example, a within sample
forecast from 1980 to 2015 might use data from 1980 to 2012 to estimate the model. Using this
model, the forecaster would then predict values for 2013-2015 and compare the forecasted values
to the actual known values.
An out of sample forecast instead uses all available data in the sample to estimate a models. For
the previous example, estimation would be performed over 1980-2015, and the forecast(s) would
commence in 2016.
As opposed to a static forecast, which always uses the known (and hence error-free) value of data
to forecast the next period’s value, a dynamic forecast uses the forecasted value of the dependent
variable to generate the next predicted value (and predicted values further out in the forecating
horizion). As there will typically be an error associated with each forecast, errors will most likely
accumulate (or build on themselves) as the forecast horizon increases.
The first step is to restrict the active sample. Next, estimate your equation. Third, generate the
forecast and while setting the sample to be outside of the active sample. Last, compare the actual
and fitted residuals.
As explained in the lecture, the model simulator is a generalized forecasting tool that can deal
with models with many equations. The values obtained from the model simulator for 2012-14
(atl_infl_0) are the same as those obtained using the Forecast tab (atl_infl_od). This should come
as no surprise as the EViews simulator is solving the same equation that we used to forecast in
previous questions.
create alternative scenarios using EViews. A scenario in EViews is a (dynamic or static) forecast
conditional on a specific set of assumptions regarding the exogenous variables in the model.
Obviously, the baseline forecast is an example of one scenario. EViews allows you to define any
number of scenarios.
As time increases, all the pdfs shown in the graph (there is one for each period t) appear to be
centered at zero. Therefore the mean of Y does not change. However, the dispersion around zero
increases with t, since the pdfs become flatter as time t increases (that is, a smaller number of
observations are close to the zero mean). Thus, the mean remains constant over time but the
variance increases.
Because observed time series are actually finite draws from infinite stochastic processes, there is
always a chance that you may never correctly identify the true stochastic process. What the
methods and tools discussed in this module will allow is for you to come up with an educated
guess of what the stochastic process is, which will in turn allow you to produce a reasonable
forecast for the time series being analyzed. However, in some cases it may boil down to
judgment, as there might not exist an unambiguously “best” model.
As long as there is nonzero correlation between y and any of its lags, y cannot be white noise.
However, y can be either stationary or nonstationary, as we do not know whether the mean,
variance, or covariance (correlations) change over time. Furthermore, with the given lag-one
correlation of 0.65, we cannot determine if y is an MA(q) or ARMA(p,q). All three models can
produce a correlation of 0.65 at the first lag.
An upward trend implies that the mean is changing over time, therefore y is nonstationary and all
of the stationary processes (AR, ARMA, and white noise) can be eliminated.
the ACF of a pure AR(1) decays geometrically, and therefore the ACF at lag 1 is exactly the
autoregressive coefficient (0.55), and at lag 2 is equal to the square of the autoregressive
coefficient (0.552)^2 = 0.3025.
For one lag, the PAC is defined as the estimated coefficient b1 from the following regression:
For the theoretical AR(1) process defined above for Variable A, the estimated coefficient should
be exactly equal to 0.55. We also know that the theoretical PAC of an AR(p) drops to zero after
lag p. In this case, p 1, therefore the PAC at lag 2 should be equal to zero.
Estimating the PAC for one period amounts to regressing A on its 1-period lag, and taking the
estimated coefficient on the lag as the PAC. For the 2-period lag PAC, you must regress A on its
1-period and 2-period lags, with the estimated coefficient on the 2-period lag being the PAC at
two lags.
Note that the theoretical AR(1) should have a 2-period PAC of zero, whereas our simulated
AR(1) has a 2-period PAC that is small but still statistically significant. This shows that even
with a relatively large number of observations (500), a simulation of A does not necessarily
replicate the behavior of the true stochastic process underlying it. A researcher looking at this
time series (without knowing its true nature) might plausibly (but incorrectly!) infer that A is an
AR(2).This example also suggests how difficult it can be to infer the true stochastic process from
actual economic time series.
MA(1): R2 = 0.852
MA(2): R2 = 0.854
MA(3): R2 = 0.855
Indicate which of the following could explain why the MA(3) is not the best model for Z (select
all that apply): Although the MA(3) has the best fit, it does not have the lowest AIC. Not all of
the three MA(3) coefficients are statistically significant.
EXPLANATION
Options a and b are correct and related; introducing additional terms into the regression will
improve the fit (if only marginally), but these might not be statistically significant. The AIC
weighs the improved fit with the additional parameters introduced, so it may actually be higher
(i.e., worse) for MA(3) than for the other two specifications. If another of the two models has a
lower AIC, then MA(3) cannot be the best. Also, there may be significant AR terms that could be
included that lower AIC even further. Option c is incorrect; parsimony is not an objective on its
own, otherwise, we would choose only AR(1) or MA(1) models.
Consider the following three models that have been estimated for pe_ind (the stock market price-
earnings ratio in India).
Model 1 is an AR(1);
Model 2 has AR terms at lags 1 and 8 only;
Model 3 has AR terms at lags 1 and 8, and an MA term at lag 8.
For each model, the regression results and the ACF, PACF and Q-Statistics for the residuals are
shown below: Note: You are strongly encouraged to replicate the results yourself by using
Quick->Estimate equation-> then typing in the equation pe_ind c ar(1) for example or running
the command equation model1.ls pe_ind c ar(1) which will save the results in an equation object
model1. Note that SIGMASQ which appears in the output if you are using EViews 9 is not the
coefficient of an additional variable in the regression. It is the estimate of the error variance of
the regression based on the Maximum Likelihood method. You can ignore it in the context of this
module. Also note that if you use EViews 8, which uses a different default estimation method,
SIGMASQ does not appear and the numerical results could be different.
In discussing the AIC and SBC in the lecture, we referred to SSR as the sum of squares of
residuals, which reflects the goodness of fit of the regression. Model 3, with the highest number
of estimated parameters, has the lowest SSR (297.8219) and, equivalently, the highest
R2(0.840784)
In addition to having the highest R2, Model 3 also has the lowest AIC; therefore the additional
parameters are significant enough to lower AIC. Note, however, that the AR(8) term is not
statistically significant.
Is the model with the lowest AIC also the one with the lowest Schwartz Criterion (SBC)?
With which model or models can you be reasonably confident that the residuals are not serially correlated?
(Select all that apply)
Note: to replicate the correlograms of the residuals you can either click on the regression window
then View->Residual diagniostics->Correlogram – Q-statistics or run the command
model1.correl(12) if you already defined the equation “model1” as show in the note above.
The null hypothesis of the Q-statistic is that there is no autocorrelation in the true residuals for all
lags up to s. All of the reported Q-statistics for Models 2 and 3 are lower than the corresponding
10 percent critical value. (Equivalently, the reported p-values are greater than 10%.) Hence one
fails to reject the null hypothesis that the residuals are not serially correlated. On the other hand,
Model 1 rejects this null hypothesis using a 10 percent level of significance, but only after lag 7
(i.e., for lags 8-11).
Based on your reading of the results, what will you do now to determine the best model for pe_ind?
Although here we enter into the realm of judgment, it would be uncontroversial to say that
Model 3 is better than the other two; it achieves lower AIC and SBC, and the residuals are well-
behaved. However, it also noticeable that the AR term at lag 8 is not significant, therefore it is
possible that the MA term at lag 8 is correcting for the autocorrelation observed in Model 1, and
therefore the AR term at lag 8 might not be needed. Therefore, one additional specification that
one might consider is an AR(1) with a single MA term at lag 8. It might achieve uncorrelated
residuals with fewer parameters, and might lower AIC and SBC further. So our work is not quite
done yet!
The true model, the AR(2) regression, has a lower AIC and SBC, exhibits significant coefficients
for both lags, and serially uncorrelated residuals. Therefore, the diagnostics favor the true model.
Looking at the Model Selection Criteria table, one can see that the true model, AR(2) was one of
the best in terms of AIC, but actually has the lowest Swartz Information Criteria SIC (BIC), as it
is reported in the table). Therefore, using SIC (BIC) would lead one to choose the true model as
the best.
In the assessments for Session 5, we suggested that a model with an AR(1) and a single MA term
at lag 8 might outperform the three models we had observed. Run this model, which we will call
Model 4 (note that the Automatic ARIMA Modeling would not consider such a model, as it
would include all lags up to lag 8 not just the 8th lag and no other MA term). The results for the
previous models are summarized below:
Model 4 achieves serially uncorrelated residuals with only two estimated parameters, both of
which are highly significant. Its AIC (3.396371) and SBC (3.466789) are lower than for any of
the three previous models, and also lower than that of the ARMA(5,6) chosen by Automatic
ARIMA Modeling.
Lesson: it might be possible to find a model that outperforms those chosen by Automatic ARIMA
Modeling, by choosing more parsimonious models that include some higher-order lags but
exclude intermediate lags. Automatic ARIMA Modeling is constrained to include all lags up to p
or q.