Gpebook PDF
Gpebook PDF
Gpebook PDF
Kuan-Pin Lin
ETEXT
Los Angeles
www.etext.net
COMPUTATIONAL ECONOMETRICS
GAUSS Programming for Econometricians and Financial Analysts
ISBN 0-9705314-3-5
All rights reserved. No part of this book and the accompanying software may be reproduced,
stored in a retrieval system, translated or transcribed, in any form or by any means—
electronic, mechanical, photocopying, recording, or otherwise—without the prior written
permission of the copyright owner.
Although every precaution has been taken in the preparation this book and the accompanying
software, the publisher and author make no representation or warranties with respect to the
accuracy or completeness of the contents, and specifically disclaim any implied warranties of
merchantability or fitness for any particular purpose, and shall in no event be liable for any
loss of profit or any damages arising out of the use of this book and the accompanying
software.
Trademarks
GAUSS is a trademark of Aptech Systems, Inc. GPE2 is a product name of Applied Data
Associates. All other brand names and product names used in this book are trademarks,
registered trademarks, or trade names of their respective holders.
ii
Preface
Computational Econometrics is an emerging field of applied economics which focuses on the
computational aspects of econometric methodology. To explore an effective and efficient
approach for econometric computation, GAUSS Programming for Econometricians and
Financial Analysts (GPE) was originally developed as the outcome of a faculty-student joint
project. The author developed the econometric program and used it in the classroom. The
students learned the subject materials and wrote about their experiences in using the program
and GAUSS.
We know that one of the obstacles in learning econometrics is the need to do computer
programming. Who really wants to learn a new programming language while at the same time
struggling with new econometric concepts? This is probably the reason that “easy-to-use”
packages such as RATS, SHAZAM, EVIEWS, and TSP are often used in teaching and
research. However, these canned packages are inflexible and do not allow the user sufficient
freedom in advanced modeling. GPE is an econometrics package running in the GAUSS
programming environment. You write simple codes in GAUSS to interact with GPE
econometric procedures. In the process of learning GPE and econometrics, you learn GAUSS
programming at your own pace and for your future development.
Still, it takes some time to become familiar with GPE, not to mention the GAUSS language.
The purpose of this GPE project is to provide hands-on lessons with illustrations on using the
package and GAUSS. GPE was first developed in 1991 and has since undergone several
updates and revisions. The first version of the project, code-named LSQ, started in the
summer of 1995 with limited functions of least squares estimation and prediction. This book
and CDROM represent a major revision of this work in progress, including linear and
nonlinear regression models, simultaneous linear equation systems, and time series analysis.
Here, in your hands, is the product of GPE. The best way to learn GPE is to read the book,
type in and run each lesson, and explore the sample programs and output. For your
convenience, all the lessons and data files are available on the distribution disk.
During several years of teaching econometrics using the GPE package, many students
contributed to the ideas and codes in GPE. Valuable feedback and suggestions were
incorporated into developing this book. In particular, the first LSQ version was a joint project
with Lani Pennington, who gave this project its shape. Special thanks are due to Geri
Manzano, Jennifer Showcross, Diane Malowney, Trish Atkinson, and Seth Blumsack for their
efforts in editing and proofreading many draft versions of the manuscript and program
lessons. As always, I am grateful to my family for their continuing support and understanding.
iii
Table of Contents
PREFACE................................................................................................................................iii
TABLE OF CONTENTS ............................................................................................................. v
I INTRODUCTION ................................................................................................................... 1
Why GAUSS?..................................................................................................................... 1
What is GPE? .................................................................................................................... 1
Using GPE......................................................................................................................... 2
II GAUSS BASICS .............................................................................................................. 5
Getting Started................................................................................................................... 5
An Introduction to GAUSS Language................................................................................ 7
Creating and Editing a GAUSS Program........................................................................ 17
Lesson 2.1 Let’s Begin ...............................................................................................................18
File I/O and Data Transformation................................................................................... 21
Lesson 2.2: File I/O ....................................................................................................................23
Lesson 2.3: Data Transformation................................................................................................25
GAUSS Built-In Functions............................................................................................... 26
Lesson 2.4: Data Analysis ..........................................................................................................32
Controlling Execution Flow ............................................................................................ 33
Writing Your Own Functions........................................................................................... 36
User Library .................................................................................................................... 40
GPE Package................................................................................................................... 41
III LINEAR REGRESSION MODELS ..................................................................................... 43
Least Squares Estimation ................................................................................................ 43
Lesson 3.1: Simple Regression...................................................................................................44
Lesson 3.2: Residual Analysis ....................................................................................................46
Lesson 3.3: Multiple Regression ................................................................................................48
Estimating Production Function...................................................................................... 50
Lesson 3.4: Cobb-Douglas Production Function ........................................................................51
Lesson 3.5: Testing for Structural Change..................................................................................55
Lesson 3.6: Residual Diagnostics ...............................................................................................58
IV DUMMY VARIABLES .................................................................................................... 63
Seasonality....................................................................................................................... 63
Lesson 4.1: Seasonal Dummy Variables.....................................................................................64
Lesson 4.2: Dummy Variable Trap.............................................................................................67
Structural Change............................................................................................................ 68
Lesson 4.3: Testing for Structural Change: Dummy Variable Approach ...................................68
V MULTICOLLINEARITY ..................................................................................................... 73
Detecting Multicollinearity.............................................................................................. 73
Lesson 5.1: Condition Number and Correlation Matrix..............................................................73
Lesson 5.2: Theil’s Measure of Multicollinearity.......................................................................75
Lesson 5.3: Variance Inflation Factors (VIF) .............................................................................77
Correction for Multicollinearity ...................................................................................... 78
Lesson 5.4: Ridge Regression and Principal Components..........................................................78
VI NONLINEAR OPTIMIZATION ......................................................................................... 81
Solving Mathematical Functions ..................................................................................... 81
Lesson 6.1: One-Variable Scalar-Valued Function.....................................................................82
Lesson 6.2: Two-Variable Scalar-Valued Function....................................................................85
Estimating Probability Distributions ............................................................................... 87
Lesson 6.3: Estimating Probability Distributions....................................................................... 88
Lesson 6.4: Mixture of Probability Distributions ....................................................................... 91
Statistical Regression Models .......................................................................................... 93
Lesson 6.5: Minimizing Sum-of-Squares Function.................................................................... 94
Lesson 6.6: Maximizing Log-Likelihood Function.................................................................... 96
VII NONLINEAR REGRESSION MODELS ............................................................................99
Nonlinear Least Squares.................................................................................................. 99
Lesson 7.1: CES Production Function...................................................................................... 100
Maximum Likelihood Estimation ................................................................................... 101
Lesson 7.2: Box-Cox Variable Transformation........................................................................ 104
Statistical Inference in Nonlinear Models...................................................................... 108
Lesson 7.3: Hypothesis Testing for Nonlinear Models ............................................................ 109
Lesson 7.4: Likelihood Ratio Tests of Money Demand Equation............................................ 112
VIII DISCRETE AND LIMITED DEPENDENT VARIABLES ..................................................113
Binary Choice Models.................................................................................................... 113
Lesson 8.1: Probit Model of Economic Education................................................................... 115
Lesson 8.2: Logit Model of Economic Education .................................................................... 119
Limited Dependent Variable Models ............................................................................. 121
Lesson 8.3: Tobit Analysis of Extramarital Affairs.................................................................. 122
IX HETEROSCEDASTICITY ..............................................................................................127
Heteroscedasticity-Consistent Covariance Matrix ........................................................ 127
Lesson 9.1: Heteroscedasticity-Consistent Covariance Matrix ................................................ 127
Weighted Least Squares ................................................................................................. 130
Lesson 9.2: Goldfeld-Quandt Test and Correction for Heteroscedasticity ............................... 130
Lesson 9.3: Breusch-Pagan and White Tests for Heteroscedasticity........................................ 132
Nonlinear Maximum Likelihood Estimation .................................................................. 134
Lesson 9.4: Multiplicative Heterscedasticity............................................................................ 135
X AUTOCORRELATION .....................................................................................................143
Autocorrelation-Consistent Covariance Matrix............................................................. 143
Lesson 10.1: Heteroscedasticity-Autocorrelation-Consistent Covariance Matrix .................... 144
Detection of Autocorrelation ......................................................................................... 146
Lesson 10.2: Tests for Autocorrelation .................................................................................... 147
Correction for Autocorrelation ...................................................................................... 149
Lesson 10.3: Cochrane-Orcutt Iterative Procedure .................................................................. 151
Lesson 10.4: Hildreth-Lu Grid Search Procedure .................................................................... 153
Lesson 10.5: Higher-Order Autocorrelation............................................................................. 154
Autoregressive and Moving Average Models: An Introduction ..................................... 157
Lesson 10.6: ARMA(1,1) Error Structure ................................................................................ 158
Nonlinear Maximum Likelihood Estimation .................................................................. 161
Lesson 10.7: Nonlinear ARMA Model Estimation .................................................................. 161
XI DISTRIBUTED LAG MODELS .......................................................................................167
Lagged Dependent Variable Models.............................................................................. 167
Lesson 11.1: Testing for Autocorrelation with Lagged Dependent Variable ........................... 167
Lesson 11.2: Instrumental Variable Estimation........................................................................ 170
Polynomial Lag Models ................................................................................................. 173
Lesson 11.3: Almon Lag Model Revisited............................................................................... 173
Autoregressive Distributed Lag Models......................................................................... 176
Lesson 11.4: Almon Lag Model Once More............................................................................ 176
XII GENERALIZED METHOD OF MOMENTS .....................................................................179
GMM Estimation of Probability Distributions............................................................... 179
Lesson 12.1 Gamma Probability Distribution .......................................................................... 181
vi
GMM Estimation of Econometric Models ......................................................................185
Lesson 12.2 A Nonlinear Rational Expectations Model ...........................................................188
Linear GMM ...................................................................................................................192
Lesson 12.3 GMM Estimation of U.S. Consumption Function ................................................192
XIII SYSTEM OF SIMULTANEOUS EQUATIONS ............................................................... 195
Linear Regression Equations System..............................................................................195
Lesson 13.1: Klein Model I ......................................................................................................197
Lesson 13.2: Klein Model I Reformulated................................................................................202
Seemingly Unrelated Regression Equations System (SUR) ............................................204
Lesson 13.3: Berndt-Wood Model............................................................................................204
Lesson 13.4: Berndt-Wood Model Extended............................................................................207
Nonlinear Maximum Likelihood Estimation...................................................................209
Lesson 13.5: Klein Model I Revisited ......................................................................................211
XIV UNIT ROOTS AND COINTEGRATION ....................................................................... 215
Testing for Unit Roots.....................................................................................................216
Lesson 14.1: Augmented
Dickey-Fuller Test for Unit Roots ...........................................................................................217
Testing for Cointegrating Regression.............................................................................223
Lesson 14.2: Cointegration Test: Engle-Granger Approach .....................................................225
Lesson 14.3: Cointegration Test: Johansen Approach ..............................................................230
XV TIME SERIES ANALYSIS ........................................................................................... 233
Autoregressive and Moving Average Models .................................................................234
Lesson 15.1: ARMA Analysis of Bond Yields.........................................................................235
Lesson 15.2: ARMA Analysis of U.S. Inflation .......................................................................239
Autoregressive Conditional Heteroscedasticity..............................................................240
Lesson 15.3 ARCH Model of U.S. Inflation.............................................................................243
Lesson 15.4 ARCH Model of Deutschemark-British Pound Exchange Rate ...........................245
XVI PANEL DATA ANALYSIS ........................................................................................ 251
Fixed Effects Model ........................................................................................................251
Lesson 16.1: One-Way Panel Data Analysis: Dummy Variable Approach ..............................253
Random Effects Model....................................................................................................256
Lesson 16.2: One-Way Panel Data Analysis: Deviation Approach ..........................................258
Lesson 16.3: Two-Way Panel Data Analysis............................................................................261
Seemingly Unrelated Regression System ........................................................................263
Lesson 16.4: Panel Data Analysis for Investment Demand: Deviation Approach....................264
Lesson 16.5: Panel Data Analysis for Investment Demand: SUR Method ...............................266
XVII LEAST SQUARES PREDICTION .............................................................................. 271
Predicting Economic Growth .........................................................................................271
Lesson 17.1: Ex-Post Forecasts and Forecast Error Statistics...................................................272
Lesson 17.2: Ex-Ante Forecasts ...............................................................................................277
EPILOGUE ................................................................................................................. 281
APPENDIX A GPE CONTROL VARIABLES ............................................................ 283
Input Control Variables..................................................................................................283
General Purpose Input Control Variables .................................................................................283
ESTIMATE Input Control Variables........................................................................................284
FORECAST Input Control Variables .......................................................................................292
Output Control Variables ...............................................................................................293
ESTIMATE Output Control Variables .....................................................................................293
FORECAST Output Control Variables ....................................................................................294
APPENDIX B GPE APPLICATION MODULES .......................................................... 297
Application Module B-1: GMM.GPE .............................................................................297
vii
Application Module B-2: JOHANSEN.GPE .................................................................. 299
Application Module B-3: PANEL1.GPE........................................................................ 300
Application Module B-4: PANEL2.GPE........................................................................ 303
APPENDIX C STATISTICAL TABLES ......................................................................313
Table C-1. Critical Values for the Dickey-Fuller Unit Root Test Based
on t-Statistic ................................................................................................................... 313
Table C-2. Critical Values for the Dickey-Fuller Unit Root Test Based
on F-Statistic.................................................................................................................. 315
Table C-3. Critical Values for the Dickey-Fuller Cointegration t-Statistic
τρ Applied on Regression Residuals.............................................................................. 316
Table C-4. Critical Values for Unit Root and Cointegration Tests Based
on Response Surface Estimates...................................................................................... 317
Table C-5: Critical Values for the Johansen’s Cointegration Likelihood
Ratio Test Statistics........................................................................................................ 319
REFERENCES ..........................................................................................................321
INDEX ...........................................................................................................................323
viii
I
Introduction
GAUSS Programming for Econometricians and Financial Analysts (GPE) is a
package of econometric procedures written in GAUSS, and this book is about
GAUSS programming for econometric analysis and applications using GPE. To
explore the computational aspects of applied econometrics, we adopt the
programming environment of GAUSS and GPE.
You cannot learn econometrics by just reading your textbook or by just writing
GAUSS code or programs. You must interact with the computer and textbook by
working through the examples. That is what this book is all about—learning by
doing.
Why GAUSS?
GAUSS is a programming language similar to C or Pascal. GAUSS code works on
matrices as the basis of a complete programming environment. It is flexible and
easily applies itself to any kind of matrix-based computation.
GAUSS comes with about 400 intrinsic commands ranging from file input/output
(I/O) and graphics to high-level matrix operations. There are many GAUSS libraries
and application packages, which take advantage of these built-in commands and
procedures for implementing accurate and efficient computations.
The use of libraries and packages hides complex programming details and simplifies
the interface with a set of extended procedures and control variables. For instance,
GAUSS supports publication quality graphics by use of a library which extends the
main system with a set of control variables manipulated on the defined graphic
procedures.
What is GPE?
GPE is a GAUSS package for linear and nonlinear regression useful for econometric
analysis and applications. GPE contains many econometric procedures controllable
by a few groups of global variables. It covers most basic econometric computations
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
However, beyond econometric computation, GPE does not provide a user interface
for data input and output nor are there any procedures for data transformation. Both
of these operations and other related topics, which build the interaction between GPE
and the GAUSS programming environment, will be discussed in the next chapter on
GAUSS Basics. Using the GPE package in a GAUSS environment is first introduced
in Chapter III on linear least squares estimation and is the foundation of the rest of
the book.
Using GPE
This book and CDROM were developed based on the latest version of GAUSS for
Windows1. Before using the GPE package, it must be properly installed with your
GAUSS program. Install GPE according to the instructions given with the
distribution CD. Make sure that the version number of GPE matches with that of
your GAUSS program.2
Following the completion of GPE installation, the compiled GPE program named
GPE2.GCG should reside in the GAUSS directory. GPE2.GCG is a compiled
GAUSS program. It is an encoded binary file, which requires the correct version of
GAUSS. In addition, a GPE subdirectory is created and stores all the lesson
programs and data files. GPE is the working directory for all the empirical lessons.
By going through this book lesson by lesson, program files may be overwritten and
additional output files are generated. If you want a fresh start, just reinstall the GPE
package.
All the GPE lesson programs are written with direct reference to the GPE
subdirectory created during installation. Using the default GPE subdirectory is
convenient because all the lesson programs and data files are already there for you to
explore. Alternately, you may want to use a working diskette for the practice of
creating each lesson. If you don’t mind typing, using a working diskette is not only
portable but also a true hands-on experience. You need only to change the references
of the GPE subdirectory in each lesson program to the floppy drive your working
diskette resides on (a: is assumed). That is, in the beginning of each lesson program,
replace gpe\ with a:\. You may also need to copy the required data files to the
working diskette. A working diskette is recommended especially if you are using
GAUSS in a laboratory environment.
It is important to recognize that this book is not a GAUSS how-to manual or program
documentation, for which you are advised to consult GAUSS for Windows User
Guide and GAUSS Language References supplied from Aptech Systems. Also, this is
not a book on econometrics, although many fundamental formulas for econometric
computation are introduced in order to use the implemented algorithms and routines.
There are many textbooks on econometrics that describe the technical details. Rather,
this is a book on computational aspects of implementing econometric methods. We
provide step by step instruction using GPE and GAUSS, complete with explanations
and sample program codes. GAUSS program codes are given in small chunks in a
piece-meal construction. Each chunk, or lesson, offers hands-on practice for
1
This writing is based on GAUSS for Windows version 5.0.
2
GPE is also available for earlier versions of GAUSS.
2
INTRODUCTION
economic data analysis and econometric applications. Most examples can be used on
different computer platforms without modification.
A number of abbreviations for statistical and econometric terms are used in this text.
Although all are defined upon their first appearance, we provide a list of these
abbreviations below for reference purposes:
3
We thank Aptech Systems for permission to use their GAUSS 3.2 “hammer on numbers”
icon.
3
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
4
II
GAUSS Basics
GAUSS is a high-level computer language suitable for mathematical and matrix-
oriented problem solving. It can be used to solve any kind of mathematical,
statistical, or econometric model. Since GAUSS is a computer language, it is
flexible. But it is also more difficult to learn than most canned (prewritten)
econometric programs such as EVIEWS, SHAZAM, and TSP.
In this chapter we begin with the basics of starting GAUSS for Windows. After
learning how to get in and out of GAUSS, we discuss much of the GAUSS language.
At the end of the chapter, we introduce the GPE (GAUSS Programming for
Econometricians and Financial Analysts) package and briefly describe its capacity
for econometric analysis and applications.
Getting Started
Start GAUSS for Windows in one of the following ways:
• Click the short-cut (an icon with GAUSS logo) on the desktop.
• From Start button at the lower left corner, from Programs menu, select and run
GAUSS.
• Use Windows Explorer or File Manager to locate the GAUSS directory4 and
execute the file GAUSS.EXE.
To quit and exit GAUSS for Windows, do either one of the following:
Windows Interface
If you are new to the GAUSS programming environment, you need to spend some
time to familiarize yourself with the GAUSS Windows interface. A good reference is
GAUSS for Windows User Guide. Or, from the menu bar go to Help/Contents to
learn about GAUSS and its Windows interface. Understanding the working function
of each button on the menu bar, toolbar (below the menu bar), and status bar (bottom
bar of the main window) is the crucial beginning of GAUSS programming.
Briefly, GAUSS for Windows runs in two modes: Command and Edit. Each mode
has its own window. The Command Input-Output window (or Command mode) is
4
GAUSS directory refers to the directory in which you have successfully installed the
GAUSS program in your computer. Assuming C: is your boot drive, by default installation,
the GAUSS directory may be C:\GAUSS, C:\GAUSS50 (for Version 5.0), or C:\GAUSSLT
(for Light Version 5.0). In the following, we refer to C:\GAUSS as the GAUSS directory.
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
6
GAUSS BASICS
You may want to configure the programming environment to fit your taste as
desired. This is done from the menu bar button Configure in which you can change
the program setup and window properties. In the GAUSS programming
environment, you can also trace and debug a program file in the Debug window.
This is more suited for a programmer in developing a large program, which we will
not cover in this book.
We have seen that GAUSS commands are either written in the Command or Edit
mode. Command mode executes each line of code as it is written. Simple GAUSS
commands can be typed and executed (by pressing the carriage return or Enter key)
5
This session is written based on introductory materials for MathWorks’ MATLAB prepared
by William F. Sharpe for his finance course at Stanford (http://www.stanford.edu/~wfsharpe/
mia/mat/mia_mat3.htm). We thank Professor Sharpe for his helpful comments and
suggestions. Both GAUSS and MATLAB are matrix programming languages, and they are
syntactically similar.
7
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
line by line at the “>>” prompt in the Command window. In the beginning, to
introduce the basic commands and statements of GAUSS, we shall stay in the
Command Input-Output window and use the Command or interactive mode. If the
Output window is open, close it.
First of all, each line of GAUSS code must end with a semi-colon (;).
If both A and B are scalars (1 by 1 matrices), C will be a scalar equal to their sum. If
A and B are row vectors of identical length, C will be a row vector of the same
length. Each element of C will be equal to the sum of the corresponding elements of
A and B. Finally, if A and B are, say, 3×4 matrices, C will also be a 3×4 matrix, with
each element equal to the sum of the corresponding elements of A and B.
In short the symbol “+” means “perform a matrix addition.” But what if A and B are
of incompatible sizes? Not surprisingly, GAUSS will complain with a statement such
as:
So the symbol “+” means “perform a matrix addition if you can and let me know if
you can’t.” Similar rules and interpretation apply to matrix operations such as “-”
(subtraction) and “*” (multiplication).
Assignment Statements
GAUSS uses a pattern common in many programming languages for assigning the
value of an expression to a variable. The variable name is placed on the left of an
equal sign and the expression on the right. The expression is evaluated and the result
assigned to the variable. In GAUSS, there is no need to declare a variable before
assigning a value to it. If a variable has previously been assigned a value, a number,
or a string, the new value overrides the predecessor. Thus if A and B are of size
20×30, the statement:
C = A + B;
creates a variable named C that is also 20×30 and fills it with the appropriate values
obtained by adding the corresponding elements in A and B. If C already existed and
was, say, 20×15 it would be replaced with the new 20×30 matrix. Therefore, matrix
8
GAUSS BASICS
variables in GAUSS are not fixed in size. In GAUSS, unlike some languages, there is
no need to pre-dimension or re-dimension variables. It all happens without any
explicit action on the part of the user.
Variable Names
The GAUSS environment is case insensitive. Typing variable names in uppercase,
lowercase, or a combination of both does not matter. That is, GAUSS does not
distinguish between uppercase and lowercase except inside double quotes. A variable
name can have up to 32 characters, including letters, numbers and underscores. The
first character must be alphabetic or an underscore. Therefore the variable
PersonalDisposableIncome is the same as personaldisposableincome.
While it is tempting to use long names for easy reading, small typing errors can mess
up your programs. If you do mistype a variable name, you may get lucky (e.g. the
system will complain that you have asked for the value of an undefined variable) or
you may not (e.g. you will assign the new value to a newly created variable instead
of the old one desired). In programming languages there are always tradeoffs. You
don’t have to declare variables in advance in GAUSS. This avoids a great deal of
effort, but it allows for the possibility that nasty and difficult-to-detect errors may
creep into your programs.
Showing Values
If at any time you wish to see the contents of a variable, just type its name. GAUSS
will do its best, although the result may extend beyond the Command or Output
window if the variable is a large matrix (remember that you can always resize the
window). If the variable, say x, is not defined or has not previously been given a
value, a message such as:
Undefined symbols:
x (0)
will appear.
GAUSS will not show you the result of an assignment statement unless you
specifically request for it. Thus if you type:
C = A + B;
No values will be shown although C is now assigned with values of the sum of A and
B. But, if you type:
C;
GAUSS will show you the value of C. It may be a bit daunting if C is, say, a 20 by 30
matrix. If the variable C is not of interest, and what you want to see is the result of A
plus B, simply type:
A + B;
9
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Initializing Matrices
If a matrix is small enough, one can provide initial values by simply typing them in
the Command window. For example:
a = 3;
b = {1 2 3};
c = {4, 5, 6};
d = {1 2 3, 4 5 6};
Here, a is a scalar, b is a 1×3 row vector, c a 3×1 column vector, and d is a 2×3
matrix. Thus, typing
d;
produces:
The system for indicating matrix contents is very simple. Values separated by spaces
belong on the same row; those separated by commas are on separate rows. All values
are enclosed in brace brackets.
The alternative to creating a matrix using constants is to use the GAUSS built-in
command let. If dimensions are given, a matrix of that size is created. The
following statement creates a 2×3 matrix:
let d[2,3] = 1 2 3 4 5 6;
Note that dimensions of d are enclosed in square brackets, not curly brace
brackets. If dimensions are not given, a column vector is created:
let d = 1 2 3 4 5 6;
If curly braces are used, the let is optional. That is, the following two
expressions will create the same matrix d:
let d = {1 2 3, 4 5 6};
d = {1 2 3, 4 5 6};
10
GAUSS BASICS
While
a = {1 2 3};
b = {4 5 6};
d = a|b;
print d;
Matrices can easily be pasted together in this manner, a process that is both simple
and easily understood by anyone reading a procedure. Of course, the sizes of the
matrices must be compatible. If they are not, GAUSS will tell you.
or
d = {a,b};
equals:
2.0000000
While
d[2,1];
equals:
4.0000000
In every case the first bracketed expression indicates the desired row (or rows), while
the second expression indicates the desired column (or columns). If a matrix is a
vector, a single expression may be given to indicate the desired element, but it is
often wise to give both row and column information explicitly.
11
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The real power of GAUSS comes into play when more than a single element of a
matrix is wanted. To indicate “all the rows” use a dot for the first expression. To
indicate “all the columns,” use a dot for the second expression. Thus,
d[1,.];
equals:
That is, d[1,.] yields a matrix containing the entire first row of d. While,
d[.,2];
equals:
2.0000000
5.0000000
That is, d[.,2] yields a matrix containing the entire second column of d. In fact,
you may use any expression in this manner as long as it includes a valid row or
column numbers. For example,
d[2,2:3];
equals:
5.0000000 6.0000000
And
d[2,3:2];
equals:
6.0000000 5.0000000
equals:
5.0000000 6.0000000
2.0000000 3.0000000
5.0000000 6.0000000
12
GAUSS BASICS
d[.,1 3];
Recall that “.” is a wildcard symbol and may be used when indexing a matrix, rows,
or columns, to mean “any and all.”
Text Strings
GAUSS is wonderful with numbers. It deals with text too, but one can tell that its
heart isn’t in it.
A variable in GAUSS is one of two types: numeric or string. A string is like any
other variable, except the elements in it are interpreted as ASCII numbers. Thus the
number 32 represents a space, and the number 65 a capital A, etc. To create a string
variable, enclose a string of characters in double quotation marks. Thus:
stg = "This is a string";
The variable named stg is assigned a string of characters: “This is a string.” Since a
string variable is in fact a row vector of numbers, it is possible to create a list of
strings by creating a matrix in which each row or column is a separate string. As with
all standard matrices, each element of a string matrix can only have up to 8
characters long, which is exactly the 32-bit size number can hold. To print a string
matrix, the variable must be prefixed with a dollar sign ($). Thus the statement
x = {"ab", "cd"};
print $x;
produces:
ab
cd
While
x = {"ab" "cd"};
print $x;
produces:
ab cd
as always.
To see the importance of including the dollar sign in front of a variable, type:
print x;
13
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Matrix Operations
Matrix transposition is as easy as adding a prime (apostrophe) to the name of the
matrix. Thus
x = {1 2 3};
print x';
produces:
1.0000000
2.0000000
3.0000000
To add two matrices of the same size, use the plus (+) sign. To subtract one matrix
from another of the same size, use a minus (-) sign. If a matrix needs to be “turned
around” to conform, use its transpose. Thus, if A is 3×4 and B is 4×3, the statement
C = A + B;
While
C = A + B';
In GAUSS, there are some cases in which addition or subtraction works when the
matrices are of different sizes. If one is a scalar, it is added to or subtracted from all
the elements in the other. If one is a row vector and its size matches with the number
of columns in the other matrix, this row vector is swept down to add or subtract the
corresponding row elements of the matrix. Similarly, if one is a column vector and
its size matches with the number of rows in the other matrix, this column vector is
swept across to add or subtract the corresponding column elements of the matrix.
For instance,
x = {1 2 3};
y = {1 2 3, 4 5 6, 7 8 9};
x + y;
produces
While,
x' + y;
produces
14
GAUSS BASICS
GAUSS provides two notations for matrix division which provide rapid solutions to
simultaneous equation or linear regression problems. They are better discussed in the
context of such problems later.
Array Operations
To indicate an array (element-by-element) multiplication, precede a standard
operator with a period (dot). Thus,
x = {1 2 3};
y = {4 5 6};
x.*y;
produces:
You may divide all the elements in one matrix by the corresponding elements in
another, producing a matrix of the same size, as in:
C = A ./ B;
In each case, one of the operands may be a scalar or the matrices must be element-
by-element compatible. This proves handy when you wish to raise all the elements in
a matrix to a power. For example:
x = {1 2 3};
x.^2;
produces
15
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
• EQ or == Equal
• NE or /= Not equal
Note carefully the difference between the double equality and the single equality.
Thus A==B should be read “A is equal to B,” while A=B should be read “A is
assigned the value of B.” The former is a logical relation, the latter an assignment
statement. For comparisons between character data and comparisons between strings,
these operators should be preceded by a dollar sign ($).
produces:
1.0000000
While
x = 1 > 3;
print x;
produces:
0.0000000
produces:
0.0000000
This is because there is at least one element of A that is not greater than 2. While
16
GAUSS BASICS
A .> 2;
produces:
0.0000000 0.0000000
1.0000000 1.0000000
Similarly,
A = {1 2, 3 4};
B = {3 1, 2 2};
A > B;
produces:
0.0000000
While
A .> B;
produces:
0.0000000 1.0000000
1.0000000 1.0000000
You may also use logical operators of which we will only mention the frequently
used ones in passing:
• not
• and
• or
If the logical operator is preceded by a dot (.), the result will be a matrix of 1’s and
0’s based on an element-by-element logical comparison of two matrices. Each
operator works with matrices on an element-by-element basis and conforms to the
ordinary rules of logic, treating any non-zero element as true and a zero element as
false.
Relational and logical operators are used frequently with if statements (described
below) and scalar variables, as in more mundane programming languages. But the
ability to use them with matrices offers major advantages in statistical and
econometric applications.
17
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
GAUSS for Windows provides a consistent and convenient window interface for
program development. From the menu bar File/New (or by clicking on the blank
page icon from the toolbar), you can open a blank Edit window to create a file from
scratch. If the file exists, from the menu bar File/Open (or by clicking on the open
folder icon from the toolbar), then select the name of the file to load its contents into
the Edit window. You can also open a file in the Edit window by typing the file
name in the Command window, including the directory in which the file is stored.
This Edit window will then “pop up” and layer over the Command window. Note
that the title of the Edit window is the name of the file you open for editing. After
editing, selecting Run Active File from the Run menu button saves and runs the
program file, with outputs shown in the Command or Output window. If you are not
running the program file after editing, do not forget to save it.
A group of program and data files may be involved in a project. They can be created,
loaded, and edited each in their separate Edit windows. GAUSS keeps track of two
types of files: an active file and a main file. The active file is the file that is currently
displayed (in the front highlighted Edit windows). The main file is the file that is
executed to run the current job or project. An active program file can be executed,
and put in the main file list (that is, in the pull-down menu on the toolbar). The main
file list contains the program files you have been running (the results of which appear
in the Command window or in the Output window). Any files on the main file list
can be selected, edited, and executed repeatedly. The list of main files may be
retained or cleared anytime as you wish.
Many Edit/Run cycles are involved in the writing and testing of a GAUSS program.
The convention adopted in this book is that all example lessons (with only a few
exceptions such as the first one below) will be set up to have two files. The first
(program) file contains the GAUSS code, and the second (output) file will contain all
output from running the program in the first file. You will see not only the results in
the Command or Output window, but also the output is stored in a file you specified.
The benefit of using Edit mode is the ability to have a record of each line of code.
This is especially helpful when troubleshooting a long or complicated program.
Alternatively, a file can be opened from the Command window by typing the file
name at the “>>” prompt:
edit gpe\lesson2.1;
18
GAUSS BASICS
You are now ready for program editing. The full path of file name
c:\gauss\gpe\lesson2.1 (or something like that depending on your GAUSS
installation) shows up as the title of the Edit window. lesson2.1 is just the name of
the program file for the following exercise. GAUSS will create a file named
lesson2.1 in the c:\gauss\gpe directory if it does not already exist. If a file named
lesson2.1 does exist, GAUSS will simply bring the file to the Edit window. When
working on your own project, you should use the name of your file.
The purpose of this lesson is to demonstrate some basic matrix and array operations
in the GAUSS language we have learned so far and to familiarize you with the
Edit/Run dual mode operation of GAUSS. If you are typing the following lesson for
practice, do not type the line number in front of each line of code. The numbering
system is for reference and discussion only.
/*
** Lesson 2.1: Let’s Begin
*/
1 A = {1 2 3,
0 1 4,
0 0 1};
2 C = {2,7,1};
3 print"Matrix A" A;
4 print;
5 print "Matrix C" c;
6 print "A*C" a*c;
7 print "A.*C" a.*c;
8 print "A.*C'" a.*c';
9 end;
From the menu bar, click on the Run button and select Run Active File. This will
save and run the program. The name of the program file lesson2.1 appears in the
main file list located on the toolbar as a pull-down menu item. As of now, lesson2.1
is the active file. You can run, edit, compile, and debug the main file all by clicking
on the four buttons next to the main file list.
Each line of code must end with a semi-colon (;). In line 1, we have typed in the
numbers in matrix form to be easier to read. Spaces separate columns while commas
separate rows. Carriage return is not seen by GAUSS. That is,
A = {1 2 3, 0 1 4, 0 0 1};
The GAUSS command, print, is used to print output to the screen. You may have
wondered about the extra print statement in line 4. This creates an empty line
between matrix A and matrix C, making the output easier to read. The rest of
lesson2.1 demonstrates the difference between matrix multiplication (*) and
element-by-element array multiplication (.*) with matrices. In addition, the use of
matrix transpose notation (') is demonstrated.
Matrix A
19
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Matrix C
2.0000000
7.0000000
1.0000000
A*C
19.000000
11.000000
1.0000000
A.*C
2.0000000 4.0000000 6.0000000
0.00000000 7.0000000 28.000000
0.00000000 0.00000000 1.0000000
A.*C'
2.0000000 14.0000000 3.0000000
0.00000000 7.0000000 4.0000000
0.00000000 0.00000000 1.0000000
Notice that matrix multiplication requires that the number of columns in the first
matrix equals the number of rows in the second matrix. Element-by-element array
multiplication requires that both matrices have the same number of rows or columns.
It “sweeps across” each row, multiplying every element of matrix A by the
corresponding element in matrix C (line 7). Element-by-element array multiplication
is “swept down” each column if C is transposed first (C') into a horizontal row
vector as shown in line 8.
Programming Tips
Just a few comments on programming in general. Professional programmers judge
their work by two criteria: Does it do what it is supposed to? Is it efficient? We
would like to add a third criterion: Will you be able to understand what your program
is supposed to be doing six months from now? Adding a blank line between sections
in your program will not affect how it runs, but it will make reading your program
easier. Describing the function of each section within comment symbols will benefit
you not only in troubleshooting now, but also in understanding your program in the
future. To do so in GAUSS, put the comment statement between a pair of “at” (@)
signs or in between “/*” and “*/” symbols. Notice that the “*” is always adjacent
to the comment text. Everything between the sets of “@” signs or between “/*” and
“*/” symbols will be ignored by GAUSS. Comments can extend more than one line
as desired. The difference between these two kinds of comments is shown in the
following:
/* This kind of
/* comment */
can be nested */
@ This kind of comment cannot be nested @
Another important programming style observed throughout this book is that we will
keep each program small. Break down your problem into smaller tasks, and write
sub-programs for each task in separate blocks of a larger program or in separate
programs. Avoid long lines of coding. Write clear and readable code. Use indention
where applicable. Remember that programming is very fluid, and there are always
multiple routes to achieve any desired task.
20
GAUSS BASICS
Most useful programs need to communicate and interact with peripheral devices such
as a file storage device, console display, printer, etc. A typical GAUSS program will
read input data from the keyboard or a file, perform the computation, show results on
the screen, and send outputs to a printer or store in a file.
GAUSS can handle at least three kinds of data formats: GAUSS data sets, GAUSS
matrix files, and text (or ASCII) files. The first two data file formats are unique and
efficient in GAUSS. For file transfer (import and export) between GAUSS and other
application software or across platforms, the text file format is preferred. Although
we do not limit the use of any particular file format, we focus here on text-formatted
file input and output. For the use of data set and matrix files, see GAUSS Language
References or on-line help from menu bar Help/References for more information.
Data Input
The most straightforward way to get information into GAUSS is to type it in the
Command window as we have been doing in the first part of this chapter. This
approach is useful for a small amount of data input. For example:
prices = {12.50 37.875 12.25};
assets = {"cash", "bonds", "stocks"};
holdings = {100 200,
300 400,
500 600};
For long series of data, it is recommended that your create a text file for the data
series using the GAUSS editor. That is, create the file and type the data in the Edit
window. Such a file should have numeric ASCII text characters, with each element
in a row separated from its neighbor with a space and each row on a separate line.
Now, we will introduce a text data file named longley.txt which comes with the GPE
package. If you installed GAUSS and GPE correctly, this data file should be located
in the GPE subdirectory of the GAUSS directory. The easiest way to bring it into the
Edit window is to click on the menu bar button File/Open and select the file name
longley.txt located in the GPE directory.
The alternative is typing the following in the Command window at the “>>”
prompt:
edit gpe\longley.txt;
The data matrix is arranged in seventeen rows and seven columns, and there are no
missing values. The first row contains only variable names, so it must not be
included in statistical operations. Each variable name is short, no longer than four
characters in this case. All values, except the first two columns (YEAR and PGNP),
21
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
are too large to handle easily. Scaling these variables may make interpreting the
resulting data easier. The bottom of the file contains the data source and description
for reference purposes. Of course, the descriptive information should not be included
for statistical analysis.
Alternatively, it can be re-coded as the following two lines using GAUSS command
reshape to form the desired 17×7 matrix:
load data[]=gpe\longley.txt;
data=reshape(data,17,7);
Notice that the size of a matrix created with load must be equal to the size of the
file being loaded (not counting optional reference information at the bottom of the
file). If the matrix is larger than the actual file size, bogus data will be read. If it is
smaller, part of the data series will be discarded. In either case, computations will be
inaccurate.
Data Output
A simple way to output data is to display a matrix. This can be accomplished by
either giving its name in interactive mode or using the print function as we have
shown so far.
print data;
You can use the format statement to control the formats of matrices and numbers
printed out. For prettier output, the GAUSS function printfm can print a matrix
using different format for each column of the matrix.
If you want to save essentially everything that appears on your screen (i.e. the output
from your GAUSS program), issue the following command:
output file = [filename] [option];
where [filename] represents the name of a new file that will receive the
subsequent output. When using the command output file you must designate
one of three options in [option]: Reset, On, or Off. The option Reset clears
all the file contents so that each run of the program stores fresh output; On is
cumulative, each output is appended to the previous one; Off creates an output file,
but no data are directed to it. An output file is not created if none of these three
options is used. When you are through directing output, don’t forget to issue the
command:
output off;
You may want to examine the output files. To create a text file containing the data
from a matrix use output and print statements in combination. For example:
output file = gpe\output2.1 reset;
22
GAUSS BASICS
print data;
output off;
will save the matrix named data in the file named output2.1 in the directory
GPE.
If you are in the Edit mode to write a program file, it is a good habit to end your
program with the statement:
end;
This will automatically perform output off and graciously close all the files still
open.
Click on the menu bar button File/Open and select the file name lesson2.2 located in
the GPE directory.
Make sure that the highlighted Edit window, with the title c:\gauss\gpe\lesson2.2, is
layered over the Command window and stays in the front. To run it, click on menu
bar button Run/Run Active File.
/*
** Lesson 2.2: File I/O
*/
1 output file = gpe\output2.2 reset;
2 load data[17,7] = gpe\longley.txt;
3 data = data[2:17,.];
4 PGNP = data[.,2];
5 GNP = data[.,3]/1000;
6 POPU = data[.,6]/1000;
7 EM = data[.,7]/1000;
8 X = PGNP~GNP~POPU~EM;
9 print X;
10 end;
For those of you who are using a working diskette (a:\ is assumed) and want to type
in the program, type these lines exactly as written. Misspellings, missing
semicolons, or improper spaces will all result in error messages. Be warned that each
23
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The first line of the program code tells GAUSS to direct the output of this program
to a file named output2.2 located in the GPE subdirectory. If you want a printed
copy of your work, just change it to:
output file = lpt1 reset;
Let’s examine the code for data loading. In line 2, a matrix named data, containing
17 rows and 7 columns, is created using the GAUSS command load. A text file
located in the GPE subdirectory named longley.txt is then loaded into the variable
data.
Remember that the first row of data contains variable names. Chopping off the first
row, or indexing the matrix, is one way to remove these names from statistical
analysis. In line 3, the new data takes from the old data the second row through
the seventeenth row. After line 3, the matrix named data contains 16 rows and 7
columns. Now try to make some sense about what line 4 is doing. It assigns PGNP to
the second column of the modified data matrix. Notice that when a matrix is being
created, the brackets are to the left of the equal sign. When a matrix is indexed, the
brackets are to the right of the equal sign. In general, information is taken from the
right side of an equal sign and assigned to either a matrix or variable on the left side
of the equal sign.
The next few lines, 4 through 7, create new variables by picking the corresponding
columns of data. For easier handling of large numbers, quantity variables are
scaled down by 1000-fold: GNP is now in billions of 1954 dollars; POPU and EM are
in millions of persons. PGNP is kept as given. Note that only the variables needed for
study are named and identified.
We now have four variables (vectors) that have been scaled down to a workable size.
Statistical operations can be done on each variable separately, or they can be joined
together and then operated on with one command. Line 8 concatenates all of the four
variables horizontally with a “~” symbol, forming a new data matrix named X.
24
GAUSS BASICS
If your output extends beyond your screen (in the Command or Output window),
you can resize the window for a better view. You can also try another font such as
New Courier, size 10, from the Configure button on the menu bar.
For those of you who are using a working diskette (a:\ is assumed), the first two
blocks of code in lesson2.2 can be used again. Duplicating and renaming lesson2.2
to lesson2.3 and then editing it will save typing and time. To do that, just start with
lesson2.2 in the Edit window and click on File/Save As. Since your working
diskette is in a:\, make sure that in the “Select File to Save…” dialog window the
“Save In:” line shows: “3 ½ Floppy (A):”. Type a:\lesson2.3 in the “File Name” line
and click on “Save.”
/*
** Lesson 2.3: Data Transformation
*/
1 output file = gpe\output2.3 reset;
2 load data[17,7] = gpe\longley.txt;
3 data = data[2:17,.];
4 PGNP = ln(data[.,2]);
5 GNP = ln(data[.,3]/1000);
6 POPU = ln(data[.,6]/1000);
7 EM = ln(data[.,7]/1000);
8 X = PGNP~GNP~POPU~EM;
9 print X;
10 end;
25
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
In case the function outputs are not of interest, the command call is used to call the
requested function or procedure without using any returned values. The syntax is,
call functionName(input1,input2,...);
26
GAUSS BASICS
seqa(0,0.1,10);
creates a 10x1 vector beginning at 0 and increasing with a 0.1
increment (i.e. 0, 0.1, … , 0.9).
To convert or reshape an existing matrix to a new matrix of different size, use the
reshape function as in the following example: …
x=seqa(1,1,5);
print x;
y=reshape(x,5,5);
print y;
Equivalently,
y=delif(x, x[.,1] .<= 0.5);
print y;
27
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
There are many other GAUSS functions like maxc and minc which work on the
columns of a matrix. For example:
x = {1, 2, 3};
y = sumc(x) + 10;
print y;
Since sumc computes the sum of each column of a matrix, this will produce:
16.000000
5.0000000
7.0000000
9.0000000
To compute the cumulative sum of elements in each column of a matrix, use the
function cumsumc as follows:
cumsumc(x);
We further list a few descriptive statistics functions which are applied to each
column of a matrix:
28
GAUSS BASICS
These functions will sort the rows of a matrix with respect to a specified column.
That is, they will sort the elements of a column and will arrange all rows of the
matrix in the same order as the sorted column. The sort is in ascending order.
Another useful sort function, sortind, returns the sorted index of a column vector.
This can be used to sort several matrices in the same way that some other reference
matrix is sorted. For example,
x = {5, 2, 8};
idx = sortind(x);
y = x[idx];
print idx~y;
produces two columns containing the ordering index of the original x and the sorted
x:
2.0000000 2.0000000
1.0000000 5.0000000
3.0000000 8.0000000
29
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
a = { 6 8,
-2 4};
b = {2, 1};
x = inv(a)*b;
print x;
If the matrix A is symmetric positive definite, use the GAUSS function solpd to
solve for x. Note that the solpd function takes two arguments. The first is the
matrix on the right-hand side of the matrix equation, while the second is the matrix
being inverted. For example:
a = {40 40,
40 72}; @ a is symmetric positive definite @
b = {2, 1};
x = solpd(b,a);
print x;
Therefore, if the matrix A is nxk and n is equal or greater than k, solving x from A*x
= b is equivalent to solving x from (A'A)*x = (A'*b). In other words, x =
invpd(A'A)*(A'b). Using the solpd function:
a = { 6 8,
-2 4};
b = (2, 1};
x = solpd(a'b,a'a);
print x;
This is exactly the GAUSS division (/) operator for finding the least squares (LS)
solution of A*x = b:
x = b/a;
print x;
30
GAUSS BASICS
{r,v} = eigrs2(a);
print r~v;
We note that the function eigrs2 returns two values: the first is a vector of
eigenvalues, while the second is a matrix of the corresponding eigenvectors. The
returned results are listed to the left of the equal sign, enclosed in brace brackets.
Running the above block of codes, we have:
The first column of the matrix is the vector of two eigenvalues. The last two columns
of eigenvectors correspond to each of the two eigenvalues, respectively.
The condition number, cn, is defined as the square root of the ratio of the largest
eigenvalue to the smallest. Compared with the GAUSS built-in function cond, the
identical result is:
print cond(x);
Not listed, but of great use, are the many functions that provide data plotting in two
or three dimensions, as well as a number of more specialized functions. To whet the
econometrician’s appetite, let’s name a few more in the following:
31
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The full list of functions and information on each one can be obtained via GAUSS’s
on-line help system.
In Lesson 2.4 we write a GAUSS program to review what we have learned so far.
First we load the data matrix from the file longley.txt. Recall that the first row of this
data matrix consists of variable names, therefore it will not be used in statistical
calculations. We define y as the last (7th) column of the data matrix. In addition, we
select all values of the first 6 variables and add a column of ones (constant vector) to
form the matrix x.
First, we call the built-in function dstat to report the descriptive statistics of all
data series including y and x. Then the ordinary least squares (OLS) estimator of y
on x is computed. Finally, the data matrix x is checked for its condition number.
Here is the program:
/*
** Lesson 2.4: Data Analysis
*/
1 output file = gpe\output2.4 reset;
2 load x[17,7]=gpe\longley.txt;
3 y=x[2:17,7];
4 x=x[2:17,1:6]~ones(16,1);
5 call dstat(0,y~x);
6 b=y/x; @ b=invpd(x'x)*x'y=solvpd(x'x,x'y) @
7 print b;
8 xx=x'*x;
9 r=eigrs(xx);
10 cn=sqrt(maxc(r)./minc(r));
11 print cn cond(x);
12 end;
Note that in line 5, dstat is a GAUSS built-in procedure which when called prints
descriptive statistics of a data matrix into a table. The output is arranged row-wise
for each variable. In dstat, the first input argument 0 means that the data to be
used involve a matrix defined earlier in the program. In this case, it is the matrix y~x
defined in line 3 for y and line 4 for x.
Line 6 demonstrates a simple way to obtain the least squares estimator: b = y/x.
To compute the condition number of x, we first get the eigenvalues of x'x (line 9)
and then take the square root of the ratio of maximum and minimum eigenvalues
(line 10). The result of the formal calculation of the condition number should be the
same as that from calling the GAUSS built-in function cond. We leave the rest of
running the program and interpreting the results to you as an exercise.
32
GAUSS BASICS
For Loops
The For Loop is easy to use. The most common use of a For Loop arises when a set
of statements is to be repeated a fixed number of times, as in:
.......
endfor;
where i is the counter integer followed by a pair of parentheses which enclose three
arguments. The first argument is the initial value of the counter, the second is its
final value, and the last is the increment value. The statements within the loop will be
executed 10 times starting from the counter i at value 0 through 9, each time with
increment of 1. Note that a For Loop ends with endfor statement.
There are fancier ways to use For Loops, but for our purposes, the standard one
suffices.
Do Loops
There are two kinds of Do Loops: do while and do until. The difference
between a do while loop and a do until loop is that the former will continue
the loop execution when the condition is true, while the latter will execute the loop
when the condition is false (or until the condition becomes true). A Do Loop always
ends with endo statement.
.......
endo;
.......
endo;
The statements break and continue are used within Do Loops to control
execution flow. When break is encountered, the execution will jump to the
33
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
statement following the endo. This terminates the loop. When continue is
encountered, the execution will jump to the top of the loop and reevaluate the do
while or do until expression. It reiterates the loop without executing any more
of the statements inside the loop. For the For Loops, both break and continue
statements work the same way as described for the Do Loops.
It is, of course, crucial that at some point a statement will be executed that will cause
the condition in the do while (or do until) statement to be false (true). If this
is not the case, you have created an infinite loop—one that will go merrily on until
you pull the plug.
For readability, it is sometimes useful to create variables for true and false, then use
them in a do while or do until loop. For example:
true = 1==1;
false = 1==0;
.....
done = false;
do while not done;
........
endo;
Of course, somewhere in the loop there should be a statement that will at some point
set done equal to true.
If Statements
An If statement provides a method for executing certain statements if a condition is
true and other statements (or none) if the condition is false. A complicated if
section can come with elseif and else statements, but it always ends with an
endif statement. For example:
if x > 0.5;
.......
elseif x > 0;
.......
else;
.......
endif;
In this case, if x is greater than 0.5 the first set of statements will be executed; if not,
x is checked again for a positive value. If x is greater than 0, the second set of
statements will be executed. Otherwise, the last set of statements will be executed.
34
GAUSS BASICS
if x > 0.5;
.......
else;
.......
endif;
In this case, if x is greater than 0.5 the first set of statements will be executed; if not,
the second set will be executed. An even simpler version omits the else section, as
in:
If x > 0.5;
.......
endif;
Here, the statements will be executed if (and only if) x exceeds 0.5.
Nesting
All of these flow control structures allow nesting, in which one type of structure lies
within another. For example:
j = 1;
do until j > n;
for k (1,n,1);
if x[j,k] > 0.5;
x[j,k] = 1.5;
endif;
endfor;
j=j+1;
endo;
The indentation is for the reader’s benefit, but is highly recommended in this and
other situations for purposes of clarity. It is wise to pair up endo (endfor, endif)
statements with preceding do (for, if) statements in a last-come-first-served
manner. It is up to the programmer to ensure that this will give the desired results.
Indenting can help, but hardly guarantees success on every occasion.
write:
value = price'*quantity;
The latter is more succinct, far clearer, and will run much, much faster. GAUSS
performs matrix operations at blinding speed, but is downright glacial at times when
35
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
loops are to be executed a great many times, since it must do a certain amount of
translation of each statement every time it is encountered.
A Practical Example
Do you know the accuracy of your computer’s numerical calculation? The following
example addresses this important problem. Suppose e is a known small positive
number, and the 5×4 matrix X is defined as follows:
1 1 1 1
e 0 0 0
0 e 0 0
0 0 e 0
0 0 0 e
Verify that the eigenvalues of X'X are 4+e^2, e^2, e^2, and e^2. How small a
value of e can your computer use and still successfully invert X'X? Try to make
some sense out of the following segment of code:
one=ones(1,4);
e=1.0;
do until e<1.0e-16;
x=one|(e.*eye(4));
print "e = " e;
print invpd(x'x);
e=e./10;
endo;
end;
Single-Line Functions
A single-line function starts with a fn statement declaring the function, followed by
the name of the function with its arguments enclosed in parentheses. The “guts” of
the function are defined on the right-hand side of the equal sign, all in one line. It is
called the same way as GAUSS built-in functions. However, it returns only one
argument. For example:
fn value(p,q) = p'*q;
This value function takes two arguments, p and q, to produce the inner product of
them. The result may be a scalar, a vector, or a matrix. Of course, this will only work
if p and q vectors or matrices are compatible for matrix multiplication. A more
36
GAUSS BASICS
complex multi-line version (or procedure) could examine the sizes of these two
matrices p and q, then use transpositions, etc., as required.
It is important to note that the argument and output names used in a function are
strictly local variables that exist only within the function itself. Thus, in a program
one could write the following statement to use the value function defined above:
cost = value(price,quantity);
In this calling statement, the function value takes two arguments, price and
quantity, which become assigned to matrices p and q of the function,
respectively. This function returns an output, called cost, which is assigned to the
output argument (value). There is no need for the names to be the same in any
respect. Moreover, the function cannot change the original arguments in any way. It
can only return information via its output.
This function converts a quarterly time series into an annual series by taking the
average of every four data points. Of course, this function will work only if the input
data series starts from the first quarter, and it is designed to handle one series at a
time. That is, the input argument x is a column vector of quarterly series, and the
function returns a column vector of annual series. Note that if the last year does not
have a complete quarterly series of four data points for conversion, it is ignored.
f(x) = ln(x) – x2
Now we check the maximum x = ½ for which f1(x) = 0 and f2(x) < 0:
xmax = sqrt(0.5);
f(xmax);
f1(xmax);
f2(xmax);
Remember the built-in procedures gradp and hessp serve the same purpose of
finding the first and second derivatives, the gradient vector, and hessian matrix of a
37
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
user defined function without writing their analytical forms f1 and f2 as above. Try
this:
gradp(&f,xmax);
hessp(&f,xmax);
The use of gradp and hessp procedures to numerically evaluate the first and
second derivatives of a function is particularly useful when the analytical forms of
derivatives are difficult to write. Consider the following function of two variables:
With the 2×1 parameter vector x, the function is easily defined in GAUSS:
fn g(x) = (x[1]^2 + x[2] – 11)^2 + (x[1] + x[2]^2 –7)^2;
Writing out the analytical formulas of the first and second derivatives using single-
line functions may be difficult. For this function, there are four minima: (3, 2),
(3.5844, -1.8481), (-3.7793, -3.2832), and (-2.8051, 3.1313). Using gradp and
hessp to check them is easy.
At this point you may be tempted to try using a graph to find the minima of the
above function. GAUSS is not good at graphics. Nevertheless, there are functions
available to do publication quality graphics in GAUSS. See GAUSS for Windows
User Guide or the on-line help menu for more details.
Procedures
A procedure in GAUSS is basically a user-defined function which can be more than
one line and as complicated as necessary to perform a given task. Any GAUSS built-
in command or function may be used in a procedure, as well as any user-defined
function or other procedure. Procedures can refer to any global variable and declare
local variables within. The basic structure of a GAUSS procedure consists of the
following components:
3. Body of procedure …
There is always one proc statement and one endp statement in a procedure
definition. Anything that comes between these two statements is part of the
procedure. local and retp statements are optional, and may occur more than once
in a procedure. GAUSS does not allow nested procedure definitions. That is, a
procedure cannot be defined within another procedure.
Variables other than input and output arguments may be included in procedures as
needed. There are global and local variables. A global variable is already declared
38
GAUSS BASICS
and used outside the procedure. A local variable is only visible to the procedure and
has no existence outside the procedure. Indeed, a local variable in one procedure may
have the same name as a different local variable in another function or procedure; the
two will coexist with neither bothering the other.
A procedure can return multiple arguments of output through retp statements and
by specifying the number of returned items in the beginning of the proc statement.
As an example, the procedure version of the value function takes inputs of p and q
and returns the total (called s) and average (called m) values as follows:
proc (2) = value(p,q);
local s, m;
s = p'*q;
m = s./sumc(q);
retp(s,m);
endp;
In the proc statement, the syntax of an equal sign preceded with the number of
returned arguments enclosed in parentheses (that is, “(2)= ” in the above example)
is not needed for a procedure with single output argument (the default case).
To use the multiple output arguments of a procedure call, simply assign them names
in the declaration line, as in:
{sum,mean} = value(price,quantity);
Here, variables price and quantity are assigned to the input arguments p and q,
respectively. Similarly, sum and mean are assigned to the output arguments s and
m. All the input and output arguments are local variables.
Note that as with inputs, the correspondence between outputs in the calling statement
and the procedure itself is strictly by order. When the procedure has finished its
work, its output values are assigned to the variables in the calling statement.
If a procedure does not return any items or you want to discard the returned items,
just call it as we have demonstrated in the above example lessons. For example:
call value(price,quantity);
Now let’s extend the earlier single-line version of time series conversion function
qtoa1 to a multi-line procedure so that it can handle the conversion of a more
general data matrix. The working method of the following procedure qtoa is to take
a data matrix x which consists of quarterly data series in columns and convert it into
a matrix of the yearly averages. The procedure takes advantage of the previously
defined single-line function qtoa1 to compute the annual average series from each
column of the quarterly data matrix, all in a Do Loop. Here is the procedure:
proc qtoa(x);
local r,c,y,i;
r = rows(x);
c = cols(x);
y = qtoa1(x[.,1]);
i = 2;
do until i > c;
y = y~qtoa1(x[.,i]);
i = i+1;
endo;
39
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
retp(y);
endp;
Of course, the above time series conversion function and procedure are limited to a
quarterly data series. We can make them more flexible by specifying the number of
periods of each seasonal cycle as an input argument in addition to the data matrix.
The following function tss1 and procedure tss are essentially the same as qtoa1
and qtoa, respectively. The difference is that now the number of periods n for time
series conversion is specified as one of the input arguments. Depending on the
seasonal cycle of the data series, you can use the same procedure for either quarterly
or monthly conversion.
fn tss1(x,n) = meanc(reshape(x,rows(x)/n,n)');
proc tss(x,n);
local r,c,y,i;
r = rows(x);
c = cols(x);
y = tss1(x[.,1],n);
i = 2;
do until i > c;
y = y~tss1(x[.,i],n);
i = i+1;
endo;
retp(y);
endp;
As an exercise, write the procedures to compute the analytical first and second
derivatives of this scalar-valued function of two variables:
User Library
The purpose of writing functions and procedures is to keep tasks organized and self-
contained. Each function or procedure will perform a few given tasks and nothing
else. To make programmers’ work easier, procedures allow programmers to build on
their previous work and on the work of others rather than starting over again and
again to perform related tasks. One way to organize the work is to collect a group of
functions and procedures into a program file and register the file and its contents
with the GAUSS library system. Note that you must have your own copy of GAUSS
installed on your own computer in order to access and modify the GAUSS library
facility. We will assume that your GAUSS comes with the User Library, to which
you can add your creative functions and procedures.
First, let’s put the function tss1 and procedure tss together in a file named
TSS.SRC (SRC is the default file extension name for GAUSS source codes,
although you can use any other name you want). Put the program file TSS.SRC in
the SRC subdirectory of GAUSS path. Next, we will add the following lines to the
library file USER.LCG located in the LIB directory of the GAUSS path:
TSS.SRC
tss1 : fn
tss : proc
40
GAUSS BASICS
Similar to the idea of using a dictionary, the function tss1 and procedure tss
defined in the program file TSS.SRC are now part of GAUSS library system, which
will be searched for name recognition every time GAUSS executes a program. You
can also add variable names as matrices or strings in the library. Refer to GAUSS
Language References or the on-line help system for more details on using and
maintaining the library system.
From now on, both tss1 and tss functions are an integral part of your version of
GAUSS. You have just extended the environment for GAUSS programming!
GPE Package
The other way of extending GAUSS is to use a package, which is a set of compiled
GAUSS libraries for special purposes. GAUSS Programming for Econometricians
and Financial Analysts (GPE) is a GAUSS package of econometric procedures. The
GAUSS command use is used to load a package at the beginning of your program.
For example,
use gpe2;
will load the GPE package (version 2) for econometric analysis and applications.
Note that use can only appear once and must occur at the top of a program.
Output global variables are the results of calling estimate and forecast. They
can be assigned to new variables for further analysis. Depending on the input global
variables used which control the econometric routines, not all the output global
variables will be available. The name of an input control variable starts with a single
underscore (for example, _b), while an output control variable starts with a double
underscore (for example, __b). Refer to Appendix A for a complete list of GPE
global control variables and their default or predefined values.
/*
** Comments on program title, purposes,
** and the usage of the program
*/
. /*
. ** Writing output to file or sending it to printer:
. ** specify file name for output
*/
41
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
/*
** Loading data:
** read data series from data files.
*/
/*
** Generating or transforming data series:
** create and generate variables with
** data scaling or transformation
** (e.g. y and x are generated here and will be used below)
*/
/*
** Set input control variables for model estimation
** (e.g. _names for variable names, see Appendix A)
*/
/*
** Retrieve output control variables for
** model evaluation and analysis
*/
/*
** Set more input control variables if needed,
** for model prediction
** (e.g. _b for estimated parameters)
*/
Using the GPE package in a GAUSS environment is the main focus of the rest of this
book, which begins in the next chapter on linear regression models. If you are
already familiar with linear least squares estimation, you can jump to the nonlinear
models discussion which begins in Chapter VI. The topic of simultaneous equation
systems is covered in Chapter XIII. In addition to many classical econometric
methods, modern approaches such as generalized method of moments (Chapter XII),
autoregressive conditional heteroscedasticity (Chapter XV), and panel data analysis
(Chapter XVI), are programmed and solved with GPE (version 2) for GAUSS.
Don’t forget that we are learning GAUSS as a tool to do econometrics. The package
GPE written in GAUSS acts as a bridge between the domain knowledge
(econometrics) and the programming environment (GAUSS). With this approach,
only a limited knowledge of computer programming is required in the beginning.
After gaining experience with GPE and GAUSS in general, you should be ready for
your own programming adventure in advanced econometrics, by either extending
GPE or writing new programs.
42
III
Linear Regression Models
GPE (GAUSS Programming for Econometricians and Financial Analysts) is a
GAUSS package for linear and nonlinear regressions useful for econometric analysis
and applications. The purpose of this chapter is to show you how to use GPE for
basic linear least squares estimation.
Y = Xβ + ε
The ordinary least squares regression amounts to the following estimation results:
b = (X'X)-1X'Y Estimator of β.
Var(b) = s2(X'X)-1 Estimated variance-covariance matrix of β.
e = Y - Xb Estimated errors ε or residuals.
s2 = e'e/N-K Estimated regression variance σ2. N is the number of
sample data; K the is the number of parameters.
Consider the simple case of regressing the dependent variable Y against one
independent variable X in addition to a constant:
Y = α + βX + ε
where ε is the difference between known Y and estimated Y, or the residual. The
parameter α is the intercept and β is the slope of the linear regression equation.
Continuing with the text data file longley.txt we used in Chapter II, Lesson 3.1
introduces the use of GPE to estimate a simple regression equation. Lesson 3.2
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
examines a set of regression statistics obtained from the simple regression. Lesson
3.3 is a multiple regression model.
/*
** Lesson 3.1: Simple Regression
*/
1 use gpe2; @ using GPE package (version 2) @
2 output file = gpe\output3.1 reset;
3 load data[17,7] = gpe\longley.txt;
4 data = data[2:17,.];
5 PGNP = data[.,2];
6 GNP = data[.,3]/1000;
7 EM = data[.,7]/1000;
8 RGNP = 100*GNP./PGNP;
9 call reset;
10 _names = {"EM","RGNP"};
11 call estimate(EM,RGNP);
12 end;
In order to use GPE package for econometric analysis and applications, the first
executable statement in your GAUSS program must be:
use gpe2;
This tells GAUSS where to look when GPE commands are used. However, remarks
enclosed in comment symbols are permitted before the first line of program code.
In Lesson 3.1, line 9 initializes all the GPE global control variables by calling
reset procedure. Then, in line 10, the input control variable _names is defined to
be a list of character names for variables used in the regression (dependent variable
first, followed by independent variables in the order of appearance in the equation).
In this example, EM is the dependent variable, RGNP the independent variable, and
_names is a character vector of variable names as:
_names = {“EM”, “RGNP”};
Not starting GPE input control variables such as _names with an underline (_) is a
common mistake. GAUSS ignores them without warning or an error message. Your
program using GPE just will not work like it should. See Appendix A for more
information about the usage of _names and other input control variables.
44
LINEAR REGRESSION MODELS
If _names is not specified, then the default variable names are used for the
procedure estimate. That is, Y for the name of the dependent variable and X# for
the names of the independent variables (# indicates the number in sequence, i.e., 1,
2, … ).
The GPE econometric procedure estimate is called in line 11. It takes the
dependent variable as the first argument, and the list of independent variables as the
second. A constant vector for the estimated intercept term is automatically added to
the model estimation.
The basic output of estimate is presented in four blocks. The first block gives
general information about the regression. Goodness of fit of the estimated regression
and several model selection criteria are given in block two. Block three is the
standard Analysis of Variance (AOV) . The following discussion focuses on the last
block of output information. Values of the estimated coefficient, standard error, and
t-ratio for each variable are given row-wise. Reading the output for each variable
gives the estimated model as:
Interpreting this output tells us that, on average for each one billion dollar increase of
RGNP (measured in 1954 value), there will be an increase of about 59 thousand in
people employed (EM).
Since the expected value of the residuals is zero, it is not in the estimated regression
equation. However, a list of error values for each observation is available for further
analysis to be discussed in later lessons.
45
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Testing of the simple hypothesis that a given coefficient is equal to zero takes the
estimated coefficient’s t-ratio and compares it with the critical value from the
Student’s t distribution listed for the given degrees of freedom (DF). Prob > |t| is the
corresponding P-value, or the probability of a type II error (that is, the probability of
not rejecting the null (false) hypothesis that the corresponding coefficient equals
zero). We know that RGNP’s coefficient is statistically significant from its t-ratio of
22.5 and our chance of being wrong is 2×10-12, or very close to zero.
The partial regression coefficient measures the marginal contribution of the variable
when the effects of other variables have already been taken into account. For a linear
regression including only one independent variable, this is just the R-square (0.9732
in this case) of the regression.
The GPE estimate routine is the foundation of most models that you will use to
estimate a regression equation. Global control variables are then added in different
combinations to check, test, and hopefully correct the fitted regression. In the next
several lessons, further analysis of the regression is achieved through the use of input
control variables for the estimate procedure. Again, refer to Appendix A for a
more detailed description of GPE input control variables.
/*
** Lesson 3.2: Residual Analysis
*/
1 use gpe2; @ using GPE package (version 2) @
2 output file = gpe\output3.2 reset;
3 load data[17,7] = gpe\longley.txt;
4 data = data[2:17,.];
5 PGNP = data[.,2];
6 GNP = data[.,3]/1000;
7 EM = data[.,7]/1000;
8 RGNP = 100*GNP./PGNP;
Before using global control variables, as we have mentioned earlier, you need to call
the reset procedure once to initialize them. Calling reset returns all GPE global
control variables to their default setting.
46
LINEAR REGRESSION MODELS
The reset option used with an output file has nothing to do with a called
reset in GPE. The latter is a GPE procedure, while the former is an option
associated with the GAUSS command output. They are simply two different
concepts.
_rstat is the residual analysis tool most frequently used in conjunction with
GPE’s estimate. Setting the GPE input control variable to a non-zero value (the
convention is 1, meaning true or yes) provides a set of simple residual statistics.
These statistics are: squared correlation of the observed (actual) and predicted (fitted)
values of the dependent variable, sum-of-squared residuals, sum of absolute
residuals, sum of residuals, and the serial correlation coefficient, first-order Rho. The
well-known Durbin-Watson test statistic is useful for testing the presence of first-
order serial correlation. The output of residual statistics is:
Squared Correlation of Observed and Predicted = 0.97320
Sum of Squared Residuals = 4.9582
Sum of Absolute Residuals = 7.6446
Sum of Residuals = 5.32197E-012
First-Order Rho = 0.23785
Durbin-Watson Test Statistic = 1.4408
The option _rlist = 1 lists the observed (actual) and predicted (fitted) values of
the dependent variable. The residual is computed as the difference between the actual
and fitted values. Each observation of residual and its standard error is listed as well.
List of Observed, Predicted and Residuals
Obs Observed Predicted Residual Std Error
1 60.323 59.841 0.48195 0.52253
2 61.122 60.479 0.64314 0.53477
3 60.171 60.446 -0.27506 0.53419
4 61.187 61.938 -0.75124 0.55639
5 63.221 63.347 -0.12561 0.56955
6 63.639 64.037 -0.39762 0.57341
7 64.989 64.938 0.050585 0.57597
8 63.761 64.588 -0.82719 0.57531
9 66.019 66.329 -0.31005 0.57446
10 67.857 66.798 1.0588 0.57246
11 68.169 67.251 0.91781 0.56979
12 66.513 66.826 -0.31280 0.57232
13 68.655 68.439 0.21576 0.55933
14 69.564 69.110 0.45430 0.55112
15 69.331 69.565 -0.23401 0.54454
16 70.551 71.140 -0.58873 0.51511
Plotting is a quick way to evaluate the result of model estimation. Setting _rplot
= 1 will return a plot of estimated residuals, while setting _rplot = 2 produces
both the residual graph and fitted-vs.-actual dependent variable series. The graph is
shown in a separate window when running the program. By viewing the plot of
residuals, the correlation patterns in residuals may indicate a need to re-specify the
model.
47
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
In the following, we add a few new twists to both the programs of Lesson 3.1 and
3.2. In addition to _rstat, _rplot, _rlist, this lesson introduces the use of
another input control variable, _vcov. By setting it to a non-zero value (i.e., 1), the
regression output will include a variance-covariance matrix as well as a correlation
matrix of the estimated coefficients. It is often useful to examine the relationship
among estimated coefficients in a multiple regression.
/*
** Lesson 3.3: Multiple Regression
*/
1 use gpe2; @ using GPE package (version 2) @
2 output file = gpe\output3.3 reset;
3 load data[17,7] = gpe\longley.txt;
4 data = data[2:17,.];
5 PGNP = data[.,2];
6 GNP = data[.,3]/1000;
7 POPU = data[.,6]/1000;
8 EM = data[.,7]/1000;
9 RGNP = 100*GNP./PGNP;
48
LINEAR REGRESSION MODELS
Including POPU has changed our estimated regression, but is it better? Analyzing the
following result will tell the story.
Least Squares Estimation
------------------------
Dependent Variable = EM
Estimation Range = 1 16
Number of Observations = 16
Mean of Dependent Variable = 65.317
Standard Error of Dependent Variable = 3.5120
49
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
RGNP still has about the same influence on EM, as reported in the previous lesson.
Based on residual statistics, the model has a similar performance to the simple
regression without the POPU variable.
Pay special attention to the “Partial Regression Coefficient,” which gauges the
marginal contribution of each variable when the effects of other variables have
already been taken into account. In terms of model interpretation, the negative slope
coefficient of POPU is not what we would expect. POPU has an extremely low partial
regression coefficient and it is not statistically significant as seen by its near zero t-
ratio. Looking at the outputs of the variance-covariance matrix and the correlation
matrix of coefficients, the estimated coefficient of POPU has a relatively large
variance and it is strongly correlated with that of RGNP. Therefore, our regression is
better without POPU.
Out of the cjx.txt data series we will use only the following selected variables:
50
LINEAR REGRESSION MODELS
To make our following presentation easier, L1 has been renamed L and K1 has been
renamed K. With the introduced notations, a simple two-input Cobb-Douglas
production function is written as
To transform the model into a more useful form, natural logarithms are taken on both
sides of the equation:
where β0 = ln(α) is the intercept of the log model, and the slopes β1 and β2 are
interpreted as input elasticities. Econometric estimation of this Cobb-Douglas
production function is the focus of the following few lessons.
/*
** Lesson 3.4: Cobb-Douglas Production Function
*/
1 use gpe2;
2 output file = gpe\output3.4 reset;
3 load data[40,6] = gpe\cjx.txt;
4 year = data[2:40,1];
5 X = ln(data[2:40,2]);
6 L = ln(data[2:40,3]);
7 K = ln(data[2:40,5]);
8 call reset;
9 _names = {"X","L","K"};
10 call estimate(X,L~K);
11 _restr = {1 1 1};
12 call estimate(X,L~K);
13 end;
Before examining the output, let’s look at the programming style. This program is
efficient, that is, many actions are combined in few lines of code. Line 4 removes the
first row (variable names) of the data file and indexes the matrix data into a vector,
all in one step. Lines 5, 6, and 7 go one step further. In addition to indexing the
matrix, they take the natural logarithm of each variable.
51
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
------------------------
Dependent Variable = X
Estimation Range = 1 39
Number of Observations = 39
Mean of Dependent Variable = 5.6874
Standard Error of Dependent Variable = 0.46096
Interpreting the estimation result of the log model takes into account that estimated
slope coefficients translate directly into elasticities. In other words, the influence of
labor L and capital K on output GNP (X) is expressed in terms of “percentage
change.” For every one percent increase in labor input L, GNP increases by 1.45
percent. When capital input K is increased by one percent, GNP increases 0.38
percent.
An adjusted R2 of .994 reveals a very good fit of the regression equation. Large t-
ratios and small P-values for all variables show that the chance of the elasticity
coefficients being zero is also small. Moreover, partial regression coefficients are
strong enough for both ln(L) and ln(K). The resulting model is the basis for many
lessons to come.
Now, let’s consider the theory of constant returns to scale often assumed in many
classical productivity studies. Restricted least squares is the technique used to
estimate models with linear restrictions. GPE’s estimate procedure can be
modified to perform restricted least squares with the use of the input control variable
_restr (see Appendix A for details).
The last part of the program (lines 11 and 12) demonstrates a simple example of
restricting β1 + β2 = 1, in order to test for CRS. To understand what line 11 is doing,
we need to describe a little matrix algebra. Linear restrictions on least squares
coefficients can be expressed by the equation
Rβ = q
52
LINEAR REGRESSION MODELS
[ Rs R0 ]
βs = q
β0
_restr = [ Rs q ]
Linear restrictions involving the intercept term will necessitate the explicit inclusion
of a constant column as part of the data matrix of independent variables and will
estimate the model without an intercept. This is done by setting the input control
variable _const = 0 before calling the estimate procedure.
_restr = {1 1 1};
Now let’s look at the matrix to the right of the equal sign, {1 1 1}. Each row of
_restr specifies a single restriction. Therefore, only one restriction is called out
(i.e. β1 + β2 = 1) in this example. The number of columns of _restr comes from
the number of slope coefficients βs plus the one column of the restricted value q. The
first two columns of 1’s in _restr(βs) select β1 and β2. When multiplied with
corresponding slope coefficients, it is the sum of β1 and β2. The last column of 1’s in
_restr(q) specifies that the resulting sum of β1 and β2 equals 1. In other words,
the _restr matrix calculates:
[1 1]
β1 = 1*β + 1*β = 1.
β2 1 2
_restr = {1 -1 0};
That is, [ 1 -1 ]
β1 = 1*β -1*β = 0.
β2 1 2
53
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
_restr ={0 1 0 0 0,
0 0 1 1 1};
β1
0*β + 1*β + 0*β + 0*β 0
0 β
That is,
0 1 0 2
=
1 2 =
3 4
0 0 1 1 β
0*β + 0*β + 1*β + 1*β 1
β
3 1 2 3 4
Look at the output produced by restricted least squares estimation (line 12):
Least Squares Estimation
------------------------
Dependent Variable = X
Estimation Range = 1 39
Number of Observations = 39
Mean of Dependent Variable = 5.6874
Standard Error of Dependent Variable = 0.46096
Before testing for CRS, let’s look at the format of the output. Notice the warning
near the top. Due to the imposed restrictions, standard statistics based on residuals
are reliable only when the restrictions are correct. The Wald test statistic of linear
restrictions is given along with its P-value, directly under the warning near the top of
the output. The Wald test statistic uses residual sum-of-squares (RSS) from both
unrestricted and restricted models to check if the stated restrictions yield a model that
is statistically different from the model without the restrictions.
54
LINEAR REGRESSION MODELS
RSS* - RSS
J
RSS ∼ F(J, N-K)
N-K
0.55874 - 0.04338
1
0.04338 = 427.66 ~ F(1, 39-3)
39-3
At a 5% level of significance, the F critical value of 4.17 places our computed value,
427.66, in the right-tail rejection region of the F distribution. Together with a near
zero P-value for the Wald statistic, this result leads us to reject the linear restriction
β1 + β2 = 1.
In addition to Wald test for linear restrictions, large sample test statistics such as
Lagrange Multiplier test and Likelihood Ratio test are reported in the output. Refer
to econometrics textbooks about the derivation and application of these tests. All the
corresponding P-value of these statistics are very small pointing to the same
conclusion to reject the linear restriction of constant returns to scale. Based on this
simple two-input Cobb-Douglas specification of the production technology, the data
series from cjx.txt does not support the theory of constant returns to scale. As a
matter of fact, the U.S. production technology exhibited the pattern of increasing
returns to scale (i.e., β1 + β2 > 1) at least from 1929 to 1967.
Having thrown away the hypothesis of constant returns to scale, the next interesting
issue about the production function is presented. Is there any difference in factor
productivity between pre-war and post-war periods?
/*
** Lesson 3.5: Testing for Structural Change
*/
1 use gpe2;
2 output file = gpe\output3.5 reset;
3 load data[40,6] = gpe\cjx.txt;
4 year = data[2:40,1];
55
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
5 X = ln(data[2:40,2]);
6 L = ln(data[2:40,3]);
7 K = ln(data[2:40,5]);
8 call reset;
9 _names = {"X", "L", "K"};
10 call estimate(X,L~K); @ whole sample @
Run the above program to analyze the output. It calls estimate three times. The
first time it estimates the entire sample (the default case):
Least Squares Estimation
------------------------
Dependent Variable = X
Estimation Range = 1 39
Number of Observations = 39
Mean of Dependent Variable = 5.6874
Standard Error of Dependent Variable = 0.46096
The first regression is named the restricted model because it restricts the entire time
period to having the same structure. The estimated restricted model is:
Sub-samples for the second and for the third regression estimations are controlled by
_begin and _end, set to the desired observation numbers, respectively. _begin
and _end allow regressions of varying sizes to be estimated from a single data
series. Since the default value of _begin is 1, it really is not necessary in line 11.
The next line _end = 20 tells GPE to use up to, and including, the 20th row of
data for estimate. Here is the output of the second regression equation:
Least Squares Estimation
------------------------
56
LINEAR REGRESSION MODELS
Dependent Variable = X
Estimation Range = 1 20
Number of Observations = 20
Mean of Dependent Variable = 5.3115
Standard Error of Dependent Variable = 0.27867
Notice that the estimation range is from 1 to 20, using 20 observations. Running the
regression using only the time period from 1929 to 1948 returns the following
model:
The third regression with _begin = 21 (line 14) and _end = 39 (line 15) tells
GPE to estimate the model from the 21st row of the data series up to, and including,
the last or the 39th row. Let’s look at the regression result:
Least Squares Estimation
------------------------
Dependent Variable = X
Estimation Range = 21 39
Number of Observations = 19
Mean of Dependent Variable = 6.0832
Standard Error of Dependent Variable = 0.21025
57
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Now, notice that the estimation range is from 21 to 39, using 19 observations.
Regressing only the time period from 1949 to 1967 returns the following model:
Back to the question at hand, was there a structural change in productivity between
the years of 1929 and 1967? We have processed our raw data using 1948 as the
break point, and only need to apply the formal Chow test on the results. Computing
the Chow test statistic as follows:
where RSS* is the restricted residual sum-of-squares for the whole sample (1929-
1967); RSS1 is the residual sum-of-squares for the first sub-sample (1929-1948);
RSS2 is the residual sum-of-squares for the second sub-sample (1949-1967); K is the
number of variables, including constant, in each regression (again, do not confuse K
with the variable name for capital input in this program); and N is the number of
observations for the whole sample.
At a 5% level of significance, comparing the Chow test statistic (1.27) against the F
critical value of 2.92 leads us to conclude that, based on the Cobb-Douglas
specification, there was no structural change in productivity between 1929 and 1967.
or in exponential form:
1.45 0.38
X = 0.02 L K
58
LINEAR REGRESSION MODELS
Besides examining standard residual statistics and plotting residual series, GPE
offers a set of diagnostic information to check the characteristics of residuals in
depth.
The first half of the program below is the same as that of lesson3.4. After removing
_rplot and setting _rstat to typical values, we add the following two lines:
_bjtest = 1;
_rlist = 2;
Setting _bjtest = 1 (meaning yes or true) will carry out the Bera-Jarque
normality test on the residuals.
We have seen the use of _rlist = 1 which lists each observation of the residuals
and their standard errors, in addition to observed (actual) and predicted (fitted) data
series. With _rlist = 2, in addition to residuals and their standard errors, useful
information on influential observations and outliers is available.
/*
** Lesson 3.6: Residual Diagnostics
*/
1 use gpe2;
2 output file = gpe\output3.6;
3 load data[40,6] = gpe\cjx.txt;
4 year = data[2:40,1];
5 X = ln(data[2:40,2]);
6 L = ln(data[2:40,3]);
7 K = ln(data[2:40,5]);
8 names = {"X", "L", "K"};
9 call reset;
10 _rstat = 1;
11 _rlist = 2; @ check influential obs. @
12 _bjtest = 1; @ normality test @
13 call estimate(X,L~K);
14 end;
Running the above program, the output file output3.6 is generated. For model
evaluation, we now refer to output3.6. After reporting basic residual statistics,
the Bera-Jarque Wald test for normality computes the statistic based on the
measurements of skewness and kurtosis for the residuals as follows:
Bera-Jarque Wald Test for Normality
Asymptotic Standard Error of Residuals = 0.033352
Skewness of Residuals = 0.84226
Kurtosis of Residuals = 4.7072
Chi-Sq( 2) Prob>Chi-Sq
0.0093379
59
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The resulting test statistic follows the Chi-squared probability distribution with 2
degrees of freedom. The computed value of 9.35 for the Bera-Jarque test statistic is
far greater than the critical value for either a 5 or 10 percent level of significance (or,
less than 1% in its P-value). The null hypothesis of residual normality is rejected!
For a perfect normal distribution, residual skewness should be 0 and residual kurtosis
should be 3. The rejection of normality is not a surprise. However, non-normal
residuals render potential problems for statistical inference.
The last part of the output reports the regression diagnostics for influential
observations and outliers:
To check for influential observations and outliers, we first take a look at the column
“Leverage,” which measures the influence of each observation on the regressors. We
check for leverage which is greater than 2×(K/N) where K is the number of
estimated coefficients and N is the number of observations. In this case, 2×(K/N) =
2×(3/39) = 0.154. Observations 4 and 5 (leverage 0.201 and 0.185, respectively) are
quite influential.
60
LINEAR REGRESSION MODELS
A more robust measure of outliers uses the studentized residuals (or standardized
predicted residuals) which follows the Student’s t distribution with N-K-1 degrees of
freedom. Given the critical value of 1.69 at a 5% level of significance, observations
16, 17, 19, and 20 are candidates for outliers.
The last column, “DFFITS,” measures the contribution of each observation to the
0.5
prediction of the model. The cutoff value 2×(K/N) is suggested (that is, 0.555 in
the case of this Cobb-Douglas production model). The contribution of observations
17, 19, and 20 are rather large.
Materials of this lesson on influential observations and outliers can be found in Judge
et al. (1988) and Maddala (1988). In summary, for our study of the Cobb-Douglas
production function, the model is sensitive to the use of data near the end of World
War II (i.e., observations 17, 19, and 20). The model may be better explained
without them.
61
IV
Dummy Variables
Dummy variables are widely used in econometrics to isolate sub-group effects in a
given sample. These sub-groups may be geographical regions, yearly quarters,
gender, or periods in time. How dummy variables are used in regression estimation
determines in which way the sub-groups differ. The so-called dummy variables
themselves remain vectors of ones and zeros. A one indicates the presence of a given
characteristic, while a zero indicates its absence. In most cases, one less dummy
variable is used than there are sub-groups. Estimated regressions from these sub-
groups may have an additive difference, a multiplicative difference, or a combined
additive and multiplicative difference. An additive difference refers to a parallel shift
in the level of an estimated regression. This shift is reflected in a change of the
intercept term, while the other coefficients remain unchanged. The slope coefficients
will vary with their associated multiplicative dummy variables. The estimated
changes in slope coefficients among sub-groups are measured by the coefficients of
multiplicative dummy variables. A combined additive and multiplicative difference
in sub-groups is achieved by a change in all coefficients, both intercept and slope
terms.
Seasonality
Determining seasonal patterns in time series data is one application of dummy
variables. A new text data file named almon.txt will be used to study quarterly
seasonality. It has three variables. The first column is the date, in years and quarters
(YEARQT). The second column is capital expenditures in millions of dollars
(CEXP). The last column holds capital appropriations in millions of dollars (CAPP).
The basic Almon model describes the simple relationship between capital
expenditures and appropriations as follows:
CEXP = β0 + β1 CAPP + ε
There are 60 observations in total, although Almon’s original study used the first 36
observations from 1953 to 1961. Lesson 4.1 is devoted to the study of seasonal
differences with Almon’s quarterly time series on capital appropriations and
expenditures.
Does the use of dummy variables matter? Lesson 4.1 continues the hypothesis testing
procedure for significant differences in quarterly seasonality. It is achieved by
comparing regression results from restricted (without seasonal dummy variables) and
unrestricted (with seasonal dummy variables) least squares.
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
In Lesson 4.2, the notorious problem of the “dummy variable trap” is discussed with
an alternative use and interpretation of dummy variables in conjunction with the
regression without intercept.
First of all, seasonality implies that the best-fitting regression for each season
(quarter) may be different. In other words, the intercept and slope terms that provide
the best fit for one quarter may not provide the best fit for different quarters. Before
generating the seasonal dummy variable matrix, you need to have some idea of what
it should look like. It has a repeating set pattern of four columns, one for each
quarter. Consider all 60 observations of time series data in almon.txt; a pattern of
0’s and 1’s is created to represent one cycle of seasonality (that is, one year). The
pattern is reshaped into a 4-column matrix with the desired 60 rows:
pattern = {1 0 0 0,
0 1 0 0,
0 0 1 0,
0 0 0 1};
D = reshape(pattern,60,4);
q1 = D[.,1];
q2 = D[.,2];
q3 = D[.,3];
To avoid perfect collinearity with the constant column associated with the intercept,
only three columns of the dummy variable matrix D will be used. That is, four
quarters are indicated with only three dummies: q1, q2, and q3. Lesson 4.2 on the
dummy variable trap explains why we must do this.
CEXP = β0 + β1CAPP + δ1 Q1 + δ2 Q2 + δ3 Q3 + ε
We also will address the significance of seasonal differences in the model by testing
when the three coefficients δ1, δ2, and δ3 are jointly equal to zero. This is a test
procedure for the presence of seasonality in the model we will examine later.
/*
** Lesson 4.1: Seasonal Dummy Variables
*/
1 use gpe2;
2 output file = gpe\output4.1 reset;
3 load almon[61,3] = gpe\almon.txt;
4 cexp = almon[2:61,2];
5 capp = almon[2:61,3];
64
DUMMY VARIABLES
6 qt = almon[2:61,1];
7 pattern = {1 0 0 0,
0 1 0 0,
0 0 1 0,
0 0 0 1};
8 D = reshape(pattern,60,4);
9 q1 = D[.,1]; @ quarterly seasonal dummies @
10 q2 = D[.,2];
11 q3 = D[.,3];
12 call reset;
13 _names = {"cexp", "capp", "q1", "q2", "q3"};
14 call estimate(cexp,capp~q1~q2~q3);
15 _restr = {0 1 0 0 0,
0 0 1 0 0,
0 0 0 1 0};
16 call estimate(cexp,capp~q1~q2~q3);
17 end;
The estimation is carried out with three quarter dummy variables named q1, q2, and
q3. The fourth quarter is the base case, and the coefficients of three dummy
variables identify the additive differences from that of the fourth quarter, or the
intercept term.
There are many ways to generate the dummy variables other than the suggested use
of the reshape command. A simple alternative is to rely on the quarter indicator
qt, appearing in the first column of the data file almon.txt. Lines 7 through 11 of
the above program can be replaced by the following three lines:
q1 = (qt%10) .== 1;
q2 = (qt%10) .== 2;
q3 = (qt%10) .== 3;
The modulo division “%” returns the remainder of the integer division, and the
notation “.==” in GAUSS performs element-by-element equality comparison. In
other words, each line compares the last digit of qt to a given quarter, placing a one
in the dummy variable if the comparison turns out to be true.
GAUSS has its own commands for creating dummy variables: dummy, dummybr,
dummydn. The command dummy creates a matrix of dummy variables by breaking
a vector of data into multiple groups. To make sense of this example, lines 7 and 8
of the above program may be replaced by the following:
seasons = {1,2,3,4};
D = dummy(qt%10, seasons);
where the column vector seasons containing four quarter indicators is used to
compare with the last digit of the variable qt. The GAUSS command dummy
creates a matrix of four columns of dummy variables, D. It compares each data
observation of qt%10 to the breakpoints designated in the vector seasons. If the
data are in the range designated, a one is placed in the corresponding element of
matrix D, if not, a zero is placed.
Running the program lesson4.1 will produce two sets of regression results in the
output file output4.1. The first estimated model looks like this:
65
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
We can also write the estimated model as four separate equations, one for each
quarter:
We have estimated the linear relationship between capital expenditures (CEXP) and
appropriations (CAPP) with varying intercept terms to represent the seasonal
differences in the model. Is there a real or significant difference among the four
estimated regression equations? Analyzing both the t-ratios and the P-values reveals
that the coefficients of dummy variables are not statistically significantly different
from zero. Furthermore, the partial regression values are very small for the dummy
variables. A more formal procedure is to test the hypothesis that all of the
coefficients of dummy variables are jointly equal to zero. The hypothesis is that δ1 =
0, δ2 = 0, and δ3 = 0 hold simultaneously. The GPE input control variable _restr
(line 15) defines Almon’s equation with the three quarterly dummy variables jointly
equaling zero.
_restr = {0 1 0 0 0,
0 0 1 0 0,
0 0 0 1 0};
Then, restricted least squares estimation is carried out in line 16. Here is the second
set of estimation results in which the coefficients of three quarterly dummy variables
are restricted to zero:
66
DUMMY VARIABLES
This condition is called the “dummy variable trap.” The dummy variable trap gets
just about everyone at some time. Understanding how the dummy variable trap
happens will make avoiding it easier. Remember that a typical regression equation
contains a constant vector of ones associated with the intercept coefficient. Now, if
there is a dummy variable for each group, summing all the dummy variables together
equals one. The problem of perfect collinearity exists! Dropping one dummy
variable is not the only solution to stay out of the “trap.” The alternative is to include
all dummy variables but to estimate the regression without the intercept term. In
GPE, regression estimation without intercept is carried out by setting the input
control variable:
_const = 0;
/*
** Lesson 4.2: Dummy Variable Trap
*/
1 use gpe2;
2 output file = gpe\output4.2 reset;
3 load almon[61,3] = gpe\almon.txt;
4 cexp = almon[2:61,2];
5 capp = almon[2:61,3];
6 qt = almon[2:61,1];
7 pattern = {1 0 0 0,
67
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
0 1 0 0,
0 0 1 0,
0 0 0 1};
8 D = reshape(pattern,60,4);
9 call reset;
10 _const = 0; @ regression without intercept @
11 _names = {"cexp","capp","q1","q2","q3","q4"};
12 call estimate(cexp,capp~D);
13 end;
Run this program, and refer to the output file output4.2 for details. The important
catch is the statement of line 10:
_const = 0;
Without it, you will fall into the “dummy variable trap”! The estimated model can be
summarized as follows:
The interpretation of the coefficients associated with four quarter dummy variables is
directly reflected as the intercept values of each equation:
A careful eye will see that these results are the same as those of the first regression
equation in Lesson 4.1 using three dummies and a constant term.
Structural Change
In the next lesson, we will use a dummy variable approach to estimate and test for
structural change previously studied in the production function of Lesson 3.5. Recall
that a simple Cobb-Douglas production function was estimated using time series of
U.S. real output (X), labor (L) and capital (K) inputs obtained from the data file
cjx.txt. The question was, is there a change in both intercept and slope terms during
post-war expansion after 1948? In Lesson 3.5, a Chow test was formulated and
performed with two separate samples: 1929-1948 and 1949-1967. The alternative
approach is to use a dummy variable for sample separation, and check for the
difference in intercept and slope terms of the regression for each sub-sample. To
check for the intercept difference, the use of an additive dummy variable would
suffice. To check for the slope difference, a multiplicative dummy variable
associated with each explanatory variable should be used.
68
DUMMY VARIABLES
create D is to compare a given vector available in the original time series to a set
value. In this lesson we create a dummy variable D by comparing each observation
in the vector YEAR to the value 1948. For every observation greater than 1948, D is
set to one, otherwise D is set to zero. Notice that the dot (.) before the “>” means
element-by-element greater-than comparison:
D = year.>1948;
If the number of continuing observations designed for the base and alternative
situations are known, concatenating a vector of zeros vertically to a vector of ones is
a simple method of creating the dummy variable D. In this case,
D = zeros(20,1)|ones(19,1);
/*
** Lesson 4.3: Testing for Structural Change
** Dummy Variable Approach
*/
1 use gpe2;
2 output file = gpe\output4.3 reset;
3 load data[40,6] = gpe\cjx.txt;
4 year = data[2:40,1];
5 X = ln(data[2:40,2]);
6 L = ln(data[2:40,3]);
7 K = ln(data[2:40,5]);
8 D = year.>1948;
9 DL = D.*L;
10 DK = D.*K;
11 call reset;
12 _names = {"X","L","K","DL","DK","D"};
13 call estimate(X,L~K~DL~DK~D);
14 _restr = {0 0 1 0 0 0,
0 0 0 1 0 0,
0 0 0 0 1 0};
15 call estimate(X,L~K~DL~DK~D);
16 end;
Line 8 creates the additive dummy variable named D. Lines 9 and 10 use D to set up
multiplicative dummy variables in association with the other two explanatory
variables L and K, respectively. Thus, for entries of D equal to one, the corresponding
entry in DL equals L and the corresponding entry in DK equals K. Otherwise, the
entries of DL and DK are zeros. The three dummy variables, one additive and two
multiplicative, are added to estimate in line 13. In this example, our model can be
written in two ways. It may be written with two separate regressions, one for years
before 1948, and one for the years after. This example demonstrates how to construct
both situations into one combined regression as follows:
X = β0 + β1 L + β2 K + δ0 D + δ1 DL + δ2 DK + ε
When D equals zero (that is, for the period 1929-1948), we have what is called the
base case. When D equals one (that is, 1949-1967), the estimated coefficients of the
dummy variables are added to the estimated coefficients of the independent variables
including the constant vector. In other words,
For 1929-1948, X = β0 + β1 L + β2 K + ε;
69
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Run the program so that we can check out the first estimated regression from the
output:
One look at t-ratios and P-values tells us that the dummy variables are not
statistically significant. To test for the structural change, we need to verify that the
coefficients of both additive and multiplicative dummy variables are all zero. In
other words, we must show that δ0 = 0, δ1 = 0, and δ2 = 0 jointly. The GPE input
control variable _restr, defines these three dummy variables in a 3 by 6 matrix as
shown in line 14:
_restr = {0 0 1 0 0 0,
0 0 0 1 0 0,
0 0 0 0 1 0};
Line 15 estimates the restricted model (restricting all coefficients associated with
dummy variables to zeros) in which no structural change is assumed. Here is the
result of the restricted least squares estimation:
70
DUMMY VARIABLES
Comparing the result of Chow test presented in Lesson 3.5 to the above output shows
an identical computed Wald F-test statistic of 1.27 for three linear restrictions on
dummy variables. In other words, based on the Cobb-Douglas log specification of
the production function, there is no reason to believe that there was a structural
change in output productivity between the years of 1929 and 1967. Both Lesson 3.5
(sample separation approach) and Lesson 4.3 (dummy variable approach) reach the
same conclusion. However, to a careful eye, there are subtle differences in the
estimated standard errors and t-ratios for the regression coefficients obtained from
these two approaches. Why?
71
V
Multicollinearity
Multicollinearity is a data problem due to a group of highly correlated explanatory
variables used in the regression equation. The consequence of multicollinearity is
large standard errors of the coefficient estimates. The size of these errors suggest that
there are too many explanatory variables and some of them may not be needed. Then
the question is how to identify and treat the irrelevant explanatory variables in the
regression.
The famous Longley data are known for the problem of multicollinearity. Instead of
constructing a meaningful model, we will demonstrate a hypothetical relationship
with the dependent variable (EM), regressed against a set of four other variables
(YEAR, PGNP, GNP, and AF).
Detecting Multicollinearity
Given the regression equation:
the focus of this chapter is to examine how closely the four explanatory variables
(YEAR, PGNP, GNP, and AF) are related. Lessons 5.1, 5.2, and 5.3 address the
techniques of detecting multicollinearity. These include: condition number and
correlation matrix (Lesson 5.1), Theil’s measure of multicollinearity (Lesson 5.2),
and Variance Inflation Factors (Lesson 5.3).
t2
2
t +DF
Another useful tool to check for the problem of multicollinearity is the data
correlation matrix, which describes the simple pair-wise correlation among all the
variables used in the regression. The built-in GAUSS command corrx can do
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
exactly that, but the GPE package offers the convenience of a data correlation matrix
by setting the following input control variable:
_corr = 1;
Using the Longley data, the following program estimates the model with the
dependent variable (EM) regressed against a set of four other variables (YEAR,
PGNP, GNP, and AF). The problem of multicollinearity is detected by examining the
partial regression coefficients, as well as the condition number and correlation matrix
of the explanatory variables.
/*
** Lesson 5.1: Condition Number and Correlation Matrix
*/
1 use gpe2;
2 output file = gpe\output5.1 reset;
3 load data[17,7] = gpe\longley.txt;
4 data = data[2:17,.];
5 year = data[.,1];
6 pgnp = data[.,2];
7 gnp = data[.,3];
8 af = data[.,5];
9 em = data[.,7];
10 call reset;
11 _corr = 1; @ cond# and correlation matrix @
12 _names = {"em","year","pgnp","gnp","af"};
13 call estimate(em,year~pgnp~gnp~af);
14 end;
74
MULTICOLLINEARITY
With the exception of the variable GNP, small partial regression coefficients are
strong indications of irrelevant explanatory variables. The added information from
the use of the input control variable _corr = 1 (line 11) includes the condition
number and correlation matrix of the explanatory variables. The correlation
coefficients between the dependent variable and each independent variable are given
in the first column of the correlation matrix. These measure the individual effect of
each independent variable on the dependent variable. With the exception of the
variable AF, the explanatory variables have a rather high correlation with the
dependent variable. However, these variables are also highly correlated among
themselves, as seen from the rest of the correlation matrix. In addition, the condition
number of explanatory variables is extremely large, suggesting severe
multicollinearity for this set of variables.
where R2 is the R-square (that is, coefficient of determination) of the full model,
including all explanatory variables. R-j2 is the R-square of the same regression model
excluding the j-th explanatory variable. Therefore, the difference R2 - R-j2 measures
the net contribution of the j-th explanatory variable in terms of R-square. K is the
number of explanatory variables of the full regression, in which the first one is the
constant term. Notice that the index j for summation does not count the constant
term. In the ideal case of no multicollinearity, Theil’s measure equals or is close to
zero.
The first regression in the following program (lines 10-13) estimates the full model
with dependent variable (EM) on a set of four independent variables (YEAR, PGNP,
GNP, and AF). The rest of the program (lines 14-33) estimates four regression
equations; each corresponds to the partial model with one of the independent
variables removed. The R-squares from the full model and from the four partial
models are then used to compute the Theil’s measure of multicollinearity.
Instead of showing the lengthy results of each regression estimation, we explain the
use of output control variables in GPE for keeping track of the information from
75
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
each of the regression runs. The use of an output control variable is first introduced
in line 13. In GPE, output control variables take on new values each time
estimate or forecast is called. An output control variable is identified with a
name beginning with a double underscore ( __ ). For example, __r2 is the value of
R-square computed in the previous estimation. Therefore, in line 13, assigning __r2
to a variable named r2 allows us to use that value later in the program. See
Appendix A for a complete list of output control variables available in GPE.
/*
** Lesson 5.2: Theil’s Measure of Multicollinearity
*/
1 use gpe2;
2 output file = gpe\output5.2 reset;
3 load data[17,7] = gpe\longley.txt;
4 data = data[2:17,.];
5 year = data[.,1];
6 pgnp = data[.,2];
7 gnp = data[.,3];
8 af = data[.,5];
9 em = data[.,7];
10 call reset;
11 _names = {"em","year","pgnp","gnp","af"};
12 call estimate(em,year~pgnp~gnp~af);
13 r2 =__r2;
14 call reset;
15 print"Partial Regression 1: EM = PGNP GNP AF";
16 _names = {"em","pgnp","gnp","af"};
17 call estimate(em,pgnp~gnp~af);
18 r2x1 = __r2;
19 print"Partial Regression 2: EM = YEAR GNP AF";
20 _names = {"em","year","gnp","af"};
21 call estimate(em,year~gnp~af);
22 r2x2 = __r2;
23 print"Partial Regression 3: EM = YEAR PGNP AF";
24 _names = {"em","year","pgnp","af"};
25 call estimate(em,year~pgnp~af);
26 r2x3 = __r2;
27 print"Partial Regression 4: EM = YEAR GNP PGNP";
28 _names = {"em","year","gnp","pgnp"};
29 call estimate(em,year~gnp~pgnp);
30 r2x4 = __r2;
31 print "Theil’s Measure of Multicollinearity =";;
32 print r2-sumc(r2-(r2x1|r2x2|r2x3|r2x4));
33 end;
From four partial regressions, we repeat the use of output variable __r2, to keep
track of the R-square of each regression. By renaming each __r2 and subtracting it
from the R-square of the full model, these net differences are concatenated and then
summed using a GAUSS command sumc (see line 32). Running the program, the
output displays the results of all the regressions before the line:
In summary, the near unity of the Theil’s measure confirms the problem of
multicollinearity.
76
MULTICOLLINEARITY
1
1- Rj2
It can be used to detect multicollinearity, where Rj2 is the R-square from regressing
the j-th explanatory variable on all the other explanatory variables. A near unity Rj2
and hence a high value of VIF indicates a potential problem of multicollinearity with
the j-th variable.
The following program computes VIF for each explanatory variable through a set of
four auxiliary regressions similar to the procedure used in computing Theil’s
measure of multicollinearity.
/*
** Lesson 5.3: Variance Inflation Factors (VIF)
*/
1 use gpe2;
2 output file = gpe\output5.3 reset;
3 load data[17,7] = gpe\longley.txt;
4 data = data[2:17,.];
5 year = data[.,1];
6 pgnp = data[.,2];
7 gnp = data[.,3];
8 af = data[.,5];
9 em = data[.,7];
10 call reset;
11 print "Aux Regression 1: YEAR = PGNP GNP AF";
12 y = year;
13 x = pgnp~gnp~af;
14 _names = {"year","pgnp","gnp","af"};
15 call estimate(y,x);
16 r2x1 = __r2;
17 print "Aux Regression 2: PGNP = YEAR GNP AF";
18 y = pgnp;
19 x = year~gnp~af;
20 _names = {"pgnp","year","gnp","af"};
21 call estimate(y,x);
22 r2x2 = __r2;
23 print "Aux Regression 3: GNP = YEAR PGNP AF";
24 y = gnp;
25 x = year~pgnp~af;
26 _names = {"gnp","year","pgnp","af"};
27 call estimate(y,x);
28 r2x3 = __r2;
29 print "Aux Regression 4: AF = YEAR GNP PGNP";
30 y = af;
31 x = year~gnp~pgnp;
32 _names = {"af","year","gnp","pgnp"};
33 call estimate(y,x);
34 r2x4 = __r2;
35 r2=r2x1|r2x2|r2x3|r2x4;
36 print "Variance Inflation Factors:";
37 print " Model R-Square VIF";;
38 print seqa(1,1,4)~r2~(1/(1-r2));
39 end;
77
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The first part of the program performs four auxiliary regression estimations. Each
corresponds to the regression of one selected explanatory variable against the rest of
the others. Only the R-squares from the four estimated regressions are of interest in
computing the VIF. First, these values are retained using the output variable __r2,
then they are concatenated into a vector (line 35) for calculating the VIF of each
variable (line 38). Based on the R-square measure of each auxiliary regression, VIF
for each explanatory variable is reported as follows:
Again, all explanatory variables except variable AF (Model 4) have higher than
normal values of VIF, indicating a severe problem of multicollinearity.
br = (I + r(X'X)-1)-1b
78
MULTICOLLINEARITY
bpc = VV'b
Var(bpc) = (VV')Var(b)(VV')
lesson5.4 is a GAUSS program which implements the ridge regression and principal
components based on the hypothetical regression equation with the Longley data as
described in the previous lessons 5.1 to 5.3. After obtaining the ordinary least
squares result, we introduce several GAUSS commands to perform ridge regression
and principal components estimation. For detailed explanations of the GAUSS
commands used therein, refer to the GAUSS Command References or consult the
on-line help menu.
/*
** Lesson 5.4: Ridge Regression and Principal Components
*/
1 use gpe2;
2 output file = gpe\output5.4 reset;
3 load data[17,7] = gpe\longley.txt;
4 data = data[2:17,.];
5 year = data[.,1];
6 pgnp = data[.,2];
7 gnp = data[.,3];
8 af = data[.,5];
9 em = data[.,7];
10 call reset;
11 _names = {"em","year","pgnp","gnp","af"};
12 call estimate(em,year~pgnp~gnp~af);
/* Principal Components */
@ compute char. roots and vectors of X'X @
21 {r,v}=eigrs2(x'x);
22 v = selif(v',r.>0.1)';
23 bpc = v*v'__b;
24 vbpc = v*v'__vb*v*v';
79
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
25 print;
26 print "Principal Components Model:";
27 print " Coefficient Std Error";;
28 print bpc~sqrt(diag(vbpc));
29 end;
First we estimate the equation using ordinary least squares (lines 10-12), then from
line 13 on we focus on the data matrix of explanatory variables including the
constant term, to perform ridge regression and principal components. Ridge
regression is obtained for a shrinkage parameter of 0.3 (lines 14-20). We could try
several small positive values for a shrinkage parameter to find the most “stable”
coefficient estimates. The following output is the result of ridge regression:
As we can see from the above example, the computation of ridge regression and
principal components is easy, but the interpretation of the resulting coefficient
estimates will be difficult.
80
VI
Nonlinear Optimization
To find an optimal (maximal or minimal) solution of a scalar-valued function is at
the core of econometric methodology. The technique of least squares estimation is an
example of solving the nonlinear “sum-of-squares” objective function. For a linear
regression model, the exact solution is derived using the analytical formula of matrix
algebra. However, the problem may be more complicated if the regression equation
is nonlinear in the parameters. In this case, approximation or iterative methods of
nonlinear optimization will be necessary. We will consider only the case of
unconstrained optimization. In most cases, simple equality constraints can be
substituted into the objective function so that the problem is essentially the
unconstrained one. Nonlinear optimization with inequality constraints is difficult,
though not impossible.
or
proc FunctionName(Data,Parameters);
…
endp;
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
where FunctionName is the name of the function, Data are the sample
observations of data series, and Parameters are the parameters or coefficients of
the function. For a statistical model, both Data and Parameters are used to
define the function FunctionName. For a mathematical function, only the
Parameters matter, therefore Data can be set to 0 (or a dummy value) in this
case.
Here, &FunctionName denotes the code address (holding place) of the function
FunctionName we declared earlier, which itself is defined with Data (a set of
sample data) and Parameters (a vector of initial values of parameters).
Using GPE for nonlinear functional optimization (or estimation), the following input
control variables are required:
• _nlopt
• _b
• _iter
The GPE input control variable _nlopt defines the type of optimization problem
involved. _nlopt=0 indicates a minimization problem, while _nlopt=1 indicates
a maximization problem. Since numerical iteration is used for solving a nonlinear
model, the solution found can be at best a local one. The input variable _b provides
the initial guess of parameters as the starting point of iterations. Different starting
values of _b may lead to different (local) solutions. In an effort to find a global
solution for the function, several different values of _b should be tried. The variable
_iter sets the maximal number of iterations allowed for a particular problem.
Usually we keep _iter low for testing the function. When the function is debugged
and ready for solving, _iter should be set large enough to ensure the convergence
of an iterative solution.
Calling the procedure estimate for nonlinear model estimation (or optimization)
is similar to the case of the linear regression model. The differences are that under
nonlinear estimation or optimization, the first argument of estimate is now an
address for the objective function and the second argument (for the data matrix) is
more forgiving in its structure. Remember that the objective function must be
defined with both data and parameters before calling the estimate procedure.
f(x) = ln(x) – x2
82
NONLINEAR OPTIMIZATION
/*
** Lesson 6.1: One-Variable Scalar-Valued Function
** f(x) = ln(x) – x^2
*/
1 use gpe2;
2 output file=output6.1 reset;
3 fn f(data,x)=ln(x)-x^2;
4 call reset;
5 _nlopt=1;
6 _iter=100;
7 _b=0.5;
8 call estimate(&f,0);
9 end;
Line 5 indicates the maximization problem involved, and line 6 sets the iteration
limit for finding the solution. The estimation (maximization, in particular) of
function f starts with the initial value of x at 0.5 as shown in line 7. The GPE input
variable _b controls the starting value of iteration. Notice that here we do not use
sample data or parameter names in defining the function and its maximization.
Running the above lesson program, we obtain the following result:
Initial Result:
Function Value = -0.94315
Parameters = 0.50000
Final Result:
Iterations = 6 Evaluations = 38
Function Value = -0.84657
Parameters = 0.70711
Gradient Vector = -4.2549e-006
Hessian Matrix = -4.0000
Starting at x = 0.5 with function value – 0.94315, it takes six iterations to reach the
convergence of a solution. The solution 0.70711 is indeed a maximum with function
value – 0.84657, where the gradient is almost zero at – 4.2549e-06 and the hessian is
negative at – 4.0.
83
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
You may want to define the function’s analytical derivatives and use them for
solving the function. For this example, they are:
fn f1(data,x) = 1/x – 2*x;
fn f2(data,x) = -1/(x^2) – 2;
The functions f1 and f2 are the first and second derivatives of f, respectively. It
may be necessary to write a multi-line procedure for the derivatives of a more
complicated function. To solve the function with analytical derivatives, just
concatenate the first and second derivatives together and set it to the input control
variable _derviv before calling the procedure estimate as below:
_deriv = &f1|&f2;
call estimate(&f,0);
There is no need to use both first and second derivatives. Using only the first
derivative will work. That is,
_deriv = &f1;
The use of analytical derivatives will speed up the computation and increase the
numerical precision of the solution. However, for a complicated function, it is often
a difficult task to write and code the analytical formulas of derivatives.
The bare-bones program of Lesson 6.1 does not take advantage of the many options
available in GPE to fine tune the optimization process. For a simple problem, as the
one shown above, the default settings of the optimization method (i.e., steepest-
ascent method) and convergence criteria (i.e., convergence in function value and
solution relative to the tolerance level of 0.001) may be acceptable.
We now explain some of the GPE input control variables, which provide the option
to select one of many optimization methods and control its behavior in order to find
the optimal solution for a more complicated and difficult function. These control
variables are:
• _method
• _step
• _conv
• _tol
• _restart
84
NONLINEAR OPTIMIZATION
All the optimization or estimation methods should be combined with a line search to
determine the step size of optimization for each iteration. The default line search
method is a simple cutback method (_step=0). Setting _step=1 causes the
quadratic step size to be used in the search. Readers interested in a more detailed
discussion and comparison of different optimization methods should check the
references (e.g., Quandt, 1983; Judge, et al., 1985, Appendix B; Greene, 1999,
Chapter 5) for details.
The other optional input variables control the accuracy and convergence of the
solution. The variable _tol sets the tolerance level of convergence. Typically _tol
is a small number (default value 0.001). The variable _conv checks for two
consecutive iterations to reach convergence, relative to the tolerance level. When
_conv=0 (default), only the function values and solutions are checked for
convergence with _tol; when _conv=1, the convergence of function values,
solutions, and zero gradients are checked with _tol. Finally, the variable
_restart sets the number of times to restart the computation when the function
value fails to improve. A maximum of 10 restarts is allowed, with no restart as the
default (_restart=0).
As will be demonstrated in many example lessons below, we use all sorts of different
optimization methods or algorithms for different types of problems. It is not unusual
that a different (local) solution may be found due to the particular algorithm in use.
Although there is no clear indication which method should be used for what type of
problem, we recommend a mixed bag of optimization methods in conjunction with a
variety of options controlling the numerical optimization. It is a matter of
experimentation to find the best suite of solution tools for a particular problem.
There are four minima, (3,2), (3.5844, -1.8481), (-3.7793, -3.2832), and
(-2.8051, 3.1313) with the same function value 0, although we can only find one
minimum at a time. With various initial starting values of the variables, we are able
to find all of the four solutions. Also, the maximal function value 181.62 is found at
the solution (-0.27084, -0.92304). Be warned that sometimes the solutions are
difficult to find because there are several saddle points, (0.08668, 2.88430),
(3.38520, 0.07358), and (-3.07300, -0.08135), in the way. Here is the program:
/*
** Lesson 6.2: Two-Variable Scalar-Valued Function
** g(x) = (x[1]^2+x[2]-11)^2 + (x[1]+x[2]^2-7)^2
*/
1 use gpe2;
2 output file=output6.2 reset;
3 fn g(data,x)=(x[1]^2+x[2]-11)^2+(x[1]+x[2]^2-7)^2;
4 call reset;
5 _nlopt=0;
6 _method=1;
85
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
7 _iter=100;
8 _step=1;
9 _conv=1;
10 _b={3,-2};
11 call estimate(&g,0);
12 end;
Line 3 defines the one-line objective function g. Again, data is not used for
defining such a function, in which only the vector of parameters x matters. In this
example, a version of the quasi-Newton method (BFGS, i.e., _method=1) is used
(line 6) for optimization. It takes seven iterations to find one of the four minima
(3.58, -1.85) from the initial starting point _b=(3, -2) given in line 10. Run this
program, and refer to the output file output6.2 for more details.
For pedagogical purposes, we write out the procedures for analytical first and second
derivatives g1 and g2, although we do not use them in the above lesson. We note
that g1 is a row-vector gradient and g2 is a hessian matrix:
proc g1(data,x); @ 1st derivative of g(x) @
local f1,f2;
f1=4*x[1]*(x[1]^2+x[2]-11)+2*(x[1]+x[2]^2-7);
f2=2*(x[1]^2+x[2]-11)+4*x[2]*(x[1]+x[2]^2-7);
retp(f1~f2);
endp;
If both derivatives g1 and g2 were used in the optimization, we need only to set
_deriv = &g1|&g2;
By changing the initial values of the parameters in line 10 of lesson6.2, all solutions
may be found. We suggest the following values and the corresponding minima to
which they converge. Try them out:
Unfortunately, without knowing the solution ahead of time, the search is rather a
blind process. The general rule of thumb is to try as many different initial values as
possible. As an exercise, modify the program of Lesson 6.2 to find the maximum
(-0.27, -0.92) with function value 181.62. Hint: Try _nlopt=1 (line 5) and
_b={0,0} (line 10).
86
NONLINEAR OPTIMIZATION
The problem is to maximize the log-likelihood function ll(θ) so that the solution θ
characterizes the probability distribution of the random variable X under
consideration. To find the θ that maximizes ll(θ) is the essence of maximum
likelihood estimation. The corresponding variance-covariance matrix of θ is derived
from the information matrix (negatives of the expected values of the second
derivatives) of the log-likelihood function as follows:
∂2ll -1
Var(θ) = -E
∂θ∂θ'
The familiar example is the likelihood function derived from a normal probability
distribution:
1 (X − µ )2
f ( X, θ ) = exp 2
2πσ2 2σ
1 1 (ln(X ) − µ )2
f ( X, θ ) = exp
2πσ2 X 2σ 2
1 1
with the solution µ = N ∑i=1,…,N ln(Xi) and σ2 = N ∑i=1,…,N (ln(Xi)- µ)2, the
corresponding mean and variance of X are E(X) = exp(µ+σ2/2) and Var(X) =
exp(2µ+σ2) [exp(σ2)-1], respectively. Many economic variables are described with a
log-normal instead of a normal probability distribution. If µ is re-parameterized in
terms of a set of non-random variables Z and additional parameters β, µ = Zβ for
example, we get the statistical regression model, to be discussed in the next section.
87
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
λρ
f (X, θ) = exp −λX X ρ−1
Γ(ρ)
where θ = (λ, ρ) is the parameter vector with λ > 0 and ρ > 0. The mean of X is ρ/λ,
and the variance is ρ/λ2. Many familiar distributions, such as the exponential and
Chi-square distributions, are special cases of the gamma distribution.
As with the normal distribution, the technique of maximum likelihood can be used to
estimate the parameters of the gamma distribution. Sampling from N independent
observations from the gamma distribution, the log-likelihood function is:
/*
** Lesson 6.3: Estimating Probability Distributions
** See Greene (1999), Chapter 4
*/
1 use gpe2;
2 output file=output6.3 reset;
3 load data[21,2]=gpe\yed20.txt;
4 x=data[2:21,1]/10; @ income data: scaling may be helpful @
88
NONLINEAR OPTIMIZATION
9 call reset;
10 _nlopt=1;
11 _method=4;
12 _iter=100;
13 _b={3.0,2.0};
14 call estimate(&llfn,x);
15 _b={1.0,0.5};
16 call estimate(&llfln,x);
17 _b={2.0,0.5};
18 call estimate(&llfg,x);
19 end;
Initial Result:
Function Value = -44.174
Parameters = 3.0000 2.0000
89
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Final Result:
Iterations = 3 Evaluations = 39
Function Value = -43.974
Parameters = 3.1278 2.1809
Gradient Vector = -0.00010836 0.00040791
Asymptotic Asymptotic
Parameter Std. Error t-Ratio
X1 3.1278 0.48766 6.4139
X2 2.1809 0.34483 6.3247
For the case of log-normal distribution, starting from the initial values of (µ, σ) at
(1.0, 0.5) in line 15, the maximum likelihood solution is found at (0.9188, 0.6735) as
shown in line 16. The maximal value of log-likelihood function is –38.849. Here is
the output:
Initial Result:
Function Value = -41.299
Parameters = 1.0000 0.50000
Final Result:
Iterations = 3 Evaluations = 44
Function Value = -38.849
Parameters = 0.91880 0.67349
Gradient Vector = -0.0018227 0.027819
Asymptotic Asymptotic
Parameter Std. Error t-Ratio
X1 0.91880 0.15053 6.1039
X2 0.67349 0.10637 6.3317
90
NONLINEAR OPTIMIZATION
Convergence Criterion = 0
Tolerance = 0.001
Initial Result:
Function Value = -40.628
Parameters = 2.0000 0.50000
Final Result:
Iterations = 3 Evaluations = 42
Function Value = -39.324
Parameters = 2.4106 0.77070
Gradient Vector = -0.0036550 0.0078771
Asymptotic Asymptotic
Parameter Std. Error t-Ratio
X1 2.4106 0.71610 3.3663
X2 0.77070 0.25442 3.0293
1 (X − µ 1 ) 2
f 1 ( X, µ 1 , σ 1 ) = exp 2 ,
2πσ 1
2
2σ 1
1 (X − µ 2 )2
f 2 ( X, µ 2 , σ 2 ) = exp 2 .
2πσ 2
2
2σ 2
91
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
where λ is the probability that an observation is drawn from the first distribution
f1(X,µ1,σ1), and 1-λ is the probability of that drawn from the second distribution. θ =
(µ1,µ2,σ1,σ2,λ) is the unknown parameter vector that must be estimated.
Continuing from the previous example, suppose each observation of the variable
INCOME is drawn from one of two different normal distributions. There are five
parameters, the first two are the mean and standard error of the first normal
distribution, while the second pair of parameters corresponds to the second
distribution. The last parameter is the probability that the data are drawn from the
first distribution. Lines 12 to 17 of lesson6.4 below define the log-likelihood
function for the mixture of two normal distributions.
/*
** Lesson 6.4: Mixture of Two Normal Distributions
** See Greene (1999), Chapter 4
*/
1 use gpe2;
2 output file=output6.4 reset;
3 load data[21,2]=gpe\yed20.txt;
4 x=data[2:21,1]/10; @ income data: scaling may help @
5 call reset;
6 _nlopt=1;
7 _method=5;
8 _iter=100;
9 _b={3,3,2,2,0.5};
10 call estimate(&llf,x);
11 end;
/*
mixture of two normal distributions
mu1=b[1], mu2=b[2]
se1=b[3], se2=b[4]
prob.(drawn from the 1st distribution)=b[5]
*/
12 proc llf(x,b);
13 local pdf1,pdf2;
14 pdf1=pdfn((x-b[1])/b[3])/b[3];
15 pdf2=pdfn((x-b[2])/b[4])/b[4];
16 retp(sumc(ln(b[5]*pdf1+(1-b[5])*pdf2)));
17 endp;
Initial Result:
Function Value = -44.174
92
NONLINEAR OPTIMIZATION
Final Result:
Iterations = 11 Evaluations = 406
Function Value = -38.309
Parameters = 2.0495 5.7942 0.81222 2.2139 0.71204
Gradient Vector = 1.1441e-005 0.00000 2.1870e-005 5.1351e-006 -
4.7899e-005
Asymptotic Asymptotic
Parameter Std. Error t-Ratio
X1 2.0495 0.24421 8.3922
X2 5.7942 1.4988 3.8659
X3 0.81222 0.19424 4.1816
X4 2.2139 0.90617 2.4431
X5 0.71204 0.14456 4.9254
With the maximum log-likelihood function value of –38.309, the variable INCOME
when drawn from the mixtures of two different normal probability distributions is as
convincing as when that variable is drawn from a single non-normal distribution
(log-normal or gamma) as demonstrated in Lesson 6.3.
As with the linear regression model, values of several output control variables are
available after nonlinear least squares or maximum likelihood estimation:
93
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Appendix A, GPE Control Variables, lists and explains the usage of these input and
output control variables.
where ε is the error term and β’s are the unknown parameters. The data matrix X =
(L, K, Q) is available in the text file judge.txt. The method of least squares
estimation is to find the vector β = (β1,β2,β3,β4) so that the sum-of-squared errors
S(β) = ε'ε is minimized.
/*
** Lesson 6.5: Minimizing Sum-of-Squares Function
** Estimating a CES Production Function
** See Judge, et al. (1988), Chapter 12
*/
1 use gpe2;
2 output file=output6.5 reset;
3 load x[30,3]=gpe\judge.txt;
4 call reset;
5 _nlopt=0;
6 _method=5;
7 _iter=100;
8 _tol=1.0e-5;
9 _vcov=1;
10 _b={1.0,0.5,-1.0,-1.0};
11 call estimate(&cessse,x);
12 end;
/* Objective Function */
13 proc cessse(data,b); @ sum-of-squares function @
14 local l,k,q,e;
15 l=data[.,1];
16 k=data[.,2];
17 q=data[.,3];
18 e=ln(q)-b[1]-b[4]*ln(b[2]*l^b[3]+(1-b[2])*k^b[3]);
19 retp(sumc(e^2));
20 endp;
94
NONLINEAR OPTIMIZATION
We keep the definition of objective function cessse outside (beyond the end
statement) of the main program. There is no strict rule dictating where to place the
functions you define. Putting the function or procedure outside of the main program
makes the function accessible to other procedures you write for other purposes.
The final solution is found after 36 iterations. To save space, we report only the final
result of the iterations. The output file output6.5 contains the details of all the
iterations for reference.
In your program, setting _print=0 will suppress the printing of iteration outputs
to the file and the screen.
Initial Result:
Function Value = 37.097
Parameters = 1.0000 0.50000 -1.0000 -1.0000
Final Result:
Iterations = 36 Evaluations = 1012
Function Value = 1.7611
Parameters = 0.12449 0.33668 -3.0109 -0.33631
Gradient Vector = 2.6755e-006 4.6166e-007 2.5664e-006 1.7166e-006
Hessian Matrix =
60.000 -5.7563 35.531 295.65
-5.7563 19.377 -3.4569 -23.595
35.531 -3.4569 35.461 298.10
295.65 -23.595 298.10 2509.4
Asymptotic
Parameter Std. Error t-Ratio
X1 0.12449 0.074644 1.6678
X2 0.33668 0.10809 3.1147
X3 -3.0109 2.2904 -1.3145
X4 -0.33631 0.26823 -1.2538
95
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Both the gradient and hessian of the solution confirm that the solution indeed
minimizes the sum-of-squares objective function at the value 1.761. The estimated
model is presented as follows:
1
N
ε (X, β )′ ε (X, β )
exp
2 2σ 2
2πσ
where N is the sample size, and ε(X, β) = ln(Q) - β1 - β4 ln (β2L β3 + (1-β2)K β3). The
corresponding log-likelihood function of the unknown parameters θ = (β,σ) is
written as
ε (X, β ) ε (X, β )
'
( )
ll (θ ) = − N 2 ln(2π ) − N 2 ln σ 2 − 1 2
σ σ
The program below follows the same basic structure as in the previous lesson. The
relevant modifications of lesson6.6 include changing the objective function in the
call estimate to cesll (line 11) and setting the variable _nlopt=1 (line 5).
The objective log-likelihood function cesll is defined from lines 13 to 21. In
addition to β, the standard error of the model σ must be estimated simultaneously.
Line 10 sets out the initial values of θ = (β,σ).
/*
** Lesson 6.6: Maximizing Log-Likelihood Function
** Estimating a CES Production Function
** See Judge, et al. (1988), Chapter 12
*/
1 use gpe2;
2 output file=output6.6 reset;
3 load x[30,3]=gpe\judge.txt;
4 call reset;
5 _nlopt=1;
6 _method=5;
7 _iter=100;
8 _tol=1.0e-5;
9 _vcov=1;
10 _b={1.0,0.5,-1.0,-1.0,1.0};
11 call estimate(&cesll,x);
96
NONLINEAR OPTIMIZATION
12 end;
/* Objective Function */
13 proc cesll(data,b); @ log-likelihood function @
14 local l,k,q,e,n;
15 l=data[.,1];
16 k=data[.,2];
17 q=data[.,3];
18 e=ln(q)-b[1]-b[4]*ln(b[2]*l^b[3]+(1-b[2])*k^b[3]);
19 n=rows(e);
20 retp(-0.5*n*(ln(2*pi)+ln(b[5]^2))-0.5*sumc((e./b[5])^2));
21 endp;
σ2(β) = ε(X,β)'ε(X,β)/Ν
The advantage of using the concentrated log-likelihood function is that there is one
less parameter (that is, σ) to estimate directly.
Running the program lesson6.6, we obtain the following result (again, the details of
interim iterations can be found in the output file output6.6):
Initial Result:
Function Value = -46.117
Parameters = 1.0000 0.50000 -1.0000 -1.0000 1.0000
Asymptotic
97
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
It is no surprise that the solution is identical to the one obtained from minimizing the
sum-of-squares function in Lesson 6.5. In addition, the estimated standard error of
the normal distribution is found to be 0.2423, or σ2 = 0.0587. This also confirms the
minimal sum-of-squares S(β) = Nσ2 = 1.761.
98
VII
Nonlinear Regression Models
Many economic and econometric problems can be formulated as optimization
(minimization or maximization) problems. In econometrics, sum-of-squares
minimization and log-likelihood maximization are standard in empirical model
estimation. In the previous chapter, we defined a scalar-valued objective function to
minimize (maximize) and interpreted the parameter estimates in accordance with the
classical least squares (maximum likelihood) model. This approach is flexible
enough to encompass many different econometric models. In many situations,
however, it becomes troublesome to write out the objective function in detail. It is
more desirable to present only the functional form which defines the model directly,
such as
F(Z,β) = ε
where Z is the data matrix, β is the parameter vector, and ε is the error term. Both Z
and β are used to define the functional form of the model (that is, the error structure).
The data matrix Z can be further decomposed as Z = [Y, X] where Y consists of
endogenous (dependent) variables and X is a list of predetermined (independent)
variables. For a classical single regression equation, Y = f(X,β) + ε or ε = Y - f(X,β).
The special case of linear model is simply ε = Y - Xβ.
S(β) = ε'ε
∂S(b ) ∂β = 2ε ′(∂ε ∂β ) = 0
∂ 2 S(b ) ′ ∂ 2 ε i
= 2(∂ε ∂β ) (∂ε ∂β ) + ∑i =1, 2,..., N ε i
∂β∂β ′ ∂β∂β ′
where s2 = e'e/N is the estimated regression variance σ2, e = F(Z,b) is the estimated
error, and N is the sample size used for estimation. It becomes clear that only the
information concerning the functional form F(Z,β) and its first and second
derivatives is needed to carry out the nonlinear least squares estimation of b, e, s2,
and Var(b).
The setup of input control variables is the same as in Lesson 6.5. The difference is
the use of the residual function (instead of the sum-of-squares objective function) in
calling the estimate procedure (line 12). The residual function ces is defined in
the block from line 14 to line 20.
/*
** Lesson 7.1: CES Production Function Revisited
** Judge, et al. (1988), Chapter 12
*/
1 use gpe2;
2 output file=gpe\output7.1 reset;
3 load x[30,3]=gpe\judge.txt;
4 call reset;
11 _b={1.0,0.5,-1.0,-1.0};
12 call estimate(&ces,x);
13 end;
100
NONLINEAR REGRESSION MODELS
The regression output duplicates that of Lesson 6.5, and it is available in the output
file output7.1.
1 F(Z, β)2
exp | J (Z, β) |
2σ
2
2πσ 2
′
F(Z, β ) F(Z, β )
( )
ll (θ ) = − N 2 ln(2π ) − N 2 ln σ 2 − 1 2 + ∑i =1, 2,..., N ln | J (Zi , β ) |
σ σ
The technique of maximum likelihood estimation is to find the θ that maximizes the
log-likelihood function ll(θ). Usually the computation is performed by substituting
out the variance estimate σ2 = ε'ε/N = F(Z,β)'F(Z,β)/N. Then the following
concentrated log-likelihood function is maximized with respect to the parameter
vector β:
′
ll * (β ) = − N 2 [1 + ln(2π) − ln(N )] − N 2 ln F(Z, β ) F(Z, β ) + ∑i =1, 2,..., N ln | J (Z i , β ) |
′
ll * (β ) = − N 2 [1 + ln(2π) − ln(N )] − N 2 ln F * (Z, β ) F * (Z, β )
where F*(Z,β) = ε* is the weighted error, with the weight being the inverse of the
geometric mean of Jacobians (that is, 1/[(∏i=1,2,…,N|Ji|)1/N]). Therefore, maximizing
the concentrated log-likelihood function ll*(β) is equivalent to minimizing the
corresponding sum-of-squared weighted errors S*(β) = ε*'ε*.
The maximum likelihood estimator b of β is obtained from solving from the first-
order condition (recall that S* = ε*'ε* and ε* = F*(Z,β)):
We must also check that the hessian matrix is negative definite (the second-order
condition for maximization) at b:
101
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
∂ 2 ll * (b ) ′ ∂ 2S *
= − 1 2 (N S *)(1 S *)(∂S * ∂β ) (∂S * ∂β ) −
∂β∂β ′ ∂β∂β ′
∂ 2 ll * (b ) ∂ 2 S *
= −(N S *)1 2
∂β∂β ′ ∂β∂β ′
′ ∂ 2 ε i *
= −(N S *)(∂ε * ∂β ) (∂ε * ∂β ) + ∑i =1, 2,..., N ε i *′
∂β∂β ′
−1
∂ 2 ll * (b ) −1
Var(b ) = − E
= s 2 * (∂ε * ∂β )′ (∂ε * ∂β )
∂β∂β ′
We now introduce the GPE input variable _jacob, we have not yet introduced,
which controls the use of Jacobians in deriving the objective function (log-likelihood
or sum-of-squares) from the residuals. Notice that a Jacobian transformation is
nothing but a function of data and parameters. If you define a Jacobian function of
your own, then _jacob should be set to the location (address) of the function. An
example of a Jacobian function is given later in Lesson 7.2 on Box-Cox variable
transformation. If you do not wish to write out the Jacobian analytically, you may set
_jacob = 1;
Then the numerical Jacobian is computed for each sample observation, which is
usually a time consuming process. In case of requesting numerical Jacobians, the
first column of the data matrix used to define the residuals must be the dependent
variable Y (recall that J(Z,β) = ∂ε/∂Y and Z = [Y,X]).
Here, based on Lesson 7.1 above, we insert the following statement before calling
the estimate procedure in line 12:
_jacob = 0;
102
NONLINEAR REGRESSION MODELS
It is no surprise that the empirical results are identical for both techniques of
nonlinear least squares and maximum likelihood.
You must be sure that the first column of the data matrix data used to define the
residual function ces(data,b) corresponds to the dependent variable of the
model, ln(Q) in this case. As it was presented in Lesson 7.1, this rule is not
followed. You may want to correct the data matrix and rewrite the procedure
ces(data,b) so that you can use the numerical Jacobians which are all ones. The
estimation result should not be affected.
X(λ) = (Xλ-1)/λ
Although the range of λ can cover the whole set of real numbers, -2 ≤ λ ≤ 2 is the
area of interest in many econometric applications. λ = 2 corresponds to a quadratic
transformation, while λ = ½ is a square-root transformation. A linear model
corresponds to λ =1, and the logarithmic transformation is the limiting case where λ
approaches 0 (by L’Hôspital’s rule, limλ−>0 (Xλ-1)/λ = ln(X)).
The value of the power transformation parameter λ may not have to be the same for
each variable in the model. In particular, the dependent variable and independent
variables as a group may need different Box-Cox transformations. Let β = (α,λ,θ) be
the vector of unknown parameters for a regression model:
or, equivalently,
Y(θ) = X(λ)α + ε
[ ( )] ′
ll (β ) = − N 2 ln(2π ) + ln σ 2 − 1 2 F(Z, β ) F(Z, β ) / σ 2 + (θ − 1)∑i =1, 2,..., N ln(| Yi |)
103
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Z = [Y,X], β = (α,λ,θ), and for each data observation i, the Jacobian term of the
function is derived as J(Yi,θ) = ∂εi/∂Yi = Yi(θ-1). By substituting out the variance σ2 =
ε'ε/N, the concentrated log-likelihood function is
′
= − N 2 [1 + ln(2π ) − ln(N )] − N 2 ln F * (Z, β ) F * (Z, β )
As described in Greene’s Example 10.9, M is the real money stock M2, R is the
discount interest rate, and Y is real GNP. money.txt is the data text file consisting of
these variables. Several variations of the Box-Cox transformation may be estimated
and tested for selecting the most appropriate functional form of the money demand
equation:
∂ln(M ) R ∂M Rλ
= = α1 θ
∂ln(R ) M ∂R M
104
NONLINEAR REGRESSION MODELS
Similarly, the elasticity of GNP is ∂ln(M ) = α Y . The elasticity at the means (of
λ
∂ln(Y )
2
Mθ
data variables) should be reported for model interpretation.
/*
** Lesson 7.2: Box-Cox Transformation
** U.S. Money Demand Equation
** Greene (1999), Chapter 10
*/
1 use gpe2;
2 output file=gpe\output7.2 reset;
3 load x[21,4]=gpe\money.txt;
@ scale and re-arrange data: m,r,y @
4 x=(x[2:21,3]/1000)~(x[2:21,2])~(x[2:21,4]/1000);
5 call reset;
6 _method=0;
7 _iter=200;
8 _step=1;
9 _conv=1;
10 _jacob=&jf;
The residual function for the general model rf is defined in lines 21 through 28.
Notice that the residual function is written in such a way that the first column of the
data matrix is the dependent variable. The procedure jf, given in the block from
lines 16 to 20, defines the Jacobian terms for the likelihood function. Recall that the
Jacobian term is just Yθ-1 for the Box-Cox model, where Y is the left-hand side
dependent variable. As we have mentioned earlier in Lesson 7.1, the GPE input
variable _jacob controls the use of Jacobian transformation in defining the log-
likelihood and sum-of-squares functions. In the case of Box-Cox variable
transformation, the residuals are weighted with the inverse of geometric mean of
Jacobians: 1/[(∏i=1,2,…,N |Ji|)1/N] and Ji = Yiθ-1. When _jacob=0, the Jacobians are
not used (vanishing log-Jacobians is assumed). When _jacob=1, the numerical
105
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Jacobians are computed from the residual function with the assumption that the first
column of the data matrix is the dependent variable under consideration. When
_jacob is set to the location (address) of a procedure defining the analytical
Jacobians, the result of the procedure called is used. Here, in line 10:
_jacob = &jf;
the input control variable _jacob is assigned the result of the procedure
jf(data,b). Therefore, the analytical Jacobian transformation is used for log-
likelihood maximization and sum-of-squares (weighted) minimization. By defining
and applying the Jacobian transformation, as we have done here, we guarantee that
our parameters will be efficiently estimated.
Two sets of starting values of the parameters may be tried: one from the linear model
estimates and the other from log model. In the program, we start with the linear
model estimates (lines 11 and 12). The alternative is to start with the log model
estimates as given in the comment block immediately below line 12. Just to make
sure that you achieve the same solution starting from several different initial values
of the parameters, run program lesson7.2, and check the following result:
Initial Result:
Sum of Squares = 7.0963
Log Likelihood = -18.017
Parameters = -3.1694 -0.014921 1.5881 1.0000 1.0000
Final Result:
Iterations = 165 Evaluations = 29500
Sum of Squares = 0.11766
Log Likelihood = 22.978
Gradient of Log Likelihood = -8.2374e-005 -3.0382e-005 -2.8533e-005 -
0.00032734 -6.0704e-005
Asymptotic
Parameter Std. Error t-Ratio
X1 -14.503 12.127 -1.1959
X2 -14.067 50.134 -0.28058
X3 56.399 106.75 0.52831
106
NONLINEAR REGRESSION MODELS
The same model may be estimated with the technique of weighted least squares. As
long as the Jacobian is used, the sum-of-squares function is derived from the
residuals, weighted by the inverse of the geometric mean of the Jacobians. Just
replace line 13 with:
_nlopt=0;
This should produce the same result as that from the maximum likelihood
estimation. However, if we attempt to minimize the sum-of-squared unweighted
residuals, then the estimation result will not be efficient. It can even be biased.
Check it out by deleting line 10 or changing it to:
_jacob=0;
The program of Lesson 7.2 is readily modifiable to accommodate all of the special
cases of Box-Cox transformations. For example, for the case θ = λ, let’s define the
residual function rf1 as follows:
proc rf1(data,b);
local r,m,y,e;
@ box-cox transformation @
m=(data[.,1]^b[4]-1)/b[4];
r=(data[.,2]^b[4]-1)/b[4];
y=(data[.,3]^b[4]-1)/b[4];
e=m-b[1]-b[2]*r-b[3]*y;
retp(e);
endp;
To run this special case, modify the starting values for the parameters (note that the
number of parameters is changed as well) and call estimate with &rf1. That is,
the lines from 12 to 14 should read like this:
_b=b|1.0;
_nlopt=1;
call estimate(&rf1,x);
Other cases such as linear or log transformation on one side of the equation can be
estimated as long as the respective residual function is defined and used correctly.
You may have to experiment with different combinations of optimization options
and starting values to find all the solutions. Also, the last two cases of linear and log
models may be more conveniently estimated with linear least squares. We leave the
remainder of these special cases to you as exercises.
From Lesson 7.2, we have estimated the general Box-Cox transformation model,
together with many special cases. It is useful to tabulate and compare the estimation
results of all these models. Based on the sample data series in money.txt, what is the
107
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
most appropriate functional form for the U.S. money demand equation? To answer
this question, some knowledge of statistical inference will be necessary.
[ ( )]
Var (b ) = − E ∂ 2 ll (b ) / ∂β ′∂β
−1
where s2 = S(b)/N is the estimated asymptotic variance of the model. Note that S(b)
= e'e and e = F(Z,b).
c(β) = 0
If there are J active parameter restrictions, let the restricted parameter estimator and
its variance-covariance matrix be b* and Var(b*), respectively. For example, the
simplest case of a linear restriction c(β) = β - β0 (possibly a vector) confines the
parameter vector β to be near β0. The following three tests are useful for inference
about the model restrictions.
Wald Test
Without estimating the constrained model, the unconstrained parameter estimator b
is expected to satisfy the constraint equation closely, if the hypothesis is true. That is,
c(b) = 0. The Wald test statistic:
′
W = c(b ) { Var[c(b )] } c(b )
−1
′
{
W = c(b ) [∂c(b ) ∂β ] [Var(b )] [∂c(b ) ∂β ]
′
} c(b)
−1
Note that this test statistic does not require the computation of the constrained
parameters.
108
NONLINEAR REGRESSION MODELS
Lagrangian multiplier test is based on the score vector ∂ll(b*)/∂β of the original
parameterization of the log-likelihood function. If the constraints hold, then
∂ll(b*)/∂β should be close to ∂ll(b)/∂β for the unconstrained parameter estimator b,
which is of course zero. The Lagrangian multiplier test statistic is written as:
′
−1
′
LM = [e * ′ (∂ e * ∂ β )] (∂ e * ∂ β ) (∂ e * ∂ β ) [e * ′ (∂ e * ∂ β )] / (e * ′ e * /N )
LR = -2(ll(b*)-ll(b))
LR = N ln(S(b*)/S(b))
let’s verify the nonlinear equality constraint: β4 = 1/β3. The following program
implements the Wald, Lagrangian multiplier, and Likelihood Ratio tests, based on
constrained and unconstrained maximum likelihood estimates of the parameters. The
unconstrained model is the same as in Lesson 7.1 except that we are working with
maximum likelihood estimation (instead of sum-of-squares). The constrained
residual function rfc is defined in lines 38 through 45 in which the constraint β4 =
1/β3 is substituted into the function, eliminating the parameter β4. The single
constraint, expressed as β4β3 – 1 = 0, is given in lines 46 through 48 and is named as
the eqc procedure. In line 11 and 12, the constrained model is estimated, and the
estimated parameters and log-likelihood function value are saved for later use.
109
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
/*
** Lesson 7.3: Hypothesis Testing for Nonlinear Models
** CES Production Function: b[4]=1/b[3]
** Judge, et al. (1988), Chapter 12
*/
1 use gpe2;
2 output file=gpe\output7.3 reset;
3 load x[30,3]=gpe\judge.txt;
4 call reset;
5 _nlopt=1; @ MAXLIK: log-likelihood maximization @
6 _method=5;
7 _iter=100;
8 _tol=1.0e-5;
9 _conv=1;
10 _jacob=0; @ vanishing log-jacobians @
/* Wald Test */
@ based on unconstrained estimation @
19 _b={1.0,0.25,-1.0,-1.0};
20 call estimate(&rf,x);
21 b2=__b; @ estimated parameters @
22 vb2=__vb; @ estimated var-cov. of parameters @
23 ll2=__ll; @ log-likelihood @
24 w=eqc(b2)'*invpd(gradp(&eqc,b2)*vb2*gradp(&eqc,b2)')*eqc(b2);
110
NONLINEAR REGRESSION MODELS
Here is the estimation result of the constrained model as of line 12 (see also
output7.3 for more details):
Initial Result:
Sum of Squares = 37.097
Log Likelihood = -45.753
Parameters = 1.0000 0.50000 -1.0000
Final Result:
Iterations = 11 Evaluations = 7050
Sum of Squares = 1.7659
Log Likelihood = -0.080162
Gradient of Log Likelihood = -6.2600e-006 3.4304e-006 -5.5635e-007
Asymptotic
Parameter Std. Error t-Ratio
X1 0.11849 0.070742 1.6749
X2 0.32238 0.10324 3.1225
X3 -3.4403 1.7791 -1.9338
We now return to the program of Lesson 7.3. To compute the Lagrangian multiplier
test statistic, the estimated errors are recalculated from the residual function rf (line
15). In addition, the variance (line 16) and the derivatives (line 17) of estimated
errors are needed for implementing the LM formula in line 18. Note that the gradient
computation of line 17 is for a function with two arguments, the data matrix as the
first and parameter vector as the second. The procedure gradp2 is built into GPE
with the consideration that user-defined functions are constructed from the
combination of a data matrix and a parameter vector. It serves the same purpose as
the GAUSS built-in procedure gradp to compute the gradient vector of a
continuous differentiable function with respect to the parameters. The result of
gradp2(&rf,x,b1)of line 17 is a 30 by 4 matrix of derivatives of the residual
function rf with respect to 4 parameters of b1 over a sample of 30 observations of
x.
The Wald test is based on the unconstrained model. The unrestricted regression
model is the same as reported in Lesson 7.1. Based on the maximum likelihood
estimation using the unconstrained residual function rf (lines 30-37), the Wald test
statistic is computed from the constraint function eqc (and its first derivatives
gradp(&eqc,b2)) evaluated at the estimated parameter b2 (lines 19-24). Finally,
log-likelihood function values of both constrained and unconstrained estimations are
used to compute the Likelihood Ratio test statistic in line 25. The following output
summarizes the result:
111
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
All three test statistics are small and close to 0. Comparing with the critical value of
the Chi-square distribution for 1 degree of freedom, we conclude that the restriction
β4 = 1/β3 should not be rejected. Therefore the CES production function should be
represented as follows:
The conclusion is obvious that the linear equation of a log model (θ, λ −> 0) will be
the choice for this set of data. The model is linear in the parameters and can be
estimated more accurately using linear least squares. The log model is the limiting
case of Box-Cox transformation, and the estimates obtained from linear regression
are close to those of nonlinear method. The above calculation is based on the
following estimation result, which you should be able to duplicate:
112
VIII
Discrete and Limited Dependent Variables
There are many situations in which the dependent variable of a regression equation is
discrete or limited (truncated) rather than continuous. As we have seen in the
discussion of dummy variables in Chapter IV, some or all of the explanatory
variables in a regression model are qualitative in nature, and therefore only take on a
limited number of values. In the case of dummy variables, those values are 0 and 1.
In this chapter we will consider only the simplest form of qualitative choice models:
binary choice and tobit (censored regression) models. The binary choice (or the “yes
or no” decision) will take on one of two discrete values, 1 or 0. The censored
regression model allows for the dependent variable to follow a mix of discrete and
continuous distributions. Here we learn how to implement and estimate the binary
choice and tobit limited dependent variable models as applications of nonlinear
regression.
Yi = 1 with probability Pi
0 with probability 1-Pi
∂ E (Y i | X i ) ∂ F (X i β )
= β = f (X i β )β
∂X i ∂ (X i β )
likelihood function is
∂ ll (β ) Yi 1 − Yi
∂β
= ∑
i = 1 , 2 ,..., N
− fiX i
Fi 1 - Fi
Y − Fi
= ∑ i =1 , 2 ,..., N i fiX i = 0
Fi (1 − Fi )
∂β∂β ′
The range of Var(εi) is between 0 and 0.25. Furthermore, since E(Yi|Xi) = F(Xiβ) =
Xiβ, a linear function, there is no guarantee that the estimated probability Pi or 1-Pi
will lie within the unit interval. We can get around the problem of Pi taking values
outside the unit interval by considering a specific probability distribution or
functional transformation for Pi. A commonly used probability distribution is the
normal distribution giving rise to the probit model, while a commonly used
functional transformation is the logistic curve function giving rise to the logit model.
Probit Model
Let Pi = F(Xiβ) = ∫−∞Xiβ 1/(2π)1/2 exp(-z2/2) dz. Then we call Pi (based on the
cumulative normal distribution), the probit for the i-th observation. The model Yi =
F-1(Pi) + εi is called the probit model, where F-1(Pi) = Xiβ is the inverse of the
cumulative normal distribution F(Xiβ). For those concerned that we chosen the
above specification seemingly out of thin air, the probit model can be derived from a
model involving a continuous, unobserved, or latent, variable Yi* such that Yi* =
Xiβ + εi, where εi follows a standard normal density.6 Suppose the value of the
observed binary variable Yi depends on the sign of Yi* as follows:
Yi = 1 if Yi* > 0
0 if Yi* ≤ 0
6
If εi is a normal random variable with zero mean and standard error σ, then the probability of
Yi = 1 is written as Pi = F(Xiβ/σ). Since β/σ appears in the density function as a ratio, they are
not separately identified. Therefore, it is convenient to normalize σ to be one. The standard
normal distribution is sometimes referred to as the z-distribution, where the random variable
is zi = εi/σ = εi, given σ =1.
114
DISCRETE AND LIMITED DEPENDENT VARIABLES
Therefore,
Yi − Fi
∑ f i Xi = 0
Fi (1 − Fi )
i =1, 2 ,..., N
where fi and Fi are, respectively, the probability density and cumulative density
functions of a standard normal random variable evaluated at Xiβ. That is,
Fi = F(X iβ ) = ∫
Xi β
−∞
1 ( )
2π exp − z 2 / 2 dz
and,
∂F(X iβ )
fi =
∂ (X iβ )
=1 [ 2
]
2π exp - (X iβ ) / 2 .
Furthermore, it can be shown that for the maximum likelihood estimates of β the
expected value of the (negative definite) hessian is
′
∂ll 2 (β ) f 2X X
E = −∑i =1,2,..., N i i i .
∂β∂β′ Fi (1 − Fi )
∂E(Yi | X i ) ∂F(X iβ )
= β j = f (X iβ )β j = f i β j
∂X ij ∂ (X iβ )
115
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The data file grade.txt is used. The following program estimates the probit model
specification of the above equation. The log-likelihood function for each data
observation is defined in lines 20 through 26 with the procedure named probitf.
Since the first derivatives of the log-likelihood function are rather straightforward
analytically, we also write the procedure probitf1 to calculate these derivatives
(lines 27 to 34). The analytical first derivatives may be used in optimization to
improve the accuracy of the solution (see Chapter VI).
If the residual function is defined and called when estimating a nonlinear regression
model, the GPE default objective function is the log-likelihood function for
maximization and sum-of-squares for minimization. Whether the problem is
maximization or minimization is controlled by the value of the input variable
_nlopt. Setting _nlopt=1 specifies a maximization problem. Setting _nlopt=0
indicates a minimization problem. There are certain cases of maximum likelihood
estimation in which the log-likelihood function may be defined instead of the
residual function for each sample observation. All the qualitative choice models
discussed in this chapter fall in this second category. Setting _nlopt=2 (see line
11) informs GPE that the maximization is performed on the sum of the component
(log-likelihood) functions. We could write the total log-likelihood function to
estimate the model, but there may be a loss of numerical accuracy due to
compounding running errors in evaluating the function and its derivatives.
/*
** Lesson 8.1: Probit Model of Economic Education
** Greene (1999), Example 19.1
** See also Spector and Mazzeo (1980)
*/
1 use gpe2;
2 output file=gpe\output8.1 reset;
3 n=33;
4 load data[n,4]=gpe\grade.txt;
5 gpa=data[2:n,1];
6 tuce=data[2:n,2];
7 psi=data[2:n,3];
8 grade=data[2:n,4];
9 z=gpa~tuce~psi~ones(rows(grade),1);
10 call reset;
116
DISCRETE AND LIMITED DEPENDENT VARIABLES
19 end;
Running the program, we obtain the estimated probit model as follows (see also
output8.1 for the detailed results of interim iterations):
Initial Result:
Log Likelihood = -62.697
Parameters = 0.50000 0.00000 0.50000 0.00000
Final Result:
Iterations = 5 Evaluations = 3808
Log Likelihood = -12.819
Parameters = 1.6258 0.051729 1.4263 -7.4523
Gradient Vector = -2.5489e-005 -0.00031045 -3.6749e-006 -5.7386e-006
Asymptotic Asymptotic
Parameter Std. Error t-Ratio
X1 1.6258 0.69373 2.3436
X2 0.051729 0.083925 0.61637
X3 1.4263 0.59520 2.3964
X4 -7.4523 2.5467 -2.9263
Recall that, given the estimated parameters, we are mostly concerned with the
probability or the conditional expected value E(Y|X). With respect to the three
explanatory variables, GPA, TUCE, and PSI, the slopes (or marginal effects) are of
interest. Lines 16 through 18 of lesson8.1 calculate and report the probability and the
relevant marginal effects. The next part of the output shows these results for each
observation:
Probability Slopes
117
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
If you do not want to see the long list of E(Y|X) (probability) and ∂E(Y|X)/∂X
(marginal effects), they may be computed at the means of the explanatory variables.
To see what happens, insert the following statement after line 16:
z=meanc(z)';
Logit Model
Let Pi = F(X i β ) = 1
, where Pi as defined is the logistic curve. The
1 + exp(− X i β )
model Yi = Xiβ+ εi = F-1(Pi) + εi is called the logit model. We can easily derive the
logit model from the odd ratio model, in which we assume that the log of the ratio of
Pi
the probabilities (Pi and 1-Pi) is equal to Xiβ. Thus we assume ln 1-P = Xiβ.
i
Solving for Pi yields:
exp(X iβ ) 1
Pi = =
1 + exp(X iβ ) 1 + exp(− X iβ )
Yi − Fi
∑ fi Xi = 0
Fi (1 − Fi )
i =1, 2 ,..., N
118
DISCRETE AND LIMITED DEPENDENT VARIABLES
1
Fi = F(X i β ) =
1 + exp(− X i β )
and
∂F(X iβ ) exp(− X iβ )
fi = = = Fi (1 − Fi ),
∂ (X iβ ) [1 + exp(− X iβ )]2
∑ i =1, 2 ,..., N
(Yi − Fi )X i = 0.
−1
∂ll 2 (β )
Var (β ) = − E
∂β∂β ′
∂β∂β ′
To interpret the model, we define the marginal effect of the j-th explanatory variable
Xij as:
∂E(Yi | X i ) ∂F(X iβ )
= β j = f (X iβ )β j = f i β j = Fi (1 − Fi )β j
∂X ij ∂(X iβ )
As you can see, the logit model is similar in construction to the probit model. Only
the choice of transformation function is different.
/*
** Lesson 8.2: Logit Model of Economic Education
** Greene (1999), Example 19.1
** See also Spector and Mazzeo (1980)
*/
1 use gpe2;
2 output file=gpe\output8.2 reset;
3 n=33;
4 load data[n,4]=gpe\grade.txt;
5 gpa=data[2:n,1];
6 tuce=data[2:n,2];
7 psi=data[2:n,3];
119
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
8 grade=data[2:n,4];
9 z=gpa~tuce~psi~ones(rows(grade),1);
10 call reset;
15 call estimate(&logitf,grade~z);
/*
_deriv=&logitf1;
call estimate(&logitf,grade~z);
*/
@ logit model: interpretation @
16 p=1./(1+exp(-z*__b));
17 print " Probability Slopes";;
18 print p~(p.*(1-p).*__b[1:rows(__b)-1]');
19 end;
The estimated results are similar to those of the probit model. Instead of showing the
detailed output for comparison, we present the estimated probabilities and marginal
effects of the probit and logit models, evaluated at the means of three explanatory
variables:
Probit Logit
Probability 0.26581 0.25282
Marginal Effects GPA 0.53335 0.53386
TUCE 0.01697 0.01798
PSI 0.04679 0.04493
Extensions of binary choice models to the cases with more than two choices are
interesting, though the derivations are tedious. Multiple choice models include
unordered (independent or nested) and ordered (with a preference rank) choices.
Both the probit and logit model specifications for multiple choice are possible, but
they are beyond the scope of the current discussion.
120
DISCRETE AND LIMITED DEPENDENT VARIABLES
Yi* = Xiβ + εi
Yi = 1 if Yi* > 0
0 if Yi* ≤ 0
Suppose, however, that Yi is censored—that is, we restrict the number (or kinds) of
values that Yi can take. As an example, consider the following:
That is,
This model is called the tobit (or Tobin’s probit) model. Define fi and Fi to be the
probability density function and cumulative density function of a standard normal
random variable evaluated at Xiβ/σ. That is,
Fi = F(X iβ / σ ) = ∫
−∞
Xi β σ
1 (
2π exp − z 2 2 dz )
fi = f (Xiβ/σ) = 1 2π exp − (Xiβ σ) 2[ 2
]
For the observations such that Yi = 0 or Yi* = Xiβ + εi ≤ 0, the likelihood function is
If Yi > 0, on the other hand, then the likelihood function is simply the normal density
function:
Therefore the likelihood function for the tobit model is a mixture of the above
discrete and continuous distributions depending on the values taken by the dependent
variable (i.e., zero or positive):
[ ( )
ll = ∑{i|Y =0} ln(1 − Fi ) − 1 2 ∑{i|Y >0} ln(π ) + ln σ 2 + (Yi − Xiβ ) σ 2
i i
2
]
121
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Then, for the maximum likelihood estimation, we solve from the following first-
order conditions:
i
(
∂ll ∂β = − (1 σ )∑{i|Y =0} f i Xi (1 − Fi ) + 1 σ 2 )∑{ i |Yi >0}
(Yi − Xiβ) Xi = 0
(
∂ll ∂σ 2 =1 2 1 σ 3 )∑{i |Yi = 0} i
(
f X iβ (1 − Fi ) − 1 2 1 σ 2 )∑{ i |Yi >0}
[1 − (Y − X β)
i i
2
]
σ2 = 0
To interpret the estimated coefficients of the model, we may use three conditional
expected values:
E(Yi*|Xi) = Xiβ
The first expected value (corresponding to the “uncensored” case) is easy to obtain.
The last expected value will be of particular interest if our sample contains many
censored observations. Accordingly, for the j-th explanatory variable, the
corresponding marginal effects are:
∂E(Yi*|Xi)/∂Xij = βj
We note that the last censored marginal effect differs from the first uncensored one
by a scale factor equal to the probability of that observation not being censored. In
other words, the scale factor is equal to Fi (recall that Fi is 1-Prob(Yi = 0)).
The tobit model is often estimated for comparison with the alternative probit or count
model specifications. The model can be easily extended to consider more than one
censoring point. For instance, we could censor both tails of the distribution. This is
an example of a doubly censored regression.
Y Number of affairs in the past year: 0, 1, 2, 3, 4-10 (coded as 7), 11-365 (coded
as 12).
122
DISCRETE AND LIMITED DEPENDENT VARIABLES
The preponderance of zeros (no affairs) may not render the tobit model to be the best
for the study. The complete data set used in the article is available from the text file
fair.txt, but we present only the restricted model using a subset of five explanatory
variables as follows:
Z2 Age.
Z3 Number of years married.
Z5 Degree of religiousness: 1 (anti-religious), … , 5 (very religious).
Z7 Hollingshead scale of occupation: 1, … , 7.
Z8 Self-rating of marriage satisfaction: 1 (very unhappy), … , 5 (very happy).
Y = β0 + β2 Z2 + β3 Z3 + β5 Z5 + β7 Z7 + β8 Z8 + ε
The conclusion and interpretation of the estimated model are left to the interested
reader. Our emphasis here is the implementation of tobit analysis using GPE and
GAUSS. To do so, we need to briefly explain the maximum likelihood estimation
procedure. Recall that the likelihood function of the tobit model is a mixture of
discrete and continuous normal likelihoods, depending on the censored point (zero)
of the dependent variable. Unlike in the probit model, the standard error is an explicit
unknown parameter which must be estimated together with the regression
parameters. In lines 22 to 28 of the following program, the procedure
tobitf defines the log-likelihood function for each sample observation. For
maximum likelihood estimation, we need to set _nlopt=2 (line 10), which
instructs GPE to maximize the sum of the individual log-likelihood functions.7
/*
** Lesson 8.3: Tobit Analysis of Extramarital Affairs
** Greene (1999), Example 20.12
** See also R. Fair, JPE, 86, 1978, 45-61
*/
1 use gpe2;
2 output file=gpe\output8.3 reset;
3 n=602;
4 load data[n,15]=gpe\fair.txt;
5 y=data[2:n,13];
6 z=data[2:n,5 6 8 11 12]; @ use z2, z3, z5, z7, z8 @
7 call reset;
7
Because the size of this nonlinear optimization is beyond the limits of GAUSS Light, the
professional version of GAUSS should be used for Lesson 8.3.
123
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
18 em=cdfn(z*b/s).*b';
19 print "Expected Value Marginal Effects";;
20 print ey~em[.,1:5];
21 end;
Remember that tobit is a nonlinear model. First, the uncensored model is estimated
by ordinary least squares (line 8). The estimated parameters are then used as the
initial values in the tobit model estimation (line 14). Here is the result of the
estimated tobit model, which converges after 60 iterations:
Initial Result:
Log Likelihood = -886.21
Parameters = -0.050347 0.16185 -0.47632 0.10601 -0.71224
5.6082 5.0000
Final Result:
Iterations = 60 Evaluations = 1981497
Log Likelihood = -705.58
Parameters = -0.17933 0.55414 -1.6862 0.32605 -2.2850
8.1742 8.2471
Gradient Vector = 3.0579e-005 3.0303e-006 2.2826e-006 1.4280e-006
2.3271e-006 4.5474e-007 -2.7294e-006
Asymptotic Asymptotic
Parameter Std. Error t-Ratio
X1 -0.17933 0.079136 -2.2661
X2 0.55414 0.13459 4.1174
X3 -1.6862 0.40378 -4.1761
X4 0.32605 0.25443 1.2815
X5 -2.2850 0.40787 -5.6022
X6 8.1742 2.7419 2.9812
X7 8.2471 0.55363 14.896
To save space we do not list the 601 observations of expected values and marginal
effects. One alternative is to calculate the average of the estimated expected values
124
DISCRETE AND LIMITED DEPENDENT VARIABLES
A second alternative is to evaluate the expected values and marginal effects at the
means of the explanatory variables by inserting the following statement before line
16:
z=meanc(z)';
125
IX
Heteroscedasticity
Heteroscedasticity is a common problem with cross-sectional data, in which unequal
model variance is observed. Ordinary least squares estimation with a
heterogeneously distributed error structure leads to inefficient estimates of the
regression parameters. In order to correct for this inefficiency, the source of
heteroscedasticity in relation to one or more variables must be identified.
To illustrate how to test and correct for the problem of heteroscedasticity, the
following relationship of public school spending (SPENDING) and income
(INCOME) across 50 states in the U.S. is considered:
To estimate this equation, which is used for all the lessons in this chapter, a cross-
sectional data file greene.txt is used.8 It gives per capita public school expenditure
and per capita income by state in 1979. Let’s take a look at the data file greene.txt
we will be dealing with. The data file contains three columns. The first column is the
state identifier (STATE), the second column is per capita expenditure on public
schools (SPENDING), and the third column is per capita income (INCOME).
Viewing greene.txt in the Edit window reveals a problem with the data. Notice that
WI (Wisconsin) is missing a data observation. The row WI has “NA” for the
corresponding value in the SPENDING column. GAUSS sees “NA” as a character
string, not suitable for numerical computation. GAUSS has commands that convert
character strings, such as “NA,” to a symbol that it can interpret as a missing value.
The first part of each lesson in this chapter walks you through the process of
converting greene.txt with its missing values to useable data. Several new GAUSS
commands are introduced for this purpose.
8
This example was used in Greene (1997, Chapter 12), but it has been removed from the
updated fourth edition (Greene, 1999).
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
() ( )
Var β̂ = (X′X ) X′Σ̂X (X′X )
−1 −1
where X is the data matrix of regressors, β̂ is the ordinary least squares estimator of
the parameters vector β, and Σ̂ is a diagonal variance-covariance matrix (i.e., the
estimator of E(εε') with the elements being the squares of the estimated regression
residuals.
From two least squares estimations, one with the ordinary variance-covariance
matrix and the other with the heteroscedasticity-consistent covariance matrix, we can
directly compare the results of these regressions. In GPE, by setting the input
variable _vcov=1 (see line 11 of Lesson 9.1), the details of the variance-covariance
matrix are presented. The second regression estimation with the newly introduced
input variable _hacv=1 (see line 13 of Lesson 9.1) computes the heteroscedasticity-
consistent estimates of the variance-covariance matrix instead of the inefficient one
from the ordinary least squares.
/*
** Lesson 9.1: Heteroscedasticity-Consistent
** Variance-Covariance Matrix
*/
1 use gpe2;
2 output file = gpe\output9.1 reset;
3 load greene[52,3]= gpe\greene.txt;
4 data = greene[2:52,.];
5 data = miss(data,"NA"); @ NA to missing value @
6 data = packr(data); @ deletes row w/miss value@
7 spending = data[.,2];
8 income = data[.,3]/10000;
9 call reset;
Line 5 introduces the miss command of GAUSS. It modifies the matrix data by
comparing each element in the matrix data to the character string “NA.” If an
element in data equals “NA,” it is replaced with a dot (.), GAUSS’s symbol for a
missing value. In the next line, packr(data) deletes any rows that contain any
missing values in the matrix data. After data has been packed (line 6), the number
of rows in data is reduced to 50. Refer to the GAUSS manual or on-line help to
find more information on the commands miss and packr.
128
HETEROSCEDASTICITY
The result of the first least squares estimation with the option to print out the
estimated variance-covariance matrix (lines 10 to 12) is:
In order to compare the ordinary least squares estimates of standard errors and the
variance-covariance matrix with the heteroscedasticity-consistent variance-
covariance matrix, we look at a portion of the output from the second regression
estimation (lines 13 and 14):
129
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
INCOME 1.0000
INCOME^2 -0.99796 1.0000
CONSTANT -0.99789 0.99182 1.0000
INCOME INCOME^2 CONSTANT
The Goldfeld-Quandt test requires data to be sorted according to the size of the
independent variable suspected to be the source of heteroscedasticity. The entire data
set is then divided into three parts. The middle group is dropped, and the regression
is run using only the groups containing the smallest and largest values. Separate
regressions are estimated on each of the groups of smallest and largest values.
Residual sum-of-squares (RSS) from both groups are then compared in the Goldfeld-
Quandt test statistic.
As in the previous Lesson 9.1, the data are read from greene.txt and corrected for
missing values. The Goldfeld-Quandt test requires a sorted data series in accordance
with the suspected source of heteroscedasticity. Sorting the rows in the matrix,
data, by the information in the third column (that is, the variable INCOME) is done
in line 7. INCOME is sorted from its smallest value to its largest. The GAUSS Help
menu gives greater details about the data sorting commands such as sortc used
here.
/*
** Lesson 9.2: Goldfeld-Quandt Test and
** Correction for Heteroscedasticity
*/
1 use gpe2;
2 output file = gpe\output9.2 reset;
3 load greene[52,3]= gpe\greene.txt;
4 data = greene[2:52,.];
5 data = miss(data,"NA"); @ NA to missing value @
6 data = packr(data); @ deletes row w/miss value @
7 data = sortc(data,3); @ sort data (income), in ascending order @
8 spending = data[.,2];
130
HETEROSCEDASTICITY
9 income = data[.,3]/10000;
10 call reset;
/* Goldfeld-Quandt Test */
11 _names = {"spending","income","income^2"};
12 _begin = 1;
13 _end = 17;
14 call estimate(spending,income~income^2);
15 mss1 =__rss/14; @ N-K = 17-3 = 14 @
16 _begin = 34;
17 _end = 50;
18 call estimate(spending,income~income^2);
19 mss2 =__rss/14; @ N-K = 17-3 = 14 @
Selecting the first group of the 17 smallest observations to regress for the Goldfeld-
Quandt test is done by restricting the data matrix to only include observations from 1
to 17 (lines 12 and 13). The use of the output control variable, __rss, is introduced
in line 15. __rss stores the sum-of-squared residuals from the latest regression
estimation. Each time when the procedure estimate is called, output control
variables are assigned new values. To save the value of __rss/14, the mean sum-
of-squares of residuals, for later use in the Goldfeld-Quandt test statistic, it is
assigned to variable mss1. Similarly, lines 16, 17, and 18 select the 17 largest
observations and run the regression, assigning the resulting __rss/14 to the
variable mss2.
Since we are only interested in the RSS from the regressions, the outputs from the
first two estimations are not printed here. Instead, the result of Goldfeld-Quandt test
statistic from line 20 is given as follows:
RSS2
N2 − K
RSS1
N1 − K
which follows the F-distribution with N2-K and N1-K degrees of freedom, where N1
and N2 are the number of observations corresponding to the two separate samples,
and K is the number of regressor parameters, including the constant term. Since the
Goldfeld-Quandt method requires that the test statistic be greater than 1, the largest
RSS (RSS2) must be in the numerator. The computed value of 1.94 is smaller than
the critical value of F(14,14) at a 5% level of significance (that is, 2.40), so we could
not reject the hypothesis of homoscedasticity.
131
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
tells GPE to estimate the model using 1/INCOME as the weight incorporated in the
regression estimation called on line 23. All the variables, both dependent and
independent, including the constant term, are weighted (or divided by INCOME).
The rationale is that the variable INCOME2 may be used to approximate the
heterogeneous variances of the model.
Because input variables _begin and _end had been set to non-default values
earlier in the program, calling reset again (line 21) is the simplest way to insure
that all control variables are reset to their default values for the new estimation to
come.
The regression output corrected for heteroscedasticity is available in the output file
output9.2. Compare the earlier ordinary least squares estimation with the
heteroscedasticity-consistent covariance matrix from Lesson 9.1,
and with the weighted least squares estimation using 1/INCOME as the weighting
variable,
Notice that the numbers in parentheses are the estimated standard errors of the
coefficients. The results of the two regressions are similar, but in theory, weighted
least squares estimation is more efficient.
132
HETEROSCEDASTICITY
sorted. However, the Breusch-Pagan test does assume residual normality for accurate
results. The alternative Koenker-Basset test is more forgiving in regards to the
assumption of normality.
We note that both the Breusch-Pagan and White tests for general heteroscedasticity
do not offer information about the source and the form of heteroscedasticity. To
correct for this problem, a more specific heteroscedastic structure of the model may
be required.
/*
** Lesson 9.3: Breusch-Pagan and White Tests
** for Heteroscedasticity
*/
1 use gpe2;
2 output file = gpe\output9.3 reset;
3 load greene[52,3]= gpe\greene.txt;
4 data = greene[2:52,.];
5 data = miss(data,"NA"); @ NA to missing value @
6 data = packr(data); @ deletes row w/mis value @
7 spending = data[.,2];
8 income = data[.,3]/10000;
9 call reset;
14 end;
Lines 1 through 10 are similar to lesson9.1. Keeping in mind that the working of the
Breusch-Pagan test assumes that residuals are normally distributed, so we have
included the Bera-Jarque test for residual normality on line 11 (_bjtest=1). Line
12 (_bptest=1) performs the Breusch-Pagan and White tests for general
heteroscedasticity.
Let’s examine the output now. The regression result from the first estimation (line
13) is the same as the result discussed in the previous Lesson 9.2, with additional
pieces of information: the Bera-Jarque test for normality, and the Breusch-Pagan and
White tests for heteroscedasticity:
133
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The result of the Bera-Jarque test reveals normality in the residuals (refer to Lesson
3.6 for details on the Bera-Jarque test for residual normality). The last section of the
regression output is what we are interested in: the Breusch-Pagan and White tests for
heteroscedasticity. It is set up to test the null hypothesis of homoscedasticity. That is,
for the Breusch-Pagan test, if the computed test value is less than the critical value of
the Chi-square distribution with two degrees of freedom, we fail to reject the
hypothesis that the model error is homogeneously distributed. A similar conclusion
is obtained from the White test, which is based on the R2 statistic of the auxiliary
regression with 4 degrees of freedom. Note the low P-values for both the Breusch-
Pagan and White test statistics, leading us to reject the hypothesis of
homoscedasticity and conclude that heteroscedasticity exits in the model.
Remember the requirement of residual normality for the Breusch-Pagan test? If the
residuals are not normally distributed, we need to use a more general version of the
Breusch-Pagan test, called the Koenkar-Basset test. The closer to normal the
residuals are, the more similar these two test statistics. If absolute normality exists,
the computed values of the two tests will be identical. Since the estimated residuals
are indeed normally distributed for this example as shown earlier, both tests return
rather close values, 18.9 and 15.8, respectively. Our conclusion of heteroscedasticity
is the same from both the Breusch-Pagan and Koenkar-Basset test statistics.
∑ = σ2Ω
Given the general form of heteroscedasticity, there are too many unknown
parameters. For practical purposes, some hypothesis of heteroscedasticity must be
assumed:
134
HETEROSCEDASTICITY
σi2 = σ2 h(Xi,α)
where σ2 > 0 and the heteroscedastic function h depends on part or all of the
regressors and a vector of parameters α. Given a specific formulation of
heteroscedasticity, hi(α) = h(Xi,α) for brevity, the log-likelihood function is written
as:
Let εi*(β,α) = εi(β) / hi(α) and substitute out σ2 with ε*(β,α)'ε*(β,α)/N, then
the concentrated log-likelihood function is
As the variances must be explicitly estimated, σi2 = σ2 hi(α), the objective log-
likelihood function is inevitably complicated. To maximize the log-likelihood
function, the techniques of nonlinear optimization of Chapter VI are applicable.
3. ln(σi2)= ln(σ2) + α Xi
4. ln(σi2)= ln(σ2) + α ln(Xi)
135
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The data set is given in greene.txt. This time we will find and compare the
maximum likelihood estimates based on the following hypothesis of multiplicative
heteroscedasticity:
σi2 = σ2 INCOMEiα
Lesson 9.2 has demonstrated the weighted least squares estimation for the case of α
= 2. The alternative expression of multiplicative heteroscedasticity is
/*
** Lesson 9.4: Multiplicative Heteroscedasticity
** Greene (1997), Chap. 12.5
*/
1 use gpe2;
2 output file=gpe\output9.4 reset;
3 load data[52,3]=gpe\greene.txt;
4 data=data[2:52,2]~(data[2:52,3]/10000); @ scale data @
5 data=packr(miss(data,"NA")); @ take care of missing obs @
6 b=data[.,1]/(ones(rows(data),1)~data[.,2]~(data[.,2]^2));
7 call reset;
8 _method=4;
9 _iter=100;
10 _restart=10;
11 _b=b|2.0;
12 _nlopt=1;
13 call estimate(&llf,data);
14 end;
15 proc llf(data,b);
16 local n,y,x,e,h,ll;
17 y=data[.,1]; @ public school spending @
18 x=data[.,2]; @ income @
19 n=rows(y);
20 h=x^b[4]; @ multiplicative hetero @
/*
h=exp(b[4]*x);
*/
21 e=(y-b[1]-b[2]*x-b[3]*(x^2))./sqrt(h);
22 ll=-0.5*n*(1+ln(2*pi)+ln(e'e/n))-0.5*sumc(ln(h));
23 retp(ll);
24 endp;
The first part of the program loads and scales the data, which are the same as in
previous lessons. Line 6 computes the linear model estimates as the starting values of
parameters for nonlinear maximum likelihood estimation (see line 11). The objective
log-likelihood function llf is defined in lines 15 through 23. The specific form of
multiplicative heteroscedasticity is given in line 20. Since the estimation has
experienced some difficulty in improving the function value in its final iterations, we
set _restart=10 in line 10 to restart the iteration in case of failure. If you have
trouble understanding what this program is doing, a review of Chapter VI and the
program lessons on nonlinear models there is recommended.
136
HETEROSCEDASTICITY
Initial Result:
Function Value = -268.70
Parameters = 832.91 -1834.2 1587.0 2.0000
Final Result:
Iterations = 5 Evaluations = 126
Function Value = -268.09
Parameters = 560.69 -1124.1 1132.4 3.2985
Gradient Vector = 2.0276e-008 0.00000 0.00000 1.7233e-006
Hessian Matrix =
-0.020623 -0.014837 -0.010848 -0.00024046
-0.014837 -0.010848 -0.0080655 9.0987e-005
-0.010848 -0.0080655 -0.0061037 -0.00013901
-0.00024046 9.0987e-005 -0.00013901 -0.58515
Asymptotic
Parameter Std. Error t-Ratio
X1 560.69 354.11 1.5834
X2 -1124.1 943.28 -1.1917
X3 1132.4 621.04 1.8233
X4 3.2985 1.3790 2.3920
σi2 = σ2 INCOMEi3.3
137
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
which is currently ignored by GAUSS within the comment notations (/* */). Make
the change and run the modified program. The resulting regression equation looks
like this,
Refer to the output file generated from the program and verify the above regression
results. To summarize the discussion of heteroscedastic regression models, we put
together and compare the estimation results of the public school spending-income
relationship:
The numbers in parentheses are the estimated standard errors of the parameters. To
recap the essence of each model: (1) Lesson 9.1 is an ordinary least squares with
heteroscedasticity-consistent variance-covariance matrix; (2) Lesson 9.2 is a
weighted least squares using 1/INCOME as the weight (i.e., σi2 = σ2 INCOMEi2); (3)
and (4) are the maximum likelihood estimators with multiplicative heteroscedasticity
(i.e., σi2 = σ2 INCOMEiα and σi2 = σ2 exp(α INCOMEi), respectively) as presented
in Lesson 9.4.
138
X
Autocorrelation
Autocorrelation is a problem most likely associated with time series data. It concerns
the relationship between previous and current error terms in a regression model. In
the simplest case, the serial correlation is of first order where the correlation between
current and immediate previous errors is nonzero. A more complicated error
structure can include autoregressive (AR) and moving average (MA) terms.
OLS (ordinary least squares) estimation with autocorrelated error structure results in
a loss of efficiency. Therefore, statistical inference using t and F test statistics cannot
be trusted.
In this chapter, we revisit the multiple regression model of U.S. production function,
using the labor (L), capital (K), and output (X) data series of cjx.txt:
Lesson 10.1 below demonstrates the use of the input control variable _hacv to
obtain a consistent estimator of the variance-covariance matrix, when ordinary least
squares is used. Several tests for the existence of autocorrelation are given in Lesson
10.2. Correction methods for first-order autocorrelation to improve the efficiency of
parameter estimates are presented in Lessons 10.3 and 10.4. Since a more
complicated structure of autocorrelation may be identified, the estimation of
autoregressive and moving average error structures is considered in the last three
lessons. Lesson 10.5 is a model with higher-order autocorrelation. The technique of
maximum likelihood is introduced in Lesson 10.6, while Lesson 10.7 covers the
nonlinear method.
() ( )
Var β̂ = (X′X ) X′Σ̂X (X′X )
−1 −1
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
where X is the data matrix of regressors, β̂ is the ordinary least squares estimator of
the parameter vector β , and Σ̂ is the Newey-West covariance estimator for
autocorrelated and possibly heterogeneous disturbances. Refer back to Lesson 9.1 for
more details on the consistent covariance matrix in the context of heteroscedasticity.
/*
** Lesson 10.1: Heteroscedasticity Autocorrelation
** Consistent Variance-Covariance Matrix
*/
1 use gpe2;
2 output file = gpe\output10.1 reset;
3 load data[40,6]= gpe\cjx.txt;
4 year = data[2:40,1];
5 X = ln(data[2:40,2]);
6 L = ln(data[2:40,3]);
7 K = ln(data[2:40,5]);
8 call reset;
Recall that by setting the GPE input variable _vcov = 1 (see line 10), the
estimated variance-covariance matrix is presented. Instead of using the inefficient
variance-covariance matrix from ordinary least squares, computation of the
consistent covariance matrix is controlled by the input variable _hacv. _hacv is
either a scalar or a two-element vector. The first element of _hacv is reserved for
heteroscedasticity correction as shown earlier in Lesson 9.1, while the second
element is the order of autocorrelation to be considered for the estimator of an
autocorrelation-consistent variance-covariance matrix. Therefore, line 12 of the
program:
_hacv = {0,4};
144
AUTOCORRELATION
We now analyze the output of three regression estimations. The first least squares
estimation with the option to print out the estimated variance-covariance matrix
(lines 10 to 11) is as follows:
Since autocorrelation is suspected for most time series data, the second regression
estimation (lines 12 and 13) is carried out with the fourth-order autocorrelation-
consistent standard errors and the variance-covariance matrix:
145
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Putting together the estimated regression equations with the three sets of estimated
standard errors, we have
In general, the consistent estimators of the covariance matrix are larger than their
ordinary least squares counterparts. The consequence is a higher probability of type
II error (incorrectly accepting the null hypothesis) for the estimators. In this example,
all three estimated variance-covariance matrices of the coefficients are quite similar
in spite of the consistency correction for autocorrelation and heteroscedasticity.
Detection of Autocorrelation
Given each observation of a linear regression model
Yi = Xi β + εi
the linear form of autocorrelated errors is written as:
146
AUTOCORRELATION
The popular Durbin-Watson test statistic is a part of the residual statistics output
reported with the regression estimation. That is, it is available with the statement:
_rstat = 1;
147
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
/*
** Lesson 10.2: Tests for Autocorrelation
*/
1 use gpe2;
2 output file = gpe\output10.2 reset;
3 load data[40,6]= gpe\cjx.txt;
4 year = data[2:40,1];
5 X = ln(data[2:40,2]);
6 L = ln(data[2:40,3]);
7 K = ln(data[2:40,5]);
8 call reset;
9 _names = {"X","L","K"};
10 _rstat = 1;
11 _rplot = 2;
12 _bgtest = 4; @ Breusch-Godfrey, 4th order @
13 _acf = 12; @ auto & partial auto,12 lags @
14 call estimate(X,L~K);
15 end;
Looking at the program, the use of input control variables _bgtest (line 12) and
_acf (line 13) are new. You may set the value of _bgtest to any order of
autocorrelation for the Breusch-Godfrey LM test, in this case up to 4 orders are
tested. Similarly, for calculating and plotting autocorrelation and partial
autocorrelation functions, the number of lags must be given to the variable _acf. In
this example, 12 lags seem sufficient for a data size of about 40 observations.
To see the plotting of more than one graphs in multiple windows, the following
statement must be included anywhere before calling estimate:
pqgwin many;
Let’s analyze the first part of the estimation result (which is the same as that of
Lesson 10.1), paying attention to the Durbin-Watson test statistic:
148
AUTOCORRELATION
The Breusch-Godfrey LM test is compared with the critical value of the Chi-square
distribution with degrees of freedom equal to the number of orders of
autocorrelation. P-values of each order tested are given to simplify the analysis. It
becomes clear that autocorrelation exists from the very first order upward. Since the
first order exhibits the problem of serial correlation as the Durbin-Watson bounds
test suggests, all LM tests for cumulative higher orders will certainly also identify
autocorrelation.
The last part of the output lists and displays autocorrelation and partial
autocorrelation coefficients for regression residuals up to 12 lags. Standard errors of
these coefficients are useful to spot the significance of lags for autoregressive and
moving average structures of autocorrelated residuals. In addition, both Box-Pierce
and Ljung-Box Q test statistics are computed for each lag. Similar to the Breusch-
Godfrey LM test, these accumulative tests follow a Chi-square distribution with
degrees of freedom corresponding to each individual number of lags, adjusted for the
number of regression coefficients whenever necessary.
Both moving average and autoregressive processes of lower orders are visibly
identifiable from the significant values of autocorrelation and partial autocorrelation
coefficients, respectively. These include the first lag of autocorrelation as well as the
first and second lags of partial autocorrelation coefficients. Moreover, Box-Pierce
and Ljung-Box Q test statistics confirm the problem of autocorrelation starting from
the first lag.
In summary, all these tests for autocorrelation suggest that our model may need to be
re-specified. Moreover, the correct specification may not involve just the simple
first-order correction. Nevertheless, the next two lessons will explain the correction
mechanisms of autocorrelation for the first-order model. For higher-order
autocorrelation and even the mixture of autoregressive and moving average error
structure, a proper model specification must first be identified.
149
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
In GPE, use the input control variable _ar to specify the order of autocorrelation for
estimation. Since the correction mechanism may require several iterations to
complete, another control variable _iter must be set to a large enough number to
achieve convergence. Therefore the following statements are minimal for estimating
a regression model, which corrects for the first-order serial correlation within the
limit of 50 iterations:
_ar = 1;
_iter = 50;
will use the traditional Cochrane-Orcutt method to estimate and to correct for an
AR(1) error process. As a matter of fact, this method applies to the autocorrelated
error structure of both first and higher orders. It just drops more observations in the
beginning of the sample, with a cost of inefficiency, partially due to the loss of
degrees of freedom. For estimation with higher-order autocorrelation, it is not
necessary to specify the variable _drop. For example,
_ar = 4;
will estimate and correct for the AR(4) process using the traditional Cochrane-Orcutt
iterative method in which the initial four observations are dropped automatically.
The Cochrane-Orcutt method only converges to a local solution. In rare cases, there
may exist more than one solution for an autocorrelated error structure. The Hildreth-
Lu grid search method guarantees that the global solution will be found for an AR(1)
model. Similar to the Prais-Winsten modification to the original Cochrane-Orcutt
iterative method, the Hildreth-Lu grid search method may include the first
observation with proper transformation. Alternatively, dropping the first observation
is an option with the cost of a decrease in efficiency. Again, we note that the
Hildreth-Lu method applies to an AR(1) model only. The Hildreth-Lu method is
activated by letting:
_method = 2;
Based on the GAUSS program of Lesson 10.3, the Hildreth-Lu grid search method is
introduced in Lesson 10.4, in which the global solution for the estimated AR(1)
model is ensured.
Both the Cochrane-Orcutt iterative and the Hildreth-Lu grid search methods offer the
option of using least squares or maximum likelihood criterion for optimization. They
are the same if the first observation is dropped. However, with the transformed first
observation included, the use of different optimization criteria may result in finding
different solutions, although they are usually close. We have noted the use of the
input control variable _method to select different methods of estimation for an
autocorrelated error structure. _method can be either a scalar or a 2-element vector.
150
AUTOCORRELATION
When employing a 2-element vector, the first element of _method selects the
estimation method, while the second element selects either least squares or
maximum likelihood criterion for optimization. For example,
_method = {0,1};
/*
** Lesson 10.3: Cochrane-Orcutt Iterative Procedure
*/
1 use gpe2;
2 output file = gpe\output10.3 reset;
3 load data[40,6]= gpe\cjx.txt;
4 year = data[2:40,1];
5 X = ln(data[2:40,2]);
6 L = ln(data[2:40,3]);
7 K = ln(data[2:40,5]);
8 call reset;
9 _names = {"X","L","K"};
10 _rstat = 1;
11 _ar = 1; @ AR(1) error structure @
12 _iter = 50; @ 50 iter for default C-O @
13 call estimate(X,L~K);
14 end;
As shown below, the regression output of this program is more complicated than
previous ones without the autocorrelation correction.
151
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Notice that the second block of output reports the Cochrane-Orcutt iterative results
set in lines 11, 12, and 13 of the program. Every iteration is listed until convergence
is reached. At the end of the iterations, we have the following results: the estimated
Rho, standard error, and t-ratio associated with the first-order serial coefficient. In
this example, the significant non-zero Rho value of 0.805 is used to correct the least
squares regression. In summary,
152
AUTOCORRELATION
ε = 0.805 ε-1
s.e. (0.096)
The verbose listing of iterations may be suppressed by setting the control variable
_print = 0. See Appendix A for details.
There is another optional input control variable _tol, which adjusts the
convergence tolerance level. Its default value is set to 0.001.
in line 12 as follows:
/*
** Lesson 10.4: Hildreth-Lu Grid Search Procedure
*/
1 use gpe2;
2 output file = gpe\output10.4 reset;
3 load data[40,6]= gpe\cjx.txt;
4 year = data[2:40,1];
5 X = ln(data[2:40,2]);
6 L = ln(data[2:40,3]);
7 K = ln(data[2:40,5]);
8 call reset;
9 _names = {"X","L","K"};
10 _rstat = 1;
11 _ar = 1; @ AR(1) error structure @
12 _method = 2; @ H-L method @
153
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
To call for the estimation of autocorrelation coefficients using the Hildreth-Lu grid
search procedure based on the maximum likelihood criterion, try this in place of line
12:
_method = {2,1};
Running the above program returns many iterations and lengthy output. Remember
that these iterations are controlled by the global variables _iter and _tol. For
viewing the estimation result, we refer readers to the output file output10.4.
ε = 0.826 ε-1
s.e. (0.090)
The results are basically the same as those obtained by using the Cochrane-Orcutt
method. Although the Hildreth-Lu grid search is costly in terms of computer
resources, the global nature of the estimated autocorrelation coefficients is superior
to the local solution found with either of the Cochrane-Orcutt methods.
/*
** Lesson 10.5: Higher-Order Autocorrelation
*/
1 use gpe2;
2 output file = gpe\output10.5 reset;
3 load data[40,6]= gpe\cjx.txt;
4 year = data[2:40,1];
5 X = ln(data[2:40,2]);
6 L = ln(data[2:40,3]);
7 K = ln(data[2:40,5]);
8 call reset;
9 _names = {"X","L","K"};
10 _rstat = 1;
154
AUTOCORRELATION
11 _rplot = 2;
12 _bgtest = 4;
13 _acf = 12;
Lines 12 and 13 add the options to perform tests for higher-order autocorrelation.
These include the Breusch-Godfrey LM test up to the fourth order of autocorrelation
and a plot of 12-lag autocorrelation functions. The AR(1) model is re-estimated (line
16) with the following test results:
From the visual display of autocorrelation functions as well as the results of several
test statistics (Breusch-Godfrey LM test, Box-Pierce and Ljung-Box Q tests), higher
orders of autocorrelation, or a mixture of autocorrelation and moving average
processes is suspected. In particular, the coefficients for the first lag of
autocorrelation and the first two lags of partial autocorrelation are still statistically
significantly different from zero. The second part of Lesson 10.5 goes ahead to
estimate the AR(3) model using the traditional Cochrane-Orcutt iterative method.
The possibility of a mixed error structure with a moving average process is discussed
in the next lesson.
Since the option to test for higher orders of autocorrelation is still included in the
program (see lines 12 and 13), the estimated AR(3) model is also tested for problems
of autocorrelation. Here are the estimation and test results with the AR(3) error
structure:
155
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
156
AUTOCORRELATION
Based on the Breusch-Godfrey LM test up to the fourth order as well as the plot of
the 12-lag autocorrelation function and the associated Box-Pierce and Ljung-Box Q
test statistics, the estimated model with AR(3) error structure is now free of
autocorrelation. The estimated model is superior to that of correcting only for first-
order autocorrelation.
In summary,
ε i = ρ1ε i-1 + ρ2ε i-2 + … + ρpεi-p - θ1υ i-1 - θ2υ i-2 - … - θqυ i-q + υi
With GPE, the following input control variables are relevant to the estimation of an
error structure with autoregressive and moving average components:
157
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Among these control variables, only _arma is new. The other variables are related
to nonlinear model estimation discussed in Chapters VI and VII. The variable
_arma is used to specify the orders of the autoregressive and moving average
components of the error structure. It is a column vector consisting of at least two
elements: the first is the order of the autoregressive component, followed by the
order of the moving average. For example,
_arma = {1,1};
/*
** Lesson 10.6: ARMA(1,1) Error Structure
*/
1 use gpe2;
2 output file = gpe\output10.6 reset;
3 load data[40,6]= gpe\cjx.txt;
4 year = data[2:40,1];
5 X = ln(data[2:40,2]);
6 L = ln(data[2:40,3]);
7 K = ln(data[2:40,5]);
8 call reset;
9 _names = {"X","L","K"};
10 _rstat = 1;
11 _rplot = 2;
12 _bgtest = 4;
13 _acf = 12;
158
AUTOCORRELATION
Initial Result:
Sum of Squares = 0.043221
Log Likelihood = 77.359
Parameters = 1.4508 0.38381 -3.9377 0.00000 0.00000
Final Result:
Iterations = 34 Evaluations = 50154
Sum of Squares = 0.019363
Log Likelihood = 93.016
Parameters = 1.1051 0.54833 -2.8796 0.62424 -0.67141
Gradient of Log Likelihood = 0.022545 0.019469 0.0044089 -
0.00020935 0.00015670
159
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Both the parameter estimates of AR(1) and MA(1) are statistically significant and
useful for correcting autocorrelation. Based on the Breusch-Godfrey LM test up to
the fourth order, as well as the 12-lag autocorrelation function plot and the
associated Box-Pierce and Ljung-Box Q test statistics, the estimated ARMA(1,1)
model is as good as that of the AR(3) specification presented in the Lesson 10.5.
Both models are essentially equivalent.
160
AUTOCORRELATION
The nonlinearity of AR(1) is clearly due to the product of parameters β and ρ, while
MA(1) is recursively weighted by θ. ARMA(1,1) is a mixed process of AR(1) and
MA(1), and therefore contains both of aforementioned nonlinearities. For model
estimation, the beginning observation of data series may be lost if not properly
initialized. The built-in ARMA estimation of GPE is conditional upon the simple
data initialization with the sample mean. We have seen the Prais-Winsten
transformation for the first observation of AR(1): 1-ρ2 Y1 and 1-ρ2 X1. This adds
more nonlinear complexity into the model and makes maximum likelihood the
preferred method for estimation.
For MA(1), the recursive process starts with the initial residual υ0 which is typically
set to its expected value (i.e., zero) or the sample mean. An alternative is to estimate
υ0 directly. The concentrated log-likelihood function is simpler but conditional to the
initialization of υ0 as follows:
161
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
This lesson differs from the previous ones in that the program setup is for nonlinear
maximum likelihood estimation. Nonlinear maximum likelihood was covered in
Chapter VII, and it is helpful to go back for a thorough review of the estimation
technique.
For each of three autoregressive models, the residual function must be defined in
order to maximize the objective log-likelihood function. The block from line 34 to
line 41 defines the residual function for an AR(1) model. The MA(1) specification is
given in lines 42 through 48, while the ARMA(1,1) is specified from line 49 to line
57. Notice that AR(1) and ARMA(1,1) require the use of Jacobians in the likelihood
function. The Jacobian function is defined in the block from line 29 to line 33.
/*
** Lesson 10.7: Maximum Likelihood Estimation
** AR(1), MA(1), ARMA(1,1)
*/
1 use gpe2;
2 output file = gpe\output10.7 reset;
3 load data[40,6]= gpe\cjx.txt;
4 year = data[2:40,1];
5 X = ln(data[2:40,2]);
6 L = ln(data[2:40,3]);
7 K = ln(data[2:40,5]);
8 data=X~L~K;
10 call reset;
11 _nlopt=1;
12 _method=0;
13 _iter=100;
14 _conv=1;
15 _b=b|0.5;
16 _jacob=&jcb;
17 _names = {"CONSTANT","LN(L)","LN(K)","AR(1)"};
18 call estimate(&ar,data);
19 _b=b|0;
20 _jacob=0;
21 _names = {"CONSTANT","LN(L)","LN(K)","MA(1)"};
22 call estimate(&ma,data);
23 _b=b|0.5|0;
24 _jacob=&jcb;
25 _names = {"CONSTANT","LN(L)","LN(K)","AR(1)","MA(1)"};
26 call estimate(&arma,data);
27 end;
34 proc ar(x,b);
35 local n,e,u;
36 n=rows(x);
37 e=x[.,1]-b[1]-b[2]*x[.,2]-b[3]*x[.,3];
162
AUTOCORRELATION
38 u=e-b[4]*lagn(e,1);
@ first obs transformation @
39 u[1]=sqrt(1-b[4]^2)*e[1];
40 retp(u);
41 endp;
42 proc ma(x,b);
43 local n,e,u;
44 n=rows(x);
45 e=x[.,1]-b[1]-b[2]*x[.,2]-b[3]*x[.,3];
46 u=recserar(e,e[1],b[4]); @ u[1]=e[1] since u[0]=0 @
/*
@ recursive computation of errors using @
@ built-in RECSERAR is the same as below: @
u=e; @ initialize: u[1]=e[1] @
i=2;
do until i>n;
u[i]=e[i]+b[4]*u[i-1];
i=i+1;
endo;
*/
47 retp(u);
48 endp;
49 proc arma(x,b);
50 local n,e,u,v;
51 n=rows(x);
52 e=x[.,1]-b[1]-b[2]*x[.,2]-b[3]*x[.,3];
53 u=e-b[4]*lagn(e,1);
@ first obs transformation @
54 u[1]=sqrt(1-b[4]^2)*e[1];
55 v=recserar(u,u[1],b[5]);
56 retp(v);
57 endp;
Using the linear least squares estimates as initial values of parameters (line 9), lines
15-18 carry out the estimation for the AR(1) model. Here is the result:
Initial Result:
Sum of Squares = 0.029940
Log Likelihood = 84.518
Parameters = -3.9377 1.4508 0.38381 0.50000
Final Result:
Iterations = 17 Evaluations = 4914
Sum of Squares = 0.027133
Log Likelihood = 86.438
Gradient of Log Likelihood = 4.9312e-005 0.00023054 0.00020964 2.6217e-
005
Asymptotic
Parameter Std. Error t-Ratio
CONSTANT -2.8220 0.56933 -4.9567
LN(L) 1.0926 0.17753 6.1546
LN(K) 0.54995 0.096866 5.6775
163
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The BHHH method is used for log-likelihood function maximization (see line 12),
and it converges in 17 iterations. As seen in line 15, the initial value of the AR(1)
coefficient is set to 0.5. The solution is close to that of the Cochrane-Orcutt (Lesson
10.3) and Hildreth-Lu (Lesson 10.4) procedures. The crucial point of this model is
the use of first observation transformation (line 39) and the resulting Jacobian
function must be incorporated for the exact maximum likelihood estimation.
Similarly, the MA(1) model is estimated in lines 19-22 with the starting value of the
MA(1) coefficient at 0. Here is the estimation result:
Initial Result:
Sum of Squares = 0.043382
Log Likelihood = 77.286
Parameters = -3.9377 1.4508 0.38381 0.00000
Final Result:
Iterations = 29 Evaluations = 8073
Sum of Squares = 0.022342
Log Likelihood = 90.226
Gradient of Log Likelihood = -0.00016182 -0.00094511 -0.00082061 4.1086e-
005
Asymptotic
Parameter Std. Error t-Ratio
CONSTANT -3.6176 0.27477 -13.166
LN(L) 1.3379 0.094876 14.102
LN(K) 0.44264 0.054316 8.1495
MA(1) -0.81620 0.095539 -8.5431
As the MA(1) model does not use the first-observation transformation of AR(1), the
Jacobian function should not be called (see line 20). The residual function is defined
with an autoregressive recursive series using the GAUSS built-in function
recserar (line 46). The initialization of the recursive series is the expected value
of the series, which is zero. Line 46 shows the use of recserar with initialization.
Check the GAUSS manual or online help for more information about the procedure
recserar. The computation of autoregressive recursive series is also explained in
the comment block immediately below line 46.
we have obtained the maximum likelihood estimates of β and θ as shown above. For
i = 1, υ0 = 0 is assumed. The alternative is to estimate υ0 together with β and θ.
Simply replace line 46 with the following:
164
AUTOCORRELATION
u=recserar(e,e[1]+b[5]*b[4],b[4]);
where b[5] is the unknown υ0 to be estimated with the rest of the parameters.
Starting from the initial guess of (θ,υ0) at (0,0), in addition to the linear least squares
estimator of β, the model is estimated exactly the same way as before except that
line 19 should be:
_b=b|0|0;
How do the estimation results differ from those of Lesson 10.7, which assumed υ0 =
0? We leave this question to you.
For ARMA(1,1), both the first observation transformation of AR(1) and the
autoregressive recursive series with initialization of MA(1) are required. The model
is estimated in lines 23-26. Here is the estimation result:
Initial Result:
Sum of Squares = 0.029940
Log Likelihood = 84.518
Parameters = -3.9377 1.4508 0.38381 0.50000 0.00000
Final Result:
Iterations = 21 Evaluations = 7059
Sum of Squares = 0.018525
Log Likelihood = 93.879
Gradient of Log Likelihood = -0.00012631 -0.00065739 -0.00055447
0.00015394 -1.6021e-006
Asymptotic
Parameter Std. Error t-Ratio
CONSTANT -2.6041 0.42941 -6.0644
LN(L) 1.0321 0.12537 8.2320
LN(K) 0.57271 0.071019 8.0641
AR(1) 0.66145 0.14519 4.5559
MA(1) -0.71077 0.12402 -5.7309
where b[6] is the unknown υ0 to be estimated with the rest of the parameters.
Starting from the initial guess of (ρ,θ,υ0) at (0.5,0,0), in addition to the linear least
squares estimator of β, the model is estimated exactly the same way as before except
165
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
where X is output, and the two factor inputs are labor L and capital K. The following
table presents the parameter estimates (numbers in parentheses are the estimated
standard errors of the parameters) and the corresponding log-likelihood function
value ll for each model.
The top row of the table identifies the model and its corresponding lesson: (1)
Lesson 10.1 is the ordinary least squares estimates without autocorrelation
correction; (2) Lesson 10.3 is the AR(1) model using the Cochrane-Orcutt iterative
procedure; (3) Lesson 10.4 is the same AR(1) model using Hildreth-Lu grid search
method; (4), (5), and (6) are based on Lesson 10.7, using nonlinear maximum
likelihood estimation for the model AR(1), MA(1), and ARMA(1,1), respectively.
The last column (7) is the ARMA(1,1) model estimated with the GPE built-in
conditional maximum likelihood method in Lesson 10.6. All the methods use the
entire sample of 39 observations from 1929 to 1967. For model comparison, the
statistics of pair-wise Likelihood Ratio will be useful. It is immediately clear that the
model must be corrected for autocorrelation. The plain OLS model (1) is rejected
based on LR tests with all the other models. Finally, the structure of the
autoregressive moving average ARMA(1,1) of both Lessons 10.6 and 10.7 cannot be
rejected.
166
XI
Distributed Lag Models
With the proper use of distributed lags, regression models can be expanded to
include dynamic features such as long-run and short-run elasticities and multipliers
for analysis. In this chapter we will consider two popular setups of distributed lags:
geometric, or Koyck lags, and polynomial, or Almon lags. The former is an infinite
distributed lags model with a geometric declining pattern, which in turn can be
transformed into a lagged dependent variable model. The latter is a finite distributed
lags model with polynomial functional restrictions. The combination of the two is the
so-called autoregressive distributed lag (ARDL) model.
For estimating such a model with lagged dependent variables, instrumental variable
(IV) estimation is suggested. IV weighs the trade-off between “biasedness” and
“inefficiency” and obtains a “consistent” parameter estimator which minimizes the
ill effects of using lagged dependent variables. Instrumental variable estimation for a
lagged dependent variable model is the focus of Lesson 11.2.
C = β0 + β1Y + β2 C-1 + ε
In this lesson we will estimate a lagged dependent variable model. The only new
input control variable is _dlags, which is used to specify the number of lags
desired. For example, by setting _dlags = 1, GPE inserts the first lag of the
dependent variable into the regression equation for least squares estimation. The
lagged dependent variable is added in front of the other explanatory variables. The
Durbin-H test statistic is automatically included in the output of _rstat when
_dlags is set to a value greater than zero.
/*
** Lesson 11.1: Lagged Dependent Variable Model
** Estimation and Testing for Autocorrelation
*/
1 use gpe2;
2 output file = gpe\output11.1 reset;
3 load z[67,3] = gpe\usyc87.txt;
4 y = z[2:67,2];
5 c = z[2:67,3];
6 call reset;
7 _names = {"c","y"};
8 _rstat = 1;
9 _dlags = 1;
10 call estimate(c,y);
11 _ar = 1;
12 _iter = 50;
13 call estimate(c,y);
14 end;
If more than one lag is needed, just change the value of _dlags to the desired
positive number of lags.
To estimate the model is simple, but to evaluate and interpret the effect of a lagged
dependent variables is not. Line 9 specifies that the model to be estimated includes
the first lag of the dependent variable. The following call to estimate (line 10)
proceeds to carry out least squares estimation of the model. Since _rstat is set to 1
in line 8, a summary of residual statistics including the new Durbin-H test statistic is
presented.
Alternatively, you can create the lagged dependent variable and then include it in
estimate as an independent variable. In GAUSS, the lagged variable is
constructed with the command lag1 or lagn. This method requires the extra step
of handling the initial observation lost from lagging the dependent variable explicitly
so that the program will run. Setting the GPE control variable _begin to the
beginning of the usable sample for estimation may be necessary. In addition, GPE
will not treat the variable you created differently from the rest of the explanatory
variables. Therefore, the Durbin-H test statistic, unique to the lagged dependent
variable model, is not computed. In passing, we note that testing linear restrictions
involving lagged dependent variables requires specifying restrictions on those
variables explicitly.
The result of the first least squares estimation (line 10) is given below:
168
DISTRIBUTED LAG MODELS
Number of Observations = 65
Mean of Dependent Variable = 1588.2
Standard Error of Dependent Variable = 955.14
Notice that the estimation range given in the first block of the output is from 2 to 66,
using 65 observations. This is because of the use of the first lag of the dependent
variable on the right-hand side of the regression equation. Next is a statement giving
the number of lags included in the estimation.
The last line of the first block of output is the Durbin-H statistic. Given the first-
order Rho at 0.45 with the Durbin-H test statistic as high as 4.2 (comparing with the
critical values of a standardized normal distribution), the problem of autocorrelation
is readily apparent. Methods of correction for autocorrelation discussed in the
previous chapter should be used to improve the results of the model estimation.
Lines 11 to 13 of the program correct and then re-estimate the model with a first-
order autocorrelated error structure. The default Cochrane-Orcutt iterative method is
used. We refer the reader to the output file output11.1 for details of the
regression results. In summary, here is our estimated lagged dependent variable
model with AR(1) error structure:
ε = 0.648 ε-1
s.e. (0.098)
The correction for first-order autocorrelation is certainly a right step to improve the
model. It may not be a bad idea to continue to carry out testing for higher orders of
autocorrelation. Remember to use the following statements before the last call of
169
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
With the results not shown here, we did not find a significant autocorrelation
problem at higher orders.
The alternative is to specify the entire data matrix for the variable _ivar. This is
useful for applying instrumental variable estimation in other contexts such as
measurement error in the regression model. We note that the matrix specification of
_ivar requires that its size (rows and columns) to be at least as large as that of the
data matrix of explanatory variables.
We now continue on from the end of Lesson 11.1, adding the option to estimate the
model using instrumental variables in the following program:
/*
** Lesson 11.2: Lagged Dependent Variable Model
** Instrumental Variable Estimation
*/
1 use gpe2;
2 output file=gpe\output11.2 reset;
3 load z[67,3]=gpe\usyc87.txt;
4 y=z[2:67,2];
5 c=z[2:67,3];
6 call reset;
7 _names={"c","y"};
8 _rstat=1;
9 _dlags=1;
10 _ar=1;
11 _iter=50;
12 call estimate(c,y);
13 _ivar=1;
14 call estimate(c,y);
15 end;
170
DISTRIBUTED LAG MODELS
which calls for the use of internal instrumental variables that GPE will construct for
the lagged dependent variables. In addition, the autocorrelation correction is
requested for the first order (line 10), hence one additional lag of explanatory
variables is needed as part of the instrumental variables.
The advantage is that you have more control over the addition of relevant
instrumental variables in order to improve the small-sample properties of the
estimator. In contexts other than the lagged dependent variable model, instrumental
variable estimation may be requested with the variable _ivar explicitly assigned to
a data matrix no smaller than that of explanatory variables.
We note that the scalar definition of _ivar = 1 will only work when specifying a
positive number of _dlags. _ivar = 1 without _dlags (or _dlags = 0)
will result in a program error.
Looking at the output file output11.2, the results of the first regression in this
lesson are the same as the results of the second regression of Lesson 11.1. The
second estimation of this lesson performs instrumental variable estimation while at
the same time correcting for first-order serial correlation. We will show you only a
partial result of the second regression estimation:
171
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The model uses four instrumental variables: the original, the first and second lags of
the explanatory independent variable Y, and the constant term. The second lag is
included due to the first-order serial correlation being specified for model estimation.
Comparing the estimation results obtained when instrumental variables are not used,
ε = 0.648 ε-1
s.e. (0.098)
ε = 0.668 ε-1
s.e. (0.097)
we see that their parameter estimates are similar. But the current estimated standard
errors of the parameters are slightly larger than the standard errors resulting from not
using the instrumental variables. Nevertheless, the conclusions of statistical
inferences are not affected in this example.
172
DISTRIBUTED LAG MODELS
We will keep the interpretation of the estimated model using instrumental variables
brief. First, the short-run marginal propensity to consume is 0.51. With the estimated
coefficient 0.46 for the lagged dependent variable, the long-run consumption change
is about 0.94 for each dollar increase of income. To realize 50% of the total effect
(that is, half of 0.94 or 0.47) will take 0.89 years. This is the concept of median lag
frequently used in dynamic analysis. The other measurement is the lag-weighted
average or the mean lag, which is computed at about 0.85 years.
Remember the formula for computing the median lag and mean lag? Let λ be the
estimated parameter of the lagged dependent variable. Then the median lag is
computed as ln(0.5) , and the mean lag is λ .
ln(λ ) 1− λ
In GPE, a polynomial lag model is defined with the input control variable _pdl. The
variable _pdl is a 3-column matrix with the number of the rows corresponding to
the number of explanatory variables. A polynomial lag structure must be defined for
each variable (row) in the _pdl matrix. The 3-column entry for each explanatory
variable must be separately called out for the lags q, polynomial orders p, and end-
point restrictions r in that order. End-point restrictions “tie down” one or both ends
of the polynomial’s curve, enforcing a theoretical or empirical justification of the lag
structure. For variables that do not have the polynomial lag structure, the 3-column
entry 0 0 0 should be used. Normally the constant term is not included in the
_pdl matrix, unless otherwise specified.
Estimating a model with a polynomial lag structure defined for each explanatory
variable is essentially the same as restricted least squares. The number of restrictions
is (q-p) polynomial restrictions plus the number of end-point restrictions. Any
additional linear restrictions imposed with the variable _restr must take into
account the correct structure of right-hand side variables that _pdl may add to the
original equation.
173
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
variables whose four coefficients were constrained to sum to zero. Also, both end-
points are restricted on the polynomial lags.
/*
** Lesson 11.3: Polynomial Distributed Lag Model
** Almon Lag Model Revisited
*/
1 use gpe2;
2 output file = gpe\output11.3 reset;
3 load almon[61,3] = gpe\almon.txt;
4 cexp = almon[2:61,2];
5 capp = almon[2:61,3];
6 qt = almon[2:61,1];
11 call reset;
12 _rstat = 1;
13 _end = 36;
In her original discussion, Almon used only 36 observations, therefore we end our
estimation at 36 (line 13). As you will remember, suppressing the constant term (line
14) is necessary to avoid the dummy variable trap when using all four seasonal
dummy variables. The reason for not dropping one dummy variable is so that we can
impose a linear restriction, summing the coefficients of these four dummy variables
to zero (line 15). Line 16 defines the polynomial lag structures with _pdl. Each row
of _pdl controls a different explanatory variable and rows are separated by
commas. Remember that carriage returns are not “seen” by GAUSS. Three columns
of _pdl specify the following: q = lags, p = orders, and r = end-point restrictions,
respectively. There are four possible settings for end-point restrictions: -1
(beginning), 1 (ending), 2 (both), or 0 (no restriction). The first row of _pdl in line
16 assigns 7 lags, to the fourth order, with both endpoints restricted to the variable
CAPP. The four dummy variables are not affected since each entry is set to 0 from
the second to the last rows of _pdl.
9
As pointed out by Greene (1997, Chapter 17), it was not possible to reproduce Almon’s
original results. Our regression results match with Greene’s results.
174
DISTRIBUTED LAG MODELS
------------------------
Dependent Variable = CEXP
Estimation Range = 8 36
Number of Observations = 29
Mean of Dependent Variable = 2568.3
Standard Error of Dependent Variable = 468.69
In the output, seven lags of CAPP are estimated with the adjusted sample size. Look
at the results of hypothesis testing for linear restrictions. Although we have explicitly
defined only one restriction to sum all the seasonal effects across four quarters to
zero, there are six restrictions. How are the other five restrictions entering the model?
Remember the 7 lags and 4 orders of the polynomial lag structure for the variable
CAPP? Equivalently, there are 3 restrictions (that is, 7-4). On top of them, there
175
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
were two end-point restrictions. The computed Wald F-test statistic of 1.13 (with P-
value 0.38) suggests these 6 linear restrictions cannot be jointly rejected. Similar
results are obtained from the other test statistics. We notice the insignificant seasonal
effect for all four quarters. The model may be re-estimated without quarterly dummy
variables but with a constant.
Of course, there are problems of model misspecification. As you can see from the
Durbin-Watson statistic, autocorrelation is a serious problem that was not addressed
in the original Almon study. We leave for an exercise the challenge of refining this
model further.
To implement an ARDL model using GPE, we need to specify _pdl for the
appropriate polynomial lag structure (restricted or unrestricted) of the explanatory
variables and _dlags for the number of lags of dependent variable. Additional
parameter restrictions may be imposed on these lag variables as well. We have seen
the input control variables _pdl and _dlags used separately in previous lessons.
For specifying an ARDL model, both _pdl and _dlags are required.
10
To correct for autocorrelation, we could continue on the model of Lesson 11.3 and assume
that the error structure is AR(1) or even higher-order. Such derivation will result in
complicated non-linear restrictions involving lagged variables (dependent and independent).
176
DISTRIBUTED LAG MODELS
/*
** Lesson 11.4: Autoregressive Distributed Lag Model
** Almon Lag Model Once More
*/
1 use gpe2;
2 output file = gpe\output11.4 reset;
3 load almon[61,3] = gpe\almon.txt;
4 cexp = almon[2:61,2];
5 capp = almon[2:61,3];
6 qt = almon[2:61,1];
11 call reset;
12 _rstat = 1;
13 _end = 36;
14 _pdl = {7 4 2};
15 _dlags = 2;
16 _names={"cexp","capp"};
17 call estimate(cexp,capp);
18 end;
We can view it as a restricted version of ARDL model. For example, assuming AR(1)
correlation ε = ρ ε-1 + u for the long-run relation Y = α + β X + ε is the same as assuming the
short-run dynamics Y = a + b X + c X-1 + ρY-1 + u with the non-linear restriction b = -c/ρ. In
other words, given ε = ρ ε-1 + u, we must have a = α/(1-ρ), b = β, c = -βρ.
177
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The estimated model is a restricted ARDL model. The restrictions come in the form
of the fourth-order (of 7 lags) polynomial and end-point restrictions. There are 5
restrictions because of the polynomial lag structure assumed, and these restrictions
are statistically significant based on all the tests. The first two lags of the dependent
variable, with coefficients 1.25 and –0.63, are statistically significant. The stability
of a dynamic equation hinges on the characteristic equation for the autoregressive
part of the model. It is easy to show that the model is stable.11 By augmenting two
lags of the dependent variables, the model is free of autocorrelation as required.
11
Solving the characteristic function 1-1.25z + 0.63z2 = 0, z = 0.9921 ± 0.7766i. It is obvious
that the solutions are greater than 1 in absolute value. Thus two complex solutions of z lie
outside of unit circle.
178
XII
Generalized Method of Moments
Recall from the study of maximum likelihood estimation that assumptions regarding
the underlying probability density or likelihood function of a model structure are
rather strong, typically including the assumption that the model error is normally
distributed. The alternative to the maximum likelihood approach, known as
generalized method of moments (GMM), does away with assumptions regarding the
probability density or likelihood function. Instead, GMM estimation begins by
specifying a set of identities, or moment functions, involving the model variables and
parameters, and then finds the set of parameters that best satisfies those identities
according to a quadratic criterion function. As a result, the GMM estimator is
consistent. For some ideal cases, it can be shown to be as efficient as a maximum
likelihood estimator. In addition to the classical least squares and maximum
likelihood methods, GMM serves as an alternative for regression parameter
estimation. Even for the purpose of estimating the parameters for a probability
distribution of a random variable, GMM is a viable alternative to maximum
likelihood estimation.
GMM estimation is nonlinear in nature. In the following, we shall revisit the problem
of estimating a probability distribution first seen in Lesson 6.3. Instead of using the
maximum likelihood method, GMM is introduced to estimate the parameters of a
gamma probability distribution. It is generalized to study a nonlinear regression
model of rational expectations as done by Hansen and Singleton (1982), where a set
of moment equations or orthogonality conditions are estimated. Finally, the special
cases of linear regression models are considered. For linear models, GMM is more
general than the least squares estimation. It is analogous to an instrumental variable
estimator which accounts for heteroscedasticity and autocorrelation. One of the
advantages of GMM estimation is that it is less dependent on the underlying
probability density or likelihood function. Classical least squares and maximum
likelihood estimation methods are special cases of GMM.
E[m(X,θ)] = 0
Let’s consider a simple example. Suppose X is a random variable for which the
population mean is defined as θ = E(X). Then, E(X) - θ = E(X-θ) = 0. The moment
function is m(X,θ) = X - θ = 0 so that E[m(X,θ)] = 0. In the now familiar maximum
likelihood case, m(X,θ) = ∂ll(θ)/∂θ and E[∂ll(θ)/∂θ] = 0, where ll(θ) is the log-
likelihood function with unknown parameters θ. Moments are used to describe the
characteristics of a distribution, and much of the statistical estimation focuses on the
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
To ensure that W is positive definite, we need to say some things about its
autocovariance structure. For example,
where p is the degree of autocovariance assumed in the model. This is the White-
Newey-West estimator of Var(m(θ)), which guarantees positive definiteness by
down-weighting higher-order autocovariances.
For an exactly identified model, the optimal value of Q is zero and therefore the
choice of weighting matrix W is irrelevant. For an over-identified case, there are L-K
moment restrictions which must be satisfied with a minimal positive value (or
penalty) of Q. The function of weighting matrix W as constructed is to place the
importance of each individual moment function. Typically, the first iteration of
GMM estimation starts with the special case of W = I (the identity matrix). In other
words, we find the estimator θ0 of θ that minimizes the quadratic function, Q(θ) =
m(θ)'m(θ), with the associated asymptotic covariance matrix:
180
GENERALIZED METHOD OF MOMENTS
The asymptotic covariance matrix for the resulting GMM estimator θ1 of θ is:
Var(θ*) = [G(θ*)'W(θ*)G(θ*)]-1
Q* = Q(θ*) = m(θ*)'W(θ*)m(θ*)
Q* serves as the basis for hypothesis testing of moment restrictions. If there are L
moment equations with K parameters (L > K), the Hansen test statistic Q* follows a
Chi-square distribution with L-K degrees of freedom. Justification for including L-K
additional moment functions is made based on the value of Q*.
The GMM estimator of θ = (λ,ρ) is obtained from minimizing the weighted sum-of-
squares:
where m(θ) = (m1(θ), m2(θ), m3(θ), m4(θ))' and W is a positive definite symmetric
matrix. Conditional to the weighting scheme W, the variance-covariance matrix of θ
is estimated by:
If we let W equal the inverse of the covariance matrix of m(θ), or [Var(m(θ))]-1, then
Var(θ) = [G(θ)'W G(θ)]-1.
181
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
From here, we can show that the GMM class of estimators includes the maximum
likelihood estimator as a special case. Solving from the score of the log-likelihood
function based on the gamma distribution:
where ll(X,θ) = N [ρln(λ) - lnΓ(ρ)] - λ ∑i=1,2,...,N Xi + (ρ-1) ∑i=1,2,...,N ln(Xi). Thus, the
maximum likelihood estimate of θ = (λ,ρ) is an exactly identified GMM with m(θ) =
(m1(θ), m3(θ)). For this exactly identified case, the weighting matrix W is irrelevant,
and the optimal criterion Q is zero.
/*
** Lesson 12.1: GMM Estimation of a Gamma Distribution
** See Greene (1999), Example 4.26 and Example 11.4
*/
1 use gpe2;
2 output file=gpe\output12.1 reset;
3 load data[21,2]=gpe\yed20.txt;
4 x=data[2:21,1]/10; @ income: data scaling may help @
5 call reset;
6 _nlopt=0; @ it is a minimization problem @
7 _method=5;
8 _iter=100;
12
Our implementation of this example is slightly different from that of Example 11.4 in
Greene (1999). Instead of using the scaled moment equations as Greene did, we scale the data
series first and then estimate the original moment equations as described. The numerical
results are more stable and easier to evaluate.
182
GENERALIZED METHOD OF MOMENTS
16 end;
/*
User-defined moments equations, must be named mf
based on gamma distribution: b[1]=rho, b[2]=lambda
*/
17 proc mf(x,b);
18 local n,m;
19 n=rows(x);
20 m=zeros(n,4);
21 m[.,1]=x-b[1]/b[2];
22 m[.,2]=x^2-b[1]*(b[1]+1)/(b[2]^2);
23 m[.,3]=ln(x)-gradp(&lngamma,b[1])+ln(b[2]);
24 m[.,4]=1/x-b[2]/(b[1]-1);
25 retp(m);
26 endp;
/*
Log of gamma distribution function
*/
27 fn lngamma(x)=ln(gamma(x));
28 #include gpe\gmm.gpe;
Most of the computation details and algorithms of GMM estimation in GAUSS are
grouped in a module named GMM.GPE. There are many ways to include a module
in your program. The simplest is to use the GAUSS compiler directive #include.
It will include the specified module during the compilation of your program. We
suggest including the module GMM.GPE at the end of your program. If you have
properly installed the GPE package with your version of GAUSS, GMM.GPE is
located in the GPE subdirectory. Putting source codes in a separate file hides their
implementation “secrets.” If you are interested in the programming details, you can
examine the program listing of GMM.GPE available in Appendix B-1.
The module GMM.GPE defines two objective functions, gmmqw and gmmq. The
former uses a predefined weighting matrix, while the latter computes the weighting
matrix together with the unknown parameters. In addition, the procedure gmmout
prints the regression output. Since GMM is a nonlinear optimization method, it
requires a user-defined moment function with the name mf which, like the other
functions (e.g. residual function for nonlinear least squares or maximum likelihood
estimation), depends on a sample data matrix x and a parameter vector b.
One of the key advantages of GMM is that it allows for a flexible specification of
covariance structure, including heteroscedasticity and autocorrelation. We have seen
the use of the GPE control variable _hacv to compute the heteroscedasticity-
autocorrelation-consistent covariance matrix in the context of heteroscedasticity
(Chapter IX) and autocorrelation (Chapter X). _hacv is used similarly for nonlinear
GMM estimation, except that heteroscedasticity-consistent covariance is the default
option here. If we assume first-order autocovariance, then line 10 needs to be
modified as follows:
183
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
_hacv = {1,1};
We begin the GMM estimation by minimizing the objective function gmmqw (line
11) with the default weighting matrix I (identity matrix). The estimated parameters
are used as starting point for the next iteration. First, we start from the estimated
parameters (line 12) and compute the corresponding weighting matrix (line 13).
Then the improved consistent parameter estimates are obtained (line 14) and printed
(line 15). We could continue on updating the weighting matrix and estimating the
parameters until convergence. Equivalently, we could estimate the parameters
together with the associated weighting matrix. However, finding a convergent
solution is not guaranteed due to a high degree of nonlinearity in the objective
function.
Running lesson12.1, the first set of GMM estimation results, based on the identity
weighting matrix, is as follows:
Initial Result:
Function Value = 6.4658
Parameters = 3.0000 1.0000
Final Result:
Iterations = 8 Evaluations = 123
Function Value = 0.0068077
Parameters = 2.3691 0.74112
Gradient Vector = -0.21423 0.44266
Asymptotic Asymptotic
Parameter Std. Error t-Ratio
X1 2.3691 0.010345 229.02
X2 0.74112 0.0028050 264.21
The second set of GMM estimation results is based on the previous estimates of the
parameters and the associated weighting matrix (see lines 12-14 of the program
lesson12.1). As is standard practice in nonlinear optimization, the estimated standard
errors and t-ratios of the parameters are computed from the inverse of the hessian
matrix calculated during minimization of the quadratic criterion function. Note that
184
GENERALIZED METHOD OF MOMENTS
these are not the GMM estimates of the standard errors and t-ratios. The correct
estimates of the standard errors and t-statistics of the parameters are computed at the
end. In addition, the Hansen test statistic of the moment restrictions is presented.
Calling the procedure gmmout (see line 15 of the program lesson12.1) gives us the
following result:
Initial Result:
Function Value = 13.294
Parameters = 2.3691 0.74112
Final Result:
Iterations = 5 Evaluations = 103
Function Value = 3.2339
Parameters = 2.8971 0.84839
Gradient Vector = -487.16 -6.2162
Asymptotic Asymptotic
Parameter Std. Error t-Ratio
X1 2.8971 0.0079292 365.37
X2 0.84839 0.062556 13.562
For the two parameters of the gamma distribution, ρ and λ, we now compare their
GMM estimators with the maximum likelihood (ML) estimators obtained earlier
from Lesson 6.3 in Chapter VI. The standard errors are in parentheses.
ML GMM
ρ 2.4106 (0.7161) 2.8971 (0.0044)
λ 0.7707 (0.2544) 0.8484 (0.120)
185
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
contemporaneous correlation with the model’s errors. The advantage of GMM over
IV is that the model need not be homoscedastic and serially independent. The
covariance matrix of the averages of sample moments is taken into account for
minimizing the GMM criterion function.
Recall that the objective function, known as the GMM criterion function, to be
minimized is Q(β) = m(β)'W m(β). We would optimally choose W to be equal to
[Var(m(β))]-1. To find the β* which minimizes Q(β), we solve the zero-gradient
conditions: ∂Q(β*)/∂β = 0. Our estimator β* of β will be asymptotically efficient and
normally distributed with mean β and covariance matrix:
Var(β*) = {G(β*)'[Var(m(β*))]-1G(β*)}-1
Nonlinear IV Estimation
Now we consider the regression model ε = ε(β) = F(Y,X,β) (or Y- f(X,β)), where Y
is the endogenous or dependent variable, and X consists of predetermined (or
independent) variables. β is a K-element parameter vector. Let Z be a set of L
instrumental variables, for which we assume L ≥ K. Under the general assumption
that E(ε) = 0 and Var(ε) = E(εε') = Σ = σ2Ω, we can write the model as E(Z'ε) = 0.
The sample moment functions are defined by m(β) = 1/N Z'ε(β) with covariance
matrix:
186
GENERALIZED METHOD OF MOMENTS
Linear IV Estimation
If the model is linear, or ε = ε(β) = Y - Xβ, then the GMM estimator of β is
equivalent to the IV estimator:
β* = (X'Z[Z'Σ(β∗)Z]-1Z'X)-1 X'Z[Z'Σ(β∗)Z]-1Z'Y
Var(β*) = {X'Z[Z'Σ(β∗)Z]-1Z'X }-1
β* = (X'X)-1X'Y
Var(β*) = (X'X)-1[X'Σ(β∗)X](X'X)-1
Special Cases
If the IV model is homoscedastic and serially uncorrelated, that is Σ = σ2I, then
β* = (X'Z[Z'Z]-1Z'X)-1X'Z[Z'Z]-1Z'Y
Var(β*) = σ2(β∗) {X'Z[Z'Z]-1Z'X}-1
β* = (X'X)-1X'Y
Var(β*) = σ2(β∗) (X'X)-1
Hypothesis Testing
Based on the statistical inference for nonlinear regression models (see Chapter VII,
Lesson 7.3 in particular), there are three corresponding test statistics for testing linear
or nonlinear restrictions on the GMM estimate of β. Suppose there are J constraint
equations written in the form c(β) = 0. Let β* be the unconstrained GMM estimator
of β, and let b* be the constrained estimator of β. All the three test statistics
discussed below will follow a Chi-square distribution with J degrees of freedom.
Wald Test
The Wald test statistic, based on the unconstrained estimator β*, is defined as:
W = c(β*)'[Var(c(β*)]-1c(β*)
= c(β*)' {(∂c(β*)/∂β) [Var(β*)] (∂c(β*)/∂β)'}-1 c(β*)
LM = α[Var(α)]-1α'
= m(b*)'W G(b*)[G(b*)'W G(b*)]-1G(b*)'W m(b*)
187
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
LR = Q(b*) - Q(β*)
We note that both β* and b* are computed using the same consistent estimator of the
weighting matrix W.
where Zt is the information available to the consumer at time t and Ct+τ is the
consumption τ periods from t. 0 < β <1 is known as the discount factor of time
preference. Given N different stocks, the optimal consumption-investment allocation
must satisfy the following condition:
for i = 1,...,N. u'(Ct) = ∂u/∂Ct is the marginal utility of consumption at time t. Pi,t+1 is
the price of stock i at time t+1 and Di,t+1 is the dividend per share of stock i at t+1.
The ratio (Pi,t+1+Di,t+1)/Pi,t represents the returns of investment in stock i between
periods t and t+1. In other words, this merely defines the equilibrium condition that
the marginal utility of consumption in the current period must equal the expected
return next period from investing in stock i. Assume that the utility function exhibits
constant relative risk aversion as:
where 1-α is known as the coefficient of relative risk aversion, and 1-α > 0. Then,
for each i = 1, ..., N, the optimal decision-rule is
188
GENERALIZED METHOD OF MOMENTS
For more detailed description of the model, see Hansen and Singleton (1982).13 In
terms of our econometric estimation, the model may be expressed with the
orthogonality condition: E[Z ε(X,θ)] = 0, where X = [X1,X2,X3], θ = (β,α), and
The data file gmmq.txt installed in the GPE subdirectory consists of three variables
(335 observations from January 1959 to December 1978, though not the original
Hansen-Singleton data):
We note that this model consists of a system of two nonlinear equations. The
instrumental variables Z consist of one or several lags of X and a constant. The
following program lesson12.2 implements and estimates the Hansen-Singleton
rational expectations model. The program structure looks similar to that of Lesson
12.1. The main difference is the model specification described in the block from line
19 to line 27 for the moment function procedure mf(x,b), which reflects exactly
the model described above. We use one lag of each of the three variables X1, X2, X3,
and a constant as the instrumental variables (see line 22). More lags may be included
for the additional instrumental variables. The two orthogonality equations for stock
and risk-free returns are concatenated to form the moment functions system (see
lines 23 and 24). The rest of the program performs the GMM estimation. First, the
initial estimates of (β,α) are obtained by assuming the default identity weighting
matrix in the objective function for minimization (line 10). Then, with the resulting
covariance matrix of the moment functions, the consistent parameter estimates are
calculated (see lines 11-13). An efficient solution is obtained from simultaneously
estimating the parameters and the corresponding covariance matrix in the second
iteration (see lines 15 and 16).
/*
** Lesson 12.2: A Nonlinear Rational Expectation Model
** GMM Estimation of Hansen-Singleton Model (Ea, 1982)
*/
1 use gpe2;
2 output file=gpe\output12.2 reset;
3 load x[335,3]=gpe\gmmq.txt; @ data columns: @
@ (1) c(t+1)/c(t) (2)vwr (3)rfr @
4 call reset;
5 _nlopt=0;
6 _method=5;
7 _tol=1.0e-5;
8 _iter=100;
13
For computational implementation of the model and the data file gmmq.txt used in this
lesson example, see also the Hasen-Heaton-Ogaki GMM package from the American
University GAUSS archive at http://www.american.edu/academic.depts/cas/econ/gaussres/
GMM/GMM.HTM.
189
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
18 end;
/*
User-defined moments functions, must be named mf
*/
19 proc mf(x,b);
20 local z,n,m;
21 n=rows(x);
22 z=ones(n,1)~lagn(x,1); @ IV @
23 m=z.*(b[1]*(x[.,1]^(b[2]-1)).*x[.,2]-1);
24 m=m~(z.*(b[1]*(x[.,1]^(b[2]-1)).*x[.,3]-1));
25 @ nonlinear multiple equations system @
26 retp(packr(m));
27 endp;
28 #include gpe\gmm.gpe;
The first part of the output (the first iteration, using the identity matrix as the
weighting matrix) is only preparation for computing the consistent parameter
estimates in the second iteration of the GMM estimation. The result of second
consistent estimation is shown below:
Initial Result:
Function Value = 55.406
Parameters = 0.99977 -0.00012883
Final Result:
Iterations = 13 Evaluations = 239
Function Value = 9.4685
Parameters = 0.99919 0.85517
Gradient Vector = -0.00026951 0.00017303
Asymptotic Asymptotic
Parameter Std. Error t-Ratio
X1 0.99919 0.00012573 7946.9
X2 0.85517 0.044497 19.219
190
GENERALIZED METHOD OF MOMENTS
The consistent estimates (0.9992, 0.8552) of the two parameters are obtained using 8
instrument variables (4 for each of the two equations). The Hansen test statistic of
the extra 6 moment restrictions is barely statistically significant at 5% level (with the
critical value 12.59).
Initial Result:
Function Value = 13.306
Parameters = 0.99919 0.85517
Final Result:
Iterations = 7 Evaluations = 127
Function Value = 12.474
Parameters = 0.99950 0.78735
Gradient Vector = 0.0069545 0.0056644
Asymptotic Asymptotic
Parameter Std. Error t-Ratio
X1 0.99950 9.8686e-005 10128.
X2 0.78735 0.037531 20.979
The final GMM estimate of (β,α) at (0.9995, 0.7874) is consistent with the result of
Hansen and Singleton (1982). Since it is the convergent solution, it is also efficient.
However, the 6 moment restrictions are significant only at the 10 percent level.
191
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Linear GMM
GMM estimation of a linear regression model is essentially the same as IV
estimation with a general covariance matrix. In Chapters IX and X, we discussed the
computation of White-Newey-West estimates of heteroscedasticity-autocorrelation-
consistent covariance by using the GPE control variable _hacv. In the case of the
IV approach, Lesson 11.2 introduced the use of another GPE control variable,
_ivar, to specify a set of instruments for linear model estimation. _ivar may be
internally determined in the case of autocorrelation correction for a lagged dependent
variable model. Alternatively, it may be externally specified as a data matrix
consisting of a list of instrumental variables. Refer to your econometrics textbook for
the requirements and proper use of instrumental variables in a regression model.
In GPE, _ivar may be combined with _hacv to carry out GMM estimation. In
addition, as in a nonlinear model estimation, we can set the control variable _iter
to allow for iterated computation of the parameter estimates. If the solution
converges, it is efficient. Even if it does not converge, the GMM estimator is still
consistent.
Ct+1 and Ct are expected and current consumption, respectively. Let Zt be historical
information available to the consumer at time t or earlier. Then the orthogonality
condition becomes:
E(Ztεt+1) = 0
From other theories of consumption, the instrumental variables Z may include levels
of income Y and consumption C in addition to a constant. That is,
Zt = [ 1 Ct Yt]
Further lags of C and Y may be added to the model as needed. The consumption-
income relationship was studied in Chapter XI with income as the explanatory
variable. Based on an Euler equation of a constrained expected utility maximization,
in this example, the future consumption is affected indirectly by the current and past
income as the instrumental variables. Using the U.S. time series data of usyc87.txt,
192
GENERALIZED METHOD OF MOMENTS
/*
** Lesson 12.3: GMM Estimation of U.S. Consumption Function
** GMM Estimation of a Linear Regression Model
*/
1 use gpe2;
2 output file=gpe\output12.3 reset;
3 load z[67,3]=gpe\usyc87.txt;
4 y = z[2:67,2];
5 c = z[2:67,3];
6 call reset;
7 _names={"c","c1"};
8 _rstat=1;
9 _rplot=2;
10 _dlags=1;
11 _ivar=ones(rows(y),1)~lagn(c~y,1);
12 _hacv={1,1};
13 _iter=100;
14 call estimate(c,0);
15 end;
Two variables, named C and Y, are read from the data file usyc87.txt. Line 10,
_dlags = 1, specifies the lagged dependent variable model. GMM estimation for
the linear autoregressive consumption function is given in line 14, with the
instrumental variables _ivar specified in line 11. We use only the first lag of
income and consumption variables in addition to a constant as the instruments. The
first-order autocovariance structure is specified in line 12, in which the White-
Newey-West estimate will be computed. The computation of GMM estimates will be
iterated until convergence or until the limit set in line 13, _iter = 100. The
empirical results in greater detail can be found in the output file output12.3.
193
XIII
Systems of Simultaneous Equations
GPE can estimate systems of linear and nonlinear equations. For a system of linear
equations, you need to define the endogenous and predetermined (including lagged
endogenous, current and lagged exogenous) variables. By selecting and identifying
the relevant variables for each equation, the procedure estimate carries out the
system model estimation as in the case of a single equation model. For a system of
nonlinear equations, it becomes more involved to define the functional form for the
model equations. In this chapter, the classic example of Klein Model I (1950) is used
to demonstrate the estimation of a system of linear regression equations. The special
case of seemingly unrelated regressions (SUR) is considered with the Berndt-Wood
model of energy demand. Finally, re-examining the Klein Model, nonlinear
maximum likelihood estimation is shown to be useful for estimating a system of
nonlinear equations.
YB + XΓ = U
Given the data matrices Y and X, the unknown parameters in B and Γ can be
estimated using a variety of methods. GPE implements both single equation (limited
information) methods and simultaneous equations (full information) methods. Before
the parameter estimation, the model of multiple equations must be properly specified
to account for the relevant variables and restrictions. In GPE, this is done by
specifying the structure of the parameter matrices B and Γ. It uses a couple of
specification matrices to define the stochastic equations and fixed identities of the
system by representing the parameter matrices of the system as arrays of 1’s, 0’s, and
–1’s, signifying the inclusion or exclusion of variables from particular equations. In
the following, we discuss the three most important input variables that control the
specification and estimation of a simultaneous linear equations model.
• _eq
• _id
• _method
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
First, the variable _eq specifies the stochastic equation matrix for system model
estimation. This is a Gs by (G+K) matrix with elements -1, 0, and 1 arranged in
accordance with the order of endogenous variables Y followed by the predetermined
variables X. Note that Gs is the number of stochastic equations and G ≥ Gs. For each
stochastic equation, there is exactly one row of _eq to define it. In the stochastic
equation specification matrix _eq, an element -1 indicates the left-hand side
endogenous variable. Only one -1 entry is allowed in each equation. An element 1
indicates the use of an endogenous and/or a predetermined variable on the right-hand
side of an equation. The zeros indicate the corresponding unused or excluded
variables in the equation. Constant terms are not normally included in the equation
specification. If _eq is not specified, or _eq=0 by default, a SUR equations system
is assumed. In this case, Gs = G, and the _eq matrix consists of a G×G sub-matrix
with -1 in the diagonals and zeros elsewhere (endogenous variables portion of _eq),
and a G×K sub-matrix consisting entirety of ones (predetermined variables portion of
_eq).
The second input variable _id specifies the identity equation specification matrix
for a system model. _id is similar in size and setup to _eq, except that its entries
can be any value as required by the model. If _id is not specified, or _id=0 by
default, there is no identity. To ensure system compatibility, the number of rows in
two specification matrices _eq and _id must sum to G, the total number of
endogenous variables or equations in the system.
The input variable _method controls the use of the specific method of estimation.
In the context of simultaneous linear equations, the available estimation methods are:
Note that LIML and FIML are not true nonlinear maximum likelihood estimation
methods. Instead they are types of instrumental variables estimation. In GPE, three
variants of the FIML method are available:
2SLS and 3SLS are flavors of the instrumental variables estimation method, where
the instrumental variables used are all the predetermined variables in the system. For
estimation of a linear system model, external instrumental variables may be
requested and specified in the matrix _ivar. The data matrix _ivar will be
combined with all the predetermined variables to form the basis for instrumental
variable estimation. A constant term is automatically included in _ivar. For
technical details and comparisons of different estimation methods for a linear
equations system, refer to your econometrics textbook.
196
SYSTEM OF SIMULTANEOUS EQUATIONS
The first three equations are stochastic with unknown parameters α’s, β’s, and γ’s,
respectively. The remaining three equations are accounting identities. Since the sum
of private and public wage bills (W1+W2) appears in the first equation, it is more
convenient to introduce one additional endogenous variable W, total wage bill, with
the accounting identity:
W = W1 + W2
197
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
explain the construction of the first line of these two matrices, and leave you to
puzzle out the rest.
First, the _eq matrix (line 21) specifies which variables are to be included in which
stochastic equations. Recall the first stochastic equation in the Klein Model I:
Note that the column under the variable C contains a –1. This means that C is the
left-hand side variable of the first stochastic equation. Since I and W1 are not in the
first stochastic equation, we place 0’s in their respective columns. Looking again at
this equation, we see that the right-hand side variables are: constant, P, P-1, and W
(remember that W = W1+ W2). To let GPE know that P, P-1, and W are in the first
equation, we place 1’s in their respective places in the _eq matrix. This does not
mean that their coefficients are restricted to be equal to one, it merely tells GPE to
include those particular variables in the equation. GPE includes the constant
automatically. All other variables (namely X, K, X-1, K-1, W2, A, G, T) are not in the
first equation. Putting 0’s in the respective places of these variables in the _eq
matrix lets GPE know that it should not include these variables in the first stochastic
equation of the system.
/*
** Lesson 13.1: Klein’s Model I
** Simultaneous Equation System of Klein’s Model I
*/
1 use gpe2;
2 output file = gpe\output13.1 reset;
3 load data[23,10] = gpe\klein.txt;
16 yvar=c~i~w1~x~p~k~w;
17 xvar=lag1(x~p~k)~w2~a~g~t;
18 call reset;
19 _names={"c","i","w1","x","p","k","w",
"x-1","p-1","k-1","w2","a","g","t"};
20 _vcov=1;
@ C I W1 X P K W XL PL KL W2 A G T 1 @
21 _eq = {-1 0 0 0 1 0 1 0 1 0 0 0 0 0,
0 -1 0 0 1 0 0 0 1 1 0 0 0 0,
0 0 -1 1 0 0 0 1 0 0 0 1 0 0};
22 _id = { 1 1 0 -1 0 0 0 0 0 0 0 0 1 0,
198
SYSTEM OF SIMULTANEOUS EQUATIONS
0 0 -1 1 -1 0 0 0 0 0 0 0 0 -1,
0 1 0 0 0 -1 0 0 0 1 0 0 0 0,
0 0 1 0 0 0 -1 0 0 0 1 0 0 0};
23 _begin=2;
30 _iter=100;
31 _method=3; @ 3SLS estimation (iterative) @
32 call estimate(yvar,xvar);
33 _method=4; @ FIML estimation @
34 call estimate(yvar,xvar);
35 end;
Similarly, line 22 specifies the identity equations of the model. Take the first identity
equation as an example:
X=C+I+G
It involves the variables X, C, I, and G. Therefore, in the first row of the _id
matrix, only the relevant columns have non-zero values. Typically these entries are 1
or -1, but they could be any other values as required. Variables not used in the
definition of an identity have zeros in the corresponding places of the _id matrix.
The first row of the _id matrix looks like this:
@ C I W1 X P K W XL PL KL W2 A G T 1 @
_id = { 1 1 0 -1 0 0 0 0 0 0 0 0 1 0, …
The easiest way to understand the construction of the system model described so far
is to relate it with the matrix representation:
YB + XΓ = U
Because of the use of lag variables, the first observation of the data is lost and
estimation must start from the second observation (see line 23). In lines 24 through
34, five estimation methods are carried out. They are OLS, LIML, 2SLS, 3SLS, and
FIML. It is of interest to see the covariance matrices of the equations, thus line 20
sets the option _vcov=1 to show the variance-covariance matrix across equations
and across parameters as well. Note that 3SLS and FIML are iterative methods, and
it is wise to set an iteration limit for the solution to converge. Line 30, _iter=100,
does the job.
Running the program of Lesson 13.1 will generate about 20 pages of output. To save
space, we will only present the results of 2SLS because of its popularity in the
literature. You should run the program in its entirety and check the output file to see
the complete results. In a summary table, the parameter estimates of these methods
are listed and compared. You need to check your econometrics textbook for the
199
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
evaluation of the pros and cons of these estimation methods, in particular the
differences between limited information and full information estimation methods.
The regression results of a typical linear system model are divided into two parts: the
results of the system of equations as a whole, and the results of each separate
equation. Here we present only the first part of 2SLS for the estimated parameters,
including the variance-covariance matrices across equations. For the rest of
regression results by equation, we refer to the output file output13.1.
P-1 0.026499
K-1 -0.0038717 0.0013051
CONSTANT 0.77760 -0.26903 56.892
X -0.00033125 -9.0028E-005 0.013436 0.0012696
X-1 0.00061121 9.9812E-005 -0.020120 -0.0011997 0.0015082
A -0.00066991 0.00017448 -0.032062 -0.00030915 5.1023E-005
200
SYSTEM OF SIMULTANEOUS EQUATIONS
A 0.00084920
CONSTANT 0.015608 1.3174
A CONSTANT
The system estimation methods such as 3SLS and FIML may be iterated until the
convergent solution is found. To control the iteration, as in the case of nonlinear
iteration methods, the following input control variables may be applied: _iter,
_tol, and _restart. We refer readers to Appendix A for more details of these
control variables. For example, the statement _iter=100 of line 30 sets the
number of iterations for 3SLS and FIML estimation. This is to ensure the near
efficiency (theoretically speaking) of the parameter estimates, if they are found
before exhausting the limit of iterations. It will be warned that the results may be
unreliable when iterations exceed the limit.
indicates that the instrumental variables method of FIML is used for estimation. It is
equivalent to state the scalar 4 as a vector {4,0}. The alternatives are either setting
_method to {4,1} for a linearized FIML or setting it to {4,2} for the Newton
method. It is mind-boggling that, for the same problem, not all the FIML methods
will converge to the same solution (provided there is convergence of a solution). It is
now common wisdom that different methods may produce different results due to
different algorithms in use for nonlinear model estimation.
We now present the summary table of the parameter estimates obtained from five
methods we use in estimating the Klein Model I. Numbers in parentheses are
asymptotic standard errors.
201
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
/*
** Lesson 13.2: Klein’s Model I Reformulated
** Using _dlags and _restr
*/
1 use gpe2;
2 output file = gpe\output13.2 reset;
3 load data[23,10] = gpe\klein.txt;
18 call reset;
19 _names={"c","i","w1","x","p","k",
"w2","a","g","t"};
20 _vcov=1;
202
SYSTEM OF SIMULTANEOUS EQUATIONS
22 _id = { 1 1 0 -1 0 0 0 0 1 0,
0 0 -1 1 -1 0 0 0 0 -1,
0 1 0 0 0 -1 0 0 0 0};
27 _iter=100;
28 _method=3; @ 3SLS estimation (iterative) @
29 call estimate(yvar,xvar);
We notice the use of _dlags in line 23 to specify the lagged endogenous variables
in each equation. _dlags is a G×G matrix whose entries indicate the number of
lags for each endogenous variable (column) in each equation (row). If there is no lag
of a particular endogenous variable in a particular equation, the corresponding entry
in _dlags is set to 0. The resulting lag variables are appended to the list of
endogenous variables to form the complete system. Instead of hard-coding the
relevant lags as in Lesson 13.1, the advantage of using _dlags to specify the
model’s lag structure is to let GPE control the dynamic specification of the model.
This feature will be useful for forecasting and simulation.
203
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Leaving Klein Model I for the moment, let’s move on to consider a special class of
simultaneous linear regression equations which has broad application. More
examples involving the usage of the above mentioned input control variables will
follow.
Y = XΓ + E
The system is seemingly unrelated because of the correlated error structure of the
model due to the embedded parameter restrictions or data constraints. In other words,
errors in one equation may be correlated with errors in other equations.
where c = (C/PM)/Q is the normalized unit cost (C is total cost and Q is total output),
and the normalized factor price is pi = Pi/PM for i = K, L, E. All the βs are unknown
parameters of the cost function. Invoking the Shepard Lemma, we can derive the
factor shares as Si = PiXi/C = ∂ln(c)/∂ln(pi), where Xi is the quantity demanded of the
i-th factor (i = K, L, E). Therefore, adding the error terms, the system of factor
demand equations for model estimation is written as:
/*
** Lesson 13.3: Berndt-Wood Model
** Seemingly Unrelated Regression Estimation
** Factor Shares System with Symmetry Restrictions
*/
1 use gpe2;
204
SYSTEM OF SIMULTANEOUS EQUATIONS
3 n=26;
4 load x[n,6] = gpe\bwq.txt;
5 year=x[2:n,1];
6 qy=x[2:n,2];
7 qk=x[2:n,3];
8 ql=x[2:n,4];
9 qe=x[2:n,5];
10 qm=x[2:n,6];
11 load x[n,6] = gpe\bwp.txt;
12 py=x[2:n,2];
13 pk=x[2:n,3];
14 pl=x[2:n,4];
15 pe=x[2:n,5];
16 pm=x[2:n,6];
25 yv=sk~sl~se;
26 xv=pk~pl~pe;
27 call reset;
28 _names={"sk","sl","se","pk","pl","pe"};
30 _method=0;
31 call estimate(yv,xv);
32 _method=1;
33 call estimate(yv,xv);
34 _method=2;
35 call estimate(yv,xv);
36 _iter=50;
37 _method=3;
38 call estimate(yv,xv);
39 _method=4;
40 call estimate(yv,xv);
41 _method={4,1};
42 call estimate(yv,xv);
43 _method={4,2};
44 call estimate(yv,xv);
45 end;
The program is rather straightforward. It loads the two data files and calculates the
necessary variables such as factor shares and normalized prices needed for model
estimation. We do not use either _eq or _id to define the equations system. First,
there are no identity equations. All the equations in the model are stochastic. The
model is in the form Y = XΓ + E where Y corresponds to yv and X corresponds to
xv in the program, and all variables in the xv matrix appear in each equation. Recall
that this is exactly the default structure of the _eq matrix.
205
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The more challenging task is to specify the symmetry condition for the parameters.
As shown in lesson13.2, linear restrictions are expressed in Rβ= q and _restr =
[R|q]. Recall that the number of rows in _restr is the number of restrictions
imposed. As explained before, the restrictions are stacked horizontally in matrix R,
and q is a column vector of the restricted values. There are three symmetry
restrictions for the Berndt-Wood Model across three equations: βKL = βLK, βKE = βEK,
and βLE = βEL. Line 29 does exactly that by setting the correct entries in the _restr
matrix. No restrictions on the constant terms are needed.
The first row of _restr corresponds to the first restriction βKL = βLK. The entry for
the variable PL of SK equation is set to 1, while the entry for the variable PK of SL
equation is –1. Since there is a zero for the q’s column, it amounts to βKL - βLK = 0,
or βKL = βLK. By the same token, the other two restrictions, βKE = βEK and βLE = βEL,
are expressed in the second and third rows of _restr, respectively:
@ PK PL PE|PK PL PE|PK PL PE| q @
_restr = { 0 1 0 -1 0 0 0 0 0 0,
0 0 1 0 0 0 -1 0 0 0,
0 0 0 0 0 1 0 -1 0 0};
Although we estimate the model with all the available methods, there are only two
sets of solutions. One is from the limited information method, and the other from full
information. Of course, the single equation method is not appropriate for a seemingly
unrelated equations system in which parameter restrictions bind the equations
together. Cross equation covariance is of interest in the multiple equations system.
Iterative 3SLS and FIML are the methods of choice for this particular class of model.
In the literature, the estimated parameters of a production function are rich in terms
of elasticity interpretation. Elasticities are simple to compute once the parameters
have been estimated according to the formula:
We leave the elasticity interpretation of the production function to the reader. Here is
the summary output of 3SLS estimation of the Berndt-Wood Model (see
output13.2 for the estimated results by equation):
206
SYSTEM OF SIMULTANEOUS EQUATIONS
Estimation Range = 1 25
may be added to the model of Lesson 13.3 to form a four-equation system. The idea
is to explicitly estimate the cost function from which the factor share equations are
derived. In particular, this model allows us to estimate the scale parameter β0 of the
cost function. In addition, both first-order βi and second-order βij parameters are
constrained to equal the corresponding parameters of the factor share equations.
The parameter restrictions are certainly more involved in the extended model. Since
the restrictions involve constant terms of each equation, we need to address the issue
of regression intercept explicitly. In lesson13.4 below, we first define the constant
vector one in line 32, and include it in the list of exogenous variables xv in line 34.
The model is then estimated without the constant term or _const=0 (line 38). Line
39 specifies 13 linear restrictions among 23 variables (including constant terms for
each equation). Identifying and restricting the parameters of the unit cost function
with those of derived factor share equations is accomplished in the first 10 rows of
207
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
_restr. The last 3 rows are the familiar three symmetry conditions across factor
demand equations, as specified in Lesson 13.3.
/*
** Lesson 13.4: Berndt-Wood Model Extended
** Seemingly Unrelated Regression Estimation
** Full System with Restrictions
*/
1 use gpe2;
2 output file = gpe\output13.4 reset;
3 n=26;
4 load x[n,6] = gpe\bwq.txt;
5 year=x[2:n,1];
6 qy=x[2:n,2];
7 qk=x[2:n,3];
8 ql=x[2:n,4];
9 qe=x[2:n,5];
10 qm=x[2:n,6];
11 load x[n,6] = gpe\bwp.txt;
12 py=x[2:n,2];
13 pk=x[2:n,3];
14 pl=x[2:n,4];
15 pe=x[2:n,5];
16 pm=x[2:n,6];
33 yv=sk~sl~se~c;
34 xv=pk~pl~pe~pkpk~pkpl~pkpe~plpl~plpe~pepe~one;
35 call reset;
36 _names={"sk","sl","se","c","pk","pl","pe",
"pkpk","pkpl","pkpe","plpl","plpe","pepe","one"};
37 _iter=50;
38 _const=0;
@ |----yv----|------------xv--------------| @
@ SK SL SE C PK PL PE KK KL KE LL LE EE 1 @
39 _eq[4,14] = { -1 0 0 0 1 1 1 0 0 0 0 0 0 1,
0 -1 0 0 1 1 1 0 0 0 0 0 0 1,
0 0 -1 0 1 1 1 0 0 0 0 0 0 1,
0 0 0 -1 1 1 1 1 1 1 1 1 1 1};
40 _restr[12,23] =
@ P P P K K K L L E @
@PK PL PE 1|PK PL PE 1|PK PL PE 1|K L E K L E L E E 1|q @
{0 0 0 -1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0,
0 0 0 0 0 0 0 -1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0,
0 0 0 0 0 0 0 0 0 0 0 -1 0 0 1 0 0 0 0 0 0 0 0,
-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0,
0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0,
0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 1 0 0,
208
SYSTEM OF SIMULTANEOUS EQUATIONS
0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0,
0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0,
0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0,
0 1 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0,
0 0 1 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0,
0 0 0 0 0 0 1 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0};
41 _method=3;
42 call estimate(yv,xv);
43 end;
The extended Berndt-Wood Model is estimated with 3SLS. To save space, we leave
out the lengthy results of the model estimation. You should run the program and
check out the results yourself. Instead, we compare the parameter estimates for two
versions of the Berndt-Wood Model. We do not expect the parameter estimates to be
the same, or close, for the two models, even though the same 3SLS method is used
for model estimation. Numbers in the parentheses are standard errors.
209
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
F(Y,X,β) = U
F1(Y,X,β1) = U1
F2(Y,X,β2) = U2
…
FG(Y,X,βG) = UG
Note that identity equations are substituted out in order to avoid the complication of
using constrained optimization. Also, not all the columns of data matrices Y and X
are used in each equation of the system. However, there must be at least one distinct
endogenous variable appearing in each equation. The parameter vector βj effectively
selects the variables included in the equation j.
Constructing from the joint normal probability density of Ui and the Jacobian factor
Ji = Ji(β) = det(∂Ui./∂Yi.) for each observation i, the likelihood function is
210
SYSTEM OF SIMULTANEOUS EQUATIONS
The FIML estimator of the parameters vector β = [β1, β2, ..., βG]' is obtained by
maximizing the above concentrated log-likelihood function.
Note that the original Klein model (Klein, 1950) used a variable G' = G + W2, and
many parameter restrictions have to be built into the system for correct estimation.
To represent the model in the form YB + XΓ = U, we have
Y = [P W1 K]
X = [P1 K1 X1 W2 (G+W2) T A 1]
-1 b1 r1 a3 0 r2
B = a1 -1 0
a1 0 -1
-a2 0 r3
0 b2 0
a1 0 0
Γ=
a2 0 0
-a2 b1 0
0 b3 0
1 1 1
/*
** Lesson 13.5: Klein Model I Revisited
** Nonlinear FIML Estimation, Goldfeld-Quandt (1972), p.34
*/
1 use gpe2;
2 output file=gpe\output13.5 reset;
3 load data[23,10]=gpe\klein.txt;
211
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
28 _names={"a1","a2","a3","b1","b2","b3","r1","r2","r3"};
29 call estimate(&klein,data);
30 end;
212
SYSTEM OF SIMULTANEOUS EQUATIONS
The first part of the data manipulation is the same as in the linear system of Lesson
13.1. The relevant input variables controlling nonlinear optimization are discussed in
chapters VI and VII. The objective log-likelihood function is defined in lines 30
through 53. For nonlinear maximum likelihood estimation, we reduce the size of the
problem by using the deviation from the mean of the data series so that the constant
terms of each equation are eliminated (see line 17). The following is the result of
running lesson13.5:
Initial Result:
Function Value = -116.37
Parameters = 0.20410 0.10250 0.22967 0.72465 0.23273
0.28341 0.23116 0.54100 0.85400
Final Result:
Iterations = 19 Evaluations = 1369
Function Value = -83.324
Parameters = -0.16079 0.81143 0.31295 0.30568 0.37170
0.30662 -0.80101 1.0519 0.85190
Asymptotic Asymptotic
Parameter Std. Error t-Ratio
A1 -0.16079 0.098663 -1.6297
A2 0.81143 0.38368 2.1149
A3 0.31295 0.11847 2.6417
B1 0.30568 0.16223 1.8843
B2 0.37170 0.049169 7.5596
B3 0.30662 0.047628 6.4378
R1 -0.80101 0.84311 -0.95007
R2 1.0519 0.42533 2.4730
R3 0.85190 0.046840 18.187
B3 0.0022684
R1 -0.0095885 0.71083
213
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
a0 = α0/(1-α1) b0 = γ0/(1-γ1) r 0 = β0
a1 = (α2-1)/(1-α1) b1 = γ1/(1-γ1) r 1 = β1
a2 = 1/(1-α1) b2 = γ2/(1-γ1) r 2 = β2
a3 = α3/(1-α1) b3 = γ3/(1-γ1) r3 = β3+1
214
XIV
Unit Roots and Cointegration
So far the econometric models we have constructed are mostly based on economic
theory or empirical evidence. In many situations involving time series data, we will
have to rely on information drawn from the data generating process (DGP) . An
example of this would be a time series with an autocorrelated error structure.
Considering a time series as a DGP, the data may possess a trend, cycle, or
seasonality (or any combination). By removing these deterministic patterns, we
would hope that the remaining DGP is stationary. However, most nonstationary data
series are stochastic. “Spurious” regressions with a high R-square but a near-two
Durbin-Watson statistic, often found in time series literature, are mainly due to the
use of stochastic nonstationary data series.
Given a time series DGP, testing for a random walk is a test of stationarity. It is also
called a unit roots test. Testing for the problem of unit roots for each time series is
more of a process than it is a step. This chapter will chart the procedure to test for
unit roots. If a problem is identified, the original data are differenced and tested
again. In this way, we are able to identify the order of the integrated process for each
data series. Once all data series have completed this process, they are regressed
together and tested for a cointegrating relationship.
Since the tests we use, Dickey-Fuller (DF) and augmented Dickey-Fuller (ADF),
require the model’s error structure to be individually independent and
homogeneously distributed, anomalous serial correlation in time series must be
treated before these tests can be applied. Therefore, serial correlation is tested and
corrected as the pretest step of each unit root test. Instead of directly correcting the
error structure through the integration process, we will modify the dynamics of the
data generating process with lagged dependent variables.
We follow the “top down” approach to carry out both the DF and ADF tests for unit
roots, by testing for the most complicated problems first and then simplifying our
model if problems are absent. We will formulate and test a hierarchy of three
models. First, we estimate the Random Walk Model with trend and drift, or the
Model III, as follows:
where the dependent variable ∆Xt = Xt-Xt-1 is the first difference of the data series
Xt. Using augmented lags of dependent variable Σi=1,2,… ρi ∆Xt-i ensures a white noise
εt for the unit root test. The optimal lag may be selected based on criteria such as AIC
(Akaike Information Criterion) and BIC (Schwartz Baysian Information Criterion).
Testing the hypothesis that ρ = 1 (so that the coefficient of Xt-1 is equal to zero) is the
focus of the unit root tests.
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
If the unit root is not found in the Model III, we continue the process by estimating
the Random Walk Model with Drift, or the Model II, as follows:
And finally, if the unit root is not found in the Model II, we estimate the Random
Walk Model, or Model I:
Testing for unit roots is the first step of time series model building. For a univariate
case, several versions of the DF and ADF tests are available. For multivariate time
series, after unit root tests for each variable, a cointegration test should be carried out
to ensure that the multiple regression model is not spurious. For testing cointegration
of a set of variables, the necessary causal relationship among variables may not be
available for the single equation ADF-type testing due to Engle and Granger (1987).
Johansen’s vector autoregression (VAR) representation of the model and the relevant
Likelihood Ratio tests are suggested for the multivariate case.
1. Test ρ = 1, using the ADF τρ distribution (t-statistic) for Model III. If the null
hypothesis is rejected, we conclude that there are no unit roots in X. Otherwise,
continue on to Step 2.
216
UNIT ROOTS AND COINTEGRATION
1. Test ρ = 1, using the ADF τρ distribution (t-statistic) for Model II. If the null
hypothesis is rejected, we conclude that there are no unit roots in X. Otherwise,
continue on to Step 2.
Test ρ = 1, using the ADF τρ distribution (t-statistic) for Model I. If the null
hypothesis is rejected, we conclude that there are no unit roots in X. Otherwise, we
conclude that the data series is nonstationary, and restart the test process using the
differenced data series.
Many macroeconomic time series have been scrutinized for unit roots and
cointegration. In this chapter, two economic time series, Y (real personal disposable
income) and C (real personal consumption expenditure), from usyc87.txt are used to
illustrate the process of ADF tests for unit roots and cointegration. The same data
series were used in the example of U.S. income-consumption relationship studied in
Chapter XI.
/*
** Lesson 14.1: Unit Root Tests
*/
1 use gpe2;
2 output file = gpe\output14.1 reset;
3 load z[67,3] = gpe\usyc87.txt;
4 y = z[2:67,2];
5 c = z[2:67,3];
217
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
6 x = c;
16 call reset;
17 _names={"dx","trend","x1"};
18 _rstat = 1;
19 _dlags = 3; @ augmented terms if needed @
/* Model III */
20 call estimate(dx,trend~x1);
21 _restr = {0 0 0 1 0 0,
0 0 0 0 1 0}; @ DF joint test @
22 call estimate(dx,trend~x1);
23 end;
Let’s walk through the program. Lines 7 through 12 introduce a Do Loop to simplify
taking the difference of our data, if needed. Line 7 specifies the number of
differences diff on the data series. Then, from line 8 to 12, a Do Loop is used to
transform the necessary differences for the data series when the variable diff is
greater than 0. In line 7 we begin with the original data series in level:
diff = 0;
The next two lines (lines 13 and 14) work on the selected variable to compute the
lagged and the differenced values necessary for the test specification. A GAUSS
command packr is used to eliminate the initial observations of data which are lost
due to the lag operation. Next, a trend variable is generated (line 15) and included for
the estimation of Model III.
Line 19 is the result of a pretest of the model to ensure a classical or white noise
error structure, which is necessary for the ADF test of unit roots. Through a process
of trial and error, we found that for the consumption series C, the addition of three
lags of the dependent variable to the test equation is enough to remove
autocorrelation and maintain the classical assumption for the model error.
Model III is now estimated and tested for unit roots (line 20). Keep in mind that most
computed t-statistics will fall in the left tail of the ADF τ distribution and will be
negative. The second restricted least squares estimation (lines 21 to 22) is there to
carry out the ADF φ-test (based on the F-statistic) for the joint hypotheses of unit
root and no trend, provided that the first regression equation reveals a unit root. We
note that the definition of the restriction matrix of _restr must take into account
the three lags of dependent variables included in front of the explanatory
independent variables.
The following is the result of the estimated Model III in which three lags of the
dependent variable are augmented for analyzing personal consumption.
218
UNIT ROOTS AND COINTEGRATION
Because of the use of lagged dependent variables, the sample range is adjusted. As a
pretest, we see that the errors for the test model are not autocorrelated, therefore
various ADF tests for unit roots are applicable. Starting at Step 1, with the estimated
t-statistic of –1.10 for the coefficient of the lagged variable X1 in the test equation
(vs. the τρ critical value of –3.5 at 5% level of significance, see Table C-1), the unit
root problem is clearly shown. Given the unit root, we continue on to Step 2, testing
the zero-value coefficient of the trend variable. Based on the ADF t-statistic for the
variable TREND, the hypothesis of no trend is barely rejected at a 10% level of
significance. Notice that, from Table C-1, the τβ critical value is 2.81 and 2.38 at 5%
and 10% levels of significance, respectively. However, the joint hypotheses of unit
root and no trend may be better served with the ADF φ-test based on the F-statistic.
The following result of hypothesis testing is due to the restrictions specified in line
21:
219
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
With the Wald F-test statistic of 10.836, compared with the critical value of the ADF
φ3 distribution for Model III (6.73 at 5% significance, see Table C-2), the conclusion
of unit root and no trend leads to the confirmation of unit root with a traditional
normal test. Unit root for the variable C is confirmed, so the level series of C is
nonstationary.
Since the level data series is nonstationary, it must be differenced then estimated and
tested again for unit roots. Based on the above program, it is easy to make changes to
carry out the unit root test for the first differenced consumption series. First, line 7
should read as:
diff = 1;
The Do Loop of lines 8 through 12 translates the original level series to the first
difference series. From this point on, the program will evaluate the data series in the
first difference. We also found that there is no need to augment the model with
lagged dependent variables, since the model error is already free of correlation.
Therefore, line 19 is changed to:
_dlags = 0;
Model III is estimated and tested for unit roots for the first difference of the
consumption series. The ADF test for the joint hypotheses of unit root and no trend
in lines 21 to 22 must be modified or deleted. A simple way to remove these lines
from being “seen” by GAUSS is to comment them out in between “/*” and “*/”.
Here is the output of the modified program running the unit root test on the first
difference of the consumption series:
220
UNIT ROOTS AND COINTEGRATION
Based on the ADF t-statistic –5.55 for the lagged variable X1, the conclusion of no
unit root in the first difference data is immediate and obvious. Therefore, we
conclude that the consumption series is an integrated series of order one. We know
this because taking the first difference makes the data stationary.
Also, from the pretest for the classical error structure, it appears that augmenting the
first lag of the dependent variable is necessary for “whitening” the error term.
Therefore, in line 19:
_dlags = 1;
Accordingly, for computing the F-statistic from the second restricted least squares
estimation, we also modify the restriction matrix in line 21:
_restr = {0 1 0 0,
0 0 1 0};
Here is the estimation result of Model III for personal income level series:
221
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
We see that by comparing the t-statistic of X1 (-1.71) with the corresponding ADF
critical values (-3.50 at 5% significance, see Table C-1), there is a unit root. Based
on the joint test for unit root and no trend hypotheses, a trend is also presented. This
is the purpose of the second regression estimation. The following Wald F-test result
should be checked with the critical values of the ADF φ3 distribution for Model III
(6.73 at 5% significance, see Table C-2):
and deleting the last part of the ADF joint F-test, the first difference of the income
series augmented with one lag of dependent variable is reexamined as follows:
222
UNIT ROOTS AND COINTEGRATION
Based on the ADF t-test statistic, -4.75, for the lagged variable X1, the first
differenced income series is stationary and free of unit roots. As with consumption,
personal income is an integrated series of order one.
The above example of testing for unit roots in the personal consumption and income
data series is carried out based on Model III. We did not go down the hierarchy
further to test Model II or Model I since most of the test results are clear-cut at the
level of Model III. For other macroeconomic time series, you may be required to test
Model II or Model I as well.
All tests are based on Model III. The following annual data series from 1929 to 1994
are tested (in rows): C = Personal consumption expenditure in billions of 1987
dollars; Y = Personal disposable income in billions of 1987 dollars; ∆C = Annual
change in personal consumption expenditure; ∆Y = Annual change in personal
disposable income. Also the following notations are used (in columns): N = number
of observations; Lags = augmented lag terms in the test equation; ρ-1 = estimated
coefficient of the lag variable; β = estimated coefficient of the trend variable; τρ = t-
statistic hypothesis of unit root; τβ = t-statistic hypothesis of no trend, given unit
root; and φ3 = F-statistic hypotheses of unit root and no trend. The asterisk (*)
indicates rejection of the null hypothesis at a 5% statistical significance level based
on ADF distributions (see Table C-1 for critical values of t-statistics and Table C-2
for critical values of F-statistics).
As many previous studies have suggested, income and consumption data are
nonstationary in level, but their first difference or change series are stationary. In
summary, both income and consumption are of the first-order integrated series.
Suppose there are M variables, Z1, … , ZM. Let Yt = Zt1 and Xt = [Zt2, ..., ZtM].
Consider the following regression equation:
Yt = α + Xtβ + εt
In general, if Yt, Xt ~ I(1), then εt ~ I(1). But, if εt can be shown to be I(0), then the
set of variables [Yt, Xt] is said to be cointegrated, and the vector [1 -β]' (or any
multiple of it) is called a cointegrating vector. Depending on the number of variables
M, there are up to M-1 linearly independent cointegrating vectors. The number of
223
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
linearly independent cointegrating vectors that exists in [Yt, Xt] is called the
cointegrating rank.
To test for the cointegration of the set of variables [Yt, Xt], two approaches are used.
If the causality of Y on X is clear, then the Engle-Granger or ADF test based on the
regression residuals may be applied. The alternative is to work with the VAR system
of all variables under consideration. This is the Johansen approach to the
cointegration test, to be discussed later.
Yt = α + Xtβ + εt
the Engle-Granger test for cointegration is to test for unit root for the residuals of the
above regression model. That is, based on Model I, the auxiliary test equation is
written as:
∆εt = (ρ-1)εt-1 + ut
where εt = Yt - α - Xtβ, and ∆εt is defined as εt - εt-1. The rationale is that if the
residual εt has unit root, regressing Y on X may not completely capture the
underlying (nonstationary) trends of all these variables. The estimated model does
not reveal the meaningful relationship, although it may fit the data well. This is the
crux of the spurious regression problem. However, if a cointegrating vector can be
found among the variables that causes the error term εt to be stationary or I(0), then
we can attach meaning to the estimated regression parameters.
We note that the above unit root test equation on the regression residuals does not
have a drift or trend. In order to apply ADF-type testing for a unit root, the model
may be augmented with lagged dependent variables as needed:
If we can reject the null hypothesis of unit root on the residuals εt, we can say that
variables [Yt, Xt] in the regression equation are cointegrated. The cointegrating
regression model may be generalized to include trend as follows:
Yt = α + γ t + Xtβ + εt
Notice that the trend in the cointegrating regression equation may be the result of
combined drifts in X and/or Y. Critical values of the ADF τρ distribution for spurious
cointegrating regression are given in Table C-3 of Appendix C. These values are
based on the work of Phillip and Ouliaris (1990), and depend on the number of
cointegrating variables and their trending behaviors for large samples.
224
UNIT ROOTS AND COINTEGRATION
Ct = β0 + β1 Yt + εt
will be meaningful only if the error εt is free of unit roots. The test for cointegration
between C and Y thus becomes a unit root test on the regression residuals:
/*
** Lesson 14.2: Cointegration Test
** Engle-Granger Approach
*/
1 use gpe2;
2 output file = gpe\output14.2 reset;
6 call reset;
7 _names = {"c","y"};
8 call estimate(c,y);
13 _rstat = 1;
14 _dlags = 2; @ augmented terms if needed @
15 _const = 0; @ no intercept term @
16 call estimate(dx,x1);
17 end;
The program reads in and uses both income (Y) and consumption (C) data series, and
runs a regression of the consumption-income relationship. Here, we are not
interested in investigating or refining the error structure of the regression equation
(though we must make sure that no autocorrelated structure exists in the error term of
the cointegrating regression). Instead, we want to test the residuals for the presence
of unit roots. In GPE, residuals are available as the output variable __e immediately
after the regression equation is estimated. Line 9 sets the variable X to the vector of
residuals:
225
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
x = __e;
and prepares for unit root testing on this variable in the rest of the program. This later
portion of codes (lines 10 through 16) is the same as that in lesson14.1 for testing
unit roots of a single variable. Again, line 14 is the result of a pretest to ensure the
white noise error structure for unit root test:
_dlags = 2;
It turns out that we need to have two lags of the dependent variable augmented to the
test equation. We recall that both income (Y) and consumption (C) variables include
linear trend from our earlier unit roots tests on the respective variable. This fact must
be considered when we use the appropriate ADF τρ distribution for cointegration
tests (using Model 2a or Model 3 of Table C-3). The alternative is to use
MacKinnon’s table (Table C-4) for testing the cointegrating regression model. We
present only the results relevant to the cointegration test in the following (see the
generated output file output14.2 for more details):
Testing for the cointegration of two variables, C and Y with trend, the computed t-
statistic for the lagged variable X1 in the test equation is –3.52, which is right on the
borderline of rejecting the null hypothesis of unit root at a 5% level of significance
(looking at Table C-3, the critical value of ADF cointegration t-statistic τρ for K=2 at
226
UNIT ROOTS AND COINTEGRATION
5% is –3.42 for Model 2a). A similar conclusion is obtained by using the critical
values of MacKinnon (Table C-4). Although these results do not give us
overwhelming confidence that the long-run income-consumption relationship is
legitimate, empirical studies based on the Permanent Income Hypothesis still stand.
Single equation cointegration tests can only be valid when the specific causal
relation of the underlying multiple regression is correct. If the causal relationship of
C and Y is not as clean-cut as the Permanent Income Hypothesis suggests, we need
to run and test the reverse regression equation.
Given a set of M variables Zt=[Zt1, Zt2, ..., ZtM], and considering their feedback
simultaneity, Johansen’s cointegration test based on FIML (full information
maximum likelihood) is derived from the following:
Similar to the random walk (unit roots) hypothesis testing for a single variable with
augmented lags, we write a VAR(p) linear system for the M variables Zt:
where Πj, j=1,2,...M, are the MxM parameter matrices, Π0 is a 1xM vector of
deterministic factors (drifts and trends). Moreover, we assume the 1xM error vector
Ut is independently normally distributed with a zero mean and a constant covariance
matrix Σ = Var(Ut) = E(Ut'Ut) across M variables.
The VAR(p) system can be transformed using the difference series of the variables,
resembling the error correction model, as follows:
∆Zt = ∆Zt-1Γ1 + ∆Zt-2Γ2 + ... + ∆Zt-(p-1)Γp-1 + Zt-1Π + Γ0 + Ut
where I denotes the identity matrix, Π = ∑j=1,2,...,pΠj - I, Γ1 = Π1 - Π - I , Γ2 = Π2 + Γ1,
Γ3 = Π3 + Γ2, … , and Γ0 = Π0 for notational convenience. Recall that Γ0 is a vector
of deterministic factors including drifts and trends. If both drift and trend (µ0 + µ1t)
exist in Zt, then Γ0 = -µ0 Π + µ1(Γ+Π) - µ1Π t where Γ = I - ∑j=1,2,...,p-1Γj.
A few words about the vector Γ0 (or Π0) of the deterministic factors. We consider
only the case of constant vector Γ0 that is restricted such that µ1Π = 0 (no trend),
then Γ0 = -µ0 Π + µ1 Γ. It is easy to see that (1) if µ1 = 0, µ0 is the only deterministic
factor (drift) for Zt, or Γ0 = -µ0 Π; (2) if µ1 ≠ 0, then the VAR(p) model consists of
drift and linear trend components, or Γ0 = -µ0 Π + µ1 Γ.
227
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Plugging in the auxiliary regressions, we can now write the concentrated log-
likelihood function as
ll*(W(Φ1,Φ2,...,Φp-1,Φ0), V(Ψ1,Ψ2,...,Ψp-1,Ψ0),Π)
= - NM/2 (1+ln(2π)-ln(N)) - N/2 ln|det((W-VΠ)'(W-VΠ))|
det((W-VΠ)'(W-VΠ)) = det(W(I-V(V'V)-1V')W')
= det((W'W)(I-(W'W)-1(W'V)(V'V)-1(V'W))
228
UNIT ROOTS AND COINTEGRATION
= det(W'W) det(I-(W'W)-1(W'V)(V'V)-1(V'W))
= det(W'W) (∏i=1,2,...,M(1-λi))
where λ1, λ2, ..., λM are the ascending ordered eigenvalues of the matrix
(W'W)-1(W'V)(V'V)-1(V'W). Therefore the resulting double concentrated log-
likelihood function (concentrating on both Σ = U'U/N and Π = (V'V)-1V'W) is
ll**(W(Φ1,Φ2,...,Φp-1,Φ0), V(Ψ1,Ψ2,...,Ψp-1,Ψ0))
= - NM/2 (1+ln(2π)-ln(N)) - N/2 ln|det(W'W)| - N/2 ∑i=1,2,...,M ln(1-λi)
Given the parameter constraints that there are 0<r<M cointegrating vectors, that is Π
= -BA' where A and B are Mxr matrices, the restricted concentrated log-likelihood
function is similarly derived as follows:
llr**(W(Φ1,Φ2,...,Φp-1,Φ0), V(Ψ1,Ψ2,...,Ψp-1,Ψ0))
= - NM/2 (1+ln(2π)-ln(N)) - N/2 ln|det(W'W)| - N/2 ∑i=1,2,...,rln(1-λi)
Therefore, with the degree of freedom M-r, the Likelihood Ratio test statistic for at
least r cointegrating vectors is
Similarly the Likelihood Ratio test statistic for r cointegrating vectors against r+1
vectors is
A more general form of the Likelihood Ratio test statistic for r1 cointegrating vectors
against r2 vectors (0 ≤ r1 ≤ r2 ≤ M) is
The following table summarizes the two popular cointegration test statistics: the
maximal eigenvalue test statistic λmax(r) and the trace test statistic λtrace(r). By
definition, λtrace(r) = ∑r1=r,r+1,…,Mλmax(r1). For the case of r = 0, they are the tests for no
cointegration. If M=r+1, the two tests are identical.
The critical values of λmax(r) and λtrace(r) for testing the specific number of
cointegrating vectors or rank r are given in Statistical Table C-5. Three models (no
constant, drift only, and trend drift) are presented.
229
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
subdirectory (see also Appendix B-2). Interested readers can study the details of
implementation of the Likelihood Ratio test we outline above. To perform the
Johansen’s cointegration test, in lesson14.3, we include the module at the end of the
program (line 17). The test is done by calling the procedure johansen with three
input arguments: z = data matrix, p = lags of VAR structure, and c is the model or
constant (0=no, 1=drift only, 2=trend drift):
call johansen(z,p,c);
/*
** Lesson 14.3: Cointegration Test
** Johansen Approach
*/
1 use gpe2;
2 output file = gpe\output14.3 reset;
8 call reset;
9 _rstat=1;
10 _method=3;
11 _dlags=3; @ find the proper order p @
12 call estimate(data,0); @ for VAR(p) model estimation @
We present only the summary of the estimation results, and leave out the details of
each equation. We note that the model estimation is used to determine the lag
structure of the VAR system. In this example, VAR(3) has been shown to be the
appropriate model.
230
UNIT ROOTS AND COINTEGRATION
----------------------------------------
Number of Endogenous Variables = 2
Number of Predetermined Variables = 7
Number of Stochastic Equations = 2
Number of Observations = 63
Estimation Range = 4 66
Most importantly, cointegration tests based on eigenvalue and trace statistics are
given for each of the three models: trend drift (Model 3), drift only (Model 2), and
no constant (Model 1), in that order. These computed test statistics are compared
with the critical values of Statistical Table C-5 in Appendix C. Consider the case of
no cointegration (that is, cointegrating rank equals 0 with 2 degrees of freedom):
both λmax(0) and λtrace(0) statistics are statistically greater than the corresponding
critical values at a 5% level significance. We reject the null hypothesis of no
cointegration for the data series under consideration. Therefore, the time series
231
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
income (Y) and consumption (C) are cointegrated, confirming the previous Engle-
Granger or ADF test result based on the cointegrating regression residuals.
232
XV
Time Series Analysis
Continuing from the previous chapter in which we discussed a stationary vs.
nonstationary data generating process, in this chapter, we focus on the modeling of
stationary time series data. If the data series under consideration is nonstationary, we
assume that it is an integrated process and can be made stationary with the proper
amount of differencing. A random data generating process which is difference
stationary is the subject of modern time series analysis.
Typically, time series analysis is carried out in several steps: model identification,
estimation, diagnostic checking, and prediction. In this chapter we emphasize model
identification and estimation. Diagnostic checking is the repetition of the
identification step on the estimated model. Prediction is taken up later in Chapter
XVII. In many circumstances, economic theory offers no a priori data generating
process for a given variable, so model identification is often a trial and error process.
To extract structural information from a random variable, the process of model
identification consists of testing and estimation for the mean and variance of the
variable under consideration. In Chapter X, we used several procedures to test for
autocorrelation in an ARMA model. These tests include the Durbin-Watson bounds
test for first-order serial correlation, the Breusch-Godfrey LM test for higher-order
autocorrelation, and Box-Pierce and Ljung-Box Q test statistics based on different
lags of autocorrelation coefficients. In addition, the autocorrelation function (ACF)
and partial autocorrelation function (PACF) gave us useful clues as to the model’s
structure. Many examples shown in Chapter X demonstrated the use of a
combination of the above-mentioned testing procedures and statistics to study the
time series.
For ARCH modeling, the idea of variance correlation is new but the mechanics are
similar to ARMA analysis. Working on the squares of mean-deviation (or
regression) residuals, the corresponding ACF and PACF can assist in detecting the
autocorrelation in the variance. The associated Box-Pierce and Ljung-Box statistics
are useful to test the potential ARCH process. Analogous to the Breusch-Godfrey
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
In the following, we present the basic formulation of ARMA and ARCH models.
GPE implementation of model identification and estimation for regression models
with ARMA and ARCH effects are illustrated by examples.
where εt is independently distributed with zero mean and constant variance σ2, or εt ~
ii(0,σ2), t = 1,2,...,N. As described in Chapter X, ARMA(p,q) is a mixed process of
AR(p) and MA(q), where p and q represent the highest order of autoregressive and
moving average parameters in the model, respectively. The model may also be
written as a general linear stochastic process:
Recall that stationarity requirements for the process imply that the mean, variance,
and autocovariances of the variable must be finite constants:
For parameter estimation, the ARMA(p,q) model may be written in the “inverted”
form as follows:
ρ(B)Yt = δ + θ(B)εt
or,
θ(B)-1[-δ+ρ(B)Yt] = εt
where B is the backshift operator, ρ(B) = 1 - ρ1B - ρ2B2 - ... - ρpBp, and θ(B) = 1 -
θ1B - θ2B2 - ... - θqBq. Conditional to the historical information (YN, ..., Y1), and data
initialization (Y0, ..., Y-p+1) and (ε0, ..., ε-q+1), the error sum-of-squares is defined by
S = ∑t=1,2,...,Nεt2
234
TIME SERIES ANALYSIS
In order to utilize all N data observations, data initialization may be needed for the
observations Y0, Y-1, ..., Y-p+1 with E(Yt) = δ / (1-ρ1-...-ρp), and ε0, ε-1, ..., ε-q+1 with
E(εt) = 0.14 In GPE, the data initialization used for the pre-sample observations is
simply the sample mean of the series.
ll = -N/2 [1+ln(2π)-ln(N)+ln(∑t=1,2,...,Nεt2)]
Using nonlinear optimization methods, maximizing the above function with respect
to the parameters ρs, θs, and δ is straightforward (see Chapter VII for more details
on maximum likelihood estimation of a nonlinear regression model). The GPE
package implements the nonlinear maximum likelihood estimation for the ARMA
error structure in a linear regression framework.
Note that the model specification posited above is only tentative, pending diagnostic
checking on the estimated residuals. We do not know whether or not we have
included sufficiently high orders of AR and MA terms in the model specification. In
other words, we do not know whether our choice of orders, p for AR and q for MA,
were adequate. The “correct” p and q are usually determined through an iterative
process. We choose an initial number of AR and MA terms (usually at low values,
zero or one) and estimate the model. We then use the diagnostic tests on the
estimated residuals (e.g., Durbin-Watson, Breusch-Godfrey, Box-Pierce, and Ljung-
Box) to determine if serial correlation is still present in the model. If we still have
problems with serial correlation, we add AR or MA terms (i.e., increase the values of
p and q), re-estimate the model and rerun the diagnostic tests on the “new” residuals.
This process continues until the error term has been sufficiently “whitened.” In so
doing, we find the combination of AR and MA terms that removes the serial
correlation from the model. Note that when performing the diagnostic checking on
the estimated residuals, the degrees of freedom used to choose the critical value for
each test statistic is N-(K+p+q), where K is the number of regression parameters.
Yt = ρ0 + ρ1 Yt-1 + ρ2 Yt-2 + ut
14
The alternative to data initialization is to estimate the unknown pre-sample observations of
ε0, ε-1, ..., ε-q+1 together with the model parameters. The problem becomes highly nonlinear
and complicated.
235
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
where ut ~ ii(0,σ2) or nii(0,σ2). We may examine the data series Yt by plotting the
correlogram of its autocorrelation and partial autocorrelation coefficients. For
univariate analysis, the ACF and PACF of the time series will be identical to those of
the residuals obtained from the mean-deviation regression. Up to the maximum
number of lags specified for the ACF and PACF, Box-Pierce and Ljung-Box test
statistics are useful for identifying the proper order of AR(p), MA(q), or ARMA(p,q)
process. In addition, the Breusch-Godfrey LM test may be used to verify higher
orders of autocorrelation, if any exist.
Although the entire diagnostic checking procedure is not shown here, by examining
the autocorrelation and partial autocorrelation coefficients as well as the relevant
diagnostic test statistics, we can show that an AR(2) specification is sufficient for the
bond yield model. The result is consistent with that of the stationarity test. Using a
time series of bond yields, a regression model with two (first and second) lagged
dependent variables is estimated in the following program lesson15.1.
/*
** Lesson 15.1: ARMA Analysis of Bond Yields
*/
1 use gpe2;
2 output file=gpe\output15.1 reset;
5 y=bonds[2:n,2];
6 call reset;
7 _names={"yields"};
8 _rstat=1;
9 _rplot=2;
10 _dlags=2;
/*
_ar=2;
_iter=50;
*/
11 _bgtest=4;
12 _acf=12;
13 call estimate(y,0);
14 end;
Further study of the regression residuals using the ACF and PACF up to 12 lags does
not reveal higher-order autocorrelated structure in this model. The other tests
(Durbin-Watson test for the first lag and Breusch-Godfrey LM test up to the fourth
lags) suggest that a structure beyond AR(2) may be presented. But if such structure
does exist in this model, it is not reflected in the ACF and PACF. Conflicting results
from the use of different diagnostic tests are not unusual in empirical analysis. Run
this program, and see the output file output15.1 for details.
236
TIME SERIES ANALYSIS
Yt = µ + εt
εt = φ1 εt-1 + φ2 εt-2 + ut
Or, equivalently
Yt = µ + φ1 εt-1 + φ2 εt-2 + ut
We note that µ = ρ0 /(1- ρ1 - ρ2) from the earlier specification with lagged dependent
variables. We now modify Lesson 15.1 by replacing line 10 of the program
(_dlags=2) with the following two statements:
_ar=2;
_iter=50;
Running the revised lesson15.1, we obtain the following result (standard errors are in
parentheses):
15
With the model of Lesson 15.1, the divisor used is N-K where K is 3.
237
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Yt = Xtβ + εt
φ(B)εt = θ(B)ut
Or, equivalently
ρ(B)Yt = Xtβ + εt
εt = θ(B)ut
Or, equivalently
As mentioned earlier, in GPE, ARMA analysis is called with the input control
variable _arma. _arma is a column vector containing at least two elements
specifying the type of ARMA model to be estimated. The first element of _arma
denotes autoregressive order of the ARMA process, while the second element
denotes the moving average order. Specifying only the autoregressive portion and
including a zero for the moving average portion yields a pure AR specification (vice
versa for a pure MA specification). Optional initial values of the autoregressive and
moving average coefficients may be appended to the vector _arma along with their
respective orders. Supplying the initial values is useful for starting the iterative
estimation process from non-zero values of ARMA coefficients. For example,
_arma = {1,1,0.5,0.1};
The model ARMA(1,1) is specified, with the initial values 0.5 for the AR(1)
coefficient and 0.1 for the MA(1) coefficient. Nonlinear estimation of this model will
begin from the specified set of starting values.
238
TIME SERIES ANALYSIS
∆Pt = β0 + β1(∆Mt-1-∆Yt-1) + εt
The lagged values of the inflation rate (or the disturbance term) will serve to model
the effects of external shocks to the economy. The data file usinf.txt consists of 136
quarterly observations (from 1950 Q1 to 1984 Q4) of data for price (implicit GNP
deflator) Pt, money stock M1t, and output (GNP) Yt.
To keep the model simple, we include the first lag of the dependent variable in the
regression and examine the patterns of ACF and PACF. Significant spikes (or non-
zero values of autocorrelation and partial autocorrelation coefficients) appear up to
the third or fourth lags for both functions, indicating a complicated structure in the
model’s error term. We will not go through the entire identification process here.
Interested readers can “comment out” lines 15 to 18 in the program lesson15.2
below, and decide the proper ARMA or ARMAX specification for themselves based
on their observations of the behavior of the ACF and PACF.
/*
** Lesson 15.2: ARMA Analysis of U.S. Inflation
** Greene (1999), Example 18.11
*/
1 use gpe2;
2 output file=gpe\output15.2 reset;
3 n=137;
4 load data[n,4]=gpe\usinf.txt;
5 y=ln(data[2:n,2]);
6 m=ln(data[2:n,3]);
7 p=ln(data[2:n,4]);
8 dp=packr(100*(p-lagn(p,1)));
9 dm=packr(100*(m-lagn(m,1)));
10 dy=packr(100*(y-lagn(y,1)));
11 call reset;
12 _rstat=1;
13 _rplot=2;
14 _acf=12;
15 _dlags=1;
16 _arma={0,3};
17 _method=5;
239
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
18 _iter=100;
19 call estimate(dp,lagn(dm-dy,1));
20 end;
The final model for estimation is a lagged dependent variable model (line 15) with a
third-order moving average specification (line 16). Maximum likelihood estimation
of the model is carried out using the QHC optimization method (line 17). The output
of running lesson15.2 is stored in the file output15.2.
The lag of the dependent variable (∆Pt-1) plays an important role in the regression
equation. Although the second lag of the moving average is insignificant, the first
and third are significant. Further analysis of the ACF and PACF does not show
autocorrelation in the regression residuals.
Yt = Xtβ + εt
At time t, conditional to the available historical information Ht, we assume that the
error structure follows a normal distribution: εt|Ht ~ nii(0,σ2t) where the variance is
written as:
Let υt = ε2t-σ2t, αi = 0 for i > q, δj = 0 for j > p, and m = max(p,q). Then, the above
serially correlated variance process may be conveniently rewritten as an
ARMA(m,p) model for ε2t. That is,
By assuming E(υt) =0, E(ε2t) is the estimated variance of σ2t. This is the general
specification of autoregressive conditional heteroscedasticity, or GARCH(p,q),
240
TIME SERIES ANALYSIS
σ2t = α0 + Σi=1,2,...qαiε2t-i
ARCH(1) Process
The simplest case, pioneered by Engle (1982) sets q = 1 (while p = 0). This
ARCH(1) process can be written as:
σ2t = α0 + α1ε2t-1
Yt = Xtβ + εt
εt = ut(α0 + α1ε2t-1)½ where ut ~ nii(0,1)
This specification gives us the conditional mean and variance, E(εt|εt-1) = 0 and σ2t =
E(ε2t|εt-1) = α0 + α1ε2t-1, respectively. Note that the unconditional variance of εt is
E(ε2t) = E[E(ε2t|εt-1)] = α0 + α1E(ε2t-1). If σ2 = E(ε2t) = E(ε2t-1), then σ2 = α0/(1-α1),
provided that |α1| < 1. In other words, the model may be free of general
heteroscedasticity even when we assume that conditional heteroscedasticity is
present.
ARCH-M(1) Model
An extension of the ARCH(1) model is ARCH(1) in mean, or ARCH-M(1) model,
which adds the heterogeneous variance term directly into the regression equation
(assuming a linear model):
Yt = Xtβ + γσ2t + εt
σ2t = α0 + α1 ε2t-1
The last variance term of the regression may be expressed in log form ln(σ2t) or in
standard error σt. For example, Yt = Xtβ + γln(σ2t) + εt. Moreover, to ensure the
model stability and positive values of variances, we will need to constrain σ2t by
forcing α0 > 0 and 0 ≤ α1 < 1.
_acf2 = 12;
241
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
_ebtest = 6;
For more information about the use of _acf2 and _ebtest, see Appendix A.
The parameter vector (α,δ) is estimated together with the regression parameters (e.g.,
ε = Y - Xβ) by maximizing the log-likelihood function, conditional to the data
initialization ε20, ε2-1, ..., ε2-q, σ20, σ2-1, ..., σ2-p. In GPE, the data initialization used for
the pre-sample observations is simply the sample variance of the error series E(ε2t) =
Σt=1,2,...,N ε2t/N.
In estimating a GARCH model, the estimated variance for each observation must be
positive. We could assume the following parameter restrictions:
However, this set of restrictions is sufficient but not necessary (see Nelson and Cao,
1992) for the positive values of variances.
To estimate a model with ARCH or GARCH effects, we introduce the input control
variable _garch. _garch is a column vector with at least two elements. The first
element defines the autoregressive order of the GARCH model, while the second
element is the moving average order of the model. The rest of components in the
vector _garch, if given, specify the initial values of the GARCH parameters in
accordance with the orders given, as well as the initial value of the constant term.
The constant term in the variance equation is important because it may indicate a
homoscedastic structure if all the other slope parameters are insignificant. For
example,
_garch = {1,1};
242
TIME SERIES ANALYSIS
_garch = {1,1,0.1,0.1,0.5};
then the initial values of the GARCH(1,1) are also given. The first two values of 0.1
are the initial values of autoregressive and moving average components of the
GARCH (1,1) process, respectively. The last element, 0.5, is the constant. The
nonlinear model estimation will begin from this set of starting values. Finally, we
remark that GPE implementation of the GARCH model estimation includes another
input variable _garchx to allow for external effects in the variance equation:
where Xt is a data matrix of external variables which may influence the variances. γ
is the corresponding parameter vector. Setting the data matrix to the variable
_garchx will do the trick. For example,
_garchx = x;
where x is the data matrix of the external variables already in place, which must have
the same number of observations as the residual variances. For more information
about the use of _garch and _garchx, see Appendix A.
We will test, identify, and estimate the appropriate GARCH variance structure for
the variable ∆Pt, defined as the percentage change of implicit price GNP deflator. We
specify 12 lags of ACF and PACF for the squared mean-deviation residuals and
compute Engle-Bollerslev LM test statistics up to the sixth lag:
_acf2 = 12;
_ebtest = 6;
Just to be sure, the ARMA structure in the mean is also investigated by examining 12
lags of ACF and PACF for the mean-deviation residuals and 6 lags for Breusch-
Godfrey LM test statistics:
_acf = 12;
_bgtest = 6;
/*
** Lesson 15.3: ARCH Analysis of U.S. Inflation
** Greene (1999), Example 18.12
*/
1 use gpe2;
2 output file=gpe\output15.3 reset;
243
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
3 n=137;
4 load data[n,4]=gpe\usinf.txt;
5 p=ln(data[2:n,4]);
6 dp=packr(100*(p-lagn(p,1)));
7 call reset;
8 _rstat=1;
9 _rplot=2;
10 _acf=12;
11 _bgtest=6;
12 _acf2=12;
13 _ebtest=6;
14 _dlags=3;
15 _garch={1,1,0.5,0.5,0.1};
16 _method=5;
17 _iter=100;
18 call estimate(dp,0);
19 end;
We note that the initial values of the GARCH(1,1) parameters are used in order to
successfully estimate the model (see line 15). Running lesson15.3, we obtain the
following result (see the generated output file output15.3 for details):16
Based on the standard normal test, we see that σ2t-1 is statistically different from zero,
but the constant term and ε2t-1 are not. The model may be re-estimated with
GARCH(1,0) specification.
To be sure that the estimated GARCH(1,1) model does not have higher-order
structures in either the ARMA or GARCH specifications, the following extract of
output on diagnostic checking of the estimated model consists of: (1) ACF and
PACF for the estimated residuals and Breusch-Godfrey LM test for ARMA
specification; (2) ACF and PACF for the squared estimated standardized residuals
and Engle-Bollerslev LM test for ARCH specification. With an exception at the
twelfth lag of ACF and PACF for GARCH specification (possibly an outlier), the
estimated GARCH(1,1) model is acceptable for describing the U.S. inflation rate.
16
Because we use a different set of U.S. inflation rate data, the estimated model does not
match with the Example 18.2 of Greene (1999).
244
TIME SERIES ANALYSIS
At this point, you may be wondering whether there exist ARCH effects for the
inflation rate model we considered earlier in Lesson 15.2. The mixture of ARMA
and ARCH effects may be identified and estimated for the model. We leave the
validation of ARCH effects in Lesson 15.2 to interested readers.
245
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
sample, longer lags may be used for the tests with ACF and PACF.17 We leave out
the details of identification and report only the chosen model for estimation.
/*
** Lesson 15.4: GARCH(1,1) Model of DM/BP Exchange Rate
** Bollerslev and Ghysels (1996), JBES, 307-327.
*/
1 use gpe2;
2 output file=gpe\output15.4 reset;
5 x=data[.,1];
6 call reset;
7 _names={"xrate"};
8 _rstat=1;
9 _rplot=2;
@ model identification @
10 _acf2=12;
11 _ebtest=6;
12 _acf=12;
13 _bgtest=6;
@ model estimation @
14 _garch={1,1};
15 _method=6;
16 _iter=100;
17 call estimate(x,0);
18 end;
Using the modified QHC method (line 15), the result of maximum likelihood
estimation of the GARCH(1,1) model is given below:
17
As the size of the data is beyond the limit of GAUSS Light, the professional version of
GAUSS should be used.
246
TIME SERIES ANALYSIS
Convergence Criterion = 0
Tolerance = 0.001
Initial Result:
Log Likelihood = -1722.8
Parameters = -0.016427 0.00000 0.00000 0.10000
Final Result:
Iterations = 11 Evaluations = 596148
Log Likelihood = -1106.6
Parameters = -0.0061905 0.80598 0.15313 0.010761
Gradient Vector = 0.067109 -3.2813 -2.7642 -17.896
247
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
With the exception of the constant term, all other parameters are significantly
different from zero based on the standard normal test.
Diagnostic checking on the estimated GARCH(1,1) model does not suggest a higher-
order ARMA or GARCH specification. All the statistical tests presented below
confirm that the estimated GARCH(1,1) model describes the volatility of the returns
of the Deutschemark-British pound exchange rate reasonably well.
248
TIME SERIES ANALYSIS
249
XVI
Panel Data Analysis
We have seen two popular types of data used in econometric analysis: time-series
and cross-sectional data. However, in some circumstances, the economic data may
be a composition of time series and cross sections (i.e., the observations of several
individuals over time). International statistics, company surveys, and longitudinal
data sets are common examples. Modeling these panel data sets calls for some quite
complex stochastic specifications. In this chapter, we introduce the basic
programming techniques for panel data analysis.
For each cross section (individual) i=1,2,...N and each time period (time) t=1,2,...T,
we write the regression equation as follows:
Suppose that the regressors Xit include a constant term. Let βit = β and assume εit = ui
+ vt + eit. Note that we assume the identical β for all i and t, and consider their
differences in the components of the error term εit. Here ui represents the individual
difference in intercept (so that the individual effect is β0+ui, where β0 is the intercept
parameter in β) and vt is the time difference in intercept (so that the time effect is
β0+vt). Two-way analysis includes both time and individual effects. Throughout
much of this chapter, however, we will assume vt = 0. That is, there is no time effect
and only the one-way individual effects will be analyzed.
We further assume that eit is a classical error term, with zero mean, homogeneous
variance, and there is neither serial correlation nor contemporaneous correlation.
That is, the error term is not correlated across individuals or time periods. Also, eit is
assumed to be uncorrelated with the regressors Xit. That is,
E(eit) = 0
E(eit2) = σ2e
E(eitejt) = 0, for i≠j
E(eiteiτ) = 0, for t≠τ
E(Xiteit) = 0
known as the fixed effects model. To estimate a model with individual fixed effects,
consider the following equation:
Let Yi = [Yi1, Yi2, ..., YiT]', Xi = [Xi1, Xi2, ..., XiT]', εi = [εi1, εi2, ..., εiT]', and υi = [ui,
ui, ..., ui]' (a column vector of T elements of ui). The pooled (stacked) model is
υ1 ε1
Y2 = X2 β + υ2 + ε2 , or
Y1 X1
… … … …
YN XN υN εN
Y = Xβ + υ + ε
Then D = [D1, D2, ..., DN-1] is NT×(N-1) matrix of N-1 dummy variables.
Mathematically, D is the first N-1 columns of the matrix I⊗ι where I is NxN identity
matrix and ι is Tx1 column vector of ones.
Ordinary least squares can be used to estimate the model with dummy variables as
follows:
Y = Xβ + Dδ +ε
Since X includes a constant term, we will only need N-1 dummy variables for
estimation and the estimated δ measures the individual change from the intercept.
The individual effects are then computed as the sum of the intercept coefficient and
the estimated dummy variable parameter for each individual.
Deviation Approach
Although the dummy variable approach is simple, the size of the problem may
become difficult to handle if the number of cross sections (individuals) is large. An
alternative is the deviation approach.
Or, equivalently
252
PANEL DATA ANALYSIS
Note that the constant term drops out due to the deviation transformation. As a result,
we can conclude the individual effects as ui = Ymi - Xmiβ. The variance-covariance
matrix of individual effects can be estimated as follows:
where v is the estimated variance of the mean deviation regression with NT-N-K
degrees of freedom. Note that K is the number of explanatory variables not counting
the constant term.
We can also estimate the model by using only the calculated individual means (as
opposed to the deviations from the mean):
The parameter estimates produced from this specification are referred to as the
between-estimates, and are related to the within-estimates of the parameters.
RSS R − RSS U
N −1
RSS U
NT - N - K
For panel data analysis, allowing for individual effects, the model for the total cost of
production is:
253
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
We notice that the intercept αi is taken to be constant over time t and specific to the
individual firm i. The interpretation of slope parameters is straightforward in that
1
β1 > 0, β2 > 0, and β3 < 0. Moreover, the economies of scale defined as -1 ,
β1
measures the efficiency of production.
The following program implements the fixed effects analysis using dummy
variables. For a typical regression, we need only to include five dummy variables for
the case of six firms. The estimated parameters associated with dummy variables
represent the change from the intercept (or the base case). If you are interested in the
fixed effects for each individual firm, you may use the full set of 6 dummy variables
in the regression equation without including the intercept. Since the use of dummy
variables in the regression was explained earlier in Chapter IV, this program is easy
to follow. In passing, note the use of a GAUSS built-in command dummybr to
generate the necessary dummy variables (see line 18).
/*
Lesson 16.1: One-Way Panel Data Analysis, Dummy Variable Approach
Cost of Production for Airline Services I
*/
1 use gpe2;
2 output file = gpe\output16.1 reset;
3 load data[91,6] = gpe\airline.txt;
4 panel=data[2:91,1:2]; @ panel definition @
5 n=6;
6 t=15;
11 call reset;
12 _names = {"c","q","pf","lf","d1","d2","d3","d4","d5","d6"};
/* pooled estimates */
13 ys=cs;
14 xs=qs~pfs~lfs;
15 call estimate(ys,xs);
16 rssr=__rss;
17 dfr=__df;
22 f=((rssr-rssur)/(dfr-dfur))/(rssur/dfur);
23 print "Wald F Test Statistic";
24 print "for No Fixed Individual Effects = " f;
25 end;
The estimation results include the pooled regression and the dummy variable
regression. The Wald F-test statistic for fixed effects is computed from the estimated
sum-of-squares of the restricted (pooled) and unrestricted (dummy variables)
regressions (see line 22 in the program). Here is the output of running lesson16.1:
254
PANEL DATA ANALYSIS
Given the critical value of the distribution F(5, 81) at 5% level of significance, it is
clear that the cost structures among the six airline firms are somewhat different. In
other words, we reject the null hypothesis that there are no fixed effects. The fixed
255
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
effects are calculated by adding the parameters of the dummy variables to the
intercept.
Remember that an alternative is to include all six dummy variables and estimate the
model without an intercept. That is, replace line 18 with the following two
statements:
_const=0;
d=dummybr(panel[.,1],seqa(1,1,n));
The individual fixed effects are summarized in the following table (numbers in
parentheses are the estimated standard errors):
Then, the model error is εit = ui + eit, which has the following structure:
In other words, for each cross section i, the variance-covariance matrix of the model
error εi = [εi1, εi2, ...,εiT]' is the following T×T matrix:
2
+ σ2u σ2u … σ2u
σ e
2 2 2
σ uσ e+σ u… σ2u
Σ= =σ 2
eI + σ2u
… …… …
σ2u σ2u … σ2e + σ2u
If we let ε be an NT-element vector of the stacked errors ε1, ε2, ..., εN, then E(ε) = 0
and E(εε') = Σ⊗I, where I is an N×N identity matrix and Σ is the T×T variance-
covariance matrix defined above.
256
PANEL DATA ANALYSIS
β = [X'(Σ−1⊗I)X]-1X'(Σ−1⊗I)y
Since Σ-1 can be derived from the estimated variance components σ2e and σ2u, in
practice the model is estimated using the following partial deviation approach.
2. Assuming the randomness of ui, estimate the between parameters of the model:
E(ui + emi) = 0
E((ui + emi)2) = σ2u + (σ2e/T)
E((ui + emi)(uj + emj)) = 0, for i≠j
E(ε*it) = 0
E(ε*2it) = σ2e
E(ε*itε*iτ) = 0 for t≠τ
E(ε*itε*jt) = 0 for i≠j
The least squares estimate of [w (Ymi - Xmiβ)] is interpreted as the change in the
individual effects.
257
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
NT
2(T - 1)
[(∑ i =1,2,..., N
(∑ t =1,2,...T
ε it ) ) (∑
2
i =1,2,..., N ∑ t =1,2,..., T
) ]
ε it 2 − 1
2
or,
NT
2(T - 1)
[(∑ i =1,2,..., N
(Tε ) ) (∑
i
m 2
i =1,2,..., N ∑ t =1,2,..., T
) ]
ε it 2 − 1
2
The Hausman test begins by noting that the difference between the fixed and random
effects models is in their respective covariance matrices. Let bfixed be the estimated
slope parameters of the fixed effects model (using the dummy variable approach),
and let brandom be the estimated slope parameters of the random effects model.
Similarly, let Var(bfixed) and Var(brandom) be the estimated covariance matrices for the
fixed and random effects models, respectively. The Hausman specification test
statistic is:
(brandom-bfixed)'[Var(brandom)-Var(bfixed)]-1(brandom-bfixed)
258
PANEL DATA ANALYSIS
We put the include directive at the end of program (see line 15 of lesson16.2). Then
one-way panel data analysis is called with the statement:
call panel1(y,x,n,t);
where y is the dependent variable and x is the data matrix of explanatory variables.
Both y and x are stacked according to the panel definition of n blocks (cross
sections) of t observations (time periods). To analyze fixed and random effects for
the airline services example, the program of Lesson 16.2 is given below:
/*
Lesson 16.2: One-Way Panel Data Analysis, Deviation Approach
Cost of Production for Airline Services II
*/
1 use gpe2;
2 output file = gpe\output16.2 reset;
3 load data[91,6] = gpe\airline.txt;
4 panel=data[2:91,1:2]; @ panel definition @
5 n=6;
6 t=15;
14 end;
15 #include gpe\panel1.gpe;
There are four sets of regression output, but we will present only the important
results of fixed effects and random effects models here:
259
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The end of the estimation output produces a summary of the panel data analysis.
Three sets of hypothesis testing for fixed and random effects are given. Based on the
Wald F-test and the Breusch-Pagan LM test, it is clear that there exist both fixed
effects and random effects for this model. Based on the Hausman specification test,
however, there is no significant difference between the fixed and random effects.
Within-Groups Estimates:
Fixed S.E. Random S.E.
0.91928 0.029890 0.90668 0.026404
0.41749 0.015199 0.42278 0.014451
-1.0704 0.20169 -1.0645 0.20615
-6.1586e-016 0.0063356 1.1873 0.026704
260
PANEL DATA ANALYSIS
One-Way Effects:
Section/Period Fixed S.E. Random S.E.
1.0000 9.7059 0.19323 9.6378 0.18313
2.0000 9.6647 0.19908 9.5979 0.18716
3.0000 9.4970 0.22505 9.4408 0.20686
4.0000 9.8905 0.24185 9.7780 0.21918
5.0000 9.7300 0.26102 9.6299 0.23371
6.0000 9.7930 0.26374 9.6831 0.23544
Finally, within-groups estimates of the slope parameters and the individual intercept
parameters are presented for the fixed effects and random effects models,
respectively. Note that the estimated fixed effects, derived from the deviation
approach, are the same as those of dummy variables approach. Furthermore, the
random effects are similar to the fixed effects, reinforcing the result of the Hausman
specification test that there is no significant difference between the two models.
Notice that the procedure panel1 is designed for study of individual (cross-
section) effects. To study the time effects, swap the panel definition n and t and
rearrange the stacked data series accordingly. For example, in lesson16.2, you can
insert the following statements (with comments for clarity) between lines 10 and 11:
@ re-arrange data, then swap n and t @
cs=vec(reshape(cs,n,t));
qs=vec(reshape(qs,n,t));
pfs=vec(reshape(pfs,n,t));
lfs=vec(reshape(lfs,n,t));
n=15;
t=6;
We leave the estimation and interpretation of the time period effects as an exercise.
Once you understand and master the idea of one-way panel data analysis, it is
straightforward to extend it to two-way analysis. Both cross-section and time period
effects are analyzed simultaneously under the respective assumptions of fixed effects
and random effects. Greene (1999) presented such an extension as two exercises in
Chapter 14. We implement the two-way analysis in the module program
PANEL2.GPE, which extends the module PANEL1.GPE for one-way analysis
used in Lesson 16.2. You may want to examine the code of PANEL2.GPE in
comparison with the outlined formula of Greene (1999), pp. 587-589. In essence, the
two-way analysis runs five regressions: a pooled regression, two between-groups
(time periods and cross sections) regressions, and two within-groups (full deviations
and partial deviations) regressions. From these regression estimations, we calculate
overall, cross section, and time period effects. As with one-way analysis, statistics
for testing fixed effects, random effects, and for comparing fixed and random effects
are computed. The module program PANEL2.GPE hides the details of
implementation from all but the most curious eyes. PANEL2.GPE can be found in
Appendix B-4 and it is installed in the GPE subdirectory.
261
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
where y, the dependent variable, and x, the independent variables, are stacked
according to the panel definition of n blocks (cross sections) of t observations (time
periods). The rest of the program for two-way analysis is identical to the previous
lesson for one-way analysis.
/*
Lesson 16.3: Two-Way Panel Data Analysis
Cost of Production for Airline Services III
*/
1 use gpe2;
2 output file = gpe\output16.3 reset;
3 load data[91,6] = gpe\airline.txt;
4 panel = data[2:91,1:2]; @ panel definition @
5 n=6;
6 t=15;
12 call reset;
13 _names = {"c","q","pf","lf"};
14 call panel2(cs,xs,n,t);
15 end;
16 #include gpe\panel2.gpe;
It takes five regression estimations to carry out two-way panel data analysis. To save
space, we will report only the summary information as follows:
Within-Groups Estimates:
Fixed S.E. Random S.E.
0.81725 0.031851 0.90237 0.029742
0.16861 0.16348 0.42418 0.016306
-0.88281 0.26174 -1.0531 0.22948
6.1829e-016 0.0054155 1.0109 0.025968
Two-Way Effects:
Fixed Random
Overall 12.667 1.6784
262
PANEL DATA ANALYSIS
From the two-way analysis, we can see that the model exhibits significant fixed
effects and random effects. The magnitude and the pattern of the two effects are
different. From examining the “Time Periods Effects” in the output, we see that the
fixed effects are larger than the random effects. On the other hand, we see that for
the “Cross Sections Effects,” the magnitude of the random effects is greater than that
of the fixed effects.
ε1
Y2 = 0 X2 … 0 β + ε2
Y1 X1 0 … 0
… … … … … …
YN 0 0 … XN εN
Notice that not only the intercept but also the slope terms of the estimated parameters
are different across individuals. Of course, the restrictions of identical slope terms
263
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
across individuals may be imposed for direct comparison with the classical methods.
The error structure of the model is summarized as follows:
E(ε) = 0
E(Xε) = 0
E(εε') = Σ⊗I
System estimation techniques such as 3SLS and FIML should be used for parameter
estimation in this kind of model, which is seemingly unrelated regression estimation
in the current context. The SUR estimation method was discussed in Chapter XIII.
Denote b and S as the estimated β and Σ, respectively. Then,
b = [X'(S-1⊗I)X]-1X'(S-1⊗I)y
Var(b) = [X'(S-1⊗I)X]-1
The advantage of the SUR estimation method for panel data analysis is that it not
only allows the intercept difference between individuals (as in the fixed and random
effects models), but also allows the slope to vary among individuals. If the slope
parameters are assumed to be constant across individuals, the method differs from
the random effects model in the fundamental assumption of the covariance structure.
By allowing cross-section correlation, the restricted SUR method is more general
than the classical random effects model.
Lesson 16.4: Panel Data Analysis for Investment Demand: Deviation Approach
To demonstrate the different approaches for panel data analysis, we consider the
following classical example of investment demand (Greene, 1999, Chap. 15;
Grunfeld and Griliches, 1960; Boot and deWitt, 1960):
The panel data of 20 years for 5 companies are available in 5 separate files, one for
each company. The data files used are: ifcgm.txt (General Motor), ifcch.txt
(Chrysler), ifcge.txt (General Electric), ifcwe.txt (Westinghouse), ifcus.txt (United
Steel).
264
PANEL DATA ANALYSIS
First we assume that β1i = β1 and β2i = β2 for all firms. In other words, we are
estimating the restricted SUR model by assuming that the slope parameters do not
vary across firms. To estimate and compare the fixed effects and random effects for
the model, we use the following program which is essentially the same as that of
lesson16.2. Since the five company data sets are read in separately as time series,
some manipulation is necessary to convert them into a stacked vector of dependent
variables and a stacked matrix of independent variables (see lines 8 through 14 in
lesson16.4 below). The stacked data format is required in order to use the
PANEL1.GPE module program.
/*
Lesson 16.4: Panel Data Analysis for Investment Demand
Deviation Approach
*/
1 use gpe2;
2 output file = gpe\output16.4 reset;
15 call reset;
16 _names={"i","f","c"};
17 call panel1(ys,xs,n,t);
18 end;
19 #include gpe\panel1.gpe;
As described earlier, using the module PANEL1.GPE to estimate the one-way fixed
and random effects gives us four sets of regression output: the pooled regression,
between-groups means regression, within-groups full deviations regression, and
within-groups partial deviations regression. You should check the details of each
regression output. We present only the summary results of the analysis.
265
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Within-Groups Estimates:
Fixed S.E. Random S.E.
0.10598 0.015891 0.10489 0.015112
0.34666 0.024161 0.34602 0.024770
-1.6507e-014 6.9118 -8.8082 8.1293
One-Way Effects:
Section/Period Fixed S.E. Random S.E.
1.0000 -76.067 66.886 -69.356 58.234
2.0000 -29.374 19.814 -33.176 19.376
3.0000 -242.17 33.321 -213.56 31.028
4.0000 -57.899 19.703 -57.575 19.263
5.0000 92.539 33.947 72.218 31.535
It is interesting to find the classical estimates of fixed effects and random effects are
similar. This is consistent with the very small Hausman specification test statistic
shown in the output.
Lesson 16.5: Panel Data Analysis for Investment Demand: SUR Method
By restricting β1i = β1 and β2i = β2 for all firms, the restricted SUR estimation
method is used in direct comparison with the classical methods of panel data
analysis. In Chapter XIII we implemented and estimated a system of linear demand
equations using the SUR estimation method. The use of the input control variable
_eq in estimating the simultaneous linear equations system was discussed in detail
in Chapter XIII. In Chapter III we introduced the implementation of restricted least
squares with the use of input control variable _restr. Parameter restrictions across
equations in a linear equations system were again discussed in Chapter XIII. You
may want to review these chapters and the relevant examples before working on this
lesson.
In Lesson 16.5, the restricted SUR method is estimated using iterative three-stage
least squares (_method=3). The result is the same as full information maximum
likelihood.
/*
Lesson 16.5: Panel Data Analysis for Investment Demand Function
Seemingly Unrelated Regression Estimation
*/
1 use gpe2;
2 output file = gpe\output16.5 reset;
11 yvar=i;
12 xvar=f~c;
13 call reset;
14 _names={"i-gm","i-ch","i-ge","i-we","i-us",
"f-gm","f-ch","f-ge","f-we","f-us",
"c-gm","c-ch","c-ge","c-we","c-us"};
@ I I I I I F F F F F C C C C C 1@
266
PANEL DATA ANALYSIS
15 _eq = {-1 0 0 0 0 1 0 0 0 0 1 0 0 0 0,
0 -1 0 0 0 0 1 0 0 0 0 1 0 0 0,
0 0 -1 0 0 0 0 1 0 0 0 0 1 0 0,
0 0 0 -1 0 0 0 0 1 0 0 0 0 1 0,
0 0 0 0 -1 0 0 0 0 1 0 0 0 0 1};
17 _iter=200;
18 _method=3;
19 call estimate(yvar,xvar);
20 end;
You should run the program to get the full report of the estimation results. The
output of the restricted SUR estimation is lengthy, but can be summarized as
follows:
267
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
To compare the fixed effects, random effects, and SUR method, the estimated
parameters of the investment function are tabled together. The individual effects for
three methods (under different covariance assumptions) are shown in the rows of
intercept terms for each firm. Numbers in parentheses are the estimated standard
errors.
Although the estimates from the models of fixed effects and random effects are
similar, the parameter estimates obtained from the SUR estimation method are quite
different. The impact of different covariance assumptions when estimating the model
is obvious. Since the SUR method is typically applied to estimating a model with
varying slope as well as intercept terms, we can easily estimate the unrestricted
model by removing (or commenting out) the restriction statement in line 16 of
lesson16.5 above. By comparing the results to those of the restricted model, the
validity of the assumption of constant slopes may be tested. The following table
presents the comparison results of restricted and unrestricted estimates (standard
errors are in parentheses). The large Likelihood Ratio statistic of the two models,
calculated as 2 × [-459.092 -(-490.753)] = 63.322, leads us to the conclusion that the
slope parameters are not the same across the five firms under consideration.
In summary, we have presented the classical methods of panel data analysis: fixed
effects and random effects. A more general SUR approach was introduced, which
allowed us to consider contemporaneous correlation across individuals, which the
classical methods ignore. Misspecification issues such as autocorrelation and
heteroscedasticity in panel data are important. In Chapter X we discussed the
268
PANEL DATA ANALYSIS
269
XVII
Least Squares Prediction
The art of forecasting lies in building a practical model for real world application,
and in the preceding chapters, we have presented all the tools necessary to do so in
GPE. This chapter introduces the few remaining GPE control variables dedicated
solely to least squares prediction and time series forecasting.
Least squares prediction is nothing more than the extrapolation of the estimated
regression model from a set of historical observations into the unknown future. It is
assumed that given the stable model structure, the future state is predictable from the
replication of history.
The lessons in this chapter use the data file gdp96.txt. It consists of four variables:
QUARTER (quarterly index), GDP (Gross Domestic Product in billions of dollars),
PGDP (Implicit Price Deflator of GDP, 2000 = 100), and LEADING (Composite
Economic Leading Indicator, 1996 = 100). We note that the quarterly series
LEADING is the last month of the quarter.
The target variable is the annual growth rate of real GDP. The following GAUSS
statements generate the required data series of GDP growth:
rgdp = 100*gdp./pgdp;
growth = 100*(rgdp-lagn(rgdp,4))./lagn(rgdp,4);
First, Real Gross Domestic Product is expressed in billions of 2000 dollars. Then,
GDP growth is measured as the annual percentage rate of change in real GDP from
the same quarter last year. Although the causal relationship of the variables
LEADING and GROWTH is well grounded, we have to make sure that these two
variables are cointegrated. It turns out that both variables are stationary or I(0)
processes and thus do not have unit roots. Moreover, the two variables are
cointegrated. We leave the details of the unit roots and cointegration tests of
LEADING and GROWTH as exercises. See also Chapter XVI for a thorough
review.
We are now ready to construct a working model suitable for short-run structural
estimation and prediction. Since forecasting is a time-sensitive business, we reserve
the last two years of data for ex-post forecast evaluation. In other words, we are
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
going to estimate the model using data through 2001, and see how well the model
predicts real GDP growth in 2002 and 2003. We need to construct a working model
not only for historical estimation but also for forecasting18.
If the variable LEADING can actually predict GROWTH several quarters ahead,
then a distributed lag structure must be specified. As a result of trial and error, we
have found that both the first and fifth quarters lag of LEADING are useful in
explaining historical GROWTH. In addition, the momentum effect of GDP growth is
captured with a lagged dependent variable. The model error is identified to be a
MA(4) process. By construction, the dependent variable GROWTH is the annual rate
of GDP growth based on the same quarter in the previous year. The specification of
fourth-order moving average for the model error term should not, therefore, be
surprising. Of course, this may not be the only working specification of the model
you can construct. Throughout this book we have given examples of model building.
We leave the process of finding the best model for forecasting to you. However, we
emphasize the importance of using GPE variables such as _pdl and _dlags to
determine the short-run dynamics of the model structure.
We now turn to new forecasting features of the GPE package. In GPE, the work of
least squares prediction is done by a procedure called forecast. Forecasts are
usually computed after the estimation of the model. Calling forecast is similar to
calling estimate. In calling forecast, you need to specify only the dependent
and independent variables (in that order). The estimated parameters and the
associated covariance matrix of the regression model in the immediately preceeding
estimate statement are used to compute the forecasts for the same model. The
forecasting period defaults to begin one observation after the estimation period ends
and continues to the end of the data series. If future observations of the dependent
variable become available, ex-post forecast error statistics based on the results of
least squares prediction can be used for model evaluation.
If there are longer series of right-hand side explanatory variables, ex-ante forecasts
can be computed upon request. The GPE control variables _fbegin and _fend are
used to specify the beginning and ending of the multiple steps ahead of forecasts. In
most cases, ex-ante forecasting depends on the forecasts or scenario assumptions
made regarding the explanatory independent variables. If the Composite Economic
Leading Indicator can predict the economy three to nine months ahead as claimed,
our model certainly can point out the direction of GROWTH about one year in
advance of current Leading Indicator. Furthermore, by making scenario assumptions
about the variable LEADING (for example assuming no change in LEADING for
the next year or so) we can predict the future value of GROWTH even further out on
the horizon.
18
This chapter is printed based on the 2004 forecasts, and it is subject to annual revision and
updates. The latest version of forecasts can be found in the e-book copy of this chapter on the
CD-ROM.
272
LEAST SQUARES PREDICTION
/*
** Lesson 17.1: Ex-Post Forecasts and
** Forecast Error Statistics
*/
1 use gpe2;
2 output file = gpe\output17.1 reset;
/* Model Estimation */
11 call reset;
12 _rstat=1;
13 _dlags=1;
/*
_bgtest=4;
_ebtest=4;
_acf=12;
_acf2=12;
*/
14 _arma={0,4};
15 _iter=100;
16 _method=5;
17 _begin=9; @ 1961Q1 @
18 _end=172; @ 2001Q4 @
19 call estimate(growth,xvar);
/* Forecasting */
20 _fstat=1;
21 _fplot=1;
@ _dynamic=1; @
22 call forecast(growth,xvar);
23 end;
The program is divided into two main sections: estimation and forecasting. Notice
that line 10 assigns the matrix of independent variables to the variable XVAR.
XVAR is then passed to both estimate (line 19) and forecast (line 22).
Modifying the independent variable matrix can quickly be done by editing only line
10.
The distributed lag structure of the model includes the first and fifth quarters lags of
the independent variable LEADING (line 10) and a one quarter lag of the dependent
variable GROWTH (line 13). The first five quarters of data series are lost due to
variable transformation. Through model identification, we determine that the error
structure follows an MA(4) process. Using the QHC method for maximum
likelihood estimation (lines 15-16), the model is estimated from the first quarter of
1961 (or the 9th observation) to the fourth quarter of 2001 (or the 172nd
observation):
_begin = 9;
_end = 172;
273
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
The _begin statement (line 17) safely blocks out the unusable data series for
estimation, while _end statement (line 18) reserves the rest of the data series (the
last two years of 2002 and 2003) for ex-post forecast evaluation. The output of the
estimated model follows:
Initial Result:
Log Likelihood = -220.68
Parameters = 0.64250 0.32175 -0.33188 1.6991 0.00000
0.00000 0.00000 0.00000
Final Result:
Iterations = 23 Evaluations = 241900
Log Likelihood = -190.95
Parameters = 0.70158 0.29521 -0.30270 1.3044 -0.14822
-0.18625 -0.049189 0.66183
Gradient Vector = -0.24025 -5.8682 -5.7777 -0.070409
0.0014879 -0.00099907 -0.00010876 0.0028283
274
LEAST SQUARES PREDICTION
Although the estimated model with MA(4) error structure looks satisfactory, add the
following few lines before the estimation call (line 19):
_bgtest = 4;
_ebtest = 4;
_acf = 12;
_acf2 = 12;
and rerun the model to verify the serial correlation problem in the conditional mean
and variance,19 if any.
The next section of the program (lines 20 to 22), calls for least squares prediction
based on the immediate previously estimated model. Simply calling forecast
specifies the default prediction period, which begins after the last observation used in
the regression estimation, and ends with the end of the sample. We note that the
beginning and end of the prediction period can be controlled by two GPE input
variables, _fbegin and _fend, respectively.
Ex-post forecast error statistics are computed by setting the input control variable
_fstat=1 (line 20). This control variable is similar to its counterpart, _rstat,
used for estimate. In addition, plotting of the forecasts and actuals can provide a
visual clues as to the model’s performance. This is done in line 21 by setting the
input control variable _fplot = 1.
19
The dynamic model may be correlated in terms of conditional variance, identifiable with
ARCH or GARCH specification.
275
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Each observation in the forecast period is listed, complete with observed and
predicted values. Residuals (or forecast errors) and their standard errors are also
given. Since we have put aside the last two years (eight quarters) of the GROWTH
data series to be compared with the forecasts, ex-post forecast error statistics,
including mean squared error and its components, are computed from the last eight
quarters of GDP growth. Moreover, forecasts in pairs with actuals are plotted
together with the band of two standard errors. We note that the upper and lower
bounds of forecast are the minimal spread of the prediction. In reality, the forecast
interval tends to be much wider due to additional non-sampling errors. Econometrics
texts describe the model evaluation based on this set of forecast error statistics in
detail. We leave judgment of the model’s performance to you.
It can be shown that the method of least squares yields the best, linear, and unbiased
predictor. Since the model is dynamic in nature (with a lagged dependent variable),
we have an option to perform a dynamic forecast. A dynamic forecast is obtained by
using the predicted lagged dependent variable on the right-hand side of the
forecasting equation, instead of the actual lagged dependent variable. Let’s turn on
the dynamic option of least squares prediction:
_dynamic = 1;
Make sure that the dynamic option is added before calling forecast, and run the
program to see the result:
276
LEAST SQUARES PREDICTION
As expected, the model performance deteriorates when we forecast farther ahead into
the future. This is because the predicted value of the lagged dependent variable is
used in place of the actual value of the lagged dependent variable. Including the
predicted value of the lagged dependent variable simply means that each forecast
error is compounded over the forecast period. One important characteristic of the
dynamic forecast is that the further in the future we try to predict, the less reliable the
forecasts we get.
Recall that the variable z is the original data matrix read from the data file
gdp96.txt. Similarly, we could create a “pessimistic scenario” similar to the
following, in which the variable LEADING declines at 2 percent over the next year:
leading = z[2:n,4]|115.0|114.4|113.8|113.2|112.6;
Or, we could assume an “optimistic scenario” (2 percent annual growth rate) as well:
leading = z[2:n,4]|115.0|115.6|116.2|116.8|117.4;
In other words, ex-ante forecasts are nothing but a crystal-ball prediction about
uncertain future conditions. To keep the model performance in line with the available
information, we do not use the dynamic features of the least squares prediction
during the ex-post forecast periods. Dynamic forecast is automatic anyway, during
the ex-ante forecast periods, since the value of the lagged dependent variable is not
available and must first be predicted for the period that follows.
The following program is almost identical to that of Lesson 17.1. Pay attention to the
change we made in line 7 assuming a scenario for ex-ante forecast:
277
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
/*
** Lesson 17.2: Ex-Ante Forecasts
*/
1 use gpe2;
2 output file = gpe\output17.2 reset;
8 rgdp = 100*gdp./pgdp;
9 growth = 100*(rgdp-lagn(rgdp,4))./lagn(rgdp,4);
10 xvar = lagn(leading,1)~lagn(leading,5);
/* Model Estimation */
11 call reset;
12 _rstat=1;
13 _dlags=1;
14 _arma={0,4};
15 _iter=100;
16 _method=5;
17 _begin=9; @ 1961Q1 @
18 _end=172; @ 2001Q4 @
19 call estimate(growth,xvar);
/* Forecasting */
20 _fstat=1;
21 _fplot=1;
@ _dynamic=1; @
22 call forecast(growth,xvar);
23 end;
278
LEAST SQUARES PREDICTION
Similarly, we run the other two scenarios, pessimistic (LEADING decreases) and
optimistic (LEADING increases), respectively. Instead of listing each of the
forecasting results, we compare the respective ex-ante forecasts in the following
table:
Furthermore, the following graph summarizes the ex-post and ex-ante forecasts for
three different scenarios. The picture can tell a complicate story more clearly.
What you can say about the predictability of the Composite Economic Leading
Indicator? No matter which scenario is used the economy appears to be heading
279
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
towards a so-called “jobless recovery”. Will the recovery continue? Even under the
optimistic view, the growth may not sustain by the end of 2004. It will certainly
depend on an effective government policy to revive the growth. Only time will tell!
280
Epilogue
This is not the end of Gauss Programming for Econometricians and
Financial Analysts!
Many important topics in econometrics we did not cover here would certainly be
good candidates for GAUSS implementation. To name a few examples:
Beyond GPE, you may feel ready to write your own codes for econometric and
statistical applications. More than 400 GAUSS commands, procedures, and functions
are available at your disposal as part of the GAUSS programming environment. As a
consequence, we have seen powerful procedures being developed over the past
years.
Whatever your eventual goals, you will probably agree that learning econometrics
with GPE is certainly the right first step. We have demonstrated that GAUSS is a
programming environment built on the convenient syntax and operations of matrix
algebra. As you step through each lesson, learning to master GPE, you also learn
GAUSS and econometrics. From here, the next step is up to you!
Appendix A
GPE Control Variables
There are two types of global control variables in GPE: input control variables and
output control variables. For reference purposes, consider the following general
regression equation:
F(Z, β) = ε
where Z is the data matrix of variables and β is the vector of parameters, which
define the functional form F. ε is the error structure of the model. Z can be further
decomposed as Z = [Y, X] with Y denoting the endogenous (dependent) variables
and X the predetermined (independent) variables. If Y consists of more than one
column, it is a system of linear equations. For a classical regression model, Y =
f(X, β) + ε or F(Z, β) = Y - f(X, β) = ε. The simple case of single linear regression
equation is written as:
Y = Xβ + ε
where Y is the left-hand side (LHS) or dependent variable, and X denotes the right-
hand side (RHS) explanatory or independent variables. β is the vector of estimated
parameters, and ε is the vector of estimated residuals.
Three categories of input control variables are listed below: general-purpose input
control variables, estimate (and optimize) input control variables, and
forecast input control variables. Unless otherwise specified, setting each variable
to 1 (that is, true or yes) activates or turns on the optional behavior specified. If the
variable is not defined or specified, then its default value is assumed.
_acf Specify the number of lags for computing autocorrelation and partial
autocorrelation coefficients from the estimated regression residuals.
Useful for testing the ARMA error structure. Display and plot the
functions if _rplot > 0 (see below). In addition, standard errors of
coefficients and Box-Pierece and Ljung-Box portmanteau test statistics
are presented up to the number of lags specified in _acf. For example,
12 lags of autocorrelation and partial autocorrelation functions are
requested by setting:
_acf = 12;
_acf = 0; (default)
As an option for computing autocorrelation coefficients and the
associated standard errors using regression method, the second element
of the vector _acf may be set to a positive value, with the first element
indicating the number of lags requested. For example:
_acf = {12,1};
_acf2 Same as _acf except that the autocorrelation and partial autocorrelation
coefficients are computed from the squared standardized residuals.
Useful for testing the GARCH error structure.
_acf2 = 0; (default)
284
APPENDIX A
285
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
_dlags A scalar or a 2x1 column vector to specify the use of lagged dependent
variables. As a scalar, it is the order of the regular lagged dependent
variables in use. As a 2x1 column vector, a seasonal lagged dependent
variables model is identified with order _dlags[1] and seasonal span
_dlags[2] (require _dlags[2]>0). Normally, _dlags[2] = 4 for a model
with quarterly data series, while _dlags[2] = 12 for the monthly case.
_dlags[1] or the scalar _dlags is always the order number. For a pure
(regular or seasonal) lagged dependent variables model, set RHS
variable X = 0 in calling ESTIMATE procedure and specify the column
vector _dlags accordingly. For example:
_dlags = q; (or equivalently, _dlags = {q,1};)
_dlags = {q,s};
Where q is the order of autocorrelation and s is the seasonal span.
_dlags = 0; (default)
For a system model, _dlags is a gxg matrix with the value of its entry
indicating the number of lags for each endogenous variable (column) in
each equation (row). A zero ij-element of _dlags signifies that no lag is
used for the j-th variable in the i-th equation. Here, g is the number of
endogenous variables or equations.
_drop Drop the first few observations for model estimation. Depending on the
method of estimation, initial unusable observations may be dropped
automatically.
_drop =1: Drop the first observation or the first seasonal span of
observations for AR model estimation;
_drop =0 (default): Keep the first observation or the first seasonal span
of observations for AR model estimation with appropriate data
transformation.
_eq Specify the stochastic equation specification matrix for system model
286
APPENDIX A
287
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
_id Specify the identity equation specification matrix for a system model.
Similar size and setup as _eq (see above) except that its entries can be
any value as required. If _id is not specified, or _id=0 by default, there
is no identity. Note: gs=rows(_eq|_id) to ensure the system
compatibility.
288
APPENDIX A
_ivar may be used together with _iter and _hacv (see above) to produce
the GMM estimation.
289
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
290
APPENDIX A
10.
_restart = 0; (default)
_restr Perform restricted least squares estimation with the linear restrictions
defined in accordance with the form: Rb = q, or [R1 R0][b1 b0]’ = q,
where b1 is the vector of slope coefficients and R1 is the restriction
matrix corresponds to b1. Similarly, b0 is the intercept and R0
corresponds to b0. q is the vector of restricted values. Linear restrictions
involving intercept should be specified in conjunction with _const = 0.
If _restr is specified, then _restr = [R1 q]. Requires rows(_restr) =
number of restrictions, and cols(_restr) = cols(X).
_restr = 0; (default)
For a nonlinear model, residual plot is meaningful only for the cases
specified with error component functions (i.e., _nlopt=0 or 1). In
defining the error component function, the dependent variable must be
the first column of the data matrix and it has not been transformed
within the definition of error component function.
For a nonlinear model, residual statistics are meaningful only for the
cases specified with error component functions (i.e., _nlopt=0 or 1). In
defining the error component function, the dependent variable must be
the first column of the data matrix and it has not been transformed
within the definition of error component function.
291
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
residual normality;
_rtest[2] is the same as _bptest (see above) for Breusch-Pagan test for
heteroscedasticity;
_rtest[3] is the same as _bgtest (see above) for Breusch-Godfrey test
for higher-order autocorrelation;
_rtest[4] is the same as _ebtest (see above) for Engle-Bollerslev test
for higher-order autoregressive conditional heteroscedasticity.
_rtest = 0 (default): No tests are performed.
_step Specify step size of line search method for iterative or nonlinear model
estimation.
_step = 0 (default): Cut back (half) step size is used;
_step = 1: Quadratic step size is used.
_tol Set the convergence tolerance level for iterative or nonlinear model
estimation.
_tol = 0.001; (default)
_weight Perform weighted least squares estimation with the weighting variable
defined in _weight. _weight must be a column vector and
rows(_weight) >= rows(X).
_weight = 0; (default)
Variable Description
292
APPENDIX A
293
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
294
APPENDIX A
Miscellaneous
A few GAUSS built-in procedures have been modified, and they can be called
throughout the program using GPE package.
Procedure Description
295
Appendix B
GPE Application Modules
Each of the GPE application modules is given AS IS. The user is free to use and to
make changes as needed for different purposes. However, the usual disclaimer
applies. In particular, the following copyright statement must be presented as long as
all or part of the program code is used in your work:
** ==>
** ==> call gmmout(x,__b);
*/
declare gmmw ?= 1;
/*
Sample average of moments
*/
proc gmmm(x,b);
local m,d;
m=meanc(mf(x,b));
retp(m);
endp;
/*
Covariance matrix of sample averages of moments
considering White-Newey-West autocovariances
depending on global _hacv
*/
proc gmmv(x,b);
local n,m,v,s,j;
n=rows(x);
m=mf(x,b)/n;
v=m'm; @ hetero. variances @
j=1;
do until j>_hacv[2]; @ autocovariances @
s=m'*missrv(lagn(m,j),0);
v=v+(1-j/(_hacv[2]+1))*(s+s');
j=j+1;
endo;
retp(v);
endp;
/*
GMM criterion function: depending on global gmmw
Weighted sum of squared sample averages of moments
*/
proc gmmqw(x,b);
local m;
m=gmmm(x,b);
retp(m'*gmmw*m);
endp;
/*
GMM criterion function: general
Weighted sum of squared sample averages of moments
*/
proc gmmq(x,b);
local m;
m=gmmm(x,b);
gmmw=invpd(gmmv(x,b));
retp(m'*gmmw*m);
endp;
298
APPENDIX B
print;
print "Hansen Test Statistic of the Moment Restrictions";
print ftos(rows(m)-rows(b),"Chi-Sq(%*.*lf) = ",4,0);;
print q;
__vb=vb; @ using the GMM var-cov matrix @
endp;
one=ones(n,1);
if p>1; @ VAR(p), p>1 @
x=y[.,2*m+1:cols(y)]; @ RHS x data matrix @
if c>0;
if c==1; @ with drift only @
e=one-x*(one/x);
endif; @ constant regression residuals @
if c==2; @ with trend drift @
x=x~one;
endif;
endif;
@ auxiliary regression residuals @
u=dy-x*(dy/x); @ (1) difference regression @
v=y1-x*(y1/x); @ (2) lag regression @
else; @ p==1, or VAR(1) @
if c>0;
u=dy-meanc(dy)';
v=y1-meanc(y1)';
if c==1; e=one; endif;
else;
u=dy; v=y1;
endif;
endif;
299
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
300
APPENDIX B
z=packr(ys~xs);
y=z[.,1];
x=z[.,2:k+1];
call estimate(y,x);
rssr=__rss;
dfr=__df;
ws=vec(reshape(w,t,n));
z=packr((ys-ws.*yms)~(xs-ws.*xms));
y=z[.,1];
x=z[.,2:k+1];
call estimate(y,x);
b2=__b;
vb2=__vb*(__df/dfur);
xm2=xm~(-1/w); @ w must be > 0 @
a2=w.*(ym-xm2*b2);
va2=(w^2).*(v/meanc(ts)+xm2*vb2*xm2');
h=(b1[1:k]-b2[1:k])'*inv(vb1[1:k,1:k]-vb2[1:k,1:k])*(b1[1:k]-b2[1:k]);
/* print output */
print;
print "Panel Data Model Estimation Procedure:";
print "(1) Pooled Regression";
print "(2) Between-Groups Regression";
print "(3) Fixed Effects (Within-Groups) Regression";
print "(4) Random Effects (Weighted Within-Groups) Regression";
print;
print "Wald F Test Statistic for Fixed Effects";
301
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
print ftos(dfr-dfur,"F(%*.*f,",4,0);;
print ftos(dfur,"%*.*f) = ",4,0);;
print wf;
print;
print "Breusch-Pagan LM Test Statistic for Random Effects";
print ftos(1,"Chi-Sq(%*.*f) = ",4,0);;
print bp;
print;
print "Hausman's Test for Fixed and Randon Effects";
print ftos(k,"Chi-Sq(%*.*f) = ",4,0);;
print abs(h);
print;
print "Within-Groups Estimates:";
print " Fixed S.E. Random S.E.";;
print b1~sqrt(diag(vb1))~b2~sqrt(diag(vb2));
print;
print "One-Way Effects:";
print "Section/Period Fixed S.E. Random S.E.";;
print seqa(1,1,n)~a1~sqrt(diag(va1))~a2~sqrt(diag(va2));
endp;
302
APPENDIX B
i=1;
do until i>n;
i1=(i-1)*t;
y1=ys[i1+1:i1+t];
x1=xs[i1+1:i1+t,.];
z=packr(y1~x1);
ts[i]=rows(z);
zm=meanc(z);
ymi[i]=zm[1];
xmi[i,.]=zm[2:rows(zm)]';
i=i+1;
endo;
i=1;
do until i>t;
i1=(i-1)*n;
y1=y[i1+1:i1+n];
x1=x[i1+1:i1+n,.];
z=packr(y1~x1);
ns[i]=rows(z);
zm=meanc(z);
303
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
ymt[i]=zm[1];
xmt[i,.]=zm[2:rows(zm)]';
i=i+1;
endo;
/*
bp=(n*t/2)*(
(1/(t-1))*((sumc(sumc(reshape(__e,n,t)')^2)/sumc(sumc(__e^2))-1)^2)+
(1/(n-1))*((sumc(sumc(reshape(__e,n,t))^2)/sumc(sumc(__e^2))-1)^2));
*/
304
APPENDIX B
wf=((rssr-rssur)/(dfr-dfur))/(rssur/dfur);
b1=__b;
vb1=__vb*(__df/dfur);
c1=ymm-xmm*b1[1:k]; @ overall effects, note: b1[k+1]=0 @
a1i=(ymi-ymm)-(xmi-xmm)*b1[1:k]; @ cross sections effects @
a1t=(ymt-ymm)-(xmt-xmm)*b1[1:k]; @ time periods effects @
/* random effects model (weights must be computed for each obs (nxt)) */
v3=meanc(v1)+meanc(v2)-v; @ v3 is a scalar @
w1=1-sqrt(v./v1); @ w1 is a nx1 vector @
w1=(w1.<=0).*__macheps + (w1.>0).*w1; @ 0 < w1 <= 1 @
w2=1-sqrt(v./v2); @ w2 is a tx1 vector @
w2=(w2.<=0).*__macheps + (w2.>0).*w2; @ 0 < w2 <= 1 @
w3=maxc((1-sqrt(v./v3))|__macheps);
w3=meanc(w1)+meanc(w2)-w3; @ w3 is a scalar @
w1s=vec(reshape(w1,t,n));
w2s=reshape(w2,n*t,1);
ystar=ys-w1s.*ymis-w2s.*ymts+w3.*ymm;
xstar=xs-w1s.*xmis-w2s.*xmts+w3.*xmm;
z=packr(ystar~xstar);
y=z[.,1];
x=z[.,2:k+1];
call estimate(y,x);
b2=__b;
vb2=__vb*(__df/dfur);
c2=w3.*(ymm-xmm*b2[1:k])+b2[k+1]; @ overall effect @
a2i=(w1.*ymi-w3.*ymm)-(w1.*xmi-w3.*xmm)*b2[1:k]; @ individual effects @
a2t=(w2.*ymt-w3.*ymm)-(w2.*xmt-w3.*xmm)*b2[1:k]; @ period effects @
h=(b1[1:k]-b2[1:k])'*inv(vb1[1:k,1:k]-vb2[1:k,1:k])*(b1[1:k]-b2[1:k]);
/* print output */
print;
print "Panel Data Model Estimation Procedure:";
print "(1) Pooled Regression";
print "(2) Between-Groups (Cross Sections) Regression";
print "(3) Between-Groups (Time Periods) Regression";
print "(4) Fixed Effects (Within-Groups) Regression";
print "(5) Random Effects (Weighted Within-Groups) Regression";
print;
print "Wald F Test Statistic for Fixed Effects";
print ftos(dfr-dfur,"F(%*.*f,",4,0);;
print ftos(dfur,"%*.*f) = ",4,0);;
print wf;
print;
print "Breusch-Pagan LM Test Statistic for Random Effects";
print ftos(2,"Chi-Sq(%*.*f) = ",4,0);;
print bp;
print;
print "Hausman's Test for Fixed and Randon Effects";
print ftos(k,"Chi-Sq(%*.*f) = ",4,0);;
print abs(h);
print;
print "Within-Groups Estimates:";
print " Fixed S.E. Random S.E.";;
print b1~sqrt(diag(vb1))~b2~sqrt(diag(vb2));
print;
print "Two-Way Effects:";
print " Fixed Random";
print " Overall " c1~c2;
print;
print "Cross Sections Effects:";
print " Sections Fixed Random";;
print seqa(1,1,n)~a1i~a2i;
print;
305
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
306
APPENDIX B
i=1;
do until i>n;
i1=(i-1)*t;
ik=(i-1)*k;
y1=ys[i1+1:i1+t];
x1=xs[i1+1:i1+t,.];
@ in case of missing values @
z1=packr(y1~x1);
y1=z1[.,1];
x1=z1[.,2:cols(z1)];
@ ts[i]=rows(z1); @
call estimate(y1,x1);
b[.,i]=__b;
vb[ik+1:ik+k,.]=__vb;
bw=bw+invpd(__vb)*__b;
vbw=vbw+invpd(__vb);
i=i+1;
endo;
307
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
gb=vcx(b');
sumgv=0;
i=1;
do until i>n;
ik=(i-1)*k;
sumgv=sumgv+invpd(gb+vb[ik+1:ik+k,.]);
i=i+1;
endo;
bstar=0;
i=1;
do until i>n;
ik=(i-1)*k;
w=vbstar*invpd(gb+vb[ik+1:ik+k,.]);
bstar=bstar+w*b[.,i];
i=i+1;
endo;
ginv=invpd(gb);
bw=zeros(k,n);
vbw=zeros(k,n); @ variances only @
i=1;
do until i>n;
ik=(i-1)*k;
vinv=invpd(vb[ik+1:ik+k,.]);
a=invpd(vinv+ginv)*ginv;
a=a~(eye(k)-a);
bw[.,i]=a*(bstar|b[.,i]);
sumgv=(sumgvw~vbstar)|(vbstar'~(gb+vb[ik+1:ik+k,.]));
vbw[.,i]=diag(a*sumgv*a');
i=i+1;
endo;
/* output report */
print;
print "Randon Coefficients Model Estimation:";
print "(1) Individual Equation OLS Regression";
print "(2) Generalized Least Squares Regression";
print "(3) Individual Equation Parameters Prediction";
print;
print "Swamy Test Statistic for Random Coefficients";
print ftos(k*(n-1),"Chi-Sq(%*.*f) = ",4,0);;
print swamy;
print;
print "Random Coefficients Estimates:";
print " No. Parameter S.E.";;
print seqa(1,1,k)~bstar~sqrt(diag(vbstar));
print;
i=1;
do until i>k;
print "Individual Parameter Prediction:" i;
print "Section/Period Parameter S.E.";;
308
APPENDIX B
print seqa(1,1,n)~bw[i,.]'~sqrt(vbw[i,.]');
print;
i=i+1;
endo;
endp;
309
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
/* pooled regression */
call estimate(ys,xs);
b=__b;
vb=__vb;
ll=__ll;
e=reshape(__e,n,t)';
v=e'e/t; @ not same as v=vcx(e); @
@ consider both cross section hetero. and serial corr. @
/*
@ consider cross section heteroscedasticity only @
v=diagrv(eye(n),diag(v));
*/
print;
print "System of Regression Equations:";
print ftos(n,"Number of Equations = %-*.*lf",12,0);
print ftos(k,"Number of Parameters = %-*.*lf",12,0);
print ftos(t,"Number of Observations = %-*.*lf",12,0);
print;
it=1; fail=0;
do until it>_iter;
b0=b; vb0=vb; ll0=ll;
/*
@ memory extensive computation @
vinv=invpd(v.*.eye(t));
vb=invpd(x'*vinv*x);
310
APPENDIX B
b=vb*(x'*vinv*ys);
*/
@ memory saving computation @
@ less efficient with double loops @
vinv=invpd(v);
xx=0; xy=0;
i=1;
do until i>n;
i1=(i-1)*t;
j=1;
do until j>n;
j1=(j-1)*t;
xx=xx+vinv[i,j]*x[i1+1:i1+t,.]'*x[j1+1:j1+t,.];
xy=xy+vinv[i,j]*x[i1+1:i1+t,.]'*ys[j1+1:j1+t];
j=j+1;
endo;
i=i+1;
endo;
vb=invpd(xx);
b=vb*xy;
e=reshape(ys-x*b,n,t)';
v=e'e/t;
/*
v=diagrv(eye(n),diag(v));
*/
ll=-0.5*t*(n*(1+ln(2*pi))+ln(det(v)));
it=it+1;
endo;
print;
print "Log-Likelihood Function Value = " ll;
if fail==1;
print "WARNING: Log-Likelihood Fails to Improve!";
endif;
if it>_iter;
print "WARNING: Iteration Limit Exceeded!";
endif;
print;
print " Parameter S.E.";;
format /ros; print b~sqrt(diag(vb));
print;
endp;
311
Appendix C
Statistical Tables
Statistical tables for normal distribution, t distribution, Chi-squared distribution, and
F distribution are available from most statistics references. Durbin-Watson bounds
test statistics are readily available in econometric textbooks. In this appendix, we list
only the not-so-popular statistical tables for testing unit roots and cointegration as
discussed in Chapter XVI.
Table C-1. Critical Values for the Dickey-Fuller Unit Root Test
Based on t-Statistic
Model
Model I: ∆Xt = (ρ-1) Xt-1 + Σi=1,2,…ρi ∆Xt-i + εt
Model II: ∆Xt = α + (ρ-1) Xt-1 + Σi=1,2,…ρi ∆Xt-i + εt
Model III: ∆Xt = α + β t + (ρ-1) Xt-1 + Σi=1,2,…ρi ∆Xt-i + εt
Test Statistic
τρ t-statistic (non-symmetric distribution, testing ρ = 1)
τα t-statistic (symmetric distribution, testing α = 0 given ρ = 1)
τβ t-statistic (symmetric distribution, testing β = 0 given ρ = 1)
Source
Fuller (1976, p. 373); Dickey and Fuller (1981).
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
314
APPENDIX C
Table C-2. Critical Values for the Dickey-Fuller Unit Root Test
Based on F-Statistic
Model
Model II: ∆Xt = α + (ρ-1) Xt-1 + Σi=1,2,… ρi ∆Xt-i + εt
Model III: ∆Xt = α + β t + (ρ-1) Xt-1 + Σi=1,2,… ρi ∆Xt-i + εt
Test Statistic
φ1 F-statistic (testing α = 0 and ρ = 1 on Model II)
φ2 F-statistic (testing α = 0, β = 0, and ρ = 1 on Model III)
φ3 F-statistic (testing β = 0 and ρ = 1 on Model III)
Source
Dickey and Fuller (1981).
315
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
Model
Yt = α + Xt β + εt
∆εt = (ρ-1) εt-1 + Σi=1,2,…ρi∆εt-i + ut
K = Numbers of variables in the cointegration tests, i.e., [Yt, Xt].
t = 1,2,…, N (500).
Test Statistic
τρ t-statistic (testing ρ = 1)
Source
Phillips and Ouliaris (1990).
Note: For the case of two variables in Model 2a, X is trended but Y is not. It is
asymptotically equivalent to ADF unit root test for Model III (see Table C-1, τρ for
N=500). If only Y has drift (Model 3), the cointegration equation can be expressed as
Yt = α + γ t + Xt β + εt. Therefore, the same critical values of Model 2a apply to
Model 3 for one extra variable t (but do not count for K).
316
APPENDIX C
Table C-4. Critical Values for Unit Root and Cointegration Tests
Based on Response Surface Estimates
Critical values for unit root and cointegration tests can be computed from the
equation:
Notation
Model: 1=no constant; 2=no trend; 3=with trend;
K: Number of variables in cointegration tests (K=1 for unit root test);
N: Number of observations or sample size;
e: Level of significance, 0.01, 0.05, 0.1.
Source
MacKinnon (1991).
317
GAUSS PROGRAMMING FOR ECONOMETRICIANS AND FINANCIAL ANALYSTS
K Model e b b1 b2
1 1 0.01 -2.5658 -1.960 -10.04
1 1 0.05 -1.9393 -0.398 0.00
1 1 0.10 -1.6156 -0.181 0.00
1 2 0.01 -3.4335 -5.999 -29.25
1 2 0.05 -2.8621 -2.738 -8.36
1 2 0.10 -2.5671 -1.438 -4.48
1 3 0.01 -3.9638 -8.353 -47.44
1 3 0.05 -3.4126 -4.039 -17.83
1 3 0.10 -3.1279 -2.418 -7.58
2 2 0.01 -3.9001 -10.534 -30.03
2 2 0.05 -3.3377 -5.967 -8.98
2 2 0.10 -3.0462 -4.069 -5.73
2 3 0.01 -4.3266 -15.531 -34.03
2 3 0.05 -3.7809 -9.421 -15.06
2 3 0.10 -3.4959 -7.203 -4.01
3 2 0.01 -4.2981 -13.790 -46.37
3 2 0.05 -3.7429 -8.352 -13.41
3 2 0.10 -3.4518 -6.241 -2.79
3 3 0.01 -4.6676 -18.492 -49.35
3 3 0.05 -4.1193 -12.024 -13.13
3 3 0.10 -3.8344 -9.188 -4.85
4 2 0.01 -4.6493 -17.188 -59.20
4 2 0.05 -4.1000 -10.745 -21.57
4 2 0.10 -3.8110 -8.317 -5.19
4 3 0.01 -4.9695 -22.504 -50.22
4 3 0.05 -4.4294 -14.501 -19.54
4 3 0.10 -4.1474 -11.165 -9.88
5 2 0.01 -4.9587 -22.140 -37.29
5 2 0.05 -4.4185 -13.461 -21.16
5 2 0.10 -4.1327 -10.638 -5.48
5 3 0.01 -5.2497 -26.606 -49.56
5 3 0.05 -4.7154 -17.432 -16.50
5 3 0.10 -4.4345 -13.654 -5.77
6 2 0.01 -5.2400 -26.278 -41.65
6 2 0.05 -4.7048 -17.120 -11.17
6 2 0.10 -4.4242 -13.347 0.00
6 3 0.01 -5.5127 -30.735 -52.50
6 3 0.05 -4.9767 -20.883 -9.05
6 3 0.10 -4.6999 -16.445 0.00
318
APPENDIX C
Notations
VAR Model: 1=no constant; 2=drift; 3=trend drift
N: Sample Size, 400
M: Number of Variables
r: Number of Cointegrating Vectors or Rank
Degree of Freedom = M-r
Source
Johansen (1988), Johansen and Juselius (1990), and Osterwald-Lenum (1992).
319
References
GAUSS for Windows User Guide (Version 5.0), 2002, Aptech Systems, Inc.
GAUSS Language Reference (Version 5.0), 2002, Aptech Systems, Inc.
E. Berndt and D. Wood, 1975, “Technology, Prices, and the Derived Demand for
Energy,” Review of Economics and Statistics, 259-268.
T. Bollerslev, 1986, “Generalized Autoregressive Conditional Heteroskedasticity,”
Journal of Econometrics, 31, 307-327.
T. Bollerslev and E. Ghysels, 1996, “Periodic Autoregressive Conditional
Heteroscedasticity,” American Statistical Association Journal of Business and
Economic Statistics, 14, 139-151.
J. Boot and G. deWitt, 1960, “Investment Demand: An Empirical Contribution to the
Aggregation Problem,” International Economic Review, 1, 3-30.
J. Y. Campbell, A. W. Lo, and A. C. Mackinlay, 1977, The Econometrics of
Financial Markets, Princeton University Press.
R. Davidson and J. G. MacKinnon, 1973, Estimation and Inference in Econometrics,
Oxford University Press.
D. A. Dickey and W. A. Fuller, 1981, “Likelihood Ratio Statistics for Autoregressive
Time Series with a Unit Root,” Econometrica, 49, 1057-1072.
P. J. Dhrymes, 1970, Econometrics, Harper & Row.
R. Engle and C. Granger, 1987, “Co-integration and Error Correction:
Representation, Estimation and Testing,” Econometrica, 35, 251-276.
R. F. Engle, 1982, “"Autoregressive Conditional Heteroscedasticity with Estimates
of the Variance of United Kingdom Inflation,” Econometrica, 50, 987-1006.
R. F. Engle, D. M. Lilien, and R. P. Robins, 1987, “Estimating Time-Varying Risk
Premia in the Term Structure: the ARCH-M Model,” Econometrica 55, 391-407.
R. Fair, 1978, “A Theory of Extramarital Affairs,” Journal of Political Economy, 86,
45-61.
W. A. Fuller, 1976, Introduction to Statistical Time Series, John Wiley.
S. Goldfeld and R. Quandt, 1972, Nonlinear Methods in Econometrics, Chapter 1:
Numerical Optimization, North-Holland, 1-38.
W. H. Greene, 2002, Econometric Analysis, 5th ed., Prentice Hall.
W. H. Greene, 1999, Econometric Analysis, 4th ed., Prentice Hall.
W. H. Greene, 1997, Econometric Analysis, 3rd ed., Prentice Hall.
Y. Grunfeld and Z. Griliches, 1960, “Is Aggregation Necessarily Bad?” Review of
Economics and Statistics, 42, 1-13.
A. Hall, 1993, “Some Aspects of Generalized Method of Moments Estimation,”
Handbook of Statistics, Vol. 11, ed. by G. S. Maddala, C. R. Rao, and H. D.
Vinod, Elsevier Science Publishers, North-Holland, 393-417.
R. Hall, 1978, “Stochastic Implications of the Life Cycle-Permanent Income
Hypothesis: Theory and Evidence,” Journal of Political Economy 86, 971-987.
J. D. Hamilton, 1994, Time Series Analysis, Princeton University Press.
L. P. Hansen and K. J. Singleton, 1982, “Generalized Instrumental Variables
Estimation of Nonlinear Rational Expectations Models,” Econometrica 50, 1269-
1286.
J. Hausman, 1975, “An Instrumental Variable Approach to Full-Information
Estimators for Linear and Certain Nonlinear Models,” Econometrica, 727-738.
F. Hayashi, 2000, Econometrics, Princeton University Press.
S. Johansen, 1988, “Statistical Analysis of Cointegration Vectors,” Journal of
Economic Dynamics and Control, 12, 231-254.
S. Johansen and K. Juselius, 1990, “Maximum Likelihood Estimation and Inference
on Cointegration with Applications to the Demand for Money,” Oxford Bulletin
of Economics and Statistics, 52, 169-210.
G. G. Judge, R. C. Hill, W. E. Griffiths, H. Lutkempohl, and T.-C. Lee, 1988,
Introduction to the Theory and Practice of Econometrics, 2nd ed., John Wiley
and Sons.
G. G. Judge, R. C. Hill, W. E. Griffiths, and T.-C. Lee, 1985, Theory and Practice of
Econometrics, 2nd ed., John Wiley and Sons.
L. Klein, 1950, Economic Fluctuations in the United States: 1921-1941, John Wiley
and Sons.
J. G. MacKinnon, 1991, “Critical Values for Cointegration Tests,” in Long-Run
Economic Relationships: Readings in Cointegration, ed. by R. F. Engle and G.
W. Granger, Oxford University Press, 267-276.
T. C. Mills, 1999, The Econometric Modeling of Financial Time Series, 2nd ed.,
Cambridge University Press.
R. C. Mittelhammer, G. G. Judge, and D. J. Miller, 2000, Econometric Foundations,
Cambridge University Press.
D. B. Nelson, 1991, “Conditional Heteroscedasticity in Asset Returns, A New
Approach,” Econometrica, 59, 347-370.
D. B. Nelson and C. Q. Cao, 1992, “Inequality Constraints in the Univariate GARCH
Model,” Journal of Business and Economic Statistics, 10, 229-235.
M. Ogaki, 1993, “Generalized Method of Moments: Econometric Applications,”
Handbook of Statistics, Vol. 11, ed. by G. S. Maddala, C. R. Rao, and H. D.
Vinod, Elsevier Science Publishers, North-Holland, 455-488.
M. Osterwald-Lenum, 1992, “A Note with Quantiles of the Asymptotic Distribution
of the Maximum Likelihood Cointegration Rank Test Statistics,” Oxford Bulletin
of Economics and Statistics, 54, 461-471.
P. C. B. Phillips and S. Ouliaris, 1990, “Asymptotic Properties of Residual Based
Tests for Cointegration,” Econometrica, 58, 165-193.
R. E. Quandt, 1983, “Computational Problem and Methods,” Handbook of
Econometrics, Vol. I, ed. by Z. Griliches and M. D. Intriligator, Chapter 12, 699-
764, North-Holland.
L. Spector and M. Mazzeo, 1980, “Probit Analysis and Economic Education,”
Journal of Economic Education, 11, 37-44.
J. M. Wooldridge, 2002, Introductory Econometrics: A Modern Approach, 2nd ed.,
Thomson, South-Western.
Index
Coefficient of relative risk aversion, 188
Analysis of variance, AOV, 47 Cointegrating rank, 224
Analytical derivatives, 86 Cointegrating vector, 223
Analytical Jacobian, 108 Cointegration test, 216
ARCH in mean, ARCH-M, 241 Engle-Granger test, 224
ARMA analysis for regression residuals, Johansen test, 227
238 maximal eigenvalue test statistic, 229
ARMAX regression model, 238 trace test statistic, 229
Augmented Dickey-Fuller (ADF) test, 215 Composite economic leading indicator,
Autocorrelation, 143 271
Beach-MacKinnon iterative maximum Concentrated log-likelihood function, 99,
likelihood method, 151 106, 210
Box-Pierce Q test, 147, 155 Constant returns to scale, CRS, 53
Breusch-Godfrey LM test, 147, 155 Correlation matrix, 50, 75
Durbin-H test, 168
Durbin-Watson bounds test, 147 Data generating process, DGP, 215
first order, AR(1), 147 Dickey-Fuller (DF) test, 3, 215
higher order, 154 Distributed lag models, 167
Hildreth-Lu grid search method, 150 Almon lag, 167
Ljung-Box Q test, 147, 155 autoregressive distributed lag, ARDL,
Prais-Winsten modified Cochrane- 167, 176
Orcutt method, 149 geometric lag, 167
Autocorrelation coefficient, 147, 234 Koyck lag, 167
Autocorrelation function, ACF, 148, 233, lagged dependent variable, 167
241 polynomial lag, 167, 173
Autocorrelation-consistent covariance Dummy variable, 65, 115
matrix, 144 additive, 65, 70
Autoregressive conditional multiplicative, 65, 70
heteroscedasticity, ARCH, 233 Dummy variable trap, 66, 69
Box-Pierce Q test, 241
Engle-Bollerslev LM test, 242 Economies of scale, 254
Ljung-Box Q test, 241 Elasticity, 54, 106
Autoregressive moving average, ARMA, long-run, 167
157, 233 short-run, 167
Box-Pierce Q test, 234 Elasticity of subsititution, 206
Breusch-Godfrey LM test, 235, 242 Error correction model, 224
Durbin-Watson bound test, 235 Euler equation, 188
Ljung-Box Q test, 234
Forecasting, 271
Berndt-Wood model, 204 dynamic forecast, 276
Binary choice model, 115 ex-ante forecast, 272
Box-Cox variable transformation, 104, ex-post forecast, 271
114 forecast error statistics, 272
Full information maximum likelihood,
Censored regression model. See Limited FIML, 196
dependent variable model
CES production function, 96, 102, 111 Gamma probability distribution, 90, 179,
Cobb-Douglas production function, 52, 181
70, 161 GAUSS, 5
array operation, 13 White test, 134
characteristic roots or eigenvalues, 30 Heteroscedasticity-autocorrelation-
characteristic vectors or eigenvectors, consistent covariance matrix, 143, 183
30 Heteroscedasticity-consistent covariance
condition number, 31 matrix, 130, 143, 184
cumulative distribution function, 31 Hypothesis testings in nonlinear models
data transformation, 21 Lagrangian multiplier test, LM test, 111
descriptive statistics, 32 Likelihood ratio test, LR test, 111
element-by-element compatibility, 15 Hypothesis testings in nonlinear models,
file input and output, 21 110
gradient, 32 Wald test, 110
hessian, 32
least squares solution, 30 Information matrix, 89
logical operator, 17 Input control variable, 283
matrix operation, 13 _acf, 148, 154, 159, 236
probability density function, 31 _acf2, 241
relational operator, 16 _ar, 150, 158, 237, 238
sequential quadratic programming, 32 _arma, 158, 237
GAUSS for Windows, 5 _b, 84, 275
active file, 18 _begin, 58, 134, 202, 273
Command window, 6 _bgtest, 148, 154, 159, 236
Debug window, 7 _bjtest, 61, 135
Edit window, 6 _const, 55, 70, 203
GAUSS library system, 40 _conv, 87, 158
main file, 18 _corr, 75
main file list, 18 _dlags, 168, 176, 202, 230
Output window, 6 _drop, 150
Generalized autoregressive conditional _dynamic, 276
heteroscedasticity, GARCH, 240 _ebtest, 242
Generalized method of moments _end, 58, 134, 273
Hansen test for moment restrictions, _eq, 196, 266
181 _fbegin, 275
Langrangian multiplier test, 187 _fend, 275
Likelihood ratio test, 188 _fplot, 275
linear model, 192 _fstat, 275
nonlinear model, 182 _garch, 242
quadratic criterion function, 180 _garchx, 243
Wald test, 187 _hacv, 129, 144, 183, 192
White-Newey-West estimator, 180 _id, 196
Generalized method of moments, GMM, _iter, 150, 192, 199
179 _ivar, 170, 192, 196
Global control variables, 41, 283 _jacob, 104, 108
input control variables, 41 _method, 86, 150, 196
output control variables, 41 _names, 46
Goodness of fit, R2, adjusted R2, 47 _nlopt, 84, 105, 118, 125, 158
GPE package, 5, 41 _pdl, 173, 176
_print, 97, 153
Heteroscedasticity, 129 _restart, 87, 138, 201
Breusch-Pagan test, 134 _restr, 54, 68, 72, 173, 203, 218, 266
Goldfeld-Quandt test, 132 _rlist, 49, 61
Koenkar-Basset test, 135 _rplot, 49, 202
maximum likelihood estimation, 136 _rstat, 49, 168, 202
weighted least squares, 132 _step, 87
_tol, 87, 153, 201 line search, 87
_vb, 275 modified quadratic-hill climbing
_vcov, 50, 96, 130, 199 method, 87
_weight, 134 Newton-Raphson method, 86
Instrumental variable, 170, 192 quadratic hill-climbing (QHC) method,
Instrumental variable estimation, IV, 3, 87
167, 185 steepest-ascent method, 86
Nonlinear rational expectation, 179, 189
Jacobian, 211 Normal probability distribution, 89
Jacobian transformation, 103 Numerical derivatives, 83
Numerical Jacobian, 104, 289
Klein Model I, 197
Ordinary least squares, 45
L’Hôspital’s rule, 105 Orthogonality condition, 180
Latent variable, 116 Output control variable, 293, 295
Least squares estimation, 41 __a, 275
Least squares prediction, 41. See __b, 82, 275
Forecasting __e, 225
Likelihood function, 89 __r2, 78
Limited dependent variable model, 115 __rss, 133
Limited information maximum likelihood, __vb, 82, 275
LIML, 196
Linear probability model, 116 Panel data, 251
Linear restriction, 54, 203 Panel data analysis, 251
Wald F-test, 56, 69 between-estimates, 253
Logistic curve, 116 Breusch-Pagan LM test for random
Logit model, 116 effects, 258
Log-likelihood function, 89, 103, 118, deviation approach, 252
123, 210 dummy variable approach, 252
Log-normal probability distribution, 90 fixed effects, 252
Longitudinal data, 251 Hausman specification test for fixed or
random effects, 258
Maximum likelihood estimation, 86, 103 individual effects, 251
Moment function, 179 one-way analysis, 251
Moment restrictions, 180 partial deviation approach, 257
Moving average, 158 random effects, 258
Multicollinearity, 75 SUR method, 264
condition number, 31, 76 time effects, 251
Theil’s measure, 77 two-way analysis, 261
variance inflation factors, VIF, 79 Wald F-test for fixed effects, 253
Multiple regression, 50 within-estimates, 252
Multiplicative heteroscedasticity, 137 Partial adjustment, 167
Partial autocorrelation coefficient, 147,
Newey-West estimator, 143 234
Nonlinear full information maximum Partial autocorrelation function, PACF, 4,
likelihood, 209 148, 233, 241
Nonlinear least squares, 86, 101 Partial correlation coefficient, 75
Nonlinear optimization, 83 Partial regression coefficient, 52
BFGS quasi-Newton method, 86 Perfect collinearity, 66
DFP quasi-Newton method, 86 Permanent income hypothesis, 167, 192,
gradient, first derivatives, 83 223, 227
Greenstadt method, 86 Principal components, 80
hessian, second derivatives, 83 Probit model, 116
P-value, 48 dummy variable approach, 70
Sum-of-squares, 83, 97
Residual analysis, 48 System of simultaneous equations, 195
Bera-Jarque normality test, 61, 135 endogenous variable, 195
Durbin-Watson test statistic, 49 identity equation, 196
first-order rho, 49 predetermined variable, 195
kurtosis, 62 stochastic equation, 196
skewness, 62
Residual diagnostics, 61 Three-stage least squares, 3SLS, 196
DFFITS, 63 Time series analysis, 233
influential observations, 61 serial correlation in the mean, 233
leverage, 63 serial correlation in the variance, 233
outliers, 61 Time series conversion, 37, 39
standardized predicted residuals, 63 Tobit analysis, 123
standardized residuals, 63 Transfer function. See ARMAX
studentized residuals, 63 regression model
Residual sum of squares, RSS, 56 Translog cost function, 204
Restricted least squares, 54, 173, 218, 266 t-ratio, 47
Ridge regression, 80 Two-stage least squares, 2SLS, 196
2. NO WARRANTIES: Publisher and Aptech do not warrant the accuracy of the Contents as
contained in the Media against data corruption, computer viruses, errors in file transfer data,
unauthorized revisions to the files, or any other alterations or data destruction to the file(s).
The Media and its Contents are transmitted as is. Publisher and Aptech shall not have any
liability for Recipient’s use of the Media or its Contents, including without limitation, any
transmittal of bugs, viruses, or other destructive or harmful programs, scripts, applets or files
to the computers or networks of the Recipient. Recipient acknowledges and agrees that
Recipient is fully informed of the possibility of the Media or its Contents being harmful to
Recipient’s computers or networks and the possibility that the Contents may not be an exact
and virus-free copy of masters by Publisher or Aptech. Recipient also acknowledges, agrees,
and warrants that Recipient shall be solely responsible for inspection and testing of the Media
and the Contents for bugs, viruses, or other destructive or harmful programs, scripts, applets
or files, before accessing or using the Media or Contents.
5. GAUSS LIGHT SOFTWARE LICENSE: Installation and use of this software is subject to
and governed by the License Agreement displayed in the Media. By installing and using the
GAUSS LightTM Software, Recipient indicates his or her acceptance of, and Recipient is
subject to, all such terms and conditions of the License Agreement. Violation of the License
Agreement is also a violation of the copyright laws.