Lab Assignment

Lab assignment
Yogendra panwar
19HS20048
DATE 29/03/2023
Variables in Panel Data

In this panel dataset variables used are:
 I = Airline,
 T = Year,
 PF = Fuel price,
 LF = Load factor, the average capacity utilization of the fleet.
 Q = Output, in revenue passenger miles, index number,
 C = Total cost, in $1000, the data
Methodology
To identify the factors that influence the total cost of U.S. airlines, a
multiple linear regression model was fitted using the predictors
available in the dataset.
The model is of the form:
Total cost = β0 + β1Airline + β2Year + β3Output + β4Fuel price +
β5*Load factor + ε
where β0 is the intercept, β1 to β5 are the coefficients for the
predictors, and ε is the error term.
The model was fitted using the OLS (ordinary least squares) method,
and the significance of the coefficients was tested using t-tests and p-
values.
The model was also evaluated for goodness of fit using the R-squared
value and the residual plots.
If assumptions do not hold, OLS estimates are BIASED and/or

INEFFICIENT
 Biased - Expected value of parameter estimate is different from true.
Consistency. If an estimator is unbiased, or if the bias shrinks as the
sample size increases, we say it is CONSISTENT
 Inefficient - (Informally) Estimator is less accurate as sample size
increases than an alternative estimator. o Estimators that take full
advantage of information more efficient
OLS Bias Due to Endogeneity
 Omitted Variable Bias o Intervening variables, selectivity
 Measurement Error in the Covariates
 Simultaneity Bias o Feedback loops o Omitted variables Conventional
regression-based strategies to address endogeneity bias
 Instrumental Variables estimation
OLS Inefficiency due to Correlated Errors

Many data structures are susceptible to error correlation:
 Hierarchical data sample multiple individuals from each unit, e.g.
household members, employees in firms, multiple pupils from each
school.
 Multistage probability samples often incorporate cluster-based
sampling designs with errors that may be correlated within clusters.
 Repeated observations data often show within-unit error correlation.
STATA PART
Notes:
. *(6 variables, 90 observations pasted into data editor)
. xtset i t
panel variable: i (strongly balanced)
time variable: t, 1 to 15
delta: 1 unit
.
. summarize i t c q pf lf
Variable Obs Mean Std. Dev. Min Max
i 90 3.5 1.717393 1 6
t 90 8 4.344698 1 15
c 90 1122524 1192075 68978 4748320
q 90 .5449946 .5335865 .037682 1.93646
pf 90 471683 329502.9 103795 1015610
lf 90 .5604602 .0527934 .432066 .676287
. reg t c q pf lf
Source SS df MS Number of obs = 90

F( 4, 85) = 197.84
Model 1517.05038 4 379.262594 Prob > F = 0.0000
Residual 162.949622 85 1.91705438 R-squared = 0.9030
Adj R-squared = 0.8984
Total 1680 89 18.8764045 Root MSE = 1.3846
t Coef. Std. Err. t P>|t| [95% Conf. Interval]
c 1.37e-06 5.30e-07 2.58 0.012 3.14e-07 2.42e-06

q -2.785735 1.116553 -2.49 0.015 -5.005742 -.5657278
pf 9.37e-06 8.26e-07 11.34 0.000 7.72e-06 .000011
lf 20.09563 3.790522 5.30 0.000 12.55906 27.63221
_cons -7.698693 1.876634 -4.10 0.000 -11.42995 -3.967442
. xtunitroot llc t, noconstant
Levin-Lin-Chu unit-root test for t
Ho: Panels contain unit roots Number of panels = 6

Ha: Panels are stationary Number of periods = 15
AR parameter: Common Asymptotics: root(N)/T -> 0

Panel means: Not included
Time trend: Not included
ADF regressions: 1 lag

LR variance: Bartlett kernel, 7.00 lags average (chosen by LLC)
Statistic p-value
Unadjusted t 0.0000 0.5000

Adjusted t* -0.0229 0.4908
. xtreg i c q pf lf
the panel variable i may not be included as an independent variable
r(198);
. xtreg i t c q pf lf
the panel variable i may not be included as an independent variable
r(198);
. xtreg t c q pf lf
Random-effects GLS regression Number of obs = 90

Group variable: i Number of groups = 6
R-sq: within = 0.0000 Obs per group: min = 15

between = 0.0000 avg = 15.0
overall = 0.9030 max = 15
Wald chi2(4) = 791.34

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
t Coef. Std. Err. z P>|z| [95% Conf. Interval]
c 1.37e-06 5.30e-07 2.58 0.010 3.29e-07 2.41e-06

q -2.785735 1.116553 -2.49 0.013 -4.974139 -.5973306
pf 9.37e-06 8.26e-07 11.34 0.000 7.75e-06 .000011
lf 20.09563 3.790522 5.30 0.000 12.66635 27.52492
_cons -7.698693 1.876634 -4.10 0.000 -11.37683 -4.020558
sigma_u 0
sigma_e 1.2755837
rho 0 (fraction of variance due to u_i)
Conclusion
This report provides an analysis of a subset of a larger dataset
containing cost data for U.S. airlines from 1970 to 1984. The subset
consists of 90 observations on six firms for 15 years. The aim of this
analysis is to identify the factors that influence the total cost of U.S.
airlines and to build a predictive model for total cost based on the
available predictors. The predictors include airline, year, output, fuel
price, and load factor. The response variable is the total cost, measured
in $1000.
5.0e+06
0 1 2 3
5.0e+06
4 5 6
0
0 5 10 15 0 5 10 15 0 5 10 15
T
I T
C Q
PF LF
Graphs by I
Thank You

Lab Assignment

Uploaded by

Copyright:

Available Formats

Lab Assignment

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab Assignment

Uploaded by

Copyright:

Available Formats

Lab assignment

Variables in Panel Data

If assumptions do not hold, OLS estimates are BIASED and/or

OLS Inefficiency due to Correlated Errors

. *(6 variables, 90 observations pasted into data editor)

Variable Obs Mean Std. Dev. Min Max

lf 90 .5604602 .0527934 .432066 .676287

Source SS df MS Number of obs = 90

t Coef. Std. Err. t P>|t| [95% Conf. Interval]

c 1.37e-06 5.30e-07 2.58 0.012 3.14e-07 2.42e-06

Levin-Lin-Chu unit-root test for t

Ho: Panels contain unit roots Number of panels = 6

AR parameter: Common Asymptotics: root(N)/T -> 0

ADF regressions: 1 lag

Unadjusted t 0.0000 0.5000

Random-effects GLS regression Number of obs = 90

R-sq: within = 0.0000 Obs per group: min = 15

Wald chi2(4) = 791.34

t Coef. Std. Err. z P>|z| [95% Conf. Interval]

c 1.37e-06 5.30e-07 2.58 0.010 3.29e-07 2.41e-06

You might also like