Lab Assignment

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Lab assignment

Yogendra panwar
19HS20048
DATE 29/03/2023

Variables in Panel Data


In this panel dataset variables used are:

 I = Airline,
 T = Year,
 PF = Fuel price,
 LF = Load factor, the average capacity utilization of the fleet.
 Q = Output, in revenue passenger miles, index number,
 C = Total cost, in $1000, the data
Methodology
To identify the factors that influence the total cost of U.S. airlines, a
multiple linear regression model was fitted using the predictors
available in the dataset.
The model is of the form:
Total cost = β0 + β1Airline + β2Year + β3Output + β4Fuel price +
β5*Load factor + ε
where β0 is the intercept, β1 to β5 are the coefficients for the
predictors, and ε is the error term.
The model was fitted using the OLS (ordinary least squares) method,
and the significance of the coefficients was tested using t-tests and p-
values.
The model was also evaluated for goodness of fit using the R-squared
value and the residual plots.

If assumptions do not hold, OLS estimates are BIASED and/or


INEFFICIENT
 Biased - Expected value of parameter estimate is different from true.
Consistency. If an estimator is unbiased, or if the bias shrinks as the
sample size increases, we say it is CONSISTENT
 Inefficient - (Informally) Estimator is less accurate as sample size
increases than an alternative estimator. o Estimators that take full
advantage of information more efficient
OLS Bias Due to Endogeneity
 Omitted Variable Bias o Intervening variables, selectivity
 Measurement Error in the Covariates
 Simultaneity Bias o Feedback loops o Omitted variables Conventional
regression-based strategies to address endogeneity bias
 Instrumental Variables estimation

OLS Inefficiency due to Correlated Errors


Many data structures are susceptible to error correlation:
 Hierarchical data sample multiple individuals from each unit, e.g.
household members, employees in firms, multiple pupils from each
school.
 Multistage probability samples often incorporate cluster-based
sampling designs with errors that may be correlated within clusters.
 Repeated observations data often show within-unit error correlation.

STATA PART
Notes:

. *(6 variables, 90 observations pasted into data editor)

. xtset i t
panel variable: i (strongly balanced)
time variable: t, 1 to 15
delta: 1 unit

.
. summarize i t c q pf lf

Variable Obs Mean Std. Dev. Min Max

i 90 3.5 1.717393 1 6
t 90 8 4.344698 1 15
c 90 1122524 1192075 68978 4748320
q 90 .5449946 .5335865 .037682 1.93646
pf 90 471683 329502.9 103795 1015610

lf 90 .5604602 .0527934 .432066 .676287

. reg t c q pf lf

Source SS df MS Number of obs = 90


F( 4, 85) = 197.84
Model 1517.05038 4 379.262594 Prob > F = 0.0000
Residual 162.949622 85 1.91705438 R-squared = 0.9030
Adj R-squared = 0.8984
Total 1680 89 18.8764045 Root MSE = 1.3846

t Coef. Std. Err. t P>|t| [95% Conf. Interval]

c 1.37e-06 5.30e-07 2.58 0.012 3.14e-07 2.42e-06


q -2.785735 1.116553 -2.49 0.015 -5.005742 -.5657278
pf 9.37e-06 8.26e-07 11.34 0.000 7.72e-06 .000011
lf 20.09563 3.790522 5.30 0.000 12.55906 27.63221
_cons -7.698693 1.876634 -4.10 0.000 -11.42995 -3.967442
. xtunitroot llc t, noconstant

Levin-Lin-Chu unit-root test for t

Ho: Panels contain unit roots Number of panels = 6


Ha: Panels are stationary Number of periods = 15

AR parameter: Common Asymptotics: root(N)/T -> 0


Panel means: Not included
Time trend: Not included

ADF regressions: 1 lag


LR variance: Bartlett kernel, 7.00 lags average (chosen by LLC)

Statistic p-value

Unadjusted t 0.0000 0.5000


Adjusted t* -0.0229 0.4908

. xtreg i c q pf lf
the panel variable i may not be included as an independent variable
r(198);

. xtreg i t c q pf lf
the panel variable i may not be included as an independent variable
r(198);

. xtreg t c q pf lf

Random-effects GLS regression Number of obs = 90


Group variable: i Number of groups = 6

R-sq: within = 0.0000 Obs per group: min = 15


between = 0.0000 avg = 15.0
overall = 0.9030 max = 15

Wald chi2(4) = 791.34


corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

t Coef. Std. Err. z P>|z| [95% Conf. Interval]

c 1.37e-06 5.30e-07 2.58 0.010 3.29e-07 2.41e-06


q -2.785735 1.116553 -2.49 0.013 -4.974139 -.5973306
pf 9.37e-06 8.26e-07 11.34 0.000 7.75e-06 .000011
lf 20.09563 3.790522 5.30 0.000 12.66635 27.52492
_cons -7.698693 1.876634 -4.10 0.000 -11.37683 -4.020558

sigma_u 0
sigma_e 1.2755837
rho 0 (fraction of variance due to u_i)
Conclusion
This report provides an analysis of a subset of a larger dataset
containing cost data for U.S. airlines from 1970 to 1984. The subset
consists of 90 observations on six firms for 15 years. The aim of this
analysis is to identify the factors that influence the total cost of U.S.
airlines and to build a predictive model for total cost based on the
available predictors. The predictors include airline, year, output, fuel
price, and load factor. The response variable is the total cost, measured
in $1000.
5.0e+06

0 1 2 3
5.0e+06

4 5 6
0

0 5 10 15 0 5 10 15 0 5 10 15
T
I T
C Q
PF LF
Graphs by I

Thank You

You might also like