STATA and R
STATA and R
STATA and R
BUSINESS SCHOOL
RESEARCH ESSAY
ECONOMETRIC
tangible=
√ ¿ tangible assets
total assets
If a business has a large percentage of tangible fixed assets, lenders are more willing to provide
loans since they can reduce the risk of lending because tangible fixed assets are used as collateral.
3.6 tax
corporate income tax rate = income tax actually paid / total income before tax.
Tax is a very important tool for managers to determine the capital structure of an enterprise. . When
the corporate income tax rate is high, businesses will tend to borrow more debt to benefit from the tax
shield. In fact, it has been shown that the relationship between the corporate income tax rate and the tax
rate is high. debt-to-total assets ratio is positive.
3.7 liquidity
The quick solvency of the business is calculated by taking the natural logarithm of total current
assets divided by current liabilities.
liquidity = ln(total current assets / current liabilities)
Quick solvency has an effect on the capital structure of a business. According to the trade-off
theory, a business can use liabilities to pay off because it needs to maintain high solvency. This theory
states that a firm's solvency is positively related to its debt-to-asset ratio.
However, the pecking order theory shows that there exists a negative relationship between the
solvency of the firm and the ratio of debt to total assets. Because if the business has high solvency, it
means that the business tends to use its own capital instead of debt.
3.8 state
state = 1 if the state holds more than 50% of the share capital, and state = 0 for the rest.
State ownership has an impact on the performance of enterprises. In addition to the advantage that
state-owned enterprises are supported with policies such as legal framework, tax, and easier access to
capital, state ownership makes enterprises less motivated or less motivated. efforts in creating profits
for shareholders. According to the pecking order theory, state-owned enterprises will tend to use all of
their internal capital instead of debt financing. This theory demonstrates that there exists a negative
relationship between state ownership and debt-to-equity ratio.
4. HYPOTHESIS
This project will examine how factors affect the capital structure by creating 3 hypotheses:
H1: All factors are negatively related to capital structure.
H2: All factors are positively related to capital structure.
H3: Some factors are positively related some are negatively related to capital structure.
RESEARCH RESULTS
1. SOME BASIC TESTS FOR DATA
1.1 Descriptive statistics
First, we checked the data by using the command: describe.
The data set has a variable company in the form of strings, we have to encode that variable in
numeric form, with a new variable named cty. Then, we summarized the data to get basic statistical
information
Command: encode company, generate(cty)
summarize TDTA roa tobinq grow tangible tax liquidity state
Table 1.1. Descriptive statistics
Variable Obs Mean Std. dev Min Max
TDTA 200 0.509593 0.2161634 0.1036509 1.66833
roa 200 -2.960643 0.9817643 -7.823014 -1.084289
tobinq 200 0.3971934 0.6610419 -1.726099 2.343141
grow 200 0.0038987 0.4372285 -3.468204 0.7275137
tangible 200 0.3684422 0.193659 0.0280032 1.042776
tax 200 0.1865015 0.0891136 0.027745 0.9342222
liquidity 200 0.5411906 0.4845706 -0.4314091 2.318925
state 200 0 .15 0.3579675 0 1
(STATA 14.1)
The statistic shows that the average debt-to-total asset ratio is 50,95% so the return on assets (roa)
is -296%. The growth opportunity of chosen enterprises is nearly 40% but the growth rate is 0,3%
which is pretty low. Profit tax is 18,6% on average.
1.2 Correlation effective matrix:
Command: correlate TDTA roa tobinq grow tangible tax liquidity state
Table 1.2. Correlation effective matrix
TDTA roa tobinq grow tangible tax liquidity state
TDTA 1.0000
roa -0.4950 1.0000
tobinq -0.0194 0.3600 1.0000
grow 0.0603 0.0871 0.1037 1.0000
tangible -0.0771 0.1595 0.0534 0.1191 1.0000
tax 0.1443 -0.5006 -0.0486 0.0320 -0.0714 1.0000
liquidit -0.3371 0.2273 0.0299 -0.1396 -0.4634 -0.0121 1.0000
y
state 0.0832 -0.1182 -0.0691 -0.1121 -0.1348 -0.0624 0.1242 1.0000
(STATA 14.1)
Based on the table, the relationship between the dependent variable TDTA and independent
variable roa has the highest correlation coefficient of 15,95%, which means that the capital structure of
an enterprise depends very much on the efficiency of its ability to use assets.
1.3 Multicollinearity test (post-estimation test)
Command: regress TDTA roa tobinq grow tangible tax liquidity state
estat vif, uncentered
Table 1.3. Postestimation test
Variable VIF 1/VIF
TDTA 33.81 0.029574
roa 19.74 0.050668
tobinq 7.86 0.127242
grow 6.60 0.151582
tangible 3.48 0.287437
tax 1.63 0.614177
liquidity 1.26 0.791952
state 1.06 0.947282
Mean VIF 9.43
(STATA 14.1)
The criterion to help detect multicollinearity is VIF, also known as the variance exaggeration factor.
If VIF > 10, collinearity is happening and should be inspected. Looking at Table 1.3, we can see
that the independent variable roa has VIF > 10, so the phenomenon of multicollinearity appears in this
variable but not the whole model because of the mean VIF < 10.
1.4 Unit roots test (stationarity)
We use Fisher-type based on augmented Dickey-Fuller tests with zero lag to calculate. 2
hypotheses:
Ho: All panels contain unit roots
Ha: At least one panel is stationary
Command: xtunitroot fisher TDTA, dfuller lags(0)
Table 1.4. Testing for unit roots (STATA 14.1)
Statistic p-value
Inverse chi-squared P 309.0377 0.0000
Inverse normal Z -3.0451 0.0012
Inverse logit t L* -8.8747 0.0000
Modified inv. chi-squared Pm 18.1070 0.0000
Based on the test, the result shows that chi-squared = 0.0000 < α = 0.05 so we rejected the
hypothesis Ho and accepted Ha which means at least one panel is stationary.
2. PANEL DATA REGRESSION
2.1 Pool OLS model
To regress, we have to set the x y variables (x-panel var = cty; y-time var = year)
Command: xtset cty year
regress TDTA roa tobinq grow tangible tax liquidity state
Table 2.1. Pool OLS regression model
TDTA Coef Std. Err. t P>t [95% Conf. Interval]
roa -0.1183558 0.0177419 -6.67 0.000 -0.15335 -0.0833617
tobinq 0.0587001 0.0206439 2.84 0.005 0.0179822 0.099418
grow 0.0373644 0.0293376 1.27 0.204 -0.020501 0.0952299
tangible -0.16112 0.0768904 -2.10 0.037 -0.3127783 -0.0094617
tax -0.3132996 0.1689867 -1.85 0.065 -0.6466084 0.0200092
liquidity -0.1267919 0.0320127 -3.96 0.000 -0.1899337 -0.0636501
state 0.0291806 0.0361333 0.81 0.420 -0.0420885 0.1004497
_cons 0.0291806 0.0724178 4.39 0.000 0.1749219 0.460595
(STATA 14.1)
Number of obs = 200
F(7, 192) = 15.40
Prob > F = 0.0000
R-squared = 0.3595
Adj R-squared = 0.3362
Root MSE = 0.17612
R-squared = 0.3596, the model can be explained 35,95% in reality. Adjusted R-squared = 0.3362
means that independent variables in the model can explain 33,62% of the variation of dependent
variable TDTA. The remaining 100 – 33,63% = 67,37% is due to outside-model variables.
With 10% significance level, roa, tobinq, tangible, and liquidity are all statistically significant.
Based on the coef factor we can generate the model:
TDTA = 0.0291806 - 0.1183558*roa + 0.0373644*tobinq + 0.0373644*grow - 0.16112*tangible
- 0.3132996*tax - 0.1267919*liquidity + 0.0291806*state + ei
The profitability of a business (roa) has a negative impact on capital structure – increase
profitability by 1%, the debt ratio in capital structure decreases by 11,83%, and vice versa. When
enterprises operate effectively, internal capital from profits will be prioritized to be retained.
The growth opportunity (tobinq) has a positive impact - increase growth opportunities by 1%, the
debt ratio in capital structure increases by 5,87%, and vice versa. The company raises more capital due
to the cost of raising is cheaper, then the enterprise has a tendency to lend more debt to raise capital.
Within the scope of this study of the Pool model, the growth rate (grow), income tax rate (tax), and
state’s ownership (state) do not affect capital structure because they are not statistically significant.
2.2 Test with dummy
A dummy variable for each enterprise i.cty can be used to do a heteroskedasticity test.
Command: estat hettest i.cty
Ho = Constant variance
Ha = there is variation
Table 2.2 Heteroskedasticity test for OLS (STATA 14.1)
chi2 = 489.77
Prob > chi2 = 0.0000
The result shows that Prob > chi2 = 0.0000 < α = 0.05 so we rejected the hypothesis Ho which
means there is heteroskedasticity.
2.3 Fixed effect model
2.3.1 Within transformation
Command: xtreg TDTA roa tobinq grow tangible tax liquidity state, fe
In this model, the state variable is omitted because of collinearity
Table 2.3 Fixed effect (within transformation) regression model (STATA 14.1)
TDTA Coef Std. Err. t P>t [95% Conf. Interval]
roa -0.0360356 0.0177461 -2.03 0.044 -0.0710928 -0.0009784
tobinq 0.0423695 0.0223645 1.89 0.060 -0.0018114 0.0865504
grow 0.022698 0.0200884 1.13 0.260 -0.0169865 0.0623825
tangible 0.1764481 0.1130924 1.56 0.121 -0.0469646 0.3998609
tax -0.1261964 0.1438586 -0.88 0.382 -0.4103873 0.1579945
liquidity -0.0727456 0.0352261 -2.07 0.041 -0.1423343 -0.0031568
_cons 0.3838812 0.0625423 6.14 0.000 0.2603296 0.5074328
(STATA 14.1)
Based on the regression results of the FEM model, we see that most of the p-values are > 0.05,
except for the roa and liquidity, so it means that the variables are not statistically significant in affecting
the TDTA, except for roa and liquidity. And the F-value test has a p-value of 0.0000 < 0.05, and the R-
squared value of 12,34% means the corresponding confidence level is 12.34%
2.3.2 First difference
The first-differenced estimator is an approach that is used to address the problem of omitted
variables so it is actually obtained by running a pooled OLS estimation for a regression of the
differenced variables (d.roa d.tobinq d.grow d.tangible d.tax d.liquidity d.state)
Command: reg d.TDTA d.roa d.tobinq d.grow d.tangible d.tax d.liquidity d.state
d.state omitted because of collinearity.
Table 2.4 Fixed effect (first difference) regression model
d.TDTA Coef Std. Err. t P>t [95% Conf. Interval]
d.roa -0.0370262 0.013764 -2.69 0.008 -0.0642183 -0.009834
d.tobinq 0.0234956 0.0144384 1.63 0.106 -0.0050287 0.0520199
d.grow 0.009149 0.0128395 0.71 0.477 -0.0162167 0.0345147
d.tangible 0.3376215 0.107367 3.14 0.002 0.1255083 0.5497347
d.tax -0.1262337 0.1179178 -1.07 0.286 -0.359191 0.1067235
d.liquidity -0.1162716 0.0267494 -4.35 0.000 -0.1691174 -0.0634257
_cons -0.0165758 0.0076425 -2.17 0.032 -0.0316743 -0.0014773
(STATA 14.1)
Based on the results of FD model, we see that d.roa, d.tangible, and d.liquidity have p-values <
0.05, so it means that they are statistically significant in affecting the d.TDTA.
2.3.3 LSDV
The LSDV method is also developed from the Pool PLS model with the dummy variable i.cty.
Command: regress TDTA roa tobinq grow tangible tax liquidity state i.cty
40.cty omitted because of collinearity.
Table 2.5 LSDV regression model
Number of obs = 200
F(7, 192) = 13.52
Prob > F = 0.0000
R-squared = 0.7980
Adj R-squared = 0.7390
Root MSE = 0.11043
(STATA 14.1)
R-squared = 0.7980, the model can be explained 79,80% in reality. Adjusted R-squared = 0.7390
means that independent variables in the model can explain 73,90% of the variation of dependent
variable TDTA.
2.4 Random effect model
Command: xtreg TDTA roa tobinq grow tangible tax liquidity state, re
Table 2.7 Random effect regression model
TDTA Coef Std. Err. t P>t [95% Conf. Interval]
roa -0.0654263 0.0162754 -4.02 0.000 -0.0973255 -0.0335271
tobinq 0.042134 0.0205559 2.05 0.040 0.0018452 0.0824228
grow 0.0292957 0.0203272 1.44 0.150 -0.0105448 0.0691362
tangible -0.0032733 0.0906076 -0.04 0.971 -0.180861 0.1743144
tax -0.2372918 0.1383178 -1.72 0.086 -0.5083898 0.0338062
liquidity -0.0975874 0.0314752 -3.10 0.002 -0.1592777 -0.0358971
state 0.0508971 0.0671838 0.76 0.449 -0.0807807 0.1825748
_cons 0.3896796 0.064866 6.01 0.000 0.2625445 0.5168148
(STATA 14.1)
Based on the regression results of the REM model, we see that most of the p-values are > 0.05,
except for the roa, tobinq, and liquidity, so it means that the variables are not statistically significant in
affecting the TDTA, except for roa, tobinq, and liquidity. The F-value test has a p-value of 0.0000, the
R-squared value of 31,67% means the corresponding confidence level is 31,67%.
2.5 Between model
Command: xtreg TDTA roa tobinq grow tangible tax liquidity state, be
Table 2.7 Between effect regression model
TDTA Coef Std. Err. t P>t [95% Conf. Interval]
roa -0.1623828 0.0491026 -3.31 0.002 -0.2624014 -0.0623641
tobinq 0.0954144 0.0504809 1.89 0.068 -0.0074119 0.1982408
grow 0.0224594 0.1576872 0.14 0.888 -0.2987389 0.3436576
tangible -0.1331151 0.1808944 -0.74 0.467 -0.501585 0.2353548
tax -0.0361114 0.5394235 -0.07 0.947 -1.134881 1.062658
liquidity -0.1173176 0.0849495 -1.38 0.177 -0.290354 0.0557188
state 0.0223154 0.0708643 0.31 0.755 -0.1220303 0.1666612
_cons 0.106774 0.2067308 0.52 0.609 0.3143229 0.5278708
(STATA 14.1)
Based on the regression results of the BEM model, we see that most of the p-values are > 0.05,
except for the roa, so it means that the variables are not statistically significant in affecting the TDTA,
except for roa. The F-value test has a p-value of 0.0006, and the R-squared value of 33,65% means the
corresponding confidence level is 33,65%.
3. CHOOSING BETWEEN OLS, FEM, AND REM
3.1 Pool OLS or Fixed effect model
To compare Pool OLS and FEM, we use the testparm command to test whether the coefficient of
dummy variables is simultaneously zero (testing between the OLS and LSDV).
Ho: There is no fixed effect
Ha: There is a fixed effect
Command: testparm i.cty
Table 3.1 Result of testparm
F (38,154) = 8.80
Prob > F = 0.0000
(STATA 14.1)
As the Prob > F = 0.0000 < α = 0.05, we rejected the Ho and accepted Ha, which means we used
FEM instead of OLS.
3.2 Pool PLS or Random effect model
To compare Pool OLS and FEM, we use the Breusch and Pagan Lagrangian multiplier test.
Ho: There is no random effect
Ha: There is a random effect
Command: xttest0
Table 3.2 Result of xttest0
chibar2 = 107.22
Prob > chibar2 = 0.0000
(STATA 14.1)
As the Prob > F = 0.0000 < α = 0.05, we rejected the Ho and accepted Ha, which means we used
REM instead of OLS.
3.3 Fixed effect model or Random effect model
To choose between the two models FEM or REM, which is more appropriate, it depends on the
assumption we make about the correlation possible between the specific noise components and the
independent variables.
If the assumption εi and the explanatory variables are not correlated, then the REM model is more
suitable, and vice versa, if they are correlated, the FEM model is more efficient.
We use the Hausman Test to check whether εi is correlated with the explanatory variables.
Ho: difference in coefficients not systematic
Ha: difference in coefficients is systematic
Command: est store fe (after doing the fixed effect model)
est store re (after doing the random effect model)
hausman fe re
or hausman fe re, sigmamore (if V_b-V_B is not positive definite)
Table 3.3 Hausman test
(b) (B) (b-B) sqrt(diag(V_b-V_B))
fe re difference S.E.
roa -0.0360356 -0.0654263 0.0293907 0.0083128
tobinq 0.0423695 0.042134 0.0002355 0.0103881
grow 0.022698 0.0292957 -0.0065977 0.0038451
tangible 0.1764481 -0.0032733 0.1797215 0.0731745
tax -0.1261964 -0.2372918 0.1110954 0.0530703
liquidity -0.0727456 -0.0975874 0.0248418 0.0180366
(STATA 14.1)
chi2 = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 16.62
Prob>chi2 = 0.0108
Based on Hausman test, we have Prob > Chi2 = 0.0108 < α = 0.05, so we reject hypothesis Ho and
accepted Ha, which means difference in coefficients is systematic or εi are correlated with independent
variables. Thus the model FEM is suitable for this study.
4. TESTS FOR CHOSEN MODEL
4.1 Heteroskedasticity in FEM
We used the Modified Wald test for groupwise heteroskedasticity in fixed effect regression model.
Ho: sigma(i)^2 = sigma^2 for all i – there is no heteroskedasticity
Ha: there is heteroskedasticity
Command: xttest3
Table 4.1 Result of xttest0
chibar2 = 34948.00
Prob > chibar2 = 0.0000
(STATA 14.1)
We obtained Prob = 0.0000 < 0.05 Thus, we accepted the hypothesis Ha and rejected Ho which
means there is heteroskedasticity.
4.2 Autocorrelation test
Wooldridge test for autocorrelation in panel data
Ho: no first-order autocorrelation
Ha: there is first-order autocorrelation
Command: xtserial TDTA roa tobinq grow tangible tax liquidity state
Table 4.2 Result of autocorrelation test (STATA 14.1)
F (1,49) = 20.202
Prob > F = 0.0001
Prob > F = 0.0001 < 0.05, so we concluded that there is first-order autocorrelation.
4.3 GLS to fix heteroskedasticity and autocorrelation
After conducting the tests, we could see that the regression model violates the autocorrelation and
heteroskedasticity phenomenon. We control these phenomena by performing the GLS method as well
as increasing the efficiency of the panel data.
Command: xtreg TDTA roa tobinq grow tangible tax liquidity state, fe robust
est store fgls
Table 4.3 GLS regression model
TDTA Coef Robust t P>t [95% Conf. Interval]
Std. Err.
roa -0.0360356 0.0147454 -2.44 0.019 -0.065861 -0.0062102
tobinq 0.0423695 0.0173665 2.44 0.019 0.0072424 0.0774966
grow 0.022698 0.014061 1.61 0.115 -0.0057429 0.051139
tangible 0.1764481 0.3012644 0.59 0.561 -0.4329166 0.7858129
tax -0.1261964 0.0995534 -1.27 0.212 -0.3275622 0.0751694
liquidity -0.0727456 0.0334044 -2.18 0.036 -0.1403123 -0.0051789
_cons 0.3838812 0.1089805 3.52 0.001 0.1634474 0.604315
(STATA 14.1)
The roa variable has the opposite effect with the dependent variable TDTA and is statistically
significant at 5%. Thus, when the independent variable roa increases by 1 unit, it will cause TDTA to
decrease by 0.0360356 units, if other factors are kept constant.
The tobinq has a positive effect on the TDTA and has statistical significance at 5% level. When the
tobinq increases by 1 unit, TDTA will increase by 0.0423695 units if other factors are kept constant.
The liquidity has the opposite effect with TDTA and is statistically significant at the level 5%.
Thus, when the liquidity variable increases by 1 unit, the TDTA will decrease by 0.0727456 units if
other factors are kept constant.
4.4 Compare results
Command: esttab pool fe re fgls, r2 star(* 0.1 ** 0.05 *** 0.01) brackets nogap compress
Table 4.4 Results of 4 models with significance levels 10%, 5%, 1%
Pool OLS FEM REM GLS
roa -0.118*** -0.0360** -0.0654*** -0.0360**
[-6.67] [-2.03] [-4.02] [-2.44]
tobinq 0.0587*** 0.0424* 0.0421** 0.0424**
[2.84] [1.89] [2.05] [2.44]
grow 0.0374 0.0227 0.0293 0.0227
[1.27] [1.13] [1.44] [1.61]
tangible -0.161** 0.176 -0.00327 0.176
[-2.10] [1.56] [-0.04] [0.59]
tax -0.313* -0.126 -0.237* -0.126
[-1.85] [-0.88] [-1.72] [-1.27]
liquidity -0.127*** -0.0727** -0.0976*** -0.0727**
[-3.96] [-2.07] [-3.10] [-2.18]
state 0.0292 . 0.0509 .
[0.81] [0.76]
_cons 0.318*** 0.384*** 0.390*** 0.384***
[4.39] [6.14] [6.01] [3.52]
(STATA 14.1)
CONCLUSION
1. CONCLUSION FROM FEM
Getting an optimal collaborator is one of the most important contents of the corporate capital
management strategy. Therefore, the study of the theoretical basis of capital structure as well as the
application of it to plan the optimal structure is the most important task of the managers.
In this study, after searching sample and testing it, the results have been obtained:
- Regression results show that profit (roa), the corporate’s income tax rate (tax), and liquidity have
a negative impact on capital structure represented by the debt-to-asset ratio TDTA.
- Meanwhile, the growth opportunities (tobinq), percentage of the tangible asset (tangible), and
state’s ownership have a positive impact on capital structure.
Therefore, we accepted hypothesis H3 and rejected H1, and H2 from the hypotheses at the
beginning. That means some factors are positively related some are negatively related to capital
structure.
2. LIMITATION
The results of this study can bring many benefits to credit institutions in Vietnam, but due to limited
capacity and limited time, the study still has many limitations:
- The data set is quite small and data collected from financial statements of enterprises, and
financial indicators on listed websites are sometimes incomplete.
- This study is based on internal factors and ignores other factors that can affect the ratio of debt to
total assets of the enterprise such as the behavior of managers, and operating policies.
- Some of the chosen factors may not be the core factors of capital structure. We should add some
other internal or macro variables to increase the explanatory power of the dependent variable.
PICTURE OF RESULTS FROM STATA
1. Summaty data
2. Descriptive statistics
12. Testparm
13. Breusch and Pagan Lagrangian multiplier test
(Rstudio)
The graph shows that this time series does not seem to be stationary, and the fluctuation range of
the index is quite large during this period. However, to be more certain, the author uses the Augmented
Dickey-Fuller test to double-check in parallel with determining the coefficient 𝑑 by testing the original
time series, logarithm, and the 1st order difference.
Ho: The time series is not stationary.
Ha: The time series is stationary
Command: adf.test(price)
adf.test(dp)
Table 5.2 Augmented Dickey-Fuller test result
Data Type Lag ADF p-value
price No drift no trend 6 -0.740 0.414
With drift no trend 6 -0.858 0.752
With drìft with trend 6 -2.36 0.426
dp No drift no trend 6 -10.6 0.01
(diff(log(price)) With drift no trend 6 -10.6 0.01
) With drìft with trend 6 -10.6 0.01
(Rstudio)
At the 5% level of significance, the author concludes that the 1st-order difference is a stationary
time series of any type with p-value = 0.01 < 0.05. Therefore, the author determines the value of the
coefficient 𝑑 = 1 in operating the ARIMA model.
2.3 Determine optimal delay p, q
The author uses analysis based on the Correlogram ACF and PACF combined with AIC content
method to determine p, q. In this test, the author also uses the logarithm of price data.
Command: lnp <- log(price)
> par(mfrow=c(3,1),mar=c(3,3,3,3))
> plot(lnp,main="Log of Price")
> acf(lnp,ylab='',main="ACF of price",ylim=c(-1,1))
> pacf(lnp,ylab='',main="PACF of price",ylim=c(-1,1))
Graph 2.1 ACF and PACF of log price
(Rstudio)
Based on the ACF diagram, the author found that lnp is correlated with the 2nd and 3rd-order delay
Therefore, the values 𝑞 = 3 and 𝑞 = 5 are considered and selected to operate the model. With the PACF
diagram, there is a partial correlation with the 2nd and 3rd-order lag, So the values 𝑝 = 1 and 𝑝 = 2 is
considered for selection.
Therefore, to select the model, the author uses
Command: auto.arima(lnp, seasonal = FALSE, approximation = FALSE, trace = TRUE)
Table 6.1 auto arima test result
Best model: ARIMA(2,1,3)
Series: lnp
ARIMA(2,1,3)
sigma^2 = 0.00024 log likelihood = 2033.79
AIC = - 4055.58 AICc = - 4055.47 BIC = - 4027.95
(Rstudio)
Thus, the author decided to choose the ARIMA(2,1,3) model for the next testing and forecasting
steps.
2.4 Estimating the selected model:
Command: arima(lnp, order=c(2,1,3),include.mean=F,method="ML")
Arima(lnp, order=c(2,1,3),include.constant=F,method="ML")
Table 7.1 Results of regression model ARIMA(2,1,3)
AR(1) AR(2) MA(1) MA(2) MA(13)
Coefficient -0.9066 -0.7879 0.9285 0.8056 -0.0722
Standard error 0.0743 0.0858 0.0840 0.1002 0.0471
sigma^2 = 0.00024: log likelihood = 2033.79
AIC=-4055.58 AICc=-4055.47 BIC=-4027.95
(Rstudio)
The AIC statistic is the smallest among all models so ARIMA(2,1,3) is the appropriate one.
2.5 Test for residual
To check whether this is a good model, the author uses tests related to the residuals of the
regression model. First, test the variance for change. Here, the author uses the Ljung - box with the test
hypothesis as follows. The total lags used is 10.
Ho: Residual does not experience autocorrelation
Command: arima1 <- arima(lnp, order=c(2,1,3),include.mean=F,method="ML")
checkresiduals(arima1)
Table 8.1 Results of Ljung-box test
Ljung-Box test
data: Residuals from ARIMA(2,1,3)
Q* = 5.6681, df = 5, p-value = 0.3399
(Rstudio)
The p-value = 0,3399 > 0.05 indicates that the residual has no autocorrelation means that there is a
White noise phenomenon.
Check the variance of the residuals by the ARCH heteroscedasticity test.
Ho: no ARCH effects
Command: arch.test(arima1)
ArchTest(arima1$residuals,lags=360)
Table 8.2 Results of ARCH heteroscedasticity test
ARCH LM-test
data: arima1$residuals
Chi-squared = 333.14, df = 360, p-value = 0.8418
(Rstudio)
The p-value = 0,8418 > 0.05 indicates that the residual there is no ARCH effect means that there is
no variance in the residuals.
Check the nominal distribution of the residuals by the Shapiro-Wilk normality test.
Ho: no ARCH effects
Command: shapiro.test(resid(arima1))
Table 8.2 Results of Shapiro-Wilk normality test
Shapiro-Wilk normality test
Data: resid(arima1)
W = 0.96857, p-value = 1.629e-11
(Rstudio)
The p-value = 1.629e-11 < 0.05 means there that the data tested are not normally distributed.
2.6 Forecasting and model performance evaluation
Here, the author makes forecast for the next 10 days. Forecast results, graphs are presented below:
Command: pred <-forecast(arima1, lead=10)
pred1 <- predict(arima1, n.ahead=10)
pred2 <- predict(arima1, h=10)
Table 8.2 Results for forecasting
Lead Forecast Standard error Lower 95% Upper 95% Reality Difference
1 9.84 0.0154 9.81 9.87 9.80 0.04
2 9.84 0.0221 9.80 9.88 9.82 0.02
3 9.84 0.0271 9.79 9.89 9.70 0.14
4 9.84 0.0307 9.78 9.90 9.76 0.08
5 9.84 0.0344 9.77 9.91 9.75 0.09
6 9.84 0.0378 9.77 9.92 9.67 0.17
7 9.84 0.0405 9.76 9.93 9.70 0.14
8 9.84 0.0434 9.75 9.94 9.75 0.09
(Rstudio)
Graph 3.1 Predicted value for next 10 periods
(Rstudio)
After obtaining the forecast results, the author has some comments:
(1) Due to the characteristics of forecasting, the confidence interval tends to widen for more distant
forecast points. This reduces the accuracy of the forecast.
(2) The actual Hang Seng Index data in the next 10 days is completely not within the 95%
confidence interval of the forecast. Thus, in the author's opinion, ARIMA is not a good model
to forecast the index in an environment where price shocks may occur.
RESULTS FROM RSTATION
1. Summary data
7. Arch test
8. Shaprio test
9. Forecasting