STATA and R

UEH UNIVERSITY
BUSINESS SCHOOL
RESEARCH ESSAY
ECONOMETRIC
Name: Nguyễn Lê Tú Quyên

Class: FNC03 – K47
Student No. : 31211021439
Friday, June 2nd, 2023

PART 1: STATA - FACTORS AFFECTING THE CAPITAL
STRUCTURE OF BUSINESSES IN VIETNAM
INTRODUCTIONS
1. RESEARCH SCOPE:
This research uses data from 40 Vietnamese enterprises by obtaining financial information
extracted from the business results of those enterprises on the Hochiminh City Stock Exchange (HOSE)
in the 2018 – 2022 period. These data samples have been carefully selected, the author did not choose
enterprises with a lack of transparency, unclear or missing data.
2. RESEARCH SUBJECTS
1.1 Dependant variable
- TDTA (debt to total asset ratio) represents for capital structure factor.
1.2 Independent variables
- roa: profitability of the business
- tobinq: the growth opportunities of the business
- growth: growth rate of the business
- tangible: percentage of the tangible asset of the enterprise’s asset structure
- tax: effective tax rate of corporate income tax
- liquidity: the quick liquidity of the enterprise
- state: state’s ownership
3. MEASURE VARIABLES
In order to ensure that the research variable data is normally distributed, and suitable for the
estimation models, the research variables are measured as follows:
3.1 TDTA
TDTA can be measured as:
TDTA = total debt / total assets
This variable represents the capital structure of the firm. The concept of capital structure is very
familiar to all enterprises today in Vietnam. Choosing the most effective debt-to-total asset ratio is the
top goal of managers. This paper will examine how the capital structure of a firm will be affected by
intrinsic factors such as profitability, income tax rates, and the solvency of businesses.
3.2 roa
The profitability of a business is calculated by taking the natural logarithm of after-tax profit
divided by total assets.
roa = ln(after-tax profit / total assets)
ROA is a very important financial indicator to evaluate the efficiency of the enterprise's ability to
use assets. The higher the roa, the more profitable the business is, and the probability of bankruptcy is
very low. This suggests a higher debt ratio due to the increase in financial leverage, since according to
the signaling theory, firms with large profits tend to increase their debt ratio due to the benefit of the
tax barrier. However, firms prefer to use equity over external debt (Pecking order theory), and if debt
financing is required, issuing safe-haven securities will be the top priority. Therefore, depending on the
specific case of each business, there are different results.
3.3 tobinq
The growth opportunity represents the relationship between market valuation and book value.
tobinq = ln(market value/book value)
It estimates whether a certain business or market is overvalued or undervalued. This ratio will
balance if market value equals book value, showing the investment opportunity of the business. If the
coefficient is high, the company will invest more since raising more capital will be cheaper because the
market price of the company is quite high compared to the cost of raising more capital. The company
should invest more if the coefficient q > 1, if q is low, the company will not increase investment
because the cost of raising more capital is quite expensive.
3.4 grow
grow = (revenue of this year - revenue of previous year)/revenue of previous year
It represents the growth of the company, influencing the investment decision of the business. The
higher the growth rate, the more company has positive business results.
3.5 tangible
tangible=
√ ¿ tangible assets
total assets
If a business has a large percentage of tangible fixed assets, lenders are more willing to provide
loans since they can reduce the risk of lending because tangible fixed assets are used as collateral.
3.6 tax
corporate income tax rate = income tax actually paid / total income before tax.
Tax is a very important tool for managers to determine the capital structure of an enterprise. . When
the corporate income tax rate is high, businesses will tend to borrow more debt to benefit from the tax
shield. In fact, it has been shown that the relationship between the corporate income tax rate and the tax
rate is high. debt-to-total assets ratio is positive.
3.7 liquidity
The quick solvency of the business is calculated by taking the natural logarithm of total current
assets divided by current liabilities.
liquidity = ln(total current assets / current liabilities)
Quick solvency has an effect on the capital structure of a business. According to the trade-off
theory, a business can use liabilities to pay off because it needs to maintain high solvency. This theory
states that a firm's solvency is positively related to its debt-to-asset ratio.
However, the pecking order theory shows that there exists a negative relationship between the
solvency of the firm and the ratio of debt to total assets. Because if the business has high solvency, it
means that the business tends to use its own capital instead of debt.
3.8 state
state = 1 if the state holds more than 50% of the share capital, and state = 0 for the rest.
State ownership has an impact on the performance of enterprises. In addition to the advantage that
state-owned enterprises are supported with policies such as legal framework, tax, and easier access to
capital, state ownership makes enterprises less motivated or less motivated. efforts in creating profits
for shareholders. According to the pecking order theory, state-owned enterprises will tend to use all of
their internal capital instead of debt financing. This theory demonstrates that there exists a negative
relationship between state ownership and debt-to-equity ratio.
4. HYPOTHESIS
This project will examine how factors affect the capital structure by creating 3 hypotheses:
H1: All factors are negatively related to capital structure.
H2: All factors are positively related to capital structure.
H3: Some factors are positively related some are negatively related to capital structure.
RESEARCH RESULTS
1. SOME BASIC TESTS FOR DATA
1.1 Descriptive statistics
First, we checked the data by using the command: describe.
The data set has a variable company in the form of strings, we have to encode that variable in
numeric form, with a new variable named cty. Then, we summarized the data to get basic statistical
information
Command: encode company, generate(cty)
summarize TDTA roa tobinq grow tangible tax liquidity state
Table 1.1. Descriptive statistics
Variable Obs Mean Std. dev Min Max
TDTA 200 0.509593 0.2161634 0.1036509 1.66833
roa 200 -2.960643 0.9817643 -7.823014 -1.084289
tobinq 200 0.3971934 0.6610419 -1.726099 2.343141
grow 200 0.0038987 0.4372285 -3.468204 0.7275137
tangible 200 0.3684422 0.193659 0.0280032 1.042776
tax 200 0.1865015 0.0891136 0.027745 0.9342222
liquidity 200 0.5411906 0.4845706 -0.4314091 2.318925
state 200 0 .15 0.3579675 0 1
(STATA 14.1)
The statistic shows that the average debt-to-total asset ratio is 50,95% so the return on assets (roa)
is -296%. The growth opportunity of chosen enterprises is nearly 40% but the growth rate is 0,3%
which is pretty low. Profit tax is 18,6% on average.
1.2 Correlation effective matrix:
Command: correlate TDTA roa tobinq grow tangible tax liquidity state
Table 1.2. Correlation effective matrix
TDTA roa tobinq grow tangible tax liquidity state
TDTA 1.0000
roa -0.4950 1.0000
tobinq -0.0194 0.3600 1.0000
grow 0.0603 0.0871 0.1037 1.0000
tangible -0.0771 0.1595 0.0534 0.1191 1.0000
tax 0.1443 -0.5006 -0.0486 0.0320 -0.0714 1.0000
liquidit -0.3371 0.2273 0.0299 -0.1396 -0.4634 -0.0121 1.0000
y
state 0.0832 -0.1182 -0.0691 -0.1121 -0.1348 -0.0624 0.1242 1.0000
(STATA 14.1)
Based on the table, the relationship between the dependent variable TDTA and independent
variable roa has the highest correlation coefficient of 15,95%, which means that the capital structure of
an enterprise depends very much on the efficiency of its ability to use assets.
1.3 Multicollinearity test (post-estimation test)
Command: regress TDTA roa tobinq grow tangible tax liquidity state
estat vif, uncentered
Table 1.3. Postestimation test
Variable VIF 1/VIF
TDTA 33.81 0.029574
roa 19.74 0.050668
tobinq 7.86 0.127242
grow 6.60 0.151582
tangible 3.48 0.287437
tax 1.63 0.614177
liquidity 1.26 0.791952
state 1.06 0.947282
Mean VIF 9.43
(STATA 14.1)
The criterion to help detect multicollinearity is VIF, also known as the variance exaggeration factor.
If VIF > 10, collinearity is happening and should be inspected. Looking at Table 1.3, we can see
that the independent variable roa has VIF > 10, so the phenomenon of multicollinearity appears in this
variable but not the whole model because of the mean VIF < 10.
1.4 Unit roots test (stationarity)
We use Fisher-type based on augmented Dickey-Fuller tests with zero lag to calculate. 2
hypotheses:
Ho: All panels contain unit roots
Ha: At least one panel is stationary
Command: xtunitroot fisher TDTA, dfuller lags(0)
Table 1.4. Testing for unit roots (STATA 14.1)
Statistic p-value
Inverse chi-squared P 309.0377 0.0000
Inverse normal Z -3.0451 0.0012
Inverse logit t L* -8.8747 0.0000
Modified inv. chi-squared Pm 18.1070 0.0000
Based on the test, the result shows that chi-squared = 0.0000 < α = 0.05 so we rejected the
hypothesis Ho and accepted Ha which means at least one panel is stationary.
2. PANEL DATA REGRESSION
2.1 Pool OLS model
To regress, we have to set the x y variables (x-panel var = cty; y-time var = year)
Command: xtset cty year
regress TDTA roa tobinq grow tangible tax liquidity state
Table 2.1. Pool OLS regression model
TDTA Coef Std. Err. t P>t [95% Conf. Interval]
roa -0.1183558 0.0177419 -6.67 0.000 -0.15335 -0.0833617
tobinq 0.0587001 0.0206439 2.84 0.005 0.0179822 0.099418
grow 0.0373644 0.0293376 1.27 0.204 -0.020501 0.0952299
tangible -0.16112 0.0768904 -2.10 0.037 -0.3127783 -0.0094617
tax -0.3132996 0.1689867 -1.85 0.065 -0.6466084 0.0200092
liquidity -0.1267919 0.0320127 -3.96 0.000 -0.1899337 -0.0636501
state 0.0291806 0.0361333 0.81 0.420 -0.0420885 0.1004497
_cons 0.0291806 0.0724178 4.39 0.000 0.1749219 0.460595
(STATA 14.1)
Number of obs = 200
F(7, 192) = 15.40
Prob > F = 0.0000
R-squared = 0.3595
Adj R-squared = 0.3362
Root MSE = 0.17612
R-squared = 0.3596, the model can be explained 35,95% in reality. Adjusted R-squared = 0.3362
means that independent variables in the model can explain 33,62% of the variation of dependent
variable TDTA. The remaining 100 – 33,63% = 67,37% is due to outside-model variables.
With 10% significance level, roa, tobinq, tangible, and liquidity are all statistically significant.
Based on the coef factor we can generate the model:
TDTA = 0.0291806 - 0.1183558*roa + 0.0373644*tobinq + 0.0373644*grow - 0.16112*tangible
- 0.3132996*tax - 0.1267919*liquidity + 0.0291806*state + ei
The profitability of a business (roa) has a negative impact on capital structure – increase
profitability by 1%, the debt ratio in capital structure decreases by 11,83%, and vice versa. When
enterprises operate effectively, internal capital from profits will be prioritized to be retained.
The growth opportunity (tobinq) has a positive impact - increase growth opportunities by 1%, the
debt ratio in capital structure increases by 5,87%, and vice versa. The company raises more capital due
to the cost of raising is cheaper, then the enterprise has a tendency to lend more debt to raise capital.
Within the scope of this study of the Pool model, the growth rate (grow), income tax rate (tax), and
state’s ownership (state) do not affect capital structure because they are not statistically significant.
2.2 Test with dummy
A dummy variable for each enterprise i.cty can be used to do a heteroskedasticity test.
Command: estat hettest i.cty
Ho = Constant variance
Ha = there is variation
Table 2.2 Heteroskedasticity test for OLS (STATA 14.1)
chi2 = 489.77
Prob > chi2 = 0.0000
The result shows that Prob > chi2 = 0.0000 < α = 0.05 so we rejected the hypothesis Ho which
means there is heteroskedasticity.
2.3 Fixed effect model
2.3.1 Within transformation
Command: xtreg TDTA roa tobinq grow tangible tax liquidity state, fe
In this model, the state variable is omitted because of collinearity
Table 2.3 Fixed effect (within transformation) regression model (STATA 14.1)
roa -0.0360356 0.0177461 -2.03 0.044 -0.0710928 -0.0009784
tobinq 0.0423695 0.0223645 1.89 0.060 -0.0018114 0.0865504
grow 0.022698 0.0200884 1.13 0.260 -0.0169865 0.0623825
tangible 0.1764481 0.1130924 1.56 0.121 -0.0469646 0.3998609
tax -0.1261964 0.1438586 -0.88 0.382 -0.4103873 0.1579945
liquidity -0.0727456 0.0352261 -2.07 0.041 -0.1423343 -0.0031568
_cons 0.3838812 0.0625423 6.14 0.000 0.2603296 0.5074328
(STATA 14.1)
Based on the regression results of the FEM model, we see that most of the p-values are > 0.05,
except for the roa and liquidity, so it means that the variables are not statistically significant in affecting
the TDTA, except for roa and liquidity. And the F-value test has a p-value of 0.0000 < 0.05, and the R-
squared value of 12,34% means the corresponding confidence level is 12.34%
2.3.2 First difference
The first-differenced estimator is an approach that is used to address the problem of omitted
variables so it is actually obtained by running a pooled OLS estimation for a regression of the
differenced variables (d.roa d.tobinq d.grow d.tangible d.tax d.liquidity d.state)
Command: reg d.TDTA d.roa d.tobinq d.grow d.tangible d.tax d.liquidity d.state
d.state omitted because of collinearity.
Table 2.4 Fixed effect (first difference) regression model
d.TDTA Coef Std. Err. t P>t [95% Conf. Interval]
d.roa -0.0370262 0.013764 -2.69 0.008 -0.0642183 -0.009834
d.tobinq 0.0234956 0.0144384 1.63 0.106 -0.0050287 0.0520199
d.grow 0.009149 0.0128395 0.71 0.477 -0.0162167 0.0345147
d.tangible 0.3376215 0.107367 3.14 0.002 0.1255083 0.5497347
d.tax -0.1262337 0.1179178 -1.07 0.286 -0.359191 0.1067235
d.liquidity -0.1162716 0.0267494 -4.35 0.000 -0.1691174 -0.0634257
_cons -0.0165758 0.0076425 -2.17 0.032 -0.0316743 -0.0014773
(STATA 14.1)
Based on the results of FD model, we see that d.roa, d.tangible, and d.liquidity have p-values <
0.05, so it means that they are statistically significant in affecting the d.TDTA.
2.3.3 LSDV
The LSDV method is also developed from the Pool PLS model with the dummy variable i.cty.
Command: regress TDTA roa tobinq grow tangible tax liquidity state i.cty
40.cty omitted because of collinearity.
Table 2.5 LSDV regression model
Number of obs = 200
F(7, 192) = 13.52
Prob > F = 0.0000
R-squared = 0.7980
Adj R-squared = 0.7390
Root MSE = 0.11043
(STATA 14.1)
R-squared = 0.7980, the model can be explained 79,80% in reality. Adjusted R-squared = 0.7390
means that independent variables in the model can explain 73,90% of the variation of dependent
variable TDTA.
2.4 Random effect model
Command: xtreg TDTA roa tobinq grow tangible tax liquidity state, re
Table 2.7 Random effect regression model
roa -0.0654263 0.0162754 -4.02 0.000 -0.0973255 -0.0335271
tobinq 0.042134 0.0205559 2.05 0.040 0.0018452 0.0824228
grow 0.0292957 0.0203272 1.44 0.150 -0.0105448 0.0691362
tangible -0.0032733 0.0906076 -0.04 0.971 -0.180861 0.1743144
tax -0.2372918 0.1383178 -1.72 0.086 -0.5083898 0.0338062
liquidity -0.0975874 0.0314752 -3.10 0.002 -0.1592777 -0.0358971
state 0.0508971 0.0671838 0.76 0.449 -0.0807807 0.1825748
_cons 0.3896796 0.064866 6.01 0.000 0.2625445 0.5168148
(STATA 14.1)
Based on the regression results of the REM model, we see that most of the p-values are > 0.05,
except for the roa, tobinq, and liquidity, so it means that the variables are not statistically significant in
affecting the TDTA, except for roa, tobinq, and liquidity. The F-value test has a p-value of 0.0000, the
R-squared value of 31,67% means the corresponding confidence level is 31,67%.
2.5 Between model
Command: xtreg TDTA roa tobinq grow tangible tax liquidity state, be
Table 2.7 Between effect regression model
roa -0.1623828 0.0491026 -3.31 0.002 -0.2624014 -0.0623641
tobinq 0.0954144 0.0504809 1.89 0.068 -0.0074119 0.1982408
grow 0.0224594 0.1576872 0.14 0.888 -0.2987389 0.3436576
tangible -0.1331151 0.1808944 -0.74 0.467 -0.501585 0.2353548
tax -0.0361114 0.5394235 -0.07 0.947 -1.134881 1.062658
liquidity -0.1173176 0.0849495 -1.38 0.177 -0.290354 0.0557188
state 0.0223154 0.0708643 0.31 0.755 -0.1220303 0.1666612
_cons 0.106774 0.2067308 0.52 0.609 0.3143229 0.5278708
(STATA 14.1)
Based on the regression results of the BEM model, we see that most of the p-values are > 0.05,
except for the roa, so it means that the variables are not statistically significant in affecting the TDTA,
except for roa. The F-value test has a p-value of 0.0006, and the R-squared value of 33,65% means the
corresponding confidence level is 33,65%.
3. CHOOSING BETWEEN OLS, FEM, AND REM
3.1 Pool OLS or Fixed effect model
To compare Pool OLS and FEM, we use the testparm command to test whether the coefficient of
dummy variables is simultaneously zero (testing between the OLS and LSDV).
Ho: There is no fixed effect
Ha: There is a fixed effect
Command: testparm i.cty
Table 3.1 Result of testparm
F (38,154) = 8.80
Prob > F = 0.0000
(STATA 14.1)
As the Prob > F = 0.0000 < α = 0.05, we rejected the Ho and accepted Ha, which means we used
FEM instead of OLS.
3.2 Pool PLS or Random effect model
To compare Pool OLS and FEM, we use the Breusch and Pagan Lagrangian multiplier test.
Ho: There is no random effect
Ha: There is a random effect
Command: xttest0
Table 3.2 Result of xttest0
chibar2 = 107.22
Prob > chibar2 = 0.0000
(STATA 14.1)
As the Prob > F = 0.0000 < α = 0.05, we rejected the Ho and accepted Ha, which means we used
REM instead of OLS.
3.3 Fixed effect model or Random effect model
To choose between the two models FEM or REM, which is more appropriate, it depends on the
assumption we make about the correlation possible between the specific noise components and the
independent variables.
If the assumption εi and the explanatory variables are not correlated, then the REM model is more
suitable, and vice versa, if they are correlated, the FEM model is more efficient.
We use the Hausman Test to check whether εi is correlated with the explanatory variables.
Ho: difference in coefficients not systematic
Ha: difference in coefficients is systematic
Command: est store fe (after doing the fixed effect model)
est store re (after doing the random effect model)
hausman fe re
or hausman fe re, sigmamore (if V_b-V_B is not positive definite)
Table 3.3 Hausman test
(b) (B) (b-B) sqrt(diag(V_b-V_B))
fe re difference S.E.
roa -0.0360356 -0.0654263 0.0293907 0.0083128
tobinq 0.0423695 0.042134 0.0002355 0.0103881
grow 0.022698 0.0292957 -0.0065977 0.0038451
tangible 0.1764481 -0.0032733 0.1797215 0.0731745
tax -0.1261964 -0.2372918 0.1110954 0.0530703
liquidity -0.0727456 -0.0975874 0.0248418 0.0180366
(STATA 14.1)
chi2 = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 16.62
Prob>chi2 = 0.0108
Based on Hausman test, we have Prob > Chi2 = 0.0108 < α = 0.05, so we reject hypothesis Ho and
accepted Ha, which means difference in coefficients is systematic or εi are correlated with independent
variables. Thus the model FEM is suitable for this study.
4. TESTS FOR CHOSEN MODEL
4.1 Heteroskedasticity in FEM
We used the Modified Wald test for groupwise heteroskedasticity in fixed effect regression model.
Ho: sigma(i)^2 = sigma^2 for all i – there is no heteroskedasticity
Ha: there is heteroskedasticity
Command: xttest3
Table 4.1 Result of xttest0
chibar2 = 34948.00
Prob > chibar2 = 0.0000
(STATA 14.1)
We obtained Prob = 0.0000 < 0.05 Thus, we accepted the hypothesis Ha and rejected Ho which
means there is heteroskedasticity.
4.2 Autocorrelation test
Wooldridge test for autocorrelation in panel data
Ho: no first-order autocorrelation
Ha: there is first-order autocorrelation
Command: xtserial TDTA roa tobinq grow tangible tax liquidity state
Table 4.2 Result of autocorrelation test (STATA 14.1)
F (1,49) = 20.202
Prob > F = 0.0001
Prob > F = 0.0001 < 0.05, so we concluded that there is first-order autocorrelation.
4.3 GLS to fix heteroskedasticity and autocorrelation
After conducting the tests, we could see that the regression model violates the autocorrelation and
heteroskedasticity phenomenon. We control these phenomena by performing the GLS method as well
as increasing the efficiency of the panel data.
Command: xtreg TDTA roa tobinq grow tangible tax liquidity state, fe robust
est store fgls
Table 4.3 GLS regression model
TDTA Coef Robust t P>t [95% Conf. Interval]
Std. Err.
roa -0.0360356 0.0147454 -2.44 0.019 -0.065861 -0.0062102
tobinq 0.0423695 0.0173665 2.44 0.019 0.0072424 0.0774966
grow 0.022698 0.014061 1.61 0.115 -0.0057429 0.051139
tangible 0.1764481 0.3012644 0.59 0.561 -0.4329166 0.7858129
tax -0.1261964 0.0995534 -1.27 0.212 -0.3275622 0.0751694
liquidity -0.0727456 0.0334044 -2.18 0.036 -0.1403123 -0.0051789
_cons 0.3838812 0.1089805 3.52 0.001 0.1634474 0.604315
(STATA 14.1)
The roa variable has the opposite effect with the dependent variable TDTA and is statistically
significant at 5%. Thus, when the independent variable roa increases by 1 unit, it will cause TDTA to
decrease by 0.0360356 units, if other factors are kept constant.
The tobinq has a positive effect on the TDTA and has statistical significance at 5% level. When the
tobinq increases by 1 unit, TDTA will increase by 0.0423695 units if other factors are kept constant.
The liquidity has the opposite effect with TDTA and is statistically significant at the level 5%.
Thus, when the liquidity variable increases by 1 unit, the TDTA will decrease by 0.0727456 units if
other factors are kept constant.
4.4 Compare results
Command: esttab pool fe re fgls, r2 star(* 0.1 ** 0.05 *** 0.01) brackets nogap compress
Table 4.4 Results of 4 models with significance levels 10%, 5%, 1%
Pool OLS FEM REM GLS
roa -0.118*** -0.0360** -0.0654*** -0.0360**
[-6.67] [-2.03] [-4.02] [-2.44]
tobinq 0.0587*** 0.0424* 0.0421** 0.0424**
[2.84] [1.89] [2.05] [2.44]
grow 0.0374 0.0227 0.0293 0.0227
[1.27] [1.13] [1.44] [1.61]
tangible -0.161** 0.176 -0.00327 0.176
[-2.10] [1.56] [-0.04] [0.59]
tax -0.313* -0.126 -0.237* -0.126
[-1.85] [-0.88] [-1.72] [-1.27]
liquidity -0.127*** -0.0727** -0.0976*** -0.0727**
[-3.96] [-2.07] [-3.10] [-2.18]
state 0.0292 . 0.0509 .
[0.81] [0.76]
_cons 0.318*** 0.384*** 0.390*** 0.384***
[4.39] [6.14] [6.01] [3.52]
(STATA 14.1)
CONCLUSION
1. CONCLUSION FROM FEM
Getting an optimal collaborator is one of the most important contents of the corporate capital
management strategy. Therefore, the study of the theoretical basis of capital structure as well as the
application of it to plan the optimal structure is the most important task of the managers.
In this study, after searching sample and testing it, the results have been obtained:
- Regression results show that profit (roa), the corporate’s income tax rate (tax), and liquidity have
a negative impact on capital structure represented by the debt-to-asset ratio TDTA.
- Meanwhile, the growth opportunities (tobinq), percentage of the tangible asset (tangible), and
state’s ownership have a positive impact on capital structure.
Therefore, we accepted hypothesis H3 and rejected H1, and H2 from the hypotheses at the
beginning. That means some factors are positively related some are negatively related to capital
structure.
2. LIMITATION
The results of this study can bring many benefits to credit institutions in Vietnam, but due to limited
capacity and limited time, the study still has many limitations:
- The data set is quite small and data collected from financial statements of enterprises, and
financial indicators on listed websites are sometimes incomplete.
- This study is based on internal factors and ignores other factors that can affect the ratio of debt to
total assets of the enterprise such as the behavior of managers, and operating policies.
- Some of the chosen factors may not be the core factors of capital structure. We should add some
other internal or macro variables to increase the explanatory power of the dependent variable.
PICTURE OF RESULTS FROM STATA
1. Summaty data
2. Descriptive statistics
3. Correlation effective matrix

4. Post-estimation test
5. Unit roots test
6. Pool OLS model

7. Heteroskedasticity test for OLS
8. Fixed effect (first difference) regression model

9. LSDV test
10. Random effect model

11. Between effect model
12. Testparm
13. Breusch and Pagan Lagrangian multiplier test
14. Hausman test
15. Heteroskedasticity in FEM

16. Autocorrelation test
17. GLS regression model
18. Compare results

PART 2: R-STATION – HANG SENG INDEX WITH ARIMA MODEL
1. RESEARCH DATA
For the purpose of testing and evaluating the model, the author decided to use daily historical data
of the Hangseng-Index for a period of 3 years, from May 26th, 2020 to May 25th, 2023 instead of a
weekly data of 52 weeks to be able to make appropriate comparisons.
Here, the author chose the close price for a better forecast and test.
2. RESEARCH RESULT
2.1 Basic information
The first step in model selection is to define some basic properties of the time series data to be used.
Command: summary(price)
Table 5.1 Descriptive statistics
Min. 1st Qu. Median Mean 3rd Quan. Max.
14687 20533 24181 23590 26197 31085
(Rstudio)
2.2 Test for univariate time series stationarity and determine (d)
To make a graph for univariate time series, use
Command: plot(price,main="Price")
Graph 1.1 Hang Seng Index time series
(Rstudio)
The graph shows that this time series does not seem to be stationary, and the fluctuation range of
the index is quite large during this period. However, to be more certain, the author uses the Augmented
Dickey-Fuller test to double-check in parallel with determining the coefficient 𝑑 by testing the original
time series, logarithm, and the 1st order difference.
Ho: The time series is not stationary.
Ha: The time series is stationary
Command: adf.test(price)
adf.test(dp)
Table 5.2 Augmented Dickey-Fuller test result
Data Type Lag ADF p-value
price No drift no trend 6 -0.740 0.414
With drift no trend 6 -0.858 0.752
With drìft with trend 6 -2.36 0.426
dp No drift no trend 6 -10.6 0.01
(diff(log(price)) With drift no trend 6 -10.6 0.01
) With drìft with trend 6 -10.6 0.01
(Rstudio)
At the 5% level of significance, the author concludes that the 1st-order difference is a stationary
time series of any type with p-value = 0.01 < 0.05. Therefore, the author determines the value of the
coefficient 𝑑 = 1 in operating the ARIMA model.
2.3 Determine optimal delay p, q
The author uses analysis based on the Correlogram ACF and PACF combined with AIC content
method to determine p, q. In this test, the author also uses the logarithm of price data.
Command: lnp <- log(price)
> par(mfrow=c(3,1),mar=c(3,3,3,3))
> plot(lnp,main="Log of Price")
> acf(lnp,ylab='',main="ACF of price",ylim=c(-1,1))
> pacf(lnp,ylab='',main="PACF of price",ylim=c(-1,1))
Graph 2.1 ACF and PACF of log price
(Rstudio)
Based on the ACF diagram, the author found that lnp is correlated with the 2nd and 3rd-order delay
Therefore, the values 𝑞 = 3 and 𝑞 = 5 are considered and selected to operate the model. With the PACF
diagram, there is a partial correlation with the 2nd and 3rd-order lag, So the values 𝑝 = 1 and 𝑝 = 2 is
considered for selection.
Therefore, to select the model, the author uses
Command: auto.arima(lnp, seasonal = FALSE, approximation = FALSE, trace = TRUE)
Table 6.1 auto arima test result
Best model: ARIMA(2,1,3)
Series: lnp
ARIMA(2,1,3)
sigma^2 = 0.00024 log likelihood = 2033.79
AIC = - 4055.58 AICc = - 4055.47 BIC = - 4027.95
(Rstudio)
Thus, the author decided to choose the ARIMA(2,1,3) model for the next testing and forecasting
steps.
2.4 Estimating the selected model:
Command: arima(lnp, order=c(2,1,3),include.mean=F,method="ML")
Arima(lnp, order=c(2,1,3),include.constant=F,method="ML")
Table 7.1 Results of regression model ARIMA(2,1,3)
AR(1) AR(2) MA(1) MA(2) MA(13)
Coefficient -0.9066 -0.7879 0.9285 0.8056 -0.0722
Standard error 0.0743 0.0858 0.0840 0.1002 0.0471
sigma^2 = 0.00024: log likelihood = 2033.79
AIC=-4055.58 AICc=-4055.47 BIC=-4027.95
(Rstudio)
The AIC statistic is the smallest among all models so ARIMA(2,1,3) is the appropriate one.
2.5 Test for residual
To check whether this is a good model, the author uses tests related to the residuals of the
regression model. First, test the variance for change. Here, the author uses the Ljung - box with the test
hypothesis as follows. The total lags used is 10.
Ho: Residual does not experience autocorrelation
Command: arima1 <- arima(lnp, order=c(2,1,3),include.mean=F,method="ML")
checkresiduals(arima1)
Table 8.1 Results of Ljung-box test
Ljung-Box test
data: Residuals from ARIMA(2,1,3)
Q* = 5.6681, df = 5, p-value = 0.3399
(Rstudio)
The p-value = 0,3399 > 0.05 indicates that the residual has no autocorrelation means that there is a
White noise phenomenon.
Check the variance of the residuals by the ARCH heteroscedasticity test.
Ho: no ARCH effects
Command: arch.test(arima1)
ArchTest(arima1$residuals,lags=360)
Table 8.2 Results of ARCH heteroscedasticity test
ARCH LM-test
data: arima1$residuals
Chi-squared = 333.14, df = 360, p-value = 0.8418
(Rstudio)
The p-value = 0,8418 > 0.05 indicates that the residual there is no ARCH effect means that there is
no variance in the residuals.
Check the nominal distribution of the residuals by the Shapiro-Wilk normality test.
Ho: no ARCH effects
Command: shapiro.test(resid(arima1))
Table 8.2 Results of Shapiro-Wilk normality test
Shapiro-Wilk normality test
Data: resid(arima1)
W = 0.96857, p-value = 1.629e-11
(Rstudio)
The p-value = 1.629e-11 < 0.05 means there that the data tested are not normally distributed.
2.6 Forecasting and model performance evaluation
Here, the author makes forecast for the next 10 days. Forecast results, graphs are presented below:
Command: pred <-forecast(arima1, lead=10)
pred1 <- predict(arima1, n.ahead=10)
pred2 <- predict(arima1, h=10)
Table 8.2 Results for forecasting
Lead Forecast Standard error Lower 95% Upper 95% Reality Difference
1 9.84 0.0154 9.81 9.87 9.80 0.04
2 9.84 0.0221 9.80 9.88 9.82 0.02
3 9.84 0.0271 9.79 9.89 9.70 0.14
4 9.84 0.0307 9.78 9.90 9.76 0.08
5 9.84 0.0344 9.77 9.91 9.75 0.09
6 9.84 0.0378 9.77 9.92 9.67 0.17
7 9.84 0.0405 9.76 9.93 9.70 0.14
8 9.84 0.0434 9.75 9.94 9.75 0.09
(Rstudio)
Graph 3.1 Predicted value for next 10 periods
(Rstudio)
After obtaining the forecast results, the author has some comments:
(1) Due to the characteristics of forecasting, the confidence interval tends to widen for more distant
forecast points. This reduces the accuracy of the forecast.
(2) The actual Hang Seng Index data in the next 10 days is completely not within the 95%
confidence interval of the forecast. Thus, in the author's opinion, ARIMA is not a good model
to forecast the index in an environment where price shocks may occur.
RESULTS FROM RSTATION
1. Summary data
2. Hang Seng – index
3. Augmented Dickey-Fuller with original data and 1st-order difference

4. Results for the best ARIMA model
5. Estimating the selected model

6. Ljung-Box test
7. Arch test
8. Shaprio test
9. Forecasting

STATA and R

Uploaded by

Copyright:

Available Formats

STATA and R

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STATA and R

Uploaded by

Copyright:

Available Formats

UEH UNIVERSITY

Name: Nguyễn Lê Tú Quyên

Friday, June 2nd, 2023

3. Correlation effective matrix

5. Unit roots test

6. Pool OLS model

8. Fixed effect (first difference) regression model

10. Random effect model

14. Hausman test

15. Heteroskedasticity in FEM

17. GLS regression model

18. Compare results

2. Hang Seng – index

3. Augmented Dickey-Fuller with original data and 1st-order difference

5. Estimating the selected model

You might also like