Regression Problem Set
Regression Problem Set
Regression Problem Set
It is believed that caste plays an important role in Indian elections. Data is collected from 2019
elections from 37 parliamentary constituencies of Tamil Nadu for the political party called
A2ZMK (A-to-Z Munnetra Kazhagam) and number of people belonging A to Z (A2Z) caste in
each of the 37 constituencies. Note that, depending on the candidate, they also get large number
of votes from people belonging to other castes. Data description is provided in Table 1.1 and
descriptive statistics are shown in Table 1.2.
A simple linear regression model was developed between the variable NVP as dependent variable
and A2Z as independent variable. The regression outputs are shown in Tables 1.3 and 1.4.
(b) A constituency has 50000 voters belonging the A2Z caste. Predict the votes polled for
A2ZMK in this constituency.
(c) What proportion of the variation in votes polled for A2ZMK is explained by the number
of people belonging the A2Z caste?
(d) The President of the A2ZMK party believes that at least 40% of the people belonging to
the A2Z caste vote for them. Check whether the President’s claim is true at a 10%
significance level. Clearly write all the steps.
The normal P-P plot and residual plots are shown in Figures 1.1 and 1.2.
(e) Based on the P-P plot (Figure 1.1) and Residual Plot (Figure 1.2), comment on the validity
of model shown in Tables 1.3 and 1.4. Clearly identify any potential problem with the
model.
A second model is developed between ln(NVP) and ln(A2Z). The model outputs are provided in
Tables 1.5 and 1.6 and Figures 1.3 and 1.4.
(f) Coimbatore has 50,000 voters belonging A2Z caste, what is the number of the votes polled
for A2ZMK in Coimbatore using Model 2?