Assignment - DMBA103 StAT
Assignment - DMBA103 StAT
Assignment - DMBA103 StAT
1. Mr. Vijay, a retired government servant, is considering investing his money in two
proposals. He wants to choose the one that has higher average net present value and lower
standard deviations. The relevant data are given below. Can you help him choosing the
proposal?
-10050 0.30
5812 0.40
20584 0.30
Answer:
To Suggest Mr. Gupta a proposal for high average net present value, first calculate
the expected (average) net present value for both the proposals.
Since the expected NPV in both the cases is same, he would like to choose the less
risky proposal. For this we have to calculate the standard deviation in both the
cases.
∑ 𝑓(𝑋− ̅̅̅̅
𝑋 )2
SA = √ ∑𝑓
= √87,21, 404.4 = Rs. 2953. 20
∑ 𝑓(𝑋− 𝑋̅ )2
SB= √ ∑𝑓
= √14, 08,37,579 = Rs. 11,867.50
Decision: The SA & SB indicates uniform net profit for proposal A. Thus proposal
A may be chosen.
Q2
Answer:
E = a defective item
P ( Ai / E) is the probability that the item is produced by the plant, given that the
item is defective.
P (Ai ∩ E) is the probability that the items are produced by the ith plant are
defective.
Prior Probabilities-
500 = 1
P (A1) =
500 + 1000 + 2000 7
1000 = 2
P (A2) =
3500 7
2000 = 4
P (A3) =
3500 7
Joint Probabilities
Posterior Probabilities
P(A1∩E)1 0.005/7 5
P(A1/E) = = =
P(E) 0.061/7 61
P(A2∩E)1 0.016/7 16
P(A2/E) = = =
P(E) 0.061/7 61
P(A3∩E)1 0.040/7 40
P(A3/E) = = =
P(E) 0.061/7 61
A. Since value of P (A3/E) is more amongst posterior probabilities, it is then probable
that the defective item has been drawn from the output of the third plant.
B. The probability is from First factory is =5/61.
Probability Sampling
Probability sampling refers to the selection of a sample from a population, when this selection is
based on the principle of randomization, that is, random selection or chance.
However, because units from the population are randomly selected and each unit’s selection
probability can be calculated, reliable estimates can be produced and statistical inferences can be
made about the population.
Non-Probability Sampling
1. Unknown proportion of the entire population is not included in the sample group i.e. lack
of representation of the entire population
2. Lower level of generalization of research findings compared to probability sampling
3. Difficulties in estimating sampling variability and identifying possible bias
Systematic sampling is a type of probability sampling method in which sample members from a
larger population are selected according to a random starting point but with a fixed, periodic
interval.
This interval, called the sampling interval, is calculated by dividing the population size by the
desired sample size.
Cluster Sampling:
In cluster sampling, researchers divide a population into smaller groups known as clusters. They
then randomly select among these clusters to form a sample. Cluster sampling is a method of
probability sampling that is often used to study large populations, particularly those that are
widely geographically dispersed.
Ans. Generally, nonprobability sampling is a bit rough, with a biased and subjective process.
This sampling is used to generate a hypothesis. Conversely, probability sampling is more
precise, objective and unbiased, which makes it a good fit for testing a hypothesis.
When the population is large, applying the census method would be difficult.
Information from the sampled units is used to estimate the characteristics for the entire population of
interest.
Secondary data
Using existing data generated by large government Institutions, healthcare facilities etc. as part
of organizational record keeping. The data is then extracted from more varied datafiles. In
published form, secondary data is available in research papers, newspapers, magazines,
government publication, international publication, and websites.
SET-II
4. A report says that 80% of India’s females aged 15-59 are not currently
engaged in the workforce. A national agency has an opinion that this
percentage may be even more. To validate its opinion, the agency did a survey
of randomly chosen 1200 females of the age group 15-59 from the different
parts of the country and found 228 females working. Do the figures of the
survey help the agency in validation of its opinion?
Answer:
According to report, 80% of females are non working i.e. 20% are working.
So, with the help of data of 1200 females, the working females should be 20% of
1200 females i.e.
1200*20%=240 females.
But, in the agency’s survey it is found that 228, females are working out of 1200
females which is 19%.
Therefore, agency can validate its opinion on the grounds of survey that 81% of
females aged 15-59 are not working.
Q5. What is regression analysis? Explain steps of performing regression
analysis in detail?
The following data shows the yearly sales (in million Rs.) of A2Z- corporation
for the last nine years. Develop a regression model.
Year 2011 2012 2013 2014 2015 2016 2017 2018 2019
Sale 2.3 5.3 5.1 3.5 3.4 2.7 2.8 4.1 2.9
Answer:
Regression analysis is a set of statistical methods used for the estimation of relationships
between a dependent variable and one or more independent variables. It can be utilized to assess
the strength of the relationship between variables and for modeling the future relationship
between them.
In order to understand regression analysis fully, it’s essential to comprehend the following terms:
• Dependent Variable: This is the main factor that you’re trying to understand or predict.
• Independent Variables: These are the factors that you hypothesize have an impact on your
dependent variable.
Y = a + bX
α and β in the above equations are parameters and they remain constant as x and y changes.
Y is your Sales, the ‘a’ is the intercept and the ‘b’ is the slope.
By determining the values of “α” and “β” we can calculate the value of “y” for a given value of
“x”.
Regression analysis is a predictive modelling technique, used to analyse the cause and effect. It is
primarily used for:
1. Decide on purpose of model and appropriate dependent variable to meet that purpose.
2. Decide on independent variables.
3. Estimate parameters of regression equation.
4. Interpret estimated parameters, goodness of fit and qualitative and quantitative assessment of
parameters.
5. Assess appropriateness of assumptions.
6. If some assumptions are not satisfied, modify and revise estimated equation.
7. Validate estimated regression equation.
Regression Model:-
Year (X-X̅)*(y-
Year Sale (Y) X² Y² XY (X-X̅) (y-ӯ) (X-X̅)²
(X) ӯ)
2011 0 2.3 0 5.29 0 -4 -1.27 5.07 16
2012 1 5.3 1 28.09 5.3 -3 1.73 -5.20 9
2013 2 5.1 4 26.01 10.2 -2 1.53 -3.07 4
2014 3 3.5 9 12.25 10.5 -1 -0.07 0.07 1
2015 4 3.4 16 11.56 13.6 0 -0.17 0.00 0
2016 5 2.7 25 7.29 13.5 1 -0.87 -0.87 1
2017 6 2.8 36 7.84 16.8 2 -0.77 -1.53 4
2018 7 4.1 49 16.81 28.7 3 0.53 1.60 9
2019 8 2.9 64 8.41 23.2 4 -0.67 -2.67 16
36 32.1 204 123.55 121.8 0 -0.03 -6.60 60.00
X̅ = ƩX/N= 4
ӯ= ƩY/N= 3.57
Regression
Equation= Ŷ=a + bX
= -6.60 = -0.11
b= (X-X̅)*(y-ӯ) / (X-X̅)²
60
a.
a= ӯ - b X̅= 3.57 – (-0.11)(4) = 4.0
Answer:
Here we will apply ANOVA A two way classification, as we have two factors. One is consignment
and the other is observer.
Step -1: Null Hypothesis Ho 1: There is no difference in the yield between treatments (rows).
Step 2: Test Statistics (Calculating F ratio), as the given data is really larger, let us subtract all
values given by 10 make them simpler for calculation ease.
Consignment
Observer 1 2 3 4 5 6 Total
1 -1 0 -1 0 1 1 0
2 2 1 -1 1 0 0 3
3 1 0 0 2 1 0 4
4 2 1 1 4 2 0 10
Total 4 2 -1 7 4 1 GT=17
17² 289
Correction factor(C.F) = = = 12.04
24 24
Total S.S. = (-1)2 + (2)2 + (1)2 + (2)2 + …..+ (1)2 + (0)2 + (0)2 + (0)2 – CF
= 43 – 12.04 = 30.96
= 21.75 – 12.04
=9.71
9+16+100
= - 12.04
6
= 20.83 – 12.04
= 8.79
S.S. due to error = Total S.S. –S.S. between consignment- S.S. between observers
= 12.46
ANOVA Table
Since calculated value of F for consignments is less than the table, we accept the null
hypothesis and conclude that there is no significant difference between consignments.