Assignment No. 2 Subject: Educational Statistics (8614) (Units 1-4) Subject
Assignment No. 2 Subject: Educational Statistics (8614) (Units 1-4) Subject
Assignment No. 2 Subject: Educational Statistics (8614) (Units 1-4) Subject
2
Subject: Educational Statistics(8614)
(Units 1–4)
Subject
Name : Asia Noor Roll # BY627591 B.Ed 1.5 years Spring 2020
Hypothesis testing is used to assess the plausibility of a hypothesis by using sample data. Such data may come
from a larger population, or from a data-generating process.
Hypothesis testing is used to assess the plausibility of a hypothesis by using sample data. Such data may come
from a larger population, or from a data-generating process. The word "population" will be used for both of these
cases in the following descriptions.
In hypothesis testing, an analyst tests a statistical sample, with the goal of providing evidence on the plausibility of
the null hypothesis.
Statistical analysts test a hypothesis by measuring and examining a random sample of the population being
analyzed. All analysts use a random population sample to test two different hypotheses: the null hypothesis and
the alternative hypothesis.
The null hypothesis is usually a hypothesis of equality between population parameters; e.g., a null hypothesis may
state that the population means return is equal to zero. The alternative hypothesis is effectively the opposite of a
null hypothesis (e.g., the population means return is not equal to zero). Thus, they are mutually exclusive, and only
one can be true. However, one of the two hypotheses will always be true.
1. The first step is for the analyst to state the two hypotheses so that only one can be right.
2. The next step is to formulate an analysis plan, which outlines how the data will be evaluated.
3. The third step is to carry out the plan and physically analyze the sample data.
4. The fourth and final step is to analyze the results and either reject the null hypothesis, or state that the null
hypothesis is plausible, given the data.
A random sample of 100 coin flips is taken, and the null hypothesis is then tested. If it is found that the 100 coin
flips were distributed as 40 heads and 60 tails, the analyst would assume that a penny does not have a 50%
chance of landing on heads and would reject the null hypothesis and accept the alternative hypothesis.
If, on the other hand, there were 48 heads and 52 tails, then it is plausible that the coin could be fair and still
produce such a result. In cases such as this where the null hypothesis is "accepted," the analyst states that the
difference between the expected results (50 heads and 50 tails) and the observed results (48 heads and 52 tails)
is "explainable by chance alone."
The Logic of Hypothesis Testing As just stated, the logic of hypothesis testing in statistics involves four steps.
1. State the Hypothesis: We state a hypothesis (guess) about a population. Usually the hypothesis concerns
the value of a population parameter.
2. Define the Decision Method: We define a method to make a decision about the hypothesis. The method
involves sample data.
3. Gather Data: We obtain a random sample from the population.
4. Make a Decision: We compare the sample data with the hypothesis about the population. Usually we
compare the value of a statistic computed from the sample data with the hypothesized value of the
population parameter.
o If the data are consistent with the hypothesis we conclude that the hypothesis is reasonable. NOTE:
We do not conclude it is right, but reasonable! AND: We actually do this by rejecting the opposite
hypothesis (called the NULL hypothesis). More on this later.
o If there is a big discrepancy between the data and the hypothesis we conclude that the hypothesis
was wrong.
Q.2: Explain types of ANOVA.. Describe possible situations in which each type should be used.
ANS: ANOVA: Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed
aggregate variability found inside a data set into two parts: systematic factors and random factors. The systematic
factors have a statistical influence on the given data set, while the random factors do not. Analysts use the ANOVA
test to determine the influence that independent variables have on the dependent variable in a regression study.
Types Of ANOVA:
There are two main types of ANOVA:
A two-way ANOVA: is an extension of the one-way ANOVA. With a one-way, you have one independent variable
affecting a dependent variable. With a two-way ANOVA, there are two independents. For example, a two-way
ANOVA allows a company to compare worker productivity based on two independent variables, such as salary
and skill set. It is utilized to observe the interaction between the two factors and tests the effect of two factors at
the same time.
Your independent variable is social media use, and you assign groups to low, medium, and high levels of
social media use to find out if there is a difference in hours of sleep per night.
Your independent variable is brand of soda, and you collect data on Coke, Pepsi, Sprite, and Fanta to find
out if there is a difference in the price per 100ml.
You independent variable is type of fertilizer, and you treat crop fields with mixtures 1, 2 and 3 to find out if
there is a difference in crop yield.
The null hypothesis (H0) of ANOVA is that there is no difference among group means. The alternate hypothesis
(Ha) is that at least one group differs significantly from the overall mean of the dependent variable.
If you only want to compare two groups, use a t-test instead.
Q.3:What is the range of correlation coefficient? Explain strong, moderate and weak relationship?
ANS: Range of correlation coefficient:
Correlation of coefficients are indicators of the strength of the linear relationship between two different variables, x
and y. A linear correlation coefficient that is greater than zero indicates a positive relationship. A value that is less
than zero signifies a negative relationship. Finally, a value of zero indicates no relationship between the two
variables x and y. This article explains the significance of linear correlation coefficient for investors, how to
calculate covariance for stocks, and how investors can use correlation to predict the market.
Strong, Moderate And Weak Relationship:
The correlation coefficient (ρ) is a measure that determines the degree to which the movement of two different
variables is associated. The most common correlation coefficient, generated by the Pearson product-moment
correlation, is used to measure the linear relationship between two variables. However, in a non-linear relationship,
this correlation coefficient may not always be a suitable measure of dependence.The possible range of values for
the correlation coefficient is -1.0 to 1.0. In other words, the values cannot exceed 1.0 or be less than -1.0.
A correlation of -1.0 indicates a perfect negative correlation, and a correlation of 1.0 indicates a perfect positive
correlation. If the correlation coefficient is greater than zero, it is a positive relationship. Conversely, if the value is
less than zero, it is a negative relationship. A value of zero indicates that there is no relationship between the two
variables.
For example, suppose that the prices of coffee and computers are observed and found to have a correlation of
+.0008. This means that there is no correlation, or relationship, between the two variables.
Positive Correlation
A positive correlation—when the correlation coefficient is greater than 0—signifies that both variables move in the
same direction. When ρ is +1, it signifies that the two variables being compared have a perfect positive
relationship; when one variable moves higher or lower, the other variable moves in the same direction with the
same magnitude.
The closer the value of ρ is to +1, the stronger the linear relationship. For example, suppose the value of oil
prices is directly related to the prices of airplane tickets, with a correlation coefficient of +0.95. The relationship
between oil prices and airfares has a very strong positive correlation since the value is close to +1. So, if the price
of oil decreases, airfares also decrease, and if the price of oil increases, so do the prices of airplane tickets.
In the chart below, we compare one of the largest U.S. banks, JPMorgan Chase & Co. (JPM), with the Financial
Select SPDR Exchange Traded Fund (ETF) (XLF).1 2 As you can imagine, JPMorgan Chase & Co. should have a
positive correlation to the banking industry as a whole. We can see the correlation coefficient is currently at 0.98,
which is signaling a strong positive correlation. A reading above 0.50 typically signals a positive correlation.
Negative Correlation
A negative (inverse) correlation occurs when the correlation coefficient is less than 0. This is an indication that
both variables move in the opposite direction. In short, any reading between 0 and -1 means that the two securities
move in opposite directions. When ρ is -1, the relationship is said to be perfectly negatively correlated. In short, if
one variable increases, the other variable decreases with the same magnitude (and vice versa). However, the
degree to which two securities are negatively correlated might vary over time (and they are almost never exactly
correlated all the time).
Even for small datasets, the computations for the linear correlation coefficient can be too long to do manually.
Thus, data are often plugged into a calculator or, more likely, a computer or statistics program to find the
coefficient.
Q.4:Explain chi Square Independent test. In what situation should it be applied?
ANS: Chi Square Independent Test:
The Chi-square test of independence is a statistical hypothesis test used to determine whether two categorical or
nominal variables are likely to be related or not.
The Chi-Square Test of Independence determines whether there is an association between categorical variables
(i.e., whether the variables are independent or related). It is a nonparametric test.
We have a list of movie genres; this is our first variable. Our second variable is whether or not the patrons
of those genres bought snacks at the theater. Our idea (or, in statistical terms, our null hypothesis) is that the type
of movie and whether or not people bought snacks are unrelated. The owner of the movie theater wants to
estimate how many snacks to buy. If movie type and snack purchases are unrelated, estimating will be simpler
than if the movie types impact snack sales.
A veterinary clinic has a list of dog breeds they see as patients. The second variable is whether owners
feed dry food, canned food or a mixture. Our idea is that the dog breed and types of food are unrelated. If this is
true, then the clinic can order food based only on the total number of dogs, without consideration for the breeds.
For a valid test, we need:
Data values that are a simple random sample from the population of interest.
Two categorical or nominal variables. Don't use the independence test with continous variables that define
the category combinations. However, the counts for the combinations of the two categorical variables will be
continuous.
For each combination of the levels of the two variables, we need at least five expected values. When we
have fewer than five for any one combination, the test results are not reliable.
Regression Analysis
Regression analysis refers to assessing the relationship between the outcome variable and one or more variables.
The outcome variable is known as the dependent or response variable and the risk elements, and co-founders are
known as predictors or independent variables. The dependent variable is shown by “y” and independent variables
are shown by “x” in regression analysis.
The sample of a correlation coefficient is estimated in the correlation analysis. It ranges between -1 and +1,
denoted by r and quantifies the strength and direction of the linear association among two variables. The
correlation among two variables can either be positive, i.e. a higher level of one variable is related to a higher level
of another or negative, i.e. a higher level of one variable is related to a lower level of the other.
The sign of the coefficient of correlation shows the direction of the association. The magnitude of the coefficient
shows the strength of the association.
For example, a correlation of r = 0.8 indicates a positive and strong association among two variables, while a
correlation of r = -0.3 shows a negative and weak association. A correlation near to zero shows the non-existence
of linear association among two continuous variables.
Linear Regression
Linear regression is a linear approach to modelling the relationship between the scalar components and one or
more independent variables. If the regression has one independent variable, then it is known as a simple linear
regression. If it has more than one independent variable, then it is known as multiple linear regression. Linear
regression only focuses on the conditional probability distribution of the given values rather than the joint
probability distribution. In general, all the real world regressions models involve multiple predictors. So, the term
linear regression often describes multivariate linear regression.
Correlation shows the quantity of the degree to which two variables are associated. It does not fix a line
through the data points. You compute a correlation that shows how much one variable changes when the
other remains constant. When r is 0.0, the relationship does not exist. When r is positive, one variable goes
high as the other goes up. When r is negative, one variable goes high as the other goes down.
Linear regression finds the best line that predicts y from x, but Correlation does not fit a line.
Correlation is used when you measure both variables, while linear regression is mostly applied when x is a
variable that is manipulated.
To estimate values of a
To find a value expressing
random variable based on
Objective the relationship between
the values of a fixed
variables.
variable.
Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent
variable and one or more independent variablesIndependent VariableAn independent variable is an input,
assumption, or driver that is changed in order to assess its impact on a dependent variable.
Regression Analysis – Linear model assumptions: