Normal Distribution

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

NORMAL DISTRIBUTION A normal distribution can be converted into a

standard normal distribution by obtaining the z


Normal distribution or Gaussian distribution is a
value. A z value is the signed distance
continuous probability distribution that
between a selected value, designated x, and
describes data that clusters around a mean.
the mean, μ, divided by the standard
The graph of the associated probability density
deviation. It is also called as the z scores, the z
function is bell-shaped, with a peak at the
statistics, the standard normal deviates, or the
mean, and is known as the Gaussian function
standard normal values
or bell curve.
DETERMINING NORMALITY
The normal curve was developed
mathematically in 1733 by Abraham de Moivre To determining normality, a bell-shaped
(1667-1754) as an approximation to the distribution or normally shaped is only one of
binomial distribution. many shapes that a distribution can assume;
though, it is important since many statistical
The normal distribution can be used to
methods require that the distribution values be
describe, at least approximately, any variable
approximately normally shaped.
that tends to cluster around the mean.
There are a lot of ways for a statistician to
determine the normality. The easiest way is to
PROPERTIES OF A NORMAL DISTRIBUTION draw a histogram for the data and check its
shape. If the histogram is not approximately
A normal distribution is a continuous, normally shaped, then the data are not
symmetric, bell-shaped distribution of a normally distributed.
variable. The properties of the normal
distribution are as follows: Skewness can be tested by applying Pearson’s
index (PI) of skewness. If the , it can be
The distribution is bell-shaped. concluded that the data are significantly
The mean, median, and mode are equal and skewed if will fall outside the range. Also, the
are located at the center of the distribution. data should be tested for outliers because even
one or two outliers can have a big effect for
The normal distribution is unimodal. normality.
The normal distribution curve is symmetric CONFIDENCE INTERVAL AND SAMPLE SIZE
about the mean (the shape are same on both
sides) ESTIMATION

The normal distribution is continuous. One aspect of inferential statistics

The normal curve is asymptotic (it never Process of estimating the value of a parameter
touches the x-axis) from information drawn from a sample

The total area under the normal distribution OBJECTIVE: TO DETERMINE THE APPROXIMATE
curve is 1.00 or 100% VALUE OF A POPULATION PARAMETER ON THE
BASIS OF A SAMPLE STATISTIC.
The area under the part of normal curve that
lies within 1 standard deviation of the mean is *ESTIMATOR: sample statistic
about 68%; within 2 standard deviation is *ESTIMATE: computed sample statistic
about 95%, and with 3 standard deviations is
about 99.7%. *POINT ESTIMATE: is the value of a sample
statistic that is used to estimate a population
parameter. Generally, whenever we utilized
STANDARD NORMAL DISTRIBUTION
point estimation, we calculate for the margin error value on either side of the population
of error associated with that point estimation: mean.

A. CONFIDENCE INTERVALS Sample size determination is very much related


to estimation. To get an accurate estimate we
A good estimator should satisfy these
need three things:
properties:
The maximum error of estimate
It should be unbiased estimator: a
population parameter is an The population standard deviation
estimator whose expected value is
The degree of confidence
equal to that parameter.
SAMPLE SIZE FOR PROPORTIONS
It should be consistent estimator: if the
difference between the estimator and the By using the maximum error part of the
parameter grows smaller as the sample size confidence interval formula, it is possible to
grows larger determine the size of the sample that must be
taken in order to estimate with a desired
It should be relatively efficient estimator: if
accuracy.
there are two unbiased estimators of a
parameter, the one whose variance is smaller.

*INTERVAL ESTIMATE:

Is an interval or a range of values used to


estimate the parameter.

A degree of confidence (generally a percent)


can be assigned before an interval estimate is
prepared.

Each interval is constructed with regard to a


given confidence level and is called a
confidence interval.

*CONFIDENCE LEVEL

Is associated with a confidence interval states


how much confidence we have that is interval
contains the true population parameter.

The confidence level is denoted by:

When stated as probability, it is called as


confidence coefficient and is denoted as:

Remember that α represents significance level.

The more frequent values are 99%, 95%, and


90% (the following confidence coefficients are
0.99, 0.95, and 0.90)

MAXIMUM ERROR OF ESTIMATE is the maximum


of a parameter and the actual value of the
parameter. For the specific value, say that α=
0.10, 90% of the sample means will fall within this
HYPOTHESIS TESTING ALTERNATIVE HYPOTHESIS

Introduced by Sir Ronald Fisher, Jerzy Newman, Symbolized by


Karl Pearson, and Egon Pearson
It is the opposite of the null hypothesis
Statistical method that is used in making
It shows that observations are the result of a
statistical decisions using experimental data.
real effect
Hypothesis testing is basically an assumption
that we make about the population It states that there is a difference between
parameter two population means (or parameters)

3 METHODS TO TEST HYPOTHESIS

TRADITIONAL METHOD LEVEL OF SIGNIFICANCE


p-VALUE METHOD In hypothesis testing, the level of significance
CONFIDENCE INTERVAL METHOD refers to the degree of significance in which
we accept or reject the null hypothesis.

In hypothesis testing, 100% accuracy is not


PROCEDURE IN HYPOTHESIS TESTING possible for accepting or rejecting a null
hypothesis, so we therefore select a level of
All hypothesis testing situations start with stating
significance that is usually 1% and 5%.
the statistical hypothesis. A statistical hypothesis
is a conjecture about the population Level of significance is the maximum
parameter. This conjecture may or may not be probability of committing a Type I error. That is,
true. P(Type I error)=α.
2 TYPES OF STATISTICAL HYPOTHESES: After the level of significance is chosen, a
critical value is selected from a table for the
NULL HYPOTHESIS (H0)
appropriate test statistic. The critical value
ALTERNATIVE HYPOTHESIS (H1) determines the critical and non-critical regions

COMPARISON BETWEEN NULL AND ALTERNATIVE CRITICAL VALUE


HYPOTHESIS
Is a value that separates the critical region
NULL HYPOTHESIS from the non-critical region

Symbolized by CRITICAL REGION

It is a statistical hypothesis testing that Also known as REJECTION REGION


assumes that the observation is due to a
Range of the values of the test value that
chance factor
indicates that there is significant difference
In hypothesis testing, null hypothesis is and that the null hypothesis should be
denoted by , which shows that there is no rejected.
difference between the two population
means (or parameters).
NON-CRITICAL REGION TYPE I ERROR

Also known as NON-REJECTION REGION Occurs if one rejects the null hypothesis when it
is true
Range of values of the test value that indicates
that the difference was probably due to Denoted by an α
chance and that null hypothesis should not be
In hypothesis testing, the normal curve that
rejected.
shows the critical region is called the ALPHA
ONE-TAILED VS. TWO-TAILED TEST REGION.

ONE-TAILED TEST

One-tailed test shows that the null hypothesis TYPE II ERROR


should be rejected when test value is in the
Occurs if one does not reject the null
critical region on one side of the mean.
hypothesis when it is false.
It may be either a right-tailed or left-tailed
Denoted by a β
test, depending on the direction of the
inequality of the alternative hypothesis. In hypothesis testing, the normal curve that
shows the acceptance region is called the
BETA REGION
TWO-TAILED TEST

Two-tailed test shows that the null hypothesis STEPS IN CONDUCTING HYPOTHESIS TESTING
should be rejected when the test value is
in either of the two critical regions. State the null hypothesis (H0) and the
alternative hypothesis (H1).

Choose the level of significance, α, and the


sample size.
POSSIBLE OUTCOME OF A HYPOTHESIS TEST Determine the test statistic and sampling
STATISTICAL H0 TRUE H0 FALSE distribution.
DECISION Determine the critical values that divide the
DO NOT Correct Type II rejection and non-rejection regions.
REJECT H0 decision error Collect the data and compute the value of the
Confidence= P(Type II test statistic.
1-α error)= β Make a statistical decision.
REJECT H0 Type I error Correct State the conclusion.
decision
P(Type I error)= HYPOTHESIS TESTING USING p-VALUE
α Power= 1-β
p-value (PROBABILITY VALUE) is the probability
of getting a sample statistic or a mean extreme
sample statistic in the direction of the H1 when
the H0 is true. It is the actual area under the
standard normal distribution curve representing
the probability of a particular sample statistic or
a more extreme sample statistic occurring if the
H0 is true.
B. HYPOTHESIS TESTING USING p-VALUE When the confidence interval contains the
hypothesized mean, DO NOT REJECT H0.
p-value (PROBABILITY VALUE) is the probability
of getting a sample statistic or a mean extreme When the confidence interval does not contain
sample statistic in the direction of the H1 when the hypothesized mean, REJECT H0.
the H0 is true. It is the actual area under the
ONE-SAMPLE Z TEST
standard normal distribution curve representing
the probability of a particular sample statistic or It is a statistical test for the mean of a population
a more extreme sample statistic occurring if the and applicable to interval and ratio scale. It is
H0 is true. used when , or when the population is normally
distributed and population standard deviation
is known.
STEPS FOR p-VALUE METHOD:
*OBSERVED VALUE: statistic (computed sample
State the null hypothesis (H0) and the mean) obtain in the sample data
alternative hypothesis (H1).
*EXPECTED VALUE: parameter (or the population
Choose the level of significance, α, and the mean) that one would expect to obtain if the null
hypothesis were true (or the hypothesized value)
sample size

Determine the test statistic and sampling


ASSUMPTIONS IN ONE-SAMPLE Z TEST
distribution.
Subjects are randomly selected.
Compute the test value.
Population distribution is normal.
Determine the p-value.
The population should be known.
Make a statistical decision.
Cases of the samples should be independent
State the conclusion
Sample size should be greater than or equal to
OTHER IMPORTANT GUIDELINES FOR P-VALUES:
30.
If the -value , reject the H0, thus difference is
highly significant
ONE-SAMPLE T TEST
If the -value > 0.01 and -value , reject the H0,
thus difference is significant One sample t-test is a statistical procedure that
is used to know the mean difference between
If the -value and -value , consider a
the sample and the known value of the
consequences of Type I error before rejecting
population mean based from an interval or
the H0, thus difference is significant.
ratio scale.
If the -value > 0.10, do not reject the H0, thus
We draw a random sample from the
difference is not significant.
population and then compare the sample
CONFIDENCE INTERVALS AND HYPOTHESIS mean with the population mean and make a
TESTING statistical decision as to whether or not the
sample mean is different from the population.
Another concept of hypothesis testing is the
relationship between hypothesis testing and The sample size should be less than 30.
confidence intervals. The relationship between
confidence interval and hypothesis testing can
be summarized into two.
ASSUMPTIONS IN ONE-SAMPLE T TEST between the observed sample means, . This
observed difference belongs to a sampling
The population must be approximately
distribution.
normally distributed.
ASSUMPTIONS IN Z TEST FOR INDEPENDENT
Samples drawn from the population should be
POPULATIONS
random.
Subjects are randomly selected and
Cases of the samples should be independent.
independently assigned to groups.
Sample size should be less than 30.
Population distribution is normal.
The population mean should be known.
Population standard deviations are known.

TEST FOR PROPORTION

Considered as a binomial experiment when


there are only two outcomes and the
probability of success does not change from
trial to trial.

ASSUMPTIONS IN Z TEST FOR PROPORTION

Subjects are randomly selected

Population distribution is normal

Observations are dichotomous

HYPOTHESIS TESTING: TWO POPULATIONS

BASIC KINDS OF SAMPLES

INDEPENDENT SAMPLE

DEPENDENT SAMPLE

*SOURCE: can be an object, a person, or


something that yield a part of data.

*INDEPENDENT SAMPLING: if two unrelated sets


of the sources are used, one set of each
population.

*DEPENDENT SAMPLING: if same set of sources


are paired or matched in some way to obtain
the data representing both populations

TESTING THE DIFFERENCE BETWEEN TWO MEANS:


LARGE

When comparing the means of two


populations, we usually consider the difference
between their means μ1-μ2. The inferences will
make about μ1-μ2 will based on the difference
STUDENT’S T-TEST 2 NATURES OF RELATIONSHIP BETWEEN VARIABLES

 Also known as t-test for independent samples  POSITIVE


 Is used to test the significance of the difference  NEGATIVE
between two sample means
2 TYPES OF ANALYSIS
 This can be used to compare the sample means
between two independent samples.  SIMPLE
 A parametric which assumes a normal  MULTIPLE
distribution and it is used for smaller samples
Simple Relationship:
(sample size should be less than 30)
 Developed by William Sealy Gosset in 1908 and There are two variables:
published the pseudonym student.
 INDEPENDENT/EXPLANATORY/PREDICTOR
Assumptions in T-Test for Independent Samples  DEPENDENT/RESPONSE
1. Subjects are randomly selected and Multiple Relationship
independently assigned to groups.
2. Population variances are homogenous There are two or more independent variables are used
3. Population distribution is normal. to predict the dependent variable.
SIMPLE LINEAR RELATIONSHIP
TEST THE DIFFERENCE BETWEEN 2 MEANS: PAIRED
SAMPLES Positive Relationship Exists when either
variables increase at the
PAIRED SAMPLE T-TEST
same time or both
 It is statistical technique that is used to decrease at the same
compare two population means in the case of time.
two samples that are correlated. Negative Relationship Exists when as one
 It is used “before-and-after” studies, or when variable increases, the
the samples are the matched pairs, or the case other variables decrease
is a control study. or vice versa.

ASSUMPTIONS IN PAIRED SAMPLE T-TEST


PEARSON PRODUCT-MOMENT CORRELATION
1. Only the matched pair can be used to perform
the paired sample t-test.  Most widely used in statistics to measure the
2. In the paired sample t-test, normal distributions degree of relationship between linear related
are assumed. variables
3. The variance of two samples are homogenous.  Pearson r correlation require both variables to
4. The observations must be independent of be normally distributed; used to measure
a=each other. degree of relationship between two products
WEEK 10: CORRELATION AND REGRESSION ANALYSIS
CORRELATION CORRELATION COEFFICIENT (PEARSON’S r)
 Statistical method used to determine whether a  Measure of the linear strength of the
relationship between variables exist. association between two variables.
 The value of the correlation coefficient lies
VARIABLES- characteristic of the population being
around +1 and -1.
observed or measured
 When the value of the correlation coefficient
REGRESSION ANALYSIS lies around +-1, then it is said to be perfect
degree of association between two variables. As
 Statistical method to describe the nature of the the value of the correlation coefficient goes
relationship between variables.
closer to zero, the correlation between the two
variables will be weaker.
B. TEST OF SIGNIFICANCE
 A test of significance for the coefficient of
correlation may be used to find if the
computed Pearson’s r could have occurred
in a population in which the two variables
are related or not.
 The test statistic follows the t distribution
with n-2 degrees of freedom.
ASSUMPTION IN PEARSON PRODUCT-MOMENT
CORRELATION TEST
1. Subjects are randomly selected.
2. Both populations are normally distributed
OTHER IMPORTANT NOTES
When the null hypothesis has been rejected for a
specific significance level; there are possible
relationships between x and y variables:
1. There is a direct cause-and-effect relationship
between the two variables.
2. There is a reverse cause-and-effect relationship
between the two variables
3. The relationship between the two variables may
be caused by the third variable.
4. There may be a complexity of interrelationship
among many variables.
5. The relationship between the two variables may
be coincidental
SPEARMAN RANK CORRELATION
 Also known as Spearman’s Rho (named after
Charles Edward Spearman)
 Denoted by p (rho) or as rs
 Non-parametric test that is used to measure the
degree of association between the two
variables.
 Used when the Pearson test gives misleading
results.
 Counterpart of Pearson Product Moment
Correlation in statistics
ASSUMPTIONS IN SPEARMAN RANK CORRELATION
TEST
1. Subjects are randomly selected
2. Observations must be at least ordinal level.

You might also like