Psychological Statistics Reviewer

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

PSYCHOLOGICAL STATISTICS ◼ allows one to make inferences about the entire

population.
Sampling Methods Non-Probability Sampling:

● Data - facts or figures ◼ is the sampling method where samples are arbitrarily
or purposively selected.
● Parameter – value of a variable computed based on ◼ The major disadvantage is that we can obtain no valid
the entire population. estimates of our risks of errors. Therefore, statistical
inference is not legitimate and should not be applied.
● Variable - a property or characteristic which can be ◼ However, this does not mean that non- probability
measured on every unit of the population, i.e. height, sampling can no longer be used.
weight, economic status, etc.
PROBABILITY SAMPLING
● Benchmark Data - baseline data about a population
SIMPLE RANDOM SAMPLING
● Sampling – process of selecting a sample from a - Simply drawing a sample in a manner that ensures
population each unit has an equal chance of being selected

● Population - totality of units to be studied or When to use:


investigated. ◼ if items of the population are more or less similar with
each other
● Sample - totality of units selected from a population; a ◼ if the population is not widely spread geographically
subset of the population
SYSTEMATIC RANDOM SAMPLING
● Target Population - the population for which the - A method of selecting a sample by numbering the units
researcher would like to make generalizations about. consecutively from 1 to the last unit and with a random
start, selects the subsequent units at a constant interval.
● Sampling Population - the accessible population
from which a sample is actually taken. When to use:
◼ if ordering of population is essentially random
● Sampling Frame - the list of all the units in the ◼ if there is slight diversity in the population
sampling population.
Procedure:
● Sampling Unit - the unit in a sampling frame. 1. Consecutively number the units in the population from
1 to N
● Sample Size - the total number of samples.
2. Determine the sampling interval k where k = N/n then
round it off.

3. Randomly select a number r from 1 to k. The selected


number will be the first unit in the sample.

4. Repeat the process until n units are selected. The


units of the sample are r, r+k, r+2k, r+3k, ..., r+nk.

Example: A sample of 5 female workers is to be


selected from a group of 10 female workers.
1. List down all the 10 female workers and assign a
Why sample? number to each one.
◼ Reduced cost
◼ Greater speed 1 Leonila 2 Blessilda 3 Florencia
◼ Greater scope 4 Gertrudes 5 Angela 6 Corazon
◼ Greater Accuracy 7 Agnes 8 Imelda 9 Estela
10 Virgie
Sampling methods can be of two types:
2. Determine the sampling interval, k, where
1. Probability Sampling k = N/n = 10 / 5 = 2
2. Non-Probability Sampling
3. Select a number from 1 to 2, say, r = 1.
Probability Sampling:
4. The units in the sample are: 1, 3, 5, 7, 9.
◼ based on the concept of random selection which
ensures that each unit in the sampling population has a Thus, the corresponding samples are: Leonila,
known non-zero chance of being selected in the sample. Florencia, Angela, Agnes and Estela.
STRATIFIED RANDOM SAMPLING NON-PROBABILITY SAMPLING
- A sampling method where a population is divided into
subgroups, called strata. CONVENIENCE SAMPLING
- Also called accidental or haphazard sampling, this
- Each stratum is a subset of the population that share at sampling method is based on the accessibility of units to
least one common characteristic. the researcher.

- After each stratum is identified, random sampling is When to use:


then used to select units from each stratum. ◼ if there is a need to get only approximation of actual
value quickly and inexpensively
When to use:
◼ if precise estimates are desired for certain groups Example: Public opinion surveys conducted by
◼ if sampling considerations differ from one group to television stations. Listeners are asked to participate by
another dialing the telephone number on the television screen.
With this scheme, only the individuals who volunteered
Procedure: to call will be the samples in the survey.
◼ Divide the population into H strata with items within
each stratum being more or less similar, while the strata PURPOSIVE OR JUDGEMENT SAMPLING
are marked differently from one another. Then, a random - Taking a sample which is "representative" according to
sample is taken from each stratum. some criteria or purposes in the mind of a researcher

Example: A sample of 16 establishments is to be When to use:


selected from a group of 50 establishments. The ◼ if one wishes to select a specific group for screening
establishments are classified by its main economic purposes
activity: 15 agricultural establishments, 20 industrial
establishments and 15 establishments engage in Example: A market researcher is interested in single
services. females between 30-40 years old. He will probably be
sizing up the people passing by a shopping mall and will
◼ List all the 15 agricultural establishments and get a stop anyone who looks to be in that category and ask
random sample of 5. them if they are willing to be interviewed. Females that
◼ List all the 20 industrial establishments and get a look like they fall in the category are taken as samples.
random sample of 6.
◼ List all the 15 establishments engage in services and Example: Suppose a researcher wants to estimate the
get a random sample of 5. average amount of money a shopper spends in a mall.
Then he will take sample shoppers who look like they
CLUSTER SAMPLING have spent “average” amount of money already,
- A sampling method where the population is divided into depending on the packages they carry.
distinct groups or clusters of smaller units and then take
a random sample of clusters. ◼ In this way, we have deliberately selected a sample to
confirm our prior opinion.
- From the clusters, all the units will be drawn.
QUOTA SAMPLING
- Unlike strata, clusters are typical of the heterogeneous - The sample is obtained by meeting a specified
characteristic of the population. number (quota) of units according to certain controls to
minimize serious biases. These controls are similar to
When to use: strata.
◼ if information about the population (benchmark) is not
available - The researcher first identifies the stratum and their
◼ if lower field cost is desired proportions as they are represented in the population.
Then convenience or judgment sampling is used to
Procedure: select the required sample for
◼ Identify the clusters and consecutively number them each strata.
from 1 to C. Obtain a sample of clusters. Then observe
all units in each selected cluster. When to use:
◼ If probability sampling is impractical, costly or
Example: A study of town governments will require a considered unnecessary.
visit to the towns personally. With a limited budget, the ◼ If one wishes to get a sample with a
country is divided into Regions. Five regions were predetermined number of units.
randomly selected. All the town governments in the five
(5) selected regions were considered as units in the Example: It is known that the urban population is 30%
samples. male and 70% female. A researcher is required to have
100 samples. The researcher will then select samples
until he get 30 males and 70 females.
◼ No instruction is given on how these the 30 males and Basic Concepts in Estimation
70 females are to be filled.
❖ Point Estimation
SNOWBALL SAMPLING
- A special non-probability method used when the An estimator is any statistic whose value is used to
desired sample characteristic is rare. estimate an unknown parameter. A realized value of an
estimator is called an estimate.
- It may be extremely difficult or cost prohibitive to locate Example: The sample mean X̄, is an estimator of the
respondents in these situations population mean μ.

- Relies on referrals from initial subjects to generate An estimator is said to be unbiased if the average of the
additional subjects. estimates it produces under repeated sampling is equal
to the true value of the parameter being estimated.
- While this technique can dramatically lower search Example: Under random sampling, the sample mean is
costs, it comes at the expense of introducing bias unbiased estimator of the population mean, that is E(X̄)
because the technique itself reduces the likelihood that = μ.
the sample will represent a good cross section from the
population. Estimating Mean and Proportion

When to use: ❖ Interval Estimation


◼ when the desired characteristic is rare
◼ Useful in early stages of an investigation, just to learn An interval estimator of a population parameter is a rule
something about the rare population. that tells us how to calculate two numbers based on
sample data, forming an interval within which the
Example: It is known that AIDS victims do not come parameter is expected to lie. This pair of numbers, (a, b),
openly to the public. In a study of the emotional is called an interval estimate or confidence interval.
characteristics of AIDS victims, one relies to referrals of
doctors, other AIDS victims and some spiritual directors. 2. CONFIDENCE INTERVAL
Until such time that the researcher is satisfied with the
number at hand, then the collection of data will be Confidence Interval (interval estimate) is a range (or an
closed. interval) of values that is likely to contain the true value
of the population parameter. It is associated with a
◼ To take a snowball sample of homeless persons, you degree of confidence, which is a measure of how certain
will start with just a few. Then ask each of them to you are that the interval contains the population
identify additional homeless persons until the desired parameter. The degree of confidence is the probability 1
sample size is attained. - 𝒶 that the interval contains the true parameter.
_________________________________________ Common choices for the degree of confidence are 90%
(with 𝒶= 0.10), 95% (with 𝒶= 0.05), and 99% (with 𝒶=
LESSON 2: 0.01). The most common is the 95% because it provided
Estimating Mean and Proportion a good balance between precision and reliability. Hence,
in any given problem if no degree of confidence is given,
Topics it is understood to be at 95% with a .05 level of
1. Estimation significance.
1.1.1 Basic Concepts of Estimation
1.1.2 Estimating the Mean ❖ CRITICAL VALUE
1.1.3 Estimating the Proportion
2. Confidence Interval CRITICAL VALUE (or tabular value) is the number on
2.2.1 Confidence Interval of the Population the borderline separating sample statistics that are likely
Mean to occur from those that are unlikely to occur. It is also
2.2.2 Confidence Interval of the Population known as the tabular value.
Proportion
3. Solving for Sample Size Involving Proportions

1. ESTIMATION

Statistical Inference refers to methods by which one


uses sample information to make inferences or
generalizations about a population.
❖ MARGIN OF ERROR (E)
Two Areas of Statistical Inference
MARGIN OF ERROR (E) is the maximum likely (with
1. Estimation probability 1- 𝒶) difference between the observed sample
● Point estimation mean X̄ and the true value of the population mean μ. It is
● Interval estimation also called the maximum error of estimate.
2. Hypothesis Testing
Remarks
1. In general, we construct a (1- 𝒶) 100% confidence
interval. The fraction (1- 𝒶) is called the confidence
coefficient (level of confidence) and the endpoints a and
b are called lower and upper confidence limits,
respectively.
2. Interpretation of (1- 𝒶) 100% confidence interval: If we
take repeated samples of size n and if for each one of
these samples, we compute the (1- 𝒶) 100% confidence
interval then (1- 𝒶) 100% of the resulting confidence
intervals will contain the unknown value of the
parameter.
3. The confidence coefficient is not “the probability that
the true value of the parameter falls in the interval
estimate” since once a sample is drawn and a
confidence interval constructed, the resulting interval
estimate either encloses the true value of the parameter
or it does not. Rather, the confidence coefficient is “the
probability that the interval estimator encloses the
true value of the parameter”. 2. The running time (in minutes) of a sample of
4. A good confidence interval is one that is as narrow as films produced by a film production outfit are as
possible and has a large confidence coefficient, near 1. follows: 103, 94, 110, 87, 98
The narrower the interval, the more exactly we have
located the parameter; whereas, the larger the A 95% (0.95) confidence interval for the mean running
confidence coefficient, the more confidence we have that time of films produced by the film production outfit is
a particular interval encloses the true value of the (87.56, 109.24).
parameter. However, for a fixed sample size, as the
confidence coefficient increases, the length of the
interval also increases.

Confidence Interval for μ

Remarks
The above formulas hold strictly for random samples
from a normal distribution. However, they provide good
approximate (1- 𝒶) 100% confidence intervals when the
distribution is not normal provided the sample size is
large, i.e. n>30.

Guided Exercises:

1. A study of 50 registered psychometricians showed ● The number 0.95 in the example is called the
that their average score was 85. The standard deviation confidence coefficient or the degree of confidence.
of the population is 12. Find the 95% confidence interval ● The endpoints 87.56 and 109.24 are called the lower
of the mean score of the psychometricians and upper confidence limits.

Practice Exercises

1. The dean of a university wishes to estimate to


average age of students presently enrolled. From past
studies, the standard deviation is known to be 2 years. A
sample of 100 students is selected, and the mean is
found to be 19.2 years. Find the 99% confidence interval
of the average age of students.
Guided Exercises
1. Fast food chains regularly ask customers to be part of
their survey on the kind of service they provide. Suppose
1200 customers participate on the survey and 908
responded that they like their service. What is the
estimate of the true proportion of all customers who like
the service?

2. The dean of a university wishes to estimate to


average age of students presently enrolled. From past
studies, the standard deviation is known to be 2 years. A
sample of 100 students is selected, and the mean is
found to be 19.2 years. Find the 95% confidence interval
of the average age of students.
2. Jose conducted a survey in which 620 of the 1400
voters indicated their preference for a particular
candidate. Using a 95% confidence level, what is the
true population proportion p of voters who preferred the
said candidate?

❖ PROPORTIONS
A proportion is a fractional expression where the
favorable response is in the numerator and the total
number of responses is in the denominator. Division is 3. Solving for Sample Size Involving Proportions
the basic operation involved and the result is a decimal
number that can be expressed as percent. Steps
● Determine the confidence level
Notations: ● Determine the critical value or the tabular value
● Determine the margin of error, E
●Determine the p̂ and q̂

Substitute the values in the formula:

Guided Exercises
Find the sample size using the following information:
1. 90% confidence, E = .08 p̂ = .38
Where x is the number of sample elements that possess
the desired characteristics and n is the sample size.

Confidence Interval for p (large samples)

The formula for computing a large-sample confidence


interval for a population is

2. 99% confidence, E = .03 p̂ = .59

Assumptions
- The sample is a random sample
- The conditions for a binomial experiment are satisfied.

3. A city mayor wants to determine the sample size he


needed to interview because he had plans of
establishing a new landfill. He wants to be able to assert
with 95% confidence that his error will be within 3%. A
survey in the past revealed a 72% approval. How large a
sample does the mayor need?
_________________________________________

LESSON 3: T-TEST

THE PARAMETRIC TEST

Parametric tests are tests that require normal


distribution and the levels of measurement are
expressed in interval or ratio data.

THE T-TEST

The t-test is used to compare two means, the means


of two independent samples or two independent groups
and the means of correlated samples before and after
the treatment. The t-test for two independent
samples/groups.

Hypothesis:
H0 : There is no significant difference between the
performance of male and female BS student in
Psychology.

H1 : There is a significant difference between the


performance of male and female BS students in
Psychology.
Example: The following are the scores of 10 male and
10 female BS students in Psychology. Test the null
hypothesis that there is no significant difference between
the performance of male and female BS Psychology Level of Significance:
students in the said test. Use the t-test at .05 level of
significance.

Statistics:
T-test for two independent samples

Decision Rule:
If that t-computed value is greater than or beyond the
tabular/critical value, reject H0.

Conclusion:
Since the t-computed value of 2.88 is greater than
t-tabular value 2.101 at .05 level of significance with 18
degrees of freedom, the null hypothesis is rejected in
favor of the research hypothesis. This means that there
is a significant difference between the performance of
male and female of BS students in Psychology. It implies
that the male perform better than the female students
considering that the mean/average score of the male
students of 13.1 is greater compared to the average
score of female student of only 7.8.
THE T-TEST FOR CORRELATED SAMPLES Statistics:
t-test for correlated samples
The t-test for correlated samples is used when
comparing the means before and after the treatment. It Decision Rule:
is also used to compare the means of the pre-test and If the t-computed value is greater than or beyond the
the post-test. critical value, reject H0.

Conclusion:
The t-computed value of -3.17 is beyond the t-critical
value of -1.73 at .05 level of significance with 19 degrees
of freedom, the null hypothesis is therefore rejected in
favor of the research hypothesis. This means that the
post-test result is higher than the pre-test result. It
implies that the use of the programmed materials in
English is effective.

_____________________________________________

LESSON 4: Z-TEST
Example: An experimental study was conducted on the
effect of programmed materials in English on the The z-test is another test under parametric statistics
performance of 20 selected college students. Before the which requires the normality of the distribution. It utilizes
program was implemented the pre-test was administered the two population parameters μ and σ . It used to
and after 5 months the same instrument was used to get compare two means, the sample mean, and the
the post-test result. The following is the result of the perceived population mean.
experiment.
The Tabular Value Of Z-Test

THE ONE SAMPLE MEAN TEST

The one sample mean test is used when the sample


mean is being compared to the perceived population.

Example: ABC company claims that the average lifetime


of a certain tire is at least 28,000 km. To check the claim,
a taxi company puts 40 of these tires on its taxis and
gets a mean lifetime of 25,560 km. With a standard
Hypothesis: deviation of 1,350 km, is the claim true? Use z-test at .05
H0: There is no significant difference between pre-test
and post-test or the used of programmed materials did Hypothesis:
not affect the students’ performance in English. H0: The average lifetime of a certain tire is 28,000
km.
H1: The post-test result is higher than the pre-test result.
H1: The average lifetime of a certain tire is not
Level Of Significance: 28,000 km.
α = .05
df = n-1 Level Of Significance:
= 20-1 α = .05 ; One – tailed test
= 19 z = ±1.645
t.05 = -1.729; one-tailed test
Statistics: Statistics:
Z-test for one sample mean test z-test for two sample mean test

Computation:

Decision Rule:
If the z-computed value is greater than or beyond the
tabular value, reject H0.
Conclusion:
Since the z computed value of - 11.42 is beyond the Conclusion:
critical value of - 1.645 at .05 level of significance the Since the z computed value of 5.774 is greater than the
null hypothesis is rejected or research hypothesis is z-tabular value of 2.575 at .01 level of significance, the
accepted which means that the average lifetime of a research hypothesis is accepted which means that there
certain tire is not 28,000 km. is a significant difference between the two groups. It
implies that incoming freshmen from nursing are better
THE TWO SAMPLE MEAN TEST than the incoming veterinary medicine.
_____________________________________________
The two-sample mean test is used when comparing two
separate samples drawn at random taken from a normal LESSON 5: F-TEST (ONE-WAY-ANOVA)
population. To test whether the difference between the
two values of X̄1 and X̄2 is significant or can be attributed ● The F-test is the Analysis of Variance (ANOVA). This
to chance. is used in comparing 3 or more independent groups.

● One way ANOVA is used when there is only one


variable involved. While the two-way ANOVA is used
when two variables are involved: the column
and row variables.

● F-test will be using if we want to know if there are


differences between and among columns and rows.

● This is also used in looking at the interaction effect


between the variables being analyzed.

● The F-test is more efficient than other tests of


Example: An admission test was administered to differences.
incoming freshmen in the College of Nursing and
Veterinary Medicine with 100 students. Each was F-TEST (ONE-WAY ANOVA)
randomly selected. The mean scores of the given
samples were X̄1 = 90 and X̄2 = 85 and the variances of Example: A sari-sari store is selling 4 brands of
the test scores were 40 and 35, respectively. Is there a shampoo. The owner is interested if there is a significant
significant difference between the two groups? Use .01 difference in the average sales for one week. The
level of significance. following are recorded.

Problem:
Is there a significant difference between the two groups?

Hypothesis:

Level Of Significance: Perform the analysis of variance and test the


α = .01; Two-tailed test hypothesis at .05 level of significance that the average
z = ±2.575 sales of the four brands of shampoo are equal.
Hypothesis: Conclusion:
H0 : There is no significant difference in the average Since the F-computed value of 7.98 is greater than the
sales of the four brands of shampoo. F-tabular value of 3.01 at .05 level of significance with 3
H1 : There is a significant difference in the average sales and 24 degrees of freedom, the null hypothesis is
of the four brands of shampoo. rejected in favor of the research hypothesis which
means that there is a significant difference in the
Level of Significance: average sales of the 4 brands of shampoo.
α = .05
df = 3 and 24 SCHEFFÉS TEST
dfb = k-1, dfw = (N-1)-(k-1)
= 4-1, = (28-1)-(4-1)
=3 = 27 – 3 = 24
dft = (N-1) = 28-1= 27

Statistics:
F-test One-way-Analysis of Variance

Comparison of the Average Sales of the Four Brands of


Shampoo

The above table shows that there is a significant


difference between brand A and brand B, brand B and
brand C and, also brand B and brand D However,
brands A and C, A and D and C and D have no
significant differences in their average sales. This
implies that brand B is more saleable than brands A, C
and D
F-TEST (TWO-WAY ANOVA WITH
INTERACTION EFFECT)

Example:
Forty five language students were randomly
assigned to one of three instructors and to one of the
three methods of teaching. Achievement was measured
on a test administered at the end of the term. Use two
way ANOVA with interaction effect at .05 level of
significance to test the hypothesis:

Problem:
1. Is there a significant difference in the performance of
students under three different instructors?
2. Is there a significant difference in the performance of
students under the three different methods of teaching?
3. Is there an interaction effect between teachers and
methods of teaching factors?

Hypotheses:
1. H0: There is no significant difference in the
performance of the three groups of students under three
different instructors.

H1: There is a significant difference in the performance


of the three groups of students under three different
instructors.

2. H0: There is no significant difference in the


performance of the three groups of students under three
different methods of teaching.

H1: There is a significant difference in the performance


of the three groups of students under three different
methods of teaching.

3. H0: Interaction effects are not present.


H1: Interaction effect is present.

Level of Significance:

a = .05
df total = N-1= 45-1 = 44
df within = k (n-1) = 9(5-1) = 36
df column = c-1 = 3-1 = 2
df row = r-1 = 3-1= 2
df c∙r = (c-1)(r-1) = (3-1)(3-1) = 4

Statistics:
F-test ( Two- way ANOVA with interaction)
Conclusion:

With the computed F-value (column) of 24.82


compared to the F-tabular value of 3.26 at .05 level of
significance with 2 and 36 degrees of freedom, the null
hypothesis is rejected in favor of the research hypothesis
which means that there is a significant difference in the
performance of the three groups of students under three
different instructors. It implies that instructor B is better
than instructor A.

With regard to the F-value (row) of 1.95, it is


lesser than the F-tabular value of 3.26 at .05 level of
significance with 2 and 36 degrees of freedom. Hence,
the null hypothesis of no significant differences in the
performance of the students under the three different
methods of teaching accepted.

However, the F-value (interaction) of 13.03 is


greater than the F-tabular value of 2.64 at .05 level of
significance with 4 and 36 degrees of freedom. Thus, the
research hypothesis is accepted which means that the
interaction effect is present. It implies that there is
interaction effect between the instructor and their
methods of teaching. Students under instructor B have
better performance under methods of teaching 1 and 3
while students under instructor C have better
performance under method 2.

You might also like