Different STATISTICAL TOOL
Different STATISTICAL TOOL
Different STATISTICAL TOOL
The word statistics has two meanings. In a more common usage, statistics refers to the
numerical facts. The number that represents the income of the family, the number of
cars sold at a dealership during past months, the number of employees of a company,
the number of students enrolled in a class, the increase and decrease of enrollment in a
certain university, the kind of food frequently served to customers, the starting salary of
a typical college graduate, are examples of statistics in this sense.
The second meaning of statistics refers to the field or disciplines of study. In this sense, the
word statistics is defined as follows.
Statistics is a group of methods that are used to collect, organize, present, analyze, and interpret
data to make decisions.
Everyday we make decisions that may be personal, business related, or of some other kind.
Usually, these decisions are made under conditions of uncertainty. Many times, the situations or
problems we face in the real world have no precise or definite solution. Statistical methods help
us to make scientific and intelligent decisions in such situations. Decisions made by using
statistical methods are called educated guesses. Decisions made without using statistical (or
scientific) are pure guesses and hence, may prove to be unreliable.
Like almost all field of study, statistics has two aspects, theoretical and applied.
Theoretical or mathematical statistics deals with the development, derivation, and
proofs of statistical theorems, formulas, rules and laws. Applied statistics involves the
applications of those theorems, formulas, rules and laws to solve problems. Applied
statistics can be categorized into two areas: descriptive statistics and inferential
Descriptive Statistics consists of methods for organizing, displaying, and describing data
by using tables, graphs and summary measures.
Suppose we have information on the 2022 total sales of top 100 companies in the
Philippines. In statistical terminology, the whole set of numbers that represent the sales
of the top 100 companies is called data set, the name of each company is an element,
and the sales of each company is called an observation.
A data set in its original form is usually very large. Consequently, such a data set is not
very helpful in drawing conclusions or making decisions. It is easier to draw conclusions
from summary tables and diagrams than from the original version of a data set. So, we
reduce data to a manageable size by constructing tables, drawing graphs, or calculating
summary measures such as averages. The portion of statistics that helps us to do this
type of statistical analysis is called descriptive statistics.
In statistics, the collection of all elements of interest is called a population. The selected few
elements from its population called a sample.
A major portion of statistics deals with making decisions, inferences, predictions, or forecasts
about populations based on results obtained from samples. For example, we may make some
decisions about views of all college and university students on the role of ethics in business
based on the views of 100 students taken as samples from a few colleges and universities.
Another example, suppose a company receives a shipment of parts from a manufacturer that
are to be used in CD players manufactured by this company. To check the quality of the whole
shipment, the company will select a few items from the shipment, inspect them, and make
decision. The area of statistics that deals with such decision-making procedures is referred to as
inferential statistics. This branch of statistics is also called inductive reasoning or inductive
statistics.
Functions of Statistics
1. To provide investigators means of measuring scientifically the conditions that may be
involved in a given problem and assessing the way in which they are related.
2. To show the laws underlying facts and events that cannot be determined by individual
observations.
3. To show relations of cause and effect that otherwise may remain unknown.
4. To find the trends and behavior in related conditions which otherwise may remain
ambiguous.
TERMINOLOGIES IN STATISTICS
Some important terms are commonly used in the study of Statistics. These terms should be
understood fully in order to facilitate the study of statistics.
1. Population refers to a large collection of objects, places or things. To illustrate this,
suppose a researcher wants to determine the average income of the residents of a
certain barangay and there are 1500 residents in the barangay. Then all of these
residents comprise the population. A population is usually denoted or represented by
N. Hence, this case, N = 1500.
2. Sample is a small portion or part of a population. It could also be define as a sub-group,
subset, or representative of a population. For instance, suppose the above-mentioned
researcher does not have enough time and money to conduct the study using the whole
population and he wants to use only 200 residents. These 200 residents comprise the
sample. A sample is usually denoted by n, thus n = 200.
3. Parameter is any numerical or nominal characteristics of a population. It is a value or
measurement obtained from a population. It is usually referred to as the true or actual
value. If in the preceding illustration, the researcher uses the whole population
(N=1500), then the average income obtained is called a parameter.
- Is any characteristic of a population that is measurable.
3. Interval Scale- If data are measured in the interval level, we can say not only that
one object is greater or less than another, but we can also specify the amount of
difference. The scores in an examination are of interval scale of measurement. To
illustrate, suppose Kensly Kyle got 50 in a Math examination while Kwenn Anne got
40. We can say the Kensly Kyle got higher score than Kwenn Ann by 10 points.
(Scores, IQ, current temperature on Fahrenheit or Celcius, pH, SAT, salary grade,
4. Ratio Scale- The ratio level of measurement is like the interval level. The only
difference is that the ratio level always starts from an absolute or true zero point. In
addition, in the ratio level, there is always the presence of units of measure. If data
are measured in this level, we can say that one object is so many times as large or as
small as the other. For example, suppose Mrs. Reyes weight 50 kg, while her
daughter weighs 25 kg. We can say that Mrs. Reyes is twice heavier than her
daughter. Thus, weight is an example of data measured in the ratio.
(Distance of La Carlota to Bago City, amount of money in your account, electricity
bills, water consumption, weight of the baby, number of vacation leave, your height
in centimeters, setc )
SAMPLING TECHNIQUES
Sampling Technique- is a procedure used to determine the individuals or members of a
sample.
Sampling is performed so that a population under study an be reduced to a manageable
size.
A – PROBABILITY OR RANDOM SAMPLING TECHNIQUE is a sampling technique wherein each
member or element of the population has an equal chance of being selected as members of
the sample.
1. Simple Random Sampling
a. Lottery Method
Suppose Mrs. Cruz wants to send five students to attend a 2-day training or seminar in basic
computer programming. To avoid bias in selecting these five students from her 40 students,
she can use the lottery sampling. This is done by assigning a number of paper to each student
and then writing these numbers on pieces of paper. Then, these pieces of paper will be rolled
or folded and placed in a box called lottery box. The lottery box should be thoroughly shaken
and then five pieces of paper will be picked or drawn from the box. The students who were
assigned to the numbers chosen will be sent to the training. In this case, the selection of the
students is done without bias. Note that we can simply assign1 to the first student, 2 to the
second student and so on.
b. Sampling with the use of Table of Random Numbers
Below is a proportion of the table of roman Random Numbers
School N n
A 100 (100/480)*218 45
B 80 36
C 200 91
D 60 27
E 40 19
N=480 n=218
5% = 0.05
N 480 480 480 480
n= = = = = = 218.18=218
1+ Ne ² 1+ 480(.05) ² 1+ 480(.0025) 1+ 1.2 2.2
4. Cluster Sampling
Cluster sampling is sampling wherein groups or clusters instead of individuals are randomly
chosen. Recall that in the simple random sampling we select members of the sample
individually. In cluster sampling, we will select or draw the members of the sample by group
and then we select a sample of elements from each cluster or group randomly. Cluster
sampling is sometimes called area sampling because this is usually applied when population is
large.
To illustrate the use of this sampling method, let’s suppose that we want to determine the
average income of the families in Manila. Let us assume there are 250 barangay in Manila. We
can draw a random sample of 20 barangays using simple random sampling, and then a certain
number of families from each of the 20 barangays may be chosen.
5. Multi-Stage Sampling
Multi-stage sampling is a combination of several sampling techniques. This method is
usually used by the researchers who are interested in studying a very large population, say the
whole island of Luzon or even the Philippines. This is done by starting the selection of the
members of the sample using cluster sampling and then dividing each number or group into
strata. Then, from each stratum individuals are drawn using simple random sampling.
B. Non-Probability or Non- Random Sampling Techniques
The non-probability sampling is a sampling technique wherein members of the sample
are drawn from the population based on the judgment of the researchers. The results of a
study using this sampling technique are relatively biased. This technique lacks objectivity of
selection; hence, it is sometimes called subjective sampling. Inferences made based on the
sample obtained using this technique is not so reliable.
Non-probability sampling techniques are used because they are convenient and
economical. Researchers use these methods because they are inexpensive and easy to conduct.
1. Convenience Sampling
As the name implies, convenience sampling is used because of the convenience it offers
to the researcher. For example, a researcher who wishes to investigate the most popular
noontime show may just interview the respondents through the telephone. The result of this
interview will be biased because the opinions of those without telephone will not be included.
Although convenience sampling may be used occasionally, we cannot depend on it in making
inferences about a population.
2. Quota Sampling
In this type of sampling, the proportions of the various subgroups in the population are
determined and the sample is drawn to have the same percentage in it. This is very similar to
the stratified random sampling the only difference is that the selection of the members of the
sample using quota sampling is not done randomly. To illustrate this, let us suppose that we
want to determine the teenagers’ most favorite brand of T-shirt. If there are 1000 female and
1000 male teenagers in the population and we want to draw 150 members for our sample, we
can select 75 female and 75 male teenagers from the population without using randomization.
This is quota sampling.
3. Judgment or Purpose Sampling
Another method of drawing the members of the sample using non-probability is by
using purposive sampling. Let us suppose that the target is to find out the effectiveness of a
certain kind of shampoo. Of course, bald fellows will not be the sample.
4. Incidental Sampling
This design is applied to those samples which are taken because they are the most
available. The investigator simply takes the nearest individuals as subjects of the study until it
reaches the desired size. In an interview, for instance, an interviewer can simply choose to ask
those people around him or in a coffee shop where he is taking a break.
The parametric tests. To use the parametric tests, there are some conditions that should be
met. The data must be normally distributed and the level of measurement must be either interval or
ratio.
The data are said to be normal when the value of skewness equals zero and the value of kurtosis is 2.65.
Parametric Tests are used when the data are in the interval and ratio scales. It is assumed that the data
are normally or nearly normally distributed.
- Is used to compare two sample means when the two samples are independent of
one another.
2. T-test of Dependent/ correlated Means- It is used to determine the significance between two
means obtained by one group from two testing conditions.
Example: Is there a significant difference in the pre and post test scores of children after
undergoing a remedial class?
- Is used for matched samples (where the two samples are not independent of one
another as they are matched) and for pre-test/post test comparisons where the pre-
test and post test are taken on the same group of subjects.
3. Z-test – used to determine the significant difference between the means of two groups or
conditions with more than 30 cases or observations.
4. F-test or ANOVA (Analyis of Variance) – is a statistical method that separates observed variance
data into different components to use for additional test. It is used to determine the significant
difference among means of three or more independent groups.
Developed by sir Ronald Aylmer Fisher
Example: Do the four groups of students significantly differ in terms of academic performance?
NON-PARAMETRIC TESTS – are used when the data are in nominal or ordinal scales.
Non-parametric test are methods of statistical analysis that do not require a distribution to meet
the required assumptions to be analyzed (especially if the data is not normally distributed). Due
to this reason, they are sometimes referred to as distribution-free tests.
1. Chi-Square tests – used to determine the difference or association of two or more sets of
data in nominal-ordinal type of scale.
-t can be used to test association in one or more groups and it does this by comparing actual
(observed) numbers in each group, with those that would be expected according to theory
or simply by chance. Chi-Square test requires that the data be expressed as frequencies, i.e.
numbers in each category; this is nominal level of measurement. It should be noted that in
most cases almost any data can be reduced to categorical or frequency data, but it is not
always wise to do this because information is invariably lost in the process.
2. Spearman rank Order Correlation Coefficient (Spearman rho) named after Charles Spearman
and often denoted by the Greek letter \rho( ρ ¿ or rs. is a nonparametric measure or rank
correlation. It assesses how well the relationship between two variables can be described
using a monotonic function.
It is used to measure the relationship of paired ranks assigned to individual scores on two
variables.
3. Gamma or Goodman’s and Kruskal’s Gamma (G) -is a measure or rank correlation, i.e., the
similarity of the orderings of the data when ranked by each of the quantities. It measures
the strength of association of the cross tabulated data when both variables are measured at
the ordinal level.
This statistic (which is distinct from Goodman and Kruskal’s Lambda) is named after Leo
Goodman and William Kruskal, who proposed it in a series of papers from 1954 to 1972.
It is an alternative of Spearman rho. –used to determine whether or not there is a
correlation between two ordinal variables.
4. Mann-Whitney U Test – used to test the significant difference of independently random
samples from two groups with uneven number of cases in ordinal form.
-it is a non-parametric alternative to the independent t-test.
- a popular nonparametric test to compare outcomes between two independent groups. It
is sometimes called the Mann Whitney Wilcoxon Test or Wilcoxon Rank Sum Test, is used to
test whether two samples are likely to derive from the same population (i.e., that the two
populations have the same size)
5. H-Test or Kruskal Wallis Test – used to test the significant difference of independently
random samples from three or more groups with uneven number of cases in ordinal form.
-(sometimes also called one-way ANOVA on ranks”) is a rank-based nonparametric test that
can be used to determine if there are statistically significant differences between two or
more groups of an independent variable on a continuous or ordinal dependent variable. It is
considered the nonparametric alternative to the one-way ANOVA, and an extension of the
Mann Whitney U-test to allow the comparison of more than two independent groups.
6. Phi-Coefficient – used to measure the degree of association between two binary variables
or two nominal dichotomous variables. These are referred to as binary variables and include
responses to yes/no questions or in many contexts, gender (i.e., male/female)
Phi-coefficient should be used in statistics when a measure of association is desired
between two categorical variables, which is only two possible outcomes.
7. Friedman ‘s Two way Analysis of Variance by Ranks – it is used when the data from related
samples of at least an ordinal scale and had been taken from similar population.
8. Kendall’s Coefficient of Concordance (W) – it is used to determine the relationship among
three or more sets or ranks.
9. Mcnemar test- is a nonparametric test used to analyze paired nominal data. It is a test on a
2x2 contingency table and checks the marginal homogeneity of two dichotomous variables.
The test requires one nominal variable with two categories (dichotomous) and one
independent variable with two dependent groups
This is before and after design which all are trying to test whether there is a significant
change between the before and after situations.
Example: Is there a significant difference in the use of seat belt before and after involvement
in an automobile accident?
10. Sign Test for Correlated Samples (Fisher Sign Test). This test is under nonparametric
statistics. It is the counterpart of the t-test for correlated sample under the parametric test.
The Fisher Sign Test compares two correlated samples and is applicable to data composed of
N paired observations. The difference between each pair of observations is obtained. This
test is based on the idea that half the difference between the paired observations will be
positive and the other half will be negative.
Example: is there a significant difference on the academic performance of the students
before and after the implementation of the program?