Different STATISTICAL TOOL

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

STATISTICS

The word statistics has two meanings. In a more common usage, statistics refers to the
numerical facts. The number that represents the income of the family, the number of
cars sold at a dealership during past months, the number of employees of a company,
the number of students enrolled in a class, the increase and decrease of enrollment in a
certain university, the kind of food frequently served to customers, the starting salary of
a typical college graduate, are examples of statistics in this sense.

The second meaning of statistics refers to the field or disciplines of study. In this sense, the
word statistics is defined as follows.

Statistics is a group of methods that are used to collect, organize, present, analyze, and interpret
data to make decisions.

Everyday we make decisions that may be personal, business related, or of some other kind.
Usually, these decisions are made under conditions of uncertainty. Many times, the situations or
problems we face in the real world have no precise or definite solution. Statistical methods help
us to make scientific and intelligent decisions in such situations. Decisions made by using
statistical methods are called educated guesses. Decisions made without using statistical (or
scientific) are pure guesses and hence, may prove to be unreliable.

 Statistics is a scientific body of knowledge that deals with the collection,


organization or presentation, analysis and interpretation of data.

Collection refers to the gathering of information or data.


Organization or presentation involves summarizing data or information in textual,
graphical or tabular forms.
Analysis involves describing the data by using statistical methods and procedures.
Interpretation refers to the process of making conclusions based on the analyzed data.

 Statistics Is the science of collecting, organizing, analyzing, and interpreting


numerical facts, which we call data.

Like almost all field of study, statistics has two aspects, theoretical and applied.
Theoretical or mathematical statistics deals with the development, derivation, and
proofs of statistical theorems, formulas, rules and laws. Applied statistics involves the
applications of those theorems, formulas, rules and laws to solve problems. Applied
statistics can be categorized into two areas: descriptive statistics and inferential
Descriptive Statistics consists of methods for organizing, displaying, and describing data
by using tables, graphs and summary measures.

Suppose we have information on the 2022 total sales of top 100 companies in the
Philippines. In statistical terminology, the whole set of numbers that represent the sales
of the top 100 companies is called data set, the name of each company is an element,
and the sales of each company is called an observation.

A data set in its original form is usually very large. Consequently, such a data set is not
very helpful in drawing conclusions or making decisions. It is easier to draw conclusions
from summary tables and diagrams than from the original version of a data set. So, we
reduce data to a manageable size by constructing tables, drawing graphs, or calculating
summary measures such as averages. The portion of statistics that helps us to do this
type of statistical analysis is called descriptive statistics.

Inferential statistics-consists of methods that use sample results to help make


prediction.

In statistics, the collection of all elements of interest is called a population. The selected few
elements from its population called a sample.

A major portion of statistics deals with making decisions, inferences, predictions, or forecasts
about populations based on results obtained from samples. For example, we may make some
decisions about views of all college and university students on the role of ethics in business
based on the views of 100 students taken as samples from a few colleges and universities.

Another example, suppose a company receives a shipment of parts from a manufacturer that
are to be used in CD players manufactured by this company. To check the quality of the whole
shipment, the company will select a few items from the shipment, inspect them, and make
decision. The area of statistics that deals with such decision-making procedures is referred to as
inferential statistics. This branch of statistics is also called inductive reasoning or inductive
statistics.

Functions of Statistics
1. To provide investigators means of measuring scientifically the conditions that may be
involved in a given problem and assessing the way in which they are related.
2. To show the laws underlying facts and events that cannot be determined by individual
observations.
3. To show relations of cause and effect that otherwise may remain unknown.
4. To find the trends and behavior in related conditions which otherwise may remain
ambiguous.

Importance of Statistics to Research


1. It gives the most exact kind of description.
2. It provides the most definite and exact procedures in analyzing data.
3. It summarizes results in a meaningful and convenient form.
4. It draws a general conclusion.
5. It predicts possible outcomes under certain conditions.

TERMINOLOGIES IN STATISTICS
Some important terms are commonly used in the study of Statistics. These terms should be
understood fully in order to facilitate the study of statistics.
1. Population refers to a large collection of objects, places or things. To illustrate this,
suppose a researcher wants to determine the average income of the residents of a
certain barangay and there are 1500 residents in the barangay. Then all of these
residents comprise the population. A population is usually denoted or represented by
N. Hence, this case, N = 1500.
2. Sample is a small portion or part of a population. It could also be define as a sub-group,
subset, or representative of a population. For instance, suppose the above-mentioned
researcher does not have enough time and money to conduct the study using the whole
population and he wants to use only 200 residents. These 200 residents comprise the
sample. A sample is usually denoted by n, thus n = 200.
3. Parameter is any numerical or nominal characteristics of a population. It is a value or
measurement obtained from a population. It is usually referred to as the true or actual
value. If in the preceding illustration, the researcher uses the whole population
(N=1500), then the average income obtained is called a parameter.
- Is any characteristic of a population that is measurable.

4. Statistic is an estimate of a parameter. It is a value or measurement obtained from the


sample. If the researcher in the preceding illustration makes use of the sample (n=200),
then the average income obtained is called statistic.
5. Data –(singular form is datum) are facts, or a set of information or observation under
study. More specifically, data are gathered by the researcher from a population or from
a sample. Data may be classified into two categories, qualitative or quantitative
a. Qualitative data are data which can assume values that manifest the concepts of
attributes. These are sometimes called categorical data. Data falling in this category
cannot be subjected to meaningful arithmetic. They cannot be added, subtracted or
divided. Gender and nationality are qualitative data.
b. Quantitative Data are data which are numerical in nature. These are data obtained
from counting or measuring. In addition, meaningful arithmetic operations can be
done with this type of data. Test scores and height are quantitative data.
6. A Variable is a characteristic or property of a population or sample which makes the
members different from each other. If a class consists of boys and girls, then gender is a
variable in this class. Height is also a variable because different people have different
heights. Variables may be classified on the basis of whether they are discrete or
continuous and whether they are dependent or independent.
a. Discrete Variable
A discrete variable is one that can assume a finite number of values. In other words, it can
assume specific values only. The values of a discrete variable are obtained through the process
of counting. The number of students in a class is a discrete variable. If there are 40 students in
a class, it cannot reported that there are 40.2 students or 40.5 students, because it is
impossible for a fractional part of a student to be in the class.
b. Continuous Variable
A continuous variable is one that can assume infinite values within a specified interval. The
values of a continuous variable are obtained through measuring. Height is a continuous
variable. If one reports that the height of a building is 15 m, it is also possible that another
person reports that the height of the same building is 15.1m or 15.12m, depending on the
precision of the measuring device used. In other words, height of the building can assume
several values.
c. Dependent Variable
A dependent variable is a variable which is affected or influenced by another variable.
d. Independent Variable
An independent Variable is one in which affects or influences the dependent variable. To
illustrate Independent and dependent variables, consider the problem entitled, The Effect of
Computer-Assisted Instruction on the Students’ Achievement in Mathematics. Here the
independent variable is the computer-assisted instruction while the dependent variable is the
achievement of students in mathematics.
7. Constant refers to the fundamental quantities that do not change in value, fixed costs
and acceleration due to gravity are examples of such
SCALES OF MEASUREMENT
1. Nominal Scale- This is the most primitive level of measurement. The nominal level
of measurement used when we want to distinguish one object from another for
identification purposes. In this level, we can only say that one object is different
from another, but the amount of difference between them cannot be determined.
We cannot tell that one is better or worse than the other. Gender, nationality and
civil status are of nominal scale.
(occupation, course & major, blood type, race, color, hotel rooms, names of
companies, different maker of cars, political affiliation, religious groupings, etc)
2. Ordinal scale – in the ordinal level of measurement, data are arranged in some
specified order or rank. When objects are measured in this level, we can say that
one is better or greater than the other. But we cannot tell how much more or how
much less of the characteristic one objects than the other. The ranking of
contestants in a beauty contest, or siblings in the family, or of honor students in the
class are of ordinal scale.
Academic Performance [Outstanding, Very Satisfactory, Satisfactory, Fairly
satisfactory, Did Not meet expectation]
level of satisfaction: (Satisfied, partially satisfied, not satisfied)
level of work performance: High, Average, low
assessing the degree of agreement (totally agree, agree, neutral, disagree, totally
disagree)
evaluating the frequency of occurrences (Very often, often, not often, not at all)
medical condition ( serious, guarded, critical)
level of aggressiveness

3. Interval Scale- If data are measured in the interval level, we can say not only that
one object is greater or less than another, but we can also specify the amount of
difference. The scores in an examination are of interval scale of measurement. To
illustrate, suppose Kensly Kyle got 50 in a Math examination while Kwenn Anne got
40. We can say the Kensly Kyle got higher score than Kwenn Ann by 10 points.
(Scores, IQ, current temperature on Fahrenheit or Celcius, pH, SAT, salary grade,
4. Ratio Scale- The ratio level of measurement is like the interval level. The only
difference is that the ratio level always starts from an absolute or true zero point. In
addition, in the ratio level, there is always the presence of units of measure. If data
are measured in this level, we can say that one object is so many times as large or as
small as the other. For example, suppose Mrs. Reyes weight 50 kg, while her
daughter weighs 25 kg. We can say that Mrs. Reyes is twice heavier than her
daughter. Thus, weight is an example of data measured in the ratio.
(Distance of La Carlota to Bago City, amount of money in your account, electricity
bills, water consumption, weight of the baby, number of vacation leave, your height
in centimeters, setc )
SAMPLING TECHNIQUES
Sampling Technique- is a procedure used to determine the individuals or members of a
sample.
Sampling is performed so that a population under study an be reduced to a manageable
size.
A – PROBABILITY OR RANDOM SAMPLING TECHNIQUE is a sampling technique wherein each
member or element of the population has an equal chance of being selected as members of
the sample.
1. Simple Random Sampling
a. Lottery Method
Suppose Mrs. Cruz wants to send five students to attend a 2-day training or seminar in basic
computer programming. To avoid bias in selecting these five students from her 40 students,
she can use the lottery sampling. This is done by assigning a number of paper to each student
and then writing these numbers on pieces of paper. Then, these pieces of paper will be rolled
or folded and placed in a box called lottery box. The lottery box should be thoroughly shaken
and then five pieces of paper will be picked or drawn from the box. The students who were
assigned to the numbers chosen will be sent to the training. In this case, the selection of the
students is done without bias. Note that we can simply assign1 to the first student, 2 to the
second student and so on.
b. Sampling with the use of Table of Random Numbers
Below is a proportion of the table of roman Random Numbers

31871 60770 59235 41702


87134 32839 17850 37359
06728 16314 81076 42172
95646 67486 05167 07819
44085 87246 47378 98338
Let us illustrate how these random numbers are use to select the members of the sample. Let
us consider the preceding example wherein Mrs. Cruz wants to select 5 students from her 40
students. Again, we will assign a number to each student, say from 1 to 40.
Since there are 40 students, we will use the two-digit number of the table of random number
when selecting the members of the sample. This is because the students have been assigned
with number 01, 02, 03,. . . up to 40. Looking at the first column of the table of random
numbers above, we see that the number formed by the first two-digit is 31, hence, the student
assigned to number 31 is chosen as a member of the sample. If we proceed down the column,
we see that the number formed is 87 which cannot be used because we have only 40 members.
In a similar manner, the third number is 06 so that the student assigned to number 6 is chosen.
Notice that the next two numbers from the table are 95 and 44, numbers we cannot use for the
same reason as before. When we get to the bottom of the column, we move up the column and
merely shift one digit to the right for the next random number. Thus, we will have 18 as our
next number. Thus is one of the many alternatives. We can have other ways of selecting the
members of the sample until we complete the 5 students.
2. Systematic Sampling
Let us use the example wherein Mrs. Cruz wants to select 5 students from her 40
students. First, we select a random starting point. This is done by dividing the number of
members in the population by the number of the members in the sample. Hence, in our case
we shall have i = 8. The next step is to write the numbers 1, 2, 3, 4, 5, 6, 7, and 8 on pieces of
paper and draw one number by lottery. If we were able to get 5, this means that we will select
every 5th student in the population as members of the sample. Therefore, the 5 th, 10th, 15th, 20th,
and 25th student shall be the members of the sample. If, for instance, we were able to obtain
the number 6, then the members of the sample will be the 6th, 12th, 18th, 24th and 30th students.
3. Stratified Random Sampling
There are some instances whereby the members of the population do not belong to the
same category, class, or group. To illustrate this, let us suppose that we want to determine the
average income of the families in a certain community or barangay. In a typical barangay,
different families belong to different income brackets. If we will draw or select members of the
sample using simple random sampling, there is a possibility or chance that none of the families
or a disproportionate number of the families from the low-income, average income, or high-
income group will be included in the sample. In this case, the result of the study would turn out
into biased. For example, if the sample comes only from the high-income families, then we will
conclude that the average income of the families living in this barangay is high. This suggest
that the sample that should be drawn from the population should be proportionally drawn
from each group or category – the high, the average, and the low-income families.
To do this, we will use the stratified random sampling. The word stratified comes from the root
word strata which means group or categories (singular form is stratum). When we use this
method, we are actually dividing the elements of the population into different categories or
subpopulation and then the members of the sample are drawn or selected proportionally from
each subpopulation.
Example. Suppose a community consists of 5000 families belonging to different income
brackets. We will draw 200 families as our random sample using stratified random sampling.
Below are the subpopulations and corresponding number of families belonging to each
subpopulation or stratum.

Strata Number of Families


High-Income Families 1000
Average-Income Families 2500
Low-Income Families 1500
N=5000
Solution: the first step is to find the percentage of each stratum. This is done by dividing the
number of families in each stratum by the total of families. Then, we multiply each percentage
by desired number of families in the sample.

Strata Number of Percentage Number of Families in


Families the Sample
High 1000 1000/5000= 0.2 or 20% 0.2x200= 40
Average 2500 2500/5000=0.5 or 50% 0.5x200=100
Low 1500 1500/5000=0.3 or 30% 0.3x 200=60
N=5000 n = 200
From the above table, we see that if we are going to draw 200 members from the population of
5000, we should draw 40 families belonging to the high-income, 100 from the average, and 60
from the low-income groups. Observe that the number of families drawn as sample in each
stratum is proportional to the number of families from the population.

School N n
A 100 (100/480)*218 45
B 80 36
C 200 91
D 60 27
E 40 19
N=480 n=218

5% = 0.05
N 480 480 480 480
n= = = = = = 218.18=218
1+ Ne ² 1+ 480(.05) ² 1+ 480(.0025) 1+ 1.2 2.2
4. Cluster Sampling
Cluster sampling is sampling wherein groups or clusters instead of individuals are randomly
chosen. Recall that in the simple random sampling we select members of the sample
individually. In cluster sampling, we will select or draw the members of the sample by group
and then we select a sample of elements from each cluster or group randomly. Cluster
sampling is sometimes called area sampling because this is usually applied when population is
large.
To illustrate the use of this sampling method, let’s suppose that we want to determine the
average income of the families in Manila. Let us assume there are 250 barangay in Manila. We
can draw a random sample of 20 barangays using simple random sampling, and then a certain
number of families from each of the 20 barangays may be chosen.
5. Multi-Stage Sampling
Multi-stage sampling is a combination of several sampling techniques. This method is
usually used by the researchers who are interested in studying a very large population, say the
whole island of Luzon or even the Philippines. This is done by starting the selection of the
members of the sample using cluster sampling and then dividing each number or group into
strata. Then, from each stratum individuals are drawn using simple random sampling.
B. Non-Probability or Non- Random Sampling Techniques
The non-probability sampling is a sampling technique wherein members of the sample
are drawn from the population based on the judgment of the researchers. The results of a
study using this sampling technique are relatively biased. This technique lacks objectivity of
selection; hence, it is sometimes called subjective sampling. Inferences made based on the
sample obtained using this technique is not so reliable.
Non-probability sampling techniques are used because they are convenient and
economical. Researchers use these methods because they are inexpensive and easy to conduct.
1. Convenience Sampling
As the name implies, convenience sampling is used because of the convenience it offers
to the researcher. For example, a researcher who wishes to investigate the most popular
noontime show may just interview the respondents through the telephone. The result of this
interview will be biased because the opinions of those without telephone will not be included.
Although convenience sampling may be used occasionally, we cannot depend on it in making
inferences about a population.
2. Quota Sampling
In this type of sampling, the proportions of the various subgroups in the population are
determined and the sample is drawn to have the same percentage in it. This is very similar to
the stratified random sampling the only difference is that the selection of the members of the
sample using quota sampling is not done randomly. To illustrate this, let us suppose that we
want to determine the teenagers’ most favorite brand of T-shirt. If there are 1000 female and
1000 male teenagers in the population and we want to draw 150 members for our sample, we
can select 75 female and 75 male teenagers from the population without using randomization.
This is quota sampling.
3. Judgment or Purpose Sampling
Another method of drawing the members of the sample using non-probability is by
using purposive sampling. Let us suppose that the target is to find out the effectiveness of a
certain kind of shampoo. Of course, bald fellows will not be the sample.
4. Incidental Sampling
This design is applied to those samples which are taken because they are the most
available. The investigator simply takes the nearest individuals as subjects of the study until it
reaches the desired size. In an interview, for instance, an interviewer can simply choose to ask
those people around him or in a coffee shop where he is taking a break.

KINDS OF STATISTICAL TESTS


Statistical tests can be grouped into two. The parametric and the nonparametric tests.

The parametric tests. To use the parametric tests, there are some conditions that should be
met. The data must be normally distributed and the level of measurement must be either interval or
ratio.

The data are said to be normal when the value of skewness equals zero and the value of kurtosis is 2.65.

Parametric Tests are used when the data are in the interval and ratio scales. It is assumed that the data
are normally or nearly normally distributed.

1. T-Test of Independent/uncorrelated Means- it is used to determine whether or not an


observed difference between the averages of two different /independent groups is statistically
significant. The data are of the interval scale or in the form of scores; the number of cases is less
than 30.
Example: Is there a significant difference between the two groups of children in terms of
achievement in Mathematics?
-Is there a significant difference on the performance of students in two classrooms?

- Is used to compare two sample means when the two samples are independent of
one another.
2. T-test of Dependent/ correlated Means- It is used to determine the significance between two
means obtained by one group from two testing conditions.

Example: Is there a significant difference in the pre and post test scores of children after
undergoing a remedial class?
- Is used for matched samples (where the two samples are not independent of one
another as they are matched) and for pre-test/post test comparisons where the pre-
test and post test are taken on the same group of subjects.

3. Z-test – used to determine the significant difference between the means of two groups or
conditions with more than 30 cases or observations.

4. F-test or ANOVA (Analyis of Variance) – is a statistical method that separates observed variance
data into different components to use for additional test. It is used to determine the significant
difference among means of three or more independent groups.
Developed by sir Ronald Aylmer Fisher
Example: Do the four groups of students significantly differ in terms of academic performance?

5. Pearson Product Moment Correlation (Pearson r) used to determine if there is a correlation


between two variables for linear relations with the interval-ratio type of scale. If curvilinear, eta
correlation is recommended.
6. Eta Correlation – it is used when relationship between two sets of variables is not linear.
7. Scheffe’s Test/ Posteriori T-test/ Tuque/ Post Hoc/ Duncan Multiple Range Test- used to
determine the significant difference between means of two groups. It is used to determine which
pairs of comparison is significantly related or associated from among group means when the data
are in interval-ratio scale.
8. Point Biserial Coefficient of Correlation- it is used to find out whether there is a correlation
between interval (quantitative) and nominal data or when the variable in a 2-category split and this
dichotomy is considered real and not arbitrary.
9. Linear Regression -The simple linear regression analysis is used when there is a significant
relationship between x and y variables. This is used in predicting the value of y given the value of x.
10. Analysis of Co-variance (ANACOVA) – it is used to control or reduce the effect of one or more
uncontrollable variables to the dependent variable, which are known as co-variates.

NON-PARAMETRIC TESTS – are used when the data are in nominal or ordinal scales.
Non-parametric test are methods of statistical analysis that do not require a distribution to meet
the required assumptions to be analyzed (especially if the data is not normally distributed). Due
to this reason, they are sometimes referred to as distribution-free tests.

1. Chi-Square tests – used to determine the difference or association of two or more sets of
data in nominal-ordinal type of scale.
-t can be used to test association in one or more groups and it does this by comparing actual
(observed) numbers in each group, with those that would be expected according to theory
or simply by chance. Chi-Square test requires that the data be expressed as frequencies, i.e.
numbers in each category; this is nominal level of measurement. It should be noted that in
most cases almost any data can be reduced to categorical or frequency data, but it is not
always wise to do this because information is invariably lost in the process.
2. Spearman rank Order Correlation Coefficient (Spearman rho) named after Charles Spearman
and often denoted by the Greek letter \rho( ρ ¿ or rs. is a nonparametric measure or rank
correlation. It assesses how well the relationship between two variables can be described
using a monotonic function.
It is used to measure the relationship of paired ranks assigned to individual scores on two
variables.
3. Gamma or Goodman’s and Kruskal’s Gamma (G) -is a measure or rank correlation, i.e., the
similarity of the orderings of the data when ranked by each of the quantities. It measures
the strength of association of the cross tabulated data when both variables are measured at
the ordinal level.
This statistic (which is distinct from Goodman and Kruskal’s Lambda) is named after Leo
Goodman and William Kruskal, who proposed it in a series of papers from 1954 to 1972.
It is an alternative of Spearman rho. –used to determine whether or not there is a
correlation between two ordinal variables.
4. Mann-Whitney U Test – used to test the significant difference of independently random
samples from two groups with uneven number of cases in ordinal form.
-it is a non-parametric alternative to the independent t-test.
- a popular nonparametric test to compare outcomes between two independent groups. It
is sometimes called the Mann Whitney Wilcoxon Test or Wilcoxon Rank Sum Test, is used to
test whether two samples are likely to derive from the same population (i.e., that the two
populations have the same size)
5. H-Test or Kruskal Wallis Test – used to test the significant difference of independently
random samples from three or more groups with uneven number of cases in ordinal form.
-(sometimes also called one-way ANOVA on ranks”) is a rank-based nonparametric test that
can be used to determine if there are statistically significant differences between two or
more groups of an independent variable on a continuous or ordinal dependent variable. It is
considered the nonparametric alternative to the one-way ANOVA, and an extension of the
Mann Whitney U-test to allow the comparison of more than two independent groups.
6. Phi-Coefficient – used to measure the degree of association between two binary variables
or two nominal dichotomous variables. These are referred to as binary variables and include
responses to yes/no questions or in many contexts, gender (i.e., male/female)
Phi-coefficient should be used in statistics when a measure of association is desired
between two categorical variables, which is only two possible outcomes.

7. Friedman ‘s Two way Analysis of Variance by Ranks – it is used when the data from related
samples of at least an ordinal scale and had been taken from similar population.
8. Kendall’s Coefficient of Concordance (W) – it is used to determine the relationship among
three or more sets or ranks.
9. Mcnemar test- is a nonparametric test used to analyze paired nominal data. It is a test on a
2x2 contingency table and checks the marginal homogeneity of two dichotomous variables.
The test requires one nominal variable with two categories (dichotomous) and one
independent variable with two dependent groups
This is before and after design which all are trying to test whether there is a significant
change between the before and after situations.
Example: Is there a significant difference in the use of seat belt before and after involvement
in an automobile accident?
10. Sign Test for Correlated Samples (Fisher Sign Test). This test is under nonparametric
statistics. It is the counterpart of the t-test for correlated sample under the parametric test.
The Fisher Sign Test compares two correlated samples and is applicable to data composed of
N paired observations. The difference between each pair of observations is obtained. This
test is based on the idea that half the difference between the paired observations will be
positive and the other half will be negative.
Example: is there a significant difference on the academic performance of the students
before and after the implementation of the program?

You might also like