Chapter 1 Data Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

INTRODUCTION

Objectives:

1.
2.
Define and explain statistical terms
Differentiate the different divisions of statistics
3. Identify the scale of measurement of variables
4. Differentiate data sets
5. Present data in three different way

C
H
A OBJECTIVES:
P 1. Define and explain statistical terms
2. Differentiate the different divisionsof statistics
T 3. Identify the scale of measurement of variables
4. Differentiate data sets
E 5. Present data in different ways
6. Learn how to plot/graph data using MS-Excel
R

Jimczyville Publication
1
INTRODUCTION
Two different meanings carry in the word statistics. It is the plural
form of the word statistic which means measure of sample. As such, data
themselves are synonymous with statistics. Statistics refers to the science
that deals with the systematic method of collecting, classifying,
presenting, analyzing and interpreting qualitative and numerical data.

DIVISION OF STATISTICS

Statistics may be divided into:

1. Descriptive Statistics.

This refers to the methods of summarizing and presenting data in


the form which will make them easier to analyze and interpret. It
characterizes the distribution of a set of observations on a specific
variable or variables.

2. Inferential Statistics

This refers to the drawing of valid conclusions or inferences about a


population based on representative sample systematically taken from
the same population.
The data on hand are usually a sample of the actual population of
interest. For example, most presidential election polls only a sample
about 1,000 individuals, and yet the goal is to describe the expected
voting behavior of 50 million or more.

Jimczyville Publication
2
Theory of
Probability
STATISTICS

tH
Theory of
DESCRIPTIVE STATISTICS Probability INFERENTIAL STATISTICS

Frequency & Statistical Estimation Hypothesis Testing


Percent Descriptions Theory
Distributions  Hypothesis Testing
 Measures of Central o Real-life
Tendency applications
o Mean  Estimation of o Type I & Type
 Levels of o Median o Mean II errors
Measurement o Mode o Standard o Level of
Deviation Significance
 Statistical  Measures of o Proportion o Power of a
Tables Dispersion or test
Variability  Estimation of  Parametric Tests
 Statistical o Range o Differences o t-test
Graphs o Variance & between o Analysis of
Standard Means Variance
 The Normal Deviation o Differences (ANOVA) / F-
Distribution o Graphical between test
& other depiction Proportions o Analysis of Co-
Distributions  Measures of o Other Variance
(Skewness& Position estimations (ANCOVA)
Kurtosis) (Percentiles, o Multivariate
Deciles, Quartiles) Analysis
 Coefficients of (MANOVA/MA
Correlation NCOVA)
o Pearson r o Simple &
o Spearman’s rho Multiple
o Kendall’s tau / Regression
Kendall’s W o Others
o Multiple  Non-Parametric
correlation Tests
o Phi, Cramer’s & o Chi-square
Contingency tests
Coefficients o Mann-Whitney
o Point-biserial U /Kruskal-
coefficient Wallis
o Others o Others

Figure 1: Division of Statistics

Jimczyville Publication
3
POPULATION AS DIFFERENTIATED FROM SAMPLE

The word population refers to the total collection of actual or potential realizations of the
unit of observations in the research study. This means that there are population of students,
teachers, principals, animals, birds, insects and many others.
The sample is the specific, finite, realized which represent their characteristics or traits.
These members constitute a subset of a population. These measures of the population are called
parameters, while those of the sample are called estimates or statistic.

VARIABLE

The variable is the construct that a researcher is attempting to measure. The term
variable refers to a characteristic of the subject or individuals. For example, course preference is a
variable that can take on values such as education, criminology, nursing, computer and others.
Another example of variables includes gender, age, intelligence, attitude, faculty ranks and so on.
Variables are classified into qualitative and quantitative variable. A qualitative variable
(also called categorical)has values that are described by words rather than numbers. Clearly,
qualitative variables generally have either nominal or ordinal scales. For example: gender, disease
status, occupation, gender, race and others.
Quantitative or numerical variable is a data which arise from counting, measuring
something or from some kind of mathematical operation. These are variable values that are
intrinsically numeric. Number of children in a family, and age are good examples of quantitative
variable.

VARIABLES ACCORDING TO CONTINUITY OF VALUES

1. Continuous variables. These are variables whose levels can take continuous
values.
Example. The height of an individual, which can be 5’6’’, 5.5 ft or 5.5341 ft,
depending on accuracy of measurement, is a continuous variable.

2. Discrete or discontinuous variables. These are the variables whose values or


levels cannot take the form of decimals.
Example. The number N of children in a family, which can assume any of the
values 0, 1, 2, 3 … but cannot be 1.5 or 3.482 is a discrete variable.

VARIABLES ACCORDING TO SCALE OF MEASUREMENT

1. Nominal Scale. Data classified into various distinct categories in which ordering of label
or assign numbers to things is not implied. It also has two core characteristics: a) they
consist of two or more categories and b) do not have an intrinsic order.

Dichotomous variables are nominal variables that have just two categories. They have a
number of characteristics:

a. Dichotomous variables are designed to give you an either/or response: Example,


you are either male or female. Do you like playing basketball? You may answer
yes or no.

Jimczyville Publication
4
b. Dichotomous variables can either be fixed or designed: Example, sex variable
can only be dichotomous (male or female) therefore they are fixed. In other
cases, dichotomous variables are designed by the researcher. For example, take
the question: Do you like playing basketball? We have determined that the
respondent can only select Yes or No. However, another researcher could
provide the respondent with more than two categories to this question
(e.g. most of the time, sometimes, hardly ever). Where more than two
categories are used, these variables become known as nominal variables rather
than dichotomous ones.

2. Ordinal Scale. Data classified into distinct categories in which ordering of label or
assigning numbers to things is implied.

Example: Educational Attainment, Restaurant Rating, Pain Level

3. Interval Scale. Ordered scales in which the difference between measurements provides
a meaningful quantity or intervals between scale points. No absolute zero or absence of a
meaningful zero is a key characteristic of interval data.

Example: Temperature, Scores in Achievement Test, Calendar Time

4. Ratio Scale. Ordered scales in which the difference between the measurements involves
absolute zero which means it possesses a meaningful zero that represents the absence of
the quantity being measured.

Example: Age, Weight, Height, Distance

COLLECTION OF DATA

In order to ensure the accuracy of data, one must know the right sources of data and
methods of collecting them.

TYPES OF DATA

1. Primary Data. These refer to data observed or collected from firsthand


experience which is gathered directly from an original source.

Advantages of primary source of data: The information you get from a primary
source is more accurate and more likely to be correct.

Disadvantages of primary source of data: Collection of primary data can be costly


and time consuming.

2. Secondary Data. These refer to information collected in the past or other


parties which are previously gathered by other individuals or agencies.

Advantages of secondary source of data: (1) It can usually be obtained more


quickly; (2) It can be less expensive as much of it can be done with books and
the internet;(3) It helps to make primary data collection more specific since with

Jimczyville Publication
5
the help of secondary data, we are able to make out what are the gaps and
deficiencies and what additional information needs to be collected.

Disadvantages of secondary source of data: (1) The needed information does


not meet one’s specified needs; (2) No real control over data quality.

METHODS OF COLLECTING DATA

1. Direct method or interview. The researcher gets the needed data/information


directly from the respondent. The information is collected by direct personal
interview.
2. Indirect Method or questionnaire. This is a very commonly used method of
collecting primary data. The information is collected through a set of
questionnaires. A questionnaire is a document prepared by the researcher
containing a set of questions given out to acquire the needed data/information.
3. Registration Method. It refers to continuous, permanent, compulsory
recording of the occurrence of vital events together with certain identifying or
descriptive characteristics concerning them, as provided through the civil code,
laws or regulations. Examples of registration method are the records of births,
marriages, and deaths at the National Statistics Office. Another example is the
registration record of all Filipinos of voting age at the COMELEC.
4. Observation Method. It involves human or mechanical observation of what
people actually do or what events take place. The information is collected by
observing process at work.
5. Experimentation Method. An experiment is a study of cause and effect. It
differs from non-experimental methods in that it involves the deliberate
manipulation of one variable, while trying to keep all other variables constant.

PRESENTATION OF DATA

1. Textual. This mode of presentation is explained or discussed in text or in paragraph


form.
2. Tabular. The data are systematically presented through tables consisting of vertical
columns and horizontal rows with headings describing these rows and columns.
3. Graphical. It is the most effective means of presenting statistical data, because a
graph may make things clearer.

TYPES OF COMMON GRAPHS

1. Histograms. A histogram is a graphical representation of a frequency


distribution. It is a bar chart whose Y-axis shows the number of data values
within each class of a frequency distribution and whose X-axis shows the class
boundary of each class. There should be no gaps between bars.

Jimczyville Publication
6
Figure 2: Histogram

2. Frequency Polygon and Ogive. A frequency polygon is a line graph that


connects the midpoints of the histogram intervals, plus extra intervals at the
beginning and end so that the line will touch the X-axis. An ogive is a line graph
of the cumulative frequencies.

Figure 3: Frequency Polygon

Figure 4: Ogive

Jimczyville Publication
7
3. Line Chart. A line chart is used to display time series, to spot trends, or to
compare periods. Line charts can be used to display several variables at once.

Figure 5: Line Chart

4. Pie Chart. A pie chart is a circular graph that shows the relative contribution
that different categories contribute to an overall total. A wedge of the circle
represents each category’s contribution, such that the graph resembles a pie that
has been cut into different sized slices.

Figure 6: Pie chart

5. Scatter plot. A scatter plot shows n pairs of observations as dots on an X-Y


graph. We create scatter plots to investigate the relationship between two
variables. Typically we would like to know if there is an association between two
variables and what kind of association exists.

Jimczyville Publication
8
Figure 7: Scatter Plot

COMPUTER BASED SOLUTION OF GRAPHING USING MS-EXCEL

1. Open a new blank MS-excel workbook then select Insert in the tabs. This selection
will provide the tools in creating different types of presenting data graphically. The
Column, Line, Pie, Bar, Area, Scatter and other charts.

2. We will try to graph the given data below using histogram.

Grades of Students Frequency


70 100
75 400
80 550
85 750
90 450
95 200
100 50

Jimczyville Publication
9
3. Encode the given in the blank MS- Excel workbook.

4. Select and highlight the encoded data then click the Insert tab above the data
window. Then select Column.

5. Select Clustered Column below 2- D column. The resulting graph will be shown.

6. Format the graph using Chart Tools that will appear above the Layout Tab when the
resulting graph was selected or highlighted.

Jimczyville Publication
10
7. Click the Chart Tools and the Design Tab will be the default selection. Click the drop-
down menu of the chart layouts then select Layout 8.

8. The resulting graph will be shown.

Jimczyville Publication
11
9. To visualize clearly the columns of the graph, format it further using the chart tools.

10. Click the drop-down button under the chart styles. Select style 26.

11. The results will be shown below.

Jimczyville Publication
12
12. To insert the grades of the students in the graph, click the x-axis where the number
1, 2, 3 …7 were located. Right click the mouse and the dialogue box will appear.
Click Select Data.

13. Another dialogue box will appear. Select 1 on the right side then click edit.

14. A dialogue box for Axis Label Range will appear. Type now the grades of the
students here. = {70, 75, and 80,100}. Then press OK.

Jimczyville Publication
13
15. The resulting graph will be shown below.

16. To edit the Axis titles, click the boxes and replace the text with Grades of Students in
the x-axis and Frequency in the y-axis.

17. Finally, the graph of the given data now will be shown below.

Histogram
800
600
Frequency

400
200
0
70 75 80 85 90 95 100
Grades of Students

18. To use the other types of graphs, just explore the other types of charts, and then the
formatting will be just the same as what we did above.

Jimczyville Publication
14
EXERCISES for Chapter 1

Name: _________________ Score: _____________


Course & Year: _________ Date: ______________

1. Categorize each of the following as nominal, ordinal, interval, or ratio measurement.


a. Faculty are classified as
1 - Contract of service
2- Consultant;
3 - regular/casual;
4 -regular/plantilla
b. Attitude of Filipinos toward immigrants as measure on a five number scale from
1(unfavorable) to 5(highly favorable)
c. Difference between scores in the pretest and posttest
d. Types of birds that arrive at a feeder each day
e. Educational attainment categorized as:
0 - no schooling
1 - elementary
2 - high school
3 - college
4 - graduate/MA/MAT
5 - post graduate ( Ph.D./Ed.D.)

f. Economic status of students scaled as:


0 - high status
1 - low status
g. Scores in a test of statistics and probability subject
h. Grade point average of students
96-100 - superior
91-95 - above average
86-90 - average
81-85 - below average
76-80 - inferior
i. Instructors are ranked according to their performance rating
j. Ranking of five objectives by the different mathematics students
1 - most important
0 - least important
k. Years of service of school administrators
l. Students’ study habits classified as either good or poor
m. Students are classified as either leaders or non-leaders
n. Attitude towards an issue scaled as:
0- extremely disliked
1- dislike
2- like
3- extremely like
Jimczyville Publication
15
EXERCISES for Chapter 1

Name:_________________ Score: _____________


Course & Year: _________ Date: ______________

2. Identify the following as nominal level, ordinal level, interval level, or ratio level data.

a) Percentage scores on a Statistics exam.


b) Grades on an English recitation.
c) Flavors of ice cream.
d) Answers classified as : Easy, Difficult or Impossible.
e) Students evaluations classified as : Excellent, Average, Poor.
f) Religions.
g) Sex.
h) Commuting times to office.
i) Years (BC) of important historical events.
j) Ages (in months) of statistics students.
k) Juice flavor preference.
l) Amount of money in bank accounts.
m) Students classified by their verbal ability : Above average, Below average,
Normal.
n) Year level
o) Different varieties of banana
p) Speed of an automobile in seconds
q) Boiling point of a water in Celsius scale
r) Students ranked first, second and third honors.
s) Rainfall recorded every hour.
t) Inventory of sales every month

3. A researcher gave a 50-items questionnaire on satisfaction in teaching to a group of


faculty at Gordon College, what scale of measurement is being employed in each of
the following:

a. Faculty are classified as satisfied in teaching if they score 25 or more, not


satisfied if they score less than 25.

b. Faculty scores provide the basis for determining if faculty should undergo
seminars on motivations in teaching or not.

40-50 - need not undergo seminar on motivations to


teach
29-39 - should supplement their motivations on teaching
by attending some sessions of the seminar
Below 29 - should attend seminar on motivations to teach

c. The number of yes answers is taken as the measure of motivation in teaching

Jimczyville Publication
16
4. Classify the data described in the following scenarios as qualitative or quantitative.

Classify the quantitative as either discrete or continuous.

a. The students in a school are classified into one of six categories in classroom
performance as follows: excellent, very good, good, satisfactory, passed and
failed.

b. The number of questions correctly answered on a 30-item test is recorded for


each student in a mathematics class.

c. The fasting blood sugar readings are determined for several individuals in a
study involving diabetics.

d. The body mass index of school employee of a certain school.

5. Using the table below which shows a frequency distribution of test scores in entrance
examination of 500 students in Mathematics construct a histogram and frequency
polygon for these data.

Scores in Entrance
Number
Examination
30-39 24
40-49 46
50-59 58
60-69 76
70-79 68
80-89 82
90-99 48
100-109 32
110-119 66

Jimczyville Publication
17
6. Construct a bar graph on social classes of nursing students of Gordon College using
the given data below.

Social Class Frequency

Very Low 27
Low 45
Below Average 52
Above Average 35
High 26
Very High 17

7. Construct a bar graph using the data below.

Slightly agree 100


Somewhat agree 92

Strongly agree 103

Slightly not agree 62

Strongly not agree 147

Somewhat not agree 130

8. Construct a frequency polygon and histogram using the following salaries of a group
of clerks.

Salary Frequency
9,000-9,999 24
10,000-10,999 26
11,000-11,999 30
12,000-12,999 36
13,000-13,999 38
14,000-14,999 50
15,000-15,999 5
16,000-16,999 10
17,000-17,999 15

Jimczyville Publication
18

You might also like