1-The Nature of Statistics

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 63

NATURE OF

STATISTICS
A branch of mathematics that deals with the
scientific collection, organization,
presentation, analysis, and interpretation of
numerical data in order to obtain useful and
meaningful information. (General )

DEFINITION OF STATISTICS

#Math111
A set of procedures and rules for reducing
large masses of data into manageable
proportions allowing us to draw conclusions
from those data. (McCarthy)

DEFINITION OF STATISTICS

#Math111
A person who is trained in collecting
numerical information (data), evaluating it,
and drawing conclusions from it.

STATISTICIAN

#Math111
• Variable (data) - characteristics or attribute that can
assume different values
Examples:
> scores of the students.
> opinion of the students about the taste of durian.

BASIC CONCEPTS
#Math111
Data Values- values (measurements or observations) that
the variables can assume.

Data set - collection of data values


Example: (Scores in a 40-item quiz)
21,27,19,21,20,24,30

BASIC CONCEPTS
#Math111
• A population consists of all subjects Population
Complete collection of data
(human or otherwise) that are being
studied.
Sample
The portion of the
population selected
• A sample is a group of subjects for analysis

selected from a population.

BASIC CONCEPTS
#Math111
POPULATION SAMPLE

Banks in the Philippines Banks in NCR

Students with Statistics


All UB Students
subject

Working Students in Working Students in


Laguna Cabuyao

POPULATION VS. SAMPLE


#Math111
Measure used to describe
the population
is called the PARAMETER.

Measure computed from


sample data is called
STATISTIC.

Population vs. Sample


#Math111
TYPES OF STATISTICS
• Descriptive statistics consists of the collection,
organization, summarization, and presentation of
data.

Example:

Describing the allocation of your weekly


allowance
TYPES OF STATISTICS
#Math111
DESCRIPTIVE STATISTICS
•COLLECT DATA
• SURVEY
•PRESENT DATA
• TABLES AND GRAPHS
•CHARACTERIZE DATA
X i

• SAMPLE MEAN = n
DESCRIPTIVE STATISTICS
Example:
• The survey, conducted from March 25 to 28, 2017
showed that VP Robredo got a “moderate” +26 net
satisfaction rating, one grade down from the “good”
+37 she received in December last year.
• 53% of respondents said they were satisfied with
Robredo’s performance, 27% were dissatisfied and
19% were undecided.
(Source: http://newsinfo.inquirer.net/888250/robredos-net-
satisfaction-rating-falls-by-11-points-sws)
DESCRIPTIVE STATISTICS
EXAMPLE:
DESCRIPTIVE STATISTICS
EXAMPLE:

Source:
http://espn.go.com/nba/player/stats/_/id/3975/stephen-curry
• Inferential statistics consists of generalizing
from samples to populations, performing
estimations and hypothesis tests, determining
relationships among variables, and making
predictions.

TYPES OF STATISTICS
#Math111
INFERENTIAL STATISTICS

• ESTIMATION AND HYPOTHESIS TESTING


• estimate the population mean
weight using the sample mean
weight
• test the claim that the population
mean weight is 120 pounds
INFERENTIAL STATISTICS

SEVERAL WELL-DESIGNED LONG-TERM


CLINICAL STUDIES HAVE FOUND THAT PEOPLE
WHO TAKE BLOOD-PRESSURE-LOWERING
DRUGS ACTUALLY SUFFER FROM
UNNECESSARY SIDE EFFECTS INCLUDING AN
INCREASED RISK OF HEART DISEASE.
SOURCE: HTTP://WHFOODS.ORG/GENPAGE.PHP?TNAME=DISEASE&DBID=15
• Inferential statistics
Hypothesis Testing:
Is there a significant difference
between the height of male and
female MCL students?
Decision: There is a significant
difference between the height
of male and female MCL
students.
TYPES OF STATISTICS
#Math111
• In each of these statements, tell whether descriptive or inferential
statistics have been used.
1. On average, 100 people choke to death on ball point pens every
year. (statisticbrain.com) 1. Descriptive
2. The average person's left hand does 56% of the typing.
(statisticbrain.com) 2. Descriptive
3. By 2040 at least 3.5 billion people will run short of water
(World Future Society). 3. Inferential
4. Allergy therapy makes bees go away (Source: Prevention).4. Inferential
5. Nerve impulses to and from the brain travel as fast as 170 miles
per hour. (typepad.com) 5. Descriptive

LET’S PRACTICE
#Math111
A study conducted at Manatee Community College revealed that students who
attended class 95 to 100% of the time usually received an A in the class.
Students who attended class 80 to 90% of the time usually received a B or C in
the class. Students who attended class less than 80% of the time usually received
a D or an F or eventually withdrew from the class.
Based on this information, attendance and grades are related. The more you
attend class, the more likely you will receive a higher grade. If you improve your
attendance, your grades will probably improve. Many factors affect your grade in
a course. One factor that you have considerable control over is attendance. You
can increase your opportunities for learning by attending class more often.
1. What are the variables under study?
2. Are descriptive, inferential, or both types of statistics used?
3. What is the population under study?
4. From the information given, comment on the relationship between the
variables.

LET’S PRACTICE
TYPES OF DATA
Data

Qualitative
Quantitative
(Categorical
(Numerical)
)

Discrete Continuous

TYPES OF DATA
#Math111
• Qualitative data are variables that can be placed into
distinct categories, according to some characteristic
or attribute.
 consist of labels, category names, and such for which
representation on a numerical scale is not naturally
meaningful
Examples:
Opinion of Catholics to Death Penalty (Pro or Anti)
Name of your friends in MCL

TYPES OF DATA
#Math111
• Quantitative data are numerical and can be ordered
or ranked.
 are counts or measurements for which
representation on a numerical scale is naturally
meaningful.
Example:
Amount of a student’s daily allowance.

TYPES OF DATA
#Math111
Data

Qualitative
Quantitative
(Categorical
(Numerical)
)

Discrete Continuous

TYPES OF DATA
#Math111
Discrete Data
quantitative data that are countable using a
finite count, such as 0, 1, 2, and so on
integer-valued

Continuous Data
quantitative data that can take on any value
within a range of values on a numerical scale in
such a way that there are no gaps, jumps, or
other interruptions
real-valued
DISCRETE OR
Examples CONTINUOUS?
Daytime temperature readings (in degrees
Fahrenheit) in a 30-day period continuous
Ages of MATH111 students continuous
Number (0, 1, 2, or so on) of people attending a
conference discrete
Defects per hour in a shoe company discrete
Number of hours you waited for your girlfriend continuous

CDCJAURIGUE
Data
DATA TYPES

Qualitative Quantitative
Examples: (Categorical) (Numerical)
 Marital Status
 Political Party
 Eye Color
(Defined Discrete Continuous
categories)
Examples:
Examples:
 Weight
 Number of  Voltage
Children  Sales
 Defects per hour
(Measured
(Counted items) characteristics)
LEVELS OF
MEASUREMENT
Levels of Measurement
4
Ratio

3
Interval
2
Ordinal
1
Nominal
Nominal Scale
the lowest level of data
applied to data that are used for category
identification
characterized by data that consist of names,
labels, or categories only
data cannot be arranged in an ordering scheme
arithmetic operations are not performed
for nominal data
Nominal Scale
Qualitative Variable Data Values

Blood type A, B, AB, O

Gender male, female

Status single, married, seperated


MCL, MIT, UP, ADMU,
Name of Schools
DLSU
Nominal Scale

Qualitative variable Possible nominal level data values

Province of Laguna, Batangas, Cavite, Rizal,


residence Quezon

Color of road signs red, white, blue, green

Religion Christian, Moslem, etc.


Ordinal Scale

the next higher level of data


characterized by data that applies to categories
that can be ranked
data can be arranged in an ordering scheme
arithmetic operations are not performed
on ordinal level data
Ordinal Scale

Qualitative variable Data values

Product rating Poor, good, excellent

Socioeconomic class Lower, middle, upper

Pain level None, low, moderate, severe


38
Interval Scale
applied to data that can be arranged in some
order and for which differences in data
values are meaningful
results from counting or measuring
the value zero is arbitrarily chosen for
interval data and does not imply an absence
of the characteristic being measured
Ex: temperature
Ratio Scale
the highest level of measurement
applied to data that can be ranked and for
which all arithmetic operations including
division can be performed
results from counting or measuring
data can be arranged in an ordering scheme
and differences and ratios can be calculated
and interpreted
Ratio Scale
data has an absolute zero and a value of zero
indicates a complete absence of the
characteristic of interest
Examples:
wages height weight
units of production
changes in stock prices
distance between branch offices
grams of fats consumed per day
Data Measurement Levels

Highest Level
Measurements
e.g., temperature
Ratio/Interval Data Complete Analysis

Rankings Higher Level


Ordered Categories
e.g., age range 25-34
Ordinal Data Mid-level Analysis

Categorical Codes Lowest Level


e.g., ID Numbers, gender Nominal Data Basic Analysis
• Classify each variable as nominal, ordinal,
interval or ratio-level measurement.
1. Times required for mechanics to do a tune-
up. Ratio
2. Ages of students in a classroom. Ratio
3. Classification of children in a day-care
center (infant, toddler, preschool)
Ordinal
PRACTICE!
#Math111
SAMPLING METHODS

44
• Population. All of the subjects of interest.
• Sample. The subjects in the population we actually
measure.
• Sampling. The process of selecting the individuals
from the population that makes up our sample.

The details of which subjects are and are not part


of our population should be carefully specified.
- our sample is our only source of
information about the population 45
The theory of sampling is as
follows:
• Researchers want to gather
information about a whole
group of people (the
population).
• Researchers can only observe a
part of the population (the
sample).
• The findings from the sample
are generalized, or extended, 46

back to the population.


Why Sample?
• Less time consuming than a census
• Less costly to administer than a census
• It is possible to obtain statistical
results of a sufficiently high precision
based on samples
Strive for representative samples to reflect the population
of interest accurately!
Sample sizes can be computed
by applying the Slovin’s formula:

𝑁
𝑛= 2
1+ 𝑁 𝑒
where n – number of samples
N – number of population
e – margin of error
SAMPLING TECHNIQUES

Sampling Techniques

Nonstatistical Sampling Statistical Sampling

Simple
Convenience Systematic
Random

Judgment
Cluster
Stratified
Nonstatistical Sampling
 Convenience
 Collected in the most convenient manner for the

researcher

 Judgment
 Based on judgments about who in the population

would be most likely to provide the needed


information
Statistical Sampling
 Items of the sample are chosen based on known
or calculable probabilities

Statistical Sampling
(Probability Sampling)

Simple Random Stratified Systematic Cluster


4 METHODS OF STATISTICAL
SAMPLING
1. Simple Random Sampling
 Every possible sample of a given size has an equal chance of
being selected
 The sample can be obtained using a table of random
numbers or computer random number generator
4 METHODS OF STATISTICAL
SAMPLING
2. Stratified Random Sampling
 Divide population into subgroups (called strata)
according to some common characteristic
 e.g., gender, income level
 Select a simple random sample from each subgroup
 Combine samples from subgroups into one

Population
Divided
into 4
strata

Sample
4 METHODS OF STATISTICAL
SAMPLING
3. Systematic Random Sampling
 Decide on sample size: n
 Divide ordered (e.g., alphabetical) frame of N
individuals into n groups of k individuals: k=N/n
 Randomly select one individual from the 1st
group
 Select every kth individual thereafter
N = 64
n=8 First Group
k=8
4 METHODS OF STATISTICAL
SAMPLING
4. Cluster Sampling
 Divide population into several “clusters,” each
representative of the population (e.g., province)
 Select a simple random sample of clusters
 All items in the selected clusters can be used, or
items can be chosen from a cluster using another
probability sampling technique

Population
divided into
16 clusters. Randomly selected
clusters for sample
Classify each sample as random, systematic, stratified, or cluster
1.) In a large school district, all teachers from two buildings are
interviewed to determine whether they believe the students
have less homework to do now than previous years. Cluster
2.) The team needs to get a sample of 4000 students from the
population and select 480 English, 1120 Science, 960 Computer
Science, 840 Engineering and 600 Math students which provides
Stratified
a better representation of students’ college majors in U.S.
3.) Every 100th hamburger manufactured is checked to determine
its fat content. Systematic
4.) Mail carriers of a large city are divided into four groups
according to gender (male or female) and according to whether
they walk or ride on their routes. Then 10 are selected from
56
each group and interviewed to determine whether they have
been bitten by a dog last year. Stratified PRACTICE
In each statement, tell whether descriptive or inferential statistics
have been used.

1.) Drinking decaffeinated coffee can raise cholesterol levels by 7% (source: american
heart association). inferential inferential
2.) Expenditures for the cable industry were $5.66 billion in 1996 (source: USA today).
3.) The median household income for people aged 25 – 36 is $35, 888 (source: USA
today). descriptive
4.) Twenty-eight percent or 17.3 million Filipino adults age 15 years and older are
current tobacco smokers, according to the results of the 2009 global adult tobacco
survey (GATS). Almost half (48 percent or 14.6 million) of adult males and 9 percent
(2.8 million) of adult females are current smokers. Moreover, 23 percent of Filipino
adults are daily tobacco smokers: 38 percent for males and 7 percent for females.
(Https://psa.Gov.Ph/article/173-million-filipino-adults-are-current-tobacco-smokers) descriptive
5.) Experts say that mortgage rates may soon hit bottom (source: USA today).
descriptive
The chart shows the number of job-related injuries for each of the
transportation industries for 1998.
Industry Number of Injuries
Railroad 4520
Intercity Bus 5100
Subway 6850
Trucking 7144
Airline 9950

1. What is the variable under study?


2. Which variable is categorized as quantitative? qualitative?
3. In quantitative variable, is it categorized as discrete or
continuous?
Classify each variable as qualitative or quantitative.

QN
1. Genre of songs played in YES FM 101.9 last year
2. Rankings of NBA players in the MVP race. QL
QN
3. Capacity in cubic feet of six cylindrical containers.
4. Grade Point Average (GPA) of the top five ME QN
students in MCL last term (3T-2016-2017).
5. The population of the California condor. QN
• Classify each variable as discrete or
continuous.
1. Speed of a car. DISCRETE
2. The weight of a bag of apples. CONTINUOUS
3. The length of a piece of wire. CONTINUOUS
4. The number of telephone calls received. CONTINUOUS
5. The number of felony arrests in a town. DISCRETE
• Classify each variable as nominal, ordinal,
interval or ratio-level measurement.
1. Ratings of eight local plays (poor, good,
excellent) RATIO
2. Number of pages in the city of Cleveland
telephone book. ORDINAL
NOMINAL
3. SSS numbers of MCL Faculty members.
4. Salaries of the top five CEOs in USA. RATIO
5. Rankings of tennis players. ORDINAL
Classify each sample as random, systematic,
stratified, or cluster

1.) The president of a 5 – year University would like


to know if the opinion of students regarding some SYSTEMATIC
issues differ based on their year level. 25
students per level were chosen as respondents.
2.) In a regional research, 10 representatives per
district were chosen. CLUSTER
3.) A survey was conducted among students of RANDOM
Briston University, the first 100 students to enter
the school gate were chosen as respondents.
4.) Every seventh costumer entering a shopping mall
is asked to select her/his favorite store. SYSTEMATIC
62
Each student at a school has a student identification number.
Counselors have a computer generate 5050 random identification
numbers and those students are asked to take a survey.
Simple random sampling

You might also like