Topic 1 - Ch1 Ch2
Topic 1 - Ch1 Ch2
Topic 1 - Ch1 Ch2
1 Introduction to Statistics
Slide 1:
Welcome everyone.
1
Introduction to Statistics
Descriptive Inferential
Statistics statistics
population vs Statistic vs
sample parameter
Slide 2:
In this video, I will briefly introduce the discipline of statistics. In particular, we will
• talk about basic idea behind statistics,
• highlight the relevance and importance of data, and
• introduce the two main branches of statistics, namely, descriptive statistics
and inferential statistics.
We will also distinguish between some key statistical concepts. In particular, we will
differentiate between
• a population and a sample,
• a parameter and a statistic, and
• discuss how they are related to the idea of inferential statistic.
2
What is Statistics?
Business
Make decisions Use of data
The most popular sources of statistical data
Published data •Primary: Published by collectors (e.g. ABS)
•Secondary: Published by non-collectors (e.g. World Bank)
Observational data •Example:. Measuring yield of different types of rice without any control
Experimental data •Example: Measuring yield of different types of rice using a certain amount of fertilizer (control factor)
Information Technology
Data Data Data Data Data Data Data Data Data Data Data
Slide 3:
Let us first start by discussing what is Statistics all about?
The operation of businesses involves making a variety of decisions. These decisions often involve the use of
data.
Data are mostly numerical facts collected through many sources. Four of the most popular sources of statistical
data are:
• Published data,
• Observational data (that is, data collected from observational studies),
• Experimental data (that is, data collected from experimental studies), and
• data collected from surveys.
Published data is the most preferred source of data due to low cost and convenience. There are two types of
published data: 1) primary data and 2) secondary data.
• When data is published by an organization that has collected it, it is known as primary data. For
example, data published by the Australian Bureau of Statistics.
• When data is published by an organization that is not the original collector of the data, the data is
known as secondary data. For example, the World Bank which publishes data collected by other
organizations.
When published data is unavailable, one needs to conduct a study to generate the data through observational
studies, experimental studies or surveys.
• An observational study is one in which measurements representing a variable of interest are
observed and recorded, without controlling any factor that might influence their values.
• Experimental studies on the other hand use some control factors.
For example, when you measure the yield of different types of rice without any control factors, you are
conducting an observational study. However, if you use the same study but use fertilizer as a control factor,
this becomes an experimental study.
Data collected through surveys might be more obvious to you. A good example of surveys is the student
feedback on the quality of the subject which is collected by universities at the end of the semester.
In today’s world, we have access to more data than ever before. This is due to advances in information
technology which has increased the demand for graduates who are able to scientifically analyses data. This
makes statistics an important subject which plays an important role in your career success.
3
What is Statistics?
to make
get
Statistics information
from data informed
decisions
• Collecting data
The science of • Analyzing data
• Drawing conclusions from data
Slide 4:
With this in mind, we are now ready to talk about the definition of statistics. There many
definitions of statistics. The simplest one is that
• Statistics is a way to get information from data to make informed decisions.
It is therefore the science of collecting reliable data; analyzing data using statistical
techniques to extract useful information; and drawing conclusions from the data.
4
Two major branches of statistics
Descriptive Statistics
• Graphical Inferential Statistics
• Numerical
Slide 5:
There are two major branches of statistics that analyse data to generate useful
information for decision making:
• Descriptive statistics and
• Inferential statistics.
5
Descriptive Statistics
Methods of organising, summarising, and presenting data
graphical techniques
• Examples
• Mean, median, mode
numerical measures • Range, variance, standard
deviation
Slide 6:
Let us first start with descriptive statistics.
• Descriptive statistics involves methods of organising, summarising, and presenting
data in ways that are useful, attractive and informative to the reader.
• One form of descriptive statistics uses graphical techniques, which is a very
convenient way of visually extracting useful information from complex data.
• Another form of descriptive statistics uses numerical techniques to summarise
data. One such method you would have already used frequently is calculating
an average or mean. The mean and median are popular numerical measures
to describe the central location of the data. The range, variance and standard
deviation are popular measure of the variability of the data. We will discuss
these measures in detail later in the subject.
6
Inferential Statistics
• Weight of Tuna cans
ACCC Complaint less than advertised
Slide 7:
In order to give you some insight into inferential statistics, consider this example:
Suppose you work for The Australian and Competition and Consumer Commission
(ACCC). The Commission receives a complaint from a group of consumers who claim that
the average weight of tuna cans of a particular brand weigh less than what is advertised
by the producer. You are asked to investigate whether or not this claim is true. There is
however a problem. You cannot go and weigh every single can of tuna in the market. So
then how would you resolve the issue? This is where inferential statistics can help. The
process involves selecting a random sample of the producers tuna cans and use their
average weight to get an estimate of the average weight of all the cans being produced
by the company. Methods of inferential statistics will also allow you to say something
about the accuracy of this estimate which we will cover in details later in this subject.
7
Inferential Statistic
Population
(ALL cans) Sample
(100 cans)
Subset
Parameter Statistic
Average weight of all cans Average weight of 100 cans
( A descriptive measure of the population) (A descriptive measure
of the sample)
Slide 8:
Let us clarify the idea behind inferential statistics further.
Suppose you randomly selected 100 cans.
• These randomly selected cans are referred to as the sample, whereas the
collection of all cans produced by the company is referred to as the
population.
• The average weight of the cans in the sample is referred to as statistic. A
statistic is a descriptive measure of a sample.
• The average weight of ALL cans in the population is however referred to as a
‘parameter’ which is a descriptive measure of the population.
In inferential statistics,
• We use the sample to get some idea about the population.
• Or, we calculate a statistic to get an estimate of the parameter.
Therefore, inferential statistics is a body of methods used to draw conclusions about a
population, based on information provided by a randomly selected sample from the
population. Or a process which uses sample statistic to make inference about population
parameter. In our example here, we used the average weight of the 100 randomly
selected tuna cans (which is a statistics) to infer about the average weight of all cans
produced by the company (which is a parameter).
This concludes our basic introduction to statistics.
8
1.2 Types of Data
Slide 9:
Next, we will talk about different types of data.
9
10
Types of Data
Numerical Data Nominal Data Ordinal Data
(quantitative/interval data) (categorical data) (ranked data)
10
Slide 10:
From the point of view of statistical analysis, there are three basic types of data. These
are:
• Numerical Data (also known as quantitative or interval data),
• Nominal Data (also referred to as categorical data), and
• Ordinal data (also known as ranked data).
11
Numerical Data
• The values of numerical data are real numbers.
• E.g. income,
Numerical • heights, weights, prices, waiting time at a
data medical practice, etc.
• Allowed:
• It is meaningful to talk about 2*income, or
Arithmetic income+ $10, and so on.
operations:
11
Slide 11:
Let us discuss these one by one, starting with numerical data.
The values of numerical data are real numbers. For example, if you survey students and
ask them as to what is their weekly income. The data that you collect will be numerical
data.
Similarly, heights, weights, prices, waiting time at a medical practice etc. are all
numerical variables.
All arithmetic calculations are permitted on numerical data. It is therefore meaningful to
talk about the weekly income of one student being 2 times that of another, or 10 dollars
more than the other, and so on.
12
Nominal Data
• The values of nominal data are categories.
• E.g. Responses to questions about marital
status coded as:
Nominal
Data • Single = 1, Married = 2, Divorced = 3,
Widowed = 4
12
Slide 12:
The values of nominal data on the other hand are categories (e.g., responses to
questions about marital status with Single, Married, Divorced, and Widowed as the
available choices). These choices may be coded as 1, 2 , 3 and 4, respectively.
Arithmetic operations on Nominal data do not make any sense. For Example, we cannot
say Married ÷ 2 = Divorced. Therefore no calculations are allowed on nominal data
except counting the number of observations in each category and calculating their
proportions.
To clarify this, suppose you surveyed 100 individuals, asking them about their marital
status, and 40 individuals said that they are single. The ratio of 40 to 100 is 0.4. This
means that 40% of those who responded to the survey are single. This is the only
calculation that is allowed on nominal data.
Ordinal Data
• categorical with an order or ranking to them:
• e.g. University course evaluation system:
Ordinal Poor = 1, fair = 2, good = 3, very good = 4,
excellent = 5
data:
• Not meaningful.
(e.g. does it make sense to say 2*fair = very good?!),
Arithmetic • It is meaningful to say that
excellent > poor or fair < very good
Calculation:
13
Slide 13:
Ordinal data appear to be categorical in nature, but their values have an order (i.e., the
values have a ranking to them).
For Example, university course evaluation surveys, where students are asked to assign a
score to the quality of the course at the end of the semester. The rating could be either
poor, fair, good, very good, or excellent with numerical codes.
While it is still not meaningful to do arithmetic calculations on this data, we can say
“excellent is better than poor”, or “fair is worse than very good”.
14
Slide 14:
Next we will be looking at important methods of sample selection.
15
Sampling
15
Slide 15:
Recall that in previous slides, we:
• differentiated between a sample and a population;
• highlighted the difference between descriptive statistics and inferential
statistics; and
• discussed how a sample statistic is used to infer about a population
parameter in inferential statistics.
Now we will discuss some important methods of sample selection from a population,
referred to as sampling plans.
16
Sampling
(i.e. selecting a sub-set of a whole population) is
often done instead of a census
• cost
• For example, it is less expensive
a sample? • practicality
• For example, performing a crash
test on every automobile
produced is impractical.
16
Slide 16:
Collecting data from the whole population is referred to as census. The Australian
Bureau of Statistics for example conducts a census every 5 years in Australia.
Sampling refers to selecting a subset of the whole population. The main reasons for
selecting a sample from the population instead of collecting data from the entire
population are
• costs, and
• practicability.
It is, for example, less expensive to survey 1000 television viewers than 20 million
viewers. Similarly, performing a crash test on every automobile produced is impractical.
17
Sampling Plans
• a method or procedure for specifying how a
Sampling plan: sample will be taken from a population.
Simple Stratified
Cluster
random random
sampling sampling sampling.
17
Slide 17:
Methods of selecting samples are referred to as sampling plans. There are three
commonly used methods of sampling. They are:
• simple random sampling,
• stratified random sampling, and
• cluster random sampling.
Let us discuss these one by one.
18
Example:
Sample selection:
Drawing five names from a hat containing all the names of the students in a class (any group of
five names is as equally likely as picking any other group of five names).
18
Slide 18:
A simple random sample is a sample selected in such a way that every possible sample
of the same size is equally likely to be chosen.
For example, suppose an instructor wants to randomly choose five students to perform a
group task in front of the class. The instructor does not mind whether the student is a
male or a female. He can simply put the names or student numbers of all students in a
hat and blindly draw any five names.
Businesses normally use Microsoft Excel for random selection of these numbers or
names, which you will learn as well.
19
Example:
Class of 200 students. 60% male and 40% Objective: Draw a random sample of 5
female (Male and female are two strata) students (60% males and 40% females)
Sample selection:
Drawing 3 (60% of 5) names from a hat Drawing 2 (40% of 5) names from a hat
containing the names of all MALE students only containing the names of all FEMALE students
using simple random sampling. only using simple random sampling.
19
Slide 19:
Let us now introduce stratified random sampling. In this type of sampling, a random
sample is obtained by separating the population into mutually exclusive sets (referred to
as strata), and then drawing simple random samples from each stratum.
For example, suppose a class has 200 students. 60% of these students are male and 40%
female. The male and female are two mutually exclusive sets or strata. A stratified
random sample draws a sample that has the same proportion of males and females as in
the class (i.e., 60% males and 40% females). If we want to select 5 students, 60% of 5 is
3, and 40% of 5 is 2.
We can separate the names of males from those of females, and randomly select 3
males and 2 females using simple random sampling. We can do so, for example, by
putting the names of male and female students in separate hates, and randomly
selecting 3 names from the hat containing the boys’ names and 2 from the hat
containing the girls’ names.
20
Cluster Sampling
• Suppose you do not have the names of the
Useful when:
• it is difficult/costly to develop a complete list of the population members
• the population members are widely dispersed geographically
• (it is easier to randomly select a street and interview all members in the
street than to select and interview a similar numbers in different streets)
20
Slide 20:
Now, suppose you do not have the names of the students. They work in groups and sit
on different tables in groups or clusters of five. Is it possible to select a random sample
without having details of all students?
The answer is yes. You can simply assign a number to each table, put it in a hat and draw
one table using simple random sampling. This is referred to as cluster sampling.
This procedure is useful when it is difficult and costly to develop a complete list of the
population members. Or, when the list is available but the population members are
widely dispersed geographically. For example, it is easier to randomly select a street and
interview all members in the street than to select and interview a similar number of
respondents in different streets. This saves a great deal of time and effort.
Well done. You have now completed your reading for this week.