Chapter No. 1 Statistic Basic Terms: Subject Name: Business Statistics Code: Lecture No. Revision 0 Title
Chapter No. 1 Statistic Basic Terms: Subject Name: Business Statistics Code: Lecture No. Revision 0 Title
Chapter No. 1 Statistic Basic Terms: Subject Name: Business Statistics Code: Lecture No. Revision 0 Title
Definition of terms:
Statistics
Collection of methods for planning experiments, obtaining data, and then organizing,
summarizing, presenting, analyzing, interpreting, and drawing conclusions.
Variable
Characteristic or attribute that can assume different values
Random Variable
A variable whose values are determined by chance.
Population
All subjects possessing a common characteristic that is being studied.
Sample
A subgroup or subset of the population.
Parameter
Characteristic or measure obtained from a population.
Statistic (not to be confused with Statistics)
Characteristic or measure obtained from a sample.
Descriptive Statistics
Collection, organization, summarization, and presentation of data.
Inferential Statistics
Generalizing from samples to populations using probabilities. Performing hypothesis
testing, determining relationships between variables, and making predictions.
Qualitative Variables
Variables which assume non-numerical values.
Quantitative Variables
Variables which assume numerical values.
Discrete Variables
Variables which assume a finite or countable number of possible values. Usually
obtained by counting.
Continuous Variables
Variables which assume an infinite number of possible values. Usually obtained by
measurement.
Nominal Level
Level of measurement which classifies data into mutually exclusive, all inclusive
categories in which no order or ranking can be imposed on the data.
Ordinal Level
Level of measurement which classifies data into categories that can be ranked.
Differences between the ranks do not exist.
Interval Level
Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
Ratio Level
Level of measurement which classifies data that can be ranked, differences are
meaningful, and there is a true zero. True ratios exist between the different units of
measure.
Random Sampling
Sampling in which the data is collected using chance methods or random numbers.
Systematic Sampling
Sampling in which data is obtained by selecting every kth object.
Convenience Sampling
Sampling in which data is which is readily available is used.
Stratified Sampling
Sampling in which the population is divided into groups (called strata) according to some
characteristic. Each of these strata is then sampled using one of the other sampling
techniques.
Cluster Sampling
Sampling in which the population is divided into groups (usually geographically). Some
of these groups are randomly selected, and then all of the elements in those groups are
selected.
The population includes all objects of interest whereas the sample is only a portion of the
population. Parameters are associated with populations and statistics with samples. Parameters
are usually denoted using Greek letters (µ (mu), δ (sigma)) while statistics are usually denoted
using Roman letters (x, s).
There are several reasons why we don't work with populations. They are usually large, and it is
often impossible to get data for every object we're studying. Sampling does not usually occur
without cost, and the more items surveyed, the larger the cost.
We compute statistics, and use them to estimate parameters. The computation is the first part of
the statistics course (Descriptive Statistics) and the estimation is the second part (Inferential
Statistics)
Discrete variables are usually obtained by counting. There are a finite or countable number of
choices available with discrete data. You can't have 2.63 people in the room.
Continuous variables are usually obtained by measuring. Length, weight, and time are all
examples of continuous variables. Since continuous variables are real numbers, we usually round
them. This implies a boundary depending on the number of decimal places. For example: 64 is
really anything 63.5 <= x < 64.5. Likewise, if there are two decimal places, then 64.03 is really
anything 63.025 <= x < 63.035. Boundaries always have one more decimal place than the data
and end in a 5.
Levels of Measurement
There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go from
lowest level to highest level. Data is classified according to the highest level which it fits. Each
additional level adds something the previous level didn't have.
Types of Sampling
There are five types of sampling: Random, Systematic, Convenience, Cluster, and Stratified.
Random sampling is analogous to putting everyone's name into a hat and drawing out
several names. Each element in the population has an equal chance of occurring. While
this is the preferred way of sampling, it is often difficult to do. It requires that a complete
list of every element in the population be obtained.
Systematic sampling is easier to do than random sampling. In systematic sampling, the
list of elements is "counted off". That is, every kth element is taken. This is similar to
lining everyone up and numbering off "1,2,3,4; 1,2,3,4; etc". When done numbering, all
people numbered 4 would be used.
Convenience sampling is very easy to do, but it's probably the worst technique to use. In
convenience sampling, readily available data is used. That is, the first people the surveyor
runs into.
Cluster sampling is accomplished by dividing the population into groups -- usually
geographically. These groups are called clusters or blocks. The clusters are randomly
selected, and each element in the selected clusters are used.
Stratified sampling also divides the population into groups called strata. However, this
time it is by some characteristic, not geographically. For instance, the population might
be separated into males and females. A sample is taken from each of these strata using
either random, systematic, or convenience sampling.
Sample Laboratory
The purpose of this laboratory exercise is to familiarize yourself with the different sampling
techniques.
You need one page from a movie listing (like contained in TV-Guide). Note, if you actually use
TV Guide®, then you need to use two facing pages. Pick a page with little extraneous material,
other than the listings, on it.
For the purposes of this sampling project, a movie is included on the page or in a cluster if the
running time for the movie falls on the page.
Random Sampling
Number each movie on the page. If there are a lot of movies, you may wish to number every
other or every third movie.
Generate a random sample on 8 numbers between 1 and the number of movies on the page.
Write down the # generated and the running time for the movie corresponding to that number.
Systematic Sampling
Generate a random number between 1 and 6. Beginning with the movie corresponding to that
number, and then taking every 6th movie thereafter, write the # of the movie and the running
length of the movie.
Convenience Sampling
Stratified Sampling
On a separate piece of paper, write down the running times of all PG/PG13, R, and not-rated
(either NR or no rating given) movies in three columns -- ignore all other types (NC17, G, etc).
Split a sample of 8 proportionally to each type of movie (if R is 40%, then sample 40% of 8 =
3.2 -> 3 R movies). Use random sampling within each movie type. Record the running lengths of
the movies selected.
Cluster Sampling
Divide the page into equal regions so that each region has roughly 3 - 4 movies in each cluster.
Randomly select 3 clusters, and record the running length of all movies in those clusters.