01 Nature of Statistics-1-1

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 13

1

CHAPTER ONE
THE NATURE OF STATISTICS
1.1 Some basic concepts
Like all fields of learning, statistics has its own vocabulary. Some of the words and
phrases encountered in the study of statistics will be new to those not previously
exposed to the subject. The following are some terms that we will use extensively in the
remainder of this book.

Data
The raw material of statistics is data. For our purpose, we may define data as numbers.
The two kinds of numbers that we use in statistics are numbers that result from taking a
measurement and those that result from the process of counting. For example, when a
nurse weighs a patient or takes a patient’s temperature, a measurement, consisting of a
number such as 30 kg or 37 oC, is obtained. A different type of number is obtained when
a hospital administrator counts the number of patients – perhaps 15 – discharged from
the hospital on a given day. Each of these three numbers is a datum, and the three
numbers taken together are data.

Population and sample


Consider the following example:

Example 1.1
Suppose we wish to study the body masses of all students of Methodist University. It
will take us a long time to measure the body masses of all students of the university and
so we may select 20 of the students and measure their body masses. Suppose we obtain
the measurements in Table 1.1.

Table 1.1: Body masses (in kg) of 20 students


49 56 48 61 59 43 58 52 64 71
57 52 63 58 51 47 57 46 53 59

In this study, we are interested in the body masses of all students of Methodist
University. The set of body masses of all students of Methodist University is called the
population of this study. The set of body masses in Table 1.1, W = {49, 56, 48, …, 53,
59}, is a sample from this population.

Definition 1.1
2 ELEMENTARY STATISTICAL METHODS

A population is the set of all objects we wish to study.


Definition 1.2
A sample is part of the population we study to learn about the population.

Example 1.2
In a certain study, 900 men were selected from Nsawam. It was found that 25 are
smokers.
(a) What is the population in this study?
(b) What is the sample size?

Solution
(a) The population is men from Nsawam.
(b) The sample size is 900.

Remarks
1. If we wish to study the blood pressures of Ghanaians, then our population consists
of all blood pressures of Ghanaians. If we are interested in the blood pressures of
Ghanaian men, then we have a different population – the blood pressures of
Ghanaian men.
2. In many situations, we cannot afford to study the entire population. Instead, we take
a sample from the population and study this sample. If the sample is representative
of the population, then the information from the sample can be applied to the whole
population. One way of obtaining a representative sample is discussed in
Section 1.6.
3. A population may be finite or infinite. If a population of values consists of a fixed
number of these values, the population is said to be finite, otherwise, it is infinite.
An infinite population consists of an endless succession of values. In practice, the
term infinite population is used to refer to a population that cannot be enumerated in
a reasonable period of time.

Example 1.3
A finite population includes the following:
(i) Students studying Business Administration at the Methodist University.
(ii) All football clubs in the first and second divisions in Ghana.
(iii) All households in Nkawkaw.

Example 1.4
An infinite population includes the following:
(i) The set of real numbers between two integers.
(ii) All fishes in River Volta.
THE NATURE OF STATISTICS 3

(iii) All palm trees in West Africa.


What is statistics?
Statistics is a field of study concerned with:
(i) the collection, organization, and analysis of data, and
(ii) the drawing of inferences about a population from a sample taken from the
population.
It can be seen that statistics can be classified into two main branches – descriptive
statistics and inferential statistics. Descriptive statistics is concerned with the
collection and describing important features of data. In inferential statistics, our aim is
to make a decision about a population based on a sample from the population. Most of
the modern use of statistics, particularly in engineering and the sciences, focus on
inference rather than description. For example, an engineer who designs a new
computer chip will manufacture a sample or prototypes and will want to draw
conclusions about how all these devices will work once they are in full-scale production.
There are two main methods used in inferential statistics: estimation and hypothesis
testing. These methods are discussed in chapters five and six.

1.2 Opportunities for statisticians


In almost every endeavour of human activity, the scientific method has proven effective
for solving problems and improving performance. This approach involves the collection
of data pertinent to the particular problem. Statisticians play several important roles in
these scientific studies. First, they plan the studies to ensure that the data are collected
efficiently and answer the questions relevant to the investigation. Second, they analyze
the data to discover what the study has demonstrated and what issues need further
investigation.
In industry, statisticians design and analyze experiments to improve the safety,
reliability and performance of products of all types. Statisticians are also directly
involved with quality control issues in manufacturing to ensure consistent product
dependability.
Statisticians work with social scientists to survey attitudes and opinions. In
education, statisticians are involved with the assessment of educational aptitude and
achievement and with experiments designed to measure the effectiveness of curricular
innovations. Statisticians are an important part of research teams which search for
better varieties of agricultural crops, and for safer and more effective use of fertilizers.
In major hospitals, medical schools and government agencies, statisticians study the
control, prevention, diagnosis and treatment of diseases, injuries and other health
abnormalities. They also investigate the efficiency of health delivery systems and
practices. In the pharmaceutical industry, statisticians design experiments to measure
4 ELEMENTARY STATISTICAL METHODS

the efficacy of drugs in treating illnesses and to assess the likelihood of undesirable side
effects.
Statistical methods are also used in business practice, e.g. to forecast demand for
goods and services. Actuaries use statistical methods to assess risk levels and set
premium rates for insurance and pension industries.
Statisticians also play a vital role in assessing employment levels and needs of the
population for health, economic and social services. Without accurate information from
agencies like Ghana Statistical Services, Customs Excise and Preventive Services
(CEPS), Environmental Protection Agency, the government cannot effectively allocate
its resources.
Research in statistical methods is carried out in universities, government agencies
and in private industry. Statisticians employed in these activities develop new ways to
collect and analyze data for the many types of data and experimental settings
encountered in practical studies.

1.3 Types of variables


Any type of observation which can take different values for different people, or different
values at different times, or places, is called a variable. The following are examples of
variables:
(a) family size, number of hospital beds, year of birth, number of schools in a country,
etc.
(b) height, mass, blood pressure, temperature, blood glucose level, etc.
There are, broadly speaking, two types of variables – quantitative and qualitative
variables.

1.3.1 Quantitative variables


A quantitative variable is one that can take numerical values. The variables in (a) and
(b) are examples of quantitative variables. Quantitative variables may be characterized
further as to whether they are discrete or continuous.

1.3.2 Discrete variables


The variables in (a) above, can be counted. These are examples of discrete variables. A
discrete variable is characterized by gaps or interruptions in the values that it can
assume. The following example illustrates the point. The number of daily admissions to
a hospital is a discrete variable since it must be represented by a whole number, such as
0, 1, 2 or 3. The number of daily admissions on a given day cannot be a number such as
1.8, 3.96 or 5.33.

1.3.3 Continuous variables


THE NATURE OF STATISTICS 5
The variables in (b), above, can be measured. These are examples of continuous
variables. A continuous variable does not possess the gaps or interruptions characteristic
of a discrete variable. A continuous variable can assume any value within a specific
relevant interval of values assumed by the variable.
1.3.4 Qualitative variables
Variables which cannot take numerical values are called qualitative variables. A
qualitative variable can neither be measured nor be counted.
The following are examples of qualitative variables: place of birth, nationality,
colour, colour of hair, gender, blood group, smoking habit, surname, rank in military.

1.4 Measurement scales


Variables can further be classified according to the following four levels of
measurement: nominal, ordinal, interval and ratio. A detailed discussion of this can be
found in Stevens (1946).

1.4.1 Nominal scale


This scale of measure applies to qualitative variables only. On the nominal scale, no
order is required. For example, gender is nominal, blood group is nominal, and marital
status is also nominal. On the nominal scale, categories are mutually exclusive. Thus an
item must belong to exactly one category. Notice that we cannot do arithmetic
operations on data measured on the nominal scale.

1.4.2 Ordinal scale


This scale also applies to qualitative data. On the ordinal scale, order is necessary. This
means that one category is lower than the next one or vice versa. For example, in the
Army, the rank of private is lower than the rank of captain, which is lower than the rank
of major, and so on. Thus the rank of an army officer is measured on the ordinal scale.
In universities, the rank of an academic staff is measured on the ordinal scale. Grades
are also ordinal, as excellent is higher than very good, which in turn is higher than good,
and so on.
It should be noted that in the ordinal scale, differences between category values have
no meaning. For example, although Professor is higher than Lecturer, the difference
between these two ranks does not exist numerically. Similarly, if 4 denotes “excellent”,
3 denotes “very good”, 2 denotes “good” and 1 denotes “fair”, it does not mean that a
candidate who is rated “excellent” is twice as competent as a candidate who is rated
“good”, just because “excellent” is denoted by 4 and “good” is denoted by 2.

1.4.3 Interval scale


6 ELEMENTARY STATISTICAL METHODS

This scale of measurement applies to quantitative data only. In this scale, the zero point
does not indicate a total absence of the quantity being measured. An example of such a
scale is temperature on the Celsius or Fahrenheit scale. Suppose the minimum
temperatures of 3 cities, A, B and C, on a particular day were 0 C, 20 C and 10 C,
respectively. It is clear that we can find the differences between these temperatures.
For example, city B is 20 C hotter than city A. However, we cannot say that city A has
no temperature. Note that city A has a temperature equivalent to 32 F. Moreover, we
cannot say that city B is twice as hot as city C, just because city B is 20 C and city C is
10 C. The reason is that, in the interval scale, the ratio between two numbers is not
meaningful.

1.4.4 Ratio scale


This scale of measurement also applies to quantitative data only and has all the
properties of the interval scale. In addition to these properties, the ratio scale has a
meaningful zero starting point and a meaningful ratio between 2 numbers.
An example of variables measured on the ratio scale, is weight. A weighing scale
that reads 0 kg gives an indication that there is absolutely no weight on it. So the zero
starting point is meaningful. If Yaw weighs 40 kg and Akosua weighs 20 kg, then Yaw
weighs twice as Akosua. Another example of a variable measured on the ratio scale is
temperature measured on the Kelvin scale. This has a true zero point.

1.4.5 Summary of types of variables


Fig. 1.1 shows a chart, summarizing the relationships between the various types of
variables and measurement scales.
Variables

Quantitative Qualitative

Continuous Discrete e.g. Nominal Ordinal e.g.


e.g. height number of e.g. colour ranks and
and mass students grades

Fig. 1.1: Types of variables

Exercise 1(a)
THE NATURE OF STATISTICS 7

1. For each of the following variables, state whether it is quantitative or qualitative and
specify the measurement scale that is employed when taking measurements on each.
(a) gender of babies born in a hospital, (b) marital status,
(c) temperature measured on the Kelvin scale, (d) nationality,
(e) masses of babies in kg, (f) temperature in C,
(g) prices of items in a shop, (h) position in an exam.
(i) the rank of an academic staff in a University.
2. For each of the following situations, answer questions (a) through (d):
(a) What is the variable in the study? (b) What is the population?
(c) What is the sample size? (d) What measurement scale was used?
A. A study of 150 students from St. Ann School, showed that 10% of the students
had blood group A.
B. A study of 100 patients admitted to St. Paul’s Hospital, showed that 25 patients
lived 8 km from the hospital.
C. A study of 50 teachers in Town A showed that 5% of the teachers earn GH
¢800.00 per month.
3. Explain what is meant by descriptive statistics.
4. Explain what is meant by inferential statistics.
5. Define the following terms:
(a) population, (b) qualitative variable,
(c) discrete variable, (d) sample,
(e) continuous variable, (f) quantitative variable.

1.5 Sources of statistical data


Sources of statistical data can be put into two main categories, depending on their
originality. These are primary sources and secondary sources. Data from a primary
source are called primary data while those from a secondary source are called
secondary data.

1.5.1 Primary sources of data


When data are originally collected by the researcher, they are called primary data.
Primary data can be obtained by designing an experiment or by conducting a survey.

Experiments
Frequently, the data needed to answer a question are available only as a result of an
experiment. A researcher may wish to know which of several drugs is most effective
for treating headache. The researcher might conduct an experiment by assigning the
8 ELEMENTARY STATISTICAL METHODS

drugs to different patients. Subsequent evaluation of the responses to the different drugs
might enable the researcher to decide which drug is most effective for treating headache.

Surveys
In surveys, the aim of the researcher is to find a way of obtaining information from
individuals, referred to as respondents. Such information can be factual (for example,
the number of cars per household, age of respondents, or income) or can concern the
attitudes of the respondent (for example, his attitude to racial discrimination, or his
liking for a brand of cigarette).
A survey conducted on a whole population of interest is called a census and a
survey conducted on a sample from a population is called a sample survey. Surveys
involve the use of questionnaires to obtain desired information from respondents.
Questionnaires may be administered by post, by telephone, by e-mail or in person.

Personal interview
Here, we gather information through oral questioning.
Disadvantages
 It can be very costly.
 Requires specially trained interviewers.
Advantages
 It usually yields a high proportion of returns because a well-trained enumerator can
establish the necessary rapport to ensure co-operation by the respondent.
 Information on conceptually difficult items can be obtained since the enumerator
can explain what is required.
 The information obtained is likely to be more accurate than that obtained by other
methods since the interviewer can clarify seemingly unclear questions by explaining
the questions to the respondent.
 Visual materials to which the respondent is able to react can be presented.

Telephone interview
This is a variation of the personal interview.
Advantages
 It saves time.
 It is cheaper than personal interviews.
 It is easy to train and direct interviewers.
THE NATURE OF STATISTICS 9

Disadvantages
 Telephone subscribers are usually not representative of the whole population. There
is therefore the risk of a biased survey, unless great care is taken in the use of the
method.
 Sensitive questions cannot be asked in this type of enquiry.
 Its use is limited to urban areas with efficient telephone services.

Postal survey
In postal survey, questionnaires are posted to respondents; they complete them and mail
them back to you. The questionnaires are usually accompanied by a letter that explains
the survey, encourages complete and candid answers and sets a deadline for returning
responses. A stamped addressed envelope is customarily included to facilitate returns.
Advantages
 It makes wide geographic coverage possible at comparatively little cost.
 There is no need to train interviewers.
 It encourages the respondent to answer questions frankly in the privacy of the home
and without the subjective influence of the interviewers.
 There is lack of interviewer bias.
Disadvantages
 One cannot be sure of the interpretation placed by the respondent on the questions
asked.
 There may be a delay in receiving responses.
 There is the problem of non-response to the survey. This non-response is certain to
affect the validity of the survey as it is most unlikely that the sections of the sample
that do and do not reply are similar in the characteristics under consideration.

1.5.2 Secondary sources of data


Secondary data are data originally not collected under the supervision of the person or
organization using the data. Secondary data are available from libraries, government
agencies and the internet.

Libraries
A common place to look for secondary data is a library. Here, data can be obtained
from magazines, journals and newspapers.

Government agencies
10 ELEMENTARY STATISTICAL METHODS

Government data can be obtained from publications issued by local, state, national and
international governments. Such data include laws, regulations, statistics and consumer
information.

Internet
Secondary data can be obtained from search engines such as Yahoo, Google, MSN.com,
etc., on the internet.

Advantages of secondary data


 Immediately available.
 Cheaper than obtaining new data.

Disadvantages of secondary data


 May be incomplete.
 May have been collected to satisfy different needs.
 No control exists over the method of collection and accuracy of the data.

1.6 Methods of data collection


Why do we sample from a population? In Section 1.1, we learnt that it is sometimes not
feasible to study the entire population. Three reasons why we sample are:
(a) The determination of the characteristic under investigation may involve a destructive
test, as for example in determining the tensile strength of a metal specimen or the
lifetime of a car battery.
(b) It is sometimes impossible to check all items in a population. For example, it is not
possible to count the population of fish in a lake, the population of birds and the
population of snakes.
(c) The cost of studying all the items in a population is often prohibitive and time
consuming.

Sampling methods
A sampling method (or sampling design) is a definite plan for obtaining a sample from a
given population. Practical difficulties in handling certain parts of a population may
point to their elimination from the scope of a survey. Thus, any sample selection
procedure will give some individuals the chance to be included in the sample while
excluding others. The people who have a chance of being included among those
selected, constitute a sample frame. Examples are: the Electoral Register of Ghana
THE NATURE OF STATISTICS 11

(this contains the names of all those who can vote in Ghana), the list of members of
professional associations (statisticians, doctors, lawyers, etc.).

Simple random sampling


Once a researcher has made a decision about a sample frame, the next question is how
to select the individual units to be included. One method is to use simple random
sampling. Here, each item in the population has the same chance of being selected.
The sample obtained by using simple random sampling is called a simple random
sample. One way of obtaining a simple random sample is to use the “lottery system”.
The lottery system
The lottery system consists of writing the name of each item in the sample frame on a
slip of paper or a card and then drawing them from a container one after the other. To
ensure a bias free selection, shuffle the cards or the slips of paper before each draw.

Advantages of the lottery system


 It is independent of the properties of the population.
 It is a very reliable method of selecting random samples.
 It eliminates selection bias.
Disadvantages of the lottery system
 It is time-consuming and cumbersome when the population is large.
 Cannot be used when the population is infinite.
A discussion of methods of data collection can be found from Levy and Lemeshow
(1999) and Rao (2000).

1.7 Computers and statistical analysis


The recent widespread use of computers has had a tremendous impact on statistical
analysis. Computers can perform more calculations faster and far more accurately than
can human technicians. The use of computers makes it possible for investigators to
devote more time to the improvement of the quality of raw data and the interpretation of
the results.
The current prevalence of microcomputers and the abundance of statistical software
packages have further revolutionized statistical computing. The researcher in search of
a statistical software package will find the book by Woodward et al. (1987) extremely
helpful. This book describes approximately 140 packages. Among the most prominent
ones are: Statistical Package for the Social Sciences (SPSS), S-plus, Minitab, SAS and
GENSTAT. The spreadsheet, Excel, also has facilities for statistical analysis.

Exercise 1(b)
12 ELEMENTARY STATISTICAL METHODS

1. Give two reasons why it is sometimes necessary to take a sample from a population.
2. State two ways of obtaining primary data.
3. State two sources of secondary data.
4. State two advantages and two disadvantages of the lottery system for taking a simple
random sample from a population.
5. State two disadvantages and one advantage of telephone interview, as a means of
collecting data.
Revision Exercises 1
1. Briefly describe the difference between descriptive statistics and inferential
statistics.
2. A doctor examined a patient to determine the cause of a disease. He took a drop of
blood and used it to determine the state of health of the patient. What aspect of
statistics is the doctor employing in order to form a judgement?
3. In your own words, explain and give an example of each of the following statistical
terms:
(a) population, (b) sample.
4. Mrs. Akrong wants to check whether the pot of soup she is cooking has the right
taste and quantity of salt. She did this by tasting a small portion of the soup scooped
in a ladle. What aspect of statistics is she employing in order to form a judgement?
Briefly explain why she decided to use this particular method?
5. Explain the difference between qualitative and quantitative data. Give examples of
qualitative and quantitative data.
6. List the four levels of measurement and give examples.
7. Explain the difference between:
(a) nominal and ordinal data, (b) a census and a sample survey,
(c) a discrete data and a continuous data.

References
1. Levy, P. S. and Lemeshow, S. (1999). Sampling of populations, Methods and
Appications. John Wiley and Sons Inc., New York.
2. Rao, P. S. R. S. (2000). Sampling Methodologies with applications. Chapman and
Hall, London.
3. Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103,
677 – 680.
THE NATURE OF STATISTICS 13

4. Woodward, W. A., Elliott, A. C. and Gray, H. L. (1987). Directory of Statistical


Microcomputer Software. Marcel Dekker, New York.

You might also like