Data Collection and Sampling

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/322656396
DATA COLLECTION AND SAMPLING
Preprint · January 2018

DOI: 10.13140/RG.2.2.16052.55688
CITATIONS READS
0 4,345
1 author:
Hitesh Mohapatra
Veer Surendra Sai University of Technology
40 PUBLICATIONS 3 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Wireless Sensor Network View project
Intrusion Detection System View project
All content following this page was uploaded by Hitesh Mohapatra on 23 January 2018.
The user has requested enhancement of the downloaded file.

SYLLABUS OF UNIT III
Data Collection: Primary data, secondary data, Processing and analysis of data, Measurement
of relationship, Statistical measurement & significance, Random Sampling, Systematic
Sampling, Stratified Sampling, Cluster Sampling, Multistage Sampling.
UNIT III
3.1 DATA COLLECTION
The task of data collection begins after a research project has been defined and research
design/plan chalked out. Two types of data: primary data and secondary data.
Primary data are collected afresh and for first time and it is original in character. The secondary
data are already collected by someone else and have already been passed through statistical
process.
Experiment and surveys
We collect primary data from experiments in experimental research. In case we do research of

descriptive type, survey refers to the method of securing information from all or a selected no.
of respondents.
SURVEY EXPERIMENT
Survey are conducted in case of descriptive Experiments are a part of experimental
research research.
Survey type research usually have larger Experimental studies need smaller sample
sample.
Survey are concerned with describing Experimental research provide a systematic &
recording, analyzing and interpreying logical logical method for answering question.
conditions that exist or existed. The variables Certain variables are carefully controlled or
that exist or have already occurred are manipulated.
selected and observed.
No manipulation of variables.
Survey are appropriate in case of social and Experiments are essential feature of physical
behaviourial sciences and natural sciences.
Survey are examples of field research. Experiments are a example of lab research.
Survey are concerned with hypothesis Experiments provide a method for hypothesis
formulation & testing. testing.
Possible relation between data & unknowns Experiments are meant to determine such
can bec studied through survey. relationship.
3.1.1 COLLECTION OF PRIMARY DATA: There are several methods.
(A) OBSERVATION METHOD:

Observation becomes a scientific tool & the method of data collection for the researcher when
it serves a formulated research purpose, is systematically planned and recorded & is subjected
to checks and controls on validity and reliability. The information is sought by investigators
direct observation without asking from the respondent. For example in study of consumer
behavior the investigator instead of asking brand of wrist watch used by respondent may look at
the watch.
Advantages of observation method
1. Subjective bias is eliminated.

2. Information obtained by this method relates to what is currently happening.
3. this method is independent of respondents willingness to respond.
Limitations of observation method
1. It is an expensive method.
2. Information provided by this method is limited.
3. Sometimes observation is biased. Some people create obstacle to collect data
effectively.
Structured Observation: If the observation is characterized by a careful definition of units to be

observed, the style of recording the observed information standardized conditions of
observation & selection of pertinent data of observation then it is called structured Observation
Unstructured Observation: If the observation is not structured observation then it is called

unstructured Observation
Participant Observation: If the observer observes by making himself a member of the group he
is observing so that he can experience what the members of the group experience, the
observation is called Participant Observation otherwise non- Participant Observation.
Uncontrolled observation: If the observation takes place in the natural setting then it is called
uncontrolled observation
Controlled observation: If the observation takes place according to definite pre-arranged plans
involving experimental procedure then it is called controlled observation. In controlled
observation we use mechanical instruments as aids to accuracy and standardization.
(B) INTERVIEW METHOD:
This method involves presentation of oral- verbal stimuli and reply of oral verbal responses.This
method can be used through personal interview and if possible through telephone interview.
(B1) Personal interview: It may be in form of direct personal investigation or indirect oral
investigation. In case of direct personal investigation the interviewer has to collect information
personally from the sources. In some cases it is impossible to contact directly the person
concerned. In such cases an indirect oral examination can be conducted under which the
interviewer has to cross examine other persons who are supposed to have knowledge about the
problem under investigation. Most of the commission/ committees appointed by govt. make use
of this method.
Structured interviews involve use of a set of predetermined questions and highly standardized
techniques of recordings
Unstructured interviews involve flexibility of approach to questioning. It demands deep

knowledge and greater skill of interviewer. It is used for explanatory or formulative research.
Focused interview is meant to focus attention on the given experience of the respondent & its
effects. The main task of interviewer is to confine the respondent to a discussion of issues with
which he seeks conversance.
Clinical interview is concerned with feelings or motivations or with the course of individuals life
experience.In non-directive interview the interviewer encourage the respondent to talk about
the given topic with minimum of direct questioning.
Advantages of Personal interview Method
1. More information and of greater depth can be obtained.

2. Interviewer by his own skill can overcome the resistance if any of the respondent.
3. There is opportunity to restructure question, so greater flexibility is there.
4. Observation method can be applied recording verbal answers.
5. Personal information can be obtained easily
6. Interviewer can control which person will answer the question. Group discussions may
be held.
7. Language of interview can be adopted to the ability of the person interviewed.
Weaknesses of Personal interview Method
1. It is expensive method and bias of interviewer is possible.

2. More time consuming for large sample.
3. Presence of Interviewer on spot may over stimulate respondent.
4. Organisation required for selecting, training and supervising field staff is more complex.
5. Interview at times may introduce systematic error.
(B2) Telephone interview.
This method of collecting information is contacting respondents by telephone. It is not widely

used but important in industrial surveys.
Merits
1. It is more flexible & faster in comparison to other methods.

2. It is cheaper than personal interview Method.
3. Recall is easy ; call backs are simple & economic.
4. There is higher rate of response than mailing method.
5. Replies can be recorded without causing embarrassment to respondent.
6. Interviewer can explain requirement more easily.
6. Access can be gained to respondent who otherwise can not be contacted.
Demerits
1. Little time is given to respondents.

2. Surveys are restricted to telephone users.
3. Not suitable for surveys where comprehensive answers are required.
4. There is more possibility of bias of interviewer.
5. Questions have to be short & to the point. Probes are difficult to handle.
6. Replies can be recorded without causing embarrassment to
7. Interviewer can explain requirement more easily.
8. Access can be gained to respondent who otherwise can not be contacted.
(C). COLLECTION OF DATA THROUGH QUETIONNAIRE

This method is adopted by private individuals, research workers, pvt. & public
organizations & by govt. The questionnaire is mailed to respondents who are expected
to read & understand the question, write the reply & return .
Merits
1. There is low cost.

2. Free from bias of interviewer.
3. Respondents have adequate time to answer.
4. Large sample can be made use & thus results can be made more dependable.
Demerits
1. Low rate of return of duly filled in questionnaires.

2. Used only for educated & cooperating respondents.
3. No control over questionnaire.
4. There is inbuilt inflexibility because of the difficulty of amending the approach once
questionnaires have been despatched.
5. There is also the possibility of ambiguous replies or omission of replies altogether to
certain questions; interpretation of omissions is difficult.
6. It is difficult to know whether willing respondents are truly representative.
Three Main aspects of a questionnaire:

1. General form:
The general form of a questionnaire is either structured or unstructured questionnaire.
In structured questionnaires there are definite, concrete and pre-determined questions. The
questions are presented with exactly the same wording and in the same order to all respondents.
Resort is taken to this sort of standardisation to ensure that all respondents reply to the same set of
questions. The form of the question may be either closed (i.e., ‘yes’ or ‘no’) or open (i.e., inviting
free response) but should be stated in advance.
Structured questionnaires may also have fixed alternative questions in which responses of the
informants are limited to the stated alternatives. Thus a highly structured questionnaire is one in
which all questions and answers are specified and comments in the respondent’s own words are
held to the minimum.
When these characteristics are not present in a questionnaire, it can be termed as unstructured.
Structured questionnaires are simple to administer and relatively inexpensive to analyse. The
provision of alternative replies, at times, helps to understand the meaning of the question clearly.
On the basis of the results obtained in pretest (testing before final use) operations from the use of
unstructured questionnaires, one can construct a structured questionnaire for use in the main study.
2. Question sequence: To make the questionnaire effective and to ensure quality to the replies
received, a researcher should pay attention to the question-sequence. A proper sequence of
questions reduces the chances of individual questions being misunderstood.
The question-sequence must be clear and smoothly-moving.The first few questions are
particularly important because they are likely to influence the attitude of the respondent and
in seeking his desired cooperation. The opening questions should be such as to arouse human
interest.
The following type of questions should generally be avoided as opening questions in a
questionnaire:
1. questions that put too great a strain on the memory or intellect of the respondent;
2. questions of a personal character;
3. questions related to personal wealth, etc.
Knowing what information is desired, the researcher can rearrange the order of the questions
to fit the discussion in each particular case. Difficult questions must be relegated towards the end
so that even if the respondent decides not to answer such questions, information would have
already been obtained. The answer to a given question is a function not only of the question itself,
but of all previous questions.
3.Question formulation and wording:

Questions must be very clear and impartial.
All questions should be (a) easily understood; (b) simple (c) concrete and should conform as much as
possible to the respondent’s way of thinking.
(For instance, instead of asking. “How many razor blades do you use annually?” The more
realistic question would be to ask, “How many razor blades did you use last week?”
Concerning the form of questions, we can talk about two principal forms, viz., multiple choice
question and the open-end question. The question with only two possible answers ( ‘Yes’ or ‘No’) is
called as a ‘closed question.’
Open-ended questions which are designed to permit a free response from the respondent rather
than one limited to certain stated alternatives are considered appropriate. Such questions give the
respondent considerable latitude in phrasing a reply. Getting the replies in respondent’s own words is
the major advantage of open-ended questions. But open-ended questions are more difficult to handle,
raising problems of interpretation, comparability and interviewer bias.
Researcher must use proper wordings of questions since reliable and meaningful returns depend
on it to a large extent. Simple words, which are familiar to all respondents should be employed. Words
with ambiguous meanings must be avoided. Similarly, danger words, catch-words or words with
emotional connotations should be avoided.
Essentials of a good questionnaire:

Questionnaire should be comparatively short and simple.
Questions should proceed in logical sequence moving from easy to more difficult questions.
Personal and intimate questions should be left to the end.
Technical terms and vague expressions capable of different interpretations should be avoided.
Questions may be dichotomous (yes or no answers), multiple choice (alternative answers listed) or
open-ended. The latter type of questions are often difficult to analyse and hence should be avoided in
a questionnaire. There should be some control questions in the questionnaire which indicate the
reliability of the respondent.
For instance, a question designed to determine the consumption of particular material may be
asked first in terms of financial expenditure and later in terms of weight.
The control questions, thus, introduce a cross-check to see whether the information collected
is correct or not. Questions affecting the sentiments of respondents should be avoided.
Adequate space for answers should be provided in the questionnaire to help editing and
tabulation. The quality of the paper, along with its colour, must be good so that it may attract the
attention of recipients.
(D) COLLECTION OF DATA THROUGH SCHEDULES

In this method of data collection schedules (proforma containing a set of questions) are being filled in
by the enumerators who are appointed for the purpose.
These enumerators along with schedules, go to respondents, put to them the questions from the
proforma in the order the questions are listed and record the replies in the space meant for the same in
the proforma.
In certain situations, schedules may be handed over to respondents and enumerators may help them in
recording their answers to various questions in the said schedules.
Enumerators explain the aims and objects of the investigation and also remove the difficulties which
any respondent may feel in understanding the implications of a particular question or the definition or
concept of difficult terms.
The enumerators should be selected and trained to perform their job well and the nature and
scope of the investigation should be explained to them thoroughly so that they may well
understand the implications of different questions put in the schedule.
Enumerators should be intelligent and must possess the capacity of cross-examination in
order to find out the truth. Above all, they should be honest, sincere, hardworking and should
have patience and perseverance.
This method of data collection is very useful in extensive enquiries and can lead to fairly
reliable results. It is very expensive and is usually adopted in investigations conducted by
governmental agencies or by some big organisations.
Population census all over the world is conducted through this method.
DIFFERENCE BETWEEN QUESTIONNAIRES AND SCHEDULES
Questionnaire Schedule
The questionnaire is sent through mail to The schedule is filled out by the researcher or
informants to be answered. the enumerator, who can interpret questions
when necessary.
To collect data through questionnaire is cheap. It is more expensive.
Non-response is high . It is very low in schedules.
It is not always clear as to who replies. The identity of respondent is known.
Personal contact is not possible. Direct personal contact is established with
respondents.
It is used only when respondents are literate and Information can be gathered even when the
cooperative respondents is illiterate.
Wider and more representative distribution of There are difficulty in sending enumerators
sample is possible. over a relatively wider area.
The physical appearance of questionnaire must This is not so in case of schedules as they are
be attractive. to be filled in by enumerators and not by
respondents.
SOME OTHER METHODS OF DATA COLLECTION

Some other methods of data collection, particularly used by big business houses in modern times.
1. Warranty cards: Warranty cards are usually postal sized cards which are used by dealers of
consumer durables to collect information regarding their products. The questions are printed on
the ‘warranty cards’ which is placed inside the package along with the product with a request to
the consumer to fill in the card and post it back to the dealer.
2. Distributor or store audits: Distributor or store audits are performed by distributors as well as
manufactures through their salesmen at regular intervals. Distributors get the retail stores audited
through salesmen and use such information to estimate market size, market share, seasonal
purchasing pattern and so on. The data are obtained in such audits by observation. The advantage
of this method is that it offers the most efficient way of evaluating the effect on sales of variations
of different techniques in-store promotion.
3. Pantry audits: Pantry audit technique is used to estimate consumption of the basket of goods
at the consumer level. The investigator collects an inventory of types, quantities and prices of
commodities consumed. Data are recorded from the examination of consumer’s pantry. The
objective is to find out what types of consumers buy certain products and certain brands.
Limitation of pantry audit is that, at times, it may not be possible to identify consumers’
preferences from the audit data alone.
4. Consumer panels: An extension of the pantry audit approach on a regular basis is known as
‘consumer panel’, where a set of consumers are arranged to come to an understanding to maintain
detailed daily records of their consumption and the same is made available to investigator on
demands. In other words, a consumer panel is essentially a sample of consumers who are interviewed
repeatedly over a period of time.
Initial interviews are conducted before the phenomenon takes place to record the attitude of the
consumer. A second set of interviews is carried out after the phenomenon has taken place to find out
the consequent changes occurred in the consumer’s attitude. Consumer panels have been used in the
area of consumer expenditure, public opinion and radio and TV listenership.
5.Use of mechanical devices: The use of mechanical devices has been widely made to collect
information by way of indirect means. Eye camera, Pupilometric camera, Psychogalvanometer,
Motion picture camera and Audiometer are the principal devices so far developed and commonly
used by modern big business houses.
Eye cameras are designed to record the focus of eyes of a respondent on a specific portion of a
sketch or diagram or written material. Such an information is useful in designing advertising material.
Pupilometric cameras record dilation of the pupil as a result of a visual stimulus. The extent of dilation
shows the degree of interest aroused by the stimulus. Psychogalvanometer is used for measuring the
extent of body excitement as a result of the visual stimulus. Motion picture cameras can be used to
record movement of body of a buyer while deciding to buy a consumer good from a shop or big store.
6. Projective techniques: Projective techniques (or indirect interviewing techniques) for the
collection of data is to use projections of respondents for inferring about underlying motives,
urges, or intentions which are such that the respondent either resists to reveal them or is unable to
figure out himself. In projective techniques the respondent in supplying information tends
unconsciously to project his own attitudes or feelings on the subject under study. Projective
techniques is important in motivational researches or in attitude surveys.
(i) Word association tests: These tests are used to extract information regarding such words which
have maximum association. This technique is frequently used in advertising research.
(ii) Sentence completion tests: These tests happen to be an extension of the technique of word
association tests. This technique permits the testing not only of words (as in case of word
association tests), but of ideas as well and thus, helps in developing hypotheses and in the
construction of questionnaires.
(iii) Story completion tests: Such tests are a step further wherein the researcher may contrive
stories instead of sentences and ask the informants to complete them. The respondent is given just
enough of story to focus his attention on a given subject and he is asked to supply a conclusion to
the story.
(iv) Verbal projection tests: These are the tests wherein the respondent is asked to comment on or
to explain what other people do. For example, why do people smoke? Answers may reveal the
respondent’s own motivations.
(v) Pictorial techniques: There are several pictorial techniques. The important ones are as follows:
(a) Thematic apperception test (T.A.T.): The TAT consists of a set of pictures that are
shown to respondents who are asked to describe what they think the pictures represent.
The replies of respondents constitute the basis for the investigator to draw inferences
about their personality structure, attitudes, etc.
(b) Rosenzweig test: This test uses a cartoon format wherein we have a series of cartoons
with words inserted in ‘balloons’ above. The respondent is asked to put his own words
in an empty balloon space provided for the purpose in the picture.
(c) Rorschach test: This test consists of ten cards having prints of inkblots. The design
happens to be symmetrical but meaningless. The respondents are asked to describe what
they perceive in such symmetrical inkblots
(d) Holtzman Inkblot Test (HIT): This test consists of 45 inkblot cards which are based on
colour, movement, shading and other factors involved in inkblot perception. Only one
response per card is obtained from the subject (or the respondent) and the responses of a
subject are interpreted at three levels of form appropriateness. Form responses are
interpreted for knowing the accuracy (F) or inaccuracy (F–) of respondent’s percepts;
shading and colour for ascertaining his affectional and emotional needs.
(e) Tomkins-Horn picture arrangement test: This test is designed for group administration.
It consists of twenty-five plates, each containing three sketches that may be arranged in
different ways . Respondent is asked to arrange them in a sequence. The responses are
interpreted as providing evidence confirming certain norms, respondent’s attitudes, etc.
(vi) Play techniques: Subjects are asked to improvise a situation in which they have been
assigned various roles. The researcher observe such traits as hostility, dominance, sympathy,
prejudice or the absence of such traits. These techniques have been used for knowing the attitudes
of younger ones through manipulation of dolls.
(vii) Quizzes, tests and examinations: This is also a technique of extracting information regarding
specific ability of candidates indirectly. In this procedure both long and short questions are
framed to test analytical ability.
(viii) Sociometry: It is a technique for describing the social relationships among individuals in a
group. It attempts to describe attractions or repulsions between individuals by asking them to indicate
whom they would choose or reject in various situations.
7. Depth interviews: Depth interviews are those interviews that are designed to discover underlying
motives and desires and are often used in motivational research. Such interviews are held to explore
needs, desires and feelings of respondents. They aim to elicit unconscious as also other types of
material relating especially to personality dynamics and motivations. Depth interviews require great
skill of interviewer and involve considerable time.
8. Content-analysis: Content-analysis consists of analysing the contents of documentary
materials such as books, magazines, newspapers and the contents of all other verbal materials
which can be either spoken or printed. Since 1950’s content-analysis is qualitative analysis
concerning the message of the existing documents. Content-analysis is measurement through
proportion. It measures pervasiveness and that is sometimes an index of the intensity of the force.
3.2 COLLECTION OF SECONDARY DATA

Secondary data means data that are already available i.e., they refer to the data which have already
been collected and analysed by someone else.
When the researcher utilises secondary data, then he has to look into various sources from where he
can obtain them. Secondary data may either be published data or unpublished data.
Published data are available in:
(a) various publications of the central, state are local governments;
(b) various publications of foreign governments or of international bodies
(c) technical and trade journals;
(d) books, magazines and newspapers;
(e) reports and publications of various associations connected with business and industry, banks,
stock exchanges, etc.;
(f) reports prepared by research scholars, universities, economists, etc. in different fields; and
(g) public records and statistics, historical documents, and other sources of published information.
The sources of unpublished data are many; they may be found in diaries, letters, unpublished
biographies and autobiographies and also may be available with scholars and research workers, trade
associations, labour bureaus and other public/ private individuals and organisations.
Researcher must be very careful in using secondary data. He must make a minute scrutiny
because it is just possible that the secondary data may be unsuitable or may be inadequate in the
context of the problem which the researcher wants to study. In this connection
The researcher before using secondary data, must see that they possess following characteristics:
1. Reliability of data: The reliability can be tested by finding out such things about the said data:
(a) Who collected the data?
(b) What were the sources of data?
(c) Were they collected by using proper methods
(d) At what time were they collected?
(e) Was there any bias of the compiler?
(f) What level of accuracy was desired? Was it achieved ?
2. Suitability of data: The data that are suitable for one enquiry may not be suitable in another
enquiry. If the available data are unsuitable, they should not be used by the researcher.
Researcher must scrutinise the definition of various terms and units of collection used at the time
of collecting the data from the primary source originally. The object, scope and nature of the
original enquiry must be studied. If the researcher finds differences in these, the data will remain
unsuitable.
3. Adequacy of data: If the level of accuracy achieved in data is found inadequate for the
purpose of the present enquiry, they will be considered as inadequate and should not be used by
the researcher. The data will also be inadequate, if they are related to an area which may be either
narrower or wider than the area of the present enquiry.
The already available data should be used by the researcher only when he finds them reliable,
suitable and adequate.The most desirable approach with regard to the selection of the method
depends on the nature of the particular problem and on the time and resources (money and
personnel) available along with desired accuracy. But, this depends upon the ability and
experience of the researcher.
Guidelines for Constructing Questionnaire/Schedule
1. The researcher must keep in view the problem he is to study for it provides the starting
point for developing the Questionnaire/Schedule. He must be clear about the various
aspects of his research problem to be dealt with in the course of his research project.
2. Appropriate form of questions depends on the nature of information sought, the sampled
respondents and the kind of analysis intended. The researcher must decide whether to
use closed or open-ended question.
3. Rough draft of the Questionnaire/Schedule be prepared, giving due thought to the
appropriate sequence of putting questions.
4. Researcher must invariably re-examine, and in case of need may revise the rough draft
for a better one. Technical defects must be minutely scrutinised and removed.
5. Pilot study should be undertaken for pre-testing the questionnaire. The questionnaire
may be edited in the light of the results of the pilot study.
6. Questionnaire must contain simple but straight forward directions for the respondents so
that they may not feel any difficulty in answering the questions.
Guidelines for Successful Interviewing

1. Interviewer must plan in advance and should fully know the problem under
consideration. He must choose a suitable time and place so that the interviewee may be
at ease during the interview period.
2. Interviewer’s approach must be friendly and informal. Initially friendly greetings in
accordance with the cultural pattern of the interviewee should be exchanged and then
the purpose of the interview should be explained.
3. All possible effort should be made to establish proper rapport with the interviewee;
people are motivated to communicate when the atmosphere is favourable.
4. Interviewer must know that ability to listen with understanding, respect and curiosity is
the gateway to communication, and hence must act accordingly during the interview.
5. The questions must be well phrased in order to have full cooperation of the interviewee.
But the interviewer must control the course of the interview in accordance with the
objective of the study.
6. In case of big enquiries, where the task of collecting information is to be accomplished
by several interviewers, there should be an interview guide to be observed by all so as to
ensure reasonable uniformity.
3.3 Processing and Analysis of Data

The data, after collection, has to be processed and analysed in accordance with the outline laid down
for the purpose at the time of developing the research plan. This is essential for a scientific study and
for ensuring that we have all relevant data for making contemplated comparisons and analysis.
Processing implies
(1) Questionnaire checking
(2) Editing
(3) Coding
(4) classification
(5) Tabulation
(6) Graphical representation
(7) Data cleaning
(8) Data Adjusting
(1). Questionnaire checking: It involves checking all questionnaires for completeness &
quality. A Questionnaire may not be acceptable if
(a) It is fully or partially incomplete.
(b) It is answered by a person with inadequate knowledge
(c) It is answered in such a way which we feel that respondent could not understand
question.
2. Editing: Editing of data is done to detect errors and omissions and to correct these when
possible. It involves a careful scrutiny of the completed questionnaires and/or schedules. Editing
is done to assure that the data are accurate, consistent with other facts gathered, uniformly
entered.
Field editing consists in the review of the reporting forms by the investigator for completing
(translating or rewriting) what the latter has written in abbreviated and/or in illegible form. This
editing should be done soon after the interview. While doing field editing, the investigator must
restrain himself and must not correct errors of omission by guessing what the informant would have
said .
After all forms or schedules have been completed and returned to the office Central editing takes
place. All forms should get a thorough editing by a single editor in a small study and by a team of
editors in case of a large inquiry.
Editor(s) may correct the obvious errors such as an entry in the wrong place, entry recorded in
months when it should have been recorded in weeks, and the like. In case of inappropriate on missing
replies, the editor can sometimes determine the proper answer by reviewing the other information in
the schedule. All the wrong replies, which are quite obvious, must be dropped from the final results.
Dos and donots for Editors
(a) They should be familiar with instructions given to the interviewers and coders as well as
with the editing instructions supplied to them for the purpose.
(b) While crossing out an original entry for one reason or another, they should just draw a
single line on it so that the same may remain legible.
(c) They must make entries (if any) on the form in some distinctive colur and that too in a
standardised form.
(d) They should initial all answers which they change or supply.
e) Editor’s initials and the date of editing should be placed on each completed form or
schedule.
3. Coding: It refers to the process of assigning numerals or other symbols to answers so that
responses can be put into a limited number of categories or classes. Such classes should be
appropriate to the research problem under consideration. They must also possess the
characteristic of exhaustiveness (i.e., there must be a class for every data item) and a specific
answer can be placed in one and only one cell in a given category set. Every class is defined in
terms of only one concept.
Coding is necessary for efficient analysis and through it the several replies may be reduced to
a small number of classes which contain the critical information required for analysis. Coding
decisions should usually be taken at the designing stage of the questionnaire. This makes it
possible to precode the questionnaire choices and which in turn is helpful for computer tabulation
as one can straight forward key punch from the original questionnaires. But in case of hand
coding we code in the margin with a coloured pencil. The other method can be to transcribe the
data from the questionnaire to a coding sheet. Coding errors are to be eliminated or minimized.
4. Classification: It is the process of arranging Data in classes or groups on the basis of common
characteristics.
Classification can be one of the following two types.
(a)Classification according to attributes: Datas are classified with common characteristics which
can be descriptive (such as literacy, honesty, etc.) or numerical (such as weight, height, etc.).
Descriptive characteristics refer to qualitative phenomenon; only their presence or absence can be
noticed. Data obtained this way on the basis of certain attributes are known as statistics of
attributes and their classification is called classification according to attributes.
Such classification can be simple classification or manifold classification. In simple classification
we consider one attribute and divide the universe into two classes—one class consisting of items
possessing the given attribute and the other class which do not possess the given attribute.
In manifold classification we consider two or more attributes simultaneously, and divide that data
n
into a number of classes (total number of classes of final order is given by 2 , where n = number
of attributes). Attributes are defined in such a manner that there is least possibility of any
doubt/ambiguity .
(b)Classification according to class-intervals: The numerical characteristics refer to quantitative

phenomenon which can be measured through some statistical units. Data relating to income, age,
weight, etc. come under this category. Such data are known as statistics of variables and are classified
on the basis of class intervals. For instance, persons whose incomes, say, are within Rs 201 to Rs 400
can form one group, those whose incomes are within Rs 401 to Rs 600 can form another group and so
on. In this way the entire data may be divided into a number of groups or classes what are called,
‘class-intervals.’ Each group of class-interval, thus, has an upper limit as well as a lower limit which
are known as class limits. The difference between the two class limits is known as class magnitude.
All classes or groups, with their respective frequencies put in a table, are described as frequency
distribution. Classification according to class intervals usually involves the following three main
problems:
Step I: How may classes should be there? What should be their magnitudes?
The objective should be to display the data in such a way as to make it meaningful for the analyst.
Typically, we may have 5 to 15 classes.
While determining class magnitudes. Some statisticians adopt the following formula, suggested
by H.A. Sturges, determining the size of class interval:
i = R/(1 + 3.3 log N)
where
i = size of class interval; N = Number of items to be grouped.
R = Range (i.e., difference between the values of the largest item and smallest
item among the given items);
In case one or two or very few items have very high or very low values, one may use what are
known as open-ended intervals in the overall frequency distribution.
Step II: How to choose class limits?
The mid-point of a class-interval and the actual average of items of that class interval should
remain as close to each other as possible. Class limits should be in multiples of 2, 5, 10, 20, 100
etc.
Class limits may generally be stated in any of the following forms:
Exclusive type class intervals: They are usually stated as follows:
10–20 (read as 10 and under 20)
20–30 (read as 20 and under 30)
30–40
40–50
Under exclusive type class intervals, the upper limit of a class interval is excluded
and items with values less than the upper limit are put in the given class interval.
Inclusive type class intervals:
11–20(read as 11 and under 21)
21–30
31–40
41–50
In inclusive type class intervals the upper limit of a class interval is also included in
the concerning class interval. Thus 20 will be put in 11–20 class interval. The stated
upper limit of the class interval 11–20 is 20 but the real limit is 20.99999
When the phenomenon under consideration happens to be a discrete one (i.e., can be
measured and stated only in integers), then we should adopt inclusive type classification. But
when the phenomenon happens to be a continuous one capable of being measured in fractions as
well, we can use exclusive type class intervals.
Step-III To find the frequency of each class?
This can be done either by tally sheets or by mechanical aids. Under the technique of tally sheet,
the class-groups are written on a sheet of paper known as tally sheet and for each item a stroke
(a small vertical line) is marked against the class group in which it falls. After every four small
vertical lines in a class group, the fifth line for the item falling in the same group, is indicated as
horizontal line through the said four lines.All this facilitates the counting of items in each one of
the class groups. An illustrative tally sheet can be shown as under:
Table: Tally Sheet for Determining the Number of 70 Families in Different
Income Groups
Income groups Tally mark Number of families or
(Rupees) (Class frequency)
Below 400 IIII IIII III 13

401–800 IIII IIII IIII IIII 20
801–1200 IIII IIII II 12

1201–1600 IIII IIII IIII III 18
1601 and
Above IIII II 7
Total 70
Alternatively, class frequencies can be determined, in large inquires and surveys, by machines .
5. Tabulation: procedure of arranging data in concise and logical order is called tabulation.
Thus, tabulation is the process of summarising raw data and displaying the same in compact form
for further analysis.
Tabulation is essential because of the following reasons.
1. It conserves space and reduces explanatory and descriptive statement to a minimum.
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the detection of errors and omissions.
4. It provides a basis for various statistical computations.
Tabulation can be done by hand or by mechanical or electronic devices. In large inquiries, we use
computer tabulation if other factors are favourable and facilities are available. The card sorting
method is flexible for hand tabulation. In this method the data are recorded on special cards of
convenient size and shape with a series of holes. Each hole stands for a code and when cards are
stacked, a needle passes through particular hole representing a particular code. These cards are then
separated and counted. Tabulation is classified as simple and complex tabulation. The former type of
tabulation gives information about one or more groups of independent questions, whereas the latter
type of tabulation shows the division of data in two or more categories.
Generally accepted principles of tabulation: (For constructing statistical tables)

1. Every table should have a clear, concise and adequate title. This title should always be
placed just above the body of the table.
2. Every table should be given a distinct number to facilitate easy reference.
3. The column headings (captions) and the row headings (stubs) of the table should be
clear and brief.
4. The units of measurement under each heading or sub-heading must always be indicated.
5. Explanatory footnotes, if any, concerning the table should be placed directly beneath the
table, along with the reference symbols used in the table.
6. Source or sources from where the data in the table have been obtained must be indicated
just below the table.
7. Usually the columns are separated from one another by lines which make the table more
readable and attractive. Lines are always drawn at the top and bottom of the table and
below the captions.
8. There should be thick lines to separate the data under one class from the data under
another class and the lines separating the sub-divisions of the classes should be
comparatively thin lines.
9. The columns may be numbered to facilitate reference.
10. Those columns whose data are to be compared should be kept side by side. Similarly,
percentages and/or averages must also be kept close to the data.
11. It is important that all column figures be properly aligned. Decimal points and (+)
or (–) signs should be in perfect alignment.
12. Abbreviations should be avoided .
13. Miscellaneous and exceptional items, if any, should be usually placed in the last row of
the table.
14. Table should be made as logical, clear, accurate and simple as possible. If the data
happen to be very large, they should not be crowded in a single table for that would
make the table unwieldy and inconvenient.
15. Total of rows should normally be placed in the extreme right column and that of
columns should be placed at the bottom.
16. The arrangement of the categories in a table may be chronological, geographical,
alphabetical or according to magnitude to facilitate comparison. Above all, the
table must suit the needs and requirements of an investigation.
6.Graphical Representation: All statistical packages, MS Excel offer graphs. In cas of
qualitative data common graphs are bar charts and pie charts.
Bar charts : It consists of series of rectangles. Height of each rectangle is determined by
frequency of that category.
Pie charts: It is used to emphasize relative proportion of each category. It is a circular
chart divided into sectors. The relative frequency in each category or sector is
proportional to arc length of that sector or area of that sector.
7. Data cleaning: This includes checking the data for consistency & treatment for missing
value. Consistency checks look for the data which are not consistent or outlines. Such
data may be discarded or replaced by the mean value. Missing values are the values
which are unknown or not answered by the respondent .In place of missing values some
natural values may be used.
8. Data Adjusting: It is not always necessary but it improve quality of analysis. Each
respondent is assigned a weight to reflect its importance. Using this method the collected
sample can be made a strong representative of a target population on specific
characteristic.
ANALYSIS OF DATA
Analysis means the computation of certain indices or measures along with searching for patterns of
relationship that exist among the data groups. Analysis, in survey or experimental data, involves
estimating the values of unknown parameters of the population and testing of hypotheses for drawing
inferences. Analysis is categorised as descriptive analysis and inferential analysis (Inferential analysis
is called statistical analysis).
Descriptive analysis is largely the study of distributions of one variable. This study provides us with
profiles of companies, work groups, persons and other subjects on any of a multiple of characteristics
such as size. Composition, efficiency, preferences, etc. This sort of analysis may be in respect of one
variable (unidimensional analysis), or of two variables (bivariate analysis) or of more than two
variables (multivariate analysis). We work out various measures that show the size and shape of a
distribution(s) along with the study of measuring relationships between two or more variables.
Correlation analysis studies the joint variation of two or more variables for determining the amount of
correlation between two or more variables.
Causal analysis is concerned with the study of how one or more variables affect changes in another
variable. It is study of functional relationships existing between two or more variables. This analysis is
called regression analysis.
(a) Multiple regression analysis: This analysis is adopted when the researcher has one dependent
variable which is presumed to be a function of two or more independent variables. The objective
of this analysis is to make a prediction about the dependent variable based on its covariance with
all the concerned independent variables.
(b) Multiple discriminant analysis: This analysis is appropriate when the researcher has a single
dependent variable that cannot be measured, but can be classified into two or more groups on the
basis of some attribute. It is used to predict an entity’s possibility of belonging to a particular
group based on several predictor variables.
(c) Multivariate analysis of variance (or multi-ANOVA): This analysis is an extension of two-way
ANOVA, wherein the ratio of among group variance to within group variance is worked out on a
set of variables.
(d) Canonical analysis: This analysis can be used in case of both measurable and non-
measurable variables for the purpose of simultaneously predicting a set of dependent variables
from their joint covariance with a set of independent variables.
(e)Inferential analysis is concerned with the various tests of significance for testing hypotheses in
order to determine with what validity data can be said to indicate some conclusion or conclusions.
It is also concerned with the estimation of population values. It is mainly on the basis of
inferential analysis that the task of interpretation (i.e., the task of drawing inferences and
conclusions) is performed.
3.4 MEASURES OF RELATIONSHIP
When the population is based on two or more variables, we like to measure relationship between
variables. The measurement of relationship between the two or more variables can give us idea of
effect of one variable on the other.
Some methods of measurement of relationship are

In case of bivariate population: Correlation can be studied through
(a) cross tabulation;
(b) Charles Spearman’s coefficient of correlation;
(c) Karl Pearson’s coefficient of correlation; whereas cause and effect relationship can be studied
through simple regression equations.
In case of multivariate population: Correlation can be studied through
(a) coefficient of multiple correlation;
(b) coefficient of partial correlation; whereas cause and effect relationship can be studied
through multiple regression equations.
Cross tabulation approach is useful when the data are in nominal form. We classify each variable
into two or more categories and then cross classify the variables in these sub-categories. Then we look
for interactions between them which may be symmetrical, reciprocal or asymmetrical.
A symmetrical relationship is one in which the two variables vary together. A reciprocal
relationship exists when the two variables mutually influence each other. Asymmetrical relationship
exists if one variable (independent variable) is responsible for another variable (dependent variable).
The correlation, found through this approach is not a very powerful form of statistical correlation
and we use some other methods when data happen to be either ordinal or interval or ratio data.
Charles Spearman’ s coefficient of correlation (or rank correlation) is the technique of

determining the degree of correlation between two variables in case of ordinal data where ranks
are given to the different values of the variables. The main objective of this coefficient is to
determine the extent to which the two sets of ranking are similar or dissimilar. This coefficient is
determined as under:
6 d i2
Spearman's coefficient of correlation rs = 1 
n n 2  1
where di = difference between ranks of ith pair of the two variables;
n = number of pairs of observations.
Karl Pearson’s coefficient of correlation is the method of measuring degree of relationship between
two variables. This coefficient assumes the following:
(i) that there is linear relationship between the two variables;
(ii) that the two variables are casually related which means that one of the variables is
independent and the other one is dependent; and
(iii) a large number of independent causes are operating in both variables so as to produce a
normal distribution.
Karl Pearson’s coefficient of correlation ( r) =

 XY Cov  X , Y 

 X Y Std Dev  X Std Dev Y 
 X i Yi  n  X  Y
or r 2 2 2 2
X  nX Y  nY
i I
where Xi = ith value of X variable, X = mean of X, Yi = ith value of Y variable

Y = Mean of Y, n = number of pairs of observations of X and Y
 X = Standard deviation of X
Y = Standard deviation of Y
Karl Pearson’s coefficient of correlation is also called product moment correlation coefficient.
The value of ‘ r’ lies between  1. Positive values of r indicate positive correlation between the
two variables (i.e., changes in both variables take place in the statement direction), whereas
negative values of ‘ r’ indicate negative correlation i.e., changes in the two variables taking place
in the opposite directions. A zero value of ‘ r’ indicates that there is no association between the
two variables. When r = (+) 1, it indicates perfect positive correlation and when it is (–1), it
indicates perfect negative correlation.
3.5 STATISTICAL MEASUREMENT & SIGNIFICANCE

Statistical measurement is defined as process of associating numbers or symbols to observations
obtained in research study. It may be qualitative or quantitative. Qualitative characteristic can be
counted but can not be computed. Quantitative characteristic can be computed.(Example mean &
std deviation.)
Nominal data is numerical in name only . They do not share any properties of numbers. For
example if we record marital status as 1, 2, 3 depending on a person is single, married or
widowed, we can not write 3 > 1.
In Ordinal data we get only inequalities. For example Mohs scale numbers 1 to 10 are assigned
respectively to talc, gypsum, calcite, fluorite, apatite, feldspar, quartz, topaz, sapphire & diamond.
We can write 5 > 2 or 6 < 9 with these numbers but we can not write 10 – 9 = 1 = 5 - 4. As
apatite is harder than gypsum but difference in hardness between sapphire & diamond is much
greater than fluorite and apatite.
In addition to inequalities if we can form differences we refer to the data as interval data.
Example : Temperature reading (in F) 58, 70, 95, 110, 126
In addition to setting inequalities and forming differences if we can form quotients such data
refer to ratio data.
Example : measurement of height, area, weight etc.
Classification of measurement scales

The classification of measurement scales are: (a) nominal scale; (b) ordinal scale; (c) interval
scale; and (d) ratio scale.
(a)Nominal scale: Nominal scale is a system of assigning number symbols to events in order to label
them. The usual example of this is the assignment of numbers of basketball players in order to identify
them. Such numbers cannot be associated with an ordered scale for their order is of no consequence.
Nominal scales provide convenient ways of keeping track of people, objects and events. For example,
one cannot usefully average the numbers on the back of a group of football players to get meaning.
Counting of members in each group is the only possible arithmetic operation when a nominal scale is
employed. There is no generally used measure of dispersion for nominal scales. Chi-square test is the
most common test of statistical significance that can be utilized, and for the measures of correlation,
the contingency coefficient can be worked out.
Nominal scale is the least powerful level of measurement. It indicates no order or distance
relationship and has no arithmetic origin. The scale wastes any information about varying degrees
of attitude, skills, understandings, etc. Nominal scales are useful and are widely used in surveys
and other ex-post-facto research when data are classified by major sub-groups of the population.
(b) Ordinal scale: The lowest level of the ordered scale used is the ordinal scale. The ordinal
scale places events in order. Rank orders represent ordinal scales and are frequently used in
research relating to qualitative phenomena. A student’s rank in his graduation class involves the
use of an ordinal scale.
For instance, if Ram’s position in his class is 10 and Mohan’s position is 40, it cannot be said that
Ram’s position is four times as good as that of Mohan. Ordinal scales only permit the ranking of
items from highest to lowest. Ordinal measures have no absolute values, and the real differences
between adjacent ranks may not be equal. One person is higher or lower on the scale than
another, but more precise comparisons cannot be made.
Thus, the use of an ordinal scale implies a statement of ‘greater than’ or ‘less than’ (equality
statement is also acceptable) without our being able to state how much greater or less. Since the
numbers of this scale have only a rank meaning, the appropriate measure of central tendency is
the median. A percentile or quartile measure is used for measuring dispersion. Correlations are
restricted to various rank order methods. Measures of statistical significance are restricted to the
non-parametric methods.
(d)Interval scale: In interval scale, the intervals are adjusted in terms of some rule for making the
units equal. The units are equal only in so far as one accepts the assumptions on which the rule is
based. Interval scales can have an arbitrary zero, but it is not possible to determine for them what
may be called an absolute zero.
The limitation of the interval scale is the lack of a true zero; it does not have the capacity to
measure the complete absence of a characteristic. The Fahrenheit scale is an example of an
interval scale and shows similarities in what one can and cannot do with it. One can say that an
increase in temperature from 30° to 40° involves the same increase in temperature as an increase
from 60° to 70°, but one cannot say that the temperature of 60° is twice as warm as the
temperature of 30° because both numbers are dependent as zero on the scale is set arbitrarily at
the temperature of the freezing point of water. The ratio of the two temperatures, 30° and 60°,
means nothing because zero is an arbitrary point.
Interval scales provide more powerful measurement than ordinal scales for interval scale also
incorporates the concept of equality of interval. Mean is the appropriate measure of central
tendency, standard deviation is used as measure of dispersion. The tests for statistical significance
are the ‘t’ test and ‘F’ test.
(d) Ratio scale: Ratio scales have an absolute or true zero of measurement. We can conceive of
an absolute zero of length and time. For example, the zero point on a centimeter scale indicates
the complete absence of length or height. But an absolute zero of temperature is theoretically
unobtainable. The number of minor traffic-rule violations and the number of incorrect letters in a
page of type script represent scores on ratio scales. Both these scales have absolute zeros and as
such all minor traffic violations and all typing errors can be assumed to be equal in significance.
Ratio scale represents the actual amounts of variables. Measures of physical dimensions such
as weight, height, distance, etc. are examples. Generally, all statistical techniques are usable with
ratio scales and all manipulations that one can carry out with real numbers can also be carried out
with ratio scale values. Multiplication and division can be used with this scale but not with other
scales mentioned above.
Thus, proceeding from the nominal scale to ratio scale, relevant information is obtained
increasingly. If the nature of the variables permits, the researcher should use the scale that provides
the most precise description. Researchers in physical sciences have the advantage to describe variables
in ratio scale form but the behavioural sciences are generally limited to describe variables in interval
scale form, a less precise type of measurement.
Goodness of Measurement Scales

The quality of goodness are validity, reliability and practicality & accuracy. These are major
considerations one should use in evaluating a measurement tool.
1. Validity
Validity indicates the degree to which an instrument measures what it is supposed to measure.
Validity is the extent to which differences found with a measuring instrument reflect true differences
among those being tested. We seek other relevant evidence we have found with our measuring tool.
The three types of validity are: (i) Content validity; (ii) Criterion-related validity and (iii) Construct
validity.
(i) Content validity is the extent to which a measuring instrument provides adequate coverage of
the topic under study. If the instrument contains a representative sample of the universe, the
content validity is good. It can be determined by using a panel of persons who shall judge how
well the measuring instrument meets the standards.
(ii) Criterion-related validity relates to our ability to predict some outcome or estimate the existence
of some current condition. This form of validity reflects the success of measures used for some
empirical estimating purpose. The concerned criterion must possess the following qualities:
Relevance: (A criterion is relevant if it is defined in terms we judge to be the proper measure.)
Freedom from bias: (Freedom from bias is attained when the criterion gives each subject an equal
opportunity to score well.)
Reliability: (A reliable criterion is stable or reproducible.)
Availability: (The information specified by the criterion must be available.)
Criterion-related validity refers to (i) Predictive validity and (ii) Concurrent validity. The
former refers to the usefulness of a test in predicting some future performance whereas the latter
refers to the usefulness of a test in closely relating to other measures of known validity. Criterion-
related validity is expressed as the coefficient of correlation between test scores and some
measure of future performance or between test scores and scores on another measure of known
validity.
(iii) Construct validity is the most complex and abstract. A measure is said to possess construct
validity to the degree that it confirms to predicted correlations with other theoretical propositions.
Construct validity is the degree to which scores on a test can be accounted for by the explanatory
constructs of a sound theory. For determining construct validity, we associate a set of other
propositions with the results received from using our measurement instrument.
If the above stated criteria and tests are met with, we may state that our measuring instrument
is valid and will result in correct measurement; otherwise we have to look for more information.
2. Reliability
The test of reliability is another important test of sound measurement. A measuring instrument is
reliable if it provides consistent results. Reliable measuring instrument does contribute to validity,
but a reliable instrument need not be a valid instrument. Reliability is not as valuable as validity,
but it is easier to assess reliability. If the quality of reliability is satisfied by an instrument, then
while using it we can be confident that the transient and situational factors are not interfering.
Two aspects of reliability viz., stability and equivalence deserve special mention. The stability
aspect is concerned with securing consistent results with repeated measurements of the same person
and with the same instrument. Degree of stability is determined by comparing the results of repeated
measurements. The equivalence aspect considers how much error may get introduced by different
investigators or different samples of the items being studied. A good way to test for the equivalence of
measurements by two investigators is to compare their observations of the same events.
Reliability can be improved in the following two ways:
(i) By standardising the conditions under which the measurement takes place.This will
improve stability aspect.
(ii) By carefully designed directions for measurement with no variation from group to
group, by using trained and motivated persons to conduct the research and also by
broadening the sample of items used. This will improve equivalence aspect.
3. Practicality
The practicality characteristic of a measuring instrument can be judged in terms of economy, convenience
and interpretability. From the operational point of view, the measuring instrument should be economical,
convenient and interpretable. Economy consideration suggests that some trade-off is needed between the
ideal research project and that which the budget can afford. The length of measuring instrument is an
important area where economic pressures are quickly felt. Data-collection methods to be used are also
dependent at times upon economic factors.
Convenience test suggests that the measuring instrument should be easy to administer. Attention to the
proper layout of measuring instrument is required. For instance, a questionnaire, with clear instructions is
more effective and easier to complete.
Interpretability consideration is important when persons other than the designers of the test are to interpret
the results. The measuring instrument, in order to be interpretable, must be supplemented by (a) detailed
instructions for administering the test;(b) scoring keys; (c) evidence about the reliability and (d)
guides for using the test and for interpreting results.
4. Accuracy The characteristic of accuracy of measurement scale must be a true
representative of observation of underlying characteristic. For example measuring with
an inch scale will provide accurate value only upto one eighth of an inch but measuring
with cm scale will provide more accurate value.
3.6 RANDOM SAMPLING

Probability sampling is also known as ‘random sampling’ or ‘chance sampling’. Under this
sampling design, every item of the universe has an equal chance of inclusion in the sample.The results
obtained from probability or random sampling can be assured in terms of probability i.e., we can
measure the errors of estimation or the significance of results obtained from a random sample.
Random sampling ensures the law of Statistical Regularity which states that if on an average the
sample chosen is a random one, the sample will have the same composition and characteristics as the
universe.
Random sampling from a finite population refers to that method of sample selection which
gives each possible sample combination an equal probability of being picked up. This applies to
sampling without replacement .Sampling with replacement is used less frequently.
The implications of random sampling (or simple random sampling) are:
(a) It gives each element in the population an equal probability of getting into the sample;
and all choices are independent of one another.
(b) It gives each possible sample combination an equal probability of being chosen.
Keeping this in view we can define a simple random sample from a finite population as a
N
sample which is chosen in such a way that each of the Cn possible samples has the same
N
probability, 1/ Cn, of being selected.
For example if we take a certain finite population consisting of six elements (say a, b, c, d, e,
6
f ) i.e., N = 6. Suppose that we want to take a sample of size n = 3 from it. Then there are C3 =
20 possible distinct samples of the required size, and they consist of the elements abc, abd, abe,
abf, acd, ace, acf, ade, adf, aef, bcd, bce, bcf, bde, bdf, bef, cde, cdf, cef, and def. If we choose
one of these samples in such a way that each has the probability 1/20 of being chosen, we will
then call this a random sample.
HOW TO SELECT A RANDOM SAMPLE ?

In simple cases we write each of the possible samples on a slip of paper, mix these slips thoroughly in
a container and then draw as a lottery either blindfolded or by rotating a drum or by any other similar
device. The practical utility of such a method is limited.
We can take a random sample in a relatively easier way without taking the trouble of enlisting all
possible samples on paper-slips. We can write the name of each element of a finite population on a
slip of paper, put the slips of paper so prepared into a box or a bag and mix them thoroughly and then
draw (without looking) the required number of slips for the sample one after the other without
replacement.
In doing so we must make sure that in successive drawings each of the remaining elements of the
population has the same chance of being selected. This procedure will also result in the same
probability for each possible sample.
We can verify this by taking the above example. Since we have a finite population of 6 elements
and we want to select a sample of size 3, the probability of drawing any one element for our sample in
the first draw is 3/6, the probability of drawing one more element in the second draw is 2/5, (the first
element drawn is not replaced) and similarly the probability of drawing one more element in the third
draw is 1/4. Since these draws are independent, the joint probability of the three elements which
constitute our sample is the product of their individual probabilities and this works out to 3/6 × 2/5 ×
1/4 = 1/20.
This verifies our earlier calculation.
Even this relatively easy method of obtaining a random sample can be simplified in actual
practice by the use of random number tables.
Note that it is easy to draw random samples from finite populations with the aid of random
number tables only when lists are available and items are readily numbered. But in some
situations it is often impossible to proceed in the way we have narrated above. For example, if we
want to estimate the mean height of trees in a forest, it would not be possible to number the trees,
and choose random numbers to select a random sample.
RANDOM SAMPLE FROM AN INFINITE UNIVERSE

Suppose we consider the 20 throws of a fair dice as a sample from the hypothetically infinite
population which consists of the results of all possible throws of the dice. If the probability of getting
a particular number, say 1, is the same for each throw and the 20 throws are all independent, then we
say that the sample is random.
Similarly, it would be said to be sampling from an infinite population if we sample with replacement
from a finite population and our sample would be considered as a random sample if in each draw all
elements of the population have the same probability of being selected and successive draws happen
to be independent. The selection of each item in a random sample from an infinite population is
controlled by the same probabilities and that successive selections are independent of one another.
COMPLEX RANDOM SAMPLING DESIGNS
Probability sampling under restricted sampling techniques, as stated above, may result in complex
random sampling designs. Some popular complex random sampling designs are as follows:
(i)Systematic sampling: In some instances, the most practical way of sampling is to select every
ith item on a list. Sampling of this type is known as systematic sampling. An element of
randomness is introduced into this kind of sampling by using random numbers to pick up the unit
with which to start.
For instance, if a 4 % sample is desired, the first item would be selected randomly from the
first twenty-five and thereafter every 25th item would automatically be included in the
sample. Thus, in systematic sampling only the first unit is selected randomly and the
remaining units of the sample are selected at fixed intervals.
Systematic sampling can be taken as an improvement over a simple random sample in as
much as the systematic sample is spread more evenly over the entire population. It is an easier
and less costlier method of sampling and can be conveniently used even in case of large
populations. If there is a hidden periodicity in the population, systematic sampling will prove to
be an inefficient method of sampling. For instance, every 25th item produced by a certain
production process is defective. If we are to select a 4% sample of the items of this process in a
systematic manner, we would either get all defective items or all good items in our sample
depending upon the random starting position.
If all elements of the universe are ordered in a manner representative of the total population,
i.e., the population list is in random order, systematic sampling is considered equivalent to
random sampling. In practice, systematic sampling is used when lists of population are available
and they are of considerable length.
(ii) Stratified sampling: If a population from which a sample is to be drawn does not constitute a
homogeneous group, stratified sampling technique is applied in order to obtain a representative
sample. Under stratified sampling the population is divided into several sub-populations that are
individually more homogeneous than the total population.The different sub-populations are called
‘strata’. We select items from each stratum to constitute a sample. Since each stratum is more
homogeneous than the total population, we are able to get more precise estimates for each stratum and
by estimating more accurately each of the component parts, we get a better estimate of the whole.
tratified sampling results in more reliable and detailed information.
The following three questions are highly relevant in the context of stratified sampling:
(a) How to form strata?
(b) How should items be selected from each stratum?
(c) How many items be selected from each stratum or how to allocate the sample size of
each stratum?
st
For 1 question, strata can be formed on the basis of common characteristic(s) of the items to be
put in each stratum. This means that various strata be formed in such a way as to ensure elements
being most homogeneous within each stratum and most heterogeneous between the different
strata. Thus, strata are purposively formed and are usually based on past experience and personal
judgement of the researcher. Consideration of the relationship between the characteristics of the
population and the characteristics to be estimated are normally used to define the strata.
For 2nd question, the usual method, for selection of items for the sample from each stratum,
resorted to is that of simple random sampling. Systematic sampling can be used if it is considered
more appropriate in certain situations.
For 3rd question, we follow the method of proportional allocation under which the sizes of
the samples from the different strata are kept proportional to the sizes of the strata. That is, if Pi
represents the proportion of population included in stratum i, and n represents the total sample
size, the number of elements selected from stratum i is n . Pi.
To illustrate it, let us suppose that we want a sample of size n = 30 to be drawn from a
population of size N = 8000 which is divided into three strata of size N1 = 4000, N2 = 2400 and
N3 = 1600. Adopting proportional allocation, we shall get the sample sizes as under for the
different strata:
For strata with N1 = 4000, we have P1 = 4000/8000
and hence n1 = n . P1 = 30 (4000/8000) = 15
Similarly, for strata with N2 = 2400, we have
n2 = n . P2 = 30 (2400/8000) = 9, and
for strata with N3 = 1600, we have
n3 = n . P3 = 30 (1600/8000) = 6.
Thus, using proportional allocation, the sample sizes for different strata are 15, 9 and 6 respectively which
is in proportion to the sizes of the strata viz., 4000 : 2400 : 1600.
Proportional allocation is considered most efficient and an optimal design when the cost of selecting an
item is equal for each stratum, there is no difference in within-stratum variances, and the purpose of
sampling happens to be to estimate the population value of some characteristic. But in case the purpose
happens to be to compare the differences among the strata, then equal sample selection from each stratum
would be more efficient even if the strata differ in sizes.
In cases where strata differ not only in size but also in variability and it is considered reasonable to take
larger samples from the more variable strata and smaller samples from the less variable strata, we can then
account for both (differences in stratum size and differences in stratum variability) .
(iii) Cluster sampling: If the total area of interest is big one, a convenient way in which a sample
can be taken is to divide the area into a number of smaller non-overlapping areas and then to
randomly select a number of these smaller areas (called clusters).
In cluster sampling the total population is divided into a number of relatively small
subdivisions which are themselves clusters of still smaller units and then some of these clusters
are randomly selected for inclusion in the overall sample. Suppose we want to estimate the
proportion of machine-parts in an inventory which are defective. Also assume that there are
20000 machine parts in the inventory at a given point of time, stored in 400 cases of 50 each.
Now using a cluster sampling, we would consider the 400 cases as clusters and randomly select ‘
n’ cases and examine all the machine-parts in each randomly selected case.
Cluster sampling, reduces cost by concentrating surveys in selected clusters. But it is less
precise than random sampling. There is also not as much information in ‘ n’ observations within a
cluster as there happens to be in ‘ n’ randomly drawn observations. Cluster sampling is used only
because of the economic advantage it possesses; estimates based on cluster samples are usually
more reliable per unit cost.
(iv) Multi-stage sampling: Multi-stage sampling is a further development of the principle of
cluster sampling. Suppose we want to investigate the working efficiency of nationalized banks in India
and we want to take a sample of few banks. The first stage is to select large primary sampling unit
such as states in a country. Then we select certain districts and interview all banks in the chosen
districts. This would represent a two-stage sampling design with the ultimate sampling units
being clusters of districts.
If instead of taking a census of all banks within the selected districts, we select certain towns
and interview all banks in the chosen towns. This would represent a three-stage sampling design.
If instead of taking a census of all banks within the selected towns, we randomly sample banks
from each selected town, then it is a case of using a four-stage sampling plan. If we select
randomly at all stages, we will have what is known as ‘multi-stage random sampling design’.
Multi-stage sampling is applied in big inquires extending to a considerable large geographical
area, say, the entire country. There are two advantages.
(a) It is easier to administer than most single stage designs because sampling frame under multi-
stage sampling is developed in partial units.
(b) A large number of units can be sampled for a given cost under multistage sampling because of
sequential clustering, whereas this is not possible in most of the simple designs.
View publication stats

Data Collection and Sampling

Uploaded by

Copyright:

Available Formats

Data Collection and Sampling

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Collection and Sampling

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

DATA COLLECTION AND SAMPLING

Preprint · January 2018

Wireless Sensor Network View project

Intrusion Detection System View project

The user has requested enhancement of the downloaded file.

3.1 DATA COLLECTION

Experiment and surveys

We collect primary data from experiments in experimental research. In case we do research of

3.1.1 COLLECTION OF PRIMARY DATA: There are several methods.

(A) OBSERVATION METHOD:

Advantages of observation method

1. Subjective bias is eliminated.

Limitations of observation method

Structured Observation: If the observation is characterized by a careful definition of units to be

Unstructured Observation: If the observation is not structured observation then it is called

Unstructured interviews involve flexibility of approach to questioning. It demands deep

Advantages of Personal interview Method

1. More information and of greater depth can be obtained.

Weaknesses of Personal interview Method

1. It is expensive method and bias of interviewer is possible.

(B2) Telephone interview.

This method of collecting information is contacting respondents by telephone. It is not widely

1. It is more flexible & faster in comparison to other methods.

1. Little time is given to respondents.

(C). COLLECTION OF DATA THROUGH QUETIONNAIRE

1. There is low cost.

1. Low rate of return of duly filled in questionnaires.

Three Main aspects of a questionnaire:

3.Question formulation and wording:

Essentials of a good questionnaire:

(D) COLLECTION OF DATA THROUGH SCHEDULES

SOME OTHER METHODS OF DATA COLLECTION

3.2 COLLECTION OF SECONDARY DATA

Guidelines for Successful Interviewing

3.3 Processing and Analysis of Data

(b)Classification according to class-intervals: The numerical characteristics refer to quantitative

Income groups Tally mark Number of families or

(Rupees) (Class frequency)

Below 400 IIII IIII III 13

801–1200 IIII IIII II 12

Generally accepted principles of tabulation: (For constructing statistical tables)

3.4 MEASURES OF RELATIONSHIP

Some methods of measurement of relationship are

Charles Spearman’ s coefficient of correlation (or rank correlation) is the technique of

Karl Pearson’s coefficient of correlation ( r) =

where Xi = ith value of X variable, X = mean of X, Yi = ith value of Y variable

3.5 STATISTICAL MEASUREMENT & SIGNIFICANCE

Classification of measurement scales

Goodness of Measurement Scales

3.6 RANDOM SAMPLING

HOW TO SELECT A RANDOM SAMPLE ?

RANDOM SAMPLE FROM AN INFINITE UNIVERSE

View publication stats

You might also like