Chapter Six Methods of Data Collection, Processing and Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Chapter Six

Methods of Data Collection, Processing and Analysis


Introduction
The task of data collection begins after a research problem has been defined and research
design/ plan chalked out. While deciding about the method of data collection to be used for
the study, the researcher should keep in mind two types of data viz., primary and secondary.
The primary data are those which are collected afresh and for the first time, and thus happen
to be original in character. The secondary data, on the other hand, are those which have
already been collected by someone else and which have already been passed through the
statistical process. The researcher would have to decide which sort of data he would be using
(thus collecting) for his study and accordingly he will have to select one or the other method
of data collection. The methods of collecting primary and secondary data differ since primary
data are to be originally collected, while in case of secondary data the nature of data
collection work is merely that of compilation.

6.1.Collection of Primary Data


There are several methods of collecting primary data, particularly in surveys and descriptive
researches. Important ones are: (i) observation method, (ii) interview method, (iii) through
questionnaires. We briefly take up each method separately.

Survey refers to the method of securing information concerning a phenomenon under study
from all or a selected number of respondents of the concerned universe. In a survey, the
investigator examines those phenomena which exist in the universe independent of his action.

6.2.Observation Method
The observation method is the most commonly used method specially in studies relating to
behavioural sciences. In a way we all observe things around us, but this sort of observation is
not scientific observation. Observation becomes a scientific tool and the method of data
collection for the researcher, when it serves a formulated research purpose, is systematically
planned and recorded and is subjected to checks and controls on validity and reliability.
Under the observation method, the information is sought by way of investigator‟s own direct
observation without asking from the respondent. The main advantage of this method is that
subjective bias is eliminated, if observation is done accurately. Secondly, the information
obtained under this method relates to what is currently happening; it is not complicated by
either the past behaviour or future intentions or attitudes. Thirdly, this method is independent

1
of respondents‟ willingness to respond and as such is relatively less demanding of active
cooperation on the part of respondents as happens to be the case in the interview or the
questionnaire method. This method is particularly suitable in studies which deal with subjects
(i.e., respondents) who are not capable of giving verbal reports of their feelings for one
reason or the other.

However, observation method has various limitations. Firstly, it is an expensive method.


Secondly, the information provided by this method is very limited. Thirdly, sometimes
unforeseen factors may interfere with the observational task. At times, the fact that some
people are rarely accessible to direct observation creates obstacle for this method to collect
data effectively.

6.3. Interview Method


The interview method of collecting data involves presentation of oral-verbal stimuli and reply
in terms of oral-verbal responses. This method can be used through personal interviews and,
if possible, through telephone interviews.
(a) Personal interviews: Personal interview method requires a person known as the
interviewer asking questions generally in a face-to-face contact to the other person or
persons. (At times the interviewee may also ask certain questions and the interviewer
responds to these, but usually the interviewer initiates the interview and collects the
information.) This sort of interview may be in the form of direct personal investigation or it
may be indirect oral investigation. In the case of direct personal investigation the interviewer
has to collect the information personally from the sources concerned. He has to be on the spot
and has to meet people from whom data have to be collected.

This method is particularly suitable for intensive investigations. But in certain cases it may
not be possible or worthwhile to contact directly the persons concerned or on account of the
extensive scope of enquiry, the direct personal investigation technique may not be used. In
such cases an indirect oral examination can be conducted under which the interviewer has to
cross-examine other persons who are supposed to have knowledge about the problem under
investigation and the information, obtained is recorded. Most of the commissions and
committees appointed by government to carry on investigations make use of this method.

The method of collecting information through personal interviews is usually carried out in a
structured way. As such we call the interviews as structured interviews. Such interviews
involve the use of a set of predetermined questions and of highly standardised techniques of
2
recording. Thus, the interviewer in a structured interview follows a rigid procedure laid
down, asking questions in a form and order prescribed. As against it, the unstructured
interviews are characterised by a flexibility of approach to questioning. Unstructured
interviews do not follow a system of pre-determined questions and standardised techniques of
recording information. In a non-structured interview, the interviewer is allowed much greater
freedom to ask, in case of need, supplementary questions or at times he may omit certain
questions if the situation so requires. He may even change the sequence of questions. He has
relatively greater freedom while recording the responses to include some aspects and exclude
others.

Despite the variations in interview-techniques, the major advantages and weaknesses of


personal interviews can be enumerated in a general way. The chief merits of the interview
method are as follows:
- More information and that too in greater depth can be obtained.
- Interviewer by his own skill can overcome the resistance, if any, of the respondents; the
interview method can be made to yield an almost perfect sample of the general
population.
- There is greater flexibility under this method as the opportunity to restructure questions
is always there, specially in case of unstructured interviews.
- Observation method can as well be applied to recording verbal answers to various
questions.
- Personal information can as well be obtained easily under this method.
- Samples can be controlled more effectively as there arises no difficulty of the missing
returns; non-response generally remains very low.
- The interviewer can usually control which person(s) will answer the questions. This is
not possible in mailed questionnaire approach. If so desired, group discussions may also
be held.
- The interviewer may catch the informant off-guard and thus may secure the most
spontaneous reactions than would be the case if mailed questionnaire is used.
- The language of the interview can be adopted to the ability or educational level of the
person interviewed and as such misinterpretations concerning questions can be avoided.
- The interviewer can collect supplementary information about the respondent‟s personal
characteristics and environment which is often of great value in interpreting results.

3
But there are also certain weaknesses of the interview method. Among the important
weaknesses, mention may be made of the following:
- It is a very expensive method, specially when large and widely spread geographical
sample is taken.
- There remains the possibility of the bias of interviewer as well as that of the
respondent; there also remains the headache of supervision and control of interviewers.
- Certain types of respondents such as important officials or executives or people in high
income groups may not be easily approachable under this method and to that extent the
data may prove inadequate.
- This method is relatively more-time-consuming, specially when the sample is large and
recalls upon the respondents are necessary.
- The presence of the interviewer on the spot may over-stimulate the respondent,
sometimes even to the extent that he may give imaginary information just to make the
interview interesting.
- Under the interview method the organisation required for selecting, training and
supervising the field-staff is more complex with formidable problems.
- Interviewing at times may also introduce systematic errors.
- Effective interview presupposes proper rapport with respondents that would facilitate
free and frank responses. This is often a very difficult requirement.

(b) Telephone interviews: This method of collecting information consists in contacting


respondents on telephone itself. It is not a very widely used method. The chief merits of such
a system are:
- It is more flexible in comparison to mailing method.
- It is faster than other methods i.e., a quick way of obtaining information.
- It is cheaper than personal interviewing method; here the cost per response is relatively
low.
- Recall is easy; call backs are simple and economical.
- There is a higher rate of response than what we have in mailing method; the non-
response is generally very low.
- Replies can be recorded without causing embarrassment to respondents.
- Interviewer can explain requirements more easily.
- At times, access can be gained to respondents who otherwise cannot be contacted for
one reason or the other.

4
- No field staff is required.
- Representative and wider distribution of sample is possible.

But this system of collecting information is not free from demerits. Some of these may be
highlighted.
- Little time is given to respondents for considered answers; interview period is not likely
to exceed five minutes in most cases.
- Surveys are restricted to respondents who have telephone facilities.
- Extensive geographical coverage may get restricted by cost considerations.
- It is not suitable for intensive surveys where comprehensive answers are required to
various questions.
- Possibility of the bias of the interviewer is relatively more.
- Questions have to be short and to the point; probes are difficult to handle.

6.4. Collection of Data through Questionnaires


This method of data collection is quite popular, particularly in case of big enquiries. It is
being adopted by private individuals, research workers, private and public organisations and
even by governments. In this method a questionnaire is sent to the persons concerned with a
request to answer the questions and return the questionnaire. A questionnaire consists of a
number of questions printed or typed in a definite order on a form or set of forms. The
questionnaire is mailed to respondents who are expected to read and understand the questions
and write down the reply in the space meant for the purpose in the questionnaire itself. The
respondents have to answer the questions on their own.

The method of collecting data by mailing the questionnaires to respondents is most


extensively employed in various economic and business surveys. The merits claimed on
behalf of this method are as follows:
- There is low cost even when the universe is large and is widely spread geographically.
- It is free from the bias of the interviewer; answers are in respondents‟ own words.
- Respondents have adequate time to give well thought out answers.
- Respondents, who are not easily approachable, can also be reached conveniently.
- Large samples can be made use of and thus the results can be made more dependable and
reliable.

The main demerits of this system can also be listed here:

5
- Low rate of return of the duly filled in questionnaires; bias due to no-response is often
indeterminate.
- It can be used only when respondents are educated and cooperating.
- The control over questionnaire may be lost once it is sent.
- There is inbuilt inflexibility because of the difficulty of amending the approach once
questionnaires have been despatched.
- There is also the possibility of ambiguous replies or omission of replies altogether to
certain questions; interpretation of omissions is difficult.
- It is difficult to know whether willing respondents are truly representative.
- This method is likely to be the slowest of all.

Before using this method, it is always advisable to conduct „pilot study‟ (Pilot Survey) for
testing the questionnaires. In a big enquiry the significance of pilot survey is felt very much.
Pilot survey is in fact the replica and rehearsal of the main survey. Such a survey, being
conducted by experts, brings to the light the weaknesses (if any) of the questionnaires and
also of the survey techniques. From the experience gained in this way, improvement can be
effected.

Researcher should note the following with regard to these three main aspects of a
questionnaire:
1. General form: So far as the general form of a questionnaire is concerned, it can either be
structured or unstructured questionnaire. Structured questionnaires are those questionnaires in
which there are definite, concrete and pre-determined questions. The questions are presented
with exactly the same wording and in the same order to all respondents. Resort is taken to this
sort of standardisation to ensure that all respondents reply to the same set of questions. The
form of the question may be either closed (i.e., of the type „yes‟ or „no‟) or open (i.e., inviting
free response) but should be stated in advance and not constructed during questioning.
Structured questionnaires may also have fixed alternative questions in which responses of the
informants are limited to the stated alternatives. Thus a highly structured questionnaire is one
in which all questions and answers are specified and comments in the respondent‟s own
words are held to the minimum. When these characteristics are not present in a questionnaire,
it can be termed as unstructured or non-structured questionnaire. More specifically, we can
say that in an unstructured questionnaire, the interviewer is provided with a general guide on

6
the type of information to be obtained, but the exact question formulation is largely his own
responsibility and the replies are to be taken down in the respondent‟s own words to the
extent possible; in some situations tape recorders may be used to achieve this goal.

2. Question sequence: In order to make the questionnaire effective and to ensure quality to
the replies received, a researcher should pay attention to the question-sequence in preparing
the questionnaire. A proper sequence of questions reduces considerably the chances of
individual questions being misunderstood. The question-sequence must be clear and
smoothly-moving, meaning thereby that the relation of one question to another should be
readily apparent to the respondent, with questions that are easiest to answer being put in the
beginning. The first few questions are particularly important because they are likely to
influence the attitude of the respondent and in seeking his desired cooperation. The opening
questions should be such as to arouse human interest.

3. Question formulation and wording: With regard to this aspect of questionnaire, the
researcher should note that each question must be very clear for any sort of misunderstanding
can do irreparable harm to a survey. Question should also be impartial in order not to give a
biased picture of the true state of affairs. Questions should be constructed with a view to their
forming a logical part of a well thought out tabulation plan. In general, all questions should
meet the following standards—(a) should be easily understood; (b) should be simple i.e.,
should convey only one thought at a time; (c) should be concrete and should conform as
much as possible to the respondent‟s way of thinking.

6.5.Collection of Secondary Data


Secondary data means data that are already available i.e., they refer to the data which have
already been collected and analysed by someone else. When the researcher utilizes secondary
data, then he has to look into various sources from where he can obtain them. In this case he
is certainly not confronted with the problems that are usually associated with the collection of
original data. Secondary data may either be published data or unpublished data. Usually
published data are available in: (a) various publications of the central, state are local
governments; (b) various publications of foreign governments or of international bodies and
their subsidiary organisations; (c) technical and trade journals; (d) books, magazines and
newspapers; (e) reports and publications of various associations connected with business and
industry, banks, stock exchanges, etc.; (f) reports prepared by research scholars, universities,

7
economists, etc. in different fields; and (g) public records and statistics, historical documents,
and other sources of published information. The sources of unpublished data are many; they
may be found in diaries, letters, unpublished biographies and autobiographies and also may
be available with scholars and research workers, trade associations, labour bureaus and other
public/ private individuals and organisations.

Researcher must be very careful in using secondary data. He must make a minute scrutiny
because it is just possible that the secondary data may be unsuitable or may be inadequate in
the context of the problem which the researcher wants to study. By way of caution, the
researcher, before using secondary data, must see that they possess following characteristics:

1. Reliability of data: The reliability can be tested by finding out such things about the said
data: (a) Who collected the data? (b) What were the sources of data? (c) Were they collected
by using proper methods (d) At what time were they collected? (e) Was there any bias of the
compiler? (f) What level of accuracy was desired? Was it achieved?

2. Suitability of data: The data that are suitable for one enquiry may not necessarily be
found suitable in another enquiry. Hence, if the available data are found to be unsuitable, they
should not be used by the researcher. In this context, the researcher must very carefully
scrutinise the definition of various terms and units of collection used at the time of collecting
the data from the primary source originally. Similarly, the object, scope and nature of the
original enquiry must also be studied. If the researcher finds differences in these, the data will
remain unsuitable for the present enquiry and should not be used.

3. Adequacy of data: If the level of accuracy achieved in data is found inadequate for the
purpose of the present enquiry, they will be considered as inadequate and should not be used
by the researcher. The data will also be considered inadequate, if they are related to an area
which may be either narrower or wider than the area of the present enquiry.

6.6. Selection of Appropriate Method for Data Collection


Thus, there are various methods of data collection. As such the researcher must judiciously
select the method/methods for his own study, keeping in view the following factors:

1. Nature, scope and object of enquiry: This constitutes the most important factor affecting
the choice of a particular method. The method selected should be such that it suits the type of
enquiry that is to be conducted by the researcher. This factor is also important in deciding

8
whether the data already available (secondary data) are to be used or the data not yet
available (primary data) are to be collected.

2. Availability of funds: Availability of funds for the research project determines to a large
extent the method to be used for the collection of data. When funds at the disposal of the
researcher are very limited, he will have to select a comparatively cheaper method which may
not be as efficient and effective as some other costly method. Finance, in fact, is a big
constraint in practice and the researcher has to act within this limitation.

3. Time factor: Availability of time has also to be taken into account in deciding a particular
method of data collection. Some methods take relatively more time, whereas with others the
data can be collected in a comparatively shorter duration. The time at the disposal of the
researcher, thus, affects the selection of the method by which the data are to be collected.
4. Precision required: Precision required is yet another important factor to be considered at
the time of selecting the method of collection of data.

But one must always remember that each method of data collection has its uses and none is
superior in all situations. Thus, the most desirable approach with regard to the selection of the
method depends on the nature of the particular problem and on the time and resources
(money and personnel) available along with the desired degree of accuracy.

6.7. Processing Operations


The data, after collection, has to be processed and analysed in accordance with the outline
laid down for the purpose at the time of developing the research plan. This is essential for a
scientific study and for ensuring that we have all relevant data for making contemplated
comparisons and analysis. Technically speaking, processing implies editing, coding,
classification and tabulation of collected data so that they are amenable to analysis. The term
analysis refers to the computation of certain measures along with searching for patterns of
relationship that exist among data-groups. Thus, “in the process of analysis, relationships or
differences supporting or conflicting with original or new hypotheses should be subjected to
statistical tests of significance to determine with what validity data can be said to indicate any
conclusions”.

1. Editing: Editing of data is a process of examining the collected raw data (specially in
surveys) to detect errors and omissions and to correct these when possible. As a matter of
fact, editing involves a careful scrutiny of the completed questionnaires. Editing is done to

9
assure that the data are accurate, consistent with other facts gathered, uniformly entered, as
completed as possible and have been well arranged to facilitate coding and tabulation.

2. Coding: Coding refers to the process of assigning numerals or other symbols to answers so
that responses can be put into a limited number of categories or classes. Such classes should
be appropriate to the research problem under consideration. They must also possess the
characteristic of exhaustiveness (i.e., there must be a class for every data item) and also that
of mutual exclusively which means that a specific answer can be placed in one and only one
cell in a given category set. Coding is necessary for efficient analysis and through it the
several replies may be reduced to a small number of classes which contain the critical
information required for analysis. Coding decisions should usually be taken at the designing
stage of the questionnaire.

3. Classification: Most research studies result in a large volume of raw data which must be
reduced into homogeneous groups if we are to get meaningful relationships. This fact
necessitates classification of data which happens to be the process of arranging data in groups
or classes on the basis of common characteristics. Data having a common characteristic are
placed in one class and in this way the entire data get divided into a number of groups or
classes.

4. Tabulation: When a mass of data has been assembled, it becomes necessary for the
researcher to arrange the same in some kind of concise and logical order. This procedure is
referred to as tabulation. Thus, tabulation is the process of summarising raw data and
displaying the same in compact form (i.e., in the form of statistical tables) for further
analysis. In a broader sense, tabulation is an orderly arrangement of data in columns and
rows.

Generally accepted principles of tabulation: Such principles of tabulation, particularly of


constructing statistical tables, can be briefly states as follows:
- Every table should have a clear, concise and adequate title so as to make the table
intelligible without reference to the text and this title should always be placed just above
the body of the table.
- Every table should be given a distinct number to facilitate easy reference.
- The column headings (captions) and the row headings (stubs) of the table should be clear
and brief.

10
- The units of measurement under each heading or sub-heading must always be indicated.
- Explanatory footnotes, if any, concerning the table should be placed directly beneath the
table, along with the reference symbols used in the table.
- Source or sources from where the data in the table have been obtained must be indicated
just below the table.
- Table should be made as logical, clear, accurate and simple as possible. If the data happen
to be very large, they should not be crowded in a single table for that would make the
table unwieldy and inconvenient.

6.8.Elements/Types of Analysis
Analysis means the computation of certain indices or measures along with searching for
patterns of relationship that exist among the data groups. Analysis, particularly in case of
survey or experimental data, involves estimating the values of unknown parameters of the
population and testing of hypotheses for drawing inferences. Analysis may, therefore, be
categorised as descriptive analysis and inferential analysis (Inferential analysis is often
known as statistical analysis). “Descriptive analysis is largely the study of distributions of
one variable. This study provides us with profiles of companies, work groups, persons and
other subjects on any of a multiple of characteristics such as size, composition, efficiency,
preferences, etc.”. This sort of analysis may be in respect of one variable (described as
unidimensional analysis), or in respect of two variables (described as bivariate analysis) or in
respect of more than two variables (described as multivariate analysis). In this context we
work out various measures that show the size and shape of a distribution(s) along with the
study of measuring relationships between two or more variables.

We may as well talk of correlation analysis and causal analysis. Correlation analysis studies
the joint variation of two or more variables for determining the amount of correlation
between two or more variables. Causal analysis is concerned with the study of how one or
more variables affect changes in another variable. It is thus a study of functional relationships
existing between two or more variables. This analysis can be termed as regression analysis.

In modern times, with the availability of computer facilities, there has been a rapid
development of multivariate analysis which may be defined as “all statistical methods which
simultaneously analyse more than two variables on a sample of observations”.

11
Multiple regression analysis: This analysis is adopted when the researcher has one dependent
variable which is presumed to be a function of two or more independent variables. The
objective of this analysis is to make a prediction about the dependent variable based on its
covariance with all the concerned independent variables.

So, Inferential analysis is concerned with the various tests of significance for testing
hypotheses in order to determine with what validity data can be said to indicate some
conclusion or conclusions. It is also concerned with the estimation of population values. It is
mainly on the basis of inferential analysis that the task of interpretation (i.e., the task of
drawing inferences and conclusions) is performed.

6.9.Statistics in Research
The role of statistics in research is to function as a tool in designing research, analysing its
data and drawing conclusions therefrom. Most research studies result in a large volume of
raw data which must be suitably reduced so that the same can be read easily and can be used
for further analysis. Clearly the science of statistics cannot be ignored by any research
worker, even though he may not have occasion to use statistical methods in all their details
and ramifications. Classification and tabulation, as stated earlier, achieve this objective to
some extent, but we have to go a step further and develop certain indices or measures to
summarise the collected/classified data. Only after this we can adopt the process of
generalisation from small groups (i.e., samples) to population. If fact, there are two major
areas of statistics viz., descriptive statistics and inferential statistics.

Descriptive statistics concern the development of certain indices from the raw data, whereas
inferential statistics concern with the process of generalisation. Inferential statistics are also
known as sampling statistics and are mainly concerned with two major type of problems: (i)
the estimation of population parameters, and (ii) the testing of statistical hypotheses. The
important statistical measures that are used to summarise the survey/research data are: (1)
measures of central tendency or statistical averages; (2) measures of dispersion; (3) measures
of asymmetry (skewness); and (4) measures of relationship.

6.10. Measures of Central Tendency

Measures of central tendency (or statistical averages) tell us the point about which items have
a tendency to cluster. Such a measure is considered as the most representative figure for the
entire mass of data. Measure of central tendency is also known as statistical average. Mean,

12
median and mode are the most popular averages. Mean, also known as arithmetic average, is
the most common measure of central tendency and may be defined as the value which we get
by dividing the total of the values of various given items in a series by the total number of
items.

Mean is the simplest measurement of central tendency and is a widely used measure. Its chief
use consists in summarising the essential features of a series and in enabling data to be
compared. It is amenable to algebraic treatment and is used in further statistical calculations.
It is a relatively stable measure of central tendency. But it suffers from some limitations viz.,
it is unduly affected by extreme items; it may not coincide with the actual value of an item in
a series, and it may lead to wrong impressions, particularly when the item values are not
given with the average. However, mean is better than other averages, specially in economic
and social studies where direct quantitative measurements are possible.

Median is the value of the middle item of series when it is arranged in ascending or
descending order of magnitude. It divides the series into two halves; in one half all items are
less than median, whereas in the other half all items have values higher than median. Median
is a positional average and is used only in the context of qualitative phenomena, for example,
in estimating intelligence, etc., which are often encountered in sociological fields. Median is
not useful where items need to be assigned relative importance and weights. It is not
frequently used in sampling statistics.

Mode is the most commonly or frequently occurring value in a series. The mode in a
distribution is that item around which there is maximum concentration. In general, mode is
the size of the item which has the maximum frequency, but at items such an item may not be
mode on account of the effect of the frequencies of the neighbouring items. Like median,
mode is a positional average and is not affected by the values of extreme items. It is,
therefore, useful in all situations where we want to eliminate the effect of extreme variations.

6.11. Measures of Dispersion


An averages can represent a series only as best as a single figure can, but it certainly cannot
reveal the entire story of any phenomenon under study. Specially it fails to give any idea
about the scatter of the values of items of a variable in the series around the true value of
average. In order to measure this scatter, statistical devices called measures of dispersion are
calculated. Important measures of dispersion are (a) range, (b) standard deviation.

13
(a) Range is the simplest possible measure of dispersion and is defined as the difference
between the values of the extreme items of a series.
(b) Standard deviation is most widely used measure of dispersion of a series and is
commonly denoted by the symbol „σ‟ (pronounced as sigma). Standard deviation is defined
as the square-root of the average of squares of deviations, when such deviations for the values
of individual items in a series are obtained from the arithmetic average.

Standard deviation (σ) = √

The standard deviation (along with several related measures like variance, coefficient of
variation, etc.) is used mostly in research studies and is regarded as a very satisfactory
measure of dispersion in a series. It is less affected by fluctuations of sampling. These
advantages make standard deviation and its coefficient a very popular measure of the
scatteredness of a series. It is popularly used in the context of estimation and testing of
hypotheses.

6.12. Measures of Asymmetry (Skewness)


When the distribution of item in a series happens to be perfectly symmetrical, we then have
the following type of curve for the distribution:

Such a curve is technically described as a normal curve and the relating distribution as
normal distribution. Such a curve is perfectly bell shaped curve in which case the value of
mean, median and mode is just the same and skewness is altogether absent. But if the curve is
distorted (whether on the right side or on the left side), we have asymmetrical distribution

14
which indicates that there is skewness. If the curve is distorted on the right side, we have
positive skewness but when the curve is distorted towards left, we have negative skewness.

Skewness is, thus, a measure of asymmetry and shows the manner in which the items are
clustered around the average. In a symmetrical distribution, the items show a perfect balance
on either side of the mode, but in a skew distribution the balance is thrown to one side. The
amount by which the balance exceeds on one side measures the skewness of the series. The
difference between the mean, median or the mode provides an easy way of expressing
skewness in a series.

Kurtosis is the measure of flat-toppedness of a curve. It may be pointed out here that knowing
the shape of the distribution curve is crucial to the use of statistical methods in research
analysis since most methods make specific assumptions about the nature of the distribution
curve.
6.13. Measures of Relationship
So far we have dealt with those statistical measures that we use in context of univariate
population i.e., the population consisting of measurement of only one variable. But if we have
the data on two variables, we are said to have a bivariate population and if the data happen to
be on more than two variables, the population is known as multivariate population.

Thus we have to answer two types of questions in bivariate or multivariate populations viz.,
(i) Does there exist association or correlation between the two (or more) variables? If yes, of
what degree?
(ii) Is there any cause and effect relationship between the two variables in case of the
bivariate population or between one variable on one side and two or more variables on the
other side in case of multivariate population? If yes, of what degree and in which direction?
The first question is answered by the use of correlation technique and the second question by
the technique of regression. There are several methods of applying the two techniques, but
the important ones are as under:

In case of bivariate population: Correlation can be studied through (a) cross tabulation; (b)
Charles Spearman‟s coefficient of correlation; (c) Karl Pearson‟s coefficient of correlation;
whereas cause and effect relationship can be studied through simple regression equations.
In case of multivariate population: Correlation can be studied through (a) coefficient of
multiple correlation; (b) coefficient of partial correlation; whereas cause and effect
relationship can be studied through multiple regression equations.
15
6.14. Simple Regression Analysis
Regression is the determination of a statistical relationship between two or more variables. In
simple regression, we have only two variables, one variable (defined as independent) is the
cause of the behaviour of another one (defined as dependent variable). Regression can only
interpret what exists physically i.e., there must be a physical way in which independent
variable X can affect dependent variable Y. The basic relationship between X and Y is given
by Y = a + bX + e
6.15. Multiple Correlation and Regression
When there are two or more than two independent variables, the analysis concerning
relationship is known as multiple correlation and the equation describing such relationship as
the multiple regression equation. We here explain multiple correlation and regression taking
only two independent variables and one dependent variable.

Multiple regression equation assumes the form


Y = a + b1X1 + b2X2 + e
where X1 and X2 are two independent variables and Y being the dependent variable

In multiple regression analysis, the regression coefficients (viz., b1 b2) become less reliable as
the degree of correlation between the independent variables (viz., X1, X2) increases. If there is
a high degree of correlation between independent variables, we have a problem of what is
commonly described as the problem of multicollinearity. In such a situation we should use
only one set of the independent variable to make our estimate. In fact, adding a second
variable, say X2, that is correlated with the first variable, say X1, distorts the values of the
regression coefficients. Nevertheless, the prediction for the dependent variable can be made
even when multicollinearity is present, but in such a situation enough care should be taken in
selecting the independent variables to estimate a dependent variable so as to ensure that
multicollinearity is reduced to the minimum.

16

You might also like