Claire Selltiz - Research Methods in Social Relations
Claire Selltiz - Research Methods in Social Relations
Claire Selltiz - Research Methods in Social Relations
EDITORIAL READERS
Revised
i.
One-Volume Edition
ACKNOWLEDGMENTS
January, 1967
Copyright 1951, 1959
by Holt, Rinehart and Winston, Inc.
Library of Congress Catalog Card Number: 59-8714
27803-0119
Printed in the United States of America
PREFACE TO THE FIRST EDITION
February 1959
CLAIRE SELLTIZ
MARIE JAHODA
MORTON DEUTSCH
STUART W. COOK
CONTENTS
3. Research Design: I
Exploratory and Descriptive Studies 49
Formulative or Exploratory Studies, 51
Descriptive:Studies,65
Summary, 78
4. Research Design: II
Studies Testing Causal Hypotheses 79
The Logic Qf Testing Hypotheses about Causal Relation-
ships,80i
Causal Inference from Experiments, 91
xiii
xiv CONTENTS
6. Data Collection: I
Observational Methods 199
Unstructured Observation, 207
Structured Observation, 221
7. Data Coliection: II
Questionnaires and Interviews 235
Comparison of Interview and Questionnaire,238
Question Content, 243
Types of Interviews and Questionnaires, 255
The Sociometric Method, 268, .
Visual Aids il). Interviewing, 272
A Concluding Note, 276
APPENDICES
Bibliography 589
Index 607
RESEARCH METHODS IN SOCIAL RELATIONS
1
THE RESEARCH PROCESS
An Illustration
I: PROPERT~ OF I
~ the Kansas S.ate Universil)' oj ~
i
I Agriculture & Applzed Science I
I TC M India. I
-
It cannot be that axioms established by argumentation can
suffice for the discovery of new works, since the subtlety of
nature is greater many times over than the subtlety of
argument. FRANCIS BACON
techniques are the tools of his trade. He needs not only to develop
skill in using them but also to understand the logic behind them.
But it is not only the student who intends to carry out research
who needs to know about research methods. The positions for which
social science students are likely to be preparing themselves-teaching.
administration in government or business, community consultation,
social work-increasingly can for the ability to evaluate and to use
research results~ to judge whether a study has been carried out in such
a way that one can have reasonable confidence in its findings and
whether its findings are applicable to the specific situation at hand.,
Even if one does not expect to make specific use of research find-
ings in his job, in our scientific age all of us are in many ways "con-
~umers" of research results. To use them intelligently, we need to be
'lble to judge the adequacy of the methods by which they have been
obtained. As a student, for example, you will find that many of t~e
"facts" presented in your courses rest on the results of research. But
you may discover that the "facts" reported by one study are quite dif-
ferent from those produced by another study. on the same point~ One
investigator, for example, may report that children who are weaned'
early grow up to be more independent and better adjusted than those
who are nursed for a longer time; another investigator turns up with'
just the opposite finding. Or several studies may conclude that wheQ
Negroes and whites live near each other, each group is likely to become
more favorably inclined toward the other; but other studies may
conclude that interracial hostility is likely to be especially intense in
neighborhoods where Negroes and whites live in close proximIty. In
order to be able to make a tentative judgment about which conclusion
merits more confidence, you need to be able to judge the adequacy of
the studies. Later sections of this book will .consider the criteria ,of
good research in detail. Here we may suggest simply that you wi1J
want to ask such questions as: How do the' investigators define their
terms? Are they really both talking about the same things, or have
they used the same words for different phenomena? Was the evidence
they gathered re1evant to the problem? Were there any obvious sources
of bias in the way the data were gathered? Were there different con~
ditions in the studies that might account for the difference in findings?
Even in the course of daily living, the average citizen increasingly
THE RESEARCH PROCESS 7
needs to be able to evaluate research in order to make intelligent
decisions. This is perhaps most clear at the present time with respect
to medical research and decisions based on it: Should I have my child
inoculated with "flu" vaccine? Should I vote for fluoridation of the
local water supply? Should I stop smoking in order to lessen the risk
of getting lung cancer?1 With the rapid increase in social science
research, it seems likely that the average citizen will increasingly be
presented with social science findings. At the present time, he has
relatively little occasion to evaluate these findings as a basis for his own
actions. But the person who knows how research is carried out is better
able to judge the probable accuracy of opinion polls or election predic-
tions and to view with appropriate skepticism the claims that "9
doctors out of 10 approve ..."
Y,' Besides all these practical advantages of familiarity with research
.
Maior Steps in Research
The obiect of this book is to describe in detail the procedures neces-
sary to discover answers to questions ,through research. But since
concern with detail often obscures perception of the whole, it is well,
before embarking on the examination of specifi.c procedures, to point
out some over-all aspects of the research process. , .
'The research process cOIisists of a number of closely related activi-
ties that overlap continuously rather than following a strictly prescribed
sequence. So interdependent are these acti0ties that the first step of
a research project largely determines the nature of the last. If sub-
sequent procedures have not been taken into account in the early
stages, serious difficulties may arise and prevent the completion of a
study. Frequently these difficulties cannot be remedied at the time
they become apparent because they are rooted in the earlier procedures.
They can be avoided only by keeping in mind, at each step of the re-
search process, the requirements of subsequent steps.
To be sure, as research proceeds from the conception of a. theme
for a study through the gathering of data to the production of a report
and the application of the findings, the focus of attention will ne~es
sarily shift from one activity to the next. This shift reflects a difference
in emphasis, however, rather than' an exclusive concentration on, one
step. A mechanically consecutive sequence of procedures, in which one
research step is entirely completed before the next is begun, is rarely,
if ever, the experience of social scientists.
The usual pattern of reporting research creates an oversimplified
expectation of what is involved in doing research. Customarily, a report
on completed research, when it appears as an article in a technical
journal, resembles, with minor modifications, t~ following model:
MAJOR STEPS IN RESEARCH 9
1. A statement of purpose is made in the form of formulating the
problem;
2. A description of the study design is given;
3. The methods of data collection are specified;
4. The results are presented;
5. Frequently, there follows a section on conclusions and inter-
pretation.
Whatever the individual variations from this model, published
research strongly suggests the existence of a prescribed sequence of
procedures, each step presupposing the completion of the preceding
one. Although this model is entirely justified in the interest of economy
of scientific reporting, it must not be mistaken for a model of the
research process, which differs from it in two respects: (1) The re-
search process almost never follows the neat sequential pattern of
activities suggested in the organization of research reports; and (2)_
the process involves many additional activities which are rarely men-
tioned in published studies.
Some of these additional activities are related to the scientific re-
quirements of the study; others to its practical demands. The ap-
parently simple reporting of the methods of data collection, for ex-
ample, summarizes decisions about the kinds of data needed and the
most efficient way of collecting them, and the activities carried out in
the development and pretesting of the data-collection instruments. In
addition to these steps, related to the scientific requirements of the
'study, there are other, more "practical," demands: the budget must be
planned; funds must be obtained and administered; personnel must
be allocated and, in some cases, specially trained; the setting within
which the data are, to be collected must be explored and the coopera-
tion of the, people 'in it must be gained; etc. In addition, if the study
is one designed to solve an immediate, practical problem, the antici-
pated application of the findings must be considered from the outset.
An Illustration
I,
The manner in which each step influences, and is influenced by,
others is perhaps best demonstrated by a brief case history of a research
10 THE RESEARCH PROCESS
and intimacy of contact with other women in the project, the social
supports for attitudes, and the characteristics of the housewives.
While the interview schedule was being developed, the samples
of women to be interviewed were being drawn from the lists of tenants
in the projects. During this same period, one of the staff members
began taking steps to recruit the number of skilled interviewers who
would be needed to collect the data within a reasonable time. The inter-
viewers were graduate students in social work and psychology. As soon
!1S the interview schedule was in a form that seemed reasonably satis-
factory, a "pilot test" was carried out; two or three of the most expe-
rienced interviewers, and the investigators themselves, carried. out in-
terviews with a small number of white housewives in other housing
projects similar to those that had been selected for the study. As was
expected, these pilot interviews pointed up questions that were not
clear, those that needed especially careful handling to avoid antagoniz-
ing the respondents, those that did not seem to elicit the inforniation
they were intended to get. Changes were made in the interview sched-
ule to overcome these difficulties. After another set of pilot interviews
had been checked, aU interviewers were trained in use of the sch~ule.
Each interviewer spent approximately twelve,hours in training sessiops
and conducted two practice interviews with residents of projects not
included in the study.
Finally, the actual interviewing got under way. Five hundred in-
terviews were conducted: four hundred with white housewives, one
hundred with Negro housewives. The interviews lasted, on the average,
from an hour to an hour and a half. There were nineteen intervieo/ers,
and all the interviews were completed within a month. During the
interviewing, the investigators spent time in each of the projects, super-
vising the assignments and inspecting the interviews as they were, cqm-'
pleted.
Once data collection had taken place according to the specified
research design, a number of irreversible decisions had been made
which largely determined the next step-analysis and interpretation. In
this study, the plan of data collection made it possible to compare the
white housewives in the area-segregated and the integrated projects
in terms of: (1) their extent of association with Negroes, (2) their
perception of the social norms concerning association with. Negroes,
AN ILLUSTRATION 19
(3) the relation between perceived social norms and extent of associa-
tion, (4) their attitudes toward Negroes living in the project and
Negroes in general, and (5) their attitudes toward living in an inter-
racial project_
The limitations and compromises that have already been pointed
out meant that It was not possible to be entirely sure that the differ-
ences found between the housewives in the two types of project repre-
sented the effect of occupancy pattern rather than general differences
in attitude in the two cities or differences in attitude between the two
groups of white housewives that existed before they ever moved into
the projects_ Nor was it possible to say whether the findings would hold
for projects with smaller proportions of Negroes and for projects where
the pattern of segregation was less marked-for example, in projects
segregated by buildings rather than total sections_
Finally, the pattern of area-segregation in the Newark proiects
meant that, even if it was found that the white housewives living in
integrated projects had more favorable attitudes toward Negroes than
those in the segregated projects, it would be difficult to draw inferences
about the processes that contributed to the difference in attitudes_ In
thinking about the ways in which different occupancy patterns might
be expected to lead to differences in attitudes, the investigators con-
sidered two major factors: (1) the extent to which Negro and white
tenants had occasion to see and to meet each other and thus had
natural opportunities to become acquainted, and (2) the implications
of the integrated and segregated arrangements in terms of social ap-
proval or disapproval at association between white and Negro families.
The pattern of segregation in the Newark projects, with white and
Negro families 1ivi~g in separate sections of the project rather than
in separate buildings scattered throughout the project, made it impos-
sible to disentangle the effects of perceived social standards from those
of simple physical :proximity, since in these area-segregated projects
not only did white and Negro families live relatively far from each
other but the segregated arrangement suggested social disapproval of
interracial associat\on. This was a limitation primarily on the potential
contribution of the study to theoretical knowledge of the dynamics
of attitude change; it did not interfere with the possibility of gathering
20 THE RESEARCH PROCESS
Negro women, and the contacts that were most frequently mentioned,
such as meeting in stores or in streets around the project, did not
provide natural opportunities for extended conversation.
Another analysis was made comparing the housewives in the in-
tegrated projects who reported that their attitudes toward Negroes
had become more favorable since living in the project with those who
reported no change in their attitudes. This showed, among other
things, that the women whose attitudes had changed tended to be
those who had more intimate contact with the Negro women in the
project and who believed that their white friends in the project ap-
proved of their friendly association with the Negro women. As was
pointed out earlier, the study design did not make possible an evalua-
tion of the relative influence of association with Negroes and of per-
ceived social approval of such association on the process of attitude
change.
Because of this and other uncertainties of interpretation men-
tioned earlier, plans were made almost immediately for another study,
which would build on this one and carry its findings still further (Wil-
ner, Walkley, and Cook, 1955). This new study was carried out in
four cities (all of them outside the New York metropolitan area).
The projects all had a quite small proportion of Negroes, and the seg-
regated projects took the form of scattered Negro and white buildings'
rather than separate areas. These two latter characteristics made it
possible to examine separately tIle influence of physical proximity and
of implied official standards, since so~e white women in the obuilding-
segregated projects lived closer to Negroes than did some white women
in the integrated projects. The findings of this study indicated that
physical proximity was the more important influence.
This illustration can indicate only in barest outline the mi.ttire of
the research process. As would be the case with any illustration, it does
not cover all the possibilities of interrelation and interdependence of
research steps.1_'he pattern of interaction among the various procedures
that constitute a scientific inquiry will, of course, vary from study to
study. The point of this illustration is to show not only that early
AN ILLUSTRATION 23
steps influence subsequent ones-an obvious matter-but also that tho
interaction of each step with others is a major consideration in its
selection, and that subsequent steps often lead to a reconsideration of
preceding ones. Social research is not a deductive process, in which
everything follows from some clearly defined premises; it is a contin-
uous search for truth, in which tentative answers lead to a refinement
of the questions to which they apply and of the procedures by which
they were obtained.
This volume describes the major steps in the process of social re-
search. The demands of organization require that these steps be dis-
cussed separately and consecutively, but it should always be kept in
mind that the steps are not so clearly demarcated from one another
as an organized discussion makes them appear to be.
Chapter 2 discusses the problems and considerations arising in
the selection and formulation of a research question. Chapters 3 and
4 deal with research design ,and its function in a scientific inquiry.
Variations in design are discussed, from the relatively
/
unstructured ex-
ploration of a problem to the rigorous testing of hypotheses by means
of controlled experiments.
Chapter 5 presents some general problems of measurement in the
social sciences. It provides a background for the next five chapters.
Chapters 6, 7, and 8 discuss three broad groups of data-collection
methods-observational methods, questionnaires and interviews, and
projective techniq~es. Chapter 9 treats the ,use of data already avail-
able, such as statis~ical records and the content of communications and
personal documents. Chapter 10 discusses techniques for placing in-
dividuals on scalesion the basis of data collected by any of the methods
considered in the preceding chapters.
Chapter 11 deals with the analysis and interpretation of da~
Chapter 12 with !pe writing of a research report; and Chapter 13 with
the application of research. Finally, Chapter 14 discusses the contin-
uousand close interrelationship between empirical research and thooIY,
24 THE RESEARCH PROCESS
Formulating Hypotheses
"
Defining Concepts
,
~stablishing Working Definitions
this country brings together for six weeks about two hundred young
men and women from all walks of life. Every region of the cO"!-lntry, .
every creed, and every race is represented. Some of the young people
are workers or farmers; others are clerks or students. They are selected
for this summer school because of the promise of leadership they have
shown in their unions, social organizations, or colleges. The sponsor-
ing agency aims at giving these young people information about the
world they live in, an experience in living together, and skills to enable
each of them to meet the demands of a leadership role in his own
sphere of life even more effectively than before. The organizers of this
voluntary venture invited a team of social scientists to discuss the
possibility of doing research on the school. The reason for the invita-
tion was one that prompts many agencies to seek help from .social
scientists-the institution hoped to obtain reassurance about the value
of the program and was convinced that science could establish this-
value beyond doubt. The topic of the envisaged research was clearly,
then, the value of the educational venture. The' discussion between
sponsors and social scientists had' the purpose of transforming this
topic, step by step, into a research problem.
As is customary in such discussions the social scientists started
,I
with, the question: What would you like to find out about your enter-
priser'This was followed by the equally customary counter-question:
What can social science tell us about it? The remainder of the session
demonstrated to both parties the difficulties of research formulation.'
After the original impasse had been overcome, the representa-
tives of the institution explained in complete detail their long-term
objectives. They already had considerable knowledge of the. 9'rica1
problems and short-range effects of their ,pioneering effort, but their
own high goals and their educational outlook prevented them from
accepting as a trustworthy yardstick of success the obvious enjoyment
of the experience by the young people. What they sought was some-
thing that would show itself out~de the confines of the educational
setting and throughout the later life of each participant.} The im-
mediate problems of administering the school-recruitment, program,
organization, etc.-were apparently well in hand.
Although the discussion of long-term effects was of considerable
FORMULATING THE PROBLEM 35
feasible to start with this aspect, it provides considerable guidance for
the other steps.
Formulating Hypotheses
A hypothesis is "a proposition, condition, or principle which is
assumed, perhaps without belief, in order to draw out its logical con-
sequences and by this method to test its accord with facts which are
known or may be determined" (Webster's New International Dic-
tionary of the English Language, Second Edition, Unabridged,
1956). The role of hypotheses in scientific research is to suggest ex-
planations for certain facts and to guide in the investigation of others.
The importance of hypotheses in research has been emphasized by
Cohen and Nagel (1934), who state:
We cannot take a single step forward in any inquiry unless
we,begin with a suggested explanation or solution of the difficulty
which originated it. Such tentative explanations are suggested
to us by something in the subject matter and by our previous
knowledge. When they are formulated as propositions, they are
called hypotheses.
The function of a hypothesis is to direct our search for the
order among facts. The suggestions formulated in the hypothesis
may be solutions to the problem. Whether they are; is the task
of the inquiry. No one of the suggestions need necessarily _lead
to our goal. And frequently some of the ,suggestions are incom-
patible with one another, so that they cannot all be solutions to
the same problem.
It seems to us that this is an accurate statement of the nature and
value of hypotheses in scientific investigation, but we believe it to be
too sweeping in its assertion that research cannot begin until a hy-
pothesis has been 'formulated. As we shall argue later, a very important
type of research h~s as its goal the formulation of significant hypotheses
about a particular topic. .
A hypothesisI, m~y assert that something is the case in a given
instance, that a ,particular object, person, situation, or event has a
certain characteristic. For example, Freud's book Moses and Mon-
36 FORMULATION OF A RESEARCH PROBLEM
one hand, concern with local self-government might lead the tenants
to become so absorbed in the small matters of the housing project
...
that they were not interested in larger political issues. On the other
hand, participation in the operation of a small community, where
the workings of political forces could easily be observed, might serve
as a training ground for participants and thus lead to a heightened in-
terest in and understanding of political forces on a national plane.
However, since it was not even known whether and to what ex-
tent the tenants availed themselves of the opportunity to participate
in self-government, it would have been premature for the investigators
to base the plan of their work on any hypothesis concerning the
effects of this participation. In the course-of the investigation, it became
evident that participation was the rule. Moreover, it was discovered
that in two vastly different housing projects, concern with local politics
did not replace but rather initiated and reinforced concern with na-
tional affairs. In view of this evidence, it may now be possible to begin
an investigation in public housing-or in any other setting, for that
matter-,in order to verify this hypothesis in different circumstances.
- , On the other hand, in the area of prejudice, where a great amount
of resear~h has already been done, investigations are possible in which
hypotheses can be formulated in advance. This was demonstrated in
the study of public housing (Deutsch and Collins, 1951) which was
discussed in Chapter 1. It will be; remembered that in this' study
the problem was the impact of occupancy pattern on relations between '
Negro and white tenants. In the light of previous research, the in-
vestigators were able to formulate l in advance a number of interrelate~
hypotheses about the effects of the occupancy patterns on theattit~des
of the tenants.
These examples make it cleal; that it is pointless to regard a study
which sets Qut with hypotheses as more '!scientific" than one which
ends with hypotheses. The time for formulation of hypotheses varies
with the nature of the problem and the extent of prior knqwledge
about it. Formulation and reformulation of research questi9ns is a
continuing process. As the German sooiologist Max Weber said,
"Every scientific fulfillment raises new questions; it asks to be sur-
passed and outdated."
DEFINING CONCEPTS 41
Defining Concepts
Any investigator, in order to organize his data so that he may
perceive relationships among them, must make use of concepts. A
concept is an abstraction from observed events or, as McClellat<i
-(1951) puts it, "a shorthand representation of a variety of facts. Its
purpose is to sim lif thinkin b subsumin a number ~o!. ~nts
Un er one general ,heading." Some concepts are quite close to the
llbjects or tacts fhey represent. Thus, for example, the meaning of the
concept dog may be easily illustrated by pointing to specific dogs. The
concept is an abstraction of the characteristics all dogs have in com-
mon-characteristics that are either directly observable or easily
measured. Other 'COncepts, however, cannot be so easily related to the
phenomena they are intended to represent; attitudes, learning, role,
motivation are of this sort. They are inferences, at a higher level of
abstraction from concrete events, and their meaning cannot easily be
conveyed by pointing to specific objects, individuals, or events. Some-
times these higher-level abstractions are referred to as constructs, since
they are constructed from concepts at a lower level of abstraction.
The :greater the distance between one's concepts, or constructs,
and the empirical facts to which they are intended to refer, the greater
the possibility of their being misunderstood or cfrelessly used, and tJhe
greater the care that must be given to defining them. They must be
defined both in abstract terms, giving the general meaning they are
intended to convey, and in terms of the operations by which they will
be represented in the particular study. The former type of definition
is necessary in order to link the study with the body of knowledge using
similar concepts or :constructs. The latter is an essential step in carry-
ing out any research; since data must be collected in terms of observable
facts.
As an illustrati~n, let us go back to Morris and Davidsen's study
of foreign students.iTheir ,hypothesis had to do with the effects of the
student's estimate of the regard in which Americans hold his country
on his attitudes toward the United States. Actually, as we have noted,
I
the hypothesis was1more complex than this. It involved the constructs
of "national status' gain" and "national status loss" through coming
42 FORMULATION OF A RESEARCH PROBLEM
Summary
Summary
O NCE THE RESEARCH PROBLEM has been formulated clearly enough
to specify the types of information needed, the investigator
must work out his research design. A research design is the arrangement
of conditions for collection and analysis of data in a manner that aims
to combine relevance to the research purpose with economy in pro-
cedure. It follows that research designs will differ depending on the
research purpose.
Each study, of course, has its own specific purpose. But we may
think of research purposes as falling into a number of broad groupings:'
(1) to gain familiarity with a phenomenon or to achieve new insights
into it, often in order to formulate a more, precise research problem
or to develop hypotheses; (2) to portray accurately the 'characteIistics
of a particular individual, situation, or group (with or without specific
initial hypotheses about the nature of these characteristics); (3) to
determine the frequency with which something occurs or with which
it is associated with something else (usually, but not always, with a
specific initial hypothesis); (4) to test a hypothesis of a causal relation-
ship between variables.
In studies that have the first purpose listed above-generally called
tormulative or exploratory studies-the major emphasis is on discovery .
\ of ideas and insights. Therefore, the research design must be flexible;
enough to permit t~e consideration' of many different a~pects of a
phenomenon.
In studies having the second and third purposes listed above; a
major consideration is accuracy. Therefore a design is needed that will
minimize bias and maximize the reliability of the,evidence collected.
(Bias results from the collection of eviqence in such a way that one
alternative answer to a research question is favored. Evidence is reli-
able to the extent that we can assert confidently that similar findings
would be obtained if the collection of evidence were repeated. For a
detailed discussion of bias and reliability in connection with measure-
ment procedures, see Ohapter 5.) Since studies with these second and
third purposes present similar requirements for research design, we
can treat them together; we shall call them descriptive studIes.
50
RESEARCH DESIGN: I 51
Studies testing causal hypotheses require procedures that will not
only reduce bias and increase reliability but will permit inferences
about causality. Experiments are especially suited to meeting this latter.
requirement. However, many studies concerned with testing hy-
potheses about causal relationships do not take the form of experi-
ments.
In practice, these different types of study are not always sharply
distinguishable. Any given research may have in it elements of two
or more of the functions we have described as characterizing different
types of study. In any single study, ,however, the primary emphasis is
usually on only one of these functions, and the study can be thought
of as falling into the category corresponding to its major function. In
short, although the distinctions among the different types of study
are not clearcut, by and large they can be made; and, for the purpose
of discussing appropriate research designs, it is useful to make them.
-
the various hypotheses that have been put forward, to evaluate their
usefulness as a basis for further research, and to consider whether they
suggest new hypotheses. More frequently, however, an exploratory
study is concerned with an area in which hypotheses nave not yet been
formulated; the tasf then is to review the available material with
sensitivity to the hyp'otheses that may be derived from it.
I
In many areas a bibliographical survey will undoubtedly be mOrC
time-consuming thah rewarding; often one will find that no research
of significance has, been done in one's area of interest. This is perhaps
less often true, ho\\[~ver, than is assumed by those who fail to build
upon the work of previous investigators. In any case, the conclusion
that there is no r~levant material would be unjustified without a
54 RESEARCH DESIGN: I
56 RESEARCH DESIGN: I
,
on Community Interrelations of the American Jewish Congress.
FORMULATIVE OR'EXPLORATORY STUDIES 59
so before the interview is to take place by sending him a copy of the
questions to be discussed. This gives him an opportunity not only to
do some advance thinking, but to consult his colleagues and to add the
knowledge to be gained from their experiences.
SOME BY-PRODUCTS OF EXPERIENCE SURVEYS. An experience survey,
as well as being a good source of hypotheses, can provide information
about the practica1 possibi1ities for doing different types of research.
Where can the facilities for research be obtained? Which factors can
be controlled and which cannot in the situations one might wish to
study? What variables tend to be clustered together in community
settings? How ready are agencies, professional workers, and ordinary
citizens to cooperate in controlled research studies of the problem in
question? The answers to these and similar practical questions may
be one of the by-products of a carefully planned experience survey. In
addition, such a survey may provide a census of the problems consid-
ered urgent by the people working in a given area. This census may be
extremely useful in establishing priorities in a program of research.
The report of an experience survey also provides a summary of the
knowledge of skilled practitioners about the effectiveness of various
methods and procedures in achieving specified goals. In lieu of more
definitive knowledge, this information may be of enormous value as
a guide to "best" practices in a given field. Of course, in presenting
such a summary, it should be made clear that the survey was in no
sense based on a random sample of workers in the field. Its usefulness
comes from the presentation of insights and effective practices rather
than from the presentation of the "typical."
his intensive studies of patients. So, too, profound changes in our con-
ception of the relationship between man and society have been brought
about largely by anthropological studies of primitive cultures.
From these examples it should be clear that we are nO,t describing
what is sometimes called the "case-study" approach, in the narrow
sense of studying the records kept by social agencies or psychothera-
pists, but rather the intensive study of selected instances of the phe-
nomenon in which one is interested. The focus may be on individuals,
on situations, on groups, on communities. The method of study may
be the examination of existing records; it may also be unstructured in-
terviewing or participant observation or some other approach.
What features of this approach make it an appropriate procedure
for the evoking of insights? A major one is the attitude of the investi-
gator, which is one of alert receptivity, of seeking rather than of test-
ing. Instead of limiting himself to the testing of existing hypotheses,
he is guided by the features of the object being studied. His inquiry
.is constantly in the process of reformulation and redirection as new
information is obtained. Frequent changes are made in the types of
data collected or in the criteria for case selection as emerging hypo-
theses require Rew information. '
A second feature is the intensity of the study of the individual,
group, community, culture, incident, or situation selected for investi-
gation. One attempts to obtain sufficient information to characterize
and explain both the unique features of the case being studied and
those which it has in common with other cases. In the study of the
individual, this may entail an extensive examination of both his present
situation and his life history. In 'the study of a group, an incident, etc.,
individuals may be treated as informants about the object, rather
than being themselves the objects of intensive analysis.
A third characteristic of this approach is its reliance on the inte-
grative powers of the investigator, on his ability to draw together many
diverse bits of information into a unified interpretation. This last
characteristic has led many critics to view the analysis of insight-stim-
ulating instances as a sort of projective technique, in which conclusions
reflect primarily
I
the investigator's predisposition ratlier than the ob-
ject of study. Even if this reproach is appropriate to many case studies,
the characteristic is not necessarily undesirable when the purpose is to
FORMULATIVE (,)R EXPLORATORY STUDIES 61
evoke rather than to test hypotheses. For even if the case material is
merely the stimulus for the explicit statement of a previously unform-
ulated hypothesis, it may serve a worth-while function.
Social scientists who work with this approach have frequently
found that the study of a few instances may produce a wealth of new
insights, whereas a host of others will yield few new ideas. Although
here, as elsewhere, no simple rules can be established for the selection
of the instances to be studied, experience indicates that for particular
problems certain types are more appropriate than others. We list below
some of these types, together with the purposes for which they have
been found most useful. The list is not exhaustive, nor are the types
mutually exclusive.
1. The reactions of strangers or newcomers may point up charac-
teristics of a community that might otherwise lbe overlooked by an
investigator reared in the culture. A stranger is likely to be sensitive
to social customs and practices that are more 'or less taken for granted
by the members of a community. His ouriosity Dr surprise or bewil-
derment may call attention to features of community life to which
members of the community have become so accustomed that they no
longer notice them.
2. Marginal individuals, or groups, who are moving from one cul-
tural grouping to another and are on the periph~ /
of both groups, are
similar in some respects to strangers or outsiders. Because they.are "in
between," exposed to conflicting pressures of the groups from and to
which they are moving, they can often reveal dramatically the major
influences operating in each 'group. For example, in the field of inter-
group relations, the study of emigrants, of displaced persons, of Jews
who are trying to be assimilated into local cultural groups, of Negroes
who are trying to "pass" as whites, of people who are in the process of
conversion to or ~rom Catholicism, of people in areas of disputed na-
tional sovereignty, is likely to be highly rewarding.
3. Study of ihdividuals or groups who are in transition from one
stage of developIPent to the next has been fruitful, particularly in
anthropological investigations of the influence of culture upon per-
sonality. In hi~ I,investigation of any culture, the anthropologist is
necessarily limite,d by time to a cross-sectional study rather than one
that would tracer individuals from birth to death. The study of indi-
62 RESEARCH DESIGN: I
Descriptive Studies
to talk at an earlier age than boys? Note that none of these questions,;
as they have been present~ involves a hypothesis that on~ of the
variab1es leads to or produces the other; questions embodymg such
hypotheses pose different requirements for research procedures.
This is a considerable arra~ of research interests, which we have
grouped under the heading of Ja!escriptive studies. We have grouped
them together because, from the p~int of view of research proced~res,
they share certain important charactt~tics. The resear~1: qu~sbo;nS
presuppose much prior knowledge of th&,:~roblem to be mvestIgated,
as contrasted with the questions that fo~ IIp.e basis for exploratory
studies. The investigator must 'be able to define 1.S;learly what it is he
wants to measure and must find adequate methods fo-i)I, measuring it. I~
addition, he must be able to specify who is to be include~~ in the defim-
t lon 0
f" a gIven
. commumty ." or" a.gIven popuI " 1'{Ii
atlOn. j:
collecting
evidence for a study of this sort, what is needed is not so Titeuch fleXI-
bility as a clear formulation of what and who is to be measd: ted , and
techniques for valid and reliable measurements.6 sh
Descriptive
- studies are not
. _limited
- to anyone method ot ill\data
~ ted
collection. They may employ any or ~1I of the methodsJto be prese,t 'J. ,
in subsequent chapters. Thus Lundberg, Komarovsky, and MclI1yery
(1934), in their study of leisure, collected information through inln r-
views, questionnaires, systematic direct observation,_
I.,i ,
analysis
_o
of cp;
munity records, and parti,cipant observation: "".-/
--
~rst step in a descriptive study, as in any other, is to define
the question that is t? be answered. Unless the objectives are specified
"with sufficient precision to ensure that the data collected are relevant
to the question rais~the study may not provide the desired informa-
tion.
In our example, the research question was: Do restaurants in New
York City discriminate against Negro patrons? But before data could
be gathered to answer this question, 'it was necessary to specify what
was meant by discrim.i.nation. It was defined as any inequality between
the treatment accorded white and Negro diners, unless there seemed
~ 7 For a more detailed discussion of descriptive studies, and especially those that
_ake the form of surveys of opinion, attitudes, etc., see Hyman (1955, Part II),
and Parten (1950).
68 RESEARCH DESIGN: I
the Negro team consisted of two men, the white team going to the
same restaurant consisted of two men; with few exceptions, both
teams were of about the same age. Since the entire group was relatively
I
the sample shows families in income class "A" spending 27.l per cent
of their income on rent, while the total survey shows that such families
spent 26.5 per cent of their income on rent; in income class "B," the
sample shows 22.6 per cent of income spent on rent, while the total
survey shows 22.7 per cent; and so on. For no income group does the
figure shown by the sample based on 1 family in 50 differ by more
than 2 percentage points from that shown by th~ complete survey. In
other words, by ;taking a sample of 1 in 50 instead of every working-
class hou.sehold 'iJ the city, essentially the same results would have
been obtained. That is, a substantial saving in time and effort Could
I
may then code, say, every twentieth case, in order to provide a check
on accuracy.
If the material is to be tabulated by machine, it must be entered
on appropriate cards; this is usually done by punching holes corre-
sponding to a given code. It is advisable to check the accuracy of punch-
ing; again, it is usual to check only a sample of the cards.
The accuracy of tabulation may also be checked by having a
sample of the tables re-done. However, at this stage it is possible to
make a rough check by comparing figures from different tables. For
example, the figures in each table should add up to the total number
of cases, unless there is reason to omit some from a given table. More-
over, certain classifications are likely to be used in more than one table,
and these figures provide a partial check on accuracy. For example, in
the restaurant survey, in addition to the basic table showing the num~
ber of restaurants in which discrimination was found and the'number
in which it was not, there were tables showing the number of res-
taurants in which a given kind of discrimination was encountered, the
Occurrence of discrimination in restaurants at different price levels, in
American and "foreign" restaurants, etc. If any of these mOre detailed
tables had shown a different number of restaurants as discriminatory
than the basic table, this would have been evidence of errOr.
Finally, statistical computations are needed in a study of any com-
plexity; averages, percentages, correlations must be computed. Again,
these operations may be checked by having a second person re-do a
sample of them.
Statistical operations of ~nother sort a re introduced for ,the pur-
4
Summary
Summary
Experimental observations are only experience carefuJ1y
planned in advance, and designed to form a secure basis of new
knowledge. R. A. FISHER
84 RESEARCH DESIGN: 11
could gather evidence of the three kinds needed to provide a basis for
inferring a causal relationship: concomitant variation, time order, and
the possible influence of other factors.
When an experiment is possible, it is the most effective method
of testing a hypothesis that one variable, X, causally influences another
variable, Y. Many questions of causal relationship lend themselves
easily to experimental study. For example, the investigator who is in-
terested in the relative effectiveness of group discussion and decision
versus reading a pamphlet or listening to a lecture as a method of
changing behavior can set up a situation in which one or more groups
of individuals discuss a certain issue and come to a decision about it,
wbile compa~ble individuals read a pamphlet or listen to a lecture on
the subject. Simil~rly, the influence of subliminal stimuli on the per-
ception of subsequent supraliminal stimuli can be investigated by ex-
posing subjects to such stimuli. Or the effects of "packaging" on the
evaluation of a product can be tested by giving one sample" of people
a product in a container of a given style, a comparable sample the'
same product in a different container. In such cases, the investigator
himself manipulates the independent' variable.
In other pr~lems, however, manipulation of the independent
variable by the experimenter, or assignment of subjects to different
treatments, is not feasible. Suppose one wishes to study the effects of
different methods of child-rearing on the personality structure of chilo
dren. He is not likely to be able to assign certain children to be brought
up in one way, others in another. (He might be able to do something
of this sort in the case of children in institutions, -but his findings could
I
notweII be generalized to children in family settings.). He mu;;t'proceed
by locating children who have been brought up in different ways and
assessing their personalities. If he fin,ds a correlation, he has secured
evidence of concomitant variation. In order to provide.a hasis for in-
ferring that the child-training practIces (X) are a cause, of the per-
sonality structure (Y), he must gather evidence that Y did not precede
X and that other possible factors are not the determining ones.
Ordinarily, the evidence on these points will be less convincing than
that provided by an experiment.
Hypotheses about the effects of attributes of individuals (rather
than of the situations in which they are placed) often are not amenable
LOGIC OF TESTING CAUSAL HYPOTHESES 91
to experimental investigation in the sense of manipulation of the "in-
dependent" variable by the investigator. To be sure, a hypothesis that
hungry subjects will be more likely to interpret ambiguous pictures as
representing food than will subjects who are not hungry can be
tested experimentally; the degree of hunger can be controlled reason-
ably well by specifying the length of time subjects must go without
eating before viewing the pictures. Many attributes of individuals,
however, cannot be manipulated in this way. Non-manipulatable at-
tributes are involved, for example, in such hypotheses as: Brain damage
impairs the ability to think abstractly; or, People will tend to remem
ber those parts of a message that are consistent with their own views
and to forget those that are contrary. The investigator working with
human subjects will not manipulate the variable of brain damage
willfully by destroying portions of the brain; he must seek existing
cases of brain damage. And he cannot assign certain views to certain
individuals; they bring their views with them. The investigator achieves
the variation, he wants, not by direct manipulation of the variable itself,
but by selection of individuals in whom the variable is present' or
absent, strong or weak, etc. He presents brain-damaged and non-
damaged subjects with the same task; he asks individuals with different
views to read the same passage; etc.
The logic of testing hypotheses about the presumed effects of an
attribute of a person, such as brain damage, which is not created ex-
perimentally, is essential1y the same as that of testing hypotheses in any
other nonexperimental study. The nonexperimental study, in its de-
sign, does not allow One to rule out in advance, with any confidence,
the possibility that the effect was created by some other factor that is
correlated with the presumed causal factor. Hence, one is faced with
the necessity of ruli~g out on an ex post facto basis (i.e., after the
presumed causal variable has already occurred) the possibility that
other factors correlated with the presumed causal factor may have
produced the observ~d effect. For example, if we exposed patients
with brain damage and patients without brain damage to a test, dif-
ferences in their test performance might reflect the effects of brain
damage or they ~i~ht reflect such other factors as differences in
anxiety that are asso~iated with different types of illness, differencet
associated with socipeconomic variables (e.g., brain damage occurs
92 RESEARCH DESIGN: II
differ more than would be expected by chance, one may infer that the
experimental variable led to the difference. This inference, of course,
must always be made tentatively, subject to the possibility that some
other factor may have led to the difference.
R. A. Fisher (1951), one of the outstanding figures in the develop-
ment of experimental design, has pointed out that: "
... the uncontrolled causes which may influence the result
[of an experiment] are always strictly innumerable. When any
such cause is named, it is usually perceived that, by increased
labour and expense, it could be largely eliminated. Too fre-
quently it is assumed that such refinements constitute improve-
ments to the experiment. ... whatever degree of care and ex-
perimental skill is expended in equalising the conditions, other
than the one under test, which are liable to affect the result,
this equalisation must always be to a greater or less extent incom-
plete, and in many important practical cases will certainly be
grossly defective . . . . the simple precaution of randomisation
will suffice to guarantee the validity-of the test of significance, by
which the result of the experiment is to be judged.
To go back to our television-teaching illustration: Let us say that
our subjects are to be all the eighth-grade children in a given schooi,
half of whom (the experimental group) will be assigned to a class in
which television will be used, half( the control group) to a class using
the conventional methods. But the children will certainly not all have
the same IQ; some of them m,ay already have more"science information
than others; some will be more interested than others in the subject
matter; some will have better eyesight than others; etc. Frqm the
point of view of the validity of the inferences to be drawn, it is neces-
sary that the experimental and the control groups shall ,not differ
on any of these variables to such an extent that it leads to a difference
in science information, as meas~red at the conclusion of the experi-
ment, which will be incorrectly interpreted as resulting from the dif-
ference in teaching methods. Since all tests of statistical significance
are based on the assumption that cases have been randomly assigned
to the groups being compared, they are specifically d~signed to take
account of chance differences in the initial characteristics of the two
CAUSAL INFERENCE FROM EXPERIMENTS 101
groups. Therefore the statistical test of significance6 offers protection
against the possibility that differences on the dependent variable that
result from chance initial differences between the experimental and
the control group will be incorrectly interpreted as effects of the ex-
perimental treatment.
The social scientist, however, is not always in a position to assign
cases randomly to different conditions. Compromises with the ideal
of random selection are often necessitated by practical circumstances.
In our television-teaching example, for instance, it may not be feasible
to select randomly from among all eighth-grade children those who are
to be~ssigned to the experimental class. In order not to disrupt school
routines, it may 'be necessary to assign existing classes to one or the
other treatment. In this case, the classes may be randomly assigned
to one or the other treatment, but this does not afford as much protec-
tion as the random assignment of individuals. Sometimes such com-
promises may be made without invalidating the bases for inference
within the study (though, in terms of our definition, a study in which
cases are not randomly selected does not constitute an experiment).
One extreme form of nonrandom assignment, however, does seriously
impair the .grounds for inference. This is assignment on the basis of
self-selection. For example, if an investigator wishes
/
to test the hy-
pothesis that social case work with the families of delinquent children
red?s the delinquent behavior, he would be ill advised to draw his
experimental sample from families who have voluntarily come to
social-agencies and his control sample from families with similarly
delinquent children who have not sought such help. The reaSOn is
dbvious: Families who seek help of this sort may have certain char-
acteristics that either directly affect the probability that the delinquent
behavior would bel roouced even without the case work service or that
make the service effective with them although it would not be with
other families. We might suppose, for instance, that a mother who
J
seeks the help of a social agency in dealing with her child's delinquency
is both more concerned about the delinquent behavior and more aware
6 For a brief dis~ussion of the meaning of statistical tests of significance, see
Chapter 11, pages 414-422. For a fuller discussion, consult any standard statistical
text.
102 RESEARCH DESIGN: II
of community facilities for dealing with it than a mother who does not.
Either of these characteristics might mean that she would be likely
to take steps intended to change her child's behavior even if she did
not have the help of a social case worker. And the fact that she applied
for case work help mjght mean that she would be more receptive to
it, and thus that it would be more likely to have an effect on the
delinquent behavior, than if she had been assigned involuntarily to
receive such help. The same principle applies whenever subjects place
themselves in the "experimental" or the "control" group.7
MATCHING. Although random assignment, where it is feasible, is
generally considered to provide adequate protection agaInst interpret-
ing differences on the dependent variable as resulting from the in-
dependent variable when in fact they stem from prior differences be-
tween the two groups, it is not the most effective procedure from the
point of view of increasing the sensitivity of the experiment., In the
interest of research efficiency, it is desirable that the experiment reveal
true differences brought about by the experimental treatment, even if
they are small in relation to differenc(ls produced by other variables.
In our t~levision example, teaching method may have less influence
than IQ on science information. Random assignment of children to
groups being taught by one or the other method would not be likely
to lead to exact matching of the two groups in terms of IQ. 'Bhis dif-
ference in IQ might lead to a difference in inforination at the end
of the study. As already noted, statistical tests of significance based
on the assumption of random sampling would provide protection
against attributing this difference to the difference in t,eaching
methods. However, there might be a small difference in the effective-
ness of the two teaching methods which would be obscured by the
difference in information related to IQ. The more such "extraneous"
differences are reduced, the more ch~nce there is for the effects of the
experimental treatment to show up.
An oversimplified hypothetical example may help to make this
point dear. Suppose, in our study of the effects of teaching ~cience with
7 This is a frequent problem in studies designed to test causal !hypotheses that
do not follow the pattern of controlled experiments. Methods of dealing with it are
considered in the section of this chapter which discusses such studies.
CAUSAL INFERENCE FROM EXPERIMENTS 103
the help of television, we took existing classes rather than individual
students as our sampling units. Suppose further that eight classes were
to be used in the experiment-four to receive the televised instruction,
four to serve as controls. Let us say these classes differed in average
IQ; four had a mean IQ of over 100 (these will be called the "highs"
in the table below), and four of under 100. If the classes were ran-
domly assigned to the television or no-television treatments, we might
have a pattern such as that shown below, with grades on the final test
shown in the right-hand column.
In this example, the mean score of all classes on the final test is
75. The mean score of the high-IQ classes is 82.5; that of the low-IQ
classes, 67.5. But both those with television and those without have
mean sco~ of 75, even though it is apparent from inspection of the
table tliarhigh-IQ classes with television instruction score higher than
high-IQ classes without, and that low-IQ classes with television instruc-
tion Score higher than low-IQ classes without it. But the fact that
random assignment has led to an arrangement whereby three of the
four classes receiving tel~vision instruction are of low IQ, while three
of the four without te1 vision are of high IQ, obscures the effect of
1
television when the average score of aU classes receiving the experi-
mental treatment is compared with that of all the control classes.
To illustrate the eff~ct of matching, let us suppose that the tele-
vision and no-television treatments had had equal numbers of high-
and low-IQ classes. Making the same assumptions as in the preceding
\
table about the relative contribution of intelligence and teaching
method to scores on the ;final test, the results would be as follows:
104 RESEARCH DESIGN: II
Now the mean score of the classes with television is 80; that of
those without television is 70. By equating the groups in terms of in-
telligence, the effects of teaching method have been permitted to
appear.
It should be noted that in the matching procedure it is impmtant
not to sacrifice randomization. In our example, randomization might
have been incorporated in the procedure in a number of ways; for ex-
ample, by tossing coins to determine which two of the four high-IQ
classes, and which two of the low-IQ classes, should receive the tele-
vision treatment. Or, if there were a large number of classes from which
to select, they might be divided into two groups-high- and low-IQ's;
then, by means of a table of random numbers, two classes from each
group might be selected for the television treatment and two from
each group to serve as controls.
The fact that matching may make an experime~t mor<;: sensitive
by controlling the effects of oth~r variables which might obscure that
of the variable in which the investigator is interested often leads ex-
perimenters to supplement randomization by matching procedures.
Two methods are commonly used: precision control and trequency
distribution controJ.8 Both, when combined-as they should be-with
randomization procedures, are methods of stratified random sa,mpling
(see Appendix B) .
The equating of groups by precision control involves matching the
individuals in the groups, case by case. To take a complicated' problem
-suppose we wish to determine the effect of psychoanalytjc therapy
of a certain sort upon the attitudes of prejudiced people. We would try
8 For a more detailed discussion of these methods of matching, as well as of the
method of random assignment, see Greenwood (1945).
CAUSAL INFERENCE FROM EXPERIMENTS 105
to set up two groups of persons who are matched, individual for in-
dividual, in attitudes and in factors that might be relevant to their
predisposition to attitude change. That is, for person A who is highly
prejudiced, who is exposed to pressures from his social group to be
prejudiced, who is intelligent, who has no, strong unconscious needs
that motivate his prejudices, etc., we would try to find an exact coun-
terpart, N. A would be assigned to one group and N to the other.
For B, who is moderately prejudiced, who is exposed to social pressures
not to be prejudiced, who is of average' intelligence, and who has
underlying insecurities that find an outlet in his prejudices, etc., we
would try to find a B'. And so on until for every individual in the ex-
perimental group we had a matched individual in the control group.
The matching of individuals is obviously a very difficult task for
several reasons. First, jf matching is to be precise and if individuals are
to be matched on several factors, there must be..a large number of cases
to select from ,in order to achieve an adequate pairing. AlI of these
cases' have to be measured in the relevant fadors, but only a few will
be us~d. The more precise the matching, and the.greater the number
of factors on which matching is to take place! the greater the number
of cases for which no match is available. Secondly, it is frequently dif-
ficult to know which factors, of the many possible relevant ones,
are the most ii'nportant to use in obtaining precision control. Matching
on more than two or three factors with any degree of precision is rarely
possible. Fortunately, however, relevant factors are ,often so interrelated
that matching on one factor brings with it partial matching on other
factors; there is a "diminishing return" as additional factors are con-
trolled. Third, it is often difficult to obtain adequate measures of the
Ifactors
I
on which it may be important
,
to match; consider, for instance,
lour suggested experiment on the effects of psychoanalysis. 1 no
adequate measures of the ass~med relevant factors are available, then
obviously matching is not like~y to be very accurate.
Successful matching .can greatly increase the efficiency of an
experiment by decreasing the I.size of the differences on the dependent
variable that would occur between the experimental and the control
groups by chance alone. When the chance differences are small, it is
easier to demonstrate a difference that is due to the effect of the ex-
106 RESEARCH DESIGN: II
s...
.-. I ~
...C'l ~ ~
...
s... s...
\3 \3 II ~
><><~~
s: ~
I ~
,01 ,'" I
e.;:... "
o (l) ~ II
'"
~z
0
.-< Z >< >< ~
s...
I
.,'" '".,
>< ><
.-<
s...
~ +<'1
~I
~
~~
",~II II
., ., ~ 0 .-i
><><'"0 >< Z s... ~
8~
..... ::l
ot:: 0...
CJo.o o
'" 0 o Z
"Z
><# Z
.,
'"
><
CA USAI. INFERENCE FROM EXPERIMENTS 111
greater than the chance differences taken account of by the test of
significance.
Although this design shares a problem of all social research, that
the measurement procedures used may alter the characteristic they are
intended to measure, the problem is less serious here than in "before-
after" studies. (The difficulties introduced by measurements made
before exposure to the experimental variable will be discussed later,
in connection with "before-after" designs.)
What about the effects of other contemporaneous events or ma-
turation? The assumption is made that both groups are exposed to the
same external events and undergo similar maturational processes be-
tween the time of selection and the time at which Y is measured. If
this assumption is justified, the position of the control group on the
dependent variable (Y'2) at the close of the experiment includes the
influence of the external events and maturational processes that have
affected both groups. Thus the difference (d) between Y2 and Y' 2 may
be taken as an indication of the effect of the experimental treatment,
provided that neither external events nor maturational processes inter-
act with tbe experimental variable to cbange its effects. (The possibil-
ity of interaction between the experimental variable and other factors
will be discussed later. )
a
The "after-only" design may be illustrated by study of the effects
of a film, Tbe Battle of Britain, carried out by the Experimental Sec-
tion of the Research Branch in the War Department's Information
and Education Division (Hovland, Lumsdaine, and Sheffield, 1949).
In this study, the experimental group was shown the film, the control
group was not. In assigning men to the two groups, random selection
of individuals did not seem feasible, since this would have required
pulling specified meri out of their regular units to see the film. Such
an unusual procedur~ would not only have ma<:1e for administrative
difficultie::; but would presumably have raised questions in the men's
minds about the purpose of the operation. Therefore selection of cases
for the sample was on the basis of company units rather than individ
uals. In view of evidence that companies differed in certain ways, two
groups of companies }vere set up which were comparable in character
istics such as average:score on the Army General Classification Test
education, age, region of birth, stage of training, etc. Randomization
112 RESEARCH DESIGN: II
took the form of tossing a coin to decide which of the matched groups
should see the film, which should not. 12
The experimental group was shown the film during their weekly
orientation hour, as part of the regular training procedure. The control
group did not see the film. Approximately a week later, the men in both
groups were asked to fill out a questionnaire as part of a War Depart-
ment survey "to find out how a cross-section of soldiers felt about
various subjects connected with the war." Mixed in with "camouflage"
items were a number of factual and opinion items that might have
been expected to be influenced by the film but were not so specifical~y
related to it as to suggest a connection between the "survey" and the
film. The measure of the effects of the film was the difference (in
excess of chance difference) between the proportion of the experi-
mental and the control groups who responded to each of the relevant
items in a given way.
"BEFORE-AFTER" EXPERIMENTS. In addition to the measures of Y
after exposure to the experimental variable, an investigator may wish
to have measures of Y before such exposure, for a variety of reasons:
1. As discussed in the section on selecting experimental and con-
trol groups, he may wish to increase the sensitivity of his experiment
by matching cases in terms of their initial position on the dependent
variable. As pointed out in that section, such matching should be
accompanied by procedures of random assignment.
2. He may want to check whether there is "room" for the experi-
mental variable to have an effect. For example, suppose one were
studying the effectiveness of an advertising campa~gn to induce women
to use perfume. If, by chance, one selected an experimental and a
control group of women such that 100 per cent of them were already
using perfume before the onset of the advertising campaign,_ it would
I
variable. Thus evidence that the two groups were initially comparable
in their position on the dependent variable is only a partial substitute
for random assignment.
Studies using "before" as well as "after" measures of position on
the dependent variable may follow various arrangements with respect
to control groups. (1) Only one group may be used in the study, with
the "before" measure serving as a "control" in the sense that it is
assumed to represent the level of the dependent variable in the absence
of the experimental treatment. (2) The "before" measure may be taken
on one group and the "after"'measure on a different but presumably
equivalent group. (3) "Before" and "after" measures may be taken
both on the experimental group and on one control group. (4) There
may be two or more control groups. These four patterns will be dis-
cussed below.
Whatever the pattern of control groups, the "before-after" e~peri
I1lent, like the "after-only;'! 'provides evidence of concomitant varia-
tiQn between the independent and'the dependent variables by com-
paring the occurrence (or the extent, or th.e increase) of Y in the group
that has been exposed to X with the occurrence (etc.) of Y in the
group that has not been exposed to X. That Y did not occur before X
is inferred from the assurance provided by randomization that the
groups are not likely to have differed initially in their position on Y
by more than the specified chance amount taken account of in the
test of significance. This initial equivalence with respect to Y may be
checked by comparison of the "before" measures of the hyo groups.
In the everit that random assignment has not been possible, the "be-
fore" measures still provide evidence of whether there were differences
in Y that preceded differences in X. But if random assignment has not
been possible, there is no basis for ruling out the possibility that there
were greater-than-chance differences on ()ther factors that might, ac-
count for a difference in position on Y after exposure to the experi-
mental treatment.
The variations in control group arrangements are concerned with
attempts to take account of contemporaneous events, maturational
processes, and the effects of the initial measurement. Although the
measuring process itself may affect the characteristic being measured
in any type of social research, the' "before-after" desi~n is especially
CAUSAL INFERENCE FROM EXPERIMENTS 115
subject to this difficulty. For example, the attempt to measure the
subjects' attitudes before the experiment begins may crystallize the
attitudes; it may exhaust the good will of the subjects; etc. The second,
or "after," measurement may introduce other problems: the subject
may be bored and therefore unwilling to respond; he may try to give
responses that are consistent with his previous responses (thus mini-
mizing the apparent change); or he may try to make his responses
"interesting" by varying them from one interview to the next (thus
increasing the apparent change). The process of repeated measurement
may also affect the "measuring instrument"; for example, in the course
of repeated measurements, an observer may become bored, fatigued,
more sensitive or less sensitive to the phenomena he is recording.
The different control group arrangements differ in the extent of
protection they offer against mistakenly attributing to the independent
variable differences on the dependent variable that may really be due
to other contemporaneous events, to maturation, or to the effects of
the initial measurement
The "before-after" study with a single group. Barker, Dembo, and
Lewin (1941), in their study of the effects of frustration on young
children's play, used a "before-after" design without a control group.
Each child was taken into a room where there were simple toys with
which he was allowed to play for half an houri" during this time his
play was rated by an observer on a scale of "constructiveness." Next
a partition was raised; in the part of the room now exposed was an
elaborate and attractive set of toys. When the child had become
thoroughly involved in playing with these, the experimenter took him
by the hand, led him back to the part of the room in which he had
been playing earlier, and locked the new toys behind a wire-net parti-
tion through whichl the child could still see them. The child's play with
the original toys 'Vas again rated for constructiveness during a half-
hour period. The qifference in ratings of constructiveness of play dur-
ing the "pre-frustration" and "post-frustration" periods was taken as
evidence of the amount of regression 'induced by the frustrating ex-
perience. (At the end of the experiment, the child was allowed to play
as long as he wished with the more attractive toys, in order to undo
I, .
the frustrating effects of the expenment).
In this design~ each subject "serves as his own control." The differ-
116 RESEARCH DESIGN: II
ence between his position on the dependent variable before and aftel
exposure to the independent variable is taken as a measure of thE
effect of the independent variable. (See Column 2 of table on page
110). But other influences may have operated between the "be-
fore" and "after" measures. External events unrelated to the experi-
mental treatment may lead to a change in position on the dependent
variable; so may processes of growth and development. The initial
measurement itself (in this case, the period of play on which the initial
rating was made) may lead to changes. This design does not make it
possible to separate such effects from those of the experimental treat-
ment. Thus its use is justified only when one has good reason to believe
(as did Barker, Dembo, and Lewin): (1) that the "before" measure
itself will not in some way affect either the response to the experimental
treatment or the "after" measure; and (2) that there are not likely
to be any other influences, besides the experimental treatment, dpring
the course of the study that might affect the subjects' response at the
time of the second measurement. In order to be reasonably sure that
such assumptions are justified, one must !lave considerable knowledge
of the probable effects of his measurements and of the conditions
other than the experimental treatment that are likely to influence the
dependent variable. This may be true of many problems in such fields
as learning and sensory perception, where much experimental work
has been done; it is much less likely to be so, at the present time, in
social psychology and sociology.
The "before-after" study with interchangeable groupS.14 One ap-
proach to ruling out the effects pf the initial measurement is to meas-
ure one group before the introduction of the experimental fa~tor, a
different group after exposure to the experimental factor~ The two
groups are selected in advance from the population that is to be ex-
posed to the experimental variable; as in other designs, random selec-
tion provides assurance that the groups-probably did not differ by more
than a specifiable amount before introduction of the experimental vari-
able, and thus that they may be treated as interchangeable. Again,
matching may be used to supplement randomization. The .difference
14 In the earlier edition of this book, this was called the "simulated before-after"
design. D. T. Campbell, in a forthcoming paper ("Quasi Experil1\ental Designs
for Use in Social Science Settings"), refers to it as the "offset before and after"
design.
CAUSAL INFERENCE FROM EXPERIMENTS 117
between the "before" measure taken on the first group (Y'l) and the
"after" measure taken on the second group (Y2 ) is assumed to be a
measure of the effect of the experimental factor. (See Column 3 of
the table on page llO).
This design was used in a study of a publicity campaign about the
United Nations in the city of Cincinnati (Star and Hughes, 1950).
Two equivalent samples, of a thousand persons each, were drawn from
the city's population. One was interviewed before the start of the
publicity campaign, the other two months later. To determine the
effectiveness of the campaign, the responses of the two groups were
compared. As it turned out, there was very little difference between
them.
The "before-after" study with interchangeable groups eliminates
the possibility of -confounding an effect of the initial measurement
with that of the experimental variable. Suppose the same group of
respondents had been interviewed before and after the campaign. The
initial interview might have aroused their interest in the United Na-
tions and thus made them especially sensitive to the publicity cam-
paign. If this were true, a simple "before-after" study, in which the
difference between the "before" and "after" responses of a single re-
interviewed group was taken as the measure of the effect of the experi-
/
comparison of the change in this group with those in the other groups.
It may be observed that this four-group design amounts to doing
the experiment twice, once with a "before-after" design with one
control group (experimental group and control group I), and once
with an "after-only" design (control groups II and III). If the results
of these two experiments are consistent, we have greater assurance that
the outcome is not an artifact than we would with either version
alone, since we have replicated the finding within the study.
REPRESENTATIVE DESIGN
segregated projects about their initial attitudes did not differ con-
sistently, whereas their current expressions of attitude did, the in-
vestigators concluded that recall was not. being systematically distorted
by attitudes at the time of the study. Further, they found that women
who were similar in education, religion, and political attitudes gave
similar reports of their initial attitude toward Negroes, regardless of
which project they lived in; this consistency was taken as further
evidence against distortion of recall by present attitudes.
Gathering evidence through studies extended over time. In studies
which are limited to a single interview or observation or other measure-
ment of each respondent, and in which the investigator does not have
supplementary information about individuals' experiences, there is
little possibility of getting evidence about time sequences except by
asking the respondent to recall when things happened. But in studies
that focus on the same people over a period of time, the investigator
may secure direct evidence of time relationships among variables. Such
longitudinal studies may take the form of repeated observations of the
same subjects, or 'fepeated interviews with them,17 OF of different meas-
urement procedures at different times.
Stouffer et al. (1949a) provide an example of a study using differ:
ent kinds of data about the same subjects at different times. The in-
vestigators were interested in the relation between acceptance of the
official value-system of the Army and promotion. Had they simply
intef\;iewed a cross-section of Army personnel and found ~at those
of higHer rank expressed attitudes and opinions more in line with
official Army values, they would have h~d no grounds for inferring
whether acceptance of the official value-system was conducive to
promotion or whether being promoted increased acceptance of the
! system. To avoid this dilemma, they interviewed ,a group of newlY
inducted soldiers, using questions fromwhich an index of "acceptance
of Army value-system" could be constructed. Four months later, they
examined the. Army records of these same men... and found that a
higher proportion of those who had expressed views in keeping with
the Army's values had become privates first-class than-of those who
had not. Thus it was clear that conformity with the Army's value-
J 7 For a detailed discussion of studies using repeated interviews with the same
respondents ("panel studies"), see Rosenberg, Thielens, and Lazarsfeld (1951).
CAUSAL INFERENCE FROM OTHER STUDY DESIGNS 133
system was conducive to promotion. (It is, of course, entirely likely that
the relationship between these two variables is a mutually reinforcing
one; further research might well have shown that after promotion,
views were even more in line with official Army position.)
SEARCH FOR PATTERNS OF RELATIONSHIP INFERRED FROM COMPETING
CAUSAL ASSUMPTIONS. Sometimes one can infer which of two factors
that vary together is the "causal" one on the ground that the two
variables would show a certain pattern of association if X were the
"cause," a different pattern if Y were the "cause." For example, it is
sometimes reasonable to expect that if X were the cause, it would
affect Y cumulatively-that is, that individuals who had been exposed
to X for a longer time would show a higher degree of Y-but that this
would not be so if Y were the causal factor.
Such an inference was central to the plan of a study by Newcomb
(1943, 1947), which focused on the question of what kinds of people
accept certain kinds of social change. One of the hypotheses of the
study was that "values come to be values largely through the mediation
of the groups with which an individual has direct contact." Studying
stu~ents at Bennington College, Newcomb considered the college
community as a group with which the students had direct contact, and
attitude toward public affairs as a relevant value. This attitude was
/
selected because ,the college was characterized by a high degree of
concern with public affairs and a "liberal" attitude on controv~rsia)
issues. The investigator reasoned that if group membership werefudeed
the causal variable, then those who had been exposed to the group
atmosphere for longer periods should show attitudes more in keeping
with those characteristic of the group (in this case, more liberal atti
tudes). If, on the other hand" it was the possession of liberal attitudes
that led to attending the col~ege, there would be less reason to expect
an increase in liberalism with increased years of attendance. 1s Using a
variety of measures, Newcomb found that length of exposure to the
Bennington community, as iqdicated by college class, was accompanied
by increased information about public issues and increased liberalism
18 Again, of course, there is Ithe possibility that attending the college migh~,
further have strengthened initial liberal attitudes, but presumably this would not1
have led to such marked differences between longer.exposed and shorter-exposed'
students as would be expected if college membership were the major causal factor.
134 RESEARCH DESIGN: II
.
Klineberg commented:
This difference may be due to improvement in the schooling in
the South; in any case there is no evidence that the more recent
arrivals are inferior. The conclusion is therefore justified that the
superior showing of those subjects who have had a longer period
of residence is due to this longer residence, and n~t to any regular
change in the quality of the migrants.
There are, of course, methods other than repetition of the study
to determine whether other factors may be responsible for the dif-
ferences found, on the dependent variable. For example, Klineberg
provided two other checks on the hypothesis that differences in the
intelligence of Negr~es who left the South at different times might
account for differences in IQ related to length of residence in New
York. He investigated\reasons for migrating from the South, and found
nothing in these reasqns to support the hypothesis that factors leading
to migration might be expected to correlate with intelligence. He also
studied records of southern schools attended by Negro children, and
found no systematic 1,ifference in the relative class standing of children
who subsequently migrated and of those who did not. Both these lines
of evidence suggested that, at least.during the period covered by these
136 RESEARCH DESIGN: II
Summary
/
5
SOME GENERAL PROBLEMS
OF MEASUREMENT
Scales of Measurement
Summary
I PROPERTY OF I
71z~ Kansas S.ate University of I
Agriculture & Applled Science I
I, T C M India. -'
Measurement . . . is more than the pedantic pursuit of a
decimal place. Its vital and absorbing aspect emerges most
clearly perhaps when it becomes a question of measuring some-
thing that has never been measured. Or better still, something
that has been held to be unmeasurable. s. s. STEYENS
ness, the various distractions, etc., all tend to affect responses of the
subject. If the situations of measurement vary from individual to in-
dividual or from one measurement to another, a considerable variation
in scores is likely to result from such factors quite apart from the true'
differences among individuals with respect to the attribute being
measured.
5. Differences due to variations in administration. Inadequate and
nonuniform methods of administering a measuring instrument may
contribute to variations jn scores. Interviewers may add questions,
change wording, revise the order, omit questions, etc., in such a way
as to make one interview noncomparable with another. A bored test
administrator may improvise his own instructions; a satiated coder
may glance at rather than read the item to be coded; a tired observer
may not be able to keep recording the constantly changing 'group
process. All of these variations in the use of a measuring instrument
may markedly affect both the consistency with which a given coller,
obs~rver, etc., rat~ the responses of various individuals and the con-
sistency of rati-qg fkom on~ coder, observer, etc" to another.
, Bothl the !situation
< t', '
iii which the measurement is made and the
method of ad~inistiatlon may influence the orientation with which
the subject answers-for example, whether he responds in terms of what
he believes to be true, of what he thinks the measurer considers the
"right" answer, etc.
6. DiHerences due to sampling of items. Any measuring instrument
necessarily taps only a sample of items relevant to the charact~ristic
being measured. Thus, an attitude questionnaire contains only a rela-
tively few' items' 'from the universe' of relevant items 'that m~ght haVJ
heen included. If we conceive of a score broadly, as a measure of at'
htude, rather than narrowly, as the score on a specific qu~stionnaire,
jt is apparent that the variations in attitude as masured by different
questionnaires will be, in part, dependent on the nature of the sample
-of items included in the questionnaires. For example, in one question-
naire dealing with attitudes toward Negroes, the particular items in-
cluded may happen to be those on which a given individual is more
jlikely to respond f~voiably than he would on another questionnaire
.consisting of a different sample of items.
It is obvious that, if other things are equal, a one-item question-
:naire is likely to be fl less adequate sample of the total universe than a
VARIATIONS IN SCORES 153
questionnaire with thirty items. Similarly, ratings based on a few ob-
servations or made by a single observer are not as trustworthy as ratings
based on many observations by several observers. Increasing the number
of items (provided the added items are equally appropriate to the pur-
poses of the given questionnaire), or the amount of relevant material
on which a score is based, makes it likely that the variation in scores
attributable to this source will decrease.
7. Differences due to lack of clarity of the measuring instrument.
If individuals understand the items in a measuring instrument differ-'
ently, variations in their responses may reflect these differences in inter-
pretation rather than true differences in the characteristic one is
3ttempting to measure. Frequently the categories in a coding or
observational instrument are complex and ambiguous; different coders
or observers may interpret the categories differently and assign similar
responses to different categories~\'Interview questi.ons may ~e so long,
or phrased in such a complex way, that some respondents do not un-
derstand them; the responses of these subjects can hardly constitute
an adequate indication of the characteristic.or attitude at which the
questions were aimed. Words such as free enterprise or liberty, which
are emotionally colored or which have special connotations not com-
mon to all people measured, may set off differential reactions not di-
rectly r~lated to toe characteristic which the instrument aims to meas-
ure. Even apparentlysimple questions may be unclear if their context
is am'Qiguous. Take, for example, t~e following question used in a
survey of a coll~ge community: :'Duririg the last week, did you visit
the home of any faculty member?" If the interview took place im-
mediately after a week of vacation, some respondents might interpret
the question to mean "during the last regular week of classes," others
as "the preceding :week-i.e., during the vacation." Simplicity, con-
creteness, and a high degree of specificity are to be desired in measurillt;
instruments. 2 i
8. Differences, due to mechanical factors. Circumstances such as
broken pencils, check marks in the wrong. box, poorly printed instruc-
tions, lack of space to record re5ponses fully, play their role in prevent-
ing the most effective functioning of a measuring instrument. MallY
2 This statemrnt does not apply when the characteristic the investigator is
trying to measure is the way in which a subject interprets an ambiguous situation
-as i< the case in m!?~t Dfoiective techniques (see Chapter 8).
154 GENERAL PROBLEMS OF MEASUREMENT
errors.
-
uation from one occasion to another, rather than constant or random
.... - -- - -- - -- . _ , . , _ .
CONSTRUCT VALIDITY
_-
characteristic being reflected is not something which can be .point._<:!d_
to or identified with soIiIe'spe'Ciflclfnd
_._.,
6
~... . of beh~vior; rather, it is an ab-
straction, a construct. Therefore the process of validating this kind
~ing-instrumk.nt "'isi=ererfecf to as conStruct validatlon.
Many of the measures used in the social sciences deal with con-
c:!
~u~of intel~ig<:nceL.<?f~attitudes,_ ~~t_!1.orita~ia~lsm, of
introversion-eXtroversion, of ~i~~y.J or of more global personality pat-
terns, erc05f1lilS sort. qronbach and Meehl P955), who first~~
explicit the concept of construct validity, p-ointed out~he defini-
tions of such co_nsb.:ucts-CQ[!sis..Un_p_grt ~tS.2LP-.I2Rositions about
-'their relatiol1ships. to other via~les-other constructs or directly
~havim,.Thus, in examining constru~t validity, it Tsap-
propriate to as~ such guestTOii'Sas=: Whafp~edi~tIons would one make,
o""i11lle"15'iiSlSOf these sets of propositions~ about the relationships to
other variables of scores based on a measure of this construct? ~the
measurements obtain~d by using this instrument consistent with these
prealcbons? .~ ='" . - - -- --'
attitude toward authority figures were made on the basis of each of the
following methods: interview questions about father and present
superior officers (the respondents were Air Force trainees); a list of
traits to be ,checked as descriptive of father and of immediate superior;
written character descriptions of photographs of middle-aged and older
persons (intended as symbolic authority figures); stories about scenes
containing symbolic authority figures; an autobiographical inventory; ,
an attitude survey; and a sociometric questionnaire.
Each of these methods was also used to measure a second charac-
teristic: attitude toward "nonauthority figures" (present colleagues, a
past fellow worker, "symbolic peers" represented by pictures of young
persons). This second characteristic-attitude toward nonauthority
figures-was measured in order to determine whether the attitudes
expressed toward authority figures were indeed specific to persons in
authority or whether they were expressions of attitudes toward people
in general. If there were a high positive correlation between attitudes
expressed toward authority figures and toward nonauthority figures-
that is, if individuals favorable toward authority figures were' also
favorable to nonauthority figures and those who were unfavorable
toward one wer~ also unfavorable toward the other-one would con-
clude that what was being tapped by the first group of measures was not
a specific attitude toward authority figures but a more general attitude
toward people. On the other hand, if there were little or no correla-
tion, or a negative correlation, between the measures of the two types
of attitudes" one would conclude that the first set of measures was in-
deed getting'\at attitudes specificallY,directed toward authority figures.
As it t~rned out, the measures of attitude toward authority figures
showed so 'little agreement that there seemed no basis for believing
that any consistent attitude had been tapped; thus there was no point
in trying to determine whether these 'measures, were getting at a spec,ific
attitude that could be distinguished from attitude toward nonauthority
figures. Ratings made on the basis of the interviews showed a high
correlation between attitude toward father and attitude toward supe'
rior Dfficers; had this been the only method used, the investigators
might have concluded that they had successfully measured a general-
ized attitude toward authority figures. However, the ratings based on
different methods showed little agreement with one another; more-
THE VALIDITY OF MEASUREMENTS 163
over, techniques other than the interview showed little correspondence
between attitude toward father and attitude toward superior officers.
In such a situation, one faces the question whether the measuring in-
struments are invalid or whether the construct one is attempting to
measure (in this case, "attitude toward authority figures") is somehow
faulty. In this study, the investigators reasoned that the number of
different methods they had used provided a basis for concluding that
the difficulty was with the construct rather, than the measuring instru-
ments. Although they recognized that anyone or more of the measures
might have been invalid, they thought it unlikely that all of them were
inadequate indicators of the construct "attitude toward authority."
In view of the fact that no two of their measures showed high agree-
ment, they concluded that their findings required a modification of
the assumption that each individual has a generalized attitude toward
authority which reflects his attitude toward his father.
From this discussion, it is apparent that construct validity cannot
be adeguatel>; tested by_a_Dy_singk.l~rocedure. Evidence from a number
'of sources is relevant: correlation with other tests and with other
~ehavior, internal consistency-of ite1!l~ ,stability over ti;;e;~' 'Ho~
eVIdence-from each of these sources bears on estimation of the validity
of the test depends on the relationships predicted in the theoretical
network in which the construct is embodied. The more different rela-
~p-s tested..allci.coufirmed,jhe. ,greater the support both for the
measuring instrument and for the unded)illJgjJ1~r.i.' . -
eluded in his test refer to topics which may be more familiar to some
individuals than to others, and which may thus test knowledge of the
topic rather than reading comprehension; whether they involve peculi-
arities of style that may present more difficulty to some individuals
than to others; etc.
time interval ~twe~n the two Tests is short) and, in the second test,
maygiveagain the responses he remembers (or misremem bers) having
made earlier rather than responses which are spontaneous or tnought
through anew in the second situation.
There is the further possibility that the initial measure has actually
changed the characteristic being measured. (The reader is reminded
THE RELIABILITY OF MEASUREMENTS 171
of the discussion of the "before-after" experimental design in the
preceding chapter.) An interview, a situational test, an attitude ques-
tionnaire may raise questions a person has never thought about and
may heighten interest and stimulate the development of definite
opinions; thus, for example, a "don't know" response may be replaced
by a definite agreement or disagreement.
In addition to the possibility of changes brought about by the
initial measurement, there is-as with all types of measurement-the
possibility of genuine change between the two administrations of the
test. As a result of influences unrelated to the testing, some subjects
may have acquired more information, or undergone a shift in attitude,
during.th~ inter~al b~t1:~ DYQ admiI1istrationi <;lithe test.) -1..:.
When there is both the possibility that the initial measure may
affect the results of the second measurement and the possibility of
genuine changes brought about by other factors, the common practice
is to try to steer a course between waiting long enough for the effects
of the first testing to wear off and not long enough for a significant
amount of real change to take place. If the second measurement is
administered before the effects of the first have worn off, the estimate
of stability will not be trustworthy because the results of the two
measurements will not be independent; the error is likely to be in the
direction of an overestimate of stability. On the other hand, if genuine
changes have occurred, the resulting coefficient will be an under-
estimate of the stability of the instrument itself. No hard and fast rules
can be offered for judging the optimal interval; much depends on the
specific nature of the test. Fortunately, one can expect the effects to
wear off most rapidly at the beginning, with a decreasing rate as time
goes on.9 In other words, there are diminishing returns for waiting
over longer and longer periods of time. Two weeks to one month is
co~ly_conside~~ be a suitable interv~~ fo~ many psychologial
tests. If i_? douhl, Mw~.v~r, it _is better to w.!!iLa19l!g~_uqtheLthan a
sliorterperiod of time, since with increasing time such errors as occur
are likely to be in tIie dir~ction of unclerestimati"Z>n of the stability pf
theii1s1~umeiiE':"ratlier tha~_ ~ve~esti~ati~ One is safer ,,:ith a~
underestImate than, an overestnuate:" m--the former case, the mvesti"
9 S~e the curves of forgetting in any standard textbook of psychology.
172 GENERAL PROBLEMS OF MEASUREMENT
tionnaire to measure worker morale in the hope that it will help us makt
predictions about the rate of absenteeism under specified conditions~ If
the questionnaire were completely unreliable-for example, if workers
whom it classified as having low morale were just as likely to show up
as having high morale on a second administration ten minutes later-
it would be impossible to observe a relationship between morale and
absenteeism, even if the two were in fact cIosdy related. If the ques-
tionnaire is not completely unreliable, we may be able to demonstrate
that some relationship exists between morale and absenteeism. How-
ever, if we hope to discover how close the relationship between the two
variables is, it is necessary t'o have highly reliable measuring instru-
ments.
DISTINGUISHING AMONG INDIVIDUALS AND AMONG GROUPS. All the
methods of estimating reliability that we have described, and most of
the others in common use, consist basically in determining whether
measurements at different times or by different forms of the ins~rument
place individuals in the same position in relation to the total group
tested. No matter what the subject matter 6 the test or the method of
estimating reliability, the question being asked is essentially: Do the
results of the two testing situations agree in where they place Charlie
(and Joe and Mary and each of the others) in relation to the average
score of the group? Charlie and Joe and Mary and each of the others
may score ten points higher in one testing situation than in the other,
but this will not show up as unreliability if each is in the same p'0sition
relative to the others on both measures. Nor will different changes in
scores for different individuals aff~ct the estimate of reliability unless
they change the position of the individuals in relation to One another.
Suppose that on the first measure Charlie scored 30, Joe, 40, and Mary
50; and that on the second measure Charlie scored 33, Joe" 4CJ\and
Mary 47. Since the relative position of the three would not be changed,
these shifts would not appear as unreliability. But suppose that on
the first measure Charlie has scored 39, Joe 40, and Mary 41. If Charlie
again gained three points on the second measurement and Mary again
" lost three points, their relative position would change; thus the
changes in scores between the two testing situations would appear as
unreliability.
THE RELIABILITY OF MEASUREMENTS 181
From the nature of these operations used in estimating reliability
follow several conseq uences: ~ (~ : I.J'
1. The reliability at a measurement procedure is always contingent
on the degree at _un_5;!rmitX at the given characteristic within the
population beIng measured. Small shifts in individual scores may lead
to changes in relative position in a group where the scores of many
individuals are close to one another, whereas the same shifts may not
lead to changes in relative position in a group where individuals differ
markedly from one another. Thus, a test with a low reliability in a very
homogeneous population may have a high reliability in a very hetero-
geneous population. Tests are sometimes published with deceptively
high estimates of reliability, computed on the basis of administration
to very heterogeneous populations, whereas the application of the test
may require the ability to distinguish among individuals in relatively
homogeneous groups.
2. High reliability- is more important it we wish to make fine
discriminations amQng)ndivig!Jals than it we merely wi~h to identity
pdopre;ho-;re at the _extrem~s. To demonstrate a sig;ificant difference
between two scores, the difference between them would have to be
approximately three times as great if the reliability coefficient were .10
than if it were .90; twice as great if it were .60 rather than .90; and
/
about 1.4 times as great if it were .80 rather than .90. Reliability is
obviously important for precise discrimination, and without it the fine
gradations of a measuring instrument are illusory.
3. Estimates at reliability apply to the average reliability ot Scores
of individuals in a group. They provide no estimate of the different
reliabilities of the sc;ores of each individual w!thin the group. It is, of
course, an approximation of unknown degree to assign the same reli- .
ability coefficient to ~cores of all individuals. Frequently, the reliability
of a score at one poiqt on a continuum is different from that at another
point; for example, ~ndividuals who have more intense attitudes may
be more consistent ~han individuals who are less intense (see Cron-
bach, 1949). The reliability of an average score is higher than the
reliability of the individual
I,
scores that go into the computation of that
average. If we are int~rested in group results, therefore, we can afford
to operate with measuring instruments of relatively low reliability,
182 GENERAL PROBLEMS OF MEASUREMENT
that there are sizable differences in the correlation of items with one
another. The problem then is to select from the possible available items
or measurement operations those that correlate most highly with one
another, and to increase the reliability of the measurement procedure
as a whole by increasing its internal consistency.
This method has rarely been used outside the field of psycho-
logical testing (including attitude measurement), but in this field it ~as
been quite successful. The most commOn practice is to begin with ~_
fairly large collection ofltemr,calculate a scorebased on each item,'"'"
anl another score based on-;~sponses to the' total set of ite~s:-The~
"~ ..._.~....-.-
------ -
- < ____ ....,..,..".
tile score for eacli Ifem is "correlated with the total score, and those
.
- ----~ - _. -- ........ _.__.-....
ifems are select~~ ll).at correlate most lilghly with this score. These
ifems are CTivid"ed into tWo equivalent g~oups; two ~ewscores are
calculated based on the two groups of selected items; and these
scores are correlated to provide a measure of the 'reliability of -the
"purified" test. If the new reliability is not satisfactory, the test may be
further purified in the same manner as before, or'additional items may
be added ofthe type represented by the selected items. "
Rather thaf; Correlating the score for each item with that for the
total test, the goal of increasing internal consistency may be approached
by the followirig procedure: The subjects are divided into ~o groups
-a high-scoring_and a low-scoriiig- one-on the-b-asls_or their total
scores:"Inhe,n~mber of s~ts IS quite large; as-it properly should be,
one takes extreme groups-say, the top and the bottom twenty per
~tanirem IS consistent with the complete set of items, th~~-th{'
proportion of high scorers who answer the item in a ,specified Wqy
should be significantly different from the corresponding proportion ,of
low scorers. Those items are most consistent with the total setwhich'
yield the largest differences in the appropriate direction.
As an example, let us consid_er the procedure used in constructing
a scale of anti-semitism for use in the Authoritarian Personality inv,esti-
gation (Adorno et at, 1950). A questionnaire consisting of 52 items
referring to Jews was administered to a group of female college stu-
dents. Let us consider the results for five of the items on the test:,
A. One trouble with Jewish businessmen is that they ~tick
together and connive, so that a Gentile doesn't have a fair
chance in competition.
THE RELIABILITY OF MEASUREMENTS 185
B. Colleges should adopt a quota system by which they limit
the number of Jews in fields which have too many Jews now.
C. Anyone who employs many people should be careful not
to hire a large percentage of Jews.
D. The trouble with letting Jews into a nice neighborhood
is that they gradually give it a typical Jewish atmosphere.
E. Most hotels should deny admittance to Jews, as a general
rule.
The respondents were asked not simply to agree or disagree with
each item, but to indicate the strength of their opinion, from "strong
support, agreement" to "strong opposition, disagreement." The reply
to each item was scored on a scale ranging from 1 (strong opposition
to anti-semitism) to 7 (strong anti-semitism), with a neutral point of
4. For each item; the mean scores of the 25 per cent who scored highest
and the 25 per cent who scored lowest on the total test were computed;
the difference between the two means was taken as the "discriminatory
power" of the item. The figures for our five items are given in the
following table:
Scales at Measurement
The effec~iveness of our behavior both in science and in everyday
life depends on our ability to distinguish among objects and to make
differential responses to them. Many of our activities require no more
.than the distinguishing of objects possessing qualities that are rather
sharply demarcated from those of others. To take obvious ~xamples
from daily life, it is both useful and simple to notice the differances
SCALES OF MEASUREMENT 187
,
between a pear and an apple, between an infant and an adult, between
a tennis court and a swimming pool, between the ringing of a telephone
and a Beethoven sonata.
Similarly, in the social sciences many of the distinctions that are
made are qualitative in nature. For example, we distinguish different
languages, different types of social system, different nationalities, and
so forth.
However, both in the sciences and in everyday life, it is often de-
sirable to make distinctions of degree rather than of quality. In daily
life we are frequently faced with the problem of selecting among
alternatives: Which person is more intelligent? Which type of cloth
is sturdier? Which teacher is more interesting? In the interest of both
accuracy of judgment and the discovery of constant relationships
among characteristics that vary in amount as well as in kind, science
pursues the objective of replacing statements that simply affirm or
deny differences by more precise statements indicating the degree ot
difference.
Although there is little doubt that quantification facilitates the
establishment of scientific laws, it should be recognized that measure-
ment exists in a variety of forms. Sometimes measurement has been
defined so as to exclude methods of data collection/
that permit only
qualitative discriminations. For example, McGregor (1935) defined
measurement as "the process of assigning numbers to represent quanti-
ties." Other writers, such as Weyl (1949), Stevens (1946,1951), and
Coombs (1953), have included in their concept of measurement any
empirical procedure that involves the assignment of symbols, of which
numerals are only one type, to objects or even~s according to rules.
Measurement tS possible only because there is a certain corre-
spondence betweenjthe empirical relations among objects and events,
on the one hand, a~d the rules of mathematics; on the other. We use
empirical procedures to determine the relations among objects and
events. In the case of physical objects, these empirical procedures may
take the form of direct manipulation. Suppose that one has a number
of bars of iron and ~ticks of wood, which he wishes to distinguish from
one another. On the basis of criteria we need not go into here, he
identifies some of .them as being wood (that is, as being equivalent
188 GENERAL PROBLEMS OF MEASUREMENT
NOMINAL SCALES
-
a given object, indiridual, or response belongs in a given category or
that it does not; in other words, it is the determination of equivalence
16
- ..
Our discussion lof measurement scales is directly indebted to Stevens.
17 Coombs (1950, 1953) has described, in addition, the "partially ordered
scale," which falls logically between the nominal and the ordinal scale, and the
"ordered metric," which falls logically between the ordinal and the interval scale.
190 GENERAL PROBLEMS OF MEASUREMENT
The distance between 8 and 9 may be equal to, less than, or greater
than the distance between'l and 2: With ordinal ;cales we are limited
to statements of gre.:;_ter, _S9ual, or less; we cannot undertake to state
how much greater or how much less.
An exaIiIple of' an ordinal scale in the physical sciences is the
Mohs' scale of hardness, which is applied to minerals. The empirical
relation in this case is the ability of minerals to scratch one another.
A diamond is ranked highest on this scale, since it can scratch all other
known minerals but none can scratch it; however, the scale does not
assert anything about how much harder a diamond is than other
minerals.
When the operation by which objects or individuals are placed
on a scale involves direct comparison of the individuals in terms of the
extent to which ~hey P,Ossess the attribute in question, it is ea~l... to see
[that the scal~ reflJ~ts.only the order of positio~s and n~t the dis~~
, between them. T.lus IS the case, for example, m a spellmg bee, or m ,a .
teacher's ranking of children in terms of cooperativeness, or in the
judgment of the relative desirability of applicants for a job. Although
the individual who is ranked highesris given tn.e number }, the next
highest 2, etc., it is clear that there is no necessary assumption that
# 1 is as much higher than #2 as # 2is than # 3, etc. - -
.., 'However, when the data needed for placing, a~_ individual are
gathered by an instrument that yields a numerical score, the fact that
one may still be dealing only with an ordinal sc:ale is sometimes ob-
scured. Suppose that three indivi,duals taking a spelling test: or an at-
titude questionnaire, receive scores of 100, 80, and 60, respectively. pur
knowledge of mathematical relations may dispose us to think tha,t the
person who scores 100 is as much higher than the one who scores 80
as the latter is above the one who' scores 60, But unless we have reason
to believe that the distance between 80 and 100 represents the same
amount of the attribute being measured, as does the distance between
60 and 80, these scores indicate only that the first person ranks higher
than the second, and the second higher than the third.
, 0~e statistics ,a,ppIicable to dat~ that permit, only ra~k ordering
are lImIted. In addltJon to those applIcable to nommal scales, one may,
strictly speaking, use only such statistics as medians, percentiles, and
SCALES OF MEASUREMENT 193
rank-order correlations. Within recent years there has been a rapid
expansion of statistical tests appropriate to data that are simply ranked
or ordered. 22
INTERVAL SCALES
tions; in f,act, almost all the usual statistical metho~ are applicable to
an interval scale.
As in the scales previously discussed, the zero point on an interval
scale is a matter of convention. Its arbitrariness is indicated by the fact
that a constant can be added to all scale positions without changing
the form of the scale. The arbitrariness of the zero point is apparent
when one compares the Fahrenheit and Centigrade scales of tempera-
ture. In the latter, zero corresponds to the point at which water freezes;
in the former, zero is well below that freezing point. Because the zero
point is arbitrary, multiplication and division are meaningless; although
relations between positions can be stated in terms of the distance
(".i.e., number of scale points) between them, they cannot be stated
in terms of ratios. Thus, with data that meet the assumptions of an
interval scale (but not of a ratio scale-see below), one cannot state
that a person's attitude is twice as favorable as that of another person,
just as one cannot state that 20 0 F. is twice as hot as 100 F., or 20 0 C.
twice as hot as 100 C. However, difierences between values on an
interval scale can be treated in terms of ratIos. Thus, we can say that
an individual who shifted from a score of '3 on an inter_val scale to a
score of 7 has changed twice as much as one who shifted from a score
of 3 to a score of 5. This is because the point of no difierence provides
an absolute zero.
RATIO SCALES
-
196 GENERAL PROBLEMS OF MEASUREMENT
Summary
Unstructured Observation
/
Structured Observation
How odd it is that anyone shollfd not see that all observation
must be for or against some view, if it is to be" of any
service. CHARLES DARWIN
and did not feel a need for quantifying their observations. The richness
of their data, based as it was on their subtle and perceptive approac:;h,
has tempted other social scientists to adopt similar methods. In so I
doing, they have frequently taken over not only the subtlety of the -
approach but the neglect of the possibilities for quantification as well.
This is not to imply that all observational data must be quantified,
but it is important to note that they can be.
Observation Play serve a variety of research purposes. It may be
used in an exploratory fashion, to gain insights that will later be tested
by other techniques; its purpose may be to gather supplementary data
that may qualify or help to interpret findings obtained by other tech-
niques; Or it may be used as the primary method of data collection in
studies designed to provide accurate descriptions of situations or to
test causal hypotheses. Observation may take place in "real-life" sit-
uations or in a laboratory. Observational procedures may range from
almost complete flexibility, guided only by the formulation of the
problem to be stuc;lied and some general ideas about aspects of prob-
able importance, to the use of detailed formal instruments developed
in advance. The observer may himself participate actively in the group
he is observing; he may be defined as a member of the group but keep
his participation to a minimum; he may be defined as an observer who
is not part of the group; or his presence may be unknown to some or
all of the people he is observing.
In general, the degree of structure and the degree of participation
tend to vary with the purpose of the study. In an exploratory study,
the observational procedures are iikely to be relatively unstructured,
I
Unstructured Observation 3
The first question the observer must face is: What should be
o~ "Everything" is an unachievable goal, si~e n"ot even the 0
----
exploratory technique,' the observer's understanding of the situation is
,
3_For'amore detailed'discussion of many problems arising in this type of obser-
vation, with emphasis on/situations where the observer participates in the group
activity, see Whyte (1951,). Much of the material in this section is taken verbatim
from that source, with Dr., Whyte's permission.
208 DATA COLLECTION: I
likely to change as he goes along. This, in turn, may can for changes
in what he observes, at least to the extent of making the content of ob-
servation more specific; and often the changes called for may be quite
radical. These changes in the content of observation are not undesir-
able. Quite the contrary; they represent the optimal use of unstructured
observation.
Suppose that an observer wishes to explore child-rearing practices
in a foreign culture. tI..e wilLp~bIy begin by observing situations in ~
- -wliich-mOf:~cf child are together. In the course of his initial ob-
servations he may discover that such situations are much less frequent
than he anticipated because mothers in the particular culture go out
to work while fathers or older siblings take care of the infants. As soon
as he has satisfied himself about this fact, the focus of his observational
efforts, will of course, shift to the persons entrusted with the rearing
of the young.
The shift in focus often goes hand in hand with narrowing~ the
scope of observation. Suppose an observer wishes to explore social rela-
tions among the families in a suburban community. He may begin by
observing street life, shopping centers, tile local drugstore; he may
attend club meetings and lectures, watch the crowds in front of the
local theater, visit sessions of the local governing body, mix with
parents waiting for their children a! the close of the school day, etc.
His initial observations may reveal that street life is hurried and un-
conducive to social interchange in the particular community; that the
resident families give their shopping orders by telephone or send .a
maid to the store; that the drugstore is a center of activities only for
adolescents; etc. Probably he will exclude these situations from his
schedule after an initial period of observation and narrow the focus of
attention to those more rewarding for his purpose.
However, although narrowing the range of situations to be ob-
served facilitates observation, it stiIlleaves the crucial part of the ques-
tion-what to observe-unanswered. Among the features of a social
situation that has been recognized as rewarding, which should be
noted?
No hard an~ fast rules can be laid down; the observer must always
be prepared to take his cues from unanticipated events. Nevertheless,
it may be helpful to provide a check list such as the one which follows.
UNSTRUCTURED OBSERVATION 209
The list indicates significant elements of every social situation; it
suggests directions of observation that may otherwise be overlooked.
l. The participants. Here one wants to know: Who are the par-
ticipants, how are they related to one another, and how many are there?
There are various ways of characterizing the participants, but usually
one will want to know at least the following about any person who is
being observed: age, sex, official function (e.g., "teacher," "doctor,"
"spectator," "customer," "host," "club president") in the situation
being observed and in the occupational system of the broader com-
munity. One will also want to know how the participants are related
to one another: Are they strangers or do they know one another? Are
they members of some collectivity, and if so, what kind-e.g., an in-
formal friendship group, a fraternity or club, a factory, a church?
What structures or groupings exist among the participants-e.g., can
diques, focal persons, or isolates be identified by their spatial group-
ings or patterns of interaction?
2. The setting. A social situation may occur in different settings-
e.g., a drugstore, a busy street intersection, a factory lunchroom, a
nursery school, a slum dwelling, a palatial mansion. About the setting
one wants to know, in addition to its appearance, what kinds of be-
havior it c::ncourages, permits, discourages, or prevents. Or the social
characteristics of the setting may be described in'terms of what kinds
of behavior are likely to be perceived as expected or unexpected, ap-
proved or disapproved, conforming or deviant.
3. The purpose. Is there some official purpose that has brought
the participants together, or have they been brought together by
chance? If there is an official purpose, what is it-e.g., to attend a
funeral, to compete in a boat race, to participate in a religious cere-
r '
mony, to meet as a committee, to have fun at a party? How do the
participants react ~o the official purpose of the situation-e.g., with
acceptance or with!rejection? What goals other" than the official pur-
pose do the partidpants seem to be pursuing? Are the goals of the
various participants:compatible or antagonistic?
4. The soci~l behavior. Here one wants to know what actually
occurs. What do th,~ participants do, how do they do it, and with whom
and with what do they do it? With respect to behavior, one usuaIIy
wants to know the' following: (a) what was the stimulus or event that
210 DATA COLLECTION: I
initiated it; (b) what appears to be its objective; (c) toward whom or
what is the behavior directed; (d) what is the form of activity entailed
in the behavior (e.g., talking, running, driving a car, gesturing, sitting);
( e) what are the qualities of the behavior (e.g., its intensity, persist-
ence, unusualness, appropriateness, duration, affectivity, manner-
isms); (t) what are its effects (e.g., what behavior does it evoke from
others )?
5. Frequency and duration. Here one wants to know the answer
to such questions as the following: When did the situation occur?
How long did it last? Is it a recurring type of situation, or unique? If
it recurs, how frequently does it occur? What are the occasions that
give rise to it? How typical of such situations is the one being ob-
served?
It should be emphasized that this list is not meant to apply in its
entirety to every situation observed. Frequently it is impossible to
obtai'n enough clues to permit such a comprehensive description. Or
the course I6f events may be too rapid to permit consideration of all
dimensions of a social situation. Or some aspect of an occurrence may
need the entire attention of the observer, to the virtual exclusion of
everything else. The list has its greatest advantage in planning the
content of observational activities.
outsider does not take as much for granted, and his questions are a safe-
guard against growing blind spots.
Reference to i check list, such as the one on pages 209-210, may
also be helpful in ,overcoming blirrd spots, especially if it is reviewed
216 DATA COLLECTION: I
with the attitude, "Have I been overlooking anything about this item
which is relevant in the context of this study?"
It is also possible to overcome blind spots by deliberately breaking
up the perceptual field so that the factors that lead it to be seen in a
particular way lose much of their force. The natural way of seeing a
situation (and the most valid for most purposes) is to see the action
centered around the principal characters. But sometimes the real
center of the action is not the obvious one. For example, one of our
associates has informally described a family that he has known fairly
intimately for many years. It had always seemed obvious to him that
the mother was the central character in the group. She was the manager,
the disciplinarian, the one who gave direction and set limits to the
activities of the children. The father seemed like a negative quantity.
He rarely spoke. When he came home, no one seemed to notice him.
There was never any exchange of greetings. He would be there, reading
a book; then you would notice that he was no,longer there, with hardly
more than a softly closing door to mark his departure. The children
developed certain behavioral disturbances, and itwas in trying to under-
stand these that our colleague eventually real~ed that he had totally
misperceived the family constellation. Actually, that entire family
gravitated around the person of the father. The mother was constantly
interpreting the wishes of the father, regulating things so that they
would fit in with the father's notions of how things should be. The
children were. very much aware that the ultimate source of approval
and disapproval was the father. And both mother and children attri-
buted a mystical power to the fath'~r's few words; they shared a belief
that even his most casual remark was bound to come true.
Sometimes one discovers that a parent who has been dead or
missing for many years is, nevertheless, the real center of a situation.
In any group, important leadership functions are not necessarily vested
in the manifest leaders; ther~ may be a variety of behind-the-scenes
sources of power without formal leadership status-individuals who
crystallize opinion, individuals who take over the organization of
. actions in emergencies, individuals who can block particular lines of
action, individuals who take the center of the stage on certain social
occasions, etc. By deliberately refocusing on individuals who do not
UNSTRUCTURED OBSERVATION 217
appear to be central in the group, one may gain new insights about im-
portant relationships.
A very different kind of check on the accuracy of observation and
interpretation may come from the people who are being observed, if
the investigator establishes the sort of relationship with them which
make it possible for him to take them into his confidence about the
research. Whyte reports, for example, (1951) that in gathering mate-
rial for Street Corner Society, he had innumerable research discussions
with "Doc," one of the key figures in the group he was observing, and
that Doc read every page of the first draft of his manuscript. Usually,
of course, the participants in the situation cannot check on the validity
of theoretical interpretations, but they can ten the observer whether he
has caught the meaning the situation and the behavior have for them.
Rosenfeld (1958) has suggested that the situation of being a par-
ticipant observer is likely to create inner conflicts within the investi-
gator which may interfere with objectivity. She points out that,
especially if the group being observed is undergoing an emergency of
some sort, there is strong pressure on the observer to become an active
participant, to the extent of abandoning at least temporarily his de-
tached position as an observer. If he does not do so, he may feel guilty
about not helping when help is needed. On the other hand, if he does
enter completely into the activities of the group,/he becomes anxious
about losing his identity as a scientist. In order to re-establish his posi-
tion as an objective investigator, he may lean over backward to separate
himself from the group he is observing; in doing so, he may become
susceptible to sources of negative bias and distortion. Rosenfeld
suggests that the first step in safeguarding against bias arising from
inner conflicts is to be aware of the conflicts and of the nature of one's
defenses. With this awareness, one can develop specific safeguards
appropriate to the mature of the conflicts and the situation being
studied.
The need to prepare both oneself and the field carefully for ob-
servation in a real-life setting cannot be emphasized too strongly. For
here, more than in many other techniques, mistakes in approach are
218 DATA COLLECTION: I
settings, the investigator can arrange the major aspects of the situation
in such a way as to suit his research purposes and reduce the danger of
unexpected interference from disturbing factors. Few of the prob-
lems that arise with observation in a community setting need trouble
the observer if he can arrange and control the situation. Here his
observational activity is often, but not necessarily, reduced to noting
the presence, absence, or intensity of clearly specified types of behavior,
much as the animal experimenter observes a rat's behavior in a specific
way under controlled conditions. Of course, such control of the situa-
tion is appropriate only when the investigator already possesses a great
deal of information about the phenomena he wishes to study .
.' Katz, Goldston, and Benjamin (1958), for example, created a series
'of controlled situations to test predictions 'about Negro-white social
interaction. The hypotheses were suggested by findings in field studies
of interracial contact and in experiments on the dynamics of small
face-to-face groups. Male college students, white and Negro, were
"hired" to work together in groups of four (two whites and two
Negroes) for several three-hour sessions. M~mbers of each group re-
ma'ined together throughout their employmenf and had no contact
with other groups. The subjects were given various group tasks (osten-
sibly materials that were being developed for vocational aptitude tests),
which included mental problems, mechanical construction, human
relations problems, map drawing, and a game that required a high
degree of coordination of effort among the four participants. Two
types of hypothesis were tested. First, there were predictions having to
do with the effect of the general 4ifference in social sfatus between
Negroes and whites in our culture on the content and direction of
communication between them: that the ~hite students' would tend to
ignore the Negroes, and that the Negro students would speak less than
the white participants and would direct most of their remarks to the
latter. The second group of predictions concerned the effects on
interracial behavior of two experimental variables: group reward versus
individual reward, and high group prestige versus no prestige. It was
hypothesized that group rewaJil and group prestige would tetid to
reduce the divisive, effects of disparity in social status-specifically, that
these experimental conditions would bring about greater friendliness
Ilnd cooperation between men of the two races, less behavioral restraint
STRUCTURED OBSERVATION 223
on the part of Negroes, less "bias" in the direction of communications
of both Negroes and whites, and higher group productivity.
In order to measure amount and direction of relevant types of
behavior, it was first necessary to develop a set of reliable categories for
systematic observation and recording. Many weeks of preliminary work
with pilot groups preceded the establishment of a satisfactory classifica-
tion scheme. It consisted of twenty-eight interaction categories for
describing such facilitative and disruptive behaviors as giving help,
advice, information, and encouragement; rejecting another's sugges-
tions, hoarding materials, disparaging another's contributions, ex-
pressing anger, and so on. Direction of behavior was recorded by noting
the initiator and recipient of every social action. Specific categories
were developed for the various tasks, so that the unique effects of each
work situation could be ascertained. The observers were in a room
adjoining that in which the subjects worked, behind a one-way screen;
a recording device made it possible for them to hear what was being
said. Although the subjects could not see the observers, they were told
that they were being observed; the reason given them was that, in
order to improve the aptitude tests, it was necessary to have a detailed
record of how people went about working on them.
On the whole, the results of this rather elaborate investigation
tended to support hypotheses about the effects of status disparity and
of group reward; predictions about the effects of group prestige tended
to be contradicted.
Since the situation and the problem are already specified, the
observer is in a position to set up in advance the categories in terms oj
which he wishes .to analyze the situation. When he starts out, he i3
likely to have a cO~$iderable number of categories. As he tries out his
instrument, he may find both mechanical problems in observation a,'J
failures in reliability. To meet these problems, categories are drop;)_J
224 DATA COLLECTION: I
B
4 Gives suggestion, direction, imply-
ing autonomy for other: .
j
7 Asks for orientation, information,
repetition, confirmation:
8 Asks for opinion, evaluation, analy-
c sis, expression of feeling:
9 Asks for suggestion, direction, pos- I+-
sible ways of action: /
10 Disagrees, shows passive rejection,
formality, withholds help:
Social-
Emotional D 11 Shows tension, asks for help, with-
Area: draws out of field:
Negative 12 Shows antagonism, deflates other's ,
:status, defends or asserts' self:
KEY:
a Problems of Communication A Positive Reactions
b Problems of Evaluation B Attempted Answerr:.
I
c Problems of Control C Questions
d Problems of Decision D Negative Reactions
e Problems of Tension Reduction
f Problems of ReIntegration
9 Reproduced from Bales (1950).
DATA COLLECTION: I
ference. The more one is interested in details, however, the more im-
portant the choice of the proper frame of reference may become.
It often happens that certain data can be coded only in retrospect.
Whether or not a remark precipitates group tension, for example,
can be determined only in the light of the events following that re-
mark. To handle this type of categorizing, some studies have made tape
recordings of the observed situation; others have demanded that the
observer pause periodically to go back over his notes in order to make
codings in the light of subsequent events.
TIME UNITS. The amount of time included in one notation by
an observer may range from a few seconds to several hours. The central
problem in setting up time units is to determine what a psychologically
meaningful unit would be. For example, it may.not be sensible to
make a rating of the constructiveness of a child's play with a certain
toy every two minutes. Such a rating may have to be based on t.he
complete sequence of events in a child's use of a toy. A typical way of
meeting this problem is to use more than one observer. One of them
watches for those acts that must be poted a~ they occur, such as brief
comments or small bits of motor behavior. Another takes a larger view,
)loting those behaviors that would be distorted, by strict adherence to
time or other sampling citeria, and codes thes~ in terms of an index or
rating scale whenever he feels that he has enough data for his purposes.
WHAT IS AN ACT? The definition of an act is difficult when one is
attempting to categorize the verbal behavior of a person. Is an act a
sentence, a pause for breath, a complete thought, or the least _notice-
able difference between one idea apd another in a given speech? An
act is even more difficult to define when recording ~otor behavior,
since the person being observed seldom separates his movements one
from another as neatly as the categories on the observation schedule.
The problem is still further confounded 'Yhen dealing with group
yhenQmena. Is the act of a group a speech by one member, a decision
\reached, a function carried out, an event, a set of events, a program
item, the completion of an item on an agenda, a mood shift, or what?
The definition of an act will be determined by the frame of reference
used as well as by the size of time units recorded. "
The most frequent practice in coding verbal behavior is to code
each complete thought separately. Thus, one sentence may includr
STRUCTURED OBSERVATION 227
thoughts that fit several categories or none at all. The observation of
motor behavior is usually concerned with the general nature of the
behavior, such as sitting, walking, slouching, gesturing, handling, etc.
The recorded acts of a group, on the other hand, are usually decisions,
or the completion of items on an agenda or of certain phases of the
meeting.
RATING SCALES VERSUS ALL-OR-NONE CATEGORIES. If the research pur-
pose requires only a record of the objective facts of behavior, without
any further qualification, all-or-none categories are usually adequate.
"Speaking" versus "not speaking" is an example of an all-or-none
category; the observer simply places a tally on his sheet when a given
person speaks, and makes no tallies for this person until he speaks
again. A series of discrete categories intended to describe the behavior
of the speaker may take the form of all-or-none tallies. For example, in
one study the observer noted, for each act of a youth-group leader,
whether the leader was acting in the role of a comrade, a policeman,
a referee, an educator, or a coordinator.
Often, however, observers are asked to describe the behavior of
a group member by making ratings on scales. In watching the leader
above, for example, the observer rated each leadership act in terms of
the degree of freedom implied and the strength/of the influence ex-
erted on the group members by the act.
PRESERVING THE PATTERN OF THE PHENOMENA. The personal inter-
actions or group phenomena in any situation make up a number of
themes which it may be important to preserve in their original pattern.
In many studies, of course, it is sufficient to get a record of acts apart
from their context. An example is that of tallying the frequency of
remarks addressed to a deviant group member. In other studies, how-
ever, the main function of the data is to specify the nature of some
general pattern of behavior in the group, such anhe teaching method
being used, the rise ~md fall of group tension, or the establishment of
group procedures forihandling the feelings of members.
In part, this i~ a problem of analysis. However, a reliable observa-
tional device may t\:\rn out to provide only fragmented information,
which cannot be combined into a meaningful picture. It is necessary,
therefore, that the tryout stage of any instrumeI\t include attempts
228 DATA COLLECTION: I
to code the data in order to make sure that information of the desired
type can be obtained.
A study of leadership among nursery school children in Hungary
(Merei, 1949) provides an example of how a complex set of observa-
tions recorded by a number of different observers can be combined to
give a total pattern. Merei was interested in the relationship between
leader and group, particularly in whether leadership at this early age
is maintained when a child leader is put into groups which have various
degrees of cohesiveness and with which he is not familiar. Some of
the criteria Merei used for identifying leadership behavior were: giving
orders more frequently than following orders; being imitated more
frequently than imitating; attacking more often than being attacked.
Categories for recording the behavior of each child were established
according to these criteria. In addition, categories were developed to
describe other dimensions of the group situation. It was decided that
several observers were needed to note all of the relevant dimensions of
behavior. Some of the observers concentrated' on describing the activi-
ties of the group by five-minute intervals; others recorded the behavior
of the leader, his relation to other children and to the on-going activity,
for the same intervals. Others singled out the following dimensions of
behavior: group formation and isolation; imitation; order-giving and
response to orders; taking the initiative; ownership and change of
ownership of toys; etc. Typical forms of behavior were itemized und~r
each of these categories. In the ownership category, for instance, the
fOllowing forms were distinguished; the child asks for somethi.ng and
receives it; the child asks for something and does not receive it; the
child takes an object away from another child; etc. Ea9h form of be-
havior was given a special written symbol to facilitate speed of reco~d
ing. When the observers had mastered this symbol language, the actual
recording could proceed at considerable s,J?eed. Synchronizing the
various protocols at fivecminute intervals made it possible to reconstruct
the complete scene when the data were analyzed.
RECORDING OBSERVATIONS
In one sense, all that has been said thus far is relevant to the
problem of obtaining adequate reliability in the use of an observational
instrument. That is, wise and consistent procedures iu the develop-
, ment of such an instrument will greatly enhance reiiability, assuming,
of course, that the observers have been trained to interpret their in-
STRUCTURED OBSERVATION 231
structions similarly and have practiced enough to develop the skills
necessary for proper categorizing and recording. There are some
special problems, however, in achieving reliable and valid observation
that are worth separate consideration.
One problem grows out of inadequate definition of the kinds of
behavior that are to be taken as corresponding to a given concept.
Berkowitz and Guetzkow (1949), for example, have pointed out that
in categorizing different groups in terms of "pleasantness of group
atmosphere," one observer might be especially sensitive to the per-
sonal liking of the group members for one another, another to the
informality of the relationships among members, another to still a
different dimension of pleasantness of atmosphere.
Another factor that may lower the reliability of even well-trained
and skilled observers is the degree of confidence one must have in
one's judgment before marking a given category. If observers are re-
quired to rate the presence of "ego need" behavior (an example used
by Berkowitz and Guetzkow) even when little behavior relevant to
such a rating occurs, one observer may rate the extent of it as being
greater than it "actually" is simply because he himself has a predisposi-
tion to perceive evidence of "ego need" behavior; another observer
may rate it as lower than it "actually" is because he requires more evi-
dence or, perhaps, greater confidence before making a decision about
the presence of "ego need" behavior.
One of the greatest sources of unreliability is the constant error
introduced by the observer because of distortion of his perceptions by
his own needs Or values. An observer who sharply disapproves of certain
leadership practices, for example, will have difficulty in preventing a
bias; he may code more of the leader's behavior as falling in the
categories he disapprovbs of than would another observer who feels
less strongly on the matter. Adequate training and practice can over-
come this in most persons, though not in all.
I
Although,research observers have been able to rate a group reliably
on as many as fifty categories and even mOre, there is a point at which
the load can seriously hamper reliability. The major result of overload-
ing is that the observd,cannot record all the relevant data and may
unwittingly record some aspects more adequately than others, thus
giving a biased account. This may result from fatigue, which causes
232 DATA COLLECTION: I
the observer to slow down and later spurt; from avoidance of more
difficult categories in order to keep up the pace; or from any of a
number of other reasons. Overloading can be prevented by standardized
rest periods, by distributing the job among several observers or by
mechanically recording the session (if such recordings are suitable).
Obviously, one important method of increasing reliability is care-
ful training of observers. A well-developed observational procedure can
be spoiled by differences among users or by failure to understand the
rules for its use. It is necessary, therefore, that the investigator plan to
invest a good portion of time in training the observers. The elaborate-
ness of the training depends on the complexity of the observer's task.
One study, which used almost one hundred observers in several dif-
ferent parts of the country, determined the length of the training
course by the amount of time it took the trainee to obtain adequate
agreement with the observer-trainer. When he had achieved sufficient
skill, he was allowed to begin observations.
A typical training program begins with an explanation of the pur-
poses and theory involved in the given study and then moves on to an
explanation of the categories and the rules fortheir use. The purpose
of each category in relation to the theory and to specific hypo'theses is
pointed out. After the trainees have had an opportunity to ask ques-
tions, they try to use the schedule on a group that is attempting to
demonstrate phenomena of the type the observers will be expected to
code when the actual data collection begins. The trainee naturally
encounters difficulty in selecting proper categories, sampling, keeping.
abreast, deciding how to categorize marginal cases, etc. These difficul-
ties are ironed out by discussion and further practiC)::. Next comes a
tryout"in.a pilot study oh a group similar to the one the trainee will later
be expe'cted to observe. Here again difficulties arise which ~n be cor-
rected. At this point, or a little later, it may be helpful to use tape re-
cordings or motion pictures in orq.er to check events that were coded
differently by different observers. Now the observers are ready Jor reli-
ability tests, followed by subsequent practice and more,reliability tests
until the trainer is satisfied that the observers have become useful and
I comparable measuring instruments.
Here too, as in the case of less structur~d observation, checking
and increasing reliability does not eliminate the possibility of a con-
STRUCTURED OBSERVATION 233
stant bias shared by two or more observers.lO There are no simple tech-
niques for dealing with this problem. If it seems likely that such con-
stant biases may seriously affect the findings, it may be desirable to
have two or more observers with different backgrounds record the same
events, at least during a preliminary period.
are no threat. Deutsch also found that the members of small groups
were much more aware of the observer's presence (as indicated on a
rating scale marked after each weekly meeting by the group members)
at the beginning of their experience with them than they were after
they had been observed for three meetings. Many investigators believe
that it makes little difference in the observed behavior of the group
members whether the observer sits in the room with the group, behind
a one-way screen with the group aware that he is there, or behind a
one-way screen with the group left to wonder whether he is there or not.
7
DATA COLLECTION
II. Questionnaires and Interviews
ADVANTAGES OF QUESTIONNAIRES
able variation from home to, home in the conditions under which the question.
naire is filled out. In one home:, for example, the questionnaire may be filled out
by the head of the family, in 3I':other by some other member; in one the question.
naire may be given time and attention, in another it may be competing with a
television broadcast or a crying baby.
240 DATA COLLECTION: II
ADVANTAGES OF INTERVIEWS
It has been estimated that, for purposes of filling out even simple
written questionnaires, at least 10 per cent of the adult population of
the United States is illiterate. For complex questionnaires, the per-
centage would undoubtedly be considerably higher. 6 Thus, one of the
major drawbacks of the usual questionnaire is that it is appropriate only
for subjects with a considerable amount of education. Complicated
questionnaires requiring extended written responses can be used with
only a very small percentage of the population. Even many college
graduates have little facility for writing, and of those who do, few have
the patience or motivation to write as funy as they might speak. Hence,
" questionnaires are not an appropriate method for large segments of
the population; for those for whom they are appropriate, the burden
of writing or of maintaining interest is great enough to limit the
number of questions that may be asked and the ful1ness of the
responses. On the other hand, interviews can be used with almost aU
segments of the population; in fact, in contrast with ;he questionnaire,
a frequent problem in interviewing is that of limiting the responses of
the verbose individua1.
Surveys conducted by personal interviews have an additional
advantage over surveys conducted by mailed questionnaires in that
they' usually yield a much better sample of the general population.
Many people are willing and able to cooperate in a study when all they
have to do is talk. When questionnaires are mailed to a random sample
of the population, the p,roportion of returns is usually low, varying from
about 10 to 50 per cent There are many factors that-influence the per-
centage of returns to' a mailed questionnaire: Among the most
important are: (1) the sponsorship of the questionnaire; (2) the attrac-
tiveness of the questionnaire format; (3) the length of the question-
naire; (4) the nature qf the accompanying letter requesting coopera-
6 Subjects with limited education may be able to fill out questionnaires with
the help of questionnaire a,dministrators. However, in such cases the questionnaire
loses much of its advantage over the interview with respect to economy.
242 DATA COLLECTION: II
tion; (5) the ease of filling out the questionna.ire anci mailing it back;
(6) the inducements offered to reply; (7) the nature of the people to
whom the questionnaire is sent. Attractively designed questionnaires
that are short, easy to fill out, simple to return, sponsored by a group
with prestige, and presented in a context that motivates the respondent
to cooperate are most likely to be returned. However, even under the
best of circumstances a sizable proportion do not return questionnaires.
The people who do return them are usually the less mobile (and thus
the more likely actually to receive the questionnaire), the more in-
terested, the more literate, and the more partisan section of the
population.7
Another advantage of the interview is its greater flexibility. In a
questionnaire, if the subject misinterprets a question or records his
responses in a bafHing manner, there is usually little that can be done
to remedy the situation. In an interview there' is the possibility of
repeating or rephrasing questions to make sure that they are under-
stood or of asking further questions in order to clarify the meaning of
a response. Its flexibility makes the interview a _f~r superior technique
for the exploration of areas where there is little basis for knowing either
what questions to ask or how to formulate them.
In addition, the interviewing situation offers a better opportunity
than the questionnaire to appraise validity of reports. The interviewer
is in a position to observe not only what the respondent says but also
how he,. ~ays it. He can, if he wishes, follow. up contradictory state-
ments. If heed be, the interviewer can directly challenge-the subject's
report-in order to see how consistent his, answers will be.
The interview is the more appropriate technique for revealing in-
formation about complex, emotionally laden subjects or for probing
the sentiments that may underlie an expressed, opinion. If a verbal
:r~port is to be accepted at face value,' it must be elicited in circum-
stances that encourage the greatest possible freedom and honesty of
expression. Although, as already noted, an anonymous questionnaire
may sometimes be the most effective way of producing such a permis-
sive atmosphere, its usefulness is limited to issues on which respondents
have rather clearly formulated views that can be simply expressed. The
, more or less rigid structure of questionnaires, the fnability to explain
7 For a fuller discussion, see Parten (1950, Chapter 11).
INTERVIEW AND QUESTIONNAIRE 243
fully in writing one's asocial or antisocial feelings and behavior, and
the solemnity and permanent nature of a response that is put on paper
in one's own handwriting or (if the questionnaire is not anonymous)
under one's own name-all work against frank discussions of socially
taboo or socially controversial issues in response to a questionnaire.
With respect to manY""l.uestions, an interview is likely to be more
successful in creating an atmosphere that allows the respondent to
express feelings or to report behaviors that are customarily disap-
proved. 8
In the interview situation, the "social atmosphere" can be varied
in other ways. Behavior il1 real life occurs in situations that are seldom
free from social pressures. The interview, more than the questionnaire,
allows one to approximate in the measurement situation these varying
social pressures, since the interviewer can, within limits, vary the nature
of the atmosphere as he questions the respondent. He can, for example,
point out objections to the position of the person being interviewed,
and observe how the latter responds. 9 This is a very useful flexibility,
especially if the ultimate objective of the measurement is to predict
behavior in varied situations.
Question Content
A person's beliefs about what the facts are will often give very
clear indications of :his feelings and his desires. The converse is also
I
true; an emotional reaction will sometimes reveal beliefs that a subject
is unable to
.
verbalize.
I
To understand a person's behavior, knowledge
of his f~elings may ~be at least as fruitful as knowledge of his beliefs.
In questionnaires, perhaps the most common method of investi-
10 For a discussiop of techniques that employ distortions in perception and
memory a~ a method oflmeasuring social attitudes, see Chapter 8.
U For a discussiori of the measurement of the various characteristics of beliefs,
see Krech and Crutchfield (1948). .
248 DATA COLLECTION: II
betting in New Jersey?" The first received more "don't know" and
more opposed answers. This suggests that the "is it desirable" form was
answered from a social or moral point of view, whereas the "would you
vote" form was answered in terms of personal preference.
In a study of reactions to prejudiced remarks (Selltiz et al., 1950),
subjects were shown a skit representing an informal situation in which
an anti-semitic remark was made before a group of people. In inter-
views following the skit, the subjects were asked a series of three
questions: "What do you think is the right thing to do or say? What
do you think you yourself really would have done in a situation like
this? What do you think most people would do or say in a situation
like this?" More than half (56 per cent) of the respondents replied
that the right thing would be to answer the anti-semitic remark in
some way (that is, verbally express disagreement with it); only 35 per
cent said they themseTves would have answered the remark; and only
15 per cent said most people would answer the remark.14
feelings, etc., about a given entity: e.g., "What do your friends, rela-
tives, clubs, etc. (feel, believe, etc.) about ?" "What evidence
is there to support your beliefs, feelings, etc., about ___ ?" (4) The
personal desires, motives, values, or interests involved in a given reac-
tion: e.g., "Is there anything about yourself that makes you want to
(believe, feel, or act in a given way)?" (5) The specific situations and
circumstances in which a given reaction occurs: e.g., "In what types of
situation are you most likely to (feel or act in a given way)?"
In addition to these reasons tor a given belief, feeling, action, etc.,
it may be relevant to inquire into the reasons against alternative beliefs,
actions, etc. It may also be important to distinguish between past and
present influences; for exampl~, between reasons for starting on a given
course of behavior and reasons for continuing it.
Once the investigator has decided which kinds of influences are
likely to be relevant to his particular question, he sets up an "account-
ing scheme" (see Zeisel, 1957), mapping out, in preliminary fashion,
the various kinds of reasons in which he is interested and providing
questions to tap each of them. The following illustration of a set of
questions to serve as a guide in an interview aimed at learning why an
individual selected a particular college is ad~pted from ZeiseI. Note
that it starts with the general question, "Why?," permitting the investi-
gator to find out what is salient in the mind of the respondent, and then
provides specific questions to cover the history of the choice and the
kinds of influences in which the investigator is interested but which
the respondent may not have mentioned in his spontaneous reply:
about half and half, more satisfied than dissatisfied, definitely satisfied.
The coder may find it difficult to decide in which of the three middle
categories to place this man. The man himself, however, might have
little difficulty in making the judgment if he were presented with the
alternative positions.
Most of these advantages of fixed-alternative questions have, how-
ever, corresponding disadvantages. One of the major drawbacks of the
closed question is that it may force a statement of opinion on an issue
about which the respondent does not have any opinion. Many individ-
uals have no clearly formulated or crystallized opinions about many"
issues; this important characteristic is not likely to be revealed by a
closed question. Inclusion of a "Don't know" alternative may help to
provide an indication of a lack of crystallized opinion, but the tendency
in much interviewing with questions of this sort is to press for a def-
inite response and to accept a "Don't know" only as a last resort. Under
such pressure, the answer chosen by a respondent may be an artifact of
the specific wording or phrasing of the question or of the stated alter-
native responses. Suppose one were to ask, "Do you approve or dis-
approve of the Eisenhower Doctrine for aid to Middle Eastern coun-
tries threatened by Communist aggression?" It is easy to say "Approve"
or "Disapprove," and many respondents may find this less embarras-'
sing than admitting that they don't know what,the Eisenhower Doc-
trine is, much less have an opinion about it. In the closed question, the
reply is taken at face value. Open-ended questions, especially when
they are used in an interview and can be followed by, probes, provide
a much better indication of whether the re~pondent has any informa-
tion about the issue, whether he has a clearly formulated opinion about
it, and how strongly he feels aboutit.
Even when a respondent has a clear opinion, a, fixed-alternative
question may not give an adequate representation o,f it because none
of the choices corresponds exactly to his position, or because they
do not allow for qualification. Ta~e such a question as, "Which of the
following considerations are mos~.important to you in choosing a job?
Interesting work; opportunity to assume responsibility; pleasant sur~
roundings; congenial associates; opportunity for advancement; high
salary; s~curity. Place a 1 next to the one most important to you, 2 next
.to the one that is next most important, etc." Let us suppose that .the
TYPES OF INTERVIEWS AND QUESTIONNAIRES 261
items cover the range of relevant considerations for a given respondent,
and that he has a fairly clear view. But his view may involve interrela-
tions among the factors. In general, interesting work may be more im-
portant to him than a high salary: However, given a choice between
two jobs, one of which pays twice'as much as the other but is slightly
less interesting, he may choose the higher-paying. Or there may be
some lower limit of salary beyond which he feels he cannot afford to
go, no matter how interesting the work. Such qualifications can be ex-
pressed in reply to an open-ended question; a closed question not only
makes no provision for them, but even discourages the respondent
from thinking about them.
Omission of possible alternative responses may lead to bias. Even
when a space is provided for "other" replies, most respondents limit
their answers to the alternatives provided. Omission of an alternative
may seriously change the replies to even a factual question such as
what magazines people read. In a study of applicants who were accepted
by a certain college but did not actually enter, the subjects were pre-
sented with a check list of reasons for not attending. These reasons
included such factors as the location of the college, its cost, the fact
that it was not coeducational, the fact that specific desired courses were
not offered, etc. However, the possibility that the applicant had in the
end entered another college because it had a generally high,er academic
reputation was not included. Although a few respondents added this
in the space provided for "other reasons," there was no way of esti-
mating how many would have checked it had it been included among
the suggested alternatives. Unless one can be reasonably certain, on
the basis of either the logical possibilities or prior investigation, that the
alternatives presented adequately cover the complete range of prob-
able responses, it is safer to ~se an open-ended question, which does
not bias the responses by suggesting some but not others.
The fact that the wording of questions is the same for all respond-
ents may conceal the fact that different respondents make different in-
terpretatio~s, some of which may be quite different from those in-
tended by the interviewer, This possibility exists, of course, in both
closed and open questions, but I,
it is much more likely to go undetected
in the former. An instance of interpretations made from such varying
frames of reference as to make the meaning of the obtained replies
262 DATA COLLECTION: II
Itudies, the dolls were used not only in connection with questions
::lirected toward the child's awareness of his own racial identification
but toward his attitudes: "Which doll do you like best?," "Which doll
is prettier?," etc.
Clark and Clark (1950) used an interesting variation of a pictorial
technique to get at children's awareness of their own racial identifica-
tion and their feelings about it. They presented Negro children with
a box of crayons, including a range of shades of brown as well as the
colors usually included in children's crayon sets, and two line drawings.
The child was asked first to color one figure "the color that you are,"
then to make the other one "the color that you like little girls (boys)
to be."
The use of pictorial techniques has not been limited, however, to
studies of children or of intergroup attitudes. Murphy and Likert
(1938) made use of both photographs and motion pictures in a study
of the attitudes of college students. The photographs, borrowed from
news services, all showed conflict situations-strikes, war, race riots. In
connection with each picture, the subjects were asked to answer such
questions as: "Describe 1:?riefly in outline form your reaction to this
photograph. . . . In this situation, with whom do you sym-
pathize? . . . What do you like or dislike in. this photograph? Why?"
Three motion pictures were shown: one portraying the aftermath ofa
race riot; another; the attempt of a mob to storm a courthouse with the
intent of lynching a Negro who was in custody there; and the third,
fleet maneuvers. After seeing each film, the student was asked to write
briefly what he thought about it, and then to express his agreement or
disagreement with a number of st~tements related to it (e.g., "Riots of
this kind are a tragedy for both the white and black races") . I
A Concluding Note
From this survey of questionnaire a~d interview procedures, it is
apparent that'the investigator interested in individuals' self-reports has
a choice of many different ways of eliciting them. In making such
decisions as whether to use a questionnaire or an interview, whether to
use a standardized or a less structured form, and whether to supple-
mcnt the verbal material by visual aids, he need~ to consider the advan-
tages and disadvantages of each approach in the light of the, purpose
of his study.
A CONCLUDING NOTE 277
The investigator should be concerned, of course, with the reli-
ability and validity of his measures. Although it seems likely that most
investigators are concerned with these matters in the sense of hoping
that their measures are reliable and valid, this concern has not often
been expressed in attempts to determine the reliability or validity of
the instruments used. This is at least as true of interviews and question-
naires as of other types of measuring instruments-perhaps even more
so, since most interviews and questionnaires are specifically designed
for the purpose of a single study, and thus do not have the benefit of
testing by repeated use.
An occasional investigator has tested the reliability of his instru-
ments by having two different interviewers interview the same individ-
uals, or by repeating a questionnaire or interview with the same in-
dividuals after a lapse of time. To be sure, this procedure is time-con
suming, and it is not always easy to secure the cooperation of subjects
for a repetition of the same interview or questionnaire to which they
have already responded. This latter difficulty can be lessened somewhat
by changing parts of the instrument, repeating only certain selected
questions for the test of reliability.
In connection with a few of the measurement techniques de-
scribed in this chapter, the investigators reported evidence about
validation; for example, Pace's testing of his /measure of political-
economic-social attitudes on .known "liberal" and "conservative"
groups, and Mussen's comparison of scores on the "faces test" with
sociometric choices. Many other investigators have attempted to find
ways of assessing the validity of their measures. Nevertheless, it remains
true that many-probably most-questionnaires and interviews have
been used without evidence of their validity. Again, the reasons are
not hard to find. It is not always easy to determine what would be
appropriate criteri~ of validity. Even if one can identify what would
I
constitute appropriate evidence, it may not be feasible to gather the
necessary data. But without such evidence, one can only hope that his
instruments are actually measuring what he believes they are measur
ing. Thus it would seem desirable to devote more time than is usually
given to investigation of the reliability and validity of questionnaire~
and interviews, as irell as of other instruments for collecting data.
It has been found in many public opinion surveys that even slight
278 DATA COLLECTION: II
Projective Methods
.Structured Disguised Tests of Social Attitudes
I
Substitute Measures
A Note on Validation
T ECHNIQUES that rely on the individual's own report of his be-
havior, beliefs, feelings, etc., presuppose, as has already been
pointed out, that the person is willing and able to give such informa-
tion about himself. But this is not alwayspue. People may be unwilling
to discuss controversial topics or to reveal intimate information about
themselves. They may'be reluctant to express their true attitudes if they
believe that such attitudes are generally disapproved. Or they may be
unable to give the desired information, either because they cannot
easily put their feelings into words or because they are unaware of their
feelings about the matter in question.
To get around these limitations, techniques have been devised that
are largely independent of the subject's self-insight and of his willing-
ness to reveal himself. These indirect techniques may be grouped in. two
broad c1assesl _differing in their degree o! structure. The less structured
ones are commonly referred to as projective methods; among the more
structured techniques we may identify disguised methods and sub
stitute measures.
Projective Method~
-
Frank (1939), the originator of the'term, has -given the following
of
definition a projective technique:
value-that is, with the meaning that the subject presumably would
expect them to have-but are interpreted in terms of some pre-estab-
lished psychological conceptualization of what his responses to the
specific test situation mean. 2 This underlying conceptualization pro-
vides the framework for interpreting the responses. Usually the system
of interpretation provides for considering responses not in isolation
but in terms of patterns. In effect, the clinician attempts to arrive at a
psychologically coherent picture of the individual by deriving the full
meaning of any particular response tendency from the total record of
his replies. _
One of the most frequently used projective techniques in the \
clinical setting is the Rorschach Test, consisting of ten cards, on each
of which is a copy of an ink blot. The subject is asked, "What might \
this be?" Another commonly used technique is the Thematic Apper-
ception Test, or T.A.T. This test consists of a series of pictures about
which the subject is asked to tell stories. In some of the pictures the
persons or objects are quite clearly represented, in others they are not;
some. of the pictures deal with ordinary or ~sual events, some with
situations that are unusual or bizarre.
Techniques such as these are designed to elicit a rich sample of
behavior, from which a great variety of inferences can be drawn. Some
inferences may have to do with adaptive aspects of the person's be-
havior; that is, how well he carries out the task posed by the test (to
tell. what an ink blot looks like, to make up a story, etc.). Others have
to do with expressive aspects. The way the person deals with the
materials of the test is taken as reflect~ng the "style" of his personality;
for example, his approach to the test may show constriction or expan"
siveness, intellectual control or impulsiveness, etc. Inferences about
adaptive and expressive aspects are generally considered relevant' to a
description of the individual's personality struc;ture. In addition, from
the content of what the individual says, inferences may be drawn about
2 Some authors (e.-g., Deri et aI., 1948, Proshapsky, 1950) consider this the
essential characteristic of projective techniques. We do not share this view because
we believe that all data-collection methods which attempt to measure -the
characteristics of a person-for example, his attitudes-except those which assume
In' a priori validity in self-report, involve an interpretation of responses in terms
of some psychological formulation. What is distinctive about projective tests'in this
respect, perhaps, is the extent of inference involved-that is, the lack of apparent
relevance of the responses to the characteristics about which inferences are drawn.
PROJECTIVE METHODS 283
his needs, attitudes, values, conflicts, ideologies, and conception of
himself. Such inferences rest on the assumption that what the individ-
ual perceives in the test materials represents in some way (though not
necessarily in terms of direct correspondence) an externalization or
projection of processes within himself.
Other projective tests have somewhat more specific focus. A rela-
tively new technique, the Tomkins-Horn Picture Arrangement Test, is
designed for group administration and machine scoring. It consists of
twenty-five plates, each containing three sketches that may be arranged
in various ways to portray a sequence of events; the subject is asked to
arrange them in the most reasonable sequence. The responses are
interpreted as providing evidence concerning the following dimensions
of personality: conformity; social orientation (sociophilia, sociophobia,
aggression, dependence, etc.); optimism-pessimism; level of function-
ing (relative emphasis on thinking, fantasy, affect, overt behavior);
and work orientation.
Other commonly used tests are: word association, sentence com-
pletion, doYl play, and figure drawing. In the word-association test, the
subject is presented with a list of words; after each one, he is to respond
with the first word that comes to his mind. Both the rate and the
content of his responses may indicate areas of emotional disturbance.
In the sentence-completion test, the first few words of a possible sen-
tence are given, and the individual is asked to complete it. Like the
word-association test, the sentence-completion method may provide
clues to areas of emotional disturbance; any given area may be investi~
gated by presenting the respondent with relevant sentence beginnings.
In doll-play procedure, the subject is given a set of dolls, usually repre~
senting adults and cHildren of both sexes, and is either encouraged to
play freely with themjor to show how they would act in various circum"
stances. This pw~ed).lre is, of course, especially appropriate for use
with children; it is weI! suited for eliciting feelings about family rela-
tionships. In the figure-drawing test, the subject is asked to "draw a
person," to "draw a, man," or to "draw a woman," etc. The assumption
is that the drawing rf-presents the person's image of himself, and that
unusual features represent areas of confli,:t. strain, etc. In addition to
the relatively specific function ascribed here' to' each of these tests,
284 DATA COLLECTION: III
from the responses to the entire series of pictures, one may be able to
make such statements as, "In describing white females, the subject
remarks upon their sex before he remarks upon their race, but in
describing white males the reverse is true; in describing Negroes, he
consistently remarks upon race first, regardless of sex." And one can
compare the patterns with those available from normative data. Can
one generalize from such a test? This is a problem of validation.
3. Sometimes access to certain populations of potential subjects
(e.g., school children, workers in a factory, etc.) may be withheld if
the topic under investigation is made explicit to the subjects, but
granted if it remains tacit even though obvious.
4. Even though the purpose of a projective attitude test is appar-
ent, it may produce more extensive information than a questionnaire
or 'even an interview with open-ended questions would. In the descrip-
tions of the pictu~ed situations, for example, we may see how attitudes
cQlor perception, or what aspects of attitude (feelings, beliefs, etc.) are
significant for thc individual, and so on.
The projective techniques that have been.. devised for the study
of social attitudes vary in the effectiveness with which they mask'their
purpose, in the richness of personality material they reveal, in the
ambiguity of the stimulus presented to the subject, and in the expendi-
ture of $}<iJI.-{nd effort ~ecessary to, the collection an~ analysis of re-
spons_3Few of the specIfic adaptatIOns oUhese techmques have been
employed in more than one investigation; none is supported by a wide
body of experience. Nevertheless, we shaH briefly describe the major
techniques, since they are directly relevant to the study of social rela-
tions and since they do not usually require the complex skills demanded
by the projective methods for the study of personality. Caution in their
use and interpretation, however, is indicated.
We shall group the many specific adaptations of projective tests
for the study of' social attitudes in terms of certain general materi"lls
and methods of approach: verbal, pictorial, play, and psychodrama tic
techniques. '
VERBAL TECHNIQUES. Perhaps the simplest method is based on the
classic technique of word association employed many years ago by Carl
Jung in the study of abnormal behavior. As employed in the study of
social attitudes, the technique is essentially the same: a number of
PROJECTIVE METHODS 23=)
words are presented to the subject, one by one, and he is asked to in-
dicate the first thought that he associates with each word. Some of
the words used as stimuli are neutral; some relate tb the social attitudes
being investigated. The speed of response and its emotional concomi-
tants, as well as its content, may constitute valuable indicators of atti-
tude. This technique has been used frequently in market research, to
discover, for example, associations to a given brand name or to a
proposed name for a new product.~ It has also been used in several
studies of the relation of specific attitudes to broader personality pat-
terns. Murray and Morgan (1945), for example, used a modification of
the word-association technique, asking the subject to respond to such
words as communism, religion, Negro, by giving the most descriptive
adjective he could think of.
Somewhat similar to the word-association method is the sentence-
completion technigg~ The individual is presented with a series of in-
~lete sentences which he is asked to complete, usually under some
time pressure to ensure spontaneity of response. The content. of
responses, if the items are carefully selected, may provide considerable
insight into the person's attitudes. However, for sophisticated subjects
it is unlikely that the purpose of the task is effectively masked, even
though neutral or irrelevant items are included. On the other hand,
time p-lessure may do much to prevent concealment of attitudes.'
This technique ~,as been used for studying attitudes of many kinds.
fi, For example, Kerr (19:4-3) used it in a study of national stereotypes held
by English people. Some of the sentence beginnings were:
Better Homes and Gardens (Smith, 1954) used the following sentenc~
beginnings, among others:
the office feel about the supervisor?" She answered: "They think
he's wonderful. They'll do anything for him." At which point the
interviewer followed with: "And how about you-how do you
feel about him?" with the reply: "I really detest him. I'm trying
to transfer out of the unit."
Weitz and Nuckols (1953), in a study of the relationship between
job satisfaction of insurance agents and continuance in the job, used
both direct questions and those asking for estimates of the reactions
of others. The indirect questions were introduced with the explana-
tion:
We want to get your opinion about the attitudes of other agents
toward their job. Below are a number of questions which can be
answered by a percentage. Circle the per cent figure you believe
best answers the question. If you don't know, guess.
The questions were of the form:
Approximately what per cent of the agents in your company think
that: The training they received was good.
0%,10%,20%,30%,40%,50%,60%-, 70%, 80%,90%,100%_
The direct questions were introduced with the explanation:
Now we'd like to get your attitude about your own job. Check the
word or phrase which you feel best completes the statement for
you.
The questions were of the form:
The training I received for my present job was
__ poor
--adequate
__ excellent
Although neither the score lfased on the direct questions no~ that
based on the indirect questions correlated yery highly with job survival
at the end of a year, these investigators found that the direct questions
about the person's own reactions provided a better basis for prediction
than did those asking for estimates of other people's reactions.
Another verbal technique is that of asking the respondent to
describe the kind at person who \~ould behave in a specified way. This
:J.pproach has been used most frequently in mar,ket research, to elicit
PROJECTIVE METHODS 293
respondents' "images" of a given product, but it would seem easily
adaptable to the investigation of attitudes of other kinds. Smith (1954)
gives an example of this approach in a study of attitudes toward small
cars. The person being interviewed is asked to imagine that a new
family has moved into his block. Before he sees any members of the
family, he se~s their car parked outside the house; it is a Burton (a
small car). He is asked, "What kind of people would you guess they
are?"
One use of this approach in market research (Haire, 1950) has be-
come almost a classic.' In a conventional survey of attitudes ,toward
Nescafe, an instant coffee, women were asked, "Do you use instant
coffee?" If the answer was "No," they were asked, "What do you dislike
about it?" Most of the replies were along the line, "I don't like the
flavor." The investigators, suspecting that this was a stereotype that did
not express the underlying reasons for rejection of instant coffee,
switched to an indirect approach. Half of the sample of housewives
interviewed were presented with the following shopping list made out
by a hypothetical woman:
pound and a half of hamburger
2 loaves Wonder bread
bunch of carrots
1 can Rumford's Baking Powder
Nescafe instant coffee
2 cans Dd Monte peaches
SIbs. potatoes
The other half of the sample were presented with the same list,
except that "1 lb. Maxwell House Coffee (Drip Ground)" was sub-
stituted for the Iiescaf~. Each respondent was asked to read the
shopping list and then to write a brief'description of the personality
and character of the woman who had made it out. The differences
between the descriptions of the woman who bought Nescafe and the
one who bought Maxwell House coffee were striking. Almost half of
the women who read the list containing instant coffee described its
writer as lazy an9 failing to plan her household purchases well; the
woman who boug4~ the drip-ground coffee was hardly ever described
in these terms. In addition, the woman who bought instant coffee was
more often descri~ed as a spendthrift and a poor wife. A check of the
294 DATA COLLECTION: III
pantries of the respondents showed that most of the women who de-
scribed the buyer of instant coffee in these unfavorable terms did not
have instant coffee on their shelves; those who did not describe her
unfavorably were much more likely to have instant coffee. In other
words, it seemed clear that the decision to buy or not to buy instant
coffee was influenced at least as much by attitudes about what COn-
stitutes good housekeeping as by reaction to the flavor of instant coffee,
but these attitudes could not easily have been elicited by direct ques-
tioning.
A variation of this approach is the matching technique, in which
the respondent is given a list of various brands of the same kind of
product and another list of different kinds of people (e.g., doctors'
wives, electricians' wives, stepmothers, career women) and is asked
to match each kind of person with the brand she would be likely to buy.
Gardner and Levy (1955) report striking differences in the "image" of
different brands~elicited by this technique. f
PICTORIAL. TECHNIQUES.6 Pictorial techniques, many of them bor-
rowed from well-established clinical proced.ures, have long been popular
in the projective study of social attitudes. The Thematic Apperception
Test (T.A.T.) has been the stimulus for several ventures.
Proshansky (1943) was one of the ~rst to employ the T.A.T. type
of picture in the study of social attitudes. Ambiguous pictures of situa-
tions involving labor were intermingled with regular T.A.T. pictures
and exposed to a group for five seconds each. The subjects were asked
to write briefly what they thought the pictures represented. On the
basis of these stories, three judges rated the subjects' attitudes toward
labor on a five-point scale. The pooled ratings of the" three judges cor-
related .87 for one group of subjects and .67 for another group wi\h a
standard scale for measuring attitlfde toward labor.7
Sayles (1954) used pictures somewhat differently in a study' of
attitudes of union members toward grievance procedures. Finding that
many questions seemed threaten_jng to his respondents, or too personal,
6 The pictorial techniques discussed here differ from those discussed in Chapter
7 in that they use pictures to stimulate indirect expression of. the respondent's
attitudes, whereas those discussed in the preceding chapter ask direct questions
about the subject's response.
7 The construction and use of non-disguised attitude scales are discussed in
Chapter 10.
PROJECTIVE METHODS 295
he decided to use a projective approach. On the basis of exploratory
interviews, he identified seven stages in the grievance process; for
example, informal discussion with fellow workers on what to do about
a complaint, informal meeting of the foreman and union official and
'the worker involved, etc. He took photographs of such situations in
other plants, with personnel unknown to his respondents. These
photographs were then shown to the respondents, with an accompany-
ing explanation-e.g., "This person has a grievance, something bother-
i~g him, but before taking it to the union he's discussing it with his
fellow workers." The respondent was then asked such questions as,
"How do you think this guy is feeling right now? What has happened
just before this picture was taken? What do you think is going to
happen next?" Sayles reports that the responses elicited by this tech-
nique were very similar to those obtained in the intensive interviews
conducted in the exploratory stage of the research. The pictorial tech-
nique had a number of advantages: It took only about ten minutes to
administer, compared witn an average of two hours for the intensive
interviews; it could be administered in the factory, whereas the inter-
views had to be conducted at home in order to assure free responses; it
could be administered by someone unknown to the respondents,
whereas the interviews had to be preceded by a six-week period of
developing rapport before respondents were willi1)g to talk freely.
Pictures of the T.A.T. type have been used in several studies of
attitudes toward minority groups. In the Authoritarian Personality
study (Adorno et al., 1950), ten pictures were presented to each sub-
ject: six pictures from the T.A.T. and four especially aimed at uncover-
tig attitudes toward minority groups. The latter were of "Jewish-look-
ing people in a poor district," "an older Negro woman with a younger
Negro boy," "a young I
couple in zoot suits," and "a lower-class man
accosted by a policeplan wielding a nightstick." Subjects were asked to
construct a complete story about each picture, and their stories were
recorded verbatim ~y the examiner. Stories were analyzed quantita-
tively, in terms of t~e strength and frequency of expression of various
needs, and qualitatively, in terms of the theme expressed in the various
stories. Interestingly' enough, the pictures designed to distinguish be-
iween those with high and low ethnocentrism were less effective for
this purpose than the pictures from the regular T.A.T. series.
296 DATA COLLECTION: III
carried out for a grocery store whose sales were declining showed two
women sitting at a table drinking coffee: One woman said, "Well, I feel
I have to buy food where the price is lower-that's the main thing
as far as I'm concerned." The other woman was shown as saying, "Art
and I agree that I should shop where . . ." The respondent was asked
to fill in the rest of the answer (Zober, 1956).
The device of asking the respondent to describe the kind of person
who would behave in a certain way, described in the preceding section
on verbal techniques, has also been used in conjunction with pictorial
material. Smith (1954) gives an example from a study testing proposed
ads for a new perfume. One of the ads featured a Gauguin picture of
South Sea girls; the other, a picture of a young American girl clasping
a bouquet of flowers. The interviewer said: "A good many women
prefer this picture; others prefer this one. I wonder if you could say
anything about these two types of women-the sort of people they
are." The responses made it clear that the Gauguin picture would not
appeal to the market for whom the perfume was intended.
PLAY TECHNIQUES. Techniques involving the manipulation of dolls
have been used in investigating the attitudes of young children.
Hogrefe, Evans, and Chein (unpublished study) gave their subjects a
number of "white" and "colored" dolls 'and asked them to play out
specific scenes, such as "going to school" or "arranging a party," as
~hough they were producing a movie. The inclusion or exclusion of
the colored dolls, as well as the role assigned to them, provided a simple,
objective score which was taken as a measure of the child's 'attitude
toward Negro children, The majority of the children they tested
showed a striking avoidance of segregated patterns; that is, they created
mixed situations far more often than would be expected by chance.
This was in keeping with the children's reports of their own play
activity; in answer to the question, "Do you ever play with Negro
children?," four fifths of the white children said "Yes." But observa-
tion of their actual behavior in an interraCial recreation center showed
a striking contrast; on repeated occasions, when children were asked
to pick partners fQr some activity, the number of segregated pairs was
far greater than would be expected by chance.
Hartley and Schwartz (1948) combined doll play with pictorial
PROJECTIVE METHODS 299
material in the investigation of intergroup attitudes of children. Pic-
torial backgrounds carried characteristic symbols of the Catholic reli-
gion in one set, the Jewish religion in another, and middle-class sur-
roundiJags (without religious identification) in a third. Identical
family sets of dolls were placed on the three backgrounds and the
child was allowed to use them in playing out situations such as a
birthday party, school bus, etc.
PSYCHODRAMATIC AND SOCIODRAMATIC TECHNIQUES. Although psy-
chodrama and sociodrama have not been used systematically in the
study of social attitudes, the fact that they are methods of considerable
flexibility makes it reasonable to examine the possibility that they
might be .used in this way. The methods require that the subject act
out a role, either as himself (psychodrama) or as somebody else (socio-
drama), as he would in a real-life situation. For example, a white sub-
ject may be presented with the problem of acting out the role of a
Negro factory worker who has been absent from work several times and
who has just been called into the foreman's office to explain his absen-
teeism. The manner in which he plays his role, the history that he
creates for the role, etc., may provide considerable insight into his
attitudes. The investigator, in much the same way as an observer, can
record the behavior for later analysis, can categorize it on the spot, or
can rate"it in terms of various scales, etc. Psychodr~ma and sociodrama,
it should be noted, are among the few tools available for the systematic
investigation of social skills. They enable one to place a person in sit-
uations in which one can observe how skillfully he behaves in relation
to other people. s
INFORMATION TESTS
REASONING TESTS
No A's are B's. Some e's are B's. From these statements it is
logicalto conclude:
1. All C's are A's.
2. Some e's are A's.
3. Only a few e's are A's.
4. Some e's are not A's.
5. Most e's are not A's.
6. No e's are A's.
7. No logical conclusion can be drawn from the given statements
/
The difference between the response to the abstract form and the
form referring to ,the Japanese was used as an indicator of attitude
toward the Japanese.
I,
The two syllogisms forming a pair were not, of
course, placed next to each other in the questionnaire as it was pre-
sented to the subj~cts.
304 DATA COLLECTION: III
subjects who received the Negro form differed significantly from those
of the "less prejudiced" subjects who received the white form, whereqs
among the "more prejudiced" subjects the distributions of responses
to the two forms were essentially the same. This finding led Seeman
to question what was actually being measured by the indirect test and
by the Likert scale, and which one was more valid for what purposes.
He concluded by pointing to the "need for extreme care in interpreta-
tion of projective and semi-projective techniques for the study of
specific attitudes."
Other investigators have employed changes in judgment of literary
merit, changes in evaluation of the quality of mottoes, and changes
in level of aspiration as indicators of attitudes. The general technique
is always essentially the same. The item to be evaluated is presented
as a product or as a characteristic of a given person or group, and the
same or an equivalent item as originating with a different person Qr
group. The discrepancy in judgment is taken as a measure of the
attitude toward the given group. Note, however, that the discrepancies
in judgment are more complex than may appear on the surface, since
changes in meaning go along with changes in imputed origin. 13
Hovland and Sherif (1952) have suggested another way in which
judgments might be used as an indication of attitude. In a study dis-
cussed in more detail in Chapter 10, these investigators found that
people's judgment of the degree of favorableness or unfavorableness
of a given statement about Negroes was influenced by their own at-
titude toward Negroes. This finding led to the suggestion that at_titude
might be assessed indirectly through study of the wayan individual
I
Substitute Measures
Still another indirect approach to the measurement of a character-
istic involves measuring something else, or sQme combination of other
things, that is sufficiently highly correlated with the characteristic one
wants measured to enable it to serve as a satisfactory substitute. We
may call this approach the substitute measure.
The F-scal~ (Adorno et al., 1950) was constructed on this prin-
ciple. It was reasoned that authoritarian attitudes should correlate
highly with anti-semitism, so that a measure of the former should also
yield a satisfactory measure of the latter. To increase the substitut-
ability of the F-scale for a measure of anti-semitism, items were selected
for inclusion in the F-scale not only on the basis of the criterion ;of
internal consistency, but also on the basis of how well they correlated
with-a scale of anti-semitic attitudes.
Wilner, Walkley, and Cook (1955) used a different type of sub-
stitute meastire in their effort to establish post hoc whether two groups
of white tenants in housing projects (one living relatively close to,
Negroes in the project, the other relatively far from them) had held
similar attitudes qefore moving into the project. Working from the
fact that a number of socioeconomic characteristics (for example, re'
ligion and education) are known to be correlated with attitudes toward
SUBSTITUTE MEASURES 311
Negroes, they constructed an index of "probable initial attitude" on
the basis of such characteristics.
Ideally, the, method of substitute measurement calls for the com-
bination of a number of indices, each of which has a relatively high
correlation with the characteristic for which one needs a measure, and
a relatively low correlation with each of the other indices in the com-
bination. In principle, such a measure involves an application of the
logic of pragmatic validation. It should be noted, however, that such
a measure cannot be used as an "after" measure in an attitude-change
experiment. The experimental factor is designed to change the attitude,
not necessarily the variables which were initially correlated with it.
If successful, the experimental factor may change the basis of the rela-
tionship and, among other things, lower the pragmatic validity of the
substitute measure. 14
ANote on Validation
tude validly (as indicated by its consistency with the results of a vali-
dated undisguised technique), it may also measure other aspects of an
attitude validly; nevertheless, the consistency should not be taken as
conclusive evidence of validity in measuring these other aspects.
Another approach to validation is to compare the results obtained
by a measuring instrument with observations of actual behavior. That
this approach has not often been used is undoubtedly due in part to the
difficulty of determining what kinds ~ehavior in what situations
would provide an adequate criterion, in part to the difficulty of. securing
measures of such behavior. Nevertheless, a few of the studies mentioned
in this chapter have used this approach. Haire used it in the instant
coffee study, and found high correspondence between responses to his
indirect measure and actual behavior. Hogrefe, Evans, and Chein, in
their study of relations between white and Negro children, found high
correspondence between scores on their indirect measure and chil-
dren's reports of their own behavior, but little correspondence with
observation of actual behavior in test situations.
In summary: Much more investigation of the validity of indirect
tests is needed. Some of the studies carri~a out to date have given
encouraging evidence of correspondence between the results of an in-
direct test and those provided by an independent criterion; others,
however, have revealed discrepancies between different meaSj.lres that
raise questions about what the various tests are in fact measuring. In-
vestigation of the validity of indirect tests-and especially of projective
tests which attempt to measure more than one dimension-is hampered
by the difficulty of finding appropriate criteria. Nevertheless, more
attention needs to be paid to validation of tests of this type before
they can make their full contribution to social research.
9
THE USE OF AVAILABLE DATA
AS SOURCE MATERIAL
Statistical Records!'
Personal Documents
, Mass Communications
Summary
Statistical Records
hours of work could not account for a consistently rising rate of produc-
tivity in their experimental groups over a period as long as one year;
they concluded that changes in the social organization of the 'work
groups and in their relationship to management were responsible for
the rise in productivity.
Available data may be used at other points in a study. They are
frequently helpful in selecting cases with specified characteristics for
intensive study, or a random sample for interviewing in a survey. A
study of worker morale in war industries by Katz and Hyman (1947)
illustrates both these uses of available data. First, production records
were used as a basis for selecting five shipyards which differed in pro-
ductivity; within each of these yards, a sample of workers to be inter-
viewed was selected by taking every nth name from the payroll lists.
These investigations found a circular relation between morale and
production, with high production giving a feeling of accomplish!llent
which led to increased effort, while low production reduced motivation,
which in turn reduced productivity. They concluded further that fac-
tors directly associated with the job we{e more important determi-
nants of worker morale than mOre general community conditions such
as housing, transportation, and recreational facilities.
Available records may also be used to supplement or to check
information gathered specifically for the purposes of a given investi-
gation. For example, in a study of the psychological impact of long-term
unemployment in an Austrian village (Jahoda-Lazarsfeld and Zeisl,
1932), the ~ccounts of their experience given by sev~ral unemployed
men suggested that they felt much worse at the onset of unemploy-
ment than after three years, in spite of the gradual deterioration of
their economic condition. This "shock" effect of unemployment was
checked against such records as the accounts of the local grocer, which
showed a s,udden drop in sales in the months immediately after the
onset of unemployment, followed by a slight recovery and a, steady
decline thereafter.
In many of these examples, the investigator's interest ~as in be-
havior or characteristics of the sort directly reflected in the statistical
records-segreg~tion, suicide, voting, productivity. Like other types
of data, however, records of specific behavior n:ay be used as an indi-
\
STA TISTICAL RECORDS 321
cator of some more general concept. A series of studies by Tryon (1955)
illustrates both these uses of existing data. Tryon was concerned wifh
the problem of identifying subcultural groups in more meaningful and
reliable ways than the usual ratings of "social class." He was interested
in two related hypotheses: (I) that demographic social areas can be
identified on the basis of census data; and (2) that a demographic
social area is also a psychosocial area-that is, that residents of a com-
mon demographic social area will experience certain common socially
relevant situations and certain common psychological states elicited by
those situations, and will behave in certain common ways. In connec-
tion with the first hypothesis, he examined thirty-three items in the
1940 U. S. Census for the 243 census tracts in the San Francisco Bay
Area; the items included, for example, the percentage of detached
single-family homes, the percentage of women not in the labor force,
the percentage of managerial or professional workers, etc. By a statis..
tical technique known as cluster analysis,2 he found that these thirty.
three items fell into three main groupings, which could be described
as: socioeconomic independence, having to do with wealth and social
independence; assimilation, or the incorporation of persons into stand-
ard white-collar American culture; and orientation around the family.
Each of the census tracts could be described in terms of its position on
each of these three basic variables. Many tractsl of course, showed
the same pattern; for the entire area, eight basic patterns were found.
These could be given descriptive labels as well as index scores; for
example, "the exclusives"-above average in assimilation, in orientation
around family life, and especially in socioeconomic independence;
"the workers"-average in orientation around family life but some-
what below average in assimilation and socioeconomic independence;
"the segregated"-lqw in assimilation, socioeconomic independence,
and family life as defined in the study.
As evidence to tbt
I
his second hypothesis-that demographic areas
are also psychosocial areas-Tryon used voting records. His interest
was not in voting per se, but as an indicator of social attitudes. He
found a high correspondence between demographic pattern and voting
in the 1940 presideptial election. In most of the "exclusive" tracts,
about a quarter of the votes were cast for Roosevelt, and in none were
2 For procedures invo1ved ill duster ana1ysis, see Tyron (1957a).
322 AVAILABLE DATA AS SOURCE MATERIAL
more than a third for him; at the other extreme, in most of the "worker"
and "segregated" tracts, three quarters Or more of the votes went to
Roosevelt, and in none did he get less than half the votes. Analysis (as
yet unpublished). of election results fourteen years later showed a
continued relationship between demographic pattern (as determined
in 1940) and social attitudes as revealed in 1954 votes on such issues
as bonds for a hospital, tax exemption for welfare institutions, and pen-
sions for needy aged.
Personal Documents
been produced on the writer's own initiative or, if not, in such a way
that their introspective content has been determined entirely by the
author; and (3) documents that focus on their author's personal ex-
periences. These criteria exclude interview material, however informal
the interview sittiation may have been. They exclude also those literary
efforts that can be used as personal documents only through projective
interpretation. This more limited definition of personal documents
has the advantage of bringing to the.fore their most distinctive charac-
teristic: they permit us to see other people as they see themselves.
Augustine, who produced one of the greatest personal documents
of all times, fully realized the uni~1 contribution they can make in
this respect. In Book X of his Cop{._essions he explains why he wrote
this personal document. He starts fro the assumption that "men
are a race curious to know of other men's lives," an assumption which
is as valid now as then; and he argues that othing but what a man says
about himself can fully satisfy this curiosity:
As to what I now am while I am writing my Confessions,
there are many who desire to know-both people who know me
personally, and people who do not, but have heard from me or
about me. Yet they have not their ear at my heart, where I ,am
what I am. They wish, therefore, to hear from my own confes-
sion what I am inwardly where they cannot pierce with eye or
ear or mind [italics suppliedJ.
It is true that the social sciences have developed modern tech-
niques that aim at piercing through outward appearance and behavior
to inner experiences. Depth interviews, projective techniques, and
psychoanalysis aim at just this. Successfully applied, they can often
penetrate even beyond what a man knows of himself.4 But these tech-
niques, although they are able to 'discover the nature of selected inner
experiences; can hardly ever reconstruct the entire structure of a per-
son's self-image, with its spontaneous emphases and complexities.
Gordon Allport (1942), in his defense of the value of personal
documents for psychology, stresses the importance of aiming for a view
4 Augustine realized the limitation of personal documents in this respect: " ...
yet there is something of man that the very spirit of man that is in him does not
know."
PERSONAL DOCUMENTS 325
of the whole before details of inner experience are subjected to mon
systematic scrutiny:
Acquaintance with particulars is the beginning of all knowl-
edge-scientific or otherwise. In psychology the font and origin
of our curiosity in, and knowledge of, human nature lies in our
acquaintance with concrete individuals. To know them in their
natural complexity is an essential first step. Starting too soon with
analysis and classification, we run the risk of tearing mental life
into fragments and beginning with false cleavages that misrepre-
sent the salient organizations and natural integrations in personal
life. In order to avoid such hasty preoccupations with unnatural
segments and false abstractions, psychology needs to concern
itself with life as it is lived, with significant total processes of the
sort revealed in consecutive and complete life documents.
By and large, the rationale for the use of personal documents is
similar to that for the use of observational techniques. What the latter
may achieve for overt behavior, the former can do for inner experiences:
to reveal to the social scientist life as it is lived without the interference
of research. However, although the number of situations that can be
observed is considerable, personal documents are relatively rare; hence
the scope of their usefulness for research is rather limited.
Even when they are available, personal documents have to be used
with some caution. Augustine pointed to one of the basic reservations
-the doubt about authenticity: "And when they hear me confessing
of myself, how do they know whether I speak the truth . . . ?" He
saw clearly that there was no completely satisfactory answer to such
skepticism, for "I cannot prove to them that my confession is true."
There are two possible kinds of falsification with which a social
scientist who uses personal documents has to be concerned. In its
crudest form, falsificat~on amounts to conscious, deliberate deceit. A
document can be prod~ced in the form and manner of a personal docu-
ment by someone else' and be presented to the world as the genuine
article. The motives fbr such falsification are various: material gain,
malice, practical joke, literary exercise. The most outstanding example
in the psychological literature was an extremely skillfully presented
falsification of the diary of an adolescent girl, which deceived, even
Freud, who called it a "gem." Hug-Helmuth, who originated this
326 AVAILABLE DATA AS SOURCE MATERIAL
a blind deaf-mute, and the Confessions, also belong to the first group.
Among the most frequently used documents of the second type
are diaries of adolescents. Charlotte Buhler (1934), in developing
her psychology of adolescence, conducted and sponsored a series of
interrelated studies based on the use of such diaries. Buhler's interest
led to the establishment of a collection of diaries of adolescents at the
Psychological Institute of the University of Vienna; in a relatively
short time almost one hundred specimens were assembled. Their avail-
ability in such number permitted a more systematic comparison than
is ordinarily possible with unique documents.
At least two of Buhler's studies involved comparison of individuals
at very different periods in time-something difficult to achieve by
other methods of data collection. One study was based on three diaries
of girls from three successive generations. Buhler demonstrated that in
spite of the considerable cultural change between 1873 and 1910 (the
years in which the oldest and the youngest of the three diary writers
were born), some basic desires of adolescence, such as the need for
intimate personal relationships, remained the same. Yet in other re-
spects, such as the girls' relations to their parents, cultural changes were
reflected in the diaries.
The second study was based on diaries of two girls of the same
generation who,. at _the time of the study, were about twenty years
older than at the time they had produced the diaries. A comparison be-
tween "then" and "now" revealed considerable similarities between
the two girls during adolescence and considerable _differences in later
life. t
Mass Communications 6
In addition to statistical records and autobiographical documents,
every literate society produces a variety of material intended to inform,
entertain, or persuade the populace. Such material may appear in the
form of literary productions, newspapers, and magazines or, more re-
cently, motion pictures and radio or television broadcasts.
Mass communication documents are not produced for the bene-
fit of the investigator, and in this respect (although not in others) are
free from the influence of his theoreti~al or personal bias. Like avail-
able statistical records, they enable one to deal with the historical past
as w~llas with contemporary society, an advantage that can hardly be
overestimated in view of the considerable methodological difficulties
stari'ding in the way of a historical perspective in social science. Even
mote than statistical records, documents of mass communication re-
flect broad aspects of the social climate in which they are produced.
PURPOSES OF ANALYSIS
TECHNIQUES OF ANALYSIS
7 Although the tec~nique of content analysis has been worked out primarily
in relation to the mass media, it is applicable to other materials as well. For ex-
ample, personal documepts, unstructured interviews, protocols of responses to pro-
jective tests, records. of patient-therapist interactions, etc., may all be subjected to
content analysis. .
8 For other discu~sions of content analysis, see Bruner (1941), Goldsen
( 1947) , Janis (1943), Kaplan (194 3a ), Kaplan and Goldsen (1943), Lasswell
(1942a, 1942b, 1946)/ Lasswell, Leites, and associates (1949), and Sargent and
Saenger (1947).
336 AVAILABLE DATA AS SOURCE MATERIAL
has been carried out by Harold D. Lasswell and his associates (1949).
Lasswell developed a system of "symbol analysis," which was employed
during World War II in several branches of the United States govern-
ment. In this system, newspaper content is studied for the appearance
of certain symbols, such as "England," "Russia," "democracy," "Jews,"
"Stalin," etc. The frequency with which these symbols appear is noted,
as well as whether their presentation is favorable, unfavorable, or
neutral (or "indulgent," "deprivational," "neutraL") Favorable refer-
ences are sometimes further divided into those stressing "strength" and
those stressing "goodness" or "morality"; negative references into
"weakness" and "immorality" categories.
Davison's analysis of Berlin newspapers, described on page 333,
made use of this type of analysis. The symbols he considered were
I the names of the countries; the favorable, neutral, or unfavorable
quality of each reference was noted. Davison did not use the dimen-
sions of strength-weakness and morality-immorality, but he made an
additional analysis in terms of "themes," another widely used method
of content analysis. In this approach, the analyst immerses himself
in the material until its recurrent ideas or propositions become evideut
a ~nd then counts the frequency with which these propositions occur.
For ',example, some of the major themes Davison found in news items
in the h,S.oviet-controlled Berlin newspapers were: the United States
is torn by I.. ~conomic unrest and industrial strife; the United States is
in the grip 01 ( reactionaries; the United States is pursuing policies of
militarism, impt 7!ialism, and dollar diplomacy.
Wright and 1 ,Telson (1939) employed a more complicated method
of content analysis I,of newspapers. Using a sample of editorials con-
cerning Japan and Ch: ina in the New York Times, the Chicago Tribune
and the Chicago Dall)'T News for the period January 1937 to March
1938, they selected a '\epreserHative statement" from each editorial
and then asked judges to classify these statements in eleven piles, rang-
ing from pile 1 (most. hO,\)tile toward the country concerned) through
pile 6 (neutral) to plle ~ 1 (most favorable). Over-all scores for the
different newspapers and ~for different periods of time were obtained
by averaging the score valu~s of specific statements. The results showed,
, among other things, that the bombing of Nanking had more, influence
in promoting unfavorable references to Japan than did the Panay inci-
MASS COMMUNICATIONS 341
dent, and considerably more than the juridical act of the League of
Nations in branding Japan the aggressor.
RELIABILITY OF CLASSIFICATION. Reliability of responses and of
classifications is, of course, a universal problem in social science.
Methods of ascertaining, and of increasing, the reliability of measure-
ment were discussed in Chapter 5. More specific discussion of problems
of reliability of coding appears in Chapter 11. Ideally, our methods of
analysis and quantification should be so clearly defined that different
judges would arrive at exactly the same results when analyzing the
same material. Perfect reliability, however, is something that can be
achieved at the present time only when the more superficial kinds of
analysis are made, such as counting the number of times a particular
word turns up in a given amount of material. As soon as some degree
of interpretation enters the analysis, judges tend to differ to some
extent in their results.
White's study of the values stressed by Hitler and by Roosevelt,
described on pages 333-334, provides an example of difficulties that
may arise when complex judgments must be made. In the category
"other values" White included the value safety, which is illustrated by
the following statement from a Hitler speech: "It is quite unimportant
whether we ourselves live, but it is essential that our people live, that
Germany shall live." It is obvious that the assignment of the va!ue
safety to this statement involves an interpretation of the meaning of
the sentence in the context within which it was used. The primary
method of increasing reliability of classification is to specify clearly the
characteristics of statements that are to be placed in a given category,
and to use many examples drawn from the material being analyzed to
'illustrate what kinds, of statements are to be considered as belonging
in a given category. But it is obviously much more difficult to give a
definition of the value safety which will be both sufficiently compre-
hensive and sufficien'tly specific to serve as a guide for the coder, than
to give an adequate qefinition of a category such as "mentions racial
groups" in coding an interview question about "what kinds of people
live in this neighb~rhood."
L
The difficulty is increased by the variety of
material to be considered in content analysis, which is limited only by
the interest or intention of the communicator, without the restricting
influence of a specific interview question. To increase reliability of con-
342 AVAILABLE DATA AS SOURCE MATERIAL
Summary
In the last four chapters we have discussed various ways of gather-
ing data needed to answer research questions: observation, interviews
and questionnaires, projective and other indirect techniques, and the
use of available data in the form of statistical records, personal docu-
ments, and mass commurlications. We have pointed out that each
method has its advantages and limitations, and that each is more ap-
propriate for answering certain types of research questions than it is
for others. Moreover, we have noted that no matter what technique
an investigator uses, he must be alert to problems of reliability~ and
validity of his data.
In the next chapter we shall consider procedures for placing in-
dividuals on scales on the basis of data collected by any of the methods
discussed in these last four chapters.
10
PLACING INDIVIDUALS
ON SCALES
Rating Scales
Rating Scales
A number of types of rating scale have been employed; the number
of types distinguished and the names given to them tend to vary with
the measurement theorist. 1
One feature is common to all types: the rater places the person or
object being rated at some point along a continuum or in one of an
ordered series of categories; a numerical value is attached to the point
or the category. Scales differ in the fineness of the distinctions they
permit and in the procedures involved in assigning persons or objects
to positions. These differences will become apparent in our discussion
of several of the more common types of rating scale.
1 2 3 4 5 6 .7 8 9
X: Is indifferent to Negroes as a group; doesn't think about them.
Y: Doesn't think of Negroes as group; considers them as individuals.
I I
In using graphic and: itemized rating scales, the rater makes his
judgment of the individual without direct reference to the positions
of other individuals or grriups with which he might be compared. On
the other hand, comparative rating scales-as their name suggests-
t,
2 Adapted from Cottrell (1947). This rating scale was used by coders in the
analysis of interviews. Howev;er, similar ratings may be made directly by inter-
viewers or by observers of group discussion.
350 PLACING INDIVIDUALS ON SCALES
the person poorly adjusted as well as shy. It is apparent that the halo
effect reduces the validity of the ratings of some traits and introduces
a spurious degree of positive correlation among the traits that are
rated.
Another frequent type of constant error is the "generosity error."
Here the tendency of the rater is to overestimate the desirable qualities
of subjects whom he likes. Still other frequent errors have been iden-
tified. Thus, raters tend to avoid making extreme judgments and to
assign individuals to the more moderate categories. Murray et at
(1938) have identified the "contrast error," in which there is a tendency
on the part of the rater to see others as opposite to himself in a trait.
They found, for example, that raters who were themselves very orderly
rated others as being relatively disorderly, whereas raters who were
themselves less orderly tended to see others as more orderly.
Obviously, one way of reducing constant errors such as those
described above is to train the raters carefully and, especially, to make
them aware of the possibility of such biases. Specific steps may be
taken to reduce the likelihood of specific types of error. For example,
the tendency to avoid using the extreme positions may be counteracted
by giving somewhat less-than-extreme labels to these positions. People
are more likely to check "I am well satisfied with my job" than "I am
completely satisfied with my job"; at the other extreme, they are more
likely to check "There are many things about my job that I do not
like" than "There is nothing about my job that I like." The "gen-
erosity error~' may be reduced by using relatively neutral descriptive
terms for the scale positions rathdr than evaluative ones; for exau;ple.
"does not readily accept new opinions or ways of doing things" rather
than "rigid." Halo effects may be reduced, or eliminated altogether, by
having the various ratings of a given person made independently-
either by different raters, or by the same rater at different times. with-
out awareness that he is rating the same person. Obviously, this latt~r
condition can be met only when the ratings are made on the basis of
recorded material, such as responses to interview questions, accounts
of behavior, etc.,.from which identifying information can be removed.
Systematic errors, of course, reduce the validity of ratings. There
1Jlay also be random errors that reduce their reliability. One frequent
RATING SCALES 353
source of unreliability among different raters is the fact that some
frame of reference is implicit in any rating; different raters may use
different frames of reference in describing individuals in terms of the
characteristic in question. For example, the rating of a person as "con-
servative" or "radical" takes its meaning from the rater's reference
groups-the group norms he has in mind as he makes his rating. Lack
of correspondence between ratings by different observers is frequently
due to the fact that they make ratings with different reference groups
in mind.
[Reliability can be increased not only by careful training of raters
but also by attention to the construction of the rating scal~ Clear
definitions of the characteristic being measured and of the various
positions on the scale, as well as clear specification of the reference
group, help to reduce unreliability. Whenever possible, the definitions
of the scale points should include concrete illustrations of question
responses, types of behavior, or communication content. Careful can
sideration should be given to distinguishing between adjacent positions
on the scale; for example, to the difference between "favorable" and
"very favorable." The example given on pages 348-349 illustrates this
procedure.
Un constructing a rating scale, one must decide how many scale
positions or categories are to be used, unless ohe is using a graphic
scale on which the rater is free to check any point on a continuous
line. TI1ere is no simple rule for determining the optimal number of
positions. A basic consideration, .of course, is the degree of differentia-
tion wanted in the measurement. But regardless of what is demanded
by the research problem, other factors must be taken into account:
, (1) the discriminative ability of the judges or raters, including the
extent to which tl;ey I
are trained and experienced; (2) the kind of
characteristics to b~ judged; e.g., whether "they are complex "inner"
attributes or more" manifest "outer" attributes; (3) the conditions
under which the ratings are to be made; e.g., whether they are based
on extensive data, (long periods of observation of the subject or a great
deal of communicationI,
content) or on limited data (brief observa-
tion or limited communication content). These factors interact' in
their effect on the; degree of fineness possible in the rating scale. If
354 PLACING INDIVIDUALS ON SCALES
4 For example, Kelly and Fiske (1950) found that ratings by experienced
clinical psychologists based on unstructured interviews had little value in predicting
performance in situations that were not clearly specified.
QUESTIONNAIRES THAT FORM SCALES 357
DIFFERENTIAL SCALES
7 Throughout this section, for the sale of simplicity, the discussion is w9rded
in terms of scales measuring favorablenessunfavorableness toward some object.
A scale may, of course, be concerned with some other dimension; for example,
liberalismconservatism of social, political, or ~conomic views; permissiveness
strictness of views on child-rearing, etc. In developing Thurstone scales, ithe in
structions to the judges specify the dimension along which the items are. to be
placed. Thus, in developing a scale to measure liberalism-conservatism, the judges
would be instructed to place in the first pile the items they consider most liberal,
in the eleventh those they consider most conservative. The same principles and
procedures apply whether the dimension to be measured is favorableness-un-
favorableness or some other.
QUESTIONNAIRES THAT FORM SCALES 361
Scale Item
value no.
10.3 l. I consider that the native is only fit to do the
"dirty" work of the white community.
10.2 2. The idea of contact with the black or dark skin
of the native excites horror and disgust in me.
8.6 15. I do not think that the native can be relied
upon in a position of trust or of responsi-
bility.
8.4 17. To my mind the native is so childish and irre-
sponsible that he cannot be expected to know
what is in his best interest.
3.8 22. I consider that the white community in this
country owe a real debt of gratitude to the
missionaries for the way in which they have
tried to uplift the native.
3.1 3. It seems to me that the white man by placing
restrictions such as the "Colour Bar" upon
the native is really trying to exploit him
economically.
0.8 11. I would rather see the white people lose their
position in this country than keep it at the
expense of injustice to the native.
I
The scale values, of course, are not shown on the questionnaire,
and the items are usually arranged in random order rather than in order
of their scale value. The mean (or median)8 of the scale values of the
items the individual checks is interpreted as indicating his position
on a scale of favorable-unfavorable attitude toward the object.
Theoretically, if a Thurstone-type scale is completely reliable and
if the scale is measuring,! single attitude rather than a complex of
attitudes, an individual sh9uld check only items that are immediately
contiguous in scale value-~.g., items 15 and 17 above. If the responses
of an individual scatter witIely over noncontiguous items, his attitude
8 Thurstone, on the assumption that scales constructed by this method were
true interval scales (see Chapter 5), advocated the use of statistics appropriate
to interval scales-the mean and the standard deviation. Other investigators,
operating on the more cautiou~ assumption that the intervals are not truly equal,
have favored the use of the median as appropriate to ordinal scales. For a discus-
sion of whether the assumption that these are true interval scales is justified, sec
pages 363-365.
362 PLACING INDIVIDUALS ON SCALES
score is not likely to have the same meaning as a score with little scatter.
The scattered responses may indicate that the subject has no attitude
or that his attitude is not organized in the manner assumed by the
scale. There is no a priori reason to expect that all people have attitudes
toward the same things or that attitudinal dimensions are the same
for all.
The Thurstone method of equal-appearing intervals has been
widely used. Scales have been constructed to measure attitudes toward
(war, toward the church, toward capital punishment, toward the
Chinese, toward Negroes, toward whites, etc. In addition, an attempt
has been made by Remmers and his colleagues (1934) to develop
generalized Thurstone scales that might be used to measure attitudes
toward any group, social institution, etc. For example, the Kelley-
Remmers "Scale for Measuring Attitudes Toward Any Institution"
consists of forty-five statements, ranging from "The world could not
exist without this institution," through such items as "Encourages
moral improvement" and "Is too radical in its views and actions," to
"Is the most hateful of institutions." In applyi~g this generalized scale
to the measurement of attitudes toward a given iristitution (war, the
family"the church, advertising, or whatever), the subject is instructed
to check each of the statements with which he agrees in reference to
the given institution, or the statements may be reworded to include
mention of the specific institution being considered (e.g., "War is the
most hateful of institutions").
The Wright-Nelson study of editorial positions of newspapers
concerning Japan and China (see page~ 340-341) illustrates-an applica-
tion of Thurstone-type scaling to the analysis of available data. In this
study, the sorting of statements by the judges served simultaneously
to establish the position of each item on a scale of favorableness-
hostility and to determine the scores of the various newspapers. In
effect, each newspaper was treated as having "checked" all the state
ments selected from its editorials; its score was obtained by averaging
the score values of the specific statements.
Several objections have been raised against the Thurstone-type
scale. First, many have objected to the amount of work involved in con-
structing it. Undoubtedly, the procedure is cumbersome. However,
Edwards (1957) has expressed the opinion that~ in view of 'recent
QUESTIONNAIRES THAT FORM SCALES 363
developments in time-saving techniques, the amount of time and labor
involved in constructing a scale by the method of equal-appearing
intervals is not substantially different from that involved in construct-
ing a summated scale. In any case, it is doubtful that simple methods
for the rigorous construction of scales will ever be developed. The
precise measurement of attitudes is perhaps inevitably a complex affair.
A second criticism has been that, since an individual's score is the
mean or median of the scale values of the several items he checks,
essentially different attitudinal patterns may be expressed in the same
score. For example, on the scale of attitudes toward natives of South
Africa given earlier, an individual who checks the two moderately
"anti" items 15 and 17 receives a score of 8.5 (the median of their
scale values). Another individual, who checks items 1, 15, 17 and 22
(perhaps because 22 has a meaning for him which is different from
that which it had for the judges), also receives a score of 8.5 (the
median of the scale values of these items). The two individuals are
rated as having the same degree of prejudice, even though the latter
checked the most unfavorable item in the scale and the former did not.
Dudycha (1943), after six years' use of the Peterson test of attitude
toward war (a test constructed by the method of equal-appearing
intervals) with college students, reported that the average student,
instead of checking only two or three contiguous items, covered more
than a third of the scale; some students endorsed statements ranging
from those placed at the "strongly favorable:' end of the scale to shlte-
ments at the "strongly opposed" end. (One must, of course, comider
the possibility that such students had no clear attitude toward war and
that it was therefore inappropriate to try to measure their altitude by
any technique.) Dudycha questioned the meaning to be given to a
median derived from such a range of responses. However, the criti-
I
cism that identical scoJ,'es do not necessarily indicate identical patterns
of response is not unique I
to the Thurstone-type- scale; it applies at
least as strongly, as we shall see, to summated scales.9
A still more seriohs question has to do with the extent to which
the scale values assigned to the items are influenced by the attitudes
of the judges themselves. Do the attitudes and backgrounds of the
9 The fact that different patterns may lead to identical scores is not necessarily
as serious a limitation as it might seem. This point is discussed on pages 369-370.
364 PLACING INDIVIDUALS ON SCALES
judges affect the position of the various items on the scale? This
obviously is a matter that is open to experimental inquiry. A number
of early studies supported the view that the scale values assigned did
not depend on the attitude of the judges. Hinckley (1932) found a
correlation of .98 between the scale positions assigned to 114 items
measuring prejudice toward Negroes by a group of Southern white
students in the United States who were prejudiced against Negroes
and those assigned by a group of unprejudiced Northern students.
Similarly, MacCrone (1937), in the study of race attitudes in South
Africa referred to earlier, found that the scale positions assigned various
items by South Africans of European background and by educated
Bantus, natives of South Africa, were similar except for a few items.
Studies of the construction of scales measuring attitudes toward a
particular candidate for political office (Beyle, 1932), toward war
(Ferguson, 1935), toward "patriotism" (Pintner and Forlano, 1937),
and toward Jews (Eysenck and Crown, 1949) all found correlations of
.98 or higher between the scale positions assigned to the items by
groups of judges with opposed attitudes towar,d the object of the
scale.
More recent research, however, has sharply challenged the con-
clusions of these studies. Hovland and Sherif (1952), using the items
employed in the Hinckley study mentioned above, found marked
differences between the scale values assigned to items by anti-Negro
white judges on the one hand, and those assigned by pro-Negro white
judges and Negro judges, on the other. Items rated as "neutral" or
moderately favorable by Hinckley's subjects were likely fo be seen as
unfavorable by the pro-Negro white judges and the Negro judges. This
discrepancy between the earlier and the later findings can be accounted
for by the different procedures used. Hinckley followed a rule suggested
by Thurstone, that any judge who placed more than one fourth of the
statements in a single category should be eliminated as "careless."
Hovland and Sherif, however, found that judges with extreme attitudes
tended to place many statements in the same category; checks within
their procedure convinced these investigators that this was nbt a
matter of carelessness. Application of the rule followed by Hinckley
would have eliminated over three fourths of their Negro judges and
two thirds of their pro-Negro white judges; when they did eliminate
QUESTIONNAIRES THAT FORM SCALES 365
these judges, they found that the scale values assigned by the remain-
ing white judges were very close to those assigned by Hinckley's judges.
These findings strongly suggest that Hinckley's procedure had the
effect of ruling out judges with extreme attitudes.
A subsequent study by Kelley et al. (1955), using twenty of the
Hinckley items, found marked differences between the scale values
assigned to items by white and by Negro judges, with the statements
fairly evenly distributed from "favorable" to "unfavorable" by the white
judges, but bunched at the two ends of the continuum by the Negro
judges. lO Granneberg (1955), in constructing a scale of attitudes to-
ward religion, found not only that a religious group and a nonreligious
group differed significantly in the scale values they assigned to items,
but that judges of superior and of low intelligence differed, and that
there was an interaction between attitude and intelligence which
affected the scale position to which items were assigned.
Such findings, of course, cast serious doubt on the meaning of
the scale positions and the distances between them. It should be noted,
however, that even those studies that found marked differences be-
tween groups of judges in the absolute scale values they assigned to
items found high agreement in the rank order in which judges with
differing attitudes arranged the items along the favorable-unfavorable
continuum. Thus, although the assumption that Thurstone-type scales
are true interval scales seems dubious, it is still possible for them to
constitute reasonably satisfactory ordinal scales; that is, they provide
a basis for saying that one individual is more favorable or less favorable
than another. If in practice individuals agreed with only a few contig
uous items, so that a given score had a clear meaning, the Thurstone
methods would provide highly satisfactory ordinal scales. But, as noted
above, individuals m~y agree with items quite widely spaced on the
scale, and in such ca~es the median of the items checked may not
provide a meaningful: basis for ranking the individual in relation to
others.
10 These investigators found that other methods of constructing Thurstonetype
scales were less subject tlian the equal-interval technique to the effect of extreme
attitudes on the part of the judges. The method of successive intervals showed less
difference between whiteland Negro judges, and the method of paired compari~c:1>
eliminated the differences almost entirely.
366 PLACING INDIVIDUALS ON SCALES
SUMMATED SCALES
CUMULATIVE SCALES
The items used in the Bogardus scale (that is, the column headings
in the i11ustration above) were selected on logical grounds. It seems
reasonable to expect that an individual who circles 4 in relation to
Chinese, indicating that he would be wi11ing to accept them to em-
ployment in his occupation, would ordinarily also circle 5 and not
circle 6 or 7. (Here, as in other scales, the content of the item must be
taken into account in deciding whether a "Yes" response is to be scored
as favorable or unfavorable. Since 6 and 7 are essentially statements of
exclusion, absence of a circle constitutes the favorable response to
these two items. Thus, neither 6 nor 7 should be circled for a given
group if any of the other numerals is circled.) If the individual did not
circle 3 (willing to admit to my street as neighbors), one would expect,
on logical grounds, that he would also not circle 2 or 1.
On the whole, the assumption that these items form a cumulative
s{ale has been borne out. Nevertheless, in practice some reversals c;lo
occur. Some individuals, for example, who would object to living in a
buildingwitH ~uerto Ricans would not object to having Puerto Ricans
in an informal social club (see Deutsch and Collins, 1951). Although
individuals not infrequently show such reversals in replies on the social-
distance scale, it is relatively uncommon to find an entire group revers-
ing items. Thus, the social-distance scale has been used rather effect-
ively in comparing the attitudes of different groups of people toward
various nationalities. It may be noted that re.versals can almost always
be interpreted by postulating the intrusion of some factor other than
the individual's own attitude toward the group in quest~on-e.g., the
respondent's image of hqw other people would interpret his living in
a certain neighborhood, or his expectation concerning the impact on
real estate values of admitting minority group members to residence
on his street, etc.
With the appearance of the Thurstone and Likert scaling methods
in the late nineteen-twenties and early thirties, attention shifted away
from cumulative scales. However, the forties saw a revival of interest
and a rapid development of techniques for determining whether the
items of a scale do in fact have a cumulative relationship, regardless of
whether they appear cumulative in common-sense terms. This renewed
QUESTIONNAIRES THAT FORM SCALES 373
interest was linked to an emphasis on the development of unidimen-
sional scales-that is, scales consisting of items that do not raise issues,
or involve factors, extraneous to the characteristic being measured.
A number of investigators had pointed out that the Thurstone and
Likert scales, although ostensibly measuring "an attitude," contained
statements about various aspects of the object under consideration.
Thus, Carter (1945) pointed out that Form A of the Peterson scale of
attitude toward war (a Thurstone-type scale) had as its most favorable
statement, "War is glorious"; as its most unfavorable statement, "There
is no conceivable justification for war"; and as its mid-point, "I never
think about war and it doesn't interest me." He commented that it is
difficult to think of these statements as falling along a straight line. He
suggested that such statements as, "The benefits of war rarely pay for
its losses even for the victor" and "Defensive war is justified but other
wars are not," belong on two different scales, one having to do with the
economic results of war, the other with the ethics of war activity. It
was argued that combining items referring to different aspects of the
object made it impossible to specify exactly what the scale was measur-
ing, and also accounted for the scattering of responses, which made
it difficult to assign any clear meaning to the score based on the median
of the items checked. /
There have been several approaches to this problem. We shall
discuss here only the technique developed by Guttman, commonly
called scale analysis or the scalogram method. 13 One of the main pur-
poses of this technique is to ascertain whether the attitude or charac-
teristic being studied (technically termed the "universe of content"
or the "universe of attributes") actually involves only a single dimen-
sion. In the Guttma~ procedure, a "universe of content" is considered
to be unidimensional,only if it yields a perfect, or-nearly perfect, cumu-
lative scale-that is, if it is possible to arrange all the responses of any
number of responderj.ts into a pattern of the following sort:
number wanted in the final scale are selected. The items selected are
those which have the highest discriminatory coefficients in their scale
interval; for example, of all the items with scale values between 8.0 and
8.9, those with the highest discriminatory coefficients are selected. An
equal number of items is selected for each interval. (4) The items
in the resulting list are arranged in order of their scale value. The list
is then divided into two equated forms of the questionnaire by assigning
all the odd-numbered items to one form and all the even-numbered
items to the other.
The Guttman and related techniques represent major contribu-
tions to the methoddlogy of questionnaire construction and analysis.
However, two qualifications related to the use of unidimensional scales
should be kept in mind: (1) Such a scale may not be the most effec-
tive basis either for measuring attitudes toward complex objects or for
making predictions about behavior in relation to such objects; (2) a
given scale may be unidimensional for one group of individuals but not
for another.
Let us consider the first reservation. Suppose we have devised a
unidimensional scale to measure attitude toward the economic results
of war, another to measure attitude concerning the ethics of war ac-
tivity, still others to measure whatever other aspects of attitude toward
war can be identified and measured by unidimensional scales. No single
one of these scales may give an accurate reflection of an individual's
attitude toward the complex concept "war," or provide a basis for pre-
dicting how he would vote on the question of his country's participa-
tion in a specific war. This is, of course, the same qualification noted
in connection with the discussion of internal consistency in Chapter
5, pages 178-179; a complex measure may be needed as a basis' for
predicting complex behavior.
As for the second reservation, it is sometimes assumed that uni-
dimensionality is a property of a measuring instiument, rather than of
the patterning of an attitude among a given group of individuals. For
one group, a number of items may be arranged unidimensionally in a
given order; for another group, the same items may fall into a different
order; for still another group, they may not form a unidimensional pat-
terti at all. The way i~ which the experiences of different groups can
lead to different patternings of items is illustrated in a study by Harding
QUESTIONNAIRES THAT FORM SCALES 377
and Hogrefe (1952). These investigators interviewed three groups of
white department-store employees. The members of Group 1 worked
in departments in which there was at least one Negro in a job equal
in status to their own, or of higher status than their own; those in
Group II worked in departments in which all the Negroes were in jobs
of lower status than their own; those in Group III were in departments
where there were no Negroes. The interviews included six "social-dis-
tance" questions, having to do with: sitting next to Negroes in buses
or trains, sitting at the same table with a Negro in a lunchroom, taking
a job in which there were both Negroes and white people doing the
same kind of work as you, working under a Negro supervisor, living in
a building in which there were both white and Negro families, and
having a Negro for a personal friend. The investigators found that
these six questions formed satisfactory Guttman-type scales for each
of the three groups, but that the question about taking a job in which
there were both Negroes and white people doing the same kind of
work as the respondent fell in a different position for each of the three
groups. For Group I-the people who were actually in this situation-
this question tied with the one about buses and trains for the "most
acceptable" position. For Group II-those working in departments
with Negroes, but in positions of unequal status-sitting next to
Negroes in trains and buses was more acceptable than working with
them on an equal status. For those in all-white departments, both
sitting next to Negroes in buses and trains and sitting at the same
table with a Negro in a lunchroom were more acceptable than working
with them on an equal status.
In other, words, as Coombs (1948) has pointed out:
. . . in a highly organized social order with standardized
education, there will tend to be certain traits generated which
will be common to the population subjected to the same pattern
of forces. There is, however, at the same time, opposition, con-
tradiction, and interaction of these forces on organisms that are
not equally endowed in the first place-with the result that the
structuring of a psychological trait is less complete in some
individuals than in others . . . . A psychological trait, in other
words, mayor may not be a functional unity and it mayor may
not be general, i.e., common to a large number of individuals.
378 PLACING INDIVIDUALS ON SCALES
control group of individuals who did not receive therapy but who
were tested several times showed a s~lity coefficient of .86 computed
on the basis of two administrations of the instrument at least six
months apart. Comparison of the "adjustment" scores with therapists'
ratings of the success of therapy showed considerable, though by no
means complete, agreement.
Although Q-sorts have been used most often in studies such as
this, where the emphasis is on self-image and other person-images, the
method is applicable to the study of other attitudes. Subjects might be
presented with sets of statements about methods of child-rearing, or
about labor-management relations, or about Negroes, and asked to
sort them in terms of the extent of their agreement or disagreement
with each statement. The resulting data might be used, for example,
to compare a given individual's views about different ethnic groups,
or to compare the views of different individuals about a given group
m~~
Key
1. Taft 5. Truman
2. Policy in China 6. Atombomb
3. Socialism 7. United Nations
,4. Stalin 8. Eisenhower
A Concluding Note
1
In this chapter )Ve have discussed various methods of scaling, that
is, of distinguishing among objects or individuals in terms of the degree
to which they possJss a given characteristic. Here, as in connection
with other measurement techniques, we have raised questions about
reliability and validity. Although many users of scales have investigated
the reliability and ~alidity of their measures, it is probably still true,
as one writer (Ferguson, 1957) has remarked, that there has been
384 PLACING INDIVIDUALS ON SCALES
Tabulation
Statistical Analysis of Data
, Inferring Causal Relations
1
Racial groups
1. Negroes: mentioned-_; not mentioned--
2. Other racial groups: mentioned--; not mentioned_ _
Nationality groups
3. Irish: mentioned--; not mentioned--
4. Italians: mentioned-_; not mentioned--
5. Other nationality groups: mentioned--; not men-
tioned--
Religious groups
6. Jews: mentioned--; not mentioned--
7. Catholics: mentioned-_; not mentioned--
S. Protestants: mentioned_ _; not mentioned-_
9. Other religious groups: mentioned--; not men-
tioned--
10. Mention of groups listed above
Mentioned one or more--; mentioned none, but men-
tioned other human grouping(s ) _ _; mentioned none,
stating "I don't know" _ _; did not answer question--
Notice that, although we are dealing with the answers to only one
question, our specific research interests have led us to provide ten
sets of categories in which to classify the respondents. Every respondent
can be placed in one of the two categories of each of the first nine sets
and in one of the four categories of the final set.
Since the failure to check a specific "mentioned" may be taken to
imply the corresponding "not mentioned," the appearance of the
code can be simplified by omitting all of the "not mentioned" cate-
gories; but each of the first nine set~ will stiU remain a two-category set,
one category .indicated by a check mark and the other by the absence
of a check mark. Such simplification is, howeve;, not always wise. The
failure to check a particular "mentioned" category may represent an
oversight in coding. Such oversights can 1:Je ,discouraged (and certflinly
made detectable) by requiring that the placement of a respondent in
a "not mentioned" category should call for as positive an act .as his
placement in a "mentioned" category. Also, to anticipate a bit, when
coding such data for machine tabulation, it is highly desirable that all
! categories should have an explicit identification. The machines in most
belong together, and then ask himself what led him to feel that those
he has placed in a single group are alike. Thus, for example, Chein
et aI. (1952), in a study of views of prominent Jewish educators and
group workers about a number of issues in Jewish education, found it
appropriate first to sort their respondents into groups that could be
characterized in terms of their total outlook on the meaning of being
Jewish. The investigator may find that he has grouped his cases on the
basis of common characteristics; he may then examine them to see
whether those who have similar characteristics have undergone similar
experiences. Or he may find that his grouping is on the basis of similar
experiences; he may then re-read the cases to see if these similar ex-
periences seem to have led to similar consequences.
,Another approach that may stimulate the formulation of working
hypotheses is to: note matters that seem surprising in view of eHher
common-sense or theoretical expectations, and then to search for #
There are many things that may operate to make the judgments of
coders 3 unreliable. These factors lpay arise from the data to be cate-
gorized, from the nature of the categories that are to be applied, .from
the coders themselves, etc. Let us briefly consider some of these factors
and possible safeguards against them.
DIFFICULTIES ARISING FROM THE DATA. Many of the difficulties that
occur in coding result from the inadequacies of the data. Frequently,
2For a general discussion of problems of reliability, see Chapter ?
3Weare speaking here of persons who code the data after they have been
collected, not of respondents or data collectors.
CODING: THE CATEGORIZATION" OF DATA 403
the data do not supply e.nough relevant information for reliable coding.
This may be the consequence of inadequate data-collection procedures
-poorly worded questions, untrained observers, etc. Perhaps more
often, however, the difficulties are of a sort that can easily be corrected
by carefu1 editing of the data.
When the interviewer or the observer hands in his material, the
possibility of eliminating many potential coding difficulties still exists.
A careful examination of the data as soon as they are collected and, if
necessary, a systematic questioning of the interviewer or observer will
avert many coding problems. The process of scrutinizing the data to
improve their quality for coding is commonly caned editing. 4
Not only does editing help to avoid later coding problems; it may
also markedly improve the quality of data collection by calling atten-
tion to points at which the interviewers or observers have misunder-
stood instructions, are not recording data in sufficient detail, etc. To
serve this function, editing shou1d be done in the course of pretesting
the interview or observation schedule and of training the interviewers
or observers, as well as throughout the period of data collection. In
any case, if editing is to remove coding problems, it must be done
while the interviewers or observers are still available for questioning.
Each interview or observation schedule should be checked for:
1. Completeness. An items shou1d be fined. in. A b1ank next to a
question in an interview schedule may mean "don't know," "refused
to answer," or that the question was not applicable, that the question
was omitted by mistake, etc. For many purposes, it is important to be
able to distinguish among these potential meanings.
2. Legibility. If the coder cannot decipher the handwriting of
the interviewer or observer, or the abbreviations and symbo1s he
employs, then coding is impossible. It is a simple matter to check for
legibility when th~ material is handed in and to have it rewritten if
necessary, but it iSioften extreme1y time-consuming to have the coder
attempt to deciph~r the handwriting or to track down the interviewer
once coding has begun.
3. Comprehensibility. Frequently a recorded behavior or response
seems perfectly cQ~prehensible to the interviewer or observer but is
4 For a detailed discussion of the process of editing in largescale surveys,
consult Parten (1950; Chapter 13) ..
404 ANALYSIS AND INTERPRETATION
Tabulation '
is also quite rapid if small cards are used; 1,000 cards can be counted in
less than five minutes, if good technique is employed. At such a rate
of speed, sorting !and counting by hand is likely to rival, for short
6 See her Chap~er 15 for a detailed discussion of tabulation procedures.,
4G5 ANALYSIS AND INTERPRETATION
The use.of th~ mode dispenses with the idea of a point of balance.
In fact, it is possible to have several modes in one distribution; for
412 ANALYSIS AND INTERPRETATION
there are obviously two possibilities: either the two populations are
alike, or they are different. Suppose that our samples from the two
populations are different on a particular measure or attribute. Clearly,
this would be likely to happen if the two populations from which the
samples are drawn do in fact differ on that attribute. However, it does
not in itself constitute evidence that they do differ, since there is always
the possibility that the samples do not correspond exactly to the
populations they are intended to represent. We must consider the pos-
sibility that the element of chance which is involved in the selection
of a sample may have given us samples which differ from each other
even if the two populations do not differ. Thus the crucial question
is: Is it likely that we would have come up with samples that differ to
this extent if the two populations were actually alike? This is the
question the test of the nulI hypothesis answers; it tens us what the
chances are that two samples differing to this extent would have been
drawn from two populations that are in fact aIike.14 Only if the
statistical test indicates that it is improbable that two samples differ-
ing to this extent could have been drawn froIl} similar populations can
we conclude that the two populations probably differ from each other.
Suppose, however, that our findings show no difference between
the two samples; Jet us say that in our samples, both rural and urban
Englishmen attend the movies, on an average, two and one-half times
a month. Can we then concluder that the total populations of rural
and urban Englishmen are alike in frequency of movie-going? Not
with any certainty. Just as there. is the possibility that samples may
diller when the populations are alike, so there is the possibility that
samples may be alike when in fact the populations differ. Thus, if our
two samples are alike, all we can conclude is that we have no evidence
that the populations differ-in other words, that the idea that the
two populations are alike is tenabie. I
But to go 'back to the case where the two samples differ. We .tan
affirm that the two populatio!1s they represent probably differ if we
14 It should be kept in mind that all statistical tests of significance, and thus
all gener;>lizations from samples to populations, rest on the assumption that the
I>Jimples are not biased.!.._that is, that the cases to be included in the samples have
been selected by some procedure that gives every case in the population an equal,
or at least a specifiable, chance of being included in the sample. If this assumption
i.~ not justified, si~nificance tests become meaningless.
-
STATISTICAL ANALYSIS OF DATA 417
can reject the null hypothesis-that is, if we can show that the obtained
difference between the two samples would be unlikely to appear if the
two populations were in fact the same. It is, however, in the nature of
probability that even highly improbable events can sometimes happen.
Thus, we can never be absolutely certain of our generalizations to the
total population. Whenever we reject the null hypothesis, there is some
chance that we are wrong in doing so.
However, since we are always dealing with inferences and prob-
abilities, there is always also some chance that if we accept the null
hypothesis, we are wrong in doing so. That is, even if our statistical test
indicates that the sample differences might easily have arisen by chance
even if the two populations are alike, it may nevertheless be true that
the populations differ.
In other words, we are always confronted with the risk of making
one of two types of error. We can reject the null hypothesis when, in
fact, it is true; that is, we may conclude that there is a difference be-
tween the two populations when, in fact, they are alike. This is com-
monly referred to as the Type I error. Or, on the other hand, we can
accept the null hypothesis as tenable when, in fact, it is false; that is,
we may conclude that the two populations are alike when, in fact,
they are different. This is referred to as the Type Irerror.
The risk of making the Type I error is determin,ed by the level of
significance we accept in our statistical testing. Thus, if we decide that
we will conclude that the populations truly differ whenever a test of
significance shows that the obtained difference between two samples
would be expected to occur by chance not more than 5 times in 100
if the two populations were in fact alike, we are accepting 5 chances
in 100 that we will be wrong in rejecting the null hypothesis. \Ve can
reduce the risk of a Type I error by making our criterion for rejecting
the null hypothesis m9re extreme; for example, by rejecting the null
hypothesis only if the: statistical test indicates that the sample dif-
ference might have ap~eared by chance only once in a hundred times,
or once in a thousand times, or once in ten thousand times. Un-
fortunately, however, the chances of making Type I and Type II errors
are inversely related. The more we protect ourselves against the risk
of making a Type I error (that is, the less likely we are to conclude
that two populations differ when in fact they do not), the more likely
418 ANALYSIS AND INTERPRETATION
SPURIOUS RELATIONSHIPS
TABLE lA
(Hypothetical data)
No. % making
qualifications
Low interaction 107 /
64
Medium interaction 101 72
High interaction 140 77
Total 348' 72
TABLE IB
(Hypothetical data)
Students from Other
Parts of the
Students from Europe World
% making % making
No. qualifications No. qualifications
Low interaction 17 84 90 63
Medium interaction 45 78 56 68
High interaction 90 82 50 68
Total 152 79 196 65
table, one would conclude that association with Americans does not
lead to qualification of generalized statements about them, and that
the observed relation between these two variables stemmed from the
fact that European students were both more likely to associate with
Americans and more likely to qualify their statements about them.
The "No." columns show more than half the European students
(90 out of 152) as scoring high on interaction with Americans,
and almost half the non-European students (90 out of 196) as scoring
low. The percentages in the "Total" row show European students as
more likely to qualify their statements; according to these hypothetical
figures, 79 per cent of them did so, compared with 65 per cent of the
non-Europeans. The columns headed "Percentage making qualifica-
tions" indicate that neither among the European nor among the non-
European students did the extent of interaction with Americans make
any marked difference in the frequency with which students qualified
their generalized statements. In other words, on the basis of ~uch a
pattern of findings, one would conclude that the relationship between
amount of interaction with Americans and the likelihood of qualifying
generalizations about Americans was spurious.
INFERRING CAUSAL RELATIONS 42'
TRACING THE PROCESS INVOLVED IN A RELATIONSHIP
TABLE 2A
No. Mean "intimacy" score
Students in small colleges 77 2.43
Students in non-metropolitan
universi ties 139 2.17
Students in metropolitan
universities 132 1.69
TABLE 2B
Students in "Low Students in "Me- Students in "High
Interaction- dium Interaction- Interaction-
potential" Living potential" Living potential" Living
Arrangements Arrangements Arrangemen ts
. Mean Mean Me:m
"intimacy" "intimacy" "intimacy"
No. score No. score No. score
Students in
small colleges 7 1.43 20 2.45 50 2.56
Students in non-
metropolitan
universities 38 1.46 40 1.93 61 2.77
Students in metro-
politan uni-
versities 64 1.16 34 1.79 34 2.59
All students 109 1.28 94 1.99 145 2.65
..
apparent causal relationship between X and Y to be spuri us .
430 ANALYSIS AND INTERPRETATION
SPECIFICATION OF A RELATIONSHIP
TABLE 3B
(Hypothetical data)
Number Perct:ntage Percentage
Earning Less Earning $3000
than $3000 or More
NEGRO WORKERS
Graduated from high
school 100 80 20
Did not graduate from
. high school 100 90 10
Total 200 85 15
WHITE WORKERS
Graduated from high
' school 400 20 80
~ Did not graduate from
school 400 - 80 ~ 20
Total 800 50 50
tionship between education and income for Negroes than for whites.
F9r whi~es, it ISO somewhat higher than in the original relationship; for
Negroes It is considerably lower. Thus, the breakdown of the original
~relationship by race has helped to specify some of the conditions
,tinder whibh it is more pronounced and some of the conditions under
'which it is)~ss pronounced. . --
- i_ I , 1 t
When I walk thraugh the praject, many a Negra man shauts fram
across the street, 'Hello., Helen.' "
So. far, the excerpt has mainly an illustrative function. Fallawing
a table shawing the frequency af such changes in the caurse of time,
this vivid accaunt would probably lead all readers to assume that race
relations were considerably impraved by living in the praject, withaut
any awareness af qualificatians other than thase indicated in the table.
Hawever, the excerpt cantinues: " ... 'Hello., Helen.' [a pause]. Of
caurse, I'd .faint if they did this to. me in the main street in front af
everybody."
This afterthaught intraduces an impartant qualificatian af the
results. It suggests that the relatians af the white tenants to. the Negro
tenants are only in part a cansequence of their own beliefs and feelings
abaut Negraes; in part they are also. a cansequence af the white tenants'
perceptian af sacial approval or disappraval af interracial assaciation.
Within the hausing project, the white waman in the illustration ap-
parently perceived appraval af such assaciatian. Outside the praject,
"an the main street," she apparently perceived disappraval. Evidently
her behavior taward Negroes varied accardingly. The illustratian sug-
gested nat anly a qualificatian af the findings but a lead for further
research.
In the example given, the implication is so. dramatically obvious
that no. analyst who. examined the raw data wauld be likely to. averloak
it. In ather cases, suggestians far further research are nat so. abviaus;
they must be deliberately sought in the raw data. Although the con-
firmatian or refutatian af hypatheses requires that a study be set up
with these hypatheses in mind, the purpose af discavering promising
leads for investigatiQ~ is often served best by the painstahng inspec-
tian af nanquantified data.
12
TJ-tE .RESEARCH REPORT
3. T}!~ t~~\.,lt.~.
4. The irnpliC'ations drawn from the results.
This list provides the major headings for an outline of the research
report. Each point is discussed in more detail below.
It was pointed out in Chapter 2 that the first step in the research
process is a precise formulation of the question to be investigated.
Ordinarily, the research report also starts with this statement of the
issue on which the study was focused. \Enough background should be
given to make clear to the reader why the problem was considered
worth investigating. Since a social science audience is likely to be more
interested in contributions to general knowledge of human behavior
than in the solution of a specific practical problem, the report to such
an audience usually stresses the relevance of the investigation to some
aspect of psychological or sociological theory.3 For example, a study
undertaken at the request of an institution to ascertain reactions to a
proposed change in personnel policy may be planned and carried out
in such a way that it provides evidence on the manner in which an
individual's role within the organization influences his perception of
the new policy. The report to a social science audience 'would quite
properly stress this latter aspect rather than the concrete issue with
which the specific institutiqn is concerned.
However"it should be recognized that not all studies have a direct
bearing on theoretical issues, and th~t the relevance of a study to s~me
theoretical point may be_come apparent only as one seeks to 'under-
stand the findings. At the present time, in the social sciences, many
studies are of necessity carried out without the guidance of a systematic
theory. When this is the state of affairs, there is no reason to disguise
it; attempts to invent theoretical relevance usually strike the reader
as pretentious.
In addition to indicating the practical or theoretical importance of
the question inves~igated, the statement of the problem should include
3 For a discussion of the relation between research and theory, see <;::hapter 2,
pages 44-47, and Chapter 14.
WHAT THE REPORT SHOULD CONTAIN 445
a brief summary of other relevant research, so that the study may be
seen in context; the hypotheses of the study, if any were fonnulated;
and definitions of the major concepts employed (see Chapter 2). The
connections among these elements should be made clear; that is, the
logical sequence of ideas leading from the existing theory and relevant
research findings to the hypotheses and concepts of the study should
be explicitly indicated.
THE RESULTS
DISCUSSION OF IMPLICATIONS
THE SUMMARY
mean? Could the point be expressed more simply? Does the material
given in the tables justify the conclusions I have drawn? Do the various
points fit together logically?
Having at least one colleague read the report just before the final
revision is extremely helpful. Sentences that seem crystal-clear to the
writer may prove quite confusing to other people; a connection that had
seemed self-evident may strike others as a non sequitur. A friendly
critic, by pointing out passages that seem unclear or illogical, and per-
haps suggesting ways of remedying the difficulties, can be an _invaluable
aid in achieving the goal of adequate communica'fiOn. -
13
THE APPLICATION OF
SOCIAL RESEARCH
have been left up to the principals of the various schools in the system,
the superintendent of schools may be able to point out to the investi-
gator schools which are similar in such matters as general socio-
economic level of the students, range of scholastic ability, and caliber
of teachers, but which differ in the independent variable with which
the research is concerned-the basis on which students are grouped.
Or, if the existing situation does not contain arrangements that so
neatly fit the research Iequirements, the superintendent may arrange
to have homogeneous classes set up in some schools and heterogeneous
ones in others, or to have some homogeneous and some heterogeneous
classes within the same school.
The agency personnel can inform the investigator whether data
needed for the study are available in records compiled in the course of
the agency's regular operations, or they may be able to suggest ways
in which the collection of data can be worked into on-going pr0cedures.
They can advise the investigator whether his proposed design or
methods of data collection are likely to be feasible. For example, an
investigator planning a study of street-comer, gangs discussed his re-
search plan with a group worker who had had experience with such
gangs. The plan called for administering Rorschach tests to the gang
members. The group worker predicted that it would be impossible to
persuade the boys to respond to the Rorschach, and suggested that
the investigator try to devise ways of getting the information he needed
by means of participant observation. At times, a~tion personnel, view-
ing the situation from their particular vantage point, may overestimate
or underestimate difficulties; therefore the investigator should try to
become sufficiently familiar with the situation to make his own.estimate
of the feasibility of his proposed procedures. Nevertheless, the practi-
tioner may be an invaluable sourc;e of advice concerning resources and
possible obs,tacles.
The action personnel may participate in carrying out the experi-
mental manipulations or in gathering the data for the study. However,
their participation in analysis of the data is likely to be limited, except
in the case of studies set up as self-surveys and requiring only the
simplest kind of analysis. 2
2 For a discussion of self-surveys, see Wormser and Selltiz (1951a and b).
CONCERN DURING THE RESEARCH 461
SOME PRACTICAL PROBLEMS
Collaboration does not always run smoothly, nor are the recipients
of research findings always eager to accept and act on them-even
though they may have requested the investigation in the first place.
Even if the organization has requested the study, this request may not
represent a consensus on the part of all the relevant people or interest
groups within the organization. One individual, or a few, in a central
positipn, may have made the decision. There may be others who are
not in favor of the research, and these may include people whose help
is needed in carrying out the study or who will have some responsibility
for applying the findings. Some of them may believe that no problem
exists; others may recognize a problem but think that research cannot
contribute to its solution; others may feel that the nature of the people
or groups sponsoring the research is such that the study will inevitably
lead to conclusions detrimental to their interests; still others may be
sure that they know the solution and so consider research unnecessary.
For example, the principal and board of trustees of a private school
became concerned about the school's apparent lack of effect on the
character of its students. They asked a research organization to under-
take an exploratory study to evaluate whether there was, in fact, a
discrepancy between the goals of the school and the putcomes in terms
of the students' character and, if so, to identify possible reasons and
suggest courses of action that might make achievement of the goals
more likely. The plan of the study called for interviews with faculty,
parents, and students. When the interviews with faculty members
were started, it rather quickly became apparent that the faculty were
antagonistic to the study. They saw the school's accomplishments as
more nearly in keepipg with its goals than did the principal and
trustees. Moreover, to; the extent that they recognized the results as
falling short of what )Vas desired, they held that the main source of
the difficulty was their own low pay and heavy work schedule. In their
opinion, the money spent for research might better have been used to
increase their salaries. Several meetings 'With the entire group and with
influential individualsi~ere necessary to overcome their opposition.
In other situations, the relevant people may recognize a problem
and believe that rese~rch can contribute to its solution, but some of
462 THE APPLICATION OF SOCIAL RESEARCH
may see the research las involving an evaluation of how well they are
performing their functions; they may fear that possible negative evalua-
tion will be used agaipst them. In a work situation, for example, they
may worry that evid~nce collected in the course of the research will
be used as a basis for determining promotions or dismissals. Even
where no such obj~ctive consequences are likely, people may be un-
t-
comfortable at feeling themselves "under the microscope." The chair-
man of a P.T.A. in ~hich attendance at meetings has been declining
464 THE APPLICATION OF SOCIAL RESEARCH
may ask a social scientist for help in finding out the reasons for the
decline and possible ways of reviving interest; but even though she has
asked for the help research can give, she may be apprehensive that
the findings will point to inadequacies in her performance as chair-
man.
One final source of resistance is especially likely to come into
play at the stage of applying the findings: the reluctance to change
accustomed ways of doing thinGs. Application of research findings
often implies doing something differently, in a way that will pre
sumably be more effective. But ordinarily it is easier to keep on doing
things as we have been doing them; inertia may lead to rejection of
the research findings or of their implications for action. A dramatic
example from medical research illustrates both this source of resistance
and the one discussed immediately above-resentment of criticism.
Beveridge (1950) recal1s the fate of Semmelweis after he had dis-
covered the origin or puerperal rever:
. . . he instituted a strict routine of washing the hands . . . be-
fore the examination of the patients. As a result of this procedure,
the mortality from puerperal fever in the first obstetric clinic
of the General Hospital of Vienna fell immediately from twelve
per cent to three per cent, and later almost to one per cent. His
doctrine was well received in some quarters and taken up in some
hospitals, but such revolutionary ideas, incriminating the obste-
tricians as the carriers of death, roused opposition from en-
trenched authority and th<:; renewal of his position as assistant was
refused. -
INTERIM REPORTS
In preparing both the interim reports and the final report to the
collaborating organization or to others who are expected to act On the
findings, the investigator must again remember that the main purpose
of the report is to communicate to the audience. And this audience
is quite different from the social scientists for whom technical reports
are written. As a rule, the collaborators or the other persons who will
apply the findings cOi_lstitute a lay audience as far as the theory and
methods of social science are concerned, but in the area of their
activities they are experts, and their concrete knowledge of the specific
situation being studie4 usually exceeds that of the social scientist In
reporting to them, thcrinvestigator must take into account the areas in
which they are experts and those in which they are not.
A report to this audience usually starts with a formulation of..-the
,/
470 THE APPLICATION OF SOCIAL l\ESEARCH
role will be better if the shift is made deliberately and of his own
volition, rather than accidentally or in response to demands and pres-
sures. It is not within the scope of this text to offer specific suggestions
for the social scientist in his consultant role. 7 However, one cardinal
rule can be presented: He should not make recommendations prior to
full discussion of the features peculiar to the new situation. In con-
ferences with the persons who have sought his advice, he will usually
find it most helpful to recount the principal characteristics of his
investigation and to ask in return for the essential elements of the
." problem being.brought to him. Such conferences bring into the open
both the potentialities and the i?evitable risks involved in application
of the research results to a new situation. In this process, the agency
under whose auspices the research findings are to be applied becomes
a full partner to the venture, and the reponsibilities for the outcome
of the application are shared by both agency and consultant. It is the
consultant's task to specify the conditions under which his original
findings obtained; it is the agency's task to examine with him differ-
ences in conditions in the new situation. To weigh the importance of
such differences is the combined task of both.
PROPERTY OF I
The Kansas S.ate University of i
Agriculture & Applied Science I
"-_ _ TC M India. _j
Those who refuse to go beyond facts rarely get as far as facts.
T.H.HUXLEY
B. An Introduction toSarnpling
U\ U\
N N
N N
o o
N\ N\
N N NU\ OU\
o o o N\
M
I-<,
CI)
505
506 ESTIMATING FOR A STUDY
TIME BUDGET 2
AN EXPLORATION OF THE IMPACT OF A WORKSHOP FOR TEACHERS
(The figures indicate working days)
Study
Operation Director Consultant Secretary
Formulation 3 3
Negotiations with workshop organizers 1
Development of interview schedule 3 1 1
20 intensive interviews 15 6
Analysis and writing, draft 8 1 2
Editing and final report 1 5 2
Miscellaneous administrative and clerical work 3
Total working days'" 33 11 12
'"There are about 230 to 250 working days in a year, depending on vacation
arrangements.
] ~ 0
'oB OJ
\D
0
\D
~ U
en
...OJ
'" 0 0
'"t:l \D \D
0
U
...,on
III
1>01 '"... .:
OJ
c>: .....0 0
'"
N '"
"" 8
~f>l
U
<:: '"
Ill)
.:
.....: ~
....
~ ~
>- '"
<:II
>..
C1l .....
<:II .:
Ii""2 "0I:l.O ::; 0 I.r\
'" ...... tr\ 00 \D t-- z
0
'" "" '" 0- (3
?S ~=
0 .:
0 <:II
N\ U >
~ 0:;:: .:
""
IJl
t:: .....v 00 N
0
"
Ill)
0 OU"\OOll'\
Q
::::> ~ UC1l <::'"'" ci '"
T r-r. NNNff'..-4
0
T
T
'" "" :.a.:
T
N
~ 0 ;.a Z .:
~ ...<:i
... '"0..
::g ""
'"
c>:
05 U
<:II '"
T
0 0\.1"\001..1"\ 0
'" '" '" '"
'0
~ ~ '"....v OJ
'" ci
ff\ NNNN'\~ T
""
N .....
::: ::l
I:l.O ~
OJ
Z <:II
OJ
'"Z lQ
>.
<: v
,..c::
... /
!:II
~ >..8 0 oS
0 E-< '"t:lU '"
T
""
OV'\lI\Oll'\
NN
"" ......
ON
\D
\D 0 00
>- V5
::l
.~
OJ N
'"
N '">.
<:II
Q Q '"t:l
::::> Ill)
.!. .:
""
tJ:l
:g 2 bD .~ ;.;;;....
<: '"3
8
oS
'0
.:
:.a0
.: OJ
.... .:
0
'"
.~
OJ
0
~
.~
oU. '0 .", U 0
!:II
'0
.:<:II OJ 11"\
'" .: u '0
.: N
.:
'C" '"....
0; <:II
0,c
'"
.:0""
~ '"<:IIbD
<:II
...,0
.: '"
.:
;:: '"'"
0;:: !:II ....
.... '"0 .:c'" 0
507
508 ESTIMATING FOR A STUDY
factory, a11 the households in a particular nty district, a11 the boys in
a given community under sixteen years of age who are stamp collectors,
all the case records in a file.
By certain specifications, one population may be included in
another. Thus, the population consisting of all the men residing in
the United States is included in the population consisting of all the
people who live in the United States. In such instances, we may refer
to the included population as a sub-population, a population stratum,
or simply as a stratum (pl. strata). A stratum may be defined by one
or more specifications that divide a population into mutually exclusive
segments. For instance, a given population may be subdivided into
strata consisting of males under twenty-one years of age, females under
twenty-one years of age, males from twenty-one through fifty-nine
years, etc. Similarly, we;may specify a stratum of the American popula-
tion consisting of white, male, college graduates who live in New
England and who have passed their seventy-fifthJ>irthday; or we may
have some reason for regarding this group of individuals as a popula-
tion in its own right-that is, without reference to the fact that it is
included in a larger popUlation.
A single member of a population is referred to as a population
element. We often want to know how certain characteristics of the
elements are distributed in a population. For example, we may want
to know the age distribution of the elements or we may want to know
the proportion of the elements who prefer one political candidate to
another. A census is a count of all of the elements in a population
and/or a determination of the distributions of their characteristics,
based on information obtained for each of the elements.
It is generally much more economical in time, effort, and money
to get the desired information for only some of the elements than (or
all of them. When we (select some of the elements with the intention
, I
of finding out something about the population from which they are
taken, we refer to that group of elements as a sample. We hope; of
course, that what we find out about the sample is true of the popula-
tion as a whole. Actually, this mayor may not be the case; how
closely the information we receive corresponds to what- we would
find by a comparable census of the population depends largely on the
way the sample is selected.
SOME BASIC DEFINITIONS AND CONCEPTS 511
For example, we may want to know what proportion of a popula-
tion prefers one candidate to another. We might ask one hundred
people from that population which candidate they prefer. The propor-
tion of the sample preferring Mr. Jones mayor may not be the same
as the corresponding proportion in the population. For that matter,
even the actual distribution of votes in an election may not correctly
represent the distribution of preferences in the population. Unless
there is'a 100 per cent turnout, the actual voters constitute only a
sample of the population of people eligible to vote. A very high propor-
tion of the people who prefer Mr. Smith may be overconfident with
respect to their candidate's chances and neglect to come to the polls; or
they may be living in a rural area and be discouraged from coming to
the polls by a heavy downpour. The election results may properly
dete~mine which candidate will take office, but they will not neces-
sarily indicate which candidate is preferred by a majority of the popu-
lation. 2 Similarly, the early returns in an election may be taken
2 It has been a common practice to predict the outcome of an election on the
basis of a pre-election sample survey which, at best, answered only the question of
preferences. The results have occasionally been disastrous_ The fiascos are by no
means attributable simply to the failure of the samples to represent the distribution
of preferences in the population at the time the polls were taken. In one instance
(the presidential election of 1948), the pre-election surveys showed that a large pro-
portion of people were undecided, and there are clear indications that an unantici-
pated consolidation of opinion in this group helped to confound the predictors. As
already indicated in the text, the fact that different proportions of those who prefer
different candidates may actually vote complicates the translation o.f preference
estimates into election forecasts.
There are also measurement problems involved. Preferences measured one way
m3j or may not correspond to preferences measured another way. Thus, behavior in
the voting qooth does not necessarily correspond to preferences expressed to an
interviewer. The former is generally accepted at face value as the more valid measure,
but we have no certainty that this is the case. A housewife, for instance, may follow
her husband's preference rather than her own, at the last moment, and it is possible
that there may be enough such instances to materially affect the outcome of an
election; similarly, othe~ kinds of subjectively felt pressures or momentary impulses
may take effect in the !,!Iection booth. Practical politicians seem to feel that the
position of their candid~te's name on the ballot affects his chances, as do the names
of other candidates running for other offices on the same ticket; such effects may
have bearing on voting behavior without affecting preferences.
Further complications arise from the gerrymandering of election districts and
other factors (e.g., the. electoral college system), which have the effect of giving
different voters different weights in determining the outcome of an election. Perhaps
the moral of this footn6te will be clear: The usefulness of findings obtained from a
sampl~_may depend in Ilarge measure on factors which are extraneous to the sam-
pling l,sues per se. Nor is it easy to draw a hard-and-fast dividing line between the
512 AN INTRODUCTION TO SAMPLING
factors which are' extraneous and those which are not. Thus, what is extraneous to
the sampling of one population (e.g., eligible voters) may be intrinsic to the
sampling of another (e.g., actual voters); the ambiguity arises when we sample one
popUlation with the intention of learning something about the other.
SOME BASIC DEFINITIONS AND CONCEPTS 513
of the time; or within any other limits of accuracy and any assigned
probability. In practice, of course, we do not repeat the same study on
an indefinite number of samples drawn from the same population. But
our knowledge of what would happen in repeated studies enables us to
say that, with a given sample, there is say a 90 per cent probability that
our figures are within 5 percentage points of those that would be shown
by a census of the total population using the same measures. Having
set our level of aspiration for accuracy and confidence in the findings,
we would select from the available alternatives the sampling plan
which can be most economi~ally carried through. Needless to say, the
higher the level of aspiration, other conditions being equal, the higher
the cost of the operation.
A sampling plan that carries such insurance may be referred to as
a representative sampling plan. Note that in this usage the word
"representative" does not qualify "sample," but "sampling plan." What
a. representative sampling plan can do is to insure that the odds are
great enough that the selected sample is, for the purposes at hand,
sufficiently representative of the population to justify our running the
risk of taking it as representative.
The use of such a sampling plan is not the only kind of insurance
that can be taken out to decrease the likelihood of misleading sample
findings. Another involves taking steps to guarantee the inclusion in
the sample of diverse elements of the population and to make sure
( either by controlling the proportions of the various types of elements
or by analytical procedures in the handling of the data) that these-
diverse elements are taken account of in the propoItions in which
they occur in the population. We shall consider this type of insurance
at greater length in our discussion of quota sampling and of stratified
random sampling. :
It should perhaps be emphasized that the dependability3 of survey
findings is affected not only by the sampling plan and the faithfulness
with which it is carried out, but also by the measurement procedures
used. This is one reJson why sample surveys of a large population can,
3 Throughout this appendix, the terms "accuracy," "dependability," and
"precision" are used ihterchangeably. Although technical distinctions are some
times made among these words, in most discussions of sampling they are used
as synonyms.
514 AN INTRODUCTION TO SAMPLING
---
being included, but this is not a necessary condition .. What is necessary
is that for each element there must be SOlTIe specifia~I~l?Jqbability that
it will be inclu~ed. ~is _point will. be considered more fully in con-
nection with the discussions of simple random samples and stratified
randb'm samples. In non probability sampling,' there is no way of
estimating the prob~ility that each element has of being included in
I
Nonprobability Sampling
ACCIDENTAL SAMPLES
In accidental sampling, one simply reaches out and takes the cases
that fall to hand, continuing the process until the sample reaches a
designated size. Thus, one may take the first hundred people one meets
on the street who are willing to be interviewed. Or a college professor,
wanting to make some generalization about college students, studies
the students in his classes. Or a journalist, wanting to know how "the
people" feel about a given issue, interviews conveniently available cab
drivers, barbers, and others who are presumed to reflect public opinion.
There is no known way (other than by doing a parallel study with a
probability sample or with a complete census) of evaluating the biases 8
introduced in such samples. If one uses an accidental sample, one can
only hope that one is not being too grossly misled.
QUOTA SAMPLES
tion value that would be obtained from a very large number of samples selected by a
given procedure and the actual population value, assuming identical measurement
processes.
NONPROBABILITY SAMPLINc' 517
wants to generalize-hence the notion that it "represents" that popula-
tion. If it is known that the population has equal numbers of males
and females, the interviewers are instructed to interview equal numbers
of males and females. If it is known that 10 per cent of the population
lies within a particular age range, assignments are given to the inter-
viewers in such a way that 10 per cent of the sample will fall within that
age range.
The question of the kinds of characteristics that must be taken
into account will be considered in more detail in the course of our
discussion of stratified random sampling. It is enough, for the moment,
to say that in the sampling of preferences, opinions, attitudes, etc.,
experience indicates that it is wise to take into account such bases of
stratification as age, sex, education, geographical region of residence,
socioeconomic status, and ethnic background. Not all these are equally
visible; the usual practice is to set the quotas for the interviewers in
terms of the more manifest traits and to get information in the course
of the interviews on the less manifest ones. The latter information
permits correction of the inadequacies of the sample by adjustments
introduced during the analysis, a procedure that will be illustrated in
the following paragraphs. It also calls attention to omissions, if any
should occur, of important segments of the population.
It often happens, in practice, that the various c9mponents of the
sample turn out not to be in the same proportions as the corresponding
strata are in the population. The interviewers may not have carried out
their instructions exactly; instead of interviewing equal numbers of
males and females, 55 per cent of the people they interviewed may have
been' males. Disproportions between the sample and the population
are most likely to occur, of course, in the less manifest traits which
have not been included 'as part of the specifications for the interviewers'
quotas. Suppose it is k~own that, in a given population, 40 per cent
~have not gone beyond grammar school; suppose, however, that only
20 per cent of the people interviewed fall in this category. The inade-
quacy in the sample can be corrected in the analysis by weighting the
different strata in terms of their proportions in the population. This
may be done by multiplying or dividing the obtained results by the
appropriate figure.
Let us say that the total sample consisted of 1,000 persons, of
518 AN INTRODUCTION TO SAMPLING
whom 800 had attended high school, 200 had not. Suppose we asked
this sample whether they had seen a certain television program, and
the responses were as follows:
No High School High School Total
Yes 20 400 420
No 180 400 580
Total 200 800 1,000
In other words, one-tenth of the people without high-school education
and half of those with such education said they had seen the program.
If we wished simply to report the figures for the educational groups
separately, no adjustment would be needed. But if we wanted to esti-
mate the proportion of the total population that had seen the program,
our sample findings would be misleading. The program had been seen
by 42 per cent of the people in our sample. But our sample un~er
represented people in the lower educational category, overrepresented
those with high-school education. To derive an estimate of the correct
figure for the total population, we must calculate what the responses
would have been if 40 per cent of the people-in the sample had had only
grammar-school education, 60 per cent had attended high school (the
proportions we have assumed for the population). One way of doing
this is to multiply the responses of the no-high-school group by 2 (to
bring the 20 per cent in the sample up to 40 per cent), and of the high-
school group by three-fourths (to reduce the 80 per cent to 60 per
cent). This would give 40 "yes's" in the no-high-school group and 300
in the high-school group, or 340 for the total group; thus we would
estimate that 34 per cent of the population had seen the progr~m,
rather than the 42 per cent we would have estimated if we l;1ad
not weighted the strata in terms of their actual proportions in the
population.
From this example it should he clear-that the critical requirement
in quota sampling is not that the various population strata be sampled
in their correct proportions; but rather that there be enough cases from
each stratum to make possible an estimate of the population stratum
value, and that we know (or can estimate with reasonaBle acctlracy)
~the proportion that each stratum constitutes in the total population.
If these conditions are met, the estimates of the values for the various
NONPROBABILITY SAMPLING 519
strata can be combined to give an estimate of the total population value.
However, despite these precautions in the selection of the sample,
and the corrections in the analysis, quota sampling remains basically
an accidental sampling procedure. The part of the sample in any par-
ticular class constitutes an accidental sample of the corresponding
stratum of the population. The males in the sample are an accidental
sample of the males in the population; the twenty-to-forty-year-olds in
the sample constitute an accidental sample of the twenty-to-forty-year-
olds in the population. If the instructions received by the interviewers
and their execution of these instructions produce correct proportions
of the compound classes (e.g., white males in the twenty-to-forty age
range), the sample cases in these classes are still accidental samples of
the corresponding compound strata in the population. The total sample
is thus an accidental sample.
There is by now, however, enough experience with quota sampling
to make it possible to minimize the risks of at least certain types of
unfortunate accidents. It is known that interviewers, left to .their own
devices, are especially prone to certain pitfalls. They will interview
their friends in excessive proportion. But their friends are likely to be
rather similar in many respects to themselves. Now, consider the pos-
sibility that, in certain matters, people who do interviewing and others
like them are atypical of the population at large. If these matters are
involved in the survey, the sample results are likely to be inaccurate.
Once we are aware of the danger, however, we can take steps to dis-
courage the practice.
If interviewers fill their quotas by stopping passers-by and inviting
them to be interviewed, they will tend to concentrate on areas where
there are large numbers of potential respondents: the entertainment
centers of cities, business districts, railway and air terminals, the en-
trances of large depattment
I
stores. Such samples will overrepresent the
kinds of people who tend to gravitate to these areas. A concentration
I
PURPOSrvE SAMPLES
Probability Sampling
Probability samples involve the first kind of insurance against mis-
leading results that we discussed earlier-the ability to specify the
chances that the sample findings do not differ by more than a certain
amount from the true population values. They may also include the
~ecpnd kind of insurance-a guarantee that enough cases are selected
from each relevant population stratum to provide an estimate for that
stratum of the population.
i
SIMPLE RANDOM SAMPLES
tion of the desired number of cases equally likely. Suppose, for example,
that one wants a simple random sample of two cases from a popula-
tion of five cases. Let the five cases in the population be A, B, C, D, and
E. There are ten possible pairs of cases in this population: AB, AC,
AD, AE, BC, BD, BE, CD, CE, and DE. Write each combination on
a disc, put the ten discs in a hat, mix them thoroughly, and have a
blindfolded person pick one. Each of the discs has the same chance
of being selected. 9 The two cases corresponding to the letters on the
selected disc constitute the desired simple random sample.
There are, in the tiny illustrative population of five cases, ten pos-
sible samples of three cases: ABC, ABD, ABE, ACD, ACE, ADE,
BCD, BCE, BDE, and CDE. Using the same method, one can select
a simple random sample of three cases from this population.
In principle, one can use this method for selecting random samples
from populations of any size, but in practice it could easily become a
lifetime occupation merely to list aU the combinations of the desired
number of cases. The same result is obtained by selecting each case
individually, using a list of random numbers such as may be found in
most textbooks of statistics. These are sets of numbers that after care-
ful examination have shown no evidences of systematic order. Before
using the table of random numbers, it is first necessary to number all
the elements in' the population to be studied. The table is then entered
at some random starting point (e.g., with a blind pencil stab at the
page.), ,and the, cases whose numbers come up as one moves from this
point down the column of numbers are taken into the ~ample-until the
"desired number of cases is obtainecL The-selection of any given case
places no limits on what other cases can be selected, thus making
equally possible the selection of anyone of the many possible combina-
9 In t,his illustration, each of the <liscs (i.e., each combination of t:vo cases)
has one, chance in ten of being selected. Each' of.' the individual cases also has the
same chance of being selected-four in ten, since each case appears on four of the
discs. There are, however, very many ways of giving each case the same chance of
being selected without getting a simple random sample. For example, s~ppose we
were arbitrarily to divide an illustrative population of ten cases into five pairs as
follows: AB, CD, EF, GH, IT. If we write the designations for these pairs on five
I discs, blindly pick one of the discs, and take as our sample the two cases designated
on this disc, then every case has one chance in five of being picked but, obviously,
not every possible combination has the same chance of being selected as every other
-in fad, most of the combinations (e.g., AC) have no chance at all, since they
have not been included on the discs.
PROBABILITY SAMPLING 523
tions of cases. This procedure is therefore equivalent to selecting
randomly one of the many possible combinations of cases.lO
Without going into the mathematical argument, it is possible only
to illustrate the underlying principles of probability sampling. Con-
sider, for this purpose, a hypothetical population of ten cases, as
follows:
Case: A BCDEFGHI I
Sex: F F F FFMMMMM
Age: Y OY OYOYOYO
Score: 0 I 2 3456789
The first five cases are females, the last five males; the cases designated
Yare younger and the O's are older. Age and sex will be considered
10 The procedure of selecting a random sample should not be confused with
the procedure of sampling from a list or a file of cases by taking every kth (for
example, every fourteenth or every sixty-third) case. The latter procedure is called
systematic sampling. Systematic samples may be either probability or nonprobability
samples, depending on how the first case is selected. Suppose one wants to select
every sixtieth case. To get a probability sample, the first case has to be selected
randomly from the first sixty, and every sixtieth case thereafter is selected. If the first
case is not selected randomly, the resulting sample is not a probability sample since
most of the cases have a zero probability of being included in the sample. Although
to the uninitiated systematic sampling seems to be the most natural and rational
way to go alJout sampling from a list, it involves complications not present in a
simple random sample. When the first case is drawn raIJdomly, in a systematic
sample, there is in advance no limitation on the chances of any given case to be
included in the sample. If we are selecting a sample of 100 cases from a population
of 6,000, before the first case is selected each case has one chance in sixty (100 in
6,000) of being included in the sample, whether we are using simple random or
systematic sampling. But in a systematic sample, once the first case is selected, the
cl:j.ances of other cases are altered. Suppose the first case drawn is #46. Selecting
every sixtieth case thereafter means that # 106, 166, 226, etc., will be drawn; the
cases between these numbers now have no chance of being included.
This means that a systematic sampling plan does not give all possible combina-
tions of cases the same chance of being included; only combinations of elements 60
cases apart in the list have any chance of being selected for the sample. The results
may be quite deceptive lif the cases in the list are arranged in some cyclical order.
Suppose, for example, that the 6,000 cases are houses 'in a community that was
built according to a systematic plan, and that they are listed in order of streets and
numbers. Corner house~ would then appear at regular intervals throughout the list;
say, the first house and every twentieth house thereafter is a corner dwelling. A
sample consisting of ca~es 1, 61, 121, etc., would be made up entirely of corner
houses; one consisting of cases 2, 62,122, etc., would contain no corner houses. But
corner houses are usua\!y larger and more expensive than those within the block,
and their occupants may accordingly differ systematically in certain characteristics.
Thus any sample made up entirely of corner houses, or entirely lacking in corner
houses, would give misleading results if the study concerned characteristics in which
occupants of the two types of dwellings differ.
524 AN INTRODUCTION TO SAMPI.ING
TABLE 1
MEAN SCORES OF SAMPLES FROM ILLUSTRATIVE
POPULATION OF TEN CASES WITH POPULATION MEAN SCORE
OF 4.5 (SIMPLE RANDOM SAMPLES)
Number of Samples
Samples of Samples of SampJesof
Sample Means l l 2 cases 4 cases 6 cases
.5 1
1.0 I
1.5-1.75 2 2
2.0-2.67 5 10 2
2.75-3.25 3 25 10
3.33-4.00 8 43 52
4.17-4.83 5 50 82
5.00-5.67 8 43 52
5.75-6.25 3 25 10
6.33-7.0 5 10 2
7.25-7.5 2 2
8.0 I
8.5 I
Total no. of samples 45 210 210
Mean of sample means 4.5 4.5 4.5
% of sample means
greater than 4.00
and less than 5.00 II 24 39
%of sample means
greater than 2.67
and less than 6.33 60 89 98
included. That is, the magnitude of the errors that are likely depends
more on the absolute size of the sample than on the proportion of the
population that it includes. Thus, the estimation of popular pref-
erences in a national pre-election poll, within the limits of a given
margin of error, would not require a substantially larger sample than
the estimation of the preferences in anyone state where the issue is in
doubt. Conversely, it would take just about as large a sample to estimate
the preferences in one doubtful state with a given degree of accuracy
as it would to estimate the distribution of preferences in the entire
nation. This is true despite the fact that a sample of a few thousand
cases obviously includes a much larger proportion of the voters in one
state than the same-sized sample does of the voters in the nation.13
taken from other strata. Again, of course, the figures from the various
strata would have to be appropriately weighted in estimating the total
volume of sales in the city.
Another reason for taking a larger proportion of cases from one
stratum than from others is that one may want to subdivide the cases
within each stratum for further analysis. Let us say that in our survey
of retail sales we want to be able to examine separately the volume of
sales made by food stores, by clothing stores, etc. Even though these
classifications are not taken into account in selecting the sample (i.e.,
the sample is not stratified on this basis), it is clear that one needs a
reasonable number of cases in each volume-of-sales stratum to make
possible. analysis of different types of stores within each stratum. If a
given stratum has relatively few cases, so that sampling in the propor-
tion used in other strata would not provide enough cases to serve as an
adequate basis for this further analysis, One may take a higher proPQr
tion of cases in this stratum.
One of the major reasons for varying the sampling proportions for
different strata cannot be fully explained without going into the mathe-
matical theory of sampling, but the principle involved can be under-
stood on a more or less intuitive basis. Consider two strata, one of
which is much more homogeneous with respect to the characteristics
being studied than the other. For a given degree of precision, it will
take a smaller number of cases to determine the state of affairs in the
first stratum than in the second. To take an extreme example: suppose
that there is reason to know that every case in a given stratum ha.:; the
same score; one could then determine how to represent that stratum
in the total sample on the basis of a sample of one case. Of COllrs_e, In
such an extreme case one IS not likely to have this information without
also knowing what the common score is. But in less extreme cases one
can often anticipate the relative degrees ofl homogeneity or hetero-
geneity of strata before carrying out the survey. For example, there may
be a great deal of experience to suggest that, with respect to certain
types of opinion questions, men will differ among themselves much
!pore than will wo~en; one would accordingly plan one's sample for
a survey of such opinions so as to provide for sampling a larger ,propor-
tion of men than of women. Because women may be expected to be
PROBABILITY SAMPLING 531
more alike than men in these matters, they do not have to be sampled
as thoroughly as do the men for a given degree of precision.
In general terms, one can expect the greatest precision if the
various strata are sampled proportionately to their relative variabilities
with respect to the characteristics under study rather than proportion-
ately to their relative sizes in the population. A special case of this
principle is that, in sampling to determine the proportion of cases
possessing a particular attribute, strata in which one can anticipate that
about half the cases will have the attribute and half will not should be
sampled more thoroughly than strata in which one would expect a
more uneven division. Tl:ms, in planning a stratified sample for predict-
ing a national election, using states as strata, one should not plan to
sample each state in proportion to its eligible population; it would be
wiser to sample most heavily in the most doubtful states.
One final point about stratified sampling: There may be reason to
believe that certain criteria will provide very effective bases for stratifica-
tion (ie., using these criteria, we would get strata which differ markedly
from one another), b':!t, as pointed out in the discussion of quota
sampling, the relevant data may become available only in the course
of the survey. In this case one cannot use the- ;'lteria in the sampling
design, but one can apply the logic of stratified sampling theory in the
analysis of the data. Thus, one can take a simple random sample,
ascertain the information necessary for stratification during the course
of the interviews, and use this information in grouping the cases
according to their respective strata and weighting them appropriately
in the analysis of the data.
For example, suppose that we want to survey the attitudes of the
students in a certain school toward some issue and that we have some
reason to believe that the proportions of "pro's," "anti's," and "un-
decideds" are likeiy to be different among the Negro and the white
students. Suppose,; further, that we have a complete listing of the stu-
dent body but no identification of the race of the individual students,
even though we know that 30 per cent of the students are Negro and
70 per cent are white.
I,
We could draw a simple random sample of the
students and ascertain the race of each respondent while recording his
views on the issu",. The data might then come out as follows:
532 AN INTRODUCTION TO SAMPLING
CLUSTER SAMPLING
school districts included in the sample, list the schools and take a
simple or stratified random sample of them .. 1f some or all of the
schools thus selected for the sample have more seventh-grade classes
than can be studied, one may take a sample of these classes in each of
the schools. The survey instruments may then be administered to all
the children in these classes or, if it is desirable and administratively
feasible to do so, to a sample of the children.
Similarly, a survey of urban households may take a sample of
cities; within each city that is selected, a sample of districts; within
each selected district, a sample of households.
Characteristically, the procedure moves through a series of stages-
hence the common term, "multi-stage" sampling-from more inclusive
to less inclusive sampling units until one finally arrives at the popula-
tion elements that constitute the desired sample.
Notice that with this kind of sampling procedure it is no long~r
true that every combination of the desired number of elements in the
population (or in a given stratum) is equally likely to be selected as the
sample of the population (or stratum). Henc~, the kinds of effects we
noticed in our analysis of simple and stratified random sampling of our
hypothetical population of ten cases (the pop-ulation value being the
most probable sample result and larger deviations from the population
value being less probable than smaller ones) cannot develop in quite
the same way. Such effects do, however, occur in a more corriplicated
way,11 provided that each stage of cluster sampling is carried out on a
17 The complication arises from the fact that there are two sourc,es of sampling
error: the sampling of the larger sampling, units and the sampling of population'
elements within the larger units. To illustrate the point that cluster sampling does
have the same kinds of effects as simple and stratified random sampling, let us con-
sider the simple case in which the second source of error is eliminated by studying
all the population elements in the sampled larger units. Each larger unit has its score
(consisting, say, of, the mean score of its elelpents) . But this leaves us with a simple
or stratified random sample of the population of larger, units-no different, in prin
ciple, from a simple or stratified sample of population elements. Hence, it is clear
that the trends we noted in connection with random samples will tend to occur un
this level. Now, if instead of taking 100 per cent sa:mples of the elements in each
larger unit, we were to take a simple or stratified random sample of the elements in
each unit, the larger units become the populations from which these sampl~s are
drawn--and the tendencies we noted will again occur. Thus, these tendencies
to~rd the greatest projJability of achieving a sampling result that is the same as
the population value, and toward progressively larger deviations becoming progres-
sively less probable, will occur with respect to both sources of error that are involved
in cluster sampling.
PROBABILITY SAMPLING
types of information. This does not mean that one is not concerned
with the possibility of error; but one places one's reliance on the in-
ternal consistency of the data and its coherence with other things that
one knows.
Another special case justifying the use of nonprobabiIity samples
arises from the fact that there are many important considerations in
research in addition to the sampling design. It may be necessary to
balance one consideration against another-for example, a better sam-
pling design against a more sensitive method of data collection.
Ackerman and Jahoda (1950), for example, studied the characteristics
of patients in psychoanalytic treatment who had given expression to
anti-Semitic sentiments. With complete protection of the anonymity
of the patients, some forty analysts served as informants. The sample
of psychoanalysts was, of necessity, an accidental one and, conse-
quently, so was the sample of patients. Suppose that the investigators.
could hayes,olved the problem of obtaining a probability sample of all
psychoanalytic patients in a given area, should they have done so?
Assume that this would have required giving l!P the psychoanalysts as
informants and substituting a relatively superficial direct interview.
Similarly, in a study of factors related to th<;: use of narcotics by
boys in juvenile street gangs, Chein and .his associates (see Wilner
et al., 1957) used group workers as informants (also with complete
protection of the anonymity of the individual gang member). These
workers had spent months winning the confidence of the boys, convinc-
ing the latter that they were not confederates of the police, ~ocial
reformers, or other things reprehensible in the eyes of the boys; and
they had been working closely with the gangs for many more months-
in some instances, for several years. Since these informants were avail-
able only for the gangs that were being worked with, the sample 6f
gangs-and hence of gang members-was an_accidental sample. ASsum-
ing that (1) it would have been possible to get a probability sample of
gang members and that (2) the information obtained through the
group workers was much more dependable than w01,lld have been
information obtained through direct interview, what should the in:vesti-
gators have done?
The answer to such a question is not easy. The first thing, to do,
of course, is to assure oneself that the dilemma is real. If convinced that
APPLICATIONS OF NONPROBABILITY SAMPLING 541
it is, one must then decide whether the problem is, under the circum-
stances, worthy of investigation at all. If the answer is still in the
affirmative, one must decide, in terms of the research purpose, whether
it would be better to gather more adequate information based on a
not very sound sample or less adequate information based on a sounder
sample.
We corne, finally, to another special and controversial case of non-
probability sampling. Many studies in behavioral science are carried
out on accidental samples of SUbjects. The data are treated, however,
in a manner that is appropriate only to probability samples. For ex-
ample, statistical tests of significance which presuppose random sam-
pling are applied to the data.
One justification of this practice is completely spurious. The in-
vestigators argue that they are interested not in estimating population
values, but in studying relationships among variables. For example, the
question, "What are the effects of variations in routines of memorizing
on the retention of the memorized materials?" does not seem to have
reference to any population. Relationships, however, are subject to
sampling error just as averages and proportions are. If a great many
samples are taken from a given population, certain relationships may
appear among some of the variables in some of the samples, and may
not appear, or may appear in different degree: in others. Hence the
results for a given sample may be-quite misleading. If the samples are
probability samples, we may legitimately estimate the probability of
being in error by more than a specified amount; if they are not, we
cannot legitimately make such estimates. Moreover, the answer to the
question may be quite different for different populations of subjects
(e.g., subjects differing in educational experience), for different popula-
tions of materials ito be memorized (e.g., nonsense syllables vs. mean
ingful poems), and for different populations of associated conditions
I
does not. Does this not then point to the possibility of defining a more
inclusive population for which the relationship holds, and would we
not want to know the specifications of this more inclusive population?
Suppose, on the other hand, that it does hold only for the sophomores
at our particular college. Would we not then want to know what is
so unique about our population of sophomores? And would not the
tentative formulation of possible uniquenesses of our population sug-
gest hypotheses that we would want to explore-hypotheses that might
suggest population specifications that cut across our initial population
and that include elements not included in our initial population? In
either case, would we not want to press toward the discovery of the
specification of a population within which the trend that we have dis-
covered in our population to be statistically significant becomes a
virtual certainty?
It should perhaps be added that we are not, in these last few para-
graphs, preaching a paralyzing spirit of agnosticism that would pro-
hibit anyone from coming to any conclusions. The progress of science
and the scientific tenability of conclusions at-any point in time are,
after all, based on the coherence and consistency of many bits of
fallible evidence, the articulation of theory, and the interlocking of the
individually fallible bits of evidence with theory. It has been empha-
sized elsewhere in this book th<J.t science offers no possibilities of
absolute proof. The scientist can, at most, aspire to the soundest con-
clusions that can be reached in the light of the best evidence that can
be brought to bear on any issue. At the same time, science_would only
degenerate into dogma if one would not constantly remain alert to
the sources of ambiguity and fallibility in the available evidence and
the semantic gaps that may lie concealed in the ~eneralizations that are
drawn; if one would not attempt to. weigh the possible alternatives that
may be compatible with the evidence, particularly in the light of tpe
sources of fallibility and ambiguity; if one would not attempt to pin-
point the gaps in knowledge; and if, even though one has dismissed
some alternative on the ground that it is not sufficiently plausible to
merit serious consideration or dismissed some manifest gap in knowl-
edge as not sufficiently germane to merit intensive exploration, one
lVould not be constantly prepared to reopen these issues and remain
APPLICATIONS OF NONPROBABILITY SAMPLING 545
sensitive to the possibility of reopening them. In the light of these con-
siderations, what we have attempted to do in these last few paragraphs
is merely to look at a considerable body of contemporary research and
research practice in the perspective of sampling theory. If there is any
preachment implied, it is only another lesson in scientific humility.
AppendixC
QUESTIONNAIRE CONSTRUCTION AND
INTERVIEW PROCEDURE
BY ARTHUR KORNHAUSER AND PAUL B. SHEATSLEY
Probably the best way to begin is to outline or list the topics for
the questionnaire, consider carefully what is likely to be the best
sequence of topics (not the logical sequence, but the best psychological
sequence from the standpoint of the respondent), and then write the
q1.festions.
In addition to the questions deemed essential, the questionnaire
writer sometimes finds it wise to include a few extra ones aimed at
checking the reliability of responses or measuring the 'influence of
changes in wording.iFor example, two or more roughly equivalent or
closely related questions, well separated in the questionnaire, may be
asked in order to m~asure consistency of answers. The effect of dif-
ferent wording may be determined by constructing two parallel forms
of questionnaire ("split-ballot technique"), to be used with equivalent
samples of the popul~tion. The two forms have some of their questions
in common, but certain other questions are worded in different ways
in order that the effects of these_ differences may be measured.
550 QUESTIONNAIRE CONSTRUCTION, INTERVIEW PROCEDURE
After all the preceding steps have been completed, the question-
naire should be ready for use. All that remains is a final editing by the
research staff to ensure that every element passes inspection: the con-
552 QUESTIONNAIRE CONSTRUCTION, INTERVIEW PROCEDURE
remarkably few refusals to reply. A great deal depends upon the inter-
viewer's own attitude. If he is embarrassed or feels that the question is
too personal, his doubts are readily transmitted to the respondent. If
he confidently expects a reply, he is likely to get it.
more than one idea; items that influence responses by being over-
specific; items that overrepresent or underrepresent one side of an
issue. The following examples illustrate some of these faults:
Does the question come too early or too late from the point of
view of arousil1g interest and receiving sufficient attention,
I avoiding resistance, etc.?
EXAMPLES: Some suspicion may be aroused if a home interview
with workers opens ~bruptly with the question: "Where do you work?
In what department?"
An opening qurstion such as, "Do you think the government is
giving the public as much information as it should about the hydrogen
bomb?" is likely to arouse some resistance, since the respondent may
574 QUESTIONNAIRE CONSTRUCTION, INTERVIEW PRODECURE
.
Within the limits of survey design, however, there is ample room
This section is tal)en, with slight modifications, from a discussion by Paul
. 6
B. Sheatsley which appeared in Volume II of Research Methods in Social Relations,
edited by Marie Jahoda, Morton Deutsch, and Stuart W. Cook (The Dryden
Press, 1951), pages 463-492.
THE ART OF INTERVIEWING 575
for "the art of interviewing" to come into play. The interviewer's art
consists in creating a situation wherein the respondent's answers will
be reliable and valid. The ideal usually sought is a permissive situation
in which the respondent is encouraged to voice his frank opinions
without fearing that his attitudes will be revealed to others and with-
out the expression of any surprise or value judgment by the interviewer.
The first requisite for successful interviewing, therefore, is to
create a friendly atmosphere and to put the respondent at his ease.
With a pleasant, confident approach and a questionnaire that starts off
easily, this is usually not difficult to achieve. From then on, the inter-
viewer's art consists in asking the questions properly and intelligibly, i1)
obtaining a valid and meaningful response, and in recording the re-
sponse accurately and completely.
critical test of a good interviewer, and since no cne can foresee all the
possible replies which may call for probes, each interviewer must
understand fully the over-all objective of each question, the precise
thing it is trying to measure. Both the written instructions and the oral
training should emphasize the purpose of the question and should give
examples of inadequate replies which were commonly encountered
during the pretest. By the time he is actually out interviewing, the
interviewer should have formed the automatic habit of asking himself,
after each reply the respondent gives him: "Does that completely
answer the question I just asked?"
When the first reply ,is inadequate, a simple repetition of the
question, with proper emphasis, will usually suffice to get a response
in satisfactory terms. This is particularly effective when the respondent
has seemingly misunderstood the question, or has answered it irrele-
vantly, or has responded to only a portion of it. If the respondent's
answer is vague or too general or incomplete, an effective probe is:
"That's interesting. Could you explain that a little more?" or "Let's
see, you said. . . . Just how do you mean that?"
Throughout, the interviewer must be "extremely careful not to
suggest a possible reply. People sometimes find the questions difficult,
and sometimes they are not deeply interested, in them. In either case,
they will welcome any least hint from the interviewer which will enable
them to give a creditable response. Interviewers must be thoroughly
impressed with the harm which results from a "leading probe," from
any remark which "puts words in their mouth." To be safe, the inter-
viewer should always content hims,elf with mere repetition of all or
part of the actual question, or with such innocuous nondirective probes
as are suggested in the preceding paragraph.
The "Don't know" reply is another problem for the interviewer.
Sometimes that response represents a gen.uine lack of opinion; but at
other times it may hide a host of other attitudes: fear to speak qne's
mind, reluctance to focus on the issue, vague opinions never yet ex-
pressed, a stalling for time while thoughts are marshaled, a lack of com-
prehension of the question, etc. It is the interviewer's job to distinguish
among all these types of "Don't know" response and, when appropriate,
to repeat the question with suitable assurances. In one case, for ex-
ample, he might say, "Perhaps I didn't make that too clear. Let me
THE ART OF INTERVIEWING 579
read it again"; in another, he might say, "Well, lots of people have
never thought about that before, but I'd like to have your ideas on it,
just the way it seems to you." Or, again, he might point out, "Well, I
just want your own opinion on it. Actually, nobody really knows the
answers to many of these questions."
Qualified answers to questions that have been precoded in terms
of "Yes-No," "Approve-Disapprove" or similar dichotomies are an
interviewing problem which is actually in the domain of the study
director. As far as possible, the most frequent qualifications of opinion
should be anticipated in the actual wording of the question. If very
many people find it impossible to answer because of unspecified con-
tingencies, the question is a poor one. Most qualifications can be
foreseen as a result of the pretest, and those that are not taken care of
by revisions of the wording should be mentioned in the instructions to
interviewers, with directions on how to handle such answers. In some
cases, special codes may be provided for the most frequent qualifica-
tions; in other cases the interviewer may be instructed to record them as
"Don't know" or "Undecided." In avoiding many qualifications in-
herent in the response to almost any opinion question, the interviewer
may find it helpful to use phrases such as, "Well, in general, what
would you say?" or "Taking everything into consideration," or "On
the basis of the way things look to you now." /
There are two chief means of recording opinions during the inter-
view. If the question is precoded, the interviewer need only check a
box or circle a code~ or otherwise indicate which code comes closest
to the respondent's opinion. If the question has not been precoded,
the interviewer is expected to record the response verbatim.
On precoded questionnaires, errors and omissions in recording are
a frequent source of!interviewer error. In the midst of trying to pin the
respondent down to a specific answer, keep his attention from Bagging,
remember which qyestion comes next, and the many other problems
that engage the interviewer's attention in the field, it is not surprising
that he will sometimes neglect to indicate the respondent's reply to one
580 QUESTIONNAIRE CONSTRUCTION, INTERVIEW PRODECURE
of the items, overlook some particular question, check the wrong code
on another, or ask some other question when it should be skipped.
The better the interviewer, the fewer the mistakes he will make,
but even the best interviewers will occasionally be guilty. The unfor-
givable sin is to turn in the interview as complete when it contains such
errors and omissions. The only certain way for the interviewer to avoid
this is to make an automatic habit of inspecting each interview, im-
mediately after its completion, before he goes on to another respond-
ent, to make sure that it has been filled in accurately and completely.
If he is lacking any information, he can go back and ask the respondent
for it; if his questionnaire contains any errors or omissions, he can
correct them on the spot; if his handwriting is illegible in places, or if
he has recorded verbatim replies only sketchily, he can correct the
weakness right there. If he waits until later in the day, or until he
returns home at night, he will have forgotten many of the cir~um
stances of the interview, or perhaps the prospect of editing the whole
day's work will seem so forbidding that he will skip the matter com-
pletely. .
The importance of clerical errors and omissions can be impressed
upon the interviewer during training by pointing out that the question-
naire is designed as an integral whole, and that the omission or inac-
curate reporting of a single answer can make the entir~ interview
worthless. Thus, if for each question the responses of persons with
different amounts of education are to be shown separately, and the
interviewer neglects to record the amount of schooling the respondent
has had, that whole interview must be discarded in that part of the
analysis.
In reporting responses to free-answer questions, interviewers
should be aware of the importance of complete, verbatim reporting.
It will often be difficult to get down everything the responde-nt says
in reply, bOut aside from obvious irrelevancies and repetitions, this
should be the goal. Interviewers should be given some idea of the
coding process, so that they can see the dangers of summarizing, abbre-
viating, or paraphrasing responses. Unless the coder can view the whole
answer, just as the respondent said it, he is likely to classify it im-
properly or lose some important distinctions that should be made.
Interviewers should be instructed to quote the respondent directly,
THE ART OF INTERVIEWING 581
just as if they were news reporters taking down the statement of an
important official. Paraphrasing the reply, summarizing it in the inter-
viewer's own words, or "polishing up" any slang, cursing, or 'bad
grammar not only risks distorting the respondent's meaning and
emphasis, but also loses the color of his reply. Frequently the verbatim
responses of individuals are useful in the final report as illustrations of
the nuances of attitudes, and they should not be abbreviated or dis-
torted.
Although it is frequently difficult to record responses verbatim
without using shorthand, 7 a few simple techniques can greatly increase
the interviewer's speed and the extent to which he succeeds in the
verbatim recording of responses. It is perfectly permissible to ask the
respondent to wait until the interviewer gets down "that last thought
(that's pretty interesting)," but in order not to slow up the interview,
the following devices will be found helpful for speedy recording. First,
an interviewer should be prepared to write as soon as he has asked
a question and to write while the respondent talks, not waiting until
the entire response is completed. (Experienced interviewers often
finish their recording of the prior response while they ask the next
question and the respondent is considering his reply.) Second, the
interviewer should use common abbreviations. Third, he should not
bother to erase, but should cross out instead. Fo'urth, he may depart
from the ideal of verbatim recording to the extent of using a telegraphic
style; omission of "a", "the," and such parenthetical expressions as
"well," "you know," "let's see," will ordinarily not lead to loss or
distortion of meaning. But the interviewer should not speed up his
recording' by merely jotting down key words here and there. The
connecting words and phrases are easily forgotten, and the recorded
answer, even if it means something to the interviewer, may prove in-
comprehensible to the coders.
It is generally helpful if, on precoded questions, the interviewer
reports verbatim anything the respondent says to explain or qualify
his coded response; but he should not solicit such comments. The
volunteered remark~ of respondents often help the study director later
7 Shorthand recording, although it has the advantage of more easily achieving
a verbatim report, has the disadvantage of requiring later transcrjption, which may
be very time-consuming!and thus expensive.
582 QUESTIONNAIRE CONSTRUCTION, INTERVIEW PRODECURE
in evaluating the meaning of the results and warn him of any com
monly held qualifications or differences in intensity of opinion.
SAMPLING
LAPIERE, R. T., 1934. Attitudes vs. actions. Social Forces, 14, 230-237.
LASSWELL, H. D., 1942a. Analyzing the content of mass communication:
a brief introduction. Document No. 11, Library of Congress, Experi-
mental Division for Study of War Time Communications.
LASSWELL, H. D., 1942b. The politically significant content of the press:
coding procedures. Journalism Quarterly, 19, 12-23.
LASSWELL, H. D., 1946. Describing the contents of communications. In
Propaganda, communisation, and public opinion, by B. L. Smith,
H. D. Lasswell and R. D. Casey. Princeton University.
LASSWELL, H. D., N. LEITES, and associates, 1949. Language of politics.
George W. Stewart.
LAZARSFELD, P. F., 1935. The art of asking why. National Marketing
Review, 1, 26-38.
LAZARSFELD, P. F., 1944. The controversy over detailed interviews-an
offer for negotiation. Public Opinion Quarterly, 8, 38-60.
LAZARSFELD, P. F., 1957. Latent structure analysis. Bureau of Applied
Social Research, Columbia University. Mimeographed.
LEAHY, A., 1931. Punching psychological and sociological data on Hol-
lerith cards. J. Applied Psychology, 15, 199-207.
LEE, A. M., 1949. A sociological discussion of consistency and incon-
sistency in intergroup relations. J. Social Issues,S, 12-18.
LEE, A. M., and N. D. HUMPHREY, 1943. Race riot. Dryden.
LEE, R. S., 1957. The family of the addict: a comparison of the family
experiences of male juvenile heroin addicts and,controls. Unpublished
doctoral dissertation, New York University, Graduate School of Arts
and Science.
LEIGHTON, A. H., 1949. Human relations in a changing world. Dutton.
LEVINE, J. M., and G. MURPHY, 1943. The learning and forgetting of con-
troversial material. J. Abnormal and Social Psychology, 38,,507-517.
LEVY, D. M., 194~. Maternal overprotection. Columbia University.
LEWIN, K., 1951. Formalization and progress in psychology. In Field
theory in social science: selected theoretical papers, ed. by D. Cart-
wright. Harper.
LEWIN, K., C. E. MEYERS, J. KALHORN, M. L. FARBER, and J. R. P.
FRENCH, JR., 1944. Authority and frustration. Studies in topological
and vector psychology III. University of Iowa Studies in Child Welfare
20.
LEWIS, H. B., 1941. Studies in the principles of judgments and attitud<:!s:
IV. The operation of "prestige suggestion." J. Social Psychology, 14,
1229-256.
I LIKERT, R., 1932. A technique for the measurement of attitudes. Archives
Psychology, No. 140.
LIKERT, R., and R. LIPPITT, 1953. The utilization of social science. In
BIBLIOGRAPHY 599
Research methods in the behavioral sciences, ed. by L. Festinger and
D. Katz. Dryden.
LINDGREN, E. J., 1935. Field work in social psychology. British J. Psychol-
ogy, 26, Part 2,174-182.
LINDQUIST, E. F., 1953. Design and analysis of experiments in psychology
and education. Houghton Miffiin.
LINDZEY, G., and E. F. BORGATTA, 1954. Sociometric measurement. In
Handbook of social psychology, Vol. I, ed. by G. Lindzey. Addison-
Wesley.
LOEVINGER, J., 1947. A systematic approach to the construction and eval-
uation of tests of ability. Psychological Monographs, 61, No.4.
LOEVINGER, J., 1948. The technic of homogeneous tests compared with
some aspects of "scale analysis" and factor analysis. Psychological
Bulletin,45,507-529.
LOWENTHAL, L., 1943. Biographies in popular magazines. In Radio re-
search 1942-43 ed. by P. F. Lazarsfeld and F. N. Stanton. Duell,
Sloan.
LTJNDBERG, G. A., M. KOMAROVSKY, and M. A. McINERY, 1934. Leisure:
a suburban study. Columbia University.
MCCARTHY, P. J., 1951. Sample design. In Research methods in social
relations, Vol. II, first ed., ed. by M. Jahoda, M. Deutsch, and S. W.
Cook. Dryden.
MCCLELLAND, D. C., 1951. Personality. Dryden.
MACCOBY, E. E., and N. MACCOBY, 1954. The interview: a tool of social
science. In Handbook of social psychology, Vol. 14 ed. by G. Lindzey.
Addison-Wesley.
MACCRONE, I. D., 1937. Race attitudes in South Africa. Oxford.
MCGRANAHAN, D. V., 1951. Content analysis of the mass media of com-
munication. In Research methods in social relations, Vol. II, first ed.,
, ed. by M. Jahoda, M. Deutsch, and S. W. Cook. Dryden.
MCGREGOR, D., 1935, Scientific measurement and psychology. Psycho-
logical Review, 42,246-266.
McNEMAR, Q., 1940.: Sampling in psychological research. Psychological
Bulletin, 37,331-365.
MARGENAU, H., 19 50~ The nature of physical reality: a philosophy of
modern physics. McGraw-HilI.
MEAD, M., 1946. Rese:arch on primitive children. In Manual of child psy-
chology, ed. by L. Carmichael. Wiley.
MEREI, F., 1949. Group leadership and institutionalization. Human Rela-
tions, 2, 23-29.
MERTON, R. K., 194 Y. Selected problems of field work in the planned
community. American Sociological Review, 12, 304-312.
MERTON, R. K., 1957. Social theory and social structure, (rev. ed.). Free
Press.
600 BIBLIOGRAPHY
SNEDECOR, G. W., 1956. Statistical methods. 5th ed. Iowa State College.
SOLOMON, R. L., 1949~ Extension of control group design. Psychological
Bulletin, 46, 137-~ 50.
SPITZ, R. A., and K. M. WOLF, 1946. Anaclitic depression: an inquiry
into the genesis of psychiatric conditions in early childhood, II. The
Psychoanalytic Sttidy of the Child, Vol. II.
STAR, S. A., and H. M. HUGHES, 1950. Report on an educational campaign:
the Cincinnati pl:m for the United Nations. American J. Sociology,
55,355-361.
604 BIBLIOGRAPHY
STEIN, M. L, 1947. The use of a sentence completion test for the diagnosis
of personality. J. Clinical Psychology, 3, 47-56.
STEINZOR, R, 1949. The development and evaluation of a measure of social
interaction. Human Relations, 2,103-122.
STEPHAN, F. F., and P. J. MCCARTHY, 1958. Sampling opinions: an anal-
ysis of survey procedure. Wiley.
STEPHENSON, W., 1953. The study of behavior: Q-technique and its meth-
odology. University of Chicago.
STEVENS, S. S., 1946. On the theory of scales of measurement. Science, 103,
677-680.
STEVENS, S. S., 1951. Mathematics, measurement, and psychophysics. In
Handbook of experimental psychology, ed. by S. S. Stevens. Wiley.
STEVENS, S. S., 1957. On the psychophysical law. Psychological Review,
64,153-181.
STOUFFER, S. A., 1949. An analysis of conflicting social norms. American
Sociological Review, 14, 707-717.
STOUFFER, S. A, et al., 1949a. The American soldier: ad;ustment during
army life. Studies in Social Psychology in World War II, Vol. I.
Princeton University"
STOUFFER, S. A, et a1., 1949b. The American soldier: combat and its after-
math. Studies in Social Psychology in Wqrld War II, Vol. II. Prince-
ton University.
STOUFFER, S. A, et al., 1950. Measurement and prediction. Studies in
Social Psychology in World War II, Vol. IV. Princeton University.
Technical recommendations for psychological tests and diagnostic tech-
niques, 1954. Prepared by a joint committee of the American Psycho-
logical Association, American Educational Research Association, and
National Council on Measurements Used in Education. Supplement
to Psychological Bulletin, 51, No.2, Pt. II.
THELEN, H., and J. WITHALL, 1949. T,hree frames of reference: a de-
scription of climate. Human Relations, 2,159-176.
THOMAS, W. I., and F. ZNANIECKI, 1918. The Polish peasant in Europe
and America. Badger.
THOMSON, G. H., 1946. The factorial analysis of human ability (2nd ed.).
University of London. .
THORNDIKE, R. L., 1949. Personnel selection test and measurement tech-
niques. Wiley.
THURSTONE, L. L., 1927. The method of paired comparisons for social
values. J. Abnormal and Social Psychology, 21,384-400.
THURSTONE, L. L., 1928. An experimental study of nationality preferences.
J. Genetic Psychology, 1,405-425.
I THURSTONE, L. L., 1929. Theory of attitude measurement. Psychological
Bulletin, 36,222-241.
BIBLIOGRAPHY 605
THURSTONE, L. L., 1931. The measurement of social attitudes. J. Ab-
normal and Social Psychology, 26,249-269.
THURSTONE, L. L., and E. J. CHAVE, 1929. The measurement of attitude.
University of Chicago.
TORGERSON, "V. S., 1958. Theory and methods of scaling. Wiley.
TRYON, R. C., 1955. Identification of social areas by cluster analysis. Uni-
versity of California Publications in Psychology, 8, No. 1.
TRYON, R. C., 1957a. Communality of a variable: formulation by cluster
analysis. Psychometrika, 22,241-260.
TRYON, R. C., 1957b. Reliability and behavior domain validity: reformu
lation and historical critique. Psychological Bulletin, 54, 229-249.
UNDERWOOD, B. J., 1957. Psychological research. Appleton-Century-Crofts
UNIVERSITY OF CHICAGO PRESS, 1949. A manual of style (II th ed.) .
WALKER, H. M., and J. LEV, 1953. Statisticalinference. Holt.
WARNER, W. L., and associates, 1941-1947. Yankee City series. Yale Uni.
versity. 4 vols.
WATSON, G. B., 1925. The measurement of fair-mindedness. Teachers
College, Columbia University Contributions to Education No. 176.
Teachers College, Columbia University.
WEITZ, J., and R. C. NUCKOLS, 1953. The validity of direct and indirect
questions in measuring job satisfaction. Personnel Psychology, 6, 487-
494.
WESCHLER, 1. R., 1950. An investigation of attitudes toward labor and
management by means of the error-choice method. J. Social Psychol-
ogy, 32,51-67.
WEHL, H., 1949. Philosophy ot mathematics and the natural sciences.
Princeton University.
WHITE, B. W., and E. SALTZ, 1957. Measurement of reproducibility. Psy-
chological Bulletin, 54, 81-99.
WHITE, R. K., 1949. Hitler, Roosevelt, and the nature of war propaganda.
, J. Abnormal and Social Psychology, 44,157-174.
WHITING, J. W. M., and I. L. CHILD, 1953. Child training and personality:
a cross-cultural study. Yale University.
WHYTE, W. F., 1951. Observational field-work methods. In Research
methods in social relations, Vol. II, first ed., ed. by M. Jahoda, M.
Deutsch, and S. W. Cook. Dryden.
WHYTE, W. F., 1957. On asking indirect questions. Human Organiza-
tion, 15, 21-23.
WILNER, D. M., E. ROSENFELD, R. S. LEE, D. L. GERARD, and I. CHEIN,
1957. Heroin use and street gangs. J. Criminal Law, Criminology and
Police Science, 48" 399-409.
WILNER, D. M., R. P. WALKLEY, and S. W. COOK, 1955. Human relations
in interracial housing: a study ot the contact hypothesis. University
of Minnesota.
606 BIBLIOGRAPHY
WOLFENSTEIN, M., and N. LEITES, 1950. Two social scientists view "No
Way Out"-the unconscious vs. the "message" in an anti-bias film.
Commentary, 10,388-391.
WOOLLEY, E. C., and F. W. SCOTT, 1944. College handbook of composi-
tion (4th ed.). Heath.
WORMSER, M. H., and C. SELLTIZ, 1951a. Community self-surveys. In
Research methods in social relations, Vol. II, first ed., ed. by M.
Jahoda, M. Deutsch, and S. W. Cook. Dryden.
WORMSER, M. H., and C. SELLTIZ, 1951b. How to conduct a community
self-survey of civil rights. Association Press.
WRIGHT, Q., and C. J. NELSON, 1939. American attitudes toward Japa!!
and China, 1937-38. Public Opinion Quarterly, 3, 46-62.
WRIGLEY, C., 1957. Electronic computers and psychological research.
American Psychologist, 12, 501-509.
ZANDER, A., 1951. Systematic observation of small face-to-face groups. In
Research methods in social relations, Vol. II, first ed., ed. by M.
Jahoda, M. Deutsch, and S. W. Cook. Dryden.
ZAWADZKI, B., 1948. Limitations of the scapegoat theory of prejudice. ,.
Abnormal and Social Psychology, 43,127-141.
ZEISEL, H., 1957. Say it with figures (4th ed.) . Harper.
ZEVIN, B. D., ed., 1946. Nothing to fear (Addresses. of Franklin D. Roose
velt). Houghton. .
ZOBER, M., 1956. Some projective techniques in marketing research. ,.
Marketing, 20,262-268.
INDEX
INDEX
I
-
PROPE}rr~
The Kansas S.ate
-Uniu{rsity ~
OF
of
Agriculture & ApplIed Science ':'
TC M India. 1