Academia.eduAcademia.edu

Hypothetical Vignettes in Empirical Bioethics Research

2007

EMPIRICAL METHODS FOR BIOETHICS: A PRIMER ADVANCES IN BIOETHICS Series Editors: Robert Baker and Wayne Shelton Recent Volumes: Volume 1: Violence, Neglect and the Elderly – Edited by L. Cebik, G. C. Graver and F. H. Marsh Volume 2: New Essays on Abortion and Bioethics – Edited by R. B. Edwards Values, Ethics and Alcoholism – Edited by W. N. Shelton and R. B. Edwards Critical Reflections Medical Ethics – Edited by M. Evans Volume 3: Volume 4: Volume 5: Volume 6: Volume 7: Volume 8: Volume 9: Volume 10: Bioethics of Medical Education – Edited by R. B. Edwards Postmodern Malpractice – Edited by Colleen Clements The Ethics of Organ Transplantation – Edited by Wayne Shelton and John Balint Taking Life and Death Seriously: Bioethics in Japan – Edited by Takao Takahashi Ethics and Epidemics – Edited by John Balint, Sean Philpott, Robert Baker and Martin Strosberg Lost Virtue: Professional Character Development in Medical Education – Edited by Nuala Kenny and Martin Strosberg ADVANCES IN BIOETHICS VOLUME 11 EMPIRICAL METHODS FOR BIOETHICS: A PRIMER EDITED BY LIVA JACOBY Associate Professor of Medicine, Office of the Vice Dean for Academic Affairs and The Alden March Bioethics Institute, Albany Medical College, Albany, NY, USA LAURA A. SIMINOFF Professor and Chair, Department of Social and Behavioral Health, School of Medicine, Virginia Commonwealth University, VA, USA Amsterdam – Boston – Heidelberg – London – New York – Oxford Paris – San Diego – San Francisco – Singapore – Sydney – Tokyo JAI Press is an imprint of Elsevier JAI Press is an imprint of Elsevier Linacre House, Jordan Hill, Oxford OX2 8DP, UK Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands 525 B Street, Suite 1900, San Diego, CA 92101-4495, USA First edition 2008 Copyright r 2008 Elsevier Ltd. All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: [email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://www.elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-7623-1266-5 ISSN: 1479-3709 (Series) For information on all JAI Press publications visit our website at books.elsevier.com Printed and bound in the United Kingdom 08 09 10 11 12 10 9 8 7 6 5 4 3 2 1 This book is dedicated to our husbands Bill Jacoby and Jacek Ghosh for their unfailing support This page intentionally left blank CONTENTS LIST OF CONTRIBUTORS ix INTRODUCTION Liva Jacoby and Laura A. Siminoff 1 SECTION I: PERSPECTIVES ON EMPIRICAL BIOETHICS THE ROLE OF EMPIRICAL DATA IN BIOETHICS: A PHILOSOPHER’S VIEW Wayne Shelton 13 THE SIGNIFICANCE OF EMPIRICAL BIOETHICS FOR MEDICAL PRACTICE: A PHYSICIAN’S PERSPECTIVE Joel Frader 21 SECTION II: QUALITATIVE METHODS QUALITATIVE CONTENT ANALYSIS Jane Forman and Laura Damschroder 39 ETHICAL DESIGN AND CONDUCT OF FOCUS GROUPS IN BIOETHICS RESEARCH Christian M. Simon and Maghboeba Mosavel 63 CONTEXTUALIZING ETHICAL DILEMMAS: ETHNOGRAPHY FOR BIOETHICS Elisa J. Gordon and Betty Wolder Levin 83 vii viii CONTENTS SEMI-STRUCTURED INTERVIEWS IN BIOETHICS RESEARCH Pamela Sankar and Nora L. Jones 117 SECTION III: QUANTITATIVE METHODS SURVEY RESEARCH IN BIOETHICS G. Caleb Alexander and Matthew K. Wynia 139 HYPOTHETICAL VIGNETTES IN EMPIRICAL BIOETHICS RESEARCH Connie M. Ulrich and Sarah J. Ratcliffe 161 DELIBERATIVE PROCEDURES IN BIOETHICS Susan Dorr Goold, Laura Damschroder and Nancy Baum 183 INTERVENTION RESEARCH IN BIOETHICS Marion E. Broome 203 SUBJECT INDEX 219 LIST OF CONTRIBUTORS G. Caleb Alexander Section of General Internal Medicine, Department of Medicine, MacLean Center for Clinical Medical Ethics, The University of Chicago, Chicago, IL, USA Nancy Baum University of Michigan, School of Public Health, Ann Arbor, MI, USA Marion E. Broome Indiana University, School of Nursing, Indianapolis, IN, USA Laura Damschroder Ann Arbor VA HSR&D Center of Excellence, Ann Arbor, MI, USA Jane Forman Ann Arbor VA HSR&D Center of Excellence, Ann Arbor, MI, USA Joel Frader Feinberg School of Medicine, Northwestern University, Chicago, IL, USA Susan Dorr Goold Internal Medicine and Health Management and Policy, University of Michigan, Ann Arbor, MI, USA Elisa J. Gordon Alden March Bioethics Institute, Albany Medical College, Albany, NY, USA Nora L. Jones Center for Bioethics, University of Pennsylvania, Philadelphia, PA, USA Maghboeba Mosavel Center for Reducing Health Disparities, MetroHealth Medical Center, Case Western Reserve University, Cleveland, OH, USA Sarah J. Ratcliffe University of Pennsylvania, Department of Biostatistics, School of Medicine, Philadelphia, PA, USA ix x LIST OF CONTRIBUTORS Pamela Sankar Center for Bioethics, University of Pennsylvania, Philadelphia, PA, USA Wayne Shelton Program on Ethics and Health Outcomes, Alden March Bioethics Institute, Albany Medical College, Albany, NY, USA Christian M. Simon Department of Bioethics, School of Medicine, Case Western Reserve University, Cleveland, OH, USA Connie M. Ulrich University of Pennsylvania School of Nursing, Philadelphia, PA, USA Betty Wolder Levin Department of Health and Nutrition Sciences, Brooklyn College/Brooklyn City University of New York, Brooklyn, NY, USA Matthew K. Wynia The Institute for Ethics, American Medical Association, Chicago, IL, USA INTRODUCTION Liva Jacoby and Laura A. Siminoff In recent years, concerns over how to use the results of scientific advances, changing expectations of how medical decisions are made, and questions about the implications of demographic changes have raised ethical challenges regarding allocation of resources, justice, and patient autonomy. Bioethics – no longer the singular purview of moral philosophy – is now accepted as a legitimate field in the academic health sciences and is helping to guide policy and clinical decision-making. To achieve its full potential, it must seamlessly integrate the methods of the humanities, social sciences and medical sciences. This volume is intended to open a window to how empirically based social research helps illuminate and answer ethical questions in health care. Its primary aim is to examine the nature, scope and benefits of the relationship between empirical social science research and bioethics. Through a thorough examination of key research methods in sociology, anthropology and psychology and their applications, the book explores the study of bioethical phenomena and its impact on clinical and policy decision making, on scholarship and on the advancement of theory. The many and varied illustrations of research investigations presented in this book, allow readers to learn how different methodological approaches can address a wide range of ethical questions on both micro- and macro levels. In this vision of bioethics, fundamental questions are formulated using the tools of the social sciences, and then systematically studied with the thoughtful and methodical application of empirical methods. In this way, bioethics achieves the widest Empirical Methods for Bioethics: A Primer Advances in Bioethics, Volume 11, 1–10 Copyright r 2008 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1479-3709/doi:10.1016/S1479-3709(07)11013-X 1 2 LIVA JACOBY AND LAURA A. SIMINOFF lens allowing it to become a translational, as well as theoretical, area of inquiry. The book provides a primer and a guide to those who are interested in learning how to collect and analyze empirical data that informs matters of bioethical concern. It provides readers an in-depth understanding about a range of qualitative and quantitative research methodologies and is designed to convey the breadth, depth and richness of such work. This book demonstrates how synergy between the social sciences and moral philosophy can integrate into a new vision of bioethics. It is our hope that this approach will expand not only our understanding of the complex bioethical environment surrounding the provision of health care, but also will encourage continued collaboration across disciplines relevant to bioethics. This book builds on over 20 years of work in which individual researchers have attempted to shine an empirical spotlight onto bioethics. In 1984, Fox and Swazey (1984) characterized American bioethics as devoid of recognition of the social and cultural forces influencing ethical phenomena and being ‘‘sealed into itself’’ (p. 359). Five years later, Fox (1989) produced an eloquent analysis of the relationship between bioethics and the social sciences characterizing it as ‘‘y tentative, distant and susceptible to strain’’ (p. 237). In her analysis, she described how each field contributed to the tension – bioethics largely due to its focus on individualism and equating the social sciences with a quantitative and non-humanistic perspective, and the social sciences due to their limited interest in studying values and beliefs and favoring structural and organizational variables which, she contended, reduced their understanding of the importance of ethical and moral values in society. Her conclusion was that the ethos of both fields, with resultant ‘‘blind spots,’’ constituted barriers to collaboration and synergy. Since this bleak picture was articulated two decades ago, the relationship between the two fields has evolved to the point where bioethics is a multidisciplinary field of study (as opposed to a singular discipline), where moral philosophy, the medical sciences, the humanities and the social sciences intersect. This evolution of bioethics into a truly multidisciplinary field makes the present book not only possible but also necessary. In a sense, the book represents a culmination of the coming together of these areas of inquiry. Thus, in order to situate it appropriately, we present a brief overview of some of the literature that has helped move this development forward over the past two decades. Leading the path are Social Science Perspectives on Medical Ethics, edited by George Weisz (1990) and Bioethics and Society: Constructing the Ethical Enterprise, edited by DeVries and Subedi (1998), The SUPPORT study Introduction 3 (The SUPPORT Principal Investigators, 1995), and several articles by sociologists, philosophers/ethicists and physician/ethicists. George Weisz, in presenting the rationale for the book, describes three key areas where he believed social scientists could make important contributions to medical ethics: (1) provide data, (2) place ethical problems in their social contexts and (3) facilitate critical self-reflection on the part of medical ethicists. Of note is that although Weisz chronicles the important contributions of the social sciences to what he then termed ‘‘medical’’ ethics, he did not see the social sciences as an integral part of the field. In one of the chapters, and perhaps one of the most forceful deliberations on the topic at hand, Hoffmaster (1990) – a philosopher-ethicist – introduces the notion of ethical contextualism whose aim is to ‘‘explain the practice of morality’’ (p. 250). In order to understand and explain morality and the nature of ethical dilemmas, he posits that the focus has to be on practice and on the social and historical contexts in which these dilemmas are located. In the same volume, Fox expounds on the limited attention given by bioethicists to reciprocity, interconnectedness and community when analyzing ethical phenomena. Despite the often pessimistic perspectives presented on the differences in disciplinary approaches, Weisz concludes the volume with the hope that the exchange of ideas and the ‘‘stretching of disciplinary boundaries’’ (p. 15) will continue. In 1992, Hoffmaster (1992) continued his vigorous arguments for incorporating contextualism into bioethical analysis by deliberating on the question ‘‘Can Ethnography Save the Life of Medical Ethics?’’ Presenting ethnography as a method to better understand the structural forces and individual particularities that influence values, behaviors and decisions in medical settings, he posits that moral theory must be tested in practice in order for theoretical development to occur. To justify his position, Hoffmaster uses several highly acclaimed ethnographic studies on decision making in neonatal intensive care units and genetic counseling. One of the most comprehensive empirical studies in bioethics so far has been the SUPPORT study, published in 1995. With two physicians as its principal investigators, the study employed quantitative as well as qualitative methods to examine a range of end of life care issues, both descriptively and experimentally, and produced a vast amount of data showing not only troubling results but engendering new questions, debates and more significant research. Despite criticism from many quarters concerning the study’s methods and aspects of the intervention, this investigation was evidence of how bioethics and the social sciences informed each other and it constituted a vital example of the testing of ethical theories 4 LIVA JACOBY AND LAURA A. SIMINOFF in practice. In fact, it demonstrated that thinking of bioethics simply as a branch of moral philosophy that could be ‘‘aided’’ by social scientific inquiry, was largely an outdated approach to the field. In the first chapter of the volume Bioethics and Society: Constructing the Ethical Enterprise (1998), DeVries and Conrad call attention to the ‘‘bad habits’’ of analytic bioethics that have resulted in its asocial nature and practical irrelevance (1998). They also point out the shortcoming among social scientists of separating data from norms, arguing that, ‘‘a richly rendered social science y must be normative’’ (p.6). Introducing the notion of ‘‘social bioethics,’’ they further posit that without empirical underpinnings, principalism and analytic bioethics will never lead to workable solutions to moral problems. A recurring theme in this volume is a critique of how the weight given to autonomy and individualism in American bioethics limits bioethicists’ recognition of social and cultural contexts and respect for pluralism. In one chapter, bioethics is criticized for protecting the status quo through its lack of attention to justice and structural factors in the health care system. In 2000, The Hastings Report moved the discourse about the relationship between the social sciences and medical ethics forward by publishing an issue entitled ‘‘What Can the Social Scientist Contribute to Medical Ethics?’’ In one of the articles, sociologist Zussman (2000) recognizes how medical ethicists have become more inclined to incorporate empirical data in their analyses and that social scientists studying medical ethics have shown more openness to the normative implications of their research. Importantly, he states that the classic ‘‘ought-is’’ distinction and other differences between the fields denote complementarity rather than incompatibility. He concludes by calling for a combination of empirical methods and an applied ethics model as an approach to pursuing scholarly work immersed in practice. In the same issue of the Hastings Report, Lindemann Nelson – a philosopher – states that the social sciences can and should help bioethics by enriching an understanding of prevailing ethical values and how these ‘‘come to be installed or resisted in patterns of practice’’ (2000; p. 15). Using the SUPPORT study as an illustration of his arguments, he recognizes the need to give attention to how structural and institutional factors impact human behaviors and practice patterns. Bioethics in Social Context, edited by Hoffmaster, and Methods in Medical Ethics, edited by Sugarman and Sulmasy were published in 2001. Both constitute important work in this genre, and reflect what many of the scholars reviewed above have called for. The first volume consists of essays on qualitative research emphasizing the context within which ethical decisions take place. The second volume describes a wide range of empirical approaches Introduction 5 to studying bioethical questions – from religion and theology, history and legal methods to ethnography, survey research, experimental methods and economics. Setting the stage for their book with a discussion on the relationship between descriptive and normative research in medical ethics, Sugarman and Sulmasy posit that empirical research ‘‘can raise questions about the universalizability of normative claims’’ and ‘‘can identify areas of disagreement that are ripe for ethical inquiry’’ (p. 15). Claiming that ‘‘good ethics depends upon good facts’’ (p. 11), one of their premises is that good moral reasoning needs both moral and factual elements. Their conclusion is that descriptive and normative inquiries are mutually supportive. Including a number of methods outside as well as within the social sciences, the book sheds light on the wide range of disciplines that have contributed to the study of bioethical phenomena during the past couple of decades. In the present book, we advance the field of empirical bioethical inquiry another step by focusing on empirical methods in bioethics and on their practical applications to investigating a wide spectrum of bioethical problems. One thing that is noteworthy, when compared to much of the work preceding this volume, is that we use the term ‘‘bioethics’’ rather than ‘‘medical ethics.’’ We believe the term ‘‘bioethics’’ denotes a broader meaning related to the study of the ethical, moral and social implications of the practice of medicine in all its aspects along with the social and ethical problems generated by new biotechnology and biomedical advances. We have included eight basic research methodologies – and asked the authors to describe how they have employed ‘‘their’’ particular method to examine matters of bioethical concern. These range from informed consent, human subjects research, end of life care, decision making regarding organ donation, to the tension between privacy rights and the facilitation of medical research to community standards concerning health care spending priorities. Before the eight chapters on methodology, are two chapters that provide two different and, in many ways, complementary perspectives on contemporary empirical bioethics – one by a practicing physician/ethicist and the other by a clinical ethicist/philosopher. Their discussions of how empirical research has contributed to their areas of expertise, demonstrate how such research brings together moral philosophy, clinical practice and clinical ethics and serves as a valuable framework for the remainder of the book. The methodology chapters are organized under the headings of methods that are generally classified as ‘‘qualitative’’ research methods and those that are ‘‘quantitative.’’ The particular methods were chosen that have been demonstrated to have a practical application in the study of bioethics, including those that are used frequently and have proven to yield valuable 6 LIVA JACOBY AND LAURA A. SIMINOFF results. The qualitative chapters include: Content analysis, Focus groups, Ethnography and Semi-structured interviews. Chapters that focus on quantitative methods are: Survey Research, Hypothetical Vignettes, Deliberative Procedures and Intervention Studies. The section on qualitative methods begins with a chapter on content analysis by Forman and Damschroder. As the first chapter in this section, it introduces the reader to a method that constitutes the basis for much of analysis of qualitative research data and, as such, frames the following three chapters. The authors provide a detailed description of how qualitative content analysis can be used to analyze textual data of various kinds and is aimed at generating detail and depth. Their focus is on the examination of data gathered through open-ended interviews. By using specific examples, they illustrate how content analysis provides comprehensive descriptions of phenomena; illuminates processes; captures beliefs, motivations and experiences of individuals; and explains the meaning that individuals attach to their experiences. The chapter provides ample information on the many steps inherent in content analysis, from the framing of the research question, deciding on the unit(s) of analysis, to the various and specific forms of engaging with the data, and performing the actual analysis. In the second chapter, Simon and Mosavel discuss focus groups as a useful method to stimulate discussion and gather data on multifaceted and complex bioethical issues. The use of focus groups in bioethical inquiry has increased in recent years, and drawing on their own research experiences in South Africa, the authors explore and highlight some of the uses of this method as an investigatory tool. Referring to focus groups as a method that is comparatively cost effective and easy to implement, Simon and Mosavel present the reader with practical information on the processes and procedures of designing and conducting focus groups in an ethical, culturally appropriate and scientifically rigorous way. They go on to present a novel and unique form of analysis of focus group data, referred to as ‘‘workshop-based summarizing and interpretation’’ and describe how this approach was used with members of communities in South Africa as part of their research. Finally, they provide insights into ways of disseminating findings from focus group research. The chapter by Gordon and Levin illuminates how ethnography, as one of the most prominent empirical methods in early bioethics research, has and still does contribute significantly to our understanding of bioethical phenomena. In this chapter, Gordon and Levin start by giving a brief overview of seminal ethnographic work in the field, followed by a detailed description of participant observation as ‘‘the heart’’ of ethnography. Using examples from research of their own and that of others conducted in a Introduction 7 variety of health care settings, they continue by outlining the steps involved in preparing and implementing a participant observation study in the field. Their accounts give the reader valuable insights into the unique role of the participant observer, the significance of good note-taking, and common challenges encountered by ethnographers. The authors offer helpful ideas on precautions that researchers can take in order to maximize the rigor of their research and to generate valid and meaningful data. The section on the elements of data interpretation and analysis connects with Forman’s and Damschroder’s chapter on content analysis, providing the reader with a comprehensive guide to the collection and analysis of qualitative data. The authors end by reviewing ethical considerations in conducting ethnographic research as well as the strengths and weaknesses of such research. The final chapter on qualitative methods describes semi-structured interviews. Along with surveys, interviews have long constituted one of the basic methods in social science research. In this chapter, Sankar and Jones begin by presenting the advantages of semi-structured interviews characterized by the combination of closed-ended questions with open-ended queries making possible both comparisons across subjects and the in-depth exploration of data. Comparing to quantitative research, the authors contend that the main strength of semi-structured interviews is in the richness of the data they generate and that the method is particularly useful in exploratory research. Paying a good deal of attention to considerations in designing an interview guide, Sankar and Jones discuss the importance of pilot testing, and using an example from their study on medical confidentiality, address ways of maximizing the validity of interview questions and steps involved in finalizing questions and queries. Important segments of the chapter are the discussion of sampling and the actual conducting of semi-structured interviews. Focusing on audiotaping and the digital recording of interviews, Sankar and Jones provide a valuable complement to Gordon’s and Levin’s discussion of note taking in ethnography. Similarly, the review of coding procedures and the particular approach referred to as ‘‘multi level consensus coding,’’ add to the perspectives offered by Foreman and Damschroder in their chapter on content analysis. The first chapter in the book’s section on quantitative methods presents the basics of survey research. As Alexander and Wynia point out, surveys have been the bedrock of much of the research conducted by social scientists, stating that, few researchers who conduct empirical research in bioethics do not use some survey research techniques. Alexander and Wynia further observe that surveys about ethically important topics have made important contributions to bioethics. However, it is not a simple task to 8 LIVA JACOBY AND LAURA A. SIMINOFF conduct a good survey. The authors make clear that in order to obtain meaningful information from a survey, the researcher needs to pay careful attention to the development of the survey’s design, including formulating the research question, how to draw the sample, developing the questionnaire and how the data will be managed and analyzed. The authors contend that using rigor throughout this process can be the difference between an important study that makes fundamental contributions and one that is irrelevant to ethical analysis, health policy or clinical practice. The chapter provides a primer to the reader in how to balance rigor with feasibility at all stages of survey development, fielding, analysis and presentation and helps the reader plan, develop and conduct a survey. A related technique to survey research is the use of hypothetical vignettes. This technique is especially relevant to bioethics research where it can be difficult to directly observe certain ethical problems because of the intensely personal nature of the questions of interest (removal of ventilator support from a patient), the rarity of the occurrence (requests for organ donation in the hospital), or a sensitive question that may reside at the edge or over the edge of what is legally permitted (assisted suicide and euthanasia). As Ulrich and Ratcliffe point out, hypothetical vignettes provide a less personal and, therefore, less threatening presentation of such issues to research participants. The chapter provides an overview of hypothetical vignettes with examples of how this method has been used to examine and analyze critical ethical problems. It reviews ways to evaluate the reliability, validity, strengths and limitations of studies using vignettes. The chapter will take the reader through what constitutes a vignette, how to develop a vignette about a bioethics-relevant problem, how to evaluate the psychometric properties of vignettes, the determination of sample size and sampling considerations and examples of published vignettes used in empirical bioethics research. The chapter by Goold, Damschroder and Baum will introduce many readers to a methodology unfamiliar to them – deliberative procedures. This methodology is based on theories of deliberative democracy with the idea of providing community members with a ‘‘voice’’ in community-wide decisions, for instance about health care spending priorities or research regulation. Deliberative procedures offer an opportunity for individuals to assess their own needs and preferences in light of the needs and desires of others. In bioethics research, deliberations involve individuals in a community decision-making process about bioethical issues with policy implications and may provide acceptance and legitimacy to a given issue within a population. 9 Introduction The authors describe how deliberative procedures entail gathering nonprofessional members of the public to discuss, deliberate and learn about a particular topic with the intention of forming a policy recommendation or casting an informed ‘‘vote.’’ For researchers involved in exploring bioethical issues, deliberative procedures can be a valuable tool for gathering information about public views, preferences and values. This chapter focuses on de novo deliberative procedures used for research purposes, or combined policy and research purposes, where sampling issues, and research aims are known and planned up front. The chapter offers a review of methodological considerations unique to, or particularly important for, deliberative methods include sampling, specifically substantive representation, what to measure and when, the use of group dialog in the data collection process, and the role that deliberative procedures can play in educating the public and informing policy. The final chapter deals with intervention research. As Broome makes clear, the use of intervention designs, while a relatively recent phenomenon in bioethical inquiry, has a distinct and important role to play in advancing the field of bioethics. Its importance will grow as more empirical bioethics research provides data that not only informs policy and/or practice, but asks questions about what policies or practices work best. Although many ethical questions of interest are not appropriate for intervention research, the author contends that some questions can only be answered using experimental or quasi-experimental designs. The chapter provides the reader with a review of the application of experimental methods to bioethics research including randomized controlled trials and quasi-experimental designs ranging from the more rigorous two-group repeated measures or pre-test/post-test designs to the one-group post-test-only design. Strengths of each design, including the threats to internal and external validity, are presented. As Broome stresses, not all bioethical phenomena are appropriate for study using experimental or quasi-experimental designs. Examples of intervention research related to informed consent are provided. With this book, we hope to create enthusiasm for empirical research that will continue to bring synergy between disciplines representing bioethics and, in so doing, further enhance our understanding of bioethical phenomena. REFERENCES Devries, R., & Conrad, P. (1998). Why bioethics needs sociology. In: R. DeVries & J. Subedi (Eds), Bioethics and society: Constructing the ethical enterprise. Englewood Cliffs, NJ: Prentice Hall. 10 LIVA JACOBY AND LAURA A. SIMINOFF DeVries, R., & Subedi, J. (Eds). (1998). Bioethics and society: Constructing the ethical enterprise. Englewood Cliffs, NJ: Prentice Hall. Fox, R. (1989). The sociology of bioethics. In: R. Fox (Ed.), The sociology of medicine. Englewood Cliffs, NJ: Prentice Hall. Fox, R., & Swazey, J. (1984). Medical morality is not bioethics – medical ethics in China and the United States. Perspectives in Biology and Medicine, 27(3), 337–360. Hoffmaster, B. (1990). Morality and the social sciences. In: G. Weisz (Ed.), Social science perspectives on medical ethics. Boston: Kluwer Academic Publishers. Hoffmaster, B. (1992). Can ethnography save the life of medical ethics? Social Science and Medicine, 35(12), 1421–1431. Hoffmaster, B. (Ed.) (2001). Bioethics in social context. Philadelphia, PA: Temple University Press. Lindemann Nelson, J. (2000). Moral teachings from unexpected quarters – lessons for bioethics from the social science and managed care. Hastings Center Report, 30(1), 12–21. Sugarman, J., & Sulmasy, D. (2001). Methods in medical ethics. Washington, DC: Georgetown University Press. The SUPPORT Principal Investigators. (1995). The SUPPORT study. A controlled clinical trial to improve care for seriously ill hospitalized patients. Journal of the American Medical Association, 274, 1591–1598. Weisz, G. (Ed.) (1990). Social science perspectives on medical ethics. Boston: Kluwer Academic Publishers. Zussman, R. (2000). The contributions of sociology to medical ethics. Hastings Center Report, 30(1), 7–11. SECTION I: PERSPECTIVES ON EMPIRICAL BIOETHICS This page intentionally left blank THE ROLE OF EMPIRICAL DATA IN BIOETHICS: A PHILOSOPHER’S VIEW Wayne Shelton THE CRISIS OF TRADITIONAL ETHICAL THEORY How many textbooks or introductory articles in bioethics begin with a section on ethical theory? Of the many that do, the relevance of basic theories of utilitarianism, deontology, virtue ethics, feminist ethics, casuistry and so on, is assumed. These theories are also considered in light of the wellaccepted principles of medical ethics: (1) respect for patient autonomy, (2) beneficence, (3) non-maleficence and (4) justice. Those of us trained in philosophy find these sections on theory terse summations of complex philosophical views. Physicians and nurses, and others not trained in philosophy, sometimes struggle to get their gist, and end up with an ability to make a basic analysis and formulate arguments about ethical problems from each of these perspectives, and to write and discuss the issues that arise with fellow ethicists. But how essential are these theoretical perspectives to the real work of clinical ethics consultants? It is important that we do not forget just how applied and practical that work is. Regardless of one’s background perspective coming into bioethics, particularly clinical ethics, if he or she wishes to become a clinical ethics consultant and work in the field applied clinical ethics, it is essential to Empirical Methods for Bioethics: A Primer Advances in Bioethics, Volume 11, 13–20 Copyright r 2008 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1479-3709/doi:10.1016/S1479-3709(07)11001-3 13 14 WAYNE SHELTON grapple first hand with value conflicts in real life situations. Applied clinical ethics is the study of range of value-laden, ethical conflicts and dilemmas that arise in the clinical setting, and especially in the physician–patient relationship. The ethics consultant, a specialist in applied clinical ethics, combines both core clinical skills necessary to provide support and help people embroiled in value conflicts, as well as to employ advanced analytical skills and knowledge in offering a considered analysis of the opposing value positions in order to make recommendations consistent with the rights and obligations of those involved. Some ethics consultants are also expected to communicate directly with patients and families to facilitate an ethically acceptable outcome. For those ethics consultants directly involved in clinical value conflicts, both in terms of their practical resolution in individual situations and in terms of their import for academic reflection, what is the role of ethical theory in the work of applied ethics? And more specifically, can traditional ethical theory serve as a normative basis on which we judge one moral option better than another? These are indeed legitimate theoretical ethical questions. But, it is doubtful that our typical educational training prepares ethics consultants to answer them. It is unfortunate that philosophical ethical theories still have an uncertain and, I would say, awkward fit into the practical realm of applied ethics. As someone who has taught a number of graduate courses in clinical ethics, the justification I use for providing a brief introduction to ethical theory, after the nature of real clinical value conflicts have been established, is to stimulate the student’s imagination in analyzing and seeing all the possible ways of viewing and justifying a case ethically. But again, there is the continuing sense of not knowing quite how to use ethical theory in bioethics in relation to problem solving, particularly in knowing the normative force of ethical theory. The purpose of this brief chapter is to provide an innovative alternative that was proposed by John Dewey in the early part of last century. I will argue that the emergence of an applied ethical field like bioethics provides the occasion to reconstruct our understanding of the relation of theory and practice, and creates a whole new appreciation of, and need for, empirical bioethics. From this perspective, ethical theory can be viewed in a different, and more constructive, light. Dewey’s critique of western philosophy and ethical theory was the springboard for a new understanding of ethics. Dewey believed that western philosophy since Plato was largely based on false dualisms that created a bogus dichotomy between some ultimate, rational understanding of truth versus what can be known from ordinary experience. It is because of such dichotomies that the ‘‘Is-Ought’’ problem has appeared so impenetrable, i.e. The Role of Empirical Data in Bioethics 15 because reason and experience are assumed to be disconnected, empirical knowledge is of little or no help in deriving moral obligations. (Those interested in Dewey’s critique, please see The Quest for Certainty: A Study of the Relation of Knowledge and Action (1960)). Dewey was quite convinced, as are many of his followers, that western philosophy has failed to reach final understandings of rational truth, and that we are wasting our time to continue to search for them. He contended that there has never been and will never be final agreement about such quests. It is for this reason that Dewey believed philosophy, including ethics, is in need of reconstruction – a whole new pragmatic understanding that grounds ethical inquiry in human experience. In light of this critique of western philosophy, the problem of knowing just how to use traditional ethical theory can be fully appreciated, and why the pragmatic turn in bioethics marks a new beginning for philosophy and ethics. PRAGMATIC ETHICS Pragmatic ethics begins with the reality of lived experience of human beings who are connected biologically, socially and politically within a natural environment. Thus, the moral life is connected to the conditions that best foster human flourishing and reduce suffering. ‘‘Right’’ and ‘‘good’’ are the ends of moral inquiry, not assumptions grounded in normative religion or philosophical theory. Many have taken this approach as a step toward moral relativism and crisis of value. But pragmatists believe the turn toward naturalism is the occasion to fully grasp the vital role of human beings in shaping their own fate, and promoting a better society. Human beings’ highest aspirations – justice, peace and alleviation of suffering – lie in the enhancement of human intelligence and forms of inquiry that allows humans to better understand how to craft a better future for everyone. Questions of moral conflict, thus, become more problems of strategy for humans to resolve using principles that guide expedient resolution within a social circumstance or context. According to this perspective, since ethical reflection begins and ends in experience and not some pre-established, a priori patterns of reasoning, the tension between ‘‘is’’ versus ‘‘ought’’ begins to subside. Because ethicists no longer work only in rarified academic settings but more and more function as applied ethicists alongside practitioners in clinical settings, there is a need to provide working solutions to medical ethical problems. Not many applied ethicists I know are looking for 16 WAYNE SHELTON ultimate answers; rather, they seek solutions that help people in conflict to make decisions and to cope better in their environments. Ethicists must themselves know the experiential landscape of a given situation in order to provide meaningful and helpful moral advice to practitioners. But in order to have viable interpretations of such situations ethicists need to be able to use imagination to creatively propose viable strategies for amelioration of the human condition – both at the micro and macro level. This requires knowledge and understanding of the empirical circumstances in which value conflicts arise. In order to generate empirical knowledge the field of bioethics must rely on the same empirical methodological basis as the social sciences. This is consistent with Dewey’s hope for philosophy: that it would be put to work along side the other sciences for the betterment of society. EMPIRICAL BIOETHICS In its reconstructed role, ethical theory is no longer isolated from experience in preexisting forms as it appears in traditional philosophical ethics, but is connected to experience and informed by it. The goals of ‘‘right’’ action is not to make a determination of a final moral duty, but to make provisional statements that must be continually monitored and, like scientific claims, revised when warranted by further empirical knowledge. The philosophical quest of seeking final rational answers is replaced by a commitment to improve at least certain aspects of human life. For the clinical ethics consultant, it is to improve the conditions for people with conflicting values and/or preferences in particular clinical settings. Thus, empirical knowledge, and therefore, empirical bioethics are essential to the field of bioethics, and all ethical inquiries that seek solutions to value conflicts. The rise of empirical knowledge in bioethics also allows us to generate new knowledge about the empirical landscape of clinical ethical conflicts. This knowledge leads to greater insight into the associative and causal elements that generate ethical conflicts in the first place. With more empirical knowledge, more strategic mastery of the course of clinical events is possible. This can occur by providing more effective ethics consultations in individual cases. But empirical knowledge can also become a basis for developing broader, preemptive strategies for dealing with ethical conflicts. It is a sign of a maturing field, more aligned with other fields based in scientific methodology that many applied ethicists, health care practitioners, policy makers and others are becoming more focused on quality of care improvement by reducing the incidence of ethical conflicts. This is done by The Role of Empirical Data in Bioethics 17 testing, through scientific empirical studies, the efficacy of new strategies that focus on improved management of the contributing elements of ethical conflict. Most ethics consultants quickly realize that it is much more efficient to prevent major ethical conflicts from arising than to have to grapple with them in their often final and intractable manifestation as ethical dilemmas. This requires the ethicist to become proficient in empirical studies of clinical decision making and outcomes and related issues that affect patient and family care. A major empirical study in the Surgical ICU at an academic health care institution, led by a philosopher/ethicist, is an example. AN ILLUSTRATION The study grew out of extensive experience providing ethics consultations in the ICU setting. Based on recent ICU data accumulated from an internal study, we learned that up to 60% of patients where ethics consultations were done, died. In these cases, families typically go through extremely stressful and, sometimes, gut-wrenching experiences of decision making for their loved ones. Families in this setting are routinely required to make decisions about whether or not to continue life-sustaining treatments, and how best to follow the expressed wishes of the patient in situations where the patient lacks capacity. In the course of providing ethics consultations, it is often necessary to have extensive conversations with the family of an incapacitated patient allowing them to share their intimate knowledge of the patient’s values and how those values apply to medical decision making. Many times a consensus emerges about the right course of action, but unfortunately, sometimes there is deep disagreement between family members, and between family and care providers. At those times, the ethics consultant is there to help mediate what is often a value-laden conflict, which has reached an impasse. Attitudes and dispositions among those involved can become hardened and people’s positions entrenched. Conflicts that drag on frequently lead to a lack of clarity in defining goals of care and patients may stay in the ICU for an extended length of time, using costly resources. In situations where conflicts persist, it is possible that patients are receiving care that is not medically indicated. In many instances, the ethics consultant can serve as an outside mediator, clarifying facts and values, and help the parties in conflict reach mutually acceptable outcomes. But from the ethics consultant’s point of view the time of entry into the situation is late, and improvements in care are made, one case at a time. Commonly, much has happened prior to the point of the ethics consultation that impacts 18 WAYNE SHELTON the conflict, and often the key factor is how the information flow and communication has occurred between the health care team and the family. Over the years I have often drawn from my experience as an ethics consultant dealing with ethical conflicts in the ICU for teaching purposes to illustrate the ways how care providers can interact with families so as to prevent conflicts from arising in the first place. Based on these observations and insights, my hunch grew into a well-formulated research question about how a highly tailored plan for focused family support might reduce ethical conflicts, as well as increase family satisfaction and reduce length of stay for patients most at risk for extended length of stay. Thus, the inspiration for a study!! As a philosopher trained in clinical ethics consultation, and also in social work and health care policy, my interests expanded to considering how one might do an empirical study of my, at that point, hunch. After an exhaustive literature review, numerous discussions with many clinicians and researcher, several pilot studies, two related publications (Gruenberg et al., 2006; Rose & Shelton, 2006) and extensive planning and proposal writing, funding was received to begin the ‘‘ICU study’’. The study is designed to test the hypothesis that a focused, multidisciplinary model of family support, including the combined resources of ethics consultation, pastoral care, social work and palliative care, all led and coordinated by a nurse practitioner will lead to (1) increased family satisfaction with care, (2) decreased unnecessary and unwanted care, and (3) reduced cost and resource utilization. The intervention will use a nurse practitioner to gather information about the family from the non-medical support services. Working directly with the physicians, he or she will ensure that this information in used meaningfully and robustly in the medical decision making and goal-setting process. Thus, the nurse practitioner will be the crucial link between the physicians in charge of directing medical care for the patient and the family. Thus, the intervention will resolve one of the most pervasive and well-known problems of hospital case: no one taking responsibility for the patient and family, with one central line of communication to manage the flow of medical and other essential information. This perennial problem regarding the flow of information to families and patients in hospitals was identified by Michael Balint as ‘‘collusion of anonymity’’ in his book The Doctor, His Patient and the Illness first published in 1957. Since that time, the problem has grown much more complicated with the rise of highly complex, specialized medical fields, especially in the context of large teaching hospitals. It is common in such hospital settings for families to come into contact with numerous physicians The Role of Empirical Data in Bioethics 19 from many specialty services, each with their own medical perspective and sometimes at variance with one another in terms of the prognosis and goals of care for the patient. To say this situation can be confusing for a stressed family distraught over the illness of their loved one is an understatement. Such confusion can also fuel misunderstandings, leading to confusion and strong emotions of anger and resentment. Most experienced ethics consultants know, at least anecdotally from their own experiences, that such situations are a breeding ground for ethical conflicts and dilemmas, and for dissatisfaction with care among families. The goal of the study intervention is to preclude clinical ethics dilemmas of these kinds, and to engage in what is sometimes referred to as ‘‘preemptive ethics’’ by preventing the problems from occurring. Instead of dealing with ethical problems one at a time, from one crisis to another, the study will provide the benefit of collecting empirical data that can illuminate and address underlying root causes of ethical conflicts at the bedside. Our premise is that by providing focused support to stressed families, they will gain an enhanced ability to make decisions with ease and understanding, according to their values and preferences and those of the patient. The emphasis of the study is on improving the quality of care for seriously ill patients, and testing whether such an improvement reduces cost of care. Is this a clinical ethics study? I would say, yes! Most importantly, it shows how an emerging field like empirical bioethics is connected to other key areas of health care research, such as quality improvement, resource utilization and outcomes studies. Therefore, one significant point is that as bioethical inquiry evolves into empirical investigations it clearly becomes more multidisciplinary and gains more standing as an important area of health care research. CONCLUSIONS In light of this beginning trend toward empirical bioethics, where does this leave ethical theory, and what is its role in applied ethics? Applied ethicists who have taken the pragmatic turn and are now exploring empirical bioethics in terms of outcomes research, see the tradition of western ethical theory as a body of literature with no special moral authority. This is not to say ethical theory is not interesting or even important, but clearly the perspective of the naı̈ve graduate student looking for foundational moral authority is gone. Instead, its use is more as a set of tools – they are handy to 20 WAYNE SHELTON have at one’s disposal. They provide ways of asking relevant questions, structuring arguments and formulating alternative resolutions to pressing problems. To the extent theory is now used for the empirically oriented, applied philosopher, they represent structured ways of conceiving alternative moral perspectives that stem from direct experience in the empirical details of ethical problem solving. The theories stem from the imagination and provide the ethical visions for forging a better state of affairs with respect to some human value problems. Perhaps with more bioethical empirical research there will emerge a new type of fully articulated theory, as Dewey was hoping, that is better suited for practical problem solving. This also means much less of a focus, if any, on the questions that have flowed out of modern philosophy, based on dualisms that force us to separate theory and practice, mind and body, and fact and value. So at this point of a new beginning for applied philosophy, a field we call empirical bioethics, it is a good thing that philosophers are getting their hands dirty in the real world of experience and grounding their theoretical approaches toward improving the human condition. REFERENCES Balint, M. (2000). The doctor, the patient and his illness (2nd ed.). Amsterdam, The Netherlands: Churchill Livingstone. Gruenberg, D., Shelton, W., Rose, S., Rutter, A., Socaris, S., & McGee, G. (2006). Factors influencing length of stay in the intensive care unit. American Journal of Critical Care, 15(5), 502–509. Dewey, J. (1960). The quest for certainty. New York: Capricorn Books. Rose, S., & Shelton, W. (2006). The role of social work in the ICU: Reducing family distress and facilitating end-of-life decision-making. Journal of Social Work in End-of- Life and Palliative Care, 2(2), 3–23. THE SIGNIFICANCE OF EMPIRICAL BIOETHICS FOR MEDICAL PRACTICE: A PHYSICIAN’S PERSPECTIVE Joel Frader INTRODUCTION While some of us enjoy engaging in many forms of bioethical activity, including philosophical analysis and debate, clinical ethics consultation, and empirical research, only the latter matters much to the practicing physician. Practically minded, most doctors have little concern with fine moral distinctions when faced with a patient’s request for assistance in dying or a pharmaceutical company’s offer to attend a product ‘‘consultation’’ session at a first class resort in addition to an attractive fee for participation. Physicians want to know what facts might bear on ethical questions they confront, how ethical conflicts that have an impact on patient care can be understood and resolved, and whether research reveals consistently clear, helpful findings. The following discussion offers some examples of how empirical research related to bioethical issues has provided evidence and guides for physicians at both individual-patient care and policy levels, and further reviews areas that warrant continued research attention. Empirical Methods for Bioethics: A Primer Advances in Bioethics, Volume 11, 21–35 Copyright r 2008 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1479-3709/doi:10.1016/S1479-3709(07)11002-5 21 22 JOEL FRADER AREAS OF RELATIVELY CLEAR EVIDENCE Informed Consent During the second-half of the twentieth century in the United States, medicine experienced enormous change. Scientific and technological advances made medical interventions vastly more effective and health care became a major economic engine. In accord with the political and cultural changes emphasizing the rights of individuals, ethical and legal thinking about the relationship between professional providers and researchers on the one hand and patients and subjects on the other hand, shifted from beneficent paternalism to consumerist autonomy. The doctrine of informed consent became the judicial and ethical cornerstone of decisions about medical treatment and/or research participation. Public policy, fashioned to prevent recurrences of Nazi medical atrocities, the kind of researcher arrogance documented by Beecher (1966), Barber, Lally, Makarushka, and Sullivan (1973) and Gray (1975), and medical over-treatment noted by the President’s Commission for the Study of Ethical Problems in Medicine and Biomedical and Behavioral Research (1983) and Weir (1989) focused on individual consent by the patient, subject, or surrogate authorization. Unfortunately, research about the realities of informed consent shows that practice fails to live up to ethical theory and legal doctrine. In the context of medical treatment, evidence indicates that differences in knowledge, social status, and emotional states of practitioners and patients/family members undermine the value of information about the risks, benefits, and alternatives to proposed plans of care (Lidz et al., 1983; Appelbaum, Appelbaum, & Grisso, 1998; Siminoff & Drotar, 2004). Research also reveals that physicians often lack adequate training for and motivation to communicate clearly with patients and surrogates (Angelos, DaRosa, & Sherman, 2002; Sherman, McGaghie, Unti, & Thomas, 2005). Similar problems plague the implementation of the doctrine of informed consent in research. Potential subjects frequently suffer from the ‘‘therapeutic misconception’’ believing that proposed clinical research is designed to provide them with the best known therapy, despite randomization schemes ensuring distribution of subjects into treatment arms of competing value (Lidz & Appelbaum, 2002). In recent studies on research participation among children with cancer, Kodish, et al. (2004) have dramatically demonstrated that parents commonly do not understand that their children could receive treatment without enrolling in research nor do they typically comprehend that treatment assignment proceeds by chance, The Significance of Empirical Bioethics for Medical Practice 23 rather than the doctor’s deliberative decision about which regimen would work best for their child. Finally, especially in the research context, studies have repeatedly shown that the obsessive, legalistic focus on written consent forms generally fails to produce truly informed consent (Baker & Taub, 1983; Ogloff & Otto, 1991; Waggoner & Mayo, 1995; Lawson & Adamson, 1995; Agre & Rapkin, 2003). Consent forms, despite all the attention lavished on them by investigators, research coordinators, institutional review board (IRB) members and staffs, commonly contain language far too complex and technical for the general population to comprehend. What seems astonishing, in the face of years and volumes of research about the failure of efforts to effectuate meaningful informed consent, is the continued reliance on incomprehensible consent forms, and the lack of adequate preparation of clinicians (Wu & Pearlman, 1988; Sherman et al., 2005) or research team members for communicating clearly to patients and their loved ones about illness, treatment, research, and about the maintenance of ethical and legal notions of the consent process. The disconnection between ethical theory and application suggests one of two things: the theory needs to change to better reflect social practicality and/or the need for clinicians and investigators to become more creative and diligent in the way they convey information and assess understanding. Some recent developments, largely beyond the scope of this brief review, provide some hope that a much better job can be done with the process of informed consent, though not without substantial investment of time and resources. Researchers starting from the perspective of appropriate use of health care resources, particularly those noting wide regional variations in practice in the United States (Wennberg, Fisher, & Skinner, 2004), have become interested in standardized presentation of information to patients and surrogates. A recent review by O’Connor, Llewellyn-Thomas, and Flood (2004) concluded that use of various patient decision aids, such as interactive computer programs, up-to-date multimedia presentations, tools to help consumers identify and actualize their health-related values, and forms of personal coaching by health care professionals or experienced peers can improve the quality of decisions and reduce the use of invasive interventions without a decline in health outcomes. We will need additional efforts to assess whether such mechanisms enhance the informed consent process and if they have practical and cost effective use in everyday health care settings. This is particularly the case in situations involving emotionally charged decisions, such as testing for genetic susceptibility to disease, forgoing life support, or the choice among alternative therapies for cancer. 24 JOEL FRADER Advance Directives With the dramatic increase in intensive care units (ICUs) in American hospitals during medical and technical advances in the past 50 years, patients and families became concerned with over-treatment leading to the ‘‘right to die’’ movement. A series of legal cases related to family members’ requests to discontinue unwanted prolongation of life for their loved ones led to the notion that patients could avoid undesired life-sustaining interventions through preparation of documents designed to communicate treatment preferences once they lost capacity to interact effectively with health care providers. The first case to receive national attention in the U.S., involved a young woman named Karen Ann Quinlan who sustained severe brain injury. After prolonged treatment, including mechanical ventilation, her parents requested discontinuation of life support, but the patient’s doctors and hospital administrators, fearing legal sanctions, refused. The New Jersey courts in this case (In re Quinlan, 1976) and another subsequent seminal case involving Claire Conroy (In re Conroy, 1985), an elderly woman with several chronic conditions and no longer able to speak for herself, helped established criteria doctors and surrogates might use to justify forgoing life-sustaining treatment, recognizing the importance of evidence of what the patient him- or herself would want under the circumstances. This became a matter of Supreme Court consideration in the Nancy Cruzan case (Cruzan v. Director, 1990), which came to center on the issue of the quality of evidence necessary to establish the patient’s wishes. As a result of such cases, institutional policies and, eventually, federal and state laws, began to recognize ‘‘advance directives’’ as a valid means by which patients and surrogates could communicate their wishes in the face of lost decision-making capacity. One type of document, the ‘‘instructional’’ directive, in which the individual attempts to project what treatments to employ or avoid if she/he can no longer express him- or herself, was designed to provide the means for limiting – or, in some cases, continuing – medical interventions, especially in situations where treatment might involve marginal benefits. Research has shown that instructional directives – whether oral or written – lead to little change in what happens to patients. In the SUPPORT study, research nurses with the responsibility for ascertaining and communicating treatment preferences of patients to physicians in ICUs, failed to affect physicians’ decisions when compared to decisions among matched control The Significance of Empirical Bioethics for Medical Practice 25 patients without enhanced communication intervention (SUPPORT Principal Investigators, 1995). Various other studies have documented that even when one can find the properly executed document, no matter what form a prescriptive written instrument takes (i.e., using check boxes, blanks that patients fill in with their own words, prepared descriptions of therapies to avoid or use, etc.), clinicians frequently feel that their patients’ circumstances insufficiently match what the document says or anticipates. Thus, as suggested in an Institute of Medicine (IOM, 1997) study, Approaching Death: Improving Care at the End of Life and in a review by Lo and Steinbrook (2004), instructional directives may confuse, rather than clarify, end of life treatment decisions. Based on current research, and in the wake of the Schiavo case (Quill, 2005), ‘‘proxy’’ directives, or a combination of surrogate appointment, clarifying who should make decisions for the incapacitated patient, and instructions, rather than prescriptive documents alone, are needed. Health Care Disparities One of the tenets of modern bioethics involves the importance of social justice. With regard to medical care, most ethicists believe that everyone should have access to a ‘‘decent minimum’’ level of care, presumably including at least that which primary care physicians can deliver. In research settings, most agree and federal regulations require that the benefits and burdens of research and the advantages of access to research studies should be distributed equitably across social classes, ethnic groups, males and females, young and old, and other subgroups. Unfortunately, empirical research suggests that our health care and research systems have not lived up to these ideals. Politicians like to proclaim the glories of the health care ‘‘system’’ in the U.S. Studies show that some miraculous things do occur, as long as patients or family members have the means to ensure payment for desired services. For example, kidney transplants for those with end stage renal disease occur in substantially greater proportion among white, middle-class patients than among those who are poor and African-American (Eggers, 1995, Epstein et al., 2000; Churak, 2005). This holds true even though kidney failure affects a significantly larger percentage of the African-American population (Martins, Tareen, & Norris, 2002). When eliminating the variable of direct payment, such as within the Veterans Administration system, research 26 JOEL FRADER shows that African-American patients with equivalent medical conditions receive less aggressive care for serious heart disease than do Caucasians, i.e., fewer referrals for cardiac catheterization and/or coronary artery bypass surgery (Whittle, Conigliaro, Good, & Lofgren, 1993; Peterson, Wright, Daley, & Thibault, 1994). Recent findings have indicated similar results when comparing male cardiac patients with females, with the latter receiving less intervention (Maynard, Every, Martin, Kudenchuk, & Weaver, 1997; Fang & Alderman, 2006). The lack of access of poor patients to primary care has meant greater, more expensive, and less efficient and effective care (often in emergency departments), leading to delayed or inadequate maintenance and preventive services, and thus more frequent and/or more severe exacerbations of chronic conditions such as asthma, arthritis, and diabetes (Forrest & Starfield, 1998; Stevens, Seid, Mistry, & Halfon, 2006). Similar patterns can be seen in the research arena. For example, as only a small proportion of children require extensive medical intervention, drug manufacturers often decline to include children in clinical studies on new medications (Yoon, Davis, El-Essawi, & Cabana, 2006). As a result, those treating young patients lack systematic knowledge of proper dosing and differences in toxicities for immature human bodies. In the same vein, fear of liability, rather than patient benefit, seems to have affected pharmaceutical companies’ decisions not to study the effects of new drugs in women who are pregnant or at risk of becoming pregnant, even though these women may have or may acquire the same conditions in need of treatment as women who cannot bear children. Those favoring the inclusion of known-to-be or possibly pregnant women in clinical studies view their exclusion as discriminatory and feel that the women, not corporate legal counsel, can and should balance potential harms to themselves or their fetuses compared to the benefits that a clinical trial may offer (McCullough, Coverdale, & Chervenak, 2005). Only additional research focused on this issue can begin to clarify both the likelihood of harm and the adequacy of decision making in the face of possible or actual pregnancy. Further research aimed at ameliorating injustice in the worlds of clinical care and clinical research seems imperative. As noted, while we know African-Americans receive disproportionately fewer kidney transplants than whites with end stage renal disease, it is poorly understood how much a lack of timely access to primary care, sub-specialty (nephrology) care, transplant center evaluation, or other factors contribute to the low transplantation rate. Without such knowledge, one cannot recommend ethically optimal interventions that can address the inequities. The Significance of Empirical Bioethics for Medical Practice 27 AREAS OF CONFLICTING OR UNCLEAR EVIDENCE Assessment of Decision-making Capacity As reliance on the doctrine of informed consent depends on the ability of patients or subjects to understand and make use of information about proposed clinical or research interventions, many researchers and clinicians have sought accurate, reproducible methods to determine the adequacy of a patient’s or subject’s decision-making capacity. Putting aside efforts to decide if persons in the criminal justice system have sufficient capacity to stand trial, the search for an assessment instrument or process to adequately assess decisional capacity has had mixed success. In part, the problem seems to stem from the different questions researchers or clinicians believe ought to be asked. For example, does the individual have sufficient capacity to consent to medical care, to participate in research, to return to independent living, to make financial decisions, etc.? According to one review (Tunzi, 2001), the well known MacArthur Competence Assessment Tool (Grisso & Appelbaum, 1998) applies best to persons with known psychiatric or neurological disorders while others, such as the Capacity Assessment Tool (Carney, Neugroshi, Morrison, Marin, & Siu, 2001) or the Aid to Capacity Evaluation (Etchells et al., 1999) work in a general patient population. As Baergen (2002) and Breden and Vollman (2004) comment, the instruments may over- or under-estimate patients’ understanding of their situation, focus mostly on performance of cognitive tasks, inadequately assess the importance of a person’s values and feelings, and minimize the complexity of decision making in actual medical situations, fraught, as they often are, with several levels of uncertainty. In one particular population, that of minors, confusion and conflict reign regarding decision-making capacity. The work of ethnographers and others, most notably Bluebond-Langner (1978), show that the experience of chronic illness brings knowledge and decision-making ability well beyond one’s years for many children with serious medical conditions. On the other hand, focusing on the need for broad and effective public policy, social psychologists and lawyers (Scott, Reppucci, & Woolard, 1995) rely on research findings that adolescents typically and excessively (1) attend to the short-term consequences of decisions and actions; (2) tolerate risks; and (3) bow to pressure from others, especially adults in authority or peers. From this perspective, one should delay adolescent decisional authority as long as possible, hoping for the onset of maturity. It seems that empirical research 28 JOEL FRADER has failed to provide information as to how to determine medical decisionmaking capacity among adolescents under particular circumstances. Additional research could clarify how to balance an adolescent’s experience and maturity against population-based concerns about psychological and social development. When might an individual teenager, for example one who has lived with severe cystic fibrosis for years, despite relatively young age, say 14 years, have sufficient judgment to decline another round of mechanical ventilation in the ICU, with or without support from her parents? Targeted clinical studies could provide valuable practical data for agonizing ethical decisions regarding such thorny issues. Protection of Human Subjects of Research Since the mid-1970s with the introduction of federal regulations governing how institutions must review and oversee research involving human subjects, distressed and disgruntled investigators have often wondered about the extent to which bureaucratic processes actually protect subjects from research-related risk(s). As noted above, the common failure to produce intelligible consent forms suggests the current regulatory structure might not be effective. On the other hand, in the face of enormous increases in biomedical and behavioral research with human subjects since World War II, commentators, e.g. Emanuel (2005) and Fost (2005), point to the apparent low frequency of coercive abuses of human subjects in the era of oversight by IRBs. From this perspective, the rare dramatic and tragic deaths of research subjects prove the rule that the system works well, despite federal rebuke to IRBs at institutions where the deaths occurred. Only a few studies have systematically assessed the effectiveness of IRBs. One early study that used sham protocols submitted to IRBs around the U.S. found considerable differences in the quality and level of detail reviews, not to mention willingness to approve or reject questionably acceptable studies (Goldman & Katz, 1982). Several more recent publications regarding differing kinds of research (critical care, genetics, health services, and adult surgery) have indicated continuing variability in IRB procedures and responses to federal regulations aimed at protecting human subjects (Silverman, Hull, & Sugarman, 2001; McWilliams et al., 2003; Dziak et al., 2005). At least one ongoing study (Lidz, 2006) should begin to shed needed light on how IRBs routinely do their work. With regard to research including children, a classic vulnerable group, one might expect greater concern for subject protection and therefore greater The Significance of Empirical Bioethics for Medical Practice 29 consistency in the application of rules and regulations. However, some studies do not bear out that expectation. A review of published human subjects research with children led to a survey of authors whose articles failed to indicate whether their research had had IRB review and/or used required informed consent procedures (Weil, Nelson, & Ross, 2002). The survey found significant misclassification by IRBs of studies felt to be ‘‘exempt’’ from IRB review. Another survey of IRB chairs who frequently assessed studies involving children noted wide variations in the definitions the IRBs used for interventions constitution ‘‘minimal’’ or ‘‘minor increase over minimal’’ risk (Shah, Whittle, Wilfond, Gensler, & Wendler, 2004). A recent study showed large differences between IRBs in how they required investigators to respond to federal regulations regarding child assent to participation in research (Whittle, Shah, Wilfond, Gensler, & Wendler, 2004), while a review of IRB websites revealed some ‘‘incorrect advice’’ to investigators about meeting regulatory requirements concerning children (Wolf, Zandecki, & Lo, 2005). These studies begin to point to the areas where interventional research designed to determine how best to help IRB committee and staff members understand ethical considerations and regulations for pediatric research would be useful. There have also been studies of IRB members that suggest potential problems for human subjects protections in general. Campbell et al. (2003) found that of the nearly 3000 US medical school faculty members responding to their survey, 11% had served on IRBs. Almost half of these individuals (47%) consulted for industry, raising concerns about conflicts of interest in judging research protocols. Van Luijn, Musschenga, Keus, Robinson, and Aaronson (2002) found that ‘‘a substantial minority’’ of the 53 IRB members they interviewed in the Netherlands about reviews of phase II clinical oncology trials ‘‘felt less than fully competent at evaluating’’ key aspects of the research protocols. Several US federal government bodies have attempted to produce overviews of the adequacy of human subjects protections in the last several years. In 2000, The Office of the Inspector General of the Department of Health and Human Services (2002) issued a report entitled ‘‘Protecting Human Research Subjects: Status of Recommendations.’’ Citing an earlier report from 1998 from the same office warning of problems in the system, the follow-up report indicated that few of its previously recommended reforms had been put in place. They noted little progress in ‘‘continuing protections’’ of research subjects beyond initial IRB reviews, inadequate educational requirements for investigators or IRB members on protecting research subjects generally and preventing or minimizing conflicts of interest, especially. 30 JOEL FRADER In 2001, the National Bioethics Advisory Commission (NBAC, 2001), appointed by President Clinton, issued its report Ethical and Policy Issues in Research Involving Human Participants. The introduction, entitled ‘‘The need for change,’’ identified challenges faced by the research oversight system, highlighting the enormous workload faced by IRBs at researchintensive institutions and the high financial stakes involved. NBAC noted inadequate protections for potential subjects from vulnerable populations, inconsistency and rigidity in federal regulations, weaknesses in enforcement mechanisms available to agencies overseeing human subjects research, and inadequate resources for IRBs (administratively) and for IRB members in terms of time and education about research ethics. In 2001 and 2002, the Institute of Medicine (IOM, 2001, 2002) published two volumes concerned with protection of human subjects. These reports point to various problems in participant protection, including the fact that federal regulations do not necessarily apply to non-federally funded research, depending on arrangements at the institution where the research takes place. Even if that issue were resolved, the IOM studies acknowledge a host of additional problems, some of them well-documented, others simply feared or hypothesized, such as the extent to which potential subjects are exposed to ‘‘coercive’’ efforts to secure their enrollment in research (Emanuel, 2005). In summary, sufficient and clear data are lacking about the adequacy of protections of human subjects to assist physician-investigators considering referrals of patients to clinical studies and subjects considering participation in research. Some would claim that the combination of federal regulations, local IRB oversight, investigator education, and public good-will provide at least adequate protection for human subjects of biomedical and behavioral research in the US and Western Europe. (The controversies about research in the developing world fall outside the scope of this review.) Alternatively, some believe that the host of demonstrated and feared inadequacies in the system, especially when one considers the financial stakes involved, suggest widespread disregard of subject protection. The skeptics point to the few well-publicized deaths of research subjects in the last decade and suggest that we have only learned about the tip of the proverbially iceberg that is a poorly regulated and possibly corrupt system. We would all benefit from much more detailed studies of actual IRB function, including field observations of research team members as they interact with potential and actual subjects, and studies that monitor or audit compliance of institutions with existing rules for the conduct of human subjects research. For example, the latter research might attempt to generate generalizable results about The Significance of Empirical Bioethics for Medical Practice 31 subject understanding of risks and benefits, the completeness of research records, including properly executed consent forms, and the appropriate notification of subjects when new information becomes available that might affect their willingness to continue in a study. IMPLICATIONS FOR MEDICAL PRACTICE Empirical research in the area of bioethics has helped clarify ‘‘best practices’’ in many areas. There is compelling evidence that the theory and practice of informed consent have failed to live up to expectations in both clinical and research arenas. Likewise, advance directives have provided insufficient guidelines to most patients and many clinicians for treatment decisions when individuals lose decision-making capacity. Reliance on directives available to clinicians has proved frustrating because the instructions frequently do not cover all possible circumstances that patients, surrogates, and clinicians may face. As a result, health care attorneys, hospital administrators, clinicians, and ethicists now tend to recommend that patients both clearly designate a proxy decision maker and engage in detailed discussions with the appointed surrogate regarding the values that should guide decisions when he/she no longer has decision-making capacity. Empirical studies have clearly shown that ethnic, economic, and gender disparities persist in health care and clinical research despite increases in civil liberties and social justice in the last century. Unfortunately, studies have not yet pointed to ways to reduce the inequities. Clinicians need to maintain a high level of awareness about the potential influences of their unconscious biases on treatment and research recommendations, emergency decision making, and interactions with patients and family members. Institutions may need to develop systems to monitor for inequitable patterns of care and, if discovered, system-wide methods to correct imbalances. Of course, to the extent that patterns reflect larger social problems regarding risk for disease and disability and inadequate health care insurance coverage and payment schemes, providers may face serious financial problems if they undertake to redress inequality on their own. Empirical evidence in other areas of bioethics still remains scant or presents an unclear picture. For example, with regard to the ability to assess the adequacy of patient or (potential) research subject decisional capacity, research results are somewhat mixed. Similarly, physician-investigators trying to decide whether sufficient protections exist for patients who might become research subjects – not least those who are part of vulnerable populations – will find a 32 JOEL FRADER confusing array of reassurance, scandal, and troublesome study results. Other bioethical issues that need research attention in efforts to optimize ethical standards and quality in medical care and research include a better understanding of the potential benefits and problems of clinical ethics consultation, the consequences – intended and unintended – of universal health care insurance and rationing schemes, and methods to reduce the administrative burdens on investigators and institutions of ethics reviews of human, animal, and embryonic stem cell research. In conclusion, the above review and discussion indicate that there is fertile ground for more work at the intersection of bioethics and empirical research that can guide practitioners, investigators, patients, and their families in the quest to continue to advance medical practice and research consistent with ethical principles. REFERENCES Agre, P., & Rapkin, B. (2003). Improving informed consent: A comparison of four consent tools. IRB, 25, 1–7. Angelos, P., DaRosa, D., & Sherman, H. (2002). Residents seeking informed consent: Are they adequately knowledgeable? Current Surgery, 59, 115–118. Appelbaum, B. C., Appelbaum, P. S., & Grisso, T. (1998). Competence to consent to voluntary psychiatric hospitalization: A test of a standard proposed by APA. Psychiatric Services, 49, 1193–1196. Baergen, R. (2002). Assessing the competence assessment tool. Journal of Clinical Ethics, 13, 160–164. Baker, M. T., & Taub, H. A. (1983). Readability of informed consents. Journal of the American Medical Association, 250, 2646–2648. Barber, B., Lally, J. J., Makarushka, J. L., & Sullivan, D. (1973). Research on human subjects: Problems of social control in medical experimentation. New York: Russell Sage Foundation. Beecher, H. K. (1966). Ethics and clinical research. New England Journal of Medicine, 274, 1354–1360. Bluebond-Langner, M. (1978). The private worlds of dying children. Princeton, NJ: Princeton University Press. Breden, T. M., & Vollman, J. (2004). The cognitive based approach of capacity assessment in psychiatry: A philosophical critique of the MacCAT-T. Health Care Analysis, 12, 273–283. Campbell, E. G., Weissman, J. S., Clarridge, B., Yucel, R., Causino, N., & Blumenthal, D. (2003). Characteristics of medical school faculty members serving on institutional review boards: Results of a national survey. Academic Medicine, 78, 831–836. Carney, N. T., Neugroshi, J., Morrison, R. S., Marin, D., & Siu, A. L. (2001). The development and piloting of a capacity assessment tool. Journal of Clinical Ethics, 12, 12–23. Churak, J. M. (2005). Racial and ethnic disparities in renal transplantation. Journal of the National Medical Association, 97, 153–160. The Significance of Empirical Bioethics for Medical Practice 33 Cruzan v. Director, 497 U.S. 261 (1990). Dziak, K., Anderson, R., Sevick, M. A., Weisman, C. S., Levine, D. W., & Scholle, S. H. (2005). Variations among institutional review board reviews in a multisite health services research study. Health Services Research, 40, 279–290. Eggers, P. W. (1995). Racial differences in access to kidney transplantation. Health Care Financing Review, 17, 89–103. Emanuel, E. J. (2005). Undue inducement: Nonsense on stilts? American Journal of Bioethics, 5, 9–13. Epstein, A. M., Ayanian, J. Z., Keogh, J. H., Noonan, S. J., Armistead, N., Cleary, P. D., Weissman, J. S., David-Kasdan, J. A., Carlson, D., Fuller, J., Marsh, D., & Conti, R. M. (2000). Racial disparities in access to renal transplantation – clinically appropriate or due to underuse or overuse? New England Journal of Medicine, 243, 1537–1544. Etchells, E., Darzins, P., Silberfeld, M., Singer, P. A., McKenny, J., Naglie, G., Katz, M., Guyatt, G. H., Molloy, D. W., & Strang, D. (1999). Assessment of patient capacity to consent to treatment. Journal of General Internal Medicine, 14, 27–34. Fang, J., & Alderman, M. H. (2006). Gender differences of revascularization in patients with acute myocardial infarction. American Journal of Cardiology, 97, 1722–1726. Forrest, C. B., & Starfield, B. (1998). Entry into primary care and continuity: The effects of access. American Journal of Public Health, 88, 1330–1336. Fost, N. (2005). Gather Ye Shibboleths While Ye May. American Journal of Bioethics, 5, 14–15. Goldman, J., & Katz, M. D. (1982). Inconsistency and institutional review boards. Journal of the American Medical Association, 248, 197–202. Gray, B. H. (1975). Human subjects in medical experimentation: A sociological study of the conduct and regulation of clinical research. New York: Wiley. Grisso, T., & Appelbaum, P. S. (1998). MacArthur competence assessment tool for treatment (MacCAT-T). Sarasota, FL: Professional Resource. In re Conroy, 486 A.2d 1209 (N.J. 1985). In re Quinlan, 355 A.2d 647 (N.J. 1976). Institute of Medicine (IOM). (1997). Approaching death: Improving care at the end of life. Washington, DC: National Academy Press. Institute of Medicine (IOM). (2001). Preserving public trust: Accreditation of human research participant protection programs. Washington, DC: National Academies Press. Institute of Medicine (IOM). (2002). Responsible research: A systems approach to protecting research Participants. Washington, DC: National Academies Press. Kodish, E., Eder, M., Noll, R. B., Ruccione, K., Lange, B., Angiolillo, A., Pentz, R., Zyzanski, S., Siminoff, L. A., & Drotar, D. (2004). Communication of randomization in childhood leukemia trials. Journal of the American Medical Association, 291, 470–475. Lawson, S. L., & Adamson, H. M. (1995). Informed consent readability: Subject understanding of 15 common consent form phrases. IRB, 17, 16–19. Lidz, C. W. (2006). An observational descriptive study of IRB practices. National Institutes of Health Grant 1R01CA107295-01A2. Lidz, C. W., & Appelbaum, P. S. (2002). The therapeutic misconception: Problems and solutions. Medical Care, 40(9 supplement), V55–V63. Lidz, C. W., Meisel, A., Osterwies, M., Holden, J. L., Marx, J. H., & Munetz, M. R. (1983). Barriers to informed consent. Annals of Internal Medicine, 99, 539–543. Lo, B., & Steinbrook, R. (2004). Resuscitating advance directives. Archives of Internal Medicine, 164, 1501–1506. 34 JOEL FRADER Martins, D., Tareen, N., & Norris, K. C. (2002). The epidemiology of end-stage renal disease among African Americans. American Journal of the Medical Sciences, 323(2), 65–71. Maynard, C., Every, N. R., Martin, J. S., Kudenchuk, P. J., & Weaver, W. D. (1997). Association of gender and survival in patients with acute myocardial infarction. Archives of Internal Medicine, 157, 1379–1384. McCullough, L. B., Coverdale, J. H., & Chervenak, F. A. (2005). A comprehensive ethical framework for responsibly designing and conducting pharmacologic research that involves pregnant women. American Journal of Obstetrics and Gynecology, 193, 901–907. McWilliams, R., Hoover-Fong, J., Hamosh, A., Beck, S., Beaty, T., & Cutting, G. (2003). Problematic variation in local institutional review of a multicenter genetic epidemiology study. Journal of the American Medical Association, 290, 360–366. National Bioethics Advisory Commission (NBAC). (2001). Ethical and policy issues in research involving human participants. Rockville, MD: NBAC. (http://www.ntis.gov) O’Connor, A. M., Llewellyn-Thomas, H. A., & Flood, A. B. (2004). Modifying unwarranted variations in health care: Shared decision making using patient decision aids. Health Affairs (Suppl Web Exclusive): VAR63–72. Office of the Inspector General, Department of Health and Human Services. (2002). Protecting human research subjects: Status of recommendations. Washington, DC: DHHS. (www.dhhs.gov/progorg/oei) Ogloff, J. R. P., & Otto, R. K. (1991). Are research participants truly informed? Readability of informed consent forms used in research. Ethics and Behavior, I, 239–252. Peterson, E. D., Wright, S. M., Daley, J., & Thibault, G. E. (1994). Racial variation in cardiac procedure use and survival following acute myocardial infarction in the department of veterans affairs. Journal of the American Medical Association, 271, 1175–1180. President’s Commission for the Study of Ethical Problems in Medicine and Biomedical and Behavioral Research. (1983). The elements of good decisionmaking. In: Deciding to forego life-sustaining treatment (pp. 43–90). Washington, DC: U.S. Government Printing Office. Quill, T. E. (2005). Terri Schiavo – A tragedy compounded. New England Journal of Medicine, 352, 1630–1633. Scott, E. S., Reppucci, N. D., & Woolard, J. L. (1995). Evaluating adolescent decision making in legal contexts. Law and Human Behavior, 19, 221–244. Shah, S., Whittle, A., Wilfond, B., Gensler, G., & Wendler, D. (2004). How do institutional review boards apply the federal risk and benefit standards for pediatric research? Journal of the American Medical Association, 291, 476–482. Sherman, H. B., McGaghie, W. C., Unti, S. M., & Thomas, J. X. (2005). Teaching pediatrics residents how to obtain informed consent. Academic Medicine, 80(October supplement), S10–S13. Silverman, H., Hull, S. C., & Sugarman, J. (2001). Variability among Institutional Review Boards’ decisions within the context of a multicenter trial. Critical Care Medicine, 29, 235–241. Siminoff, L. A., & Drotar, D. (2004). Communication of randomization in childhood leukemia trials. Journal of the American Medical Association, 291, 470–475. Stevens, G. D., Seid, M., Mistry, R., & Halfon, N. (2006). Disparities in primary care for vulnerable children: The influence of multiple risk factors. Health Services Research, 41, 507–531. The Significance of Empirical Bioethics for Medical Practice 35 SUPPORT Principal Investigators. (1995). A controlled trial to improve care for seriously ill hospitalized patients: The study to understand prognoses and preferences for outcomes and risks of treatment. Journal of the American Medical Association, 274, 1591–1598. Tunzi, M. (2001). Can the patient decide? Evaluating patient capacity in practice. American Family Physician, 64, 299–306. Van Luijn, H. E., Musschenga, A. W., Keus, R. B., Robinson, W. M., & Aaronson, N. K. (2002). Assessment of the risk/benefit ratio of phase II cancer clinical trials by Institutional Review Board (IRB) members. Annals of Oncology, 13, 1307–1313. Waggoner, W. C., & Mayo, D. M. (1995). Who understands? A survey of 25 words or phrases commonly used in proposed clinical research consent forms. IRB, 17, 6–9. Weil, E., Nelson, R. M., & Ross, L. F. (2002). Are research ethics standards satisfied in pediatric journal publications? Pediatrics, 110, 364–370. Weir, R. F. (1989). Abating treatment with critically ill patients: Ethical and legal limits to the medical prolongation of life. New York: Oxford University Press. Wennberg, J. E., Fisher, E. S., & Skinner, J. S. (2004). Geography and the debate over medicare reform. Health Affairs (Suppl Web Exclusive): W96–114. Whittle, A., Shah, S., Wilfond, B., Gensler, G., & Wendler, D. (2004). Institutional review board practices regarding assent in pediatric research. Pediatrics, 113, 1747–1752. Whittle, J., Conigliaro, J., Good, C. B., & Lofgren, R. P. (1993). Racial differences in the use of invasive cardiovascular procedures in the department of veterans affairs medical system. New England Journal of Medicine, 329, 621–627. Wolf, L. E., Zandecki, J., & Lo, B. (2005). Institutional review board guidance on pediatric research: Missed opportunities. Journal of Pediatrics, 147, 84–89. Wu, W. C., & Pearlman, R. A. (1988). Consent in medical decision making: The role of communication. Journal of General Internal Medicine, 3, 9–14. Yoon, E. Y., Davis, M. M., El-Essawi, H., & Cabana, M. D. (2006). FDA labeling status of pediatric medications. Clinical Pediatrics, 45, 75–77. This page intentionally left blank SECTION II: QUALITATIVE METHODS This page intentionally left blank QUALITATIVE CONTENT ANALYSIS Jane Forman and Laura Damschroder INTRODUCTION Content analysis is a family of systematic, rule-guided techniques used to analyze the informational contents of textual data (Mayring, 2000). It is used frequently in nursing research, and is rapidly becoming more prominent in the medical and bioethics literature. There are several types of content analysis including quantitative and qualitative methods all sharing the central feature of systematically categorizing textual data in order to make sense of it (Miles & Huberman, 1994). They differ, however, in the ways they generate categories and apply them to the data, and how they analyze the resulting data. In this chapter, we describe a type of qualitative content analysis in which categories are largely derived from the data, applied to the data through close reading, and analyzed solely qualitatively. The generation and application of categories that we describe can also be used in studies that include quantitative analysis. QUANTITATIVE VERSUS QUALITATIVE CONTENT ANALYSIS In quantitative content analysis, data are categorized using predetermined categories that are generated from a source other than the data to be Empirical Methods for Bioethics: A Primer Advances in Bioethics, Volume 11, 39–62 Copyright r 2008 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1479-3709/doi:10.1016/S1479-3709(07)11003-7 39 40 JANE FORMAN AND LAURA DAMSCHRODER analyzed, applied automatically through an algorithmic search process (rather than through reading the data), and analyzed solely quantitatively (Morgan, 1993). The categorized data become largely decontextualized. For example, a researcher who wants to compare usage by physicians, patients, and family members of words such as die, dying, or death versus euphemisms such as pass away or demise, would make a list of words, use a computer to search for them in relevant documents (e.g., audio-recordings of oncology outpatient visits), and compare usage in each group using statistical measures (Hsieh & Shannon, 2005). In qualitative content analysis, data are categorized using categories that are generated, at least in part, inductively (i.e., derived from the data), and in most cases applied to the data through close reading (Morgan, 1993). There is disagreement in the literature on the precise definition of qualitative content analysis; these differences are about how the data is analyzed once it has been sorted into categories. For some authors, qualitative content analysis always entails counting words or categories (or analyzing them statistically if there is sufficient sample size) to detect patterns in the data, then analyzing those patterns to understand what they mean (Morgan, 1993; Sandelowski, 2000). For example, in one study, researchers used qualitative data from semi-structured interviews to identify, count, and compare respondents who personalized the task they were asked to do versus those who did not (Damschroder, Roberts, Goldstein, Miklosovic, & Ubel, 2005). This study derived categorical data from the qualitative data in order to quantitatively analyze differences in the types of responses people gave to the different types of elicitations. Qualitative content analysis is defined more broadly by some researchers to also include techniques in which the data are analyzed solely qualitatively, without the use of counting or statistical techniques (Hsieh & Shannon, 2005; Mayring, 2000; Patton, 2002). USES OF QUALITATIVE CONTENT ANALYSIS Qualitative content analysis is one of many qualitative methods used to analyze textual data. It is a generic form of data analysis in that it is comprised of an atheoretical set of techniques which can be used in any qualitative inquiry in which the informational content of the data is relevant. Qualitative content analysis stands in contrast to methods that, rather than focusing on the informational content of the data, bring to bear theoretical perspectives. For example, narrative analysis uses a hermeneutical perspective that emphasizes interpretation and context, and focuses Qualitative Content Analysis 41 ‘‘on the tellings [of stories] themselves and the devices individuals use to make meaning in stories’’ (Sandelowski, 1991, p. 162). There are some methods of qualitative inquiry (e.g., ethnography, grounded theory, and some types of phenomenology) that, though they bring a theoretical perspective to qualitative inquiry, use content analysis as a data analysis technique. Grounded theory, which has been used extensively in nursing research, uses a specific form of content analysis whose goal is to generate theory that is grounded in the data. Because the term ‘‘grounded theory’’ is often used generically to describe the technique of inductive analysis, it is often confused with a form of inquiry called qualitative description (Sandelowski, 2000) or pragmatism (Patton, 2002). The goal of these approaches to data analysis is to answer questions of practice and policy in everyday terms, rather than to generate theory. What we describe in this chapter is a generic form of content analysis commonly used in the health sciences to answer practical questions. The validity of an atheoretical approach is a source of controversy among qualitative researchers, but is widely accepted by researchers in the health sciences. As compared to quantitative inquiry, the goal of all qualitative inquiry is to understand a phenomenon, rather than to make generalizations from the study sample to the population based on statistical inference. Examples include providing a comprehensive description of a phenomenon; understanding processes (e.g., decision-making, delivery of health care services); capturing the views, motivations, and experiences of participants; and explaining the meaning they make of those experiences. When used as part of a study or series of studies using a combination of qualitative and quantitative methods, qualitative methods can be employed to explain the quantitative results and/or to generate items for a closed-ended survey. Qualitative content analysis examines data that is the product of openended data collection techniques aimed at detail and depth, rather than measurement. For example, a closed-ended survey can be used to measure the level of trust patients have in their physicians. As an alternative, openended interviews in which participant responses are not constrained by closed-ended categories can be used in order to explore the topic of trust more deeply. While a closed-ended survey may provide an assessment of patient trust, it fails to provide any information on the process through which patients come to trust or distrust their physicians, and what trust means to them. Empirical bioethics studies take advantage of the open-ended nature of qualitative research to, for example, examine and challenge bioethical assumptions, inform clinical practice, policy-making or theory, or describe 42 JANE FORMAN AND LAURA DAMSCHRODER and evaluate ethics-related processes or programs. For example, in a study to understand how housebound elderly patients think about and approach future illness and the end of life, interviewees ‘‘described a world view that does not easily accommodate advance care planning’’ (Carrese, Mullaney, Faden, & Finucane, 2002, p. 127). In another study, female patients talked about their views on medical confidentiality to inform clinical practice around confidentiality protections (Jenkins, Merz, & Sankar, 2005). Findings showed that some patients ‘‘might have expectations not met by current practice nor anticipated by doctors’’ (p. 499). As a first step toward developing benchmarks of clinical ethics practices, another study described and compared the structure, activity, and resources of clinical ethics services at several institutions (Godkin et al., 2005). Results indicated a high degree of variability across services and that increasing visibility was a challenge within organizations. DOING QUALITATIVE CONTENT ANALYSIS In the rest of this chapter, we will discuss the choice of qualitative content analysis as one of many decisions in designing a study. We will review the processes and procedures involved in this type of analysis, including data management, memoing developing a coding scheme, coding the data, using coding categories to facilitate further analysis, and interpretation. We will also discuss the use of software developed to aid qualitative content analysis. STUDY DESIGN As we discussed, qualitative content analysis is one of many techniques for performing analysis of textual data. Although it is beyond the scope of this chapter to discuss study design in any detail, we will briefly illustrate the types of design decisions required for qualitative studies and their effect on data analysis. As with all empirical research, study design starts and flows from the research question(s). Thoughtfully and deliberately matching data sources, sampling strategy, data collection methods, and data analysis techniques matched to the research questions is fundamental to the quality and success of any study. After formulating the research question(s), Mason suggests that the researcher ask the following questions when designing a qualitative Qualitative Content Analysis 43 study: ‘‘What data sources and methods of data generation are potentially available or appropriate? What can these methods and sources feasibly describe or explain? How or on what basis does the researcher think they could do this? Which elements of the background (literature, theory, research) do they relate to?’’ (Mason, 2002, p. 27) Examples of data sources are individuals and groups (e.g., persons with kidney failure, family members), sites (e.g., ICU, primary care clinic, governmental agency), naturally occurring interactions (e.g., primary care visit, congressional hearing), documents and records (e.g., policies and procedures, medical records, legislation), correspondence, diaries, and audio-visual materials. Data collection techniques include individual interviews, focus groups, observations, and sampling of written text(s). Once it is clear that a qualitative approach is appropriate, a number of related issues need to be addressed. First, an exploration of what is already known about the topic of study is necessary. The more that is known about the topic, the more structured or deductive the data collection and analysis are likely to be. This is because previous empirical and theoretical work will provide a conceptual framework consisting of concepts and models that direct data collection and analysis (Marshall & Rossman, 2006). The study’s research question will point to a particular unit or units of analysis. Units of analysis can be individual people, groups, programs, organizations, communities, etc. The unit of analysis is the object about which the researcher wants to say something at the end of the study. There may be more than one unit of analysis in a study (Patton, 2002). For example, in a multi-site study of communication in ICUs, ICUs, health provider groups (e.g., physicians and nurses) and individuals (e.g., individual nurse points of view) could all be units of analysis. Having identified an appropriate unit of analysis, the researcher must decide how best to sample it. Sampling in qualitative studies is what is called purposeful. In purposeful sampling, the goal is to understand a phenomenon, rather than to enable generalizations from study samples to populations. In-depth study of a particular phenomenon involves an intensive look at a relatively small sample, rather than a surface look at a large sample. ‘‘Information-rich’’ cases are selected for in-depth study to provide the information needed to answer research questions. It is important to choose those cases that will be of most use analytically (Patton, 2002; Sandelowski, 1995b). Finally, the study’s conceptual framework, unit(s) of analysis, sampling, and data collection technique(s) will affect the data analysis performed: the 44 JANE FORMAN AND LAURA DAMSCHRODER conceptual framework will influence the categories used to code the data, as well as how deductive or inductive the analysis will be; the unit(s) of analysis will determine the entities around which the analysis is organized; the sampling strategy may create subgroups that can be compared; and the data collection technique selected will produce data of varying degrees of depth. On this last point, the more in-depth the data (e.g., a few extended in-depth interviews), the more challenging and time-intensive it is to analyze. Also, because more in-depth data provides more information about what participants mean by their statements, and may indicate causal linkages emerging from the data, it provides more evidence to support a higher inference analysis than data that focuses on breadth (e.g., a set of focus groups). By higher inference, we mean that the researcher can interpret the data at a higher level of abstraction. For example, in a focus group study, participants may name the quality of communication with various members of the health care team as a barrier to appropriate care in the ICU. Data collected through in-depth interviews may contain enough evidence to characterize types of communication barriers with more abstract concepts, such as manifest versus latent communication. It is helpful to consider several additional points when designing a study. First, having several people analyzing the data demands a more structured approach. Second, resource constraints (e.g., time, money, personnel) force trade-offs between the richness of the data, the amount of data collected, and the quality of analysis. The researcher must make all these decisions must be made with the purpose of the study in mind by considering the resources available and the optimal way to expend the resources to obtain the study’s goals. Finally, there is only so much that can be learned from hearing, reading, and thinking about content analysis. In our experience, knowing about qualitative methods is not the same as having experience using them. We suggest that when embarking on a qualitative study, the novice seek a mentor to learn how to use methods and techniques effectively. The most effective learning comes from being actively engaged in a project. Qualitative research requires a somewhat more hands-on approach than quantitative techniques. DATA MANAGEMENT As in any study, before starting data collection, it is important to develop a system for labeling the data (e.g., participant, site, etc.). Qualitative data can Qualitative Content Analysis 45 take on more complex and varied forms than quantitative data that depend on its sources, for example, interviews, direct observation, complicated textual sources, and video-recordings. Therefore, the data sources must be clearly labeled in preparation for coding and electronic entry. For instance, audio-recordings of interviews should be labeled with the participant id number, the date, and the interviewer’s initials. After an interview, it is important to make a copy of the audio-recording so that there is a backup in case the original is lost or destroyed. In most qualitative content analyses, the next step is to transcribe the recording so that it appears as a written text. However, it is essential to recognize that any transcription of spoken words will be incomplete. Qualitative content analysts are interested in the informational content of the data. Therefore, the transcription focuses on a verbatim representation of the words on the audio-recording and may also include some indicators of emotion (e.g., exclamation point, notation of laughter). In contrast, transcription for analysis of discourse has highly detailed specifications that may include, for example, representation of interruptions, pauses, intonation, and simultaneous talk. It is important to use clear rules for how recordings are to be transcribed, whether the transcription is done by the researcher or delegated to a transcriptionist. Having these rules avoids unnecessary work later on, and makes it easier to read and analyze multiple transcripts. Table 1 shows an example of rules that can be included. This is not meant to be an exhaustive list; items and rules for any specific project will vary. Table 1. Example Rules for Transcription. The finished product should be in word (no tables) – Version x; with the following specifications: Times new roman 12 point font Create a separate file for each tape. The filename must include the tape, and site identifier (e.g. TapeB-SiteX.doc) and insert in the first line of the file) Insert page numbers in footer Header 1: Person ID; place on a separate line from the text 1.5uu margins Note interruptions and inaudible conversation by inserting [INTERRUPTION], [INAUDIBLE] in the text. (Only indicate [INAUDIBLE] when words cannot be distinguished because of the recording) Indicate significant deviations from normal conversational tone as [LAUGHTER], [ANGRY], [LOUD] after text Insert tape counter position periodically through the document If you are unsure of a word, put a [?] after the word(s) 46 JANE FORMAN AND LAURA DAMSCHRODER After a transcript is completed, it will need to be compared to the recording to make sure it is accurate. It is best if the interviewer can verify the accuracy of the transcript because the interviewer is more likely to recall information that may be difficult for the transcriptionist to hear due to inaudible speech. If the interviewer is also the transcriptionist, additional editorial comments about the interpretation of what occurred during the interview at specific times can be notated. If editorial comments are added, it is vital to clearly label them as such so that they are not mistaken for raw data. The final step in preparing the transcript for analysis is stripping it of information that identifies participants, such as names and places. It is important to have clear rules for how identifying information will be replaced. The aim is to include enough information about a particular word being replaced so that the informational contents are not lost. DATA IMMERSION, REDUCTION, AND INTERPRETATION Content analysis requires considerably more than just reading to see what’s there. (Patton, 2002, p. 5) In qualitative content analysis, data collection and analysis should occur concurrently. One danger in qualitative research is the collection of large amounts of data with no clear way to manage or analyze it. To maximize the chances of success, one should ‘‘engage with’’ the data early on by starting to develop a coding scheme. By examining the data as it is collected, the researcher will become familiar with its informational content, and may identify new topics to be explored and develop analytic hunches and connections that can be tested as analysis progresses. These insights also inform subsequent data collection. For example, a question might be added to the interview guide to explore a new topic with a subsequent interviewee. Although this chapter presents content analysis as a series of sequential steps, it is important to note that it is an inherently iterative process. One useful approach is to divide content analysis into three phases: immersion, reduction, and interpretation (Coffey & Atkinson, 1996; Miles & Huberman, 1994; Sandelowski, 1995a). Through each of these phases, the goal is to create new knowledge from raw, unordered data. Content analysis requires both looking at each case (e.g., participant, site, etc.) as a whole Qualitative Content Analysis 47 and breaking up and reorganizing the data to examine individual cases systematically, and compare and contrast data across cases. Immersion: Engagement with the Data During immersion, the researcher engages with the data and obtains a sense of the whole before rearranging it into discrete units for analysis. There are several ways to accomplish this. First, the researcher can write what is called a ‘‘comment sheet’’ immediately after the data collection activity to record first impressions, new topics to be added in future data collection, comparisons to data collected previously (if this is not the first data collection activity), and analytic hunches. Second, the audio-recording is listened to. Third, when transcripts are available they can be read several times. The last process that can be employed is to write thoughts that are triggered while listening to and/or reading the data. This form of ‘‘free association’’ is often associated with meaningful insights that can be tested later on. While the science of qualitative content analysis – the process of developing and implementing a systematic approach to data analysis – is vital, much of the art of content analysis takes place when the analyst makes connections that occur once the data are considered holistically. An essential part of the immersion phase is referred to as ‘‘memoing.’’ Memos are documents written by the researcher as he/she proceeds through the inspection of the data and can contain just about anything that will help make sense of it. Memos serve as a way to get the researcher engaged in the data by recording early thoughts and hunches. They also serve to initiate the data analysis by identifying and sharpening categories and themes (core topics or meanings) that begin to emerge. This diminishes the potential for losing ideas and thoughts in the process. Throughout the analysis, memos can describe themes and the connections among them that are developed through interactive inspection of the data. Memos serve as an audit trail of the researchers’ analytic processes and add credibility to the final analysis and conclusions. Memos can be coded along with the raw text data (e.g., transcripts) so that their contents appear in the reorganized data along with the raw data. The researcher can then read the raw data in a particular category along with the relevant category descriptions and analytic hunches recorded in memos during the immersion, preliminary coding, and code development phases of analysis. We describe the process of developing a coding scheme and coding the data in the next section. 48 JANE FORMAN AND LAURA DAMSCHRODER Reduction: Developing a Consistent Approach to the Data One of the most paralyzing moments in conducting qualitative research is beginning analysis, when researchers must first look at their data in order to see what they should look for in their data. (Sandelowski, 1995a, p. 371) The reduction phase is when the researcher develops a systematic approach to the data. It constitutes the heart of the content analysis process and supplies rigor to the process. The goals of the reduction phase are to: (1) reduce the amount of raw data to that which is relevant to answering the research question(s); (2) break the data (both transcripts and memos) into more manageable themes and thematic segments; and (3) reorganize the data into categories in a way that addresses the research question(s). Codes: What are They and Where do They Come from? Codes provide the classification system for the analysis of qualitative data. Codes can represent topics, concepts, or categories of events, processes, attitudes or beliefs that represent human activity, and thought. Codes are used by the researcher to reorganize data in a way that facilitates interpretation and enables the researcher to organize and retrieve data by categories that are analytically useful to the study, thereby aiding interpretation. The thoughtful and deliberative development of codes provide rigor to the analytic process. Codes create a means by which to exhaustively identify and retrieve data out of a data set as well as enable the researcher to see a picture of the data that is not easily discernable in transcript form. As Coffey and Atkinson state, ‘‘attaching codes to data and generating concepts have important functions in enabling us rigorously to review what our data are saying’’ (Coffey & Atkinson, 1996, p. 27). Codes can be either deductive or inductive. Deductive codes exist a priori and are identified or constructed from theoretical frameworks, relevant empirical work, research questions, data collection categories (e.g., interview questions or observation categories), or the unit of analysis (e.g., gender, rural versus urban, etc.). Inductive codes come from the data itself: analytical insights that emerge during immersion in the data and during what is called ‘‘preliminary coding’’ (see below). Although there are studies that use codes developed either deductively or inductively, content analysts most often employ a combination of both approaches. This means using a priori deductive codes as a way to ‘‘get into’’ the data and an inductive approach to identify new codes and to refine or even eliminate a priori codes. Qualitative Content Analysis 49 Developing the Coding Scheme We now turn to a description of how to develop a coding scheme and codebook. Coding the data allows the researcher to rearrange the data into analytically meaningful categories. Code definitions must be mutually exclusive; that is, they must have definitions that do not overlap in meaning. When coding categories are created it is important to consider how the coded data will look once retrieved, and how the rearranged data facilitate addressing the research question(s) during the next phase of the analysis. Codebook development is an iterative process, and begins with what is called ‘‘preliminary coding.’’ This consists of reading through the text, highlighting or underlining passages that may be potentially important and relevant to the research questions, and writing notes in the margins. As noted above, because code development most often is based on inductive and deductive reasoning, it often starts with deductively developed codes but remains open to new topics suggested by the data (inductive codes). To illustrate preliminary coding and subsequent codebook development (and, later in the chapter, the application of codes and data interpretation), we will use a short interview extract from the following hypothetical empirical bioethics study.1 The study explores the development of interpersonal trust in physicians from the point of view of patients with kidney disease who are at risk of losing enough function to need dialysis or a kidney transplant. Using semi-structured interviews, it seeks to understand which physician behaviors and qualities are important in the formation and maintenance of patient trust. This study conceptualizes trust as a phenomenon that arises from the patient’s experience of illness. Phenomenologists have shown that illness disrupts patients’ previously taken-for granted ways of being in the world (Zaner, 1991). Patients’ physical, emotional, and existential need arise from these disruptions and attendant feelings of distress. Trust is a central moral issue in the physician–patient relationship because patients, in their vulnerable state, need to trust that physicians use their expertise and power in their patients’ best interest. Table 2 shows an extract from an interview with a patient who has kidney disease and who is concerned about the progression of her disease. Based on the literature, which shows that physicians who more freely share complete information with patients engender more trust (Keating, Gandhi, Orav, Bates, & Ayanian, 2004), the researcher would begin with a deductive code, for example ‘‘clinical information.’’ The researcher would read the transcript, looking for statements related to ‘‘clinical information,’’ and highlight the text starting at line 4 in the transcript, in which the participant 50 JANE FORMAN AND LAURA DAMSCHRODER Table 2. Extract of Transcript from an Interview with a Patient with Kidney Disease Shortly after Visiting her Nephrologist. I: How has your kidney disease been lately? P: Well, Dr. [name of nephrologist] said that I’m pretty stable, but that eventually I’ll need to make a decision about what kind of treatment I want. You know, the thing I like most about him is that he always explains things really well. Like what my labs are compared to last time. And he tells me in regular language, so I can understand. Not like some doctors, who, um, well, you might as well be a number, for all they care. Like one doctor I went to, he barely looked at me, much less answered my questions. Dr. [name of nephrologist] is just the opposite. When he comes into the room, he looks you in the eye. I: When you say that Dr. [name of nephrologist] explains things really well, what do you mean? P: Well, you know, it’s scary having this disease. Like when I get a pain in my back, I’m thinking that my kidney is deteriorating or there’s too much poison in my blood or I have an infection or something. That actually happened today. I told Dr. [name of nephrologist] I was having these pains, and he examined me and looked at my lab tests and said that it was actually muscle pain; it didn’t have anything to do with my kidneys, just normal stuff. He said that if I had an infection, I’d have a fever and a real different kind of pain, described the difference. He also said that my kidneys getting worse wouldn’t cause pain like that. So I feel relieved. I mean I thought last week maybe I should come in and have him take a look at me, but I wasn’t sure, so I didn’t. So I just spent the week worrying. You know, I’m afraid I’m going to have to start dialysis or get put on the transplant list. I: What has Dr. [name of nephrologist] told you about that? P: Well, he was real clear about what I should expect. This sure isn’t going to get better, but I could stay the same for a while, or I could get worse more quickly. So he doesn’t know exactly what’s going to happen, but that’s ok, as long as he’s honest about it. It just helps to know. If I do get bad enough, I’ll have to start dialysis, and decide what to do about a transplant. He told me stuff about how that works, getting on the list and what I’ll need to consider. That puts my mind at ease; I know I won’t be hit with all this stuff that I don’t know about when the time comes says, ‘‘y eventually, I’ll need to make a decision about what kind of treatment I want y the thing I like most about him is that he always explains things really well.’’ The researcher would also make a notation in the margin about the concept ‘‘explaining,’’ and about how the patient may use the information, namely to make a treatment decision. After reading through several transcripts, the researcher may find descriptions of different uses of clinical, information and a deductive code called ‘‘clinical information’’ may evolve to ‘‘uses of clinical information.’’ The researcher will also want to use inductive reasoning to develop new codes, specifically in vivo codes, which reflect the way informants make sense Qualitative Content Analysis 51 of their world. For example, the text beginning at line 6, ‘‘y he tells me in regular language, so I can understand,’’ can be highlighted with a note about the patient’s desire for information from the physician in ‘‘regular language.’’ In the course of codebook development this statement may be used to support a code developed inductively, called ‘‘physician communication of clinical information.’’ Second, a preliminary code called, ‘‘relief from worry’’ might be created based on lines 18–26 in the transcript and other passages in this transcript and other transcripts. After reading through several transcripts, the code may evolve into a name that defines the concept more broadly, and frames it in terms of use of information, such as ‘‘reassurance.’’ A codebook must be developed to organize codes and to help ensure they are used reliably. A codebook is especially important for projects using multiple coders. Table 3 shows an extract of the codebook from our hypothetical study, and contains a partial list of codes, definitions, and example quotes for each code. The example shows the fundamental elements that a codebook should contain: (1) name of the element; (2) an abbreviated label for that code (e.g., [REASSURE]); (3) the node type; (4) a description of the code that includes a clear definition, often with inclusion and exclusion criteria; and (5) example quotes that further illustrate the correct use of that code, along with a notation of the transcript and line numbers where the quote is located in the data set. The node type refers to the hierarchical position of that code in the coding framework. For example, uses of information [INFOUSES], a parent node, is a high-level category (code) that has four different types of uses: (1) what to expect on progression of the disease; (2) making a decision; (3) reassurance; and (4) monitoring symptoms. Fig. 1 shows how these codes relate to one another and help visualize how ‘‘parent’’ nodes relate to ‘‘child’’ nodes (see p. 43). When working with a team, the code development process will proceed differently than for a solitary researcher. Team coding has its perils, but those are far outweighed by the benefits of having multiple perspectives to establish content validity and the ability to establish and test coding reliability. Multiple coders use much of the same procedures previously outlined in the chapter but their preliminary coding is done independently. The researchers come together to share their impressions of the data. It is important that all team members who might be involved in coding or later stages of the analysis (e.g., interpretation, writing manuscripts) be involved in these early meetings. The goal of this early development is not to review pages of transcripts but rather to engage in high-quality conceptualization through an iterative, negotiated process. Usually, to produce a revised Code Uses of clinical information (INFOUSES) WHAT to expect on progression of the disease (EXPECT) Node Type Description Child Information about progression of the disease and what the patient can expect as it progresses, or the need for such information. Also, INCLUDE statements about the lack of information about disease progression and feeling in the dark about what to expect Information that is useful to the patient to make decisions about medical treatment of the disease, or need for such information Information to help relieve worry or fear, or the need for such information. Also, INCLUDE statements related to communication that causes worry or anxiety Information to help monitor symptoms, or the need for such information Reassurance (REASSURE) Child Monitoring symptoms (MONITOR) Child Parent Descriptions of and/or judgments about the ability of the physician to communicate information in a way the patient can understand or apply Example ‘‘y he was real clear about what I should expect. This sure isn’t going to get better, but I could stay the same for a while, or I could get worse more quickly. So he doesn’t know exactly what’s going to happen, but that’s ok, as long as he’s honest about it.’’ [30–33, 2035] ‘‘eventually I’ll need to make a decision about what kind of treatment I want’’ [4, 2035] ‘‘He also said that my kidneys getting worse wouldn’t cause pain like that. So I feel relieved.’’ [22–23, 2035] ‘‘Like when I get a pain in my back, I’m thinking that my kidney is deteriorating or there’s too much poison in my blood or I have an infection or something .... He said that if I had an infection, I’d have a fever and a real different kind of pain, described the difference.’’ [15–22, 2035] ‘‘he tells me in regular language, so I can understand.’’ [6–7, 2035] ‘‘he barely looked at me, much less answered my questions.’’ [8, 2035] JANE FORMAN AND LAURA DAMSCHRODER Discussion of the uses of information by the patient. EXCLUDE sub-codes Child Md communication of clinical information (INFOCOMMMD) Example Codebook to Guide Data Coding. Parent Making a decision (DECISION) 52 Table 3. 53 Qualitative Content Analysis Parent Node Child Nodes INFOUSES EXPECT DECISION REASSURE MONITOR Fig. 1. Diagram Showing Relationship Between Types of Coding Nodes. (or initial) list of codes, the team will first review a few pages of one transcript. The team will then enter into an iterative process in which analysts apply the revised codes to a portion of the data and then meet again to add or delete codes, and further refine code definitions. After each meeting, decisions and definitions must be documented as codes are proposed, refined and finalized. This makes the process both transparent and systematic, thereby increasing the rigor of the analysis. The codebook is a good place to track changes over time by dating revisions made to code definitions. Mason (2002) provides guidance as to the number of codes and precision of definitions to use when creating codes. Codes allow the researchers to index the data so that they can easily find salient text segments that relate to particular topics or concepts in the next stage of analysis. Discovering, even after coding, that it is difficult to find meaning in the data because the reorganized text is not focused sufficiently on analytically useful topics or concepts is an indication that the codes have been defined too broadly. Codes can also be defined too narrowly. When this occurs coders will have difficulty discerning which of the two closely defined codes should be applied. Too narrow coding can also obscure the ability to see larger patterns and themes. Study goals, resources and the amount of time available will dictate the depth and breadth of coding done and the extent to which multiple, independent coders can be used. The framework for codes refers to the way the codes are arranged in relation to each other to form a conceptual map. It must be carefully designed in a way that best fits the data and that meets the goals of the study. Although each study must be approached individually, the development of 20–40 codes is the norm. As we saw in the example above, it is helpful conceptually to create coding ‘‘trees’’ in which there is a primary or parent code, with all related sub-codes or child codes listed under the parent (e.g., uses of clinical information and its children). Codes should parsimoniously categorize text and yet thoroughly cover the richness of information contained in that text. The framework chosen can make the 54 JANE FORMAN AND LAURA DAMSCHRODER difference between juggling hundreds of unrelated codes versus a fraction of that number of codes organized conceptually; the difference between creating a coding quagmire and providing a launching point for the next phase of analysis. Coding Agreement When code definitions have become substantially stable, and prior to applying the codes to the entire data set, coding agreement must be established. Agreement is when two or more coders who code text data independently, using the same codebook, can consistently apply the same codes to the same text segments. Although differences in how codes are applied are almost guaranteed to occur, regardless of how detailed codebook definitions may be, a sound conceptualization process, along with a well-constructed codebook with well-defined codes will help guide all coders to apply codes consistently. These constitute key methods in ensuring rigor in content analysis. When working in teams, the codebook is especially important for facilitating agreement because several different people may code different portions of the data. When using these methods it is common for solo researchers to assess agreement by having a second coder code a portion of the data and compare the results. The issue of coding agreement exposes a basic tension between the positivist view that bias introduced by human involvement in research must be minimized to increase the validity of research results, and the constructivist view that validity is derived from community consensus, through the social process of negotiation (Lincoln & Guba, 2003; Sandelowski & Barroso, 2003). A tenet of qualitative research is that the researcher is the primary instrument of the research, and brings with her particular experiences, assumptions, and points of view that will affect interpretation of the data (Mason, 2002). Also, some coders may be more familiar with study aims or the data set than others. Thus, multiple coders mean multiple research instruments. Those with a positivist orientation label this as bias and aim to minimize it, while those with a constructivist orientation see it as an inherent feature of the interpretive process. These differing orientations lead to two basic approaches as to how the agreement process should be structured in order to increase the validity of study findings. The first is measuring inter-coder agreement: using quantitative measures of agreement of the coding of two or more independent coders to establish coding reliability. Agreement is measured toward the end of coding scheme development; when it reaches a particular level, the codes are deemed reliable, and coding of the whole data set Qualitative Content Analysis 55 proceeds. There are many ways to quantitatively measure agreement (Lombard, Snyder-Duch, & Bracken, 2002) and some qualitative researchers, following a positivist philosophy, believe that using quantitative measures are essential to establish reliability, especially when working in teams (Krippendorff, 2004; Neuendorf, 2002). The second basic approach to the agreement process is using a consensus process in which two or more coders independently code the data, compare their coding, and discuss and resolve discrepancies when they arise, rather than measuring them. Qualitative researchers who follow a constructivist philosophy do not believe that quantitative measures of reliability are appropriate in content analysis, largely because of their view that unanimity among coders often leads to over-simplification that compromises validity, and that reflexivity and reason-giving are more important aspects of an agreement process than achieving a pre-specified level of agreement independently (Harris, Pryor, & Adams, 2006; Sandelowski & Barroso, 2003). Mason (2002) defines reflexivity as ‘‘thinking critically about what you are doing and why, confronting your own assumptions, and recognizing the extent to which your thoughts, actions and decisions shape how you research and what you see’’ (p. 5). A negotiated agreement process happens when coders meet to discuss the rationale they used to apply particular codes to the data. Through discussion, team members are able to explain their perspectives and justifications, how and why it differs from other team members’ perspectives, and reach consensus on how the data ultimately should be coded. It is important to understand the strengths and weaknesses of each approach and develop a process that best fits the study. The preferred approach will depend on study aims, the coding process used, the type of codes that are being applied (e.g., low versus high inference), the richness of the text being analyzed, the degree of interpretation required for the final product, and the targeted venue for publication and dissemination of the study results. Coding and Reorganizing the Data After a codebook is developed, and the codes can be used reliably or a consensus process is established, the codes can be applied to all the text in the data set. Once accomplished, the text is rearranged into code reports, which list all of the text to which each particular code has been applied. When applying a code to a segment of text, the coder must be sure to include text that will provide sufficient context so that its meaning can be discerned. For example, the entire section spanning lines 15–26 in Table 2 could be coded as REASSURANCE to provide full context for how the 56 JANE FORMAN AND LAURA DAMSCHRODER physician was able to ‘‘put [her] mind at ease.’’ To understand why this is necessary, imagine if only lines 22–23 were coded, ‘‘He also said that my kidneys getting worse wouldn’t cause pain like that. So I feel relieved.’’ When the text segment is read in a report that contains all of the text in the data set coded with REASSURANCE, instead of read in the context of a transcript, the reader loses information as to what led the patient to worry, including the connection to her experience of illness as ‘‘scary’’ because it could result in the need for dialysis or a kidney transplant. Even after the codebook can be used reliably or a consensus process has been established, there may be changes in code definitions. New codes may be added as existing codes are applied to new text and as conceptualization progresses. It is a challenge to manage the tension between the desire for a predictable, sequential, and efficient process and allowing the process to be guided by intuitions, concepts, and theories arising from the data. Especially for large data sets, however, there is a point when the codebook, including code definitions, should be considered final, unless it is deemed critical to add a new code. Depending on resources, it may be possible to recode smaller data sets, say less than 20 transcripts, when new code definitions and codes arise. Interpretation Data to be interpreted include the code reports and memos that can contain anything from interpretive notes to preliminary conclusions, as mentioned earlier. These products need to be further analyzed, interpreted, and synthesized in order to formulate results. This phase of the analysis involves using the codes to help re-assemble data in ways that promote a coherent and revised understanding or explanation of it. Through this process the researcher can identify patterns, test preliminary conclusions, attach significance to particular results, and place them within an analytic framework (Sandelowski, 1995a). There is no clear line between data analysis and interpretation; ordering and interpretation of data occurs throughout the analysis process. However, by the interpretation phase, the groundwork has been laid to produce a finished product that communicates what the data mean. There are many ways to go about interpreting data, but almost all will include re-organizing it, writing descriptive and interpretive summaries, displaying key results, and drawing and verifying conclusions (Miles & Huberman, 1994). To reorganize data in a way that facilitates interpretation, the researcher chooses to produce particular code reports, and organize these reports by Qualitative Content Analysis 57 cases, i.e., subsets of the data that represent the unit(s) of analysis, for example, site or health provider type. (If, as in our example study, the unit of analysis is the individual, each participant counts as a case.) Code reports can represent all of the data in the data set coded with a single code (e.g., REASSURANCE), or a combination of codes (e.g., INFOCOMMMD and DECISION). Code reports and how they are summarized are determined by the research questions, what has been learned from the data analysis to date, and the specifics of what the researcher wants to examine. Code reports can enable case-by-case analysis or can help the researcher delve more deeply into a particular topic. The process is dynamic and iterative and guided by what is learned from the data, so a number of code reports may be produced. After choosing and organizing the code reports, the researcher writes descriptive and interpretive summaries of the data contained in each report. The structure of these code report summaries will depend on the project, but usually includes the main points obtained from reading the report, quotations selected to provide evidence for those points, and an interpretive narrative at the code and/or case level. As discussed earlier, it is vital to draw a distinction between the raw data and the interpretation of the data. Summaries for each case should be grouped together so that each can be examined before making cross-case comparisons. Data displays (e.g., matrices, models, charts, networks) can be helpful for exploring a single case, but are particularly helpful in looking across cases. Miles and Huberman (1994) define a display as ‘‘an organized, compressed assembly of information that permits conclusion drawing and action’’ (p.11). Seeing the data in a compressed form, organized in a systematic way, makes it easier to recognize patterns. It facilitates comparisons, which are important in drawing conclusions from the data. For example, a matrix with categories found to be analytically meaningful arrayed horizontally across the top and cases arrayed vertically can be created. These categories can be codes that were used to break up the data and/or themes which reflect a higher level of interpretive understanding that are developed as interpretation progresses. Each cell in the matrix is filled in with text, numbers, or ordinal group (e.g., text excerpts; main points; 1, 2, 3; high, medium, low) that summarize the characteristics of that category in each case. Fig. 2 is an extract from a data display matrix from our example study (an actual display would include more participants). It assumes that the researcher identified a new theme early in the interpretive phase: ‘‘Attributing physician motives,’’ defined as what motivations the participant attributes to their physician to account for the way the physician communicates with 58 JANE FORMAN AND LAURA DAMSCHRODER ID 101 Attribution of MD motivation “doesn’t know what he’s doing” INFOCOMM MD Didn’t answer my questions “doesn’t think I’d understand” EXPECT Wants more info from MD on potential progression to dialysis DECISION REASSURE MONITOR Not mentioned Worry re: acute symptoms (pain): Does it indicate disease progression? MD did not address satisfactorily when asked. 102 “a brilliant Can physician” understand what MD “cares says (no about me” jargon) Gives detailed info Fig. 2. Describes what MD told her about expected treatment progression; was detailed and included uncertainty Needs to make decision about a transplant; MD gave useful info, including rationale for each treatment option Wants to know whether should call the physician when she has pain or fever; are kidneys infected?. Was Not worried re: mentioned fatique. MD told her it was typical and why it was occurring. Data Display Matrix. them. The researcher creates a data display that arrays attributed motivations along with a brief summary of the contents of each of the codes listed in the example codebook. The researcher would then look at the display to discern patterns in the data and draw preliminary conclusions. Data displays are powerful tools and often are used throughout the interpretation process. Drawing Conclusions Drawing and verifying conclusions involve developing preliminary conclusions and testing them by going back into the data. The researcher may develop the conclusion that one of the ways patients develop trust in their physicians is when physicians explain clinical information in a way that they can understand and that these behaviors denote physician competence to the patient. It is important to look for alternative themes and conclusions that 59 Qualitative Content Analysis may ‘‘fit’’ the data better throughout the interpretation process and not settle on premature analytic closure. Conclusion verification is derived from going back into the data to find evidence that supports or refutes a particular conclusion. The result of verification can be finding that the conclusion holds in most cases, or is refuted, or that an alternative or refined conclusion is supported. If it does hold in most cases, it is not enough to report the theme and show supporting evidence for it. The researcher also must examine ‘‘negative cases’’ – cases for which the conclusion does not hold. In the example, patients for whom information is not useful and who instead judge physician competence and develop trust based on non-verbal and social cues is one such instance. In the final product, discussion of these ‘‘negative’’ cases as they relate to the conclusions adds credibility to findings by showing that the researcher searched for what made most sense rather than simply using data to support one conclusion (Patton, 2002). What about them or their situation is different and what does that say about the phenomenon under study? Finally, the act of writing a report or manuscript presenting study findings refines and clarifies the interpretation and should not be minimized as an important step in the interpretive process. Using Software Qualitative researchers are increasingly using software to manage data and facilitate data analysis and interpretation. Commonly used software includes ATLAS.ti (http://www.atlasti.com/index.php), MaxQDA (http:// www.maxqda.com/), and NVivo (www.qsrinternational.com). It should be emphasized that software is a tool to help manage, retrieve, and connect data, but cannot perform data analysis. Too often, researchers unfamiliar with qualitative coding invest in a software purchase in the mistaken belief that the software will produce and code the data. Correctly used, and with the appropriate data set, such programs can enhance the efforts of the qualitative researcher. Nearly any kind of data source (e.g., text, pictures, video clips) can be imported into the software and coded or linked using tools within that software. Some researchers will use software primarily to enter codes and rearrange their data into coding reports. Others will use it more comprehensively as they work through code development, coding, creating code reports, interpretation, and final manuscript writing. 60 JANE FORMAN AND LAURA DAMSCHRODER The software allows researchers to link source documents with notes, memos, summaries and even theoretical models. For example, notes in the margin of a transcript can be created within NVivo by adding ‘‘annotations,’’ memos and summaries of code reports can be created and coded or linked to other documents in the data set. These kinds of software programs are especially helpful when working in teams because they facilitate sharing annotations, code reports, and summaries. For example, after the interview shown in Table 2 is coded, code reports can be generated to include text related to ‘‘uses of clinical information,’’ grouped by each of the sub-codes (progress, monitor, reassure, and decision). The software also allows one to do special queries, such as reporting all text coded with a designated union or intersection of codes. Analyses can be performed on subsets of transcripts (e.g., patient groups, sites) so that a variety of focused comparisons can be made. SUMMARY In this chapter, we have defined qualitative content analysis and discussed the choice of this method in qualitative research in the field of bioethics. As compared to quantitative inquiry, the major goal of qualitative inquiry is to understand a phenomenon, rather than to make generalizations from study samples to populations based on statistical inference. Qualitative content analysis is one of the many ways to analyze textual data, and focuses on reducing it into manageable segments through application of inductive and/or deductive codes, and reorganizing data to allow for the drawing and verification of conclusions (Miles & Huberman, 1994). The product of this process is an interpretation of the meaning of the data in a particular context. Qualitative content analysis that can be used by itself or in combination with other empirical methods can be employed to examine textual data derived from several sources and constitutes a versatile strategy to explore and understand complex bioethical phenomena. NOTE 1. The data and analysis presented here are based on an unpublished project on ‘‘The Physician-patient Relationship in Function-threatening Illness’’ funded by the Niarchos Foundation, on which Dr. Forman was a co-investigator with Dr. Daniel Finkelstein and Dr. Ruth Faden. 61 Qualitative Content Analysis ACKNOWLEDGMENT The authors wish to thank Dr. Holly A. Taylor for her insightful comments on this chapter. REFERENCES Carrese, J. A., Mullaney, J. L., Faden, R. R., & Finucane, T. E. (2002). Planning for death but not serious future illness: Qualitative study of housebound elderly patients. British Medical Journal, 325(7356), 125. Coffey, A. J., & Atkinson, P. A. (1996). Making sense of qualitative data: Complementary research strategies. Thousand Oaks, CA: Sage. Damschroder, L. J., Roberts, T. R., Goldstein, C. C., Miklosovic, M. E., & Ubel, P. A. (2005). Trading people versus trading time: What is the difference? Population Health Metrics, 3(1), 10. Godkin, M., Faith, K., Upshur, R., MacRae, S., CS, T., & Group, T. P. (2005). Project examining effectiveness in clinical ethics (PEECE): Phase 1 – descriptive analysis of nine clinical ethics services. Journal of Medical Ethics, 31, 505–512. Harris, J., Pryor, J., & Adams, S. (2006). The challenge of intercoder agreement in qualitative inquiry. Retrieved May 17, 2006, from http://emissary.wm.edu/templates/content/ publications/intercoder-agreement.pdf Hsieh, H. F., & Shannon, S. E. (2005). Three approaches to qualitative content analysis. Qualitative Health Research, 15(9), 1277–1288. Jenkins, G., Merz, J. F., & Sankar, P. (2005). A qualitative study of women’s views on medical confidentiality. Journal of Medical Ethics, 31(9), 499–504. Keating, N. L., Gandhi, T. K., Orav, E. J., Bates, D. W., & Ayanian, J. Z. (2004). Patient characteristics and experiences associated with trust in specialist physicians. Archives of Internal Medicine, 164(9), 1015–1020. Krippendorff, K. (2004). Measuring the reliability of qualitative text analysis data. Quality and Quantity, 38, 787–800. Lincoln, Y. S., & Guba, E. G. (2003). Paradigmatic controversies, contradictions, and emerging confluences. In: N. K. Denzin & Y. S. Lincoln (Eds), The landscape of qualitative research: Theories and issues (2nd ed., pp. 253–291). Thousand Oaks, CA: Sage. Lombard, M., Snyder-Duch, J., & Bracken, C. C. (2002). Content analysis in mass communication: Assessment and reporting of intercoder reliability. Human Communication Research, 28(4), 587–604. Marshall, C., & Rossman, G. B. (2006). Designing qualitative research (4th ed.). Thousand Oaks, CA: Sage. Mason, J. (2002). Qualitative researching (2nd ed.). Thousand Oaks, CA: Sage. Mayring, P. (2000). Qualitative content analysis. Forum on Qualitative Social Research, 1(2). Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: A sourcebook (2nd ed.). Thousand Oaks, CA: Sage. Morgan, D. L. (1993). Qualitative content analysis: A guide to paths not taken. Qualitative Health Research, 3(1), 112–121. Neuendorf, K. A. (2002). The content analysis guidebook. Thousand Oaks, CA: Sage. 62 JANE FORMAN AND LAURA DAMSCHRODER Patton, M. Q. (2002). Qualitative research and evaluation methods (3rd ed.). Thousand Oaks, CA: Sage. Sandelowski, M. (1991). Telling stories: Narrative approaches in qualitative research. Image: Journal of Nursing Scholarship, 23(3), 161–166. Sandelowski, M. (1995a). Qualitative analysis: What it is and how to begins. Research in Nursing and Health, 18, 371–375. Sandelowski, M. (1995b). Sample size in qualitative research. Research in Nursing and Health, 18, 179–183. Sandelowski, M. (2000). What ever happened to qualitative description? Research in Nursing and Health, 23, 334–340. Sandelowski, M., & Barroso, J. (2003). Writing the proposal for a qualitative research methodology project. Qualitative Health Research, 13(6), 781–820. Zaner, R. (1991). Trust and the patient-physician relationship. Ethics, trust and the professions: Philosophical and cultural aspects. Washington, DC: Georgetown University Press. ETHICAL DESIGN AND CONDUCT OF FOCUS GROUPS IN BIOETHICS RESEARCH Christian M. Simon and Maghboeba Mosavel ABSTRACT Focus groups can provide a rich and meaningful context in which to explore diverse bioethics topics. They are particularly useful for describing people’s experiences of and/or attitudes toward specific ethical conundrums, but can also be used to identify ethics training needs among medical professionals, evaluate ethics programs and consent processes, and stimulate patient advocacy. This chapter discusses these and other applications of focus group methodology. It examines how to ethically and practically plan and recruit for, conduct, and analyze the results of focus groups. The place of focus groups among other qualitative research methods is also discussed. INTRODUCTION Focus groups are a versatile and useful tool for bioethical inquiry. Successful focus groups shed light on the diversity of views, opinions, and experiences of individuals and groups. Their group-based, participatory Empirical Methods for Bioethics: A Primer Advances in Bioethics, Volume 11, 63–81 Copyright r 2008 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1479-3709/doi:10.1016/S1479-3709(07)11005-0 63 64 CHRISTIAN M. SIMON AND MAGHBOEBA MOSAVEL nature is ideal for stimulating discussion of the kinds of multifaceted and contentious issues that bioethicists wrestle with daily. Apart from their role as a source of information on people’s perceptions of topical issues, focus groups can also be used to determine the ethics-related needs of health care institutions and professionals, to evaluate the effectiveness of interventions, and to generate rapport and trust among research subjects and communities. Fontana and Frey (MacDougall & Fudge, 2001) have summed up the advantages of focus groups as, ‘‘being inexpensive, data rich, flexible, stimulating to respondents, recall aiding and cumulative and elaborative’’ (p. 118). This chapter explores some of the multiple characteristics and uses of focus groups with the purpose of highlighting their potential as a useful investigatory tool in bioethics. We consider a number of issues that we anticipate will be of special interest to bioethics researchers. These issues include the question of how one designs and conducts focus groups in an ethical, culturally appropriate, and scientifically rigorous way, and how researchers can use focus groups to stimulate critical reflection and generate new knowledge on key ethical and social issues. We draw on our experiences conducting focus groups in South Africa to illustrate both the challenges and rewards of using this methodology. Further information about using focus groups in empirical research can be found in a variety of sources, including our own work (Mosavel, Simon, Stade, & Buchbinder, 2005) and in several general guides to focus group methodology (Krueger & Casey, 2000; Morgan, 1997). BACKGROUND A focus group typically is composed of one or two moderators and six to ten individuals who may or may not share a common interest in the issue or topic under investigation. Focus group research originated in the 1930s among social scientists seeking a more open-ended and non-directive alternative to the one-on-one interview (Krueger & Casey, 2000). They later became popular in the marketing world as a tool for establishing consumer preferences for or opinions about different products, brands, and services. This commercial use of focus groups made some social scientists mistrustful of the methodology; however, focus groups are widely used and reported on today in the literature of the social sciences and other disciplines. Focus groups are especially popular among researchers of health and patient care issues, in part because they are comparatively cost effective, Ethical Design and Conduct of Focus Groups in Bioethics Research 65 easy to implement, and less intimidating to some patients or individuals than interviews, questionnaires, or other methods of inquiry. In bioethics, focus groups have been used in a variety of ways, including exploration of public perceptions of the continued influence of the Tuskegee experiments on the disinclination of some groups from participating in biomedical research (Bates & Harris, 2004); how to improve end-of-life care (Ekblad, Marttila, & Emilsson, 2000; McGraw, Dobihal, Baggish, & Bradley, 2002); genetic testing and its medical, social, cultural, and other implications (Bates, 2005; Catz et al., 2005); environmental justice and environmental health issues (Savoie et al., 2005); and the appropriateness and effectiveness of medical informed consent procedures (Barata, Gucciardi, Ahmad, & Stewart, 2005). They have also been used as a tool for evaluating the effectiveness of medical ethics education and training initiatives (Goldie, Schwartz, & Morrison, 2000). In community-based health research, focus groups have been used to explore community health needs and concerns, build rapport and trust, and to empower community members to work toward constructive change (Mosavel et al., 2005; Clements-Nolle & Bachrach, 2003). Other uses of focus groups are possible and likely to emerge in the future. THE FOCUS GROUP PROCESS Some initial considerations: Empirical researchers often face the question of when it is appropriate to use focus groups, as opposed to other empirical methods such as individual interviews or surveys, in order to explore a particular issue, problem, or phenomenon. In making this decision, the researcher will want to bear in mind several factors. Focus group research typically involves far fewer research subjects than interview or survey research, where sample sizes tend to be significantly larger. This may make focus group research less costly and time consuming to conduct than individual interviews or surveys. However, the generally small sample sizes in focus group research also mean that it is harder to generalize findings to the larger group, community, or population from which the focus group participants were sampled. For these reasons, among others, focus groups are often used to obtain preliminary or formative data that can be used to gain an initial impression of participant opinions and attitudes, and to inform the development of individual interviews, surveys, vignettes, or other instruments to be administered at a later date and with larger samples. 66 CHRISTIAN M. SIMON AND MAGHBOEBA MOSAVEL Focus groups are often used in combination with methods such as individual interviews, surveys, or other methods of inquiry. Findings from focus groups can help improve the design of a larger study, including the content, language, and sequence of items, and other essential elements of an interview or survey instrument. An example of this approach is provided by Fernandez et al. who conducted focus groups as a first step in determining the kinds of interview questions to ask pregnant women with regard to the ethical and other issues surrounding the collection, testing, and banking of cord blood stem cells (Fernandez, Gordon, Hof, Taweel, & Baylis, 2003). However, focus groups need not always be used as a first step in empirical exploration. In some cases, researchers have reversed the order of methods, for example, by using surveys to identify salient issues that would benefit from further, in-depth, exploration through focus group discussions (Weston et al., 2005). Selecting Participants for Focus Groups The proper sampling of research participants is essential for all types of empirical research. Different sampling techniques can be used in focus group research (MacDougall & Fudge, 2001); however, a primary objective of all these techniques is to bring together individuals who are generally representative of the larger group, community, or population that the research is interested in. One relatively simple way to achieve this representation is through intentional or purposive sampling, that is, by purposefully selecting specific individuals representative of the age, gender, racial and ethnic characteristics, professional training and skills, and other characteristics evident in the group or community of interest. This sampling approach has the advantage of being flexible and can evolve as the study develops (MacDougall & Fudge, 2001, p. 120). Researchers can draw on informal networks of colleagues, community organizations, advocacy groups, or other sources to help identify potential participants to invite to the focus groups. A second approach to focus group sampling is to randomly select potential subjects. Random sampling typically has more scientific cachet than other approaches do; however, it does present unique challenges. For example, a focus group study aimed at better understanding how pediatric oncologists view the merits and problems of assent with children is unlikely to result in diverse and rich focus groups if participants simply have been randomly selected from, say, a pediatric hospital’s directory of oncologists. Ethical Design and Conduct of Focus Groups in Bioethics Research 67 The resulting sample is likely to be overwhelmingly English-speaking, male, and Caucasian. This level of homogeneity may be perfectly acceptable if the research question at hand is limited to exploring the attitudes and perceptions of individuals who share only these characteristics. However, if the attitudes of individuals of different genders and linguistic and ethnic backgrounds are also of interest to the research, stratified random sampling needs to be employed. In this case, the researcher aiming to explore the issue of assent will need to sample a cross section of pediatric oncologists, sorted into separate lists according to gender, ethnicity, among other possible characteristics. Thus, in addition to male Caucasian oncologists, the researcher may want to invite a number of female and minority oncologists to join his or her focus groups. Of course, if there happens to be only one or two female and/or minority oncologists at the institution the researcher has selected for study, stratified random sampling will not be possible. In the event that there is no diversity of gender or racial background among the oncologists at the institution, the researcher may want to convene a focus group at a second, more demographically diverse institution, or at a number of institutions. Alternatively, the sample can be broadened beyond physicians to include nurses or nurse practitioners, residents, and other oncology staff. However, the decision to take this step would depend on the particular research question at hand. A stratified random sample is therefore one way in which focus group research can be made more representative and rigorous. A variety of sources have discussed sampling procedures for focus groups in more detail. Interested readers are referred to, among other sources, MacDougall and Fudge (2001) on the purposive approach to sampling and Krueger and Casey (2000) on random sampling. Preparing for Focus Groups Focus groups typically involve a series of questions that the moderator poses in order to generate discussion and get feedback on a particular topic. Depending on the length of the focus group (typically between 60 and 90 min), between 8 and 10 core questions are usually posed. Focus group questions need to be carefully developed so that they address the research question(s) at hand, are relevant, comprehensible and interesting to participants, and can be covered in the time allotted. They also need to be appropriately sequenced, for example, by asking questions of a more complex or controversial nature after participants have had a chance to 68 CHRISTIAN M. SIMON AND MAGHBOEBA MOSAVEL develop confidence and rapport among themselves, and with the moderator. Questions also need to flow logically and be guided by participants’ particular responses and by the moderator’s comments as facilitator. Researchers may need to consider the sensitive and complex nature of many of their bioethics-related research topics when deciding on the questions they want to pose in a focus group. Inquiring about people’s views on stem cell or genetics research, informed consent, end-of-life care, and other similar topics has the potential to intimidate focus group participants due to being politically and morally loaded. Similarly, questions that require conceptual or technical knowledge may be beyond participants’ level of knowledge or experience. One way of avoiding the dreaded silences that questions about such issues can introduce into a focus group is to first consult with group or community leaders on how best to approach the issue at hand. In fact, this community consulting process should be considered for a range of reasons, including to facilitate access to potential focus group participants, to develop linguistically and culturally appropriate questions, and to enrich the analysis of focus group data (see description later). By consulting with key stakeholders, researchers can quickly establish what kinds of topics and questions will likely be viewed as acceptable and engaging, or which ones ought better be avoided. Different levels of community or group engagement can be sought. For example, a researcher may want simply to submit a list of focus group questions to selected community members or leaders to gain their feedback on how comprehensible, engaging, and appropriate the questions are, or, he or she may involve the community from the beginning in the formulation of the research questions so that they are reflective of the interests and concerns of the wider community in which the research is taking place. Cost, logistics, and other considerations will likely determine which of these options the researcher can feasibly take. Regardless, focus groups are far more likely to attract a good turnout, lively discussion, and rich data if preceded by efforts to engage the target group or community in developing and validating the questions to be asked. Informed Consent for Focus Groups The consent process for potential participants of focus groups should be responsive to all the elements of good informed consent, plus a number of additional considerations. The group-based nature of focus groups makes it harder to ensure confidentiality when compared to individual interviews or Ethical Design and Conduct of Focus Groups in Bioethics Research 69 surveys. Focus group participants may disclose personal and sensitive information about themselves to the moderators, the researchers, and other focus group participants. Some researchers attempt to discourage this kind of disclosure by asking participants to comment generally on the topic and not to share personal information about themselves. This may be partly effective, however, the interactive and intimate nature of focus groups can get the better of participants and prompt them to share sensitive personal information before the moderator can intervene and stop them. While the researcher can take steps to keep focus group recordings and transcripts containing personal information confidential, he or she has little to no control over whether or not the information will be more widely shared by group participants once the focus group is over. Researchers can take a number of steps to help reduce, if not eliminate, concerns about the confidentiality and privacy of information that is shared in focus groups. One such step is to ask participants to use only their first names while engaged in focus group discussions. This will help protect participants’ identities if they are not already known to one another. Another step is to balance the need for diversity in focus groups against the need for confidentiality and respect among their participants, which may mean not including in the same focus group individuals who may be antagonistic toward one another on ideological, religious, or other grounds. However, such focus groups are unlikely to stimulate constructive discussion or yield important data. Finally, focus group moderators can help dispel some anxiety about confidentiality. This should not involve a formal review of the confidentiality statement that would have been included in the consent document for the research study, but a brief verbal reminder that what is said during the focus group must remain in the group. The usual practice is to provide this reminder before the focus group discussion begins and once again when it ends Selecting and Training Moderators Selecting and training effective moderators are critical steps in the successful conduct of focus groups. A poorly selected or trained focus group moderator will not be able to promote optimal interaction among participants, keep their discussion from straying, and ask pertinent follow-up questions. Many focus groups use two moderators. This has the advantage of allowing one moderator to concentrate fully on introducing the topic, asking questions, and stimulating discussion while the other keeps 70 CHRISTIAN M. SIMON AND MAGHBOEBA MOSAVEL track of time, operates the audio recorder, and helps in asking follow-up questions. Obviously, if two moderators are used, they need to clearly understand their own as well as one another’s roles and responsibilities, and spend time training together. Training is particularly critical for first-time moderators, and should focus, among other things, on the development of appropriate responses to classic focus group problems. For example, training could involve a few participant-actors who respond to the moderator’s questions in any number of predetermined and realistic ways. In actual focus group encounters, for example, it is not unusual for some participants to gravitate toward dominating discussions, while others grow increasingly passive. Training sessions that simulate this dynamic can be used to help moderators identify and negotiate this potential problem. Actors and moderators can debrief afterwards to discuss what sorts of moderator-initiated interventions work and do not work in the effort to address issues of passivity and dominance, among others. It is also important to consider how well moderators are matched to the focus group participants they will be interacting with. Moderators who are matched to focus group participants in terms of age, gender, social and ethnic background, dress code, and so forth will engender greater rapport and openness. It may be appropriate for the researcher him- or herself to facilitate the focus groups, for example, if the focus groups include researcher colleagues who may expect a certain level of sophistication from their interactions with their moderator. On the other hand, some focus group participants may feel intimidated if the researcher serves as moderator. This may be the case particularly if the focus groups are led by the researcher–moderator is older or more experienced than most of the participants. These and other potential advantages and drawbacks should be carefully considered before the researcher decides whether or not to serve as a moderator. Where to Conduct Focus Groups and How Focus groups, particularly if they are addressing sensitive or controversial issues, will need to be conducted in as private and comfortable and accessible facilities as possible. Such settings may be difficult to secure given that many researchers may want or need to conduct their focus groups in hospitals, clinics, medical schools, and other health-related facilities. Space constraints, noise, unplanned interruptions, and other typical features of Ethical Design and Conduct of Focus Groups in Bioethics Research 71 many medical and clinical settings need to be negotiated as a result. Furthermore, if participants are patients and/or family members, they may not feel comfortable openly sharing their opinions in a medical or clinical setting, even if their health care providers are not immediately present. Health care facilities may also be hard for participants to access in terms of location, parking, or finding the room where the focus group is being held. Alternative locations for focus groups might include local libraries, community centers, or other public facilities that can offer quiet and comfortable environments. The focus groups themselves should be flexible, exploratory, and not overly controlled. At the same time, too little structure or direction in a focus group can result in a lack of focus, confusion, argument, and, ultimately, highly disconnected data. Hence, focus group researchers have used a variety of strategies in an effort to balance the need for flexibility and informality against the need for direction and focus. These strategies include the use of question-and-answer formats, discussion guides, vignettes, ‘‘show cards,’’ video or audiotape, and Internet. For example, in a study designed to identify the key issues associated with the use of human-genes in other organisms, the Bioethics Council in New Zealand used show cards depicting various scientific claims associated with gene research to stimulate discussion on the topic (retrieved October 6, 2005 from http://www.bioethics.org.nz/ publications/human-genes). The well-established ‘‘case study’’ in bioethics also potentially lends itself well to stimulating focus group discussion on any number of ethics topics. These and other techniques can be used in combination with a questionasking approach. A well-developed series of questions has the capacity to generate lively discussion and valuable feedback on the topic of interest to the researcher. Many sources caution against the temptation to ask focus group participants too many questions; typically, participants in an hourlong focus group should not be asked to consider more than five core or primary questions. Moderators should promote mutual respect among focus group participants by, for example, stating at strategic moments throughout the focus group that, ‘‘there are no right or wrong answers.’’ Focus groups dealing with sensitive topics can also be introduced through an ‘‘icebreaker question.’’ Here, again, prior consultation with community or group members can be helpful. For example, as part of our focus group research on a key social justice issue in South Africa, namely access to women’s health care resources, we asked community members to suggest an appropriate way of starting off the focus groups. Because some of our 72 CHRISTIAN M. SIMON AND MAGHBOEBA MOSAVEL groups included youth, the suggestion was made that each participant should provide their first name and the name of a country whose first letter matched that of their name. This simple strategy helped to get the focus groups off on a lighthearted note, without asking participants to grapple with too difficult a topic or to divulge anything too intimate about themselves. Monitoring the Focus Group Process Monitoring focus group processes is essential to the quality and success of a focus group study. There are many different ways in which researchers can accomplish this; here, we mention two possible steps: (1) the use of debriefing reports that are put together by moderators after each focus group, and (2) ongoing review of the focus group audiotapes and/or transcripts by research staff, including the principal investigator (PI). Debriefing reports are completed by the moderator usually within 24 h of a focus group discussion and summarize the group dynamics, the quality of responses to questions, the main themes, any peculiarities in the group that may have affected its responses, and suggestions for conducting future groups. Data for debriefing could also be based on observations made or notes taken by moderators of participants’ nonverbal behaviors and of their own responses. Researchers can also meet and verbally debrief with moderators immediately following a focus group. However, a written debriefing report is useful as a record that can later help in the evaluation and analysis phase of the research process (see description later). The second step, reviewing the audiotapes or transcripts from a focus group, allows researchers to evaluate for themselves the quality of interaction and response in a given focus group. Both these steps help ensure not just that quality data are being collected, but that the focus group experience is mutually rewarding for both the researcher and the participants. DATA ANALYSIS Preliminary Analysis The data analysis process for focus groups is largely driven by the research question or aims of the research. Data analysis can begin immediately after Ethical Design and Conduct of Focus Groups in Bioethics Research 73 the first group ends or once all focus groups have been conducted. It is a good idea to start informally analyzing the data from the outset of the study so that the moderator and researcher have the opportunity to identify any challenges in the process or with the content. Materials for this informal analysis can include the audiotapes or transcripts (if already available) of the discussions and any debriefing notes that the moderator(s) may have taken about nonverbal and other behaviors among participants. Preliminary data analysis can also be used to identify issues or themes that the researcher may want to take up in subsequent focus groups. However, this strategy can be problematic if the study design or its anticipated outcomes depend on consistency in the kinds of questions being asked from one focus group to the next. Full Analysis Analysis of focus group data can be and often is quite complex, especially given that the researcher may need to process large amounts of narrative. Despite the qualitative nature of the data, its analysis must be systematic, verifiable, and context driven. The analysis process should be guided by the aims, overall philosophy, and anticipated outcomes of the research. Often the main outcome will take the form of a report of the pertinent issues discussed in the focus groups. As noted above, in other cases, the researcher may use the focus group data as part of formative research which will inform a larger, more representative study. Researchers have found that engaging multiple participants in the data analysis process can greatly facilitate rich analysis and interpretation of focus group data. For example, the authors employed and trained South African community members as well as US-based research assistants to help in analyzing their focus group data. This approach helped address the researchers’ concern that their data might be distorted if analyzed through the social and cultural lens of only South African or American research assistants. In this approach, the South African research assistants brought their intimate knowledge of the local community and its wider social and cultural context to the data, while the US. research assistants added a useful critical distance, along with their prior training and experience in data analysis. By involving South African community members in the analysis phase, the researchers were also being consistent with their participatory philosophy, which emphasized the need for active community involvement in all phase of their research. 74 CHRISTIAN M. SIMON AND MAGHBOEBA MOSAVEL Below, we describe one method that is increasingly used in focus group data analysis: workshop-based summarizing and interpretation of data. However, it should be noted that focus group researchers have described many different kinds of methods for analyzing data. Different software programs also exist to facilitate qualitative data analysis, including focus group data. These programs offer ways of innovatively and rapidly sorting and organizing qualitative data, moving between data sets, and streamlining their analysis. Online tutorials can help users learn how to use these programs; however, the upfront time and effort required to master these programs are still significant. The choice of analytic method may be affected by many factors, including the size of the focus group study. Data analysis for large studies involving 10 or more focus groups, for example, may best be conducted using a computer program rather than through a workshopbased or manual cut and paste method (Krueger & Casey, 2000). The research objectives and anticipated outcomes should also play a key role in deciding what method to use for data analysis. Workshop-Based Summaries and Interpretation One practical and useful way to analyze focus group data is to begin by reviewing the audiotapes and transcripts, and then creating summaries of participants’ responses to each question asked. Since the summaries contain only a synopsis of what was said, the original audiotapes and transcripts may need to be repeatedly consulted to place the summaries back into their contexts. Research assistants can help with this process by writing summaries and offering, in a separate section, their initial interpretations of what was said and not said in the focus group, and what appeared to be the most compelling theme or themes for each question asked. Having these summaries and interpretations independently generated by at least two people will allow them to be placed side by side for comparison and validation. Data analysis workshops can be used to review and verify summaries and interpretations. Often, what to include or leave out of a summary or interpretation will need to be decided through a process of discussion and negotiation among the researchers and research assistants. This process can be difficult, in part because people can be overzealous in their efforts to highlight certain themes in the data or to interpret the information in one way or another. Nonetheless, this workshop-based negotiation of the data and its interpretation may be one of the most effective ways of minimizing Ethical Design and Conduct of Focus Groups in Bioethics Research 75 the intrusion of individual bias into the data analysis and interpretation phase of the research. In our South African work, we held workshops both in the US and South Africa, aimed at analyzing and interpreting our focus group findings. Research Associates (RAs) in both countries were trained to create comprehensive, substantive response summaries for each focus group question. Each summary had three components: a quantitative list of responses, showing how many times a particular behavior, issue, or theme was mentioned or talked about over the course of the focus group; a narrative synopsis of the themes and issues that emerged from participants’ responses; and, the RAs’ personal interpretations of the responses to each focus group question. An example of each of these components is provided below: Example of analyzed focus group data Focus group question: What do young people in this community do for fun? 1. Quantitative list of responses: Listen to music – 8 Singing – 1 Go out to malls – 1 Watch movies – 2 Washing and cleaning my place at home – 1 Hanging out with friends – 2 2. Qualitative summary The majority of focus group participants (8 girls) reported that they liked to listen to music, particularly R&B. Other types of music the participants reported liking are hip-hop, gospel, Kwaito, jazz, and Indian music. The participants reported a wide variety of activities they do for fun. The girls reported that friends’ houses, the library, taverns, game shops, malls, and the ice rink are places where they spend time with their friends. Game shops are places where alcohol is not served and children go to play games. One participant explained that taverns are places where grown people drink alcohol: ‘‘Sorry, there’s a difference between a game shop and the taverns, they don’t put games in the taverns, grown people go there, and at the game shops, children go there’’ (p. 3). On the other hand, several girls said that young girls, including some of their friends, from their school go to taverns. 76 CHRISTIAN M. SIMON AND MAGHBOEBA MOSAVEL A few participants reported that crime makes it difficult for youth to have fun. When asked, one participant said that crime means: ‘‘Some people will rape you, take you away from your home, and kill you,’’ (p. 4). 3. Interpretations Interpretation (RA 1) These girls implicitly communicate the importance of friends and the peer group, particularly in relation to how they spend their time. The prevalence of sexual violence and other types of violence in the daily lives of these girls is evident. The fact that they feel they have nowhere safe to go indicates a dangerous environment that must be navigated. These girls have identified a few safe places for themselves, however, such as the library, though from their laughter it seems that may not necessarily be a fun place to spend time. Interpretation (RA 2) While participants list a wide variety of fun activities that they do in their spare time, they also depict their community as a dangerous place where girls must face potential violence at many public places, even the supermarket. It seems that a lot of bad activity centers at the taverns, which several attribute to the alcohol there. The girls portray their town with a matter-of-fact attitude and seem to speak optimistically, despite all the crime. These summaries were created using a workshop approach, both by RAs based in the United States and in South Africa. Prior to each workshop, each member of the research team read the full transcripts for each particular focus group. One RA prepared a detailed, three-part summary for each question, consisting of the list of categories, narrative synopsis, and subjective interpretations. We used the same process both in the US and in South Africa, with slight variations due to the local context and resources. Process in United States: Prior to each workshop, the research team, consisting of the two principal investigators (PIs) and two RAs, read the full transcripts for each particular focus group. One RA would prepare a detailed, three-part summary for each question. For the first two focus groups we analyzed, each of the two RAs summarized the same transcript and then merged their summaries. It was then determined that the summaries were similar enough that this verification process was unnecessary. In the workshops, we read the summaries of each focus group question for accuracy, and discussed modifications. In general, the Ethical Design and Conduct of Focus Groups in Bioethics Research 77 summaries did not require substantive changes. Nonetheless, the subjective interpretation portions of each summary were especially illuminating in that they often revealed the contextual biases of the RAs. Interpretations were discussed, but not corrected. After we reached agreement on the accuracy of the summaries for all focus groups, we analyzed the narratives for common themes across all groups. Process in South Africa: Similar to the US data analysis procedure, two RAs were assigned to each transcript. The summaries were written in English. To establish coder reliability, and to replicate the role of the study investigators in the US as much as possible, a senior RA was assigned to verify the accuracy and completeness of each summary. In addition, the PIs received and delivered regular feedback via telephone and email to the analysis team in South Africa. This analytic approach had several distinctive advantages. It allowed for the creation of a manageable and dynamic data set, consisting of short summaries of otherwise long and potentially unwieldy transcripts; a combination of qualitative and quantitative data; and, interpretations reflective of the social and cultural context in which they were done. It was also a creative and lively way to analyze focus group data, and, in South Africa, a way of keeping the community members whom we hired to help us with the research involved and interested in a phase of research that can be all too easily desegregated from the community setting. One potential drawback of this approach is that key information can be lost as a result of generating short narrative summaries of the much longer transcripts. We addressed this limitation by moving back and forth between the summaries and the original transcripts in order to contextualize quotations, to identify what may have been said both before and after a summarized segment, and to retain the original tone or texture of the focus group discussion. It is possible and in some cases more appropriate to use other methods to analyze focus group data. It is also possible to construct only one or two components of our approach, for example, only the quantitative listings or only the narrative summaries. However, using them in combination will make for a richer, more multifaceted analysis of focus group data. DATA DISSEMINATION After the analysis, the nature and the scope of data dissemination is largely determined by the purpose of the focus groups and the overarching 78 CHRISTIAN M. SIMON AND MAGHBOEBA MOSAVEL philosophy guiding the research. In cases where the researcher was contracted to do the research, the goal in this phase is usually to provide the client with a report. Reports have different formats including narrative, report memo, top-line, and bulleted reports (Krueger & Casey, 2000). The narrative report is the lengthier document, which is framed by the main focus group questions or the issues that has emerged from the data. This report usually includes recommendations for the client. The report memo is typically geared toward focus group participants and its purpose is to assure the focus group participants that they were heard. It commonly focuses on progress that has been made since the groups met or includes future goals that will further address the concerns of participants. Similarly, the top-line report is a much more concise report which includes a combination of bulleted points and narrative about the focus group. In fact, this report may be somewhat similar to the debriefing report in that it is usually presented to the client within a day or two after the group. The top-line report is usually prepared without careful data analysis but is based on a more immediate evaluation of the focus group. This report is a standard in market research but it may have value to the researchers attempting to disseminate findings about sensitive or controversial issues. When focus groups are conducted for academic research purposes, their findings are usually shared with an academic audience through journals, other publications, or at conference proceedings. However, other audiences can also be reached with focus group findings. In fact, because focus group research often marks the researcher’s first entrée into a community, it provides a unique opportunity to help build support and trust through data sharing and dissemination. Researchers may find it particularly useful to share data with individuals, groups, or communities who participated in the research in order to stimulate constructive discussion of sensitive or controversial issues. Sharing data in this way can boost trust, support, and accountability between researchers and research participants. The format and scope of data dissemination needs to be determined by the research question. In their South African study, the authors returned to the community and met with various local stakeholders to share their findings (Mosavel et al., 2005). One of their goals in doing this was to demonstrate accountability to their initial contacts and to the community. The researchers used a variety of reporting methods including written reports, structured and unstructured conversational reports, as well as informal and formal briefing sessions. They met with representatives of local government, with school principals, and with potentially interested health care agencies and provided them with an executive summary of the focus Ethical Design and Conduct of Focus Groups in Bioethics Research 79 group findings. They also discussed the findings in informal briefings with community teachers, clinic staff, and library personnel who had daily contact with young people in the community. Other members of the community were invited to informal presentations at which the results of the focus groups were presented for commentary and discussion. In sharing data with research participants or communities, the focus group researcher should anticipate that some people might ask, ‘‘how is this information relevant to us?’’ or ‘‘why are you telling us this?’’ It is also important for the presenter to emphasize that the focus group does not generate generalizable data. These and other factors may lead audiences to justifiably question the relevance or significance of focus group data. Krueger and Casey call this the, ‘‘ho-hum syndrome,’’ which tends to accompany the presentation of focus group data that has not been appropriately or clearly presented (Krueger & Casey, 2000). Researchers need to anticipate this reaction and provide clear information that indicates to the audience the relevance of the data and how it might affect their lives or the life of their communities. CONCLUSION Focus groups can provide a rich and meaningful context for exploring many different kinds of bioethics issues. They are an excellent tool for formative exploration of sensitive or controversial topics and for partnership building with research participants and communities. The strengths of focus group methodology lie in its participatory and interactive nature. By generating interaction and discussion, focus groups can explore issues to a degree not possible in interviews, surveys, or other empirical tools. The tradeoff for this depth and richness lies in the limited generalizability of focus group findings, the large amount of qualitative data that focus groups generate, and the challenge of analyzing these data. Researchers interested in using focus groups in their studies need to evaluate these strengths and limitations against their research goals, available resources, and other factors. We conclude this chapter with a note on possible future innovations in focus group research. Researchers have recently started exploring the benefits of conducting ethics-focused and other kinds of focus group research utilizing the Internet or World Wide Web. The benefits of this technology include the ability to ‘‘bring together’’ participants who may live or work far apart, even in different countries. Online focus group research may also provide certain social groups with an appreciated sense of 80 CHRISTIAN M. SIMON AND MAGHBOEBA MOSAVEL anonymity, social distance, and safety. One possible drawback of internetbased focus groups is that they may be able to recruit only individuals who have access to and are able to use online computers. Economically disadvantaged, elderly, and other individuals who typically have limited access to the Internet therefore may be excluded from important research. Adequate informed consent may also be difficult to obtain over the Internet, and discussions may be hard to keep private if they are conducted, recorded, and/or stored online. Unless they are televised in some way, online focus groups may also lack the visual and proximal intimacy that enriches face-to-face interaction. These and other potential advantages and drawbacks still need to be fully explored before the usefulness of conducting online bioethics focus group research can be determined. REFERENCES Barata, P. C., Gucciardi, E., Ahmad, F., & Stewart, D. E. (2005). Cross-cultural perspectives on research participation and informed consent. Social Science and Medicine, 19([Epub ahead of print]). Bates, B. R., & Harris, T. M. (2004). The Tuskegee study of untreated syphilis and public perceptions of biomedical research: A focus group study. National Medical Association, 96(8), 1051–1064. Bates, R. (2005). Public culture and public understanding of genetics: A focus group study. Public Understanding of Science, 14(1), 47–65. Catz, D., Green, N., Tobin, J., Lloyd-Puryear, M., Kyler, P., Umemoto, A., et al. (2005). Attitudes about genetics in underserved, culturally diverse populations. Community Genetics, 8(3), 161–172. Clements-Nolle, K., & Bachrach, A. (2003). Community-based participatory research with a hidden population: The transgender community health project. In: M. Minkler & N. Wallerstein (Eds.), Community-based participatory research for health (pp. 332-347). San Francisco: Wiley. Ekblad, S., Marttila, A., & Emilsson, M. (2000). Cultural challenges in end-of-life care: Reflections from focus groups’ interviews with hospice staff in Stockholm. Journal of Advanced Nursing, 31(3), 623–630. Fernandez, C. V., Gordon, K., Hof, M. V. d., Taweel, S., & Baylis, F. (2003). Knowledge and attitudes of pregnant women with regard to collection, testing, and banking of cord blood stem cells. Canadian Medical Association Journal, 168(6), 695–698. Goldie, J., Schwartz, L., & Morrison, J. (2000). A process evaluation of medical ethics education in the first year of a new medical curriculum. Medical Education, 34(6), 468–473. Krueger, R. A., & Casey, M. A. (2000). Focus groups: A practical guide for applied research. Thousand Oaks, CA: Sage. MacDougall, C., & Fudge, E. (2001). Planning and recruiting the sample for focus groups and in-depth interviews. Qualitative Health Research, 11(1), 117–126. Ethical Design and Conduct of Focus Groups in Bioethics Research 81 McGraw, S. A., Dobihal, E., Baggish, R., & Bradley, E. H. (2002). How can we improve care at the end of life in Connecticut? Recommendations from focus groups. Connecticut Medicine, 66(11), 655–664. Morgan, D. L. (1997). Focus groups as qualitative research. Thousand Oaks, CA: Sage. Mosavel, M., Simon, C., Stade, D., & Buchbinder, M. (2005). Community based participatory research (CBPR) in South Africa: Engaging multiple constituents to shape the research question. Social Science and Medicine, 61(12), 2577–2587. Savoie, K. L., Savas, S. A., Hammad, A. S., Jamil, H., Nriagu, J. O., & Abuirahim, S. (2005). Environmental justice and environmental health: Focus group assessments of the perceptions of Arab Americans in metro Detroit. Ethnicity and Disease, 15(Suppl 1), S1–S41. Weston, C. M., O’brien, L. A., Goldfarb, N. I., Roumm, A. R., Isele, W. P., & Hirschfeld, K. (2005). The NJ SEED project: Evaluation of an innovative initiative for ethics training in nursing homes. Journal of American Medical Directors Association, 6(1 Jan–Feb), 68–75. This page intentionally left blank CONTEXTUALIZING ETHICAL DILEMMAS: ETHNOGRAPHY FOR BIOETHICS Elisa J. Gordon and Betty Wolder Levin ABSTRACT Ethnography is a qualitative, naturalistic research method derived from the anthropological tradition. Ethnography uses participant observation supplemented by other research methods to gain holistic understandings of cultural groups’ beliefs and behaviors. Ethnography contributes to bioethics by: (1) locating bioethical dilemmas in their social, political, economic, and ideological contexts; (2) explicating the beliefs and behaviors of involved individuals; (3) making tacit knowledge explicit; (4) highlighting differences between ideal norms and actual behaviors; (5) identifying previously unrecognized phenomena; and (6) generating new questions for research. More comparative and longitudinal ethnographic research can contribute to better understanding of and responses to bioethical dilemmas. INTRODUCTION Ethnography aims to understand the meanings that individuals attach to situations or events under study and the myriad of factors that affect beliefs Empirical Methods for Bioethics: A Primer Advances in Bioethics, Volume 11, 83–116 Copyright r 2008 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1479-3709/doi:10.1016/S1479-3709(07)11004-9 83 84 ELISA J. GORDON AND BETTY WOLDER LEVIN and behavior. Because of this, ethnography is well suited to the study of bioethics. Bioethical issues and dilemmas are morally charged, laden with meaning, and unfold through social interaction. Ethnographic research is therefore ideal for opening the door to the world of meanings attributed to health-related events and moral decisions, and for understanding the broader socioeconomic and political factors shaping how cultures and cultural members frame, interpret, and respond to such phenomena. Ethnography was one of the first methods used to conduct empirical research on bioethical issues (Fox, 1959; Glaser & Strauss, 1965). Ethnographic studies relating to bioethics can be categorized into 3 groups. The first group is specifically about the work of bioethics itself, such as research on the role and functioning of Institutional Review Boards (IRBs) or hospital ethics committees (chapters in Weisz, 1990; DeVries & Subedi, 1998; Hoffmaster, 2001). The second group aims to elucidate bioethical issues with a focus on how bioethical dilemmas and conflicts develop and/or are addressed (see Guillemin & Holmstrum, 1986; Levin, 1986; Anspach, 1993; Zussman, 1992; DeVries, Bosk, Orfali, & Turner, 2007). The third set of ethnographic studies is not framed primarily as research in bioethics, but is relevant to the field, such as Fox’s and Swazey’s (1978, 1992) classic studies of dialysis and organ transplantation and Bosk’s (1979) examination of the socialization of surgeons in the context of surgical mistakes. Other seminal studies in this genre include Bluebond-Langer (1978) on children with leukemia; Estroff (1981) on people living with mental impairments; Ginsburg (1989) on the abortion debates in an American community; Rapp (1999) and Bosk (1992) on prenatal genetic testing; and Farmer (1999) on the social context of AIDS and other infectious diseases. Bioethicists may see ethnography as a good source of illustrative cases or dramatic stories gathered simply through observation. But ethnography involves more than just observation – it relies on understanding the social processes underlying phenomena that are observed, building on prior research and analytic methods developed by social scientists, and on the systematic analysis of data that goes beyond simple description. The art of ethnography relies on the skills, knowledge, and sensitivity of the researcher. Interpretation based on good ethnography may provide all the information one needs in many circumstances. Or, it may be only the starting point, raising questions to be further investigated with the use of other methods such as a survey instrument that can more systematically collect information from larger numbers of respondents than can be observed through ethnography. Contextualizing Ethical Dilemmas: Ethnography for Bioethics 85 In this chapter, we introduce the qualitative method of ethnography, provide practical guidance on how to conduct ethnographic research, particularly as it relates to bioethical issues, and in doing so, highlight the value of ethnography to bioethics. Although some ethnographic research has been undertaken in the area of public health ethics and related topics (Brugge & Cole, 2003; Marshall & Rotimi, 2001; Lamphere, 2005), most ethnographic research cited in the field of bioethics has been clinically based. Thus, we primarily discuss ethnographic research done in clinical settings and will include examples drawn from the authors’ own experiences conducting ethnographic research on decision-making within neonatal intensive care units (NICUs), and on kidney transplantation, to offer insights into this research endeavor. Defining Ethnography and Culture There are many ways to define ethnography, grounded in different theoretical schools of thought about social research. However, all agree that ethnography has the following characteristics: it is a qualitative, naturalistic research method that derives from the anthropological tradition. An ethnographer typically goes into the field with a research question developed to build on existing social theory and/or previous substantive research with the aim of better understanding a cultural group or groups. The ethnographer aims to explicate the behaviors as well as the meaning of those behaviors of the people observed within a holistic context. In other words, ethnographers aim to describe a culture – whether it be the culture of the Navajo, intensive care, or people involved in organ donation and transplantation – by examining the worldviews, beliefs, values, and behaviors among its members. Often ethnography seeks to explore the historical, social, economic, political, and/or ideological factors which may account for cultural phenomena. The work is inductive, rather than deductive. It does not test a preestablished, fixed hypothesis and does not collect data only using previously defined variables. Instead, concepts and variables emerge through the ethnographic process. As we detail below, the prime method of ethnography is participant observation. This entails immersion in the field situation, establishing rapport with individuals, and gaining knowledge through first-hand observation of social behaviors. This is complemented by direct interaction with people in the field, and participation in their activities. Ethnographers may also use other techniques such as semi-structured 86 ELISA J. GORDON AND BETTY WOLDER LEVIN interviews, surveys, focus groups, and do textual analysis of relevant documents. Ethnography is not only the process just described, but also a product – a written account or ethnography – derived from the process (Roper & Shapira, 2000). The concept of culture is essential to ethnography. Some fundamental points to understand about culture are: (1) culture is shared among a group of people (i.e., members of a nation, religion, profession, or institution); (2) culture entails patterns of behavior, values, beliefs, language, thoughts, customs, rituals, morals, and material objects made by people; (3) culture provides a framework for interpreting and modeling social behavior; (4) culture is learned through social interaction; (5) structural factors determine social positions which affect people’s worldviews and behaviors; (6) culture interacts with gender, class, ethnicity/race, age, (dis)ability, and other social characteristics; (7) culture is fundamental to a person’s selfidentity; and (8) cultures change over time in response to changes in social, political, economic, and physical environments. There are many definitions of culture, and ethnographers vary in the approaches they use to describe it. Here, we present two definitions – the first is a classic definition by Tylor (1958[1871]) that provides a broad sense of culture as ‘‘y that complex whole which includes knowledge, belief, art, morals, law, custom, and any other capabilities and habits acquired by man as a member of society’’ (p. 1). A second and often-quoted conception of culture is provided by Geertz who stated: ‘‘man is an animal suspended in webs of significance he himself has spun y I take culture to be those webs, and the analysis of it to be y not an experimental science in search of law but an interpretive one in search of meaning’’ (Geertz, 1973, p. 5). According to this definition, the meaning of all actions and things are socially constructed and shared. Culture is comprised of the symbols and meanings attached to actions and other phenomena that help people communicate, interpret, and understand their world. Both definitions can be helpful for analyzing the culture of medicine, and specifically the culture of bioethics. Bioethics is situated at the confluence of complex legal and moral systems. Given their moral content, bioethical issues are laden with multiple meanings, and the actions agents take to resolve ethical problems or dilemmas are symbolically charged. For example, many clinicians perceive withdrawing life-sustaining therapy as different from not initiating life-sustaining therapy. So, by custom, they try to avoid withdrawing therapy when their goals can be met by not initiating a new therapy. However, for most bioethicists, these practices are conceptually and ethically synonymous. Contextualizing Ethical Dilemmas: Ethnography for Bioethics 87 Most ethnographies conducted in the area of bioethics have been done in complex, pluralistic societies where biomedicine is the dominant professional medical system. In such societies, there may be significant cultural variations between members from different backgrounds, social statuses, and other groups (e.g., health professionals, patients, ethnic/religious groups, nationalities, genders, and age groups). People from different groups often vary in worldview and behaviors. When people interact they may assume that all the social ‘‘players’’ in a given situation share the same assumptions, knowledge, and beliefs that they hold, even when this is not the case. Moreover, people may believe their way of seeing things is the most valid or only way of interpreting reality. Conversely, people may assume that other groups hold different beliefs from their own, even when they do not. These assumptions can be a source of confusion, and contribute to bioethical dilemmas and value conflicts. Ethnography is an excellent method for examining cultural assumptions and studying their effects on behaviors, social interactions, and decisions in the health care environment. Objectives of Ethnography Researchers conduct ethnography to accomplish one or more of the following objectives: (1) understanding a phenomenon from the ‘‘native’s’’ or ‘‘participant’s’’ (i.e., member of the culture’s) point of view; (2) describing a given culture by making culturally embedded norms or tacit assumptions shared by members of a cultural group explicit; (3) discerning differences between ideal and actual behavior; (4) explaining behavior, social structure, interactions within and/or between groups, or the effects of economic, institutional, global, or ecological factors; (5) examining social processes in-depth; and (6) revealing unanticipated findings that can generate new research questions. Each of these objectives is described below. (1) Understanding human behaviors from the ‘‘insider’s’’ point of view. To describe and analyze the insider’s point of view, ethnographers distinguish between ‘‘emic’’ and ‘‘etic.’’ Emic refers to words and concepts that the people who are observed in a given setting use themselves (and are therefore significant in their culture), while etic refers to the abstracted concepts that scholars or researchers use for analysis. This distinction is illustrated well in a study of terms used to describe infections: whereas the emic ‘‘folk’’ term ‘‘flu’’ was often used by lay people, the emic term ‘‘viral syndrome’’ was used by physicians 88 ELISA J. GORDON AND BETTY WOLDER LEVIN documenting patient’s problems (McCombie, 1987). However, neither term was adequate for epidemiologists conducting enteric disease surveillance. Epidemiological investigations and disease control required the use of etic or analytic terms to more precisely categorize diseases by the specific causal agent (McCombie, 1987). (2) Revealing culturally embedded norms or tacit assumptions shared among members of a cultural group. Because people understand their physical and cultural worlds through the lens of their cultural understandings, their values, beliefs, norms, and facts are usually assumed to be naturally given or taken for granted and are sometimes subconscious. Accordingly, members often cannot explain or even articulate many aspects of their culture. As Jenkins and Karno (1992) state, ‘‘In everyday life, culture is something people come to take for granted—their way of feeling, thinking and being in the world – the unselfconscious medium of experience, interpretation, and action’’ (p. 10). Traditionally, ethnographers have conducted research on cultural groups to which they do not belong. However, conducting ethnographic research about one’s own culture, which is common in current bioethics research, can be very difficult. Ethnographers may find it easier to understand the perspectives of people from their own culture, but more difficult to identify subconscious assumptions. The ability to identify tacit assumptions depends on not having been socialized as a member of the group under examination, and/or by using techniques designed to reveal such assumptions. For example, in a 1977 study of decision-making in the NICU, the researcher initially knew little of the culture of biomedicine or bioethics and was naı̈ve about the distinction between CPAP (a device supplying continuous positive airway pressure to keep lungs inflated) and a respirator. She was confused when a nurse told her that an infant on CPAP might not be put back on a respirator if his respiratory condition deteriorated because they believed he probably had a cerebral bleed and therefore would have a poor quality of life (Levin, 1985, 1986). The researcher knew that decisions were sometimes made not to treat a baby on the basis of the future quality of life. However, when she asked what difference there would be between keeping him on CPAP or putting him on the respirator, the clinicians just described the technical differences in the two technologies. Through participant observation of the care of infants in the unit, talking to doctors, nurses, social workers, and parents, attending rounds, reading charts, and reading the medical and bioethics literature, the ethnographer realized that clinicians made emic Contextualizing Ethical Dilemmas: Ethnography for Bioethics 89 distinctions between treatments according to the level of ‘‘aggressiveness’’ as well as between ‘‘withholding’’ and ‘‘withdrawing’’ treatment. Since taking into consideration these aspects of the treatment seemed ‘‘natural’’ to them, they could not articulate clearly the reasons for their decision. (3) Discerning differences between ideal behavior and what actually happens. For example, physicians may say that it is important for patients or family members to be involved in important decisions about care. Yet in many cases, informed consent has become a ritual where clinicians follow the letter, but not the spirit, of the law in the process of obtaining consent. Physicians frequently influence patients’ treatment decisions in subtle and direct ways, through body language or by emphasizing the risks of one treatment and the benefits of another treatment to obtain the decision physicians prefer (Zussman, 1992). (4) Viewing phenomena in their economic, political, social, ideological, and historical contexts. In the NICU study mentioned above, the ethnographer endeavored to understand the clinicians’ views and decisions that were made in the context of the history of the care of newborns, the development of life support technology and of the care of people with disabilities, as well as those who were critically and terminally ill. In the middle of the study, when the ‘‘Baby Doe Controversy’’ over the treatment of ‘‘handicapped newborns’’ occurred (Caplan & Cohen, 1987), the investigation was expanded to include examination of the ways NICU physicians’ understanding of the regulations and the controversy affected decisions about care (Levin, 1985, 1988). Examining how political, economic and social forces impact cultural understandings, cultural values, and social structures, and how these in turn shape both behavior and the discourse surrounding an issue highlights the cultural construction and ‘‘situatedness’’ of given phenomena. This kind of information is valuable because it demonstrates that matters of bioethical concern do not have to be framed in only one way, and it illustrates alternative routes to construing issues or addressing them. For example, Margaret Lock’s (2002) work on views of death and organ transplantation in Japan is an excellent illustration of the ways a non-Western cultural system leads to different ethical perspectives than a Western culture even when the biotechnology is similar in both cultures. (5) Examining social processes and social phenomena in greater depth. Immersion in the research setting enables researchers to understand the subtleties and nuances of phenomena under study and how components of phenomena are related. For example, one can conduct a series of case 90 ELISA J. GORDON AND BETTY WOLDER LEVIN studies to examine the experiences, attitudes, and behaviors of patients, family members, and health care professionals encountering ethical dilemmas. Such research aims to elucidate how social interactions and power dynamics in the clinical setting affect the resolution of bioethical problems. Additionally, immersion helps ethnographers to become sensitive to the political implications of what they observe as well as how they represent cultural perspectives in their reports (Clifford & Marcus, 1986). (6) Uncovering factors and processes unanticipated at the beginning of the research process. Quite often, serendipitous events lead to new and revealing observations and insights. Accordingly, ethnography can be helpful in generating new analytic frameworks, research questions, and hypotheses that can be tested using other research methods. For example, during the course of a study of disparities in gaining access to kidney transplantation (Gordon, 2001b), emerging evidence indicated that patients faced difficulties with maintaining the transplant. This concern led to development of a new research question concerning longterm graft survival. The Application of Ethnography to Bioethics In the ethnographic study of bioethical issues, three main, albeit overlapping, sets of problems are generally examined. First is the examination of everyday ethics in clinical settings. However, as Powers (2001) states, ‘‘The challenge of recognizing everyday ethical issues lies in their ordinariness’’ (p. 339). In other words, it may be difficult to problematize or consider as cultural those practices and beliefs that members of the group being studied treat as ‘‘normal’’ or ‘‘natural.’’ Ironically, it is precisely when cultural members construe issues as ‘‘normal’’ or ‘‘natural’’ that ethnographers can identify a phenomenon with important cultural dimensions. Second, much research focuses on examining whether bioethical principles and assumptions derived from philosophy are actually applied in reality, and if not, why not. For example, a study by Drought and Koenig (2002) examining the principle of respect for autonomy and the concept of patient ‘‘choice’’ in decision-making of dying patients, their families, and their health care providers, found that patients did not perceive that they had choices when discussing treatment options, contrary to bioethics scholars’ expectations that patient autonomy would be respected. A study by Dill (1995) also illustrates a challenge to assumptions about the principle of respect for autonomy in hospital discharge planning decisions for older adults. Contextualizing Ethical Dilemmas: Ethnography for Bioethics 91 Third, ethnographic research in bioethics aims to elucidate how cultural context(s) shape ethical reasoning. Researchers taking this approach seek to highlight Western assumptions that pervade bioethics, and/or investigate ethical reasoning in other cultures for a comparative approach. For example, Jecker and Berg (1992) examined how the face-to-face dynamics of living in a small, rural American setting shaped the way scarce medical resources were allocated on an individual patient level in a primary care setting. This contrasts with philosophical expectations about justice as a blinded, impersonal process. A related approach seeks to examine the diversity of experiences within a culture or among subgroups regarding particular bioethical phenomena. For example, numerous studies have shown that African Americans prefer more aggressive life-sustaining treatments compared to European Americans (Blackhall et al., 1999). Other research suggests that Koreans and Mexican Americans appear to favor physician disclosure of grave diagnoses or terminal prognoses to family members instead of to the patient in order to protect the patient and enable familial decision-making rather than patient autonomy (Blackhall Murphy, Frank, Michel, & Azen, 1995; Orona, Koenig, & Davis, 1994; Hern, Koenig, Moore, & Marshall, 1998). ETHNOGRAPHIC DATA COLLECTION TECHNIQUES Ethnography is best learned through experience, reading ethnographies, engaging in discussions with people who have done them, and doing smaller or pilot studies, rather than through purely didactic training. Accordingly, ethnography is a difficult research method to teach. We emphasize that ethnography is more than just observing a situation. One must know what to look for, how to make observations through the lens of the complex concept of culture, and how to interpret and analyze data in light of the social, historical, and cultural contexts in which data are collected. Ethnographers generally use multiple data sources and data collection techniques to obtain rich and overlapping data. Their primary source of data is participant observation. They may also use other techniques including interviews and case studies. These techniques are discussed below along with the skills necessary to use each one. Participant Observation Participant observation is the heart of ethnography; it is a strategy enabling the ethnographer to ‘‘listen to and observe people in their natural 92 ELISA J. GORDON AND BETTY WOLDER LEVIN environments’’ (Spradley, 1979, p. 32). Participant observation allows for the examination of several dimensions of a social situation simultaneously – the physical, behavioral, verbal, nonverbal, and interactional – in the context of the broader social and physical environment. Doing so ‘‘give[s] the researcher a grasp of the way things are organized and prioritized, how people relate to one another, and the ways in which social and physical boundaries are defined’’ (Schensul & LeCompte, 1999, p. 91). A strength of participant observation is that researchers are the ‘‘instrument of both data collection and analysis through [their] own experience’’ (Bernard, 1988, p. 152). Ethnographic techniques vary depending on the extent to which ethnographers identify themselves as insiders or outsiders; how involved the people who are observed are in the data collection effort, and the kinds of activities that researchers engage in as part of fieldwork (Atkinson & Hammersley, 1994). Although the technique is commonly referred to as participant observation, not all ethnographers are actual ‘‘participants.’’ For example, in bioethics research, ethnographers are often participant observers and join providers during rounds and team meetings, or share lunchtime conversations with staff. However, they do not express their opinions about bioethical or other issues discussed. In some settings, however, the ethnographer may be only an observer. Examples of this are attending a committee meeting and listening to and observing interactions of its members. Ethnographers who are studying problems in bioethics may find it enlightening to ‘‘observe’’ phenomena outside the clinical setting such as advocacy groups or representations of issues in the media. For example, members of Not Dead Yet, which advocates for disability rights, constitute a valuable source for understanding non-clinical perspectives on end of life practices (see, for example, http://www.notdeadyet.org/docs/about.html). All data collection approaches can be used simultaneously. In one study examining access to transplantation, the ethnographer observed transplant team meetings; the formal interactions between transplant coordinators and transplant candidates and their families; clinical encounters between nephrologists and dialysis patients; shadowed transplant surgeons on their medical rounds; and observed monthly social support group meetings run by the transplant center (Gordon & Sehgal, 2000; Gordon, 2000). Skills Necessary for Effective Participant Observation Ethnography, specifically the strategy of participant observation, requires that researchers develop many skills, including self-reflexivity, having a good memory, attending to details, flexibility, interpersonal skills, the ability to exert discretion, building rapport, and appreciating cultural differences. Contextualizing Ethical Dilemmas: Ethnography for Bioethics 93 Self-reflexivity is critical to an investigator’s success in using her or himself as a research tool. Self-reflexivity can be defined as working to assess one’s own biases and their potential influence on perceptions of phenomena (Frank, 1997; Ahern, 1999). (For an excellent example of how a medical anthropologist engaged in self-reflexivity to understand her informant’s disability experience, see Gelya Frank’s (2000) monograph, Venus on Wheels). An important skill required in ethnography – whether observing or interviewing – is having a good memory. During unstructured interviews, it is essential not only to have the next question ready at hand, but also several possible leads for additional questions or comments in mind, based on earlier parts of the conversation. In addition, it can be useful to keep a list of important points to cover during an unstructured interview. Because the interviewer cannot always take notes or audiotape interviews, useful strategies are: writing everything down immediately after making observations; avoiding speaking to people about the observations before writing them down; recalling things chronologically as they were witnessed; and drawing a map of the physical space in which events occurred (Bernard, 1988, p. 157). Developing ‘‘explicit awareness’’ of details of ordinary life is another skill (Spradley, 1979, p. 55). People go about life aware, but not attending to many details, e.g., what people are wearing, music playing in public places, the process involved in deciding which products to buy at supermarkets, etc. (Bernard, 1988). In fact, this occurs because many aspects of cultural life are tacit. Since one definition of culture refers to the knowledge necessary for a person to get by in his/her culture, generating information on such ordinary details provides insight into the daily lives of members in a given culture. Attending to details helps to keep biases in check and often leads to insights about assumed realities. For example, ethnographers in medical settings can gain important knowledge by noticing which cases are talked about the longest during rounds or noticing who regularly attends or skips rounds or meetings. The researcher must exert a fair degree of flexibility when conducting ethnographic research. For example, flexibility to alter the course of inquiry if needed is a hallmark of the ethnographic approach. Drought and Koenig (2002) exercised flexibility as the data emerging during their data collection indicated that the AIDS and cancer patients in their study did not conform to the categories anticipated by the normative assumptions in bioethics regarding the existence of discrete decision points at the end of life. Conducting ethnography requires that the researcher have interpersonal skills and exercise discretion. Building rapport with people is necessary to 94 ELISA J. GORDON AND BETTY WOLDER LEVIN establish trust and open the door to communication. Ethnographers need to speak to people of diverse backgrounds to understand how power dynamics and social status shape attitudes and behaviors. In the context of bioethics research in the clinical setting, this may entail going on medical rounds daily, shadowing clinicians, talking to providers and staff, sitting at the computer station with them, drinking coffee together, staying at the unit on night shift hours, and participating in social events on the unit. The ethnographer’s goal is to ‘‘fade’’ into the social fabric of the group under observations with as little impact as possible on the phenomena under study. Participant observers must also engage key informants to help guide them about the culture under study and provide insider information about aspects of a culture. One needs at least one or two key informants in each setting who have necessary competence to provide in-depth information about a particular domain of culture. Researchers can ask key informants about how things work, factual information, or about things that are typically perceived by members of the culture to be essential to understand their culture. For example, to understand treatment decisions about critically ill neonates, one might ask, for what kinds of conditions do NICU babies tend to have an ethics consultation requested (Orfali & Gordon, 2004). However, informants are not selected for their representativeness and they may not be able to report accurately about opinions or attitudes that vary in the culture (Bernard, 1988). Key informants can also be extraordinarily helpful in guiding observations, enlisting the involvement of others, and partaking in ad hoc interviews. In the study of dialysis patients’ choices about transplantation, one key informant was an administrative secretary who provided behind-the-scenes information about the structure and functioning of the transplant center and background of the health care professionals working there (Gordon, 2000). It is important to be mindful of how one selects key informants in terms of the political alignments among people in the particular social context. One should avoid aligning with key informants who are too marginal to the group under study so that one does not lose access to people and information. Ethnographers must also be careful that alignment with a key informant does not alienate other members of the group who are important to the study. Informants may emerge through establishing friendships based on trust or luck. The best informants tend to be articulate people who are somewhat cynical about their own culture and, even though they are insiders, feel somewhat marginal to it (Bernard, 1988). Ethnography requires an ability to adopt a culturally relative perspective when doing research. By this we mean endeavoring to understand and Contextualizing Ethical Dilemmas: Ethnography for Bioethics 95 respect the views of others and events within cultural, social, and historical contexts rather than making judgments about what is observed based on one’s own cultural perspective. Ethnographers must be able to appreciate that people are ‘‘rational’’ and systematic in their thinking (Horton, 1967), and that the beliefs and values underlying people’s thought processes and behaviors may differ from their own. As a result, when confronted with the same choices, people may vary in the kinds of conclusions they reach. Using a culturally relative perspective when doing research does not require ethnographers to accept or adopt a different value system than their own, but to strive to be as nonjudgmental and open-minded as possible, in order to understand, and explicate alternative worldviews when collecting and interpreting data. Interviews Interviews constitute another major technique for collecting ethnographic data. There are three main types of interviews: unstructured, semistructured, and structured, of which unstructured interviews are the most commonly used. Unstructured and semi-structured interviews can occur spontaneously in the course of participant observation research, or researchers may seek out people to ask them about specific issues. Informal and unstructured interviews are ideally conducted in the midst of participant observation when ethnographers can get immediate input on the meaning of events as they occur, especially those that are unexpected. The ethnographer may have a general idea about the topic they want to learn about or they may let the respondent drive the course of the interview. This can allow respondents to raise issues unanticipated by the researcher. When conducting semi-structured interviews, researchers prepare a written interview schedule with a set of questions or discussion points. Whereas the bulk of unstructured interviewing is open-ended, semistructured and structured interviews commonly include both open- and closed-ended questions (for a detailed discussion on this method, see chapter on Semi-Structured Interviews). Case Studies Case studies are another ethnographic technique that entails ‘‘examin[ing] most or all aspects of a particular distinctly bounded unit or case (or series of 96 ELISA J. GORDON AND BETTY WOLDER LEVIN cases)’’ (Crabtree & Miller, 1999, p. 5; Stake, 1994). Cases can be individual patients, sets of interactions, programs, institutions, nations, etc. The goal of this data collection method is to describe cases in great detail and context, which may generate hypotheses or explain relationships of cause and effect (Aita & McIlvain, 1999). Cases may be selected based on whether they are representative of a phenomenon, setting, or demographics, and whether they ‘‘offer an opportunity to learn’’ (Stake, 1994, p. 243). Alternatively, one may choose atypical cases to explore the limits of what is the norm, and to set limits to generalizability (Stake, 1994, p. 245). Rapid Assessment Process An adaptation of ethnography has been developed called ‘‘rapid ethnography’’ or ‘‘rapid assessment process’’ (RAP) (Scrimshaw & Hurtado, 1988; Bloor, 2001; Hahn, 1999). RAP is used as a faster approach to data collection in various applied settings, including public health. RAP was developed by the World Health Organization (WHO) to accommodate shorter time frames for conducting research and tends to be more problemoriented than traditional ethnography. Generally, the RAP data collection period lasts from 3 days to 6 weeks, depending on time, resources, and previous data collected; and typically RAP uses small sample sizes (Trotter & Schensul, 1998). The RAP approach commonly uses several observers, a narrower research focus, and multiple collaborators and emphasizes the use of a number of methods including direct observation, informal conversation, and key informant interviews. Other Methods in Ethnography Ethnographers can draw upon a wealth of additional data sources, such as administrative records kept at hospitals or public records regarding morbidity and mortality in a population. Other relevant data sources constitute media reports, newspaper clippings, television programs, and other forms of popular culture. The Internet offers the opportunity to engage in different kinds of observations. For example, online support groups, such as a listserv that provides a venue for dialysis patients to exchange views and give each other advice, served as a useful counterpart to observations of the in-person support group, Transplant Recipients International Organization, which was observed by one of the authors. Contextualizing Ethical Dilemmas: Ethnography for Bioethics 97 Ethnographers may also draw upon results of surveys previously conducted by other researchers as Long (2002) did for her work on euthanasia in Japan to understand how Japanese people construed the issue. Ethnographers also collect quantitative data (Bernard, 1988). For example, one can collect data on the number of cases with certain characteristics, the number of days in the hospital for people with different medical or demographic characteristics, ask individuals to list the number of people they could call on to care for them in their homes after leaving the hospital, etc. Although ethnographers conducting statistical tests should not assume that the quantitative data represent a random sample from a defined population, they can use descriptive statistics to identify rates and correlations using such data in combination with qualitative data. DEVELOPING ETHNOGRAPHIC RESEARCH Conducting ethnographic research requires considerable preparation in advance. Key preparatory steps include: (1) conceptualizing a research question, (2) establishing a research plan, (3) obtaining permission to gain access to the research site, and (4) determining the unit(s) of analysis and sampling frame(s). Depending on the nature of the study, it may involve additional elements. 1. Conceptualizing a research question. The first step in ethnographic research is conceptualizing a research question. An important characteristic of ethnographic research is that it does not usually aim to test an a priori hypothesis and is not geared toward achieving generalizability, characteristic of statistical hypothesis testing. Rather, as discussed above, the ethnographer is seeking to gain an in-depth understanding of a problem, group, or social setting within its broader context. Conceptualizing a research question requires sufficient familiarity with a topic or human group to find a gap in the scholarly literature. Accordingly, ethnographers develop a research topic derived from social theory and/or from substantive gaps in knowledge. A topic may not have been investigated at all, or phenomena may have been studied with a different theoretical framework, methodological approach, or in a different group or setting or timeframe. As with all research, a comprehensive review of the existing literature is essential in this phase. The central research question is generally broad and subsumes multiple other questions. For example, the investigation of social and cultural factors shaping treatment decision-making regarding renal transplantation (Gordon, 2001a,b) included inquiries into the doctor–patient 98 ELISA J. GORDON AND BETTY WOLDER LEVIN communication, transplant evaluation, and patients’ decisions about donation (Gordon & Sehgal, 2000; Gordon, 2001a). In addition, researchers must consider the potential significance of their proposed research, i.e., the extent to which the knowledge gained will: (a) advance theory or methodology in bioethics and/or in the social sciences, i.e., help to re-conceptualize the doctor–patient relationship or bioethical principles, (b) change clinical practice, (c) inform health interventions, or (d) inform health policy. 2. Establishing a research plan. An important step after conceptualizing a research question is devising a preliminary plan for undertaking the research. This can be done by a comprehensive review of the existing substantive and methodological literature and by identifying and, if possible, preparing the methodological technique(s) to use during fieldwork. The next step entails distilling the research design logistics: identifying who or what situations to observe, which people to interview, when to do these steps, and how. Although ethnographers usually enter the field with a fairly open plan to develop a holistic view of the situation, ethnographers may draft interview guides, have experts in the field to review them, or conduct pilot studies with specific methods before initiating the main phase of research. Collecting background data i.e., population statistics, medical data, or administrative information, is important for gaining an appreciation for the broader context. However, ethnographers cannot plan all of their fieldwork in advance. Serendipitous occasions and unanticipated informal conversations occur during participant observation, which may lead to the most important insights – this constitutes the heart of ethnography. 3. Obtaining permission to gain access to the research site. The next step entails gaining access to the field site. Essential components of this process illustrated below are based upon experiences by one of the authors in conducting research with kidney transplant and dialysis patients (Gordon 2001a,b). In this study, the investigator planned to conduct the research in dialysis units. This necessitated obtaining permission from the nephrologist directing all the dialysis centers that served as observation settings. The next group from whom the ethnographer needed permission was the individual nephrologists who ran each dialysis center, followed by the other health care providers who staffed the dialysis centers and provided the ethnographer with direct entrée to the dialysis patients. Finally, the IRB approval was required and consent forms were needed in order to speak with patients. Negotiating access with chairs of clinical units or with the IRB can also be challenging given that many chairs and IRB members lack knowledge about Contextualizing Ethical Dilemmas: Ethnography for Bioethics 99 qualitative research and often perceive it as a ‘‘fishing expedition’’ or ‘‘not really science’’ (Koenig, Black, & Crawley, 2003). Nonetheless, anticipating in advance potential critiques of social science and qualitative methods can better enable ethnographers to address them when they arise. For example, it is common for biomedical practitioners to comment that social science research – particularly qualitative research – is ‘‘subjective’’ rather than ‘‘objective.’’ By highlighting the strengths of qualitative research in terms of validity rather than generalizability, ethnographers can advance the understanding of their methods, foster acceptance and support for their research, facilitate IRB approval, and improve communication between ethnographer and study subjects (Anspach & Mizrachi, 2006; Koenig et al., 2003; Cassell, 1998, 2000; Lederman, 2007). Gaining access to the research site is an informal as well as formal process. In addition to the formal permission from health care institutions and IRBs, it is crucial to obtain ‘‘buy-in’’ from relevant administrative and/or clinical staff and other individuals who can effectively facilitate or hinder data collection. Such informal gatekeepers can help the ethnographer gain access to electronic data, unfolding situations, potential research participants, and answer questions, all of which can help ethnographers to realize their research goals. Moreover, gaining access to the research site is not a one-time matter but rather a process that must be negotiated repeatedly. The researcher often needs to re-introduce him/herself and re-explain his or her role to staff in order to ensure that the research itself and the researcher’s presence are understood and accepted. This is especially true in clinical settings with high staff turnover and multiple shifts. Staff may feel that ethnographers are interfering with their work space, time, and patients. To minimize this perception, ethnographers must make clear to staff that they are in the setting to learn about the people there, that they will keep information confidential, and will not spread gossip. The ethnographer can let cultural members know that his/her process of learning entails asking basic questions, the answers to which might be perceived as obvious to cultural members but help the ethnographer uncover tacit and fundamental assumptions. After they have been in the field for a long time, the ethnographer must be careful to not be seen as a regular member of the group under study as this will diminish their effectiveness (Bosk, 2001). Finally, ethnographers may encounter challenges gaining access to research sites because (a) respondents/subjects hold powerful positions and (b) ethical issues are sensitive in nature. Nevertheless, ‘‘studying-up’’ or studying the powerful – such as health care professionals, administrators, or 100 ELISA J. GORDON AND BETTY WOLDER LEVIN ethicists – is important because they affect the well-being of many other members of society (Nader, 1972; Sieber, 1989, p. 1). The powerful are typically difficult to study. Anthropologist Laura Nader explains just why this is the case: ‘‘The powerful are out of reach on a number of different planes: they don’t want to be studied; it is dangerous to study the powerful; they are busy people; they are not all in one place, and so on’’ (Nader, 1972, p. 302). Nader provides further insights into the concerns held by the powerful about being studied: ‘‘Telling it like it is may be perceived as muckraking by the subjects of study y or by fellow professionals who feel more comfortable if data is [sic] presented in social science jargon which protect the work from common consumption’’ (Nader, 1972, p. 305). Indeed, the effort by one of the co-authors to examine how ethics consultants discuss cases during ethics committee meetings was thwarted by some of the committee leaders and members because of fears about the uses of data to be obtained. Finally, the sensitive nature of many medical ethical issues requires a great deal of tact, strong communication skills, discretion, and often empathy on the part of the ethnographer. 4. Determining the units of analysis and sampling frame. Because ethnographic research operates at a number of different levels of inquiry at once, it is essential to consider the particular unit(s) of analysis before implementing a study. The choice of unit(s) is driven by the study’s aims and focus and may be at the micro level of patients or clinicians, or can be broader and include an entire medical floor or hospital or a society’s response to an ethical issue. In the study of ethics consultations, the unit of analysis was the aggregate of the patient, family, and health care professionals treating one patient (Kelly, Marshall, Sanders, Raffin, & Koenig, 1997). This study examined the interactions between individuals within these small groups and with ethics consultants to obtain a rich understanding of how consultants influence decision-making and of each party’s perspectives about the value of ethics consultations. Determining the unit of analysis is related to the issue of sampling. In ethnographic research, sampling is not geared toward statistical generalization. Instead, sampling is performed to enable ethnographers to study a sufficient scope and depth of processes and interactions to gain understanding of situations in all their complexity. An in-depth appreciation often comes at the expense of generalizability. Data collection ideally proceeds until saturation has been achieved. Saturation is the point where no new insights, themes, or patterns are being generated. However, the ability to reach saturation may depend on the unit(s) of analysis. A tradeoff often exists between the number of situations observed and the richness of data Contextualizing Ethical Dilemmas: Ethnography for Bioethics 101 collected (Morse, 2000). To enhance validity of a study’s findings and obtain a better sense of the range of phenomena under study, ethnographers may make visits to other comparable programs, medical units, or institutions, albeit with less time than in the primary site. Informed Consent Ethnographic research presents challenges regarding informed consent not usually encountered in other ethnographic or bioethical research. Traditionally, much of what anthropologists observe has been ‘‘public’’ behavior or has occurred in settings which they have been invited to attend as participants. Often, an invitation may not even be required after anthropologists have immersed themselves and become accepted by the group or community under study. IRBs consider much ethnographic research to be exempt or subject to expedited review; oral consent is often considered adequate.Yet in clinical settings, because of the sensitive nature of the situations observed, written informed consent is expected from patients, family members, clinicians, and others who participate in formal interviews or whose cases or behavior are studied in depth. However, it may not be practical or even possible to obtain prospective informed consent from everyone who will be observed during an ethnographic study (e.g., from the dozens of people who will pass through a unit or clinic on a given day, and who are not the main focus of research). Although some IRBs will exempt ethnographers from obtaining written consent, most require verbal informed consent for observations and interviews that states the purpose of the research, guarantees confidentiality, and enables people who would be studied the choice not to participate or to withdraw from the study at any time. Of note is also the fact that the act of requesting consent to be an observer of group events (e.g., clinical meetings and consultations) sets the researcher off from the group being studied which may undermine efforts to establish rapport (Miller, 2001). The American Anthropological Association (AAA) (2004) has published a statement on Ethnography and Institutional Review Boards ([URL: http://www.aaanet.org/stmts/irb.htm] accessed 6-28-07). Briefly, the AAAs position is that anthropological research often falls under expedited and exempt review, yet investigators may have difficulty getting the IRB to use such mechanisms. Some IRB members are unaware of how conducting ethnographic research diverges from the biomedical model leading to difficulties for ethnographers in obtaining IRB approval for verbal consent. The AAA points out that the common rule does allow for waivers of written 102 ELISA J. GORDON AND BETTY WOLDER LEVIN consent, according to certain criteria. The tension between the requirements of good ethnography and the strictures of modern informed consent regulations has been further intensified by the Health Information Portability and Accountability Act (HIPAA) of 2003 that aims to protect identifying information and prevent its transmission outside of an institution. Local variations of how informed consent for ethnography is handled mean that there are few clear guidelines for ethnographers seeking to do research in clinical settings. Other Groundwork Throughout the preparation process ethnographers need to be involved in other groundwork. For instance, researchers may need to learn a foreign language, in this context, that of medicine, bioethics, and the local clinic. Whereas much technical medical language can be learned in advance, learning how it is used on an everyday basis occurs through observation after entering the field. It is essential for the ethnographer to become familiar with the vocabulary, jargon, and lore of those under study; study subjects will likely teach ethnographers if they indicate a willingness to learn (Bernard, 1988). For this purpose attending rounds and case conferences can be invaluable. Other groundwork involves learning the lay of the land, which is essential for understanding the social and symbolic meanings associated with physical spaces. For example, for the NICU study it was helpful to know that babies in the beds on one side of the unit were in the most critical condition. This awareness helped make sense of their treatment regimens and clarified that movement from room to room denoted improvement and preparation for discharge. Early stages of ethnographic research may also entail a macro-level form of site survey (Anderson & McFarlane, 2004), where one assesses the broader environmental setting in which the study takes place, i.e., the neighborhood. DATA MANAGEMENT AND ANALYSIS Recording Data Ethnographic research generates large quantities of data such as field notes, responses to interviews and surveys, photographs and video-recordings, and copies of documents. Contextualizing Ethical Dilemmas: Ethnography for Bioethics 103 Field notes are the main method of recording data. These start as handwritten notes which are then entered into a word processing program and/or database. Fieldnotes record what ethnographers observe about the unfolding events, the environment or context, and/or the people in it, and information provided by participants (Dewalt, Dewalt, & Wayland, 1998; Pelto & Pelto, 1978). Much has been written on how to write good field notes (Emerson, Fretz, & Shaw, 1995). Some brief, general guidelines are as follows: First, and most importantly, ethnographers must write down notes promptly, and not rely on memory. Without the note, there is no data point (Dewalt et al., 1998; Bernard, 1988; Pelto & Pelto, 1978). Field notes should be written as close to the time that the data are collected as possible to prevent memory loss. Notes should be taken throughout the day or course of observation, and not only at the end of the day. Second, it is important that ethnographers exercise discretion about note-taking while undertaking participant observation. Not infrequently, observations of emotionally laden interactions common in sensitive situations may preclude taking notes as an event unfolds. Study participants and others in the field may perceive the researcher’s note-taking as intrusive. Note-taking may also disrupt the natural flow of events. When taking notes is not appropriate, writing brief quotes, key words, or short descriptions immediately after observing events, interactions, and discussions is important to optimize proper recall and documentation. Third, ethnographers should record at a low level of abstraction. This means that researchers should note concrete details regarding the actions, interactions, communication, and nonverbal communication that they observed. Furthermore, they need to document the specific observations that led to their impression about a cultural process or pattern, such as emotional events transpiring during the observation. For example, researchers studying communication between health professionals and patients would benefit by noting nonverbal expressions, the physical proxemics, and relative positions between social actors, such as facial expressions or crying, in order to later interpret and make sense of these expressions in light of cultural meanings and expectations in the clinical setting. Otherwise, taking note that a patient expressed ‘‘agreement’’ (an abstraction based on one’s own cultural lens) may be grossly inaccurate since, for example, for many Asian patients, nodding is a form of respect rather than agreement (McLaughlin & Braun, 1998). Fourth, writing field notes does not necessarily mean writing down every bit of minutia observed, but should be directed toward a certain focus, depending on one’s theoretical orientation and study aim (Pelto & Pelto, 1978). For instance, noting the color and length of physicians’ jackets/coats may be important for a study of the symbolic power of physicians and 104 ELISA J. GORDON AND BETTY WOLDER LEVIN interactions based on their status, but would be moot if the focus is on language used during doctor–patient communication during telephone consults. Further, it is common for researchers to jot down key phrases heard, rather than noting entire sentences word for word, to stay abreast of observations without getting bogged down with the logistics of recording. At the end of a day of data collection, ethnographers should fill in the details of their field notes and write fuller notes by reviewing their jottings, reflecting on the events observed. Notes should also record steps in the development of the ethnographer’s analysis. If an ethnographer is collaborating with others, then debriefing with mentors or colleagues can trigger memories of observations and should also be recorded. Fifth, it is important for ethnographers to recognize that their notes are not objective representations of events, but rather constructions infused with their own interpretation and analysis (Dewalt et al., 1998). Accordingly, there is a debate within anthropology about the use of one or more sets of field notes to record: (a) log of field activities, (b) observations and information, (c) analytic interpretations, and (d) personal perceptions and experiences to aid in self-reflexivity. Some anthropologists recommend that ethnographers keep a separate journal or diary to record personal experiences in line with the notion of self-reflexivity. One way to keep check on biases in field notes is to write reflexively, staying as fully attuned as possible and documenting how the events observed make the ethnographer feel and respond. However, others contend that such experiences are themselves data which reveal much about the difference between the culture observed and the ethnographer’s own cultural framework. Thus, such notes yield a form of cross-cultural comparison that should be kept with notes on observations of the setting. In addition to field notes, ethnographers may collect data from interviews and surveys which may or may not be audio-recorded. Even if they are audio-recorded, it is important to take notes during interviews as a backup in case recordings fail, and to write notes after the interview to provide greater context with details. In many clinical settings, health care professionals have expressed great reluctance to have their interactions and meetings tape recorded, likely owing to the American cultural context of litigation as well as to confidentiality considerations. The Management of Data The collection of extensive data requires that researchers manage them systematically. This means establishing proper storage and retrieval systems Contextualizing Ethical Dilemmas: Ethnography for Bioethics 105 (Huberman & Miles, 1994). Most data can be entered into computer databases. Software for storing and analyzing qualitative data are readily available e.g., QSR NUD*IST, THE ETHNOGRAPH, ASKSAM, NVIVO, and ATLAS-TI. Some of these programs even include statistical capacities. Additional ways to manage and protect data entail using a physical filing system, dubbing audiotapes and transcribing tape recordings, making photocopies of all documents, and backing up electronic data on a regular basis. An important facet of data management involves the development of a codebook ensuring that data are managed in a systematic manner. This fosters clarity and enhances consistency in data collection, interpretation, and coding procedures. Codebook development, data collection, coding, and analysis are not linear but inform each other and are refined over the course of research. (For more information about these aspects of qualitative data management, see chapters on Content Analysis and Semi-Structured Interviews). THE ANALYSIS AND INTERPRETATION OF DATA Social scientists have developed many methods of inductive analysis. Excellent sourcebooks on how to conduct qualitative data analysis are available (Miles & Huberman, 1994; Strauss, 1987), and some of these techniques are discussed further in other chapters of this book. Here, we briefly define some analytic approaches that are commonly used in or suited to ethnographic research in bioethics: grounded social theory, componential analysis, and content analysis. Grounded Theory Iterative procedures have always been the norm in the practice of ethnography. While ethnographers are in the field they begin to interpret their data. Subsequent data collection is guided by earlier observations and endeavors to develop more nuanced understandings of the phenomena under study. Efforts are taken to reject or elaborate earlier interpretations and develop better explanations. In the 1960s, sociologists Glaser and Strauss (1967) formally developed a rationale for this approach and showed how such interpretations grounded in data could be used to develop theory. They labeled their approach ‘‘Grounded Theory’’ and called for rigorous attention to the process used by researchers to move between data and 106 ELISA J. GORDON AND BETTY WOLDER LEVIN analysis. In particular, they advocated the use of a technique they called the ‘‘constant comparative method’’ in which the ethnographer would use later observations to verify or modify previously suggested theories (Strauss & Corbin, 1994). According to Strauss and Corbin, an important feature that distinguishes grounded theory methodology from other more descriptive inductive approaches is that sampling, questioning, and coding are all explicitly theoretically informed and chosen with the intention of explaining the relationships between concepts and patterns of social behavior. They wrote: ‘‘In doing our analyses, we conceptualize and classify events, acts and outcomes. The categories that emerge, along with their relationships, are the foundations for our developing theory. This abstracting, reducing, and relating is what makes the difference between theoretical and descriptive coding (or theory building and doing description)’’ (Strauss & Corbin, 1998, p. 66). It should be made clear that grounded theory describes a broader approach for handling the analysis of data, but does not provide guidance on specific cognitive elements to focus on for analysis, as does componential analysis, described below. Componential Analysis Although componential analysis has not been employed much in bioethics research, it could be especially useful. Componential analysis, which is also referred to as semantic analysis, entails two goals: (1) describing how members of a cultural group categorize a given meaningful, culturally valid, behavioral issue from an emic perspective and (2) delineating the cognitive processes, components of meaning, or criteria that cultural members use to distinguish between cultural categories (Bernard, 1988; Pelto & Pelto, 1978). This process entails charting out, classifying, and contrasting semantic networks of emic constructs among group members to understand how they view an issue under study. Anthropologist Edward Sapir hypothesized that language is a reflection of a cultural group’s conception of the world, which is often tacitly embedded in how people classify the things in their world (Mandelbaum, 1949). Accordingly, identifying how people classify, for example, kinds of personhood, the meanings attached to the concepts of person, and the rationale for the classification system, may yield insight into the moral underpinnings for decisions made about medical treatment. Research in bioethics has used techniques derived from sociolinguistics. These entail the study of discourse as it naturally occurs in terms of its content and structure. For example, bioethics researchers have inquired into Contextualizing Ethical Dilemmas: Ethnography for Bioethics 107 the multiple meanings of constructs like ‘‘fairness’’ in allocation of primary health care services (Jecker & Berg, 1992) and of ‘‘do-not-resuscitate orders’’ for different social and cultural groups (Muller, 1992). Content Analysis A commonly used technique for analyzing data is content analysis. Content analysis is a process of analyzing data by systematically searching for themes and repetitions emergent from the data (Luborsky, 1994; Huberman & Miles, 1994). The first step is to prepare data for analysis – whether that means pulling out descriptions of cases, organizing hand-written and wordprocessed field notes systematically, transcribing audiotapes verbatim, or coding data for use with a computer program. Content analysis is an iterative process whereby themes are developed by grouping coded segments into schema of larger domains, followed by a review of the categorization schema for appropriate thematic fit, and adjusting and reviewing the schema again until the researcher(s) reach consensus. Traditionally, ethnographers coded data by hand, e.g., by highlighting or marking portions of handwritten or typed text, or by sorting index cards. Although many researchers continue to code by hand, there are now many qualitative data analysis software programs available, as noted above. (For a detailed discussion, see the chapter on Content Analysis in this volume). Validity Checking Ethnographers use numerous techniques to enhance the internal validity of their research, that is, ‘‘the extent to which it gives the correct answer’’ (Kirk & Miller, 1986, p. 19; Patton, 1999). To increase the validity of the data, ethnographers use multiple techniques including thick description, triangulation, member checking, using paradigm cases, and maintaining a detailed paper trail. These will be discussed further below. Thick description entails depicting phenomena in rich detail (Geertz, 1973). Geertz likened culture to ‘‘an assemblage of texts’’ (1973, p. 448) with socially shared and generated meanings that can be interpreted. The ethnographer’s analytic goal is to uncover and understand the meanings woven throughout his/her field notes. Detailed accounts in field notes help to increase the likelihood that interpretations remain data-driven and are truly inductive accounts of the cultural group or other phenomena under study. Triangulation involves the use of several kinds of methods or sources of information to obtain overlapping data which helps to support reliability and 108 ELISA J. GORDON AND BETTY WOLDER LEVIN validity of findings. There are different kinds of triangulation: (1) investigator triangulation which involves several researchers collecting data; (2) theory triangulation which entails using multiple perspectives to interpret data; (3) methodological triangulation which uses multiple data collection methods and strategies; and (4) interdisciplinary triangulation involving the inclusion of other disciplines to inform the research process (Crano, 1981). Member checking involves asking informants to evaluate aspects of the researcher’s analysis to find out if interpretations are analytically sound. It can take several forms such as asking informants how and why cases are categorized in a certain way, or asking informants to review a written description of an aspect of their culture. For example, after having observed meetings during which transplant professionals made decisions about placing patients on the waiting list, I used member checking by having a transplant surgeon review a draft of my manuscript for accuracy, correct technical details, and to obtain clarification (Gordon, 2000). Looking for ‘‘paradigm’’ cases generally involves researchers identifying ‘‘typical’’ or representative cases to illustrate situations in which certain norms apply. Ethnographers may also select anomalous or contradictory cases which do not conform to the dominant pattern. Study of these exceptional cases can help the ethnographer to differentiate between those conditions in which the cultural or ethical norms apply while simultaneously help to explain limits of application; this may reveal the ‘‘rules for breaking the rules.’’ In a study of perceptions about truth-telling by members of different ethnic groups, Blackhall, Frank, and Murphy (2001) deliberately used this technique by seeking further interviews with at least two respondents whose responses to an initial survey were atypical in order to obtain insight into the diversity within groups. Finally, maintaining a detailed and accurate paper trail is another useful form of validity checking (Pelto & Pelto, 1978). This can be fostered by ensuring that dates and decision points are written on all data materials to provide a chronological account of the development of interpretations while data are being collected. ETHICAL CONSIDERATIONS Conducting ethnographic research raises several ethical concerns and challenges. Earlier in this chapter, we discussed informed consent and the lack of familiarity IRBs frequently have regarding ethnographic methods. Detailed accounts of the problems and proposed solutions to working with Contextualizing Ethical Dilemmas: Ethnography for Bioethics 109 IRBs regarding ethnography can be found in Gordon (2003), Marshall (2003), and Koenig et al. (2003). Other ethical challenges in ethnographic research pertain to ‘‘studying up,’’ studying one’s colleagues, confidentiality, and the questions about whether to intervene in ethically troublesome situations. Confidentiality is a major concern for ethnographers who frequently live or work intimately with those they are studying. Informants may see their roles and lives described in detail or in unanticipated ways that they feel can harm their reputations (Bosk, 2001). To protect confidentiality, most ethnographers try to conceal or mask the exact location of the work and identities of the participants. Part of the problem with good ethnography is that it ‘‘penetrate[s] deeply enough into the social world being described,’’ and ‘‘make[s] the latent manifest’’ (Bosk, 2001, p. 209). Fieldworkers can thereby make subjects uncomfortable (Bosk, 2001). Moreover, informants may feel betrayed when ethnographers leave the field after they have developed friendships or paid concerted attention to them over time. Respect for study participants and their culture can be demonstrated by sharing results with them and/or having them review interim findings during the study. Care must be taken so that individuals in the group being studied cannot be identified in written reports. One way to circumvent this problem is to write composite cases that incorporate aspects from multiple cases into one. Additionally, ethnographers commonly use pseudonyms for participants and anonymize the research site. It is important to convey to participants that all data will be kept confidential and that their names will not be used in publications or presentations. Ethnographers in their research can become privy to many kinds of behaviors and discussions, including illicit and unethical behavior. This raises the question of whether the ethnographer should intervene. Intervening could help those in need. At the same time, it can potentially jeopardize the quality and reliability of data, as well as impact relationships with key informants and those providing access to the research setting. Anthropologists have examined this issue as part of the ethical standards for professional practice ([URL: http://www.aaanet.org/stmts/irb.htm] accessed 2-28-07). SUMMARY Ethnography is a qualitative research method based predominantly on participant observation; this is usually supplemented by the use of interviews, surveys, and document analysis. The concept of culture is 110 ELISA J. GORDON AND BETTY WOLDER LEVIN central – the ethnographer seeks to articulate how the members of the culture(s) being studied themselves understand phenomena (emic perspective), and to analyze (etic perspective) the complex set of historical and contemporary forces that shape beliefs and behavior. Ethnography is best learned through experience, reading ethnographies, engaging in discussions with people who have done them, and doing smaller or pilot studies, rather than through purely didactic training. Accordingly, ethnography is a difficult research method to teach. We emphasize that ethnography is more than just observing a situation. One must know what to look for, how to make observations through the lens of the complex concept of culture, and how to interpret and analyze data in light of the social, historical, and cultural contexts in which data are collected. In the following, we discuss a number of strengths and weaknesses of ethnography. Strengths Ethnography offers much strength as a research method in bioethics. Foremost, ethnography enables researchers to contextualize bioethical issues in their broader social, historic, economic, political, ideological, and cultural contexts. Ethnography provides insight by uncovering elements of a culture that other methods cannot provide because of the tacit nature of culture. It also provides unique insights because it focuses on natural behaviors occurring in specific settings. It collects data on both what people say they do and what the ethnographer observes them doing. Ethnography does not depend on a rigid research plan; therefore, it can detect unanticipated phenomena. The sensitivity ethnography brings to examining subtlety and meaning renders it ‘‘an ideal vehicle for examining normative language or decision making’’ (DeVries & Conrad, 1998, p. 248). Ethnography is valuable for showing ‘‘how flexible values are, how the same values are used to justify a wide range of seemingly incompatible behaviors’’ (Bosk, 2001, p. 200). In addition, ethnography enables researchers to understand the complexity of a phenomenon. Since one of the hallmarks of ethnography is its flexibility, investigators are able to follow up leads that arise unexpectedly during the research, adjust techniques, and modify research questions as unforeseen information emerges to more thoroughly investigate a topic (Drought & Koenig, 2002; Briggs, 1970). Moreover, ethnography lends itself to informing other research methods. Specifically, it can help generate hypotheses that can be tested using other qualitative or quantitative methods, and interpret the findings of such research. Contextualizing Ethical Dilemmas: Ethnography for Bioethics 111 Weaknesses A major weakness of ethnography is its time consuming nature. Most studies last at least one year, frequently longer. Thus, ethnography is expensive and resource intensive. Another weakness that critics are quick to point out is that it is subjective in data collection and analysis. By using the ethnographer as a primary tool for data collection, personal biases may enter unchecked into the data collection and analysis processes. Another weakness is the limited ability to generalize study findings. Other weaknesses pertain to the process of conducting ethnography itself. Ethnography relies, to a great extent, on luck. It is unknown what kinds of cases or events will emerge over the course of the research period, and ethnographers are essentially dependent on the luck of the draw. Moreover, levels of cooperation by members of the group under study with the researcher may vary unexpectedly. Research in clinical settings may also be affected by turnover in medical and administrative staff each month and by changes in nursing staff requiring re-negotiation throughout a study. ACKNOWLEDGMENTS This work was supported by grant DK063953 from the National Institute of Diabetes and Digestive and Kidney Diseases (EJG). We thank Becky Codner and Elizabeth Schilling for their research assistance, and Laura Siminoff Ph.D. and Liva Jacoby, Ph.D. for their helpful suggestions with this manuscript. REFERENCES Ahern, K. (1999). Ten tips for reflexive bracketing. Qualitative Health Research, 15(3), 407–411. Aita, V. A., & McIlvain, H. (1999). An armchair adventure in case study research. In: B. F. Crabtree & W. L. Miller (Eds), Doing qualitative research (pp. 253–268). Thousand Oaks, CA: Sage Publishers. American Anthropological Association (AAA). (2004). Statement on Ethnography and Institutional Review Boards. (Adopted by the AAA Executive Board). Retrieved December 18, 2006, from http://www.aaanet.org/stmts/irb.htm. Anderson, E. T., & McFarlane, J. (2004). Windshield survey. In: E. T. Anderson & J. McFarlane (Eds), Community as partner. New York: Lippincott. 112 ELISA J. GORDON AND BETTY WOLDER LEVIN Anspach, R. R. (1993). Deciding who lives: Fateful choices in the intensive-care nursery. Berkeley: University of California Press. Anspach, R. R., & Mizrachi, N. (2006). The field worker’s fields: Ethics, ethnography and medical sociology. Sociology of Health and Illness, 28, 713–731. Atkinson, P., & Hammersley, M. (1994). Ethnography and participant observation. In: N. K. Denzin & Y. S. Lincoln (Eds), Handbook of qualitative research (pp. 248–261). Thousand Oaks, CA: Sage Publications. Bernard, H. R. (1988). Research methods in cultural anthropology. Newbury Park, CA: Sage Publications. Blackhall, L. J., Frank, G., & Murphy, S. (2001). Bioethics in a different tongue: The case of truth-telling. Journal of Urban Health: Bulletin of the New York Academy of Medicine, 78(1), 59–71. Blackhall, L. J., Frank, G., Murphy, S. T., Michel, V., Palmer, J. M., & Azen, S. (1999). Ethnicity and attitudes towards life sustaining technology. Social Science and Medicine, 48, 1779–1789. Blackhall, L. J., Murphy, S. T., Frank, G., Michel, V., & Azen, S. (1995). Ethnicity and attitudes toward patient autonomy. Journal of the American Medical Association, 274, 820–825. Bloor, M. (2001). The ethnography of health and medicine. In: P. Atkins, A. Coffey, S. Delamont, J. Lofland & L. Lofland (Eds), The handbook of ethnography (pp. 177–187). Thousand Oaks, CA: Sage Publications. Bluebond-Langer, M. (1978). The private worlds of dying children. Princeton, NJ: Princeton. Bosk, C. L. (1979). Forgive and remember: Managing medical failure. Chicago: University of Chicago Press. Bosk, C. L. (1992). All god’s mistakes: Genetic counseling in a pediatric hospital. Chicago: University of Chicago Press. Bosk, C. L. (2001). Irony, ethnography, and informed consent. In: B. Hoffmaster (Ed.), Bioethics in social context (pp. 199–220). Philadelphia, PA: Temple University Press. Briggs, J. (1970). Never in anger: Portrait of an Eskimo family. Cambridge, MA: Harvard University Press. Brugge, D., & Cole, A. (2003). A case study of community-based participatory research ethics: The healthy public housing initiative. Science and Engineering Ethics, 9(4), 485–501. Cassell, J. (1998). The woman in the surgeon’s body. Cambridge, MA: Harvard University Press. Cassell, J. (2000). Report from the field: Fieldwork among the ‘primitives.’ Anthropology Newsletter, (April), 68–69. Caplan, A., & Cohen, C. B. (1987). Imperiled newborns. The Hastings Center Report, 17, 5–32. Clifford, J., & Marcus, G. (1986). Writing culture: The poetics and politics of ethnography. Berkeley: University of California Press. Crabtree, B. J., & Miller, B. F. (Eds). (1999). Doing qualitative research. Thousand Oaks, CA: Sage Publications. Crano, W. D. (1981). Triangulation and cross-cultural research. In: M. B. Brewer & B. E. Collins (Eds), Scientific inquiry and the social sciences. A volume in honor of Donald T. Campbell (pp. 317–344). San Francisco: Jossey-Bass. DeVries, R., Bosk, C. L., Orfali, K., & Turner, L. B. (2007). View from here: Bioethics and the social sciences. Oxford: Blackwell. DeVries, R., & Conrad, P. (1998). Why bioethics needs sociology. In: R. DeVries & J. Subedi (Eds), Bioethics and society: Constructing the ethical enterprise (pp. 233–257). Upper Saddle River, NJ: Prentice Hall. Contextualizing Ethical Dilemmas: Ethnography for Bioethics 113 DeVries, R., & Subedi, J. (1998). Bioethics and society: Constructing the ethical enterprise. Upper Saddle River, NJ: Prentice Hall. Dewalt, K. M., Dewalt, B. R., & Wayland, C. B. (1998). Participant observation. In: H. R. Bernard (Ed.), Handbook of methods in cultural anthropology (pp. 259–299). Walnut Creek, CA: Sage Publications. Dill, A. E. P. (1995). The ethics of discharge planning for older adults: An ethnographic analysis. Social Science and Medicine, 41(9), 1289–1299. Drought, T., & Koenig, B. (2002). ‘Choice’ in end-of-life decision making: Researching fact or fiction? The Gerontologist, 42(Special issue 3), 114–128. Emerson, R. M., Fretz, R. I., & Shaw, L. L. (1995). Writing ethnographic fieldnotes. Chicago: University of Chicago Press. Estroff, S. E. (1981). Making it crazy: An ethnography of psychiatric clients in an American community. Berkeley: University of California. Farmer, P. E. (1999). Infections and inequalities: The modern plagues. Berkeley: University of California. Fox, R. (1959). Experiment perilous. New York: Free Press. Fox, R. C., & Swazey, J. (1978). The courage to fail: A social view of organ transplants and dialysis. Chicago: University of Chicago Press. Fox, R. C., & Swazey, J. (1992). Spare parts: Organ replacement in American society. New York: Oxford University Press. Frank, G. (1997). Is there life after categories? Reflexivity in qualitative research. The Occupational Therapy Journal of Research, 17(2), 84–97. Frank, G. (2000). Venus on wheels. Berkeley, CA: University of California Press. Geertz, C. (1973). The interpretation of cultures: Selected essays. New York: Basic Books. Ginsburg, F. D. (1989). Contested lives: The abortion debate in an American community. Berkeley: University of California. Glaser, B. G., & Strauss, A. L. (1965). Awareness of dying. Chicago: Aldine. Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research. Chicago: Aldine. Gordon, E. J. (2000). Preventing waste: A ritual analysis of candidate selection for kidney transplantation. Anthropology and Medicine, 7(3), 351–372. Gordon, E. J. (2001a). ‘‘They Don’t Have To Suffer For Me’’: Why dialysis patients refuse offers of living donor kidneys. Medical Anthropology Quarterly, 15(2), 1–22. Gordon, E. J. (2001b). Patients’ decisions for treatment of end-stage renal disease and their implications for access to transplantation. Social Science and Medicine, 53(8), 971–987. Gordon, E. J. (2003). Trials and tribulations of navigating institutional review boards and other human subjects provisions. Anthropological Quarterly, 76(2), 299–320. Gordon, E. J., & Sehgal, A. R. (2000). Patient-nephrologist discussions about kidney transplantation as a treatment option. Advances in Renal Replacement Therapy, 7(2), 177–183. Guillemin, J. H., & Holmstrum, L. L. (1986). Mixed blessings: Intensive care for newborns. New York: Oxford University Press. Hahn, R. A. (1999). Anthropology in public health: Bridging differences in culture and society. New York: Oxford. Hern, H. E., Koenig, B. A., Moore, L. J., & Marshall, P. A. (1998). The difference that culture can make in end-of-life decisionmaking. Cambridge Quarterly of Healthcare Ethics, 7, 27–40. 114 ELISA J. GORDON AND BETTY WOLDER LEVIN Hoffmaster, B. (Ed.) (2001). Bioethics in social context. Philadelphia: Temple University Press. Horton, R. (1967). African traditional thought and western science. Part I. From tradition to science. Africa: Journal of the International African Institute, 37(1), 50–71. Huberman, A. M., & Miles, M. B. (1994). Data management and analysis methods. In: N. K. Denzin & Y. S. Lincoln (Eds), Handbook of qualitative research (pp. 413–427). Thousand Oaks, CA: Sage. Jecker, N. S., & Berg, A. O. (1992). Allocating medical resources in rural America: Alternative perceptions of justice. Social Science and Medicine, 34(5), 467–474. Jenkins, J., & Karno, M. (1992). The meaning of ‘Expressed Emotion’: Theoretical issues raised by cross-cultural research. American Journal of Psychiatry, 149, 9–21. Kelly, S. E., Marshall, P. A., Sanders, L. M., Raffin, T. A., & Koenig, B. A. (1997). Understanding the practice of ethics consultation: Results of an ethnographic multi-site study. Journal of Clinical Ethics, 8(2), 136–149. Kirk, J., & Miller, M. L. (1986). Reliability and validity in qualitative research. Qualitative research methods, (series 1). Thousand Oaks, CA: Sage. Koenig, B., Black, A. L., & Crawley, L. M. (2003). Qualitative methods in end-of-life research recommendations to enhance the protection of human subjects. Journal of Pain and Symptom Management, 25, S43–S52. Lamphere, L. (2005). Providers and staff respond to medicaid managed care: The unintended consequences of reform in New Mexico. Medical Anthropology Quarterly, 19, 1–25. Lederman, R. (2007). Educate your IRB: An experiment in cross-disciplinary communication. Anthropology News, 48(6), 33–34. Levin, B. W. (1985). Consensus and controversy in the treatment of catastrophically ill newborns: Report of a survey. In: T. H. Murray & A. L. Caplan (Eds), Which babies shall live? Humanistic dimensions of the care of imperiled newborns (pp. 169–205). Clifton, NJ: Humana Press. Levin, B. W. (1986). Caring choices: Decision making about treatment for catastrophically ill newborns. Dissertation – Sociomedical Sciences, Columbia University, University Microfilms 870354. Levin, B. W. (1988). The cultural context of decision making for catastrophically ill newborns: The case of Baby Jane Doe. In: K. Michaelson (Ed.), Childbirth in America: Anthropological perspectives (pp. 178–203). South Hadley, MA: Bergin and Garvey Publishers, Inc. Lock, M. M. (2002). Twice dead: Organ transplants and the reinvention of death. Berkeley: University of California. Long, S. O. (2002). Life is more than a survey: Understanding attitudes toward euthanasia in Japan. Theoretical Medicine, 23, 305–319. Luborsky, M. (1994). The identification and analysis of themes and patterns. In: J. F. Gubrium & A. Sankar (Eds), Qualitative methods in aging research (pp. 189–210). Thousand Oaks, CA: Sage Publications. Mandelbaum, D. (Ed.) (1949). Edward Sapir: Selected writings in language, culture, and personality. Berkeley, CA: University of California Press. Marshall, P. A. (2003). Human subjects protections, institutional review boards, and cultural anthropological research. Anthropological Quarterly, 76(2), 269–285. Marshall, P. A., & Rotimi, C. (2001). Ethical challenges in community based research. American Journal of the Medical Sciences, 322, 259–263. Contextualizing Ethical Dilemmas: Ethnography for Bioethics 115 McCombie, S. C. (1987). Folk flu and viral syndrome: An epidemiological perspective. Social Science and Medicine, 25, 987–993. McLaughlin, L. A., & Braun, K. L. (1998). Asian and Pacific Islander cultural values: Considerations for health care decision making. Health and Social Work, 23(2), 116–126. Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis. Thousand Oaks, CA: Sage. Miller, E. (2001). The danger of talk: Negotiating risk in anthropological research with a human subjects research committee. Paper presented at the panel session, ‘‘Stranger In a Familiar Land: Medical Anthropologists at Practice in Bioethics and Clinical Biomedicine,’’ 100th Annual Meeting of the Anthropological Association of America. Washington, DC, November 28–December 2. Morse, J. M. (2000). Determining sample size. Qualitative Health Research, 10(1), 3–5. Muller, J. H. (1992). Shades of blue: The negotiation of limited codes by medical residents. Social Science and Medicine, 34(8), 885–898. Nader, L. (1972). Up the anthropologist – perspectives gained from studying up. In: D. Hymes (Ed.), Reinventing anthropology (pp. 284–311). New York, NY: Pantheon Books. Orfali, K., & Gordon, E. J. (2004). Autonomy gone awry: A cross-cultural study of parents’ experiences in neonatal intensive care units. Theoretical Medicine and Bioethics, 25(4), 329–365. Orona, C. J., Koenig, B. A., & Davis, A. J. (1994). Cultural aspects of nondisclosure. Cambridge Quarterly of Healthcare Ethics, 3, 338–346. Patton, M. Q. (1999). Enhancing the quality and credibility of qualitative analysis. HSR: Health Services Research, 34(5), 1189–1208. Pelto, P. J., & Pelto, G. H. (1978). Anthropological research: The structure of inquiry. New York: Cambridge University Press. Powers, B. A. (2001). Ethnographic analysis of everyday ethics in the care of nursing home residents with dementia: A taxonomy. Nursing Research, 50(6), 332–339. Rapp, R. (1999). Testing women, testing the fetus: The social impact of amniocentesis in America. New York: Routledge. Roper, J. M., & Shapira, J. (2000). Ethnography in nursing research. Thousand Oaks, CA: Sage Publications. Schensul, J. J., & LeCompte, M. D. (1999). Ethnographer’s toolkit. Blue Ridge Summit, Pennsylvania: AltaMira Press. Scrimshaw, S., & Hurtado, E. (1988). Rapid assessment procedures for nutritional and primary health care: Anthropological approaches to program improvement. Los Angeles, CA: UCLA Latin American Center and the United Nations University. Sieber, J. E. (1989). On studying the powerful (or fearing to do so): A vital role for IRBs. Institutional Review Board, 11(5), 1–6. Spradley, J. P. (1979). The ethnographic interview. New York: Harcourt Brace Jovanovich College Publishers. Stake, R. E. (1994). Case studies. In: N. K. Denzin & Y. S. Lincoln (Eds), Handbook of qualitative research (pp. 236–261). Thousand Oaks, CA: Sage Publications. Strauss, A. L. (1987). Qualitative analysis for social scientists. Cambridge: Cambridge University Press. Strauss, A. L., & Corbin, J. (1994). Grounded theory methodology: An overview. In: N. K. Denzin & Y. S. Lincoln (Eds), Handbook of qualitative research (pp. 273–285). Thousand Oaks, CA: Sage Publications. 116 ELISA J. GORDON AND BETTY WOLDER LEVIN Strauss, A. L., & Corbin, J. (1998). Basics of qualitative research: Techniques and procedures for developing grounded theory (2nd ed.). Thousand Oaks, CA: Sage Publications. Trotter, R. T., & Schensul, J. J. (1998). Methods in applied anthropology. In: H. R. Bernard (Ed.), Handbook of methods in cultural anthropology (pp. 691–7135). Walnut Creek, CA: AltaMira Press. Tylor, E. B. (1958[1871]). Primitive culture. New York: Harper. Weisz, G. (1990). Social science perspectives on medical ethics. Philadelphia: University of Pennsylvania Press. Zussman, R. (1992). Intensive care: Medical ethics and the medical profession. Chicago: University of Chicago Press. SEMI-STRUCTURED INTERVIEWS IN BIOETHICS RESEARCH Pamela Sankar and Nora L. Jones ABSTRACT In this chapter, we present semi-structured interviewing as an adaptable method useful in bioethics research to gather data for issues of concern to researchers in the field. We discuss the theory and practice behind developing the interview guide, the logistics of managing a semi-structured interview-based research project, developing and applying a codebook, and data analysis. Throughout the chapter we use examples from empirical bioethics literature. INTRODUCTION Interviewing provides an adaptable and reliable means to gather the kind of data needed to conduct empirical bioethics research. There are many kinds of interviews to choose among, and a primary feature that distinguishes them is the degree of standardization imposed on the exchange between interviewer and respondent. Semi-structured interviews, as their name suggests, integrate structured and unstructured exchanges. They rely on a fixed set of questions but ask respondents to answer in their own words, and they allow the interviewer to prompt for a more detailed answer or for Empirical Methods for Bioethics: A Primer Advances in Bioethics, Volume 11, 117–136 Copyright r 2008 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1479-3709/doi:10.1016/S1479-3709(07)11006-2 117 118 PAMELA SANKAR AND NORA L. JONES clarification. Such interviews are more structured than in-depth or ethnographic interviews that explore topics by following and probing a respondent’s largely self-directed account. But they are less structured than typical surveys that administer the same questions to all subjects and confine responses to a fixed set of choices. Semi-structured interviews combine the advantage of closed-ended questions, which allow for comparison across subjects, and the advantage of open-ended queries, which facilitate exploring individual stories in-depth. Similarly, semi-structured interview data is more flexible than data generated from strictly quantitative studies. Analysis of semi-structured interview data can run the spectrum from the statistical to the descriptive and in-depth. This combination of types of analyses contributes to the strength and wide-ranging applicability of semistructured interviewing. Researchers can use semi-structured interviews for many types of inquiry, including confirming existing research results or opening up new fields of inquiry, but the method is particularly well suited to exploratory research or to very focused theory-testing. This is the case because the effort entailed in conducting good semi-structured interviews and then analyzing the considerable data that they produce work against the large sample sizes required to examine certain kinds of population-based questions characteristic of epidemiological and some sociological studies. While semi-structured interview results can be analyzed quantitatively, this is not their primary strength. Their main contribution lies in the richness of the data they provide, a strength that is sacrificed when data are reduced to numerical values. In bioethics, semi-structured interviews have been used effectively to examine several topics, including genetic testing, end of life care, medical confidentiality, and informed consent. Genetic testing studies demonstrate the suitability of semi-structured interviews to explore new areas of ethical concern. For example, studies relying on semi-structured interviews have been instrumental in elucidating the challenge of successfully educating patients about the complexities of BRCA1/2 testing (Press, Yasui, Reynolds, Durfy, & Burke, 2001), examining the special circumstances of some patient groups, such as men who test positive for the gene (Hallowell et al., 2005; Liede et al., 2000), and studying how families communicate about test results (Claes et al., 2003; Forrest et al., 2003). Research on end of life care demonstrates the advantages of semi-structured interviews when examining sensitive topics such as how patients and families would like clinicians to answer questions about physician-assisted suicide (PAS) (Back et al., 2002) or what factors influence a terminally ill patient’s wish to hasten Semi-Structured Interviews in Bioethics Research 119 death (Kelly et al., 2002). Another example of exploring sensitive topics with semi-structured interviews is the authors’ study of patients’ deliberations to tell or not tell their physicians about medical problems they consider sensitive. Using semi-structured interviews, we found that patients withhold information about a wider range of topics than practitioners might expect (Sankar & Jones, 2005). Semi-structured interviews also have been productively used to study situations distinguished by complex or confusing communication (Featherstone & Donovan, 1998; Stevens & Ahmedzai, 2004; Pang, 1994). THE INTERVIEW PROCESS Developing the Interview Guide Semi-structured interviews rely on an interview guide that includes questions, possible prompts, and notes to the interviewer about how to handle certain responses. Constructing the interview guide derives from the study’s research question(s) and begins with reviewing the literature on a chosen topic to see how others have approached similar inquiries. Working back and forth between the previous research – what has been done – and the draft interview guide – what could be done – helps to clarify the aims and hypotheses of a study and brings into focus possible questions to include in the interview guide. The guide should be designed to balance the need to collect roughly similar kinds of information from all subjects while capturing each subject’s unique perspective on that information. Novices to interview guide development might consider reading existing interview guides available from colleagues or those posted online associated with published articles (see Box 1 for an excerpt of the interview guide we developed for our study on medical confidentiality). Several excellent articles also offer suggestions on how to formulate interview questions (Britten, 1995; Fowler, 1992; Mishler, 1986a, 1986b; Sanchez, 1992; Weiss, 1994). Here we offer some pointers for crafting and asking semi-structured interview questions, as well as suggestions for helping subjects answer those questions (Box 2). Reading draft questions aloud or trying them on colleagues can be useful. Very often what makes sense to the author eludes the listener. The order of questions in an interview guide merits careful consideration. Topics lend themselves to different schemes, such as chronological, or by degrees of complexity or intrusiveness. The interview guide for our medical 120 PAMELA SANKAR AND NORA L. JONES Box 1. The Interview Guide: A Brief Example. 7. DEFINE ‘‘CONF’’: If you had to explain to a friend what ‘‘confidentiality’’ means, what would you tell her it means? 7a. So then, what part of the information that you talk about with your doctor or nurse is ‘‘use respondent’s wording, e.g., ‘just between a person and her doctor’’’ 7b. Does this confidential information go into your medical record? [SKIP 8 & 9 IF ALREADY ANSWERED IN 7] 8. RELEASE?: Do you think there are any situations where your doctor or nurse would decide NOT to keep confidential information use respondent’s wording, e.g., ‘just between a person and her doctor’ – an give out or release your confidential medical information? a. NO b. YES 9. Could you give me examples of this kind of situation? confidentiality study, for example, started with questions about how subjects thought medical confidentiality operated in a clinic setting, and then asked them how they thought it should operate. Only after 30 or 40 min of exchange did we ask respondents to recount their own personal experiences related to medical confidentiality (Jenkins, Merz, & Sankar, 2005; Sankar & Jones, 2005). Once a rough draft of the interview guide is complete, it is essential to try it out, start to finish, with a colleague to see if the questions make sense and flow sensibly from one to the next. Subsequent to this, it is advisable to revise the guide and try it on members of the study population while taping these pilot interviews, and if possible take notes while conducting them. Asking subjects to explain if and why a question seems unclear is important. Transcribing these interviews is not necessary, although if working in a large research team, transcribing allows everyone to review the results and facilitates revising the guide. The goal is to create an interview guide that is comprehensible to respondents and that evokes responses which answer the questions the research asks. While this goal might seem self-evident, achieving it can be far more difficult than anticipated. Absent careful review of pilot interview transcripts, the insight that the interview failed to ask the Semi-Structured Interviews in Bioethics Research 121 Box 2. Asking Questions: Some Pointers.  Tense and Specificity. The verb tense of a question influences its likely responses  Questions asked in the present tense, for example, ‘‘What happens when you call the doctor?’’ tend to elicit generalized accounts.  Conversely, questions phrased in the past tense, such as ‘‘What happened the last time you called the doctor?’’ will tend to elicit specific actions and experiences.  Phrasing  Avoid phrasing that allows for yes/no answers. Questions beginning with what or how are better at eliciting fuller answers than those starting with did, as in ‘‘Did you ask the doctor any questions?’’  Ask only one question at a time. If an interviewer asks, ‘‘What did you do after your diagnosis and how did you feel about that,’’ respondents can get caught up in answering the second part of the question and be unable or unwilling to go back and provide a complete answer to the first part.  Often the best follow-up question consists of repeating back the last phrase of the subject’s previous statement, as in, ‘‘y you called her after dinner y’’ when the subject has stated, ‘‘I couldn’t reach the doctor all day so I had to call her after dinner.’’ This response indicates that you are listening and encourages the subject to continue talking without directing him or her toward any particular response.  Helping respondents answer your questions  Extending: Can you tell me more about how you met her?  Filling in detail: Can you walk me through that? So were talking to him after the phone call?  Making indications explicit: when subjects nod or sigh, try to confirm verbally what they indicated. ‘‘So would you say that was good news?’’ or ‘‘Tell me what you’re thinking when you sigh like that.’’ Or, most directly, ‘‘Can you say what you’re thinking for the tape recorder.’’ 122 PAMELA SANKAR AND NORA L. JONES ‘‘right’’ question may not emerge until all the data are collected and the opportunity to revise the interview guide long past. Reviewing the interview guide through extensive pilot testing should also mitigate the need to revise questions during the data collection phase of a project, a common remedy for poorly phrased questions that complicates data analysis and reduces internal validity. Formulating, testing, reviewing, and revising questions so that they successfully communicate to the subject what interests the researcher is roughly equivalent to the steps that quantitative researchers go through in scale development when they ‘‘validate’’ an instrument. Validity refers generally to the relationship between measurements (such as scales or questions) and the phenomenon they claim to capture or measure. The closer that relationship is the more ‘‘valid’’ a measure. There are several types of validity, including internal, external, and construct validity. For a useful review of these concepts written for qualitative researchers, see Maxwell (1996). A major difference between the import of these ideas for qualitative and quantitative research is the degree of specificity or formalism in assessing validity. The latter is standard in quantitative research. The logical problems posed by validity concerns, however, are equally salient in qualitative research. For example, in question development for semi-structured interviews, construct validity focuses attention on assessing whether the question asked is the question answered. In our medical confidentiality study for example, we struggled with devising a question to determine how subjects understood medical confidentiality. Asking them directly, ‘‘What does medical confidentiality mean?’’ seemed to suggest a test question with a right/wrong answer. Subjects interpreted the question more like ‘‘What is the definition of medical confidentiality?’’ and responded with their approximation of a dictionary definition. The question they answered was not the question we meant to ask. We went through several iterations, including versions that added ‘‘to you’’ at the end of the original question, and another that asked subjects what it meant if a doctor or nurse told them they would keep something confidential (which elicited responses about the doctor or nurse’s character, such as ‘‘It would mean they cared about me’’). Finally we arrived at the solution of asking the subject to tell us how she would explain to a friend what medical confidentiality meant. These questions elicited answers such as, ‘‘I would tell her that it means keeping a secret. That whatever I tell a doctor goes to no one else.’’ These were the kind of responses we were seeking, that is, what patients think it means in a medical setting when a health care practitioner characterizes an exchange as ‘‘confidential.’’ Formulating, testing, and Semi-Structured Interviews in Bioethics Research 123 reformulating questions brought the query we posed and the phenomenon we were trying to measure closer and closer into alignment. Piloting the interview guide will require finalizing most of the bureaucratic and practical steps that actual interviews require. This includes completing all necessary human subjects research review, and deciding who will be interviewed, how they will be contacted, and where the interviews will be conducted. While such complete preparation for piloting might seem onerous and unnecessary, every moment spent testing the interview guide and refining recruitment and other logistics before the study’s full implementation will pay off by revealing obstacles that eventually might have delayed the project. Sampling: Choosing Whom to Interview The research question will determine the population to be sampled for interviews. How to approach the population and solicit its members for participation needs to be worked out carefully in advance based on familiarity with the targeted group and discussion with some of its members. The best designed study can founder on unanticipated obstacles in subject recruitment. If the target group is large and loosely defined, such as women who have had mammograms, the issues are quite different than if research calls for talking with a small or more isolated group, such as people who have undergone genetic testing for hearing loss. Abundance might pose a sampling problem in the former – which women who have had mammograms are of interest? Old, young, insured, uninsured, rural, or urban? Scarcity presents a challenge in the latter. How does one find people who have had genetic testing, who are few in number, geographically dispersed and, if themselves deaf or hard of hearing, who might not be reachable by methods typically used by the hearing research community? A method often relied on in qualitative research is snowball sampling, in which potential participants are identified by asking initial subjects to suggest names of other people who might be interested in participating in the study. The resulting sample is likely to consist of people who know one another, which also means it might represent a fairly narrowly defined group. Inferences from such a study cannot be generalized much beyond the subjects interviewed or individuals deemed highly similar to them. While this might be acceptable for studies examining very specific issues confined to a well-defined group, it is less effective for more broadly relevant issues. Another common strategy for identifying potential participants is to use 124 PAMELA SANKAR AND NORA L. JONES clinic staff or other gatekeepers to estimate the flow of people through a recruitment site. Relying solely on this information can be a mistake. Overestimating is common, as is failure to recall seasonal fluctuations. Researchers would do well to review any sort of available records that might document actual numbers of individuals or groups pertinent to the research. Estimating how many and what kinds subjects to enroll requires knowing what kind of inferences you plan to make from findings. The way one estimates and selects a sample will directly influence what inferences one can make from the data, and these sampling decisions will have to be explained in any publication that results from one’s research. There are several excellent articles on sampling for semi-structured interviews (Creswell, 1994; Curtis, Gesler, Smith, & Washburn, 2000; Marshall, 1996; Morse, 1991; Schensul, Schensul, & LeCompte, 1999). In general, sampling should control as much as possible for subject bias. Biased sampling occurs when researchers assume that the perspective of one group of potential respondents is representative of a more inclusive group, for example, using race or ethnicity as proxy for a wider range of socioeconomic and sociodemographic variables. A study focused on minority perspectives on genetic testing that only sampled northern urban African Americans would suffer from such sample bias, if it purported to speak for a general US minority population that is more diverse than the group sampled. Finally, in studies focused on the beliefs, attitudes, and behaviors of a narrowly defined group of people, such as oncology clinic patients, or about a specific topic, such as attitudes about hospice care, the question of the exact number of subjects may be left open until after a few rounds of preliminary coding. Some researchers decide to keep recruiting subjects until a point of theoretical saturation, meaning that new interviews begin to be redundant and are not adding new substantive issues (Eliott & Olver, 2003). Conducting the Interview The setting for the interview should be comfortable and foster an open exchange, which often, but not necessarily, means a private room. Regardless of where the interview is conducted, it is important to make sure the space is available for the entire time needed to conduct the interview. Completing the interview in a timely manner shows respect to the participant and minimizes scheduling problems. Regardless of what is explained when arranging to meet the respondent, he/she needs to be re-oriented of the study objectives. It is important to Semi-Structured Interviews in Bioethics Research 125 explain what in his/her background makes him or her able to contribute to the research. Another essential strategy is to briefly review the aim(s) of the study, topics of the interview and the order in which they will appear, and to underscore that unlike other interviews or surveys that ask for simple yes or no answers, this type of interview emphasizes learning what respondents consider important about the research topic and relies on them to provide answers in their own words. Semi-structured interviews allow a high degree of flexibility not only to the respondent but to the interviewer as well. A question mistakenly asked so that it can be answered with ‘‘yes’’ or ‘‘no’’ when the objective was to obtain a detailed response, can simply be re-stated. Having gone to the trouble to start the interview, respondents usually share the researcher’s interest in successfully completing it. Technicalities of Data Collection and Transcription There are two primary reasons why audiotaping or digitally recording interviews is preferable to note taking: memory and foresight. As semistructured interviews can be very lengthy, it is highly unlikely that the interviewer can remember the full story behind the abbreviated notes jotted down during an interesting story early in the interview. Also, as research progresses, new patterns in the interviews emerge and new questions can be asked about the data. At the time of the interview, these new issues will not catch the interviewer’s attention and will not be documented, or not be documented fully enough to permit subsequent detailed analysis. To assure high quality recordings and to avoid contributing to the overflowing stock of ‘‘interviews that got away’’ stories, one should obtain reliable equipment, bring several extra batteries and cassettes, and test the recorder before each interview. After obtaining informed consent, it is useful to establish the respondent’s permission to record the interview and have the respondent and the interviewer both speak while the recorder is in the exact position it will be in during the interview. It is also important to play back this recording and check for levels and intelligibility. It is not advisable to use the ‘‘voice activation’’ option as words or whole sentences can be cutoff. A good idea is to glance at the recorder every 10 min or so to make sure it is still functioning properly. Moreover, it is advisable to use one tape for each interview and not taping the next interview on the same cassette, as it is too easy to mix up the sides and tape over the previous interview. Cassettes can be reused after transcribing a set of interviews, so this practice need not 126 PAMELA SANKAR AND NORA L. JONES be wasteful. More and more, researchers in the field are switching to digital recording to eliminate tapes altogether, but it is important to practice a few times with any new technology before relying on it in an interview. Finally, it is essential to carefully label interview cassettes with an appropriate identification code and arrange for transcription. With respect to transcription, every hour of talk takes upwards of 3–4 h to transcribe. The task is tedious and best left to a professional transcriptionist if the project can afford one, although in smaller studies transcribing interviews can provide a good way to review data. Whether analysis will rely on a software program or pen and paper coding, it will proceed more smoothly with established transcription guidelines to help assure consistency across transcripts and to facilitate a review of the documents for errors or to remove any identifying personal information. Creating a data preparation chart to track the progress of individual interviews helps to organize interview tasks. It is important to keep any information that might identify a subject out of a data preparation chart and to create instead a code that links the data from this chart to other data sets containing subject contact information. A data preparation chart might contain fields that track an interview from its occurrence (including the date, time, place, and initials of the person who conducted an interview) to its completion as a fully prepared document ready for analysis (including the date the cassette was sent and received back for transcription, checks on transcript accuracy, reviews to remove personally identifying information, formatting text for database importing, and finally, the cassette’s erasure). CODING AND DATA ANALYSIS Researchers opt for semi-structured interviews because they provide more and more nuanced data than close-ended questions. However, the amount and kind of data that such interviews can produce will also greatly increase data management and analysis tasks. Conducting fewer interviews might cut down on these burdens, but analyzing any amount of unstructured text inevitably increases the work required in the post-data collection phase of research. Much analysis of semi-structured interview data is the same irrespective of the number of interviews conducted or the means of analysis (note cards or qualitative data analysis software programs). However, here we assume that study data will be entered into a qualitative data analysis program and that somewhere between 30 and 100 interviews have been conducted. This Semi-Structured Interviews in Bioethics Research 127 range is somewhat arbitrary, but reflects our experience that studies with fewer than 30 subjects are less likely to require the team coding process described here, and those that exceed 100 are likely to benefit from primarily quantitative analysis, which we address only in passing. There are examples of studies with larger sample sizes using semi-structured interviews (Kass et al., 2004; Mechanic & Meyer, 2000), but these studies are atypical, either in their lengthy duration (extending over many years) or in their reliance on a high proportion of close-ended questions, which minimizes coding burden. Coding is the process of mapping interview transcripts so that patterns in the data can be identified, retrieved, and analyzed. In other words, coding provides a means to gain access to interview passages or data, rather than becoming the data itself. Unlike coding survey responses for quantitative analysis, which requires reducing responses to numeric values, the goal of coding semi-structured interview transcripts is to index the data to facilitate its retrieval, while retaining the context in which data was originally identified. Researchers should investigate different qualitative software packages before beginning the codebook development process, as each program has different systems for coding. Common programs worth exploring include N6 (2002), ATLAS.ti (Muhr, 2004) and HyperResearch (2003). See the ‘‘Computer Assisted Qualitative Data Analysis (CAQDAS)’’ website for software comparisons and tips for choosing a qualitative software program (www.caqdas.soc.surrey.ac.uk/). Codes Codes classify and represent relevant passages in the interview. Devising codes that accomplish these tasks requires identifying the central themes in the research, developing concepts to represent them, and assessing how these concepts relate to each other and to overall research questions that motivate the study. Developing and defining codes thus constitutes the first stage in data analysis, which means that developing the coding manual requires careful attention. Coding itself consists of reading transcripts and deciding how and where to apply codes. Ideally the coding manual contains codes that are sufficiently well suited to the data and well enough defined that the process of coding, while unlikely to be automatic, does not require continually rethinking the meaning or limits of codes. There are several types of codes, which can be thought of as roughly paralleling the stages of coding, or at a minimum, the stages of developing a coding manual. 128 PAMELA SANKAR AND NORA L. JONES Coding Manual The first stage of developing a coding manual consists of listing codes that cover information that the interview explicitly sought. For example, in our study on confidentiality, we asked women how they would explain medical confidentiality to a friend. All of the passages containing answers to this question were assigned the code ‘‘DEF CONFI,’’ which stood for ‘‘definition of confidentiality.’’ Many codes of this type need to be divided into sub-codes. Sometimes sub-codes are logically suggested by the original code, as one might expect a code such as ‘‘family member’’ to have the following sub-codes: ‘‘spouse,’’ ‘‘child,’’ and ‘‘parent.’’ Other times subcodes are needed to represent distinctions that emerge from the data. Subdividing codes based on emergent data moves the focus from codes that are implied in the interview guide to codes that capture themes that the interviews have elicited. Thematic coding identifies recurrent ideas that are implicit in the data. Some of these codes might be envisioned in advance of the interviews. More typically, however, thematic coding is held out as the means for capturing what one learns or discovers when examining the interviews together as a completed set. In our confidentiality study, repeated transcript readings and examinations of the primary code ‘‘DEF CONFI’’ revealed that respondents often ground their definitions either in the personal relationships between patient and practitioner or in bureaucratic procedures required to safely store and accurately transmit sensitive information. Two thematic sub-codes of ‘‘DEF CONFI,’’ ‘‘personal’’ and ‘‘bureaucratic,’’ captured these recurrent ideas (Jenkins et al., 2005). Also, further analysis determined that these themes connected with many other features emblematic of different ways of understanding and using medical confidentiality. A similar staged coding process was used by researchers studying patients’ interactions with their physicians regarding PAS (Back et al., 2002). The first level of primary codes included ‘‘interactions with health care providers,’’ ‘‘reasons for pursuing PAS,’’ and ‘‘planning for death.’’ Further examination of the text coded to the primary code ‘‘interactions with health care providers’’ revealed three distinctive themes, which became three individual sub-codes: ‘‘explicit PAS discussion,’’ ‘‘clinician willingness to discuss dying,’’ and ‘‘clinician empathy.’’ Through this process, researchers were able to see what their subjects valued in their clinician’s responses, or what they wanted but did not get. The provisional coding manual provides draft definitions of the codes, which are tested by applying them to a set of completed interviews. Semi-Structured Interviews in Bioethics Research 129 The experience of trying to fit draft codes to actual interviews clarifies which codes effectively capture responses and which do not, which might with revision, and what new codes should be tested. This process is analogous to the work required to develop interview questions that accurately express the researcher’s interests in that the task is one of proposing, matching, and testing representation to concepts. Testing draft codes also helps to identify example passages from the interviews that researchers can incorporate in the coding manual to demonstrate a code’s definition or limits. After a test-round of coding, a revised coding manual is generated and applied to a second set of interviews that preferably includes some from the prior coding round as well as new ones. The second round continues largely in the manner of the first. Codes are evaluated and revised, and their definitions revised in parallel. This process continues until a codebook emerges that contains a set of codes judged to have sufficient range to effectively represent the material of interest to researchers and sufficient clarity to be consistently applied by coders (see Box 3 for an excerpted section of the codebook used in our confidentiality study). Multi-Level Consensus Coding (MLCC) With studies based on 30 or more interviews, coding typically requires, and nearly always benefits from, more than one coder. We have developed an approach to coding large semi-structured interview data sets called multilevel consensus coding (MLCC) (Jenkins et al., 2005). MLCC meets the challenge of analyzing a large number of open-ended qualitative interview transcripts by generating a high degree of inter-coder reliability and comparability among interviews, while maintaining the singular nature of interviewee’s experiences. This coding model depends on the use of a code and retrieval software program, in this case, N6, that aids in the analysis of qualitative data through flexible coding index systems and searching protocols that identify thematic patterns, essential in the mapping of qualitative knowledge. MLCC differs from other coding strategies described in most standard methods texts in two ways. First, it makes explicit the connections between training to code, coding, and analysis by fore-fronting the vital connection between the codes, the coders, and the process of coding. The codes do not exist separately from the coders in the sense that assigning a code (that is, coding) requires coders to have a shared understanding of the relationship between code and text. In turn, having to reach consensus about a code 130 PAMELA SANKAR AND NORA L. JONES Box 3. Sample Codebook. This codebook example with numbering follows the system used in Nudist v4 (non-numerical unstructured data indexing searching and theorizing) (see Box 1 to compare the codebook with the interview guide). (7) Q7 – Def confi (7 1) Between 2 people/personal (7 2) Need to know (7 3) Continuity of care model/bureaucratic model (7 4) Other (7 5) Don’t know (7 6) NA (8) Q8-release? (8 1) Yes (8 2) No (9) Q9 – Desc Release (9 1) Happens – w/consent (9 2) Happens – but not supposed to (9 3) Happens – continuity of care model (9 4) Happens – infectious disease (9 5) Happens – emergency (9 6) Happens – research (9 7) Happens – other (9 8) Happens – doesn’t specify (9 9) Don’t know (9 10) NA reproduces and, overtime, changes that understanding. The change in understanding should not be taken to mean the original relationship between the code and its definition and examples was flawed (although certainly that is sometimes the case). Rather, the change in understanding refers to evolving understanding of the phenomenon to which the code was meant to apply and to the capacity of the code to represent it. Second, MLCC formalizes and makes a virtue of the need for several stages of coding. While similar to the types of coding described in grounded theory methods (open, axial, and so on), the levels or stages laid out in MLCC are Semi-Structured Interviews in Bioethics Research 131 less tied to a specific interpretive framework, and are intended more to serve a broader range of projects. Further, it highlights the steps required to code semi-structured interview data as a particular method because as much as the basic steps described here differ little from those that appear in many qualitative methods textbooks, a review of articles claiming to follow qualitative methods suggests that many studies omit or abbreviate the process in ways that seriously undermine the validity of the data and the legitimacy of claims that accepted qualitative data analysis produced it. MLCC responds to concerns about inter-coder reliability and comparability between interviews through extensive training of coders and through consensus coding, or group decision-making. At the beginning of training coders jointly code two to three transcripts with PI supervision. When the coders have gone through a sufficient number of transcripts so that they are familiar with the content of the interviews and the coding rules, they then move onto ‘‘round-robin’’ coding. A round-robin coding session is a group of four coders (at least one PI among them) who each code three interviews, each interview with a different partner. The discussion session for roundrobin coding is an occasion to test individual perceptions of the meaning of coding categories and rules, and an occasion where irregular coding is brought into conformity with the rules. The round-robin training sessions are similar to consensus sessions that coders will participate in during regular coding. Coders then code several more interviews individually and in pairs, mirroring the regular coding method. Following training the coding teams are divided into pairs, and two coders independently read and code each level of each interview. Coding pairs are rotated so that each coder codes at least a subset of interviews with every other coder, although often certain pairs seem to work better together if only because of scheduling issues. Rotating members of the pair helps to prevent ‘‘coding drift,’’ which occurs over the course of a long coding effort when one or more coders starts to apply codes in an increasingly idiosyncratic manner. Occasional general data checks of all coded interviews can also help catch the wayward evolution of codes. Alternatively, the wayward codes these checks identify sometimes prove to be better than the approved ones. If this happens, the new code can either be added to the coding manual or substituted for an existing code. In either case, changing the coding manual should be done only after careful consideration, as it requires reviewing and amending all completed coding to bring it in line with the new codes. After each member of the coding pair has coded an interview, they meet to compare individual coding choices and come to agreement on the final 132 PAMELA SANKAR AND NORA L. JONES coding for the interview. Printing line and page numbers on the interview transcripts facilitates such crosschecking. Disagreements are discussed in these pairs, and any that the pair is unable to resolve are brought to a weekly consensus meeting, which all pairs of coders, as well as the project manager, attend. Data that cannot be coded after being discussed in these consensus meetings are omitted from analysis. Using the MLCC method, interviews are coded in three levels or stages. The objective of first-level coding is to generate standard, comparable answers to basic questions asked during each interview. The objective of second-level coding is to generate a series of vignettes characterizing the central experiences, ideas, and issues informing each interviewee’s perceptions. When examined together through N6, first- and second-level coding blend a structured and comparative analytic lens with a nuanced representation of each research participant. Third-level coding uses the first two layers of coding to theorize about the themes that emerge from the interviews. While requiring considerable organizational effort, the method has several advantages, foremost among them creating a high level of consistency, or validity, across interviews. This process also creates a collective memory of how codes have been applied, a regular forum that catches problem codes earlier than might be the case if coders interact less frequently, and a handful of people intimately acquainted with the data and interested in proposing ideas for its analysis. Analysis Once coding has been completed, the codes need to be entered into the qualitative data analysis computer program. This can even be done during the preliminary coding phase if the program will be used for codebook development. Most software programs allow for online coding, but pair coding and consensus meetings require that paper versions of coded documents be available. Our typical strategy is to start on paper and then enter coding into the program only after if has been finalized. Each program has different logistics for entering the codebook and the coding interviews, but in general, what these programs do is create a ‘‘filing cabinet’’ that stores the interviews and the coding. This filing cabinet organizes material in a myriad of ways – by interview, by the codes, or by demographic information. In response to different search commands the program retrieves any combination of Semi-Structured Interviews in Bioethics Research 133 ‘‘files,’’ or sections of coded text. Some programs can also use the existence of particular coding to create matrices that can be exported to various statistical software packages. It is the automated data retrieval function of these programs that so benefits qualitative analysis by allowing researchers to retrieve all data coded to a particular code and to examine relationships between different segments of coded data. For example, a search of our medical confidentiality interviews for all passages coded to ‘‘sexuality concerns’’ might return primarily responses from young interviewees. A second search on these passages for the context in which these concerns were expressed might suggest they are most common among college age women attending a university health service. These data are useful and could be reported as a quantitative statement, such as: ‘‘Three fourths of women who reported having hesitated to discuss sexuality concerns with health care practitioners were college students between the ages of 18 and 22 who used the college health service for their care.’’ However, ending there would miss the point of semi-structured interview data analysis. The software has simply located the respondents and passages where sexuality was characterized as a sensitive topic. Complete analysis entails going back to the transcripts and examining the selected passages in the context of the interview to discern whether the comments relate to one of the thematic codes, or whether they suggest ideas for additional analyses. Given the initial finding that younger women in university health service settings had a high number of ‘‘sexuality concerns,’’ one might, for example, query the status or gender of the health care practitioner seen by these women and what women expressed as their concerns in this setting (if these factors have been coded), and how those concerns related to other themes in the interview. As patterns start to emerge linking concerns, setting, age, and gender, hypotheses about the relationships between age and gender relations in health care might be generated that can be tested on broader segments of the sample or on the whole sample. The point, however, is always to reach back into the interview, at least to the coded passages if not to lengthier exchanges, with the goal of situating patterns in the context of the respondent’s story as a whole. These forays into the data also provide the opportunity to collect exemplary quotes that can be used to illustrate the resulting presentation or article. Before beginning analysis, it is important to devise an overall plan that includes steps needed to examine a set of relationships between thematic codes. Researchers can and typically do deviate from such a plan. Still, 134 PAMELA SANKAR AND NORA L. JONES having a written version of a strategy helps to highlight when one has departed from it, which encourages examining the reasons for doing so and keeping track of why old ideas were discarded and new ones proposed. Also, most qualitative data analysis programs lack an easy or effective way to keep track of the order in which analyses were conducted so it is the responsibility of the analyst to do so. Summary Empirical research is increasingly gaining an equal footing with philosophical analysis in bioethics inquiry. Among the myriad of methods brought to bear in this work, semi-structured interviewing is a reliable and flexible means to gather data. The semi-structured interview allows for comparison across subjects as well as the freedom to explore what distinguishes them. Semi-structured interviews are useful for examining the complex moral issues that bioethics confronts. The give and take of this method allows the interviewer to follow the subject’s lead, within the parameters of the study. The volume of the data acquired through semi-structured interviews entails, however, a trade off, usually in sample size. Given the demands of the method, including its time-consuming coding, and the difficulty of adapting a single interview guide to widely variant populations, the generalizability of findings will always be somewhat limited. Additionally, it can be difficult for interviewers to repeatedly hear lengthy personal stories that are distressing, such as those related by women about a breast cancer diagnosis or confidentiality breaches. Research findings from studies relying on semi-structured interviews can be considered problematic for policymakers because results are not easily generalizable, given the uniqueness of the real-world situations from which the data are drawn (Koenig, Back, & Crawley, 2003). Similarly, for clinicians looking for strategies to improve practice, the absence of easily digested bullet points in many qualitative studies may lead readers to ignore such research. However, these limitations are offset by the distinct strengths of this method, including detailed, in-depth, and unanticipated responses. Such data can often help identify novel factors or explain complex relationships, which can contribute substantially to the understanding of complex ethical issues, policy assessment, decision-making, and development of interest in bioethics. Semi-Structured Interviews in Bioethics Research 135 REFERENCES Back, A., Starks, H., Hsu, C., Gardon, J., Bharucha, A., & Pearlman, R. (2002). Clinicianpatient interactions about requests for physician-assisted suicide. Archives of Internal Medicine, 162, 1257–1265. Britten, N. (1995). Qualitative research: Qualitative interviews in medical research. BMJ, 311, 251–253. Claes, E., Evers-Kiebooms, G., Boogaerts, A., Decruyenaere, M., Denayer, L., & Legius, E. (2003). Communication with close and distant relatives in the context of genetic testing for hereditary breast and ovarian cancer in cancer patients. American Journal of Human Genetics, Part A, 116(1), 11–19. Creswell, J. (1994). Chapter 5: Questions, objectives, and hypothesis. In: Research design: Qualitative and quantitative approaches. London: Sage. Curtis, S., Gesler, W., Smith, G., & Washburn, S. (2000). Approaches to sampling and case selection in qualitative research: Examples in the geography of health. Social Science and Medicine, 50, 1001–1014. Eliott, J., & Olver, I. (2003). Perceptions of ‘good palliative care’ orders: A discursive study of cancer patients’ comments. Journal of Palliative Medicine, 6(1), 59–68. Featherstone, K., & Donovan, J. (1998). Random allocation or allocation at random? Patients’ perspectives of participation in a randomised controlled trial. British Medical Journal, 317(7167), 1177–1180. Forrest, K., Simpson, S., Wilson, B., van Teijlingen, E., McKee, L., Haites, N., & Matthews, E. (2003). To tell or not to tell: Barriers and facilitatory in family communication about genetic risk. Clinical Genetics, 64, 317–326. Fowler, F. (1992). How unclear terms affect survey data. Public Opinion Quarterly, 56, 218–231. Hallowell, N., Ardern-Jones, A., Eeles, R., Foster, C., Lucassen, A., Moynihan, C., & Watson, M. (2005). Communication about genetic testing in families of male BRCA1/2 carriers and non-carriers: Patterns, priorities and problems. Clinical Genetics, 67(6), 492–502. HyperRESEARCH, v2.6. (2003) Boston: ResearchWare, Inc. Jenkins, G., Merz, J., & Sankar, P. (2005). A qualitative study of women’s views on medical confidentiality. Journal of Medical Ethics, 31(9), 499–504. Kass, N., Hull, S., Natowicz, M., Faden, R., Plantinga, L., Gostin, L., & Slutsman, J. (2004). Medical privacy and the disclosure of personal medical information: The beliefs and experiences of those with genetic and other clinical conditions. American Journal of Medical Genetics, 128, 261–270. Kelly, B., Burnett, P., Pelusi, D., Badger, S., Varghese, F., & Robertson, M. (2002). Terminally ill cancer patients’ wish to hasten death. Palliative Medicine, 16, 339–345. Koenig, B., Back, A., & Crawley, L. (2003). Qualitative methods in end-of-life research: Recommendations to enhance the protection of human subjects. Journal of Pain and Symptom Management, 25(4), S43–S52. Liede, A., Metcalfe, K., Hanna, D., Hoodfar, E., Snyder, C., Durham, C., Lynch, H., & Narod, S. (2000). Evaluation of the needs of male carriers of mutations in BRCA1 or BRCA2 who have undergone genetic counseling. American Journal of Human Genetics, 67(6), 1494–1504. Marshall, M. (1996). Sampling for qualitative research. Family Practice, 13(6), 522–525. 136 PAMELA SANKAR AND NORA L. JONES Maxwell, J. A. (1996). Chapter 6: Validity: How might you be wrong? In: Qualitative research design: An interactive approach (pp. 86–98). Thousand Oaks: Sage. Mechanic, D., & Meyer, S. (2000). Concepts of trust among patients with serious illness. Social Science and Medicine, 51, 657–668. Mishler, E. (1986a). Chapter 3: The joint construction of meaning. In: Research interviewing: Context and narrative. Boston: Harvard University Press. Mishler, E. (1986b). Chapter 4: Language, meaning, and narrative analysis. In: Research interviewing: Context and narrative. Boston: Harvard University Press. Morse, J. (1991). Strategies for sampling. In: J. Morse (Ed.), Qualitative nursing research: A contemporary dialogue (pp. 127–145). Newbury Park, CA: Sage. Muhr, T. (2004). User’s manual for ATLAS.ti. Berlin: Scientific Software Development GmbH. N6 Non-numerical Unstructured Data Indexing Searching and Theorizing Qualitative Data Analysis Program, v6 (2002). Melbourne, Australia: QSR International Pty Ltd. Pang, K. (1994). Understanding depression among elderly Korean immigrants through their folk illnesses. Medical Anthropology Quarterly, 8(2), 209–216. Press, N., Yasui, Y., Reynolds, S., Durfy, S., & Burke, W. (2001). Women’s interest in genetic testing for breast cancer susceptibility may be based on unrealistic expectations. American Journal of Medical Genetics, 99(2), 99–110. Sanchez, M. (1992). Effects of questionnaire design on the quality of survey data. Public Opinion Quarterly, 56, 206. Sankar, P., & Jones, N. (2005). To tell or not to tell: Primary care patients’ disclosure deliberations. Archives of Internal Medicine, in Press. Schensul, S., Schensul, J., & LeCompte, M. (1999). Chapter 10: Ethnographic sampling. In: Essential ethnographic methods: The ethnographer’s toolkit. Walnut Creek: AltaMira Press. Stevens, T., & Ahmedzai, S. (2004). Why do breast cancer patients decline entry into randomised trials and how do they feel about their decision later: A prospective, longitudinal, in-depth interview study. Patient Education and Counseling, 52(3), 341–348. Weiss, R. (1994). Chapter 3: Preparation for interviewing. In: Learning from strangers. New York: The Free Press. SECTION III: QUANTITATIVE METHODS This page intentionally left blank SURVEY RESEARCH IN BIOETHICS G. Caleb Alexander and Matthew K. Wynia ABSTRACT Surveys about ethically important topics, when successfully conducted and analyzed, can offer important contributions to bioethics and, more broadly, to health policy and clinical care. But there is a dynamic interplay between the quantitative nature of surveys and the normative theories that survey data challenge and inform. Careful attention to the development of an appropriate research question and survey design can be the difference between an important study that makes fundamental contributions and one that is perceived as irrelevant to ethical analysis, health policy, or clinical practice. This chapter presents ways to enhance the rigor and relevance of surveys in bioethics through careful planning and attentiveness in survey development, fielding, and analysis and presentation of data. INTRODUCTION Surveys are a common method in empirical bioethics. They cannot say what is right or wrong, but they can reflect, within bounds, what people are actually doing or thinking. They can also provide information concerning whether consensus exists about a given issue. As a result, they can both inform ethical analysis and be useful in policy making and clinical practice. Empirical Methods for Bioethics: A Primer Advances in Bioethics, Volume 11, 139–160 Copyright r 2008 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1479-3709/doi:10.1016/S1479-3709(07)11007-4 139 140 G. CALEB ALEXANDER AND MATTHEW K. WYNIA Consider three examples of surveys examining ethically important topics. The first, a survey of healthy volunteers and firefighters regarding their attitudes towards quantity and quality of life, indicates that approximately one-fifth would choose radiation instead of surgery to manage laryngeal cancer in an effort to preserve voice, even though doing so would lead to a lower likelihood of survival (McNeil, Weichselbaum, & Pauker, 1981). The second, a survey of family members, demonstrates that their ability to accurately predict a loved one’s preference for life-sustaining treatment is quite limited (Seckler, Meier, Mulvihill, & Paris, 1991). The third, a survey of physicians, suggests that nearly two-fifths of respondents report having used one of three tactics to deceive insurance companies in order to help their patients obtain coverage for restricted services (Wynia, Cummins, VanGeest, & Wilson, 2000). These examples illustrate the diversity of ethically important topics that can be explored by surveys, as well as the various purposes they can serve (Table 1). The first survey, examining the tension that may exist between treatments that impact both quality and length of life, demonstrates that length of life is not always paramount, and some people are willing to make trade-offs in ‘‘cure-free survival’’ for improved quality of life such as preservation of voice. Many surveys of ethically important topics are of this type; they are useful in the process of developing, supporting or refuting normative empirical claims (e.g. the claim that most people value quantity over quality of life). The second survey provides important evidence of limitations in the ability of surrogates to predict what family members without decisionmaking capacity would want in situations of life-threatening illness. This has profound and direct clinical implications: physicians should strive to ensure that patients’ wishes are known, work with family members to distinguish between their own wishes and those of the patient whose best interests they are trying to serve, and be cautious of the reliability of ‘‘substituted judgment’’ to truly reflect patient’s wishes. Many other noteworthy surveys examine similarly important topics in clinical ethics, such as those exploring the accuracy of physician’s prognoses for their terminally ill patients (Christakis & Lamont, 2000). The final example examines insurance company deception. It demonstrates regulatory failures or the inoperability of common normative assumptions that physicians should not deceive on behalf of their patients, whether to protect a third party (Novack et al., 1989) or to secure reimbursement for services (Wynia et al., 2000). Such findings may lead to policy changes to make the adjudication of appeals to managed care Group Select Examples of Surveys Examining Ethically Important Topics. Objectives Sample Main Findings Most physicians were willing to misrepresent a screening test as diagnostic test to get insurance payment, 1/3 indicated that they would give incomplete or misleading information to patient’s family if a mistake caused patient’s death Thirty-nine percent of physicians used one of the three techniques to manipulate reimbursement rules during the previous year To assess physicians’ attitudes toward the use of deception in medicine 211 practicing physicians Wynia et al. (2000) To examine physicians attitudes toward and frequency of manipulation of reimbursement rules to obtain insurance coverage for services To assess physicians’ attitudes toward physician-assisted suicide and euthanasia 720 practicing physicians 938 physicians in one state Forty-eight percent of physicians thought euthanasia is never ethically justified and 39% believed physicianassisted suicide is never ethically justified. Fifty-four percent believed euthanasia and 53% believed physician-assisted suicide should be legal in some situations To determine accuracy of family members’ and physicians’ predictions of patients’ wishes to be resuscitated 70 patients and their family members and physicians Family members had 88% and 68% agreement rate with patients on each scenario while physicians had only 72% and 59% agreement rate with patients on each scenario, respectively Cohen, Fihn, Boyko, Jonsen, and Wood (1994) Patients or family Seckler et al. (1991) 141 Novack et al. (1989) Survey Research in Bioethics Table 1. 142 Table 1. (Continued ) Group Ganzini et al. (1998) Multiple groups Ubel et al. (1996) Bachman et al. (1996) Degner and Sloan (1992) Sample Main Findings To determine attitudes toward assisted suicide among patients diagnosed with amyotrophic lateral sclerosis (ALS) and their care givers 100 patients with ALS and 91 family care givers Fifty-six percent of patients would consider assisted suicide and 73% of caregivers and patients have the same attitude toward assisted suicide To explore preference for laryngectomy (high survival rate, high loss of speech) versus radiation (low survival rate, low loss of speech) for throat cancer 37 healthy volunteers, 12 firefighters, 25 middle and upper management executives Twenty percent of volunteers would choose radiation to preserve quality of life over quantity To determine whether individuals make cost-effective decisions about medical care given budget constraints 568 prospective jurors, 74 medical ethicists, 73 decision-making experts To assess physician and citizen attitudes toward physicianassisted suicide To determine how involved cancer patients and members of the general public would want to be in their treatment decisions 1119 physicians and 998 members of the general public 436 newly diagnosed cancer patients and 482 members of the public A little over half of the jurors and medical ethicists made non-costeffective medical decisions when the non-cost-effective choices offered more equity among patients Most physicians and members of the public preferred legalizing physicianassisted suicide over banning it Most patients wanted physicians to make treatment decisions on their behalf; most members of the public wanted to decide on their own treatment if they developed cancer G. CALEB ALEXANDER AND MATTHEW K. WYNIA General public McNeil et al. (1981) Objectives 143 Survey Research in Bioethics organizations as transparent and fair as possible and, equally important, these conclusions tell us something about how physicians view the equity of the current health insurance system. Despite the many purposes of surveys in bioethics, there are important limitations to the data they produce, including the authenticity of the responses and how far they can be generalized. In addition, the number of people doing X does not necessarily mean that X is desirable or undesirable. Heterogeneity of an attitude or behavior is almost always present, warranting consideration at the outset of how this will affect data analysis, conclusions, and presentation of an empirical inquiry (Lantos, 2004). FINDING A QUESTION A successful survey begins with identifying a good research question (Table 2). Identifying such a question is not trivial. The literature is vast and certain areas of inquiry in ethics, such as end-of-life care, have been extensively explored. However, the tremendous effort to examine ethical Table 2. Select Examples of Ethically Important Topics Suited for Survey-Based Analysis. Domain Principles of medical ethics Truth-telling Justice Non-malfeasance Ethically concerning behaviors Deception Poor quality care Clinically charged areas Unprofessional behavior Prognostication Euthanasia Surrogate judgment Informed consent Trade-offs Quantity versus quality of life Advocacy versus honesty Examples of Survey Topics Cancer patient end-of-life care Resource allocation; duty to treat Physician involvement in capital punishment Physician support for deception of third-party payers Non-adherence to appropriate guidelines Sexual relations with patients Accuracy of physicians’ prognoses Endorsement of euthanasia to treat suffering Predictive fidelity of surrogate judgments Adequacy of process of informed consent Willingness to forgo length to enhance quality of life Endorsement of insurance company deception 144 G. CALEB ALEXANDER AND MATTHEW K. WYNIA dimensions of end-of-life care speaks to the perceived importance of the topic, and an investigator with a strong interest in a well-explored area should not be dissuaded simply because others have examined the topic already. Big problems benefit from numerous approaches, and none are solved by one study alone. On the other hand, the best research questions are not only those that are feasible, but also novel. Whether or not a topic has previously been studied, any survey topic should also be relevant, either contributing to the development and understanding of normative theories or to more directly inform health policy or clinical care. Steps to identifying a good research question include a comprehensive review of the literature and existing theories, findings from qualitative studies, and discussion with content experts. Furthermore, the researcher must maintain a careful balance of creative intellectual meandering, considering multiple questions, and focusing on a specific topic. The importance of working from a theoretical framework and hypothesis varies somewhat depending upon the type of question being examined; detailed conceptual models and hypotheses are especially crucial for studies examining predictors or determinants of behaviors or attitudes. On the other hand, simple descriptive analyses of the frequency of an important outcome may demand a less well-developed theoretical framework to make the results interesting and useful. The question to ask, as in any research, is: ‘‘So what?’’ Who might care, and why, if you find what you expect to find, or if you do not? What ethical, policy, and/or clinical implications would follow from having answers to the questions you hope to pose? STUDY DESIGN There are a host of issues to consider in survey design and analysis, many of which this chapter will explore only briefly. There are several excellent references for further reading on general survey design, conduct, and analysis (Aday, 1996; Dillman, 2000; Fink, 1995). Here, we will focus on a few aspects of survey research that are most likely to arise during surveys on ethical issues. Who is to be Surveyed? Most bioethics surveys are of patients, family members, clinicians, others involved in health care (chaplains, social workers, and so on), or the general 145 Survey Research in Bioethics public. The population selected should be driven by the research question and can inform survey development. For example, health professionals may have high literacy levels and the capacity to respond to complex survey designs due to their familiarity with standardized testing. On the other hand, health professionals also have a low threshold of tolerance for any timeconsuming activity, suggesting that short surveys will be better received than longer ones. Certain populations of patients might have lower literacy or might not speak English – obviously, both would have a profound effect on survey design and administration. Although methodologically more complex, there are times when comparing or contrasting two different populations, rather than examining a single group, may be advantageous. Such an effort allows a broader and more rigorous approach to describing and reaching conclusions about attitudes or behaviors and may highlight important similarities and differences between groups of decision-makers, such as patients and families (Ganzini et al., 1998) or patients and clinicians (Alexander, Casalino, & Meltzer, 2003). Sampling Once a target group to be surveyed has been selected, it is important to consider how the sample within this group will be selected; this is called ‘‘sampling’’ or ‘‘sample design.’’ The degree to which those who receive the survey reflect the larger group from which they are drawn will affect the generalizability of survey findings, and hence the survey’s relevance and usefulness. At the same time however, important subgroups might be overlooked if special care is not taken in the survey’s sample design. Ethical issues might be rare, or might be most relevant to certain subsets of survey recipients (e.g. those recently testing positive for a genetic screening test). There are two main types of sampling designs, probability and nonprobability designs. The most common probability sampling design is a simple random sample, in which the probability of a sub-group’s selection into the survey sample is proportional to the frequency of that sub-group within the universe from which the sample is derived. In some cases, more sophisticated probability sampling methods may be applied to enhance the rigor of the survey protocol. For example, a stratified random sample allows for sampling of different sub-groups or strata within the sample universe, which is particularly helpful when different strata of interest are not equally represented within the sample universe. For instance, a survey of patients in the intensive care unit (ICU) might be designed to over-sample (i.e., include 146 G. CALEB ALEXANDER AND MATTHEW K. WYNIA more than would be chosen by chance) patients who are younger than 55 years, so that it can provide more information about this particular population, which is relatively uncommon in the ICU. Although the use of stratified probability sampling may improve efficiency and allow for greater representativeness and accuracy for selected populations, it introduces increased expense and complexity into the project and also depends upon a priori information about the populations (or strata) of interest. For analysis and reporting, most surveys using probability designs will benefit from developing sample weights. How to develop weights is beyond the scope of this chapter, but suffice it to say that weighting data allows one to report the results as if the whole sample had been drawn at random and was exactly representative of the universe of potential respondents. Weights allow for this by accounting for stratified sampling designs, as well as by taking into account potential sources of bias among survey respondents, such as differential response rates among different subsets of participants (Korn & Graubard, 1999). There are many types of non-probability sample designs. For example, convenience samples comprise populations that are surveyed because of ease and availability, such as a sample derived of patients and family members passing through a hospital cafeteria. Purposive samples and quota samples consist of subjects surveyed based on pre-specified criteria; in the latter case subjects are sampled until certain quota are fulfilled, such as a certain number of subjects of a given age range or race. Snowball samples use survey participants to recruit other potential participants, such as by asking physicians in retainer medical practices to name other physicians in similar type practices that might be contacted to participate in the survey (Alexander, Kurlander, & Wynia, 2005). Non-probability survey designs are neither inferior nor superior to probability designs overall; rather, each design has strengths and limitations and should be chosen based on the research question, available resources, and the degree to which external validity of the findings is important. External validity refers to how generalizable the data are to a larger (external) population. In some cases, survey findings only need to apply to a limited population of special interest. For example, a survey of physicians who practice in pediatric ICU’s was conducted to assess whether family presence during resuscitation attempts was more or less acceptable to them, and desired by patient families (Kuzin, et al., 2007). Whether these physicians’ views were representative of the general pediatrician population was not especially important, since most pediatricians outside the ICU have very little experience with acute resuscitations (Tibbals & Kinney, 2006). 147 Survey Research in Bioethics On the other hand, samples that are too small or poorly reflective of the population from which they were drawn should not be used to derive conclusions about the broader universe of subjects. For example, researchers studying the coping skills of parents of children hospitalized for chronic diseases might err in generalizing the results of their study to parents of children hospitalized for acute illness; these two groups of parents may differ in important ways with respect to the research question at hand. Or a study of family physicians’ views regarding home childbirth might be inappropriately generalized to obstetricians and general practitioners, but again, important differences may exist among these specialties with regard to the topic of interest. Survey Mode Most surveys are administered in-person or else conducted by U.S. mail, telephone, or increasingly, the Internet. Each survey modality has strengths and limitations (Table 3). A particular survey mode (e.g. telephone) should be selected with care, since it may influence both the response rate and patterns of responses. In addition, some populations may be more or less responsive to certain types of surveys. For example, those with low literacy will be less likely to respond to a written survey, and telephone surveys only reach people owning telephones. Recent trends in ownership of cellular versus landline phones also should be taken into account, such as the younger ages of people who own a cell phone but not a home phone (Tuckel & O’Neill, 2005). Survey Design Good survey questions are simple, clear, and often very easy to answer. However, survey development is a time-consuming task, and formulating specific survey questions to be ‘‘just right’’ may be quite difficult. Nevertheless, the time and effort required to create a good survey are well spent. Poor wording and design of surveys do not just frustrate respondents, it also leads to difficulties in item interpretation, higher rates of nonresponse and, as a result, more difficulty getting the results analyzed, published, and used. Survey questions (often referred to as ‘‘items’’) consist of three parts: (1) the introduction to the question, (2) the question itself (question stem), 148 G. CALEB ALEXANDER AND MATTHEW K. WYNIA Table 3. Pros and Cons of Different Survey Modes. Mode Commonly Targeted Respondents Phone General public In-person o Patients o General public U.S. Mail Physicians Internet o Physicians o Internet users Benefits Limitations o Inexpensive o Random-digit dial allows broad sampling of general public o Avoids problems of illiteracy o Rapid data entry with Computer Assisted Telephone Interviewing (CATI) software o Allows for sampling of patients or family members in clinical setting or public space o Offers use of visual aids and visual cues from respondents o Allows means of assisting with survey completion o Common method to reach physicians o Well-established protocols and mailing lists o Some ability to track and compare respondents with nonrespondents o Very inexpensive o Automatically entry of data while collected o Fast data collection o High non-response rates o Misses those without phones o Prevents use of visual aids as well as visual cues from respondent o Very expensive o Use in clinical settings over-samples frequent users of care o Use in public settings generally based on convenience sample of respondents who may differ from nonrespondents o Expensive o Use of survey waves complicates using anonymous design o Limited ability to generalize findings to non-Internet users o Email address lists often contain wrong addresses o High non-response rates Survey Research in Bioethics 149 and (3) the responses to the question (response frame). This structure is helpful to understand, both for conceptual clarity in communicating with others, as well as during the process of survey design. The introduction to a question or set of questions is crucial because it orients the respondent as to what is expected. Framing the issue is often critically important in ethics surveys (see section on Precautions with Sensitive Topics). For example, respondents can be put at ease through reassurance (e.g. ‘‘there is no right or wrong answer’’) or by acknowledging the challenge of the question (e.g. ‘‘please balance both the patients’ quality and length of life’’). Survey questions are either forced-choice or, less frequently, open-ended. The benefits of an open-ended question, where a respondent writes in a response (or provides the verbal equivalent during an interview), are that it is less leading and elicits a broader range of responses than a forced-choice question. Drawbacks are that the coding of such questions is technically complicated due to illegible (on self-administered surveys) or long-winded responses, and is conceptually complicated due to responses that are unclear in meaning or intent (regardless of survey mode). In addition, response rates tend to be lower for open-ended questions. A good survey question uses simple words, presents a simple idea, and uses as few words as possible to do so. Each survey question should contain a single idea and ask a single question. So-called ‘‘double-barrelled’’ questions, where two questions are asked at once, are generally to be avoided because answers to these questions are very difficult to interpret (for example, ‘‘How often have you been sued or been afraid you were going to be sued?’’). The challenge with question design is to navigate certain tensions that are inherent in this process. For example, sometimes more words are needed to precisely explain a question or complicated idea, yet more words may make a survey item more cumbersome to read and understand. There is a large literature devoted to the psychology of survey response, and readers interested in advanced study in this area should refer to one of several excellent references (Aday, 1996; Fink, 1995; Tourangeau, Rips, & Rasinski, 2000). Finally, the response options available for answering a survey question are important to consider. Response frames can be simple (e.g. yes/no or agree/disagree) or more complex (e.g. excellent, very good, good, fair, or poor) and sometimes it is necessary to create unique response frames that are specific to certain questions. The response frame of an item is crucial to making the question easy for the respondent to answer accurately. It should be developed based on an iterative process of piloting and pre-testing to 150 G. CALEB ALEXANDER AND MATTHEW K. WYNIA develop a response frame that is comprehensive, comprehensible, and, in the case of best choice answers, mutually exclusive. In addition to giving careful attention to design of the introductions, question stems, and response frames, care should be given to other survey design factors that may influence data quality. These include the survey aesthetics, framing (e.g. survey title), item grouping, and item order. For example, although selecting the title of a survey may seem a mundane or unimportant task, even this can make a considerable difference in response rates. Consider creating a title that is engaging and that will draw the recipient in (e.g. Managed care: What do physicians think?), but avoid titles that might alienate recipients or that suggest a bias in the research (e.g. Are doctors aware of widespread inequities in American health care?). Similarly, it is important to consider the ordering of items. Generally, it is advisable to avoid placing overly sensitive or conceptually difficult items early in the survey or first in a series of questions, use open-ended prior to forced-choice questions if they are about similar topics, and try to group items that have the same response frame together so as to maximize the flow and readability of the survey. In addition, it is helpful to number items within each response frame to maximize the accuracy and ease of data entry. SURVEY PLANNING A considerable amount of work on survey analysis, sampling, and projections of costs should be conducted prior to fielding the survey instrument. This helps to minimize the collection of data that inevitably are not used, and the inadvertent omission of data that would have been useful. Questions to consider in planning the survey include: What is (are) the main outcome(s) of interest? Will simple descriptive statistics of the frequency of an attitude or behavior suffice, or is the goal to conduct more detailed analyses of associations between different variables? If the latter, what is the dependent variable and what are the independent (predictor) variables of interest? What potential variables may confound the associations of interest? There is considerable value to developing ‘‘mock-up’’ tables of the expected results or lists of potential correlates prior to fielding the survey. This can help to locate gaps in the survey and allow careful consideration for how responses will be analyzed. The costs of surveys should also be considered during planning. Methods can be tailored to better fit the available budget, for example, by modifying the survey mode, sample size, financial incentives, or rigor of survey Survey Research in Bioethics 151 development and statistical analyses. Many surveys of ethically important topics have been conducted on small budgets. However, if the budget allows, there are many organizations, often university-affiliated, that can be contracted with to develop, test, or administer surveys. Similarly, firms exist that will provide survey support for telephone-based or Internet samples. Ethics surveys demand special attention to response rates. Since they are often about sensitive topics, ethics surveys may suffer from poor response rates, and there may be important differences between respondents and nonrespondents (i.e., non-response bias, see below). These issues can stimulate questions about a survey’s relevance. General efforts to improve response rates are of several types: (1) financial or non-financial incentives; (2) endorsements from opinion leaders/people of influence; (3) minimizing the burden of the survey; (4) personalizing the survey through efforts such as hand-written notes, adhesive stamps, or human signature; and (5) persuasion about the importance of the topic and the respondents’ views. This last point can be accomplished, for example, through a particular way of framing the survey in the cover letter and through its title, or through the use of special delivery methods (e.g. Federal Express for mailed survey). Of these five methods, the use of financial incentives has been studied most extensively. In general, studies of financial incentives suggest that the marginal benefit of larger financial incentives may be relatively small compared with the impact of smaller incentives (VanGeest, Wynia, Cummins, & Wilson, 2001). In addition, financial incentives, when used, are more effective when a small amount is offered upfront to everyone, rather than the use of a lottery or incentives upon survey completion (Dillman, 2000). SURVEY DEVELOPMENT AND PRE-TESTING After developing a general research question and theoretical framework, efforts should turn to identifying specific conceptual domains and factors to be explored within these domains. Qualitative data are often helpful to inform this process, and there are various ways to gather such data, including key informant interviews, focus groups, and ethnography (see chapters on qualitative methods). Such efforts are often invaluable in helping to identify important areas of inquiry, and may provide sufficient material for analyses that can take place in parallel with the quantitative 152 G. CALEB ALEXANDER AND MATTHEW K. WYNIA focus of the survey. Finally, most surveys need to be piloted and pre-tested to enhance and ensure brevity, clarity, and measurement accuracy. Accuracy consists of both the validity and reliability of the survey instrument. Validity is the degree to which a survey actually measures the phenomenon it purports to measure. In this regard, it is important to note that ethics surveys often attempt to measure phenomena (or ‘‘constructs’’) that are complex and on which clear consensus may not exist. For instance, ethics surveys may explore the meaning of ‘‘consent’’ or the importance of ‘‘privacy’’ or ‘‘fairness’’ in health care. When assessing such challenging and obscure constructs, particular attention must be given to the validity of the survey to ensure that it measures what it claims to measure. There are three main types of validity relevant to survey research: face validity, content validity, and construct validity. Face validity refers to whether the survey domains and items appear reasonable at face value. A common way to assess face validity is to share the instrument with relevant parties, such as patients, family members, and clinicians, and ask if they agree that the items measure what you intend them to measure. Content validity refers to how well the items examined represent the important content of the domains of interest. To assess content validity one must ask, does the survey address all the facets of the issue in question or, on the other hand, does it include aspects that are unrelated to the issue? For example, in the development of their survey instruments for assessing privacy in hospitals, researchers asked a group of experts to evaluate the survey items by rating each item on a 1–10 scale where (1) represented items necessary to the issue and (10) represented items extraneous to the issue. The experts were also asked for ideas on any areas that might have been missed. As this example shows, ensuring adequate content validity is usually achieved by the researchers’ comprehensive knowledge of the subject matter the survey is examining and is supported by careful review of the survey domains and items with experts in the field of inquiry. As ‘‘soft’’ as simple reliance on experts may seem, content validity is crucial to a good survey on an ethically important topic. Criticisms of ethics surveys are frequently focused on items that the survey team did not ask, or items that while the team asserts are related to the central question, there is not a strong outside consensus from experts that this is so. Finally, construct validity refers to the degree to which one can safely make inferences from a survey to broader theoretical constructs operationalized in the survey. Construct validity is very difficult to prove in many ethics surveys, because the constructs in question are often quite complex. How Survey Research in Bioethics 153 does one prove that constructs such as ‘‘informed consent,’’ ‘‘concern for privacy,’’ ‘‘fear,’’ or a sense of ‘‘disrespect’’ or ‘‘mistrust’’ are accurately measured? Construct validity is assessed by determining that the measure in use is related in expected ways to other known measures of the construct in question. For instance, the results of a survey that purports to measure ‘‘pain’’ would be expected to correlate with other measures that are known to be associated with pain, such as sweating, rapid pulse, and asking for pain medication. The challenge in ethics surveys is often to determine, a priori, what are the expected correlates of the relevant constructs. In addition to being valid, surveys should be reliable as well: they should get the same results each time they are used in the same circumstances. The reliability of a survey can be assessed through repeated administrations to one individual (intra-observer, or test-retest reliability) and by assessments of a given event or practice across multiple individuals (inter-observer reliability) (also see Chapter on The Use of Vignettes). Statistical tests of reliability are well described in most basic biostatistics or clinical epidemiology textbooks. SURVEY FIELDING AND DATA ENTRY In any ethics survey, especially one using new items, there is considerable benefit in examining early responses as they come in. For instance, it may be helpful to perform some analyses on the first wave of survey responses. This may allow for the identification of potentially serious systematic flaws, such as if respondents are unintentionally skipping questions printed on the back of a page. As data are collected, several methods can be used to ensure systematic initiation of data entry and analysis. Errors in data entry are almost impossible to avoid, but the rigor of data entry may be enhanced by double entry or randomly checking a subset of respondents for the frequency of incorrectly entered data, or both. Questions about how to code unclear responses, such as when a response is somewhat illegible or does not clearly fit the response frames offered, are inevitable. The researcher should anticipate these and work to treat them in a systematic and fair way that does not introduce unnecessary bias into the measurements. Data entry can be enhanced by the use of limited data fields and specialized software that may simplify tasks such as providing easier database management of complicated survey skip patterns. Data entry can also be simplified by minimizing the number of steps between survey query and response in the 154 G. CALEB ALEXANDER AND MATTHEW K. WYNIA case of telephone or in-person surveys through the use of Computer Assisted Telephone Interviewing (CATI) software. Even with these efforts, data cleaning is important to ensure that the data are of high quality; that is, data for given variables are within expected ranges (e.g. a survey item with a response frame from 1–5 should not have any 8 s entered as responses) and the distributions of data make sense (e.g. an item where all respondents answered the same would raise suspicions of a data entry error). BIASES AND RESPONDENT BURDEN Respondent bias and burden are two of the most important considerations to guide survey development. Bias refers to any systematic tendency to overor underestimate whatever is being measured. There are numerous types of bias that are important to consider. Socially desirable response bias, sometimes referred to as ‘‘yeah-saying,’’ is a special threat to the rigor of ethics surveys. It can be addressed in several ways, including sensitively wording questions, carefully designing the survey framing, paying attention to item ordering, ensuring protection of confidentiality (or, rarely, even ensuring anonymity), and by using statistical methods that help to adjust for the likelihood of this bias. Recall bias may be present when asking about past events, and can be minimized by carefully framing the time period in question, such as by limiting the length of the retrospective period (e.g. ‘‘In the last week y’’) or by using discrete events that are more likely to be accurately recalled (e.g. ‘‘At the last Grand Rounds you attended, ...’’). Non-response bias refers to bias introduced by systematic differences between respondents and non-respondents. That is, those who return the survey may be different in some relevant way from those who do not. This type of bias is a perennial challenge to survey development, fielding, and interpretation, and it is especially relevant to ethics surveys, which often touch on very sensitive or controversial topics. For instance, survey recipients who are especially affected by the survey topic (e.g. malpractice) may be more, or less, likely to respond. In addition, the association between respondent burden and poor response rates should not be underestimated. Although low response rates do not mean that non-response bias is present, longer and more cumbersome surveys are less likely to be completed, and lower response rates, all other things being equal, will raise concerns of possible non-response bias. In addition to seeking to improve response rates, there are four common methods used to address non-response bias. Survey Research in Bioethics 155 First, one can compare respondents with non-respondents on all known variables (the absence of differences suggests that non-response bias is less likely). Second, one can look for ‘‘response-wave bias’’ in surveys conducted by Internet or U.S. mail by exploring whether there is any association between the length of time until survey response and the primary outcome(s) of interest. Response-wave bias is based on an assumption that respondents who took a long time to respond to the survey, such as those responding to a third survey wave, somewhat resemble non-respondents in that they were less motivated to respond than their counterparts. The absence of any association between length till response and the primary outcome(s) of interest suggest that non-response bias is less likely. Third, the active pursuit of a subset of non-respondents may be helpful to ascertain the response frequencies to one or two key questions among this group. For example, in a survey of physicians’ support for capital punishment, researchers might be concerned of a significant non-response bias among the 45% of subjects who were non-respondents. To examine for this bias, the researchers might select a random 10% sample of non-respondents and call them by phone with one short question from the survey to find out whether their global beliefs about capital punishment are similar to survey respondents. Finally, advanced statistical methods can be used, such as weighting of survey responses, in an effort to account for non-response bias. Other biases are important to consider depending upon the unique circumstances of the project (e.g. interviewer bias, which introduces variation in survey response based on the characteristics of the interviewer administering the survey), but are less ubiquitous threats to survey validity than those discussed above. PRECAUTIONS WITH SENSITIVE TOPICS Many ethics surveys examine the prevalence of specific behaviors. One-way to identify the frequency of any behavior, sensitive or not, is to directly question the respondent (e.g. ‘‘In the last week, how often did you y?’’). However, the benefit of being able to report direct prevalence must be balanced with an acknowledgment that such reports are especially prone to socially desirable response bias and therefore may over- or underestimate the actual frequency of the behavior in question. Positive behaviors are likely to be over-reported, while negative behaviors are likely to be underreported. In the case of negative behaviors, direct questions can also alienate survey recipients, because direct questions about negative behaviors are often perceived as leading (e.g. ‘‘How often, if ever, do you kick your 156 G. CALEB ALEXANDER AND MATTHEW K. WYNIA dog?’’). As a result, direct questions may be most useful for non-sensitive topics or under circumstances in which modest misestimation of prevalence might not reduce the likely ethical, policy, or clinical impact of the survey results. For instance, even if only a few respondents directly report a very concerning behavior it might be worth investigation (e.g. admission of illicit drug use among physicians). An alternative to direct questions is to use indirect, or third party, questions, which can allow respondents to discuss sensitive topics without having to personally admit to socially undesirable or stigmatized behavior. For example, instead of asking individual patients whether they have ever faked an illness to skip work, inquire as to whether they know of any colleagues that have done so. The trade-off is that such questions do not allow for direct assessment of the frequency of the behavior and hence, these questions too might not produce good estimates of prevalence. For instance, perhaps only one employee has skipped work on a medical excuse to go fishing, but many employees completing the survey know about this situation – in this case, the frequency of this behavior might be overestimated. There are several methods to help maximize respondent honesty and comfort when completing both direct and indirect questions about sensitive topics. First, the way that each question is framed is crucial, and calming stems that acknowledge the legitimacy of different or controversial viewpoints and actions are helpful to allow respondents to answer honestly (e.g. ‘‘Patients often find health care frightening and stressful, and they handle this stress in many ways y’’ or ‘‘There are no right answers to the following questions y’’). Question order can be used to one’s advantage as well – generally, questions that are more sensitive should be introduced later in the survey while more sensitive responses are better earlier in the response frame. The risk of socially desirable response bias in response to direct questions can also be diminished with any interventions that help to protect respondents’ confidentiality or anonymity. Another option, useful for both sensitive and non-sensitive topics, is hypothetical vignettes. For a detailed discussion of this method, please see chapter on hypothetical vignettes. DATA ANALYSIS After data entry and cleaning, examination of univariate (single item) distributions is helpful. A blank survey instrument can be used as a template upon which to write these distributions. In some, but not all, cases, bivariate 157 Survey Research in Bioethics and multivariate distributions may be of interest, in order to see how the primary measure(s) of interest, such as patient’s preference for end-of-life care, may be associated with other variables of interest, such as illness chronicity, hospice availability, and the specialty of the treating clinician. Multivariate regression analyses and other advanced statistical analytic techniques are possible; however, they require additional skills that may be beyond most bioethicists and they may not be relevant to the research question at hand. While it is sometimes critical to evaluate associations while holding other factors constant, as multivariate regression allows one to do, in many important ethics-related surveys, simple descriptive statistics are sufficient to examine the question of interest (e.g. what proportion of surgeons would override a patient’s ‘‘Do Not Resuscitate’’ order in the immediate post-operative period?). In this regard, it is important to return to the initial survey question or hypothesis and the conceptual model that one is using to frame the question. Where advanced statistical methods are required, it may be helpful to collaborate with statisticians and others with advanced training in health services research. Despite the relative ease with which statistical programs can generate multivariate analyses, creating appropriate models, using appropriate tests, and interpreting the results of these models demands special expertise. CONCLUSIONS Surveys have great promise as a method to inform bioethics, clinical practice, and health policy. To achieve this promise, the researcher must balance rigor with feasibility at all stages of survey development, fielding, analysis, and presentation. Table 4 provides a case study of an empiric ethics survey and illustrates examples of some key elements of survey design, administration, and analysis. Identifying a good research question is crucial, yet the difficulty of this stage of survey research may be easily overlooked. When conducting ethics surveys, it is particularly important to guard against constant threats to survey validity, such as unclear wording of survey questions or biased responses to questions, because the issues under study are often complex, conceptually inchoate, and/or sensitive or controversial. Finally, in bioethics the relationship between the quantitative data gathered by surveys and the qualitative nature of normative theories is a dynamic one, making survey interpretation a challenge. Nevertheless, a well-constructed, carefully analyzed survey in ethics can have a meaningful impact on policy and practice. Surveys are well suited for examining areas of 158 Table 4. G. CALEB ALEXANDER AND MATTHEW K. WYNIA Case Study: An Example of a Survey in Bioethics (Alexander & Wynia, 2003). Aspect of Survey Goal of survey Finding a question Who is to be surveyed? Sampling Survey mode Survey design Survey planning Development and pretesting Fielding and data entry Methods  To explore physicians’ bioterrorism preparedness, willingness to treat patients despite personal risk, and beliefs in a professional duty to treat during epidemics  Question motivated by: (1) significant topical interest in duty to treat since September 11th, 2001, (2) prior debates regarding duty to treat during outset of HIV epidemic, (3) long and varied historical tradition of physicians’ response to epidemics, and (4) salient ethical dilemmas regarding balance of physician selfinterest with beneficence  Decision to survey patient-care physicians  Simple random sample taken from universe of all practicing patient-care physicians in the United States; use of representative sample allowed for inferences to be made regarding broader physician population  Survey conducted by US mail given impracticality and costs of inperson or phone surveys, and given generally low response rates to Internet surveys and the absence of reliable email addresses for physicians  Single page survey to maximize response rate; most response frames similar to reduce respondent burden  Survey title emphasized importance of respondents’ experiences; survey layout facilitated completion in less than 5 min  Decision that main outcomes of analysis would be simple descriptive statistics; awareness a priori that causal direction of any associations would be unclear  Length and rigor of pretesting balanced with need for timely data collection given potential shifting interest in the topic among policy makers, providers, and the general public  Efforts to obtain maximal response rates included minimizing burden by limiting survey to one sheet of paper, using persuasion that important topic, and using a $2 financial incentive  Face validity maximized through piloting and pretesting with practicing clinicians  Content validity maximized by expert review of survey by clinicians involved in disaster response planning  Construct validity maximized by examining expected correlations between items on training and preparedness  Analysis of early survey respondents allowed us to observe an association between survey response time and duty to treat; it was unclear if this was due to response-wave bias or temporal trends 159 Survey Research in Bioethics Table 4. (Continued ) Aspect of Survey Bias and respondent burden Data analysis Hypothetical vignettes Methods  A random sample of 100 additional physicians selected to receive survey; analyses of these new respondents suggested that temporal trends were present  Socially desirable response bias a significant threat, so survey included language reassuring subjects that survey was strictly confidential and that there may not be one right answer; also, since socially desirable response would be to report a duty to treat, our estimates provided upper bounds on physicians’ beliefs  Non-response bias examined by comparing respondents with nonrespondents, looking for associations between response wave and main outcomes of interest  Simple descriptive statistics used in conjunction with multivariate analyses; multivariate models based on logistic regression, which allowed for a more simple presentation of results  No vignettes utilized, as primary area of interest was not how different patient, provider, or system factors modify physicians’ beliefs regarding the duty to treat; vignettes would be helpful if this was an interest, and also potentially to diminish socially desirable response bias by providing more clinical detail in an effort to maximize validity of responses health care that raise vexing ethical issues for patients, clinicians, and policymakers alike. REFERENCES Aday, L. (1996). Designing and conducting health surveys: A comprehensive guide (2nd ed.). San Francisco, CA: Jossey-Bass. Alexander, G. C., Casalino, L. P., & Meltzer, D. O. (2003). Patient-physician communication about out-of-pocket costs. Journal of the American Medical Association, 290, 953–958. Alexander, G. C., Kurlander, J., & Wynia, M. K. (2005). Physicians in retainer (‘‘concierge’’) practice: A national survey of physician, patient and practice characteristics. Journal of General Internal Medicine, 20(12), 1079–1083. Alexander, G. C., & Wynia, M. K. (2003). Ready and willing? Physician preparedness and willingness to treat potential victims of bioterrorism. Health Affairs, 22(September/ October), 189–197. Bachman, J. G., Alcser, K. H., Doukas, D. J., Lichtenstein, R. L., Corning, A. D., & Brody, H. (1996). Attitudes of Michigan physicians and the public toward legalizing physicianassisted suicide and voluntary euthanasia. New England Journal of Medicine, 334, 303–309. 160 G. CALEB ALEXANDER AND MATTHEW K. WYNIA Christakis, N. A., & Lamont, E. B. (2000). Extent and determinants of error in doctors’ prognoses in terminally ill patients: Prospective cohort study. British Medical Journal, 320, 469–473. Cohen, J. S., Fihn, S. D., Boyko, E. J., Jonsen, A. R., & Wood, R. W. (1994). Attitudes towards assisted suicide and euthanasia among physicians in Washington state. New England Journal of Medicine, 331, 89–94. Degner, L. F., & Sloan, J. A. (1992). Decision making during serious illness: What role do patients really want to play. Journal of Clinical Epidemiology, 45(9), 941–950. Dillman, D. (2000). Mail and Internet surveys: The tailored design method. Wiley. Fink, A. (Ed.) (1995). The survey toolkit. Thousand Oaks, CA: Sage. Ganzini, L., Johnston, W. S., Bentson, H., McFarland, B. H., Tolle, S. W., & Lee, M. A. (1998). Attitudes of patients with amyotrophic lateral sclerosis and their care givers toward assisted suicide. New England Journal of Medicine, 339, 967–973. Korn, E. L., & Graubard, B. I. (1999). Analysis of health surveys (ch. 2, pp. 159–91). New York, NY: Wiley. Kuzin, J. K., Yborra, J. G., Taylor, M. D., Chang, A. C., Altman, C. A., Whitney, G. M., & Mott, A. R. (2007). Family-member presence during interventions in the intensive care unit: Perceptions of pediatric cardiac intensive care providers. Pediatrics, 120, e895– e901. Lantos, J. (2004). Consulting the many and the wise. American Journal of Bioethics, 4, 60–61. McNeil, B. J., Weichselbaum, R., & Pauker, S. G. (1981). Speech and survival: Tradeoffs between quality and quantify of life in laryngeal cancer. New England Journal of Medicine, 305, 982–987. Novack, D. H., Detering, B. J., Arnold, R., Forrow, L., Ladinsky, M., & Pezzullo, J. C. (1989). Physicians’ attitudes toward using deception to resolve difficult ethical problems. Journal of the American Medical Association, 261, 2980–2985. Seckler, A. B., Meier, D. E., Mulvihill, M., & Paris, B. E. (1991). Substituted judgment: How accurate are proxy predictions? Annals of Internal Medicine, 155, 92–98. Tibbals, J., & Kinney, S. (2006). A prospective study of outcome of in-patient paediatric cardiopulmonary arrest. Resuscitation, 71, 310–318. Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The psychology of survey response. Cambridge University Press: New York, NY. Tuckel, P., & O’Neill, H. Ownership and usage patterns of cell phones: 2000–2005. AAPOR-ASA Section on Survey Research Methods. Available at: http://www.amstat.org/Sections/ Srms/Proceedings/y2005/Files/JSM2005-000345.pdf (Accessed May 23, 2007). Ubel, P. A., DeKay, M. L., Baron, J., & Asch, D. A. (1996). Cost-effectiveness analysis in a setting of budget constraints: Is it equitable? New England Journal of Medicine, 334, 1174–1177. VanGeest, J. B., Wynia, M. K., Cummins, D. S., & Wilson, I. B. (2001). Effects of different monetary incentives on the return rate of a national mail survey of physicians. Medical Care, 39, 197–201. Wynia, M. K., Cummins, D. S., VanGeest, J. B., & Wilson, I. B. (2000). Physician manipulation of reimbursement rules for patients: Between a rock and a hard place. Journal of the American Medical Association, 283, 1858–1865. HYPOTHETICAL VIGNETTES IN EMPIRICAL BIOETHICS RESEARCH Connie M. Ulrich and Sarah J. Ratcliffe ABSTRACT Hypothetical vignettes have been used as a research method in the social sciences for many years and are useful for examining and understanding ethical problems in clinical practice, research, and policy. This chapter provides an overview of the value of vignettes in empirical bioethics research, discusses how to develop and utilize vignettes when considering ethics-related research questions, and reviews strategies for evaluating psychometric properties. We provide examples of vignettes and how they have been used in bioethics research, and examine their relevance to advancing bioethics. The chapter concludes with the general strengths and limitations of hypothetical vignettes and how these should be considered. INTRODUCTION The Significance and Value of Vignettes in Empirical Bioethics Research The value and significance of empirical bioethics research lies in its ability to advance our knowledge and understanding of a variety of ethical issues to promote opportunities for dialog among clinicians, researchers, Empirical Methods for Bioethics: A Primer Advances in Bioethics, Volume 11, 161–181 Copyright r 2008 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1479-3709/doi:10.1016/S1479-3709(07)11008-6 161 162 CONNIE M. ULRICH AND SARAH J. RATCLIFFE policy-makers, and members of other disciplines. Additionally, this type of research generates new lines of descriptive and normative inquiry. Hypothetical vignettes represent one type of empirical approaches to examining bioethical issues. This method can be used to better understand attitudes, beliefs, and behaviors related to bioethical considerations in clinical practice, health service delivery and financing, and health policy (Finch 1987; Flaskerud, 1979; Gould, 1996; Hughes & Huby, 2002; Veloski, Tai, Evans, & Nash, 2005). Issues range from the macro level, such as access to care, costs of care, and allocation of resources to micro level concerns of bedside rationing, provider–patient relationships, and beginning of life and end-of-life care problems. Given the potentially sensitive nature of many bioethic problems, hypothetical vignettes provide a less personal and, therefore, less threatening presentation of issues to research participants. Furthermore, some bioethics-related events are relatively rare and vignettes provide a mechanism to explore the attitudes and extrapolated behaviors concerning such events using larger numbers of participants than would be otherwise possible. This chapter will give an overview of hypothetical vignettes in research with examples of how this method has been used to examine and analyze critical ethical problems. We will also review ways to evaluate the reliability, validity, strengths, and limitations of studies using vignettes. WHAT IS A VIGNETTE? Vignettes have been used in social science research since the 1950s (Gould, 1996; Schoenberg & Ravdal, 2000) and have been described as ‘‘short stories about hypothetical characters in specified circumstances, to whose situation the interviewee is invited to respond’’ (Finch, 1987, p. 105). Stories are designed to represent an issue of importance that simulates real life and require a focused response from participants. Depending on the research question of a study, vignettes are appropriate for both qualitative and quantitative methodological designs and studies using mixed methods. They can be used in isolation or as adjuncts to other data collection methods (Hughes & Huby, 2002), for example, self-administered survey questionnaires, focus groups, and face-to-face semi-structured interviews. Vignettes can be presented in a variety of ways such as verbal administration, written surveys, audiotape, videotape, and computers. The latter allows for a flexible approach to reaching groups across various settings. Hypothetical Vignettes in Empirical Bioethics Research 163 Unlike attitudinal scales that ask direct questions about values and beliefs, vignettes offer an approach that assesses individuals’ attitudes or values in a contextualized scenario or situation (Alexander & Becker, 1978; Finch, 1987; Flaskerud, 1979; Gould, 1996; Hughes, 1998; Hughes & Huby, 2002; Schoenberg & Ravdal, 2000; Veloski et al., 2005). Various disciplines have used vignettes in health care and social science research among nurses, physicians, and in the general population to examine and ascertain attitudes, values, behaviors, and norms (Alexander, Werner, Fagerlin, & Ubel, 2003; Asai, Akiguchi, Miura, Tanabe, & Fukuhara, 1999; Barter & Renold, 2000; Christakis & Asch, 1995; Denk, Fletcher, & Reigel, 1997; Emanuel, Fairclough, Daniels, & Clarridge, 1996; Gump, Baker, & Roll, 2000; Kodadek & Feeg, 2002; McAlpine, Kristjanson, & Poroch, 1997; Nolan & Smith, 1995; Rahman, 1996; Wolfe, Fairclough, Clarridge, Daniels, & Emanuel, 1999). The interpretation of standardized vignettes by different groups of people within the same study can also be examined and compared (Barter & Renold, 1999). Most studies that use vignettes rely on the constant-variable vignette method (CVVM) where identical scenarios are presented to respondents with multiple forced-choice questionnaires or rating scales (Cavanagh & Fritzsche, 1985; Wason, Polonsky, & Hymans, 2002). For example, in a survey using vignettes developed by Christakis and Asch (1995) to measure physician characteristics associated with decisions to withdraw life support, respondents were asked to rate on a 5-point Likert scale how likely they would be to withdraw life support from a hypothetical character presented in the scenarios (Box 1). In another study to assess whether informed consent should be required for biological samples derived clinically and/or from research, Wendler and Emanuel (2002) surveyed two cohorts of subjects via telephone using three vignettes. Instead of a Likert response set, however, participants were simply asked to respond by using the following categories: ‘‘yes,’’ ‘‘no,’’ ‘‘don’t know,’’ and ‘‘it depends.’’ Different versions of the same vignette can also be constructed in a single study using a factorial design. In this design, researchers can test multiple hypotheses by systematically manipulating two or more independent variables (for example, age and gender) within the vignette and randomly allocate subjects to the vignette as a means to evaluate both main effects, of each independent variable on the outcome variable, and interaction effects of two or more independent variables on the outcome variable (Polit & Hungler, 1999). 164 CONNIE M. ULRICH AND SARAH J. RATCLIFFE Box 1. Example of a Vignette Used to Assess Physicians’ Perceptions on Withdrawal of Life Support. EL is a 66-year-old patient of yours with a 15-year history of severe chronic pulmonary disease. One week ago, he was admitted to the ICU with pneumonia, hypotension, and respiratory failure. He required antibiotics, intravenous vasopressors, and mechanical ventilation to survive. He has now lapsed into a coma and shows no signs of clinical improvement. Consultant pulmonologists assert that his lung function is such that he will never be independent of the ventilator. After his most recent hospitalization, the patient had clearly expressed to his family and to you that he would never want to live by artificial means. In view of these wishes and his poor prognosis, the family asks you to withdraw life support. You are deciding whether to stop the intravenous vasopressors or the mechanical ventilation. (Christakis & Asch, 1995) Qualitative researchers use vignettes to explore meanings of particular issues by asking participants to discuss the situation presented in their own words in response to open-ended questions. For example, to better understand the ethical awareness among first year medical, dental, and nursing students, Nolan and Smith (1995) asked students to respond to different vignettes that contained ethical dilemmas. In response to the vignettes, subjects were asked: ‘‘What course of action would you suggest, giving reasons?’’ Other authors (Kodadek & Feeg, 2002) have used similar open-ended questions based on vignettes to explore how parents approach end-of-life decisions for terminally ill infants by asking respondents (1) ‘‘What is your first reaction as you begin to think about this problem?’’; (2) ‘‘What specific questions will you ask the physician when you have a chance to discuss this problem?’’; or (3) ‘‘Name at least 5 aspects of the problem that you will consider in making your decision.’’ Open-ended response sets allow for in-depth probing and identification of issues deemed salient and emerging themes. Grady et al. (2006) presented four different hypothetical scenarios about financial disclosure to active research participants in NIH intramural sponsored protocols. Subjects were asked to openly discuss their views on each scenario, whether or what they would want to know about the financial interests of investigators, and how knowledge of financial disclosures would influence their research participation. Lastly, Hypothetical Vignettes in Empirical Bioethics Research 165 Berney et al. (2005) developed two clinical vignettes from semi-structured interviews related to allocation of scarce resources and followed by discussions of the vignettes in focus groups with general practitioners. Participants were asked, through open-ended queries, to describe the ethical issues they perceived when presented with the vignettes. Usually, both qualitative and quantitative research ask respondents to rank, rate, or sort particular aspects of vignettes into categories, or to choose what they think the hypothetical characters should or ought to do in the presented scenario(s) (Martin, 2004). HOW TO DEVELOP A VIGNETTE The complexity of many bioethical problems often necessitates constructing vignettes to meet a study’s particular purpose. If possible, however, using established valid and reliable vignettes is always preferable. Vignettes can be constructed from the literature as well as from other sources, and must appear realistic, relevant, and be easily understood (Hughes, 1998; Wason et al., 2002). Several approaches to the development and evaluation of vignettes can be used as described below (Box 2). 1. Focus groups Focus groups provide an opportunity to gather information for vignettes from a population of interest related to the specific bioethical issue being Box 2. How Can Vignettes be Constructed and Presented in Bioethics Research? Constructed:  Previous research findings/literature review  Real life experiences with clinical and/or research cases  Focus groups  Cognitive interviewing Presented:  Narrative story  Computer based, music videos  Comic book style; flip book; cards; surveys; audio tapes (Hughes, 1998) 166 CONNIE M. ULRICH AND SARAH J. RATCLIFFE studied, especially if limited information on the topic exists (Krueger & Casey, 2000). Focus groups generally range from 6 to 8 individuals who participate in an interview for a specified time period with the object of obtaining ‘‘high-quality data in a social context where people can consider their own views in the context of the views of others’’ (Patton, 2002, p. 386). Using focus groups to develop vignettes allows for the identification of the test variables (i.e., age, gender, level of education), how to structure and present content of interest in a vignette, and the number of vignettes needed. Schigelone and Fitzgerald (2004) convened a focus group of experts in geriatrics, nursing, and social science to identify key variables for geriatric vignettes to assess the treatment of older and younger patients by first year medical students. Based on the responses, the authors drafted initial versions of vignettes that examined age of patients in relation to levels of aggressive treatment (for more information, please see chapter on focus groups). Focus groups help to:  A priori refine the objectives of the research utilizing vignettes.  Clarify and provide in-depth understanding of how subjects think and interpret the topic of interest.  Design and construct vignettes for a larger quantitative study. 2. Cognitive Interviewing Cognitive interviewing (CI) is an important technique used to evaluate survey questionnaires and/or identify difficulties in vignettes. CI is used to ‘‘understand the thought processes used to answer survey questions and to use this knowledge to find better ways of constructing, formulating, and asking survey questions’’ (Forsyth & Lessler, 1991). Because vignettes often are presented in written surveys, CI explores respondents’ abilities to interpret vignettes presented within a survey format, assesses whether the wording of questionnaires accurately conveys the objective of the vignette(s) and subjects’ techniques for retrieving information from memory, as well as their judgment formation on the material presented (Willis, 1994, 2006). CI is usually undertaken ‘‘between initial drafting of the survey questionnaire and administration in the field’’ (Willis, 2006, p. 6). Subjects are often asked to ‘‘think aloud,’’ that is, to verbalize their thoughts as they respond to each vignette (self administered or interviewer administered). One researcher usually conducts the interview while a second researcher will observe the interview, tape record, take notes, and transcribe responses. By asking respondents if the proposed vignettes are measuring what they claim Hypothetical Vignettes in Empirical Bioethics Research 167 to measure (content validity), an important means of pre-survey evaluation can be established (Polit & Hungler, 1999). Example of ‘‘Think Aloud’’ exercise and verbal probes: ‘‘While we are going through the questionnaire, I’m going to ask you to think aloud so that I can understand if there are problems with the questionnaire. By ‘‘think aloud’’ I mean repeating all the questions aloud and telling me what you are thinking as you hear the questions and as you pick the answers.’’ Verbal probing techniques can be used as an alternative to ‘‘think aloud exercises’’ allowing the interviewer to immediately ask follow-up questions based on the subject’s response to the vignette. Probes can be both spontaneous and scripted (prepared prior to the interview and used by all interviewers) to test subjects’ comprehension of questionnaire items, clarity, interpretation, and intent of responses and recall. Examples of verbal probes are given below (Grady et al., 2006; Willis, 2006).  General probe: o ‘‘Do you have any questions or want anything clarified about this study?’’ o ‘‘What are your thoughts about this study?’’ o ‘‘How did you arrive at that answer?’’  Paraphrasing probe: o ‘‘Can you repeat the question in your own words?’’  Recall probe: o ‘‘What do you need to remember or think about in order to answer this question?’’  Comprehension probe: o ‘‘What does the term ‘‘research subject’’ mean to you?’’  Confidence judgment: o ‘‘How sure are you that your health insurance covers your medications?’’  Specific probe: o ‘‘Would you consider individuals who consent to participate in this study vulnerable in any way?’’ ‘‘Vulnerable to what?’’ ‘‘Why?’’ 3. Field Pre-testing The purpose of a pretest is to simulate the actual data collection procedures that will be used in a study and to identify problem areas associated with the vignettes prior to administration with a sample (Fowler, 2002; Platek, Pierre-Pierre, & Stevens, 1985; Presser et al., 2004). Pre-testing is carried out with a small convenience sample of subjects, similar to the characteristics of the population planned for in the actual study. It is important to assess if the vignettes and related questions are consistently understood and believable so that researchers can improve on any reported 168 CONNIE M. ULRICH AND SARAH J. RATCLIFFE practical problem (s). Confusing and complex wording, misunderstandings, or misreading of vignettes can lead to missing data that ultimately biases sample estimates, underestimates correlation coefficients, and decreases statistical power (Platek et al., 1985; Kalton & Kasprzyk, 1986; Ulrich & Karlawish, 2006). Problems may also be related to length of completion time and burden or simply typographical errors. Therefore, evaluating response distributions through pre-testing can help the researcher to revise questions. For example, if a vignette with open-ended questions is planned, the pre-test can identify redundant participant responses and help to narrow the range of answers needed to respond. As a result, the researcher might deem fixed response categories as being more appropriate (Fowler, 1995). Revising and refining the instrument based on pre-testing improves the data collection procedures and data quality. Areas to assess in pre-testing:  Completion time: How long is the questionnaire? Are all items completed? If items were skipped, why?  Clarity of wording: Are any items or terms used in the vignettes and questionnaire confusing to respondents? Are questions unidimensional?  Complication of the instructions: Are the instructions easy to follow, understandable, and comprehensive? Does the questionnaire flow well; does the question order appear logical?  Is the questionnaire user- friendly? Are the vignettes easy to understand?  Is the information in the vignettes accurate and unambiguous?  Are important terms in the vignette defined?  Do respondents perceive any of the items as too sensitive to answer?  What was the response rate? Illustrations of Pre-Survey Testing Siminoff, Burant, and Younger (2004) conducted 12 focus groups of different ethnicities, including Hispanics, Muslims, and African Americans to guide their research on public attitudes surrounding death and organ procurement. The focus groups were the basis for the development of vignettes that presented varying patient conditions (i.e., brain death, severe neurological damage, and persistent vegetative state). In doing so, these groups provided clarification of the ethical terms in lay language with diverse perspectives, addressed concerns not readily apparent in the literature, and provided input into the final random digit dial version of a survey that included vignettes and was to be conducted with citizens in Hypothetical Vignettes in Empirical Bioethics Research 169 Ohio. Following the focus groups, a pre-test of the survey was randomly administered to 51 individuals to further assess the questionnaire and vignettes for clarity, completion time, and reliability of respondents’ classification of when death occurs. Curbow, Fogarty, McDonnell, Chill, and Scott (2006) developed eight video vignettes to measure the effects of three physician-related experimental characteristics (physician enthusiasm, physician affiliation, and type of patient–physician relationship) on clinical trial knowledge and acceptance, and beliefs, attitudes, information processing, and video knowledge. To evaluate the video vignettes, pretesting was conducted with eight focus groups of former breast cancer patients and patients without cancer. Alterations to the vignettes were made based on participants’ responses. EVALUATING PSYCHOMETRIC PROPERTIES OF VIGNETTES Before administering vignettes to a sample and analyzing the results, it is important that the vignettes are valid and reliable. Only reliable and valid vignettes should be used to describe phenomena or test hypotheses of interest in a study. A comprehensive review of reliability and validity issues can be found in Litwin (1995) and Streiner and Norman (2003). Validity of Vignettes Internal validity is important for quantitative and qualitative measures and expresses the extent to which an instrument adequately reflects the concept(s) under study. Because vignettes in bioethics research are often constructed solely for the particular topic under study, validity and reliability are essential to achieve meaningful analyses and interpretation of data. In other words, empirical bioethics researchers often have fewer pre-existing instruments to draw from and need to develop measurement tools de novo (Ulrich & Karlawish, 2006). Thus, vignettes must be internally consistent. Two issues related to the internal validity of vignettes are important: (1) the extent to which the vignette(s) adequately depict the phenomenon of interest and (2) the degree to which each question in response to the vignette(s) measures the same phenomenon (Flaskerud, 1979; Gould, 1996). 170 CONNIE M. ULRICH AND SARAH J. RATCLIFFE Table 1. Types of Reliability Important for Quantitative Vignettes. Test-retest: Repeated measurements of the vignettes can determine the stability of the measure’s performance at two distinct time periods with the same group of subjects (generally within two weeks time). A Pearson’s product moment correlation coefficient is calculated; a coefficient closer to 1.00 generally represents stability of the measure. (Non-parametric measures of association are used for nominal and/or ordinal data). Internal Consistency: A measure of an instrument’s reliability is how consistently the items in the scale measure the designated attribute. Cronbach’s alpha is the most widely used measure of reliability and an alpha of 0.70 is considered acceptable. Source: Waltz, Strickland, and Lenz (1991). Content Validity Content validity, constituting one type of internal validity, addresses the degree to which an instrument represents the domain of content being measured and is a function of how it was developed and/or constructed (Waltz, Strickland, & Lenz, 1991). Ways to assess content validity of vignettes include using a panel of experts, focus group interviews, and/or CI techniques. Using a panel of experts, at least two (to a maximum of 10) experts in the field are asked to quantify or judge the relevance of vignettes and corresponding questions based on the following criteria: (1) does the vignette adequately reflect the domain of interest?; (2) is the vignette plausible and easily understood?; (3) are the corresponding questions representative of the vignette’s content?; and (4) do the objectives that guided the construction of the vignette correspond with its content and its response set? (Lanza & Cariaeo, 1992; Lynn, 1986; Waltz et al., 1991). Relevancy is generally rated on a four-point scale, from totally irrelevant (1) to extremely relevant (4). A formal content validity index (CVI) can be calculated based on the proportion of experts who rate a vignette 3 or 4. CVI hence indicates the extent of agreement by expert raters on the relevancy of the vignettes. Generally, an index of 0.80 or higher represents good content validity. Haynes, Richard, and Kubany (1995) provide a thorough guide to assessing content validity (Table 1). Illustration of Content Validity To compare adolescent and parental willingness to participate in minimal risk and above-minimal risk pediatric asthma research protocols, Brody, Annett, Scherer, Perryman, and Cofrin (2005) asked an expert panel of ethicists and pediatric pulmonary investigators to review 40 pediatric asthma protocol consent forms and choose those protocols that represented minimal risk and above-minimal risk. The researchers then Hypothetical Vignettes in Empirical Bioethics Research 171 developed standardized vignettes using key information from each of the selected protocols. Reliability of Vignettes A reliable instrument is one that is consistent, dependable, and stable on repeated measurements. Thus, it is free of measurement error (Waltz et al., 1991). Test-retest, inter- and intra-rater, and internal consistency reliability are three different types of reliability that can be reported for quantitative instruments. Test-retest reliability measures how stable study results are over time. Thus, if vignettes are given to respondents on two occasions using a time interval in which conditions for subjects have not changed, vignette results should be comparable. For continuous or ordinal responses (e.g. ratings or rankings), Pearson or Spearman correlation is often used to measure reliability. The kappa coefficient (Cohen, 1960) or a weighted kappa (Cohen, 1968) can be used to measure the agreement between dichotomous response categories. Intra-rater reliability is similar to test-retest but measures one rater’s variation as a result of multiple exposures to the same stimulus. Inter-rater reliability refers to the stability of responses when rated by multiple raters. When multiple experts rate the same vignette, the intra-class correlation coefficient (ICC) is often used to measure the reliability. Internal consistency refers to the homogeneity of the items used to measure an underlying trait or attribute via a scale. For a scale to be internally consistent, items in the scale should be moderately correlated with each other and with the total scale score. One option is to use Cronbach’s alpha (Cronbach, 1951) to assess the homogeneity of the scale, with a value of 0.70 generally being considered acceptable. Qualitative vignettes can also be measured for rigor and reliability by addressing the following questions (Lincoln & Guba, 1985):  How credible are the vignettes?  Are the findings transferable? How applicable are the findings to other areas of inquiry?  Are the findings dependable? Was an audit trail or process of verification used to clarify each step of the research process? Illustration of Reliability Gump et al. (2000) developed vignettes depicting six ethical dilemmas (two justice-oriented situations, two care-oriented situations, and two mixed orientations), each with 8 response items to test a measure of moral 172 CONNIE M. ULRICH AND SARAH J. RATCLIFFE justification skills in college students. Eight expert judges determined the degree of representation of the justice and care constructs by the vignettes and their corresponding items. Test-retest and internal consistency reliability scores were established for each subscale. ADDITIONAL EXAMPLES OF PUBLISHED VIGNETTES USED IN EMPIRICAL BIOETHICS RESEARCH In bioethics research, vignettes have been used to address several issues at the end-of-life, such as withdrawing life support, euthanasia, physician assisted suicide, neonatal ethics, and other sensitive topics (Alexander et al., 2003; Asai et al., 1999; Christakis & Asch, 1995; Emanuel et al., 1996; Freeman, Rathore, Weinfurt, Schulman, & Sulmasy, 1999; Kodadek & Feeg, 2002; McAlpine et al., 1997; Wolfe et al., 1999). Using a sophisticated factorial vignette design, Denk et al. (1997) conducted a computer assisted telephone interview to assess Americans’ attitudes about treatment decisions in end-of-life cases. To avoid a limited content domain and maturation bias, several vignettes were randomly presented to participants. Attitudes were solicited on continuation or termination of costly medical care of critically ill patients. Manipulated variables in vignettes included patients’ age, contribution to medical condition, quality of life, type of insurance, and patients’ right to decide about treatment. Several authors have used vignettes in self-administered mailed questionnaires. Mazor et al. (2005) surveyed 115 primary care preceptors who were attending a faculty development conference to examine the factors that influence their responses to medical errors. The researchers developed two medical error vignettes and randomly varied nine trainee-related factors, including gender, trainee status, error history, and trainee response to error. In another study, Alexander et al. (2003) developed two clinical vignettes to study public support for physician deception of insurance companies by surveying 700 prospective jurors in Philadelphia. The vignettes depicted clinical situations in which a 55 year old individual (gender was changed for each vignette) with a known condition required further invasive and/or noninvasive procedures, determined by their physician, for which the insurance company would not pay. Respondents were asked to either accept or appeal the restriction or misrepresent the patient’s condition to receive desired service. Hypothetical Vignettes in Empirical Bioethics Research 173 Similarly, Freeman et al. (1999) developed six clinical vignettes to study perceptions of physician deception in a cross-sectional random sample of internists using a self-administered mailed questionnaire. The vignettes varied in terms of clinical severity and risks ranging from a life-threatening illness to the need for a psychiatric referral to a patient in need of cosmetic surgery (rhinoplasty). Based on the vignettes, respondents were asked to indicate whether a colleague should deceive third party payers and how they, their colleagues, and society would judge such behavior. Although these clinical vignettes were less threatening to respondents than if presented in interviews, they are limited in capturing actual misrepresentation and/or deception in clinical practice. A few authors have used vignettes within a multi-method framework. Arguing that the use of vignettes with both qualitative and quantitative methods is a powerful tool, Rahman (1996) used long and complex case vignettes with both open-ended and fixed choice responses to understand coping and conflict in caring relationships of elderly individuals. Using an innovative internet survey design, Kim et al. (2005) used qualitative and quantitative approaches to understand the views of clinical researchers on the science and ethics of sham surgery in novel gene transfer interventions for Parkinson Disease patients. Researchers were asked to quantitatively estimate a number of issues, as well as to provide open commentary on their responses (Kim et al., 2005). METHODOLOGICAL CONSIDERATIONS Sample Size Estimates for Studies using Vignettes The required number of respondents (sample size) for a quantitative study using vignettes varies depending on the study aims and design. Factors that influence the design include the research questions; the type of measurements (e.g. dichotomous forced choice, Likert scale) for respondents to rate, rank, or sort vignettes; the number of vignettes given to each respondent; and the number of respondent and situational characteristic effects to be examined. Once these factors have been determined, sample size estimates can be calculated based upon the statistical analysis planned. For example, in the simplest case, suppose a study is designed to examine only the effect of respondents’ race (Caucasian and African American) on organ donation. Each respondent would be given a single vignette and asked to rate, on a Likert scale, how willing they are for their organs to be 174 CONNIE M. ULRICH AND SARAH J. RATCLIFFE Table 2. Number of Subject per Group for Two-sided t-test with a=0.05 and Power of at least 80%, Assuming Equal Group Sizes. Difference in means Number of subjects needed per group 0.10s 1571 0.20s 394 0.30s 176 0.50s 64 0.75s 29 1.00s 17 donated in the particular situation. The differences in the ratings by race would then be analyzed using a Student’s t-test, assuming normally distributed responses. Sample size calculations are based on a meaningful minimum difference (or effect size=difference/standard deviation s) between the mean response rates in the two groups, that is, the smallest difference that will be found to be statistically significant in the analyses (Table 2). When multiple vignettes are given to each respondent, there is an inherent correlation between responses from the same individual. That is, the response to one vignette is assumed to be related to or affected by the responses given to the other vignettes. When only two vignettes are given to each respondent, analyses and, hence, sample size calculations, can be based on the differences between each individual’s responses. However, as the number of vignettes increases, the within-respondent variance needs to be explicitly taken into account. This is particularly important as the withinrespondent variance can potentially be large, and may ‘‘swamp’’ the situational effects of interest. Larger sample sizes are needed in order to detect any situational effects. For example, to test the hypothesis of no differences between two groups or four situations when four vignettes are given to each respondent and comparisons are to be made between two respondent groups, 16 respondents per group may be required when the within-respondent variation is 2, but 60 respondents per group would be required if the within-respondent variation was doubled. For the interested reader, sample size tables can be found in Rochon (1991). Factorial Designs When characteristics in vignettes are manipulated, a factorial design is often employed. If responders are given every possible vignette, the study would be considered a full factorial design. When only a couple of situational characteristics are being studied, a full factorial design may be burdensome to the respondents. For example, if only the gender (male/female) and race (Caucasian/African American) of the hypothetical person in the Hypothetical Vignettes in Empirical Bioethics Research 175 vignette were changed, each subject is given 2  2=4 vignettes to respond to. However, when there are a number of characteristics to be changed with multiple levels or categories, the number of vignettes can become excessive. For example, changing 5 characteristics with 2 options for each would result in 25=32 vignettes being given to each respondent. In these cases, a fractional factorial design would be appropriate. In fractional factorial designs, each respondent is given a fraction of all the possible situational combinations being studied. While this design does not affect the sample size, it does limit the hypotheses that can be tested. Generally, a study is designed so that main effects of characteristics can be estimated (e.g. differences between races) but higher order interactions cannot (e.g. age  race  gender), as they are assumed to be zero. If the effect of one situational change does vary with the level of another change and is not taken into account in the design, effect estimates may become biased or confounded. Thus, the fraction to be used is based upon the hypotheses or inferences that are most important in the study as well as hypothesized or known relationships between the situational characteristics. For example, Battaglia, Ash, Prout, and Freund (2006) used a fractional factorial design to explore primary care providers’ willingness to recommend breast cancer chemoprevention trials to at-risk women. Five different dichotomous characteristics, including age, race, socioeconomic status, co-morbidity, and mobility, were manipulated in clinical vignettes to assess physician decision-making. Using all five characteristics would yield 32 possible vignettes in a complete factorial design (25). To reduce this number, a balanced fraction approach of half of all possible vignette combinations were used and participants were asked to respond to one of the 16 versions of the vignette. Hypothesis Testing In quantitative hypothesis-testing studies, and when each respondent is only given one vignette, standard statistical methods such as t-tests, analysis of variance (ANOVA), and nonparametric tests can be used to analyze the data. In studies when multiple vignettes are considered by each respondent, statistical methods need to account for the inherent correlation between measurements from the same respondent, as noted above. The data are considered balanced if every pair of situational characteristics is presented to respondents an equal number of times, and complete if there is no missing data. For balanced and complete data, repeated measures ANOVA or 176 CONNIE M. ULRICH AND SARAH J. RATCLIFFE analysis of covariance (ANCOVA) can test the hypothesis of interest. Issues of multiple comparisons need to be addressed before using any of these models. If the data are not balanced or complete, more advanced statistical methods can be used, such as linear mixed effect models (Laird & Ware, 1982) or generalized estimating equations (GEE) (Zeger & Liang, 1986). These methods also allow for testing complex correlation structures that may emerge in some studies. VALIDITY OF CONCLUSIONS External validity, or the extent to which generalizations can be made from the study sample to the population, is limited in studies employing hypothetical vignettes since vignettes may not reflect the clinical nuances of real-life situations. Therefore, caution must be used in interpreting predictive relationships between what participants report ‘‘ought to be done’’ in constructed vignettes and actual behaviors. However, vignettes provide a means for understanding attitudes, opinions, and beliefs about moral dilemmas. Rahman (1996) notes that findings will be more generalizable when a vignette is closer to real-life situations. STRENGTHS AND LIMITATIONS OF HYPOTHETICAL VIGNETTES Vignettes in bioethics research pose many practical advantages. They are economical; gather large amounts of data at a single time; provide a means of assessing attitudes, beliefs, and practices on sensitive subject areas; are less personal and threatening than other methods; and avoid observer influences (Alexander & Becker, 1978; Finch, 1987; Flaskerud, 1979; Gould, 1996; Hughes, 1998; Hughes & Huby, 2002; Rahman, 1996; Schoenberg & Ravdal, 2000; Wilson & While, 1998) (see Table 3). Vignettes have been criticized, however, for their limited applicability to ‘‘real life.’’ Hughes (1998) argues that the characters, social context, and situation of vignettes must be presented in an authentic, relevant, and meaningful way for participants. Vignettes are also subject to measurement error, which may lead to ‘‘satisficing.’’ Stolte (1994) described satisficing ‘‘as a tendency for participants to process the vignettes less carefully than under real conditions’’ (p. 727). This may occur because of difficulties completing 177 Hypothetical Vignettes in Empirical Bioethics Research Table 3. Strengths and Limitations of using Hypothetical Vignettes in Bioethics Research. Strengths Limitations  Flexible method, varying in length and style, of gathering sensitive information from participants; depersonalizes information and provides a distancing effect  Easily adaptable for both quantitative and qualitative research; used individually or in focus groups and modified to ‘‘fit’’ the researcher’s population of interest and topical foci  Complementary adjunct to other types of data collection methods (i.e. semi-structured interviews, observational data) or appropriately used in isolation  Systematic manipulation of specific characteristics in vignettes (e.g. age or gender) can be done to assess changes in attitudes/judgments  Cost effective and economical in terms of surveying a population sample  Does not require respondents’ in-depth understanding of the subject matter  Potential to reduce socially desirable answers  A lengthy vignette with complex wording may lead to misinterpretation or misunderstanding with resulting measurement error, especially in individuals with learning disabilities or cognitive impairments  Limited external validity cannot generalize findings of beliefs/perceptions or self-reported actions/behaviors from hypothetical scenarios to actual actions/ behaviors  Potential for psychological distress based on the presentation, sensitive nature, and context of the scenario and its interpretation/importance to participants’ life experiences  Potential for unreliable measurement(s)  Potential for satisficing: ‘‘a tendency for subjects to process vignette information less carefully and effectively than they would under ideal or real conditions’’ (Stolte, 1994). the interpretation of and/or response to a vignette, or because of insufficient motivation of participants to perform these tasks. In turn, participants’ responses may be biased or incomplete by simply choosing the first presented response option that seems reasonable, acquiescing to common assertions, randomly choosing among the offered responses, failing to differentiate responses on a particular measure, or reporting a ‘‘don’t know’’ answer (Krosnick, 1991). Attention to contextual factors, such as interview setting, participant compensation, instrument attributes, and mode of administration may help to control satisficing (Stolte, 1994). CONCLUSION Although the use of vignettes in bioethics research has largely focused on end-of-life care issues, this type of data collection method provides a 178 CONNIE M. ULRICH AND SARAH J. RATCLIFFE practical and economical means to understanding complex, challenging, and burgeoning ethical concerns. Vignettes can be constructed from a variety of sources and can be presented in several formats. The method is flexible in allowing the researcher to manipulate experimental variables of interest for a sophisticated analytic design. It can be incorporated in mailed questionnaires (paper-and-pencil method or web-based approach) or be administered face-to-face. Caution must be given in generalizing research findings based on the use of vignettes since responses regarding hypothetical behaviors are not necessarily indicative of actual behaviors. Overall, however, reliable and internally valid vignettes provide an important means to empirically advance our understanding of bioethics in key arenas such as clinical practice, research, and policy. REFERENCES Alexander, C. S., & Becker, H. J. (1978). The use of vignettes in survey research. Public Opinion Quarterly, 42, 93–104. Alexander, G. C., Werner, R. M., Fagerlin, A., & Ubel, P. (2003). Support for physician deception of insurance companies among a sample of Philadelphia residents. Annals of Internal Medicine, 138, 472–475. Asai, A., Akiguchi, I., Miura, Y., Tanabe, N., & Fukuhara, S. (1999). Survey of Japanese physicians’ attitudes towards the care of adult patients in persistent vegetative state. Journal of Medical Ethics, 25, 302–308. Barter, C., & Renold, E. (1999). The use of vignettes in qualitative research. Social Research Update, 25. Retrieved October 2, 2007, from http://sru.soc.surrey.ac.uk/SRU25.html Barter, C., & Renold, E. (2000). ‘‘I Want to Tell You a Story’’: Exploring the application of vignettes in qualitative research with children and young people. International Journal of Social Research Methodology, 3, 307–323. Battaglia, T. A., Ash, A., Prout, M. N., & Freund, K. M. (2006). Cancer prevention trials and primary care physicians: Factors associated with recommending trial enrollment. Cancer Detection and Prevention, 30, 34–37. Berney, L., Kelly, M., Doyal, L., Feder, G., Griffiths, C., & Jones, I. R. (2005). Ethical principles and the rationing of health care: A qualitative study in general practice. British Journal of General Practice, 55, 620–625. Brody, J. L., Annett, R. D., Scherer, D. G., Perryman, M. L., & Cofrin, K. M. W. (2005). Comparisons of adolescent and parent willingness to participate in minimal and above-minimal risk pediatric asthma research studies. Journal of Adolescent Health, 37, 229–235. Cavanagh, G.F., & Fritzsche, D. J. (1985). Using vignettes in business ethics research. In: L. E. Preston, (Ed.), Research in corporate social performance and policy, (Vol 7, pp 279-293). Greenwich, CT: JAI Press. Christakis, N. A., & Asch, D. A. (1995). Physician characteristics associated with decisions to withdraw life support. American Journal of Public Health, 85, 367–372. Hypothetical Vignettes in Empirical Bioethics Research 179 Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. Cohen, J. (1968). Weight kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. Curbow, B., Fogarty, L. A., McDonnell, K. A., Chill, J., & Scott, L. B. (2006). The role of physician characteristics in clinical trial acceptance: Testing pathways of influence. Journal of Health Communication, 11, 199–218. Denk, C. E., Fletcher, J. C., & Reigel, T. M. (1997). How do Americans want to die? A factorial vignette survey of public attitudes about end-of-life medical decision-making. Social Science Research, 26, 95–120. Emanuel, E. J., Fairclough, D. L., Daniels, E. R., & Clarridge, B. R. (1996). Euthanasia and physician-assisted suicide: Attitudes and experiences of oncology patients, oncologists, and the public. Lancet, 347, 1805–1810. Finch, J. (1987). The vignette technique in survey research. Sociology, 21, 105–114. Flaskerud, J. H. (1979). Use of vignettes to elicit responses toward broad concepts. Nursing Research, 28, 210–212. Forsyth, B., & Lessler, J. (1991). Cognitive laboratory methods: A taxonomy. In: P. Biemer, R. Groves, L. Lyberg, N. Mathiowetz & S. Sudman (Eds), Measurement errors in surveys (pp. 393–418). New York: Wiley. Fowler, F. J. (1995). Improving survey questions: Design and evaluation. Thousand Oaks, CA: Sage. Fowler, F. J. (2002). Survey research methods (3rd ed.). Thousand Oaks, CA: Sage. Freeman, V. G., Rathore, S. S., Weinfurt, K. P., Schulman, K. A., & Sulmasy, D. P. (1999). Lying for patients: Physician deception of third-party payers. Archives of Internal Medicine, 159, 2263–2270. Gould, D. (1996). Using vignettes to collect data for nursing research studies: How valid are the findings? Journal of Clinical Nursing, 5, 207–212. Grady, C., Hampson, L., Wallen, G. R., Rivera-Gova, M. V., Carrington, K. L., & Mittleman, B. B. (2006). Exploring the ethics of clinical research in an urban community. American Journal of Public Health, 96, 1996–2001. Gump, L. S., Baker, R. C., & Roll, S. (2000). The moral justification scale: Reliability and validity of a new measure of care and justice orientations. Adolescence, 35, 67–76. Haynes, S. N., Richard, D. C. S., & Kubany, E. S. (1995). Content validity in psychological assessment: A functional approach to concepts and methods. Psychological Assessment, 7, 238–247. Hughes, R. (1998). Considering the vignette technique and its application to a study of drug injecting and HIV risk and safer behaviour. Sociology of Health and Illness, 20, 381–400. Hughes, R., & Huby, M. (2002). The application of vignettes in social and nursing research. Journal of Advanced Nursing, 37, 382–386. Kalton, G., & Kasprzyk, D. (1986). The Treatment of Missing Survey Data. Survey Methodology, 12(1), 1–16. Kim, S. Y. H., Frank, S., Holloway, R., Zimmerman, C., Wilson, R., & Kieburtz, K. (2005). Science and ethics of sham surgery: A survey of Parkinson disease clinical researchers. Archives of Neurology, 62, 1357–1360. Kodadek, M. P., & Feeg, V. D. (2002). Using vignettes to explore how parents approach endof-life decision making for terminally ill infants. Pediatric Nursing, 28, 333–343. 180 CONNIE M. ULRICH AND SARAH J. RATCLIFFE Krosnick, J. A. (1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology, 5, 213–236. Krueger, R. A., & Casey, M. A. (2000). Focus groups: A practical guide for applied research (3rd ed.). Thousand Oaks, CA: Sage. Laird, N. M., & Ware, J. H. (1982). Random effects models for longitudinal data. Biometrics, 38, 963–974. Lanza, M. L., & Cariaeo, J. (1992). Use of a panel of experts to establish validity for patient assault vignettes. Evaluation Review, 17, 82–92. Lincoln, Y., & Guba, E. (1985). Naturalistic Inquiry. Beverly Hills, CA: Sage. Litwin, M. S. (1995). How to measure survey reliability and validity. The survey kit, 7. Thousand Oaks, London: Sage. Lynn, M. R. (1986). Determination and quantification of content validity. Nursing Research, 35, 382–385. Martin, E. (2004). Vignettes and respondent debriefing for questionnaire design and evaluation. In: S. Presser, J. M. Rothgeb, M. P. Couper, J. T. Lessler, E. Martin, J. Martin & E. Singer (Eds), Methods for testing and evaluating survey questionnaires (pp. 149–171). New Jersey: Wiley. Mazor, K. M., Fischer, M. A., Haley, H., Hatern, D., Rogers, H. J., & Quirk, M. E. (2005). Factors influencing preceptors’ responses to medical errors: A factorial study. Academic Medicine, 80, 588–592. McAlpine, H., Kristjanson, L., & Poroch, D. (1997). Development and testing of the ethical reasoning tool (ERT): An instrument to measure the ethical reasoning of nurses. Journal of Advanced Nursing, 25, 1151–1161. Nolan, P. W., & Smith, J. (1995). Ethical awareness among first year medical, dental, and nursing students. International Journal of Nursing Studies, 32, 506–517. Patton, M. Q. (2002). Qualitative evaluation and research methods (3rd ed.). Thousand Oaks, CA: Sage Publications. Platek, R., Pierre-Pierre, F. K., & Stevens, P. (1985). Development and design of survey questionnaires. Ottawa, Canada: Minister of Supply and Services Canada. Polit, D. F., & Hungler, B. P. (1999). Nursing research. Principles and methods (6th ed.). Philadelphia: Lippincott. Presser, S., Rothgeb, J. M., Couper, M. P., Lessler, J. T., Martin, E., Martin, J., & Singer, E. (2004). Methods for testing and evaluating survey questionnaires. New Jersey: Wiley. Rahman, N. (1996). Caregivers’ sensitivity to conflict: The use of vignette methodology. Journal of Elder Abuse and Neglect, 8, 35–47. Rochon, J. (1991). Sample size calculations for two-group repeated-measures experiments. Biometrics, 47, 1383–1398. Schigelone, A. S., & Fitzgerald, J. T. (2004). Development and utilization of vignettes in assessing medical students’ support of older and younger patients’ medical decisions. Evaluation and The Health Professions, 27, 265–284. Schoenberg, N. E., & Ravdal, H. (2000). Using vignettes in awareness and attitudinal research. International Journal of Social Research Methodology, 3, 63–74. Siminoff, L., Burant, C., & Younger, S. J. (2004). Death and organ procurement: Public beliefs and attitudes. Social Science and Medicine, 59, 2325–2334. Stolte, J. F. (1994). The context of satisficing in vignette research. The Journal of Social Psychology, 134, 727–733. Hypothetical Vignettes in Empirical Bioethics Research 181 Streiner, D. L., & Norman, G. R. (2003). Health measurement scales: A practical guide to their development and use (3rd ed.). New York: Oxford University Press. Ulrich, C., & Karlawish, J. T. (2006). Responsible conduct of research. In: L. A. Lipsitz, M. A. Bernard, R. Chernoff, C. M. Connelly, L. K. Evans, M. D. Foreman, J. R. Hanlon, & G. A. Kuchel (2006). Multidisciplinary guidebook for clinical geriatric research (1st Ed.) (pp. 52–62). Washington, DC: Gerontological Society of America. Veloski, J., Tai, S., Evans, A. S., & Nash, D. B. (2005). Clinical vignette-based surveys: A tool for assessing physician practice variation. American Journal of Medical Quality, 20, 151–157. Waltz, C., Strickland, O., & Lenz, E. (1991). Measurement in nursing research (2nd ed.). Philadelphia: F.A. Davis. Wason, K. D, Polonsky, M. J, & Hymans, M. R (2002). Designing vignette studies in marketing. Australasian Marketing Journal, 10(3), 41–58. Wendler, D., & Emanuel, E. J. (2002). The debate over research on stored biological samples. Archives of Internal Medicine, 162, 1457–1462. Willis, G. B. (1994). Cognitive interviewing and questionnaire design: A training manual. Cognitive Methods Staff Working Paper Series. Centers for Disease Control and Prevention, National Center for Health Statistics. Willis, G. B. (2006). Cognitive interviewing as a tool for improving the informed consent process. Journal of Empirical Research on Human Research Ethics, 1, 9–24(online issue). Wilson, J., & While, A. (1998). Methodological issues surrounding the use of vignettes in qualitative research. Journal of Interprofessional Care, 12, 79–87. Wolfe, J., Fairclough, D. L., Clarridge, B. R., Daniels, E. R., & Emanuel, E. J. (1999). Stability of attitudes regarding physician-assisted suicide and euthanasia among oncology patients, physicians, and the general public. Journal of Clinical Oncology, 17, 1274–1279. Zeger, S. L., & Liang, K. Y. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics, 42, 121–130. This page intentionally left blank DELIBERATIVE PROCEDURES IN BIOETHICS Susan Dorr Goold, Laura Damschroder and Nancy Baum ABSTRACT Deliberative procedures can be useful when researchers need (a) an informed opinion that is difficult to obtain using other methods, (b) individual opinions that will benefit from group discussion and insight, and/or (c) group judgments because the issue at hand affects groups, communities, or citizens qua citizens. Deliberations generally gather non-professional members of the public to discuss, deliberate, and learn about a topic, often forming a policy recommendation or casting an informed vote. Researchers can collect data on these recommendations, and/or individuals’ preexisting or post hoc knowledge or opinions. This chapter presents examples of deliberative methods and how they may inform bioethical perspectives and reviews methodological issues deserving special attention. In the face of scarcity, deliberation can help those who do not get what they want or even what they need come to accept legitimacy of a collective decision. –Gutmann & Thompson, 1997 Empirical Methods for Bioethics: A Primer Advances in Bioethics, Volume 11, 183–201 Copyright r 2008 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1479-3709/doi:10.1016/S1479-3709(07)11010-4 183 184 SUSAN DORR GOOLD ET AL. INTRODUCTION Bioethical issues are, by definition, morally challenging. Topics in bioethics of interest to empirical researchers frequently carry enormous policy relevance, and would benefit from informed, reflective public input. Unfortunately, such topics as stem cell research, ‘‘rationing,’’ cloning, and organ transplantation tend to be technically and conceptually complex, intimidating and sometimes even frightening, making public input difficult. Deliberative procedures, based on deliberative democratic theory, may be an appropriate choice when either: (a) an informed opinion is needed but difficult to obtain; (b) individual opinions will benefit from group discussion and insight; and/or (c) group judgments are relevant, usually because the issue affects groups, communities, or citizens. The development of health policy (including many bioethics issues) can benefit from several of these characteristics, and potentially from deliberative public input as well. Theories of deliberative democracy, despite important differences, share an emphasis on political decision making that relies on a process in which political actors listen to each other with openness and respect, provide reasons and justifications for their opinion, and remain open to changing their point of view through a process of discourse and deliberation. Deliberation has been justified by appeals to develop a more informed public (Fishkin, 1995), create decisional legitimacy (Cohen, 1997), and/or claim that participants in deliberations and their constituents have consented to informed decisions (Fleck, 1994). Just as traditional bioethical principles have been invoked to ensure that individual patients have a voice in their own medical decision making, so might deliberation provide community members with a ‘‘voice’’ in communitywide decisions, for instance about health spending priorities (Goold & Baum, 2006) or research regulation. Deliberative procedures offer an opportunity for individuals to assess their own needs and preferences in light of the needs and desires of others. Morally complex decisions may enjoy public legitimacy if they are the result of such fair and public processes. Beyond legitimacy, individuals involved in a community decision-making process about bioethical issues with policy implications may be more likely to accept even intensely difficult decisions if they feel they have had an opportunity to fully understand and consider the issues and to contribute to the final resolution. WHAT ARE DELIBERATIVE PROCEDURES? In general, deliberative procedures call for gathering non-professional (non-elite and lay) members of the public to discuss, deliberate, and learn Deliberative Procedures in Bioethics 185 about a particular topic with the intention of forming a policy recommendation or casting an informed vote. Some deliberative procedures aim primarily to inform policy. Occasionally, deliberative procedures include both research and policy aims. We will focus this chapter on de novo deliberative procedures used for research purposes, or combined policy and research purposes, where research aims are known and planned up front. For researchers, deliberative procedures can be a valuable tool for gathering information about public views, preferences, and values, and provide some advantages over other methods. Although it is common practice to conduct public opinion polls or nationally representative surveys to gauge public views on particular policy questions, opinions measured in this way can be unstable, subject to manipulation, poorly informed, or individuals may not have formed opinions (Bartels, 2003). This may especially be the case when issues are morally and/or technically complex. For example, a public survey about a rare genetic test is likely to encounter ignorance about genetics, genetic testing, and how a particular condition may impact health. Respondents may refuse to answer, respond despite their lack of knowledge, or respond based on flawed information. Also, using a survey, one may encounter a great deal of influence from the way the questions are framed. Furthermore, surveys often fail to capture the ‘‘public’’ aspect of public input, since the information collected is aggregated individual opinion. Deliberative methods prompt a discussion about what we should do as a political community; participants in deliberation are encouraged to reconsider their opinions in light of the interests of others. Like national polls, deliberative efforts can engage a representative (though smaller) sample from a constituency. Unlike national polls, deliberative procedures emphasize reasons and rationales for and against an issue or policy as a natural part of the deliberative process. Individual and group interviews, while providing opportunities for reflection, generally do not aim to inform participants. Individual interviews would be poorly suited to discovering what people think as a community (rather than as individuals). ‘‘Town hall meetings’’ have been used to gather public input on policies, although they can also include time spent informing the audience. However, they may suffer from heavily biased attendance and often are structured more for political than research objectives. OPERATIONALIZING DELIBERATIVE PROCEDURES In this section, we emphasize methodological issues unique to, or particularly important for, deliberative methods. Similar issues and concerns can be 186 SUSAN DORR GOOLD ET AL. found in other methods, for instance the role of group dynamics that arises in focus group research. We emphasize a limited number of methodological questions, and recommend that the reader turn to the many other excellent chapters in this volume and other resources related to specific portions of empirical research (e.g., survey question wording). To illustrate some of the issues, we provide two examples of deliberative projects. Representation, Recruitment and Sampling Representation, recruitment, and sampling are key aspects of research using deliberative procedures. Randomized selection methods can be used to recruit participants into a deliberative study, however, since deliberation nearly always involves gathering participants into groups, assembling them inherently introduces bias, which has both research and policy implications. In deliberative projects, even more so than, for instance, focus group projects, equal participation and equal opportunity for participation on the part of citizens is vital. A sample that omits important voices of those affected by the issue at hand, or a deliberative group that stifles certain points of view because of a dominant point of view or experience, an imbalance in perceived power, or stigma or sensitivity, undermines the legitimacy of the process and can lead to dissatisfaction, distrust, and/or recommendations that are not truly reflective of the public. There is some evidence that heterogeneous groups deliberate more effectively than homogenous groups (Schulz-Hardt, Jochims, & Frey, 2002). However, it may be important to have homogenous groups when some participants might not otherwise speak freely about sensitive issues, for example, mental illness or racism. Substantive representation presents an alternative to random (proportional) sampling, and entails selecting those most affected by a policy (Goold, 1996). It resembles the practice of convening ‘‘stakeholders,’’ although distinguishes those most affected from those with the greatest interests at stake. For example, medical researchers have an interest in policies for human subjects research but the policies disproportionately affect the human subjects themselves. In addition, the opinions of researchers and laypersons on the topic are likely to be quite different, researchers often already have a forum for input on the topic, and, finally, researchers’ expertise could stifle laypersons’ comfort and active participation in deliberations. In this example, one can justify the disproportionate inclusion of laypersons in a deliberative project about policies for human subjects research, and should ensure that researchers (if they are involved in the project) are in Deliberative Procedures in Bioethics 187 groups separate from laypersons. Importantly, community-based research (of which deliberative methods represent one type) includes input from the community at the beginning of the project to guide priorities and needs; this input can identify groups that would be most important to include and to identify types of groups where homogeneity can be especially advantageous. Early community involvement in the research process can help with recruitment of participants and add legitimacy to recruitment methods. Most projects with policy-making implications should include a component of public advertising in recruitment to ensure that important voices are identified and heard and that the project adheres to standards of openness. Careful screening of volunteers, including questions about motivation for participation, can help minimize the potential for bias. Methods and Structure of Deliberation Deliberative sessions can last as long as 4 days (Rawlins, 2005; Lenaghan, 1999), a weekend (Fishkin, 1995), or just a single day (Damschroder et al., 2007; Ackerman & Fishkin, 2002). A deliberative project may also consist of a series of discussions over time (Goold, Biddle, Klipp, Hall, & Danis, 2005). Sometimes a large group (several hundred individuals) breaks into smaller groups and then reconvenes. Deliberative methods typically include: educating and informing, discussion and deliberation, and describing and/or measuring group (and often individual) views. Balanced education is one approach commonly used to ensure that the information provided to participants meets their needs and enhances credibility. In one arrangement, experts who represent a variety of perspectives on the topic present information from their respective perspectives, and then respond to questions from participants. The opportunity for participants to construct their own questions helps guard against undue influence by the research team in what or how information is provided. The use of ‘‘competing experts’’ allows participants, like a jury in a court case, to judge for themselves the credibility of particular experts based on responses to questions raised by participants. However this type of approach can present a problem when an issue is of an adversarial nature. Participants may take sides rather than listen to experts and fellow deliberators with an open mind. It can also be frustrating if participants sense there are no ‘‘right answers.’’ Alternatively, printed or other materials rather than (or in addition to) experts can be provided to participants. 188 SUSAN DORR GOOLD ET AL. Discussion and deliberation should be led by professional facilitators, trained specifically for the deliberation at hand, whenever feasible. Trained facilitators avoid influencing participants (‘‘leading’’) and ensure that participation in the deliberation is as equally distributed as possible. Even more important in deliberations than in focus group research, dominant personalities need to be diffused and the opinions and perspectives of quieter participants actively sought. A round robin method, nominal group technique, or other approaches should ensure that all participants have an opportunity to speak. Equally important, participants must feel comfortable speaking; besides group composition, facilitator characteristics (men leading a group of women discussing sex, for instance), may be important considerations. The structure of deliberative methods ranges from highly unstructured protocols, starting with an open-ended general question about a topic, to highly structured sequences of tasks. The structure of deliberations can profoundly influence the credibility and legitimacy of the method for public input on policy as well as the rigor of the research. Less structure in the outline for deliberations is advantageous because participants have more freedom to frame or emphasize issues from their own perspective. It is possible, however, that participants will not stay on task or obtain needed information, resulting in an unfocused discussion of less important aspects of the issue at hand. Maintaining the appropriate balance between greater structure and more openness depends on the topic, time available, resources, and other factors. Deliberative groups need enough structure to remain on task and cover important information domains, but participants also need enough flexibility to raise questions and process responses. WHAT TO MEASURE AND WHEN? Research aims determine what data to collect, as well as how and when to collect it. The following includes examples of data that might be collected in a deliberative project: 1. Individual participant viewpoints ex ante relevant to the topic 2. Relevant characteristics or experiences (e.g., participation in research or out-of-pocket health spending) 3. Political engagement, self-efficacy, judgment of social capital (before and/or after deliberation) 4. Individual participant viewpoints ex post relevant to the topic Deliberative Procedures in Bioethics 189 5. Views on the deliberative process (e.g., others’ sincerity, chance to present views, group decision) 6. Group dialog and/or behavior 7. Group decisions or recommendations (or lack thereof) 8. Impact on policy Typically, data collected in deliberative projects include a combination of survey responses, group dialog, and group recommendations, decisions, and statements. Researchers may want to know if individuals change with respect to knowledge or opinion as a result of the deliberative process, and so propose to measure a given variable before and after deliberation. Data collected about individuals, besides the usual demographic information, can include pre-deliberation opinions, knowledge of the topic, and measures of characteristics that are likely to influence views. For example, if you are measuring opinions about mental health parity, personal or family experience with mental illness would be a relevant variable to include. Other data that is often valuable to collect, particularly after group deliberation, includes judgments and views of the group process or the group’s final decision. In the project described below, for instance, we used measures of perceived fair processes and fair outcomes (Goold et al., 2005). Group dialog can be audiotaped and, if needed, transcribed. Observation of group dialog and group behavior can include structured or open-ended options to document the group dynamic or process. For example, observation can document the distribution of participation in conversation, decision-making style, dominance of particular individuals in the discussion, judgments about the group’s cooperative or adversarial decision-making processes, and the like. Group dialog lends itself to a number of analytic possibilities. One can analyze dialog for the reasons, rationalizations, arguments, or experiences used to justify points of view. One can analyze the quality of reasoning, the persuasiveness of arguments, or characteristics of group dialog that influenced individual or group viewpoints. EXAMPLES OF STUDIES USING DELIBERATIVE METHODS Veterans, Privacy, Trust, and Research The Federal Privacy Rule was implemented in the United States in 2003, as part of the Health Insurance Portability and Accountability Act of 1996 190 SUSAN DORR GOOLD ET AL. (HIPAA), with the hope that it would address growing concerns people had about how personal medical information was being used in contexts outside of medical treatment. However, the Rule has affected research in unanticipated ways. Researchers generally can only access medical records if they have permission from each individual patient or they obtain a waiver of this consent requirement from an oversight board (an Institutional Review Board (IRB) or a privacy board). For researchers who need to review thousands of medical records going back a long period of time, obtaining individual authorization is difficult if not impossible (US HHS (United States Department of Health and Human Services), 2005). Requiring patient permission for each study can add significant monetary costs and result in selection biases that threaten the validity of findings (Armstrong et al., 2005; Ingelfinger & Drazen, 2004; Tu et al., 2004). Study Aim In the study described below, the investigators wanted to learn what patients thought about researchers’ access to medical records and what influenced those opinions. Representation and Sampling A sample of 217 patients from 4 Veteran Affairs (VA) health care facilities deliberated in small groups at each of 4 locations. They had the opportunity to question experts and inform themselves about privacy issues related to medical records research and patient privacy. Participants were recruited from a randomized sample of patients (stratified by age, race, and visit frequency) from four geographically diverse VA facilities to participate in baseline and follow-up phone surveys. Ensuring balanced numbers of older and heavier users of the health care system was important in order to gain insight into whether these patients were more sensitive about researchers using their medical records or, conversely, whether they were more supportive of the need for research compared to those who have a lower burden of illness. Additionally, the sampling approach used in this study sought to ensure that the voices of people with racial minority status, particularly African Americans (since they represent the vast majority of non-white patients for the four sites) were adequately represented. In past studies, African-American patients have expressed a reluctance to participate in clinical research (Corbie-Smith, Thomas, Williams, & Moody-Ayers, 1999; Shavers, Lynch, & Burmeister, 2002) and have exhibited lower levels of trust in researchers than white patients (Corbie-Smith, Thomas, & St George, 2002). Deliberative Procedures in Bioethics 191 Deliberation Procedure and Data Collection Participants who completed a baseline survey were invited to an all-day deliberative session. Each participant was randomly assigned to a group of 4–6 individuals and each deliberative session included 9 or 10 of these smaller deliberative groups. A non-facilitated deliberative process was chosen to minimize researcher bias and to encourage a fresh approach to the complex issues at hand. Small groups used a detailed, written protocol starting with a review of background information about medical records, minimal risk research, and the HIPAA Privacy Rule. Participants were asked to imagine that they were acting as an advisory committee for a ‘‘research review board y [that] judges whether a research study will pose minimal risk and whether the study will adequately protect private information.’’ The protocol and background information explained that researchers cannot use personally identifiable medical records in a research study unless the IRB agrees that three waiver criteria have been met. Small group deliberations were interspersed with larger, plenary sessions led by presentations from experts in medical records research and privacy advocacy. Participants had the opportunity to pose questions to the experts and hear the answers as a plenary group. Baseline and follow-up surveys (including some completed on-site the day of the session) elicited the level of trust in various health care entities, attitudes about privacy, prior knowledge about research and privacy, and general demographic information. Analysis followed a mixed-methods approach, combining qualitative data from deliberations with quantitative data from baseline surveys. Results Baseline opinions of participants confirmed that many aspects of the topic for deliberation were unknown to participants; 75% did not know that sometimes researchers could access their medical records without their explicit consent and 39% had never heard of the HIPAA Privacy Rule. The issue was value-laden on the heels of much publicity surrounding implementation of the Rule; 75% of participants said they were very or somewhat concerned about privacy but 89% also said that conducting medical records research in the VA was critically or very important. When asked whether they were satisfied with provisions of the HIPAA Privacy Rule, 66% wanted a procedure in place that would cede more control to patients over who sees their medical records. However, no consensus was reached as to how much control should be in the hands of patients. After deliberation, nearly everyone (96%) said they would be 192 SUSAN DORR GOOLD ET AL. willing to share their medical records with VA researchers conducting a study about a serious medical condition. Participants’ trust in VA researchers was the most powerful determinant of the degree of control they recommended for patients who overuses their medical records. Mechanic and Schlesinger’s (1996) work on trust inspired a framework to describe trust between patients and a medical research enterprise: (1) Are medical records kept confidential? (2) Does the research being conducted demonstrate high priority on patient welfare? (3) Are researchers held accountable and responsible for protecting privacy? (4) Are systems to protect medical records sufficiently secure? (5) Do researchers fully disclose the research being conducted and how medical records are used to conduct that research? (Mechanic, 1998). Participants reported the need to see and understand how their records are kept private, that violators will be punished consistently and relatively severely (e.g., job loss, fines), and assurance that computerized systems are truly secure. Participants expressed the need to see that the institution’s research actually benefits patients and is not subject to conflicts of interest. They wanted transparency in the research process in terms of what research is being done and how their medical information may have contributed to new findings. Further analyses will explore reasons for the apparent contradiction that participants are willing to share their own information, yet united in their call for more control over how their medical records are used. Deliberation changed opinions in that the overwhelming majority of participants who said they would be willing to share their medical records with VA researchers actually increased compared to baseline willingness (89%). When asked how important it was for researchers to obtain permission for each and every research study before deliberation, 74% said it was critically or very important to do so at the time of the baseline survey, whereas immediately and 4–6 weeks after deliberation, only 48% and 50%, respectively, held this position. Participants were satisfied with the deliberation process. When asked anonymously, 98% of participants thought the deliberation process was fair, 94% felt others in their group listened to what they had to say, and 89% said they would make the same recommendation as the one made by their group as a whole (through consensus or voting). One participant summarized his experience by saying, ‘‘... with more exposure and thought, my decisions are more in line with my moral values.’’ In this example, a single, one-day deliberative session generated informed, and possibly more public-spirited, points of view about an important, complex, and morally challenging health policy issue. Deliberative Procedures in Bioethics 193 Health Spending Priorities How to allocate and prioritize limited resources fairly and openly is perhaps the most pressing moral and practical concern in the health care arena today; certainly the issue is of importance to nearly all citizens. Setting health care priorities for the use of scarce public resources requires attention to both economics and justice; justice, in turn, is enhanced by the participation of those most affected by the decisions. According to this participatory conception, rationing decisions should incorporate the preferences and values of those affected (Eddy, 1990; Menzel, 1990; Fleck, 1992, 1994; Goold, 1996; Emanuel, 1997; Daniels & Sabin, 1998). Engaging and involving citizens in health care priority setting, however, confronts obstacles. Because health concerns lack salience for most healthy citizens, decisions tend to be influenced heavily by health care experts or those, like the disabled or senior citizens, with specific interests (Kapiriri & Norheim, 2002; Goold, 1996; Jacobs, 1996). Discussions about health services and financing can be complex, making citizens frustrated or intimidated and reliant on others to decide for them, trusting that the services they need will be provided to them in the event of illness (Goold & Klipp, 2002). Talking about future health care needs requires that people think about illness and death; the emotionally laden tension when money appears to be pitted against health in rationing decisions has been a thorny problem for proponents of the ‘‘citizen involvement in rationing’’ model (Ham & Coulter, 2001). Besides the complexity and value-laden nature of the problem, health care trade-offs involve pooled (often public) resources. Individual health and health care priorities must be balanced by the current or future needs of others in a community. The need for interpersonal trade-offs, and the balancing of individual with social or group needs requires either procedural justice (i.e., fair processes for decision making) or distributive justice (i.e., fair distribution of benefits and burdens), or, ideally, both. The topic of health care spending priorities lends itself well to deliberative procedures. Accordingly, Drs. Goold and Danis designed CHAT ‘‘Choosing Healthplans All Together,’’ a simulation exercise based on deliberative democratic procedures in which laypersons in groups design health insurance benefit packages for themselves and for their communities. Participants make trade-offs between competing needs for limited resources. To test the choices they have made as individuals and as a group, participants encounter 194 SUSAN DORR GOOLD ET AL. hypothetical ‘‘health events’’ that illustrate the consequences of their priorities, which inform subsequent group deliberations. Study Aims The project described below (for full details see Goold et al., 2005) aimed to describe public priorities for health benefits and evaluate the CHAT exercise as a deliberative procedure. We assessed aspects of feasibility, structure (including the content of the exercise), process, and outcomes. Representation and Sampling Participants were North Carolina residents without health care expertise, recruited from ambulatory care and community settings. Groups were homogeneous with regard to type of health insurance coverage (Medicare, Medicaid, private, uninsured) and heterogeneous with regard to other characteristics (gender, age, race). Low-income participants were oversampled to assess whether the exercise was accessible and acceptable since typically, these groups are less well represented in policy decisions. Deliberation Procedure and Data Collection The CHAT simulation exercise is a highly structured, iterative process, led by a trained facilitator using a script, and progresses from individual to group decision making. Data collection included pre-exercise questionnaires about demographics, health and health insurance status, health services utilization, out-of pocket costs, and the importance of health insurance features (Mechanic, Ettel, & Davis, 1990). Post-exercise questionnaires rated participant enjoyment of CHAT, understanding, ease of use, and informativeness (Danis, Biddle, Henderson, Garrett, & DeVellis, 1997; Biddle, DeVellis, Henderson, Fasick, & Danis, 1998; Goold et al., 2005). Other items asked participants to rate their affective response to the exercise, perceptions of the group process, outcome of decision making, informational adequacy, and range of available choices. Half the group discussions were tape-recorded to analyze the values, justifications, and reasons expressed by participants during deliberation. Results Five hundred sixty-two individuals took part in 50 sessions of CHAT. Transcripts of group deliberations were analyzed to understand the reasoning, values, and justifications participants emphasized during Deliberative Procedures in Bioethics 195 deliberation. Over 60 themes were identified and organized into four major categories: (1) Insurance as Protection against Loss or Harm; (2) Preferences for the Process of Care; (3) Economics or Efficiency; and (4) Equity or Fairness. Insurance as Protection against Loss or Harm described the greatest proportion of the dialog (55% of coded text) and had the largest number of sub-themes. Dialog was coded under this theme when participants justified coverage selections on the basis of planning for future health care needs and avoiding future harms or losses through adequate insurance coverage. For example, participants discussed the likelihood that they or others would suffer health related losses/harms in the future: I know in our group we had chosen medium coverage because depending on [heredity] and your age, you may have a lot more dental problems and it becomes certainly more important as you age. So that would be my choice. Participants also frequently discussed the types (financial, physical, social, and emotional) and the magnitude of the harms or losses that might be avoided, either because care would be prohibitively expensive without coverage, because not having coverage could lead to serious harm, or because coverage was essential for large numbers of people. The second major category, Economics or Efficiency, represented participants’ concerns about costs and resources (20% of coded text). Such concerns included the use/abuse of insurance, the perceived value of care, and issues of supply and demand of health care services. Participants were concerned with waste in the health care system, and recognized the potential for increased use related to insurance coverage – the concept of ‘‘moral hazard’’: Looking in the long run, if you, specialty just staying at the basic is kind of like a gate keeper. If you’re thinking of group coverage, some people may, I don’t, but some people may jump to a specialist quicker than they need to if they have the free choice of going there. Dialog coded in the third theme, Preferences for the Process of Care, included concerns for quality of care, the ability to choose physicians and treatment facilities, system ‘‘hassles’’ such as the wait time for appointments and, in some instances, one’s own beliefs or attitudes about health care services. This example illustrates the value placed on timely access: This is very important, I think y. I mean, just the thought of waiting four weeks for a routine appointment. What do they consider routine? 196 SUSAN DORR GOOLD ET AL. The fourth major theme, Equity or Fairness, (8% of coded dialog) included discussions of justice, including dialog about equality, personal responsibility, and social responsibility, especially a responsibility to care for the worse off: y because there are people of course that just simply can’t afford $50 a day. It would be an extreme hardship on them y. Several policy and research projects have used CHAT to involve citizens in health benefit design. Evaluation data from the project reported above and several other CHAT projects shows that participants, including lowincome and poorly educated participants (Goold et al., 2005): 1. 2. 3. 4. Find CHAT understandable, informative, and easy to do Judge the fairness of the group process and decision favorably Would be willing to abide by the decisions made by their groups Gain an understanding of the reality of limited resources and trade-offs between competing needs 5. Alter the choices they make for themselves and their families after the exercise, for instance by more frequently choosing coverage for mental health services. As is true for many deliberative methods, however, much more critical and rigorous evaluations of processes and outcomes are needed. DATA MANAGEMENT, ANALYSIS, AND INTERPRETATION: SPECIAL ISSUES Since deliberative procedures always involve gathering individuals into large or small groups for discussion, any data analysis using individuals as the unit of analysis needs to adjust for clustering effects. Owing to the fact that membership in a particular deliberative group discussion and other events will vary from group to group, individual responses on any postdeliberation measures may be affected. Another important issue for deliberative procedures is the need, for many researchers, to compare individual responses before and after deliberations. Since merely measuring something once can work as an intervention (a well documented effect in the educational research literature), it is better to either (a) use a control group that does not participate in deliberations, or (b) ensure the sample size is large enough to randomize a portion of participants to complete the measures only after group deliberations, and analyze the impact of Deliberative Procedures in Bioethics 197 completing the measure prior to deliberations on post-deliberation responses to the same measure. A related concern inherent in deliberative methods relates to the effect of the group on the individual. Do individuals tend to change their point of view depending on the group’s overall point of view? What about particular events (knowledge statements, e.g.,) that occur in some groups but not others? The recognition or mention of a particular issue (e.g., discrimination) could sway whole groups and many individuals to a different opinion. Finally, while group dialog is a valuable and rich source of information, it can be a challenge to manage. Transcription can be error-prone and should be checked in the usual manner. Transcribers who have not been present at the group discussion are unlikely to be able to recognize the gender of speakers, much less consistently identify individual speakers. Policy makers and other users of data from deliberative procedures often raise concerns about the representativeness of non-random sampling, inevitable whenever recruitment takes place into groups. When this occurs, it is important to acknowledge the limitations of non-random sampling but compare that limitation with the limitations of other methods, including comparisons with proportional representative sampling for surveys where generalization may be easier but the responses may be less informed, less insightful, more vulnerable to framing, and have other profound limitations. It is also important to talk about alternatives to proportional representative sampling, such as substantive sampling or, often more familiar to policy makers, the inclusion of important stakeholders. Compare, for example, a randomly selected public opinion survey related to high-risk brain surgery for Parkinson’s Disease to information gained from deliberative groups that purposefully includes family members of those with Parkinson’s, those with early Parkinson’s Disease, or those otherwise at high risk of having the condition. A variety of analytic techniques may be used in deliberative studies. Analyzing individual responses to survey items, before and/or after deliberations, will typically include descriptive statistics, but also should, when possible, include analyses of the responses of important subgroups. For example, in a project about health care resource allocation, subgroup analyses might include low-income or chronically ill participants. The specific approach to analysis of group dialog will be guided by the study’s specific research aims and how much is known about public views on the issue. For example, although group discussions about resource allocation might include many valuable insights, research might focus on dialog about the relative importance of mental health services. Some analysis will be 198 SUSAN DORR GOOLD ET AL. straightforward (e.g., labeling comments about job discrimination in a discussion of privacy), while some may require more interpretation (e.g., expectations of beneficence as an element of trust in medical researchers). APPLICATIONS AND IMPLICATIONS OF THE METHOD Deliberative procedures have been designed for public input into policy making, and hence have special applications for that arena. For example, where policies and/or regulations are relatively silent about surrogate decision making for research participation, deliberative methods can help learn how the public feels about degrees of risk, informed consent, and other issues. As such, these methods can be useful for the examination of the public’s views on bioethical issues that are often complex and benefit from public reflection, discourse and understanding. SUMMARY In this chapter, we have tried to review briefly the use and application of deliberative procedures to empirical research questions in bioethics, illustrated with two projects. Like many other methods, the use of deliberative procedures is varied and consequently the strengths, weaknesses, and implications of research results vary as well. Statistical (proportional) generalization is not a strength of deliberative procedures since convening groups will eliminate the ability to consider a sample truly random. However, deliberative procedures can gain public opinion that is informed, reflective, and more focused on the common good than on individual interests. Deliberative procedures hold a great deal of promise for research on relevant bioethical policy questions. However, like any other method, there is a need for conceptual and empirical research to better define when deliberative procedures are most appropriate, describe the impact of particular methodological choices, and improve our ability to draw conclusions. It may be tempting to regard deliberative procedures as simply the means to a better end – the end being ‘‘better’’ decisions and outcomes. Proponents of this outcome-oriented view evaluate only the products of deliberations (Abelson, Forest et al., 2003; Rowe & Frewer, 2000). Deliberative Procedures in Bioethics 199 Evaluating only outcomes, however, misses the normative argument that ‘‘good’’ deliberative democratic processes can be valued in and of themselves, and that the procedures can be justifiably criticized if they fail to meet normative procedural standards, for example, fair representation or transparency. Research on deliberative methods should examine, in particular, the impact of choices of sampling methods, group composition (relatively heterogeneous or homogeneous), and deliberations’ structure. Researchers using deliberative procedures should be encouraged to include these and other sorts of ‘‘methods’’ questions, as survey researchers have included research aims that address issues of framing, question ordering, and the like. Recently, a number of scholars (Abelson, Eyles et al., 2003; Fishkin & Luskin, 2005; Neblo, 2005; Steenberger, Bächtigerb, Spörndlib, & Steinerab, 2003) have begun to examine and evaluate deliberative procedures, and a few (e.g., Fishkin) have used deliberative procedures directly to answer research as well as policy questions. Researchers should use and interpret the results of deliberations carefully until more is known about the influence of particular aspects of the method. There are sound theoretical and philosophical reasons for involving the public directly in policy decisions, including policy decisions in bioethics. Deliberative methods present a promising research approach to addressing morally, technically, and politically challenging policy questions, with the additional advantage that research and policy aims can, at times, be fruitfully combined. REFERENCES Abelson, J., Eyles, J., McLeod, C. B., Collins, P., McMullan, C., & Forest, P. G. (2003). Does deliberation make a difference? Results from a citizens panel study of health goals priority setting. Health Policy, 66(1), 95–106. Abelson, J., Forest, P.-G., Eyles, J., Smith, P., Martin, E., & Gauvin, F.-P. (2003). Deliberations about deliberative methods: Issues in the design and evaluation of public participation process. Social Science and Medicine, 57(2), 239–251. Ackerman, B., & Fishkin, J. S. (2002). Deliberation day. The Journal of Political Philosophy, 10(2), 129–152. Armstrong, D., Kline-Rogers, E., Jani, S. M., Goldman, E. B., Fang, J., Mukherjee, D., Nallamothu, B. K., & Eagle, K. A. (2005). Potential impact of the HIPAA privacy rule on data collection in a registry of patients with acute coronary syndrome. Archives of Internal Medicine, 165(10), 1125–1129. Bartels, L. M. (2003). Democracy with attitudes. In: M. B. MacKuen & G. Rabinowitz (Eds), Electoral democracy (pp. 48–82). Ann Arbor, MI: University of Michigan Press. 200 SUSAN DORR GOOLD ET AL. Biddle, A. K., DeVellis, R. F., Henderson, G., Fasick, S. B., & Danis, M. (1998). The health insurance puzzle: A new approach to assessing patient coverage preferences. Journal of Community Health, 23(3), 181–194. Cohen, J. (1997). Deliberation and democratic legitimacy. In: J. Bohman (Ed.), Deliberative democracy. Cambridge: MIT Press. Corbie-Smith, G., Thomas, S. B., & St George, D. M. (2002). Distrust, race, and research. Archives of Internal Medicine, 162, 2458–2463. Corbie-Smith, G., Thomas, S. B., Williams, M. V., & Moody-Ayers, S. (1999). Attitudes and beliefs of African Americans toward participation in medical research. Journal of General Internal Medicine, 14(9), 537–546. Damschroder, L. J., Pritts, J. L., Neblo, M. A., Kalarickal, R. J., Creswell, J. W., & Hayward, R. A. (2007). Patients, privacy and trust: Patients’ willingness to allow researchers to access their medical records. Social Science and Medicine, 64(1), 223–235. Daniels, N., & Sabin, J. (1998). The ethics of accountability in managed care reform. Health Affairs, 17(5), 50–64. Danis, M., Biddle, A. K., Henderson, G., Garrett, J. M., & DeVellis, R. F. (1997). Older medicare enrollees’ choices for insured services. Journal of the American Geriatric Society, 45, 688–694. Eddy, D. (1990). Connecting value and costs: Whom do we ask and what do we ask them? The Journal of the American Medical Association, 264, 1737–1739. Emanuel, E. (1997). Preserving community in health care. Journal of Health Politics, Policy and Law, 22(1), 147–184. Fishkin, J. (1995). The voice of the people: Public opinion and democracy. New Haven, CT: Yale University Press. Fishkin, J., & Luskin, R. C. (2005). Experimenting with a democratic ideal: Deliberative polling and public opinion. Acta Politica, 40(3), 284–298. Fleck, L. M. (1992). Just health care rationing: A democratic decision making approach. University of Pennsylvania Law Review, 140(5), 1597–1636. Fleck, L. M. (1994). Just caring: Oregon, health care rationing, and informed democratic deliberation. The Journal of Medicine and Philosophy, 19, 367–388. Goold, S. D. (1996). Allocating health care resources: Cost utility analysis, informed democratic decision making, or the veil of ignorance? Journal of Health Politics, Policy and Law, 21(1), 69–98. Goold, S. D., & Baum, N. M. (2006). Define ‘Affordable.’ Hastings Center Report, 36(5), 22–24. Goold, S. D., Biddle, A. K., Klipp, G., Hall, C., & Danis, M. (2005). Choosing healthplans all together: A deliberative exercise for allocating limited health care resources. Journal of Health Politics Policy and Law, 30(4), 563–601. Goold, S. D., & Klipp, G. (2002). Managed care members talk about trust. Social Science and Medicine, 54(6), 879–888. Gutmann, A., & Thompson, D. (1997). Deliberating about bioethics. The Hastings Center Report, 27(3), 38–41. Ham, C., & Coulter, A. (2001). Explicit and implicit rationing: Taking responsibility and avoiding blame for health care choices. Journal of Health Services Research and Policy, 6(3), 163–169. Ingelfinger, J. R., & Drazen, J. M. (2004). Registry research and medical privacy. New England Journal of Medicine, 350(14), 1452–1453. Deliberative Procedures in Bioethics 201 Jacobs, L. R. (1996). Talking heads and sleeping citizens: Health policy making in a democracy. Journal of Health Politics, Policy and Law, 21(1), 129–135. Kapiriri, L., & Norheim, O. F. (2002). Whose priorities count? Comparison of communityidentified health problems and burden-of-disease-assessed health priorities in a district in Uganda. Health Expectations, 5(1), 55–62. Lenaghan, J. (1999). Involving the public in rationing decisions. The experience of citizen juries. Health Policy, 49, 45–61. Mechanic, D. (1998). The functions and limitations of trust in the provision of medical care. Journal of Health Politics, Policy and Law, 23(4), 661–686. Mechanic, D., Ettel, T., & Davis, D. (1990). Choosing among health insurance options: A study of new employees. Inquiry, 27, 14–23. Mechanic, D., & Schlesinger, M. (1996). Impact of managed care on patient’s trust. The Journal of the American Medical Association, 275, 1693–1697. Menzel, P. T. (1990). Strong medicine: The ethical rationing of health care. New York, NY: Oxford University Press. Neblo, M. (2005). Thinking through democracy: Between the theory and practice of deliberative politics. Acta Politica, 40(2), 169–181. Rawlins, M. D. (2005). Pharmacopolitics and deliverative democracy. Clinical Medicine, 5(5), 471–475. Rowe, G., & Frewer, L. J. (2000). Public participation methods: A framework for evaluation. Science, Technology and Human Values, 25(1), 3–29. Schulz-Hardt, S., Jochims, M., & Frey, D. (2002). Productive conflict in group decision making: Genuine and contrived dissent as strategies to counteract biased information seeking. Organizational Behavior and Human Decision Processes, 88(2), 563–586. Shavers, V. L., Lynch, C. F., & Burmeister, L. F. (2002). Racial differences in factors that influence the willingness to participate in medical research studies. Annals of Epidemiology, 12, 248–256. Steenberger, M. R., Bächtigerb, A., Spörndlib, M., & Steinerab, J. (2003). Measuring political deliberation: A discourse. Comparative European Politics, 1, 21–48. Tu, J. V., Willison, D. J., Silver, F. L., Fang, J., Richards, J. A., Laupacis, A., & Kapral, M. K. (2004). For the Investigators in the Registry of the Canadian Stroke Network. Impracticability of informed consent in the registry of the Canadian stroke network. New England Journal of Medicine, 350(14), 1414–1421. US HHS (United States Department of Health and Human Services). (2005). Health Services Research and the HIPAA Privacy Rule. National Institutes of Health Publication, #05–5308. This page intentionally left blank INTERVENTION RESEARCH IN BIOETHICS Marion E. Broome ABSTRACT This chapter discusses the role of intervention research in bioethical inquiry. Although many ethical questions of interest are not appropriate for intervention research, some questions can only be answered using experimental or quasi-experimental designs. The critical characteristics of intervention research are identified and strengths of this method are described. Threats to internal validity and external validity are discussed and applied to a case example in bioethical research. Several recent intervention studies that were federally funded in the area of informed consent are discussed, and recommendations for future intervention research are presented. INTRODUCTION Empirical research in bioethics has been defined as ‘‘the application of research methods in the social sciences (i.e., anthropology, epidemiology, psychology, and sociology) to the direct examination of issues in medical ethics’’ (Sugarman & Sulmasy, 2001, p. 20). Empirical research methods Empirical Methods for Bioethics: A Primer Advances in Bioethics, Volume 11, 203–217 Copyright r 2008 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1479-3709/doi:10.1016/S1479-3709(07)11009-8 203 204 MARION E. BROOME have been successfully applied to several areas of bioethical inquiry, such as informed consent, education of health professionals in ethical reasoning, assessment of patient–provider communication preferences, end-of-life decision making, and assessment of the effectiveness of various interventions. The purpose of this chapter is to discuss intervention studies applied to the study of bioethical phenomena. The chapter will review topic areas to which intervention research has been successfully applied, describe critical design elements of such research, discuss the strengths and challenges of this approach, and provide an in-depth analysis from a specific intervention study on informed consent. Finally, recommendations will be made for future research in bioethics that may benefit from intervention methods. INTERVENTION STUDIES IN BIOETHICS The use of empirical methods in the study of bioethical inquiry has increased over the past two decades. Sugarman, Faden & Weinstein (2001) conducted an analysis of empirical studies posted in BIOETHICSLINE during the decade of the 1980s. At that time 3.4 percent or 663 of the postings were reports of empirical research. In a subsequent analysis of the reports of empirical studies in bioethics in MEDLINE in 1999 (which by then had subsumed the BIOETHICSLINE database), the number of postings had doubled from 0.6 percent of all MEDLINE postings in 1980–1984 to 1.2 in 1995–1999 (Sugarman, 2004). The most frequently studied topic, regardless of type of empirical approach, was informed consent with physician–patient relationship and ethics education among the top 30 of 50 topics overall. Since 2000, studies have reported on the effectiveness of different types of materials (written, tailored, videotapes, etc.) in increasing individuals’ understanding of a health condition they have or are at risk for (Skinner et al., 2002; Rimer, et al., 2002). Sugarman identified eight types of empirical research ranging form ‘‘purely descriptive studies’’ to ‘‘case reports.’’ Only one type is based on interventional research which Sugarman calls ‘‘demonstration projects.’’ This multitude of types of empirical approaches is very appropriate in the field of bioethics as, the overwhelming majority of the topics in bioethics would not lend themselves to intervention studies, or manipulation of the independent variables, such as medical care at end-oflife, confidentiality, or euthanasia (Sugarman & Sulmasy, 2001). Intervention Research in Bioethics 205 EXPERIMENTAL DESIGNS Many descriptive studies in bioethics have examined issues surrounding clinical trials, including therapeutic misconception, informed consent, placebo groups, randomization, physician–scientist conflict, and the provision of drugs for individuals who are addicted (Friedman, Furberg, & DeMets, 1998; Sugarman, 2004). Such studies are designed to build a descriptive body of knowledge generating hypotheses that can be tested in intervention studies (Sugarman, 2004) and by now some topics have been sufficiently well described to suggest that interventions be developed and tested. The purpose of a well-designed clinical trial is to prospectively compare the efficacy of an intervention in one-group of human participants to the effects in another group of individuals to which no treatment was intentionally administered. The application of experimental designs, such as the randomized controlled trial (RCT), has been limited in bioethical inquiry for several reasons, including difficulties applying the stringent requirements for random selection, as well as difficulties with assignment, blinding, and achieving a sufficient sample size (Friedman et al., 1998). Therefore, instead of using the RCT, much of intervention research conducted in bioethics often follows the precepts of quasi-experimental designs, in which randomization is limited to random assignment to conditions and control groups are referred to as comparison groups (Rossi, Freeman, & Lipsey, 1999). Quasiexperimental designs range from the more rigorous two-group repeated measures or pre-/post-test designs to the one-group post-test-only design. The latter provides the least amount of control over extraneous factors. The nature of these designs allows for varying degrees of control for group differences in pre-existing characteristics (e.g., education, age, etc.) and events that may occur during the study and that may influence outcomes (Cozby, 2007). The degree of control is determined in part by the topic under study, and as noted above, there are topics in bioethics where not all conditions can be met for a purely experimental design. In quasi-experimental designs, the degree to which findings are generalizable (which depends on controls such as random selection and assignment), and the degree to which cause and effect conclusions (which depends on manipulation of the independent variable and use of control groups) can be drawn will be limited. Essential Attributes of Experimental Designs Experimental designs use a variety of procedures to distribute equally preexisting differences among participants across conditions (i.e., experimental 206 MARION E. BROOME and control), in order to maximize the similarity of circumstances under which participants receive an intervention and study groups are observed. These procedures can be categorized into three primary components: (1) randomization of control and experimental groups to the intervention (random assignment), (2) manipulation and systematization of the intervention, and (3) exposing the control group to contextual experiences as similar as possible as the experimental group during the study (Shadish, Cook, & Campbell, 2002). Randomization to intervention and control groups is essential in order for the investigator to assure that any differences that individuals bring to the study, such as previous experiences and personal or socio-demographic characteristics that could interact with the intervention, will be equally distributed across groups and thus not influence the outcome (Shadish et al., 2002; Lipsey, 1990). Manipulation of the independent variable, constituting an intervention in the experimental group only, is an essential attribute of experimental studies (Lipsey, 1990) and in this respect differs from field studies in which naturally occurring phenomena that affect a group are observed as the study unfolds. For instance, in a hypothetical observational field study of informed consent in a research trial, the investigator would observe conversations between the researcher and study participants under different conditions to draw conclusions about how factors may influence participants’ understanding about risk level. In this study the investigator does not manipulate an intervention. In an experimental intervention study, the investigator would randomly assign researchers to different scripts, use vignettes with varied characteristics, or manipulate other variables in the experimental group in order to ascertain how different types of information delivery affect a participant’s understanding of risk. Another important aspect of intervention studies is standardization of the intervention. This means that the investigator must develop and adhere to a protocol so that all participants in the experimental group are exposed to the intervention in the same way and for the same amount of time (Cozby, 2007). This will enable the investigator to interpret results with more confidence related to the effect of the intervention. Additionally, others can then replicate the study by following the same protocol. Finally, the researcher must make efforts to ensure that external events or experiences that may have a bearing on the study outcome are not significantly different for participants in experimental and control conditions. The RCT is the most rigorous and best-controlled experimental design. Adhering to proscribed guidelines (CONSORT, 2004), this design always employs a control group, randomization, often random selection, and an Intervention Research in Bioethics 207 intervention that follows a strict, standardized protocol. The RCT is well established, highly reliable, and valid, allowing for the use of multivariate statistical procedures to make causal inferences about the effect of an intervention on an outcome. It requires as much systematic control over extraneous variables as is feasible. When this is not possible, the extent to which an effect can be attributed to the intervention is less certain. Strengths of the Experimental Design The overall aim of an experimental design is to control as many threats to internal and external validity as is possible, with RTC being the design that provides the most rigorous application of control. When a researcher examines the relationships between two or more variables he/she must be concerned about minimizing threats to internal validity and external validity. Internal validity is defined as the extent to which the effects detected are a true reflection of reality rather than being the result of extraneous variables (Burns & Grove, 1997, p. 230). That is, the researcher wants to be assured that the relationship between variables of interest is not influenced by unmeasured variables. External validity refers to the extent to which study findings are considered generalizable to other persons, settings or time (Shadish et al., 2002). The significance of a study is judged, at least in part, by whether the findings can be applied to individuals and groups separate from those in the sample studied (Burns & Grove, 1997). THREATS TO INTERNAL VALIDITY There are 12 commonly acknowledged threats to internal validity, or the ability to be confident that a proposed causal relationship reflects known, rather than unknown or unmeasured variables. These threats are: history; maturation; testing; instrumentation; statistical regression; selection; mortality; ambiguity about direction of a causal relationship; interactions with selection; diffusion or imitation of intervention; compensatory equalization of treatments; and compensatory rivalry by or resentful demoralization of respondents receiving less desirable treatments (Cook & Campbell, 1979; Shadish et al., 2002). Of these, history, selection, testing, instrumentation, and diffusion of intervention are especially relevant in bioethical intervention studies. Each of these threats will be illustrated by an intervention study whose purpose was to examine the effectiveness of education about research 208 MARION E. BROOME Table 1. Case Example: Illustration of Threats to Internal and External Validity. Purpose: The purpose of this study was to examine the effectiveness of education about research integrity delivered using different teaching techniques. The outcome variables included knowledge and attitudes about ethical approaches to research and scientific misconduct. The content was delivered via standard in-person classes compared to self-paced, interactive modules via web-based platform and both were compared to a group of students not taking the course. Methods: Ninety-six Ph.D. students in psychology from two different universities were randomly selected from a group of 200 volunteers to participate in the study. These 96 were then randomly assigned to one of two interventions or one control group. The first intervention group consists of a series of 4 on-line modules to be completed over a four-week period, the second intervention consists of six in class two hour sessions and the control group did not receive any instruction. The Scientific Misconduct Questionnaire-Revised (Broome, Pryor, Habermann, Pulley, & Kincaid, 2005) was used to assess knowledge, attitudes, and experiences with scientific misconduct after completion of the intervention and one year later for all three groups. All participants were surveyed using the SMQ-R preintervention, eight weeks and one year after the start of the study. integrity delivered to graduate students with the use of different teaching techniques. Outcome variables were knowledge and attitudes about ethical approaches to research and scientific misconduct. Content was delivered via standard in-person classes compared to self-paced, interactive web-based modules. Groups receiving these interventions were compared to a group of students not taking the course (see Table 1). History The threat of history refers to the possibility that participants in the experimental and the control group have different experiences while under observation in a study and that such experiences (extraneous variables) influence the outcomes differently in the groups. In the case study described in Table 1, the psychology students at one of the universities were all mandated to take a research ethics workshop provided by an official from the Office of Research Integrity at NIH. That is, the experience of taking a mandated research ethics course had the potential of affecting attitudes and knowledge about research ethics in that university but not in the other. This meant that the control group at one of the sites was no longer in the control condition after being exposed to the mandated class. 209 Intervention Research in Bioethics Selection The threat of selection refers to bias that occurs when recruiting and assigning study subjects does not give every potential participant the same opportunity to become enrolled in the study (random selection) or the same opportunity to be randomly assigned to the treatment or control condition (random assignment). In the case example, the investigators advertised the study widely to all psychology graduate students on both campuses. As one would expect, only those who were interested in participating volunteered, and those who volunteer to participate in studies may represent a specific subset of the population. The investigators addressed this bias by randomly selecting 96 of the 200 students who volunteered. Random selection involves selecting study subjects by chance (e.g., using a table of random numbers) to represent the population from which they are chosen (Shadish et al., 2002) and gives each volunteer an equal chance of being chosen. The investigators then randomly assigned each individual to one of the three-groups: one which completed four on-line modules over a four week period; another which took part in six didactic two-hour sessions; and a control group that received no instruction. Thus, the threat of selection bias was minimized. Testing Testing is a threat that occurs when participants are asked to respond to the same measure on several occasions and, as a result, may remember some of the specific items. In the case study, after the first administration, some of the students may have become sensitized to the items on the instrument used to evaluate knowledge and attitudes (Broome et al., 2005) and remembered how they answered the questions when being administered the same instrument second time. Hence, any change in responses may not have been explained by being exposed to the intervention. Some investigators handle this problem by using alternative but parallel forms of a measure, so that the questionnaire tests the same concepts but uses different wording for the items. Others measure both control and intervention groups pre- and post with the same instrument so that patterns of change can be statistically tested to control for initial differences between groups. That is, investigators can test the assumption that due to random assignment, one would expect no differences in pretest scores and the potential for testing problems would be the same for both groups. Thus, differences in outcome would be attributed to the intervention. 210 MARION E. BROOME Instrumentation This threat is related to a change in the method of measurement from pre- to post intervention. An example from the case study is the use of a quantitative survey to measure the students’ knowledge about research integrity before the intervention and the use of open-ended interviews assessing this knowledge after the intervention. Any change in knowledge (either via a score on the survey or a coded analysis of open-ended responses) cannot, with a reasonable level of certainty, be attributed to the intervention, but could likely be affected by the manner in which knowledge was measured at the two points in time. Another problem related to instrumentation can occur as a result of the response option format. For instance, when intervals on a scale are narrower on the ends than in the middle (e.g., extremely positive, very positive, positive, somewhat positive, somewhat negative, negative, very negative, extremely negative), responses on a second administration of an instrument may often tend to cluster around the middle rather than reflecting the full-range of options (Cook & Campbell, 1979). This may be due to individuals becoming frustrated at attempting to differentiate between the outer options of ‘‘very’’ and ‘‘extremely,’’ rendering the measure less a reflection of subjects’ actual responses and more a result produced by the format of the instrument itself. Diffusion of the Intervention When an intervention study is designed to assess the acquisition of knowledge or skills, and when individuals in the intervention and control groups can interact about an intervention (e.g., discuss it outside the study), the control group may gain access to information that may reduce differences between the two-groups on outcome measures. In the case example, this threat is especially salient as psychology graduate students often take the same classes or participate in similar activities within a university and may thus discuss with each other particulars of the educational intervention. A method to control for this is to randomize the intervention at the university level (across settings) rather than at the individual level (within a given setting). THREATS TO EXTERNAL VALIDITY Threats to external validity can limit the ability of the investigator to generalize the results of a study beyond the current sample, which, in turn, Intervention Research in Bioethics 211 limits the usefulness of the study’s findings. The three major threats fall into the following categories: (1) interaction of selection and treatment; (2) interaction of setting and treatment; and (3) interaction of history and treatment. In the first situation, study participants possess a specific characteristic as a group that interacts with the intervention in such a way that change in the outcome is not generalizable to other groups of individuals. For example, a significant effect of an intervention to increase medical students’ skills in clinical decision making in morally ambivalent situations may not be generalizable to a group of nursing assistants. In this case, the significant difference in educational level between groups is such that it interacts with the intervention to produce different outcomes. In the second and third situations (i.e., setting and history) contextual events (e.g., pay raises, new institutional leadership climate, additional bioethics workshops) that may occur during an intervention study are not replicable in a subsequent study or in other settings and will therefore restrict generalizability of findings. In summary, it is important that an investigator select a research design to test an intervention that will control for as many threats to internal and external validity as possible. Some threats (e.g., selection, testing, and history) can be controlled for by using randomization. Other threats (diffusion of treatment and instrumentation) can be planned for by randomizing across sites rather than within one site and using the same instrument for all assessments. Choosing well-established measures that have been tested in other studies and which have demonstrated adequate reliability and validity are crucial to obtaining credible responses. Maintaining strict protocols regarding instrumentation and data collection help to ensure that data are collected under as similar conditions as possible and that any differences in responses between groups is due to the intervention. Plans that include adequate time and rigor in cleaning and managing data and applying statistical tests best fit for the type of data collected will decrease threats to both types of validity and enhance the reliability and credibility of the findings (Cook & Campbell, 1977). ADVANTAGES OF INTERVENTION RESEARCH IN BIOETHICAL INQUIRY Many questions asked by bioethicists can be answered by any one of the many non-experimental designs available to researchers. However, there are 212 MARION E. BROOME Table 2. Selected Research Questions for Intervention Research. 1. Does the timing of information (72 h, 24 h, and immediately before an elective procedure) about a research study influence the understanding of the purpose of the study, the risks and benefits, and the refusal rate of participation? 2. Are nurses who are assigned to work with patient actors who are labeled as hospice patients more likely to discuss organ donation during a clinic visit than those assigned to patient actors who have cancer but are not enrolled in hospice? 3. Are tailored instructional materials for individuals at-risk for genetic conditions more effective than standardized materials in increasing knowledge and satisfaction? 4. Do school-age children who are shown a DVD depicting a child engaged in a study demonstrate greater comprehension and a more positive affect in regards to study participation? several important questions that require the investigator to systematically and rigorously compare, in a controlled context, the effects of one or more treatments on selected outcomes. The experimental or quasi-experimental designs used to test the effectiveness of an intervention are most useful when assessing changes in knowledge, understanding, attitudes, or behaviors related to some aspect of ethical phenomena (Danis, Hanson, & Garrett, 2001). Examples of some questions that can only be answered by intervention designs are included in Table 2. There are at least five distinct advantages to intervention research in bioethical inquiry: (1) the ability to examine whether a certain action (intervention) can be safely used with a selected group of individuals (efficacy) under relatively controlled conditions; (2) the ability to examine how useful an intervention is when used in the real world with a variety of people (effectiveness); (3) the ability to use multiple measurement methods (e.g., behavioral observation and questionnaires) to assess the impact of an intervention; (4) the possibility to maximize confidence in relationships found between variables (i.e., cause and effect); and (5) the ability to reach greater acceptance of findings by the larger scientific research community. CHALLENGES IN INTERVENTIONAL RESEARCH IN BIOETHICAL INQUIRY Not all bioethical phenomena are appropriate for study using experimental or quasi-experimental designs. In general, these include naturally occurring phenomena that cannot be manipulated, the presence of conditions to which Intervention Research in Bioethics 213 one cannot randomly assign individuals, and circumstances that cannot be scripted or protocolized. For instance, one cannot manipulate or randomize individuals to different conditions at the end-of-life to study how they cope. Another challenge is the ethics of studying ethical issues. For example, in some situations the imposition of a research study, no matter how well intended, can burden participants during difficult or stressful times (Sachs et al., 2003). One example of this is to study how parents make decisions about whether or not to enroll their child in an end-of-life research study soon after the death of the child. Therefore, it is critical to carefully evaluate the study’s potential for benefit (in relation to risk or harm) and for expanding knowledge prior to approval or implementation. Another challenge in intervention research is the ability to recruit an adequate number of participants so that statistically significant differences between groups can be detected. Given the nature of bioethical phenomena, it can be difficult to attract a large enough sample that meets requirements for rigorous statistical analyses. Multi-site studies that facilitate obtaining large samples can address this challenge. ILLUSTRATIONS OF INTERVENTION RESEARCH – APPROACHES AND FINDINGS In 1998, the National Institutes of Health funded 18 studies on the topic of informed consent. The purpose of this initiative was to produce (1) new and improved methods for the informed consent process; (2) methods that would address the challenges in obtaining consent from vulnerable populations; and (3) data to inform public policy (Sachs et al., 2003). The studies tested interventions designed to improve two different but related areas: (1) knowledge among potential research participants, and (2) decision-making abilities in vulnerable individuals (Agre et al., 2003). Six studies were RCTs and one used a quasi-experimental design where patients were not randomized to groups. The studies had in common the testing of a variety of media interventions such as videotapes, decision aids, and computer software to convey information to potential research subjects. A selection of the seven funded projects that were intervention studies will be reviewed below to illustrate the range of questions, designs, and analyses used in such research. One of the studies was conducted with patients and families in a hospital waiting room who were going to make a decision about participating in a 214 MARION E. BROOME clinical trial (Agre et al., 2003). Subjects were randomly assigned to one of the four modalities: standard verbal consent, a video, a computer program, or a booklet. A brief quiz was used to measure knowledge as the outcome. Findings revealed that the more complex the research protocol described to the potential participant, the higher the knowledge score. It also showed that the better-educated patients had higher scores, while more distressed individuals and those who were minorities scored lower. There were no primary effects for the four media suggesting none were superior to the others. However, the video was more effective for those deciding about complex protocols and for minority participants, while the booklet was more effective for those in poor health. In a second study on informed consent also focusing on knowledge outcomes, Campbell, Goldman, Boccio, and Skinner (2004) conducted a simulated recruitment for two pediatric studies, one high risk (e.g., insertion of device in patients awaiting heart transplant) and one low risk (e.g., longitudinal assessment of low birth weight infants), with parents of children enrolled in Head Start. Four different interventions were tested: (1) a standard consent form; (2) a consent form with enlarged type and more white space; (3) a videotape; and (4) a PowerPoint presentation. None of the four methods of conveying information was superior. However, parents were significantly less likely to enroll their child in a high-risk protocol regardless of the nature of the method tested. In one of the other studies (an Early Phase Research Trials – EPRT – with oncology patients), researchers first conducted a descriptive study in which interactions of patients and their physicians were audiotaped when the study was described and participation was offered (Agre et al., 2003). Based on this, an intervention was designed to increase the patients’ understanding of EPRT consisting of a 20 minute, self-paced, touch screen computer-based educational program. Patients were randomly assigned to the intervention or control group, with the latter receiving a pamphlet that explained the EPRT. Results showed that the intervention had minimal impact on agreement to participate, with equal numbers in both groups deciding to join the trial, although patients in the intervention group were more likely to say the intervention changed the way they made their decision. Mintz, Wirshing and colleagues (Agre et al., 2003) developed two videotapes preparing potential participants to consider enrollment in a medical study or in a psychiatric study. In addition to content and information, the intervention videotape encouraged individuals to be active participants during the informed consent process. The control video presented historical information and federal regulations about informed Intervention Research in Bioethics 215 consent. Participants for both studies were divided into intervention and control conditions. Knowledge about consent processes among participants in both the medical and psychiatric studies improved as a result of the intervention video compared to those viewing the control tape. In another study, Merz and Sankar recruited participants from a General Clinical Research Center (GCRC) to test the effectiveness of a standard consent form compared to a series of vignettes (Agre et al., 2003) on participants’ knowledge. At post-test, participants in both groups demonstrated relatively good comprehension and no significant difference emerged between control and intervention groups on knowledge. Overall, these studies show that the medium used to deliver the message did not consistently make a difference in the outcome variables of interest. What several of the studies did show were some important subgroup differences as results of the interventions. The strength of these efforts is that for the first time several studies with similar designs and interventions (albeit different populations) could be compared and some preliminary conclusions made that validate previous thinking about informed consent. These include assumptions that younger age, higher education, higher literacy, and stronger medical knowledge influence outcomes in positive directions. This suggests that investigators must give more attention to how consents are presented, the characteristics of subjects, and how comprehension is evaluated (Agre et al., 2003). Limitations of these informed consent studies include a lack of diversity in the samples, use of patient surrogates, participants who were well educated, and the use of complex designs with multiple variables. The majority of the subjects were white and well educated, reflecting the ethnic and socio-demographic settings in which most clinical trials are undertaken and, thus, restricting the generalizability of findings. The use of patient vignettes as a method to inform potential research subjects, while not unusual given the sensitive nature of many of the clinical trials (e.g., blood donation for DNA banking), also limits the generalizability of results to researchers, patients, and families (for a detailed discussion of the use of vignettes, see chapter on hypothetical vignettes). The testing of multiple variables in some of the studies reviewed, although realistic, also presents a challenge related to how external variables or the particular medium used may have influenced outcomes. Yet, findings from these studies provide preliminary evidence on which to build future research toward expanding our knowledge and offering ways to maximize the protection of human subjects through optimizing their comprehension and informed decision making related to consent to participate in research investigations. 216 MARION E. BROOME FUTURE RESEARCH The use of intervention designs, while a relatively recent phenomenon in bioethical inquiry, has a distinct and important role to play in advancing the field of bioethics. It is especially important to examine ethical practices that have widespread implications for patients, families, and health care professionals. This is particularly the case with practices that have been proposed by policy makers, such as advanced directives, organ donation, and do not resuscitate orders. Although not all, or even most, bioethical questions are appropriate to study using empirically driven intervention designs, some are and, in fact, some questions must be addressed using intervention models. Without a systematic, controlled approach to examining the effectiveness of interventions designed to change and test various outcomes, we will never know which actions work and which do not. Clinicians depend on rigorously designed intervention studies that provide evidence for the establishment of guidelines for conflict resolution and decision making in the delivery of quality health care. Furthermore, findings generated by intervention studies in bioethics makes available data for policy makers to formulate policies, fund programs, and enact legislation that may assist clinicians and ethicists to resolve value conflicts and other ethical problems. Ultimately, this will improve the lives of patients and families who experience suffering, not only from their illnesses, but also from vexing questions related to these illnesses. REFERENCES Agre, P., Campbell, F., Goldman, B. D., Boccia, M. L., Kass, N., McCollough, L. B., Merz, J., Miller, S., Mintz, J., Rapkin, B., Sugarman, J., Sorenson, J., & Wirshing, D. (2003). Improving informed consent: The medium is not the message (Suppl.). IRB: Ethics and Human Research, 25(5), 1–19. Broome, M. E., Pryor, E., Habermann, B., Pulley, L., & Kincaid, H. (2005). The scientific misconduct questionnaire – revised (SMQ-R): Validation and psychometric testing. Accountability in Research, 12(4), 263–280. Burns, N., & Grove, S. (1997). The practice of nursing research: Conduct, critique and utilization. Philadelphia, PA: W.B. Saunders Co. Campbell, F., Goldman, B., Boccio, M., & Skinner, M. (2004). The effect of format modifications and reading comprehension on recall of informed consent information by low-income parents: A comparison of print, video, and computer-based presentations. Patient Education and Counseling, 53, 205–216. CONSORT. (2004). http://www.consort-statement.org/. Last accessed on November 28, 2005. Intervention Research in Bioethics 217 Cook, T., & Campbell, D. (1977). Quasi-experimentation: Design and analysis issues for field setting. Boston: Houghlin-Mifflin Co. Cook, T., & Campbell, D. (1979). Quasi-experimentation: Design and analysis issues for field setting. Boston: Houghlin-Mifflin Company. Cozby, P. C. (2007). Methods in behavioral research (9th ed.). New York: McGraw-Hill. Danis, M., Hanson, L. C., & Garrett, J. M. (2001). Experimental methods. In: J. Sugarman & D. L. Silmas (Eds), Methods in medical ethics (pp. 207–226). Washington, DC: Georgetown University Press. Friedman, L., Furberg, C., & DeMets, D. (1998). Fundamentals of clinical trials. New York: Springer. Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research. Newbury Park, CA: Sage. Rimer, R. K., Halibi, S., Skinner, C. S., Lipkus, I. M., Strigo, T. S., Kaplan, E. B., & Samsa, G. P. (2002). Effects of mammography decision-making intervention at 12 & 24 months. American Journal of Preventive Medicine, 22, 247–257. Rossi, P., Freeman, H., & Lipsey, M. (1999). Evaluation: A systematic approach (pp. 309–340) Thousand Oaks, CA: Sage Publication. Sachs, G., Houghman, G., Sugarman, J., Agre, P., Broome, M., Geller, G., Kass, N., Kodish, E., Mintz, J., Roberts, L., Sankar, P., Siminoff, L., Sorenson, J., & Weiss, A. (2003). Conducting empirical research on informed consent: Challenges and questions (Suppl.). IRB: Ethics and Human Research, 25(5), 4–10. Shadish, W. R., Cook, T., & Campbell, D. (2002). Eperimental and quasi-experimental designs for generalized causal inference. Boston: Houghlin-Mifflin. Skinner, C. S., Schildkraut, J. M., Berry, D., Calingaert, B., Marcom, P. K., Sugarman, J., Winer, E. P., Iglehart, J. D, Futreal, P. A., & Rimer, B. K. (2002). Pre-counseling education materials for BRCA testing: Does tailoring make a difference? Genetic Testing, 6(2), 93–105. Sugarman, J. (2004). The future of empirical research in bioethics. Journal of Law, Medicine and Ethics, 32, 226–231. Sugarman, J., Faden, R. R., & Weinsein, J. (2001). A decade of empirical research in bioethics. In: J. Sugarman & D. P. Sulmasy (Eds), Methods in medical ethics (pp. 19–28). Washington, DC: Georgetown University Press. Sugarman, J., & Sulmasy, D. P. (2001). Methods in medical ethics (p. 20). Washington, DC: Georgetown University Press. This page intentionally left blank SUBJECT INDEX CONSORT 206 constant-variable vignette method 163 construct validity 122, 152 content validity 51, 152, 170 control groups 205, 206 convenience samples 146 cost of surveys 150 Advance directives 24, 31 applied clinical ethics 14, 15, 16, 19 ATLAS.ti 127 audio-recordings 45, 47, 125 automated data retrieval 133 Bias 54, 75, 154, 186, 191 Categories 47, 48 clarity of wording 168 clinical ethics 14 clinical ethics consultation 32 closed-ended survey 41 clustering effects 196 codebook 49, 51 code development 59 code reports 55, 56, 57 codes 48, 53, 127 coding agreement 54 coding drift 131 coding manual 128 coding reliability 51, 54 coding scheme 49 cognitive interviewing 166 community-based research 187 community decision-making 184 community involvement 68, 73 computer assisted telephone interviewing 154 conceptual map 53 confidentiality 42, 68, 156 conflict of interest 29 consensus coding 131 consensus process 55 Data immersion 47 data preparation chart 126 debriefing notes 73 decision-making capacity 27, 31 decisional capacity 31 deductive codes 48, 49 descriptive studies 205 diffusion of intervention 207 digitally recording interviews 125 distributive justice 193 doctrine of informed consent 22, 27 Ethical theories 13, 14, 15, 16, 19, 23 ethics consultants 13, 14, 16, 17, 18, 19 ethics consultations 16, 17, 18 experimental designs 205 exploratory research 118 external validity 176, 203, 207 extraneous variables 207 Face validity 152 factorial design 174 financial incentives 150, 151 focus group questions 67 219 220 focused theory-testing 118 follow-up surveys 191 forced-choice responses 149 forgoing treatment 24 formative data 65 factorial designs 175 Generalizability 79, 211 goals of care 19 grounded theory 41 Health care disparities 25 health care priorities 184, 193 HIPAA Privacy Rule 191 human subjects research 186 hypothesis-testing 175 Inductive codes 48 inductive reasoning 50 informed consent 31, 68, 125, 198, 213, 214, 215 Institute of Medicine 25, 30 instrumentation 207 inter-coder reliability 129 internal consistency 169, 171 intervention studies 204 interview guide 46, 119 intra-rater reliability 171 IRBs 28, 29, 30 Justice 15, 193 Karen Ann Quinlan 24 key informant interviews 151 Labeling 44 legitimacy 184, 186, 188 life-sustaining interventions 24 Likert scale, 173 literacy levels 145 Memoing 47 memos 56, 60 SUBJECT INDEX multi-level consensus coding (MLCC) 129 Nancy Cruzan 24 National Bioethics Advisory Commission 30 node type 51 nominal group technique 188 non-probability sampling 145 non-random sampling 197 non-response bias 154 Online coding 132 open-ended interviews 41 open-ended queries 118, 168 ordering of survey items 150 Pilot testing 120, 122 positivist view 54 pragmatic ethics 15 pre-/post-test designs 205 preliminary coding 49 pretest 167 privacy of information 69 procedural justice 193 protecting human subjects 28, 30 provisional coding manual 128 proxy directives, 25, 31 public opinion 198 public opinion polls 185 purposeful sampling 43, 146 Qualitative content analysis 40 qualitative data analysis 132 quantitative content analysis 39 quasi-experimental designs 205, 212, 213 question order 156 quota samples 146 Randomization 186, 190, 205, 206, 213 rates of non-response 147 recall bias 154 221 Subject Index recruitment of subjects 123, 186 reliability 152, 169, 171 representative 185, 186, 194 research questions 18, 43, 57, 151 respondent burden 154 response frames 149 response rates 151 response summaries 75 response-wave bias 155 right to die 24 round robin method 188 Sampling 43, 66, 123, 124, 145, 146, 150, 173, 186, 190, 194, 199 selection bias 207 self-administered surveys 149 semi-structured interviews 40, 117 simple random sample 145 simulation exercise 193, 194 snowball sampling 123, 146 social justice 25, 31 socially desirable response bias, 154, 156 stratified random sampling 67, 145 sub-codes 128 substantive representation 186 SUPPORT study 24 survey development 145 survey fielding 153 survey question wording 186 survey responses 189 Team coding 51 test-retest reliability 171 textual data 39 thematic coding 128 themes 47, 48, 128 theoretical saturation 124 therapeutic misconception 22 town hall meetings 185 transcription 45, 47, 120, 126 Units of analysis 43, 196 univariate (single item) distributions 156 Validity 54, 122, 152, 169 value conflicts 14 vulnerable populations 28, 30 This page intentionally left blank View publication stats