"Virtual" Trials: Case Control Experiments Utilizing Health Services Research Workstation

"Virtual" Clinical Trials:
Case Control Experiments Utilizing a Health Services Research Workstation

Mark G. Weiner, M.D., Alan L. Hillman, M.D., M.B.A.
Division of General Internal Medicine
University of Pennsylvania School of Medicine
We created an interface to a growing repository of
clinical and administrative information to facilitate
the design and execution of case-control
experiments. The system enables knowledgeable
users to generate and test hypotheses regarding
associations among diseases and outcomes. The
intuitive interface allows the user to specify criteria
for selecting cases and defining putative risks. The
repository contains comprehensive administrative
and selected clinical information on all ambulatory
and emergency department visits as well as hospital
admissions since 1994. We tested the workstation's
ability to determine relationships between outpatient
diagnoses including hypertension, osteoarthritis and
hypercholesterolemia with the occurrence of
admissions for stroke and myocardial infarction and
achieved results consistent with published studies.
Successful implementation of this Health Services
Research Workstation will allow "virtual" clinical
trials to validate the results of formal clinical trials
on a local population and may provide meaningful
analyses of data when formal clinical trials are not
feasible.
which is a specialized interface to an integrated

clinical practice database. It can be used to test the
relationships among diseases and perform "virtual"
clinical trials to examine the impact of different
treatment modalities on the course of a disease.
Since the tool utilizes population-based information
from patients within a local community, results of
these experiments should be directly applicable to
those populations and acceptable to physicians
practicing in the region.
BACKGROUND
A growing body of literature demonstrates the
clinical and research value of relational database

technology that integrates medically related
information from disparate legacy systems.
Electronic access to comprehensive, longitudinal
information on patients has improved the process of
patient care as measured by efficiency,' reduction of
adverse drug events,2 and more appropriate medical
decision-making.3
In addition to these direct clinical applications, the
use of databases to support clinical research has
been debated and tested for years. Notable
successes include the Regenstrief Medical Records
System which has demonstrated successfully a
variety of risk factors for development of renal
The Health
insufficiency in hypertensives.4
Evaluation through Logical Processing (HELP)
system at the Latter Day Saints (LDS) Hospital was
able to determine risk factors for nosocomial
infections.5 Despite these successes, criticism of
this methodology focuses on the observational,
largely administrative nature of the information.
Within these databases, inaccurate data can be
recorded and subtle differences in patient
characteristics can be overlooked. However, when
the administrative data are coupled with clinical data
in the form of practice databases, the impact of some
of these errors can be ameliorated.6 For example,
the Duke Cardiovascular Disease Database
demonstrated the power of such integration of data
by generating prediction models of survival after
coronary artery bypass surgery that agreed with the
results of randomized controlled trials.7 These
examples demonstrate that, while not a substitute for
INTRODUCTION
The increasing focus on quality of care in medicine
requires new systems to recognize patients at risk
for serious illness and to promote interventions that
demonstrably improve health status. Traditionally,
formal experimental methods have been required to
prove an association between a disease and its
putative risk factors. Once a risk is discovered, then
further clinical trials are required to determine
whether management of the risk factor can reduce
the likelihood of disease occurrence. Unfortunately,
concerns about the generalizability of the results
often hampers their acceptance by practicing
clinicians. Such concerns are often well-founded
since, in our experience, a physician's patient
population often has different characteristics than
the formal study population and effectiveness of
therapy in the real world can often differ from
efficacy achieved within a tightly-monitored,
controlled study.
In this paper, we describe the development and
testing of a Health Services Research Workstation
1091-8280/98/$5.00 C 1998 AMIA, Inc.
300
clinical trials, population research using practice

databases can generate hypotheses warranting
further study and can test the findings of clinical
trials on local populations. Importantly, clinical
research using practice databases can provide
valuable analyses when a trial is unethical, or
Models are being developed to
infeasible.
determine the characteristics that determine if
database research or randomized clinical trials are
more appropriate."
ODBC connection and through password and Secure

Socket Layer encryption of the Web interface.
B. Interface Development
We developed the interface using Microsoft Access
and subsequently made it available via web
browsers using Active Server Pages constructed
using Microsoft's Visual InterDev. In addition to
the traditional interface which enables the selection
of patients with specified demographic and clinical
criteria, this interface is structured to facilitate casecontrol experiments that allow the researcher to test
for an actual association between disease occurrence
and reputed or hypothesized risk factors. In these
experiments, the researcher calculates an odds ratio
which quantifies the relative occurrence of a
putative risk in "case" patients known to have a
disease compared with "control" patients who do not
have the disease. Figure 1 shows a two-by-two table
which illustrates all the possible combinations of the
presence or absence of a disease and the presence or
absence of a risk.
Whereas the use of databases for clinical practice

and research9 has met with great success, only a few
notable efforts have attempted to automate the
methodology and structure interfaces to facilitate
access to data necessary for the experiment design
and revision process.'0 Therefore, an additional
goal of our work was to incorporate analysis tools
with an experiment design interface to help interpret
population data. Our system enables rapid design
and execution of case-control studies.
We
hypothesize that the interface, and the data
supporting it, can be used to identify known
associations and reject implausible relationships.
Case
Control
Risk Present [ Xa | bI ]
Risk Absent
c
d
Figure 1: A sample 2 x 2 table
METHODS
A. Database Development
We developed a comprehensive, integrated view of
the diverse population cared for by the University of
Pennsylvania Health System. This group of 500,000
patients accounts for over 1 million ambulatory
visits/year and 33,000 hospital admissions/year to
our main tertiary care hospital. The data elements
accessible at this time include basic demographics
and information on all office visits and inpatient
admissions including diagnoses assigned,
procedures performed and charges assessed since
January 1994. This information had existed
previously only within two distinct databases, IDX
for outpatients and PHS for inpatients.
Additionally, we have integrated laboratory results
from Cerner and the clinical findings of cardiac
nuclear
and
catheterization,
cardiology
echocardiography procedures that are currently
stored within distinct custom-built databases.
Finally, we incorporated pharmacy data on a subset
of the Health System population. All the data exist
centrally on a DEC AlphaServer, on a UNIX
platform running Oracle 7.3. The data are
accessible through both direct ODBC connections to
Pentium workstations running Microsoft Access and
using web browsers which interact with the database
via a Pentium II, 266 MHz machine with Microsoft
Windows NT running Internet Information Server
4.0 and active server pages. Security is achieved
through password and firewall protection of the
The degree of association between a risk and a

disease is related to the magnitude of the odds ratio
which is calculated as (a-d)/(b-c). Values greater
than 1 support a positive association between the
risk and the disease; values less than 1 support a
negative association with the disease, i.e., the
presence of the putative risk protects the individual
from the disease. Statistical significance of the
association is determined by deriving the 95%
confidence intervals around the odds ratio. The
association is defined as significant (p<0.05) if the
range of values between the confidence intervals
does not include 1. The formula used to calculate
the confidence interval (CI) is as follows:
CI = (a d/b c)exp(z (1/a + 1/b + 1/c + I/d)o 5)
where z=l.96 to produce a 95% confidence
interval.s
The user conducts the case control experiment

design in five simple steps which are illustrated in
Figure 2. Cases and putative risks, respectively, are
first designated by entering the appropriate
International Classification of Diseases-9-Clinical
Modification (ICD-9-CM) diagnostic codes in the
text boxes. The "*" is a wildcard character to
represent multiple related codes. The user can vary
the number of controls in a third text box. When the
301
because of their relative frequency in the population

and because they illustrate a spectrum of possible
risk associations with the "case" diagnoses. For
instance, the relationship between hypertension and
both myocardial infarction and stroke is very well
defined,'2 and we expected to reaffirm that
association. Conversely, there is no known link
between the presence of arthritis and either stroke or
myocardial infarction, so we did not expect to
generate a significant odds ratio. Finally, the
relationship between elevated cholesterol and
myocardial infarction has been well established
since the original Framingham study,'3 although a
link between an elevated cholesterol and stroke has
not been demonstrated.14 We expected the odds
ratios generated by our system to be consistent with
these predictions from peer-reviewed, published
literature.
information is entered, the user executes the

experiment by pressing a button to locate all cases
and another to locate controls chosen at random
from a similar population. Retention of the case and
control finding as two separate steps stems from the
frequent need to select the number of desired
controls AFTER the number of cases has been
determined. Typically, the number of controls is at
least as great as the number of cases found.
Results are displayed as an odds ratio with

confidence interval, as well as the demographic
characteristics of both cases and controls. Analysis
of the impact of different demographics should be
the subject of further investigation.
C. Selection of cases and putative risks
Ideal case diseases for study include, for example,

hospital admissions for stroke (ICD-9=434.*) and
myocardial infarction (ICD-9=410.*) because of
their relatively high prevalence in the population
and the existence of well-defined criteria for
diagnosis. We drew cases from the data derived
from the inpatient database. To test the ability of
the system to reach conclusions that require
integration of information, we chose risks from
within the distinct ambulatory care database.
Ambulatory diagnoses of hypertension (ICD-9 =
401.9), osteoarthritis (ICD-9 = 715.*) and
hypercholesterolemia (ICD-9 = 272.0) were chosen
as putative risks of stroke and myocardial infarction
Stopi DanStao
Step 3: L ateC_
Stop 2: DeneCmbRks
Risk Dehb"io
l
i Dx)
CaseIhilion
CagDDefinko
(Hospi D/C Ox)
The choice of the population from which cases and

controls are drawn requires special consideration.
Many of our health system's ambulatory patients
may be hospitalized at locations for which we have
no data. Similarly, many patients admitted to the
Hospital of the University of Pennsylvania are seen
by primary care physicians outside our health
Therefore to enable appropriate
system.
stratification of patients, the interface queries a data
mart consisting of the subset of Health System
patients for whom both inpatient and ambulatory
data exist.
Stp 4: Set sz d
1T
__
W
w~ IL401.......
I
wnC
WI .ZZ
TTTT..
lI
339
Ms
Control
290
370
645
650
Cam
MeanAg
IOdds
64.2573531085
598.346153845154
30.98
30.99
Germbr:
Male 383 Feale 262
Male 268 Female 392
Race:
Cauasion
Age Range
Stop 5 Seldcr*ols
conlba
403Back* 185
Rad
Cl:
Caucation 314 Black 308
Figure 2: The Case-Control design and results reporting form
302
1.46393557
1.17593 to
1.8224694
Table 1: Odds ratios associated with putative risks and outcomes (OR = odds ratio; *p<O.05)
PUTATIVE RISK
OUTCOMIE
OR
CONFIDENCE INTERVAL
Ambulatory Diagnosis of
Admission for Myocardial 1.64
1.24-2.13*
Hypertension
Infarction
Admission for Stroke
2.73
2.04-3.65*
Hypertension
Ambulatory Diagnosis of Arthritis Admission for Myocardial 1.20
0.87 - 1.65
Infarction
Ambulatory Diagnosis ofArthritis Admission for Stroke
1.41
0.95-2.04
Admission for Myocardial 3.47
2.69-4.48*
Hypercholesterolemia
Infarction
Admission for stroke
0.93
0.62-1.38
HypercholesterolemiaIaI
I
RESULTS
difference in demographic characteristics of the

cases and controls may confound the results.
Logistic regression can determine the impact of
these differences on the significance of the odds
Figure 2 displays the user interface screen, including

the design and results of one case-control
experiment to examine the association between
hypertension and myocardial infarction. Table 1
summarizes the results of the other experiments. As
anticipated, the odds ratios generated by evaluation
of hypertension as a risk for both myocardial
infarction and stroke showed a statistically
significant positive association. However, no
significant association was found between
osteoarthritis and these significant medical
outcomes. We found a strong association between
an ambulatory diagnosis of hypercholesterolemia
and an admission for myocardial infarction. Finally,
as expected, no association was demonstrated
between elevated cholesterol and stroke. All of our
hypotheses were confirmed.
ratios. We anticipate that improved sources of data

and incorporation of additional analytical methods
will address such issues and we will incorporate
these into the workstation in the near future.
The risk factors used in this project were
dichotomous variables; the risk factor was either
present or absent. Future research should address
whether continuous variables can be identified as
putative risks. For example, instead of comparing
patients with hypercholesterolemia to those who do
not carry that diagnosis, average cholesterol levels
associated with patients who develop myocardial
infarction could be compared with the average level
of controls who have not had a myocardial
infarction. Furthermore, likelihood ratios can be
derived which reflect the risk of myocardial
infarction for different ranges of cholesterol level.
This information will be particularly useful for
patient assessment and clinical decision-making.
DISCUSSION
This work demonstrates the face validity of a data
model and interface of a Health Services Research
Workstation that can be used to design and carry out
"virtual" clinical trials. The statistical associations
discovered by the workstation are consistent with
those of published clinical trials. Slight differences
in the actual odds ratios are attributable to the nature
of the study population. Since the population
analyzed, by definition, required at least one
inpatient stay, the patients, both cases and controls,
are probably less healthy than the general
population. However, the impact of this bias would
likely blunt any apparent relationship between a
disease and its risk; a sicker population may be more
likely to have the risk than the general population,
even if they did not have the "case" disease.
As the breadth, depth and longitudinal scope of our

data collection continue to grow, it will become
possible to determine the impact of risk factor
modification on the course of a disease. For
instance, while we have shown an association
between hypertension and myocardial infarction, we
should also be able to show a reduced association as
blood pressure improves over time.
The goal of health services research extends beyond
the integration and analysis of traditional clinical
and
administrative
parameters.
Validated
measurements of health related quality of life can
provide important, additional insights into the wellbeing of a population. Paul Ellwood, in his 1988
Shattuck Lecture,'5 envisioned a new era in
We recognize, and continue to address, statistical

issues raised by the results shown. For example, the
303
'
outcomes research wherein health services

researchers could rapidly design and execute clinical
trials that could detect when different treatment
strategies result in meaningful improvements in the
course of a given disease. Although outcomes of
interest have traditionally been assessed using
concrete measures such as mortality or health care
utilization, these outcomes do not necessarily reflect
the impact of other non-fatal, but life-altering effects
of disease and its treatment. The Short Form 12 (SF12),16 a brief 12-question survey that evaluates
patients' perceptions of their health status, is one of
many instruments that capture this essential clinical
detail. Patients' ability to complete the survey in
under two minutes, and the existence of
computerized methods of administration should
enable data collection on a large scale, within the
workflow of a clinical encounter.
Tierney WM, McDonald CJ, Luft FC. Renal

disease in hypertensive adults: Effect of race and
type II diabetes. American Journal of Kidney
Diseases 1989:13;485-493.
5 Evans R, Gardner RM, Bush AR, Burke JP,
Jacobson JA, Larson RA, et al. Development of a
infectious
disease
monitor.
computerized
Computers in Biomedical Research 1985;18:103113.
6 Tierney WM, McDonald CJ. Practice databases
and their uses in clinical research. Statistics in
Medicine 1991;10:541-57.
7 Hlatky MA, Califf RM, Harrell FE, Lee KL, Mark
DB, Pryor DB. Comparison of predictions based on
observational data with results of randomized
controlled clinical trials of coronary artery bypass
surgery. Journal of the American College of
Cardiology 1988;1 1:237-45.
8 Hornberger, J, Wrone E. When to base clinical
policies on observational versus randomized trial
data. Ann Intern Med 1997; 127:697-703.
9 Scully KW, Pates RD, Desper GS, Connors AF,
Harrell FE, Pieper KS, et al. Development of an
Enterprise-Wide Clinical Data Repository: Merging
Multiple Legacy Databases. Proceedings, SCAMC
1997;21 ;32-36.
10 Safran C, Chute CG. Exploration and exploitation
of clinical databases. International Journal of
Biomedical Computing 1995;39:151-6.
'" Hennekens CH, Buring JE. Epidemiology in
Medicine. Boston: Little Brown and Co, 1987.
12 SHEF Cooperative Research Group. Prevention
of stroke by antihypertensive drug treatment in older
persons with isolated systolic hypertension. Final
results of the Systolic Hypertenstion in the Elderly
Program. JAMA 1991 ;265:3255-64.
'3 Kannel WB, Dawber TR, Kagan A, Revotskie N,
Stokes J. Factors of risk in the development of
coronary heart disease -six-year follow-up
experience: the Framingham Study. Ann Intern Med
1961 ;55:33-50.
14 Atkins D, Psaty BM, Koepsell TD, Longstreth
WT, Larson EB. Cholesterol reduction and the risk
for stroke in men: A meta-analysis of randomized
controlled trials. Ann Intern Med 1993;1 19:136145.
15 Ellwood PM. Shattuck lecture--Outcomes
management. A technology of patient experience.
New England Journal of Medicine. 318:1549-56,
1988.
16 Ware JE Jr. Kosinski M, Keller SD. A 12-item
short-form health survey: Construction of scales and
preliminary tests of reliability and validity. Medical
Care, 34:220-33, 1996.
We have shown that a Health Services Research

Workstation can provide preliminary answers to
important clinical questions in real-time, and at far
less cost than traditional studies. It may also
validate results of randomized controlled trials
performed on a specific population. As a result, this
new integrated methodology represents an important
advance in the feasibility, applicability and
Further
acceptability of outcomes research.
enhancements, including the addition of more
sophisticated analytical techniques and health
related quality of life measures, will provide a
powerful mechanism for continuous monitoring and
improvement of the quality of care.
Acknowledgments
Dr. Weiner was supported by the National Library
of Medicine Independent Fellowship in Applied
Informatics, grant number LM00051.
References
Garrett LE, Hammond WE, Stead WW. The
effects of computerized medical records on provider
efficiency and quality of care. Meth Inform Med
1986;25:151-7.
2 McDonald CJ, Hui SL, Smith DM. Reminders to
physicians from an introspective computer medical
record: A two year randomized trial. Ann Intern
Med 1984;100:130-8.
3 Evans RS, Burke JP, Pestotnik SL, Classen DC,
Menlove RL, Gardner RM. Prediction of hospital
infections and selection of antibiotics using an
automated hospital database. Proceedings, SCAMC
1990;14;663-8.
304

"Virtual" Trials: Case Control Experiments Utilizing Health Services Research Workstation

Uploaded by

Copyright:

Available Formats

"Virtual" Trials: Case Control Experiments Utilizing Health Services Research Workstation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

"Virtual" Trials: Case Control Experiments Utilizing Health Services Research Workstation

Uploaded by

Copyright:

Available Formats

"Virtual" Clinical Trials:

Case Control Experiments Utilizing a Health Services Research Workstation

which is a specialized interface to an integrated

clinical and research value of relational database

1091-8280/98/$5.00 C 1998 AMIA, Inc.

clinical trials, population research using practice

ODBC connection and through password and Secure

Whereas the use of databases for clinical practice

The degree of association between a risk and a

The user conducts the case control experiment

because of their relative frequency in the population

information is entered, the user executes the

Results are displayed as an odds ratio with

Ideal case diseases for study include, for example,

The choice of the population from which cases and

Male 383 Feale 262

Male 268 Female 392

Caucation 314 Black 308

Figure 2: The Case-Control design and results reporting form

difference in demographic characteristics of the

Figure 2 displays the user interface screen, including

ratios. We anticipate that improved sources of data

As the breadth, depth and longitudinal scope of our

We recognize, and continue to address, statistical

outcomes research wherein health services

Tierney WM, McDonald CJ, Luft FC. Renal

We have shown that a Health Services Research

You might also like