SDTM-ADaM Pilot Project PDF
SDTM-ADaM Pilot Project PDF
SDTM-ADaM Pilot Project PDF
Project Report
Executive Summary
Background
CDISC is a non-profit, multidisciplinary consensus based standards development
organization founded over a decade ago that has established open, worldwide
biopharmaceutical data standards to advance the continued improvement of public health by
enabling efficiencies in medical research. In addition to enabling FDA submissions, many
other advantages come with the use of clinical data standards. Research studies have shown
that standards enhance the performance of clinical studies in several other key areas such
improved internal data warehousing, data integration and data transport, as well as enabling
collaborative research through timely and efficient data-sharing.
The collective power and borderless innovation provided by the CDISC constituency is well
represented by the performance of this CDISC SDTM/ADaM Pilot Project to generate an
ICH E3/eCTD clinical study report (CSR) using the CDISC data models.
Overview
This report documents the efforts made by the Pilot Project core team to successfully
accomplish the above stated objectives. The legacy data used in the Pilot Project was
provided by Eli Lilly and Company from a phase II clinical trial. Each step of the pilot
process and work completed are easily followed in this report beginning with the de-
identification of the pilot legacy data, application of CDISC Standards (including SDTM,
ADaM, and CRTDDS), and resulting in the creation of a CDISC-compliant electronic
clinical study report submission.
1
This Pilot Project is also referred to as “Pilot 1.” It was conducted during 2006 and 2007.
2
Disclaimer: All comments, statements, and opinions attributed in this document to
the regulatory (FDA) review team reflect views of those individuals conveyed as
informal feedback to the pilot project team, and must not be taken to represent
guidance, policy, or evaluation from the Food and Drug Administration.
This pilot project effort represented an unprecedented amount of work and collaboration
between CDISC3, the Industry and FDA and led to a number of valuable learnings. These
learnings are documented in this report in Section 6, and were presented at the 2006 and
2007 CDISC Interchange conferences.
Conclusion
All of the aforementioned goals were met by the CDISC SDTM/ADaM pilot project. The
project established that the package submitted using CDISC standards met the needs and the
expectations of both medical and statistical reviewers participating on the regulatory review
team. The regulatory review team noted the importance of having both data in SDTM format
to support the use of FDA review systems and interactive review, and data in ADaM format
to support analytic review. The project also demonstrated the importance of having
documentation of the data (e.g., the metadata provided in the data definition file) that
provides clear, unambiguous communication of the science and statistics of the trial.
The regulatory review team expressed a favorable impression of the pilot submission
package. They were optimistic about the impact that data standards will have on the work
associated with their review of new drug applications.
3
Disclaimer: As defined in CDISC Core Principles, CDISC standards support the
scientific nature of research and allow for flexibility in scientific content; however,
CDISC does not make the scientific decisions nor drive scientific content; rather, our
primary purpose is to improve process efficiency and provide a means to ensure that
submissions are easily interpreted, understood and navigated by medical and
regulatory reviewers.
Page 2 of 63
Project Report: CDISC SDTM/ADaM Pilot
Table of Contents
1. Introduction................................................................................................................. 5
1.1. Outline of this pilot project report ............................................................................ 5
1.1.1. Additional pilot project materials available...................................................... 6
1.2. Terms and phrases used in the report........................................................................ 6
1.3. Description of the project.......................................................................................... 7
1.4. Caveats...................................................................................................................... 9
1.5. Orientation to the legacy study ............................................................................... 10
2. Process ...................................................................................................................... 11
2.1. General description ................................................................................................. 11
2.2. Data and tools used ................................................................................................. 13
2.2.1. Legacy data ..................................................................................................... 13
2.2.2. Standards / tools used...................................................................................... 14
2.2.3. MedDRA coding of event data ....................................................................... 14
2.2.4. Process for concomitant medication coding with WHODD........................... 15
2.3. Annotating the CRF ................................................................................................ 15
2.4. Creation of SDTM datasets from the legacy data................................................... 16
2.5. Analysis datasets..................................................................................................... 18
2.5.1. Issues addressed as a result of review team comments................................... 19
2.6. Derived data in SDTM............................................................................................ 20
2.7. Analysis results ....................................................................................................... 22
2.8. Writing the study report .......................................................................................... 22
2.9. Assembling and publishing the pilot submission package ..................................... 23
2.10. Quality control .................................................................................................... 23
3. Metadata.................................................................................................................... 24
4. The pilot project Define.xml..................................................................................... 26
4.1. Overview................................................................................................................. 26
4.2. Appearance of the Define file ................................................................................. 26
4.3. Internal structure and creation of the Define file .................................................... 26
4.4. Metadata implementation issues ............................................................................. 27
4.5. Issues addressed as a result of review team comments........................................... 27
4.6. Issues to be addressed regarding metadata ............................................................. 28
Page 3 of 63
Project Report: CDISC SDTM/ADaM Pilot
Page 4 of 63
Project Report: CDISC SDTM/ADaM Pilot
1. Introduction
Submission of data to the Food and Drug Administration (FDA) has been necessary for years
in order for the FDA to conduct a thorough review and electronic submission of data will
likely become a regulation in the future. The Clinical Data Interchange Standards
Consortium (CDISC) is a non-profit, multidisciplinary consensus based standards
development organization founded over a decade ago and has established open, worldwide
biopharmaceutical data standards to advance the continued improvement of public health by
enabling efficiencies in medical research. During this 10-year period, CDISC has focused
considerable effort on developing standards to help FDA in its review and approval process
of safety and efficacy data. To this end, the CDISC data models have been successfully used
to help FDA better understand industry data, by providing a platform of standard data
content. This standard data minimizes programming and rework of the data during FDA
review, and greatly facilitates the integration and reuse of data from multiple submissions for
broader scientific and medical evaluation.
The development of CDISC standards has been informed by descriptions of FDA reviewers’
needs expressed by FDA Liaisons. Over time, the Submission Data Tabulation Model
(SDTM) and the Analysis Data Model (ADaM) have matured to the point that references to
them in industry forums are now common. The standards have garnered the attention of the
mainstream of the pharmaceutical industry, which is working on ways to implement these
standards in hopes of streamlining submission and facilitating review of the data. CDISC
recognizes that the unity and interoperability of data standards is a necessity for both the
submission and the review and approval process.
This report describes the CDISC SDTM/ADaM Pilot Project, hereafter referred to as the
“pilot project.” The objective of the pilot project was to test the effectiveness of data
submitted to FDA using CDISC standards in meeting the needs and the expectations of both
medical and statistical FDA reviewers. In doing this, the project would also assess the data
structure/architecture, resources and processes needed to transform data from legacy datasets
into the SDTM and ADaM formats and to create the associated metadata.
Page 5 of 63
Project Report: CDISC SDTM/ADaM Pilot
• Section 5 describes the interactions and communications between the pilot project team
and the regulatory review team.
• Section 6 summarizes the key points and outstanding issues noted in the report.
• Section 7 contains the appendixes of the report.
o The first appendix summarizes key points about the management of the pilot project,
including a list of participants.
o The second appendix provides an overview of the repository used by the pilot project
team.
o The remaining appendixes supplement the information in the body of the report with
more detailed information.
o A list of abbreviations used in the project report is also included.
4
April 2006 FDA guidance regarding regulatory submissions in electronic format (“Guidance for Industry:
Providing Regulatory Submissions in Electronic Format – Human Pharmaceutical Product Applications and
Related Submissions Using the eCTD Specifications, April 2006, Electronic Submissions, Revision 1,” and the
associated document “Study Data Specifications”). Refer to the following website:
http://www.fda.gov/cder/regulatory/ersr/ectd.htm
Page 6 of 63
Project Report: CDISC SDTM/ADaM Pilot
The CDISC Define.xml team has written a document specifying the standard for providing
Case Report Tabulations Data Definitions in an XML format for submission to regulatory
authorities (e.g., FDA). The XML schema used to define the expected structure for these
XML files is an extension to the CDISC Operational Data Model (ODM).
The term “SAS transport files” refers to SAS® XPORT (version 5) transport files (XPT), i.e.,
data in the SAS XPORT Transport format5.
Tabulation datasets contain the data collected during a study, organized by clinical domain.
These datasets conform to the CDISC Submission Data Standards (SDS), as described in the
CDISC Study Data Tabulation Model. The SDTM was developed by the CDISC
Submissions Data Standards (SDS) team, and precursors to the SDTM were called SDS
standards. The terms “tabulation dataset” and “SDTM dataset” are used interchangeably in
this document.
Analysis datasets contain the data used for statistical analysis and reporting by the sponsor.
The Analysis Data Model describes the general structure, metadata, content, and
accompanying documentation pertaining to analysis datasets. The terms “analysis dataset”
and “ADaM dataset” are used interchangeably in this document.
The term “pilot project team” refers to the group of individuals from industry who worked on
the pilot project. Refer to Appendix 7.1.1 for a list of pilot project team members.
The term “regulatory review team” refers to the group of FDA volunteers who participated
on this pilot project, providing input and feedback based on their areas of expertise and
interest. The views expressed by these volunteers are their own opinions and experience and
are not, necessarily, those of FDA. Refer to Appendix 7.1.1 for a list of regulatory review
team members and contributors.
Refer to Appendix 7.9 for a list of the abbreviations used in this report.
5
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Page 7 of 63
Project Report: CDISC SDTM/ADaM Pilot
• Gather the input, evaluation, and review from a group of FDA reviewers, in a
collaborative software environment, of real clinical trial data based on the CDISC
standard.
• Assess the boundaries between SDTM and the parallel elements in ADaM. Understand
the requirements and working relationships between observed data, derived data, specific
analysis datasets, and program files.
Optional or “next step” objectives were to explore submission of data using an XML file
format versus the SAS® System XPORT format and to explore the use of ODM and the
CRT-DDS (also called Define.xml) in providing metadata for the submission package.
The presentation of the proposal for the pilot project occurred at the CDISC Interchange
Meeting in September 2005. The pilot project team was identified and the first team meeting
held in November 2005. Table 1 provides highlights of the timeline between the CDISC
Interchange Meetings in 2005 and 2006 and the receipt of the regulatory review team’s final
comments on the pilot submission package.
Table 1 Timeline for CDISC SDTM/ADaM Pilot Project
November 18, 2005 First pilot project team teleconference
January 25, 2006 Planning meeting with CDISC Board representatives
February 17, 2006 Legacy study documents (redacted protocol, abbreviated study
report, case report form) provided to pilot project team
February 28, 2006 Face-to-face kick off meeting for the project, included roundtable
discussion with regulatory team members
April 10, 2006 Pre-submission encounter with FDA participants
April 19, 2006 De-identified legacy data provided to pilot project team
June 30, 2006 Submission package sent to the regulatory review team
August 28, 2006 Pilot project team received regulatory review team’s comments
September 26, 2006 Announcement of results at CDISC Interchange
February 13, 2007 Revised submission package sent to regulatory review team
April 4, 2007 Pilot project team received regulatory review team’s comments on
revised submission package
The timelines for the project were driven by the early agreement (at the January 2006
planning meeting) that results would be reported at the CDISC Interchange 2006 conference.
To achieve this deadline, the pilot submission package needed to be sent to the regulatory
review team by the end of June 2006. All activities in producing the pilot submission
package were geared towards meeting that target date.
It was agreed at the January planning meeting that the primary focus of the pilot project
would be to produce a submission package as an example of the application of the CDISC
standards, and that FDA statistical and medical reviewers would evaluate the submitted
datasets (SDTM and ADaM), metadata and documentation. The phrase “pilot submission
package” will refer to this submission package in this report. Additionally, the team
identified a set of success criteria to help assess the overall efficacy of the pilot submission
package from the perspective of the regulatory review team. These criteria were: 1) is the
submission evaluable with current tools; 2) can the reviewers reproduce the analyses and
derivations; and 3) can the reviewers easily navigate through the pilot submission package.
Page 8 of 63
Project Report: CDISC SDTM/ADaM Pilot
The goal of the pilot project was not to prove or disprove efficacy and safety of a drug;
therefore not all components of the legacy study (referred to as Study CDISCPILOT01)
discussed in the legacy protocol were included in the pilot submission package. The pilot
submission package included one abbreviated study report that documented the pilot project
team’s analyses of the legacy data. The purpose of providing a study report was to test the
summarizing of results and the linking to the metadata, as well as providing results or
findings for the regulatory review team to review and/or reproduce. Accompanying the study
report were the tabulation datasets, analysis datasets, Define.xml files containing all
associated metadata, an annotated case report form (aCRF), and a reviewer’s guide.
With the objectives of the pilot project in mind, the completeness of the pilot submission
package was considered adequate for the purpose of this pilot project by the regulatory
review team; however the pilot submission package falls far short of the standard
requirements for a complete application to market a new drug or biologic. The pilot
submission package is for illustration only; there is no intention to imply in any way that it
constitutes a complete submission package.
The regulatory review team had a favorable overall impression of the pilot submission
package. Through several meetings (teleconference and face-to-face), the individuals
participating on the review team provided constructive feedback and specific details of what
they considered best practices with regard to the content, structure, and format of clinical
study reports (CSRs), the clinical data, and the metadata that describe the clinical data.
Although the regulatory review team was generally pleased with the original pilot submission
package, they noted a few issues. The primary issues related to functionality available for
the Define.xml file and the format and structure of the analysis datasets. The pilot project
team and the regulatory review team agreed that a revised pilot submission package would be
created, to address these issues as much as possible. The pilot project team sent the revised
pilot submission package to the regulatory review team in February 2007 and received
comments back in April 2007. Based on a small survey among the regulatory review team,
the issues with functionality and navigation of the Define file appeared to have been
addressed. The feedback from the regulatory review team regarding the revised analysis
datasets was positive, stating that the revised versions are a good illustration of what
information is critical to understanding the lineage of the data from case report form (CRF)
to analysis.
1.4. Caveats
The pilot project team was primarily focused on the “What” (i.e., content) of a CDISC-
adherent submission, not the “How” (i.e., process). Although the “How” (i.e., process) was
addressed in the efforts of the pilot project team, optimizing the process was not a focus of
the project due to a variety of factors (refer to Section 2), including the fact that the amount
of time available to produce the pilot submission package was shorter than envisioned. Tight
timelines affected the project because the reasons for choosing certain ad hoc processes were
often that they were the fastest “good” processes to implement rather than the “preferred”
process. Difficulties with process are not necessarily inherent in the standards; indeed, these
issues might not exist with better tools and more time to think about processes. Therefore,
one should not interpret the processes described in this report as the only, or the best, way to
proceed with the creation of a submission using the CDISC standards.
Page 9 of 63
Project Report: CDISC SDTM/ADaM Pilot
CDISC is moving towards having a harmonized set of standards. The experiences gained in
this pilot project, and in future projects, promise to be very helpful in furthering integration
of standards. Accordingly, some ad hoc decisions were required to facilitate integration for
the pilot package. While these decisions resulted in a “legitimate” submission using the
standards available at the time, the resulting product does not necessarily represent a future
version of the standards. For example, the pilot submission used extensions to the Define file
(as described in Appendix 7.5.5) that may not necessarily be incorporated into future
versions of the Define.xml standard. One of the purposes of this project report is to explain
the various decisions made by the pilot project team and the implications of those decisions.
Clearly, the pilot project differs from a “real-world” creation of a package for submission to
FDA. Wherever possible, the report highlights these differences so that readers will not
assume that CDISC or the pilot project team advocates real-world use of these processes.
For example, the use of MedDRA terms in the pilot submission was constrained under the
terms of an agreement with the MSSO, which controls licensing of MedDRA, as described in
Section 2.2.3 of this report.
Readers should note that this pilot project did not examine how the CDISC standards interact
with every aspect of clinical data processing and review. For example, the pilot project did
not test whether certain sets of required, expected, and permissible variables in SDTM were
more useful to the review process than other sets. In addition, since the pilot project used
only one clinical trial from one therapeutic area, it did not address the question of how well
the CDISC standards would apply to clinical trials in general. One of the benefits of standard
data is the possibility of combining data across different submissions. This pilot project did
not have the data or the resources necessary to test this benefit. By using only one team to
produce the submission, this pilot did not test the reproducibility of the CDISC standards
across multiple teams.
As noted throughout this report, all comments, statements, and opinions attributed in this
document to the regulatory (FDA) review team reflect views of those individuals conveyed
as informal feedback to the pilot project team, and must not be taken to represent guidance,
policy, or evaluation from the Food and Drug Administration.
Page 10 of 63
Project Report: CDISC SDTM/ADaM Pilot
2. Process
2.1. General description
Figure 1 illustrates the content and general structure of the pilot submission package
submitted to the regulatory review team. Note that the blue rounded rectangles represent
folders, with the text in the box providing information rather than the precise folder names
described in the eCTD specification; not all folders are illustrated.
CDISCPILOT01
M1 M5
(Administrative) (Clinical Study Reports)
Study Report
Analysis Tabulations
(PDF)
Patient Narratives
Define.xml Define.xml
(ASCII text)
Annotated CRF
(PDF)
Page 11 of 63
Project Report: CDISC SDTM/ADaM Pilot
• It was understood that producing the pilot submission package might necessitate the use
of “coat hangers, duct tape, and bandages” to get everything to harmonize properly.
These patches would definitely not be part of a recommended process, but would
facilitate meeting the timelines.
• Future pilot projects will build on the work done for this pilot project.
Consequently, the process described here is only a basis for future development – both to
consolidate things that worked well and to avoid or improve on things that worked poorly.
To provide that basis, this report includes detailed descriptions of the processes used in this
pilot project, including the rationale for various decisions as appropriate.
Figure 2 illustrates a general outline of the process followed by the pilot project team. The
term “derived data” refers to data that involve calculations or manipulations of the CRF data.
At the onset of the project, it was agreed that the tabulation datasets (i.e., SDTM datasets)
would be created from the legacy data, with only a very minimum amount of derived data
included. These datasets, referred to as “SDTM-without-derived,” were the input for the
creation of the analysis datasets. With one exception, analysis results were based on analysis
datasets; the concomitant medications summary was based on SDTM datasets, as described
in Section 2.7. The pilot team wanted to test the utility of including derived data in SDTM, so
a set of potentially useful variables in the analysis datasets were selected for inclusion in
SDTM. The origins of these variables were to be identified as variables in the analysis
datasets and appropriate links provided. A separate step in the process added these derived
data to create the “SDTM-with-derived” tabulation datasets submitted to the regulatory
review team. Quality control conducted by the pilot project team verified that the derived
data incorporated in the SDTM datasets were consistent with the original data in the analysis
datasets. Refer to Section 2.6 for more details regarding including derived data in the
tabulation datasets.
Page 12 of 63
Project Report: CDISC SDTM/ADaM Pilot
Legacy
documents Building the CDISC Pilot
received
Submission Package
Decisions
regarding data
analysis
Create 0-obs
Create 0-obs Create analysis Generate
analysis Write cover
SDTM datasets datasets analyses
datasets letter
Page 13 of 63
Project Report: CDISC SDTM/ADaM Pilot
Page 14 of 63
Project Report: CDISC SDTM/ADaM Pilot
• All MedDRA terms except the lower level term, preferred term, and system organ class
were to be masked in the pilot submission package.
• CDISC were to identify and inform the MSSO of the fixed period of time that this pilot
program will be in effect. This is simply an identification of a fixed period of time for
the pilot project and the use of MedDRA in the pilot project, not a limitation.
• The total number of terms lower level terms and preferred terms used in this pilot project
would not exceed 10,000 terms.
MedDRA version 8.0 was the coding dictionary used for the adverse event data.
The regulatory review team requested that all five levels of MedDRA coding be included in
the tabulation datasets. The three levels not currently included in the SDTM adverse event
(AE) model [Higher Level Group Term (HLGT), Higher Level Term (HLT), Lower Level
Term (LLT)] were included in the supplemental qualifiers domain for AE (i.e. SUPPAE). To
protect the copyright and licensing agreement of MedDRA non-informative terms masked
the actual values of HLGT and HLT (e.g. HLGT_0152, HLT_0617). The pilot project team
also chose to mask the AE verbatim text, replacing the actual text with a randomly generated
coded text (e.g. “VERBATIM_0013”) with each unique term corresponding to unique coded
text.
It is important to note that due to the considerations outlined above, the coding of adverse
events for this project was NOT consistent with MedDRA coding rules and conventions. It is
important to clarify that in submissions sponsors should adhere to the rules of the dictionary
used in the submission.
Page 15 of 63
Project Report: CDISC SDTM/ADaM Pilot
capability as well as providing the more traditional links to the appropriate page numbers in
the blank CRF. The pilot project team elected to do both so that the familiar method of
referring to the blank CRF was also available to the review team. The reviewer’s guide sent
with the pilot submission package explained that the Acrobat Advanced Search, using the
“Search Comments” option, would facilitate finding annotations more efficiently. By
combining “Search Comments” and “Whole Words”, a reviewer could find all variables for a
particular domain using the 2-letter domain prefix that was placed in the “Subject” field.
The comments could also be printed by using that option in Adobe Acrobat (i.e., select to
print the document and then select the comments option). A note explaining this additional
attribute of comments should probably have been included in the reviewer’s guide, to make
reviewers aware of the functionality.
The CRF was annotated with “Not Entered in Database” on those pages/panels/date entry
fields where data were not reported in the datasets due to data de-identification. (This was
not done in the original pilot submission package and the oversight was noted by the
regulatory review team and corrected in the revised pilot submission package.)
Page 16 of 63
Project Report: CDISC SDTM/ADaM Pilot
Page 17 of 63
Project Report: CDISC SDTM/ADaM Pilot
reassembly of the dataset, QSSEQ was made unique across the entire set of split QS domains
by the addition of a questionnaire-specific value to the sequence number. For example, by
adding a questionnaire-specific value of 5000 to the sequence numbers of the records in the
QSAD dataset, an original QSAD sequence number of 1 became 5001. The pilot submission
package contained the re-assembled QS domain.
The pilot project team elected to order the variables in the SDTM datasets using the dataset’s
key (i.e., index) variables (as listed in the dataset metadata) and the order of variables used in
the SDTM Implementation guide. The key variables were placed first, followed by the
remaining variables. This variable ordering scheme was applied consistently for all domains
in the pilot submission package.
Page 18 of 63
Project Report: CDISC SDTM/ADaM Pilot
population indicators, clinical milestones, and completion status. This analysis dataset was
used as input to other analysis datasets and thus was pivotal to the work stream.
The principles specified in the published ADaM v2 were utilized in this pilot project.
However, in parallel to the work on the pilot project, the CDISC ADaM team was developing
the ADaM Implementation Guide, which presents standards for the structure and content of
analysis datasets, including standard variable names. Therefore, it should be kept in mind
that the analysis datasets submitted with the pilot project represent the concepts in ADaM v2
but do not necessarily reflect those included in the ADaM Implementation Guide.
According to ADaM v2, analysis datasets only need to be provided for key (i.e., important)
analyses, as defined and agreed upon by the sponsor and reviewers. The pilot project team
provided analysis datasets for each analysis included in the pilot submission package with the
exception of the concurrent medication summary. The pilot project team decided to provide
analysis datasets for almost all their analyses, key or not, because the number of analyses in
the pilot submission was relatively small and because they felt it was important to provide a
broad range of illustrative examples. The concomitant medications summary was the only
analysis for which an analysis dataset was not provided.
Page 19 of 63
Project Report: CDISC SDTM/ADaM Pilot
algorithm and by the windowing algorithm and those that were “as observed” (i.e., included
with no changes from the tabulation dataset). The datasets are described in Appendix 7.3.
Additional changes to analysis datasets in the revised pilot submission package as a result of
regulatory review team feedback included:
• Population flag variables were modified to contain either Y or N. No blank values were
allowed.
• Dates of the first and the last dose were included in all analysis datasets
• All three variables containing treatment information were included in all analysis datasets
(as opposed to only one or two of the variables). Within the pilot submission, the three
treatment variables were TRTP, TRTPN, and TRTPCD (referring to the text, numeric,
and coded versions, respectively, of the planned treatment).
• A flag variable was added to all relevant analysis datasets (i.e., all except ADSL and
ADTTE) to indicate whether the observation occurred while the subject was
on-treatment.
• Variables within each analysis dataset were ordered in a logical pattern, rather than
alphabetically.
Changes to the metadata associated with the analysis datasets included changing the
description of “structure” to be more consistent with that used in SDTM. For example, the
structure of the LB domain was described as “one record per lab test per time point per visit
per subject” in the metadata. The metadata description of structure for the lab analysis
datasets (ADLBC and ADLBH) was changed from “one record per subject per visit per lab
parameter” to “one record per lab test per visit per subject.”
Page 20 of 63
Project Report: CDISC SDTM/ADaM Pilot
• a derived variable defined as result divided by upper limit of normal (i.e., LBTMSHI) for
lab data (in SUPPLB)
While the SDTM supplemental qualifier datasets were not originally created with “numeric”
qualifiers in mind, the pilot team chose to test the use of the supplemental qualifiers structure
for the LBTMSHI variable.
In addition, as described in Section 2.2.3, three coding levels for MedDRA were included in
the supplemental qualifiers domain for AE (i.e., SUPPAE).
There has been much debate over what process should be used to produce SDTM and ADaM
datasets, including if and how derived variables should be incorporated into SDTM datasets.
The pilot project team decided to add derived data to the SDTM datasets after the analysis
datasets were created, using the same algorithms. Once the programming was complete, a
separate QC was performed to ensure that the derived values were consistently represented in
both the SDTM and the ADaM domains.
In adding the derived data to SDTM, some limitations of ODM with respect to providing a
linkage in the Define file between SDTM and ADaM were identified. The intention was to
provide a link in the metadata from the SDTM derived variable to the corresponding
(“original”) variable in the analysis dataset. The pilot project team found that the ability to
link derived data in the tabulation datasets back to the analysis datasets was not available in
the version of ODM being used. For example, in QS (the domain containing the
questionnaire data), QSSTRESN contains CRF data and contains the derived total score from
the corresponding analysis dataset. The “patch” for identifying this in the metadata was to
use “Computational Algorithm or Method” to provide text describing the various sources for
the value in the QS dataset. In this example, the text described that if the QSCAT variable is
XXX and the QSDRVFL is set to yes, then the value in QSSTRESN was from the record
containing the total score (computed using observed values) for the appropriate subject and
visit in the corresponding analysis dataset; otherwise the value for QSSTRESN came from
the CRF. The actual text used in the description was:
if QSDRVFL='Y' and the QS data pertain to ADAS-Cog or NPIX, then QSSTRESN is
from ADQSADAS.ACTOT or ADQSNPIX.NPTOT, respectively, using the windowed
data (i.e., where VISIT=AVISITC and ITYPE=' '), else if QSDRVFL = ' ' then
QSSTRESN is from the CRF Page
Given this limitation, the pilot project team elected to add the computational method for a
minimum number of the SDTM variables. In addition, the content of the “Computational
Algorithm or Method” and “Comment” columns in the pilot project define file differ from
other examples in the public domain (where computational method is incorporated in the
comments column). As noted in Section 4.4, reconciliation of differences between the
elements recommended for the ADaM and SDTM metadata was still under discussion at the
time of this pilot project. Since the pilot project team wanted to have a consistent format for
the two sets of dataset metadata, with both using the same column headings, some
“tweaking” of the information contained in existing metadata columns was necessary.
Consequently, the pilot project team populated a minimum number of these fields within the
SDTM dataset metadata as an illustration. In constructing a “real” define file, many other
variables would include explanations of how they were derived (e.g., RFSTDTC,
RFEDNDTC, AGE).
Page 21 of 63
Project Report: CDISC SDTM/ADaM Pilot
In attempting to address issues regarding derived data in the tabulation datasets, the pilot
project team found that the meaning of the term “derived data” is not universally agreed.
There is confusion between the term “derived data” and the use of the term “derived” for the
origin in the SDTM metadata.
Page 22 of 63
Project Report: CDISC SDTM/ADaM Pilot
be consistent with ICH E3, incomplete sections and appendices were included in the study
report with a notation that text for that section or appendix was not included.
Raw statistical output from the primary efficacy analyses and from the repeated measures
analysis were included in a subsection to Appendix 9 of the CSR, “Documentation of
Statistical Methods,” as requested by the regulatory review team. It was noted that statistical
reviewers often expect raw statistical output from at least the primary efficacy analysis to be
provided, and the provision of such output should be discussed and agreed between the
sponsor and the reviewer. Such documentation is helpful for examining and understanding
discrepancies between a reviewer’s results and the results reported in the CSR.
Page 23 of 63
Project Report: CDISC SDTM/ADaM Pilot
3. Metadata
The specifications for the analysis datasets and the SDTM datasets were written in metadata
prescriptively, prior to developing the programs to create the analysis datasets. In contrast to
a descriptive approach, this prescriptive approach leveraged the value of metadata by making
the data specifications accessible by a suite of (SAS) macros that automated some processes
of building and validating SDTM and analysis datasets as well as the accompanying
Define.xml content. The analysis specifications and variable level metadata were entered
into Excel spreadsheets. (Other options for collecting the information included data and
catalog editors.) Software programming was used to convert the Excel spreadsheets into the
following metadata elements:
• a dataset specifying dataset level attributes
• a dataset specifying variable level attributes
• a dataset specifying codes/decodes and valid values of variables
• a catalog containing entries that contain text descriptions and comments that could be
attached to datasets, variables and other parameters
• a dataset specifying value-level information about variables that contained multiple types
of data (e.g., vital signs result that might be blood pressure or heart rate)
These five metadata elements were then used to create an HTML file that included all the
details required by a programmer to write a program to create the datasets. If any
ambiguities or gaps in the data specification were identified by the programmers, the
metadata was updated appropriately, and the HTML file recreated from the revised metadata.
The metadata content was evaluated several times during the data build phases and kept
consistent with the desired derived datasets. A programming macro used the attributes
defined in the metadata to create 0-observation datasets. These 0-observation datasets thus
conformed to the data specification in dataset names, dataset labels, variable names, variable
labels, variable lengths, variable types, etc. As the last step of creating the final version of an
analysis dataset, the programmer would append the data file created by the analysis dataset
creation program to the appropriate 0-observation dataset, thus applying all pre-specified
variable labels, lengths, types and variable content to the dataset. This process ensured that
the Define file was consistent with the datasets described within it. The regulatory review
team identified lack of consistency between the Define file and the data as a problem in many
submissions. The process used by the pilot team addressed this regulatory concern, in
addition to adding efficiency.
Other macros were used to help automate the many steps in the creation of analysis datasets.
These included:
• A macro that created a format catalog containing formats created from the code/decode
values defined in the data specification.
• A macro that sorted the observations in the datasets by the key variables identified in the
metadata and re-ordered the variables within the dataset according to the variable order
defined in the metadata. (Regulatory review team members expressed a preference for
datasets whose variable order matched the order of variables in the Define file.)
• A macro that produced a report of the actual allocated lengths of all character variables
along with the minimum length required to contain the maximum text string length. This
Page 24 of 63
Project Report: CDISC SDTM/ADaM Pilot
report helped to ensure that character variables were only as long as needed to contain the
data values.
• A macro that compared the structure and attributes of the draft analysis datasets with the
data specifications and compared the actual values found in variables with lists of
allowed values in the metadata.
• A macro that used the metadata to generate the Define.xml file. The resultant XML file
was syntactically validated by using an XML parser that compared the XML file to the
CDISC ODM schema.
The SAS macros used in this process were developed by Gregory Steffens (Eli Lilly and
Company) and can be found at the same location as the published pilot submission package.
The Define.xml file was also reviewed by the pilot project team and CDISC ODM/XML
experts. A separate XML file was created for each of the two databases – the analysis
database and the SDTM database. These two XML files were subsequently combined with
each other and with the analysis results XML file to create a single XML file containing all
of the dataset metadata.
The XML file created in the above process is a valid Define.xml file. As described in the
next section, the pilot project Define.xml also includes some non-dataset metadata that were
added in a subsequent step.
Figure 4 illustrates the process described above.
Metadata
Populated Data spec
In Excel and Published 0 – obs
Converted to In html Data sets Data
SAS From Created Populated
metadata From And Validate the
Metadata Appended Data sets
To 0 - obs To the Spec
In Metadata
Sort obs
Add And order
Attributes Variables to
Create Match Spec
Create To Data
Define.xml In metadata
Generate Format Sets From
From
Report of Catalog Metadata
Metadata
Character From
Lengths Metadata
cf Metadata
Page 25 of 63
Project Report: CDISC SDTM/ADaM Pilot
Page 26 of 63
Project Report: CDISC SDTM/ADaM Pilot
Page 27 of 63
Project Report: CDISC SDTM/ADaM Pilot
corrected by using a “framed” version of the style sheet (i.e., a version with a left-side
navigation pane). The framed version works only with Internet Explorer, but offers much
superior navigation capabilities. The non-framed version can be used with browsers other
than Internet Explorer, but can be difficult to use with Internet Explorer 6, because of the bug
mentioned earlier.
Some pilot project team members were unable to open the framed version of the Define file
even when using Internet Explorer. This issue was traced to differences in internet browser
settings, and the cause was ultimately traced to a difficulty caused by a reference, in the
XML, to a specific version of Microsoft XML Services. When this specific reference was
removed (as it has been in the public release of the pilot submission package), conflicts with
users’ internet browser settings were eliminated.
These issues illustrate the value of providing a sample of the data and define file to determine
that the rendering provides the functionality expected.
A major issue identified by the regulatory review team was the difficulty in printing the
Define file. The style sheet used in the pilot submission package was developed with the
primary target of web browser rendering, which is not readily suited to printing. Reviewers
who attempted to print the Define file found that the file did not fit on portrait pages, that
page breaks were not clean, and that printing only a portion of the file was difficult. Opening
the document in another application (e.g., Microsoft Word) provided a work-around, but was
not an option that was user friendly or efficient. Instead, the pilot project team created a PDF
file of the rendering that could be printed. This PDF file is not included in the public release
of the pilot submission package because this solution required some non-standard
procedures. As this shows, there is a need for XML standards evolution and accompanying
tools that are accommodate the need for printing as well as screen rendering, without
imposing further development work at style sheet-creation time.
One key factor in the success of the pilot project was the unprecedented level of interest and
support by individuals at FDA. The regulatory review team participated in the
Page 28 of 63
Project Report: CDISC SDTM/ADaM Pilot
teleconferences and made time to meet with the pilot project team at several face-to-face
meetings. At one of the face-to-face interactions with the regulatory review team, someone
commented, “In order to get a standard we have to suffer.” This became the unofficial mantra
of the pilot project team.
Page 29 of 63
Project Report: CDISC SDTM/ADaM Pilot
project did not allow the pilot project team to send a briefing package to the regulatory
review team.
In addition, a real-world encounter would have involved only reviewers familiar with the
particular therapy area. For the pilot project, eleven volunteers from FDA attended the
meeting, representing multiple therapeutic areas and disciplines. This allowed the pilot
project team to get a broader view of expectations for the pilot submission package.
The meeting began with an overview of the pilot project goals followed by an overview of
the study. The analysis strategy was then presented, including what endpoints and analyses
would and would not be included in the pilot submission package. The proposed data
structures and descriptions of the contents of the SDTM and analysis datasets were presented.
A Define.xml example and the annotated CRF were demonstrated.
In addition to numerous agreements regarding the specific pilot submission package (listed in
Appendix 7.7), several key agreements were reached:
• Individual programs would not be included in the pilot submission package, though it
was strongly encouraged by at least one reviewer. Instead, the pilot project team hoped to
illustrate that the metadata, which would include sections of program code or pseudo-
code, would be sufficient without providing complete programs.
• All levels of the MedDRA coding would be included in the SDTM datasets. This would
provide a good opportunity to test the effectiveness of tools used at FDA with respect to
the handling SDTM supplemental qualifiers.
• Individuals on the regulatory review team expressed a preference to avoid all listings,
since they thought they would not be needed. Even listings of subjects with serious
adverse events and deaths were thought to be unnecessary, since these subjects could be
identified easily.
Page 30 of 63
Project Report: CDISC SDTM/ADaM Pilot
Page 31 of 63
Project Report: CDISC SDTM/ADaM Pilot
6. Conclusion
The goals of the CDISC SDTM/ADaM pilot project were met. It was established that the
package submitted using CDISC standards met the needs and the expectations of both
medical and statistical reviewers participating on the regulatory review team. The regulatory
review team noted the importance of having both data in SDTM format that support the use
of FDA review systems and interactive review, and data in ADaM format to support analytic
review. The project demonstrated the importance of having documentation of the data (e.g.,
the metadata provided in the data definition file) that provides clear, unambiguous
communication of the science and statistics of the trial.
The regulatory review team expressed a favorable impression of the pilot submission
package. They were optimistic about the impact that data standards will have on the work
associated with their review of new drug applications.
Page 32 of 63
Project Report: CDISC SDTM/ADaM Pilot
• Attributes of the pilot submission package that addressed requests or expectations of the
regulatory review team
o Navigation made easier in the Define file through use of bookmark pane and table of
contents
o Reviewer’s guide provided to orient reviewers to various aspects of the pilot
submission package – link provided from annotated CRF and from Define file, as
well as within the PDF file
o Improved methods for annotating were used, which helped to facilitate search
o Links provided in Define file to PDF files (e.g. annotated CRF, SAP, study report)
• Prescriptively using the information for the metadata in building the submission package
resulted in significant efficiencies; the specifications for the datasets were entered once
and then used not only as metadata but also to:
o automatically generate the define file
o support automation of the data set creation
o support automation of order of variables in data sets to be the same as in the define
o maintain consistency with datasets and support automation of data set validation
• Issues to be aware of in creating a submission package
o Define file is crucial and must be accurate and consistent with the data
o Use a consistent method of identifying the appropriate records used in the analysis
o How to provide links between the derived data in SDTM and analysis datasets
o Use of the “comment” and “purpose” columns in the Define file
o Definition of the term “derived data”
o Design and implementation of style sheet
o Ordering of variables in the data is important, and must be consistent with ordering in
Define file
o Verify transparency regarding how data were derived and analyzed
o Analysis datasets should be structured in such a way that reviewers can perform
sensitivity analyses as well as verify analysis results
o Confirm hyperlinks in the Define file perform as expected
Page 33 of 63
Project Report: CDISC SDTM/ADaM Pilot
As stated previously, style sheets used for viewing of the Define file do not facilitate printing
the file in such a way as to produce a reasonably formatted document. Solutions to allow
both easy viewing and printing of Define files have not been identified. This problem could
be viewed as an implementation issue that sponsors will need to handle, after discussing the
issue with their FDA reviewers. For example, a sponsor might choose to provide two
versions of the style sheet – XML for viewing and PDF for printing. Ideally, a reminder of
the issue would be included somewhere in the CRT-DDS guidance (e.g., a note that
consideration be given to how the sponsor will respond to a request from reviewers for a
print-friendly version of the style sheet). It should be noted that the regulatory review team
for the pilot project emphasized that the ability to print the document would be essential for
the future use of XML files.
6.3. Acknowledgements
The CDISC SDTM/ADaM pilot project team would like to acknowledge the contributions
and support of the many people and organizations that helped to successfully complete this
project.
Whatever technology and solutions were needed to get the job done was shared openly
between FDA and industry, software and pharmaceutical companies, or services groups and
individual contributors. This openness was a key factor in the success of the project.
It is not possible to overstate the value provided by the regulatory review team’s interactions
with the pilot project team. The guidance, feedback, and enhanced understanding of the
others’ processes were invaluable to both teams.
The members of the pilot project team want to express their appreciation for the enthusiasm
and continuous support from others in CDISC, including the project sponsors, the CDISC
boards, and, of course, the SDTM, ODM, and ADaM teams.
The entire project would not have been possible without the support of the employers of the
various team members in allowing the participants to spend time and energy working on this
pilot project. This exemplifies the CDISC spirit of working together for the common and
greater good.
Page 34 of 63
Project Report: CDISC SDTM/ADaM Pilot
7. Appendixes
7.1. Appendix: Project management
The pilot project was managed by a team of co-leaders. Once the initial pilot project team
was established and work officially started, no new team members were added. There were
several face-to-face meetings: an initial planning meeting in January 2006 with sponsors and
the pilot project team co-leaders, a kick-off meeting in February 2006 to discuss plans and
work assignments for the pilot project, a working meeting in May 2006, and another meeting
in September 2006 to hear the regulatory review team’s comments. Regular teleconferences
were held – bi-weekly initially, going to weekly during the peak workload periods. Minutes
of all meetings were posted to the team’s document repository.
Initially the pilot project team was divided into three sub-teams: analysis, data, and research.
The data sub-team worked on mapping the CRF and creating the structure for the tabulation
datasets. The analysis sub-team worked on writing the statistical analysis plan and designing
the analysis datasets (including the writing of the metadata). As these tasks were completed,
these sub-teams merged to perform the programming and QC of the analysis datasets and
summary tables. The research sub-team focused on the creation of the Define file, including
producing an XML file that would accommodate the requirements for analysis dataset
metadata and analysis results metadata.
Page 35 of 63
Project Report: CDISC SDTM/ADaM Pilot
Page 36 of 63
Project Report: CDISC SDTM/ADaM Pilot
Page 37 of 63
Project Report: CDISC SDTM/ADaM Pilot
comments, shows the complete contents of the text of any text-box comments and can be
expanded to show additional details such as the “Author”, “Subject”, and modified “Date”
and “Time” attributes of each comment.
Page 38 of 63
Project Report: CDISC SDTM/ADaM Pilot
Figure 6 Illustration of annotation comment for the visit date collection field
The SDS draft Metadata Submission Guidelines recommended a process for annotating the
text (in these “inherited” variable instances) by using the wildcard “--” to indicate that more
than one 2-digit prefix may be applicable; this recommendation was followed in the pilot
project. Another recommendation of the SDS metadata team that was followed was that a
comma-separated list of applicable variables be included in square brackets, following the
“wildcard” entry. An example is --DTC [AEDTC, CODTC, CMDTC, DMDTC, SCDTC,
QSDTC, VSDTC, DSDTC, MHDTC] when VISITNUM=“1”. Note that a qualifying
statement was also included, indicating the instance or value of the VISITNUM associated
with data collected for this variable (see next paragraph for more about this). The text box
was then re-sized (as described earlier) to initially display (on the PDF page) only the text
“--DTC” even though the brackets and additional text were part of the text in the
comment/annotation.
In order to differentiate variables, many annotations had to have qualifying statements such
as VSTESTCD when VSTESTCD=“PULSE” so that this Vital Signs test code was
differentiated from other test codes collected, such as blood pressure values or temperature.
This became crucial in identifying different questionnaire questions and responses, especially
once the comments were resized to display only the variable name such as “VSTESTCD”. In
fact, it became evident that both the topic variable (such as QSTESTCD) and the result (such
as QSORRES) needed to be annotated, even though the topic variable was defined by the test
or sponsor and applied to the CRF (thus the CRF was not really the source or origin of the
topic variable). The result, however, was collected from the CRF and it was important to
annotate the result in such a way that the result was related to the topic variable. This was
achieved by using similar qualifying statements as described above, such as VSORRES when
VSTESTCD=“PULSE”. The result was this pairing of the topic variable annotation for the
printed question and the result variable annotation for the CRF data “input” field.
Page 39 of 63
Project Report: CDISC SDTM/ADaM Pilot
As noted in Section 2.3, all pages where data were collected and reported were annotated.
Instead of referencing other panels as an example of how a page should have been annotated,
using Acrobat 7, comments were “cloned” and similar or identical pages and panels were
commented from previously annotated pages and panels. This was done via a process where
one annotated page or panel was created as the only page or panel in a template PDF file.
The annotations/PDF comments were than exported via the Acrobat “Export Comments”
feature. These were exported to an XFDF file, an Adobe XML file that describes Forms
Data Fields (such as comments that are a subset of the Adobe Forms fields). The resulting
XFDF files were then opened with a text editor and the page numbers were updated with the
correct “target” page in the complete blankCRF.pdf file. The “target” page would be the first
panel or page where the annotations needed to be applied. The blankCRF.pdf file was then
opened and the XFDF file was used for importing comments to the appropriate page. This
process was then repeated for the next and subsequent pages/panels until all were annotated.
Fully populating the 158-page PDF file from the individual templates (approximately 13
visits cloned) took approximately 3 hours.
As described in Section 2.3, the pilot project team implemented a PDF Advanced Search that
would provide a reviewer with an advanced search result including all “hits” for the searched
values. The use of the Acrobat Advanced Search, using the “Search Comments” option,
would let reviewers find annotations more efficiently. By combining “Search Comments”
and “Whole Words”, reviewers could find all variables for a particular domain using the
2-letter domain prefix which was placed in the “Subject” fields. Figure 7 depicts a
screenshot of the PDF search comments option.
Page 40 of 63
Project Report: CDISC SDTM/ADaM Pilot
Page 41 of 63
Project Report: CDISC SDTM/ADaM Pilot
Page 42 of 63
Project Report: CDISC SDTM/ADaM Pilot
• There is one record per analysis parameter (i.e., outcome variable) per analysis visit per
subject. In the case of the pilot submission package, the only analysis parameter in
ADQSADAS is ACTOT. If there had been additional subtotals of interest for analysis,
they would have been incorporated on separate rows in the dataset. The pilot project
team elected not to include the individual item scores in the restructured analysis dataset.
• All ACTOT data found in the QS domain are included in the analysis dataset (i.e., no
observations were “dropped”).
• Because the ADaM Implementation Guide was still being developed at the time of the
pilot project, the variable names and definitions do not correspond to recommendations
by the ADaM team.
• In revising the dataset the pilot project team ensured that the identification of three types
of data (observed, observed-windowed, and LOCF) was possible.
o Observed data can be identified by rows where VISIT = AVISITC and ITYPE is
blank
o Data included in the analysis of observed-windowed data can be identified by ITTV =
“Y” and ITYPE ≠ “LOCF”
o Data included in the analysis of LOCF data can be identified by ITTV = “Y” (Note
that ITYPE=”LOCF” indicates the record added to hold the imputed value)
• The definition of “observed” data is different in the revised content because all data are
included in the analysis dataset, so observed data are as found in the tabulation dataset
regardless of whether they are eligible for analysis.
• Not shown in the illustration are four indicator variables included for the purposes of
communicating how each record relates to the analyses. The variables are:
o AFLNELIG (indicates whether record contains observed data eligible for analysis)
o AFLNLOCF (indicates whether record contains data used for the LOCF analysis)
o AFLNOBS (indicates whether record contains observed data from SDTM)
o AFLNWIN (indicates whether record contains data included in the analysis of
WINDOWED observations)
A detailed discussion of each variable added in the revision will not be included here, as the
information is provided in the illustrated metadata.
Page 43 of 63
Project Report: CDISC SDTM/ADaM Pilot
Sum(ACITM01:ACITM02, ACITM04:ACITM08, ACITM11:ACITM14), see SAP section 14.1.1 for detailed scoring algorithm,
VAL Numeric value of PARAM adjusted for missing values; ACITMxx are the corresponding values of QS.QSSTRESN when QS.QSTESTCD=ACITMxx
BASE Baseline value of VAL VAL when AVISITCD='BL'
Page 44 of 63
Project Report: CDISC SDTM/ADaM Pilot
Metadata
Populated Data spec
In Excel and Published 0 – obs
Converted to In html Data sets Data
SAS From Created Populated
metadata From And Validate the
Metadata Appended Data sets
To 0 - obs To the Spec
In Metadata
Sort obs
Add And order
Attributes Variables to
Create Match Spec
Create To Data
Define.xml In metadata
Generate Format Sets From
From
Report of Catalog Metadata
Metadata
Character From
Lengths Metadata
cf Metadata
Page 45 of 63
Project Report: CDISC SDTM/ADaM Pilot
Figure 12 Spreadsheet used to collect metadata regarding variables within SDTM datasets
Page 46 of 63
Project Report: CDISC SDTM/ADaM Pilot
Figure 13 Spreadsheet used to collect metadata regarding valid values for SDTM variables
Page 47 of 63
Project Report: CDISC SDTM/ADaM Pilot
Figure 14 Example of the main-panel and left-hand pane tables of contents in the Define.xml file
Page 48 of 63
Project Report: CDISC SDTM/ADaM Pilot
Page 49 of 63
Project Report: CDISC SDTM/ADaM Pilot
Figure 18 Example of the analysis results metadata for two tables in the Define.xml file
Page 50 of 63
Project Report: CDISC SDTM/ADaM Pilot
Combining these two needs above, the pilot project team decided to place an identical
Define.xml in each of the tabulations and analysis locations, and arranged so that each Define
file functioned the same way regardless of which one was opened. This also addressed a
preference by the regulatory review team that they not have to open multiple Define files.
It turned out to be straightforward to devise a structure for the Define.xml so that not only did
these two Define files behave the same way, but also that they were identical files that could
reside in two different locations. How this was achieved is discussed later in this section.
7.5.3. Placement of schema and style sheet files in the pilot submission
package
The supporting files for the Define.xml are located in a subdirectory UTIL present in both the
tabulations and analysis folders. This keeps the supporting files in a subordinate location to
each Define.xml, while keeping the tabulations and analysis directories relatively
uncluttered. Within UTIL there is a folder called Foundation for schema files, and one called
XSL for the style sheet. Figure 20 illustrates the location of the supporting files in the
directory structure.
Figure 20 Directory structure showing location of supporting files for the Define.xml file
Page 51 of 63
Project Report: CDISC SDTM/ADaM Pilot
The structure of XML schemas that are envisioned for the CDISC Standards in the near
future and of the XML schemas as used in the pilot project are illustrated in Figure 21 and
Figure 22, respectively.
eCRF Data SDTM ADaM
Figure 22 Illustration of structure of XML schemas used in the pilot submission package
Page 52 of 63
Project Report: CDISC SDTM/ADaM Pilot
That said, in the longer term it is necessary that an official, vetted syntax be available that
meets the need for analysis results metadata submitted as part of Define.xml. It is fully
expected that, while the functionality of analysis results metadata will be supported in some
future Define schema, the syntactic details will likely change. The example provided by the
pilot project should be treated as one possible way to approach this need, suitable only in the
short term and not as definitive of the syntax to be used in future editions of Define schema.
Page 53 of 63
Project Report: CDISC SDTM/ADaM Pilot
XML fragments
Custom program needed for Analysis
Results Metadata
Custom Program
Final define.xml
Figure 23 Illustration of the process by which metadata were combined into a single Define.xml file
Page 54 of 63
Project Report: CDISC SDTM/ADaM Pilot
<crt:title>adae.xpt</crt:title>
</crt:leaf>
The syntax works generally on the hierarchy, with “..” meaning “up one directory”. So, one
can refer to the study report PDF file (as a whole) from either Define.xml file as
<crt:leaf ID=“Study-Report”
xlink:href=“../../../53-clin-stud-rep/535-rep-effic-safety-stud/
5351-stud-rep-contr/cdiscpilot01/cdiscpilot01.pdf”>
<crt:title>CDISC Pilot Study Report</crt:title>
</crt:leaf>
Extending this using PDF named destinations allows reference to individual sections and
analysis displays in the study report:
<crt:leaf ID=“ARM-Leaf0001” xlink:href=“../../../53-clin-stud-rep/535-rep-effic-
safety-stud/
5351-stud-rep-contr/cdiscpilot01/cdiscpilot01.pdf#nameddest=OUT_TBL_14.1.01”>
<crt:title>Table 14-1.01</crt:title>
</crt:leaf>
Page 55 of 63
Project Report: CDISC SDTM/ADaM Pilot
analysis dataset. The fact that it is the very same variable can be made clear in the metadata
because the variables would share a unique object identifier (OID). Using the Origin
attribute, the horizontal structure (e.g. analysis dataset) can declare the vertical structure as
the place where the variable was originally created.
In the pilot project metadata, it would have been desirable to do something similar, but in the
other direction (i.e. state that some portion of a --STRESN variable was originally created on
an analysis dataset).
To elaborate, the pilot project team placed some derived analysis variables for reviewer
convenience on the vertical SDTM datasets, as additional rows with distinct --TESTCD.
What this means is that on some SDTM datasets, the values of --STRESN for some
--TESTCD are from the CRF, and for other --TESTCD are values selected and transposed
from a particular column on some analysis dataset. The structures available at the time of the
pilot project in ODM/Define could not readily make this relationship apparent. The best
available resolution was to declare --STRESN as having Origin “Derived”, that --STRESN
was produced by a particular Computational Algorithm or Method, and then, in the
description of that Comp Method, explain how the values are sometimes taken from CRF and
sometimes based on the analysis dataset. The computational method
(COMP_QSAD_QSSTRESN) for variable QSSTRESN in dataset QS is an example of this.
While the pilot project team found a way to express the nature of --STRESN in these cases,
there may be some value in a more precise syntax for this operation; there is increasing
alignment that at least major derived variables (e.g. questionnaire domain summary scores
that serve as efficacy analysis variables) be present on the SDTM for convenience of
reviewers.
Page 56 of 63
Project Report: CDISC SDTM/ADaM Pilot
documentation page providing more depth about what was done, as well as anything
that would be critical to know when trying to understand the data file.
o Providing links for tables would be very helpful. (Reviewers receive many tables,
without really having a way to figure out where they came from.)
o More focus should be placed on the Define file being consistent.
• The description of variables is sometimes not descriptive enough and there is ambiguity
in the richness of the descriptions. Examples:
o The list of controlled terms for the variable may be incomplete or may require
descriptions of the terms.
o A derived variable may contain the answer to a question (e.g. duration) but the
variables in the database do not necessarily match those expected for the derivation; a
clear identification of the variables was not included in the description of the
derivation.
o The description of a variable does not match the data populating the variable. For
example, a variable might be described as a visit date, but the actual field is populated
with the same one date for all subjects.
o Descriptions of variables should not be ambiguous. For example, a date variable
might be described as an “entry” date, without further clarification. This could be the
subject’s date of entry into study or it could be the date data were entered into
database.
• Identifying the mapping between the protocol-specified analysis plan, the data, and the
analyses performed will simplify the review process. Without this mapping, the reviewer
will try to construct this mapping for the primary outcome variable and maybe a
secondary variable. Many of the contacts with the sponsor are to identify how to get
from the SAP to what the variables are, and then to reconstruct the analyses to see how
the SAP was implemented.
• Regarding the providing of program code: If program code is requested, it is usually not
to execute it, but rather to gain clarity for a variable’s derivation that is missing from the
define file. Because program code is not usually well documented, it takes a long time to
decipher.
• Regarding data issues:
o Two issues that present difficulties for reviewers are missing data and assumptions.
Flags that would indicate if a field has missing data or if a field has assumptions
applied to it would be very helpful.
o Need to state clearly whether or not a variable could have missing values, and if so
how to handle them.
o There also needs to be a distinction made between missing data or data not collected.
o One problem is that the variables are often named obscurely.
o Eighty percent of the data having consistent names and attributes is essential.
o Reviewers want to be able to manipulate data, if possible. For example, they might
want to explore the impact of changing a cutoff for a lab parameter. The data should
allow this flexibility (e.g., not include only the data above a specific cutoff, but
instead flag the values above the cutoff and be sure to include what the cutoff is).
Page 57 of 63
Project Report: CDISC SDTM/ADaM Pilot
• Regarding tables:
o Need to state clearly the different assumptions and rules used for a table.
o The process is to generate a table and then draw conclusions based on the table.
However, there is currently a deficiency in defining what the tables are. One person
noted a desire to see blank tables submitted with the statistical analysis plan.
o Footnotes could be added to tables to explain exceptions etc.
• Annotated CRFs:
o Very helpful to help link the protocol and the Define files. However, reviewers tend
not to look at the annotated CRF because they are often not accurate.
o Ideally, the Define file would be set up so that the CRF is transparent, minimizing or
eliminating references to the CRF. The CRF would be viewed more as a way to
collect data instead of facilitating the review.
• Regarding the application used for the Define file: Define.pdf is useful, but it is
inefficient to have data in one application and metadata in another. Using multiple
applications is inefficient and frustrating, so they would prefer using Define.xml.
• Regarding types of data needed for the medical and safety reviewers
o Both sets of reviewers need access to efficacy and safety analysis datasets, as well as
SDTM.
o For efficacy data, medical reviewers tend to get help from the statisticians. It would
be hard to standardize, since efficacy is different for every submission and tends to
have a different structure. There are predictable safety analyses asked for every time,
and these can be performed and the links set up the same for every submission. A
safety review guidance published in March 20056 gives a list of what reviewers are
supposed to be looking at regarding safety data.
o Flexibility is needed for the review of both efficacy and safety data. Though there is
a set of standard analyses for every review, there are also always additional ones to be
done. Safety data analyses are in general more exploratory than efficacy. However,
some of the therapeutic areas (e.g. oncology) consider many exploratory efficacy
analyses also, because of the different focus in life-threatening diseases. Therefore,
both the efficacy and safety datasets need to be flexible in terms of being useful for
additional analyses. It is also important to remember that a standard will cover things
that every division needs to know and then have some 20% of the data that would
need to be more flexible.
o The safety analysis datasets are paramount.
o Medical reviewers definitely need access to the analysis datasets as well as SDTM.
They also need access to the analysis plan.
o Statistical reviewers need access to both SDTM and analysis datasets, as they work to
recreate the analysis datasets from the raw data.
6
February 2005 FDA Reviewer Guidance: “Conducting a Clinical Safety Review of a New Product Application
and Preparing a Report on the Review,” a Good Review Practice by CDER. Refer to the following website:
http://www.fda.gov/cder/guidance/3580fnl.pdf
Page 58 of 63
Project Report: CDISC SDTM/ADaM Pilot
• Regarding analysis datasets: The value of the analysis dataset is to tie the conclusions
back to the raw data. In the past, analysis datasets have been supplied to some review
divisions, but are generally lacking in adequate documentation. If the analysis datasets
are not provided or are inadequate, the reviewer has to go back to the raw data and
reconstruct the analysis. Generally, the reviewer can come to the same conclusion as the
sponsor, but not the same numbers. The value of the analysis dataset is that they will be
able to get the same numbers, as well as allowing the reviewer to see if the analysis
datasets follow the analysis plan.
• The data specification for SDTM provides a well-defined structure for data. The most
critical thing is to follow the naming structure defined in the specifications. If the data are
in consistent locations, reviewers know where to look. In addition, automation is going
to be very important in the future, so having a well-defined data structure will be crucial.
Consistency and following specifications is very important.
• The FDA volunteers were asked what topics they would want covered in a conversation
with an industry statistician to discuss plans for a submission.
o These conversations should probably occur at least at the end of phase II meeting; the
pre-NDA meeting is too late. Sponsors can request a pre-submission encounter
meeting specifically to address data issues. Ideally, a formal plan of the data
structure, variable naming, etc, and a mock of what the data will look like will be
presented and all of the elements mentioned need to be specifically discussed.
o A phase II study could actually be used as the “mock up”, so have something
substantive to look at.
o Reviewers would also appreciate having a sponsor’s statistician provide an
orientation when the data are submitted. The focus would NOT be on showing what
was done and why; instead it would be on how to find things in the data.
o A reviewer’s guide for the submission package would be very helpful. It could be put
under the cover letter heading part of the eCTD. The text of the actual cover letter
would say that a reviewer’s guide has been included under the cover letter heading.
Note that the reviewer’s guide would cover more than the data. A benefit of the
reviewer’s guide would be that if the reviewer changes there would be a document
available for orienting the new reviewer.
o The reviewer’s guide could also be linked to from multiple places in the submission,
such as in the define file, because not everyone reads the cover letter.
• A very big issue is the lack of consistency within a submission.
Page 59 of 63
Project Report: CDISC SDTM/ADaM Pilot
• Discussion was held as to whether or not programs would be included. One reviewer
strongly encouraged the provision of programs as part of the documentation. ADaM v2
allows for this possibility, in cases where it would help describe what had been done. It
was commented that the pilot project team hopes the metadata will be sufficient without
providing programs, but understands programs or pseudo-code may be needed.
• It was requested that all levels of the MedDRA coding be included in the SDTM datasets.
Currently, the only way to do this is to include the additional levels in SUPPQUAL. (AE
domain only includes body system and preferred term.) This is a good opportunity to see
how SDTM handles supplemental qualifiers and to see if this method works effectively.
If there are secondary mappings, those should also be included.
• It was agreed that no patient listings would be provided. This includes listings such as
SAEs and deaths. If the data are such that these subjects can easily be identified, there is
no need for patient listings to be produced. There was a strong preference that patient
listings be avoided.
• Regarding the analysis of lab data and Hy’s Law, it was agreed that a summary be
presented using modified Hy’s Law. In addition, a table, or at least a flag in the data,
would be included to indicate the full Hy’s Law criteria.
• In addition to the above agreements, the following agreements were reached:
o An additional efficacy analysis was requested.
o Definitions for specific terms (e.g. dermatological event, treatment-emergent adverse
event) were agreed.
o The MedDRA Preferred Term will be used in summaries rather than the Lower Level
Term.
o P-values would be included in lab analyses.
o It was requested that the pilot submission package include normal shift tables for labs.
o Both the normal and the reference ranges would be included in the data. SI units will
be used, but original units will also be provided.
o All population flags will be included in the analysis datasets.
o Dermatological adverse events will be flagged in the analysis dataset, as will the
subject’s first occurrence of a dermatological AE.
o Both date of last dose & date patient was last observed would be included in all
analysis datasets.
Page 60 of 63
Project Report: CDISC SDTM/ADaM Pilot
Page 61 of 63
Project Report: CDISC SDTM/ADaM Pilot
• Use of a more logical ordering of in the analysis datasets (rather than alphabetically)
Modifications to the tabulation datasets and relevant metadata
The changes made to the SDTM (tabulation) datasets and metadata include:
• Included the dictionary names and versions for the AE and CM coded fields in the TS
dataset
• Provided one QS domain rather than splitting it by questionnaire
• Included variable SESEQ in the SE dataset
Other modifications to the pilot submission package, as requested by the regulatory
review team
• Patient narratives provided in the CSR and in a separate ASCII text file
• Raw statistical output from the primary efficacy analyses and from the repeated measures
analysis provided as a subsection to Appendix 9 of the CSR (Documentation of Statistical
Methods)
Term Definition
0-obs Zero observation dataset
aCRF Annotated case report form
ADaM Analysis Data Model
ADaM v2 Analysis Data Model Version 2.0
ADAS-Cog Alzheimer’s Disease Assessment Scale - Cognitive Subscale
ADSL Subject level analysis dataset
AE Adverse Event
CBER Center for Biologics Evaluation and Research
CDER Center For Drug Evaluation And Research
CDISC Clinical Data Interchange Standards Consortium
CRF Case Report Form
CRO Contract Research Organization
CRT-DDS Case Report Tabulation Data Definition Specification, also known as the
Define.xml
CSR Clinical Study Report
DDT Data Definition Table
eCTD Electronic Common Technical Document
eNDA Electronic New Drug Application
ETL Extract, Transform, and Load
FDA Food and Drug Administration
HLGT Higher Level Group Term
HLT Higher Level Term
Page 62 of 63
Project Report: CDISC SDTM/ADaM Pilot
Term Definition
HTML Hypertext Markup Language
ICH International Conference on Harmonisation
LLT Lower Level Term
LOCF Last Observation Carried Forward
MedDRA Medical Dictionary for Regulatory Activities
MSSO Maintenance and Support Services Organization
OBPS Office of Business Process Support
ODM Operational Data Model
OIT Office of Information Technology
PDF Portable Document Format
QC Quality Control
SAP Statistical Analysis Plan
SDS Submission Data Standards
SDTM Standard Data Tabulation Model
TDM Trial Design Model
TOC Table of Contents
WHODD WHO Drug Dictionary
XFDF XML Forms Data Format
XML Extensible Markup Language
XPT extension for a SAS Transport File
Page 63 of 63