Epi Manual v2.2.1
Epi Manual v2.2.1
Epi Manual v2.2.1
by
Myo Minn Oo
Version 2.2.1
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Table of Contents
2
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
3
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
4
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
5
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
6
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
The intention of this book is to give readers a rather practical approach to the EpiData
Manager and EntryClient for efficient data entry, documentation and data management. In
order to facilitate the learning process, the use of technical terms is minimized. Instructions are
also illustrated using a real-world project. It is my hope that this will enable readers to get
started using EpiData freeware with minimal problems.
One important thing to note is that this freeware collection was developed and
maintained by volunteers on a very limited funding. Without their dedication, EpiData would
not be accessible for many people. As a result, documentation and how-to guides are somewhat
limited. This is where I hope this book can fill the gap.
1.2 Features and Usage of the new EpiData
The first EpiData software was released in 1999. It has been around for more than 20
years now that many aspects have been changed. The new EpiData provides several advantages
over the old entry version. Meta-data and records are stored in a single file with extension
“.epx”, which abandons the previous triplet system. The file is basically a text file written in
a special web-programming language called “eXtensible Markup Language” (XML)
which is used to store data using simple text. It has become more graphically oriented. It also
supports Unicode (UTF-8) system hence non-Latin texts can be displayed. Moreover, a lot of
efforts were also put to implement good clinical practice (GCP) principle required for many
medical data projects. This means data encryption, detailed logging of events and user access
control of data.
The EpiData Manager is a tool for the project manager. Its role is to define data
structures, add meta-data, document and export data. Files created are also independent of
operating system. Once created you can open the file on any computers that install the freeware.
The EntryClient serves only data entry. The data entry personnel are not allowed to change
rules or structure while doing data entry.
1.3 Installing EpiData Manager and EntryClient
To download them, go to the EpiData Association’s official website,
http://www.epidata.dk. Under the download page, a list of options for Manager, EntryClient
and Analysis. Manager and EntryClient are available in both 32-bit and 64-bit computer
architecture under two operating systems: Mac OS and Linux. For Windows users, an all-in-
one installer including EpiData Analysis is available to download and install.
The version at the time of writing this book is 4.6.0 (as of 1st September 2019). There
can be drastic changes and discrepancies between the book and future version.
7
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
1.4 Terminology
Field refers to variables with certain characteristics such as numeric, decimals, text or
date. While data entry, values will be put into these fields corresponding to their pre-specified
data types.
Record refers to the combination of fields or variables in a subject or participant.
Dataset refers to a compilation of such records. In EpiData, it also refer to a dataform
which holds a number or such records.
Figure 1.4 illustrates the visual representation of these concepts.
Name: John
.........
......... .........
......... .........
Age: 30 .........
Fields .........
......... .........
......... .........
.........
Sex: Male
Record
Dataset
8
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
2. EpiData EntryClient Short Introduction, Documentation and help file. Version 2.0
J. Lauritsen & T. Christiansen
http://www.epidata.dk/downloads/epidataentryclientintro.pdf
3. EpiData Software for Operations Research in Tuberculosis Control: A course
developed by the EpiData Association. Hans L. Reider and J. Lauritsen
https://tbrieder.org/epidata/epidata.html
1.6 References
1. EpiData Software Freeware: EpiData Flyer General.
http://www.epidata.dk/downloads/epidataflyer_general.pdf
2. EpiData Course background by Hans L. Reider: https://tbrieder.org/epidata/course_0-
2_background.pdf
3. Short Introduction to EpiData Manager v2.01 J. Lauritsen & T. Christiansen: link
http://www.epidata.dk/downloads/epidatamanagerintro.pdf
4. EpiData EntryClient Short Introduction, Documentation and help file. v2.0
J.Lauritsen/T.Christiansen: link
http://www.epidata.dk/downloads/epidataentryclientintro.pdf
5. JM. Lauritsen, TB. Christiansen, HL. Rieder, J. Hockin EpiData Analysis
Introduction. Http://www.EpiData.dk (2018)
http://epidata.dk/downloads/EpiDataAnalysis_Introduction.pdf
9
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
As I mentioned earlier, its advantage is the graphical user interface which is simple and
intuitive. Figure 2.1.2 shows menu bar and toolbar. The toolbar is also called work process
toolbar which provides a generic workflow from project creation and data documentation to
data entry and export.
10
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Menu Bar
Toolbar
Figure 2.2.1 Creating a new project from menu bar versus from toolbar
11
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Welcome tab
Project tree
Study Infromation
Status bar
Project Tree lets you navigate through all dataforms under the tree structure. You can
easily switch the main window from the project’s Study Information to other dataforms by
pointing and clicking there on the tree structure. As you can see on figure 2.3.1, the name of
the project is still “Untitled Project”. To change the name, you can edit by double-clicking on
it. We will do that later.
Study Information is essentially meta-data or data about data. The full set of Study
Information is known as the Dublin Core Collection. Read more about it here
[https://www.dublincore.org/]. In EpiData, there are seven categories of meta-data or so-called
tabs which include “Welcome” tab. You can close welcome tab by clicking on “Close Page”.
It’s not that important. The other six tabs are
1. Title/Abstract
2. Coverage
3. Description
12
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
4. Ownership
5. Funding, and
6. Version Details.
Table 2.3.1 summarize these information.
13
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Status bar at the bottom of the screen currently gives three pieces of information: 1)
Last Saved, 2) MAIN, and 3) Records. “Last Saved” shows the time in hours, minutes and
seconds since you last saved the project. “MAIN” indicates that the canvas is currently
selected. We will cover this more at later chapter. “Records” shows the number of current data
entries in the dataset. Currently we have none since this is a new project.
Let’s move on to the dataform and see what happens. Click on the “Dataset 1” dataform
under the Project Tree. The screen on the right changes to a blank screen with small grid
layout. We call this a blank canvas because later on we will create our data entry form on it.
It’s like painting on a blank canvas. When you maximize the application window, you may
notice a red dashed line on the right edge of the blank canvas. This line indicates the margin of
the form when you print. Above the blank canvas, there is a set of tools to create data entry
fields. We will cover this more in the next chapter.
Click here
Print Margin
Blank Canvas
Let’s open the “Title/Abstract” tab. Change “Untitled Project” to “Form 1” and
press “Enter”. You may notice that the name under the Project Tree also gets changed.
14
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
15
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
16
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Task 2.6. Fill in the Study Information of current project “Form 1” and save the project as
“form1.epx”.
17
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Meta-data Description
Name Name of entry field
Label Description of entry field
Type Data type
Length Length of entry field
Range minimum and maximum values allowed for numeric and date data.
Value Labels Values and labels assigned for levels of the category Special number such
as 9 or 99 can also be used to represent missing data.
Comments Any instructions for data entry staffs. Adding Notes: or stating calculated
variables / deriving variables.
Table 3.1.1 Codebook
18
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Name
Every variable in the project should have a unique name of its own. However, there is
no one standard rule for assigning names to variables. Variations exist from different computer
programming languages and different organization like Google, Facebook or Apple. However,
there are some generic rules for naming convention.
1. Start the name with an alphabet.
Within the variable’s name, you can use all alphabets and numerics as well as an
underscore “_” character. For example, “weight1” is acceptable where “1weight”
is not.
2. Use a single word.
It means that name should not contain a space(s) or special characters.
are not acceptable “age” is a simple example. For age at registration, something like
and for age at death, can be used.
3. Use an intuitive name.
For example, age at registration for TB can be “age_reg” and age at death
“age_death”. Likewise, date of registration could be “date_reg” while "dor"
may not be very readable. However, "dob" is a commonly used acronym for date of
birth.
4. Make distinction of different composite of words.
For example, “age_reg” and “age_death” shows that you can combine with an
underscore that makes them easier to understand. This style is descriptively called
snake_case.
Another style uses an uppercase letter at the start of the second word combination.
Examples would be “ageReg” or “ageDeath”. This is called Camel case or more
descriptively, camelCase. These two styles may be better than simply "agereg" or
"agedeath".
5. KISS: keep it short and simple.
Usually aim to keep about 8 – 10 characters per name.
Label
Labels are straightforward. But keep “KISS” principle in mind.
Type
In EpiData, there are three basic types of data:1) String, 2) Number and 3) Date. Strings
can be just a short text or long string called "memo". Numbers can be an integer, floating
19
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
number (number with decimals), auto-incremental numbers and times. Dates are usually in
“dd/mm/yyyy” format, but other types can be offered.
Two other special types are Boolean (1 as Yes and 0 as No) and UPPERCASE
STRING.
Sometimes numbers are used to represent categorical data. The reason is that humans
make less error when they type less, and they make less error when they type a number
rather than text. Example, sex of a subject is a categorical data and usually include male and
female. Let’s take a moment and think here. We can create a field of string to input either
“male” or “female” or we can just enter “M” or “F”. However, keying “1” or “2” is much
easier. The numbers “1” and “2” does not bring any mathematical sense here but represent
being male or female.
For dates, a valid date should be entered, meaning that if you put “30/02/9999”, this
will not be accepted by the EpiData.
Type can provide very basic check for data validation. Example, you cannot input a
string into a numeric field.
Range
Range is also another type of built-in check to reduce the data entry error. Example,
you are entering data of adult subjects aged > 18 years old. If a range is provided, entering
values less than 18 would give a warning or an error while data entry.
It is usually used for numerical data, either discrete or continuous. Dates can also be
given a range.
Value labels
This comes hand in hand with numbers representing categorical data. In our previous
example “sex”, the number “1” represents “male” and “2” “female”. Unless we provide labels
to the value, we will not know which numbers mean which.
Another use of value label is assigning UNKNOWN or MISSING values.
Notes
As a general rule of data entry, a value should be entered to every variable. The
reason is that when a value is missing, you don’t know whether it is missing in the original
record or data entry staff forgets to enter. Missing values should also be pre-defined in the
codebook. This will enable the uniformity in data entry process if you are collaborating with
several project sites or areas. However, this practice is controversial and open to debate among
data managers.
20
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
21
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
SOLUTION:
Table 3.1.2 Codebook for “Request for Sputum Smear Microscopy Examination” of Form 1
22
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
23
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
3 = saliva
9 = Missing
Table 3.1.2 Codebook for “Request for Sputum Smear Microscopy Examination” of Form 1
24
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Additional Notes
If an entry field has sub-categories, it should be specified as number type, which in this
case has no mathematical meaning. The integer codes just correspond to the labels that are
defined. The reason for not entering text is that humans make less error when they type less,
and they make less error when they type a number rather than text. For example, we could
define someone's sex as text and then type "male" or "female", or we could also enter "M" or
"F" for simplicity, but ideally, we should define integer codes and enter 1 or 2 instead.
Variables which contain a limited number of known categories such as sex (male,
female) and marital status (single, married, separated, divorced, windowed) should be defined
as numbers and assigned the appropriate integer codes and corresponding labels. Variables
which have a larger number of known categories such as place of birth, or which have an
unknown number of categories, such as reason for not visiting a doctor, should be defined as
text. There are exceptions to these general rules, but these are beyond the scope of this book.
The variable “reason” (reason for sputum smear microscopy examination) is a good
example of giving an intuitive integer code assignment to labels. People can easily remember
that 0 means “Diagnosis” and 1 means “follow-up at 1 month” and so on. Other
good examples are results of specimens, “res1”, “res2” and “res3”.
25
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Always remember to save your project periodically as you might never know when your
computer will crash!
26
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
As we introduced earlier in Chapter 2.3, a toolbar appears at the top of the blank canvas
if you click on the dataform. These several tools shown in Figure 4.1.1 are not that many yet
powerful enough to create complex dataforms. Their respective functionalities are tabulated in
Table 4.1.1.
Print Dataform - This is convenient when you want to print your dataform and
distribute as paper-based or electronic format as pdf.
Point and select - Using this, you can point anywhere on the page and select
anything on it.
- By default, this is selected when you click on dataform under
the project tree.
Variable creators - These are a collective of tools to create different types of data
that we introduced in Chapter 3.1. Read it if you are not sure
what data types EpiData offers.
Section - group variables together for visual aid and efficient entry.
27
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Variable editing tool - edit and delete any or all variables from the page.
Alignment - Align variables for visual aid and efficient data entry flow.
Let’s now add our first heading to the dataform “dsRequest”. (Figure 4.2.1)
28
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Task 4.1.1. Add three more headings to the dataform “dsRequest” as shown in Figure 4.1.3.
29
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
This is a string type of length 3 for entering 3 letter codes. So, we are going to select “New
String Variable” from the toolbar.
2
3
6 7
Figure 4.3.1 Variable Properties of “String Variable” [Follow the steps in black circle.]
Notes:
- “Legal values” mean valid inputs. In the case of categorical data, this just means values
and value labels. We will step this up next.
- “Entry mode” means whether you must input a value or not. In “Default” mode, you can
either input a value or skip to next variable without giving a value. In “Must Enter”, you
must specify a value and in “No Enter” mode, the entry field will not be active and data
entry is not possible. This is known as “no-enter” field. It is commonly used for derived
variables into which values from other fields are feedback.
30
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
We have not yet defined the values and value labels for “facility”. To do this, open
the “Variable Properties” window by right-clicking on the variable and choosing “Edit”.
Or press “Enter” key.
3
1
4
2
Figure 4.3.2 Adding values and value labels for categorical variable
[Follow the steps in black circle.]
Now the last thing to do for categorical variable is to turn on picklist while entering
data as shown in Figure 4.3.3. As the name suggests, this shows all available sub-categories to
users to choose from. As before, open the “Variable Properties” window again. Go to
“Extended” tab and tick on “Always show picklist during entry”. (Figure 4.3.4)
picklist
31
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
3 4
DMY variable
Next variable is date of referral. This is a DMY type. Even though we specify its
length as 10 digits in our codebook, there is no need to specify here. So, we are going to select
32
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
6
1
2 7
3
8
4
10
9
As the last step, open the “Note” tab and enter a note “Enter 01/01/1900 if Missing.” as
shown in Figure 4.3.6.
Task 4.3.1. Create next variable “Name of patient” using string variable.
33
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
5
4
6
1
7
2
8
3
4
10
9
34
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
As the last step, open the “Note” tab and enter a note “Enter 99 if Missing.”
1
6
2
3
11
10 9
35
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Memo variable
Next variable is Complete Address. This is a long string type. So, we are going to
1 5
2
7
6
36
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
4.4 Alignment
If you completed task 4.3.3, you should have a dataform similar to Figure 4.4.1. The
fields in the figure are displaced and messy. Efficient data entry will not be achieved in this
condition. One way to organize our fields is to align them on the right side.
Alignment tool from toolbar has four main functionalities of alignment. You can
explore a bit to know better. Now, we will select all entry fields (not include headings), right-
align them and keep vertical fixed distance as 10 (pixels).
Note:
For window user, do not include “Memo” variable in the alignment because keeping fixed
(equal) distance distorts the height of Memo box. (This has not yet been fixed at the time of
37
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
writing this book.) Aligning is not rocket science and quite easy as EpiData provides auto-
suggested alignment feature (red horizontal and/or vertical lines for alignment).
Figure 4.4.1 Fields in dataform “dsRequest” after right alignment of vertical fixed
distance at 10 pixels
4.5 Creating derived fields
Derived fields are variables that do not exist in a data source and are created from one
or more existing fields, even across different data sources. (IBM Knowledge Center) In
EpiData, this is called as “Calculated field”. One commonly given example is deriving age
from date of birth.
Patient identifier
Keeping the next topic “unique index” in mind, we will add one more variable to the
dataform “dsRequest” we created earlier. As of now, any variables in our codebook does
not provide any uniqueness to the dataform, meaning that there can be duplicated records or
data staffs may enter the same record twice and yet our database will still accept them.
To remedy this, we will create a variable to track the patient, namely “pid” for Patient
identifier. This will be integer type with length of 4 digits and no missing value allowed.
Task 4.5.1. Create a variable named “pid” with information provided above. Add leading
zero from “Extended” tab. Place it between date of referral and name of patient.
Note: Leading zero means 0001 for 1 and 0023 for 23.
38
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
3
7
6
39
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
4
6
5
40
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Next step in creating derived fields is figuring out where to implement the deriving
process. It is quite simple and follows data entry flow. It will be implemented at the last variable
before the NO-ENTER variable. In our case, it is done at Patient identifier as shown in Figure
4.5.4.
Implement here
Open the “Variable Properties” of Patient identifier. If you forget how to do this,
recall Chapter 4.3.
6
5
Figure 4.5.4 Combine Fields in “Calculate” tab for derived field “uniqueID”
41
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
1
Right-click 4
6
5
Notes
NO-ENTER field cannot be set up as key for unique index. Hence, facility and
pid are used here instead of uniqueID.
Each single key field cannot be empty on saving records because of its intrinsic MUST-
ENTER property. Hence, after all key fields are keyed in, an implicit search is done by EpiData.
This means that users do not need to put any effort to search for any duplicates. When the key
index is found, the user can either choose to go to that record or edit values to create a different
index value.
42
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
43
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
44
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Menu bar
Toolbar
45
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Status Bar
Field in focus
Key fields
Mark for deletion or verification
Record controller
46
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
RECORD .
Another quick way to navigate your records is through “Goto Record” from dropdown
menu “Goto”. However, in this EpiData version, the function does not seem to work. I hope
they fix it in their next release.
Another two options are to List Records (Ctrl + L) OR show All Data (Ctrl +
D). List Records show current record and All Data displays all records. You can directly
double-click on the record you desire in order to open it.
5.4 Printing records
Printing dataform can be handy. In EpiData EntryClient, two printing options are
available. The first one is to print the dataform without data (Shift + Ctrl + P) and the
next one is to print it with data (Ctrl + P). Or you can find these options from dropdown
menu “File”.
5.5 Deleting records
EpiData is all about good quality data of which data security is an important aspect.
That’s why it is tricky to delete a record from EpiData. You cannot just press “Delete” key on
your keyboard. There is a special process to it.
1. Mark the record(s) you desire to be removed as “DEL” in EntryClient and save.
2. Pack the data in Manager.
After you mark the record as DEL , just move to next record or previous record. A
window will appear asking you to save the modified record as shown in Figure 5.5.1. Save it
and close the project in EntryClient.
In second step, open Manager, go to dropdown menu “Tools” and choose “Pack
Datafiles”. Then choose form.epx (Change directory if required). A window box with all
available dataforms in the project (in this case, Form 1) will appear. Tick on the dataform you
wish to perform the packing process as shown in Figure 5.5.2. Then click “OK”.
Figure 5.5.2 Window prompt display all available dataforms in the project
As you can imagine, this is tedious and may not be practical if you want to delete hundreds
of records.
Note
When a project is opened either in Manager or EntryClient, a temporary .lock file is created
in the same directory, indicating that the file is in use. So, if you open a project in Manager,
you cannot open that project in EntryClient at the same time. EpiData does not allow it.
Task 5.5. Delete the two records we just entered by marking and packing datafiles.
48
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
49
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
The process goes like this. Let’s name these two as A and B.
1. A reads field value out loud to B.
2. B enters the value as he hears.
3. And then B repeats it to A to verbally check or confirm it.
The same process should take place for both A and B’s turns. Although it may take up a lot
of efforts, this may prevent certain transcription errors such as transposing error or mistyping.
Task 6.1. Let’s try this process in our example project Form 1. Table 6.1 shows the data of 15
patients requested for sputum smear microscopy examination. Gather two persons (A and B)
and enter these data twice using the process described above.
50
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
51
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Figure 6.1.1 All Data display after data entry (Ctrl + D for window user)
Notes
In pid 4200, reason is blank while regNum is not. In practice, this case is not
uncommon. What one should do in this situation is to cross-check this record with any other
registry such as TB register.
In the case of pid 808, some value is there in regNum although reason is only for the
diagnosis. This kind of mistake can also occur in real world. This should be corrected at the
time of data entry.
Hence, data entry should also be trained about the data and its importance as well as
common errors during entry and instructions to follow in such case.
Validation
After double entry, we will check whether anyone of the two data entry staffs make any
mistakes or not. For the purpose of demonstration, let’s introduce some errors to B’s project
file, form1_double.epx.
• For pid 3307, change ptSex to Female.
• For pid 2480, change reason to Follow-up at month 4.
• For pid 4200, change ptName to Minn.
To validate the two files, open Manager. Go to Documents and choose Compare
Duplicate Files. Click on Add Files and select the two files: form1.epx and
form1_double.epx. (To select both files, press Shift and click on the files. OR you can
add one by one.) You should see similar to the Figure 6.1.2.
52
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
file manager
dataform
2 Choose both files Manager field
manager
1 Click here
The top longitudinal space is the place where input files for validation can be managed,
thus let’s call it file manager. The left space below it is to manage dataform (dataform
manager) and the right one is for managing fields (field manager).
There are a few options to explore around in field manager. Default display is on Join
by tab which basically tells EpiData which fields to take as key fields in order to match the two
files. In our case, we already define two keys (facility and pid). As you may notice,
EpiData automatically detects them.
Next two tabs are Compare and Options. In Compare tab, you can select fields of
desire to compare between the two files. In Options tab, you can
• Exclude deleted records,
• Ignore case in text variables,
• Ignore missing records in duplicate file
• Add result variable
The last choice (add result variable) is a handy tool for data validation. This create a new
variable of integer type with seven categories specific for validation. These categories are
shown in Figure 6.1.3.
53
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Before we generate the report, the last thing is to choose whether you want the report
in a text file or more formatted and stylish HTML file. Both options are fine, but I recommend
using HTML because of its stylish formatting and relatively better readability.
1 Datafile structure 2
Dataform structure
Validation Report 3
Figure 6.1.4 Report of data validation: information on (1) datafile structure, (2) dataform
structure and (3) validation report
54
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Figure 6.1.5 Datasets comparison, the place to look at for correcting the dataset
So, we’ve got our report. What’s next? We have to thoroughly cross-check with our
paper-based records or other registries. Mark the records of the datasets with actual error.
55
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Finalization
What we usually do at this stage is pick one file and modify whatever it is in that file.
It’s not a good practice. What we should do is to copy and paste one file and make change in
the copied version. Picking one file is straightforward but remember that if you pick the file
with less errors, your effort into correction will also be less. The newly copied version should
now be named as xxx_final.epx. In our case, we will name it as form1_final.epx.
To enumerate the steps,
1. Save the report.
2. Print it out and put it beside your computer.
3. Copy and paste one of the two files.
4. Rename the copied file to xxx_final.epx.
5. Make necessary changes.
6. Save it.
56
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
57
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
There are two tabs: (1) Export (this is main interface and does not vary based on data
type), and (2) Options (this provides additional settings based on data type). Generally, this is
a very clean and intuitive interface. On left upper side, we can change (1) data type, (2) export
folder and (3) exported filename.
On right upper side, there are four options:
(1) No Data (Structure only) – this is useful when you copy emptied project. This
function can be an alternative for preparation of double data entry.
(2) Include Deleted Records – when a record is marked for deletion, you can exclude
or include in the export even though the record is not physically deleted.
(3) Create export report
(4) Export to single file – this is handy when exporting relational dataform. This will
be discussed in next chapter.
In the lower part of the window, you can choose dataforms on the left side. On the
opposite, you can select variables of desire on Export Variables and specify the range of
records to export in Dataform Options.
Stata
By default, EpiData points to Stata 8,9 data type for data export, meaning that the
exported file is compatible with old version of Stata software. EpiData now supports Stata data
version up to 14.0.
Second, you can convert names of variables to either one of the three options:
UPPERCASE, lowercase or Leave as it is. This becomes very handy for data analysis process.
Finally, you can choose to either export valuelabel or not. Value labels are a feature of
data analysis using Stata.
CSV File
There is not much to change here, except some options to convert separator symbols
which is not recommended to do at all. You can remove the heading or variables’ name which
is usually first row, but again it is not recommended to change anything in this type.
SPSS
Statistical package for social science (SPSS) is a commonly used software for data
management and statistical analysis. Less options here, only one options to export Value Labels
or not.
58
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
DDI
This will export data and meta-data in eXtensive Markup Language (XML). It is a text-
based file and can store complicated data structure such as relational data. However, there are
a whole session of debates out there on whether XML is the best option for storage and retrieval
of data. Basically, it uses tags to identify the data which has been stored in an organized way.
EpiData also uses its own grammars of different tags to structure and store data, which is a
more advanced topic and will be discussed in the later chapter.
Even though it presents with several options to poke around, it is best to use the default
option if you ever need the data in XML format.
EPX
Finally, you can just export data in EpiData project file. Since less is more, it is
sometimes more efficient with less options.
Task 6.2. Export form1.epx using EPX file type into two files: (1) form1_A.epx which
will contain record 1 to 8, and (2) form1_B.epx from record 9 to 15.
Note: We will use these two files for exercise in the next chapter Appending Records.
59
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Technical Notes
EpiData XML File Format Specification (EPX) is a simplified EpiData specific adapted
data file XML structure in 2009. It was based on The Data Documentation Initiative format
(DDI) and the ODF standard. The purpose is to have a uniform way of saving and documenting
data since there are a substantial number of varieties of alphabets, numbers and character sets
on different types of platforms (Linux, Mac, Windows). (EpiData Wiki)
The essential requirements into developing the format narrowed down to the following
facts:
• Speed of data retrieval and writing
• Cross-platform compatibility
• Support of Unicode and other character sets across different countries
• Minimal drawbacks from general data format specification requirements
• Support for export and import functionality.
The details on how the XML schema works are beyond the scope of this book. Read
the usage of XML Schemas (also known as .xsd files) on the W3C school and the
specification for XML schema files on the W3C. The full documentation for EpiData’s schema
file can be found here, which is an autogenerated list of html pages using the program
<oXygen/> editor.
60
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
To append, go to Tools and choose Append. Then choose the base file. In our case,
we choose form1_A.epx which we have prepared in previous chapter.
Now you will see the window as shown in Figure 6.3.2.
Next, click Add Files on the window to add more files. Choose form1_B.epx for
our example. Then make sure to include both files by checking include box as shown in Figure
6.3.3. Select fields you desire in the lower part of the window and click OK.
61
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
You will now see the message from EpiData that our appending process is a success.
62
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Task 6.3. Append form1_A.epx to form1.epx and observe the warning message.
63
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
64
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
Make sure to notice that the file format is not the same here as before. It is in .zky
format. But the principle is the same. In order to get the data inside the file, you need EpiData
Manager or at least the password to decrypt it. The former backup file type can be opened in
65
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
EpiData Manager or EntryClient without the need for password. Let’s try to open the file in
Notepad. See Figure 6.5.2 for gibberish contents inside .zky and .epz files opened in
Notepad.
Now let’s try extracting the file to get our original data. Before doing this, let’s rename
our current form1.epx to form1_ORIGINAL.epx. Go to Tools and Choose Extract
Archive. As shown in Figure 6.5.3, choose form1.zky, check both Decrypt and Unzip, and
key in our notorious password, 1234. Click OK. You may select desired destination folder if
you want.
Now we get our data back from archived file. An alternative is to open archived files
from EpiData directly. It has some disadvantages and is generally not recommended.
66
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
67
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
68
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
69
A Guide to Data Entry and Documentation in EpiData | Myo Minn Oo
70