Session 10: Batch Edit and Export
Session 10: Batch Edit and Export
Session 10: Batch Edit and Export
To create a batch edit application you choose FileNew from CSPro and choose Batch Edit Application.
You then choose a dictionary. This is usually the same dictionary that you used for data entry.
The user interface for working with batch applications is similar to the one for working with data entry
applications except that there are no forms. Instead there is a tab for edits. Just like in data entry, you
add logic to PROCS. Instead of running interactively, all the error messages are written out to a log file
for review after the whole program has run.
Next we run the application but first we need a test data file. You can use the file Popstan2020Raw.csdb.
Run the application against this test data. After the application runs, we see the log file. The log file
reports that we have a case where this error exists.
To figure out what the problem is we can open up the problem case in data entry. The printout in the
listing file contains the case identifiers which we can use to find the case. You can copy the case id from
the listing file and use it with Find Case on the Edit menu in CSEntry.
Correcting Errors
In addition to using batch edit to find errors you can also use it to correct problems by modifying
variables in your logic. Let’s simply cap the age at first marriage to never be greater than the age.
When we run this time we will specify an output file: Popstan2020Edited.csdat. The changes we make
will only be made to the output file. We can then rerun the batch application on the output file and
make sure that you don’t have any error messages.
Instead of just assigning the value of AGE_FIRST_MARRIAGE we can use the impute command which
does the assignment just like “=” but also generates a nice report showing the values that were
imputed.
The imputation report will be opened in TextViewer after you run the batch application but to see it you
will need to go to the Window menu and choose the file that ends in “.frq.lst”.
________________________________________________________________________________
Imputed Item AGE_AT_FISRT_MARRIAGE: Age at first marriage - all occurrences
_____________________________ _____________
Categories Frequency CumFreq % Cum %
_______________________________ _____________________________ _____________
40 1 1 100.0 100.0
_______________________________ _____________________________ _____________
TOTAL 1 1 100.0 100.0
Run the program and look at the imputation report to see how many orphans are in our data set.
For each item in the checkbox field that is checked we want to set the corresponding item in the
repeating field to one and for each entry in the checkbox field that is not checked we set the
corresponding item to two. To find out if an item is checked we use the function pos() which finds the
position of one string in another and returns zero if the string is not found:
This works but it is a lot of code when the number of options is long. We can simplify it using a loop and
a clever trick.
PROC CROPS_PRODUCED
ENUMERATI HOUSEHOLD
PROVINCE DISTRICT ON_AREA AREA TYPE _NUMBER F01 F02 F03
2 14 214 1 1 3 1 3
3 27 301 1 1 9 9 1
1 1 345 1 5 1 1 1
3 26 222 2 2 1 4 1
4 37 422 2 1 1 5 1
2 2 214 3 1 1 2 4
1 1 101 1 1 2 5 1
1 row = 1
B1_ B1_ B1_ B1_ B1_ B1_ B1_ B1_ B1_ B1_ B1_ B1_ B1_ B1_ B1_ B1_
household
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16
1 2
1 2 3
2
1 2 3 4
1 2 3 4
1 2 3
1 2 3 4 5 6 7
1 row = 1
person
The solution is to export each record in a separate file by selecting Multiple Files (one for each record
type) under Number of Files Created. Doing this generates two files: PERSON.csv which contains the
household members and DEATHS.csv which contains the deaths.
PERSON.csv
DEATHS.csv
ENUMERA HOUSEHO
TION_ARE AREA_TYP LD_NUMB
PROVINCE DISTRICT A E ER E3 E4 E5
2 14 101 1 1 1 Erwin 20140211
2 14 101 1 1 2 Carmine 20161201
2 27 200 1 1 1 Arnold 20150113
The households in these two files can be linked together based on the id items.
PERSON.CSV
DEATHS.CSV
ENUMER
PROVINC ATION_A AREA_TY HOUSEHOLD_N
E DISTRICT REA PE UMBER F03 F05 E3 E4
2 14 101 1 1 1 1 1 Erwin
2 14 101 1 1 2 1 2 Carmine
2 27 200 1 1 1 3 1 Arnold
It is not possible to join multiple records to other multiple records from export.
Note that in some cases it may be easier to do the export of the single and multiple records without
joining and then do the join in the software that you have imported the data into.
1. Modify the batch edit application to add a check for someone with relationship of spouse but
marital status that is not married. Print a message for each case found. This should be done in
the batch edit application NOT in the data entry application.
a. The total number of rooms (F01) is greater than the number of bedrooms (F02)
b. The type of main dwelling (F03) is consistent with the total numbers of each dwelling
type in F04. For example, if the main dwelling is detached house then the total number
of detached houses must not be zero.
3. Modify the batch edit application to convert the disabilities (B10) from alpha (used for
checkboxes) to a set of numeric yes/no variables. Create the new variables in the dictionary and
write logic to set the value of each of the new variables based on the letters.
4. Using the test data, export the household members along with the ID items to the package of
your choice (Excel, SPSS, Stata, SAS or R). Use that package to determine the total number of
people by sex (total, males, females) and also the number of people by sex for district 01.
5. Export the deaths record and the person record along with the id items into the package of your
choice. You should export all the records at once, not one by one. How many households have
more than one death? How many households have children under 5 years of age?