D.2.5 Recommended Practices and Final Public
Report on Pilots
DOI: 10.5281/zenodo.1172058
Grant Agreement Number: 620998
Project Title:
Release Date:
European Archival Records and
Knowledge Preservation
12th February 2018
Contributors
Name
István Alföldi
István Réthy
Andrew Wilson
Clive Billenness
Anders Bo Nielsen
Phillip Mike Tømmerholt
Alex Thirifays
Hans Fredrik Berg
Terje Pettersen-Dahl
Arne-Kristian Groven
Tarvo Kärberg
Karin Oolu
Raivo Ruusalepp
Ats Rand
Gregor Završnik
Boris Domajnko
Joze Skofljanec
Miguel Ferreira
Zoltán Lux
Mezei József
David Anderson
Janet Anderson
Affiliation
National Archives of Hungary
National Archives of Hungary
University of Brighton
University of Brighton
Danish National Archive
Danish National Archive
Danish National Archive
National Archives of Norway
National Archives of Norway
National Archives of Norway
National Archives of Estonia
National Archives of Estonia
Estonian Business Archive
Estonian Business Archive
National Archives of Slovenia
National Archives of Slovenia
National Archives of Slovenia
Keep Solutions
National Archives of Hungary
National Archives of Hungary
University of Brighton
University of Brighton
D2.5 Recommended Practices and Final Public Report on Pilots
Table of Contents
EXECUTIVE SUMMARY .......................................................................................................................................... 1
PLANNING AND EXECUTING THE E-ARK PILOTS ...................................................................................................... 4
PILOT PLANNING IN THE DESCRIPTION OF WORK (DOW) ............................................................................................................ 4
PILOT PLANNING DURING THE PROJECT .................................................................................................................................... 4
PILOT PREPARATION ............................................................................................................................................................. 5
PILOT EXECUTION .............................................................................................................................................................. 11
PILOT EVALUATION............................................................................................................................................................. 18
OVERVIEW OF THE E-ARK PILOTS ........................................................................................................................ 19
Full-scale pilots and OAIS process ............................................................................................................................. 20
Full-scale pilots and E-ARK uses-cases ...................................................................................................................... 21
Pilots using E-ARK tools and format specifications ................................................................................................... 22
PILOT REPORT .................................................................................................................................................... 23
PILOTS 1 - SIP CREATION ON RELATIONAL DATABASES .............................................................................................................. 23
Scenarios ................................................................................................................................................................... 25
Execution report ........................................................................................................................................................ 28
Changes to the original plans .................................................................................................................................... 30
Feedback report ........................................................................................................................................................ 30
Recommended practices and further recommendations .......................................................................................... 31
PILOTS 2 - SIP CREATION AND INGEST OF RECORDS.................................................................................................................. 32
Scenarios ................................................................................................................................................................... 34
Execution report ........................................................................................................................................................ 37
Changes to the original plans .................................................................................................................................... 37
Feedback report ........................................................................................................................................................ 37
Recommended practices and further recommendations .......................................................................................... 38
PILOTS 3 - SIP CREATION AND INGEST OF RECORDS.................................................................................................................. 40
Scenarios ................................................................................................................................................................... 42
Execution report ........................................................................................................................................................ 45
Changes to the original plans .................................................................................................................................... 47
Feedback report ........................................................................................................................................................ 47
D2.5 Recommended Practices and Final Public Report on Pilots
Recommended practices and further recommendations .......................................................................................... 48
PILOTS 4 - BUSINESS ARCHIVES ............................................................................................................................................ 49
Scenarios ................................................................................................................................................................... 51
Execution report ........................................................................................................................................................ 53
Changes to the original plans .................................................................................................................................... 53
Feedback report ........................................................................................................................................................ 54
Recommended practices and further recommendations .......................................................................................... 54
PILOTS 5 - PRESERVATION AND ACCESS TO RECORDS WITH GEODATA........................................................................................... 55
Scenarios ................................................................................................................................................................... 57
Execution report ........................................................................................................................................................ 60
Changes to the original plans .................................................................................................................................... 61
Feedback report ........................................................................................................................................................ 61
Recommended practices and further recommendations .......................................................................................... 62
PILOTS 6 - INTEGRATION BETWEEN A LIVE DOCUMENT MANAGEMENT SYSTEM AND DIGITAL ARCHIVING AND PRESERVATION SERVICE...... 64
Scenarios ................................................................................................................................................................... 66
Execution report ........................................................................................................................................................ 68
Changes to the original plans .................................................................................................................................... 69
Feedback report ........................................................................................................................................................ 70
Recommended practices and further recommendations .......................................................................................... 70
PILOTS 7 – ACCESS TO DATABASES ....................................................................................................................................... 72
Scenarios ................................................................................................................................................................... 73
Execution report ........................................................................................................................................................ 78
Changes to the original plans .................................................................................................................................... 78
Feedback report ........................................................................................................................................................ 79
Recommended practices and further recommendations .......................................................................................... 80
EXTERNAL EVALUATIONS ..................................................................................................................................................... 82
PILOT EVALUATION ............................................................................................................................................ 84
PROJECT LEVEL PILOT SUCCESS EVALUATION ............................................................................................................................ 84
PILOT AND SCENARIO LEVEL SUCCESS EVALUATION ................................................................................................................... 88
REFERENCED DOCUMENTS ................................................................................................................................. 91
D2.5 Recommended Practices and Final Public Report on Pilots
APPENDIX 1 – EXTRACT FROM E-ARK DOW ......................................................................................................... 92
D2.5 Recommended Practices and Final Public Report on Pilots
Executive Summary
E-ARK project
The goal of the European Archival Records and Knowledge Preservation (E-ARK) Project is to pilot archival services to
keep records authentic and usable based on current best-practices. These will address the three main endeavours of
an archive – acquiring, preserving and enabling re-use of information. E-ARK will demonstrate the potential benefits
for public administrations, public agencies, public services, citizens and business by providing easy and efficient access
to the archived records.
The project brings together a core group of European national archives, four leading research institutions, three
providers of archiving software solutions and services, two government agencies, and two international membership
organisations that represent the communities who stand to benefit from the project: data owners/providers,
archives, software vendors and solution providers.
E-ARK will, over a three year period, harmonise archival processes at a pan-European level supported by guidelines
and recommended practices that will cater for a range of data from different types of source including record
management systems and databases.
Work Package 2 (description from DoW)
The E-ARK General Model definition is a public deliverable of Work Package 2.
The overall objective of this work package is to ensure that the scenarios implemented at 7 identified pilot sites are
both realistic and relevant, that they bring together a meaningful subset at each site of the use cases in order to
establish a general model of the E-ARK service.
WP2 will
Identify specific use cases that will each be implemented in at least one pilot scenario, covering:
o Export from business systems
o Creation of SIPs from unstructured and structured data
o Execution of the complete SIP -> AIP -> DIP data-flow to support migration and submission/access scenarios
o Existing use cases for access to content in physical and virtual reading rooms (with appropriate access
controls) and as web-applications
o Additional use cases that augment the main pilot programme including short “stretch tests” and 3rd party
validation
Identify and mitigate legal and regulatory constraints.
Provide support and advice about the operational environment of the pilot sites to the teams in WP3-6 during
the planning phase (which corresponds to their main cycles of iterative (agile) design and development.
Page 1 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Support the teams working at the pilot site in the planning and deployment phase
Ensure smooth execution of the pilots.
Document the recommended practices and lessons learned in the project knowledge base.
T2.4 Future pilot deployment (M25-M27)
The objective of this task is to finalize the pilots in harmony with D2.1.
The Electronic Archiving Service consists of a series of activities covered by software tools and manual workflow
steps. These tools are currently partly in existence, some are being developed by E-ARK project, many more are to be
added by developments of the digital preservation community in the future. The role of this task is to identify the
most relevant scenarios for the E-ARK Service, define for each scenario which level of activity is needed in order to
bridge the gaps of the currently existing solutions (e.g. integration, software development, interface definition).
In order for the E-ARK service to demonstrate the functionality of the service built on D2.1 as fully as possible, the
pilot will be finalized around the 7 pilot sites. In order to plan ahead for the pilots, the project previously identified
three activity levels:
1. Full scale project pilot activities – implementation, by consortium members, of one or more scenarios at one or
more locations for a period of six months or longer. Members of DLM forum and DPC will receive details of the pilot
implementation and be invited to participate as observers. There are seven full scale pilots.
2. Additional project pilot activities – implementation, by consortium members of shorter ‘stretch’ pilots that extend
the scenarios or apply them in different contexts. This may include the participation of members of DLM Forum and
DPC who are not directly members of the E-ARK consortium
3. External validation activities – implementation of project results by members of DLM Forum and DPC as part of an
extended ‘Beta’ program with limited involvement from consortium members. Outcome of this task is the high-level
requirement specification of the full scale pilots and also scenarios, sites and requirements of the 2nd and 3rd level
pilots.
T2.5 Support and execution of pilots. (M7-M33)
The task is concerned with the implementation of the pilots defined in D2.3. The Task Leader contributes to providing
an appropriate methodological framework for all pilots for specifying the input/output points and the uniform
principles applied in the different areas, such as source data management, user training, user documentation, interim
reports and the final reports. In this way the results of the pilot sites are comparable and can be reliably proven in this
E-ARK-service pilot. There are seven 6-month pilot sites running concurrently and these are defined in Section B3.2a,
Approach.
Page 2 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
This document corresponds to the deliverable:
D2.5 Recommended practices and Final public report on Pilots
Arising from the experiences acquired during the 7 pilot deployments, this report describes the achievements and
results of the pilot activities over the entire three-year period with emphasis on the final year of the project. The
report lists the resources used and provides an evaluation of progress and final result against the project objectives
and milestones and documents the remaining problems. It summarises the recommendations and lessons learned
from each pilot and provides input for the overall final report of the project. This report will also be included in the
final, publishable project report [month 36]
Structure of this deliverable
This document summarizes pilot activities, achievements and best practice recommendations using the
following chapter structure:
Chapter 1 - This introductory chapter.
Chapter 2 - Planning and executing the E-ARK pilots
Summary of all pilot related activities in the 3 years of the pilot, from planning to
evaluation.
Chapter 3 - Pilot overview
A brief overview of the full-scale and additional pilots.
Chapter 4 - Pilot report
Summary of the pilot execution and results with recommended practices and further
development recommendations. The chapter consists of the following sections for each
full-scale pilot:
Pilot scenario details
Execution report
Changes to previous plans
Feedback report, and
Recommended practices and lessons learnt.
Chapter 4 ends with an overview of the external evaluations performed by non-EARK member
organizations.
Chapter 5 - Pilot evaluation
Evaluation of the full-scale pilot against project objectives and success criteria.
Chapter 6 - Referenced documents and web pages
Appendix 1 – Extract from E-ARK Description of Work
Page 3 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Planning and executing the E-ARK pilots
This chapter summarizes all the pilot related activities of the E-ARK project. The seven full-scale pilots were already
quite well planned in the Description of Work (DoW) document when we started the real project work at the
beginning of 2014. From that point until the very end, Work Package 2 (WP2) was focusing on pilot planning and, later
on, on execution and evaluation.
Phases of pilot related activities coordinated by WP2:
Pilot planning in the Description of Work (DoW) document
The starting point of our work was the pilot descriptions in the DoW.
Pilot planning during the project
In the first year our main goal was to define the use-cases and processes to serve as the basis of tool development
and format specification. The first version of the E-ARK General Model defined the use-cases and processes along
with cross-reference tables between E-ARK processes, tools, work packages, and pilots. After the publishing of EARK General Model, colleagues at the pilot sites were developing part of the requirement specification of the EARK tools.
Pilot preparation
Pilot execution
Pilot evaluation
This chapter is organized according to the above phases.
Pilot planning in the Description of Work (DoW)
The starting point of our work was the pilot descriptions in the DoW. The Description of Work (DoW) document
defines the pilot related tasks and the role of Work Package 2. Appendix 1 is an extract of the relevant part of the EARK DoW.
Pilot planning during the project
Pilots were planned to take place in the third year of the project when all tools and format specifications were ready
to be tested, but pilot related activities started at very beginning and accompanied the tool development and format
specification work throughout the project.
General Model 1.0
One of the first deliverables was the D2.1 E-ARK General Model of Use-cases and Processes. In the General Model we
defined the use-cases and processes which were the basis for further project activities like planning and development
of the E-ARK tools, and specification of E-ARK information package and content types.
The General Model was a joint work by the tool developers of the partner IT companies, and archivists from the pilot
sites. Along with the use-case definition we tried to reach a common understanding of the project. At that point – at
the very beginning of the work – every partner had some ideas about their own goals and tasks but hardly anyone
could see what the other partners would provide to the project. We found that some overall birds-eye approach
would help people better see their place among the various activities planned so we have included some crossPage 4 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
reference tables in the General Model as well. The cross-reference tables present relations between the different
project activities and products like work packages, tools, formats, and pilots.
Use Case View
General Model
GM-PI-6
GM-PI-7
GM-PI-8
GM-PI-9
GM-PI-10
GM-PI-11
GM-PI-12
Ingest
GM-I-1
GM-I-2
GM-I-3
GM-I-4
x
x
x
x
Create SIP
Start transfer to archive
SIP reception
Validate SIP
Manipulate SIP
Create fond(s)
Start generating E-ARK SIP
x
x
x
x
?
x
x
x
x
x
x
?
x
x
x
x
x
x
?
x
x
x
x
x
x
?
x
x
Upload SIP
Start AIP generation workflow
Validate AIP
Start AIP finalization workflow
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x x
x x
x
x
x
x
x
x
x
x
?
x
x
x
x
x
x
?
x
x
x
x
x
x
?
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Tools
WP6
x
x
x
WP5
x
x
x
x
x
WP4
x
x
Work Package
WP3
Pilot 4 (EBA)
Pre-Ingest
GM-PI-1 Define SIP content
GM-PI-2 Select data (with rules)
GM-PI-3 Select data (manual)
GM-PI-4 Extract data from DB
GM-PI-5 Extract data from DMS/RMS
Pilot 3 (NAE)
Pilots
Pilot 1 (DNA)
Pilot 2 (NAN)
Use Case
Pilot 5 (NAS)
Pilot 6 (KEEP)
Pilot 7 (NAH)
E-ARK
DBExport tool
DBExport tool
ESSArch Tool s
ESSArch Tool s , Noa rk, Al fres co, RODA
DBExport tool , ESSArch Tool s , SIP crea tion tool s ,
RODA-i n, UAM
x
x
x
SIP to AIP convers i on tool s
SDB, EPP, RODA, AIS
x
SIP to AIP convers i on tool s
Pres ervi ca , EPP, RODA, AIS
x
x
x
x
SIP to AIP convers i on tool s , Pres ervi ca , EPP, RODA, AIS
SIP to AIP convers i on tool s , Pres ervi ca , EPP, RODA, AIS
SIP to AIP convers i on tool s , Pres ervi ca , EPP, RODA, AIS
The General Model helped us better understand every partner’s planned contribution to the overall objectives and
gave us a better picture of the whole project. As a result of this common approach the pilot representatives at the
meetings tried to think ahead about what they really need and wanted to try out later in the third year, and tried to
gently lead tool developers towards solutions which better suited their demands.
Requirement specification
After completing the General Model the Pilot site members took part in the next project phase, the requirement
specification work. On the basis of the General Model (and the discussions about it) they could articulate their
requirements better at the technical work package (WP3-6) requirement specification meetings. The results of this
work were the requirement specifications of the pre-ingest, ingest and access tools, along with E-ARK information
package (SIP, AIP, DIP) and content type (SIARD 2.0, SMURF) specifications.
Tool development and format specification
Cooperation between archivist of the pilot sites and tool/specification developers continued during the development
and specification phase, keeping the pilots in mind.
Changes to the planned pilot activities
At this phase there were no major differences identified compared to the plans written in the Description of Work.
Pilot preparation
Actual pilot preparation work started in the second year. WP2 and the pilot sites wanted to make sure that the tools
being developed and format specifications being defined were in line with their planned piloting activities. Therefore
we started to define the pilots very early.
Early pilot preparation works
At the 2015 Portsmouth and Lisbon meetings we held pilot preparation sessions. We agreed on the organization of
preparation activities and a schedule. In the summer of 2015 the structure of the pilot definition document was also
approved by project members.
Page 5 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot Cards
In order to promote early understanding of the pilot activities and requirements and to provide a quick overview at a
central point of information we developed the Pilot Cards. Pilot Cards were the first formalized appearance of the
pilot activities in the E-ARK community.
The Pilot Cards provide an overview of the pilot including scope and objective, contact info of the pilot leader and
contributors, OAIS relevance, usages of E-ARK tool and information package as well as status information about the
definition, installation and execution. Pilot Cards can also serve as a central information point for the EARK
community to find detailed pilot information descriptions and corresponding documents.
Pilot Card example
Pilot execution completed
Pilot Scenarios
Links
Extracting records from database (Data Set 1) - database with no documents
Extracting records from database (Data Set 2) - database with no documents (large)
Extracting records from database (Data Set 3) - database with documents
Extracting records from database (Data Set 4) - database with documents (large)
Process and use case information
Pilot definition
Test data specification
Pilot documentation
Page 6 of 100
Access
CMIS portal/viewer
Oracle (OLAP Viwer)
Peripleo
Geoserver
QGIS
Single file Viewr
ERMS Viewer (Alfresco)
DB Viewer (Sofia)
IP Viewer
DBPTK
AIP2DIP (E-ARK Web)
E-ARK Web (Search)
ESSArch Preservation Platform
Lily - Ingest
OMT - Order Management Tool
Order Submission Service
OMT - Search and Dsiplay GUI
Catalogue (E-ARK web)
RODA Repository
SIP2AIP (E-ARK Web)
SIP creator (E-ARK Web)
UAM
ESSArch Tools Archive (ETA)
X
Scenario 1
Scenario 2
Scenario 3
Scenario 4
E-ARK DIP
Data
Management
Preservation
Skype
philliptommerholt_rigsarkivet
M03.3, M03.4 (DoW)
Database Preservation Toolkit
Timeframe
Preconditions
E-ARK tools
ESSArch Tool Producer (ETP)
Short description
e-mail
[email protected]
[email protected]
The scope of this Pilot is to test the E-ARK SIP Creation tool with not less than 4 databases of different sizes and
complexities (one contains several million records)
Creating SIPs for relational databases using the tool created in WP3, T3.3: SIP Creation Tools, for
further evaluation
The goal of the pilot is to create SIPs in EARK-SIP format of each selected database with the DBextract tool. After
quality assurance on each SIP, a feedback will be given to WP3
M28-M33
RODA-In
Object
X
Danish National Archives
Magenta
Name (Title)
Phillip Mike Tømmerholt
Anders Bo Nielsen
Alfrsco Export Module
Task leader
Supported by
Contacts
Contact Person
Contact Person
Pilot staff members
Scope
E-ARK AIP
Pilot execution started
E-ARK SIP
Installation ready
OAIS relevance
Ingest
Installation started
√
√
-
Pre-Ingest
Pilot defined
HDFS-Storage
Status
SIP Creation on relational databases
ESSArch Preservation Platform
Pilot #1
D2.5 Recommended Practices and Final Public Report on Pilots
Pilots Definition
At the fall of 2015 we had the first draft of the document D2.3 Detailed pilot requirements. The most important part
of this document was the “Pilot Definition”.
Pilot definitions came in the form of Excel files and defined the pilot scenarios in detail. The sheets of the excel file
are:
Overview
Scenario description
Data description
Pilot preparation checklist
Step-by-step process description sheets for Pre-Ingest, Ingest and Access processes
The logical structure of the Pilot Definition description:
Pilot
Scenario
Business use-case (from General Model)
Used Information package types
Used E-ARK tools
Data Set description
Content description
Metadata description
Pilot preparation description and status information
Process description
Process step and low-level use-case (from General Model)
Used E-ARK and local tools
Preliminaries and start condition
Input/output description
E-ARK (and local) tools usage details
The scenarios, data and tool usage along with pilot preparation and step-by-step process activities are defined in
detail in the Pilot Definition excel documents. The final version of the Pilot Definition excel file of each pilot is part of
the deliverable D2.4 Pilot Documentation.
Detailed Pilot Requirements
Beside the pilot definition excel files, the D2.3 Detailed Pilot Requirements document defined the following
requirement types:
Schedule
Success criteria
Support requirements
Requirements for tool developers in regard to supporting pilot preparation and execution activities
Feedback requirements
Requirements for pilot staff members about how to provide feedback on tools and format specifications
Documentation requirements
Here are some example pages of the pilot definition from the deliverable D2.3 Detailed Pilot Requirements:
Page 7 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot 5
Task leader
Supported by
Scope
Preservation and access to records with geodata
National Archives of Slovenia
Danish National Archives
Pilot will prove that the SIP and DIP implementations fulfill specific requirements for the records containing GIS
data, test the instructions (for the producer and for the archive) regarding all phases of ingest, to prove that the
archival use of GIS data is possible (via open data method, direct access in the archives and use GIS data as search
criteria in the DIP contents).
Object
Pilot report with recommendations about urgent improvements and possible future improvements support for
WP6 & WP7 setting up the work environment of selected E-ARK archival tools provide real life examples how the
project deliverables can be used
Short description
During the e-ARK project the standardized method for ingesting geo data will be developed. This will allow the
archives to offer geodata as a selection and display criteria of records by means of integration of current state of
the art tools.
Timeframe
Preconditions
Contacts
M25-M27: setting up the pilot sites; M28-M31: running the pilots; M32-M33: testing and reporting
Page 8 of 100
X
X
X
X
Oracle (OLAP Viwer)
Peripleo
X
Geoserver
X
QGIS
X
Single file Viewr
X
ERMS Viewer (Alfresco)
X
DB Viewer (Sofia)
E-ARK Web (Search)
X
IP Viewer
ESSArch Preservation Platform
X
DBPTK
Lily - Ingest
X
AIP2DIP (E-ARK Web)
OMT - Order Management Tool
E-ARK DIP
Order Submission Service
X
Storage - Access
OMT - Search and Dsiplay GUI
X
Skype
gregor.zavrsnik
ICA-AtoM Catalogue
E-ARK AIP
HDFS-Storage
X
X
RODA Repository
ESSArch Tools Archive (ETA)
X
UAM
ESSArch Tool Producer (ETP)
RODA-In
Alfrsco Export Module
Database Preservation Toolkit
E-ARK SIP
E-ARK Tools
ESSArch Preservation Platform
E-mail
[email protected]
[email protected]
[email protected]
Ingest - Storage
SIP2AIP (E-ARK Web)
OAIS Relevance
Name (Title)
Gregor Završnik ()
Alenka Starman ()
Joze Skofljanec ()
Pre-Ingest
SIP creator (E-ARK Web)
Contact Person
Pilot staff member
Pilot staff member
M03.3, M03.4, M04.2, M05.4, M05.6 (DoW)
D2.5 Recommended Practices and Final Public Report on Pilots
Search and Access information using Geadota
Create DIP from AIP containing record with Geodata. Present Geodata information with QGIS along with
content and metadata from DIP.
A data object containing geodata can be identified by using search criteria as specified by E-ARK Tool
requirement specification. Selected data objects are selected and order is issued. DIP is prepared according
to order specification and end user credentials. DIP file structure with file descriptions (mime type, short
description) is presented to the enduser. Geodata from the order can be accessed in the designated viewer
(QGIS). The user checks authenticity of the DIP by accessing PREMIS documentation. Access to DIP is
documented and captured metadata can be exported.
Use-case
Note
Storage - Access
Access geodata via QGIS
Access records with Geodata and present geodata with QGIS
Pilot 5
Pilot Data
Information Packages (IP)
IP
E-ARK SIP
Note
x
Focusing on Geodata preservation
x
Focusing on Geodata preservation
x
Focusing on Geodata access
non E-ARK SIP
E-ARK AIP
non E-ARK AIP
E-ARK DIP
non E-ARK DIP
Pilot data description
Data Set 1
Description
Data type
Metadata format
less
Data Set 2
Description
Data type
Metadata format
less
Records and metadata of administrative units until 1994 exported from GURS
(The Surveying and Mapping Authority of the Republic of Slovenia)
Records and metadata of maps with Geodata
GML document with metadata in XML format, ESRI Shapefile, csv
ISO 19115 (INSPIRE)
62 records (cca. 3MB)
Records and metadata of Natura 2000 areas, exported from ARSO
Records and metadata of maps with Geodata
GML document with metadata in XML format
ISO 19115 (or INSPIRE)
1209 records (cca. 10 MB)
Page 9 of 100
X
X
X
X
Oracle (OLAP Viwer)
Peripleo
X
Geoserver
X
QGIS
X
Single file Viewr
X
ERMS Viewer (Alfresco)
E-ARK Web (Search)
X
DB Viewer (Sofia)
ESSArch Preservation Platform
X
IP Viewer
Lily - Ingest
X
DBPTK
OMT - Order Management Tool
X
AIP2DIP (E-ARK Web)
Order Submission Service
E-ARK DIP
OMT - Search and Dsiplay GUI
X
ICA-AtoM Catalogue
SIP2AIP (E-ARK Web)
SIP creator (E-ARK Web)
UAM
ESSArch Tools Archive (ETA)
ESSArch Tool Producer (ETP)
RODA-In
Alfrsco Export Module
Database Preservation Toolkit
E-ARK Tools
E-ARK AIP
HDFS-Storage
Ingest - Storage
E-ARK SIP
ESSArch Preservation Platform
Pre-Ingest
OAIS Relevance
RODA Repository
Scenario 2
Description
D2.5 Recommended Practices and Final Public Report on Pilots
OAIS Process
Pre-Ingest
Main Process Stepps
Scenario 1
Content definition
Technical feasibility
Legal issues etc.
Create/Review transfer
agreement
Select data
Manual compilation of
non ERMS content
Data Extraction
Metadata mapping
QGIS
Used local tools
Existing archival system
Producers tools
Producer tools, open
convesion tools
MS Excel, Inspire Metadata
Creator
Producer
Producer
Producer
Producer
Prelemineries and Start condition
Producer + Archivist +
Technical Specialist
Official archival records
definition
Input
Official archival records
definition and technical
documentation
Submission Agreement
Submission Agreement
Output
Submission Agreement
Data selection list
Extracted data
Subission Agreement
Additional Data and
documentation
INSPIRE.xml, Submission
Agreement, MS Excel
template for EAD
conversion
Inspire.xml, MS excel w.
metadata
Existing system
Producers tools
Producer tools
Producer GIS system, MS
Excel
Producer
Producer
Producer
Producer
Subission Agreement
Additional Data and
documentation
INSPIRE.xml, Submission
Agreement, MS Excel
template for EAD
conversion
Inspire.xml, MS excel w.
metadata
Perfomer (actor)
ESS Arch ETP
ESS Arch ETP
Producer
Producer
Extracted data
Additional Data and
documentation
Inspire.xml, MS excel w.
metadata
Subission Agreement, SIP
E-ARK SIP
Submited SIP
ESS Arch ETP
ESS Arch ETP
Producer
Producer
SIP Creation and Ingest of geodata in GML format
Used E-ARK tool
Used local tools
Prelemineries and Start condition
Producer + Archivist +
Technical Specialist
Official archival records
definition
Input
Official archival records
definition and technical
documentation
Submission Agreement
Submission Agreement
Output
Submission Agreement
Data selection list
Extracted data
Perfomer (actor)
Pilot 5
Submit SIP
SIP Creation and Ingest of geodata in GML format
Used E-ARK tool
Scenario 3
Post-packaging quality
control
Create SIP
Extracted data
Additional Data and
documentation
Inspire.xml, MS excel w.
metadata
Subission Agreement, SIP
E-ARK SIP
Submited SIP
Pilot Preparation
Preparation status
Software component
Tool / Version number
Scenario
Prepa ra ti on tas ks rel a ted to the s oftwa re
components
from Softwa re Component Ma tri x
(for E-.ARK tool s )
from Scena ri os s heet
Component 1.
Component 2.
Component 3.
Component 4.
Component 5.
Component 6.
Component 7.
Component 8.
Component 9.
Component 10.
Component 11.
Component 12.
Component 13.
ESSArchive ETP
ESSArchive ETA
ESSArchive EPP
Integrated Platform (EARK WEB)
QGIS
Inspire metadata editor
EAD metadata editor
Search and display GUI
Peripleo
OMT
Archival Catalogue (EAD based)
Lilly
Geoserver
Scenario 1, 3
Scenario 1, 3
Scenario 1, 3
Scenario 1, 2, 3, 4
Scenario 1, 2, 3, 4
Scenario 1
Scenario 1, 3
Scenario 2, 4
Scenario 1, 2, 3, 4
Scenario 2, 4
Scenario 1, 2, 3, 4
Scenario 1, 2, 3, 4
Scenario 2, 4
Pilot dataset
Dataset #
Process
Tool selected
Tool available for Pilot
Tool/Version installation
Tool configuration
Knowledge overtaken
Tool ready for Pilot
from Proces s es s heets
Yes /No / (i s s ue)
Yes /
(pl a nned da te of a va i l a bi l i ty)
Ins tal l ed / (i s s ues )
No needed / Confi gured /
(i s s ues )
Yes / (i s s ues )
Rea dy / (i s s ues )
Pre-ingest
Ingest
Ingest (Access?)
Ingest, Access
Pre-Ingest, Ingest, Access
Pre-ingest
Ingest
Access
Ingest, Access
Access
Ingest, Access
Ingest, Access
Access
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
No
No
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
No
No
Yes / in 2/2 April
No
No
Yes / in 2/2 April
Yes
Not installed
Not installed
Not installed
Not installed
Installed
Online
Not installed
Not installed
Not installed
Not installed
Not installed
Not installed
Installed
Need support form ESS
Need support form ESS
Need support form ESS
Need support form AIT
None needed
None needed
Need support form ESS
Need support form AIT
Need support from AIT
Need support form Magenta
Need input
Need support from AIT
None needed
Basic training completed No, local installation needed
Basic training completed No, local installation needed
Training in progres
EAD Support, Some validation features
Training required
???
Yes
Yes
Yes
Yes
Further knowladge transfer re
???
Further knowladge transfer re
NO
Further knowladge transfer re
Yes
Further knowladge transfer re
NO
Further knowladge transfer re
NO
Further knowladge transfer re
Yes
Yes
Yes
Prepration status
Prepa ra ti on tas ks rel a ted to pi l ot da ta
from Pi l ot Da ta s heet
Scenario
from Scena ri os s heet
Slovenian Register of spatial units selected
Data set 1
Natura 2000 dataset
Data set 2
…
Data selected
Yes / (i s s ues )
1,2 Yes
3,4 Yes
Legal issues
Data available
Dataset ready for Pilot
None / (i s s ue)
Yes / (pl a nned da te) / (i s s ue)
Rea dy / (i s s ue)
None
None
Yes
Yes
Yes
Yes
Prepration status
Infrastructure
Prepa ra ti on tas ks rel a ted to pi l ot
i nfra s tructure
Scenario
from Scenarios sheet
Process
from Proces s es s heets
Element selected
Yes / (i s s ues )
Issues
None / (i s s ue)
Element ready for Pilot
Rea dy / (i s s ue)
Virtual server - Linux
For details please examine the complete D.2.3 Detailed Pilot Requirements document here:
http://eark-project.com/resources/project-deliverables/60-23pilotsspec
Weekly pilots meeting
From the beginning of 2016 weekly progress meetings were held via a Webex teleconference service. The pilot
representatives and staff members along with technical work package leads and some of the tool developers were
regular members of these meetings.
Changes to the planned pilot activities
Only smaller changes were necessary at this phase. Some of the data providers were not ready with the planned input
data so the archives needed to arrange different data sets. Some tools were not completed in accordance with the
original timetable so we rescheduled some of the scenarios, but fundamentally nothing threatened the successful
pilot execution.
Page 10 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
General Model 2.0
The creation of the General Model was originally planned to be a one-time activity in order to be the foundation of
tool development and format specification. No goals or requirements in the DoW corresponded to any further
developmental work. But after seeing how important a role it played in the common understanding of the various
goals and approaches of the E-ARK community, we have decided to update the General Model in order to keep the
model alive as a reference for the most important E-ARK elements such as tools, formats, use-cases and pilots. The
2.0 version of the model was an online PowerPoint presentation, but we soon discovered that an HTML version would
be more suitable both for project members and the wider public. The Power Point version was soon followed by an
online presentation in HTML format.
The General Model in its present form is a perfect starting point to get acquainted with the E-ARK project. It includes
a complete general reference to present the relationship among tools, use-cases, formats and pilots along with
thematic overview chapters with links to more detailed documents and corresponding web pages.
The latest version of the General Model can be found in the E-ARK Knowledge Base and is also accessible from the EARK project web site: http://eark-project.com/resources/general-model
Pilot execution
The execution of the full-scale pilots was planned for a 6 month period between month 28 and 33 (from May to
October 2016.). All technical and organizational arrangements were in place in April 2016. The full-scale pilots started
on 1 May 2016 as planned. Not every scenario was planned to start in May, but every pilot site started with some
scenarios in that month.
Page 11 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Software deployment
The software components for the first scenarios were all deployed and configured. Pilot staff members got
preliminary knowledge of the tools from the user manuals and on-demand consultations with the developers. The
interrelationships among tools were not clear enough so those pilots using many tools (Pilot 5 and 7) tried to create
the appropriate tool portfolio to cover all the steps and transitions being tried.
Feedback about tools and format specifications
The pilots were required to give feedback about the deployment, installation, execution and documentation of the EARK tools and about format specifications. The developers managed the issues, wishlists and comments on the
GitHub sites of the product, while feedback to format specification providers and information on recommended
practices was collected respectively in excel files provided by WP2 on the project’s Google drive.
Feedback lists
Feedback list
Description
Provided by
- Bug list
Bugs (issues) found during product execution
Developer on GitHub
- Wish list
Tool extension or modification demands
Developer on GitHub
- Comments list
Comments on tool functioning (anything worth to inform
developer about)
Developer on GitHub
- Installation recommendations
Comments or recommendation about the installation
process, install kits or installation documentation
WP2 on Google drive
- Feedback on documentation
Comments or suggestions to tool documentation
WP2 on Google drive
- Recommended practices
Experiences with tool execution and recommended
practices
WP2 on Google drive
For specification providers
SIP: WP3, AIP: WP4, DIP: WP5
- General comments and wishes
Issues, comments or wishes related the specific IP
WP2 on Google drive
- Recommended practices
Experiences with IP implementation (structure, mapping,
etc.) and recommended practices
WP2 on Google drive
For tool developers
Early progress
As with all large scale projects, at the beginning progress was very slow. We had to accept that only a part (and
probably the smaller part) of the archives’ work is the actual technical ingest or dissemination of the information. The
creation and approval of the formal submission agreements with the data providers took months in some cases. Also
some tools (like export modules, and some interfaces) needed adjustments according to the specific data types they
were to process. This was a normal procedure which could only be started after the formal agreement with the
provider of the data. In some cases (Estonian and Portuguese pilots) this activity required input from a local developer
who was not part of the E-ARK project. And we have to confess that the first versions of the new or modified tools
had bugs or incompatibility issues with each other and the format specifications. Newly recognized requirements
appeared, too, because despite all the discussions and consultations the archivists’ knowledge of the tools and the
developers’ knowledge of the archival work were initially incomplete.
Page 12 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
It was originally intended before the execution started that many scenarios would be ready by mid-summer but found
that at the end of July there was only one completed scenario.
Weekly pilots meeting
At the weekly pilots meetings every pilot representative reported on progress. We were able to discuss the issues
with the tool developers, find solutions to problems, or formulate questions to other project members who were not
present. The weekly pilots meeting continued until the end of the project.
Half-time report
At the end of the third month of the pilot WP2 created a (project internal) Half-time Report. The Half-time Report
summarized the progress of each scenario with status, and progress overview information and gave a list of the most
important issues.
Completing the scenarios
Then things speeded up. The tool developers’ response time was very quick. Right after an issue had been recorded at
GitHub it was possible to tell when the bug had been corrected or the new requirement could be implemented.
Archivists got better understanding of the tools. All legal issues with the submission agreements were solved and at
end of August and in September work normalized. Pre-ingest and ingest scenarios were close to reaching their goals
and almost all access scenario were able to be started. Only two permanent issues slowed the two scenarios at the
Estonian and the Slovenian National Archives. These were due to the late development of the required versions of the
ERMS Export Module and the Order Management Tool.
By the end of October – except for the two scenarios – all the full-scale pilots were completed according to the
workplans. These two scenarios were also completed later in 2016.
Monthly reports
The pilot progress was tracked in Monthly Pilot Reports produced at the end of each month by the pilot sites. The
report summarizes the activities of the last month, any issues and possible solutions, other comments and
recommended practices.
The monthly pilot report contained:
Scenario overview
Tools overview
IP feedback overview
Scenario details per scenario
Scenario Overview
Scenario
Started
Status
Comment
Completed
Number and Title of pilot scenario
Number and Title of pilot scenario
date
0%
date
Not
started
date
0-100 %
Page 13 of 100
reason for delayed status or any
D2.5 Recommended Practices and Final Public Report on Pilots
Number and Title of pilot scenario
Number and Title of pilot scenario
Number and Title of pilot scenario
Number and Title of pilot scenario
date
Delayed
date
0-100 %
date
Started
date
100 %
date
Completed
date
0-100 %
date
Pending
date
0-100 %
date
Aborted
important comments at scenario level
reason for pending status or any
important comments at scenario level
reason for pending / aborted / delayed
status or any important comments at
process step level
Tools Overview
E-ARK Tool – Version
Issues (bugs, wishes, comments)
Experiences / Recommended practices
Tool name – version
Used in tasks
list of process steps (or tasks)
Data (input / output)
Input: summary of input data
Output: summary of input data
Performance
Excellent / OK / Pure
Issues
issues that were entered to the bug list provided by the tool developers
Wishes
wishes that were entered to the wish list provided by the tool
developers
Comments
comments that were entered in the comment list provided by the tool
developer
Experiences and recommended
practices
any info on tool execution that could be important to tool developers
Scenario execution
Scenario
1. SIP Creation and Ingest of old (not normalized) database in SIARD 2.0 format
Started
date
Completed
date
Status
Not started, Started, Delayed, Pending, Aborted, Completed
Comment
reason for Pending / Aborted / Delayed status or any important comments at
process step level
Pre-Ingest / Ingest / Access steps
Process step*
Started
*
Completed
name of the process step from Pilot Definition excel
date
*
date
Page 14 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Status*
status at the end of the reporting period (Not started, Delayed,
Started, Pending, Aborted, Completed)
Duration*
duration of the process (only for Completed tasks)
Comment*
reason for Pending / Aborted / Delayed status or any important
comments at process step level
Task*
name of the task within the process step (each task must have a
separate process step table, see sample on Pilot 7)
Used tools*
empty if detail fields are filled
or summary of tools if detail fields are empty (Manual, Local tool
name)
Tool
tool name
(Indicates if a tool is not developed by using E-ARK “local”)
Version
(mandatory for E-ARK tools)
Input
input summary
Output
output summary
Performed by
task actor (e.g. Archivist, IT specialist, Technical administrator,
etc.)
Performance
any performance related info
Issues
all bugs, wishes, comments (that were entered in any of the lists
provided by the tool developer)
Experiences / Recommended
practices
any important info on tool execution
empty or “None” or “Not relevant”
Data
Input data
*
empty if detail fields are filled
or summary of input data if detail fields are deleted
Description
input data description
Content type
type of content
Metadata format
format of the metadata
Volume
volume of input data
Data manipulation tasks
Output data
*
further data manipulation activities (if any)
empty if detail fields are filled
or summary of input data if detail fields are deleted
Description
output data description
Content type
type of content
Metadata format
format of the metadata
Volume
volume of output data
Data manipulation tasks
further data manipulation activities (if any)
Internal data manipulation tasks
further task-internal data manipulation (if any)
Task description
description of the data manipulation activities
Input
internal input
Page 15 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Output
IP usage*
internal output
empty if detail fields are filled
or summary of IPs implemented if detail fields are deleted
IP type
SIP, AIP, DIP
(indicate if not E-ARK specification compliant “local”)
Description
IP description (structure, content)
Mapping concerns
any important metadata mapping related info
Content concerns
any important content related info
IP related issues, comments
important information for WPs responsible for the IP specification
Data related issues, comments
issues/comments worth mentioning (but not tool or IP related)
Data management experiences and
best practices
any important info on data handling
Used resources*
empty or “None”
Human resource
number of Archivists, IT specialists, Technical administrators, etc.
IT resource
(PCs, servers, architecture, OS, DB,
…)
IT environment, hardware and base software (any resources
important to reproduce the pilot)
Pilot documentation
At the end of October 2016 we had published deliverable D2.4 Pilot Documentation. This document had two parallel
goals. On one hand it is the latest version of the documentation followed by the pilots. It contains an updated version
of the pilot definition excel spreadsheet, the latest version of the actions to be performed with the latest tool versions
within the pilot period (month 28-33). It also provides the latest snapshot with the most up-to-date information on
pilot execution as we have performed it. On the other hand this documentation is the most comprehensive set of
instructions and information that could be provided to archives outside the project. It is useful for archives and
archivists who would like to use our outputs and repeat, in whole or in part, the pilot activities. The documentation
includes an overview document by WP2, the updated pilot definition files and detailed description of the scenario
execution by each of the pilot sites. These documents, created by the pilot representatives, lead the user through the
pilot process via a step-by-step explanation with user screen examples.
An updated version of the documentation has been delivered in January 2017 along with updated documentation for
Pilot 3.
For details, please read the complete D.2.4 Pilot Documentation here:
Part 1: http://eark-project.com/resources/project-deliverables/87-d24docs-p1-1
Part 2: http://eark-project.com/resources/project-deliverables/88-d24docs-p2-1
Changes to the planned pilot activities
At the execution phase there were some changes compared to the original workplans. These mainly extended the
scope of the pilots and are shown below:
Pilot 1 – No changes
Page 16 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot 2 – The National Archives of Norway (NAN) wanted to test the full spectrum of the ESSArch tool set. The
ESSArch Tool for Producers (ETP) is a component to help producers create SIP packages. The producer
partners of NAN on the other hand use a previous version of this tool which creates NOARK (the Norwegian
standard) output. NAN has therefore performed an additional scenario to test ETP. The ETP tool has also
been tested in Pilot 5.
Pilot 3 – Pilot 3 was supposed to perform pre-ingest scenarios with the ERMS Export Module but used the native
export functionality of their DELTA ERMS system because of the late deployment of the appropriate ERMS
Export Module version corresponding to the local producer’s requirements. The ERMS Export Module was
tested in 2 additional scenarios.
Pilot 4 – Pilot 4 had planned only 1 scenario with DBPTK but actually performed 3 more scenarios and all 4 were
extended by a DBVTK restore database step as well.
RODA-In was not used in this pilot because the native SIP creation tool was required to ingest into the
preservation system of the Business Archives. RODA-In, on the other hand, was tested in Pilot 5 and 7.
Pilot 5 – No changes
Pilot 6 – At the pilot planning phase the Porto Municipality in Portugal also showed great interest in participating in
an automatic ingest scenario. So a second scenario was planned with the same E-ARK component and
infrastructure. Subsequently, there were some resource planning problems with their local developer who
was needed to implement the producer-side infrastructure. The discussions and preparations continued
until August 2016, when the Porto Municipality finally decided to delay the project. It is still possible that in
the near future this scenario can be executed, but this will be beyond the timescales of this project.
Pilot 7 – No changes
Additional scenarios and External evaluation
Beside the 25 scenarios of the 7 full-scale pilots we have performed several additional scenarios. Additional scenarios,
according to the Description of Work, are other, simpler scenarios also performed by the E-ARK members. Additional
scenarios are either parts of the planned full-scale scenarios that, for some various (timing, not enough support from
producer, late development), could not be performed within scope of the full-scale pilots or additional steps the pilot
team wanted to try.
An external evaluation or validation, according to the Description of Work, is an evaluation or implementation of EARK products by members of DLM Forum and DPC or third parties outside the project with limited involvement from
consortium members. We have supported 5 external evaluations by 5 different institutions from around the world.
Some scenarios are completed and highly successful, some are still in progress or in preparation phase.
Additional scenarios and external evaluations, because they were outside the scope of the Description of Work, could
not be planned in the same manner and in the same detail as the full-scale pilots were. They were prepared according
to the results of other project activities and according to the needs and resources of the external partners.
Additional scenarios are presented along with the full-scale scenarios in this document because they were performed
by the same pilot team. External evaluations are detailed in a separate chapter (Chapter 4.8).
Page 17 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot evaluation
Evaluating success criteria
In the D2.3 Detailed Pilot Requirements document we have defined several success criteria at project, pilot and
scenario level for the 25 scenarios of the 7 full-scale pilots. The evaluation of the pilots against these criteria can be
found in Chapter 5. of this document.
E-ARK Final conference
At the E-ARK Final conference we had a session related to the experiences with the pilots. After an overview of the
piloting activities each full-scale pilot representative gave a presentation on pilot execution, results and lessons
learnt. The session ended with a panel discussion with all the pilot staff at the table and the audience could provide
their opinion and ask questions about the pilots.
Recommended practices and lessons learned
Collecting and publishing recommended practices along with other pilot results is one of the most important
objectives of the E-ARK project. Recommended practices and lessons learned are the essence of the all the pilot
planning and execution activities.
With this in focus we have been collecting our experiences in the form of recommended practices and other
comments during both the planning and execution phase of the pilots. During (and) after the execution period of the
pilots recommended practices and comment have been registered at different levels.
Tool related notes – at the GitHub page of the tool developers
Format specification related notes – in a Google Drive Excel table
Other recommended practices – in a Google Drive Excel table
All kinds of comments on pilot experience - in the Monthly pilot report
Pilot level recommendations about the usage of the tools and specifications are presented as separate chapters in the
main chapter for each pilot in the Pilot report part of this document.
D2.5 Final public report (this deliverable)
This deliverable summarizes the pilot planning and execution activities of the project. It provides details on the pilot
execution and recommended practices when using E-ARK tools or format specifications.
Page 18 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Overview of the E-ARK Pilots
In the scope of the E-ARK project the format specification and tool development have been performed by the 4
technical work packages:
WP3
Supplier Information Package (SIP) – information package format specification
SIARD 2.0 – content type standard for archiving databases,
SMURF (ERMS) and SMURF (SFSB) - content type defined by E-ARK to archive ERMS system or simple file
system based records,
Content type specification to store Geodata information during the archival and dissemination processes,
Data export and SIP creation tools supporting pre-ingest processes.
WP4
Archival Information Package (AIP) – information package format specification,
SIP validation and SIP to AIP conversion tools supporting ingest processes.
WP5
Dissemination Information Package (DIP) – information package format specification,
DIP creation and content viewers tools supporting access processes.
WP6
Integrated Prototype (E-ARK Web) – a complete reference implementation consisting of several stand-alone
tools supporting the full spectrum of OAIS processes.
In order to test the format specifications and tools developed by the project several pilot scenarios have been
planned and performed during project. The pilots have been organized in seven full-scale pilots, each performed by
one of the archival institution partners in E-ARK. (And one performed by an archival solution provider KEEP Solutions).
In the scope of the seven full-scale pilots we have defined 25 scenarios testing all the tools and formats developed
and specified by E-ARK in different combinations, different business and IT environments, according to different
archival strategies.
Some pilots were focusing on specific tools or processes of the OAIS models (1, 2, 4, 5, 6), others on archival and
access of specific content types (4, 5, 7), one on automated ingest (6), and two pilots had scenarios to test the full
spectrum of the OAIS processes along with the reference implementation: E-ARK Web (5,7). Some pilots followed a
business-as-usual strategy (1, 2, 4, 6), some piloted the tools in a combination of a test and the production
environment (3, 5, 7). We have tested both deployment versions of the E-ARK Web toolset, the virtual (5), and the full
deployment (7).
Beside the 25 full-scale pilot scenarios the project has performed some smaller-scope additional scenarios and
external evaluation scenarios, too. Additional scenarios are prepared and executed by the same pilot teams as the
full-scale pilots. External evaluations are performed by non-E-ARK member organizations.
Page 19 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
The following tables and graphs present the pilots and their relationships to other E-ARK elements. They help
positioning the pilot scenarios on the OAIS map and among the various E-ARK tools and format specifications.
(The figures are from the E-ARK General Model version 2.2.)
Full-scale pilots and OAIS process
Page 20 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Full-scale pilots and E-ARK uses-cases
Page 21 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilots using E-ARK tools and format specifications
E-ARK Tools and Format Specifications
Pilot 1 – Danish National Archives
Page 22 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot report
This section gives detailed information about the pilot scenarios performed in the scope of the E-ARK project.
Pilots 1 - SIP Creation on relational databases
Page 23 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot 1
SIP Creation on relational databases
Task leader
Danish National Archives
Supported by
Magenta
Scope
The scope of this Pilot is to test the E-ARK SIP Creation tool with not less than 4 databases of different sizes
and complexities (one contains several million records)
Object
Creating SIPs for relational databases using the tool created in WP3, T3.3: SIP Creation Tools, for further
evaluation
Short description
The goal of the pilot is to make four successful data extractions from live authentic databases into the SIARD
2.0 format.
Contacts
Name (Title)
E-mail
Contact Person
Anders Bo Nielsen
[email protected]
Pilot staff member
Phillip Mike Tømmerholt
[email protected]
E-ARK DIP
X
Scenario 1
Extracting records from database (Data Set 1) - database with no documents
Scenario 2
Extracting records from database (Data Set 2) - database with no documents (large)
Scenario 3
Extracting records from database (Data Set 3) - database with documents
Scenario 4
Extracting records from database (Data Set 4) - database with documents (large)
Additional scenario Experiments with Database Visualization Toolkit
Additional scenario Extract records with ERMS Export Module and ingest into Preservica (Joint scenario with NAE)
Page 24 of 100
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
ESSArch Tool for Producer (ETP)
RODA-In
ERMS Export Module
Database Preservation Toolkit
E-ARK AIP
SMURF ERMS
X
Lily - Ingest
E-ARK SIP
SIARD 2.0
E-ARK Tools
Storage – Access
Ingest - Storage
ESSArch Preservation Platform
E-ARK Formats
philliptommerholt_rigsarki
vet
IP Viewer
Pre-Ingest
RODA Repository
OAIS Relevance
Skype
D2.5 Recommended Practices and Final Public Report on Pilots
Scenarios
E-ARK AIP
E-ARK DIP
X
Scenario 2
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
E-ARK Format
specifications
Extracting records from database (Data Set 2)
Extracting records from database containing no documents.
Pre-Ingest
Extract and Ingest relational database based on SIARD 2.0
SIARD 2.0
Database Preservation Toolkit
Registry of Cultural Events from Kultunaut Aps
Database from the commercial company Kultunat Aps, which holds information about cultural events at a
national level, from events arranged by local communities to cultural events from the Danish cultural
institutions. The database contains more than 5 million records.
MySQL
Not relevant
large
Pre-Ingest
Ingest - Storage
E-ARK SIP
SIARD 2.0
Storage - Access
E-ARK AIP
X
SMURF ERMS
Page 25 of 100
E-ARK DIP
SMURF SFSB
Geodata
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
RODA Repository
SMURF ERMS
X
SIP creator (E-ARK Web)
RODA-In
ERMS Export Module
Storage - Access
Lily - Ingest
E-ARK SIP
SIARD 2.0
Database Preservation Toolkit
E-ARK Tools
Ingest - Storage
IP Viewer
Pre-Ingest
E-ARK Format
specifications
Universal Archiving Module
Data type
Metadata format
Quantity
OAIS Relevance
Extracting records from database (Data Set 1)
Extracting records from database containing no documents.
Pre-Ingest
Extract and Ingest relational database based on SIARD 2.0
SIARD 2.0
Database Preservation Toolkit
Health system from The Danish National Serum Institute
Database containing information from reported infectious diseases at a national level. 50-60 tables and about
90.000 records in the main table.
Microsoft SQL Server 2008
Not relevant
small
ESSArch Tool for Producer (ETP)
Scenario 1
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
ESSArch Tool for Producer (ETP)
RODA-In
ERMS Export Module
E-ARK Tools
Database Preservation Toolkit
D2.5 Recommended Practices and Final Public Report on Pilots
X
Storage - Access
E-ARK AIP
E-ARK DIP
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
SIP2AIP (E-ARK Web)
ESSArch Preservation Platform
SMURF ERMS
X
ESSArch Tools for Archive (ETA)
RODA-In
ERMS Export Module
Database Preservation Toolkit
E-ARK Tools
E-ARK SIP
SIARD 2.0
RODA Repository
E-ARK Format
specifications
Ingest - Storage
IP Viewer
Pre-Ingest
SIP creator (E-ARK Web)
Data type
Metadata format
Quantity
OAIS Relevance
Universal Archiving Module
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Extracting records from database (Data Set 3)
Extracting records from database containing documents.
The DNA will go to the producer’s site with the tool on a USB. The DNA will together with the producer use the
tool and make extractions into two formats: SIARDDK and SIARD2.0.
Pre-Ingest
Extract and Ingest relational database based on SIARD 2.0
SIARD 2.0
Database Preservation Toolkit
Administrative system from The Danish National Archives
Database containing information about all incoming scientific research data, and public deliveries of research
data. Database containing BLOBs/documents. Size 131 gigabyte.
Microsoft SQL Server 2008
Not relevant
small
ESSArch Tool for Producer (ETP)
Scenario 3
Description
X
Scenario 4
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Extracting records from database (Data Set 4)
Extracting records from database containing documents.
The DNA will go to the producer’s site with the tool on a USB. The DNA will together with the producer use the
tool and make extractions into two formats: SIARDDK and SIARD2.0.
Pre-Ingest
Extract and Ingest relational database based on SIARD 2.0
SIARD 2.0
Database Preservation Toolkit
Administrative and health records system from Ministry of Higher Education and Science.
Studenterrådgivningen is an institution under Ministry of Higher Education and Science, whose purpose is to
Page 26 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
E-ARK DIP
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
SIP2AIP (E-ARK Web)
SMURF ERMS
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
RODA-In
ERMS Export Module
Storage - Access
E-ARK AIP
SIARD 2.0 X
Database Preservation Toolkit
E-ARK Tools
E-ARK SIP
RODA Repository
E-ARK Format
specifications
Ingest - Storage
IP Viewer
Pre-Ingest
ESSArch Tool for Producer (ETP)
Data type
Metadata format
Quantity
OAIS Relevance
provide social, psychological, and psychiatric counselling, and treatment to students in their educational
situation. The database contains about 100.000 BLOBS/documents.
MS SQL Server 2008
Not relevant
large
X
Please note that you can find more details with screenshots on scenario execution in the previous deliverable D2.4
Pilot Documentation.
Additional scenarios
Experiments with Database Visualization Toolkit
The users search the database for information with real-life search scenarios.
Part of access
none
Database Visualization Toolkit
Database containing film and related data
Microsoft SQL Server 2008
Not relevant
small
Storage - Access
E-ARK AIP
E-ARK DIP
X
Page 27 of 100
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
ESSArch Preservation Platform
SMURF ERMS
X
SIP creator (E-ARK Web)
Universal Archiving Module
RODA-In
ERMS Export Module
SIARD 2.0
Database Preservation Toolkit
E-ARK Tools
E-ARK SIP
RODA Repository
E-ARK Format
specifications
Ingest - Storage
IP Viewer
Pre-Ingest
ESSArch Tool for Producer (ETP)
Additional scenario
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
D2.5 Recommended Practices and Final Public Report on Pilots
Storage - Access
E-ARK DIP
X
Execution report
Please note that SIARD DK is a standard database preservation format in Denmark. This is the reason for creating
(non-E-ARK) SIARD DK packages besides the SIARD 2.0 packages in Pilot 1. SIARDDK is a slight deviation from the
SIARD 1.0 format (created by the Swiss Federal Archives / Enter AG). It was deviated in order to support large
amounts of files, a feature now supported by SIARD 2.0
Scenario
1. Extracting records from database
(Data Set 1) - database with no documents
Started
Completed
Summary
May
2016
September
2016
SIARD2.0:
100% extraction of all tables and their data. The DNA has
manually validated the SIARD-package up against the
“eCH-0165 SIARD Format Specification 2.0“. There is no
automatic tool for this yet.
SIARDDK:
100% extraction of all tables and their data. The DNA has
validated against “Executive Order on Submission
Information Packages” and found no errors in the
product.
Page 28 of 100
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Order Management Tool
Lily - Ingest
SMURF SFSB
X
Search and Display GUI
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
E-ARK AIP
SMURF ERMS
ESSArch Tools for Archive (ETA)
E-ARK SIP
SIARD 2.0
SIP creator (E-ARK Web)
RODA-In
ERMS Export Module
Database Preservation Toolkit
E-ARK Tools
Ingest - Storage
SOLR Index
Pre-Ingest
E-ARK Format
specifications
Universal Archiving Module
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
Extract records with ERMS Export Module and ingest into Preservica (Joint scenario with NAE)
NAE was supposed to use the ERMS Export Module to export records from ERMS but because of the late
deployment of the tool NAE had to use a local export tool to complete the full-scale pilot. To test the ERMS
Export Module a joint additional scenario has been executed. DNA exported the records from Alfresco ERMS
with the newly deployed ERMS Export Module and sent the SMURF ERMS file to NAE where a SIP was created,
and ingested to Preservica. With this additional scenario every step that was originally planned to be tested in
Pilot 3 has been successfully tested.
Pre-Ingest, Ingest
Extract and Ingest ERMS records based on MoReq2010
SMURF ERMS
ERMS Export Module
ERMS system of The Danish School of Media and Journalism (Danmarks Medie- og Journalisthøjskole) (DMJX)
Different kinds of letters and documents
Records from Alfresco ERMS
EAD
121 files, 17 MB
ESSArch Tool for Producer (ETP)
Additional scenario
Description
D2.5 Recommended Practices and Final Public Report on Pilots
2. Extracting records from database
(Data Set 2) - database with no documents
(large)
June
2016
September
2016
SIARD2.0:
100% extraction of all tables and their data. The DNA has
manually validated the SIARD-package up against the
“eCH-0165 SIARD Format Specification 2.0“. There is no
automatic tool for this yet.
SQL Server:
SIARD-file was successfully uploaded to a MS SQL Server.
First attempt failed due to differences in primary key
names from PostgreSQL. Key names were manually
altered and created new SIARD-file and successfully
exported to MS SQL Server.
SIARDDK:
100% extraction of all tables and their data. The DNA has
validated against “Executive Order on Submission
Information Packages” and found no errors in the
product.
3. Extracting records from database
(Data Set 3) - database with documents
July
2016
September
2016
SIARD2.0:
100% extraction of all tables and their data in one single
SIARD-file. The DNA still has to export with a split to a
SIARD-file and an external LOB-folder.
The DNA also needs to validate the SIARD-package up
against the “eCH-0165 SIARD Format Specification 2.0“
SIARDDK:
100% extraction of all tables and their data. The DNA has
validated against “Executive Order on Submission
Information Packages” and found no errors in the end
product.
4. Extracting records from database
(Data Set 4) - database with documents
(large)
August
2016
September
2016
SIARD2.0:
100% extraction of all tables and their data. The DNA has
manually validated the SIARD-package up against the
“eCH-0165 SIARD Format Specification 2.0“. There is no
automatic tool for this yet.
SIARDDK:
100% extraction of all tables and their data.
Additional scenarios
Scenario
Started
Completed
Summary
Extract records with ERMS Export Module
and ingest into Preservica
(Joint scenario with NAE)
December
2016
December
2016
Successful extraction of 120 files. The SMURF ERMS file
was sent to NAE for SIP creation and ingest. (for more
details see the documentation of Pilot 3)
Experiments with Database Visualization
November
December
4 archivists tested the DBVTK application with real life
Page 29 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Toolkit
2016
2016
scenarios on a movie database looking for answers to
questions like “What langue is used in this film?” or
“Which stars plays in the movie?” They compered DBVTK
to the local search capabilities and screens of the
database.
The users were absolutely satisfied with the logic and
design of the tool and mentioned several clever ideas
compared to the search and display functions of Sofia.
They had many recommendations for the tool developer.
(see Recommended practices later in this chapter)
Changes to the original plans
There were no changes. The scenarios have been performed according to plans in DoW and D2.3 Detailed Pilot
Requirements.
Feedback report
The following table summarizes the feedback communication between the pilot staff and tool developers or format
specification providers.
E-ARK Tool – Version
Database Preservation Toolkit
(version2.0.0-beta4.2)
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
E-ARK Tool – Version
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/keeps/db-preservation-toolkit
Data extraction – all scenarios
Input: 4 databases from different producers
Output: 1 SIARD2.0 package + 1 SIARDDK package.
Excellent with SIARD 2.0 (OK with SIARD DK)
There have been several issues with DBPTK related SIARD 2.0 output. KEEP Systems has
corrected all the bugs and the response time was excellent. After the completion of the
scenarios no known issues remained.
A tool or function for automatic validation of SIARD 2.0 would be nice to have.
None
After correcting the early bugs the tool functioned properly.
Issues (bugs, wishes, comments)
Experiences / Recommended practices
Database Visualization Toolkit
Used in Additional scenario
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
Experiments with Database Visualization Toolkit
Movie database
Good
No issues found
Users recommend showing technical information about the database on a separate page.
Page 30 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
practices
E-ARK Tool – Version
ERMS Export Module
Used in Additional scenario
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
Issues (bugs, wishes, comments)
Experiences / Recommended practices
Extract records with ERMS Export Module and ingest into Preservica
(Joint scenario with NAE)
ERMS system of The Danish School of Media and Journalism (Danmarks Medie- og
Journalisthøjskole) (DMJX)
Good
No issues found
Recommended practices and further recommendations
The following table contains the recommended practices and further development suggestions collected during pilot
execution and evaluation.
Category
Relates to
Recommended practices / Further developments
Further
requirement
SIARD 2.0
A tool or function for automatic validation of SIARD 2.0 would be required
Further
recommendation
DBPTK
documentation
It would be nice if there were more documentation on which user roles and privileges the tool
works best under
Further
recommendation
DBVTK
Users made a very detailed analysis of the tool and have a lot of smaller recommendations and
wishes. (for details see documentation of the additional scenario)
Page 31 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilots 2 - SIP Creation and ingest of records
Page 32 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot 2
SIP creation and ingest of records
Task leader
National Archives of Norway
Supported by
ESS Solutions
Scope
Not less than 2 transfers of unstructured records with mixed restricted and unrestricted material, and not less
than 1 transfer of structured records.
Extract data from EDRMS and databases, create SIPs for structured and unstructured records using ESSArch
Tools, ingest the SIPs to the repository using ESSArch Preservation Platform, for further evaluation
The main part of the pilot includes the export of electronic records and their metadata from EDRM systems
and databases of Norwegian public sector institutions, transfer and ingest them to the NAN digital repository.
Name (Title)
E-mail
Skype
[email protected]
[email protected]
[email protected]
[email protected]
X
E-ARK DIP X
X
Scenario 1
SIP Creation and Ingest of unstructured records (Data Set 1)
Scenario 2
SIP Creation and Ingest of unstructured records (Data Set 2)
Scenario 3
SIP Creation and Ingest of structured records (Data Set 3)
Additional scenario Creating SIP with ESSArch Tool for Producer
Additional scenario Generating E-ARK DIP from ESSArch Preservation Platform
Page 33 of 100
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
Geodata
AIP2DIP (E-ARK Web)
SMURF SFSB
E-ARK Web Search
X
Geoserver
X
Lily - Ingest
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
ESSArch Tool for Producer (ETP)
RODA-In
ERMS Export Module
Database Preservation Toolkit
X
E-ARK AIP
SMURF ERMS
Order Management Tool
E-ARK SIP X
SIARD 2.0
E-ARK Tools
Storage – Access
Ingest - Storage
Search and Display GUI
Pre-Ingest
E-ARK Formats
QGIS
Arne-Kristian Groven
Terje Pettersen-Dahl
Geir Haug
Jørgen Ø. Vik-Strandli
SOLR Index
Contact Person
Pilot staff member
Pilot staff member
Pilot staff member
OAIS Relevance
HDFS-Storage
Contacts
ESSArch Preservation Platform
Short description
RODA Repository
Object
D2.5 Recommended Practices and Final Public Report on Pilots
Scenarios
Scenario 2
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
E-ARK Format
specifications
X
E-ARK DIP
X
SIP Creation and Ingest of unstructured records (Data Set 2)
Extract unstructured records from EDRMS based on the Norwegian NOARK 5 standard. Create SIP using
ESSArch Tools. Ingest the SIP to the repository using ESSArch Preservation Platform, for further evaluation
Pre-Ingest, Ingest
Extract and Ingest ERMS records (similar to MoReq2010)
E-ARK-SIP
ESSArch Tool Producer (ETP), ESSArch Tool Archive (ETA), ESSArch Preservation Platform (EPP)
Noark 5 output from EDRMS
EDRMS data public producer converted into Noark 5 output (real production data)
Noark 5 XML file, documents in PDF/A (or a few other specified formats), in TAR file
XML: METS, PREMIS, ADDML (local)
5 GB
Pre-Ingest
E-ARK SIP X
SIARD 2.0
Ingest - Storage
Storage - Access
E-ARK AIP
X
SMURF ERMS
X
Page 34 of 100
E-ARK DIP
SMURF SFSB
Geodata
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
SMURF SFSB
Search and Display GUI
X
SOLR Index
X
HDFS-Storage
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
RODA-In
ERMS Export Module
X
E-ARK AIP
SMURF ERMS
ESSArch Preservation Platform
E-ARK SIP X
SIARD 2.0
Database Preservation Toolkit
E-ARK Tools
Storage – Access
Ingest - Storage
RODA Repository
Pre-Ingest
E-ARK Format
specifications
Universal Archiving Module
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
SIP Creation and Ingest of unstructured records (Data Set 1)
Extract unstructured records from EDRMS based on the Norwegian NOARK 4 standard. Create SIP using
ESSArch Tools. Ingest the SIP to the repository using ESSArch Preservation Platform, for further evaluation.
Pre-Ingest, Ingest
Extract and Ingest ERMS records (similar to MoReq2010)
E-ARK-SIP
ESSArch Tool Producer (ETP), ESSArch Tool Archive (ETA), ESSArch Preservation Platform
Noark 4 output from EDRMS
EDRMS data from public producer converted into Noark 4 output (real production data)
Noark 5 XML file, documents in PDF/A (or a few other specified formats), in TAR file
XML: METS, PREMIS, ADDML (local)
20GB
ESSArch Tool for Producer (ETP)
Scenario 1
Description
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
X
X
E-ARK DIP
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
SMURF SFSB
Search and Display GUI
X
SOLR Index
SMURF ERMS
HDFS-Storage
X
ESSArch Preservation Platform
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
RODA-In
ERMS Export Module
SIARD 2.0
Storage - Access
E-ARK AIP
RODA Repository
E-ARK SIP X
Database Preservation Toolkit
E-ARK Tools
X
Ingest – Storage
Pre-Ingest
E-ARK Format
specifications
Universal Archiving Module
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
X
SIP Creation and Ingest of structured records (Data Set 3)
Extract data from old database output, create SIPs for structured records using ESSArch Tools, ingest the SIPs to
the repository using ESSArch Preservation Platform, for further evaluation.
Pre-Ingest, Ingest
Extract and Ingest ERMS records (similar to MoReq2010)
E-ARK-SIP
ESSArch Tool Producer (ETP), ESSArch Tool Archive (ETA), ESSArch Preservation Platform
Old database (CSV)
The data set here is the national registry of licenced hunters containing data from the period 1985-1999.
CSV format (input), tar file
XML: METS, PREMIS, ADDML (local)
Containing 338.500 registered persons. 105 MB
ESSArch Tool for Producer (ETP)
Scenario 3
Description
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
ESSArch Tool for Producer (ETP)
RODA-In
ERMS Export Module
E-ARK Tools
Database Preservation Toolkit
D2.5 Recommended Practices and Final Public Report on Pilots
X
Please note that more details with screenshots on scenario execution are available in the deliverable D2.4 Pilot
Documentation.
Additional scenarios
Additional scenario
Description
Creating SIP with ESSArch Tool for Producer
NAN wanted to test the EssArch Tool for Producer (ETP) in the full-scale pilot scenarios but because of the
“business as usual” full-scale pilot strategy they had to use the previous version of this tool. NAN therefore
tested ETP in an additional SIP creation scenario in a virtual environment. The SIP then was ingested to EPP (as
Page 35 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
with full-scale scenarios) in the virtual environment.
Pre-Ingest
Extract and Ingest ERMS records (similar to MoReq2010)
E-ARK-SIP
ESSArch Tool Producer (ETP)
Local test data
Microsoft and pdf documents
Not relevant
small
E-ARK DIP
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
SMURF SFSB
Search and Display GUI
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
SIP2AIP (E-ARK Web)
X
X
Ingest – Storage
X
Page 36 of 100
E-ARK DIP X
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
SMURF SFSB
Order Management Tool
SOLR Index
HDFS-Storage
X
ESSArch Preservation Platform
SMURF ERMS
RODA Repository
SIARD 2.0
SIP2AIP (E-ARK Web)
X
ESSArch Tools for Archive (ETA)
E-ARK AIP
SIP creator (E-ARK Web)
RODA-In
ERMS Export Module
Database Preservation Toolkit
Storage – Access
E-ARK SIP
Search and Display GUI
Pre-Ingest
E-ARK Format
specifications
E-ARK Tools
X
X
Universal Archiving Module
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
E-ARK AIP
Generating E-ARK DIP from ESSArch Preservation Platform
The EssArch Preservation Platform (EPP) is fully E-ARK compatible. In this additional scenario an E-ARK DIP is
generated from EPP. The scenario could not be yet completed because of the strict Norwegian data handling
regulations make it very difficult to use archived data.
Access
Access ERMS records
SMURF ERMS
ESSArch Preservation Platform (EPP)
Selected archived data
Different kinds of letters and documents
Microsoft and pdf documents
Not relevant
small
ESSArch Tool for Producer (ETP)
Additional scenario
Description
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
RODA-In
ERMS Export Module
X
Storage – Access
SMURF ERMS
RODA Repository
E-ARK SIP X
SIARD 2.0
Database Preservation Toolkit
E-ARK Tools
Ingest – Storage
Pre-Ingest
E-ARK Format
specifications
ESSArch Tool for Producer (ETP)
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
D2.5 Recommended Practices and Final Public Report on Pilots
Execution report
Scenario
Started
Completed
Summary
1. SIP Creation and Ingest of unstructured
records (Data Set 1)
May
2016
September
2016
After a longer testing period the scenario has been
performed as planned.
2. SIP Creation and Ingest of unstructured
records (Data Set 2)
June
2016
October
2016
After a longer testing period the scenario has been
performed as planned.
3. SIP Creation and Ingest of structured
records (Data Set 3)
May
2016
October
2016
After a longer testing period the scenario has been
performed as planned.
Started
Completed
Creating SIP with ESSArch Tool for Producer
November
2016
January
2017
The scenario has been performed successfully. The overall
impression is that the tool is useful for data.
providers/agencies.
Generating E-ARK DIP from ESSArch
Preservation Platform
December
2016
Not yet
finished
The scenario could not be yet completed because of the
strict Norwegian data handling regulations make it very
difficult to use archived data.
Additional scenarios
Scenario
Summary
Changes to the original plans
The E-ARK compatible version of ESSArch Tool for Provider (ETP) could not be tested in the “business as usual” fullscale pilot because of data provider’s IT infrastructure. The tool has been tested in an additional scenario by NAN. The
ETP tool has also been tested in Pilot 5.
Feedback report
The following table summarizes the feedback communication between the pilot staff and tool developers or format
specification providers.
E-ARK Tool – Version
ESSArch Tool for Producer (ETP)
v0.95
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/ESSolutions/ESSArch_Tools_Producer
SIP Creation
3 different input sources at 3 data providers
Good
No issues left at scenario completion
NAN would like to evaluate on even larger data sets to conclude about scalability.
The tool worked well
Page 37 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
E-ARK Tool – Version
ESSArch Tools Archive (ETA)
v0.93.1
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
E-ARK Tool – Version
ESS Preservation Platform
v2.7.3
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/ESSolutions/ESSArch_Tools_Archive
Ingest preparations
SIPs from 3 different input sources
Good
No issues left at scenario completion
NAN would like to evaluate on even larger data sets to conclude about scalability.
To tools has been tested very thoroughly and all the bugs issues been solved before
deployed in production environment. The tool was able to produce satisfactory results.
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/ESSolutions/ESSArch_EPP
Ingest, Long-term preservation
SIPs from 3 different input sources
Good
No issues left at scenario completion
NAN would like to evaluate on even larger data sets to conclude about scalability.
To tools has been tested very thoroughly and all the bugs issues been solved before
deployed in production environment. The tool was able to produce satisfactory result.
Recommended practices and further recommendations
The following table contains the recommended practices and further development suggestions collected during pilot
execution and evaluation.
Category
Relates to
Recommended practices / Further developments
Recommended
practices
ETP
Submission Agreement (SA) profiles are configured in ETP, based on selecting sub-profiles of
various categories such as “SIP profiles”, “Submit description profiles”, “Transfer project profiles”
and more. The data providers/agencies using ETP should predefine their own sub-profiles
according to their specific needs using the tool Profile maker, also developed by ES Solutions.
Profiles must be locked before processing further, Therefore metadata must be edited before
locking the profiles.
Various degree of automation in ETP can be defined through definition of profiles.
EAD and EAC-CPF schemas have to be provided with the content.
ETA is a part of the Ingest process step and can be easily compared to a reception desk where
you receive packages, performs the first checks of the packages and then places them at the
appropriate shelves behind the reception desk, ready to be picked up by the persons responsible
for the next steps of the Ingest process.
In EPP, AIPs are generated in an automatic manner using a queue-handling system. The AIPs can
be stored on either tapes or disks.
Recommended
practices
ETA
Recommended
practices
EPP
Recommended
practices
ETP, ETA, EPP
For installing the ESSArch ETP, ETA and EPP tools we recommend to get support from ES
Solutions for installation and configuration of the application.
Further
Testing
Content size should also be tested a bit further, since the largest content of the original pilots
Page 38 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
recommendation
Further
recommendation
were 20 GB
SIP Format
A more flexible format specification would perhaps be more suitable in the future.
Page 39 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilots 3 - SIP Creation and ingest of records
Page 40 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot 3
Task leader
Ingest from government agencies
National Archives of Estonia
Supported by
Contacts
Export public records from an EDRM system of a governmental agency to the National Archives of Estonia and
make these available through our own catalogue (i.e. Archival Information System, AIS) as well as provide an
API for accessing the records from other systems (the original EDRMS at the agency); The whole set will
include about 5000 records (but depends on the exact agency of course).
Native EDRMS at a governmental agency (Alfresco DELTA), records preparation tool (UAM), digital
preservation and access systems (Preservica, AIS)
The main part of the proposed pilot includes the export of electronic records and their metadata from EDRM
systems of Estonian public sector institutions, transfer and ingest to the NAE digital repository.
In addition Estonian agencies have the responsibility to make public electronic records with no access
restrictions available on their web sites, which means that the pilot will also enable this through standardized
linking/access methods that are implemented in the agencies' digital infrastructure / web site
Name (Title)
E-mail
Skype
Contact Person
Pilot staff member
Karin Oolu
Tarvo Kärberg
E-ARK AIP
E-ARK DIP X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
SMURF SFSB
Lily - Ingest
X
Order Management Tool
SMURF ERMS
Search and Display GUI
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
RODA-In
ERMS Export Module
ESSArch Tool for Producer (ETP)
SIARD 2.0
Database Preservation Toolkit
Storage – Access
Ingest - Storage
E-ARK SIP X
E-ARK Tools
karinoolu
tarvo.karberg
IP Viewer
Pre-Ingest
E-ARK Formats
SOLR Index
OAIS Relevance
[email protected]
[email protected]
HDFS-Storage
Short description
ESSArch Preservation Platform
Object
RODA Repository
Scope
Scenario 1
X
X
X
Extract records from EDRM (of a governmental institution), create SIP and ingest to Preservica (Data set 1)
Scenario 2
Provide access to records from governmental institution through RESTful services (Data set 1)
Scenario 3
Extract records from EDRM (of a governmental institution), create SIP and ingest to Preservica (Data set 2)
Scenario 4
Provide access to records from governmental institution through RESTful services (Data set 2)
Additional scenario Extract records with ERMS Export Module and ingest into Preservica (Joint scenario with NAE)
Additional scenario ERMS Export Module scenario with local ERMS system DELTA
Page 41 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Scenarios
E-ARK AIP
E-ARK DIP
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Order Management Tool
Search and Display GUI
Lily - Ingest
SMURF SFSB
X
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
SIP2AIP (E-ARK Web)
SMURF ERMS
ESSArch Tools for Archive (ETA)
RODA-In
ERMS Export Module
Database Preservation Toolkit
E-ARK Tools
E-ARK SIP X
SIARD 2.0
RODA Repository
E-ARK Format
specifications
Storage – Access
Ingest - Storage
IP Viewer
Pre-Ingest
SIP creator (E-ARK Web)
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
Universal Archiving Module
OIAS relevance
Use-case
Extract records from EDRM (of a governmental institution), create SIP and ingest to Preservica
Export public records from an EDRM system of a governmental agency, create SIP, and ingest to the Preservica
system at the National Archives of Estonia.
Pre-Ingest, Ingest
Extract and Ingest ERMS records based on MoReq2010
(Alfresco is not Moreq-compliant system)
E-ARK-SIP, SMURF
Universal Archiving Module (UAM)
Records and metadata exported from native ERMS (DELTA) Export Module at Ministry of Justice of Estonia
Data set consists of different documents of Ministry of Justice from 6 series with different retention period.
ddoc, docx, PDF, TIFF
SMURF ERMS
15 files
ESSArch Tool for Producer (ETP)
Scenario 1
Description
X
Scenario 2
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
E-ARK Format
specifications
Provide access to records from governmental institution through RESTful services
Estonian agencies have the responsibility to make public electronic records with no access restrictions available
on their web sites, which means that the pilot will also enable this through standardized linking/access
methods that are implemented in the agencies' digital infrastructure / web site.
Access
Access single ERMS records via CMIS Browser
(To be consolidated with a CMIS interface access solution)
SMURF
CMIS Browser
Records and metadata exported from native ERMS (DELTA) Export Module at Ministry of Justice of Estonia
Data set consists of different documents of Ministry of Justice from 6 series with different retention period.
ddoc, docx, PDF, TIFF
SMURF ERMS
15 files
Pre-Ingest
Ingest - Storage
E-ARK SIP
E-ARK AIP
SIARD 2.0
SMURF ERMS
Page 42 of 100
Storage - Access
E-ARK DIP
X
SMURF SFSB
Geodata
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
ESSArch Tool for Producer (ETP)
RODA-In
ERMS Export Module
E-ARK Tools
Database Preservation Toolkit
D2.5 Recommended Practices and Final Public Report on Pilots
X
E-ARK AIP
E-ARK DIP
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Order Management Tool
Search and Display GUI
Lily - Ingest
SMURF SFSB
X
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
SIP2AIP (E-ARK Web)
SMURF ERMS
ESSArch Tools for Archive (ETA)
RODA-In
ERMS Export Module
SIARD 2.0
Database Preservation Toolkit
E-ARK Tools
E-ARK SIP X
RODA Repository
E-ARK Format
specifications
Storage – Access
Ingest - Storage
IP Viewer
Pre-Ingest
SIP creator (E-ARK Web)
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
Universal Archiving Module
OIAS relevance
Use-case
Extract records from EDRM (of a governmental institution), create SIP and ingest to Preservica
Export public records from an EDRM system of a governmental agency, create SIP, and ingest to the Preservica
system at the National Archives of Estonia.
Pre-Ingest, Ingest
Extract and Ingest ERMS records based on MoReq2010
(Alfresco is not Moreq-compliant system)
E-ARK-SIP, SMURF
Universal Archiving Module (UAM)
Records and metadata exported from native ERMS (via DELTA) at Ministry of Justice of Estonia
Data set consists of different documents of Ministry of Justice from different series.
DDOC (a file format holding Estonian digital signature information), DOCX, PDF, TIFF
SMURF ERMS
200 files
ESSArch Tool for Producer (ETP)
Scenario 3
Description
X
Scenario 4
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Provide access to records from governmental institution through RESTful services
Estonian agencies have the responsibility to make public electronic records with no access restrictions available
on their web sites, which means that the pilot will also enable this through standardized linking/access
methods that are implemented in the agencies' digital infrastructure / web site.
Access
Access single ERMS records via CMIS Browser
(To be consolidated with a CMIS interface access solution)
SMURF
CMIS Browser
Records and metadata exported from native ERMS (via DELTA) at Ministry of Justice of Estonia
Data set consists of different documents of Ministry of Justice from different series.
DDOC (a file format holding Estonian digital signature information), DOCX, PDF, TIFF
SMURF ERMS
Page 43 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
200 files
E-ARK DIP
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Order Management Tool
Lily - Ingest
SMURF SFSB
X
Search and Display GUI
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SMURF ERMS
SIP2AIP (E-ARK Web)
SIARD 2.0
ESSArch Tools for Archive (ETA)
E-ARK AIP
SIP creator (E-ARK Web)
E-ARK SIP
Universal Archiving Module
RODA-In
ERMS Export Module
E-ARK Tools
Database Preservation Toolkit
E-ARK Format
specifications
Storage – Access
Ingest - Storage
SOLR Index
Pre-Ingest
ESSArch Tool for Producer (ETP)
Quantity
OAIS Relevance
X
Please note that you can find more details with screenshots on scenario execution in the previous deliverable D2.4
Pilot Documentation.
Additional scenarios
Additional scenario
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
E-ARK Format
specifications
Extract records with ERMS Export Module and ingest into Preservica (Joint scenario with NAE)
The National Archives of Estonia was supposed to use the ERMS Export Module to export records from ERMS
but because of the late deployment of the tool NAE had to use a local export tool to complete the full-scale
pilot. To test the ERMS Export Module a joint additional scenario has been executed. DNA exported the records
from Alfresco ERMS with the newly deployed ERMS Export Module and sent the SMURF ERMS file to NAE
where a SIP was created, and ingested to Preservica. With this additional scenario every step that was originally
planned to be tested in Pilot 3 has been successfully tested.
Pre-Ingest, Ingest
Extract and Ingest ERMS records based on MoReq2010
SMURF ERMS
ERMS Export Module
ERMS system of The Danish School of Media and Journalism (Danmarks Medie- og Journalisthøjskole) (DMJX)
Different kinds of letters and documents
Records from Alfresco ERMS
EAD
121 files, 17 MB
Pre-Ingest
E-ARK SIP X
SIARD 2.0
Ingest - Storage
Storage - Access
E-ARK AIP
SMURF ERMS
Page 44 of 100
E-ARK DIP
X
SMURF SFSB
Geodata
X
X
E-ARK AIP
X
E-ARK DIP
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Order Management Tool
Lily - Ingest
SMURF SFSB
X
Search and Display GUI
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
RODA-In
ERMS Export Module
X
SMURF ERMS
SOLR Index
E-ARK SIP X
SIARD 2.0
Database Preservation Toolkit
E-ARK Tools
Storage – Access
Ingest - Storage
IP Viewer
Pre-Ingest
E-ARK Format
specifications
Universal Archiving Module
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
X
ERMS Export Module scenario with local ERMS system DELTA
This additional pilot combines several tools and tests the E-ARK workflow in full from the beginning to the end.
Records from the local DELTA system were exported with ERMS Export Module then a SIP was created and
ingested into Preservica. Finally the access was provided by CMIS Portal Viewer.
Pre-Ingest, Ingest
Extract and Ingest ERMS records based on MoReq2010
SMURF ERMS
ERMS Export Module
Selected records from DELTA ERMS system from partner company Wisercat
Different kinds of documents
Records from DELTA ERMS
Not relevant
A small amount of records
ESSArch Tool for Producer (ETP)
Additional scenario
Description
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
ESSArch Tool for Producer (ETP)
RODA-In
ERMS Export Module
E-ARK Tools
Database Preservation Toolkit
D2.5 Recommended Practices and Final Public Report on Pilots
X
Execution report
The focus of Pilot 3 was the export of electronic records and their metadata from EDRM systems of Estonian public
sector institutions, transfer and ingest to the NAE digital repository. In addition to that, Estonian agencies have the
responsibility to make public electronic records with no access restrictions available on their web sites, which means
that the pilot will also enable this through standardised linking/access methods that are implemented in the agencies'
digital infrastructure / web site.
Page 45 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Data has been selected and extracted from the native ERMS (DELTA) Export Module in the Ministry of Justice in
Estonia, exported to the Universal Archival Module (UAM) of the National Archives of Estonia (NAE) to create E-ARK
SIP and ingested to Preservica (NAE) in the first scenario.
NAE was supposed to use the ERMS export module to select and export records from the ERMS but the version
compatible with the local DELTA system could not be launched before November 2016. The half year execution
period of the full-scale pilots ended in October so NAE has decided to use the native export functionality of DELTA
ERMS to create the E-ARK SMURF input for the SIP and perform an additional scenario with ERMS Export Module
later. At the end two complete additional scenarios have been run, one in cooperation with the Danish National
Archives.
Scenario
Started
Completed
Summary
1. Extract records from EDRM, create SIP and
ingest to Preservica (Data set 1)
May
2016
November
2016
After the very long preparation and local development
period the scenario has been successfully executed.
2. Provide access to records through RESTful
services (Data set 1)
September
2016
November
2016
Access scenarios could start only after the ingest
scenarios have been concluded. The scenario successfully
completed. The SMURF file content is accessible through
CMIS Portal Browser linked from producers corresponding
web page.
3. Extract records from EDRM, create SIP and
ingest to Preservica (Data set 2)
May
2016
December
2016
After the very long preparation and local development
period the scenario has been successfully executed.
4. Provide access to records through RESTful
services (Data set 2)
September
2016
December
2016
Access scenarios could start only after the ingest
scenarios have been concluded. The scenario successfully
completed. The SMURF file content is accessible through
CMIS Portal Browser linked from producers corresponding
web page.
Experience with piloted tools and specifications within the Pilot 3 was positive, they are compatible and widely
usable.
Additional scenarios
Scenario
Started
Completed
Summary
Extract records with ERMS Export Module
and ingest into Preservica (Joint scenario
with NAE)
November
2016
December
2016
The joint scenario was a real success story. The
preparations at both sites resulted in a smooth
cooperation in order to export the selected records at
DNA and create the ingest and provide access to data at
NAE.
ERMS Export Module scenario with local
ERMS system DELTA
November
2016
December
2016
This pilot was actually more than an additional scenario.
The complete full-scale scenario that NAE planned to
execute within the full-scale pilot has been performed. It’s
a wall-to-wall scenario from pre-ingest to access.
Page 46 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Changes to the original plans
NAE was supposed to use the ERMS export module to select and export records from the ERMS but the version
compatible with the local DELTA system could not be launched before November 2016. The half year execution
period of the full-scale pilots ended in October so NAE decided to use the native export functionality of DELTA ERMS
to create the E-ARK SMURF input for the SIP and perform an additional scenario with ERMS Export Module later. At
the end two complete additional scenarios have been run, one in cooperation with the Danish National Archives.
Feedback report
The following table summarizes the feedback communication between the pilot staff and tool developers or format
specification providers.
E-ARK Tool – Version
ERMS Export Module
Used in additional scenario
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
E-ARK Tool – Version
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/magenta-aps/erms-export-ui-module
Exporting ERMS Records
Tested with realGood
No issues left at scenario completion
Issues (bugs, wishes, comments)
Experiences / Recommended practices
Universal Archiving Module (UAM)
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
E-ARK Tool – Version
CMIS Portal Browser
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
SIP creation
Tested with two data sets of DELTA ERMS records
Good
No issues left at scenario completion
None
None
None
Issues (bugs, wishes, comments)
Experiences / Recommended practices
Access
Tested with two data sets of DELTA ERMS records
Good
No issues left at scenario completion
None
None
None
Page 47 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Although the tools and specifications proved to be usable, we are still planning to look for more possibilities to reduce
the human factor and automate the workflow in the steps where it is possible in order to make the process even
more scalable in the future.
Recommended practices and further recommendations
The following table contains the recommended practices and further development suggestions collected during pilot
execution and evaluation.
Category
Relates to
Recommended practices / Further developments
Recommended
practices
UAM
Recommendations to data providers/agencies:
- Allocate enough time for the first attempt of the transfer as there are plenty of useful
functionalities in UAM which need time to get acquainted with;
- The quality of ERMS exported data and metadata may not be sufficient for long time
preservation and therefore it is necessary to consider whether the data may need to be
rearranged and enriched with additional descriptive metadata before;
- Subsequent archival transfers will require less time.
Recommendations to archives:
- Continue UAM training in agencies;
- Look for possibilities to enhance the user-friendliness and intuitive usage of UAM.
Recommended
practices
CMIS Portal
Browser
- Very useful and necessary tool which provides access to transferred data directly to digital
archive. It allows users to see the document in the latest archival format;
- The tool is easy to configure. Link of the external interface of the digital archive will be given to
the agency to configure the tool;
- Easy to administer users. One administrator role will be given to the agency who can manage
all others.
- It is crucial to have a search feature but as far as this is not available there is need to explain
data providers/agencies differences in EDHS and archival classification.
- Security issues need to be solved for real production implementation (public network, first
login)
Page 48 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilots 4 - Business Archives
Page 49 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot 4
Task leader
Business Archives
National Archives of Estonia
Supported by
Estonian Business Archives
Scope
Pre-ingest preparation and transfer of business records to a digital archive solution in a business archive
Object
bespoke business system that contains database records
Short description
Contacts
Estonian Business Archives, Llc. is a privately owned archiving services provider. The main client base of the
company is comprised of private businesses in Estonia for archiving and preservation of both paper and digital
records. The business archives pilot in the E-ARK project will focus on transfer of database records from a
private company to the digital archive solution of the Estonian Business Archives.
Name (Title)
E-mail
Skype
Contact Person
Pilot staff member
Raivo Ruusalepp
Ats Rand
E-ARK AIP
E-ARK DIP
Scenario 1
X
Migration and Ingest of business records from bespoke business system (Data set 1)
Scenario 2
Extracting records from database (Data set 1)
Scenario 3
Migration and Ingest of business records from bespoke business system (Data set 2)
Scenario 4
Extracting records from database (Data set 2)
Page 50 of 100
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
ESSArch Preservation Platform
SMURF ERMS
X
SIP creator (E-ARK Web)
Universal Archiving Module
RODA-In
ERMS Export Module
ESSArch Tool for Producer (ETP)
SIARD 2.0
Database Preservation Toolkit
Storage – Access
Ingest - Storage
E-ARK SIP
E-ARK Tools
raivoruu
atsrand
IP Viewer
Pre-Ingest
E-ARK Formats
RODA Repository
OAIS Relevance
[email protected]
[email protected]
D2.5 Recommended Practices and Final Public Report on Pilots
Scenarios
Migration and Ingest of business records from bespoke business system
Export business records from bespoke business system. Ingest to local archival system of EBA.
Pre-Ingest, Ingest
Extract and Ingest relational database based on SIARD 2.0
E-ARK SIP, SIARD 2.0
Database Preservation Toolkit
Records from bespoke business system
Business system with 14 tables. The database contains approximately 12 000 records.
MS-SQL as mdf
none
more than 12 000 rows
E-ARK DIP
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
RODA-In
ERMS Export Module
E-ARK AIP
SMURF ERMS
X
Lily - Ingest
E-ARK SIP
SIARD 2.0
Database Preservation Toolkit
E-ARK Tools
Storage – Access
Ingest - Storage
IP Viewer
Pre-Ingest
E-ARK Format
specifications
ESSArch Tool for Producer (ETP)
Scenario 1
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
X
Extracting records from database
Extracting records from database containing no documents.
Access (not DIPs involved only restoring data from SIARD packages)
Access databases via DBVTK (SQL)
SIARD 2.0
Database Preservation Toolkit
Records from bespoke business system
Business system with 14 tables. The database contains approximately 12 000 records.
MS-SQL as mdf
none
more than 12 000 rows
Storage - Access
E-ARK AIP
E-ARK DIP
X
Page 51 of 100
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
RODA Repository
SMURF ERMS
X
SIP creator (E-ARK Web)
Universal Archiving Module
RODA-In
ERMS Export Module
SIARD 2.0
Database Preservation Toolkit
E-ARK Tools
Ingest - Storage
E-ARK SIP
IP Viewer
Pre-Ingest
E-ARK Format
specifications
ESSArch Tool for Producer (ETP)
Scenario 2
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
D2.5 Recommended Practices and Final Public Report on Pilots
E-ARK DIP
Oracle (OLAP Viewer)
CMIS portal/viewer
E-ARK DIP
X
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
RODA-In
ERMS Export Module
E-ARK AIP
SMURF ERMS
X
Lily - Ingest
E-ARK SIP
SIARD 2.0
Database Preservation Toolkit
E-ARK Tools
Storage – Access
Ingest - Storage
IP Viewer
Pre-Ingest
E-ARK Format
specifications
Universal Archiving Module
Data type
Metadata format
Quantity
OAIS Relevance
Migration and Ingest of business records from bespoke business system
Export business records from bespoke business system. Ingest to local archival system of EBA.
Pre-Ingest, Ingest
Extract and Ingest relational database based on SIARD 2.0
E-ARK SIP, SIARD 2.0
Database Preservation Toolkit
Records from bespoke business system
Business system with 63 tables (+several history and support tables that are not needed for a complete
structure of the working database). The database contains approximately 200 000 records.
MS-SQL as mdf
none
more than 200 000 rows
ESSArch Tool for Producer (ETP)
Scenario 3
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
X
Scenario 4
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
E-ARK Format
specifications
Extracting records from database
Extracting records from database containing no documents.
Access (not DIPs involved only restoring data from SIARD packages)
Access databases via DBVTK (SQL)
SIARD 2.0
Database Preservation Toolkit
Records from bespoke business system
Business system with 63 tables (+several history and support tables that are not needed for a complete
structure of the working database). The database contains approximately 200 000 records.
MS-SQL as mdf
none
more than 200 000 rows
Pre-Ingest
Storage – Access
Ingest - Storage
E-ARK SIP
E-ARK AIP
SIARD 2.0
SMURF ERMS
Page 52 of 100
X
SMURF SFSB
Geodata
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
ESSArch Tool for Producer (ETP)
RODA-In
ERMS Export Module
E-ARK Tools
Database Preservation Toolkit
D2.5 Recommended Practices and Final Public Report on Pilots
X
Please note that more details with screenshots on scenario execution are provided in the deliverable D2.4 Pilot
Documentation.
Execution report
The Estonian Business Archives (EBA) wanted to perform only one pre-ingest scenario in a test environment according
to plans in D2.3 Detailed Pilot Requirements but as they worked with the tool, wished to substantially extend their
work. EBA had good experience with the Database Preservation Toolkit SIARD 2.0 and also wanted to try the
Database Visualization Toolkit. Finally EBA have performed 4 scenarios in “business-as-usual” manner, ingesting the
SIARD files into their local preservation repository and accessing them through DBVTK.
Scenario
Started
Completed
Summary
April
2016
September
2016
Scenario performed successfully. Tools worked as
required.
August
2016
September
2016
Scenario performed successfully. Tools worked as
required.
3. Migration and Ingest of business records
from bespoke business system (Data set 2)
September
2016
October
2016
Scenario performed successfully. Tools worked as
required.
4. Extracting records from database
(Data set 2)
September
2016
October
2016
Scenario performed successfully. Tools worked as
required.
1. Migration and Ingest of business records
from bespoke business system (Data set 1)
2. Extracting records from database
(Data set 1)
Changes to the original plans
There were no changes. The scenarios have been performed according to plans in DoW and D2.3 Detailed Pilot
Requirements.
Page 53 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Feedback report
The following table summarizes the feedback communication between the pilot staff and tool developers or format
specification providers.
E-ARK Tool – Version
Database Preservation Toolkit
(version2.0.0-beta4.2)
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
E-ARK Tool – Version
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/keeps/db-preservation-toolkit
Data extraction – in scenario 1 and 3
Input: Business system with 14 tables. The database contains approximately 12 000
records + Business system with 63 tables with approximately 200 000 records
Output: SIARD2.0 packages.
Very good
There have been several issues with DBPTK related SIARD 2.0 output. KEEP Systems has
corrected all the bugs and the response time was excellent. After the completion of the
scenarios no known issues remained.
None
None
After correcting the early bugs the tool functioned properly.
Issues (bugs, wishes, comments)
Experiences / Recommended practices
Database Visualization Toolkit
Used in task
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
Access – in scenario 2 and 4
Input: SIARD 2.0 packages
Output: Restored DB tables
Good
No issues found
None
None
None
Recommended practices and further recommendations
The following table contains the recommended practices and further development suggestions collected during pilot
execution and evaluation.
Category
Relates to
Recommended practices / Further developments
Recommended
practices
SIARD 2.0
Manual validation requires a lot of time without SIARD 2.0 validation tools.
Page 54 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilots 5 - Preservation and access to records with geodata
Page 55 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot 5
Task leader
Preservation and access to records with geodata
National Archives of Slovenia
Supported by
Danish National Archives
Scope
Pilot will prove that the SIP and DIP implementations fulfill specific requirements for the records containing
GIS data, test the instructions (for the producer and for the archive) regarding all phases of ingest, to prove
that the archival use of GIS data is possible (via open data method, direct access in the archives and use GIS
data as search criteria in the DIP contents).
Pilot report with recommendations about urgent improvements and possible future improvements support for
WP6 & WP7 setting up the work environment of selected E-ARK archival tools provide real life examples how
the project deliverables can be used
During the e-ARK project the standardized method for ingesting geo data will be developed. This will allow the
archives to offer geodata as a selection and display criteria of records by means of integration of current state
of the art tools.
Name (Title)
E-mail
Skype
Contact Person
Pilot staff member
Gregor Završnik ()
[email protected]
Alenka Starman ()
[email protected]
Pilot staff member
Anja Paulič ()
[email protected]
Pilot staff member
Joze Skofljanec ()
[email protected]
Order Management Tool
Lily - Ingest
Geoserver
QGIS
E-ARK Web Search
AIP2DIP (E-ARK Web)
X
Search and Display GUI
SMURF SFSB
SOLR Index
X
HDFS-Storage
SIP2AIP (E-ARK Web)
X
ESSArch Preservation Platform
ESSArch Tools for Archive (ETA)
SMURF ERMS
SIP creator (E-ARK Web)
ESSArch Tool for Producer (ETP)
X
Universal Archiving Module
RODA-In
ERMS Export Module
Database Preservation Toolkit
X
E-ARK DIP X
X
X
X
X
X
X
X
X
X
Scenario 1
SIP Creation and Ingest of records with Geodata (Data set 1-2)
Scenario 2
Search and Access information using Geodata (Data set 1-2)
Scenario 3
SIP Creation and Ingest of records with Geodata (Data set 3)
Scenario 4
Search and Access information using Geodata (Data set 3)
Additional scenario Cross-country search with E-ARK Web (joint scenario with NAH)
Page 56 of 100
Geodata
X
CMIS portal/viewer
E-ARK AIP
SIARD 2.0
E-ARK Tools
Storage – Access
Ingest - Storage
E-ARK SIP X
Oracle (OLAP Viewer)
Pre-Ingest
E-ARK Formats
RODA Repository
OAIS Relevance
gregor.zavrsnik
Peripleo
Contacts
IP Viewer
Short description
Database Visualization Toolkit
Object
X
X
D2.5 Recommended Practices and Final Public Report on Pilots
Scenarios
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
E-ARK DIP X
X
X
X
CMIS portal/viewer
Oracle (OLAP Viewer)
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Peripleo
Geodata X
X
E-ARK Web Search
QGIS
Geoserver
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
X
HDFS-Storage
X
ESSArch Preservation Platform
SIP2AIP (E-ARK Web)
X
Scenario 2
Description
E-ARK AIP
SMURF ERMS
ESSArch Tools for Archive (ETA)
RODA-In
ERMS Export Module
Database Preservation Toolkit
E-ARK Tools
E-ARK SIP X
SIARD 2.0
RODA Repository
E-ARK Format
specifications
Storage – Access
Ingest - Storage
Lily - Ingest
Pre-Ingest
SIP creator (E-ARK Web)
Description
Data type
Metadata format
Quantity
OAIS Relevance
Universal Archiving Module
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
SIP Creation and Ingest of records with Geodata
Create SIP from records and metadata exported from GURS (The Surveying and Mapping Authority of the
Republic of Slovenia).
SIP creation and ingest of at least one small vector geodata set with less than 100 records and one with more
than 1000 records. Archivist creates a Submission agreement for SIP creation, according to E-ARK guidelines for
geodata SIP creation. Producer creates a SIP containing geodata, according to Submission agreement, based on
EARK SIP specifications for geodata. Archivist technically validates the submitted SIP package, according to EARK guidelines for geodata SIP creation. Archivist confirms, that content validation of the submitted SIP
package was performed. An AIP is generated from the SIP and gets ingested into the archival repository.
Pre-Ingest, Ingest
Other (SIP Creation and Ingest of records with Geodata)
E-ARK SIP, E-ARK AIP (with GeoData)
RODA-In, ESSArch Tools Archive (ETA), SIP2AIP (E-ARK Web), ESSArch Preservation Platform, EAD Editor, QGIS
Two sets from the Surveying and Mapping Authority of the Republic of Slovenia:
1.) Records and metadata of municipalities as valid until 1994, exported from GURS, database
2.) Records and metadata of administrative units until 1994, exported from GURS
Records and metadata of maps with Geodata
GML document with metadata in XML format, ESRI Shapefile, csv
ISO 19115 (INSPIRE)
62 records (cca. 3MB) + 1204 records (cca. 12,4 MB)
ESSArch Tool for Producer (ETP)
Scenario 1
Description
X
Search and Access information using Geodata
Create DIP from AIP containing record with Geodata. Present Geodata information with QGIS along with
content and metadata from DIP.
A data object containing geodata can be identified by using search criteria as specified by E-ARK Tool
requirement specification after search index was updated from an AIP. Selected data objects are selected and
order is issued. DIP is prepared according to order specification and end user credentials. DIP file structure with
file descriptions (mime type, short description) is presented to the end user. Geodata from the order can be
accessed in the designated viewer (QGIS). The user checks authenticity of the DIP by accessing PREMIS
documentation. Access to DIP is documented and captured metadata can be exported.
Access
Other (Access of records with Geodata)
E-ARK AIP, E-ARK DIP (with GeoData)
Search and Display GUI, Order Management Tool, Lily – Ingest, ESSArch Preservation Platform, E-ARK Web
(Search), AIP2DIP (E-ARK Web), IP Viewer, QGIS, Geoserver, Peripleo
Two sets from the Surveying and Mapping Authority of the Republic of Slovenia:
Page 57 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
E-ARK DIP X
AIP2DIP (E-ARK Web)
X
X
X
X
X
X
CMIS portal/viewer
E-ARK Web Search
X
Peripleo
QGIS
X
IP Viewer
Geoserver
X
Database Visualization Toolkit
Lily - Ingest
X
Oracle (OLAP Viewer)
Geodata X
X
Order Management Tool
SMURF SFSB
Search and Display GUI
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
ESSArch Tool for Producer (ETP)
RODA-In
ERMS Export Module
SMURF ERMS
Storage - Access
X
SIP Creation and Ingest of records with Geodata
Create SIP from records and metadata exported from ARSO (Environmental Agency of Republic of Slovenia).
SIP creation and ingest of at least one vector geodata with at least 250 records. Data is exported directly from
their own system into GML format. And their system also exports INSPIRE metadata.
Archivist creates a Submission agreement for SIP creation, according to E-ARK guidelines for geodata SIP
creation. Producer creates a SIP containing geodata, according to Submission agreement, based on EARK SIP
specifications for geodata. Archivist technically validates the submitted SIP package, according to E-ARK
guidelines for geodata SIP creation. Archivist confirms, that content validation of the submitted SIP package
was performed. An AIP is generated from the SIP and gets ingested into the archival repository.
Pre-Ingest, Ingest
Other (SIP Creation and Ingest of records with Geodata)
E-ARK SIP, E-ARK AIP (with GeoData)
ESSArch Tools Producer (ETP), ESSArch Tools Archive (ETA), ESSArch Preservation Platform, EAD Editor, QGIS
Records and metadata of Natura 2000 areas created in 2004, exported from ARSO database
Records and metadata of maps with Geodata
GML document with metadata in XML format, ESRI Shapefile
INSPIRE
286 records (cca. 9,6 MB)
E-ARK DIP
X
Page 58 of 100
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
X
ESSArch Preservation Platform
SIP2AIP (E-ARK Web)
X
RODA Repository
SMURF ERMS
X
ESSArch Tools for Archive (ETA)
RODA-In
ERMS Export Module
Database Preservation Toolkit
SIARD 2.0
E-ARK AIP
Lily - Ingest
E-ARK SIP
X
Storage – Access
Ingest - Storage
IP Viewer
Pre-Ingest
E-ARK Format
specifications
E-ARK Tools
SIARD 2.0
SIP creator (E-ARK Web)
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
E-ARK AIP
Universal Archiving Module
Scenario 3
Description
Database Preservation Toolkit
E-ARK Tools
Ingest - Storage
E-ARK SIP
SOLR Index
Pre-Ingest
E-ARK Format
specifications
ESSArch Tool for Producer (ETP)
Description
Data type
Metadata format
Quantity
OAIS Relevance
1.) Records and metadata of municipalities as valid until 1994, exported from GURS, database
2.) Records and metadata of administrative units until 1994, exported from GURS
Records and metadata of maps with Geodata
GML document with metadata in XML format, ESRI Shapefile, csv
ISO 19115 (INSPIRE)
62 records (cca. 3MB) + 1204 records (cca. 12,4 MB)
D2.5 Recommended Practices and Final Public Report on Pilots
Additional scenario
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
E-ARK Format
E-ARK DIP
X
Geodata
X
CMIS portal/viewer
E-ARK Web Search
AIP2DIP (E-ARK Web)
X
X
X
X
X
X
Peripleo
QGIS
X
IP Viewer
Geoserver
X
Database Visualization Toolkit
Lily - Ingest
X
Order Management Tool
SMURF SFSB
Oracle (OLAP Viewer)
X
Search and Display GUI
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
E-ARK AIP
SMURF ERMS
SIP2AIP (E-ARK Web)
E-ARK SIP
SIARD 2.0
ESSArch Tools for Archive (ETA)
RODA-In
ERMS Export Module
E-ARK Tools
Database Preservation Toolkit
E-ARK Format
specifications
Storage – Access
Ingest - Storage
SOLR Index
Pre-Ingest
SIP creator (E-ARK Web)
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
Universal Archiving Module
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Search and Access information using Geadota
Create DIP from AIP containing record with Geodata. Present Geodata information with QGIS along with
content and metadata from DIP.
A data object containing geodata can be identified by using search criteria as specified by E-ARK Tool
requirement specification after search index was updated from an AIP. Selected data objects are selected and
order is issued. DIP is prepared according to order specification and end user credentials. DIP file structure with
file descriptions (mime type, short description) is presented to the end user. Geodata from the order can be
accessed in the designated viewer (QGIS). The user checks authenticity of the DIP by accessing PREMIS
documentation. Access to DIP is documented and captured metadata can be exported.
Access
Other (Access of records with Geodata)
E-ARK AIP, E-ARK DIP (with GeoData)
Search and Display GUI, Order Management Tool, Lily – Ingest, ESSArch Preservation Platform, E-ARK Web
(Search), AIP2DIP (E-ARK Web), IP Viewer, QGIS, Geoserver, Peripleo
Records and metadata of Natura 2000 areas created in 2004, exported from ARSO database
Records and metadata of maps with Geodata
GML document with metadata in XML format, ESRI Shapefile
INSPIRE
286 records (cca. 9,6 MB)
ESSArch Tool for Producer (ETP)
Scenario 4
Description
X
X
Cross-country search with E-ARK Web (joint scenario with NAH)
The SOLR index and E-ARK Web infrastructure theoretically makes it possible to perform a federated search
over more than one archive. When the SOLR index of the other archival institution can be “seen” by the search
engine (e.g. one institution has access rights to the others SOLR) then it can make a common list of the result.
The National Archives of Slovenia and the National Archives of Hungary both have an E-ARK implementation at
their pilot sites. This scenario is a simple feasibility study of cross-country search.
Access
Search and Display
E-ARK Web
Test data in the SOLR index
The SOLR index of the two archives will be theoretically connected in this sceanrio
Not relevant
Not relevant
small
Pre-Ingest
E-ARK SIP
Ingest - Storage
E-ARK AIP
Page 59 of 100
Storage - Access
E-ARK DIP
D2.5 Recommended Practices and Final Public Report on Pilots
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SMURF ERMS
SIP creator (E-ARK Web)
Universal Archiving Module
ESSArch Tool for Producer (ETP)
RODA-In
ERMS Export Module
SIARD 2.0
Database Preservation Toolkit
E-ARK Tools
RODA Repository
specifications
X
Please note that more details with screenshots on scenario execution are provided in the deliverable D2.4 Pilot
Documentation.
Execution report
Two pilots (5, 7) decided to use many tools also testing their compatibility beside their core functionality. The pilot of
the Slovenian National Archives (NAS) was focusing on Geodata. NAS has tested the ESSArch tools and E-ARK Web
tools with SMURF Geodata specification checking their compatibility with the E-ARK Geodata standard and with each
other from SIP creation to accessing graphical Geodata information. E-ARK Web has two deployment options: full
deployment and virtual environment. The virtual environment is a compact solution for electronic archiving therefore
could be very useful for smaller archives. NAS used the virtual E-ARK Web deployment solution.
Scenario
Started
Completed
Summary
1. Migration and Ingest of business records
from bespoke business system (Data set 1)
April
2016
September
2016
After a longer the incompatibility errors were corrected
the scenario performed successfully. Tools basically
worked as required.
2. Extracting records from database
(Data set 1)
July
2016
October
2016
Scenario could not be completed before the Search tool
was ready but after completion the scenario performed
successfully. Tools worked as required.
3. Migration and Ingest of business records
from bespoke business system (Data set 2)
April
2016
October
2016
After a longer the incompatibility errors were corrected
the scenario performed successfully. Tools basically
worked as required.
4. Extracting records from database
(Data set 2)
July
2016
October
2016
Scenario could not be completed before the Search tool
was ready but after completion the scenario performed
successfully. Tools worked as required.
Started
Completed
December
2016
January
2017
Additional scenarios
Cross-country search with E-ARK Web
(joint scenario with NAH)
Page 60 of 100
Summary
The scenario execution was stopped because of security
considerations by the archives. The cross-country search is
technically feasible but from security point of view it is
risky. In the future if the archives build the infrastructure
to implement a publicly accessible E-ARK Web solution
D2.5 Recommended Practices and Final Public Report on Pilots
outside their firewall then it can be reached from the
search engine of another archive with E-ARK Web.
Changes to the original plans
There were no major changes. The scenarios have been performed according to plans in DoW and D2.3 Detailed Pilot
Requirements.
Feedback report
The following table summarizes the feedback communication between the pilot staff and tool developers or format
specification providers.
E-ARK Tool – Version
ESS Arch tools
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
E-ARK Tool – Version
RODA-In
(2.0.0 Alpha 7.4)
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and best practices
E-ARK Tool – Version
E-ARK Web
(Virtual deployment)
Used in tasks
Data (input / output)
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/ESSolutions/ESSArch_Tools_Producer
https://github.com/ESSolutions/ESSArch_Tools_Archive
https://github.com/ESSolutions/ESSArch_EPP
In all scenario
SIP creation and ingest with 3 different datasets
Good
There have been several issues at the beginning, mostly incompatibility problems
between tools and between tools and the SIP specification. After the completion of the
scenarios no known issues remained.
None
None
After correcting the early bugs the tool functioned properly.
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/keeps/roda-in
Create SIP - Create an E-ARK Sip Package
Input: Unstructured data
Output: EARK SIP in a *.zip file
OK
No issues left at the end of the pilot
None
The tool is being translated to Slovenian language.
None
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/eark-project/earkweb
SIP to AIP conversion, Lilly ingest, SOLR search, AIP to DIP conversion
Input: 3 different data set
Page 61 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Performance
Issues
Wishes
Comments
Experiences and best practices
E-ARK Tool – Version
Search & Display GUI
Order Management Tool
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and best practices
E-ARK Tool – Version
Output: depending on component
OK
No issues left at the end of the pilot
None
None
None
Issues (bugs, wishes, comments)
Experiences / Recommended practices
Access
Input: E-ARK AIP
Output: order
OK
No issues left at the end of the pilot
None
None
None
Issues (bugs, wishes, comments)
Experiences / Recommended practices
IP Viewer
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and best practices
View DIP
Input: DIP
Good
None
None
None
None
Recommended practices and further recommendations
Lessons learned
We addressed a real need with our users.
When we started talking to our producers, who were cooperating as pilot sites, they welcomed our propositions.
There is a real need for them to know how to archive all the spatial data, that has been accumulating for some years.
The guidelines from this project gave them a way to finally structure geodata in a way it is suitable for the archives, as
well as an input on how to adjust their current and future systems in order to automate this process.
Bridging the gap of limited network accesses
Since we used two different tools for packaging data it was shown, that a stand-alone tool, like Roda-In is more usable
than a web based one (ESS ETP). We are working with different organisations with different types of network security
policies, that often disable us from accessing the web based tool from within organisations network. It is also more
practical to physically move large quantities of data on a portable disk drive as oposed to streaming it via network.
Page 62 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Full text search brings the archival experience closer to our users
E-ARK Web based SOLR index with the Magenta Search interface brought us a new experience - full text search.
Previously the only search option was using the catalogue. This brings our users an experience similar to the way of
searching that they are used to already using (Google, Bing…). This provides better search results and less work for
our archivists, but only if the data is well described. Therefore we need to assure, that we have good metadata
descriptions.
Interoperability between systems – better communication between archives
Our experience using the general E-ARK IP structure through different applications has proven that using a common
standard is a good way to ensure interoperability between different archives. This is important when using records
that are the same across different archives within a country or even between countries across Europe (like the Natura
2000 record).
Page 63 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilots 6 - Integration between a live document management system
and digital archiving and preservation service
Page 64 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot 6
Task leader
Integration between a live document management system and digital archiving and preservation service
KEEP SOLUTIONS (KEEPS)
Supported by
Instituto Superior Técnico (IST)
Scope
The goal of this pilot is two-fold. On one hand, KEEP SOLUTIONS will demonstrate that the pan-European SIP
structure designed in the WP3 is adequate to support the media types found in today's Electronic Records
Management Systems (e.g. text documents, video, audio, images, etc) and, on the other hand, that the most
adequate and scalable form of ingest is to automate the SIP creation and delivery process to the preservation
service.
In order to achieve the goals of this pilot we will tap into two live Electronic Records Management Systems
(ERMS) and, based on the appraisal and selection strategies installed, extract, transform, aggregate and create
Submission Information Packages (SIP) that conform to the A1:R21-European SIP format defined in WP3. The
pilot will also demonstrate the capabilities of the preservation services that follow the transfer of data to
repository, namely, ingest and access by providing means to access Dissemination Information Packages from
the producers Electronic Records Management Systems served by the preservation service.
The aim of pilot 6 is to assess the efficacy of the E-ARK Information Package Specifications which defines how
metadata and data should be packaged in order to move records between the three stages of records keeping
- active, semi-active and inactive.
On a typical setting, a record that needs to be archived usually falls into one these three “ages”:
- Active - when the metadata and data are “live” being used and modified regularly.
- Semi-active - when the metadata and data are archived for a short period – say up to 5 years.
- Inactive - when the metadata and data are moved to a long-term repository for permanent conservation.
The pilot aims to do ensure the seamless transference of information between the semi-active and the
inactive stages in a way that no relevant data or metadata is lost in the process. To accomplish this goal, a
special integration tool has been developed that implements the package specifications and orchestrates the
entire transfer process.
The pilot worked with data from a public institution whose “active” records have been initially produced and
managed in an electronic records management system and then transferred to the archival service of that
same institution for temporary conservation - semi-active stage.
The archival service is, however, not prepared to face the challenges of long-term digital preservation, so the
records that have been selected for permanent conservation need to be transferred to a long-term digital
repository (the third “age”). This is where this pilot comes in.
The whole goal of the pilot is to ensure that the information package specifications developed in E-ARK and
the integration procedures developed are appropriate to support the transference of records between a active
or semi-active archival system and a long-term preservation repository.
Name (Title)
E-mail
Skype
Object
Short description
Contacts
Contact Person
Pilot staff member
Pilot staff member
Pilot staff member
Pilot staff member
Pilot staff member
Pilot staff member
OAIS Relevance
E-ARK Formats
Miguel Ferreira
Luís Faria
Hélder Silva
Sebastien Leroux
Rui Rodrigues
Ricardo Vieira
João Cardoso
Pre-Ingest
E-ARK SIP X
SIARD 2.0
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Storage – Access
Ingest - Storage
E-ARK AIP
jmaferreira
luis100
hsilva_keep
slerouxatkeep
rui.tiago.mr
ricardojoao.vieira
joao.m.f.cardoso
E-ARK DIP X
X
SMURF ERMS
Page 65 of 100
SMURF SFSB
X
Geodata
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
RODA-In
ERMS Export Module
Database Preservation Toolkit
E-ARK Tools
ESSArch Tool for Producer (ETP)
D2.5 Recommended Practices and Final Public Report on Pilots
X
Scenario 1
Automatic ingest of records from a semi-active archival management system
Additional scenario Integration with OMT via E-ARK DIP
Additional scenario Repository succession via E-ARK AIP (E-ARK AIP exchange experiments)
Scenarios
Scenario 1
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
E-ARK Format
specifications
Automatic ingest of records from a semi-active archival management system
This scenario aims to demonstrate the ability to seamlessly transfer data from a semi-active records
management system to a long-term preservation repository with little or no human intervention.
The scenario is based on real-world operations already in place at a public organization since mid-2015. The
scenario enhances the established practice by adding an additional component to its architecture that will be
responsible for the long-term preservation of historical records once they reach their inactive age. The longterm preservation repository runs as a back-end service of the Archival Management System and aims to
support its data curation activities.
Ingest
Other (Ingest of Archival Management Records using the SMURF profile.)
E-ARK SIP, E-ARK AIP
Repository Integration Pipeline (RIP), RODA Repository
Historical records
Data used in this pilot scenario was comprised of a collection of digitised books related to the Peninsular War
dating from 1778 to 1834. The collection is composed of 964 records stored in a relational database following
the semantic elements of EAD. The dataset also contains a total of 34.600 pages of documentation in
uncompressed TIFF files at 300 dpi. The total amount of data is around 1.2 TB. This collection can be inspected
at its original location at http://arquivo.cm-mafra.pt/details?id=173037.
300 dpi uncompressed TIFF files
EAD
964 records described in EAD containing a total of 34.600 pages of 300 dpi uncompressed TIFF files. The total
amount of data is around 1.19 TB.
Pre-Ingest
E-ARK SIP X
SIARD 2.0
Storage – Access
Ingest - Storage
E-ARK AIP
SMURF ERMS
Page 66 of 100
E-ARK DIP
X
SMURF SFSB
X
Geodata
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
Search and Display GUI
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
ESSArch Tool for Producer (ETP)
RODA-In
ERMS Export Module
E-ARK Tools
Database Preservation Toolkit
D2.5 Recommended Practices and Final Public Report on Pilots
X
The workflow works by selecting an AIP and running a process that generates an E-ARK DIP. The resulting DIP
can be downloaded on the RODA user interface and then uploaded to the OMT to be delivered to the end-user.
The DIP can also be consulted using the RODA’s REST API, for example, to support a more advanced systems
integration approach.
Access
E-ARK DIP
RODA Repository, Order Management Tool
Test data
Different kinds of letters and documents
Not relevant
Not relevant
small
X
Additional scenario
Description
Storage - Access
E-ARK DIP
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Lily - Ingest
Order Management Tool
SMURF SFSB
Search and Display GUI
HDFS-Storage
ESSArch Preservation Platform
SMURF ERMS
RODA Repository
SIARD 2.0
SIP2AIP (E-ARK Web)
E-ARK AIP
ESSArch Tools for Archive (ETA)
E-ARK SIP
SIP creator (E-ARK Web)
RODA-In
ERMS Export Module
E-ARK Tools
Database Preservation Toolkit
E-ARK Format
specifications
Ingest - Storage
SOLR Index
Pre-Ingest
Universal Archiving Module
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
Integration with OMT via E-ARK DIP
An Archive uses a combination of the Order Management Tool (OMT) and E-ARK IP Viewer to provide access to
existing digital objects to its users. In order to articulate the RODA repository system with these tools, a new
process has been developed for RODA that enables an archivist to create E-ARK compliant DIPs. These files can
then be downloaded and added to the OMT workflows in order to be served to the end-user.
ESSArch Tool for Producer (ETP)
Additional scenario
Description
X
Repository succession via E-ARK AIP (E-ARK AIP exchange experiments)
A repository system has reached the end of its expected lifetime. The head of the Archive has decided to move
to a next-generation long-term digital repository system. This will unavoidably imply the migration of metadata
records, millions of files, and terabytes of data from the legacy repository system to the newly adopted one.
Because of the large scale of this operation, this procedure should entail careful planning, validation and
support. However, to simplify the migration of data between the two systems, the head of the Archive opted
for a repository system that is compliant with the E-ARK AIP specification. By doing so, the migration of data
Page 67 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
E-ARK AIP
RODA Repository, E-ARK Web
Test data
Different kinds of letters and documents
Not relevant
Not relevant
small
Storage - Access
E-ARK DIP
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
X
Lily - Ingest
X
Order Management Tool
SOLR Index
SMURF SFSB
HDFS-Storage
X
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
E-ARK AIP
SMURF ERMS
SIP creator (E-ARK Web)
E-ARK SIP
SIARD 2.0
Universal Archiving Module
RODA-In
ERMS Export Module
Database Preservation Toolkit
E-ARK Tools
Ingest - Storage
Search and Display GUI
Pre-Ingest
E-ARK Format
specifications
ESSArch Tool for Producer (ETP)
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
was greatly simplified. Data and metadata does not need to be transformed, restructured or reshaped in any
way. AIPs just need to be copied to the storage area of the new repository (or linked to) and the new repository
needs to re-index the entire set of AIPs.
In order to implement the scenario, a selection of AIPs will be transferred from the RODA repository system to
the E-ARK Web reference implementation. Previous to the transference, a process needs to be run over the
selected AIPs that will generate a manifest file in the root of the AIP folder (mets.xml). After receiving the AIPs,
E-ARK Web will re-index them thus merging them with the rest of its managed data.
Archival Storage
Please note that more details with screenshots on scenario execution are provided in the deliverable D2.4 Pilot
Documentation.
Execution report
The aim of pilot 6 was to assess the efficacy of the E-ARK Information Package Specifications which defines how
metadata and data should be packaged in order to move records between the three stages of records keeping active, semi-active and inactive.
On a typical setting, a record that needs to be archived usually falls into one these three “ages”:
1.
Active - when the metadata and data are “live” being used and modified regularly.
2.
Semi-active - when the metadata and data are archived for a short period – say up to 5 years.
3.
Inactive - when the metadata and data are moved to a long-term repository for permanent conservation.
The pilot aims to do ensure the seamless transfer of information between the semi-active and the inactive stages in a
way that ensures that no relevant data or metadata is lost in the process. To accomplish this goal, a special
Page 68 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
integration tool was developed that implemented the package specifications and orchestrated the entire transfer
process.
The pilot worked with data from a public institution whose “active” records have been initially produced and
managed in an electronic records management system and then transferred to the archival service of that same
institution for temporary conservation - semi-active stage.The archival service is, however, not prepared to face the
challenges of long-term digital preservation, so the records that have been selected for permanent conservation need
to be transferred to a long-term digital repository (the third “age”). This is where this pilot comes in.
The whole goal of the pilot was to ensure that the information package specifications developed in E-ARK and the
integration procedures developed are appropriate to support the transference of records between an active or semiactive archival system and a long-term preservation repository.
Scenario
1. Migration and Ingest of business records
from bespoke business system (Data set 1)
Additional scenarios
Integration with OMT via E-ARK DIP
Started
Completed
May
2016
July
2016
Started
Completed
December
2016
January
2017
Repository succession via E-ARK AIP (E-ARK
AIP exchange experiments)
Summary
Our initial claim was that a systems integration approach
was one of the most effective ways to support demanding
archival workflows. In our view, this claim has largely
been proven. In a short amount of time, an automatic
routine has been developed and implemented that is
capable of moving millions of digital objects between the
semi-active and inactive stages of an archival workflow
with little or no human intervention.
Summary
Until the very end of the project we didn’t know whether
we would have time and resources to run these scenarios.
The E-ARK DIP has been generated and the E-ARK AIP
exported but the evaluation of the integration could not
be finished. We are planning to finish the scenarios in the
next couple of weeks.
Changes to the original plans
At the pilot planning phase the Porto Municipality also showed great interest in participating in an automatic ingest
scenario. So a second, additional, scenario was planned with the same E-ARK component and infrastructure. Later
they had some resource planning problems with their local developer who was needed to implement the producerside infrastructure. The discussions and preparations continued until August 2016, when the Porto Municipality finally
decided to delay the project. It is still possible that in the near future this additional scenario can be executed, but
definitely not within the time frame of the current project.
Page 69 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Feedback report
The following table summarizes the feedback communication between the pilot staff and tool developers or format
specification providers.
E-ARK Tool – Version
RODA Repository
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/keeps/roda
Ingest of records
Historical records, 300 dpi uncompressed TIFF files, 1,2 TB
Good
None
None
None
Real world usage brought new requirements to the ingest process of the repository but
these have been solved by the RODA development team.
Recommended practices and further recommendations
This pilot allowed us to learn a few lessons. These are summarised next:
Requirements emerged from the real-world
Working with a real-world data and workflows enabled us to understand that additional requirements had to be
accommodated by the repository system. For example, the ingest workflow had to be revised to support the
capability of updating existing AIPs with information included in SIPs (called Update SIPs). Also, the full support for
Update SIPs had to be added to the specification and software libraries. Moreover, in an unattended systems
integration, resilience is an important characteristic. Retry mechanisms had been added to the RIP application to cope
with network failures and temporary service unavailability.
Well-established patterns proved to be a successful formula
The RIP application follows a well-established software design pattern called “Pipes and Filters”. This pattern makes
use of a sequence of tasks (called “filters”) that handle part of the entire processing workflow. Each filter is
programmed to be simple and stateless. Streaming of data is used whenever possible, enabling the following filters to
start processing data even before the entire set of data is completely processed by the previous filter. The most
interest aspect of this pattern is the fact that it is possible to change filters in the chain of processing without breaking
the processing workflow. This means that the same workflow can be used to process data from different data
sources, thus enabling the reuse of the application in many different scenarios. For example, other scenarios have
been experimented hat take as input a well-structured folder system and by merely changing the data source filter we
were able to ingest data with very little effort.
Systems integration is the way forward
Our initial claim was that a systems integration approach was one of the most effective ways to support demanding
archival workflows. In our view, this claim has largely been proven. In a short amount of time, an automatic routine
has been developed and implemented that is capable of moving millions of digital objects between the semi-active
Page 70 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
and inactive stages of an archival workflow with little or no human intervention. There are always questions of
accountability and quality assurance of the entire process, however, the repository side already supports a human
validation step at the end of its ingest workflow. This helps to mitigate the previously outlined issues as in the end
there is a human expert that attests the quality of the entire process.
Page 71 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilots 7 – Access to Databases
Page 72 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot 7
Task leader
Access to Databases
National Archives of Hungary
Supported by
Danish National Archives
Scope
Contacts
Representation of not less than 2 databases of different sizes and complexities with restricted and open
content.
Extract data from the EDRMS and the databases, create SIPs for structured and unstructured records using the
ESSArch Tools, ingest the SIPs to the repository using the ESSArch Preservation Platform, for further evaluation
NAH will extract structured content from an Oracle database with the tools developed by WP3. The pilot will
examine the applicability of data-warehouse concepts in an archival environment in order to maintain both
the original structure and intellectual interpretability of ingested data. The working prototype for access will
be a user-friendly web-based application based on the DIP specification of WP5
Name (Title)
E-mail
Skype
Contact Person
Pilot staff member
Zoltan Lux
József Mezei
Scenario 1
SIP Creation and Ingest of old (not normalized) database in SIARD 2.0 format
Scenario 2
SIP Creation and Ingest of unstructured files
Scenario 3
"Extract SIARD Package from Preservica/E-ARK AIP
Scenario 4
(APEX/Oracle BI access)"
Scenario 5
"Search and present SIARD based information with E-ARK access tools
X
X
Oracle (OLAP Viewer)
CMIS portal/viewer
AIP2DIP (E-ARK Web)
X
Geodata
Peripleo
E-ARK Web Search
X
X
IP Viewer
X
Database Visualization Toolkit
SMURF SFSB
QGIS
X
Order Management Tool
X
Search and Display GUI
SOLR Index
X
HDFS-Storage
X
ESSArch Preservation Platform
SMURF ERMS
X
X
E-ARK DIP X
X
Geoserver
E-ARK AIP
SIP2AIP (E-ARK Web)
Universal Archiving Module
RODA-In
ERMS Export Module
Database Preservation Toolkit
ESSArch Tool for Producer (ETP)
SIARD 2.0
X
Storage – Access
Ingest - Storage
E-ARK SIP X
E-ARK Tools
lux.zoltan1
jmezei_92
Lily - Ingest
Pre-Ingest
E-ARK Formats
RODA Repository
OAIS Relevance
[email protected]
[email protected]
ESSArch Tools for Archive (ETA)
Short description
SIP creator (E-ARK Web)
Object
X
Additional scenario Cross-country search with E-ARK Web (joint scenario with NAH)
Scenarios
Scenario 1
Description
OIAS relevance
Use-case
E-ARK specifications
SIP Creation and Ingest of old (not normalized) database in SIARD 2.0 format
Create SIP from old (not normalized) database B25. The data is in CSV exports of DBASE files. Create both E-ARK
and local SIPs and ingest them into E-ARK Web HDFS storage and Preservica archival repository. Both E-ARK
and local AIPs are generated during the ingest.
Pre-Ingest, Ingest
Relational database based on SIARD 2.0
E-ARK SIP, E-ARK AIP
Page 73 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
DBPTK, RODA-In, SIP2AIP (E-ARK Web), HDFS-Storage
Hungarian Prosecution Office database
Old (not normalized) database in CSV exports of DBASE files.
CSV files
none
more then 300.000 cases and 500.000 name. (1,6 GB)
X
X
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Geodata
E-ARK Web Search
QGIS
Geoserver
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
ESSArch Tool for Producer (ETP)
SMURF ERMS
X
X
X
CMIS portal/viewer
Oracle (OLAP Viewer)
IP Viewer
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
Peripleo
Geodata
X
E-ARK Web Search
QGIS
Order Management Tool
Search and Display GUI
SMURF SFSB
SOLR Index
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
SMURF ERMS
ESSArch Tools for Archive (ETA)
RODA-In
ERMS Export Module
E-ARK DIP
X
Geoserver
E-ARK AIP
SIARD 2.0
Database Preservation Toolkit
Storage – Access
Ingest - Storage
E-ARK SIP X
Lily - Ingest
Pre-Ingest
X
Scenario 3
Description
E-ARK DIP
X
SIP Creation and Ingest of unstructured files
Create SIP from scanned documents of the Meeting minutes of the Central Coimmettee of the Hungarian
Socialist Party. The image files are in PDF format with EAD metadata. Create both E-ARK and local SIPs and
ingest them into B27and Preservica archival repository. Both E-ARK and local AIPs are generated during the
ingest.
Pre-Ingest, Ingest
Other (Extract and Ingest computer files from simple file-system)
E-ARK SIP, E-ARK AIP
RODA-In, SIP2AIP (E-ARK Web), HDFS-Storage
Scanned meeting minutes of the Central Committee of the Hungarian Socialist Party
Scanned documents in file systems in PDF file and corresponding metadata (EAD)
PDF/JPG files (representations)
EAD
123.225 files. (101 GB)
E-ARK Format
specifications
E-ARK Tools
E-ARK AIP
X
SIP creator (E-ARK Web)
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
X
Universal Archiving Module
Scenario 2
Description
RODA-In
ERMS Export Module
SIARD 2.0
Database Preservation Toolkit
E-ARK Tools
Storage – Access
Ingest - Storage
E-ARK SIP X
Lily - Ingest
Pre-Ingest
E-ARK Format
specifications
ESSArch Tool for Producer (ETP)
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
X
Extract SIARD Package from Preservica/E-ARK AIP
Access database information of the Hungarian Prosecution Office in SIARD format using APEX and OWB access.
Both E-ARK and local DIPs are generated during access.
Page 74 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Access
Other (Access database via APEX and Oracle BI)
E-ARK AIP, E-ARK DIP
HDFS-Storage , Lily – Ingest, E-ARK Web (Search), AIP2DIP (E-ARK Web)
Hungarian Prosecution Office database
Old (not normalized) database in CSV exports of DBASE files.
CSV files
none
more then 300.000 cases and 500.000 name. (1,6 GB)
Scenario 5
Description
E-ARK DIP
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
X
IP Viewer
X
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
X
X
Geodata
E-ARK Web Search
QGIS
Geoserver
X
Order Management Tool
X
Search and Display GUI
SOLR Index
SMURF SFSB
HDFS-Storage
ESSArch Preservation Platform
RODA Repository
SIP2AIP (E-ARK Web)
ESSArch Tools for Archive (ETA)
SIP creator (E-ARK Web)
Universal Archiving Module
ESSArch Tool for Producer (ETP)
RODA-In
ERMS Export Module
X
E-ARK DIP
X
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
IP Viewer
X
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
X
Access information from unstructured files
Create DIP from scanned documents of the Meeting minutes of the Central Coimmettee of the Hungarian
Page 75 of 100
X
Geodata
E-ARK Web Search
X
QGIS
X
Order Management Tool
X
Search and Display GUI
SOLR Index
SMURF SFSB
HDFS-Storage
ESSArch Preservation Platform
SIP2AIP (E-ARK Web)
RODA Repository
SMURF ERMS
X
ESSArch Tools for Archive (ETA)
RODA-In
ERMS Export Module
SIARD 2.0
E-ARK AIP
Geoserver
E-ARK SIP
Database Preservation Toolkit
Storage – Access
Ingest - Storage
Lily - Ingest
Pre-Ingest
E-ARK Format
specifications
E-ARK Tools
Storage – Access
X
SMURF ERMS
X
SIP creator (E-ARK Web)
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
E-ARK AIP
Search and present SIARD based information with E-ARK access tools
Access database information of the Hungarian Prosecution Office in SIARD format using HADOOP based search
and access with HIVE Lily Presentation in local environment.
Access
Access data with OLAP via oracle
E-ARK AIP, E-ARK DIP
HDFS-Storage , Lily – Ingest, E-ARK Web (Search), AIP2DIP (E-ARK Web)
, DBVTK
Hungarian Prosecution Office database
Old (not normalized) database in CSV exports of DBASE files.
CSV files
none
more then 300.000 cases and 500.000 name. (1,6 GB)
Universal Archiving Module
Scenario 4
Description
SIARD 2.0
Database Preservation Toolkit
E-ARK Tools
, DBVTK
Ingest - Storage
E-ARK SIP
Lily - Ingest
Pre-Ingest
E-ARK Format
specifications
ESSArch Tool for Producer (ETP)
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
D2.5 Recommended Practices and Final Public Report on Pilots
Additional scenario
Description
OIAS relevance
Use-case
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
E-ARK Format
specifications
E-ARK DIP
X
CMIS portal/viewer
Oracle (OLAP Viewer)
Peripleo
X
IP Viewer
X
Database Visualization Toolkit
AIP2DIP (E-ARK Web)
X
X
Geodata
X
E-ARK Web Search
QGIS
Geoserver
X
Lily - Ingest
X
Order Management Tool
SOLR Index
SMURF SFSB
HDFS-Storage
ESSArch Preservation Platform
SMURF ERMS
RODA Repository
SIARD 2.0
SIP2AIP (E-ARK Web)
E-ARK AIP
ESSArch Tools for Archive (ETA)
E-ARK SIP
SIP creator (E-ARK Web)
RODA-In
ERMS Export Module
E-ARK Tools
Database Preservation Toolkit
E-ARK Format
specifications
Storage – Access
Ingest - Storage
Search and Display GUI
Pre-Ingest
Universal Archiving Module
E-ARK specifications
E-ARK Tools
Data
Description
Data type
Metadata format
Quantity
OAIS Relevance
ESSArch Tool for Producer (ETP)
OIAS relevance
Use-case
Socialist Party. The image files are in PDF format with EAD metadata in E-ARK Web HDFS storage and
Preservica. Create both E-ARK and local DIPs.
Access
Access databases via SOLR (no-sql)
Access data from E-ARK web / HDFS storage and from locals system. SOLR is used for search the full text index
generated of the documents.
E-ARK AIP, E-ARK DIP
HDFS-Storage, AIP2DIP (E-ARK Web), , Lily – Ingest, E-ARK Web (Search), Single file Viewr
Scanned meeting minutes of the Central Committee of the Hungarian Socialist Party
Scanned documents in file systems in PDF file and corresponding metadata (EAD)
PDF/JPG files (representations)
EAD
123.225 files. (101 GB)
X
Cross-country search with E-ARK Web (joint scenario with NAS)
The SOLR index and E-ARK Web infrastructure theoretically makes it possible to perform a federated search
over more than one archive. When the SOLR index of the other archival institution can be “seen” by the search
engine (e.g. one institution has access rights to the others SOLR) then it can make a common list of the result.
The National Archives of Slovenia and the National Archives of Hungary both have an E-ARK implementation at
their pilot sites. This scenario is a simple feasibility study of cross-country search.
Access
Search and Display
E-ARK Web
Test data in the SOLR index
The SOLR index of the two archives will be theoretically connected in this sceanrio
Not relevant
Not relevant
small
Pre-Ingest
Ingest - Storage
E-ARK SIP
E-ARK AIP
SIARD 2.0
SMURF ERMS
Page 76 of 100
Storage - Access
E-ARK DIP
SMURF SFSB
Geodata
ERMS Export Module
RODA-In
ESSArch Tool for Producer (ETP)
Universal Archiving Module
SIP creator (E-ARK Web)
ESSArch Tools for Archive (ETA)
Page 77 of 100
SIP2AIP (E-ARK Web)
RODA Repository
ESSArch Preservation Platform
HDFS-Storage
X
SOLR Index
Search and Display GUI
Order Management Tool
Lily - Ingest
Geoserver
QGIS
X
E-ARK Web Search
AIP2DIP (E-ARK Web)
Database Visualization Toolkit
IP Viewer
Peripleo
Oracle (OLAP Viewer)
CMIS portal/viewer
D2.5 Recommended Practices and Final Public Report on Pilots
E-ARK Tools
Database Preservation Toolkit
D2.5 Recommended Practices and Final Public Report on Pilots
Execution report
Two pilots (5, 7) decided to test tools’ compatibility beyond their core functionality. The core of the Hungarian pilot
infrastructure was the E-ARK Web. E-ARK Web has two deployment options, Hungary used the full deployment. In the
beginning it was necessary to create a common understanding between AIT (as developer) and NAH (as user) of a
very complex system. It was necessary to ensure that everyone understood how it works, and what the idea behind
some of the features is. The AIT developers were eager to create a very usable set of components and helped in
every way. At the end we think that E-ARK Web is very useful solution and it can be well combined with other E-ARK
tools.
Scenario
Started
Completed
Summary
1. SIP Creation and Ingest of old (not
normalized) database in SIARD 2.0 format
April
2016
September
2016
283 SIARD 2.0 packages have been created and ingested
to Preservica.
2. SIP Creation and Ingest of unstructured
files
May
2016
October
2016
3703 SIPs have been created and ingested to Preservica.
3. "Extract SIARD Package from Preservica/EARK AIP
June
2016
October
2016
Data Explorer (Oracle APEX) was used in this scenario for
accessing the databases archived in SIARD 2.0 packages.
Scenario has been successfully performed.
October
2016
November
2016
Access to database information archived in SIARD 2.0
format was provided using HADOOP based search and
access with Lily Presentation in local environment. By
OWB the original model can be converted into a Data
Warehouse model.
September
2016
October
2016
DIP was successfully created for the archived scanned
documents.
Started
Completed
December
2016
January
2017
4. (APEX/Oracle BI access)"
5. "Search and present SIARD based
information with E-ARK access tools
Additional scenarios
Cross-country search with E-ARK Web
(joint scenario with NAS)
Summary
The scenario execution was suspended because of
security considerations by the archives. The cross-country
search is technically feasible but from security point of
view it is risky. In the future if the archives build the
infrastructure to implement a publicly accessible E-ARK
Web solution outside their firewall then it can be reached
from the search engine of another archive with E-ARK
Web.
Changes to the original plans
There were no major changes. The scenarios have been performed according to plans in DoW and D2.3 Detailed Pilot
Requirements.
Page 78 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Feedback report
E-ARK Tool – Version
E-ARK Web
(Virtual deployment)
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and best practices
E-ARK Tool – Version
Database Preservation Toolkit
(version2.0.0-beta4.2)
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and recommended
practices
E-ARK Tool – Version
RODA-In
(2.0.0 Alpha 7.4)
Used in tasks
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and best practices
E-ARK Tool – Version
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/eark-project/earkweb
SIP to AIP conversion, Lilly ingest, SOLR search, AIP to DIP conversion
Input: 2 different data set
Output: depending on component
OK
At the beginning there were some issues, mostly with compatibility.
No issues left at the end of the pilot
None
None
None
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/keeps/db-preservation-toolkit
Data extraction – scenario 1
Input: Hungarian prosecution office data
Output: SIARD2.0 package
Excellent
There have been several issues with DBPTK related SIARD 2.0 output. KEEP Systems has
corrected all the bugs and the response time was excellent. After the completion of the
scenarios no known issues remained.
A tool or function for automatic validation of SIARD 2.0 would be nice to have.
None
None
Issues (bugs, wishes, comments)
Experiences / Recommended practices
For the complete issue history, please refer to the GitHub page:
https://github.com/keeps/roda-in
Create SIP - Create an E-ARK SIP Package
Input: Unstructured data
Output: EARK SIP in a *.zip file
OK
No issues left at the end of the pilot
None
None
None
Issues (bugs, wishes, comments)
Experiences / Recommended practices
IP Viewer
Used in tasks
View DIP
Page 79 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Data (input / output)
Performance
Issues
Wishes
Comments
Experiences and best practices
Input: DIP
Good
None
None
None
None
Recommended practices and further recommendations
AIT – E-ARK WEB
EARK WEB’s SIP creator is a very simple application for real-life scenarios. We have therefore been using the more
complex RODA-In instead.
Even if only ingesting one SIP we recommend to use the Batch SIP ingest, because it goes through almost every ingest
task automatically, so you don’t have to click and run every tasks manually! But in order to understand the workflow
one should use it manually once or twice.
Please note that using Batch SIP Ingest AIPs won’t get uploaded into Lily automatically. In a later step one can load
the AIPs into Lily.
RODA-In
RODA-in offers a lot of features that makes SIP creation very easy and fast. Take your time and examine all the
possibilities.
If you select a folder tree and drop it in the centre, and want to fill out the metadata cells with similar data: you can
just hold CTRL and select every SIP in the centre field, and fill out the metadata cells on the right, and hit OK. Now you
have the similar metadata for the selected SIPs. Some metadata cells cannot be the same.
We had many folders in a root folder, and every single folder had two subfolders. We had dropped them into the
centre field and used the second option, that means every single folder will be an SIP. On the right side we created a
second representation and we separated those two folders into rep1 and rep2. The type of the files were jpg in the
first and pdf/a in the second folder.
DBPTK/DBVTK
If you would like to use DBVTK and DBPTK, make sure the version of DBPTK is compatible with DBVTK version that you
would like to use or later you might have to recreate every single SIARD file.
When you make an export from an Oracle DB with DBPTK, and you want to import it into your own database: you
might have to recreate the same environment to import the SIARD into, because there could be a problem with the
tablespace names.
Page 80 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Oracle Warehouse Builder and OLAP Viewer
This is a very nice and informative way of presenting data. It should be noted, however, that the whole procedure of
creating this result requires a lot of effort. This not an automatic procedure of DIP creation.
Page 81 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
External evaluations
We have been encountering a growing interest about the E-ARK project and its results in the archival community. At
DLM Forum meetings and at the E-ARK Final Conference we have talked to people who have not only showed general
interest about E-ARK tools and format specifications but have plans to try them in the near future and asked for
support in specific problems.
Promoting and supporting external evaluation of our products has been primary task at WP2. An external evaluation
or validation, according to the Description of Work, is an evaluation or implementation of E-ARK products by
members of DLM Forum and DPC or third parties outside the project with limited involvement from consortium
members.
The following organisations have performed (or performing) external evaluation activities during the project:
Organization
Title
Scenario Description
Data set
National Archives and
Records Administration
(NARA, USA)
Testing SIARD 2.0
Status: Completed
Ministerio de Hacienda y
Función Pública (MinHAP)
Archiving complete
databases
Swiss Federal Archive
(SFA)
SIARD 2.0 validation
Agenda Open Systems
Testing the possible use
of ERMS Export Module
National Archives of Chile
(NACh)
Piloting E-ARK toolset
for electronic archiving
NARA has performed 1 pre-ingest, 1 preingest/ingest and 1 access scenarios archiving 2
different databases as SIARD 2.0 files with Database
Preservation Toolkit. NARA has generated SIARD 2.0
files from databases, created SIPs in local format and
ingested them to their local preservation system.
MinHAP plans to test DBPTK for archiving databases.
They are generating SIARD 2.0 files from MySQL and
later from Oracle databases. Also testing E-ARK SIP
creation tools for creating E-ARK SIP format
information packages in the future but today
MinHAP uses the Spanish SIP standard.
Testing DBPTK and validate DBTK's SIARD 2.0 output.
The new version of SIARD has been developed in
cooperation by the E-ARK project and the Swiss
Federal Archive.
SFA plans to test DBTK and validate the created
SIARD 2.0 files.
Agenda Open Systems is an Alfresco service provider
in Slovenia. They are interested in the product. The
latest version with source code has been sent to AOS
lately.
The NACh has no electronic archival solution so far.
They had been planning to launch one when they
heard about the E-ARK project. We’ve been having
several conversations over the possibilities of trying a
subset of E-ARK tool portfolio with their consultant
Daniel Cáceres in the subject. They are really
interested but organizational and IT arrangements go
very slowly. At the time of this report there is no
official decision about the project.
Page 82 of 100
Status: In progress
Status: Under
preparation
Status: Under
preparation
Status: Preliminary
arrangements are in
progress at the archive
in order to test and
launch their first
electronic archival
solution.
D2.5 Recommended Practices and Final Public Report on Pilots
The following slides are from the presentation by Brett Abrams of NARA at the E-ARK Final Conference, at Budapest.
Please note that at moment of finishing this document some of the above external evaluation scenarios are still in
progress. Since they are outside of the project E-ARK had no influence on resource planning or scheduling these
activities.
We have found it very encouraging that major external organisations are already starting to work with our project
tools in preparation to deploy them operationally.
E-ARK project members are committed to promote and support above and later external evaluations after the official
ending of the project.
Page 83 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot evaluation
This chapter provides an evaluation of the pilots against their goal given as detailed success criteria by the document
D2.3 Detailed Pilot Requirements.
Work Package 2 Objectives (according to the Description of Work):
The overall objective of this work package is to ensure that the scenarios implemented at 7 identified pilot sites
are both realistic and relevant. That is, that they bring together a meaningful subset at each site of the use cases that
define establish a general model of the E-ARK service.
Project level pilot success evaluation
Pilot level success criteria as defined in D2.3 Detailed Pilot Requirements
No # Requirement
MoSCoW
7.2
The whole E-ARK full-scale pilot is successful if all the high-level E-ARK
use cases are piloted in at least one of the pilots
M
7.3
The whole E-ARK full-scale pilot is successful if all of the core E-ARK tools
are piloted in at least one of the pilots
M
7.4
The whole E-ARK full-scale pilot is successful if most of the E-ARK web
(Integrated Prototype) tools are piloted in at least one of the pilots
M
E-ARK uses-cases
Page 84 of 100
Comment
D2.5 Recommended Practices and Final Public Report on Pilots
Use Case
Pre-Ingest
Ingest
Access
Pilot
Scenario
Succesfull?
Extract and Ingest relational database based on SIARD 2.0 Pilot 1
Pilot 4
Pilot 7
External
evaluation
Extract and Ingest ERMS records based on MoReq2010
Pilot 2
Pilot 3
Pilot 1,3
Extract and Ingest computer files from simple file-system Pilot 5
– GML
Extract and Ingest computer files from simple file-system Pilot 5
- Other (please specify)
Pilot 6
Pilot 7
Ingest E-ARK SIP (Generate E-ARK AIP)
Pilot 2
Pilot 5
Pilot 6
Pilot 7
Access databases via DBVTK (sql)
Pilot 4
Pilot 1
Access databases via SOLR (no-sql)
Pilot 5
Pilot 7
Scenario 1-4
Scenario 1-4
Scenario 1
NARA,
MinHAP, SFA
Scenario 1-3
Scenario 1,3
Additional sc.
Scenario 1,3
Access single ERMS records
Pilot 3
Pilot 2
Scenario 2,4
Additional sc.
Access geodata via qgis
Pilot 5
Scenario 2,4
Access data with OLAP via oracle
Pilot 7
Sceanrio 4
E-ARK tools and format specifications
Page 85 of 100
Scenario 1,3
Scenario 1
Sceanrio 2
Scenario 1-3
Scenario 1,3
Scenario 1
Scenario 1-2
Scenario 1-4
Additional sc.
Scenario 3
Scenario 3-5
D2.5 Recommended Practices and Final Public Report on Pilots
Pre-Ingest
Tools
Pilot
Scenario
Database Preservation Toolkit
Universal Archiving Module
Pilot 1
Pilot 4
Pilot 7
External
evaluation
Pilot 1
Pilot 3
Pilot 5
Pilot 7
Pilot 2
Pilot 2
Pilot 5
Pilot 3
Scenario 1-4
Scenario 1,2
Scenario 1
NARA,
MinHAP, SFA
Additional sc.
Additional sc.
Scenario 1
Scenario 1,2
Scenario 1-3
Additional sc.
Scenario 3
Scenario 1,3
SIP creator (E-ARK Web)
Pilot 7
Scenario 2
ESSArch Tools Archive (ETA)
RODA Repository
Pilot 3
Pilot 5
Pilot 5
Pilot 7
Pilot 6
Scenario 1,3
Scenario 2
Scenario 1,2
Scenario 1,2
Scenario 1
ESSArch Preservation Platform
Pilot 3
Scenario 1,3
HDFS-Storage
Pilot 7
Scenario 1-5
ERMS Export Module
RODA-In
ESSArch Tool Producer (ETP)
- Redesigned UI, E-ARK compatible version
Ingest
SIP2AIP (E-ARK Web)
Page 86 of 100
Succesfull?
D2.5 Recommended Practices and Final Public Report on Pilots
Access
Tools
Pilot
Scenario
SOLR Index
Search and Display GUI
Pilot 5
Pilot 7
Pilot 5
Scenario 1-4
Scenario 1-5
Scenario 2,4
Order Management Tool
Pilot 5
Scenario 2,4
Lily – Ingest
Pilot 5
Pilot 7
Scenario 2,4
Scenario 3-5
Geoserver
Pilot 5
Scenario 2,4
QGIS
Pilot 5
Scenario 1-4
E-ARK Web Search
Pilot 7
Scenario 3-5
AIP2DIP (E-ARK Web)
Pilot 5
Pilot 7
Scenario 2,4
Scenario 3-5
Database Visualization Toolkit
Pilot 4
Pilot 1
Pilot 5
Pilot 7
Scenario 2,4
Additional sc.
Scenario 2,4
Scenario 5
Peripleo
Pilot 5
Scenario 2,4
Oracle (OLAP Viewer)
Pilot 7
Scenario 4
CMIS portal/viewer
Pilot 3
Scenario 2,4
Pilot
Scenario
Pilot 2
Pilot 3
Pilot 5
Pilot 6
Pilot 7
Pilot 2
Pilot 5
Pilot 6
Pilot 7
Pilot 3
Pilot 5
Pilot 7
Pilot 1
Pilot 4
Pilot 7
External
evaluation
Scenario 1-3
Scenario 1,2
Scenario 1,2
Scenario 1
Scenario 1,2
Scenario 1-3
Scenario 1,2
Scenario 1
Scenario 1,2
Scenario 2,4
Scenario 2,4
Scenario 3-5
Scenario 1-4
Scenario 1-4
Scenario 1
NARA,
MinHAP, SFA
IP Viewer
Use Case
Information
Package format
specification
E-ARK SIP
(Supplier Information Package)
E-ARK AIP
(Archival Information Package)
E-ARK DIP
(Dissemination Information Package)
Content type
specification
SIARD 2.0
Page 87 of 100
Succesfull?
Successful?
D2.5 Recommended Practices and Final Public Report on Pilots
Pilot 2
Pilot 3
Pilot 1,3
Pilot 5
Pilot 6
Pilot 7
E-ARK SMURF ERMS
E-ARK SMURF SFSB
Pilot 5
E-ARK SMURF Geodata
Scenario 1-3
Scenario 1-4
Additional sc.
Scenario 1-4
Scenario 1
Scenario
2,5+D14
Scenario 1-4
Pilot and scenario level success evaluation
The full-scale pilots have pilot level and scenario level success criteria defined in D2.3 Detailed Pilot Requirements.
The following table provides the evaluation details at both levels.
Successful?
Pilot / Scenario
Success criteria
Pilot 1
The following E-ARK tools will be tested in a pilot environment:
Database Preservation Toolkit
Scenario 1
Extract records from MS SQL Server database containing 50-60 tables and about
90.000 records. (95% success rate)
Scenario 2
Extract records from MySQL database about 5 million records.(95% success rate)
Scenario 3
Extract records from MS SQL Server database containing documents. (95% success
rate)
Scenario 4
Extract records from MS SQL Server database containing documents. (95% success
rate)
Pilot 2
The following E-ARK tools will be tested in a pilot environment:
ESSArch Tools Producer (ETP), ESSArch Tools Archive (ETA), ESSArch Preservation
Platform (EPP).
This pilot will be considered a success if we are able to use and evaluate these tools in
all three scenarios, producing an output that can be stored in depot. The National
Archives of Norway have been using an earlier version of EPP in production for a
couple of years, the ETP and ETA are newly developed software from which user
experience will be gathered and disseminated during piloting.
The new version of ETP was tested in an additional scenario because of the
incompatibilities at the producer IT infrastructure. The ETP tool has also been tested
in Pilot 5.
Scenario 1
Ingest around 20 GBs of EDRMS data from public producer converted into Noark 4
output
Scenario 2
Ingest around 5 GBs of EDRMS data from public producer converted into Noark 4
output
Scenario 3
Ingest around 335.000 registered persons (105 MB) from the national registry of
licenced hunters.
Pilot 3
The following E-ARK tools will be tested in a pilot environment:
ERMS Export Module (see Aditional Scenario), UAM (Universal Archival Module), EARK CMIS Browser (Yes/No)
The ERMS Export Module was tested in 2 additional scenarios because of the late
deployment of the appropriate version corresponding to local producer’s
requirements.
Page 88 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Scenario 1
Extract records from EDRM, create and ingest SIP of different documents of Ministry
of Justice with different retention period (95% success rate)
Scenario 2
Provide access to archived records of Ministry of Justice (95% success rate)
Scenario 3
Extract records from EDRM, create and ingest SIP of different documents of Ministry
of Justice with different retention period (95% success rate)
Scenario 4
Provide access to archived records of Ministry of Justice (95% success rate)
Pilot 4
The following E-ARK tools were tested in a pilot environment:
Database Preservation Toolkit (Done), RODA-In (see note below)
RODA-In wasn’t used in this pilot because the native SIP creation tool was required to
ingest into the preservation system of the Business Archives. RODA-In, on the other
hand, was tested in Pilot 5 and 7.
Scenario 1
Exporting records from database for more than 12 000 business records from bespoke
business system
Scenario 2
Importing records to database for more than 12 000 business records from bespoke
business system
Scenario 3
Exporting records from database with files for more than 200 000 business records
from bespoke business system
(success rate 85% due complicated database architecture)
Scenario 4
Importing records to database with files for more than 200 000 business records from
bespoke business system
(success rate 85% due complicated database architecture)
The following E-ARK tools will be tested in a pilot environment:
ESSArch Tools Producer (ETP), ESSArch Tools Archive (ETA), ESSArch Preservation
Platform (EPP), Search and Display GUI, Order Management Tool , IP Viewer,
along with components of the Integrated Prototype (E-ARK Web):
Order Submission Service(see note below), Lily-Ingest, Geoserver, Peripleo, with the
integration of QGIS (Yes/No)
Pilot 5
In the final order management solution of WP5 Order Submission Service is not a
separate software component any more. The planned functionality has been
implemented in the Order Management Tool.
Scenario 1
SIP creation, verification and ingest of more than 1000 records with a vector geodata
layer.
(90% success rate)
Scenario 2
Finding, accessing, modifying and exporting a DIP containing a vector geodata layer of
more than 1000 records. (90% success rate)
Scenario 3
SIP creation, verification and ingest of more than 200 records with a vector geodata
layer.
(90% success rate)
Scenario 4
Finding, accessing, modifying and exporting a DIP containing a vector geodata layer of
more than 200 records. (90% success rate)
Pilot 6
Test the E-ARK compatible RODA Repository in a pilot environment. (Yes/No)
Scenario 1
Ingest of no less that 900 historical records in E-ARK SIP format automatically
generated by a specially developed integration tool (90% success rate)
Scenario 2
At the pilot planning phase the Porto Municipality also showed great interest in
participating in an automatic ingest scenario. So a second scenario was planned with
the same E-ARK component and infrastructure. Later they had some resource
Page 89 of 100
Postponed
(Outside scope of
D2.5 Recommended Practices and Final Public Report on Pilots
planning problems with their local developer who was needed to implement the
producer-side infrastructure. The discussions and preparations continued until August
2016, when the Porto Municipality finally decided to delay the project. It is still
possible that in the near future this scenario can be executed, but definitely not
within the time frame of the current project, so we had to cancel this scenario and at
that time it was too late to start another.
DoW)
The following E-ARK tools will be tested in a pilot environment:
DBPTK, RODA-in and DB viewer (Sofia) using Oracle OLAP Viewer,
along with components of the Integrated Prototype (E-ARK Web):
SIP2AIP, HDFS-Storage, Lily-Igest, Search, AIP2DIP (Yes/No)
Scenario 1
Create SIP and Ingest more than 300.000 cases of old (not normalized) database of
the Hungarian Prosecution Office. (90% success rate)
Scenario 2
Create SIP and Ingest more than 30.000 pages of scanned pdf images of meeting
minutes of the former Hungarian Socialist Party. (95% success rate)
Scenario 3
Provide access for more than 300.000 cases of old (not normalized) database of the
Hungarian Prosecution Office. (90% success rate)
Scenario 4
Provide access for more than 300.000 cases of old (not normalized) database of the
Hungarian Prosecution Office. (90% success rate)
Scenario 5
Provide access for more than 30.000 pages of scanned pdf images of meeting minutes
of the former Hungarian Socialist Party. (95% success rate)
Pilot 7
Page 90 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Referenced Documents
In this document the following external document references have been used:
D2.1 General Model 1.0
http://eark-project.com/resources/project-deliverables/5-d21-e-ark-general-pilot-model-and-use-casedefinition
D2.3 Detailed Pilot Requirements
http://eark-project.com/resources/project-deliverables/60-23pilotsspec
D2.4 Pilot Documentation
Part 1: http://eark-project.com/resources/project-deliverables/87-d24docs-p1-1
Part 2: http://eark-project.com/resources/project-deliverables/88-d24docs-p2-1
The latest version of the General Model can be found in the E-ARK Knowledge Base and also accessible from the EARK project web site: http://eark-project.com/resources/general-model
Page 91 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Appendix 1 – Extract from E-ARK DoW
E-ARK will pilot an end-to-end OAIS-compliant e-archival service covering ingest and reuse of structured and
unstructured data addressing the needs of data subjects, data owners and data users. It will integrate tools
currently in use in partner organisations, and provide a framework for providers of these, and similar tools, to
ensure compatibility and interoperability. The project has three phases resulting in a set of tool instantiations, a
validated pilot platform and a set of recommended practices based on evaluation of the pilot. This approach
supports the planned three-tier piloting strategy (full-scale pilot, shorter ‘stretch’ pilots and external validation).
The work has been organised into six work packages, as shown in the diagram below. Specialist skills are associated
with each WP and this grouping of activities also reduces inter-dependences between work packages and localises
risk. The detailed definition of the work required in each work package includes a diagrammatic ‘product flow’
diagram. These express the flows and dependences within and between work packages.
Figure 1: E-ARK – Overall Approach
WP2 is concerned with ensuring that the needs of each pilot site are addressed in the work packages that actually
deploy the tools, and that the pilot scenarios are achievable and reflect any legal and logistical constraints. It also
supervises the acquisition of appropriate data from the data-owners working with each pilot site and, finally,
documents the knowledge gained from the pilot in the form of recommended practices.
Page 92 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
WP3, WP4 and WP5 are responsible for the information packages that encapsulate the content and related
metadata that is being archived, respectively during the workflows for submission (SIP - the data structures used by
the data owner to enable ingestion of the content), archival (AIP - the data structures used by the repository
operator to enable preservation functions) and dissemination (DIP – the data structures used for extraction and reuse of content). The mapping of SIP to AIP and AIP to DIP provide the mechanism for integration of tools/services in
the pilot and compliance with these three data-structures provides the mechanism for interoperability between
tools/services.
WP6 provides access to ingest and re-use tools/services to be deployed in the pilot, based on the implementation of
a repository supporting the open source AIP schema from WP4. Pilot sites can either use this open-source solution
or work with their platform-providers to implement SIP/AIP and AIP/DIP mappings of their own, supported through
their community of interest within the project.
Figure 2: E-ARK Technical Integration
WP7 is responsible for evaluating the pilot service from technical and commercial perspectives based on criteria
established for each scenario by WP2 and will utilise a maturity model developed in the TIMBUS project. Following
the pilot deployments, both technical and business evaluations will be carried out and stored in a knowledge base,
based on the indicators created for each pilot component. For example, a formal specification of the pilot ingest
workflow will include information about how it has been developed and tested.
Page 93 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Figure 3: Pilot Workflows
More specifically, there are two distinct work-streams orchestrating the work required to integrate the pilot service
and the work required to deploy, support and evaluate the pilot. This is summarised above, one leading to the WP6
deliverable for an “Integrated Platform Reference Implementation” (M24) and the other leading to the WP7
deliverable “Pilots Assessment – Final” (M36).
Piloting, which is the responsibility of WP2, consists of seven instances of parts of the E-ARK service.
The full scale pilots planned in the E-ARK Description of Work (DoW)
T2.5.1 Full scale pilot no. 1. – SIP creation of relational databases
Task leader: Danish National Archives.
Supported by: Magenta
Scope: Not less than 4 databases of different sizes and complexities (one contains several million records)
Object: Creating SIPs for relational databases using the tool created in WP3, T3.3: SIP Creation Tools, for
further evaluation.
Participants: Danish National Archives (digital archive), Magenta, the data provider institution creating the archival
records.
Page 94 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Resource plan: 8 person months for setting up the pilot (assisting the archivists and data provider in preparing the
transfer), carrying out the pilot (transfer, quality checking, metadata amendments), testing the results and
reporting.
Timeframe: M28-M33
Preconditions: M03.3 and M03.4
Position in the project: DNA will pilot SIP creation and ingest specified by WP3
Contribution to the project outcome: the pilot demonstrates the applicability of the project outcomes in creating
SIPs from relational databases
T2.5.2 Full scale pilot no. 2. – SIP creation and ingest of records
Task leader: National Archives of Norway
The main part of the pilot includes the export of electronic records and their metadata from EDRM systems and
databases of Norwegian public sector institutions, transfer and ingest them to the NAN digital repository.
Scope: Not less than 2 transfers of unstructured records with mixed restricted and unrestricted material, and not
less than 1 transfer of structured records.
Object: Extract data from EDRMS and databases, create SIPs for structured and unstructured records using ESSArch
Tools, ingest the SIPs to the repository using ESSArch Preservation Platform, for further evaluation.
Participants: National Archives of Norway (digital archive), data provider
Resource plan: 6 person months for setting up the pilot (assisting the archivists and data provider in preparing the
transfer), carrying out the pilot (transfer, quality checking, metadata amendments), testing the results and reporting
Position in the project: NAN will pilot SIP creation and ingest specified by WP3
Timeframe: M28-M33
Preconditions: M03.3 and M03.4
Contribution to the project outcome: the pilot demonstrates the applicability of ESSArch Tools and the ingest
functions of ESSArch Preservation Platform.
Data owners: to be defined at the time of the pilot.
Platform: ESSArch Tools will be used to create the SIPs, and ESSArch Preservation Platform will be used to create
and manage the AIPs, both delivered by ES Solutions. NAN IT-department is responsible for the systems operation.
T2.5.3 Full scale pilot no. 3. – Ingest from government agencies
Task leader: National Archives of Estonia
The main part of the proposed pilot includes the export of electronic records and their metadata from EDRM
systems of Estonian public sector institutions, transfer and ingest to the NAE digital repository.
Page 95 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
In addition Estonian agencies have the responsibility to make public electronic records with no access restrictions
available on their web sites, which means that the pilot will also enable this through standardised linking/access
methods that are implemented in the agencies' digital infrastructure / web site.
Scope: export public records from an EDRM system of a governmental agency to the National Archives of Estonia
and make these available through our own catalogue (i.e. Archival Information System, AIS) as well as provide an
API for accessing the records from other systems (the original EDRMS at the agency); The whole set will include
about 5000 records (but depends on the exact agency of course).
Objects: EDRMS at a governmental agency (Alfresco), records preparation tool (UAM), digital preservation and
access systems (SDB, AIS);
Participants: National Archives of Estonia (digital archive), one governmental agency (data provider), general public
(access to records);
Number of users: Archivists at NAE (dealing with the ingest and preservation, about 3 persons); archivists at the
agency (about 2-3 persons preparing the export/transfer and providing means for continuous in-house usage),
general public - we have around 1000 daily users at the archives virtual reading room / AIS but obviously we are not
able to predict how many of these will actually access and use the information ingested through the pilot;
Resource plan: about 4 person months (includes updates to the EDRMS installation at the agency, to UAM and
SDB/AIS, setting up and running the pilot).
Position in the project: NAE will implement and pilot the records export requirements, SIP format and transferingest workflow specified by WP3 and the access services specified by WP5;
Timeframe: setting up pilot sites through M25 – M27, running the pilot for six months through M28 – M33, which
means that the records are available for the general public for at least three months;
Preconditions: M03.3, M03.4, M04.2, M05.4, M05.6.Records are available at the agency in digital form and enriched
with metadata; it is possible to export the records; records export, preparation, transfer, ingest and access
functionalities have been updated according to project deliverables in Alfresco, UAM, SDB and AIS;
Contribution to the project outcome: the pilot demonstrates the applicability of the project outcomes inside the
framework of Estonian public sector legislation and the tools applied at NAE.
Platform and data owners: a specific data provider has not been selected for NAE, NAE notified the Ministry of
Economics and Communication (in charge for co-ordinating e-Gov and electronic records management in Estonia)
and they have promised their full support when it comes to actually selecting the specific agency. We are aiming to
use Alfresco as the commercial system which we ingest data FROM (there are about 10-20 agencies in Estonia who
use it – so quite a few possibilities). SDB is the preservation platform which we employ to ingest data.
T2.5.4 Full scale pilot no. 4. – Business archives
Task leader: National Archives of Estonia
Supported by: Estonian Business Archives
Estonian Business Archives, Llc. is a privately owned archiving services provider. The main client base of the
company is comprised of private businesses in Estonia for archiving and preservation of both paper and digital
Page 96 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
records. The business archives pilot in the E-ARK project will focus on transfer of electronic records from private
companies to the digital archive solution of the Estonian Business Archives and their subsequent description
required for archiving and preservation.
Scope: Transfer of business records to a digital archive solution in a business archive, quality control, enhancement
of description and AIP creation.
Object: bespoke business system that contains records (pilot will test an annual batch of ca 4,500 records); financial
and CRM systems that contain records (pilot will test an annual batch of ca 15,000 records).
Participants: Estonian Business Archives, Llc (digital archive), two private companies (data providers).
Number of users: The archived business records are for the sole use of their owner-company only.
Resource plan: 4 person months for setting up the pilot (assisting the companies' archivists in preparing the
transfer; setting up and configuring the IT infrastructure at EBA), carrying out the pilot (transfer, quality checking,
metadata amendments, AIP creation), testing the results and reporting.
Position in the project: The pilot will report on the suitability of the ES Tools and ES Preservation Platform for
processing electronic records from business systems.
Timeframe: M25-M27: setting up the pilot sites; M28-M31: running the pilots; M32-M33: testing and reporting.
Preconditions: M03.3, M03.4, M04.2, M05.4, M05.6.
Contribution to the project outcome: The business archives pilot will provide a view how the tools developed by the
project can be implemented in the private sector setting. The pilot will assess to what extent these tools add value
to the existing archiving services and workflows established in the corporate sector. The nature of objects used in
the pilot – business information systems that contain or manage records – is slightly different from the public sector
use cases that mostly rely on EDRM systems or databases of records.
Platform and data owners: The systems that records will be transferred from and the current digital archive solution
at the EBA are all bespoke solutions.
T2.5.5 Full scale pilot no. 5. – Preservation and access to records with geodata
Task leader: National Archives of Slovenia.
Supported by: Danish National Archives
During the e-ARK project the standardised method for ingesting geo data will be developed. This will allow the
archives to offer geodata as a selection and display criteria of records by means of integration of current state of the
art tools.
Scope: Pilot will prove that the SIP and DIP implementations fulfil specific requirements for the records containing
GIS data, test the instructions (for the producer and for the archive) regarding all phases of ingest, to prove that the
archival use of GIS data is possible (via open data method, direct access in the archives and use GIS data as search
criteria in the DIP contents).
Page 97 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
Object: pilot report with recommendations about urgent improvements and possible future improvements support
for WP6 & WP7 setting up the work environment of selected E-ARK archival tools provide real life examples how the
project deliverables can be used
Position in the project: Pilot will prove usability of specification and tools for supporting ingest (WP3 D03.3) and
access (WP5 D5.3, D5.4) of archival records with specific data. Uses specifications and tools for supporting ingest
(WP3 D03.2, D03.3) and access (WP5 D5.2, D5.3, D5.4)
Participants: National Archives of Slovenia (digital archives), Danish National Archives (best practice exchange)
Resource plan: 7 person months (6 pm for National Archives of Slovenia 1 pm for DNA)
Preconditions: M03.3, M03.4, M04.2, M05.4, M05.6.
Timeframe: M25-M27: setting up the pilot sites; M28-M31: running the pilots; M32-M33: testing and reporting.
Platform: DBExport Tool
T2.5.6 Full scale pilot no. 6. – Seamless integration between a live document management system and a long-term
digital archiving and preservation service
Task leader: KEEP SOLUTIONS
RODA (Repository of Authentic Digital Records) is a long-term digital repository system that implements an ingest
workflow that not only validates SIPs, but also checks its contents for virus, does format identification, extracts
technical metadata, and migrates file formats to more “preservable” surrogates. RODA also provides access to
digital information in several forms such as search/navigate over available metadata as well as online visualisation
and download of originals, preservation formats and dissemination derivatives. Administration interfaces allow
back-office users to manage fonds/collections and define rules for preservation actions. All interactions between
users (human and machines) and the repository are logged for security and accountability reasons. RODA ensures
that ingested data is authentic by recording PREMIS metadata on all actions performed by the repository, records
provenance in archival metadata standards such as ISAD(g), and ensured integrity and availability by frequently
monitoring data and making sure that it has not been tampered with. More recently, RODA has been enhanced to
support preservation plans developed in Plato, thus proving a full-cycle preservation environment for digital objects
ensuring usability and readability of ingested data.
RODA currently supports the Digital Archiving and Preservation Service at the Portuguese National Archives. This
service allows public bodies to submit digital content to the archiving service for long-term preservation. The Digital
Archiving and Preservation Service takes care of the necessary procedures to keep data accessible for long periods
of time (in the scale of decades). Producers have special privileges in the system, allowing them to manage their
data and change the structure of their fonds/collections. Data is submitted via SIP files that need to be manually
prepared by producers using an offline tool called RODA-in.
Scope and objectives: The goal of this pilot is two-fold. On one hand, Keep Solutions demonstrates that the panEuropean SIP structure designed in the WP3 is adequate to support the media types currently supported by RODA
(i.e. relational databases, text documents, video, audio and images) and, on the other hand, that the most adequate
and scalable form of ingest is to automate the SIP creation process. In order to achieve this, we will tap into a
running Document Management System and, based on appraisal and selection strategy installed, we will extract,
Page 98 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
transform, aggregate and create Submission Information Packages that conform to the pan-European SIP format
defined in WP3 that are ready to be ingested in RODA.
Participants: In this pilot we will make use of data produced by several bodies of the Portuguese public
administration. One already confirmed is a project partner, the IST. The IST is a Portuguese public university that
delivers top quality higher education and engages in research, development and innovation activities. In its
activities, several forms of content with high administrative, legal, financial and informational value are produced
every day. During the project lifetime the IST will engage in a parallel project to re-engineer a large part of the
technology that supports its administrative services, which will include the acquisition and deployment of an
integrated archival system. This makes this pilot an excellent example as information assets to be ingested from the
actual production systems are expected to be highly unstructured and in desperate need of preservation. Besides
the IST, the consortium will also take advantage of the role that AMA plays in the structure of the Portuguese Public
Administration to complement this case with more data providers.
Resource plan: 7 person months. 6 PM for KEEPS for development, testing and integration and 1 PM for IST for
consulting and liaison with the departments that will provide data to the pilot.
Position in the project: RODA already supports preservation actions and dissemination interfaces for 5 media types.
This pilot will focus on enhancing the ingest process by connecting the long-term repository to the Document
Management Systems active at the data producer’s location this way demonstrating SIP suitability for packaging
various content types and scalability by providing a seamless ingest process that requires little or no human
intervention.
Timeframe: Between M25–M27 the pilot will be deployed. Between M28–M33 the ingest process will run in parallel
with the SIP creation process.
Preconditions: pan-European SIP format defined (WP3). RODA must be enhanced to support the new SIP format
(WP3). Automatic SIP creation tool/middleware must be developed to integrate the data provider DMS with the
long-term repository.
Contribution to the project outcome: The pilot will demonstrate that the pan-European SIP structure designed in
the WP3 is adequate to support the content types currently supported by RODA (i.e. relational databases, text
documents, video, audio and images) and, on the other hand. The pilot will also demonstrate and provide a
framework for automatic SIP creation and DMS-Repository interoperability showing the scalability of whole ingest
process.
Platform and data owners: The owner of the data in this pilot will be the IST. Multiple systems are currently in place
to support document management processes, e.g. an internally developed records management system called
“DOT”, a commercial workflow software called eDocLink, and an archival management system called ICA-Atom. In
this pilot a prioritization of existing platforms will be made to choose the ones that will be included in the pilot.
T2.5.7 Full scale pilot no. 7. – Access to databases
Task leader: National Archives of Hungary.
Supported by: Danish National Archives
NAH will extract structured content from an Oracle database with the tools developed by WP3. The pilot will
examine the applicability of data-warehouse concepts in an archival environment in order to maintain both the
Page 99 of 100
D2.5 Recommended Practices and Final Public Report on Pilots
original structure and intellectual interpretability of ingested data. The working prototype for access will be a userfriendly web-based application based on the DIP specification of WP5.
Scope: Representation of not less than 2 databases of different sizes and complexities with restricted and open
content.
Objects: Extract data from the EDRMS and the databases, create SIPs for structured and unstructured records using
the ESSArch Tools, ingest the SIPs to the repository using the ESSArch Preservation Platform, for further evaluation.
Participants: National Archives of Hungary (digital archives), data provider
Resource plan: 6 person months for setting up the pilot (assisting the archivists and the data provider in preparing
the transfer; setting up and configuring the IT infrastructure at NAH), carrying out the pilot (transfer, quality
checking, metadata amendments, AIP creation), testing the results and reporting.
Position in the project: NAH will primarily implement and pilot the applicability of specifications and tools related to
access (WP5 D5.3, D5.4). The pilot will also prove usability of specifications and tools for supporting ingest (WP3
D03.3) of archival records.
Resource plan: 7 person months (6 pm for National Archives of Slovenia 1 pm for DNA)
Preconditions: M03.3, M03.4, M04.2, M05.4, M05.6.
Timeframe: M25-M27: setting up the pilot sites; M28-M31: running the pilot; M32-M33: testing and reporting.
Contribution to the project outcome
Data owner: Prosecution Service of Hungary
Platform: DBExport Tool, Oracle APEX, development in Java
Page 100 of 100