Hlg2023 Dafi Final_0
Hlg2023 Dafi Final_0
Hlg2023 Dafi Final_0
STATISTICAL INTEROPERABILITY
(DAFI)
Data Governance Framework for
Statistical Interoperability (DAFI)
1
Acknowledgement
This document is the final deliverable of the UNECE High-Level Group on Modernisation of
Official Statistics (HLG-MOS) project “Data Governance Framework for Statistical
Interoperability”. The project was selected in 2021 “HLG-MOS Workshop on the Modernisation
of Official Statistics” and conducted from 2022 to 2023.
The following team members kindly dedicated their time, and contributed their knowledge,
experience, and expertise:
• Juan Muñoz (Project Lead), Silvia Fraustro and Juan Eduardo Rioja – INEGI, Mexico
• Carlo Vaccari – UNECE Project Manager
• Flavio Rizzolo and Chantal Vaillancourt – Statistics Canada
• Zoltán Vereczkei – Hungarian Central Statistical Office
• Muriel Shafir and Debbie Soffer – Israel Central Bureau of Statistics
• Emanuela Reccini and Samanta Pietropaolo – Istat, Italy
• Munthir M. Alansari – Ministry of Tourism, Saudi Arabia
• Daniel Gillman – Bureau of Labor Statistics, USA
• Edgardo Greising – ILO
• David Barraclough and Barbara Ubaldi – OECD
• InKyung Choi – UNECE
2
Executive Summary
Statistical organisations deal with data coming from different sources and domains. While each
information (data and metadata) set possesses intrinsic value on its own, integrating them with
other information holds a great potential to provide knowledge and insights to society vital to
addressing the increasing number of multi-faceted challenges. Reusing the data sets already
collected and produced for other statistical programmes where relevant could also further amplify
their values.
Yet, exchanging and making use of data sets across various sources requires a shared
understanding among involved parties on several aspects such as data semantics, representation,
formatting, and more. These difficulties exist not just for the exchange and sharing between
different organisations; they are significant challenges even within the same organisation.
Enhancing statistical interoperability, a capacity to exchange and make use of the statistical
information with minimal or no prior communication, is crucial for improving the efficiency and
quality from producers’ perspectives as well as the usability and value of products for users.
Furthermore, is also important to maximise the potential of traditional and new data sources and
leverage new technologies such as data science.
Interoperability encompasses multiple facets – semantic, structural, syntactic and system - which
are closely related and important for the smooth exchange and utilisation of information. The
governance system needed to support and improve interoperability needs to consider various
factors, including organisational roles, legal and business policies as well as the standards and
technologies that facilitates it.
Moving forward, it is recommended to develop an interoperability strategy within the
organisation and establish concrete metrics to evaluate the journey. Moreover, expanding the use
of open standards and cultivating a culture of change while supporting staff in acquiring necessary
skills and knowledge are pivotal in this endeavour.
3
Contents
Acknowledgement........................................................................................................................... 2
Executive Summary ........................................................................................................................ 3
Acronyms ........................................................................................................................................ 5
1. Introduction ................................................................................................................................ 6
1.1. Background ........................................................................................................................... 6
1.2. Problem Statement ............................................................................................................... 6
1.3. Core Terms............................................................................................................................ 9
1.4. Purpose and Scope...............................................................................................................10
2. Interoperability in Statistical Organisations ............................................................................. 12
2.1. Definition and Related Concepts ......................................................................................... 12
2.2. Facets of Interoperability .................................................................................................... 15
2.3. Benefits of Interoperability ................................................................................................. 17
2.4. Source of non-interoperability ............................................................................................18
3. DAFI Components .....................................................................................................................21
3.1. Roles and Governance Bodies ............................................................................................. 21
3.2. Legal and Business Policy .................................................................................................. 23
3.3. Standards, Tools, and Technologies................................................................................... 25
4. Recommendations .................................................................................................................... 35
4.1. Develop Interoperability Strategy and Monitor Implementation ...................................... 35
4.2. Expand Use of Standards ................................................................................................... 37
4.3. Foster Culture Change and Support Staff ........................................................................... 41
References ..................................................................................................................................... 43
Annex 1 - Standardised Vocabularies, Methods, Formats, Frameworks, Languages, Workflows
and Data models ........................................................................................................................... 44
Annex 2 - Applications that use standards. ................................................................................... 51
Annex 3 - Roles and responsibilities from ISO/IEC 11179 ........................................................... 53
4
Acronyms
APIs – Standardised Application Programming Interfaces
CDO – Chief Data Officer
CEMs – Common Exchange Models
CIO – Chief Information Officer
CSV – Comma-Separated Values
DCAT – Data Catalog Vocabulary
DDI – Data Documentation Initiative
FAIR – Findable, Accessible, Interoperable, Reusable
GAMSO – Generic Activity Model for Statistical Organization
GSBPM – Generic Statistical Business Process Model
GSIM – Generic Statistical Information Model
INEGI – National Institute of Statistics and Geography of Mexico
ISO – International Organization for Standardization
JSON – JavaScript Object Notation
LOD – Linked Open Data
MAF – Machine Actionable Format
NSO – National Statistical Organisations
OWL – Web Ontology Language
RDF – Resource Description Framework
SEPs – Standardised Exchange Protocols
SDMX – Statistical Data and Metadata Exchange
SKOS – Simple Knowledge Organization System
XKOS – eXtended Knowledge Organization System
XML – eXtensible Markup Language
5
1. Introduction
1.1. Background
The primary purpose of national statistical organisations (NSOs) is to produce high-quality
information to portray the society phenomena as accurately, completely, and timely as possible.
Statistical information describes different aspects of the society such as demography, economy,
and environment, among others. It is used as input for the design, monitoring, and evaluation of
public policies as well as in the making of a wide set of other decisions by private sector and
individuals. To create a coherent picture of reality we need an interoperable set of high-quality
statistics produced by a set of well-aligned information production processes.
Interoperability is gaining more and more attention due to the increasing complexity of the
phenomena that statistical organisations must measure. Multi-faceted policy issues such as
climate change adaptation and circular economy, among many others, involve numerous
interrelated variables or factors that interact with one another, which most of the time are
produced by different programmes in the statistical organisations or independent organisations
in the country. On the other hand, statistical organisations have been increasingly exploring the
use of big data and new data sources such as satellite images, sensors, and other technologies to
meet society's expectations for improved and timely information products. Statistics derived from
surveys and censuses could offer an accurate and comprehensive portrayal of society, economy
and environment through their systematic data collection approach and rigorous survey
methodologies. Additionally, big data represents a great opportunity for statistical organisations
to generate information products in near real-time, which could help to have intercensal
information or information on topics where data is not available through a traditional survey or a
census such as environment statistics. The use of new data whose attributes such as type and
source differ from traditional poses additional challenges for statistical organisations as they
might not necessarily be interoperable with existing datasets in the organisations. Therefore, data
interoperability is a necessary capability to provide a new generation of services and products that
meet the emerging demands of statistics users.
This document aims to provide a reference framework that contains the core elements to
implement a governance programme focused on achieving data interoperability and thus helping
statistical organisations improve their data management.
6
First, knowing the concept to which the data pertains is important. The number corresponding to
a statistical indicator represents the measurement of a concept that was defined for that indicator.
Logically, we need to be sure that when talking about a specific indicator, we are referring to the
same concept. For example, “work” and “occupation” may be used interchangeably in everyday
language, but in the labour market, they mean very different things - occupation is a specific form
of work. Other forms of work are own-account production work, volunteer work and unpaid
internship work. Therefore, if we want to make use of statistical information from different
sources, we need to know the concepts that the data obtained from those different sources
pertains to.
Once we are sure that the common understanding of concepts is established, the next step involves
determining how we will represent the numbers they refer to. To make sure that the number is
well interpreted, we need to accompany it with all the information needed to understand the
number correctly such as the period it is referring to, the geographic area that is covered, the units
of measurement, etc. For example, a population number “126,014” can lead to different
interpretations, but only when accompanied with information indicating that it refers to the
population of Mexico in thousands, as counted by the National Census in 2020, we can have a
proper context and accurate understanding about what the number refers to. Concept (variable),
period and geographic dimensions are often considered as the minimum essential information
that is needed to determine how the number (measure) can be used or compared to others.
Depending on the type of variable, however, other information might be needed to understand
the data. For example, if they are foreign trade variables, it will be necessary to know if they refer
to an import or an export, the country of origin and the country of destination, etc. On the other
hand, if the variables are about a sociodemographic subject, it would be useful to know if they
refer to the total population of women or men, and perhaps even the age group to which the data
refers. The information that can help us to provide a semantic context of the statistics, as in the
last examples described, is called structural metadata. It is needed to ensure that we correctly
interpret the numbers when we exchange and make use of them. Some of this metadata can be
coded using code lists or classifications to divide the variables into categories and to have a better
knowledge of the composition of each indicator. Having a common agreement on how this
structural metadata will be incorporated into the information set that is being exchanged will
improve the capability to interoperate with the information set.
In the current digital era, the exchange and integration of statistics is done primarily using
information technologies. It is easier to achieve technological interoperability if we share the same
syntax to conform structures that can be easily interpreted by the different software systems and
tools used by the organisations. However, to achieve statistical interoperability it is necessary to
consider other aspects.
Based on the descriptions above, we can ascertain that the core conditions to have statistical
interoperability include the understanding of the concepts, having a set of structural metadata to
provide a context, establishing a regular way that is well known to communicate them, and
providing tools and rules to access the information.
From these conditions to have statistical interoperability, we can deduce potential problems that
arise in their absence:
• We can put together indicators that refer to concepts that look similar. But if we have not
previously agreed on the meaning of those concepts, we will not be sure to confirm that
7
we are referring to the same concepts. In this case, we may not be able to compare these
indicators and interoperability cannot be ensured.
• We can make mappings between classifications and transform units to put the statistics in
the same contexts. But during this process, we can have problems of losing precision or it
may be impossible to map data between different classifications from different parties
when they have different granularities or even different conceptualisations.
• We can transform the structures of the statistical information using software. Several
organisations use these kinds of tools to integrate information that reduces the effort
required to put all this data under the same format, but we must be aware that these
transformations can induce some errors that may be difficult to detect. In the end, we
cannot guarantee the quality of the product of this transformation that gathers data
coming from different sources, initially produced with different purposes and from
different points of view, and potentially with undetected errors introduced by our
integration processes.
• We can try to use different software tools to create an interoperable data environment, but
if they don’t share at least some technical specifications such as the capability to receive
requests and to answer them directly, the result will add unnecessary complexity and, in
many cases, it will require a lot of work to integrate or link the information from one
system into others.
The stability and the scope of the agreements that must be made to achieve statistical
interoperability are important to set the right conditions to build a statistical information
ecosystem able to provide valuable statistical information to the policymakers and in general, to
the whole society fitted to satisfy all their needs.
Providing stability means that the concepts, semantics, and structures will be kept during the
different cycles of each statistical programme. Having this condition, it will be possible to have
time series from different periods, providing information to build models recognising trends and
scenarios for forecasting the future trends.
The scope of these agreements is fundamental for eliminating the information silos and building
the statistical information platform that could answer the complex multi-dimensional
information needs of the society. When a unit in charge of a certain statistical programme takes
in isolation all the decisions related to the concepts, semantics, and structures without
considering those used by other departments, organisations, or projects, the information
produced from the program will not be interoperable, resulting in an information silo. When a
division in charge of several statistical programmes establishes interoperability for the statistics
produced in their programs, then at least statistics produced by the units within the division will
be able to be interoperable. The integrated set of statistics will provide a better understanding of
the different concepts within the statistical programs and maybe between different domains.
If the interoperability scope is extended to statistics from different domains, the value of the
information will be further. This can help answer complex questions about the interplay among
different domains, for example, how the evolution of certain economic activity can affect the
demography and ecology of certain geographical areas. The scope can be extended to go beyond
statistical organisations and encompass national systems, international, regional, or global. A
wider coverage will help society to understand the context of statistics and compare it with the
rest of the available areas sharing similar statistics.
8
Building an interoperable platform of high-quality statistical data cannot be a result of
serendipity. It is necessary to establish a data governance programme to transform the data silos
into a connected network of harmonised data and metadata sets that includes the structures,
procedures, rules, and policies to preserve the meaning and quality of the statistical information
datasets it contains.
• Agreeing on the concepts behind the data to be exchanged to ensure that all the parties
will have a common understanding of what is being exchanged.
• Establishing the process patterns and constraints that will be followed by the parties to
manage, send, and receive the data using the exchange channel in a way which avoids
losing or distorting the messages.
• Developing structures to arrange all the data and its related metadata in a way that the
statistical information that is exchanged can be easily identified, logically contextualised,
accurately integrated, and correctly analysed.
• Providing formats to reduce the diversity and complexity of the tools needed to process
and publish different data and metadata sets. This includes defining the main features of
the software tools to process and publish the contents of data and metadata sets related to
different domains in such a way we can reduce the complexity and cost involved in
developing them.
As one can see from above, interoperability can be seen from different points of view and Section
2 describes these different facets, namely, semantic, syntactic, structural, and system, in more
detail.
Following this order of ideas, we can define statistical interoperability as the capacity to share
and make use of statistical information among different parties or electronic systems without
distortions of its meaning, not needing to communicate to get additional specifications or make
ad-hoc adjustments for each specific case. Statistical interoperability implies achieving minimum
compliance regarding the semantical, structural, syntactical, and technological aspects of the
statistical data and metadata.
An organisation can achieve interoperability only if it is in control of its information. Data
governance is defined as the exercise of authority and control (planning, monitoring, and
enforcement) over the management of data assets [1]. This concept is related to the decisions that
must be made to establish control and be able to manage the data. Its purpose is to ensure that
data is managed properly, and according to policies and best practices.
It is important to distinguish data governance from data management. While data governance is
about making decisions and establishing lines of authority and expected behaviours, the latter
refers to implementing and performing all the aspects of working with data. Data management
is defined as the development, execution, and supervision of plans, policies, programs, and
9
practices that deliver, control, protect, and enhance the value of data and information assets
throughout their lifecycles [1]. Data governance is about ruling data management.
Another related term is data stewardship, which is gaining importance as a role of national
statistical organisations managing data assets expanding beyond those that are produced by the
organisation itself, but data owned and shared by other government agencies or actors in the data
ecosystem. In [2], data stewardship is considered an “approach to data governance that formalises
accountability for managing information resources on behalf of others” which “is enabled through
good data governance and data management.”
A framework is a model that describes the structure underlying a system or concept. In this case,
a data governance framework is a model that identifies the elements, structure, interactions,
processes, and rules required to achieve data governance.
The Data Governance Framework for Statistical Interoperability (DAFI) can be
defined as a model and a set of guidelines and recommendations that identify the elements,
structure, interactions, processes, and rules required to establish the conditions of an information
governance environment focused on facilitating the making of decisions required to align the
efforts to achieve statistical interoperability.
10
for the development, production, and dissemination of their statistics. Statistical organisations
regularly use other types of information that play a significant role in their business and benefit
from improved interoperability (e.g., human resource, finance, legal). However, these types of
information are not in the scope of this document.
11
2. Interoperability in Statistical Organisations
2.1. Definition and Related Concepts
Definition
One of the early definitions of interoperability defines the concept as an “ability of a system (such
as a weapons system) to work with or use the parts or equipment of another system” (Merriam-
Webster) which originated from the needs of the military to make parts that can be used
interchangeably. As time went on, the term began to be used in information technology in much
the same way. The ISO standard ISO/IEC 2382 (Information Technology – Vocabulary) defines
interoperability as a “capability to communicate, execute programs, or transfer data among
various functional units in a manner that requires the user to have little or no knowledge of the
unique characteristics of those units.” These definitions have similarities in that they consider
interoperability as a capacity or capability. This means interoperability is a condition to be met,
not an activity. In other words, it is not an exchange or a function, but it promotes exchange or
functionality across systems (see Annex I for more definitions).
In the context of statistical organisations whose core business is the production and dissemination
of information (data and metadata), interoperability thus can be considered as a capacity to
exchange and make use of the information with minimal or no prior
communication.
It is important to note that each situation for which interoperability concerns arise needs different
characteristics. For different classes of objects used in the official statistics (e.g., variable, data
sets, questions, questionnaires, data structures, sampling), the elements required to describe each
of those classes are different. For example, a sample has a size, stages, frames, and a selection
method at each stage; a question has wording, response choices, and a skip pattern; and a variable
has a definition, a value domain, and the data it generates has a format and structure. If we define
a technical specification as a schema organising a set of elements, the interoperability of each class
(e.g., variable, data structure) depends on the requirements in a schema, for example, if
descriptions of variables are to be interoperable, one needs to know the schema used to organise
and format those descriptions. Similarly, if some process is interoperable, one needs to know the
schema used to organise and describe the steps of the process. Thus, conformance 1 to the
appropriate technical specification is a necessary condition for interoperability.
1Technical specifications contain normative expressions which can be divided into statement (expression
that conveys information), instruction (expression that conveys an action to be performed),
recommendation (expression that conveys advice or guidance) and requirement (expression that conveys
criteria to be fulfilled). Conformance to a technical specification means satisfying all its requirements.
12
catalogues, models, methods, procedures that are created and maintained to share, exchange and
understand data.
There is a close relationship between standards and interoperability. In principle, two parties can
achieve interoperability through bilateral agreements once they agree on every aspect and
procedure involved in the exchange (e.g., concepts used in the data, data format, data structure).
However, this arrangement quickly becomes costly and inefficient when more parties are
included. Another way to achieve interoperability is by making all relevant information open, thus
allowing any other party to obtain and understand the data without a need to contact and
communicate. However, this also creates inefficiencies as it requires additional efforts if the
concepts, structure, or format used by the party is different from those used by other parties that
want to make use of the data.
Adopting standards can significantly facilitate interoperability, enabling seamless data exchange
not just between individual parties but automatically among any involved parties (see Figure 2.1).
Therefore, standards play a crucial role in achieving interoperability efficiently.
Figure 2.1. Interoperability through bilateral agreements vs. interoperability through
adopting standard (recreated based on Figure 5-3 from [3])
The importance of the following standards is not new for statistical organisations. The
Fundamental Principles of Official Statistics states (Principle 9) that “The use ... of international
concepts, classifications and methods promotes the consistency and efficiency of statistical
systems at all official levels.” However, the significance of adopting standards has grown even
more in recent years due to several factors.
Firstly, the landscape of standards has become much more complex. Production processes have
become more granular, with each component more specialised – there are different rules,
classifications, concepts, models, methods and procedures for different sub-processes and tasks
within. The types of data that statistical organisations deal with have become diverse (e.g.,
geospatial data, unstructured data). To ensure interoperability with other domains, sectors and
countries, statistical organisations should consider standards not just within their statistical field,
but beyond them.
Also, the use of standards enhances the potential for data to be reused. While a statistical product
may have been designed for a specific purpose, the underlying data (final, intermediate, raw)
holds value for potential reuse by other programmes. The use of standards also increases the
possibility of the data assets to be reused not just for current needs, but for future needs of the
organisation.
13
For examples of standards for interoperability, see Section 3.3.
14
redistribution. Open access to data is crucial to achieve the benefits of widespread data use, reuse,
and repurposing.
To get the most value of data, official statistics must ensure that data can be used more effectively
by integrating or linking datasets. Hence, there is a need to define governance rules of data and
metadata that ensure aspects like quality, common structures and means of the data to be
disseminated or exchanged. Interoperability, as stated before, can be supported by ideally open
standards, which usually are determined collaboratively by sectoral or international organisations
with common needs. The adoption of common classifications, formats and tools facilitates
sharing, integrating, and linking data between stakeholders. Open data3 and interoperability
foster the flow of data between participants of national data systems and enable cross-border data
collaboration [5].
The use of open-source technologies contributes to reduce costs and facilitates adaptation to
different business needs. It is considered a good practice to use open-source technologies
whenever possible because it helps to the reusability of data and tools. Open data can be reused
for research, design, evaluation of public policies, innovation, and development of different
domain organisations.
3For Open Data see “Open Data for Official Statistics: History, Principles, and Implementation a
review on the principles and implementations of open data in official statistics” at
https://opendatawatch.com/publications/open-data-for-official-statistics-history-principles-
and-implentation/
15
Example 2: in CSV Example 3: in JSON
},
{
"Country ": "Canada",
"Country code": “CA”,
"Region": “North America”,
"Population 2000": 30.7,
"Population 2022": 38.9,
"Average annual population growth ": 1.1,
"Currency unit ": Canadian dollar,
}
….
]
16
communication and transport, and interfaces between components required to facilitate the
interaction between different systems, ensuring they can operate collaboratively. It covers the
applications and infrastructures linking systems and services. It includes interface specifications,
interconnection and data integration services, data presentation and exchange, and secure
communication protocols.
Theoretically, it is not impossible to have one facet of interoperability without others and one may
choose to focus solely on any individual facets of interoperability. However, the four facets are
closely related, and the four facets are needed to exchange and make use of information smoothly.
For example, different survey programmes could agree to use the same definition and code list for
“economic activity” which leads to semantic interoperability, but if data sets across different
programmes are still structured, stored, and encoded in different ways, the exchange, sharing and
re-use of data sets would require additional mapping, transformation, and communication.
17
Frequently, large part of research time of analysts is spent searching for data and executing the
transformations required to integrate it with other sources. Conceptual and technical standards
improve speed, efficiency, and consistency of research process, facilitating the comprehension of
data and eliminating potential errors caused for non-compatible terms. Thus, interoperability
enables users to better understand terms and concepts in data obtained from different sources
and domains, allowing new ways of gaining insights to solve the ever-increasing data challenges
that society faces.
From the producer’s side, interoperability improves productivity and efficiency with the reuse of
data, methodologies, tools and enables the quick access to data and information. Also, the
establishment of a common language improves the quality of production processes. For example,
the automation of processes to collect, integrate, process, or classify statistical data reduces the
potential for human error, promoting the production of high-quality data and a better decision
management.
Interoperability also plays a critical role to reduce costs and improve the quality of statistics for
producers. For example, with increased data sharing and reuse of applications among
stakeholders, the logical integration using common identifiers reduces redundancy and
unnecessary storage expenses.
18
and tools used in these data resource may lead to missing opportunities to increase
interoperability of statistics intended to be produced to be interoperable with them. Users
of the statistics produced may prioritise meeting their needs as exactly intended without
considering interoperability point of view. Therefore, during consultation with users and
stakeholders, the need for alignment of concepts and output needs to be communicated.
When this phase is initiated to review and update existing statistical programmes, it is
important to assess the impact of such changes with respect to interoperability.
2. Design Phase: “this phase includes the development and design activities, and any
associated practical research work needed to define the statistical outputs, concepts,
methodologies, collection instruments and operational processes”. Design Phase plays a
critical role not only in ensuring interoperability across the instance of production process
but also in facilitating the overall interoperability of final statistics and any artefacts
produced across the organisation. Creating variable, value domains or classifications only
slightly different from existing ones just to meet the immediate needs (sub-process 2.2)
would negatively impact the interoperability. Classes that could be consulted with the
central repository or metadata system include conceptual classes (e.g., variable, value
domain, classification, unit types) as well as those that are related to the exchange (e.g.,
data format, questionnaire, question statements, legal agreements, license). Given that
metadata is critical to understand and make use of any data set, a lack of standardisation
of the way metadata is captured and modelled in different stages will have detrimental
impact on the interoperability.
3. Build Phase: “this phase builds and tests the production solution to the point where it is
ready for use in the ‘live’ environment”. While many design decisions are made during the
Design Phase, there are several choices made at the implementation stage that could
impact interoperability. For example, specific data collection systems might use different
data formats or encoding, which could be exacerbated when multiple data collection
modes are involved in the process. It is imperative that data dissemination methods, such
as those involving APIs, are thoroughly documented to facilitate interoperability and
efficient data sharing.
4. Collect/Acquire Phase: “this collects or gathers all necessary information (e.g., data,
metadata and paradata), using different collection modes (e.g., acquisition, collection,
extraction, transfer), and loads them into the appropriate environment for further
processing”. With statistical organisations increasingly involved with sources that are not
under their direct control (e.g., administrative data, big data from the web), ensuring
interoperability becomes even more challenging. Without proper documentation of the
data and mapping (e.g., between code lists used by different sources), the risk of
introducing non-interoperable elements during this phase significantly increases.
5. Process Phase: this phase “describes the processing of input data and their preparation
for analysis”. In this phase, various processes are applied to data and lack of availability of
data provenance information would lead to non-interoperability. For example, data
integration from different sources would need more mapping and transformation
processes if the data sets do not share common concepts or classification. Besides, if the
classification or code list associated to the variables collected are not common between
different programmes, it requires validation and edition rules at each iteration, increasing
risk of error and mistakes.
19
6. Analyse Phase: “in this phase, statistical outputs are produced and examined in detail”.
If the processed data files are not interoperable, comparing statistics with previous cycles
of the same programme or other related data would be difficult. If the concepts are not
interoperable, comparisons may be even impossible. There will be additional efforts
needed for carrying out in-depth statistical analyses such as time-series analysis,
consistency, and comparability analysis when concepts, classifications and code lists are
different.
7. Disseminate Phase: “this phase manages the release of the statistical products to
users”. Non-interoperable data sets are more difficult to prepare and put into output
systems, because formatting the data and metadata in a manual or semi-automated way
is prone to error. The lack of a common classification for different domains hinders the
user’s information discovery. For example, use of common standard such as the
Classification of Statistical Activities (CSA) [8] could help classify information about
statistical activities, data, and products by providing a top-level structure to make it easier
to find information about different domains, such as demographic, economic and
environment statistics.
20
3. DAFI Components
This section lists key elements that are important in achieving interoperability in statistical
organisations, focusing on the factors that help achieve the desired interoperability. These
components encompass organisational roles, legal and business policies that influence
interoperability, the standards and technologies that facilitates it.
• Chief Data Officer: A chief data officer (CDO) is the manager dedicated to the organisation
data strategy: he/she is responsible for the utilisation and governance of data across the
organisation. A CDO is a senior executive who drives growth by following a data-driven
approach.
• Chief Information Officer: A chief information officer (CIO) is the high-ranking executive
responsible for the management, implementation, and usability of information and
computer technologies systems of an organisation. A CIO oversees the maintenance and
of the internal technology processes as a way of maximising organisation productivity and
making complex tasks more achievable through automation. To navigate through
continually changing landscapes, a CIO needs a diverse skillset in terms of leadership,
communication ability, etc.
• Data Governance Manager: Data governance managers are responsible for implementing
and managing the data governance framework, policies, and procedures. They work
closely with various departments to ensure compliance and adherence to data governance
principles.
• Data Stewards: Data stewards are responsible for specific sets of data and ensure the
quality, accuracy, and integrity of the data, as well as its compliance with data governance
policies. Internally in the organisations, data stewards are often subject matter experts in
specific domains (e.g., business statistics, health statistics).
• Data Architects: Data architects design and develop the organisation's data architecture
to support interoperability. They create data models and structures that facilitate seamless
data exchange between systems.
• IT Managers: IT managers play a crucial role in ensuring that technical systems and
infrastructure to support data interoperability. They oversee the implementation of data
integration solutions and manage the data exchange processes.
21
• Privacy and Compliance Officers: These individuals are responsible for ensuring that data
governance practices comply with relevant privacy regulations and legal requirements.
They help manage data access, usage, and consent mechanisms to safeguard sensitive
information.
• Business Analysts: Business analysts bridge the gap between technical teams and business
users. They help define data requirements, identify data sources, and assess data quality
to support interoperability initiatives.
• Data Consumers: These are the end-users or departments that utilise the data for
decision-making and operational purposes. Data consumers play a vital role in providing
feedback on data quality and ensuring that data meets their specific needs.
• Statistical Standard Experts: These experts support statistical program areas in all
matters related to the development, use or implementation of statistical standards, which
are key to interoperability. This could be provided in the form of supporting the
development of standard concepts and value domains, e.g., classifications, definitions,
etc., following established principles. They could also provide support in the use of
standard models which allow the proper capture and management of metadata used to
describe data.
• Publishing Staff: Publishing staff validates data and metadata for publication readiness.
They may be involved in drafting/approving data governance guidelines. May apply to
primary or secondary data.
Various governance bodies can manage interoperability issues to ensure the smooth exchange and
integration of data. These bodies often oversee the implementation of standards and protocols to
promote data consistency and coherence. Some governance bodies include:
https://www.ons.gov.uk/methodology/methodologytopicsandstatisticalconcepts/qualityinofficialstatistic
s/onsstatisticalqualityimprovementstrategy
22
to support data management and interoperability, including decisions on the adoption of
standardised technologies and platforms for data exchange and integration.
• Data Quality Assurance Board: with the task of monitoring and ensuring the quality of
data produced and disseminated by the NSO. It establishes protocols and procedures for
data validation, verification, and quality control to maintain high data standards and
promote interoperability among different datasets.
• Inter-agency Data Sharing Task Force: facilitates collaboration and data sharing among
various government agencies and departments. It works to establish data-sharing
agreements, protocols, and mechanisms that promote interoperability and seamless data
exchange between different entities. 6
In the Annex 3, we listed the roles and responsibilities taken from ISO/IEC 11179, an international
standard for representing, storing, and maintaining metadata in a metadata registry. One section
is specifically devoted to the roles associated with the metadata registry. While addressing the
semantics of data, the representation of data and the registration of the descriptions of that data,
ISO/IEC 11179 intends to promote harmonization and standardisation of data and metadata and
its re-use within an organisation and across organisations.
23
and practices across diverse entities (both among different organisations and within the
organisation).
In each country, there exists a Statistics Law that outlines the roles of NSOs and mandates their
activities. Some of these laws may include provisions that imply a responsibility for NSOs to
actively engage in and contribute to interoperability efforts (see Box 3.2 and Box 3.3 for examples
from Canada and Mexico respectively).
With a growing recognition on the importance of data driving economy and improving quality of
public sector services, there are increasing trend in developing centralised platform for the
provision of data from public sector which often accompanied by legal decisions and data policies
at the national level. Initiatives such as open government data strategy, national data strategy also
pushes further for a more data governance and management across the society. For example,
México’s Strategic Program of the National System of Statistical and geographical information
2022 – 2046 stablishes a specific goal and a general action related to the consolidation of the
interoperability between statistical and geographical programmes
(www.snieg.mx/Documentos/Programas/PESNIEG_2022-2046.pdf).
In contrast to other public entities where data is often a secondary by-product, the core business
of NSO is about data. With its substantial methodological and technical expertise, coupled with
long proven history of managing data at scale, NSOs naturally find themselves taking on a crucial
role in enhancing interoperability in the public sector broadly (see also Box 3.4 in the following
section for example from Italy).
24
The Information Infrastructure is the set of data and methodologies that support the
information production process to facilitate its interoperability and it is made up of catalogues,
classifications, statistical and geographical registries, and methodologies. The use of a common
Information Infrastructure facilitates the integration or linkage of information from different
statistical and geographical production processes.
Sources: https:/www.diputados.gob.mx/LeyesBiblio/ref/lsnieg.htm
www.snieg.mx/Documentos/Programas/PESNIEG_2022-2046.pdf
On top of laws and initiatives at the national level, here's how legal and business policies can
contribute to enhancing interoperability in NSOs:
1. Standardised data formats and protocols: Implementing policies that mandate the use of
standardised data formats and protocols promotes uniformity in data representation,
enabling smooth data exchange and integration within the NSO and among data
providers.
2. Data sharing agreements and contracts: Establishing clear agreements that govern data
sharing between various government agencies and external partners can facilitate the
secure and efficient sharing of data while ensuring compliance with data protection
regulations and confidentiality requirements.
3. Open data policies guidelines: Implementing open data policies guidelines encourages the
responsible sharing of non-sensitive data with the public and businesses, fostering
transparency, innovation, and economic development while safeguarding data privacy
and confidentiality.
4. Data governance frameworks and business process integration: Developing
comprehensive data governance frameworks and aligning business processes with
interoperability standards can streamline data management practices and improve the
compatibility and consistency of data across different systems and departments within the
NSO (see Box 3.5. for the example of data governance framework from Israel).
5. Compliance with industry standards and best practices: Aligning legal and business
policies with industry standards and best practices, such as those recommended by
international organisations and data governance authorities, promotes the adoption of
interoperable technologies and practices, facilitating data integration and exchange at
both national and international levels.
By incorporating these legal and business policies, NSOs can enhance their interoperability
capabilities, ultimately contributing to the overall development and advancement of national
statistics and data governance.
25
The use of standards is key to ensure all types of interoperability as described in Section 2.
Standards are a set of agreed-upon and documented guidelines, specifications, accepted
practices, technical requirements, or terminologies for diverse fields. They can be mandatory or
voluntary and are distinct from acts, regulations, and codes, although standards can be referenced
in those legal instruments.
In the world of statistics, we can think of statistical standards, which are standards about all
aspects of the statistical production, either processes/capabilities or the data/metadata they use.
A statistical data and metadata standard is a statistical standard about how data and
metadata are managed, organised, represented, or formatted. This includes information about
processes (designs and plans of statistical programmes and each step in the statistical process),
capabilities to produce statistics, data, and metadata itself, the meaning of data and the terms
used in relation to data and its structure. It enables consistent and repeatable description (e.g.,
definitions), representation (e.g., permitted values, format), structuring (e.g., logical model), and
sharing (e.g., exchange model) of data.
Examples of statistical data and metadata standards:
26
For interoperability purposes, standards need to be applied to information that is required to
advance the statistical processes. The use of the GSIM groups of concepts, exchange, business,
and structures, tied together with (meta)data catalogues for management purposes, provides a
framework in which multiple information domains and the applicable standards can be described.
The following image provides this overview:
• The Concept Group is used to define the meaning of data, providing an understanding of
what the data are measuring;
• The Exchange Group is used to catalogue the information that comes in and out of a
statistical organisation via Exchange Instruments. It includes information objects that
describe the collection and dissemination of information;
• The Business Group is used to capture the designs and plans of Statistical Programs,
and the processes that are undertaken to deliver those programs. This includes the
identification of a Statistical Need, the Business Processes that comprise the Statistical
Program and the Assessment of them;
27
• Vocabularies are organised collection of terms and relationships used to describe one or
more domains pertinent to the production of statistics.
• Methods are standard technical means of accessing and exchanging data and metadata.
• Languages are programming languages for the validation, analysis, processing and
transformation of data and metadata.
• Workflows are standard process models that capture data and metadata processing at
different levels of detail.
• Data models are standard structure specifications for the representation of data and
metadata.
Enablers for interoperability
The following schema highlights the most important enablers that allow you to more easily reach
the different facets (layers) of interoperability seen in chapter 1. Here we report both the image
and a brief description of the enablers, including some enablers absent from this scheme, but
generally useful for achieving interoperability.
• Use Machine Actionable Formats (MAFs) - MAFs are designed to make it easier for
machines to access, share, and use data, their benefits:
o MAFs can help to increase automation by making it easier for machines to
perform tasks such as data extraction, cleaning, and analysis. This can free up NSO
staff to focus on more strategic and value-added tasks.
o MAFs can help to improve data governance by making it easier to track and
manage the use of data, helping to ensure that data is used in a responsible and
ethical manner.
o MAFs can help to enhance data security by making it easier to encrypt and
protect data, mitigating the risk of data breaches and other security threats.
28
• Use Standardised Exchange Protocols (SEPs) - are protocols that define how data should
be exchanged between different systems and technologies. In official statistics, the most
relevant example is SDMX, but also JSON and CSV. SEPs benefits:
o SEPs can help to reduce the costs associated with data dissemination and use
by making it easier to share and reuse data.
o SEPs can help to improve data quality by reducing the risk of data duplication
and errors.
o SEPs can help to increase transparency and accountability by making it
easier to track the provenance and usage of data.
For syntactic interoperability:
29
seamlessly shared and interpreted across various statistical agencies, facilitating the
integration and comparison of data from different sources.
• Complete APIs Documentation: complete API documentation is a comprehensive set of
documents that explains how to use an API. This is an essential tool to use APIs effectively:
by providing complete API documentation, NSOs can make it easier for their staff and
users to understand and use APIs. This can lead to improved data collection, processing,
and dissemination.
For semantic interoperability:
30
which will make it possible to develop and increase interoperability between data of national
interest.
The investment involves the creation of a National Data Catalogue, with the aim of providing
a common model and standard and promoting the exchange, harmonisation and
understanding of information between public administrations, within the context of the
National Digital Data Platform. The Catalogue will make available controlled vocabularies and
classifications capable of making access to different information bases more functional.
To manage the project, the establishment of an Implementation Committee for the governance
and direction of the agreement is envisaged, in which the Department for Digital
Transformation at the Presidency of the Council of Ministers and Istat participate, but which is
also open to other possible public entities, such as the Agency for Digital Italy (AgID) and
National Research Council (CNR). For the development of the project plan, which provides for
a budget of 10.7 million euros, an important commitment of highly skilled human resources is
required, to be recruited through new hires. For Istat, specifically, the selection of a contingent
of up to 25 full-time people is envisaged, with technical, thematic, methodological, and legal
skills.
METAstat: the new Istat Metadata System
The importance of interoperability, which can be pursued primarily using statistical models and
standards existing at an international level (GSBPM, GSIM, etc.), has clearly emerged in the
context of the National Data Catalogue. At Istat level, the contribution to interoperability can
be achieved by having two fundamental infrastructures available: a complete and transversal
metadata system together with ontologies and controlled vocabularies.
Istat is currently working on the creation of METAstat, the new institutional system for the
documentation of metadata, processes, and statistical products. It will consist in three core
modules (controlled terminology collection; structural metadata; referential metadata),
currently independent one from the others. It will integrate the different Istat systems
containing data (and consequently metadata), with the aim not only of improving their
performance and aligning them in their common aspects but also of adding the necessary
functions to assist production processes to the current documentary aspects. Indeed, METAstat
is designed not to be a passive catalogue of metadata, to be fed ex post, but must have an active
role in providing production services with the concepts (represented by metadata) on which to
structure the data to be produced (metadata driven). It will enter the production processes
already in the design phase of the survey and will have to be integrated into the production
processes. METAstat is intended to provide an active support to simplify and automate
production processes, as well as to increase the reliability, consistency and timeliness of the
data produced (quality principles). In this way the sharing (internal or external) of the data
produced will be simplified and facilitated, because these data are already structured on shared
and certified metadata from birth.
Both National Data Catalogue and METAstat need to manage the semantic heritage common
to administrative procedures and statistical production processes, respectively.
Consultation, reuse, and reporting will be the characterising METAstat functions, also in a
metadata driven and interoperability perspective.
It is clear how crucial the aspects of governance and shared rules are before the development of
the system: the definition of an appropriate Istat metadata governance, with impacts on
31
relations with and between data systems that make use of metadata, represents an essential
task of the project.
32
At the CBS we follow the principles of "Security by design" and the "Need-to-know" security
principle.
The security-by-design policy ensures that systems and all their components are created from
the very on-set with security in mind. It is about taking a proactive approach and integrating
security from the very start.
The Need-to-know principle states that a user shall only have access to the information that
their job function requires, regardless of their security clearance level or other approvals.
Also, we follow the Five Safes framework for helping make decisions about making effective use
of data which is confidential or sensitive.
The Five Safes proposes that data management decisions be considered as solving problems in
five 'dimensions': projects, people, settings, data, and outputs. The combination of the controls
leads to 'safe use'.
• Safe projects - Is this use of the data appropriate?
• Safe people - Can the users be trusted to use it in an appropriate manner?
• Safe settings - Does the access facility limit unauthorized use?
• Safe data - Is there a disclosure risk in the data itself?
• Safe outputs - Are the statistical results non-disclosive?
Quality –
The quality assurance framework is an important part in the Governance Framework of the
CBS.
It is constituted of two parts:
The first part describes the organisation's quality assurance management protocols - the
appointment of the commissioner of statistical quality and his duties, the definition of quality
trustees in the CBS departments, and their roles in the management and examination of Quality
Indicators that will be generated in the CBS' data file management and processing system.
The second part is the methodological part which defines the development, execution, and
examination of Leading Quality Control Indicators that will monitor the quality of sample-
based and administrative data files in the CBS.
These two parts, together with the "statistics work regulations" of the CBS, which is based on
the European Statistics Code of Practice, constitute the quality section of the CBS' Governance
framework.
33
34
4. Recommendations
This section contains a set of recommendations that include activities that help achieve a good
level of statistical interoperability. There are recommendations to use some techniques - some
specific, others more generic or "architectural," as well as the suggestion of some "organizational"
changes that can support interoperability. Following most of these recommendations would allow
the NSO to achieve a sort of "interoperability by design."
7 For example, In UK the Open Standards Board works with the Cabinet Office and is accountable for
transparent selection and implementation of open standards (see
https://www.gov.uk/government/groups/open-standards-board ), managing also the interoperability
issues among public bodies and private companies; In Australia a framework describing also
governance process were implemented in the “Australian Government Technical Interoperability
Framework” available at https://www.unapcict.org/sites/default/files/2019-
01/Australian%20Government%20Technical%20Interoperability%20Framework.pdf
35
5. Develop and implement interoperability standards. These standards should define how
different systems and technologies will communicate with each other.
6. Monitor and evaluate interoperability initiatives. This is important to ensure that
interoperability is being achieved in a way that meets the needs of the NSO and its
stakeholders. Implementing an interoperability strategy can be time-consuming; hence,
organisations might choose to initiate a trial through a small-scale project which allows
for learning and adaptation, identifying successful aspects and areas for improvement.
Consequently, modifications can be refined based on learned insights.
Metrics available to evaluate Interoperability level.
Establishing a set of clear metrics is important given that the journey toward interoperability
involves multiple stakeholders and can span an extended period. These metrics serve as
measurable indicators to assess progress and effectiveness in a concrete manner, providing a
tangible means to evaluate how far the organisation has come. Here some suggestions of metrics
that can be used to evaluate the interoperability “level” inside statistical organisations:
• Percentage of statistical processes that use common data standards and definitions -
extent to which NSOs are using common standards to collect, process, and disseminate
data.
• Percentage of statistical processes that use the same software tools in specific sub-
processes - generalised software as opposed to ad-hoc software (e.g., starting from
standard tools for data exchange like SDMX).
• Percentage of statistical processes that use the standard metadata system – level of use of
standard metadata system.
• Percentage of statistical processes whose data are disseminated as Linked Open Data
(LOD) - extent to which the “statistical” data are semantically integrated with other data.
• Percentage of statistical processes that can share data with each other seamlessly - extent
to which different processes are able to share data with each other without the need for
manual intervention.
• Stakeholder and user satisfaction with the interoperability of NSOs' statistical processes -
collected through surveys or other feedback mechanisms to assess how satisfied users are
with the ability to access and use NSO data from multiple sources.
Other assessment tools that can be useful to measure interoperability levels are maturity models.
These are “tools that set out criteria and steps that help organizations measure their ability and
continuous improvement in particular fields or disciplines” [9]. Maturity models define levels to
characterise the state of specific fields or areas. Box 5 shows a few examples of interoperability
maturity models.
Box 4.1. Examples of Interoperability Maturity Models (IMMs)
36
• European Commission’s ISA programme IMM: focused on measuring how a public
administration interacts with external entities to organize the efficient provisioning of its
public services to other public administrations, businesses and or citizens. The model
distinguishes three domains of interoperability: Service delivery, service consumption and
service management and uses a five-stage model to indicate the interoperability maturity of
the public service.
• DOE's Office of Scientific and Technical Information (OSTI) IMM: Written for stakeholders
in technology integration domains, identifies interoperability criteria grouped into six main
categories: Configuration and evolution, Safety and security, Operation and Performance,
Organisational, Informational and Technical. In addition, as several criteria focus more on
the culture changes and collaboration activities required to help drive interoperability
improvements in an ecosystem or community of stakeholders, an additional “Community”
category was formed. The maturity levels in the IMM are based on the Capability Maturity
Model Integration (CMMI).
• National Archives of Australia Data IMM: can be applied to all data produced by an agency
that has the potential to be integrated, exchanged, or shared. The DIMM helps to measure
an agency information and data governance across five key themes: business, security, legal,
semantic, and technical. Each theme is split into categories and each category has 5 steps
that describe the common data interoperability behaviours, events, and processes for the
corresponding level of maturity.
Interoperability Maturity Models present relevant concepts and help to specify a strategic vision
for interoperability. They focus on the relationship between interoperability and other specific
areas that could be improved based on organisational objectives and can be used to identify the
current level of data interoperability maturity, to identify gaps between the actual and desired
interoperability level or to the planification of improvements to reach the maturity levels needed
by an organisation. These maturity models besides considering semantic, structural, syntactic and
system or technological interoperability facets, they consider other aspects like the legal,
organisational, and human capabilities that can aid to achieve statistical interoperability.
37
(fair), and decision making is determined by consensus (consensus). Such standards are called
open, and they are the best candidates for statistical offices from which to choose.
Introduce Open Standards
Open standard refers to a standard that is openly accessible and usable by anyone. Compliance
with open standards can significantly enhance interoperability within National Statistical Offices
(NSOs) in several keyways:
1. Consistency and compatibility: Open standards provide a common framework for data
representation and exchange. By adhering to these standards, NSOs ensure that their data
formats and structures are consistent and compatible with those of other systems,
enabling data integration and exchange between different entities.
2. Facilitated data sharing: Open standards create the basis for data sharing among NSOs
and external stakeholders. When data is formatted and documented according to open
standards, it becomes easier for different systems and organisations to share and access
data, fostering improved collaboration and information exchange.
3. Reduced integration efforts: Open standards streamline the process of integrating data
from disparate sources by providing a well-defined set of rules and protocols. NSOs can
use these standards to minimize the efforts required for data integration, allowing for
more efficient and cost-effective interoperability between systems and platforms.
4. Enhanced accessibility and transparency: Compliance with open standards promotes data
accessibility and transparency, as it ensures that data is accessible and comprehensible to
a wider audience. This accessibility fosters greater transparency in data sharing and
dissemination, enabling stakeholders and the public to access, analyse, and utilize
statistical information.
5. Long-term sustainability: By aligning with these standards, NSOs can ensure the long-
term sustainability of their data management systems, as they remain compatible with
evolving technological advancements and changing data requirements.
6. Promotion of innovation: Open standards encourage innovation and the development of
new tools and technologies. NSOs can leverage open standards to foster a culture of
innovation, enabling the integration of new technologies and methodologies for data
collection, processing, and dissemination.
Overall, compliance with open standards in NSOs plays a vital role in fostering a more
interconnected and efficient data ecosystem, promoting collaboration, transparency, and
innovation within the statistical community. In Box 4.2, we find an example of use of open
standards as enablers for interoperability by Statistics Canada.
Box 4.2. Enablers for interoperability: Statistics Canada “Enterprise Information
and Data Management” (EIDM) project
Under the EIDM project, data management, metadata management, standards and governance
were brought together in a unified vision that would enable interoperability.
As part of this 4-year project, DDI, SDMX, DCAT and associated DCAT application profiles are
standards that were officially adopted by StatCan governance bodies. Policy instruments were
updated to reflect the mandatory use of these standards and the enterprise tools that are based
on these standards.
38
The project led to the implementation of the following tools and standards:
• Colectica to manage metadata on instruments and conceptual/referential metadata,
e.g., variables, universes, and studies, using the DDI standard.
• Ariā, a classification management tool based on GSIM and Neuchâtel Model, with the
possibility of mapping to SKOS and XKOS
39
Sometimes there are multiple standards that can meet the requirements. Decisions on which
standard to use can be based on the following: availability of open standards, availability of
open-source applications, wide use of a standard, fit within the current ecosystem, etc.
Consider which ones would be the easiest to implement and start with one or a few of those.
There are likely already processes and systems in place that can be used or expanded upon.
4. Investigate approach through experiments: Investigate your proposed approach through
small and focussed experiments. By doing so, it will be easier to learn through them and to
iterate.
Ensure that you involve business partners through all steps of the experiments; a user-centric
approach will ultimately lead to the success of the implementation.
Explore how this new initiative would fit within an existing ecosystem.
If applicable, consider how it would relate to existing metadata. This will require mapping the
legacy metadata to the standard being considered. You may choose to migrate existing
metadata or simply link it within the ecosystem.
Note that re-using existing applications based on the chosen standard will likely yield quicker
results than building new applications.
5. Validate results: It is important to validate the tools, the standards, and the business
processes. If the experiment doesn’t yield the expected results, reconsider the standard, tool
or business processes used. You may find that the business requirements were not properly
expressed.
6. Move towards deployment: Once an approach has been determined as optimal, ensure that
the proper governance tools are in place to make mandatory the use of the standard and its
related tools.
Complete all steps necessary to move the standard, tools, and business processes into active
use.
If your new initiative is part of an existing ecosystem, it will be necessary to integrate it within
the ecosystem. If your interoperability initiative is related to improved metadata, migrate or
link to existing metadata.
Note that the last two steps may be part of their own individual projects to ensure that
initiatives stay small and focussed.
7. Onboard new users: The final step is to onboard new users. This will require a change
management plan, where communications, training and user guides are key components.
40
Semantic Web technologies enable people to create data stores on the Web, build vocabularies,
and write rules for handling data. This is empowered by technologies such as RDF, SPARQL,
OWL, LOD and SKOS, which are standards from the World Wide Web Consortium (W3C).
The semantic web can also help to improve the quality of official statistics by enabling data to be
more easily validated, integrated, and analysed. By providing a common framework for data
exchange, the semantic web can help to reduce the risk of errors and inconsistencies in data and
enable more accurate and reliable statistical analysis.
The Resource Description Framework (RDF) is a framework for expressing information about
resources. Resources can be anything, including documents, people, physical objects, and abstract
concepts.
RDF is intended for situations in which information on the Web needs to be processed by
applications, rather than being only displayed to people. RDF provides a common framework for
expressing this information so it can be exchanged between applications without loss of meaning.
Since it is a common framework, application designers can leverage the availability of common
RDF parsers and processing tools. The ability to exchange information between different
applications means that the information may be made available to applications other than those
for which it was originally created.
Many of the standards in place that are listed in the annex of chapter 3 and applicable to statistical
data or metadata are both RDF-based and open: DCAT (Data Catalog Vocabulary) and associated
application profiles DCAT-AP, StatDCAT-AP and GeoDCAT-AP; XKOS to name a few.
Linked Open Data (LOD) is a basic component of the Semantic Web techniques as they provide a
standardized, linked, and machine-readable framework for representing and exchanging
statistical data. LOD can improve statistical interoperability in several ways:
• Publish statistical data as LOD. This will make it easier for machines to understand
and use the data, and to link it to data from other sources.
• Use LOD to create a central repository for statistical metadata. This will make
it easier for users to find and understand the data that is available.
41
(ii) Organisational barriers such as the need for workflow changes (silo breakdowns), and
a culture shift towards collaboration. The fragmentation within the organisation is not
necessarily due to lack of knowledge or skills to implement technical standards for data
interoperability, but rather lack of time and resources reflecting how difficult it is to
change course in the way day-to-day operations are carried out when staff must deal
with a continuous demand for new data products while keeping key legacy systems
running. 8
People who are affected by the adoption of standards, especially those who must alter their
established practices, tend to resist change. This is why involving stakeholders from the outset is
crucial. It helps them view the requirements not as impositions but as necessary steps to deliver
greater value to the organisation.
The vision and business case set out in the interoperability strategy (section 4.1) therefore plays
an important role in imbuing the common sense of direction.
Sponsorship by high-level management can be a valuable for overcoming resistance to the
introduction of interoperability providing credibility and legitimacy, building trust among
the staff and stakeholders, and advocacy building support for the initiative and to overcome
resistance.
42
References
[1] DAMA International; DAMA-DMBOK. Data Management Body of Knowledge. 2nd Edition;
Technics Publications; USA; 2017 (https://technicspub.com/dmbok/).
[2] CES Task Force on Data Stewardship (2023) Data stewardship and the role of national
statistical offices in the new data ecosystem (https://unece.org/sites/default/files/2023-
04/CES_02_Data_stewardship_for_consultation.pdf; accessed July 2023).
[3] National Academies of Sciences, Engineering, and Medicine 2022. Transparency in Statistical
Information for the National Center for Science and Engineering Statistics and All Federal
Statistical Agencies. Washington, DC: The National Academies Press.
https://doi.org/10.17226/26360.
[4] The Open Data Institute (2013). What makes data open?
https://theodi.org/insights/guides/what-makes-data-open/
[5] World Bank (2021). World Development Report 2021: Data for better lives.
https://www.worldbank.org/en/publication/wdr2021
[6] Kécia Souza, Larissa Barbosa, Rita Suzana Pitangueira; Interoperability Types Classifications:
A Tertiary Study; ACM Digital Library; USA; 2021; https://doi.org/10.1145/3466933.3466952
[7] GSBPM v5.1 (2019) The Generic Statistical Business Process Model
https://unece.org/statistics/documents/2019/01/standards/gsbpm-v51
[8] ECE/CES (2022). Classification of Statistical Activities (CSA) 2.0 and explanatory notes.
https://unece.org/sites/default/files/2022-05/ECE_CES_2022_8-2205369E.pdf
[9] González Morales Luis, Orell Tom; Introducing the Joined-Up Data Maturity Assessment;
UNSD-GPSDD (2020)
https://www.data4sdgs.org/sites/default/files/file_uploads/Joined_Up_Data_Maturity_Asses
sment_draft5.pdf
43
Annex 1 - Standardised Vocabularies, Methods, Formats,
Frameworks, Languages, Workflows and Data models
(Meta)Data catalogues
Vocabularies
• Schema.org: is a reference website that publishes documentation and guidelines for using
structured data mark-up on webpages (called microdata). It is a part of the semantic web
project.
• DCAT-AP: is the DCAT Application Profile for data portals in Europe (DCAT-AP). It is
a specification based on the Data Catalogue Vocabulary (DCAT) developed by W3C.
This application profile is a specification for metadata records to meet the specific
application needs of data portals in Europe while providing semantic interoperability
with other applications based on reuse of established controlled vocabularies (e.g.,
EuroVoc) and mappings to existing metadata vocabularies (e.g., Dublin Core, SDMX,
INSPIRE…).
• GeoDCAT-AP: is a geospatial extension for the DCAT application profile for data
portals in Europe.
• Dublin Core: also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen
main metadata items for describing digital or physical resources. Dublin Core has been
formally standardized internationally as ISO 15836 and as IETF RFC 5013.
• DDI-RDF Discovery Vocabulary (Disco): defines an RDF Schema vocabulary that enables
discovery of research and survey data on the Web. It is based on DDI XML formats of DDI
Codebook and DDI Lifecycle.
44
Concepts
Vocabularies
• DDI (Data Documentation Initiative) is a free international standard for describing the data
produced by surveys and other observational methods in different sciences. DDI can
document and manage different stages in the research data lifecycle.
• SDMX (Statistical Data and Metadata eXchange) is an international initiative that aims at
standardising and modernising the mechanisms and processes for the exchange of statistical
data and metadata among international organisations and their member countries.
• NIEM (National Information Exchange Model) is a common vocabulary that enables efficient
information exchange across diverse private and public organisations.
• XBRL (eXtensible Business Reporting Language) is an open international standard for digital
business reporting, managed by a global not for profit consortium. It provides a language in
which reporting terms (mostly financial) can be authoritatively defined.
• FIBO (Financial Industry Business Ontology) is the industry standard resource for the
definitions of business concepts in the financial services industry. This dictionary enables you
to detect the terminology defined by the FIBO Vocabulary
• RCC (Region Connection Calculus) is a method used in AI of representing and reasoning about
space. It is based on the idea of dividing space into regions and representing the relationships
between regions using a set of calculus rules.
• ORG is a core ontology for organisational structures, aimed at supporting linked data
publishing of organisational information across several domains.
Exchange
Vocabularies
• DDI (Data Documentation Initiative) is a free international standard for describing the data
produced by surveys and other observational methods in different sciences. DDI can
document and manage different stages in the research data lifecycle.
45
• SDMX (Statistical Data and Metadata eXchange) is an international initiative that aims at
standardising and modernising the mechanisms and processes for the exchange of statistical
data and metadata among international organisations and their member countries.
• NIEM (National Information Exchange Model) is a common vocabulary that enables efficient
information exchange across diverse private and public organizations.
• The OpenAPI Specification is a specification language for HTTP APIs that provides a
standardized means to define your API to others.
Methods
• REST (Representational state transfer) is a software architectural style that was created to
guide the design and development of the architecture for the World Wide Web.
• SOAP (Simple Object Access Protocol) is a messaging protocol specification for exchanging
structured information in the implementation of web services in computer networks.
• SPARQL (SPARQL Protocol and RDF Query Language), is the standard query language and
protocol for Linked Open Data on the web or for RDF triplestores.
• SHACL (Shapes Constraint Language) is a W3C standard language for describing Resource
Description Framework (RDF) graphs.
• GraphQL is an open-source data query and manipulation language for APIs and a query
runtime engine.
• ODATA (Open Data Protocol) is an open protocol (ISO standard) that allows the creation and
consumption of queryable and interoperable REST APIs in a simple and standard way.
• XML (Extensible Markup Language) is a markup language for storing, transmitting, and
reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that
is both human-readable and machine-readable.
• JSON (JavaScript Object Notation) is a lightweight data-interchange format, easy both for
humans and for machines to parse and generate. JSON is a text format completely language
independent but uses conventions that are familiar to programmers of the C-family of
languages (C, C++, Java, Python …). These properties make JSON an ideal data-interchange
language.
46
• HTML (HyperText Markup Language) is the standard markup language for documents
designed to be displayed in a web browser.
• JSON-LD (JSON for Linking Data) is a lightweight Linked Data format. It is based on the
already successful JSON format and provides a way to help JSON data interoperate at Web-
scale. JSON-LD is an ideal data format for REST Web services and unstructured databases
such as Apache CouchDB and MongoDB.
• Turtle (Terse RDF Triple Language) is a syntax and file format for expressing data in the
Resource Description Framework data model. Turtle syntax is like that of SPARQL, an RDF
query language.
• CSV (Comma-Separated Values) file is a delimited text file that uses a comma to separate
values. Each line of the file is a data record. Each record consists of one or more fields,
separated by commas.
• Text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer
file that is structured as a sequence of lines of electronic text. A text file exists stored as data
within a computer file system.
Frameworks
• GSBPM (Generic Statistical Business Process Model) is a model that describes statistics
production in a general and process-oriented way. It is used as a common basis for work with
statistics production in different ways, such as quality, efficiency, standardisation, and
process-orientation.
• GAMSO (Generic Activity Model for Statistical Organisations) describes and defines the
activities that take place within a typical organisation that produces official statistics. It
extends and complements the GSBPM by adding additional activities needed to support
statistical production.
47
statistical community. CSPA provides a framework, including principles, processes, and
guidelines, to help reduce the cost of developing and maintaining processes and systems.
• FAIR (Find, Access, Interoperate, and Reuse) Digital Objects provide a framework to develop
cross-disciplinary capabilities, deal with the increasing data volumes, build tools that help to
increase trust in data, create mechanisms to efficiently operate in the scientific domain, and
promote data interoperability.
• International Open Data Charter is a set of principles and best practices for the release of
governmental open data, formally adopted by many governments.
Business
Languages
• SAS is a statistical software suite developed by SAS Institute for data management and
statistical analysis.
• R is a free and open-source extensible language and environment for statistical computing
and graphics.
• VTL (Validation and Transformation Language) is a standard language for defining validation
and transformation rules (set of operators, their syntax, and semantics) for any kind of
statistical data.
Workflows
48
approach to describing a range of needed data formats: traditional wide/rectangular data,
long [event] data, multi-dimensional data, and NoSQL/key-value data.
• BPMN (Business Process Model and Notation) is a standard set of diagramming conventions
for describing business processes. It visually depicts a detailed sequence of business activities
and information flows needed to complete a process.
• CMMN (Case Management Model and Notation) is a graphical notation used for capturing
work methods that are based on the handling of cases requiring various activities that may be
performed in an unpredictable order in response to evolving situations.
• DMN (Decision Model and Notation) is a modelling language and notation for the precise
specification of business decisions and business rules. DMN is easily readable by the different
types of people involved in decision management.
• CWL (Common Workflow Language) is an open standard for describing how to run command
line tools and connect them to create workflows. Tools and workflows described using CWL
are portable across a variety of platforms.
Structures
Data models
• DDI (Data Documentation Initiative) is a free international standard for describing the data
produced by surveys and other observational methods in different sciences. DDI can
document and manage different stages in the research data lifecycle.
• SDMX (Statistical Data and Metadata eXchange) is an international initiative that aims at
standardising and modernising the mechanisms and processes for the exchange of statistical
data and metadata among international organisations and their member countries.
• EDM (Entity Data Model) is a set of concepts that describe the structure of data, regardless of
its stored form. The EDM borrows from the Entity-Relationship Model described by Peter
Chen in 1976, but it extends its traditional uses.
• RDF DM (Data Model) is a standard model for data interchange on the Web. RDF has features
that facilitate data merging, and it specifically supports the evolution of schemas over time.
• A Labelled Directed Graph is, as the name suggests, a Directed Graph whose arrows have
labels on them. A Directed graph (or digraph) is a graph that is made up of a set of vertices
connected by directed edges, called arcs.
• HDF5 (Hierarchical Data Format version 5), is an open-source file format that supports large,
complex, heterogeneous data. HDF5 uses a "file directory" like structure that allows you to
organize data within the file in many different structured ways, as you might do with files on
your computer.
49
Vocabularies
• DDI (Data Documentation Initiative) is a free international standard for describing the data
produced by surveys and other observational methods in different sciences. DDI can
document and manage different stages in the research data lifecycle.
• SDMX (Statistical Data and Metadata eXchange) is an international initiative that aims at
standardising and modernising the mechanisms and processes for the exchange of statistical
data and metadata among international organisations and their member countries.
• RDF Data Cube provides a means to publish multi-dimensional data, such as statistics, on the
web in such a way that it can be linked to related data sets and concepts using the W3C RDF
(Resource Description Framework) standard.
• CSVW (CSV on the Web) is a standard method for publishing and sharing data held within
CSV files.
Formats
• XML (Extensible Markup Language) is a markup language for storing, transmitting, and
reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that
is both human-readable and machine-readable.
• JSON (JavaScript Object Notation) is a lightweight data-interchange format, easy both for
humans and for machines to parse and generate. JSON is a text format completely language
independent but uses conventions that are familiar to programmers of the C-family of
languages (C, C++, Java, Python …). These properties make JSON an ideal data-interchange
language.
• Turtle (Terse RDF Triple Language) is a syntax and file format for expressing data in the
Resource Description Framework data model. Turtle syntax is like that of SPARQL, an RDF
query language.
• JSON-LD (JSON for Linking Data) is a lightweight Linked Data format. It is based on the
already successful JSON format and provides a way to help JSON data interoperate at Web-
scale. JSON-LD is an ideal data format for REST Web services and unstructured databases
such as Apache CouchDB and MongoDB.
• CSV (Comma-Separated Values) file is a delimited text file that uses a comma to separate
values. Each line of the file is a data record. Each record consists of one or more fields,
separated by commas.
50
Annex 2 - Applications that use standards.
This annex includes a non-exhaustive list of applications which use standards. This list does not
in any way preclude endorsement of these tools; it is simply meant as a starting point in finding
applications for some of the standards.
RDF-based metadata management open-source tools
• Based on SKOS, iQvoc supports vocabularies that are common to many knowledge
organisation systems, such as thesauri, taxonomies, classification schemes and subject
heading systems.
• BluLab is a web based SKOS Editor developed by BluLab, Ohio. The web based SKOS editor
allows users to create, curate, version, manage, and visualise SKOS resources.
• Apache Jena is a Java framework for building Semantic Web and Linked Data applications.
• CKAN is an open-source DMS (data management system) for powering data hubs and data
portals. CKAN makes it easy to publish, share and use data.
DDI-based tools
A list of DDI-based tools, which cover various versions of DDI (e.g., codebook, lifecycle), as well
as a variety of functionalities from authoring and editing to data transformations and conversions,
can be found at:
51
DDI Tools | Data Documentation Initiative (ddialliance.org)
SDMX-based tools
A range of SDMX tools, which allows structural metadata management, reference metadata
editing, data management, reporting, dissemination, and other functionalities can be found at:
Tools | SDMX – Statistical Data and Metadata eXchange
52
Annex 3 - Roles and responsibilities from ISO/IEC 11179
The ISO/IEC 11179 is an international standard for representing, storing, and maintaining
metadata in a controlled environment (a metadata registry). This standard, consisting of six parts,
is focused on semantics, representation, and description of data. Its purpose is to promote
standard description of data; common understanding of data across organisational elements and
between organisations; re-use and standardization of data over time, space, and applications;
harmonization and standardization of data within an organisation and across organisations;
management of the components of descriptions of data; re-use of the components of descriptions
of data.
ISO/IEC 11179 is a general description framework for data of any kind, in any organisation and
for any purpose. ISO/IEC 11179 does not address other data management needs, such as data
models, application specifications, programming code, program plans, business plans and
business policies.
The 6th part of the standard provides registration guidelines, describing the procedure by which
metadata items, or other registry items, required in various application areas can be assigned an
internationally unique identifier and registered in a metadata registry maintained by one or more
Registration Authorities. Part of the Annex B is specifically devoted to the roles associated with
the metadata registry. A summary is provided below.
• registration authorities
• submitting organisations
• stewardship organisations
Each type of registration acting body should meet the criteria, fulfil the roles, and assume the
responsibilities. The Figure below provides a high-level view of how these organisational roles are
related within the context of a metadata registry.
Organisational roles to the metadata registry and their relationships (Source: ISO/IEC
11179-6:2023 ed. 4)
53
Role Responsibilities
Registration authorities (RA)
Metadata registry To establish itself as a registration authority, an organization
registration authority should complete the following.
Organizational unit that - Secure a Registration Authority Identifier (RAI), namely
establishes and publishes a unique internationally unique recognized organization
procedures for the code.
operation of its metadata
registry. A registration - Prescribe, amend, and interpret the procedures to be
authority should receive followed for the registration of administered items in
and process proposals from accordance with this document.
submitting organizations
for registration of - Determine any additional conditions specifically
administered items falling required by its domain of registration within its metadata
within its registration registry.
domain. A registration
authority is responsible for - Specify the format for each attribute and specify the
maintaining the metadata media by which an item for administration should be
register of administered submitted for registration.
items and issuing of
international registration - Establish and publish the rules by which its metadata
data identifiers (IRDIs). registry should be made available. The registration
authority shall specify the allowable users, the accessible
contents, the frequency of availability, and the
language(s), media, and format in which the information
is provided for the metadata registry.
Regarding applications for registering items for administration,
a registration authority should fulfil the following
responsibilities.
54
- Receive and process applications for the registration of
items for administration from its submitting
organizations.
- Assign international registration data identifier values,
and maintain a metadata register in accordance with its
procedures.
- Consult the appropriate stewardship organizations when
requests affect the mandatory attributes of the
administered items being registered.
- Handle all aspects of the registration process in
accordance with good business practice and take all
reasonable precautions to safeguard the metadata
register.
- Review and facilitate the progression of the applications
through the registration cycle.
- Assign an appropriate registration status.
- Notify submitting organizations of its decisions
according to the procedure specified in it rules.
Registrar The registrar provides a single point-of-contact responsible for
Organizational unit within managing and maintaining information about data in the
the registration authority, metadata register, under the authority of the registration
expert in registration authority. The registrar should be responsible for:
processes, responsible for a) monitoring and managing the metadata registry contents
facilitating the registration b) enforcing policies, procedures, and formats for populating and
of administered items and using the metadata registry;
making those administered c) proposing procedures and standard formats for the metadata
items widely accessible and registry to the control committee for consideration;
available to the community. d) recording current registration status for administered items
in the metadata register;
e) ensuring access for authorized users to contents in the
metadata registry;
f) assisting in the progression of administered items through the
registration status levels;
g) assisting in the identification and resolution of duplicate or
overlapping semantics of administered items in the metadata
register;
h) acting on direction from the registration authority;
i) effecting registration of administered items in external
metadata registers or dictionaries;
j) enforcing data registration procedures for submitting
administered items to the metadata registry, e.g.:
- how to prepare, submit, and process submissions of
administered items;
- how the metadata registry is used to avoid duplicate
administered items submissions to the metadata register;
55
- how the metadata registry is used to effect harmonization of
data across metadata registers of participating organizations;
- how external metadata registers are used as a source of
administered items for reuse in the metadata register;
k) maintaining a separate document recording the appropriate
contact information for all members of the control committee
and the executive committee;
l) adding new users or organizational entities that may become
authorized to access the metadata register;
m) maintaining other controlled word lists of the metadata
registry.
Executive committee The executive committee should be responsible for overall
Organizational unit policy and business direction for the metadata registry, to
responsible for include:
administering a) establishing overall metadata registry policies;
responsibilities and b) resolution of all business management issues pertaining to the
authority delegated by the metadata registry, e.g. copyrights, stewardship, executive
registration authority. committee membership, etc;
c) ensuring the long-term success and performance of the
metadata registry;
d) establishing and updating the metadata registry charter and
strategic plans;
e) meeting periodically in face-to-face meetings, with additional
meetings and/or teleconferences held as needed.
The executive committee will normally fulfil its responsibilities
via consensus building. Intractable issues may be resolved by an
established procedure.
Control committee The control committee provides overall technical direction of,
It provides technical and resolution of technical issues associated with, the metadata
direction and registry, its contents, and its technical operations.
harmonization of The control committee should be responsible for:
administered items for the a) overall conduct of registration operations;
metadata register. The b) promoting the reuse and sharing of data in the metadata
membership of the control register within and across functional areas, and among external
committee may include interested parties to the enterprise;
registrars and stewards. c) progressing administered items through “Qualified,”
“Standard,” and “Preferred Standard” registration status levels;
d) resolving semantical issues associated with registered
administered items, e.g. overlap, duplication, etc;
e) approving updates to Administered Items previously placed in
the metadata register with the “Qualified,” “Standard,” or
“Preferred Standard” registration status levels;
f) proposing metadata registry policies to the executive
committee for approval;
g) approving authorized submitters, read-only users, and types
of users, of the metadata registry;
h) approving metadata registry content, procedures, and
formats;
i) submitting management-related recommendations and issues
to the Executive Committee;
56
j) acting on directions from the executive committee;
k) meeting periodically in face-to-face meetings, with additional
meetings and teleconferences held as needed.
The control committee will normally fulfil its responsibilities via
consensus building in accordance with an established procedure.
Intractable issues may be resolved by an established procedure.
Stewardship organizations (StO)
Stewardship A stewardship organization should:
organizations - at the registration authority’s request, advise on the
They are usually designated semantics, name, and permissible values for the
by an organizational unit to administered item's attribute values submitted for
ensure consistency of registration;
related administered items
managed by its submitting - notify the registration authority of any amendments to
organizations. A the administered items assigned to the stewardship
stewardship organization is organization;
the organization, or part - decide, in case of confusion and/or conflict, on the
thereof, that is responsible attribute values of the assigned Administered Items.
for the integrity and
accuracy of the attribute
values of the administered
item, e.g. the semantics of
administered items
maintained and controlled
by a registration authority.
Steward Stewards provide specific expert points of contact responsible
Stewards should be for coordinating the identification, organization, and
responsible for the establishment of registered data for use throughout the
accuracy, reliability, and enterprise within an assigned functional area.
currency of descriptive Stewards should be responsible for:
metadata for administered a) coordinating the identification and documentation of
items at a registration status administered items within their assigned functional area;
level of “Qualified” or above b) ensuring that appropriate administered items in their
within an assigned area. assigned functional area are properly registered;
Stewards should be c) coordinating with other stewards to attempt to prevent or
responsible for metadata resolve duplicated efforts in defining administered items;
within specific areas and d) reviewing all administered items once they are in the
may have responsibilities “Recorded” status to identify and attempt to resolve conflicts
that cut across multiple among administered items with other stewards assigned
areas (e.g. value domains functional areas;
such as date, time, location, e) ensuring the quality of metadata attribute values for
codes for the countries of administered items they propose for the “Qualified” registration
the world). status level, reusing standardized data from external metadata
registers where applicable;
f) proposing “Standard” registration status level administered
items in their assigned functional area;
g) Proposing “Preferred Standard” registration status level
administered items in their assigned functional area;
57
h) ensuring that data registration procedures and formats are
followed within their assigned functional area;
i) recommending submitters to the registration authority.
Submitting organizations (SuO)
Submitting A submitting organization is responsible to:
organization - provide the information specified as required by the
Any organization that registration authority;
submits items to a
registration authority for - provide any additional information relevant to the item
entry into its metadata submitted for registration;
registry. Each registration - ensure that when an Administered Item has been
authority may establish its registered, specification of the attribute values of the
own criteria for registration administered item is not changed without first advising
eligibility. the registration authority.
Submitter Submitters are organization elements that are familiar with or
Organizational unit engaged in development and operational environments.
approved by a process Submitters maintain current administered items and are
defined by the registration engaged to describe and submit new administered items
authority. A submitter is following the registration requirements.
authorized to identify, and A submitter should be responsible for:
report administered items a) identifying himself to the register;
suitable for registration. b) identifying and documenting administered items appropriate
for registration in the metadata register;
c) submitting administered items to the metadata register;
d) ensuring the completeness of mandatory metadata attributes
for administered items proposed for the “Recorded” registration
status level.
Others
All others
A registration authority may establish guidelines on the use of their metadata registry by other
users. The general goal should be to provide an open area that anyone may use to obtain and
explore the metadata that is managed within the metadata registry.
Read-only user
Organizational unit or individual that is approved to review the contents of the metadata
register. A “read-only” user has access to the contents in the metadata register, but is not
permitted to submit, alter, or delete contents.
58
59