Benchmarking Open Data Automatically
Benchmarking Open Data Automatically
Benchmarking Open Data Automatically
As open data becomes more widespread and useful, so does the need for effective ways
to analyse it.
Benchmarking open data means evaluating and ranking countries, organisations and
projects, based on how well they use open data in different ways. The process can
improve accountability and emphasise best practices among open data projects. It
also allows us to understand and communicate how best to use open data for solving
problems. Future research and benchmarking exercises will need to happen on a larger
scale, at higher frequency and less cost to match the rising demands for evidence.
This paper explores individual dimensions of open data research, and assesses how
feasible it would be to conduct automated assessments of them. The four dimensions
examined are: open datas context/environment, data, use, and impact. They are
1
taken from the Common Assessment Methods for Open Data (CAF), a standardised
methodology for rigorous open data analysis. The paper proposes a comprehensive set
of ideal constructs and metrics that could be measured for benchmarking open data:
from the existence of laws and licensing as a measure of context, to access to education
as a measure of impact.
Recognising that not all of these suggestions are feasible, the paper goes on to make
practical recommendations for researchers, developers and policy-makers about how
to put automated assessment of open data into practice:
1. Introduce automated assessments of open data quality, e.g. on timeliness, where data
and metadata are available.
2. Integrate the automated use of global performance indicators, e.g. internet freedoms,
to understand open datas context and environment.
3. When planning open data projects, consider how their design may allow for automated
assessments from the outset.
Improving automatic assessment methods for open data may increase its quality and
reach, and therefore help to enhance its social, environmental and economic value
around the world. For example, putting an emphasis on metadata may ensure that data
publishers spend enough time on preparing the data before their release. This paper will
help organisations apply benchmarking methods at larger scale, with lower cost and
higher frequency.
This paper is part of a series produced by the Open Data Institute, as part of the Partnership for
Open Data (POD), funded by the World Bank.
What is open data?
Open data is data that is made available by governments, businesses and individuals for anyone
to access, use and share.
What is the Open Data Institute?
The Open Data Institute (ODI) is an independent, non-profit and non-partisan company based in
London, UK. The ODI convenes world-class experts from industry, government and academia
to collaborate, incubate, nurture and explore new ideas to promote innovation with open data.
It was founded by Sir Tim Berners-Lee and Professor Sir Nigel Shadbolt and offers training,
membership, research and strategic advice for organisations looking to explore the possibilities
of open data.
In its first two years, the ODI has helped to unlock over US$55m in value through the application
of open data. With 24 nodes around the world, the ODI has trained more than 500 people from
over 25 countries. In 2014, the ODI trained officials from countries including Botswana, Burkina
Faso, Chile, Malaysia, Mexico, Moldova, Kyrgyzstan and the UK on the publication and use of
open data.
What is the Partnership for Open Data?
The Open Data Institute has joined Open Knowledge and the World Bank in the Partnership
for Open Data (POD), a programme designed to help policy-makers and citizens in developing
countries to understand and exploit the benefits of open data. The partnership aims to: support
developing countries to plan, execute and run open data initiatives; increase reuse of open
data in developing countries; and grow the base of evidence on the impact of open data for
development. The initial funding comes from The World Banks Development Grant Facility (WB
DGF). Under POD, the ODI has carried out open data readiness assessments, strategic advice,
training and technical assistance for low- and middle-income countries across four continents.
In 2015, POD will merge with the Open Data for Development (OD4D) network. As part of this
new, larger network, the ODI will continue to take a lead in supporting the worlds government
leaders in implementing open data, and in doing so will continue to publish practical guides
and learning materials, such as this series of reports.
Table of contents
12
22
25
Glossary
28
Endnotes
29
Organisation
Description
E-Gov Survey/
5
Index
United
Nations Public
Administration
Network
Global Open
6
Data index
Open Data
Census/Open
Knowledge
Open Data
7
Barometer
Web
Foundation
& Open Data
Institute
Open Data
8
Monitor
European Union
Consortium
(inc. Open Data
Institute
Isolated research efforts, however, may lead to duplication, reduce comparability and stifle
innovative research. Even case studies that are, by design, unique, benefit from using an
overarching framework that embeds their results into the wider context of open data research.
The growing importance of open data means that future research and benchmarking exercises
will need to happen on a larger scale, with higher frequency and less cost. Only a quantitative
and scalable solution can meet these requirements while factoring in subjective indicators and
case study research. This study explores the feasibility of conducting automated assessment
of open data, based on the Common Assessment Framework.
Table 3.1 provides a brief introduction to each of the dimensions, an overview of the
current approaches in each and their potential for automation. This concise analysis
allows us to moderate our expectations of the potential for automation in each of the
dimensions.
Context/Environment
The context within which
open data is being
provided. This might be
the national context in
the case of central Open
Government Data, or
might be the context in
a particular sector such
as health, education or
transport.
Data
The nature and qualities of
open datasets. Including
the legal, technical
and social openness of
data, and issues of data
relevance and quality. The
framework also looks to
identify core categories
of data that might be
evaluated in assessments.
Use
How is data being used
and with what possible
outcomes? The framework
looks at the category of
users accessing data, the
purposes for which the
data will be used, and the
activities being undertaken.
This part of the framework
addresses the who, what
and why of open data in
use.
Impact
The benefits gained
from using specific
open datasets, or from
open data initiatives in
general. Benefits can
be studied according
to social, environmental,
political/governance, and
economic/commercial
dimensions.
10
Data quality
Validity of quantitative
metrics
11
12
CAF subcomponent
Organisational
Political will/
Leadership
Commitment to transparency
Government transparency index
Measure of centrality of openness in policy
Government data/technology context
Measure of the centrality of technology/data to government
policy
Level of government online service provision
Percentage of government documents that are digitised
Existence and strength of information management policy
Count of government data roles/positions (high level and
overall)
Engagement of government with other actors around open
data
Existence of information/data consultations
Measure of responsiveness of policy to consultation
processes
Level of engagement between agencies and developers
Government promotion of open data goals
Textual analysis of government communications
(speeches/press releases/publications) for key words
Count/percentage of government departments/agencies
releasing open data
Extent/strength of promotion of PSI reuse
13
CAF subcomponent
Technical
Social
14
CAF subcomponent
Economic
15
CAF subcomponent
Classification /
Sectors of datasets
Sectors of datasets
Comparison of published datasets in a sector against list
of key sector datasets, for example, based on the Global
14
Open Data Index
Cluster analysis of datasets released by sector
15
16
CAF subcomponent
and
educational
activity,
Statistics
National Statistics, Census, infrastructure, wealth, skills
Social mobility and welfare
Housing, health insurance and unemployment benefits
Transport and infrastructure
Public transport timetables, access points broadband
penetration
17
CAF subcomponent
Quality
4.3 Use: measuring how and why open data is being used
Measuring how open data is used requires an examination of:
18
CAF subcomponent
Purpose
Perceived motives
Percentage using open data in current field versus percentage
trying to enter a new field
Observed behaviour: increased value, lowered cost, improved
experience, disrupted or enhanced existing activities
Type of project: business/social/environmental
Ambition and goals
Scale of outputs: local, national, international
Percentage of those who publish/report results
Percentage of revenue types (premium, freemium etc)
Activities
Uses/outputs
Count/size of secondary open data
Analysis of applications and related tools
Type of project outputs: report, data, software etc.
Sectors
Sector/type of datasets most published
Sector/type of datasets most used
Sector/type of actors most involved
Sector/type of outputs most produced (apps, reports, etc)
19
High-level constructs
Education
Access to education
Quality of education
Lifelong learning and development opportunities
Health
Combating disease and increasing life expectancy
Promotion of healthy lives and well-being
Development of the healthcare system and healthcare delivery
Human settlements
Sustainable land use, building and infrastructure planning
Ability to house citizens
Ability to manage urbanisation
Transportation
Access to transportation
Increased efficiency of transportation
Transport infrastructure
Social development
Gender equality and empowerment of women
Protection of vulnerable society members
Social inequality
Personal financial management
Social and economic security
Environmental
20
CAF subcomponent
Political/
Governance
High-level constructs
Governmental efficiency
Public services
Reduced crime and violence
Governmental accountability
Reduced government corruption
Attitudinal changes toward government agencies
Civic engagement
Political freedom
Political participation
Economic/
Commercial
Economic prosperity
Innovation and entrepreneurship
Wealth and inequality
Employment and unemployment statistics
Job creation
Trade and investment
Growth in the open data landscape
Total number of open data businesses
Size/profit of open data businesses
Number of new jobs created in the (open) data sector
Size of tax revenue generated from open data companies
21
The next section demonstrates how new or existing benchmarking organisations can create
automated assessment methods measuring metrics within CAFs four dimensions. These
metrics should be able to supplement and streamline existing processes in a viable and
useful way.
22
Example metric
Source
Government data/
technology context
Importance of ICT to
government vision
(Variable 8.01)
Technical infrastructure
Firm-level technology
absorption (Variable 9.02)
30
Table 5.2 shows examples of data sources that are based on government open data portals.
Table 5.2. Examples of potential sources for CAF constructs for different countries
Construct(s)
Legal and
regulatory
constructs
23
Metric(s)
Textual analysis
of laws
Example countries
Kenya
Sweden
Sources
33
Construct(s)
Metric(s)
Example countries
Sources
RTI laws
Measure of
effectiveness
Brazil
USA
Access to information
35
statistics
Freedom of information
36
statistics
Government
promotion of
open data goals
Textual analysis
of government
communications
Australia
South Africa
Government media
37
releases
Department of
Communications
38
subscriptions
24
the literature on impact evaluation is vast and open data initiatives may be able to adapt many
of the leading practices.
This is not to say that in some cases an automated assessment is not attainable. However,
it is the researchers or organisations responsibility to justify why such metrics are a valid
representation of the open data impact.
The following recommendations are for new and existing benchmarking organisations:
1. Introduce automated assessments of open data quality, where data and metadata
are available
The analysis of datas nature and quality has the highest feasibility for automation. Data
are typically quantitative, in some form, and are associated with metadata, i.e. data about
data. This means that if data is provided in, for example, a hosting solution such as CKAN,
Socrata, DataPress or OpenDataSoft, researchers can build automated assessments on
top of these standardised platforms. The OpenDataMonitor project offers examples of
how this works.
2. Integrate the automated use of Global Performance Indicators (GPIs)
In the last decade, the availability of GPIs has risen dramatically. While many may be
unrelated to open data, there are several that may help to understand the context and
environment of open data initiatives. The advantages are that these indicators are usually
available for free, with regular updates, for many or all countries and based on deliberate
methodologies. On its own, a GPI may not be sufficient for a benchmarking approach,
but, as part of a wider scope, there is potential for automation.
3. Adopt an approach that considers the automated assessment of open data early
on in their planning
25
In many cases, automation fails for the most basic of requirements: the availability of
data. Without relevant and valid data sources, there will not be automated methods. It is
therefore crucial for researchers, developers and policy-makers to consider automation at
the design phase of their projects. Small changes such as the collection of key metadata
can make the difference whether an automated assessment is feasible later on. In general,
these considerations have wider benefits, for example, putting an emphasis on metadata
may ensure that data publishers spend enough time on preparing the data before its
release.
41
We invite researchers to share their approaches to data analysis and automation. As the open
data landscape evolves, established methods will improve, proposed methods will become
more feasible and new methods will emerge. Research in open data, similar to open data
itself, should therefore lead by example and stimulate the network effect of sharing leading
practices with the community.
26
Glossary
27
Endnotes
1.
The first draft of the framework was developed by the World Wide Web Foundation, the Governance Lab at NYU,
the ODI, and other organisations in a workshop held in June 2014. http://opendataresearch.org/sites/default/files/posts/
Common%20Assessment%20Workshop%20Report.pdf
2.
3.
4.
Open Data Research Network, Research Project: Open Data Barometer, http://www.opendataresearch.org/
project/2013/odb, accessed on 2014-12-18.
5.
United Nations Public Administration Network (2014). UN e-Government Surveys. Available at http://www.unpan.
org/egovkb/global_reports/08report.htm, accessed on 2014-12-18.
6.
7.
Open Data Research Network (2013). Open Data Barometer. Available at http://www.opendataresearch.org/
barometer, accessed on 2014-12-18.
8.
9.
World Wide Web Foundation & GovLab. (2014). Towards common methods for assessing open data: workshop
report & draft framework. Available at
http://opendataresearch.org/sites/default/files/posts/Common%20Assessment%20Workshop%20Report.pdf, accessed
on 2014-12-18.
10.
11.
12.
13.
Some general indicator examples can be found here: http://www.epsiplatform.eu/content/psi-scoreboardindicator-list, , accessed on 2014-12-18.
14.
E.g. Freedom House, 2013 Global Scores https://freedomhouse.org/report/freedom-net-2013-global-scores,
accessed on 2014-12-18.
15.
E.g. Transparency International, 2014 Corruptions Perception Index, http://www.transparency.org/cpi2014/
results, accessed on 2014-12-18.
16.
E.g. Reporters Without Borders, World Press Freedom Index 2014, http://rsf.org/index2014/en-index2014.php,
accessed on 2014-12-18
17.
18.
19.
20.
Taken from the G8 open data charter, available at https://www.gov.uk/government/publications/open-datacharter/g8-open-data-charter-and-technical-annex, accessed on 2014-12-18.
21.
22.
A full list of indices is awaiting publication: figure drawn from Kelley, J. G., & Simmons, B. A. (2014). The Power
of Performance Indicators: Rankings, Ratings and Reactivity in International Relations (SSRN Scholarly Paper No.
ID 2451319). Rochester, NY: Social Science Research Network. Retrieved 2014-12-18 from http://papers.ssrn.com/
abstract=2451319
23.
28
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
World Economic Forum (2012). The Global Information Technology Report 2013 Data Platform. Available at http://
www.weforum.org/global-information-technology-report-2013-data-platform, accessed on 2014-12-18.
34.
World Bank, Internet users (per 100 people), http://data.worldbank.org/indicator/IT.NET.USER.P2, accessed on
2014-12-18.
35.
36.
Freedom House, Freedom in the World, https://freedomhouse.org/report-types/freedom-world, accessed on
2014-12-18.
37.
World Economic Forum (2014). The Global Competitiveness Report 2014-2015. Available at http://reports.
weforum.org/global-competitiveness-report-2014-2015, accessed on 2014-12-18.
33.
18.
39.
40.
41.
42.
Australia.gov.au, Government Media Releases, http://www.australia.gov.au/news-and-media/government-mediareleases, accessed on 2014-12-18
43.
Department of Communications SA, Subscriptions, http://www.gcis.gov.za/content/newsroom/subscriptions,
accessed on 2014-12-18.
44.
45.
Atz, U., Heath, T., Heil, M., Hardinges, J., & Fawcett, J. (2014) Best practice visualisation, dashboard and key
figures report. OpenDataMonitor. Open Data Institute, London, UK. Available at http://project.opendatamonitor.eu/
wp-content/uploads/deliverable/OpenDataMonitor_611988_D2.3-Best-practice-visualisation,-dashboard-and-key-figuresreport.pdf, accessed on 2014-12-18.
46.
47.
29
30