Blueprint XXIII Web PDF
Blueprint XXIII Web PDF
Blueprint XXIII Web PDF
Mapping
competitiveness
with European data
DAVIDE CASTELLANI AND ANDREAS KOCH
ISBN 978-90-78910-36-7
9 789078 910367
15
BRUEGEL BLUEPRINT 23
16/2/15
10:03
Page i
Mapping
competitiveness
with European data
DAVIDE CASTELLANI AND ANDREAS KOCH
16/2/15
10:03
Page ii
16/2/15
10:03
Page iii
16/2/15
10:03
Page iv
The project leader is Lszl Halpern for CERS-HAS. The leaders of the six teams are:
Carlo Altomonte (Bocconi University) for Bruegel; Giorgio Barba Navaretti (University
of Milan) for LdA; Gbor Bks for CERS-HAS; Andreas Koch for IAW; Lionel Fontagn
for Paris School of Economics and Philippe Martin for Science Po.
Supporting institutions are the National Bank of Belgium, Banque de France, Banco
de Espaa, Deutsche Bundesbank, Banca dItalia, Magyar Nemzeti Bank and the
Italian National Institute of Statistics (ISTAT).
16/2/15
10:03
Page v
Contents
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vii
About the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ix
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Executive summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
16/2/15
10:03
Page vi
3.2.2
3.3
Annex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122
6.1 Assessment of the indicators of competiveness . . . . . . . . . . . . . . . . . . . . . .122
6.1.1 Productivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122
6.1.2 Trade competitiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .126
6.1.3 Price and Cost Competitiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . .129
6.1.4 Innovation & Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .131
6.1.5 Firm Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .137
6.1.6 Global Value Chains (GVCs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .139
6.2 List of Sources for macro indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .145
6.3 The MAPCOMPETE meta-database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .147
6.4 Detailed tables and comments for Chapter 2.2 . . . . . . . . . . . . . . . . . . . . . . .150
6.5 Detailed tables for Chapter 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .162
6.6 Synthesis of accessibility conditions for micro-data in EU . . . . . . . . . . . .167
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .171
vi
16/2/15
10:03
Page vii
Acknowledgements
This report is truly the result of a collective eort by many senior and junior researchers
at the six institutions that are part of the MAPCOMPETE project (Bruegel, Centro Studi
Luca dAgliano, CERS-HAS, IAW, Paris School of Economics and Sciences Po). The
authors would like to acknowledge that parts of this report were initially drafted by
other researchers. In particular:
Section 2.1 builds on the Competitiveness Indicators Report (deliverable 3.1 of the
MAPCOMPETE Project) by Carlo Altomonte, Marco Antonielli, Michael Blanga-Gubbay
and Silvia Carrieri (Bruegel);
Sections 2.2 and 2.3 build on two technical reports describing the state of the art at
the sectoral, regional and aggregate level and at the microeconomic level
(deliverables 2.1 and 2.2 of the MAPCOMPETE project) by Davide Castellani, Silvia
Cerisola, Giulia Felice, Emanuele Forlani and Veronica Lupi (LdA). The collection of
information on availability and computability of indicators used in these sections
also beneted from the contribution of Chiara Angeloni (LdA), Marco Antonielli
(Bruegel), Zsuzsa Holler (CERS-HAS) and Gianluca Santoni (PSE). Many ocials and
researchers within national statistical institutes, national central banks and other
institutions were also a key part of this task. A specic acknowledgement of these
contributions is provided in Box 2.3;
Chapter 3 is largely based on a report on the General considerations of the
matchability of datasets within and across countries and regions (deliverable 4.2
of the MAPCOMPETE project) by Andreas Koch and Katja Neugebauer (London
School of Economics, formerly IAW). Carlo Altomonte and Elena Zaurino (Bruegel)
contributed to chapter 3.3;
Chapter 4 was written by Gbor Bks with Zsuzsa Holler (CERS-HAS);
Giorgio Barba Navaretti (LdA) and Carlo Altomonte (Bruegel) provided valuable
insights into section 5.
vii
16/2/15
10:03
Page viii
Finally, the authors would like to thank Lauro Panella (European Commission) and Jan
Hagemeier (Central Bank of Poland) for acting as discussants on a previous version of
this report and for providing very fruitful insights.
Davide Castellani and Andreas Koch
Perugia and Tbingen, January 2015
viii
16/2/15
10:03
Page ix
ix
16/2/15
10:03
Page x
Foreword
Reality for policymakers, and the decisions they make, are to a great extent products
of the statistics available to them. It is not just the coverage and harmonisation of data
that are important. It is also the type of data that matters. As Europe attempts to put
itself back on the path to growth, the need for clear data on competitiveness, for an
accurate statistical underpinning beyond some broad macroeconomic broad indicators
and for new insights from new ways of looking at economic data has arguably never
been greater.
This volume a product of the MAPCOMPETE EU-funded project, in which Bruegel
participates provides an important service for researchers and policymakers by
examining the availability and usefulness in Europe of indicators of competitiveness.
At the country, sector and regional levels, the authors nd that Europe is served rather
well. In addition, micro-data, which could be used to tell us about competitiveness at
the rm level, is generated by EU member states. A previous project initiated by
Bruegel, EFIGE, examined the characteristics of rms that succeed globally and
showed why rm-level information is needed (www.ege.org/). But it can be hard in
practice for researchers to access the micro-data and to use it to create bottom-up
indicators of competitiveness.
This is an area in which policymakers should intervene. The matchability and
accessibility of data should be improved. The authors of this Blueprint set out a number
of practical ways in which this can be done to some extent in the short term, but a
longer-term approach is also required to build an eective European statistics
framework that will support broad growth and competitiveness objectives. This volume
shows how it can be done.
Guntram B. Wol, Director of Bruegel
Brussels, February 2015
16/2/15
10:03
Page 1
Executive summary
1.
A rst good reference at the international policy level that goes in this direction is the Competitiveness Research
Network , set up by the System of European Central Banks.
16/2/15
10:03
Page 2
16/2/15
10:03
Page 3
appraisal is carried out of the extent to which data for dierent countries can be
matched and/or bottom-up indicators of competitiveness can be compared for
dierent countries.
Our overall conclusions are:
1. Competitiveness indicators are available at the country, sector and regional level
(eg unit labour costs, price indices, REER, trade balance data, aggregate
productivity) and are generally computable for relatively long time series in most
EU28 countries. These macro indicators are also generally easily accessible via
Eurostat, national statistical institutes, national central banks or other data
providers, and can usually be compared across EU countries.
2. Availability of micro-data, and therefore computability of bottom-up indicators, is
also rather good for many countries. This implies that, within countries, it is
possible, in principle, to match dierent databases.
3. There is, however, a major problem in accessing both specific databases and even
more matched data in many EU countries. The report highlights many legal, nonlegal (such as unclear procedures, restrictions on the nationality of data users) and
technical barriers severely limiting the access to data and consequently the ability
of researchers to construct bottom-up indicators that are not generally constructed
by statistical agencies.
4. Furthermore, if we consider building up cross-country statistics from micro-level
data, which should be the nal aim of any meaningful assessment of European
competitiveness, the quality of European statistics is at the moment rather poor,
due to limited harmonisation, matchability and accessibility of data. The possibility
to build pan-European micro-level databases to assess the state and the dynamics
of competitiveness in the whole region is limited, notwithstanding the considerable
eorts of the European Statistical System (ESS) to coordinate national statistical
institutes (NSI) to harmonise the methodology, the scope and the legal framework
for data collection and processing.
Policy: what should be done?
This report shows that the information on measures of competitiveness currently
available to researchers is insucient. Aggregate data, which is easily accessible and
widely available, does not allow researchers to provide the answers that policymakers
3
16/2/15
10:03
Page 4
16/2/15
10:03
Page 5
16/2/15
10:03
Page 6
projects funded by the European Commissions Seventh Framework and Horizon 2020
programmes. Carefully crafted annual surveys will allow new measures of competitiveness to be constructed and, at the same time, provide a greater understanding
of its dynamics even in the short term.
16/2/15
10:03
Page 7
1 Introduction
16/2/15
10:03
Page 8
16/2/15
10:03
Page 9
competitiveness in Europe, mainly on the basis of micro-level data. By doing so, the
Blueprint, and the associated meta-database2, which provides detailed information
on data accessibility and computability for more than 150 indicators, can be a starting
point for a researcher interested in measuring competiveness, or for policymakers
interested in the feasibility and in the quality of alternative measures. It also aims to
identify the opportunities emerging from recent progress made by scientic research
and facilitated by dierent data providers who increasingly make their data available
to research. Finally, this inventory allows us to identify the main issues that need to
be addressed by policymakers in order to improve data accessibility for the economic
analysis of competitiveness in Europe.
The report is organised as follows. We rst introduce, in chapter 2 (section 2.1), some
general considerations on the measurement of competitiveness by developing and
presenting a system of indicators organising the eld into dierent areas. Chapter 2
also contains an extensive inventory of the available data both at the macro level (2.2)
and at the micro level (2.3) mainly produced by public data providers, such as EU
national statistical institutes and national central banks. This inventory highlights
whether information on competitiveness is available in EU countries, whether,
especially for micro-level data, it can be combined to compute the relevant indicators
of competitiveness and to what extent an external researcher (ie not aliated with
the data provider) can access the data. Chapter 3, then focuses on the availability of
micro-data comparable across countries. It briey reviews issues related to the
matching of micro-level data within and between countries, illustrates the Eurostat
experience in facing an increasing demand for micro-data comparable across countries
(3.2), and contains both an inventory and some illustrative examples of datasets that
contain information on previously unconnected areas or that gather information from
dierent countries (section 3.3). Chapter 4 presents some nal considerations on how
to improve access to micro-data related to the measurement of competitiveness in
the future. Chapter 5 oers some policy recommendations3.
2.
3.
The MAPCOMPETE meta-database, which allows searching for meta information on availability and accessibility
of data needed to build indicators of competitiveness for the 28 EU countries, is available at
http://www.mapcompete.eu/.
The Annex (section 6 of this Blueprint) provides the more technical details on the indicators of competitiveness
and detailed tables.
16/2/15
10:03
Page 10
2 Mapping competitiveness
indicators in the EU countries
2.1 Indicators of competitiveness
With the purpose of improving the toolbox of competitiveness indicators, the Mapping
European Competitiveness (MAPCOMPETE) project provides an assessment of data
opportunities and requirements for the comparative analysis of competitiveness in
European countries. Existing competitiveness indicators have been surveyed in order
to provide a critical assessment and a selection of indicators to be used in the datamapping exercise. This section introduces the methodology, the assessment and the
results of this survey and serves as a manual to interpret the ndings of sections 2.2
and 2.3.
2.1.1 Methodology
Competitiveness indicators cover almost all aspects of market performance. Price and
quality, the ability to innovate, the structure of the labour market, the level of
international integration of markets, and qualitative conditions of countries business
environments are frequently evoked in discussions of competitiveness. In fact, there
is no shared denition of competitiveness or consensus on how to measure it. We
decided, in line with Altomonte et al (2011), to consider competitiveness as related to
the ability of rms in a given country not the country itself to mobilise and
eciently employ (also outside the countrys borders) the productive resources
required to oer those goods and services for which other goods and services can be
obtained (domestically or internationally) at favourable rates of substitution (or terms
of trade).
This denition was inspired by a large body of economic literature suggesting that the
performance of countries is greatly aected by the performance of rms. Understanding
rm competitiveness is thus central to the policy discussion: the relevance of the
heterogeneity of rms in terms of their size, productivity, internationalisation strategies
and so forth means that policy needs to be designed around diverse characteristics
and strategic responses rather than around an invariant representative firm.
10
16/2/15
10:03
Page 11
In light of the denition, we conducted a systematic investigation of existing competitiveness indicators in the economic literature, policy papers and other sources.
Focusing on the performance of rms aected the search in two ways.
First, we focused in particular on indicators that aggregate information from rm-level
data, which we label as bottom-up indicators. These indicators can be useful
complements to the macro-indicators, constructed with aggregated data. Indeed, one
of the major contributions of MAPCOMPETE is to highlight where the existing standard
competitiveness indicator toolbox can be enriched with harmonised and complementary bottom-up indicators.
Second, recognising that rms compete not only on price, we gave special attention to
non-price competitiveness indicators. This induced a view of competitiveness that has
sustainable growth as the underlying concept.
Despite taking this direction, the lack of a common understanding of competitiveness
in the policy debate motivated us to further specify our analysis. In our conceptualisation, indicators of competitiveness are distinguished from drivers of
competitiveness. In theory, the dierence is striking: indicators tell us if rms,
countries, sectors or regions perform well compared to each other; drivers tell us what
determines this performance. However, in practice this dierence is less obvious:
indicators and drivers are sometimes used in the same context to denote dierent
aspects of competitiveness; in other cases, indicators are not used as outcomes but
rather as determinants.
In this chapter, we deal primarily with indicators rather than with drivers of
competitiveness. In a commentary published in the Financial Times4, Risto Penttila,
chief executive of the Finnish Chamber of Commerce, made a very compelling
argument, which supports our choice:
Either the World Economic Forum is wrong or Europe is in deep trouble. The latest
competitiveness rankings from the Swiss think-tank list Finland as the most
competitive country in the EU. At first, the countrys business leaders thought
someone was pulling their leg. But the news was real. If Finland is the best the EU
can offer, we should all be very concerned. (...) The reports authors define
competitiveness as the set of institutions, policies, and factors that determine the
level of productivity of a country. But Finlands experience shows that having well4.
If Finland is the best Europe can do, we should be worried, Financial Times, 24 June 2014.
11
16/2/15
10:03
Page 12
functioning institutions is not a cure-all. The country ticks all the boxes: wellprotected property rights, good schools, reliable infrastructure, predictable
macroeconomic policies. It is one of the biggest spenders on research and
development in the world. Yet the productivity of Finnish industries has plummeted
since 2009.
2.1.2 Classication logic and selection of indicators
Organising competitiveness indicators around several concepts helped us to assess
them against their primary objective, comparing similar indicators, and nding
complementarities. We use the following six concepts:
1.
2.
3.
4.
5.
6.
Productivity
Market share
Prices and costs
Innovation and technology
Firm dynamics
Global value chains
16/2/15
10:03
Page 13
Number of indicators
Productivity
18
Trade competitiveness
21
15
43
Firm dynamics
32
Others
16/2/15
10:03
Page 14
other words, one can highlight the availability of data for each indicator across
countries, or of each country across all indicators. We believe the former is more
informative for the aim of this report, which is to provide an overview of the availability
of comparable competitiveness indicators in dierent countries.
It is worth mentioning that some indicators can be computed from more than one
source and the dierent sources could imply dierent coverage in terms of countries,
time spans and/or sectors and regions. The results presented in this chapter are based
on the authors a priori choices of the most appropriate source for each indicator. In
particular, we assigned a higher priority to data sources which were more exhaustive
in terms of the information they provide about countries (ie we assigned cross-country
comparability a higher priority). If two (or more) sources provide the same country
coverage, we preferred the one with the longer time series.
Detailed tables and comments are provided in the Annex (Section 6.4). Here, we
summarise the main conclusions from this task.
For the 89 indicators of competitiveness at country, sectoral and regional levels for
the EU28 countries, our analysis shows that the degree of computability for the macroindicators is quite good. However, there are some exceptions. It is possible to group the
exceptions in three main categories: i) by country, ie, there are countries for which
data availability is particularly scarce for the majority of indicators; ii) by indicator, ie
there are indicators on which information is particularly scarce for the majority of
countries and levels of aggregation; iii) by level, ie there are levels of aggregation on
which information is particularly scarce for several indicators for the majority of
countries.
In terms of exceptions by country, most EU28 countries show a good level of
computability for a relevant number of indicators at dierent aggregation levels.
Information is scarcer for Croatia and Greece than for other countries.
In the second category of exceptions, information on indicators of rm dynamics (such
as entry and exit rates) is quite heterogeneous across countries and levels of
aggregation, but in general only half of the countries show the highest level of
computability. The indicators belonging to intangible assets and nancial activity are
computable for only a few countries and/or quite short time intervals. The information
on R&D expenditure and output is in general quite good with the exception of license
and patent revenues from abroad as percent of GDP and EU Summary Innovation Index
(SII), which are computed and comparable across countries for all EU countries since
14
16/2/15
10:03
Page 15
16/2/15
10:03
Page 16
scarcer and less homogeneous than at the aggregate level across countries and
indicators. In particular, the indicators computability is high at the aggregate level, but
quite limited at both sectoral and regional levels for those indicators belonging to
labour productivity and Total Factor Productivity, innovation activity, SMEs and R&D
expenditure and output. The indicators computability is high at the aggregate level
but quite limited at the sectoral level for those indicators belonging to trade
competitiveness (Group 4), while the indicators computability is high at the aggregate
level, but quite limited at the at the regional level for those indicators belonging to
innovation activity, all rms (Group 8).
2.3 Mapping the bottom-up indicators
2.3.1 Methodological issues
As we have illustrated, indicators of competitiveness can be calculated at national,
sectoral and regional level by aggregating rm-level data, ie. by applying a bottom-up
approach. Firm-level data allows researchers and policymakers to dene a multitude
of indicators that can be used to describe phenomena such as dierences in regional
productivity, the entry and exit rate in a specic market or international competitiveness (eg the intensive and extensive margin of trade).
This section provides an overview of the availability and accessibility of data needed
to compute a series of bottom-up indicators of competitiveness for the EU28 countries.
We discuss both the degree of computability of dierent indices and the degree of
accessibility of rm-level data which is necessary to compute the related indicators.
While the computability concerns the quality and time coverage of indicators,
accessibility concerns limitations on access to rm-level data5. This information is
extracted from the meta-DB (section 6.3) and will be fully searchable, jointly with the
other meta-data, via a webtool at www.mapcompete.eu.
It is worth mentioning that this section focuses on indicators that are well-established
in the literature on competitiveness, as reviewed in section 2.1, and that can be
computed from micro-level databases collected mainly by national central banks
(NCB) and national statistical institutes (NSI). Surveys, projects or commercial
databases can also oer internationally comparable indicators/data on competitive5.
As mentioned above, at this stage we mainly rely on ocial rm-level data collected by central banks and national
statistical oces.
16
16/2/15
10:03
Page 17
6.
For example, we consider the possibility to dene the average, the median, and the standard deviation of TFP for
exporting rms.
17
16/2/15
10:03
Page 18
Labour productivity
This area includes information which is used to calculate the labour productivity index
as value added per worker. The index is dened for dierent type of rms such as
domestic rms, exporters, importers, multinationals, aliates of foreign multinational
rms, foreign and domestic-owned exporters. Moreover, in this category we also
consider the rms unit labour cost. Regional and sectoral dimensions are taken into
account, as well as the possibility to dene dierent points of index distribution.
Summary results are reported in Table 6.14.
Total Factor Productivity (TFP)
Similarly to labour productivity, for each country, we collected information on the
availability of the data that is necessary to calculate rm-level TFP. In addition, the
decomposition of TFP proposed by Olley and Pakes (1996) and the decomposition of
TFP growth proposed by Foster et al (2001) are also considered. Regional and sectoral
dimensions are included, as well as the possibility to dene dierent statistical
moments of index distribution. A full list of indices can be found in the annex. Summary
results are reported in Table 6.15.
Firm dynamics
Another source of competitiveness is the rate of turnover of rms (ie the entry and exit
rate), and the average growth rate of rms. Therefore, the data mapping includes
information on rms entering and exiting the market, survival rates after dierent time
periods, average rm size (relative to age), dispersion of rms by size and growth rate.
Summary results are reported in Table 6.16.
International activities
In this area, we mapped the availability of information on trade activity at the rmlevel. This group includes data on the number of export destinations, number of
exporting rms (total and by destination), number of products exported (total and by
destination). In addition, dierent measures of the intensive and extensive margins of
trade are included, as well as rm-level estimates of quality (unit value of exports).
Information on the number of foreign-owned rms as a share of all rms, and the share
of domestic multinational rms (MNFs) to total rms (by country, sector and region)
are also collected. Summary results are reported in Table 6.17.
18
16/2/15
10:03
Page 19
Degree of
Colour
computability code
Good time span and good matchability. Observations at least since the year
Green
Yellow
Red
Grey
Accessibility
code
Colour
code
Green
Yellow
Red
Grey
2000.
Observations only after the year 2000; matching different datasets is
basically possible, but associated with some problems.
No matchability and/or only few years of data (from 2006).
With the available information it is not possible to assess the time span and/or
the matchability.
19
16/2/15
10:03
Page 20
20
16/2/15
10:03
Page 21
(http://www.ecb.europa.eu/home/html/researcher_compnet.en.html). CompNet is a
project to build bottom-up indicators exploiting data accessible by national banks.
As a matter of fact, some of the indicators that MAPCOMPETE considers relevant
bottom-up indicators have been actually computed within CompNet.
With the help of Filippo di Mauro and Paloma Lopez-Garcia at the ECB we were able
to nd contact persons in each of the 28 EU member states. In some cases, those
contact persons were able to help us ll the MAPCOMPETE MetaDB and in other cases
they referred us to people within the NSI. In cases in which we could not nd a
personal contact, we compiled the information based on publicly available
information. After a rst round of data collection, we drafted a rst version of this
report and sent it to contact persons within NSIs and NCBs for validation. In cases in
which we had no direct contact, we sent the draft to a generic contact email within
the NSI. This prompted replies from NSIs and NCBs, which allowed us to further
integrate the information collected. At the end of this process, we were able to report
on 25 out of the 28 EU countries: Austria, Belgium, Bulgaria, Croatia, Czech Republic,
Denmark, Estonia, Finland, France, Germany, Hungary, Ireland, Italy, Latvia,
Lithuania, Malta, Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain,
Sweden and the United Kingdom.
In Austria, Denmark and Spain the information could not be veried by the NSI.
From Cyprus, Greece and Luxembourg we were not able to gather enough information
from publicly available sources and the contact persons that we had identied were
not able to help us, so these countries are not included.
We would like to thank all the people that, within each country, helped us gather the
information needed to compile the MAPCOMPETE MetaDB.
Country
Contact persons
Institution
Belgium
Catherine Fuss
Bulgaria
Svetoslava Filipovich
Croatia
Blaenka Vukeli
Kamil Galuscak
Pavel Hjek
Zuzana Cabicarov
Czech Republic
21
16/2/15
10:03
Page 22
Estonia
Aavo Heinlo
Jaanika Merikyll
Finland
Satu Nurmi
France
Philippe Brion
Germany
Sven Blank
Hungary
Peter Harasztosi
Keith McSweeney
Iulia Siedschlag
ESRI
Italy
Stefania Rossetti
Latvia
Lilita Laganovska
Inga Malasenko
Sandra Vitola
Lithuania
Vera Bezviuk
Malta
Netherlands
Harry Goossens
Jan Hagemejer
Karolina Szlesinger
Romania
Virginia Balea
Slovakia
Tibor Lalinsky
Urka Cede
Spain
Sweden
Eva Hagsten
United Kingdom
Daniele Bega
Ireland
Latvia
Poland
Portugal
Slovenia
22
16/2/15
10:03
Page 23
2.3.2
Austria7
The data needed to compute bottom-up indicators derives from two main datasources. The rst source is a rm sample with detailed balance sheet data collected
by the statistics department of the sterreichische Nationalbank (OeNB). In recent
years, the sample has been approximately 8,000 rms per year, representing 35
percent of total employment. The sample is clearly biased towards larger enterprises.
The database starts in the early 2000s. The rather low number of rms is because
only larger corporations have to publish their balance sheets. The OeNB collects
additional balance sheet data from rms receiving larger loans from banks. This is
the reason why the OeNB rm sample, small as it is, is larger than the one collected
by Bureau Van Dijk which covers fewer than 3,000 rms per year for Austria (Sabina
database).
The second data source is OeNB-Statistics Austria micro-data on exports, imports and FDI.
Labour productivity Labour productivity is computable only for the non-representative sample of rms for which balance Sheet data is available at OeNB and only
from the early 2000s. Under these conditions, micro-aggregated labour productivity
(average, median, other moments) is computable for all rms (I_001_04) and for
exporters (exploiting the matching with OeNB-Statistics Austria micro-data on exports,
imports and FDI). Micro-aggregated ULC (average, median, other moments) for all rms
(I_013_02) is also computable under with the above-mentioned constraints. This
information, however, is not accessible.
TFP Under the conditions already explained for labour productivity, Micro-aggregated
TFP (average, median, other moments) is computable for all rms (I_003_03) for
exporters (I_003_05) and for importers (I_003_06). Olley and Pakes TFP decomposition
(I_004_01) and Foster decomposition of TFP growth (I_005_01) are also computable.
This information, however, is not accessible.
Firm dynamics As far as we have been able to reconstruct, only dispersion by rms
size (I_055_01) and share of fast-growing rms (which we refer to as gazelles8)
7.
8.
Please note that the information provided was collected mostly from publicly available sources and has not been
veried by the National Statistical Oce.
These are generally dened as rms displaying growth rates signicantly above the average rm (see, for
instance, Henrekson and Johanson, 2010).
23
16/2/15
10:03
Page 24
(I_056_01) are computable (the conditions mentioned above still apply). This
information, however, is not accessible.
Internationalisation Through the OeNB-Statistics Austria micro-data on exports,
imports and FDI, possibly matched with balance sheet data collected by the OeNB, the
following indicators are computable: average, median and other moments of value of
export per exporting rm (I_009_02), number of exporting rms (extensive margin)
(I_048_01), average, median, other moments of export sales as a share of total turnover
(intensive margin) (I_047_01), number of importing rms (extensive margin)
(I_048_01) and average, median, other moments of imported intermediates as a share
of total cost of materials (intensive margin) (I_050_01). This information, however, is not
accessible.
R&D and other activities Information on R&D expenditure is available in the balance
sheet data collected by the OeNB. Thus, R&D expenditure mean (I_023_04), R&D
expenditure (% of turnover) mean (I_023_05) and asset tangibility (I_059_03) are
computable under the above-mentioned restrictions. This information, however, is not
accessible.
Accessibility
The sources quoted above are not publicly available.
Belgium
The data needed to compute bottom-up indicators derive from two main sources. The
rst source is BelFirst database collected by Bureau Van Dijk. BelFirst is publicly
available (conditional on a fee payment). It includes information on rms balance
sheets. Sector classication is identied by a NACE code at 5-digit level. Most of the
series of data start from 1995.
The second data source is the National Bank of Belgium (NBB). NBB collects data on both
balance sheets, trade at rm-level (Transaction Trade dataset), and FDI (Survey on Foreign
Direct Investments). Sector classication is identied by a NACE code at 4-digit level
(both rev1.1 and rev.2). Production data are disaggregated at CN8 product level.
Labour productivity Labour productivity is computable in all its versions. Existing
data allows the calculation of micro-aggregated labour productivity (average, median,
other moments) for all rms (I_001_04), domestic rms (I_001_05), exporters
24
16/2/15
10:03
Page 25
(I_001_06) and so on. Access to rm-level data is condential. Only the labour
productivity (I_001_04) and the unit labour cost (I_013_02) are fully accessible,
because the necessary data is available in BelFirst. In the case of indicator by export
status (eg, I_001_06), or ownership (eg, I_001_09), the indices are computable but not
accessible.
TFP Similar to labour productivity, TFP can be easily calculated for a long time span.
Existing data allows the calculation of micro-aggregated TFP (average, median, other
moments) for all rms (I_003_03), domestic rms (I_003_04), exporters (I_003_05), and
so on. Again, access to rm-level data at NBB is condential. The TFP indices are all
computable but only the indices for all rms (I_003_03) and the TFP decomposition
index (I_004_01, and I_005_01) are accessible, because the necessary data are
available in BelFirst.
Firm dynamics The entry rate is poorly computable (I_051_03) because of the lack
of entry and exit information both in BelFirst and the NBB database: if a rm enters
these databases, it does not necessarily mean that the rm is a brand new one (the
same for exit). The only reliable source is CompNet database, where this indicator is
already computed. Similarly, the exit rate (I_052_03), and the survival rate (I_053_01)
are not clearly computable. Instead, indicators on rms growth are easily computable.
The dispersion of rms by size (I_055_01) is computable and accessible through
BelFirst, while average rm size by age (I_054_01) and the share of gazelles (I_056_01)
are computable but not accessible (entry and exit data are in the NBB database).
Internationalisation NBB has a rich dataset that collects information on trade activity at
rm-level. All the indices listed in Table 6.17 such as the average (median, variance, and
other moments) of number of export destination per exporting rm (I_043_01) are
computable. The intensive (eg, I_047_01) and extensive (eg, I_045_01) margin of trade are
also computable. However, NBB data is condential and the indices are not accessible.
R&D and other activities R&D data is not available at NBB and in BelFirst. Moreover,
the R&D expenditures are poorly reported in annual accounts, and only for the largest
rms. However, R&D data is available from 1998 to 2011 at Belspo (Federal Public
Planning Service Science Policy)9. Instead, it is possible to calculate (with NBB data)
the share of foreign owned rms (I_041_03), and the share of domestic multinationals
9.
This information has been retrieved from the website (http://www.belspo.be/belspo/index_en.stm). In principle,
micro-level data at belspo should be identiable by VAT number and thus matchable with NBB data. R&D data is
potentially accessible at belspo (conditional on a project submission). However we were not in position to verify
such information on matchability and accessibility.
25
16/2/15
10:03
Page 26
26
16/2/15
10:03
Page 27
Labour productivity Indicators in this group are not all perfectly computable. It is
possible to measure only the labour productivity (I_001_04), and unit labour cost
(I_013_02) for all rms from 2001 using survey data. In the case of importers (I_001_07)
and exporters (I_001_06) the degree of computability is lower (data from 2008). The
index can be calculated both at sectoral and regional levels.
TFP TFP index and its decompositions are computable from 2001 (I_003_03, I_04_01,
and I_005_01) with survey data. The degree of computability of TFP by trade status is
low (eg, I_003_05 from 2008).
Firms dynamics There is information on rms dynamics in Bulgaria from 2005.
Furthermore, it is possible to compute indices on rms dispersion (I_055_01) and share
of gazelles (I_056_01) from 2001. For the existing data, accessibility is limited.
Internationalisation Concerning internationalisation, Bulgarian databases provide
information on external trade from 2008. All the internationalisation indicators are
computable from 2008.
R&D and other activities R&D data has been collected since 2001, as well as data on
tangible assets. Information on the unit values of exports has been collected since
2008 (I_62_01), while information on rms ownership starts only in 2010.
Accessibility
All the sources mentioned above are restricted, and access is strictly regulated by the
Protection of Secrecy (chapter 6, of Statistical Act).
The micro-data from dierent statistical elds is accessible, if it is possible and does
not conict with existing regulations, and after a decision of the Commission appointed
under Art.10 of the Rules for providing of anonymised data on scientic and research
purposes. These rules govern the provision by BNSI of micro-data and the procedure
for obtaining them. The rules are based on, and in accordance with, requirements of
national and relevant EU legislation. See
https://unstats.un.org/unsd/dnss/docViewer.aspx?docID=2772. See also indicator
15.4 in
http://www.nsi.bg/sites/default/les/les/pages/LegalBasis_e/BG_report_FINAL.pdf.
27
16/2/15
10:03
Page 28
Croatia
Firm-level data for Croatia is derived from the Croatian Bureau of Statistics (CBS). The
main sources are Structural Business Statistics and Community Innovation Surveys
(CIS) compiled for Eurostat, complemented by data on international trade collected
by the same oce. Firm-level balance sheet information is not available in Croatia,
with the exception of for turnover and R&D expenditures, which are collected for the
CIS.
Sectoral disaggregation is NACE, 4-digit, while regional disaggregation depends on
specic variables/datasets.
Labour productivity No indicator is computable, because rm-level information on
value added and number of employees is not available.
TFP As above, no indicator is computable, also because rm-level information on
value added and number of employees is not available.
Firms Dynamics Entry rate (birth rate) (I_051_03) and exit rate (death rate)
(I_052_03) are computable since 2008. However, in 2014 the CBS has started to follow
more accurately rms survival. Real births are available from 2010 onwards, and only
survival for 1-3 years is observable. The other indicators are not computable because
of the lack of information on rms ages and number of employees12.
Internationalisation Average, median and other moments of value of exports per
exporting rm, total (I_009_02) and average, median, other moments of export sales as
a share of total turnover (intensive margin) (I_047_01) are not computable because
the information on value of production sold abroad is not available, while average,
median, other moments of imported intermediates as a share of total cost of materials
(intensive margin) (I_050_01) is not computable because of the lack of rm-level
information on material costs. Percent of exporting rms in total number of rms
(extensive margin) (I_046_01) and percent of importing rms in total number of rms
(extensive margin) (I_049_01) are computable only since 2008 because the
information on total number of rms is only available since that year. All the other
indicators are computable since 1991.
12. For rms that started up in 2010 and later, there is information on rms age, and for all active companies there
is information on the number of employees (for certain years). Breakdown by size is feasible.
28
16/2/15
10:03
Page 29
R&D and other activities Asset tangibility (I_059_03) is not computable because
information on tangible fixed assets and total assets is not available, while firmlevel estimates of quality (I_070_01) is not computable because firm-level data on
value of production sold abroad is not available. R&D expenditure mean
(I_023_04) and R&D expenditure (% of turnover) mean (I_023_05) are computable
for 2006, 2008 and 2010 through CIS. Share of foreign-owned firms in total firms
(by country, sector, region) (I_041_03) is computable from 2008 and share of
domestic MNFs in total firms (by country, sector, region) (I_042_03) is only
computable for 2013, since information on multinational status of the firm has just
started to be collected.
Accessibility
Access to most data is restricted. Data collected for CIS (turnover and R&D expenditure)
can be accessed under certain conditions (for scientic purposes according to the
Ordinance on the methods of statistical data protection and Ordinance on Conditions
and Terms of Using Condential Data for Scientic Purposes).
Czech Republic
The main databases for the Czech Republic are the Business Register (named RES)
and the External Trade Database. Both datasets are collected by the Statistical Oce
(CZSO), but are also available at the National Central Bank (NCB)13.
For the period up to 2007, the Business Register includes companies with 20 or more
employees. From 2008, the Business Register considers only rms with 50 or more
employees (smaller sample). The External Trade Database available at the NCB is a
smaller version of the full dataset at CZSO (data on 1,000 biggest exporters and 1,000
biggest importers). According to the reported information, the Business Register starts
from 2002, while the External Trade Database is available from 1999.
The NCB also collects rm-level data on FDI inows (about 5,700 rms). Information
on foreign ownership is also available in the Business Register (50 or more percent of
equity). In addition, statistics on outward foreign aliates (about 500-600 Czech rms
with signicant foreign aliates) are collected and available at the NCB, and data has
13. The Business Register includes all companies (legal persons), self-employed persons (natural persons) and
authorities, that is 2.8 million entities. The CSZO administers data concerning international trade with goods. Data
on international trade with services is collected by the Czech National Bank.
29
16/2/15
10:03
Page 30
been harmonised since 2007. Indicators can be dened at NACE rev.2 classication (2
digits) from 2005 (or 2007). Regional disaggregation is not reported.
The External Trade Database at NCB can in principle be matched with the Business
Register because the national rm identier ICO is available in both databases.
However, the Czech National Bank is not authorised to provide micro-data originating
from CZSO. Finally, note that in the External Trade Database, the main identier is DIC
(tax ID), while ICO (national rm ID) is a secondary identier, and thus some
combinations are not feasible.
In conclusion, the main issue for the Czech Republic is not the availability of underlying
variables, but the unclear accessibility of Custom Data. Finally, it is worth mentioning
that some of the indicators can be retrieved from the CompNet database.
Labour productivity Labour productivity indicators are computable from 2002 (or
2005 for exporting rms) with a harmonised classication (NACE rev2). Data on
multinational status needed for indicators I_001_08 and I_001_09 (domestic and foreign
multinationals) is available only from 2007 and only for a restricted sample of rms.
TFP The same considerations of labour productivity indicators apply to TFP indicators.
Firms dynamics Firm dynamics indicators are computable through the Business
Register. However, information on rm deaths is not reported: thus indicators I_052_03
and I_053_01 are not computable.
Internationalisation All the indicators on internationalisation are computable, through
the Business Register and Custom Data. The Business Register allows us to compute
directly I_009_02, while for the other indicators it is necessary to merge the two sources.
R&D and other activities R&D indicators (I_023_04, and I_023_05) are computable
using CIS for 2000, 2001, 2004, 2006, 2008, and 2010. The share of foreign-owned
rms is computable from 2005 (I_041_03), while the share of multinationals is
computable from 2007 (I_042_03). Tangible asset level is computable.
Accessibility
Business register data can be accessed both at the NCB and CZSO. For access, an
external researcher has to provide a research project and pay a fee. Data can be
accessed both on-site and with CDs (depending on the agreement). According to NCB,
30
16/2/15
10:03
Page 31
custom data is available only for NCB employees, and the NCB does not report the
conditions to use FDI, and outward FATS data. Access conditions for the External Trade
Database at CZSO are regulated by special contract of condentiality, and the access
is only granted for research purposes (on payment of a fee).
More details are available at
http://www.czso.cz/eng/redakce.nsf/i/statistical_data_for_scientic_research_purposes.
Denmark14
Firm-level data in Denmark is from Statistics Denmark (the central authority on Danish
statistics). In order to describe indicators computability, we collected information on
dierent data sources such as the Industrial Accounts Statistics, the External Trade in
Goods, or the FIDA database. The rst of these includes balance-sheet information, the
second contains the trade statistics (Intrastat and Extrastat), while the FIDA database
is an employer-employee database that encompasses labour cost and some balancesheet items. In addition, we consider the Business Demographics and the Foreign
Owned Enterprise databases.
All the databases report information on rms industry that is compatible with NACE
classication. Regional location is collected in the Industrial Accounts Statistics and in
the FIDA database. However, the computable indicators, as the internationalisation
indices, can be dened at regional level merging the dierent databases. In principle,
it seems that all the mapped databases can be merged given that several ID codes are
reported for each rm, but we have not had conrmation from Statistics Denmark (see
footnote 14).
Labour productivity The labour productivity indices are computable from 1995, 1997
(by import/export status) and 2004 (by ownership). Indicators I_001_08 and I_001_09
are not computable (since the information on multinational status is missing).
TFP Similar to labour productivity, TFP indices are computable from 1995, 1997 (by
import/export status) and 2004 (by ownership). Indicators I_003_07 and I_003_08 are
not computable (missing the information on multinational status).
14. Please note that the information provided was collected from publicly available sources. Despite several attempts
to contact Statistics Denmark, we could not verify and integrate this information. In particular, we are not in position
to verify the details and the extent to which dierent sources can actually be matched.
31
16/2/15
10:03
Page 32
Firms dynamics It is possible to calculate the entry and exit rate (I_051_03, and
I_052_03), and survival rate and average rm size (I_053_01, and I_054_01) index using
dierent data sources (FIDA or Business Demography). Indices on rms dispersion
and share of gazelles (I_055_01, and I_056_01) are computable. For these two indices,
Statistics Denmark has a specic database (Gazelles in Denmark).
Internationalisation All the trade indicators can be computed from 1997.
R&D and other activities R&D expenditure (I_023_04), R&D intensity (I_023_05), and
rms ownership (I_041_03) indicators are computable. Conversely, the share of
domestic multinationals (I_042_03) is not computable. Finally, tangible assets and the
rm-level index of quality are computable too.
Accessibility
Data is accessible for persons aliated to Danish institutions which are recognised
by Statistics Denmark, conditional to the approval of a project. In principle, foreign
researchers can access data if they have an aliation with a Danish institution.
Aliation can only take place if the authorised institution is willing to take the
responsibility for the foreign researcher, making sure that all rules governing access
to micro-data are observed. Data can be accessed on site or remotely. See more
information at http://www.dst.dk/en/TilSalg/Forskningsservice.aspx
Estonia
Firm-level data can be recovered from three main data sources: (i) Business Register
merged with custom data, (ii) Central Bank data, and (iii) R&D database. While the rst
two databases are available at the Central Bank, the latter is collected by (and available
at) Statistics Estonia (SE). In addition, Statistics Estonia collects information on
economically active enterprises in a database named the Statistical Prole: it is
updated from ocial Business Register and statistical surveys. Data in the Statistical
Prole and in the other surveys, such as R&D survey (as CIS), can be linked for microanalysis. The Statistical Prole database is available also for the Central Bank.
The main data source is the Business Register merged with custom data, which is
available at both institutions.
Firms are classied according to NACE rev.2 classication at 3 or 4 digit level (only at
2 digits for R&D surveys). Part of the time series starts from 1995, while others start
32
16/2/15
10:03
Page 33
from 2003. Regional aggregation is not reported (since Estonia itself is a NUTS 2
region); R&D is estimated also at the NUTS3 level15. It is important to underline that
even if all the disaggregated groups are possible within the available variables, the
condentiality rule requires that rm-level information cannot be discovered if fewer
than three rms belong to the group (and one rm dominates the group). Given that
Estonia is a small country this is not unlikely.
Labour productivity All the labour productivity indices are computable since 1995,
although the indicators by export/import status and foreign/domestic ownership are
computable only since 200316.
TFP Similarly to labour productivity, all the TFP indices are measurable within the
limits mentioned above. The decomposition indexes are computable since 1995.
Firms dynamics All the indices about rms dynamics are computable.
Internationalisation The competiveness indexes on trade activity are computable
from 2003.
R&D and other activities R&D data are available (I_023_04) at the National Statistical
Oce from 1998. Similarly, R&D intensity (I_023_05) is computable, merging R&D
surveys with the Business Register. Information about foreign ownership is available
from 2003. Finally, data on rms tangible assets and export unit value is available.
Accessibility
Data is at SE, and the availability of micro-data for scientic purposes is regulated by
legal acts and can be used in the safe centre (see http://www.stat.ee/legal-acts). In
addition, all the sources mentioned above are highly condential, so accessibility rules
are quite restrictive.
Finland
Finnish data is available from dierent sources. Most data is collected by the National
Statistical Oce, while the database on foreign trade statistics is collected by the
15. In the case of some big corporations, R&D value is connected to their headquarters, not to the unit performing the
R&D.
16. The indicators by export/import status are in principle computable through the CompNet database since 1995,
however we were not in position to verify details and access conditions.
33
16/2/15
10:03
Page 34
Finnish Custom Oce. The dierent sources can be matched, so that computability of
indices is guaranteed. Firms are classied according to NACE rev. 2. Regional
disaggregation is possible. The unit-level data is condential. Total number of rms is
publicly available.
Labour productivity All the labour productivity indices are computable. Access to the
data is limited.
TFP All the TFP indices are computable. Access to the data is limited.
Firms dynamics It is possible to calculate the entry and exit rate index (I_051_03, and
I_052_03), as well as survival rate and average rm size (I_053_01, and I_054_01).
However, because of mergers and acquisitions, the quality of data might not be good
and the degree of computability is reduced. However, indices of rms dispersion and
share of gazelles (I_055_01, and I_056_01) are computable. Access to the data is limited.
Internationalisation Trade indicators can be computed. However, the coverage of
indicators is dierent according to the data source and the thresholds of registered
transactions, meaning the degree of computability is reduced (I_009_02, I_043_01,
I_043_02, I_044_01, I_047_01, and I_050_01). These issues do not arise with the overall
numbers (and percentage in the total number of rms) of importers and exporters
(I_045_01, I_046_01, I_047_01, I_048_01, and I_049_01). Access to the data is limited.
R&D and other activities R&D expenditure (I_023_04, I_023_05), rm ownership
(I_041_03, I_042_03) are computable, although the computation of tangible assets
(I_059_03) and rm-level estimates of quality (I_070_01) could imply some possible
problems. Access to the data is limited.
Accessibility
Data is accessible at the Research Laboratory or via the remote access system
conditional on a user licence, access agreements and a fee payment. See more details
at http://www.stat./tup/mikroaineistot/index_en.html.
France
Micro-level data is available from three dierent databases. First, FICUS Systme
Unifi de Statistique dEntreprises (FICUS SESA) up to 2007 contains balance-sheet
data (from scal forms), with other information and identication number from
34
16/2/15
10:03
Page 35
business registers. Then, the ESANE (FARE) (since 2008) reports information of the
same kind (balance-sheet data and other information from social data or business
registers; ownership is available through a merge with specic surveys or
administrative data (LIFI). Finally, the Dclarations Douanires administrative data
collected by the DGDDI (directorate of the ministry of economy) reports trade statistics
(at rm level). All three databases are available at the National Statistical Oce (INSEE).
Users have to be careful about the meaning of the rm unit (legal status in these
databases).
As of July 2014, data up to 2012 is available.
Firms are classied according to NACE classication at 2 digits (rev.1 from 1994 to
2007, and rev.2 from 2008 to 2012); geographical location can be identied with a
NUTS 2 code. The historical series go from 1994 to 2007 and 2008 to 2012 (with the
Nace-Rev2). Data is partial for 2008 (beginning of the new system).
Labour productivity Almost all labour productivity indices are highly computable, as
well as unit labour cost. Labour productivity indices by ownership are computable from
2008. Data to calculate the competitiveness indices is highly condential but access
is feasible.
TFP Almost all TFP indices are computable for a relatively long time series, with the
exclusion of statistics by ownership (available since 2008). The relative underlying
data is condential, so that the degree of accessibility is limited.
Firm dynamics All the competitiveness indices on rms dynamics are computable.
Data is available on the FICUS or ESANE databases. However, data is condential.
Internationalisation All the measures on trade activity are computable. Data is
available through Dclarations Douanires by DGDDI.
R&D and other activities Indicators of R&D expenditure (I_023_04, I_023_05), tangible
assets (I_059_03) and export unit value (I_070_01) are computable. Ownership data
has been collected since 2008 (I_041_03, I_042_03).
Accessibility
All the sources mentioned are highly condential, but micro-level data will be
35
16/2/15
10:03
Page 36
accessible with the new system by submitting a research proposal and conditional on
approval by a committee. Details on accessibility can be found at http://www.casd.eu/.
Germany
German bottom-up indicators can be computed based on data from several datasets,
the most important of which are: (i) the Financial Statements Statistics; (ii) the Microdatabase Direct Investment (MiDi); (iii) Germanys International Trade in Services from
the Deutsche Bundesbank; (iv) a panel on manufacturing rms based on Ocial Firm
Data for Germany (AFiD) provided by the Federal Statistical Oce (Destatis); and (v)
data on employment at establishment level by the Federal Employment Oce. Finally,
some of the indicators can be retrieved directly from the CompNet database (at ZEW).
Data is classied with a NACE code (2 or 3 digit level) both in rev.1.1 and rev.2 (from
2008). In the mapped database it is not possible to recover information on the
exported quantities and the ownership of rms abroad (ie if a German rm controls
rms abroad).
Despite the general and good accessibility of the micro-level data at each institution,
matching data between those institutions is nearly impossible because of privacy
protections. Within a specic project KombiFiD (www.kombid.de) data from the
three above-named institutions was matched for a limited number of rms. However,
all rms had to be asked for their written consent to agree to the matching and the data
was only matched for one specic year. The matched dataset had to be deleted after
three years. This restriction causes a limited computability for some of the indicators,
despite good availability of the original variables needed to calculate the indices.
For example, AFiD panel can be merged with other rm-level databases from Destatis.
However the same AFiD is not easily matchable with IAB Establishment Panel at BA.
This issue raises a trade-o between time coverage and the number of computable
indicators. The AFiD panel starts to be complete from 2002, while BA data covers a
longer time span (from 1975). However, the data contained in the AFiD panel allows
identication of more indicators because the AFiD is richer in information than BA data.
In addition, we are not able to map (at the moment) a detailed dataset on international
trade activities (for manufacturing rms) at the Deutsche Bundesbank.
In light of this, the report and the summary tables in the Annex describe the indicators
that can be constructed with data at Destatis, in order to maximise the number of
computable indicators.
36
16/2/15
10:03
Page 37
Labour productivity The aggregate values of labour productivity and unit labour costs
are computable in the mapped databases, with the exclusion of the indicators by
import (I_001_08) and multinational status (I_001_08, I_001_09). Some of the indicators
are available in CompNet (by sector, NACE rev.2 2 digit).
TFP The same considerations we made for labour productivity apply to TFP indicators.
In addition, Olley and Pakes, and Foster decomposition are computable from 2002.
Firm dynamics All the indicators on rms dynamics are computable and information
is accessible at Destatis.
Internationalisation Using the information in the AFiD database, it is possible to
calculate exports per rm (I_009_02), and the extensive and intensive margin of
exports. However, indicators by destination and number of exported products are not
computable for two reasons. First, trade data by destination and number of products
are available only at the Bundesbank, but merging is not allowed. Second, Bundesbank
collects only data on trade in services. For the same reasons, indicators on import
activity for manufacturing rms are not computable.
R&D and other activities Indicators of R&D and tangible assets are computable with
the mapped databases. The multinational rm status and unit value of exports are not
computable given that the necessary information is not available in the mapped
databases.
Accessibility
Most of these datasets are available in general under certain conditions at the
respective institutions. Destatis, the Federal Employment Oce (Bundesagentur fr
Arbeit, BA) and the Bundesbank all have dedicated Research Data Centres17 which oer
on-site or remote access (or direct access via Scientic Use Files) to many of their
micro-level datasets, according to the German laws of privacy protection. Data is
accessible to researchers, but only at the BA can foreign researchers get access to the
data without cooperating with a partner from Germany.
Data from the Deutsche Bundesbank is accessible only at the Research Centre (in
17. See www.forschungsdatenzentrum.de for the Destatis Centre, see www.fdz.iab.de for the Federal Employment
Oces Centre and
http://www.bundesbank.de/Navigation/DE/Bundesbank/Forschungszentrum/forschungszentrum.html for
information about the Research Data Centre of the Deutsche Bundesbank.
37
16/2/15
10:03
Page 38
Frankfurt am Main). The use of data from the Deutsche Bundesbank is subject to
special condentiality conditions. Because of legal requirements, individual data
cannot be made generally available. However, this data is made available under strict
conditions and for clearly dened academic research purposes. Bundesbank has a
visiting researcher programme at the Research Centre.
In the case of BA, the FDZ oers three ways of data access for researchers. These dier
according the degree of anonymity of the data and the terms of data use: (i) on-site,
(ii) remote data access, and (iii) Scientic Use File (rare). In all the three cases, the
researchers have to present a research project that has to be approved by FDZ. In the
case of on-site access, there is the possibility to apply for nancial support18.
The research data centre of the Destatis oers four dierent forms of access to
selected micro-data of ocial statistics: (i) public use les, (ii) scientic use les, (iii)
safe centres, and (iv) remote execution. They dier with regard to both the anonymity
of the data, and the form of data provision. The scientic use les are well-suited for
large parts of the scientic data analyses. Foreign users, who are not employed by
German institutions, may work with the data both at the research centre and via remote
executions. More details can be found at
http://www.forschungsdatenzentrum.de/en/datenzugang.asp.
Hungary
The data used to compute bottom-up indicators for Hungary is derived from six sources.
First, company income tax return data of double-entry bookkeepers is collected by
the National Tax and Customs Administration of Hungary (NAV)19. Tax return data
includes information connected to balance sheets and prot and loss statements.
Second, there is product-country-year level trade data based on survey data and data
collected in customs procedures. For years prior to EU accession, trade data covers all
transactions and it is based on customs declarations. Since 2004, trade data consists
of Extra- and IntraStat statistics. Extrastat is based on customs declarations while
IntraStat is based on a survey which covers companies with an annual intra-EU trade
turnover of above the yearly determined exemption threshold. Information on R&D is
reported in the Innovation Database (based on the Community Innovation Survey)
and the research and development (based on R&D surveys of the HCSO) database of
18. More details are at http://fdz.iab.de/en.aspx.
19. NAV transmits the data to the Hungarian Central Statistical Oce (HCSO) and HCSO makes it available for research
purposes.
38
16/2/15
10:03
Page 39
the Hungarian Central Statistical Oce. Finally, the Business Register records
information on rms year of creation/destruction20. All the databases are maintained
and made available in a safe research room at the HCSO, subject to agreements with
HCSO.
Labour productivity Almost all labour productivity indices are computable, with the
exclusion of aggregates for domestic multinationals (I_001_08), and aliates of foreign
multinationals (I_001_09) given that data on multinational status is not available. In
addition, it is possible to compute also the unit labour cost21. These indicators are
accessible also through CompNet. Data is available from 1992.
TFP It is possible to calculate all the TFP indices, and the two decomposition terms.
Because of the absence of information on multinational status it is not possible to
dene TFP for domestic multinationals (I_004_01) or aliates of foreign multinationals
(I_005_01). Computable indicators are accessible also through CompNet22. Data is
available from 1992.
Firm dynamics According to the mapping, rm dynamics indicators are all
computable from 1992. Note that there are caveats in calculating age of rms,
especially for the early years, since the Business Register is truncated at 1992.
Internationalisation According to the mapping, indicators of internationalisation are
all computable from 1992.
R&D and other activities Indicators on R&D since 1999 can be retrieved from the
innovation and the research and development databases (CIS observations are
biannual). Data is merged with tax return data to obtain I_023_05. Information on rms
multinational status is not available. Tangible assets and unit value of exports are
computable.
20. The sources of Business Register data are: own data collections of the HCSO, database of the National Tax and
Customs Administration, register of the Court of Registration, Central Oce for Administrative and Electronic Public
Services, Hungarian State Treasury, etc.Firms are classied according to NACE 4 digit code (rev1.1 from 2003, rev.2
from 2008), and geographical location is dened by NUTS3. In the trade database, location is not reported.
However it can be retrieved from the balance sheet data. In addition, in the Business Register, location information
is not available for all the rms.
21. Note that there are caveats in calculating value added for certain sectors (oil and tobacco industries).
22. TFP indicator in CompNet is the Wooldridge augmented Levinsohn-Petrin (GMM estimated) and TFP distribution
is shown for all rms, exporters and non-exporters and by rm size. No distinction according to ownership.
Decomposition is Olley-Pakes type and Forster without entry and exit in CompNet.
39
16/2/15
10:03
Page 40
Accessibility
The Hungarian matched data was created by the CSO by assigning an anonymised
identier to each company, which is consistent between years and databases. Data
protection, required by the law, is a key element in the operations of the CSO. Therefore,
variables that provide a direct possibility to reveal the identity of a company (eg name
of the company, address of the headquarters or tax number) were deleted. Technically,
the data is stored on a server in separate les according to topics. Merging the dierent
databases using the ID numbers assigned by the CSO is performed by the researcher.
The matched database is accessible only to researchers with an agreement with CSO,
such as the Hungarian Academy of Sciences or some ministries. Access is granted
after registering the project at the CSO. The accessibility of the matched database is
restricted to a safe research room inside the building of the CSO where researchers
can work on the data on site and save their results. Note that accessibility is still limited
and occasionally quite slow. The researcher who works with the data has to be in the
research room in Budapest and needs be aliated with a partner.
Ireland
Dierent source of data, all collected by the Central Statistics Oce Ireland (CSO), are
taken into account: the Census of Industrial Production (CIP), the Annual Services
Inquiry (ASI), the Merchandise Trade Data (MTD) and the Business Expenditure on
Research and Development Survey (BERDS).
In these databases, rms are classied according to NACE classication (4 digit, rev.1
and rev.2); geographical location is identied with a NUTS3 code. Historical series are
mostly available from the middle of the 1990s.
Labour productivity All the labour productivity indices are computable. The indices
by ownership are available from 1996, while all the other indicators are computable
from 1991.
TFP All the TFP indices are computable, even if the data for capital stock presents
some diculties in the calculation23. There are restrictions on the use and publication
of results.
23. No capital stock data is available in CIP or ASI. Capital stock could be calculated based on capital investments
and disposals using the perpetual inventory method. Starting stocks could be obtained by breaking down previous
year's end of year industry-level capital stocks obtained from CSO to the rm level using the rm's share of
industry-level fuel use.Firms dynamics All the indices on rms dynamics such as entry rate (I_051_03), exit
rate (I_052_03) or survival rate (I_053_01) are computable from 1991.
40
16/2/15
10:03
Page 41
41
16/2/15
10:03
Page 42
data sources need to be merged). In addition, indicators can be retrieved from the
CompNet database.
TFP All TFP indicators are computable, but only the aggregate index (I_003_03) and
the TFP decompositions (I_004_01 and I_005_1) are accessible (all the information is in
the Surveys on rms accounts). Similar to labour productivity, the indices by trade,
ownership and multinational status are not accessible given that data sources at the
ADELE laboratory are anonymised. However, indicators can be retrieved from the
CompNet database.
Firm dynamics Firm dynamics indicators are computable from 2001, but not
accessible because statistics are calculated with the Business Demography (which
is not available at ADELE laboratory). While it would be possible to compute and access
the rm dynamics statistics, using the Business Register, ISTAT indicates the more
reliable gures are those calculated with the Business Demography, according to
Eurostat guidelines.
Internationalisation All the indicators of internationalisation are computable, but
data is not accessible to researchers (elementary trade data is not available at the
ADELE laboratory).
R&D and other activities R&D data is available from 2001 from the R&D survey, and
the correspondent indicators are computable. Similarly, indicators on ownership and
tangible assets are computable, but accessible for the period 2001-08 (more recent
data is not available at ADELE yet). Finally, the average unit value of exports is not
computable given that exported quantises are not available at rm level.
Accessibility
Firm-level data is condential and restricted. The Business Register (except for
Business Demography) and micro-data stemming from surveys is available to the
users at the ADELE Laboratory (Laboratory for Elementary Data Analysis). However, it
should be stressed that identication codes of single units are not available to external
researchers; thus it is not possible to merge data from dierent surveys without a
specic agreement with Istat (research protocol)27. Databases with the full population
27. See for example project Istat Micro3. For further information about ADELE laboratory see
http://www.istat.it/en/information/researchers/analysis-of-individual-data.
42
16/2/15
10:03
Page 43
are not accessible to researchers, but descriptive statistics from these databases are
available on request.
Latvia
The rm-level data considered here is provided by the Central Bureau of Statistics of
Latvia (CBS). CBS collects rm-level data through dierent databases among which
are the Annual Enterprise Survey, the Business Register and State Revenue Service
data (SRS)28.
The three databases can be merged through a unique identier. The CBS of Latvia also
collects monthly data on exports and imports (Custom data) from 2005 without
information on rms location29. We were not in a position to verify the matchability of
detailed trade data with other databases from CBS. The Business Register reports
import and export status by rms.
Information on Latvian multinational rms is missing, while foreign ownership is
reported.
Firms are classied according to NACE nomenclature at 4 digit level (rev.1 from 1997
to 2005, and rev.2 after 2005): because of the implementation of NACE rev.2, the data
series are comparable from 2005. Geographical location is identied with a NUTS 2
code (as already mentioned above, this information is not available for Custom data).
For each year, the preliminary data version is available around ten months later, while
nal data is available 18 months later (eg for data for January 2014, the preliminary
version is available around October 2014 and the nal version in June 2015).
Since that data is harmonised and comparable from 2005, we report in the summary
tables a degree of computability equal to one, even if the indicators can be computed
in the previous years.
Labour productivity All the labour productivity indicators are computable from 2005.
The mapped data does not allow indicators for multinational rms to be computed
because this information is not available.
28. The SRS includes annual nancial statements of enterprises and employers declaration on salary tax.
29. Foreign trade data for EU member states is collected by the Intrastat system using monthly statistical surveys.
Foreign trade data for the third countries is compiled on the basis of information taken from customs declarations.
43
16/2/15
10:03
Page 44
TFP All the TFP indicators are computable from 2005. The mapped data do not allow
indicators for multinational rms to be computed because this information is not
available. The Business Register reports only information on statutory capital, so that
it is dicult to retrieve information for tangible xed assets.
Firm dynamics Indicators of rm dynamics are computable only through CompNet.
The mapped data allows computation of the entry rate (I_051_03), dispersion of rms
(I_055_01) and the share of gazelles (I_056_01).
Internationalisation The entire set of internationalisation indices is computable.
R&D and other activities Variables R&D expenditure and Turnover are not
matchable, and the indicator on R&D intensity (I_023_05) is not computable. Similarly,
it is not possible to compute the indicator on multinational rms (I_042_03), because
this information is not available. As mentioned above, tangible assets are not available
(I_059_03). Unit value of export is computable.
Accessibility
Information on the value of exports (imports) by destination and product are not
accessible because it is condential. Other data is in principle available on request,
conditional on a fee payment.
Lithuania
The rm-level data considered here is collected by Statistics Lithuania and includes
several rm-level surveys, as well as balance-sheet data, tax declarations, the
Business Register and customs declarations.
Data is usually classied according to NACE classication (4 digit), while international
trade data can also be classied according to CN at 8 digits. As for regional
disaggregation, Lithuania is itself a NUTS 2 area; only added value, number of
employees, labour cost and turnover can be aggregated at NUTS 3 level.
Labour productivity All the indicators are computable. Micro-aggregated labour
productivity (average, median, other moments) all rms (I_001_04) is available from
2000 to 2012, while the others only since 2004-05.
TFP All the indicators are computable since 2005.
44
16/2/15
10:03
Page 45
45
16/2/15
10:03
Page 46
46
16/2/15
10:03
Page 47
rev1.1 and rev.2) and regional (NUTS 3) aggregation. The only variable (at micro-level)
for which we did not nd a source is the total assets.
The main issue in the mapped databases is related to the matchability of data from
dierent sources. According to the information reported, we are not able to assess if it
is possible to merge data collected in dierent databases. Even if most of the
underlying variables are collected, the computability is uncertain.
Labour productivity According to the collected information, it is possible to compute
only the labour productivity index for all rms (I_001_04) and the unit labour cost
(I_013_02). For all the other indices we are not able to state computability, given that
we have no information on the data merging. See Table 3.15.
TFP We can compute only the TFP index for all rms, and the decomposition indices
(I_004_01 and I_005_01). Similarly to labour productivity, we are not able to state
computability, given that we have no information on the data merging.
Firm dynamics All the indices are computable from 1993 or 2000, depending on the
data source (General Business Register or Annual Structural Survey, respectively).
Internationalisation Some of the internationalisation indices are computable, if they
involve just the use of the Survey on International Trade in Goods. Conversely, we
cannot dene the computability of indices by import status because we have no
information on matchability of data from dierent sources.
R&D and other activities The R&D indices are computable from 2003, while unit value
of export from 1990 (I_070_01). Conversely, we cannot report the computability for
the index I_37_07, I_38_09, and I_059_03.
Accessibility
In general, many indicators of competitiveness are available to both domestic and
foreign researchers. Access to micro-level data follows explicit rules, and specic
charges apply. According to CBS: All datasets in the Centre for Policy Related Statistics
micro-data catalogue are available for authorised external researchers to do their own
31. For an overview of existing data at Statistics Netherlands see the catalogue at
http://www.cbs.nl/NR/rdonlyres/0C40DD86-7AF3-4179-B74C-1B476A6A5387/0/120119catalogusmicrodata.pdf.
47
16/2/15
10:03
Page 48
research using these datasets. The catalogue does not contain all the datasets
Statistics Netherlands uses to compile its statistics. CBS datasets not (yet) included
in the catalogue may be made suitable for use by external researchers as custommade datasets. The catalogue (classified by theme) includes documentation reports
of the most recent version of datasets immediately available for use. This
documentation contains a description of the contents and structure of the dataset.
The enclosures referred to in this documentation are available only in Dutch and on
request. More details can be found at: http://www.cbs.nl/NR/rdonlyres/50625EDE3274-4D7C-B19B-5E5D0F239E2F/0/131112dienstencatalogusosra2014eng.pdf.
Poland
Information on Polish rm-level data has been provided by the Central Bank of Poland
(NBP) and Central Statistical Oce of Poland (NSO). The main source is the NSO for
both balance sheet data and innovation data (NSO database in accordance with the
Frascati Manual). The balance sheet database reports total revenues, revenues from
exports (total), and all the cost variables as well as the assets and liabilities. Firm-level
data is collected quarterly for rms with over 50 employees, and annually for rms
with more than 10 employees. Sectoral classication has a break in 2009 (NACE
rev.1.1/ rev.2 switch), but the NACE identiers can be traced back at the rm level to
2007.
Balance sheet data covers the period 1995-2011 and includes value of imports and
exports; however detailed trade data (ie quantities, products, destinations) is available
as custom data from 2004 at CN8 classication32. Customs data is available at both the
Ministry of Finance and the NSO33. Information on the year of rms creation and death
can be retrieved from the Business Register (REGON). Finally, rms IDs are unique for
all databases at NSO but information is anonymised, so that the data cannot be
matched with other data sources at NSO by external researchers. Moreover, the
customs data can in principle be merged with the balance sheet data but not at the
NBP because both sources provide anonymised data with incompatible ID codes.
Information about the conditions for access to the micro-level datasets have not been
reported.
32. Export status can be inferred using nancial statements but it would be less reliable than customs data.
33. The customs data at the NBP is the same data as held by the NSO and ministry of nance (primary origin of the
data). The accessibility of the customs data is limited to the NBP.
48
16/2/15
10:03
Page 49
Labour productivity Almost all labour productivity indices are computable, with the
exclusion of aggregates for domestic multinationals (I_001_08) and aliates of foreign
multinationals (I_001_09) because of the lack of information on ownership and
multinational status34. In addition, it is possible also to compute the unit labour cost
from 2002. These indicators are accessible also through CompNet. Data is available
from 1995.
TFP Both the TFP indicators and the two decomposition terms are computable.
However, due to the absence of information on ownership and multinational activities,
it is not possible to dene TFP for domestic multinationals (I_004_01) and aliates of
foreign multinationals (I_005_01, see footnote 34). Computable indicators are
accessible also through CompNet. Data is available from 1995.
Firm dynamics All the indicators on rm dynamics are computable using balance
sheet data. The indicators on rm dynamics can be computed using balance sheet
data for rms with over 10 employees. Otherwise, for the indicators I_051_03, I_052_03,
I_053_01, and I_054_01 the relative information on rms entry and exit are imputed
and reported in the regional register (REGON) at NSO35, 36. Conversely, dispersion of
rm by size (I_055_01) and share of gazelles (I_056_01) are computable from 1995 by
using balance sheet data at NSO.
Internationalisation All the internationalisation indicators are computable from 2002
or 2005 (eg I_009_02), however it was not possible to collect information on data
accessibility.
R&D and other activities R&D indicators are computable, as are unit value and asset
tangibility. Ownership information is not collected (I_042_03).
Accessibility
According to the information that we were able to gather, we can only state that the
34. I_001_09 is computable if rms with foreign capital as aliates of foreign multinationals are considered.
35. The REGON database cannot be matched by external researchers with other data sources at NSO. REGON is not
available at NBP. Data for rms with more than 10 employees is available since 2002.
36. At the Central Statistical Oce of Poland, data on business demography (birth rate, death rate, survivals, gazelles)
is computed in accordance with the rules contained in Annex IX of Regulation no 295/2008 of the European
Parliament and of the Council concerning structural business statistics. Data is prepared on the basis of the
statistical business register which is updated on the basis of additional sources (not used by the REGON database)
and as such is appropriate for business demography. Data on business demography of Poland (according to
Annex IX) is available since 2008.
49
16/2/15
10:03
Page 50
50
16/2/15
10:03
Page 51
16/2/15
10:03
Page 52
Business Statistics. The other indicators are only computable from 2007. Unit labour
cost is computable from 2002.
TFP The same caveats of labour productivity apply to the computability of TFPs
indicators. Indicators by export status can be recovered using information in SBS, while
indicators by import status cannot (import activity is only in FTS). Indicators by
ownership status and international activity are available from 2007. Both Olley and
Pakes (OP) decompositions and Foster decompositions can be computed from 2002.
Firm dynamics All the indicators of rm dynamics are computable from 2002.
Internationalisation Some of the internationalisation indices are computable from
2002. However, the indices available from 2002 (I_009_02, I_045_01, I_041_02, and
I_041_02) rely on SBS and therefore are not representative of the population. From
2007, FTS starts to include trade data for most of the rms with detailed set of
information, such as quantities and number of products exported, and destinations
(similarly for imports). However, FTS is still in the phase of collecting and working on
the raw data. FTS data is at time of writing not available and not harmonised with SBS.
R&D and other activities Only asset tangibility, and R&D indicators are computable
(from 2002). The indicators for ownership and multinational presence (I_041_03 and
I_042_03) are computable from 2007. The unit value index (I_070_01) is not computable
given that data on exported quantises in FTS have still to be validated.
Accessibility
Data is not accessible because a safe environment for data security is not yet in place.
Slovakia
For Slovakia, databases considered here are collected by the Statistical Institute of the
Slovak Republic and the National Bank of Slovakia. The former institution compiles the
Annual Report on Production Industries that targets non-nancial corporation (ie rms
with 20 and more employees or turnover higher than 5 million) and the individual
trade data (from customs oces). The Bank of Slovakia compiles the annual reports
on inward and outward foreign direct investment, and the register of organisations38.
38. Notice that also in this case the balance sheet data, such as value added, is available only for companies with 20
and more employees or turnover higher than 5 million.
52
16/2/15
10:03
Page 53
Firms are classied according to NACE classication (4 digits). The historical series
are in principle collected from 2000 to 2011, even if the real availability and
comparability may dier.
Labour productivity Aggregated indexes of labour productivity and unit labour cost
(I_013_02) are computable both from mapped databases (annual reports on production
industries) and CompNet. Data to calculate labour productivity indices by export
status39 based on customs data is available from 2004. Labour productivity per
exporter (I_001_06) can be calculated using balance data on sales abroad (collected
within reports on production industries) from 2000. Data to calculate labour
productivity indices by domestic/foreign ownership40 is available from 2008.
TFP Similarly to labour productivity, aggregated TFP indexes are computable both
from mapped databases (annual reports on production industries) and CompNet. Data
to calculate labour productivity indices by export/import status, and by
domestic/foreign ownership is available from 2004 and 2008, respectively. However,
TFP per exporter (I_003_05) can be calculated from 2000 using balance sheet
information on sales abroad. OP and Foster decompositions are available from 2000.
Firm dynamics Data for rm dynamics have been collected in principle since 2000
but the availability has to be veried sector by sector41. Conversely, dispersion of rms
by size (I_055_01), and the share of gazelles (I_056_01) are computable from 2000.
Internationalisation The indices of internationalisation can be computed from 2004
with individual (rm) trade data42.
R&D and other activities R&D data is computable from 2000 only for rms with an
R&D unit. The ownership indicators (I_041_03 and I_042_03) can be computed by
merging annual reports on production industries with the register of organisations
(from 2008). Indexes on tangible assets and unit value of export can be computed
from 2000 and 2004, respectively.
39.
40.
41.
42.
53
16/2/15
10:03
Page 54
Accessibility
The rm-level databases are not available online, and access is condential: the rules
of access have not been specied.
Slovenia
The databases considered for Slovenia are the Slovenian Business Registry (SBR), the
Annual Reports of Direct Investments, the IntraStat and ExtraStat database, and the
Research and Development Activity database. It should be noted that all companies in
Slovenia, whether limited or unlimited liability companies (including listed companies), economic interest groupings and main oces of foreign business entities,
are legally obliged to submit their annual reports to the Agency of the Republic of
Slovenia for Public Legal Records and Related Service (AJPES). An additional source is
the Slovenian companies annual reports used for the CompNet project43. All databases
are available at the Statistical Oce of the Republic of Slovenia (SURS). All mentioned
databases have unique ID identier so it is possible to merge micro-level databases.
Firms are classied according to NACE classication (rev1 from 1995 to 2004, rev2.
from 2005 to now), and location is identied by NUTS3 code44.
Labour productivity The aggregate values for labour productivity and unit labour
costs are computable in the mapped databases, and are also available through
Slovenian companies annual reports. Similarly, unit labour costs are computable.
Indexes I_001_08 and I_001_09 are computable only from 2008 because the
information on the multinational status of a rm (ie if a rm controls enterprises
abroad) was not collected before.
TFP All the TFP indices are computable, although I_003_07 and I_003_08 only since
2008 because the information on the multinational status of rms was not reported
before.
Firm dynamics All indices for rm dynamics are computable, even if some, such as
the entry and exit rate, are computable only from 2004 because the year of rms
deaths is reported from 2004.
43. The AJPES data is regularly used for national statistical purposes by other institutions, and includes the Slovenian
companies' annual reports.
44. Trade data was collected according to NACE rev1 until 2007.
54
16/2/15
10:03
Page 55
55
16/2/15
10:03
Page 56
TFP Like labour productivity, all the TFP indicators and relative decompositions are
computable. For indicators I_003_08 and I_003_09, it is not possible to dene the
computability given that it is not clear how to recover reliable information on the
multinational status of a rm.
Firm dynamics All the rm dynamics indicators are computable. However, the
computability of I_=50_04 (the average rm size relative to entry, by age) cannot be
dened, because there is no reliable information on year of a rms creation.
Internationalisation Most of the internationalisation indices are computable from
1993. However indicators that require information on exported quantity, number of
products exported and destination markets (I_043_01, I_043_02, and I_040_1) cannot be
computed because such data has not been mapped.
R&D and other activities R&D indicators are computable, as well as asset tangibility.
For the other indicators, the computability has not been reported given that the
availability of the underlying data is still not properly mapped.
Accessibility
In the case of the Industrial Economics Survey, only other statistical institutions
(Statistical Institutes of Autonomous Communities) are provided with micro-data les.
As for the CIS and the Pitec databases, it is possible to access rm-level data
anonymised on the INE website through a specic procedure. Researchers must
submit a request by lling out the required elds in the tab Solicitud de descarga de
BBDD. Once the request has been evaluated and approved, the researcher will receive
within 72 hours an email providing a username and password, valid for three months.
Except for anonymisation of a set of variables, the les available on the website
correspond with the original les.
Sweden
The databases we consider for Sweden are the Structural Business Statistics (SBS),
the International Trade Survey46, R&D Survey and the Business Register. All the
databases are collected by Statistics Sweden (SCB). Firms are classied according to
NACE classications; the revisions 1 and 2 of NACE classication are both reported in
46. International trade statistics changed when Sweden joined the EU.
56
16/2/15
10:03
Page 57
the transition period 2006-10. Firms location has not been mapped. However, if
location of rms is available, according to SCB this information is dicult to use
because plants might for instance report the addresses of their head oces.
Firm-level data can be merged through a rm ID, although in case of sample surveys,
overlaps can be smaller than original surveys. All the indices are highly computable.
Labour productivity All the labour productivity indices are highly computable, as well
as unit labour cost. Almost all the indices are computable from 1980, while indices by
trade status are computable from 1995 (eg I_001_06 and I_001_07).
TFP All the TFP indices are highly computable. Similar to labour productivity, TFP
indices are computable from 1980 with SBS, while TFP indicators by trade status are
computable from 1995 using the international trade surveys.
Firm dynamics All the competitiveness indices on rm dynamics are computable.
Internationalisation All the measures on trade activity are computable. Data is
available from 1995.
R&D and other activities Indicators on R&D expenditure (I_023_04, I_023_05) are
reported in R&D surveys; tangible assets (I_059_03) and export unit value (I_070_01)
are computable too. Ownership data has been collected since 1980 (I_041_03,
I_042_03).
Accessibility
All rm-level data is restricted but data can be accessed by European researchers via
remote access, conditional on a condentiality check and an administrative charge.
United Kingdom
The databases considered for the United Kingdom are the Annual Respondent
Database (ARD), the Annual Inquiry into Direct Investment in the UK (AFDI), the
Business Enterprise Research & Development (BERD) database and trade statistics
from HM Revenue and Customs (HMRC).
The rst three databases are collected by the Oce for National Statistics, but the rst
two are available through UK Data Service (UKDS). The ARD can be merged with AFDI
57
16/2/15
10:03
Page 58
and BERD using the IDBR code47. The database resulting from the merging of ARD, BERD
and AFDI classies rms according to SIC industrial classication.
With the exception of export and import status, trade data can be retrieved from trade
statistics at HMRC, which is custom data on rms trade activities. Import and export
declarations from and to countries outside the EU are available from 1996-2012, while
trade with EU countries is available only from 2008 to 2012. Firms are classied
according to SITC2 and HS4 classication (in addition CN8 nomenclature is reported).
In principle, HMRC data can be merged with external sources (such as ARD). However,
it is necessary to describe the data that a researcher would like to obtain and the HMRC
Datalab Team will consider each dataset on a case by case basis48.
Labour Productivity All the labour productivity indicators and the unit labour cost are
computable from 1995.
TFP All the TFP indicators and the relative measure of decomposition are computable
from 1995.
Firm Dynamics All the indicators on rm dynamics are computable from 1995. The
exit rate (I_052_03) is not computable given that data on rms deaths is not available
in the mapped databases.
Internationalisation All the internationalisation indices are computable. However, it
is important to underline some critical aspects. At rst, the indices for the extensive
margin of trade (both imports and exports) are computable from 1995 because the
ARD database reports all the necessary information. According to the mapped
databases, the other indicators (ie the intensive margins) are constructed with HMRC
data. This implies that foreign trade data within the EU is available from 2007 while
trade data outside EU is available from 1996. Then we made the choice to dene the
computability of these indices not perfect (in yellow).
R&D and other activities All the R&D indicators are computable, as well as indicators
on multinational status and ownership. Unit values can be calculated with HMRC data.
The caveats of internationalisation indices apply also to unit value index.
58
16/2/15
10:03
Page 59
Accessibility
All the sources are available via the submission of a research project to the appropriate
institution (UKDS, ONS, and HMRC Datalab). In addition, the HMRC Datalab requires a
short training course, which includes legal issues as well as statistical disclosure
control of output. At the moment the Datalab is only open to UK-based institutions and
by law HMRC is only allowed to share the data if it serves one of HMRCs functions. Data
is available only on-site.
2.3.3
Concluding remarks
The picture is remarkably dierent in each country when we analyse the computability
and the availability of a set of competitiveness indexes that can be calculated through
a bottom-up approach (ie using rm-level data). Table 2.4 provides a synthetic
overview of the computability and accessibility for selected bottom-up indicators,
which we use to provide a summary of our main ndings.
First, the degree of computability is rather good for a wide span of indicators for many
countries. In particular, Table 2.4 (left panel) shows that in Belgium, Denmark, Estonia,
Finland, France, Hungary, Ireland, Slovenia, Spain, Sweden and the UK, most of the
selected indicators are computable for a relatively large number of years. However,
computability is relatively low across the board in Croatia, the Czech Republic, Malta,
Portugal and Romania.
Second, indicators for labour productivity, TFP and international activities have the
highest degree of computability, given that they require the use of basic items from
balance sheet/business register data and trade statistics, respectively. It seems more
problematic to merge information from the balance sheet/business register with a
foreign-ownership ag, so that productivity for aliates of foreign multinationals
cannot be computed for Croatia, Denmark, Germany, Hungary, Latvia, Poland and
Portugal. Indicators of rm-level estimates of quality, which require information on
both value and quantity of exports by rm, are also not (or are poorly) computable for
a relatively high number of countries. Finally, for indicators of rm dynamics, it turns
out that computability is better for entry rates than for exit rates.
The mapping of computability of bottom-up indicators suggests that if scholars or
policymakers need to dene a competitiveness indicator through a bottom-up
approach, they might face three main situations:
59
I_001_04
I_003_03
I_001_06
Austria
Belgium
Bulgaria
Croatia
Czech Rep.
Denmark
Estonia
Finland
France
Germany
Hungary
Ireland
Italy
Latvia
Lithuania
Malta
Netherlands
Poland
Portugal
Romania
Slovakia
Slovenia
Spain
Sweden
UK
Micro-aggregated TFP all firms
1
2
2
1
2
2
1
2
1
9
2
1
1
2
2
2
2
1
2
2
1
1
2
1
2
2
1
1
2
2
2
2
2
1
2
2
2
2
1
2
1
1
1
1
1
2
1
2
2
1
2
2
1
1
1
1
9
2
1
1
2
2
2
2
2
1
2
2
1
1
2
2
2
2
2
1
1
1
2
2
2
9
9
1
1
1
2
2
1
2
2
2
2
1
1
1
1
2
1
1
1
2
2
2
2
2
9
9
1
1
2
2
1
2
2
2
2
1
1
1
2
1
1
1
2
1
2
2
I_047_01
I_023_04
I_041_03
I_070_01
I_001_04
I_003_03
I_001_06
9
2
1
1
1
2
2
2
2
2
2
2
2
2
1
1
9
1
1
1
1
2
2
2
2
2
1
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
9
1
1
1
1
1
1
1
1
1
1
1
9
1
1
1
1
2
1
1
2
2
1
2
2
2
2
2
2
1
1
9
1
1
1
1
2
2
2
1
1
1
2
1
1
2
2
2
2
1
2
1
1
2
2
1
1
2
1
1
2
2
2
2
2
60
9
2
1
1
1
1
2
2
1
1
2
2
1
1
1
1
9
1
1
1
1
2
2
2
1
2
9
2
1
1
1
1
1
1
1
1
1
1
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
9
1
1
1
1
1
1
1
1
9
1
1
1
1
1
1
1
9
1
1
1
1
1
1
1
9
1
1
1
1
9
1
1
1
1
1
1
1
9
1
1
9
1
1
1
1
1
9
1
1
9
9
9
I_070_01
1
9
1
9
1
1
1
9
9
1
I_041_03
1
9
1
1
1
1
1
1
1
1
1
1
1
1
9
1
1
I_023_04
I_047_01
Computability
I_052_03
2
1
2
1
1
1
2
1
I_051_03
1
2
2
1
2
9
2
1
10:03
16/2/15
I_052_03
1
1
9
I_051_03
2
1
1
2
1
1
1
9
9
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
9
1
1
1
1
1
9
1
1
9
1
2
1
1
1
9
1
1
1
1
1
1
1
1
1
1
1
1
9
1
9
1
1
1
1
1
9
1
9
1
1
16/2/15
10:03
Page 61
1. The data to calculate the indicator is available and the indicator is computable;
2. The data to calculate the indicator is not available and the indicator is not
computable;
3. The data to calculate the indicator is available but the indicator is not computable.
Cases (1) and (2) are straightforward: an indicator is computable (or not) if the
underlying data is available (or not). The third case is the most interesting and
challenging. We observe that many indicators require matching of dierent databases.
Our assessment is that it is not infrequent that researchers face problems at this point,
since some data sources cannot be matched, ie there is not a unique identier that
allows users to combine information from two or more datasets, or there are restrictions
limiting the possibility to combine dierent data sources.
In order to compute a competiveness index (bottom-up), it is not only necessary that
all the required variables are available. If data stems from dierent sources it is
important to match these sources in a unique database. This procedure is easier if the
same institution collects all the databases.
Once it is ascertained that an indicator is computable, the researcher needs to assess
if the data is actually accessible to someone who is not aliated to the institution(s)
providing the data. Table 2.4 (right panel) highlights that access to micro-level
databases is not an easy task for researchers, because of condentiality restrictions,
rules of access based on the nationality of the researcher (or the institutions to which
he/she is aliated) or based on discretionary choices. In many cases, access is
guaranteed to researchers under certain conditions and a submission of a research
proposal. Bottom-up indicators are not accessible in Romania, because a safe
environment for ensuring secure access to the data is not yet in place. In countries
such as Ireland, access is subject to stringent conditions, is possible only on-site, and
publication of results is subject to the approval of the National Statistical Institute.
In some countries, such as Austria and Slovakia, it was not possible to ascertain the
access rules. In some countries, nationality rules apply. In Belgium, data is collected
by the National Bank of Belgium, and is accessible only to NBB members (or aliated).
In Denmark, the procedure for accessing the data is clearly dened, and thus this
would qualify Denmark as demonstrating best practice in terms of data accessibility,
but access is allowed only to researchers aliated to Danish institutions. Similarly, in
Hungary data is easily accessible only to researchers who have an agreement with
the CSO. In the UK, access to HMRC Datalab is open only to UK-based institutions.
61
16/2/15
10:03
Page 62
The best practices in terms of data accessibility are those in which data can be
accessed remotely, with no constraints on aliations or nationality, and with a clearly
formalised procedure that has no (or little) room for discretion over who the data
provider can give access. In this perspective, Sweden appears to demonstrate best
practice, since data can be accessed remotely, conditional on a condentiality check
and an administrative charge. Similarly, in Finland and France, there is a rather clear
procedure to allow access to micro-data to external researchers, also via remote
connections. In France, access requests need to be approved by a committee, and this
creates some room for discretion. In Slovenia micro-data is accessible for research
purposes, but only at SURS. In the Netherlands access to micro-data is also relatively
easy, although it was not possible to ascertain if remote access is possible.
Germany has also some of the desirable features, such as the possibility of remote
access, but in this case, there is a problem of computability, since data is often
provided by dierent institutions and cannot be merged. In some countries, access to
data varies according to the type of data. For example, in Italy, only the data from the
surveys can be made available to external researchers, while micro-data with the full
population of rms is not accessible. In the Czech Republic, business register data can
be accessed relatively easily, while for other types of data, such as custom data and
FATS data, conditions are more stringent. Malta allows access to rm-level information
for research purposes, except for data on foreign ownership and capital. In Latvia, data
is available upon request, except for data on trade by destination and product, which
are condential.
In conclusion, the availability of an indicator depends on dierent factors that inuence
computability and accessibility. The computability of an indicator relies on dierent
factors such as the existence of the right data and the possibility to merge data from
dierent sources if necessary. The accessibility of data depends on the rules of access
and their clarity. The existence of large datasets is not a sucient condition to
guarantee the availability of an indicator. The best practices we observed rely on data
existence, ease of merging data from dierent sources, and clarity in the rules of
access.
62
16/2/15
10:03
Page 63
3. Bottom-up competitiveness
indicators comparable across
EU countries: challenges and
responses
16/2/15
10:03
Page 64
Linking data from dierent sources containing dierent records or including information on dierent subjects and issues is interesting for policy-oriented, comparative
scientic research for several reasons (see, for instance, Borgman, 2010; Christen,
2012; Herzog et al, 2010; Winkler, 2006):
More complex research questions can be addressed: for example, linking data on
employers with data on their employees might permit conclusions to be drawn
about the role of certain groups of employees, or about employment stability, for the
productivity of rms (see Bender et al, 2008). On a more general level, the
integration of data from administrative sources (register data) and survey data
might signicantly widen the scope and depth of potential analyses (see Bakker,
2010). Furthermore, longitudinal analysis might be made possible or facilitated.
Accuracy, reliability, and quality of existing data can be improved by cross-checking,
monitoring and validating information from dierent sources. Moreover, missing
information in one dataset might be completed by using information from another
dataset. There is also the potential to address and understand the reasons for
survey non-response, and to identify and treat measurement and representation
errors in register data (Bakker, 2010).
The burden on respondents, the bureaucratic eort and the overall costs of data
collection and analysis can be vastly reduced without compromising quality, and
the hidden potential of administrative data can be leveraged.
However, there are also a number of challenges and limitations. These involve technical
aspects such as data quality within existing datasets and diverging data quality
between datasets. Data harmonisation is an important issue in this respect. The major
obstacles to free matching of data are often legal restrictions or ethical issues
preventing the linking of data. Privacy and non-disclosure are pivotal issues in this
respect. However, against the background of the increasing availability of micro-level
data, computer science and research in social science have developed a series of
techniques and workarounds that are able simultaneously to leverage the potential
of matched data and to guarantee the preservation of privacy.
3.1.1 What is data matching?
A series of concepts and denitions exists on to the matching of datasets from dierent
sources. Data linkage or record linkage denotes simply the bringing together of
information from two records that are believed to relate to the same entity (Herzog et
64
16/2/15
10:03
Page 65
al, 2010, p. 1), such as the linking of information on addresses from a mailing list with
information on phone numbers from a telephone directory, or information on rms
employment gures from labour statistics with information on the rms balance
sheets. The terms data matching or statistical matching are used to refer to a series
of methods whose objective is the integration of two (or more) data sources referring
to the same target population. The data sources are characterised by the fact they all
share a subset of variables (common variables) and, at the same time, each source
observes distinctly other subsets of variables. Moreover, there is a negligible chance
that data in different sources observe the same units (disjoint sets of units) (Zio,
2012)49.
Linking data from dierent sources is not a new idea. Theoretical contributions and
early applications of data matching and record linkage techniques date back to the
1940s and they can be observed in large-scale census collections and in the health
sector. Newcombe et al (1959), for instance, relate dierentials in family fertility to
hereditary diseases by linking data from health records and a register of handicapped
children to birth and marriage records. Subsequent developments were aected quite
substantially by the upcoming discipline of computer science, with a special focus on
technical and methodological questions (eg Fellegi and Sunter, 1969). In recent years,
there has been a continuous convergence between statistics and computer science in
this respect.
Important factors facilitating and supporting these recent developments at the
interface of data matching, statistics and social sciences are (1) the rapid and
exponential advancements in information technology, particularly with respect to
hardware capacity (processors, memory, storage); (2) the continuous discovery and
opening up of data and data repositories, particularly at ocial data providers like the
Statistical Oces, or as Big Data, and their activation for scientic research, and (3)
the development of techniques and methodologies enabling access to and the
processing of condential data without violating privacy and nondisclosure aspects
related to the data (see, for instance, Schiller and Welpton, 2013).
3.1.2 Data quality as the basic precondition for data matching
Data quality is a crucial determinant of any eort to link data from dierent sources,
because it denes the credentials which dene the potential and the limits of matching
datasets. If the quality of a dataset is poor with regard to potential identiers, matching
49. For a more detailed discussion of these terminological issues see, for instance, Christen (2012).
65
16/2/15
10:03
Page 66
can be hampered or even precluded; it is likely that also the quality of a matched
dataset based on this data will be poor, although the process of matching can improve
data quality in several aspects: If data would be of perfect quality, then data matching
could be accomplished through straightforward database join operations
[deterministic matching] and no sophisticated indexing techniques or approximate
comparison functions would be needed (Christen 2012, p. 40). In some cases,
matching of data is also used in order to improve, complement or cross-check the
content of data of poor quality on a specic subject.
Data quality is a complex and multi-dimensional concept and it is described by several
criteria (see Christen 2012, p. 39f; Eurostat, 2003; UNECE, 2007), the most important
of which are:
Accuracy, integrity and reliability: What is the origin of the data? By whom have they
been collected, surveyed, compiled and/or changed? What are the framework
conditions of the data collection and compilation? Are there any commercial
interests involved? Is the information contained in the data believable?
Completeness: This aspect concerns both records and the attributes of records
(variables). How many missing values are there in the data? Why are values or
attributes missing? Are there any thresholds with regard to the coverage of
statistical units?
Consistency, coherence and comparability: The issue is relevant both within and
between datasets used for matching. Have there been changes in the coding of
attributes over time? Are there duplicate records in the database? Is an original
database (to be matched) grounded in dierent sources? Are the data or published
results from the data comparable to similar data? Are the concepts comparable to
other datasets?
Timeliness and punctuality: At what exact point in time was the data recorded? How
great is the time lag between reference point and clearance of data? How old is the
data?
Relevance and interpretability: Are the data and the issues covered relevant to
economic analysis? Are the contents of the databases meaningful and can they be
used in a reasonable way?
66
16/2/15
10:03
Page 67
Accessibility: Are there any restrictions on access to the data, eg for certain user
groups or for specic segments of the data? Do distinct regulations on data access
exist? In what respect is the data sensitive to non-disclosure?
Clarity and documentation: Is precise and accessible documentation of the data
available? Are metadata available in a standardised format (eg SDMX, ESMS; see
SDMX, 2008; European Commission, 2009a)? Are test data or scientic use les
(SUF) available?
Several factors have an impact on the quality of data. The following are of particular
relevance with regard to matched data (Christen, 2012):
Origin of data from multiple sources: if data originate from dierent organisations
with dierent backgrounds (eg dierent disciplines), this will aect consistency of
databases and has to be handled with caution.
Subjective judgement of data production: not all potentially relevant aspects are
recorded in the data to be matched, which might hamper the matching potential.
Data matching is a process consuming time, money and computing resources.
Particularly the latter have become much more easily available and tools have
become more and more powerful. But as many datasets grow simultaneously (eg
Big Data), more resources and novel techniques are always needed.
In linking data from dierent sources, a trade-off between security and accessibility
is frequently needed.
The inherent technical features of datasets are an important factor aecting
consistency of data from dierent sources. This refers both to the coding of data
and to data representations (eg relational databases).
Input rules might be restrictive and/or bypassed, which might hamper data quality.
For example, in a register survey of rms, there might be a complex system of
allocating the rm to an industry sector. Thus, many respondents might revert to a
simple solution and ll in, for example, simply manufacturing instead of
manufacturing of chemical products.
Last but not least, both data needs and the technical systems for data collection
and storage change over time. This might cause changes in the structure and
67
16/2/15
10:03
Page 68
contents of datasets, with certain attributes disappearing and new ones being
added to the data.
In summary, linking data from dierent sources has plenty of potential, but the quality
of the original data inuences the quality and the validity of the resulting matched
data. Thus, data harmonisation, which is described in the next section, is an important
feature of data matching and matchability.
3.1.3 Harmonisation of data
Harmonisation of existing data on dierent levels of aggregation is part of the technical
process of data matching, and is also a potential avenue towards the creation of
comparable cross-country data necessary for cross-country research50. The general
objective of data harmonisation is to improve data quality and to make the datasets to
be merged more comparable with respect to their central characteristics/variables
(Granda and Blasczyk, 2010).
Data harmonisation itself oers several benets. It provides a common basis for
standardised data, it decreases data redundancy and costs of data exchange, and it
ensures data compatibility and comparison (TID, 2012). Generally, harmonisation and
standardisation of datasets can be performed at dierent stages of the matching
process, with the two main forms of harmonisation being input harmonisation and
output harmonisation (CHINTEX, 2001; Kallas and Linardis, 2008; Burkhauser and
Lillard, 2005; see Figure 3.1)51.
The basic characteristic of input harmonisation is that standardisation starts before
any process of matching, ie the inputs of the matching process are harmonised right
from the beginning. Input harmonisation thus aims to achieve standardised
measurement processes and methods in all national or regional populations.
Comparability is realised through standardisation of definitions, indicators,
classifications and technical requirements (Granda and Blasczyk, 2010, p. 1). Input
harmonisation is always ex-ante harmonisation, while ex-ante harmonisation is
50. For a more detailed discussion of these terminological issues see, for instance, Christen (2012).
51. Another way would be the creation of new cross-country data from scratch, eg through new cross-national
surveys. There are several examples of such datasets that have been created during recent decades. Most,
however, take into account the micro-level of individuals, but rms or establishments have been rather neglected.
Notable exceptions are the Community Innovation Survey (CIS), the EFIGE Survey, or the Continual Vocational
Training Survey (CVTS). For a critical overview, see Burkhauser and Lillard (2005).
68
16/2/15
10:04
Page 69
implemented before data are surveyed or compiled, and ex-post harmonisation refers
to already existing records (Kallas and Linardis, 2008).
Figure 3.1: Input and output harmonisation
Input
Original datasets
Original datasets
Original datasets
Measurement
procedures
Measurement
procedures
Measurement
procedures
Matching
process
Matching
process
Matching
process
Matched datasets
Matched datasets
Matched datasets
16/2/15
10:04
Page 70
Statistical units,
Reference periods,
Populations (coverage),
Variables (in case of dierences in denition),
Classications,
Metadata.
For many of these issues, international standards already exist, for example for
classications of industries or products. Concerning metadata, the SDMX framework
denes standards for the international exchange of metadata and is applied by several
international organisations such as Eurostat, the World Bank and the OECD (SDMX,
2009; Vale, 2009, p. 28). Particularly for Europe, the European Commission has set
up a recommendation on reference metadata for the European Statistical System (the
ESMS, see European Commission, 2009a), which refers to the European Statistics Code
of Practice (Eurostat, 2011) and is based on the SDMX framework.
The limits of data harmonisation are mainly dened by national institutional
frameworks or by existing technical rules and standards, which are generally hard to
overcome. In particular, fundamental concepts of statistical units such as rms,
establishments or employees are often dened slightly dierently in dierent
countries (see Broersma et al, 2010, for an example of employer-employee data in
the Netherlands and in Germany).
3.1.4 Privacy and non-disclosure
An important issue for the analysis of micro-level data is privacy and condentiality of
information on single statistical units, particularly individuals, households, enterprises
or administrations (see UNECE, 2007a, for an overview). The legal conditions on non70
16/2/15
10:04
Page 71
disclosure are generally a national matter and they dier widely between European
countries, although some harmonisation eorts have been pursued already, for
example the European Commission Regulation (EC) No 831/2002 on Community
Statistics, concerning access to condential data for scientic purposes (see European
Commission, 2002) or more recently the European Statistics Code of Practice
(Eurostat, 2011).
Regulation 831/2002 applies to access to a series of Pan-European micro-level
datasets, for which it sets out procedures for access to condential data (see Santos
and Museux, 2005)52. Beyond the datasets covered by this regulation and its
amendment in Regulation 1000/2007, access to micro-level data on a European level
is theoretically granted, but in practice it is rather restricted, as stated in the European
Statistics Code of Practice (Eurostat, 2011, p. 8): Access to micro-data is allowed for
research purposes and is subject to specific rules or protocols.
With regard to the scientic analysis of micro-level data, there is a trade-o between
the perception of privacy and the risk of identication of sensitive information (such
as on individuals health complaints or on rms business strategies), and the interest
in and need for scientic research (Santos and Museux, 2005). Matching data from
dierent sources might create additional challenges for privacy protection, as the
quality and the quantity of information on single observations (ie individuals or rms)
generally increase when linking data from dierent sources.
Many data-holding institutions in European countries (and worldwide) have introduced
techniques allowing for the analysis of micro-level data without violating rules of
nondisclosure, thus guaranteeing the condentiality of the respective data. Some of
these techniques will be discussed in the next section.
3.1.5 Potential solutions and workarounds for data and matching restrictions
One approach to overcome at least some of the challenges of matching processes and
of matched datasets are so-called matching architectures. These techniques are
primarily intended to prevent misuse of data to be matched. For example, databases
to be matched can be sent to a trusted matching institution before being sent to
52. These datasets are the European Community Household Panel (ECHP), the Labour Force Survey (LFS), the
Community Innovation Survey (CIS) and the Continuing Vocational Training Survey (CVTS). More recently,
Regulation 831/2002 was amended by Commission Regulation 1000/2007 which includes further datasets,
namely the Structure of Earnings Survey (SES), the European Union Statistics on Income and Living Conditions
(EU-SILC) and the Adult Education Survey (AES).
71
16/2/15
10:04
Page 72
researchers for analysis (see Figure 3.2). The matching unit then only matches the
identiers, whereas researchers later do not get the identiers but only the contents
of the matched data (for an example, see Brook et al, 2008).
Figure 3.2: A simple architecture for matching of confidential data (three-party
protocol)
1
Data provider A
Data provider B
2
3
Matching unit
External
data user
As the involvement of the third party (the matching unit) causes some disclosure and
security risks (eg collusion of the data provider with the matching unit), the process
can also be performed without a matching unit, and the data providers can
communicate directly with each other.
Condentiality issues can also be addressed at the level of data access. As many
micro-level datasets contain sensitive information, for example with regard to
individuals or rms characteristics, which can be directly linked to the respective
rms or individuals, issues of privacy and non-disclosure are pertinent. Most often,
there are country-specic legal restrictions governing the non-disclosure of the data.
Without accessing micro-level data directly, however, a reasonable analysis of the data
is often not possible. Therefore, several solutions for researchers to get access to
original or slightly anonymised data without the risk of de-anonymisation have been
developed in recent years (DWB, 2012).
72
16/2/15
10:04
Page 73
Generally, these solutions range along a continuum from no access at all to restricted
access and full access. Whereas the rst and the last alternatives are irrelevant in
the present context, various alternatives have been developed with regard to the
provision of partial or restricted access to micro-level data.
Restrictions (and thus, the necessary non-disclosure and condentiality of data) can
be either realised by limiting the data to a restricted sample (eg a Scientic Use File),
through the anonymisation of sensitive parts of the data (eg identiers, addresses,
names), or by restricting access to these sensitive attributes of records. In this context,
data providers have developed a series of techniques to regulate access to micro-level
data. One way is through on-site access to the original data: the researcher has to visit
a physical data storage environment (safe centre) in which the legal and technical
aspects of condentiality can be taken into account (DWB, 2012; Brandt, 2012).
Another solution applied by several national statistical institutes and data archives is
the concept of remote access. The researcher sends the syntax of his programme for
data analysis53 to the data provider, which runs the programme on the basis of the real
data. Ultimately, the researcher has only access to the results (which are, moreover,
checked for potential disclosure and privacy issues) and does not see the micro-level
data itself (DWB, 2012).
Some institutions are able to provide a more advanced remote access, allowing the
data user to access the (anonymised) data from anywhere without being able to
access sensitive characteristics. This is the case in the Netherlands and Sweden, for
instance, and is being assessed by a project in Germany, the Morpheus Project (see
Hhne and Hninger, 2013). This project analyses an anonymised dataset stored on
a server located at a statistical institute (it is not possible to download the data). After
running the programmes, researchers receive the results of their analysis as well as
a corresponding quality assessment, which allows for an evaluation of the validity of
the results.
To improve access to dierent micro-datasets, Eurostat has launched some projects
with international partners: the Decentralised Access to EU Micro-data Sets project
(completed 31 January 2010) and the Decentralised and Remote Access to
condential data in the ESS (DARA) project (Brandt, 2012).
53. Most data holders also provide some type of dummy data which simulates most of the characteristics of the real
data and which helps the researcher to prepare operative programmes.
73
16/2/15
10:04
Page 74
Schiller and Welpton (2013) present a solution for the current European Union Remote
Access Network (EU-RAN), established by the Data Without Borders project. This project
plans to allow access to detailed condential data from around the EU to researchers
from within their own country of residence, which would eliminate travel time and
costs. Their proposal builds on ve general principles (Schiller and Welpton, 2013):
To put it simply, the solution from Schiller and Welpton (2013) uses a remote access
which only requires simple VPN (virtual private network) software54. Figure 3.3
illustrates the principle of EU-RAN. Data providers (usually from dierent member
states) make data available, which always remains within the institutions or at least
within the country of origin in order to comply with national legal requirements. On the
reverse side, researchers or other users have (restricted) access to the data via secure
connections from either anywhere, at the data providing institution itself, or within a
specically equipped safe centre.
The fact that researchers have access to the data does not necessarily imply that they
can download the data. Therefore it is necessary to provide a virtual working
environment which includes analytical software and applications that allow results
to be generated, prepared and presented. The purpose of the information platform with
metadata is the provision of information and a general support.
One possible option for the future is the MiCoCe (micro-data computation centre)
concept, whereby only small parts of the data are moved into the working memory of
the MiCoCe, and are later deleted. Secure connection systems are used (see Schiller,
2013).
54. This system provides a secure encrypted connection between the user and the server with the data, as widely
used for nancial or military services.
74
16/2/15
10:04
Page 75
16/2/15
10:04
Page 76
Access points
Anywhere
Institution
Safe centre
Secure connections
Data storage
Secure
connections
Data provider
A
Data provider
B
Data provider
C
Microdata
computation centre
(MiCoCe)
two or more countries. All these issues are of great relevance for the analysis of
competitiveness.
Examples of recent and ongoing projects making use of the distributed micro-data
approach are CompNet (see ECB, 2013) and EU KLEMS (OMahony et al, 2008;
OMahony and Timmer, 2009). Within Work Package 10 of EU KLEMS55, a series of
economic indicators, particularly relating to productivity, have been assembled from
micro-level data from dierent European countries.
In the light of the still remaining severe restrictions on the accessibility of micro-level
data, particularly when it comes to cross-country perspectives, the distributed micro55. Within the FP6-funded EU KLEMS project, both aggregate and micro-level data on various economic topics have
been collected and analysed using a cross-country comparative approach. For further information, see
www.euklems.net.
76
16/2/15
10:04
Page 77
77
16/2/15
10:04
Page 78
institutes (NSIs) were supervised by their governments and were free to decide
objectives and methods to produce a variety of statistics. The harmonisation of
statistics has been implemented (and is still far from complete) gradually in parallel
with the enlargement of the European Union.
NSIs now collect, edit and store micro-data from several sources to meet national
needs and EU requirements. While they have to provide detailed, quality statistics to
researchers and policymakers, they are also obliged to protect the condentiality of
the information. Traditionally, NSIs publish aggregate information at the macro or sector
level, and currently most of the information transmitted to Eurostat is in the form of
aggregate numbers, or simple frequency or magnitude tables. As a consequence, data
protection methods for aggregate, tabular data are well established in all EU member
states (Hundepool et al, 2010). However, in recent years, the demand for micro-data
for research purposes gradually increased, setting new challenges for data protection.
The provision of statistics to Eurostat by the NSIs is a cost-eective solution for
Eurostat, but it puts a heavy burden on NSIs (Sverdrup, 2005). Balancing the available
resources between the needs of Eurostat and national providers is often problematic
because of the increasing demand for detailed, quality statistics at the EU level. All
NSIs dedicate a substantial part of their resources to meet the EU requirements. This
is especially true in small countries, where NSIs work mostly to serve the needs of the
EU.
Hence, we are at a new stage of data collection, which has been also induced by the
widespread use of micro-data and proposals from economists on how rm-level data
should be used to compare competitiveness, labour markets and other economic
features in dierent countries. Ideally, in a European research area, scientists can
access data from all countries, datasets will be matched while preserving condentiality and micro-data based measures will be created in a unied form to obtain
comparable measures.
Data harmonisation methods build on principles established by other international
organisations especially the United Nations and the Organisation for Economic
Cooperation and Development but an important dierence is that while the standards
set by other international organisations are generally authoritative but not obligatory,
the EU can impose legal obligations on member states (see Shearing, 2013), though
the EU system remains decentralised, with Eurostat in a coordinating role. This
decentralised structure is a plausible solution, since the system must be able to
incorporate national statistical systems which developed independently and have
78
16/2/15
10:04
Page 79
79
16/2/15
10:04
Page 80
administrative data, but practices vary widely. Diering practices in the use of microdata go hand in hand with dierences in national legislation governing the treatment
of micro-data. As a result, there are several comparability issues for the raw data (see
section 3.1). Furthermore, as the main objective is to serve Eurostat at aggregate level,
access to micro-data at EU level is not a priority. Consequently, condentiality and
access regulations remain in national hands and vary greatly.
Since the current system is regarded as inexible and unable to appropriately adapt
to changing user needs, there is an intention to move away from the separate
production of statistics towards a more integrated system60. For instance, the European
Commission decided to improve the accessibility, harmonisation and applicability of
European statistics (see, for instance, European Commission, 2001, and Lamel, 2002).
An important step towards this goal was the implementation of the European Statistical
System Networks of Excellence (ESSnet) addressing the need for synergies,
harmonisation and dissemination of best-practice methods within the ESS61.
Subsequently, ESSnet projects were designated as networks of several ESS
organisations aimed at providing results that will be beneficial to the whole ESS
(Eurostat, 2013). One central characteristic of an ESSnet project is the connection of
a wide range of expertise throughout the ESS organisations in order to develop specic
actions which would benet the whole European system. Using such a method, it is not
necessary that all EU member states participate in every ESSnet project, results of
which are shared with the rest of the EU countries (see Table 3.1 for a selection of
recent ESSnet projects).
60. Communication from the Commission to the European Parliament and the Council on the production method of
EU statistics: a vision for the next decade, COM(2009) 404 nal. Recent eort related: ESS VIP programme.
61. The initiative started with the implementation of the Centres and Networks of Excellence (CENEX) in 2005. The rst
CENEX (pilot) project on Statistical Disclosure Control (SDC) started at the end of 2005, lasted twelve months and
involved statistical oces from eight European countries (Hundepool, 2007).
80
16/2/15
10:04
Page 81
Organisation
Detail
Institute National
Statistics Sweden
GEOSTAT 1B
Statistics Norway
Statistics Denmark
MEMOBUST
Statistics Netherlands
NET-SILC2
CEPS/ INSTEAD62
Source: Bruegel.
62. http://www.ceps.lu/?type=module&id=53.
81
16/2/15
10:04
Page 82
With regard to these criteria, it is obvious that any ESSnet project has only a supporting
character and can never be a stand-alone venture.
3.2.2 The current modernisation of European business and trade statistics
One of the rst ESSnet programmes was adopted in December 2008 with a term of ve
years from 2009-13 and was called Modernisation of European Enterprise and Trade
Statistics (MEETS, see European Economic Community, 2008). The aim of MEETS,
which included various projects, was the adaptation of business statistics to new
needs, including the adjustment of the statistical system to the production of statistics
and to the reduction of the burden on enterprises in collecting and providing internal
data. MEETS was intended to contribute to the following objectives (European Economic
Community, 2014):
To review priorities and develop indicators for new areas;
To achieve a streamlined framework for business-related statistics;
To support the implementation of a more ecient way of producing enterprise and
trade statistics;
To modernise INTRASTAT63.
To reach these targets, the European Commission spent 42.5 million. MEETS consists
of several smaller studies, including dierent ESSnet projects which directly or
indirectly contribute to it (European Commission, 2011a, see also Table 3.1)64.
In addition to MEETS, Eurostat has started the FRIBS project (Framework Regulation
Integrating Business Statistics) which aims to satisfy the need for the integration of
global business-related statistics into a single cross-cutting legal framework (European
Commission, 2012). The project started in 2011 with a ve-year duration. It was
launched to meet the objectives of the European Statistical Programme 2013-17
(European Commission, 2011a).
Specically, the European Commission plans to provide a common infrastructure tool
for the production and compilation of business statistics and to dene consistent data
63. INTRASTAT is a unique database founded on the EU Regulation No. 3330/91 which regulates the collection of
information and the production of statistics on trade in goods between countries of the European Union (European
Commission, 1991).
64. In addition to ESSnet projects, a number of external studies conducted by national statistical institutes or external
experts have also been commissioned (European Commission, 2011).
82
16/2/15
10:04
Page 83
requirements and a common data quality framework. This will make the linking and
matching of statistics obtained through the regular collection of global business
statistics possible, providing greater added value to the collection of information.
Therefore, FRIBS tackles several issues (European Commission, 2012; Statistikrat der
Bundesanstalt Statistik sterreich, 2013), such as:
The lack of full methodological consistency in dierent domains of business
statistics;
The dierences in surveys on business statistics and their diverging periodicities
across Europe;
Non-harmonised use of administrative sources in EU countries;
Improvement in the exchange of micro-data between the member states of the ESS;
The high burden on enterprises in terms of reporting intra-EU trade statistics; and
Lack of data linking across business-statistical domains.
Along with the MEETS and FRIBS programmes, the European Commission released
several additional recommendations and practice guidance. One of the rst initiatives
in this respect was the installation of the Foreign Aliates Statistics System (FATS,
see European Economic Community, 2007). This database measures commercial
presence in foreign markets through aliates and therefore describes the overall
activity of foreign aliates residing in a given target country (Eurostat, 2009).
Inward and outward FATS data is available on an annual basis. Although the rules for
uniform data collection were established only in 2007, data goes back to 199665. Data
collection is done by the statistical oces of the member states and data is then
aggregated by Eurostat. This system is also used for many other databases (eg ITSS or
ITGS, see below).
Another implementation of a common European database is the Single Market
Statistics System (SIMSTAT), started in 2011 and following the previous INTRASTAT
database (European Statistical Advisory Committee, 2012). This database is of
particular importance because the collection of INTRASTAT data generates around
50 percent of the administrative burden from ocial statistics (Radermacher, 2013).
SIMSTAT uses principles of modern design for trade statistics, which opens up the
possibility of gradually replacing the import survey of, for example, ITSS by a combined
65. Between 1996 and 2006, data was collected on a voluntary basis and thus is not complete in terms of country
coverage or uniformity.
83
16/2/15
10:04
Page 84
16/2/15
10:04
Page 85
European Commission plans to launch several other projects (Museux et al, 2013, and
European Committee, 2013).
To sum up, there is an intention at the EU level to meet the increasing demand for microlevel data for research purposes, but there are many open questions about practical
implementation. Despite that fact that collaborative projects provide guidance and
assistance to the member states, substantial dierences between member states
remain. Most countries provide access to condential micro-data for scientic
purposes, but both the set of available databases and the conditions of access vary in
dierent countries.
3.3 Cross-country and matched datasets in Europe overview and examples
3.3.1 Overview
Table 3.2 gives an overview of examples of cross-country and matched datasets in
Europe and beyond. Four types of matched datasets, projects or institutions providing
support and access for matched data can be distinguished:
Type 1: Multi-country harmonised micro-data collections
This type of cross-country dataset comprises collections of data from dierent
countries which are compiled on the basis of a harmonised methodology. This
is the case with, for example, systematic and regular collections of available
data (such as the rm-level data provided by Bureau von Dijk) or with crosscountry surveys based on a harmonised methodology and harmonised
questionnaires.
Type 2: Micro-aggregated statistics
These are collections of aggregate data (eg on sectoral and/or regional levels)
which have been compiled from micro-level data on the basis of a harmonised
methodology, mainly distributed micro-data approaches. Examples are the
CompNet database or the OECDs DynEmp data.
Type 3: Specific projects dedicated to matching micro-level data
This type of matched micro-level data is based mostly on singular projects
with a specic, mostly topical aim. Usually, the resulting datasets can be
replicated for the specic purpose of the project, but it cannot be used outside
the project because of technical and/or legal restrictions.
Type 4: Coordination actions and collections of meta-data
Type 4 is not about matched cross-country micro-level data itself, but
comprises initiatives which have the aim of organising, supporting and/or
85
16/2/15
10:04
Page 86
facilitating the access and the matching of micro-level data from dierent
countries (sometimes, such initiatives also exist within countries). Examples
for such initiatives are the Data without Boundaries (DwB) or the German
KombiFiD projects.
In Section 3.3.2 below, illustrative best-practice examples for each of the above four
types of matched data/institutions will be described and discussed.
3.3.2 Examples of cross-country (and) matched datasets in Europe
To illustrate the types of recent data matching eorts, we briey outline ve examples.
The EFIGE dataset is an example of a multi-country harmonised micro-data collection
(Type 1); the dataset being synthesised within the CompNet project is an example for
a micro aggregated dataset (Type 2); the project Combined rm-level data for Germany
(KombiFiD) serves as an illustration of what has been labelled specic projects
dedicated to matching micro-level data (Type 3); and the Data without Boundaries
DwB project is an example of a coordination action aiming at facilitating data access in
general (Type 4). Finally, the Global Value Chain project is an example of a combination
of a multi-country survey (Type 1) and micro-data linking (Type 3).
3.3.2.1 EFIGE
The EFIGE dataset66 is dataset generated within the EFIGE (European Firms in a Global
Economy: internal policies for external competitiveness) project, which was supported
by the European Commissions 7th Framework Programme, coordinated by Bruegel
and carried out between September 2008 to August 2012 by academic and
international institutions and national central banks in Europe67. The dataset provides
representative and comparable samples of manufacturing rms in seven European
countries. It includes about 3,000 rms for each of Germany, France, Italy and Spain,
more than 2,200 rms for the United Kingdom, and about 500 rms for each of Austria
and Hungary.
The EFIGE survey, for the rst time in Europe, included a broad array of questions that
allow several crucial issues related to competitiveness to be addressed. The
questionnaire generated both qualitative and quantitative data on rms characteristics
and activities, for a total of about 150 variables covering six broad areas:
66. The complete name is EU-EFIGE/Bruegel-UniCredit Dataset (Altomonte and Aquilante, 2012).
67. See http://www.ege.org/ for details of partners.
86
Community
Innovation
Survey (CIS)
87
The CompNet database is an outcome of the work
of the Compnet project, organised by the ECB with
the participation of the national central banks of EU
countries. The objective of CompNet is to develop a
more consistent analytical framework for assessing competitiveness, which allows for greater correspondence between determinants and
outcomes. The CompNet database contains various indicators of competitiveness resulting from
the analysis of (national) micro-level data based
on a harmonised methodology (DMD approach).
The Community Innovation Survey (CIS) based innovation statistics are part of the EU science and
technology statistics. Surveys are carried out with
two years frequency by EU member states and
number of ESS member countries. Compiling CIS
data is voluntary to the countries, which means
that in different surveys years different countries
are involved.
EU
2012-present
2000, 2004,
2006, 2008,
2010
EU countries
Time span
1990-present
Countries
43 countries
Annually
Biannually
Weekly updates
https://www.ecb.eur
opa.eu/secure/
comtrade/login.html
Access can be
acquired by
purchase
10:04
ECB
Eurostat
Amadeus
European
Company Data
16/2/15
CompNet
Provider
Name
Table 3.2: Examples of cross-country and matched datasets in Europe and beyond
88
is a set of annual directories
Dun &
(D&B). It allows the identificaion of relationships between companies, suppliers and
customers worldwide and provides detailed information about more than 3.5 million companies including their corporate structure, ownership, etc.
The information is divided into seven geographic
regions and facilitates the establishment of appropriate networks or the taking of profitable business decisions based on competitor analysis.
International
EU
Europe
Countries
1958-present
2001-2003
2011-15
Time span
Access can be
acquired by
purchase
http://statmath.wu.a
c.at/stat4/hackl/die
cofis/-
Completed
Annually
www.dwbproject.org
10:04
DIECOFIS
EU Commision
Development of a
System of
Indicators on
COmpetitiveness
FIScal Impact
on Enterprise
Performance
16/2/15
Provider
EU Commission
Name
Data without
Boundaries
(DwB)
Table 3.2: Examples of cross-country and matched datasets in Europe and beyond, continued
89
Enterprise Surveys collect fully comparable firmlevel survey data on about 80,000 firms in 122
countries (with a focus on World Bank client countries). Including non-global surveys, the total
number of observations is about 110,000 in 135
countries. The ES has are intended to become the
main source of comparable firm data across countries and through the years with the aim to build
comprehensive panel data sets. Currently the
panel data comprises 79 countries.
Emerging
countries
International
Countries
2005-present
2010
2001-11
Time span
http://www.efige.org
/efige-datareleased/
http://www.oecd.org
/sti/dynemp.htm
Annually
10:04
World Bank
European
Commission
EFIGE
16/2/15
Enterprise
Surveys
Provider
OECD
Name
DYNEMP
Table 3.2: Examples of cross-country and matched datasets in Europe and beyond, continued
90
The EU KLEMS Database results from the corresponding EU KLEMS project, which aimed at creating a database on measures of economic growth,
productivity, employment creation, capital formation and technological change at the industry
level for all European Union member states from
1970 onwards. This work was intended to provide
an input to policy evaluation, in particular for the
assessment of the goals concerning competitiveness and economic growth potential as established by the Lisbon and Barcelona summit goals.
The database aimed at facilitating the sustainable
production of high quality statistics using the
methodologies of national accounts and inputoutput analysis.
EU
EU
1970-2011
Project ran from
2003-08 and is
finished.
2010-12
2013
Time span
http://dragon155.st
artdedicated.com/o
ns_drupal/taxonom
y/term/32
http://dragon155.st
artdedicated.com/o
ns_drupal/
Finished
Finished
10:04
European
Commsion
ESSNet/Eurostat
ESSLimit
EU
Countries
16/2/15
EU KLEMS
Provider
ESSNet/Eurostat
Name
ESSLait
Table 3.2: Examples of cross-country and matched datasets in Europe and beyond, continued
91
FDi Markets provides information on companies
globalising through FDI. Part of the service is an
online database of crossborder greenfield investments across all sectors and countries worldwide.
The investment project database provides realtime monitoring of crossborder investment projects which allows filtering investment
opportunities, understanding investment flows
and patterns etc. Also available is a company
database, comprising profiles of all companies investing overseas. Besides, there is an Investor
Signals Module which functions as early warning
signal and indicates whether a company may be
considering investment.
International
2003-present
2008-present
EU
The EuroGroups Register (EGR) has been established as a network of registers in Member States
and on the EU level, the Business Registers of
NSIs (and in future the corresponding databases
at NCB/ECB) and the central EGR at Eurostat. When
the EGR network becomes fully operational it
should serve as a unique survey frame and form
the basic tool for improving many statistics related to globalisation.
Real-time
monitoring
http://www.fdimarkets.
com/
Data is available in
anonymised form via
CD-ROM, or remotely
via the LEED-LISSY
system. However, the
data available through
LEED-LISSY is limited
with respect to
countries and years.
The entire data in
unanonymised form is
available only at the
Safe Centre at
Eurostats premises in
Luxembourg.
10:04
Financial Times
EU Commision
EuroGroups
Register
Time span
2002, 2006,
2010
Countries
16/2/15
Provider
Eurostat, London
School of
Economics
Name
EU linked
employer/
employee data
(ESES)/SES
Table 3.2: Examples of cross-country and matched datasets in Europe and beyond, continued
Federal Reserve
Bank of New York
or the Federal
Reserve System
International
Wage Flexibility
Project (IWFP)
Once
Never again
Annually
92
IPUMS-International
makes these data
available to qualified
researchers free of
charge through a web
dissemination
system.
https://international.i
pums.org/internation
al/index.shtml
10:04
1850-present
Time span
16/2/15
International
EU 27 and
Norway
Countries
Minnesota
3, 4
Population
Center, National
Statistical Offices,
and international
data archives.
The aim of this FP 7 research project was to improve our understanding by providing new data on
intangible capital and new evidence on the contributions of intangible capital to economic growth.
The study intended to improve information about
the capital embodied in intellectual assets (eg
human capital, R&D, patents, software and organisational structures) and it aimed at unovering the
growth potential associated with intangible capital
accumulation in manufacturing, service industries
and the rest of the economy.
Integrated Public
Use Microdata
Series
International
Provider
European
Commision
Name
Innodrive
Table 3.2: Examples of cross-country and matched datasets in Europe and beyond, continued
Cross National
Data Center
Luxembourg
Cross National
Data Center
Luxembourg
Luxembourg
Employment
Study Database
(LES)
Luxembourg
Income Study
Database (LIS)
International
International (12
countries)
Germany
Countries
1980-present
1990, 1995
Once
Time span
Remain on our
servers and, if you
wish to access
them, you may.
1. LISSY : A remoteexecution system
that allows research
using the LIS or LWS
microdata.2. Web
Tabulator: An online
table-maker. 3. LIS
Key Figures: Two
sets of national
indicators
Waves
Data available to
researchers until
31/12/2014
Finished
Never again
10:04
93
LIS is focused on income microdata, contains harmonised datasets collected from multiple countries over a period of decades. The LIS datasets
contain data on market income, public transfers
and taxes, private transfers, household characteristics, labour market outcomes, and, in some
datasets, expenditures. The datasets include
household- and person-level microdata. LWS is focused on wealth microdata, contains a smaller
number of harmonised datasets. The LWS
datasets include variables on assets and debt,
market and government income, household characteristics, labour market outcomes and, in some
datasets, expenditures and behavioural indicators. The LWS datasets contain household-level
microdata.
KombiFiD was a feasibility study conducted between 2008 and 2011 aiming to assess the potentials, the obstacles and the benefits of
matching official firm-level micro data from different institutions in Germany. Administrative and
survey data from three official providers, which is
in principle not matchable due to legal restrictions, has been prepared for matching by obtaining the written consent of a sample of firms. The
result was a sample of more than 16,500 firms
which could be used for analyses of various research questions. Due to legal restrictions, the
time frame was limited until the end of 2014.
16/2/15
Provider
Federal Statistical
Office, Federal
Labour Office,
Deutsche
Bundesbank
Name
KombiFiD
(Combined Firm
data for
Germany)
Table 3.2: Examples of cross-country and matched datasets in Europe and beyond, continued
94
The MICRO-DYN centralised database is an attempt
to reconcile and combine aggregated firm-level
data from statistical offices in a number of European countries in one dataset. The final dataset
contains data (27 indicators on, e.g. firm characteristics, employment and productivity) from the
national statistical offices of 10 countries and was
supplemented with data from the Amadeus database for eight additional countries. The data generated from the Amadeus database was put
separately since it is in many ways not comparable and should only be used with caution jointly in
the analysis with data from statistical offices.
18 European
countries
1985-2009
(partially)
2004-present
Time span
1994-present
Countries
International (12
countries)
Finished
http://www.microdyn.eu/files/wp7/Mi
croDyn_Database_D
escription.pdf
10:04
wiiw (Wien)
Centre for
Economic
Performance
(LSE)
World
Management
Survey
16/2/15
MicroDyn
Provider
Cross National
Data Center Luxembourg
Name
Luxembourg
Wealth Study
Database (LWS)
Table 3.2: Examples of cross-country and matched datasets in Europe and beyond, continued
Provider
95
The World Input-Output Database (WIOD) provides
time-series of world input-output tables for forty
countries and a model for the rest-of-the-world,
covering 1995 to 2011. These tables have been
constructed on the basis of officially published
input-output tables in conjunction with national
accounts and international trade statistics. It also
provides data on labour and capital inputs and
pollution indicators at the industry level.
WIOD
EU Commission
27 EU countries
and 13 other
major countries
20 countries,
worldwide
Scandinavia
Countries
1995-2011
2008-2012
Time span
finished
finished
finished
http://www.wiod.org
/new_site/data.htm
Book is output
Open Access:
http://www.norden.o
rg/en/publications/o
pen-access
10:04
16/2/15
Name
Table 3.2: Examples of cross-country and matched datasets in Europe and beyond, continued
16/2/15
10:04
Page 96
16/2/15
10:04
Page 97
16/2/15
10:04
Page 98
and welfare) also by building a bridge between micro and macro analysis, in order to
support the design of adequate policies.
On the micro level, the research conducted within the Network has conrmed the
importance of rm-level factors (such as size, ownership and technological capacity)
in understanding the drivers of aggregate performance. It has also developed a
centralised project to compute cross-country homogenous indicators of labour and
total factor productivity, and analyse the role of resource reallocation in increasing
aggregate productivity.
CompNet is organised in three work streams related to:
1. Aggregate measures of competitiveness;
2. Firm-level studies;
3. Global value chains (GVCs).
One of the main policy questions addressed by CompNet is how aggregate productivity
can be enhanced. As discussed earlier, a thorough analysis of competitiveness in
dierent countries is best done by using rm-level data because rms are very
heterogeneous. Therefore, information on rm-level drivers of competitiveness is being
lost when working with country- or sector-level aggregates. However, because of
condentiality restrictions, the necessary rm-level datasets are not readily available
in dierent countries. Nevertheless, in many European countries the micro-level data
can be accessed from within the respective countries. Exploiting this fact, CompNet
has opted to employ the Distributed Micro-data Approach (DMD) (see section 3.1.6) in
order to compute dierent indicators of competitiveness at the micro level.
As such, CompNet has created an active network of country teams that independently
run a common algorithm to compute a large number of competitiveness indicators.
The CompNet rm-level indicator database is superior to others available because of:
(i) coverage (58 2-digit, NACE Rev. 2, manufacturing and non-manufacturing sectors
in 13 EU countries); (ii) time horizon (2002-2010), since it includes the recent boombust cycle and (iii) cross-country comparability. The rst round of the so-called Do-File
exercise has been completed and the second round is underway. Research output of
the network can be accessed via:
http://www.ecb.europa.eu/home/html/researcher_compnet.en.html.
98
16/2/15
10:04
Page 99
16/2/15
10:04
Page 100
rms was then matched using the available common identiers, and is used as the
KombiFiD dataset.
Technically, the linking of the information from the dierent datasets was realised via
common identiers jointly available across the dierent sources and via record linkage
techniques. The basic dataset for linking data from the Statistical Oces and the
Federal Employment Oce is the Business Register, which has been constructed since
the 1990s in Germany (and in other European countries due to EU legislation68). The
Business Register contains several rm identiers: a unique Business Register ID, the
establishment numbers of all corresponding establishments and tax numbers (see
Gruhl et al, 2012, pp. 10-15, for a detailed assessment of this matching process).
Matching data from the Deutsche Bundesbank was less straightforward. As no common
identiers are available between the datasets described above and the data to be used
from the Bundesbank, record linkage techniques based on the rms names and
addresses were used (see Koch and Neugebauer, 2014, for a more thorough
description).
The resulting KombiFiD dataset contains all the information from its constituent
datasets for the rms which agreed to the matching of their data. A detailed description
and lists of variables are available in Gruhl et al (2012, pp. 21-85). The data is
accessible to external researchers in a weakly anonymised version69.
In general, a broad range of issues can be examined using the KombiFiD data. Up to
now, however, the dataset has been only sparsely used in economic and statistical
analyses. Exceptions are the papers by Wagner (2012 and 2012a) and Vogel and
Wagner (2012), whereas only Wagner (2012a) goes beyond methodological aspects.
This relatively scant utilisation of the potentially very rich KombiFiD data can rst be
attributed to the fact that the data has been made available to the public only quite
recently. With regard to the analysis of competitiveness, the dataset contains a
comprehensive set of variables from the dierent sources allowing evidence to be
generated on, inter alia, growth, productivity, trade or employment.
It may, however, also be attributed to the fact that the data has some major drawbacks:
rst and foremost, it has to be pointed out that the use of the KombiFiD data was
68. Council Regulation No. 2186/93.
69. This type of anonymisation means that some variables, eg regional and sectoral identiers, are only available in
an aggregate form.
100
16/2/15
10:04
Page 101
restricted until 31 December 2014 which made the serious utilisation of the data very
dicult. To our knowledge, the data has to be erased completely from the servers of
the data providers after that date, thus making research projects or even working
papers nearly impossible as results cannot be veried after that date. Another serious
drawback of the data itself is that no information is available about the rms from the
original sample that refused consent for their data to be matched for the project. This
results in no information on a potential selection bias, making thorough analyses hard
to realise.
Wagner (2012) and Wagner and Vogel (2012) performed tests on the quality of the
KombiFiD sample for the manufacturing and the service industries on the basis of data
from the Statistical Oces. They come to the conclusion that the quality of the
KombiFiD sample can be regarded as high only for the former West Germany, whereas
for the former East Germany an assessment of quality is not possible because of the
small sample size.
Ultimately, the KombiFiD project was a huge and ambitious eort with very meaningful
objectives, ie creating a new dataset building on existing information and thus sparing
rms from participating in further surveys. The expectation was also to evaluate the
future potential of similar projects.
The expectations have only partially been met, and the main drawbacks can be traced
back to existing legal regulations preventing deeper cooperation or even exchange of
data between data providers. Although a relatively large sample was used for the
survey, even taking into account the need to obtain consent from the selected rms,
there was a relatively high response rate and a high acceptance rate of more than
30 percent. Nevertheless strict regulations prevent reasonable use of the data: rst,
the limited time window of opportunity for using the data is a problem, and, second, the
unknown nature of the potential selection bias.
In summary, the KombiFiD project generated much new knowledge on the technical
aspects of data matching, experience with regard to rm behaviour and practical
knowledge about cooperation between dierent data-providing institutions. Hopefully,
future projects will be set up in order to proceed in this promising direction.
3.3.2.4 Data without Boundaries
A very promising, large-scale programme, which is connected to the MAPCOMPETE
project in many ways, is Data without Boundaries (DwB). DwB is another European
101
16/2/15
10:04
Page 102
FP7 project, which aims to enhance transnational access to ocial micro-data for
researchers70. The project will be nished in 2015. The motivation behind the project
is that currently OS micro-data repositories are underutilised resources within
research, eg within the social science research area, both nationally in many countries
and internationally71. Programme participants cooperate with NSIs and European data
archives to create an integrated model of transnational micro-data access. As part of
the project, a comprehensive, structured meta-database providing information on
ocial micro-data available for research purposes in Europe and on the procedures
for requesting access to these data, is being built72.
3.3.2.5 The Global Value Chain project and the Eurostat International Sourcing
Survey
The Global Value Chain73 project was coordinated by Statistics Denmark and carried
out from 2011-13 within the ESSnet by Statistics Finland, Statistics Norway, CBS
Netherlands, Instituto Nacional de Estatstica (Portugal), National Institute of Statistics
(Romania), National Institute of Statistics and Economic Studies (France). The aim of
the project was to strengthen ESS capacity (conceptually and methodologically) to
measure economic globalisation and the globalisation of business, and to concretely
establish statistical evidence about the increasingly globalised ways of doing business
and organisation of companies. The objectives were to help policymakers to make
better informed decisions and to monitor the globalisation of economies by developing
and providing indicators on economic globalisation.
The GVC project is intertwined with Eurostats International Sourcing Surveys (ISS)74,
which were carried out in 2007 and in 2012. The latest survey gathered data on the
international organisation and sourcing of business functions in 15 European
countries, while in 2007, the coverage was 11 EU countries plus Norway. The surveys
cover nearly 40,000 businesses with more than 100 employees.
70. http://www.dwbproject.org/
71. Data without Boundaries, DELIVERABLE D7.1, Metadata Standards usage and needs in NSIs and Data Archives,
2013, http://www.dwbproject.org/export/sites/default/about/public_deliveraples/dwb_d7-1_metadata-standardsusage_report.pdf
72. Data without Boundaries, DELIVERABLE D5.2, Report and Databank Documenting OS Micro-data, 2013,
http://www.dwbproject.org/export/sites/default/about/public_deliveraples/dwb_d5-2_databank-nationalsurvey_report_nal2.pdf
73. http://www.cros-portal.eu/content/global-value-chains-0
74. See http://epp.eurostat.ec.europa.eu/statistics_explained/index.php/International_sourcing_of_business_functions
(for the 2013 survey) and http://epp.eurostat.ec.europa.eu/statistics_explained/index.php/
International_sourcing_statistics (for the 2007 survey)
102
16/2/15
10:04
Page 103
103
16/2/15
10:04
Page 104
This Blueprint so far has investigated the extent to which a wide range of
competitiveness indicators, especially those that are built from micro-data and that
we have dened as bottom-up indicators, can be computed for EU countries and what
data is actually accessible for researchers. In chapter 2, we highlighted issues at the
level of individual countries, while in chapter 3, we focused on the challenges of using
micro-data to construct indicators of competitiveness across countries. In this chapter,
we pick up on the main conclusions emerging from chapters 2 and 3 (in sections 4.1
and 4.2, respectively). Building on these considerations, in the next chapter we oer
some policy recommendations.
4.1 Issues regarding the availability of data at country level
The availability of an indicator of competitiveness depends on dierent factors. In the
MAPCOMPETE data mapping exercise (see chapter 2), we distinguish between factors
that determine the computability of an indicator and factors that inuence
accessibility. By computability we mean the quality of data and the length of time
coverage. Computability of an indicator relies mainly on data existence and the
possibility to merge data from dierent sources, if necessary. The accessibility of data
depends on the rules of access and their clarity. As part of the MAPCOMPETE data
mapping exercise, statistical institutes of EU member states were approached to
collect information on micro-data availability. Project participants surveyed several
bottom-up competitiveness indicators rms productivity, dynamics, international
activities, R&D activities and some other features with respect to computability and
accessibility.
104
16/2/15
10:04
Page 105
105
16/2/15
10:04
Page 106
106
16/2/15
10:04
Page 107
107
16/2/15
10:04
Page 108
The mapping of micro-level information also highlights the fact that dierent types of
data are treated dierently. In some EU member states, dierent regulations apply to
dierent databases. Databases with the full population compiled by National Statistics
Institute of Italy are not accessible to researchers, who can only access descriptive
statistics upon request, but micro-data stemming from surveys is available. In the
Czech Republic, business register data can be accessed relatively easily, while for
other types of data, such as custom data and FATS data, conditions are more stringent.
Malta allows access to rm-level information for research purposes, except for data
on foreign ownership and capital. In Latvia, data is available upon request, except for
data on trade by destination and product, which is condential.
Our results show that in general there are stricter regulations on registry-type data and
on databases that have full coverage over the observed population. Survey type data,
especially data from harmonised surveys like CIS, is usually easier to access. Our
ndings on individual-level trade data are mixed, since these databases include
information both from administrative sources (ExtraStat) and from a harmonised
survey (IntraStat).
A distinction in condentiality restrictions is particularly important when we consider
the potential use of bottom-up indicators that are based on information obtained from
dierent sources in dierent countries. For instance, rm entry and exit information
and balance sheet data are obtained from administrative sources in some countries,
while others conduct surveys to collect the information. Consequently, the computability and accessibility of bottom-up indicators based on these data is likely to
dier in dierent countries and a harmonised approach to condentiality protection is
hard to achieve.
It is worth mentioning that Eurostat provides access for scientic purposes to certain
European survey data80 including the Labour Force Survey and the Community
Innovation Survey. Recognised research entities conditional on the approval of their
research proposal might access micro-data anonymised by Eurostat on electronic
devices or non-anonymised data in Eurostats safe centre. Currently, Eurostat
negotiates on the possible dissemination of the micro-data on a case-by-case basis
and proposes a unique anonymisation methodology to all member states. Member
states might refuse Eurostats proposal if it conicts with national legislation, and thus
micro-data will not be available for all member states81.
80. Comission Regulation 831/2002 species the surveys and the rules of access.
81. Ichim D., Franconi L. Strategies to achieve SDC harmonisation at European level: multiple countries, multiple les,
multiple surveys, http://neon.vb.cbs.nl/casc/..%5Ccasc%5CESSnet%5Ccomparable%20dissemination%20v-1.pdf
108
16/2/15
10:04
Page 109
109
16/2/15
10:04
Page 110
information on the available datasets including the identity of the owner of the data,
the exact content, the quality of data and the rules of access. These pieces of
information are necessary to decide whether the dataset is suitable to their needs and
whether they apply for access.
International standards already exist for the international exchange of metadata.
Statistical Data and Metadata Exchange (SDMX), an initiative sponsored by the Bank
for International Settlements, ECB, Eurostat, International Monetary Fund, OECD, United
Nations and the World Bank, aims to provide standards for the exchange of statistical
information (eg formats for data and metadata, content guidelines, IT standards)83.
Particularly for Europe, the European Commission has set up a recommendation on
reference metadata for the European Statistical System84, which refers to the European
Statistics Code of Practice85 and is based on the SDMX framework.
While ESMS Metadata les for all of the statistics published by Eurostat are provided
and other international organisations also provide structured metadata on their
statistics our experience shows that there is still a big hole in the information on
data. ESMS metadata les present useful information on methodologies, quality and
the statistical production processes in general, but usually provide very little
information on the link between the aggregate indicator and micro-data used to
compute the given indicator. Also, country-specic information on survey and sampling
design is often sketchy. We made use of the information provided in ESMS Metadata
les when mapping the readily-available aggregate indicators, but we found that in
order to be able to assess the strengths and weaknesses of these indicators to improve
their quality or to propose new ones, much more information on the available national
micro-data would be needed.
Gathering comprehensive information on micro-data available in EU member states
proved to be a challenging and time-consuming task. The amount and structure of
information available on the websites of NSIs and other national data providers is very
dierent in dierent countries. It is usually insucient to ll the MAPCOMPETE
MetaDatabase and it is denitely insucient to plan a research project. In many cases,
researchers obtain information on given datasets from scientic publications or
83. SDMX (2009), Content-Oriented Guidelines, Statistical Data and Metadata eXchange. Vale, S. (2009), Generic
Statistical Business Process Model, Version 4.0 April 2009, UNECE Secretariat.
84. See European Commission (2009), Commission recommendation of 23 June 2009 on reference metadata for
the European Statistical System, Ocial Journal of the European Union L 168/50, 50-55.
85. Eurostat (2011), European Statistics Code of Practice for the National and Community Statistical Authorities,
Eurostat, European Statistical System, Luxembourg.
110
16/2/15
10:04
Page 111
through informal channels, which are burdensome and usually result in incomplete
information. Also, when conducting cross-country comparative research or research
that requires the use of information from more than one source, researchers have to
search through several websites and publications, each with dierent metadata
structure and information content.
Since in MAPCOMPETE we collected a huge amount of information in a systematic
manner, we tried to directly contact sta within the NSIs in all the EU28 countries to
gather the relevant information. After a few months of the project, it became apparent
that this was highly complicated, so we decided to gather information by exploiting
existing contacts built up in another international project (CompNet) and from other
personal contacts. In some cases, these contact persons were able to help us ll in
the MAPCOMPETE MetaDatabase and in other cases they referred us to people within
the NSI. The fact that in most countries economic databases are collected and handled
by more than one institution the NSI and the national central bank (and sometimes
other institutions) both collect data in most cases made it even harder to obtain the
required information. Also, smaller countries and newer EU members tend to have less
experience in handling requests for micro-data access, and consequently are usually
less prepared to provide systematic information on existing data.
The experience we gained during the data-gathering process shows that the availability
of information on the data is at least as important as the availability of data itself.
Performing EU-wide research projects on competitiveness or designing new indicators
is not feasible without easily available, comprehensive information on national microdata. This is why the MAPCOMPETE MetaDatabase is especially useful for future
research on measures of competitiveness. Furthermore, it serves as a basis for
suggestions for possible improvements to data sources, treatment of data, conditions
of access etc. It might promote quality research by providing detailed information on
the accessibility and availability of data related to the measurement of
competitiveness. However, the MAPCOMPETE MetaDatabase is only a snapshot of
competitiveness-related data. A regularly updated, structured, easily available and
comprehensive meta database on national micro-data that might include the
experience of other researchers working with the data might substantially increase
the eciency of international research projects.
Issues related to the nationality of the data user
As part of establishing the European research space, conducting research and analysis
on the basis of foreign data becomes important. Several specic problems arise in
111
16/2/15
10:04
Page 112
terms of foreign access to datasets located in countries other than the nationality of
the researcher. First, in some countries, such as Belgium, Denmark, Hungary and the
United Kingdom, access to micro-data is allowed only to researchers who are citizens
of the country of the data provider or aliated with a national institution. Second,
language barriers are obviously a serious burden, since in many countries information
is provided only in the national language, but one that can be solved by simply oering
data description and variables in English. Several NSIs have made a great deal of
progress in this respect, including metadata provision in English. Third, the provision
of data on site might not be a burden for locals, but can be very costly for foreign
researchers. Hence, setting up secure remote access such as is available in Finland,
France, Germany and Sweden would be an important step. Finally, making access by
foreigners easier by appointing an English-speaking specialist could indeed facilitate
European research integration.
Unclear rules of access
When mapping the accessibility of data, we faced the obstacle that it is often
challenging to obtain precise information on the conditions of access to condential
data. Information on the accreditation process, statistical disclosure control methods
applied and the practical details of access is usually not clearly specied on the
website of the data provider or at any other publicly-available source. We found that
one had to contact the data provider directly in order to clear up the details and to nd
out if access to the data is possible and under what conditions.
Our results show that there are substantial dierences between countries in terms of
the clarity of rules of access. In many countries there is some settled, formal procedure
of applying for access (eg Denmark, Finland, France, Netherlands, Slovenia and
Sweden) while other countries are less advanced in this respect and handle requests
on a case-by-case basis. However, regardless of the sophistication of the application
procedure, in most cases, it is required to present a research project which needs to be
approved. This approval creates room for discretionary decision-making and
informality which might dier from country to country, but is really dicult to assess.
The approval procedure might be more problematic when the data provider does not
perform output checking itself, but it is the researchers responsibility to protect the
condentiality of data. If data protection is delegated to the researchers then the
cooperation strongly relies on trust between the data provider and the researcher, and
it might be hard to dene exact criteria.
112
16/2/15
10:04
Page 113
Truncated data
In many cases, micro-data is provided in truncated form; that is it is made available
with less information than the original source, in order to prevent the risk of disclosure
(sensitivity) and for cost reasons. For the purposes of our discussion, this aspect is
related to accessibility, but it can aect computability when it prevents the merging of
dierent datasets.
Sensitivity truncation
Several statistical disclosure methods used to protect the condentiality of data lead
to a loss of information and might aect the quality of analysis carried out on the data.
Let us rst present key obstacles and make suggestions for their treatment (for details
and a broad discussion, see Hundepool et al, 2010). According to statistical best
practice, this implies first a definition of possible situations at risk (disclosure
scenarios) and second, a proper definition of the risk in order to quantify the
phenomenon (risk assessment) (Hundepool et al, 2010, p. 30).
In this chapter, we identify four issues that matter for practitioners:
1.
2.
3.
4.
The rst issue is related to the sensitivity issues of aggregated data. In some sectors,
size categories or regions, there are only very few rms. Aggregating data on them
would imply that in some categories only one or very few rms would feature and
hence, their individual data would not be protected. To avoid this scenario, most
statistics institutions and central banks or research outlets protect condentiality by
setting up compulsory aggregation rules. Typical rules include a minimum number of
rms per aggregated band (this ranges between 4 and 9, in our experience) and maybe
other controls such as market share of the top 5 rms in the aggregate.
The second topic is a more general solution to keep identication impossible. This
entails aggregating some existing rm categories such as industry or location address
to protect the identity of rms. This process is especially useful in smaller countries
where some regions or industries might include only a few rms, even if they are not
large. Examples include merging four-digit industry codes into two-digit codes, merging
113
16/2/15
10:04
Page 114
16/2/15
10:04
Page 115
and the costs of these will depend on the size and complexity of the data at hand.
Saving resources and reducing administrative burdens are important in an era when
NSI budgets are often being cut. As a result, aggregation and truncation of raw data are
often carried out not for sensitivity but for cost purposes.
One such practice is aggregation of some part of the dataset. Transaction-level data
might be aggregated into annual aggregates. For instance, foreign trade is often
registered at a very ne transaction level, but available data is mostly at annual
aggregate level. Several variables might be deleted in order to avoid spending the time
that would be required for consideration of sensitivity issues.
Finally, another approach is exclusion of small rms. Dropping rms with fewer than
ve employees could reduce the size of a dataset by 80-90 percent, while retaining 95
percent of value added. However, such an exercise will limit analysis and
understanding of important issues, such as entrepreneurship and rm dynamics.
An important aspect of dataset reduction for cost saving reasons is European/
international harmonisation. Comparing statistics computed on the whole dataset or
on rms with more than 10 employees might yield rather dierent results (for an
application for exporters, see Bks et al, 2011).
4.2 Accessibility and matching of data from dierent countries
As we argued in chapter 3, data matching opens up rich and novel research
opportunities, especially when micro-level datasets are concerned. Existing microlevel data in European countries has signicant potential in terms of record linkage
and matching, including also commercial data and Big Data. Data matching and issues
of matchability have considerably gained in importance in recent years. One reason for
this lies in the increased accessibility of micro-level datasets and in the desire of
researchers to merge these datasets within and between countries in order to increase
the research potential of the data. There has also been signicant progress on technical
issues, not least driven by the rapid development of computer technology and data
storage.
The issue of data matching and matchability is of course not conned to the social
sciences, but the recent economic crisis has made clear that economists require highquality data, especially at the micro level, that is comparable across countries, in order
to examine cross-country dierences in competitiveness. However, comparable microdata at the rm level in dierent EU countries is so far only available for some topics,
115
16/2/15
10:04
Page 116
most of which are not directly relevant for competitiveness (notable exceptions are
the Community Innovation Survey, the International Sourcing Survey or the EFIGE
survey). These comparable micro-level datasets are, however, all based on sample
surveys.
The huge potential of administrative data, which is already leveraged in many
countries, is still waiting to be fully realised (see Agatei and Vaju, 2013, for instance).
There are, however, some serious endeavours in this direction, mainly based on the
ESSnet projects and on the Framework Regulation for Integrating Business Statistics
(FRIBS, see section 3.2). These projects are of special importance because they are
concerned with administrative data within the EU, which is of high quality. Any step
towards making these data more comparable and accessible is more than welcome by
researchers and policymakers. Therefore, ensuring the availability of such data should
be a priority for the European Commission because this would ensure vastly improved
analysis of cross-country dierences in competitiveness, and of labour market issues
and related elds.
The most serious obstacles to matching micro-level data from dierent countries are
still legal restrictions preventing data from being matched, because privacy and
condentiality are at stake. However, there is some activity in this area, namely within
projects to evaluate the potential of analysing micro-level data without directly
accessing the data.
There are also obstacles to data matching within countries (see the KombiFiD example
from Germany). This holds especially true if the datasets to be matched are held by
dierent data providers, eg statistical oces, central banks, employment agencies or
private data providers. However, progress has been made in this regard in recent years.
Important steps to overcome the problem of data comparability between countries,
particularly with regard to cross-country analyses of competitiveness, have been
taken, for instance by the EFIGE project providing comparable rm-level data for
15,000 rms from seven EU countries. The ECBs CompNet project is following suit.
However, these two projects can only be regarded as rst tentative steps towards data
that can be used for cross-country analyses in the eld of competitiveness, and that
is highly useful for policymakers.
Overall, much has been achieved in the eld of data matching within Europe in recent
years, but the universe of cross-country and matched datasets is still sparsely
populated and quite heterogeneous, with potential for improvement. Because of the
116
16/2/15
10:04
Page 117
117
16/2/15
10:04
Page 118
5 Policy recommendations:
towards better access,
computability and
matchability of micro-level
data
This Blueprint has shown that the information currently available to researchers on
comparable measures of competitiveness for dierent countries is insucient.
Aggregate data, which is easily accessible and widely available, does not allow
researchers to provide the answers that policymakers need. Micro-data on individual
countries is mostly inaccessible to external researchers, and the situation is even
worse when one tries to compare gures based on micro-data which are comparable
for dierent countries. Only a few rm-level surveys are available, mostly only for one
or a few years; there are few examples of matched data from dierent countries, and
internationally comparable gures can be gathered only from a few micro-distributed
data exercises. This is very dierent from, for example, the United States, where microlevel data from dierent states has been matchable and comparable since at least the
mid-2000s. This implies that Europe lacks proper information to assess of the state of
competitiveness at European level, compared to the situation in the United States.
The rst-best solution to overcome these bottlenecks would be to change the national
and EU-level rules of data content, data availability, data matching and data access. The
eorts undertaken by the ESS, with programmes such as MEETS, FRIBS, FATS, SIMSTAT
and ESS.VIP (see section 3.2.2), towards greater harmonisation of data and the
construction of pan-European data sets are useful initial steps in this direction. In
particular, these initiatives can contribute to:
118
16/2/15
10:04
Page 119
The reduction of the burden on enterprises in collecting and providing internal data;
The provision of a common ESS infrastructure framework for the production and
compilation of business statistics with an appropriate legal background and new
administrative mechanisms allowing for the sharing of information, services and
costs among all ESS partners;
The denition of consistent data requirements and a common data quality
framework, which will enable the linking and matching of statistics obtained as part
of the regular collection of global business statistics.
However, the timeline to complete this process, and for its eects to be felt by
researchers, is far too long and in the end might even prove almost useless, since it
might well be that when this time comes, the next generation of researchers might
highlight a dierent set of needs.
Therefore, such long-term actions to change regulations need to be complemented
with more short-term workarounds.
The rst workaround is to exploit the availability of improved methods and techniques,
such as matching after separate processing (eg the Distributed Micro-Data Approach)
or imputation. Projects such as CompNet (see Table 3.1 and section 3.3.2.2) or ESSLait
(see Tables 3.1 and 3.2) provided important insights into new aspects of
competitiveness by producing micro-aggregated statistics going beyond the rst
moment of the distribution of rms competitiveness indicators. However, if not
properly supported by policy, these initiatives might remain one-shot exercises,
whereas they need to be rened, constantly updated and carried out in a timely way
in order to provide the more up-to-date gures for policy decisions. Two examples we
have already mentioned clearly highlight these risks: the ESSLait exercise provided
gures up to 2010 (see http://www.cros-portal.eu/content/metadata-work), while the
more recent CompNet gures refer to the year 2011. Since these initiatives require
researchers within data-providing institutions to run the codes prepared by the
researchers, proper policy support is needed to enforce in as many countries as
possible the requests to run micro-distributed exercises.
The second workaround would be to improve techniques for matching and accessing
micro-level data, either by improving architectures for data matching (eg by involving
matching institutions) or for access to data by researchers (eg by improving
techniques of data anonymisation). Many NSIs have already developed or adopted
elaborate methods and organisational arrangements in these areas. For example, in
Germany, there is a well-established system of research data centres at several ocial
119
16/2/15
10:04
Page 120
data providers. Other countries like the Netherlands or France have established
techniques of remote-data access. From a theoretical perspective there are several
additional ideas which could be rather easily adopted or, if necessary, adapted to
national systems and legislation (see section 3.1.5, and Koch and Neugebauer, 2014,
for an overview).
It is worth mentioning that after speaking to ocials in NSIs, national central banks
and other ocial data providers in many EU countries we are quite persuaded that
in most countries access to micro-data would be feasible for external researchers, but
it is easier for the data providers to restrict access. While the ocial reason is often
linked to legal issues about condentiality, it seems that other factors might play a
role. We have described several approaches to allow researchers access to data while
maintaining condentiality (such various forms of anonymisation, or the creation of
matching institutions), but these solutions have costs, and require the data provider
to take some responsibility for the release of the data. Restricting access is cost- and
responsibility-ecient for the data providers, although very inecient from the
researchers perspective. To some extent, it is also a way to protect the monopoly of the
data provider in terms of use of the data. But if these are the real issues behind the
restrictions on data access, there are readily-available solutions.
Data access does not need to be free for all researchers. Instead, researchers can
contribute to cover the costs of setting-up the infrastructure for data access using their
research funds. Since there are mainly xed costs, related to setting up the facilities
for safe access (including remote connections) and to the anonymisation of the data,
while the marginal costs for an additional user are relatively low, data providers could
use a sort of average incremental cost to establish access. This pricing structure is not
new to economists, and it is similar to what happens in network industries. On top of
this, since data providers are multi-product monopolies, they would obtain an
advantage from allowing access to the greatest number of data sources, in order to
increase the number of users86.
Furthermore, when contacting national statistics institutes and national central banks,
we found a generally high level of competence. However, in order to foster co-operation
and build a truly European infrastructure for accessing micro-data, it is very important
that there is also investment in developing capabilities such as language skills and
economics knowledge. In this respect, EU support is crucial, especially for smaller
120
16/2/15
10:04
Page 121
member states, which might not be able to aord to bear the xed costs of setting up
new infrastructures and developing the necessary capabilities.
The third workaround is to support multiscope cross-country surveys, which allow
researchers to gather information on a wide range of rms activities and performance
indicators, in order to enable them to assess their contribution to overall competitiveness. The Community Innovation Surveys and the International Sourcing
Surveys (see Table 3.1 and section 3.3.2.5) are interesting examples of this, although
they both focus on specic aspects of competitiveness. The EFIGE survey (section
3.3.2.1) is another example, which takes into consideration more aspects of
competitiveness. However, in order for this solution to be eective, there is a need for
greater harmonisation and coordination. Concentrating resources on fewer surveys
could be more eective in covering many aspects of competitiveness and basing
results on a larger number of rms followed constantly over time. Thereby, the
dynamics of rm competitiveness could also be accurately assessed. Such multiscope
cross-country surveys could then be linked to administrative and registry data, and
trade and foreign aliate data, exploiting protocols for micro-data linking, as tested, for
example, within the GVC project (section 3.3.2.5).
In summary, developing national capabilities in order to better service micro-level data
is the most cost-eective and sustainable way to generate new indicators of
competitiveness. Once these permanent structures are in place, access by individual
researchers to micro-level data or projects based on the distributed micro-data
approach could be more feasible. At the same time, given that setting up these
capabilities for all EU28 countries will take time and, in some cases, legislation, we
also recommend unication and extension of corporate surveys piloted under various
projects funded by the European Commissions Seventh Framework and Horizon 2020
programmes. Carefully crafted annual surveys will allow new measures of
competitiveness to be constructed and of greater understanding of its dynamics even
in the short term.
121
16/2/15
10:04
Page 122
6 Annex
16/2/15
10:04
Page 123
three main factors: investment and saving in physical capital, new technology and
human capital.
Problems: The comparability of output measures can be negatively aected by the
use of dierent valuations (inclusion of taxes, dierent deation indexes). Labour
input can be biased by dierent methods used to estimate average hours or to
estimate employed persons87,88.
Multi-factor productivity:
Description (1): Multi-factor productivity (MFP) relates output to a combined set of
inputs. KLEMS MFP is a productivity measure that relates gross output to primary
(capital (K) and labour (L)) and intermediate inputs (energy (E), other intermediate
goods (M), services (S)):
Output
MFP =
KLEMS
Description (2): the OECD MFP growth indicator is computed as the dierence
between the rate of change of output and the rate of change of total inputs.
MFPit = ln(Qit) itln(Lit) (1 it)ln(Kit)
Where it is the share of labour in total costs in industry i, (1 it) is the share of capital
in total costs, Qit is value-added at constant prices, Lit and Kit are the labour and capital
inputs respectively.
Rationale: In theory, its a more comprehensive measure than labour productivity.
MFP shows the time prole of how productively combined inputs are used to
generate gross output. Conceptually, the KLEMS productivity measure captures
disembodied technical change. In practice, it reects also eciency change,
economies of scale, variations in capacity utilisation and measurement errors89.
The OECD Multi-factor Productivity index is a harmonised index that allows for country
and sectoral comparisons.
87. International comparisons of manufacturing productivity and unit labor costs trends. International Labor
Comparisons Program. Bureau of Labor Statistics. U.S. Department of Labor.
88. Fleck, S. E. International comparisons of hours worked: an assessment of the statistics. Monthly Labor Review,
May 2009.
89. OECD Manual Measuring Productivity: measurement of aggregate and industry-level productivity growth.
123
16/2/15
10:04
Page 124
Problems: Signicant data requirements, in particular timely availability of inputoutput tables that are consistent with national accounts.
Total factor productivity growth:
Description: Total factor productivity (TFP) growth accounts for the changes in output
not caused by changes in labour and capital inputs. It is estimated as the residual
by subtracting the sum of two-period average compensation share weighted input
growth rates from the output growth rate. Log dierences of level are used for growth
rates, and hence TFP growth rates are Tornqvist indexes (denition from The
Conference Board). As such, the output measure is gross value added. In the
EUKLEMS database, TFP growth is identically dened.
Rationale: TFP growth represents the eect of technological change, eciency
improvements, and our inability to measure the contribution of all other inputs. It is
the closest approximation of productivity growth, which is the ultimate source of
growth.
Problems: As it is technically computed as a residual of the growth rates that is not
accounted for by inputs growth, TFP growth measures the contribution of all other
possible factors.
Total factor productivity (using micro-data):
Description: TFP is calculated from the residual of a production function, where the
output variable is production value and the input variables are capital, labour and
materials costs. For rm-level productivity, the employed technique is borrowed
from Levinshon and Petrin (2003) who employ intermediate inputs to control for
correlation between input levels and the unobserved rm-specic productivity
process.
Rationale: Accounts for all eects in total output not caused by traditional inputs
(labour, capital, materials etc.). Ready for cross-country and/or cross-sector
comparison. Overcomes the simultaneity bias that aects standard estimates of
rm-level productivity. Better measure of competitiveness than unit labour cost.
Change in TFP captures technology catch-up, dynamism.
Problems: Computationally intensive to calculate, and suers from potential
aggregation biases when calculated at the industry or country level.
Olley and Pakes productivity decomposition90.
Description: Productivity, dened at the industry level and computed as a weighted
90. Olley, S. and Pakes, A. (1996) "The Dynamics of Productivity in the Telecommunications Industry." Econometrica,
64(6), pp. 1263-1298.
124
16/2/15
10:04
Page 125
t = t + i sitit
1
it
and it = it
Rationale: the covariance term is a cross country comparable measure of the extent
to which rms with higher than average productivity, have a higher than average
share of activity and indicate the degree of resource misallocation. In fact, if sitit
is positive, it implies that rms with above average productivity compared to other
display above average market shares in a given year. It is a bottom-up approach for
a cross-country comparable measure.
Problems: OP decomposition compares productivity allocation across rms in a
given year, and hence it does not give a comparison over time.
itk
it +
iC
s (
it
itk
tk) +
iC
s (
it
itk
tk)
iE
s
it
it
iC
itk
iX
(itk tk)
Where C = plants that continue their business over time; E = plants that enter at a
given time and X = plants that exit; while tk is the weighted average productivity at
the beginning of the period
Rationale: The rst three terms of the decomposition are known as the within,
between and covariance component of rms contribution in productivity, while
the last two terms account for the net entry eects. This decomposition method has
two advantages: an integrated treatment of entry/exit and continuing plants
(measure of rm dynamics); separating-out within eect (based on plant-level
changes) and between eect (that reects changing shares) from cross/covariance
eects. Focusing on the covariance term sitit: if this is positive, it means that
91. Foster, L., Haltiwanger, J. and C. J. Krizan (2001), Aggregate Productivity Growth. Lessons from Microeconomic
Evidence, in: New Developments in Productivity Analysis, 303 372 National Bureau of Economic Research.
125
16/2/15
10:04
Page 126
rms who are becoming more (less) productive over time are also able to attract
more (less) workers; if it is negative or non-signicant, then the functioning of the
labour market (wage-setting mechanism) contributes negatively to productivity
growth.
Problems: While OP decomposition compares productivity allocation across rms in
a given year; Foster-type decompositions compare productivity growth within rms
over time.
16/2/15
10:04
Page 127
127
16/2/15
10:04
Page 128
Description: Price and non-price determinants of the trade balance are identied at
sector/country level through the relative unit values of imports and exports, which
are computed out of imports and exports values and quantities. The technique is
described in Dieppe et al (2012) which builds on Aiginger (1998), where X= value
of exports and M= value of imports.
Rationale: This decomposition analysis helps to disentangle the respective roles
of price and non-price factors into sectorial/country competitiveness, as identied
by the trade balance.
Revealed Comparative Advantages (RCA):
Description: The Revealed Comparative Advantage based on trade is obtained as
the fraction of the sector-country export shares over the sector-EU export shares.
Other country groups can be used as reference. Formally, for sector i, country j, it is
calculated as
Xj,i /i Xj,i
RCAi =
Xworld,i /i Xworld,i
where X is the value of exports.
Rationale: Compares the share of a given sectors exports in the EUs total
manufacturing exports with the share of the same sectors exports in the total
manufacturing exports of a group of reference countries. Values higher (lower)
than 1 mean that a given industry performs better (worse) than the reference
group, and are interpreted as a sign of comparative advantage. The RCA indicator
is thus used to rank EU products by comparative advantage. (From International
competitiveness of EU industry - DG ENTR95).
Current Account as % of GDP:
Description: The Current Account as Percentage of GDP is dened as the sum of the
net income from abroad, the net current transfers and the dierence between
nationwide exports and imports, over GDP.
Rationale: The current account balance determines the exposure of an economy to
the rest of the world, whereas the capital and financial account explains how it is
financed (Eurostat Balance of payment statistics). The indicator tracks imbalances
in the nationwide Import/Export and measures the realised competitiveness of an
economy.
Problems: The indicator carries endogeneity problems. It also includes non-trade
related components.
95. European Commission Enterprise and Industry: EU industrial structure 2011 Trends and Performance, chapter
iv international competitiveness of EU industry.
128
16/2/15
10:04
Page 129
129
16/2/15
10:04
Page 130
deators, leading to dierent measures of real exchange rate96,97,98. The two suggested
ones are the PPI-based REER and the UCLM-based REER.
The PPI-based REER index uses as deator the producer prices index:
Rationale: is closer to the production side of the economy (includes industrial
products and intermediate goods that can be traded internationally) than the CPI; in
fact CPI-based index shows the dynamics of relative consumer prices, and hence it
can be a rather poor approximation of the dynamics in relative export prices.
Even though PPI-based REER still includes production for the domestic market, PPIs
are viewed as a reasonable proxy for tradable goods prices.
Problems: data on export-oriented PPI are usually very scarce and their composition
and compilation varies considerably across countries. It is important to collect
comparable measure of PPI at the European level.
The ULCM-based REER index:
Rationale: Unit labour costs in the manufacturing sector (ULCM) are often used as
a proxy for unit labour costs in the tradable goods sector. ULCM-based REER is
considered a better measure compared to the ULC-based index that usually refers
to the total economy, including also the services sector.
Problems: Unit labour costs do not cover all of the costs incurred by rms; factor
substitution may aect these indicators without necessarily resulting in a change
in productivity. Moreover, as for ULC-based index, cost measures are typically more
aected by data quality issues than price measures. The last problem is related to
the fact that this popular measure of competitiveness may, however, be too narrow
a concept as it only focuses on a certain sector of the economy.
The percentage change over three years of the real eective exchange rate (REER)
based on consumer price index deators:
Rationale: This measure captures the drivers of persistent changes in price and cost
competitiveness of each member state relative to its major trading partners, and
thus illustrates the magnitude of developments in price and cost competitiveness.
The three years span casts a more comprehensive picture of global price pressure
on domestic producers in a medium-term perspective
96. Turner P and Vant Dack J. (1993) 'Measuring International Price and Cost Competitiveness', BIS Economic Paper
No. 39.
97. Benkovskis Konstantins & Worz Julia (2012) 'Evaluation of Non-Price Competitiveness of Exports from CESEE
Countries in the EU Market', Bank of Latvia WP 1/2012.
98. Schmitz, M., De Clercq, M., Fidora, M., Lauro, B. and Pinheiro C., (2012) 'Revisiting the eective exchange rates of
the euro', ECB Occasional paper series N. 134, June 2012.
130
16/2/15
10:04
Page 131
Other commonly used deators are: the Consumer Price Index, the Gross Domestic
Product, export prices and Unit Labour Costs.
The Unit Labour Cost (ULC):
Description: ULC is calculated as the ratio of total labour costs to real output, or
equivalently, as the ratio of mean labour costs per hour to labour productivity
(output per hour).
Rationale: ULC represents a link between productivity and the cost of labour in
producing output. Unit Labour Costs are seen as one of the most relevant measures
of eciency and aggregate competitiveness. Any increase in added value will
translate into a higher level of rm competitiveness, while an increase in the cost of
employees would reduce rms competitiveness. They are easy to compute and are
typically used for country level analysis.
Problems: This measure, per contra, presents shortcomings both at the macro and
the micro level. At the macro level ULC are not considered to be a comprehensive
measure of competitiveness (labour earnings represent just one component of total
value added). Moreover, the high heterogeneity across rms induces an aggregation
bias. The eect of the aggregation bias on the adequacy of standard aggregate cost
measures in capturing export capability can be shown with reference to the socalled Spanish paradox99. At the micro level the bias could derive from the fact that
high-quality rms might be associated with a higher total cost of employees and
thus, if not perfectly reected in higher added value, in a higher (rather than lower)
ULC.
6.1.4 Innovation & technology
The Innovation & technology category is fundamental to assess non-price
competitiveness. Through non-price competitiveness rms try to distinguish their
products or services from competitors on the basis of attributes like quality, design or
any other sustainable competitive advantage than price. Several indicators are used
to determine the rate of rms innovation.
Innovation & technology, on the other hand, could aect also prices: for example a
process innovation could result in a reduction of the production costs, both xed and
variable, of a given good.
99. Altomonte, C., di Mauro, F. and Osbat, C. (2013) 'Going beyond labour costs: How and why structural and microbased factors can help explaining export performance?' CompNet Policy Brief no.1, 15 January 2013.
131
16/2/15
10:04
Page 132
16/2/15
10:04
Page 133
133
16/2/15
10:04
Page 134
GDP
(The GDP used in this indicator is corrected for the presence of intangibles104).
Rationale: Intangible investments are crucial drivers of knowledge creation. Recent
research has shown that these spendings boost productivity and growth and foster
a sustainable comparative advantage on knowledge-intensive tasks/products. As
part of long-term strategies, these spendings are therefore considered as
investments. In addition, high-wage economies are gradually increasing their
investments in intangibles with respect to tangibles like buildings or machinery.
104. Corrado, Carol; Jonathan Haskel, Cecilia Jona-Lasinio and Massimiliano Iommi, (2012) 'Intangible Capital and
Growth in Advanced Economies: Measurement Methods and Comparative Results', Working Paper, June, available
at http://www.intan-invest.net.
134
16/2/15
10:04
Page 135
135
16/2/15
10:04
Page 136
136
16/2/15
10:04
Page 137
Extensive research projects in this eld have been nanced by the European
Commission:
INNODRIVE Intangible capital and innovations: drivers of growth and location in
the EU (2008-2011): the project tackles the intangible questions from the
viewpoint of the rms.
COINVEST Competitiveness, innovation and intangible investment in Europe
(2008-10): the project contributes to the understanding of intangible investments
as drivers of innovation, competitiveness and growth and on supporting the view
that they should be treated as investments instead of inputs.
IAREG Intangible assets and regional economic growth (2008-10): while
developing new indicators, the special focus of this project was on a) the
environment aecting rms location and b) regional externality aecting the
accumulation of intangibles.
MERITUM Intellectual capital guidelines for rms (1998-2001): the project
elaborated a classication of intangibles and contributed in understanding how
companies manage and control intangibles and whether these are relevant for
equity valuation.
137
16/2/15
10:04
Page 138
138
16/2/15
10:04
Page 139
139
16/2/15
10:04
Page 140
140
16/2/15
10:04
Page 141
nal demand; the measure in fact is computed as the imported intermediate shares
of gross production times exports
VS1 - Share of exports sent indirectly through third countries:
Description: VS1 formula for a particular sector i and country k is:
n
VS1 =
js exports
js gross production
141
16/2/15
10:04
Page 142
16/2/15
10:04
Page 143
economy imports sum over industry imported inputs; in order to derive overall
gures sum over output column; in order to derive overall gures sum over export
column. All the measures are available at sector level.
Rationale: The indicator measures the value of imported inputs in the overall exports
of a country, and can be computed on the basis of national input-output tables. It
measures to what extent countries are involved in a vertically fragmented
production. VS indicator, proposed by Hummels et al (2001), provides a good
measure of the importance of the international fragmentation in the production
processes. The OECD indicator import content of exports, by using harmonised
national input-output tables, computes the countries degree of vertical specialisation. It measures the contribution that imports make in the production of exports of
goods and services. It is a measure of the international fragmentation of production,
mapping trade ows in terms of value added and measuring the degree of
participation in international production chain. By using international I-O tables it is
possible to overcome the proportionality assumption on which Hummels et al
(2001) measure was based (ie using the same coecients for the production sold
in the domestic and in the foreign market).
143
16/2/15
10:04
Page 144
tends to be large. Both these two indices can be computed, starting from inputoutput tables, for countries and sectors.
Fally (2011) proposes another indicator to measure the relative position in the GVC:
the distance to nal demand. The distance to nal demand can be also interpreted
as the length of the value chain when looking forward. The main drawback of this
measure is that it comes from the solution of a system of linear equations for each
industry i in country k, where the value of interest (D) is a function of D in all other
industries and countries.
16/2/15
10:04
Page 145
OECD
OECD
OECD
The WIOD-database
WIOD
Bruegel
EUKLEMS
EUKLEMS
World Bank
UN ComTrade - Export
UN ComTrade
ECB
ECB
Monetary and Financial Statistics - Bank Lending Survey Supply - Enterprises - Q1-Q6
ECB
ECB
Eurostat Database
EUROSTAT
EUROSTAT
EUROSTAT
EUROSTAT
EUROSTAT
EUROSTAT
EUROSTAT
EUROSTAT
EUROSTAT
145
16/2/15
10:04
Page 146
EUROSTAT
EUROSTAT
EUROSTAT
EUROSTAT
EUROSTAT
EUROSTAT
EUROSTAT
European Commission
Ameco Database
European Commission
European Commission
ZEW
Amadeus
INTAN-Invest
INTAN-Invest consortium
146
16/2/15
10:04
Page 147
Indicator
split
IndicatorsID
SubindicatorID
Aggregation
Indicators
computability
IndicatorsID
SubindicatorID
ComputabilityID
Country
Degree of
computability
Time
VarID_1
VarID_2
VarID_3
VarID_4
VarID_...
Notes
People
Country
Title
Surname
Name
Function
Variables
VarID
Country
Description
Time
SourceID
Accessibility
Disaggregation
Notes
147
Sources
SourceID
Country
Dataset
InstitutionID
Institution
URL
InstitutionID
Institution
Existing contact?
Email
Telephone
Skype
Institutions
InstitutionID
Institution
Country
Contact person Main
aim/function
Topics covered Street
Temporal scope Postal code
Sectoral
City
availability
Regional
URL
availability
Publications
E-Mail
Thresholds
Telephone
Accessibility Notes
Statistical unit
Number of
observations
Periodicity
Type of data
16/2/15
10:04
Page 148
148
16/2/15
10:04
Page 149
Frequency.
Disaggregation (country, sector, region).
Degree of computability: a synthetic code for whether for a given country an
indicator can be computed or not. We have opted for a three value scale: high,
medium and low, to allow the fact that an indicator can be computed completely or
partially.
Time: the time span for which an indicator can be computed.
Notes: any useful information on a given indicator-country pair.
VarID_1-VarID_20: this is a key aspect of the structure of the dataset. Each indicator
needs to be computed from some underlying variables, which may or may not come
from the same source and for the same time period. We allow for the fact that each
indicator can be computed from up to 20 underlying variables. For example, in order
to compute the Unit Labour Cost of Manufacturing (UCLM)-based REER, we need
bilateral trade ows, exchange rate, compensation per employee, value added per
employee. In order to compute TFP, we need info on value added, tangible capital,
number of employees. Variables are identied as V_XX_z, where V stands for variable,
XX is the two-letter country code, and z is a numeric identier for the variable.
VarID_1-VarID_20 can be linked with VarID in the table variables.
One to one: in most cases an indicator uses information from two or more
variables. However, there are cases where an indicator is already available as it is.
This is the case of: TFP (total factor productivity) (Macro) (I_22), Number of hours
worked (I_37), Participation of adults aged 25-64 in education and training by NUTS
2 regions % (I_40), Participation in lifelong learning of employed persons by sector
(I_43), Training enterprises (%) (I_50), R&D as Percentage of GDP (I_58), EPO patent
applications per billion GDP (in PPP) (I_68), Non-R&D innovation expenditures (%
of turnover) (I_70), SMEs introducing product or process innovations (% of SMEs)
(I_73), SMEs introducing marketing or organisational innovations (% of SMEs) (I_75).
In such cases there is a one-to-one correspondence between indicators and
variables.
Variables: is a table providing info on sources and availability for each variable needed
to construct a given indicator. Note that variables are country-specic. For example,
compensation per employee (needed to construct the ULC) appears C times, one for
each country.
VarID as above
Country
Description
Time: time coverage
149
16/2/15
10:04
Page 150
16/2/15
10:04
Page 151
after 2000) for the rest of the countries and no information at all is available for Croatia,
Ireland and Malta.
As shown in Table 6.4, the indicators of productivity are generally available for all the
EU countries for a large time span when the national level is considered.
An exception is Croatia, for which not only are all these indicators not available at the
sectoral and the regional level, but data is missing at the country level for the aggregate
labour productivity based on hours worked.
It is worth noting also that for some countries, ie Estonia, Luxembourg, Latvia, Greece,
Malta and Slovenia, aggregate labour productivity based on hours worked is available
for a shorter interval, since after 2000 (2002 for Luxembourg).
The availability is worse and less systematic when the sectoral and the regional levels
are considered.
In particular, a large number of countries (Bulgaria, Croatia, Cyprus, Estonia, Greece,
Lithuania, Luxembourg, Latvia, Malta, Poland, Portugal, Romania, Slovakia) lack data on
the TFP growth rate at the sectoral level. A smaller, but relevant number of countries
lacks information on the aggregate labour productivity based on hours worked at the
sectoral level (Croatia, Cyprus, Ireland, Malta, Sweden), while data on aggregate labour
productivity based on number of employees at the sectoral level is missing for Croatia,
Cyprus, Hungary, Ireland, Malta, Poland and Sweden.
Data on the aggregate labour productivity based on hours worked at the regional level
is missing for Croatia only. Data on the aggregate labour productivity based on number
of employees at the regional level is missing for Italy, Estonia, Croatia, Belgium.
For some other countries, information on these indicators is available, but for a shorter
time interval with respect to the rest of the countries: the aggregate labour productivity
based on hours worked at the sectoral level is available since 2000 for Bulgaria, Greece,
Lithuania, Latvia, Poland, since 2001, for Spain, since 2002, for Luxembourg, while at
the regional level is available since 2000 for Austria, Finland, Greece, Hungary, Italy,
Romania, and since 2001 for the Netherlands.
The aggregate labour productivity based on number of employees at the sectoral level
is available since 2000 for Bulgaria, Estonia, Greece, Lithuania, Slovenia, since 2001
for Spain, since 2002 for Luxembourg, since 2003 for Latvia, since 2006 for Portugal,
151
16/2/15
10:04
Page 152
while at the regional level is available since 2000 for Spain, Finland, Greece, Hungary,
Latvia, Malta, Portugal, Romania, UK, since 2001 for the Netherlands, since 2002 for
Luxembourg and 2008 for Poland and Slovenia. The indicators belonging to the Trade
Competitiveness group are homogenously computable across the EU countries (see
Table 6.5).
The 5-Year Change in Export Market Shares at the country level is provided for a large
time span (at least 1997-2012) for all EU countries. As for the sectoral level, the same
indicator is available for all countries since at least 1999, with the exceptions of
Bulgaria (2001) and Luxembourg (2003). The Relative Trade Balance is available (4
digit level) for all the countries, monthly, since only 2002. The Decomposition of the
Trade Balance in Price and non-Price Competitiveness is available at both country and
sectoral level for a large time span (since before 2000) for all the EU countries. A similar
availability applies to the Current Account as a Percentage of GDP (since before 1995
depending on the country). On the other side, the Revealed Comparative Advantage
(RCA) at the sectoral level (4 digits) is available since 2002 only for all EU countries.
The Intangible Investments at the country level are available for a large time interval
for all the EU countries with only some exceptions (see Table 6.6). Croatia has no
information, while Greece, Luxembourg and Portugal provide information since 2000
instead of 1995 like most of the other countries.
The availability of the other two indicators considered, Loans to enterprises and Loan
application success/failure, is not good in most of EU countries. For the former, there
is no data available for Belgium, Bulgaria, Croatia, Czech Republic, Denmark, Finland,
Greece, Hungary, Lithuania, Latvia, Poland, Romania, Sweden, UK; data is available
since 2003 for the rest of the countries.
The picture is worse when the loan application success/failure is considered, since
there is no information for most countries, with the only exceptions being Germany,
Spain, France, Italy, for which the indicator is available since only recently (2009-12).
Tables 6.7 annd 6.8 show that, as for the indicators providing comparable information
across countries on inward and outward FDI, the coverage for EU countries is quite
good.
This holds in particular with regard to the country-level indicators of both inward and
outward FDI, both ows and stocks, which are available for a large time interval both
annually and quarterly (since before the 1990s for most countries). The only exception
152
16/2/15
10:04
Page 153
is Luxembourg for which both the ows are available since only 2002. The same
indicators are available at the sectoral level with the only dierence that for some
countries the time interval is shorter; ie inward and outward FDI ows at the sectoral
level for Belgium, Luxembourg, Slovenia, Slovakia, Malta and Romania are available
since only after 2002, depending on the country (Ireland lacks data on outward FDI
ows at the sectoral level before 2002); while inward and outward FDI stocks at the
sectoral level are available since only after 2001, depending on the country, for
Belgium, Cyprus, Ireland, Romania, Slovenia, Spain.
Information on both the number of foreign-owned rms (aliates of foreign
multinationals) and the number of aliates abroad controlled by domestic rms is
denitely worse in terms of time span for several countries. The number of foreignowned rms at both the country and sectoral level is available since only 2001 for
Austria, since 2007 for Belgium, since 2003 for Bulgaria, Estonia, Lithuania, Latvia,
Romania, Slovenia, and Slovakia, since 2004 (and 2007 sectoral) for Cyprus, since
2008 for Malta, while there are no data for Greece and for Hungary data are available
since 2003 at the sectoral level. The number of aliates abroad controlled by domestic
rms at both the country and sectoral level is good for only a few countries (Austria,
Germany, Italy, Portugal and the Czech Republic), while it is available only for recent
years in Belgium, Bulgaria, Cyprus, Denmark, Estonia, Finland, France, Hungary,
Netherlands and Sweden (since 2007), Greece and Lithuania (since 2004), Slovakia
(since 2005), Ireland (since 2010), Latvia (since 2006), Poland and Romania (since
2008), Spain and UK (since 2009), Slovenia in the time interval 2007-09. Data for
Luxembourg is available in 2005 and then since 2009.
Tables 6.9 and 6.10 show the availability of information on EU countries involvement
in the global value chain as computed from the OECD-TiVA International Input-Output
tables and from the WIOT tables. In both cases, the computability is high and
comparable across all EU countries at both country and sectoral level. In particular,
the domestic value added share of gross export and the foreign value added share of
gross export are available for 1995-2000-2005-2008-2009 from the OECD-TiVA tables,
while they are continuously available since 1995 to 2011 from the WIOT tables (Table
6.10).
The group of indicators on innovation activities for both all rms and SMEs are
computable through the data provided by Eurostat based on the CIS survey, which is
carried out in most EU countries. Nevertheless, both the number of waves available
and what is publicly available through Eurostat varies across countries.
153
16/2/15
10:04
Page 154
16/2/15
10:04
Page 155
As shown in Table 6.12, when the same indicators are considered for the subsample
of small and medium sized rms (SMEs), the picture worsens signicantly: not only is
information not available at the regional level for all EU countries, like in the case of
the all rms sample, but information is also missing at the sectoral level for all
countries. For some indicators there is information at 2 digit level only for 2000.
As for the country level, the availability of information on innovation indicators in SMEs
is quite good for all countries (ie four waves or more are available) with only a few
exceptions. Information on SMEs introducing product and/or process innovations is
available in only 2006-2008-2010 for Croatia and in 2000-2004-2006 for Greece; as
for SMEs introducing marketing and/or organisational innovations information is
available in 2004-2008-2010 for Belgium, France, Ireland, Italy, Slovakia and Spain; in
2006-2008-2010 for Croatia, in 2008-2010 for Finland, Latvia, Slovenia, Sweden and
UK; in 2004-2006 for Greece.
The availability of information on the share of SMEs innovating in-house is less
homogeneous for EU countries: in 2004-2006-2008 for Cyprus, in 2006-2008-2010
for Croatia and Luxembourg, in 2000-2004-2006 for Denmark and Greece, In 20002004-2008 for France; in 2004 and 2006 only for Ireland, in 2008-2010 for Latvia,
Malta and Slovenia; in 2000-2008 for Spain while no data are available for UK.
Innovative SMEs collaborating with others is widely and comparably computable
across EU countries for a large time span (at least four waves) with the only exceptions
of Croatia, for which data are available in 2006-2008-2010 only, France, in 2004-20082010, and Greece in 2004-2006.
The four indicators patent applications to the European Patent Oce (EPO), EPO patent
applications per billion GDP (in PPP), License and patent revenues from abroad as %
GDP, and EU Summary Innovation Index (SII) show a quite good degree of computability
across EU countries, while the picture varies when we look at the other two indicators
R&D as Percentage of GDP and R&D Expenditure.
As shown in Table 6.13, the availability of information is good for all the EU countries
when looking at the country level for patent applications to the European Patent Oce
(EPO) and EPO patent applications per billion GDP (in PPP), ie, patent applications to
the European Patent Oce (EPO) are computable for most of the countries since the
late 1970s and for all in any case since before 2000, while EPO patent applications
per billion GDP (in PPP) since the second half of the 1990s.
155
16/2/15
10:04
Page 156
At the regional level, again for all countries the time span is shorter, being in between
2000 and 2009 for all countries for the patent applications to the European Patent
Oce (EPO) and since 2000 to now for the EPO patent applications per billion GDP. It
is worth noting that there are no data at the regional level for Croatia in both cases.
On the other side, license and patent revenues from abroad as % of GDP and EU
Summary Innovation Index (SII) are computed and comparable across countries for all
EU countries since 2004 (2006 for Spain and Greece), as for the former, and since
2008, the latter.
Turning the attention to R&D as percentage of GDP and R&D Expenditure at the country
level, the availability of information is good for most countries since before 2000, with
only some exceptions: Croatia and Malta (since 2002), and Greece, Luxembourg and
Sweden showing a large discontinuity in the availability of data.
The sectoral level data on R&D as Percentage of GDP is continuously available for a
time span since before 2000 for only Belgium, Bulgaria, Cyprus, Estonia, Ireland,
Netherlands, Poland, the Czech Republic, Romania, Slovakia, Slovenia, Sweden and
Hungary. For the rest of the countries the information is limited to a shorter time
interval, ie Latvia, Lithuania and Portugal (since 2000), Finland, Germany, Greece, Italy
and Spain (since 2001), Croatia (since 2002), Denmark, France and the UK (since
2007), and/or quite discontinuous in time (Austria and Malta), with several missing
data.
At the regional level, the information is continuously available for a large time span
(since before 2000) for only Cyprus, Estonia, Finland, Latvia, Lithuania, Portugal, Spain
and Hungary; for the rest of the countries, information is given for a shorter time
interval, ie Slovakia and Poland (since 2000), the Czech Republic and Romania (since
2001), Bulgaria, Ireland and Malta (since 2002), Slovenia (since 2003), the UK (since
2005), Belgium (since 2006), and/or quite discontinuous in time (Austria, Croatia,
France, Germany, Greece, Italy, Luxembourg, Netherlands, Sweden).
The sectoral level data on R&D expenditures is continuously available for a time span
since before 2000 for only Austria, Belgium, Cyprus, Estonia, Ireland, Netherland,
Poland, the Czech Republic, Romania, Slovakia, Slovenia, Sweden and Hungary. For
the rest of the countries the information is available for only a shorter time interval, ie
Latvia, Lithuania and Portugal (since 2000), Finland, Germany, Greece, Italy and Spain
(since 2001), Croatia (since 2002), Denmark, France and the UK (since 2007),
Luxembourg, (since 2009) and/or quite discontinuous in time (Bulgaria and Malta).
156
16/2/15
10:04
Page 157
At the regional level, information on R&D expenditures are continuously available since
before 2000 for only Cyprus, Estonia, Finland, France, Latvia, Lithuania, Portugal, Spain
and Hungary, while for the rest of the countries the information is limited to a shorter
time interval, ie, Poland and Slovakia (since 2000), the Czech Republic and Romania
(since 2001), Belgium, Ireland and Malta (since 2002), Slovenia (since 2003), Croatia
(since 2008) and/or quite discontinuous in time (Austria, Bulgaria, Denmark, Germany,
Greece, Italy, Luxembourg, Netherlands, Sweden, the UK).
Table 6.2: Macro-level indicators: price and cost exchange rate and ULC
Index/level
AT BE
BG
CY
CZ
DE
DK
EE
EL
ES
FI
FR
HR HU
IE
IT
LT
LU
LV
MT
NL
PL
PT
RO
SE
SI
I_010_01 Country
SK UK
1
I_011_01 Country
I_011_02 Sector
I_012_01 Country
I_013_01 Country
AT
BE BG
CY
CZ
DE DK
EE
EL
ES
FI
FR HR HU
IE
IT
LT
LU
LV
MT NL
PL
PT
RO
SE
SI
SK UK
157
16/2/15
10:04
Page 158
Table 6.4: Macro-level indicators: labour productivity and Total Factor Productivity
Index/Level
AT
BE
BG
CY
CZ
DE
DK
EE
EL
ES
FI
FR
HR
HU
IE
IT
LT
LU
LV
MT
NL
PL
PT
RO
SE
SI
SK
UK
I_001a_01
Country
I_001a_02
Sector
I_001a_03
Region
I_001b_01
Country
I_001b_02
Sector
I_001b_03
Region
I_002_01
Country
I_002_02
Sector
AT
BE BG
CY
CZ
DE DK EE
EL
ES
FI
FR HR HU
IE
IT
LT
LU
LV
MT NL
PL
PT
RO SE
SI
SK UK
I_006_02 Sector 2
I_007_02 Sector 1
I_008_02 Sector 2
I_064_02 Sector 1
I_008_01
Country
I_065_01
Country
BE
BG
CY
CZ
DE
DK
EE
EL
ES
FI
FR
HR HU
IE
IT
LT
LU
LV
MT
NL
PL
PT
RO
SE
SI
SK
UK
I_057_01
Country
I_058_01
Country
I_066_01
Country
158
16/2/15
10:04
Page 159
AT
BE
BG
CY
CZ
DE
DK
EE
ES
FI
FR
GR
HR
HU
IE
IT
LT
LU
LV
MT
NL
PL
PT
RO
SE
SI
SK
UK
AT
BE
BG
CY
CZ
DE
DK
EE
EL
ES
FI
FR
HR
HU
IE
IT
LT
LU
LV
MT
NL
PL
PT
RO
SE
SI
SK
UK
AT
BE
BG
CY
CZ
DE
DK
EE
EL
ES
FI
FR
HR
HU
IE
IT
LT
LU
LV
MT
NL
PL
PT
RO
SE
SI
SK
UK
I_039a_01
Country
I_039a_02
Sector
I_040a_01
Country
I_040a_02
Sector
I_039a: Value Added Export Ratio - domestic value added share of gross exports, % - OECD TiVA
I_040a: Value Added Export Ratio - foreign value added share of gross exports, % - OECD TiVA
159
16/2/15
10:04
Page 160
IT
LT LU LV MT NL PL PT RO SE SI SK UK
I_039b_02
Sector
I_040b_01
Country
I_040b_02
Sector
SI
1
1
0
2
2
0
0
0
0
1
2
0
2
2
0
2
2
0
SK UK
2 0
2 0
0 0
2 2
2 2
0 0
1 0
1 0
0 0
2 0
2 0
0 0
2 2
2 2
0 0
2 1
2 2
0 0
I_039b: Value Added Export Ratio - domestic value added share of gross exports, % - WIOT
I_040b: Value Added Export Ratio - foreign value added share of gross exports, % - WIOT
AT BE BG CY
0 2 2 2
0 2 2 2
0 0 0 0
2 2 2 2
2 2 2 2
0 0 0 0
2 1 2 2
2 1 2 2
0 0 0 0
2 2 2 2
2 2 2 2
0 0 0 0
2 2 2 2
2 2 2 1
0 0 0 0
2 2 2 2
2 2 2 2
0 0 0 0
CZ DE DK EE EL ES
2 0 2 2 1 2
2 1 2 2 1 2
0 0 0 0 0 0
2 2 2 2 1 2
2 2 2 2 1 2
0 0 0 0 0 0
2 2 2 2 0 1
2 2 2 2 0 1
0 0 0 0 0 0
2 2 1 2 1 1
2 2 0 2 1 2
0 0 0 0 0 0
2 2 2 2 0 2
2 2 2 2 0 2
0 0 0 0 0 0
2 2 2 2 0 2
2 2 2 2 0 2
0 0 0 0 0 0
FI
1
1
0
2
2
0
0
0
0
2
2
0
2
2
0
1
1
0
FR HR HU IE
0 1 2 2
2 1 2 2
0 0 0 0
2 1 2 2
2 1 2 2
0 0 0 0
1 1 2 1
1 1 2 1
0 0 0 0
1 1 2 0
1 1 2 0
0 0 0 0
1 1 2 2
1 2 2 1
0 0 0 0
1 1 2 2
1 1 2 1
0 0 0 0
160
IT
1
2
0
2
2
0
1
1
0
2
2
0
2
2
0
2
2
0
LT LU LV MT NL PL PT RO SE
2 2 0 2 2 2 2 2 1
2 2 1 2 2 2 2 2 1
0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2
0 0 0 0 0 0 0 0 0
2 2 0 2 2 2 2 2 0
2 2 0 2 2 2 2 2 0
0 0 0 0 0 0 0 0 0
2 2 0 2 2 2 2 2 2
2 2 0 1 2 2 2 2 2
0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 1
0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2 1
2 2 2 2 2 2 2 2 1
0 0 0 0 0 0 0 0 0
16/2/15
10:04
Page 161
AT
BE BG
CY
CZ
DE DK
EE
EL
ES
FI
FR
HR HU
IE
IT
LT
LU
LV
MT
NL
PL
PT
RO
SE
SI
SK UK
I_029_01
Country
I_029_02
Sector
I_029_03
Region
I_031_01
Country
I_031_02
Sector
I_031_03
Region
I_033_01
Country
I_033_02
Sector
I_033_03
Region
I_034_01
Country
I_034_02
Sector
I_034_03
Region
AT
BE
BG
CY
CZ
DE
DK
EE
ES
FI
FR
GR
HR
HU
IE
IT
LT
LU
LV
MT
NL
PL
PT
RO
SE
SI
SK
UK
161
16/2/15
10:04
Page 162
I_001_06
I_001_07
I_001_08
I_001_09
I_001_10
I_001_11
I_013_02
I_001_04
I_001_05
I_001_06
I_001_07
I_001_08
I_001_09
I_001_10
I_001_11
I_013_02
Austria
Belgium
Bulgaria
Croatia
Czech Rep.
Denmark
Estonia
Finland
France
Germany
Hungary
Ireland
Italy
Latvia
Lithuania
Malta
Netherlands
Poland
Portugal
Romania
Slovakia
Slovenia
Spain
Sweden
UK
Accessibility
I_001_04
Computability
1
2
2
0
1
2
2
2
2
1
2
2
1
1
2
1
2
2
1
1
2
2
2
2
2
9
2
1
0
1
1
2
2
1
1
2
2
1
1
1
1
9
2
1
1
2
2
2
2
2
1
2
1
0
1
2
1
2
2
1
2
2
1
1
1
1
9
2
1
1
2
2
2
2
2
1
2
1
0
1
2
1
2
2
0
2
2
1
1
1
1
9
2
1
0
1
2
2
2
2
9
2
1
0
1
0
1
2
1
0
0
2
1
0
1
1
9
0
0
1
1
1
9
2
2
9
2
1
0
1
0
1
2
1
0
0
2
1
0
1
1
9
0
0
1
1
1
2
2
2
9
2
1
0
1
1
1
2
1
1
2
2
1
1
1
1
9
2
0
1
1
1
2
2
2
9
2
1
0
1
1
1
2
1
1
2
2
1
1
1
1
9
2
0
1
1
1
2
2
2
1
2
2
0
1
2
2
2
2
1
2
2
1
1
1
1
2
2
1
1
2
2
2
2
2
0
2
1
9
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
0
0
1
0
1
1
9
0
1
9
1
1
1
1
1
1
1
1
0
1
0
1
9
1
1
0
0
1
0
1
1
0
0
1
9
1
1
1
1
1
1
1
1
1
1
0
1
9
1
1
0
0
1
0
1
1
0
0
1
9
1
1
1
1
1
9
1
1
1
1
0
1
9
1
1
9
0
1
0
1
1
9
0
1
9
1
9
1
1
1
9
9
1
0
9
0
0
9
9
9
0
0
1
9
1
1
9
0
1
9
1
9
1
1
1
9
9
1
0
9
0
0
9
9
9
0
0
1
0
1
1
9
0
1
9
1
1
1
1
1
1
1
1
0
1
0
0
9
1
9
0
0
1
0
1
1
9
0
1
9
1
1
1
1
1
1
1
1
0
1
0
0
9
1
9
0
0
1
0
1
1
0
2
1
9
1
1
1
1
1
1
1
1
0
1
0
1
1
1
1
0
0
1
0
1
1
162
- all firms
- domestic firms
- exporters
- importers
- domestic multinationals
- affiliates of foreign multinationals
- foreign owned exporter
- domestic owned exporters
16/2/15
10:04
Page 163
I_003_05
I_003_06
I_003_07
I_003_08
I_003_09
I_003_10
I_004_01
I_005_01
9
2
1
0
1
1
2
2
1
1
2
1
1
1
1
0
9
2
0
1
1
1
2
2
2
I_003_03
9
2
1
0
1
1
2
2
1
1
0
1
1
1
1
0
9
2
0
1
1
1
2
2
2
I_004_01
9
2
1
0
1
0
1
2
1
0
0
1
1
0
1
0
9
0
0
1
1
1
9
2
2
I_005_01
9
2
1
0
1
0
1
2
1
0
2
1
1
0
1
0
9
0
0
1
1
1
9
2
2
I_003_10
1
2
1
0
1
2
1
2
2
0
2
1
1
1
1
0
9
2
1
0
1
2
2
2
2
I_003_08
1
2
1
0
1
2
1
2
2
1
2
1
1
1
1
0
9
2
1
1
2
2
2
2
2
Accessibility
I_003_09
I_003_07
9
2
1
0
1
1
2
2
1
1
2
1
1
1
1
0
9
2
1
1
2
2
2
2
2
I_003_05
1
2
2
0
1
2
2
2
2
1
2
1
1
1
1
0
2
2
1
1
2
2
2
2
2
I_003_06
I_003_03
Austria
Belgium
Bulgaria
Croatia
Czech Rep.
Denmark
Estonia
Finland
France
Germany
Hungary
Ireland
Italy
Latvia
Lithuania
Malta
Netherlands
Poland
Portugal
Romania
Slovakia
Slovenia
Spain
Sweden
UK
I_003_04
Computability
1
2
2
0
1
2
2
2
2
1
2
1
1
1
1
0
2
2
1
1
2
2
2
2
2
1
2
2
0
1
2
2
2
2
1
2
1
1
1
1
0
2
2
1
1
2
2
2
2
2
0
2
1
9
1
1
1
1
1
1
1
1
1
1
0
9
1
1
1
0
0
1
0
1
1
9
0
1
9
1
1
1
1
1
1
1
1
0
1
0
9
9
1
1
0
0
1
0
1
1
0
0
1
9
1
1
1
1
1
1
1
1
0
1
0
9
9
1
1
0
0
1
0
1
1
0
0
1
9
1
1
1
1
1
9
1
1
0
1
0
9
9
1
1
9
0
1
0
1
1
9
0
1
9
1
9
1
1
1
9
1
1
0
9
0
9
9
9
9
0
0
1
9
1
1
9
0
1
9
1
9
1
1
1
9
9
1
0
9
0
9
9
9
9
0
0
1
9
1
1
9
0
1
9
1
1
1
1
1
1
9
1
0
1
0
9
9
1
9
0
0
1
0
1
1
9
0
1
9
1
1
1
1
1
1
1
1
0
1
0
9
9
1
9
0
0
1
0
1
1
0
2
1
9
1
1
1
1
1
1
1
1
1
1
0
9
1
1
1
0
0
1
0
1
1
0
2
1
9
1
1
1
1
1
1
1
1
1
1
0
9
1
1
1
0
0
1
0
1
1
163
16/2/15
10:04
Page 164
I_053_01
I_054_01
I_055_01
I_056_01
I_051_03
I_052_03
I_053_01
I_054_01
I_055_01
I_056_01
Austria
Belgium
Bulgaria
Croatia
Czech Republic
Denmark
Estonia
Finland
France
Germany
Hungary
Ireland
Italy
Latvia
Lithuania
Malta
Netherlands
Poland
Portugal
Romania
Slovakia
Slovenia
Spain
Sweden
UK
Accessibility
I_051_03
Computability
9
9
1
1
1
2
2
1
2
2
2
2
1
1
1
1
2
1
1
1
2
2
2
2
2
9
9
1
1
0
2
2
1
2
2
2
2
1
0
1
1
2
1
1
1
2
1
2
2
0
9
9
1
0
0
2
2
1
2
2
2
2
1
0
1
1
2
1
1
1
2
1
2
2
2
9
2
1
0
1
2
2
1
2
2
2
2
1
0
1
1
2
1
1
1
2
2
9
2
2
1
2
2
0
1
2
2
2
2
1
2
2
1
1
1
1
2
2
1
1
2
2
2
2
2
1
2
2
0
1
2
2
2
2
1
2
2
1
1
1
1
2
2
1
1
2
2
2
2
2
9
0
1
0
1
1
1
1
0
1
1
1
0
1
0
1
1
1
1
0
0
1
9
1
1
9
0
1
0
9
1
1
1
0
1
1
1
0
9
0
1
1
1
1
0
0
1
9
1
0
9
0
1
9
9
1
1
1
0
1
1
1
0
9
0
1
1
1
1
0
0
1
2
1
1
9
0
1
9
1
1
1
1
0
1
1
1
0
9
0
1
1
1
1
0
0
1
9
1
1
0
2
1
9
1
1
1
1
0
1
1
1
0
1
0
1
1
1
1
0
0
1
0
1
1
0
0
1
9
1
1
1
1
0
1
1
1
0
1
0
1
1
1
1
0
0
1
0
1
1
164
16/2/15
10:04
Page 165
I_043_02
I_044_01
I_045_01
I_046_01
I_047_01
I_048_01
I_049_01
I_050_01
I_009_02
I_043_01
I_043_02
I_044_01
I_045_01
I_046_01
I_047_01
I_048_01
I_049_01
I_050_01
Austria
Belgium
Bulgaria
Croatia
Czech Rep.
Denmark
Estonia
Finland
France
Germany
Hungary
Ireland
Italy
Latvia
Lithuania
Malta
Netherlands
Poland
Portugal
Romania
Slovakia
Slovenia
Spain
Sweden
UK
Accessibility
I_009_02
Computability
1
2
1
0
1
2
2
1
2
2
2
2
2
1
1
2
2
1
1
1
1
2
2
2
1
9
2
1
2
1
2
2
1
2
0
2
2
2
2
1
2
2
1
1
0
1
2
9
2
1
9
2
1
2
1
2
2
1
2
0
2
2
2
2
1
2
2
1
1
0
1
2
9
2
1
9
2
1
2
1
2
2
1
2
0
2
2
2
2
1
2
2
1
1
0
1
2
9
2
1
2
2
1
2
1
2
2
2
2
2
2
2
2
2
1
2
2
1
1
1
1
2
2
2
2
9
2
1
1
1
2
2
2
2
2
2
2
2
2
1
1
9
1
1
1
1
2
2
2
2
1
2
1
0
1
2
2
1
2
2
2
2
2
2
1
1
9
1
1
1
1
2
2
2
1
2
2
1
2
1
2
2
2
2
0
2
2
2
2
1
2
2
1
1
0
1
2
2
2
2
9
2
1
1
1
2
2
2
2
0
2
2
2
2
1
1
9
1
1
0
1
2
2
2
2
1
2
1
0
1
2
2
1
2
0
2
2
2
2
0
1
9
1
1
0
1
2
2
2
1
0
0
1
9
1
1
1
1
1
1
1
1
0
0
0
1
1
1
1
0
0
1
0
1
1
9
0
1
0
1
1
1
1
1
9
1
1
0
0
0
1
1
1
1
9
0
1
9
1
1
9
0
1
0
1
1
1
1
1
9
1
1
0
0
0
1
1
1
1
9
0
1
9
1
1
9
0
1
0
1
1
1
1
1
9
1
1
0
0
0
1
1
1
1
9
0
1
9
1
1
0
0
1
0
1
1
1
1
1
1
1
1
0
0
0
1
1
1
1
0
0
1
9
1
1
9
0
1
0
1
1
1
1
1
1
1
1
0
0
0
1
9
1
1
0
0
1
9
1
1
0
0
1
9
1
1
1
1
1
1
1
1
0
0
0
1
9
1
1
0
0
1
0
1
1
0
0
1
0
1
1
1
1
1
9
1
1
0
0
0
1
1
1
1
9
0
1
9
1
1
9
0
1
0
1
1
1
1
1
9
1
1
0
0
0
1
9
1
1
9
0
1
9
1
1
0
0
1
9
1
1
1
1
1
9
1
1
0
0
9
1
9
1
1
9
0
1
0
1
1
I_009_02: Average, median and other moments of value of exports per exporting firm, total
I_043_01: Average, median, variance, other moments of number of export destination countries
per exporting firm
I_043_02: Number of exporting firms by number of export destination countries.
I_044_01: Average, median, variance, other moments of number of export destination countries
*number of products exported per exporting firm;
I_045_01: Number of exporting firms (extensive margin)
I_046_01: % of exporting firms in total number of firms (extensive margin)
I_047_01: Average, median, other moments of export sales as a share of total turnover (intensive margin)
I_048_01: Number of importing firms (extensive margin)
I_049_01: % of importing firms in total number of firms (extensive margin)
I_050_01: Average, median, other moments of imported intermediates as a share of total cost
of materials (intensive margin)
165
16/2/15
10:04
Page 166
I_023_04
I_023_05
I_041_03
I_042_03
I_059_03
I_070_01
I_023_04
I_023_05
I_041_03
I_042_03
I_059_03
Austria
Belgium
Bulgaria
Croatia
Czech Republic
Denmark
Estonia
Finland
France
Germany
Hungary
Ireland
Italy
Latvia
Lithuania
Malta
Netherlands
Poland
Portugal
Romania
Slovakia
Slovenia
Spain
Sweden
UK
Accessibility
1
1
2
1
1
2
2
2
2
1
2
1
1
2
2
1
1
2
1
1
2
2
2
2
2
1
1
2
1
1
2
2
2
2
1
2
1
1
0
1
1
1
2
1
1
2
2
2
2
2
9
2
1
1
1
1
2
2
1
1
2
2
1
1
1
1
9
1
0
1
1
1
2
2
2
9
2
1
1
1
0
2
2
1
0
0
2
1
0
1
1
9
0
1
1
1
1
9
2
2
1
2
2
0
1
2
2
1
2
1
2
2
1
1
2
0
9
1
1
1
2
2
2
2
2
9
2
1
0
1
2
2
1
2
0
2
1
0
2
1
2
1
1
1
0
1
2
9
2
1
0
0
1
1
1
1
1
1
1
1
2
1
1
1
0
1
1
1
1
0
0
1
2
1
1
0
0
1
1
1
1
1
1
1
1
1
1
0
9
0
1
1
1
1
0
0
1
2
1
1
9
0
1
0
1
1
1
1
1
1
1
1
0
1
0
0
9
1
9
0
0
1
9
1
1
9
0
1
0
1
9
1
1
1
9
9
1
0
9
0
0
9
9
1
0
0
1
9
1
1
0
2
1
9
1
1
1
1
1
1
1
1
1
1
0
9
9
1
1
0
0
1
9
1
1
166
16/2/15
10:04
Page 167
Accessibility conditions
Austria
Belgium
NBB data are confidential and restricted, and the use is allowed only to NBB members (or
affiliated). NBB data on firms balance sheet is the same data provided by Belfirst, and this
source is available upon payment of a fee.
Bulgaria
All the sources mentioned above are restricted, and access is strictly regulated by the
Protection of Secrecy (chapter 6, of Statistical Act).
The micro-data from different statistical fields are accessible, if it does not conflict with
existing regulations, and after the decision of the Commission appointed under Art.10 of
the Rules for providing of anonymised data on scientific and research purposes. These
rules govern the relationship of providing by BNSI of micro-data and the procedure for obtaining them. The rules are based on, and in accordance with, requirements of national and
relevant EU legislation. See
https://unstats.un.org/unsd/dnss/docViewer.aspx?docID=2772. See also indicator 15.4 in
http://www.nsi.bg/sites/default/files/files/
pages/LegalBasis_e/BG_report_FINAL.pdf.
Croatia
Access to most data is restricted. Data collected for CIS (Turnover and R&D expenditure)
can be accessed under certain conditions (for scientific purposes according to Ordinance
on the methods of statistical data protection and Ordinance on Conditions and Terms of
Using Confidential Data for Scientific Purposes).
Czech
Business register data can be accessed both at NCB and CZSO. For the
Republic
access, an external researcher has to provide a research project and to pay a fee. Data can
be accessed both on-site and with CDs (depending on the agreement). According to NCB,
custom data are available only for NCB employees, and the NCB does not report the
conditions to use FDI, and outward FATS data. The accesss conditions for the External Trade
Database at CZSO are regulated by special contract of confidentiality, and the access is
only for research purposes (upon payment of a fee).
More details are available at
http://www.czso.cz/eng/redakce.nsf/i/statistical_data_for_scientific_research_purposes
Denmark
Data are accessible for persons affiliated to Danish institutions which are recognised by
Statistics Denmark, conditional to the approval of a project. In principle, foreign researchers
can access to data if they have an affiliation with a Danish institution. Affiliation can only
take place if the authorised environment is willing to take the responsibility for the foreign
researcher making sure that all existing rules governing access to micro-data are observed.
Data can be accessed on site or from a remote access. See more information at
http://www.dst.dk/en/TilSalg/Forskningsservice.aspx
Estonia
Data are at SE, and the availability of micro-data for scientific purposes is regulated by legal
167
16/2/15
10:04
Page 168
Data are accessible at the Research Laboratory or via the remote access system
conditional on a user license, access agreements and a fee payment.
See more details at
http://www.stat.fi/tup/mikroaineistot/index_en.html.
France
All the mentioned sources are highly confidential, but micro level data will be accessible
with the new system by submitting a research proposal and conditional to a committee approval. Details on the accessibility can be find at http://www.casd.eu/
Germany
Most datasets are available under certain conditions at the respective institutions.
Destatis, the Federal Employment Office (Bundesagentur fr Arbeit, BA) and the
Bundesbank all have dedicated Research Data Centres which offer on-site or remote
access (or direct access via Scientific Use Files) to many of their micro-level datasets
according to the German laws of privacy protection. Data is accessible to researchers, but
only at the BA foreign researchers can get access to the data without cooperating with a
partner from Germany.
Data from the Deutsche Bundesbank are accessible only at the Research Centre (in
Frankfurt am Main). The use of data from the Deutsche Bundesbank is subject to special
confidentiality conditions. Due to legal requirements, individual data cannot be made
generally available. However, these data are made available under strict conditions and for
clearly defined academic research purposes. Bundesbank has visiting researcher
programme at the Research Centre.
In the case of BA, the FDZ offers three ways of data access for researchers. These three
ways differ according the degree of anonymity of the data and the terms of data use: (i) onsite, (ii) remote data access, and (iii) Scientific Use File (rare). In all the three cases, the
researchers have to present a research project that has to be approved by FDZ. In the case
of on-site access, there is the possibility to apply for financial support. More details are at
http://fdz.iab.de/en.aspx.
The research data centre of the Destatis offers four different forms of access to selected
micro-data of official statistics: (i) public use files, (ii) scientific use files, (iii) safe centres,
and (iv) remote execution. They differ with regard to both the anonymity of the data, and
the form of data provision. The scientific use files are well-suited for large part of the scientific data analyses. Foreign users not employed by German institutions may work with the
data both at the research centre and via remote executions. More details are at
http://www.forschungsdatenzentrum.de/en/datenzugang.asp
Hungary
The Hungarian matched data was created by the CSO by assigning an anonymised
identifier to each company, which is consistent between years and databases. Data
protection, required by the law, is a key element in the operations of the CSO. Therefore,
168
16/2/15
10:04
Page 169
variables that provide a direct possibility to reveal the identity of a company (eg name of
the company, address of the headquarters or tax number) were deleted. Technically, the
data is stored on a server in separate files according to topics. Merging the different
databases using the id numbers assigned by the CSO is performed by the researcher.
The matched database is accessible only to the researchers who have an agreement
with CSO, such as the Hungarian Academy of Sciences or some ministries. Access is
granted after registering the project at the CSO. The accessibility of the matched database
is restricted to a safe research room inside the building of the CSO where researchers can
work on the data on site, and save their results. Note that accessibility is still limited and
burdened and occasionally quite slow. The researcher who works with the data has to be in
the Research room in Budapest and needs be affiliated with a partner.
Ireland
The access to the data is in principle possible, but subject to stringent conditions. Firmlevel data can be accessed on-site only, while the use and publication of results is subject
to statistical office approval.
Italy
Firm-level data are confidential and restricted. Business Register (except for Business
Demography) and micro-data stemming from surveys are available to the users at the
ADELE Laboratory (Laboratory for Elementary Data Analysis). However, it should be
stressed that identification code of single units are not available to external researchers;
thus it is not possible to merge data stemming from different surveys without a specific
agreement with Istat (research protocol). Databases with the full population are not
accessible to researcher, but descriptive statistics from these databases are available
upon request.
See for example project Istat Micro3. For further information about ADELE laboratory
see http://www.istat.it/en/information/researchers/analysis-of-individual-data.
Latvia
Information on the value of export (import) by destination and product are not accessible
because confidential. As for other data, in principle are available upon request, conditional
to a fee payment.
Lithuania
Firm-level data are confidential. By the Law of Statistics, micro-level data could be used for
research purposes. Confidential statistical data may be provided for scientific purposes to
be used in a manner that it would be impossible to directly identify the respondents based
on the data, where the research establishments ensure the protection of these data.
Malta
All the information is accessible upon request for research purposes, except data on foreign/domestic ownership.
Netherlands
In general, many issues of competitiveness are available to both domestic and foreign researchers. The accessibility to micro-level data follows explicit rules and specific charges
apply. According to CBS All datasets in the Centre for Policy Related Statistics micro-data
catalogue are available for authorised external researchers to do their own research using
these datasets. The catalogue does not contain all the datasets Statistics Netherlands uses
to compile its statistics. CBS datasets not (yet) included in the catalogue may be made
169
16/2/15
10:04
Page 170
suitable for use by external researchers as custom-made datasets. The catalogue (classified by theme) includes documentation reports of the most recent version of datasets immediately available for use. This documentation contains a description of the contents and
structure of the dataset. The enclosures referred to in this documentation are available
only in Dutch and on request. More details can be found at
http://www.cbs.nl/NR/rdonlyres/50625EDE-3274-4D7C-B19B5E5D0F239E2F/0/131112dienstencatalogusosra2014eng.pdf
Poland
According to the information that we were able to gather, we can only state that the rules of
statistical confidentiality are determined by the law on official statistics issued on 29 June
1995. In theory, access to micro-data is possible only under specific conditions, but the
practice shows that access to individual data beyond CSO and NBP is nearly impossible.
Portugal
We are not in position to describe in details the accessibility conditions. However, in principle data seem accessible.
Romania
Slovakia
Data are not accessible since a safe environment for data security is not yet in place.
The firm-level databases are not available on-line, and the access is confidential: the rules
of access have not been specified.
Slovenia
All the micro-data are accessible at the SURS and are restricted only for research purposes.
See http://www.stat.si/eng/drz_stat_mikro.asp
Spain
In the case of the Industrial Economics Survey, only other statistical institutions (Statistical
Institutes of Autonomous Communities) are provided with micro-data files. As for the CIS
and the Pitec databases, it is possible to access to firm level data anonymised on the INE
web through a specific procedure. Researchers must submit a request by filling out the required fields in the tab Solicitud de descarga de BBDD. Once the request is evaluated and
approved, the researcher will receive within 72 hours an email providing a username and
password, valid for three months. Except for anonymisation of a set of variables the files
available on the web site correspond with the original files.
Sweden
All firm-level data are restricted but data can be accessed by European researchers on remote access, conditional on a confidentiality check and an administrative cost.
United
All the sources are available via the submission of a research project to
Kingdom
the correspondent institutions (UKDS, ONS, and HMRC Datalab). In addition, the HMRC Datalab requires a short training course, which includes legal issues as well as statistical disclosure control of output. At the moment the Datalab is only open to UK based institutions and
by law HMRC is only allowed to share the data if it serves one of HMRC's functions. Data are
available only on site
170
16/2/15
10:04
Page 171
References
16/2/15
10:04
Page 172
8-9 April
Bartelsman, E.J., Haltiwanger, J.C. and Scarpetta, S. (2009) Cross-Country Dierences
in Productivity: The Role of Allocation and Selection, NBER Working Paper No. 15490,
National Bureau of Economic Research (NBER), Cambridge (MA)
Bartelsman, E.J., Haltiwanger, J.C. and Scarpetta, S. (2009) Measuring and analyzing
cross-country dierences in rm dynamics, in Timothy Dunne, J. B. Jensen &
Roberts, M. J. (eds) Producer Dynamics: New Evidence from Micro-data, University
of Chicago Press, pp15-76
Bartelsman, E.J. and Hamilton, A. (2004) The analysis of micro-data from an
international perspective, OECD Statistics Directorate Committee on Statistics
(STD/CSTAT(2004)12), OECD, Paris
Bartelsman, E.J., Haskel, J. & Martin, R. (2008) Distance to which frontier? Evidence
on productivity convergence from international rm-level data, Discussion Paper
No. 7032, CEPR, London
Bartelsman, E.J., Scarpetta, S. & Schivardi, F. (2005) Comparative analysis of rm
demographics and survival: evidence from micro-level sources in OECD countries,
Industrial and Corporate Change 14(3): 365-391
Bks, G., P. Harasztosi and B. Murakzy (2011) Firms and products in international
trade: Evidence from Hungary, Economic Systems 35
Bender, S, Lane, J., Shaw, K.L., Andersson, F. & Wachter, T.v. (eds) (2008) The Analysis
of Firms and Employees. Quantitative and qualitative approaches. University of
Chicago Press
Benkovskis Konstantins & Worz Julia (2012) Evaluation of Non-Price Competitiveness
of Exports from CESEE Countries in the EU Market, Working Paper 1/2012, Bank of
Latvia
Biewen, Elena, Anja Gruhl, Christopher Grke, Tanja Hethey-Maier and Emanuel Wei
(2012) Combined rm data for Germany possibilities and consequences of
merging rm data from dierent data producers, in Schmollers Jahrbuch. Zeitschrift
fr Wirtschafts- und Sozialwissenschaften 132(3): 361-377
Borgman, C.L. (2010) Research Data: Who will share what, with whom, when, and
why? Working Paper 161, Berlin: RatSWD
Boyd, D. & Crawford, K. (2011) Six provocations for big data, paper presented at Oxford
Internet Institutes A Decade in Internet Time: Symposium on the Dynamics of the
Internet and Society, 21 September
Brandt, M. (2012) Decentralised and Remote Access to Condential Data in the ESS
DARA, 4th Workshop on Data Access (WDA), Luxembourg
Broersma, L., Koch, A. & Rekveldt, B. (2010) Hiring by skill in innovative and noninnovative rms. An explorative comparison using German and Dutch matched
employer-employee data bases, Micro-Dyn Working Paper No. 5/10, Vienna
172
16/2/15
10:04
Page 173
Brook, E. L., Rosman, D. L. & Holman, C. J. (2008) Public good through data linkage:
measuring research outputs from the Western Australian Data Linkage System,
Australian and New Zealand Journal of Public Health 32(1): 19-23
Brown, B., Chui, M. & Manyika, J. (2011) Are you ready for the era of big data,
McKinsey Global Institute, San Francisco
Burkhauser, R. V. & Lillard, D. R. (2005) The contribution and potential of data
harmonization for cross-national comparative research, Journal of Comparative
Policy Analysis 7(4): 313-330
Bttner, T. & Rssler, S. (2008) Multiple Imputation of Right-Censored Wages in the
German IAB Employment Sample Considering Heteroscedasticity, Discussion Paper
44/2008, Nuremberg, IAB
Cheser, A. & Nesheim, L. (2006) Review of the Literature on the Statistical Properties
of Linked Datasets, DTI Occasional Paper 3, Department of Trade and Industry,
London
CHINTEX (2001) The change from input harmonization to ex-post harmonisation in
national samples of the European Community Household Panel. Implications on
data quality (Synopsis), technical report, CHINTEX
Christen, P. (2012) Data Matching Concepts and Techniques for Record Linkage, Entity
Resolution and Duplicate Detection, Springer, Berlin, Heidelberg
Christen, P. (2012a) A survey of indexing techniques for scalable record linkage and
deduplication, IEEE Transactions on Knowledge and Data Engineering 24(9): 15371555
Cihak, M., Demirg-Kunt, A., Feyen, E., & Levine, R. (2012) Benchmarking nancial
systems around the world, World Bank Policy Research Working Paper, 6175
Clark, D. E. (2004) Practical introduction to record linkage for injury research, Injury
Prevention 10(3): 186-191
CompNet Task Force (2014) Micro-based evidence of EU competitiveness: The
CompNet database, Working Paper Research 253, National Bank of Belgium
Crawford, K. (2013) The Hidden Biases in Big Data, Harvard Business Review Blog
Network
Cukier, K. & Mayer-Schoenberger, V. (2013) The Rise of Big Data: How its Changing the
Way We Think about the World, Foreign Affairs 2013(May/June): 28-40
Daas, P. & van der Loo, M. (2013) Big Data (and ocial statistics), UNECE/
OECD/EUROSTAT/ESCAP Working Paper
Daas, P., Puts, M., Buelens, B. & van den Hurk, P. (2013) Big Data and Ocial Statistics,
manuscript, Statistics Netherlands
Data without Boundaries (DWB, 2012) Report on the State of the Art of Current SC in
Europe, European Community, Work Package 4, Improving Access to OS Micro-data
De Backer, K. and Yamano N. (2011) International Comparative Evidence on Global
173
16/2/15
10:04
Page 174
16/2/15
10:04
Page 175
concerning access to condential data for scientic purposes, Official Journal of the
European Communities L133/7
European Commission (2006) Communication from the Commission to the European
Parliament and the Council on The Reduction of the Response Burden,
Simplification and Priority-setting in the Field of Community Statistics, Commission
of the European Communities, Brussels
European Commission (2009) The production method of EU statistics: a vision for the
next decade, Communication from the Commission to the European Parliament and
the Council, COM(2009) 404 nal, Commission of the European Communities,
Brussels
European Commission (2009a) Commission recommendation of 23 June 2009 on
reference metadata for the European Statistical System, Official Journal of the
European Union L 168/50: 50-55
European Commission (2010) Fifth report on economic, social and territorial cohesion
European Commission (2010) Investing in Europes future, Fifth report on economic,
social and territorial cohesion
European Commission (2011) EU industrial structure 2011 Trends and Performance,
chapter iv international competitiveness of EU industry
European Commission (2011) Regulation of the European Parliament and of the
Council on the European Statistical programme 2013-2017, European Commission,
Brussels
European Commission (2011) EU industrial structure 2011 Trends and Performance
European Commission (2011a) Programme for the Modernisation of European
Enterprise and Trade Statistics (MEETS), European Commission, Brussels
European Commission (2012) Roadmap: Framework Regulation Integrating Business
Statistics (FRIBS), European Commission, Brussels
European Commission (2012) Macroeconomic Imbalances Procedure Scoreboard
Headline Indicators, 1 November 2012 Statistical information
European Commission (2012) European Competitiveness Report, 15th edition
European Commission (2012) Regional Innovation Scoreboard (2012 and previous
editions).
European Commission (2013) Innovation Union Scoreboard (2013 and previous
editions)
European Commission (2013) Towards knowledge driven reindustrialisation.
European Competitiveness Report 2013, Commission Staff Working Document SWD
(2013) 347 nal, Luxembourg: EC
European Commission (2014) Report from the Commission to the European Parliament
and the Council on the implementation of Decision No 1297/2008/EC of the
European Parliament and of the Council of 16 December 2008 on a Programme for
175
16/2/15
10:04
Page 176
the Modernisation of European Enterprise and Trade Statistics (MEETS), COM (2014)
444 nal, Brussels
European Economic Community (2007) Regulation (EC) No 716/2007 of the European
Parliament and of the Council of 20 June 2007 on Community Statistics on the
Structure and Activity of Foreign Aliates, Official Journal of the European Union
L171/17
European Economic Community (2008) Decision No. 1297/2008/EC of the European
Parliament and of the Council on 16 December 2008 on a Programme for the
Modernisation of European Enterprise and Trade Statistics (MEETS), Official Journal
of the European Union L 340/76
European Economic Community (2009) Regulation (EC) No 223/2009 of the European
Parliament and of the Council of 11 March 2009 on European statistics and repealing
Regulation (EC, Euratom) No 1101/2008 of the European Parliament and of the
Council on the transmission of data subject to statistical condentiality to the
Statistical Oce of the European Communities, Council Regulation (EC) No 322/97
on Community Statistics, and Council Decision 89/382/EEC, Euratom establishing
a Committee on the Statistical Programmes of the European Communities, Official
Journal of the European Union L 87/164 (223/2009)
European Statistical Advisory Committee (2012) Opinion on the further development
of statistics on international trade in goods and services in the European Union
(SIMSTAT), ESAC Doc. 2012/1115, European Statistical Advisory Committee
European Statistical System Committee (ESSC, 2013) 17th Meeting of the European
Statistical System Committee, European Statistical System Committee.
Eurostat (2003) Definition of Quality in Statistics, Eurostat, Luxembourg
Eurostat (2009) Foreign AffiliaTes Statistics (FATS) Recommendations Manual (19770375), Eurostat, Luxembourg
Eurostat (2011) European Statistics Code of Practice for the National and Community
Statistical Authorities, Eurostat, European Statistical System, Luxembourg
Eurostat (2013) ESSnet projects 2013 assessment report, Eurostat
Fally, T. (2011) On the Fragmentation of Production in the US, University of ColoradoBoulder, July
Fellegi, I. P. & Sunter, A. B. (1969) A Theory for Record Linkage, Journal of the American
Statistical Association 64(328): 1183-1210
Figueira, M.H. (2013) FRIBS The EU framework for business statistics, presentation,
Statistiktag Austria, 22/10/2013, Vienna
Fleck, S. E. (2009) International comparisons of hours worked: an assessment of the
statistics, Monthly Labor Review, May
Foster, L., Haltiwanger, J. and C. J. Krizan (2001) Aggregate Productivity Growth.
Lessons from Microeconomic Evidence, in New Developments in Productivity
176
16/2/15
10:04
Page 177
16/2/15
10:04
Page 178
Hollanders, H., Tarantola, S., Garda Porras, B. (2013) Innovation Union Scoreboard
2013, Directorate General for Enterprise and Industry, European Commission
Horvath, S. (2013), Big Data, Aktueller Begriff 37/13, Deutscher Bundestag,
Wissenschaftliche Dienste, Berlin
Hummels, D., Ishii, J. and Yi, K. (2001) The nature and growth of vertical specialization
in world trade, Journal of International Economics, Elsevier, vol. 54(1): 75-96
Hundepool, A. (2007) CENEX summary report, manuscript
Hundepool, A., J. Domingo-Ferrer, L. Franconi, S. Giessing, R. Lenz, J. Naylor, E. Schulte
Nordholt, G. Seri, P.-P. De Wolf (2010) Handbook on Statistical Disclosure Control,
ESSNet SDC, Luxembourg
Kallas, J. & Linardis, A. (2008) A Documentation Model for Comparative Research
Based on Harmonization Strategies, IASSIST Quarterly 2008: 12-25
Kang, L., OMahony, M. & Peng, F. (2012) New Measures of Workforce Skills in the EU,
National Institute Economic Review 220(1), R17-R28
Karlberg, M. and Skaliotis M. (2013) Big Data for Ocial Statistics Strategies and
some Initial European Applications, Working Paper 30, UNECE, Geneva
Karmel, R. (2005) Data linkage protocols using a statistical linkage key, Data Linkage
Series 1, Australian Institute of Health and Welfare, Canberra
Khandelwal, A., Schott, P., and S.,Wei, forthcoming, Trade Liberalization and Embedded
Institutional Reform: Evidence from Chinese Exporters, American Economic Review
Kim, J.K. and Fuller, W. (2004) Fractional hot deck imputation, Biometrika 91(3), 559578
Koch, A. (2008) How to analyse rm dynamics in European countries? Methodology
and results of a comparative study, Micro-Dyn Working Paper 15/08, MicroDyn,
Vienna
Koch, Andreas and Neugebauer, Katja (2014) Technical report on the general
considerations of the matchability of datasets within and across countries and
regions, Technical Report, Tbingen
Koopman, Robert, Powers, William M., Wang, Zhi and Wei, Shang-Jin, (2010) Give Credit
Where Credit is Due: Tracing Value Added in Global Production Chains, NBER Working
Paper No. w16426
Lamel, J. (2002) The Future of the European Statistical System. Theme 2: New ideas
for ESS development, Wirtschaftskammer sterreich, Vienna
Levinsohn, J. and Petrin, A. (2003) Estimating Production Functions Using Inputs to
Control for Unobservables, Review of Economic Studies, Wiley Blackwell, vol. 70(2):
317-341, 04
Liotti, A. (2013) Interoperability of business registers in the European Statistical
System: the Eurostat VIP.ESBRs project, manuscript, Eurostat
Little, R. & Rubin, D. (1987, 2002) Statistical Analysis with Missing Data, Wiley
178
16/2/15
10:04
Page 179
Lohr, S. (2012) The Age of Big Data, The New York Times
Mayer, T. & Ottaviano G. (2008) The Happy Few: The Internationalisation of European
Firms, Intereconomics: Review of European Economic Policy 43(3), 135-148
Mills, S. (2013) Demystifying Big Data. A practical Guide to Transforming the Business
of Government, Tech America Foundation, Washington DC
Miroudot, S., Lanz, R., Ragoussis, A. (Nov 2009) Trade in Intermediate Goods and
Services, OECD Trade Policy Working Paper No. 93, OECD Publishing
Museux, J.-M., Hilbert, N. & Barcellan, R. (2013) Architecture the ESS.VIP Programme,
Eurostat
Narayanan, A. & Shmatikov, V. (2010) Myths and fallacies of personally identiable
information, Communications of the ACM 53(6): 24-26
Ne, G. (2013) Why Big Data Wont Cure Us, Big Data 1(3): 117-123
Newcombe, H. B. & Kennedy, J. M. (1962) Record linkage: making maximum use of the
discriminating power of identifying information, Communications of the ACM 5(11):
563-566
Newcombe, H. B., Kennedy, J. M., Axford, S. & James, A. (1959) Automatic linkage of
vital records, Science, New Series 130(3381), 954-959
OMahony, M. & Timmer, M.P. (2009) Output, Input and Productivity Measures at the
Industry Level: The EU KLEMS Database, The Economic Journal 119(538), F374F403
OMahony, M., Castaldi, C., Los, B., Bartelsman, E., Maimaiti, Y. & Peng, F. (2008)
EUKLEMS Linked Data: Sources and Methods, manuscript, University of
Birmingham
OECD (2001) Measuring Productivity: measurement of aggregate and industry-level
productivity growth, OECD Manual
OECD (2011) Import content of exports, in OECD Science, Technology and Industry
Scoreboard 2011, OECD Publishing
OECD (2013) Exploring Data-Driven Innovation as a New Source of Growth: Mapping the
Policy Issues Raised by Big Data, OECD Digital Economy Papers 222, OECD, Paris
OECD (2013) Calculating Summary Indicators of Employment Protection Strictness:
Methodology http://www.oecd.org/els/emp/EPL-Methodology.pdf
OECD (2013a) OECD Health Data 2013,
http://stats.oecd.org/index.aspx?DataSetCode=HEALTH_STAT
OECD, STAN Input-Output Database:
http://stats.oecd.org/Index.aspx?DatasetCode=STAN_IO_M_X
OECD, STAN Input-Output Database:
http://stats.oecd.org/Index.aspx?DataSetCode=STAN_IO_INTERM_M
Okner, B. (1972) Constructing a New Data Base from Existing Micro-data Sets: The
1966 Merge File, Annals of Economic and Social Measurement 1(3): 326-361
179
16/2/15
10:04
Page 180
16/2/15
10:04
Page 181
16/2/15
10:04
Page 182
182
Mapping
competitiveness
with European data
DAVIDE CASTELLANI AND ANDREAS KOCH
ISBN 978-90-78910-36-7
9 789078 910367
15
BRUEGEL BLUEPRINT 23