Academia.eduAcademia.edu

Four essays on the measurement of productive efficiency

2004

AI-generated Abstract

This thesis comprises four essays focused on the measurement of productive efficiency, particularly in the context of electricity distribution utilities. Influenced by Farrell's work on estimating efficient production frontiers, these essays utilize Data Envelopment Analysis (DEA) methods to establish benchmarks and assess efficiency across various countries. The research underscores the significance of international comparisons in revealing potential efficiency improvements and introduces concepts such as Kernel Density Estimation and the Peer index to enhance the analysis.

ECONOMIC STUDIES DEPARTMENT OF ECONOMICS SCHOOL OF ECONOMICS AND COMMERCIAL LAW GÖTEBORG UNIVERSITY 141 _______________________ FOUR ESSAYS ON THE MEASUREMENT OF PRODUCTIVE EFFICIENCY Dag Fjeld Edvardsen ISBN 91-85169-00-5 ISSN 1651-4289 print ISSN 1651-4297 online Fo u r Es s a y s o n the Mea surem ent o f Pr o d u ct iv e Efficie n cy Doctoral thesis by Dag Fjeld Edvardsen ([email protected]) Preface After graduating in economics from the University of Oslo I started working as a research assistant at the Frisch Centre in January 1999. My first task was trying to understand a strange method I had never heard of before. It was refereed to as Data Envelopment Analysis (DEA). Soon I was working with Finn R. Førsund and Sverre A.C. Kittelsen on applied projects where DEA was used to measure technical efficiency. Examples were nursing homes and home care, employment offices, colleges, electricity distribution utilities, and physical therapists. In 2000 Norwegian Building Research Institute (NBI) in cooperation with the Frisch Centre wrote an application to the Norwegian Research Council (NFR). The topic to be investigated was the efficiency of the Norwegian construction industry. When the application was accepted I was hired at NBI as a doctoral student. I would like to thank NFR for financing the three years it has taken to write the four essays in this thesis. I am deeply grateful to NBI for offering me the opportunity to be part of the project “Productivity in Construction” (and for providing a very large amount of coffee). Frank Henning Holm (now head of NBI) and Grethe Bergly (now at Multiconsult) deserve thanks for hiring me. In 2001 Thorbjørn Ingvaldsen became leader of the project when Grethe Bergly went to Multiconsult. His encouragement, humour, and patients have been enormous, as is his knowledge of the Norwegian construction industry. Jon Rønning became head of PROS (the department this project is located at) when Frank Henning Holm left to become head of NBI. Jon’s support has been without parallel. I would also like to thank my patient and understanding colleges at NBI for encouragement. My thesis advisors Lennart Hjalmarsson and Finn R. Førsund have been the best advisors a doctoral student could ever wish for. Lennart has been very supportive and helped me every time things seemed very difficult. Finn has always been there for me, and his advice and support have been a necessary condition for this thesis to exist. Finn and Lennart’s knowledge of microeconomics and efficiency analysis are without doubt world class. Sverre A.C. Kittelsen (Frisch Centre) has been an enormous resource for me. His knowledge of the subtle and difficult parts of efficiency analysis is more than impressive. He has also been invaluable when it comes to the development of the software used for the bootstrap calculations used in this thesis. The members of the reference group for “Productivity in Construction” deserve thanks for understanding that some task are worth doing even if they require time: Rolf Albriktsen (Veidekke ASA), Finn R. Førsund (University of Oslo/Frisch Centre), Frank Henning Holm (NBI), Sverre Larsen (BNL), Knut Samset (NTNU), Arild Thommasen (Statistics Norway at Kongsvinger), and Grethe Bergly (Multiconsult). Last, but not least, I would like to thank my family: My late mother Laila (who died last year), my father Johny, my sister Janne, my aunt Unni, and my uncle Hugo. Their support has been invaluable, and without it this thesis would not exist. Oslo, November 2004 Dag Fjeld Edvardsen ii Contents 1. Abstract 2. Introduction 3. Essay I. International Benchmarking of electricity distribution utilities 4. Essay II. Far out or alone in the crowd: classification of self evaluators in DEA 5. Essay III. Climbing the efficiency stepladder: robustness of efficiency scores in DEA 6. Essay IV. Efficiency of Norwegian construction firms iii Abstract This collection of essays contains two kinds of contributions. All four essays include applications of the existing DEA (Data Envelopment Analysis) toolbox on real world datasets. But the main contribution is that they also offer new and useful tools for practitioners doing efficiency and productivity analysis. Essay I is about benchmarking by means of applying the DEA model on electricity distributors. A sample of large electricity distribution utilities from Denmark, Finland, Norway, Sweden and the Netherlands for the year 1997 is studied by assuming a common production frontier for all countries. The peers supporting the benchmark frontier are from all countries. New indices describing cross-country connections at the level of individual peers and their inefficient units as well as between countries are developed, and novel applications of Malmquist productivity indices comparing units from different countries are performed. The contribution of Essay II is to develop a method for classifying self-evaluators based on the additive DEA model into interior and exterior ones. The exterior self-evaluators are efficient “by default”; there is no firm evidence from observations for the classification. These units should therefore not been regarded as efficient, and should be removed from the observations of efficiency scores when performing a two-stage analysis of explaining the distribution of the scores. The application to municipal nursing- and home care services of Norway shows significant effects of removing exterior self-evaluators from the data when doing a two-stage analysis. The robustness of the efficiency scores in DEA has been addressed in Essay III. It is of crucial importance for the practical use of efficiency scores. The purpose is to demonstrate the usefulness of a new way of getting an indication of the sensitivity of each of the efficiency scores to measurement error. The main idea is to investigate a DMU’s (Decision Making Unit) sensitivity to sequential removal of its most influential peer (with new peer identification as a part of each of the iterations). The Efficiency stepladder approach is shown to provide relevant and useful information when applied on a dataset of Nordic and Dutch electricity distribution utilities. Some of the empirical efficiency estimations are shown to be very sensitive to the validity and existence of one or a low number of other observations in the sample. The main competing method is Peeling, which consists of removing all the frontier units in each step. The new method has some strengths and some weaknesses in comparison. All in all, the Efficiency stepladder measure is simple and crude, but it is shown that it can provide useful information for practitioners about the robustness of the efficiency scores in DEA. Essay IV is an attempt to perform an efficiency study of the construction industry at the micro level. In this essay information on multiple outputs is utilized by applying DEA on a cross section dataset of Norwegian construction firms. Bootstrapping is applied to select the scale specification of the model. Constant returns to scale was rejected. Furthermore, bootstrapping was used to estimate and correct for the sampling bias in the DEA efficiency scores. One important lesson that can be learned from this application is the danger of taking the efficiency scores from uncorrected DEA calculations at face value. A new contribution is to use the inverse of the standard errors (from the bias correction of the efficiency scores) as weights in a regression to explain the efficiency scores. Several of the hypotheses investigated concerning the latter are found to have statistically significant empirical relevance. iv Introduction A key paradigm in neo-classical production theory is that firms operate on the production frontier. However, even a superficial observation of real production units indicates that this is most often not the case. It is then rather odd that economists continue to believe in this paradigm, and that so little effort is spent on revealing inefficiencies and their causes. Most of the old tricks learned in microeconomics become invalid since they adopt the assumption from neo-classical economics that firms behave as if they were technically efficient. A natural starting point for developing methods for the study of productive efficiency is the seminal 1957 paper by Michael J. Farrell with the appropriate title “The measurement of productive efficiency.” Farrell’s key contribution was introducing a non-parametric method for estimating the efficient production frontier as a reference for his efficiency measures, based on enveloping data “from above.” This approach generalizes naturally to multiple inputs and multiple outputs. The four essays in this thesis are modest attempts to follow up the Farrell tradition as it has been developed both within economics and operations research where the term DEA was coined in Charnes et al. (1978). Essay I. International benchmarking of electricity distribution utilities1 Improvement of efficiency in electricity distribution utilities has come on the agenda, as an increasing number of countries moved towards deregulation of the sector in the last decade. A key element in assessing potentials for efficiency improvement is to establish benchmarks for efficient operation. A standard definition of benchmarking is a comparison of some measure of actual performance against a reference performance. One way of obtaining a comprehensive benchmarking as opposed to partial key ratios is to establish a frontier production function for utilities, and then calculate efficiency scores relative to the frontier. In this study a piecewise linear frontier is used, and technical efficiency measures (Farrell, 1957) and Malmquist productivity measures (Caves et al., 1982) are calculated by employing the DEA model (Charnes et al., 1978). The DEA model has been used in several studies of the utilities sector recently. A special feature of the present cross section study is that the data (for 1997) is based on a sample of utilities from five countries: Denmark, Finland, The Netherlands, Norway and Sweden. Most of the efficiency studies of utilities have been focusing on utilities within a single country (Førsund and Kittelsen, 1998), but a few studies have also compared utilities from different countries (Jamasb and Pollitt, 2001). In some cases an international basis for benchmarking is a necessity due to the limited number of similar firms within a country. When the number of units is not the key motivation for an international sample for benchmarking, the motivation may be to ensure that the national best practice utilities are also benchmarked . There are some additional problems with using an international data set for benchmarking. The main problem is that of comparability of data. One is forced to use the strategy of the least common denominator. A special issue is the correct handling of currency exchange rates. There are really only two practical alternatives; the average rates of exchange and the Purchasing Power Parity (PPP) as measured by OECD. The latter approach is chosen here. Relative differences in input prices like wage rates and rates of return on capital may also create problems as to distinguish between substitution effects and inefficiency. 1 This essay was published in Resource and Energy Economics in 2003. v According to the findings in Jamasb and Pollitt (2001) international comparisons are often restricted to comparison of operating costs because of the heterogeneity of capital. As a precondition for international comparisons they focus on improving the quality of the data collection process, auditing, and standardization within and across countries. Our data have been collected specifically for this study by national regulators, and special attention has been paid to standardize the capital input as a replacement cost concept. When doing international benchmarking for the same type of production activity in several countries, applying a common frontier technology seems to yield the most satisfactory environment for identifying multinational peers and assessing the extent of inefficiency. In our exercise for a sample of large electricity distribution utilities from Denmark, Finland Norway, Sweden and the Netherlands it is remarkable that peers come from all countries. The importance of exposing national units, and especially units that would have been peers within a national technology, to international benchmarking is clearly demonstrated. The multinational setting has called for the development of new indices to capture the crosscountry pattern of the nationality of peers and the nationality of units in their referencing sets. Bilateral Malmquist productivity comparisons can be performed between units of particular interest in addition to country origin, e.g. sorting by size, or location of utility (urban - rural), etc. We have focused on a single unit against the (geometric) average performance of all units, as well as bilateral comparisons of (geometric) averages of each country. Our results point to Finland as the most productive country within the common technology. This result reflects the more even distribution of the Finnish units and the high share of units above the total sample mean of efficiency scores. Essay II. Far out or alone in the crowd: classification of self evaluators in DEA The DEA method classifies units as efficient or inefficient. The units found strongly efficient in DEA studies on efficiency can be divided into self-evaluators and active peers, depending on whether the peers are referencing any inefficient units or not. Self-evaluators was introduced by Charnes et al. (1985). The contribution of the paper starts with subdividing the self-evaluators into interior and exterior ones. The exterior self-evaluators are efficient “by default”; there is no firm evidence from observations for the classification. Self-evaluators may most naturally appear at the “edges” of the technology, but it is also possible that selfevaluators appear in the interior. It may be of importance to distinguish between the selfevaluators being exterior or interior. Finding the influence of some variables on the level of efficiency by running regressions of efficiency scores on a set of potential explanatory variables is an approach often followed in actual investigations. Using exterior self-evaluators with efficiency score of 1 in such a “two-stage” procedure may then distort the results, because to assign the value of 1 to these self-evaluators is arbitrary. Interior self-evaluators, on the other hand, may have peers that are fairly similar. They should then not be dropped when applying the two- stage approach. A method for classifying self-evaluators based on the additive DEA model, either CRS or VRS, is developed. The exterior strongly efficient units are found by running the enveloping procedure “from below”, i.e. reversing the signs of the slack variables in the additive model, after removing all the inefficient units from the data set. Which units of the strongly efficient units from the additive model that turn out to be self-evaluators or active peers, will depend on the orientation of the efficiency analysis, i.e. whether input-or output orientation is adopted. The classification into exterior and interior peers is determined by the strongly efficient units turning out to be exterior ones running the “reversed” additive model. The exterior self-evaluators units should be removed from the observations on efficiency scores when performing a two-stage analysis of explaining the distribution of the vi scores. The application to municipal nursing- and home care services of Norway shows significant effects of removing exterior self-evaluators from the data when doing a two-stage analysis. Thus the conclusions as to explanations of the efficiency score distribution will be qualified taking our new taxonomy into use. Essay III. Climbing the efficiency stepladder: robustness of efficiency scores in DEA The robustness of the efficiency scores in DEA has been addressed in a number of research papers. There are several potential problems that can disturb precise efficiency estimation, such as sampling error, specification error, and measurement error. It is almost exclusively the latter that is dealt with in this paper. It has been proven analytically that the DEA efficiency estimators are asymptotically consistent given that a set of assumptions is satisfied. The most critical assumption might be that there are no measurement errors. The DEA method estimates the production possibility set by enveloping the data as close as possible, in the sense that the frontier consists of convex combinations of actual observations, given that the frontier estimate can never be “below” an observed value. If the assumption of no measurement error is broken we might observe inputoutput vectors that are outside the true production possibility set, and the DEA frontier estimate will be too optimistic. Calculating the efficiency of a correctly measured observation against this optimistic frontier will lead to efficiency scores that are biased downwards. In other words, even symmetric measurement errors can produce efficiency estimates that are too pessimistic. It is of crucial importance for the practical use of the efficiency scores that information about their sensitivity is available. The reason why measuring sensitivity is a challenge is in a sense related to the difficulty with looking at n-dimensional space. In two dimensions, and possibly three, one can get an idea of the sensitivity of one observation efficiency score by visually inspecting a scatter diagram. But when the number of dimensions is higher than three, help is needed. The Efficiency Stepladder method introduced in this paper is an offer to empirically oriented DEA applications. This paper is not about detecting outliers; it is about investigating the robustness of each DMUs efficiency score. The main inspiration is Timmer (1971), and the intention is to offer a crude and simple method that works relatively quickly and is available to practitioners as a freely downloadable software package. In the following only DEA related approaches are considered. There are mainly two ways sensitivity to measurement error in DEA has been examined: (1) perturbations of the observations, often with strong focus on the underlying LP model, and (2) exclusion of one or more of the observations of the dataset. The Efficiency Stepladder is based on the latter alternative. The main idea is to examine how the efficiency score of a given inefficient DMU develops as the most influential other DMU is removed in each of the iterative steps. The first step is to determine which of the peers whose removal is associated with the largest increase in the efficiency score. This peer is permanently removed, and the DEA model is recalculated giving a new efficiency score and a new set of peers. The removal continues in this fashion until the DMU in question is fully efficient. This series of iterative DMU exclusions provides an “efficiency curve” of the increasing efficiency values connected with each step. There are few alternative approaches available that provide information about the sensitivity of efficiency scores. Related methods in the literature are Peeling (Barr et al., 1994), Efficiency Order (Sinuany-Stern et al., 1994) and Efficiency Depth (Cherchye et al., 2000). Peeling consists of removing all the frontier units in each step. There are also similarities between the Efficiency stepladder and the Efficiency order/Efficiency Depth methods. The vii main difference is that the Efficiency stepladder approach is concerned with the stepwise increase in the efficiency scores after each iterative peer removal, while the Efficiency Order/Efficiency Depth methods are more concerned with the number of observation removals that is required for the DMU in question to reach full efficiency. The empirical application is mainly used as an illustration on how the Efficiency stepladder method works on real world data. The application is used to show what kind of analysis can be performed using this method. To carry out a full scale empirical analysis is an extensive undertaking, and is outside the scope of this paper. Ideally sensitivity analysis, detection of potential outliers, and estimation of sampling bias should be carried out simultaneously. It is easier to detect outliers if we have some information about the sampling bias, and it is easier to estimate sampling bias if we have first identified the outliers. There have been developments made on all these areas in the last few years, but at the time of writing no single method offers a solution to all the mentioned challenges. The Efficiency stepladder method is simple and crude, but it can still be useful for applied DEA investigations. It should be thought of as one way safe: An Efficiency stepladder that is very steep is a clear indication that the DEA estimated efficiency is strongly dependent on the correctness of a low number of other observations. A slow increase on the other should not be interpreted as a strong indication that the efficiency is at least this low. The reason is that the method is only one-step-optimal. In addition to measuring the sensitivity of the escores for efficient and inefficient units, it might be used in combination with bootstrapping to identify possible outliers. The necessary software for carrying out the Efficiency stepladder calculations will be made available from the author’s website. The purpose of the ESL method is to examine the sensitivity of the efficiency scores for measurement errors. Bootstrapping on the other hand is in the DEA context (primarily) used to measure sensitivity to sampling errors. We would expect that a DMU with a large ESL(1) value would also have a large standard error of the bias corrected efficiency score. The reason is that we expect the part of the (input, output) space where the DMU is located to be sparsely populated. Tentative runs have shown statistically significant and positive correlation between the ESL(1) values and the standard errors of the bootstrapped bias corrected efficiency scores. Furthermore, there is strong empirical association between the ESL(1) values for the fully efficient DMUs (=superefficiency) and the sampling bias estimated using bootstrapping. This is a promising topic for further research. Essay IV. Efficiency of Norwegian construction firms Low productivity growth of the construction industry in the nineties (based on national accounting figures) is causing substantial concern in Norway. To identify the underlying causes investigations at the micro level are needed. However, efficiency studies at the micro level of the of the construction industry are very rare. The objective of this study is to analyze productive efficiency in the Norwegian construction industry. A piecewise linear frontier is used, and technical efficiency measures (Farrell, 1957) are calculated on cross section data following a DEA (data envelopment analysis) approach (Charnes et al., 1978). The DEA efficiency scores are bias corrected by bootstrapping (Simar and Wilson, 1998, 2000), and a bootstrapped scale specification test is performed (Simar and Wilson, 2002). A new contribution is to use weights based on the standard errors from the bootstrapped bias correction in the two stage model when searching for explanations for the efficiency scores. viii One reason for the small number of efficiency analyses of the construction industry may be the problem to “identify” the activities in terms of technology, inputs and outputs in this industry. It is well known that there are large organizational and technological differences between building firms. Even when the products are seemingly similar there are large differences in the way projects are carried out. For instance some building projects use a large share of prefabricated elements, while other projects produce almost everything on the building site. This often happens even when the resulting construction is seemingly similar. It is interesting to note that projects with such large differences in the technological approach can exist at the same time. Moreover, the composition of output varies a lot between different construction companies so the definition of the output vector may also be a problem. Thus to capture such industry characteristics, a multiple input multiple output approach is required. Large differences in the efficiency and productivity scores were discovered. One important lesson that can be learned from this application is the danger of taking the efficiency scores from uncorrected DEA calculations at face value. If one decided to learn from a few DMUs based on their uncorrected efficiency scores, one might get into trouble. It is not unreasonable to think that similar things have happened in the last few years as DEA has been embraced by a very large number of practitioners (researchers and consultants). It would be interesting if the large number of empirical DEA papers were recalculated using the bootstrap methodology. Anecdotal observations indicate that very few practitioners use bootstrapping. The reason for this might be that bootstrapping is not yet available in the standard DEA software packages. Based on a scale specification test, a variable returns to scale specification was selected. A scale chart indicated that firms with total production values lower than 100 mill. NOK might be operating at a suboptimal scale level. The differences in the efficiency scores may be explained by environmental and managerial variables. Such variables have been tried in a two stage approach. A new contribution is the demonstration of how one can use the standard errors from the bias correction in stage one to improve the power of the regression model in stage two. Five possible explanations were examined for empirical relevance, and four of them were found to be statistically significant in a multivariate weighted regression setting. More detailed data would be necessary before strong conclusions can be made, but there are indications that the most efficient building firms are characterized by high average wages, low numbers of apprentices, diversified product mixes and high numbers of hours worked per employee. ix References Barr, R.S., M.L. Durchholz and Seiford, L., 1994, Peeling the DEA Onion. Layering and Rank-Ordering DMUs Using Tiered DEA, Southern Methodist University technical report, Dallas, Texas. Caves, D.W., L.R. Christensen and E. Diewert, 1982a, The economic theory of index numbers and the measurement of input, output, and productivity, Econometrica 50, 1393-1414. Charnes, A., Cooper, W.W. and Rhodes, E., 1978, Measuring the efficiency of decision making units, European Journal of Operations Research 2, 429-444. Charnes, A., Cooper, W.W., Lewin, A.Y. , Morey, R.C., and Rousseau, J.J.., 1985. Sensitivity and Stability Analysis in DEA. Annals of Operations Research 2 139-150. Cherchye, L. Kuosmanen, T. and Post, G.T., 2000, New Tools for Dealing with Errors-InVariables in DEA, Katholike Universiteit Leuven, Center for Economic Studies, Discussion Paper Series DPS 00.06. Farrell, M.J.,1957, The measurement of productive efficiency, J.R. Statis. Soc. Series A 120, 253-281. Førsund, F. R. and S. A. C. Kittelsen, 1998, Productivity development of Norwegian electricity distribution utilities, Resource and Energy Economics 20(3), 207-224. Jamasb, T. and M. Pollitt, 2001, Benchmarking and regulation: international electricity experience, Utilities Policy 9(3), 107-130. Sinuany-Stern, Z., A. Mehrez and A. Barboy, 1994, Academic Departments Efficiency via DEA, Computers Ops. Res., vol. 21, No. 5, pp. 543-556. Simar, L. and Wilson, P. W., 1998, Sensitivity analysis of efficiency scores: How to bootstrap in nonparametric frontier models. Management Science, 44, 49–61. Simar, L., and Wilson, P., 2000, A general methodology for bootstrapping in nonparametric frontier models, Journal of Applied Statistics 27, 779--802. Simar, L. and Wilson, P., 2002, Nonparametric Tests of Returns to Scale, European Journal of Operational Research, 139, 115-132 Timmer, C.P., 1971, Using a Probibalistic Frontier Production Function to Measure Technical Efficiency, Journal of Political Economy, Vol. 79, No. 4 (Jul. – Aug. 1971), 776-794. INTERNATIONAL BENCHMARKING ∗ OF ELECTRICITY DISTRIBUTION UTILITIES by Dag Fjeld Edvardsen The Norwegian Building Research Institute Forskningsvn. 3 b P.O. Box 123, Blindern, 0314 Oslo, Norway and Finn R. Førsund± Department of Economics, University of Oslo, and the Frisch Centre P.O. Box 1095, Blindern, 0317 Oslo, Norway Abstract: Benchmarking by means of applying the DEA model is appearing as an interesting alternative for regulators under the new regimes for electricity distributors. A sample of large electricity distribution utilities from Denmark, Finland, Norway, Sweden and the Netherlands for the year 1997 is studied by assuming a common production frontier for all countries. The peers supporting the benchmark frontier are from all countries. New indices describing cross-country connections at the level of individual peers and their inefficient units as well as between countries are developed, and novel applications of Malmquist productivity indices comparing units from different countries are performed. Key words: Electricity distribution utility, benchmarking, efficiency, DEA, Malmquist productivity index JEL classification: C43, C61, D24, L94. ∗ The study is done within the research project “Efficiency in Nordic Electricity Distribution” at the Frisch Centre, financed by the Nordic Economic Research Council. Finn R. Førsund was visiting fellow at ICER during the fall 2001 and spring 2002 when completing the paper. We are indebted to a group of Danish, Dutch, Finnish, Norwegian and Swedish electricity regulators for cooperation and comments on earlier drafts at project meetings in Denmark, Norway, Finland and the Netherlands. We will especially thank Susanne Hansen, Kari Lavaste and Victoria Shestalova for written comments. We are indebted to Sverre A. C. Kittelsen for valuable comments on the last draft, and a referee for stimulating further improvements. The electricity regulators, headed by Arne Martin Torgersen and Eva Nœss Karlsen from NVE, have done extensive work on data collection. However, notice that the responsibility for the final model choice and focus of the study rests with the authors. Furthermore, the analysis is only addressing technical efficiency measurement, and in particular not cost efficiency. The study is not intended for regulatory purposes. ± Corresponding author. Tel.:+47-2285-5132; fax: +47-2285-5035 Email address: [email protected] (F.R. Førsund). 2 1. Introduction Improvement of efficiency in electricity distribution utilities has come on the agenda, as an increasing number of countries moved towards deregulation of the sector in the last decade. A key element in assessing potentials for efficiency improvement is to establish benchmarks for efficient operation. A standard definition of benchmarking is a comparison of some measure of actual performance against a reference performance. One way of obtaining a comprehensive benchmarking as opposed to partial key ratios is to establish a frontier production function for utilities, and then calculate efficiency scores relative to the frontier. In this study a piecewise linear frontier is used, and technical efficiency measures (Farrell, 1957) and Malmquist productivity measures (Caves et al., 1982a) are calculated by employing the DEA model (Charnes et al., 1978). The DEA model has been used in several studies of the utilities sector recently (see a review in Jamasb and Pollitt, 2001). A special feature of the present cross section study is that the data (for 1997) is based on a sample of utilities from five countries: Denmark, Finland, The Netherlands, Norway and Sweden. Most of the efficiency studies of utilities have been focusing on utilities within a single country (Førsund and Kittelsen, 1998), but a few studies have also compared utilities from different countries (Jamasb and Pollitt, 2001). In some cases an international basis for benchmarking is a necessity due to the limited number of similar firms within a country. When the number of units is not the key motivation for an international sample for benchmarking, the motivation may be to ensure that the national best practice utilities are also benchmarked1. There are some additional problems with using an international data set for benchmarking. The main problem is that of comparability of data. One is forced to use the strategy of the least common denominator. A special issue is the correct handling of currency exchange rates. There are really only two practical alternatives; the average rates of exchange and the Purchasing Power Parity (PPP) as measured by OECD. The latter approach is chosen here. Relative differences in input prices like wage rates and rates of return on capital may also create problems as to distinguish between substitution effects and inefficiency. 1 An alternative is to use hypothetical units based on engineering information, as mentioned already in Farrell (1957). In Chile and Spain hypothetical model best practice units are used for benchmarking (Jamasb and Pollitt, 2001). 3 According to the findings in Jamasb and Pollitt (2001) international comparisons are often restricted to comparison of operating costs because of the heterogeneity of capital. As a precondition for international comparisons they focus on improving the quality of the data collection process, auditing, and standardization within and across countries. Our data have been collected specifically for this study by national regulators, and special attention has been paid to standardize the capital input as a replacement cost concept. Regarding the extent of international studies Jamasb and Pollitt (2001) found that 10 of the countries covered in the survey (OECD- and some non-OECD countries) have used some form of benchmarking, and about half of these use the frontier-oriented methods: DEA, Corrected Least Squares (COLS) and the Stochastic Frontier Approach (SFA). They predict that benchmarking is likely to become more common as more countries implement power sector reforms. (For an opposing view, see Shuttleworth, 1999.) The rest of the paper is organized in the following way: In Section 2 the DEA model is introduced and new indices are developed to capture the cross-country pattern of the nationality of peers and the nationality of units in their sets of associated inefficient units. Malmquist productivity approaches are developed for cross section international comparisons. In Section 3 the theory of distribution of electricity as production is briefly reviewed with regards to the choice of variable specification. Structural differences between the countries revealed by the data are illustrated. The results on efficiency distributions and inter-country productivity differences using Malmquist indices are presented in Section 4. Conclusions and further research options are offered in Section 5. 2. The methodological approach 2.1. The DEA model As a basis for benchmarking we will employ a piecewise linear frontier production function exhibiting the transformations between outputs, ym (m = 1,..,M) and the substitutions between inputs, xs (s = 1,..,S). We will assume constant returns to scale (CRS). The frontier is enveloping the data as tightly as possible, and observed utilities, termed best practice, will form the benchmarking technology. The Farrell technical efficiency measures are calculated 4 simultaneously with determining the nature of the envelopment, subject to basic properties of the general transformation of inputs into outputs (Färe and Primont, 1995). The efficiency scores for the input oriented DEA model, Ei for utility no i ( i ∈ N = set of units) are found by solving the following linear program: Ei = Min θ i ∑ λij ymj − ymi ≥ 0 , m = 1,.., M s.t. (1) θ i xsi − ∑ λij xsj ≥ 0 , s = 1,.., S j∈N j∈N λij ≥ 0 , j ∈ N The point (∑ j∈N λij x1 j ,..,∑ j∈N λij xSj ,∑ j∈N λij y1 j ,.., ∑ j∈N λij yMj ) is on the frontier and is termed the reference point. In the CRS case the input- and output oriented scores are identical. However, we may need to keep non-discretionary variables fixed when calculating the efficiency scores. Then, in the case of an output fixed, the input-oriented model (1) and the scores remain the same. But if one of the inputs is fixed the efficiency correction of that input constraint in (1) is dropped and the numerical results for efficiency scores may be different.2 2.2. The Peers The efficient units identified by solving the problem (1) are defined as peers if the efficiency score is 1 and all the output- and input constraints in (1) are binding. Each inefficient unit will be related to one or more benchmark or peer units. Let P be the set of peers and I the set of inefficient units, P ∪I = N. A Reference set or Peer group set for an inefficient unit, i, Pi = { p ∈ P : λip > 0} , i ∈ I (Cooper, Seiford and Tone, 2000), is defined as: (2) Each inefficient unit, i, has a positive weight, λip, associated with each of its peers, p, from the solution of the DEA model (1). The weights, λip, are zero for inefficient units not having unit p as a peer. Since all peers have the efficiency score of one there is a need to discriminate between peers as to importance as role models. Measures used in the literature are a pure 2 Correspondingly, an output-oriented model will be different if one of the outputs is fixed (but not if one of the inputs is fixed), since the constraint involving this variable will be reformulated to hold without the efficiency correction of the output variable for the unit being investigated. 5 count measure based on the number of peer group sets (2) that a peer is a member of, calculating a Super-Efficiency measure for a peer against a frontier recalculated without this peer in the data set supporting the frontier (Andersen and Petersen, 1993), and a Peer index (Torgersen et al., 1996) showing the importance of a peer as a role model based on the share of the input savings of the inefficient units referenced by a peer, weighted by the weights λip found by solving (1). 2.3. Cross group influence of peers For our situation with units from different countries we are more interested in developing measures that show the interconnections between peers and inefficient units from different countries. We will need to consider a peer and the set of inefficient units that are referenced by the peer. We will term this apparently new set in the literature, Ip, the Referencing set for a I p = {i ∈ I : λip > 0} , p ∈ P peer, p: (3) One approach is to focus on the country distribution of the inefficient units in a peer’s referencing set. Units must now be identified by country. Let L be the set of countries and Iq the set of inefficient units of country q ( ∪ I q = I ). Partitioning the Referencing set (3) by q∈L I pq = { i ∈ I q : λip > 0} , p ∈ P, q ∈ L , ∪ I pq = I p grouping the inefficient units according to country yields: (4) q∈L Let the number of units in the Referencing set (3) be #Ip, the number of units in the set (4) be # I pq and the set of peers from country q be Pq ( ∪ P q = P ). The Degree of peer localness q∈L index, DLqp , for peer p in country q, is then defined as: DLqp = # I pq #Ip , p ∈ Pq , q ∈ L (5) The index varies between zero and one. Zero means that the peer is “extreme- international”, only referencing inefficient units from other countries, and one means that the peer is “extreme- national”, only referencing inefficient units from own country. In Schaffnit et al. (1997) a count measure was developed describing the number of inefficient units belonging to a group referenced by peers from another group, relative to the total 6 number of units of the first group (may be the number of inefficient units would be more appropriate). In order to obtain more detailed information we will instead develop an index for Cross-country peer importance by using characteristics of the inefficient units analogous to the Peer index mentioned above. In the case of input orientation3 the index, ρ qrs , can be established by weighing the saving potential of an input, s, for the inefficient units from a country, q (= xks (1 − Ek ) , k ∈ I q ), with the relevant λkp - weights associated with peers from another country, r ( p ∈ P r ), being in the peer group set of the inefficient units from country ∑ ∑ (λkp / ∑ p '∈P λkp ' ) xks (1 − Ek ) q, and then comparing with the total saving potential of all inefficient units in country q: ρ qrs = p∈P r ∑ k∈I q k∈I q xks (1 − Ek ) , s = 1,.., S , q, r ∈ L (6) The weights in the numerator are normalized with the sum of weights for all peers for the inefficient unit k from country q. In the variable returns to scale case this sum is restricted to be one, but not in the CRS case we are working with. This index will be input (output) variable specific, as is the case for the Peer index. The maximal value of the index is 1. This will be the case if peers belonging to country r reference all the inefficient units of country q, and that they are not referenced by peers from any other country. The minimal index value of zero is obtained if peers from country r do not reference any inefficient unit from country q. 2.4. The Malmquist productivity index The Malmquist productivity index, introduced in Caves et al. (1982a), is a binary comparison of the productivity of two entities, usually the same unit at different points in time, but we may also compare different units at the same point in time. Let the set of units in country q be Nq, etc. ( ∪ N q = N ). The output- and input vectors of a unit, j, are written q∈L y j = ( y j 1 ,.., y jM ) , x j = ( x j 1 ,.., x jS ) , j ∈ N . The Malmquist productivity index, M kq,l , for the two units k and l from country q and r respectively, is: Elq ( yl , xl ) M ( yk , xk , yl , xl ) = q , k ∈ Nq, l ∈ Nr, q∈ L Ek ( yk , xk ) q k ,l (7) The Malmquist index is the ratio of the Farrell technical efficiency measures for the two units, as calculated by solving the program (1). The superscript on the indexes shows the 3 An output-oriented Cross-country peer index can be formulated analogously following the definition of the Peer index in Torgersen et al. (1996) for output orientation. 7 reference technology base (relevant for one of the units being compared, i.e. q means that the efficiency measures are calculated with respect to the frontier for country q). We follow the convention of having the unit indicated first in the subscript of the Malmquist index on the lhs of (7) in the denominator and the second in the numerator, thus unit l is more productive than unit k if M kq,l > 1, and vice versa. If it is appropriate to operate with different reference technologies for countries, following Färe et al. (1994) the Malmquist index can be decomposed multiplicatively into a term reflecting each unit catching up with its reference technology, and a term reflecting the distance between the two reference technologies.4 Since we are dealing with countries it may also be of interest to compare productivity levels between countries. The crucial point concerning how to construct indices for comparisons is the assumption about production technologies. There are two basic alternatives: i) A common frontier technology may be assumed, allowing utilities from different countries to support the DEA envelope. ii) The technologies are national, i.e. only own country units may be best practice ones. 2.5. Common inter- country technology As pointed out in Caves et al. (1982b) it is an advantage to use a circular index when comparing productivities of two countries (units). Berg et al. (1992), (1993), and Førsund (1993) demonstrate that the Malmquist index (7) is not circular (see also the general discussion in Førsund, 2002). In the case of the same frontier technology being valid for all countries, corresponding to assumption i) above, the index is then circular. The calculation of the Malmquist productivity index is greatly simplified, since the benchmark technology will be common for all productivity calculations. The notation of the expressions below is simplified by removing the technology index. A useful characterization of the productivity of a unit, k, may be obtained by comparing the efficiency score for this unit with the geometric mean of all the other scores, following up Caves et al. (1982b), (p. 81, Eq. (34)), where the productivity of one unit was measured against the geometric mean of the productivities of all units: 4 An application of such decomposition in a study of Norwegian electricity distributors is found in Førsund and Kittelsen (1998). 8 Mk = [Π l∈N El ( yl , xl )] Ek ( yk , xk ) 1/# N , k∈N (8) where #N is the total number of all utilities. This geometric mean-based Malmquist index is a function of all observations. To focus on bilateral productivity comparisons between countries, one way of formulating this is to compare the geometric means of efficiencies over units for each country, q and r, symbolized by the sub-index g(r,q): ⎡ Π E ( y , x )⎤ ⎢ q k k k ⎥⎦ = ⎣ k∈N 1/# N r ⎡ Π E ( y , x )⎤ ⎣⎢ l∈N r l l l ⎦⎥ 1/# N q M g ( r ,q ) , q, r ∈ L (9) where #Nq and #N r are the total number of utilities within country q and r respectively. This geometric mean-based Malmquist index is a function of all the observations in countries r and q. The index may be termed the bilateral country productivity index, and is circular, in the sense that the index is invariant with respect to which third country efficiency score average we may wish to compare with countries q and r. If we want to express how, on the average, the units within a country, q, are doing compared with the average over all units, the country r specific index in the denominator of (9) can be substituted with the geometric average of the efficiency scores of all the utilities, i.e. the denominator in (8). The geometric mean of efficiencies for units within a country, symbolized by the sub index g(q), is compared with the geometric mean over all units: ⎡ Π E ( y , x )⎤ ⎢ q k k k ⎥⎦ = ⎣ k∈N 1/# N ⎡ Π El ( yl , xl ) ⎤ ⎣ l∈N ⎦ 1/# N q M g (q) , q∈L (10) 3. Model specification and data 3.1. Distribution as production In the review of transmission and distribution efficiency studies Jamasb and Pollitt (2001) point to the variety of variables that have been used as an indication that there is no firm 9 consensus on how the basic functions of electric utilities are to be modeled as production activities. However, they mention that this may, to some extent, be explained by the lack of data. Modeling the production activity of transportation of electricity has old traditions within engineering economics (see e.g. Førsund (1999) for a review). On a general abstract level the outputs of distribution utilities are the energy delivered through a network of lines and transformers to the consumption nodes of the network and losses in lines and transformers. The inputs are the energy received by the utility, real capital in the form of lines and transformers, and labor and materials used for general distribution activities. Due to the high number of customers for a standard utility it is impossible to implement the conceptualization of a multi-output production function to the full extent. The usual approximation is to operate with total energy delivered and number of customers separately as outputs (Salvanes and Tjøtta, 1994). The latter variable is also often used in engineering studies as the key dimensioning output variable, and taken as the absolute size of a utility (Weiss, 1975). In engineering studies the load density may be a characterization of capital. Load density is the product of customer density and coincident peak load per customer (kWh per square mile). The maximum peak load may also describe capital as a quality attribute, or be used as an output attribute characterizing energy delivered. In the short run the utilities take the existing lines, transformer capacity and the geographical distribution and number of customers as given. But, as pointed out in Neuberg (1977), this is not the same as saying that these variables must be regarded as constants in our analysis. Past decisions reflected in configurations of lines and transformers may give rise to current differences in efficiency. These variables that are exogenous for the firm, may be seen as endogenous from the point of view of society. Even distribution jurisdictions can be rearranged, making the number of customers endogenous. The role of lines varies. It can be regarded as a capital input, but it is also used as a proxy for the geographical extent of the service area. For fixed geographical distribution of customers the miles of distribution line would be approximately set (but note the possibilities of inefficient configurations), thus line length may serve as a proxy for service area. The service area can be measured in different ways. The problem is to find a measure the utility cannot influence (see Kittelsen (1993) and Langset and Kittelsen, 1997). Due to probability of wire- 10 outage and cost of servicing the extent of customer area will influence distribution costs. Non-traditional variables such as size of service area may also be used to specify differences in the production system or technology from utility to utility. According to the extensive review in Jamasb and Pollitt (2001) the most frequently used inputs are operating costs, number of employees, transformer capacity, and network length. The most widely used outputs are units of energy delivered, number of customers, and size of service area. 3.2. Choice of model specification Concerning our choice of input variables it has not been possible to use a volume measure of labor due to the lack of this information for one country (Denmark). Instead a cost measure has been adopted. Labor cost, other operating costs and maintenance have been aggregated to total operating and maintenance costs (TOM). We then face the problem mentioned in the introduction about national differences in wages for labor. It has been chosen to measure TOM in Swedish (SEK) prices. A measure for real capital volume has been established for 1997 by the involved regulators by first creating for the sample utilities a physical inventory of existing real capital in the form of length of types of lines (air, under ground and sea) distributed in three classes according to voltage, categories of transformers according to type (distribution, main) and capacity in kV, transformer kiosks for distribution, and transformer stations for main transformers. The number of capital items for each country has been in the range of 60 to 100. As a measure of real capital the replacement value (RV) is the theoretically correct measure (Johansen and Sørsveen, 1967). To obtain such a measure, aggregation over the categories has been necessary due to the large number of items. The same weights should be used, i.e. using national prices will not yield a correct picture if prices differ. It has been chosen to use Norwegian prices for all countries. A more preferred set of weights may be average prices for all countries, but it has not been feasible to establish such a database for this study. Although lines and transformers have been used separately as inputs in the literature (see e.g. Hjalmarsson and Veiderpass (1992a), (1992b) and Jamasb and Pollitt, 2001), the groups have been aggregated into a single aggregated capital volume measure in this study, partly due to different classification systems used by the countries. 11 We will simplify on the energy input side and only use the loss in MWh in the system as a proxy for input. This variable will also capture a quality component of the distribution system. A problem is that data on losses may be measured with less precision due to measuring periods not coinciding with the calendar year. For some countries an average loss for the last three years is used, while loss for the last year or its estimate is used for other countries. On the output side energy delivered and the number of customers are used as outputs. The countries have information on low and high voltage, but since the classification of high and low voltage differs, we had to use the aggregate figures. Some measure of geographical configuration of the distribution networks should also be included for a relevant analysis of efficiency. In this study the total length of distribution lines is the only available measure for service area. In addition to service area the density of customers of a distributor is usually considered to influence the efficiency. But when using absolute number of customers and energy delivered as separate outputs there is no room for an additional density variable of the type energy per customer. By nature of the radial efficiency measure, the reference point on the frontier has the same energy-per-customer density as the observation in question. The countries involved have very different population densities. But it is not so obvious how this will influence efficiency in distribution. A rural distributor in Norway may serve a community located along a valley bottom with people living fairly close to each other, while the geographical area of the municipality may include vast uninhabited area of mountains and forests above the valley floor. A densely populated area in the Netherlands may not necessarily save on lines per unit of area if low-rise housing dominates. 3.3. The data structure An overview of key characteristics of the data is presented in Table 1. The difference in size between utilities is large, as revealed by the last two columns. A summary of the structure of the data of the individual countries is shown in the radar diagram in Figure 1, where country averages relative to the total sample averages (the 100% contour) are portrayed. By using the contour curves for percentages, relative comparisons can also be done between countries. The domination in size of the Netherlands is obvious in all dimensions except for energy delivered. The Netherlands is especially large in number of customers, but also in replacement value. It is relatively smaller in length of lines. Norway is 12 Table 1 Summary statistics. Cross-section 1997. Number of units 122 Average Median Standard Deviation Minimum Maximum TOM(kSEK) 152388 LossMWh 91449 RV (kSEK) 2826609 NumCust 109260 TotLines 7640 MWhDelivered 2110064 97026 52318 1907286 55980 4948 1003472 182923 104777 3288382 163422 8824 2815025 11274 7020 211789 20035 450 166015 981538 615281 22035846 1052096 54166 178054730 largest with respect to energy delivered and also correspondingly large in energy loss, although with a smaller value than the Netherlands. Sweden stands out with relatively high operating and maintenance costs (TOM), while Finland stands out with a high number for length of lines. Denmark has the smallest number for length of lines and energy loss, and has a relatively high number of customers. The combinations of number of customers and length of line show the highest customer density in the Netherlands and then Denmark second, and the lowest density in Finland. Opex 300 % 200 % MWhDelivered LossMWh 100 % 0% TotLines RV Denmark (24 units) Finland (25 units) Sweden (42 units) Norway (16 units) Netherlands (15 units) NumCust Figure 1. The average structure of the countries 13 4. The results 4.1. Efficiency scores The distribution of efficiency scores5 for model (1) is shown in Figure 2. The units for each country are grouped together and sorted according to ascending values of the efficiency score. Each bar represents a unit, an electricity distribution utility company. The size of each unit, measured as total operating and maintenance costs (TOM) (including labor costs), is proportional to the width of each bar.6 The efficiency score is measured on the vertical axis and the TOM values measured in SEK (in 1000) are accumulated on the horizontal axis. As a Denmark Finland Netherlands Norway Sweden 1.0 0.9 0.8 0.7 0.6 E 0.5 0.4 0.3 0.2 Common Local 0.1 Geometric mean 0.0 0 2 000 000 4 000 000 6 000 000 8 000 000 10 000 000 12 000 000 14 000 000 16 000 000 18 000 000 Size in TOM Figure 2. Country distribution of efficiency scores 5 The efficiency score values are given in Edvardsen and Førsund (2002). One Dutch unit has been removed from the original data set after performing a sensitivity test and considering the atypical structure of the unit. Notice that service area may be regarded as a fixed non-discretionary variable without any consequence for the values of the efficiency scores since input-orientation is adopted, cf. the discussion below the model (1). 6 The regulators chose this input-based size measure as being most relevant to them. Other candidates for size variables are mentioned in Section 3. It does not matter much, which one is chosen for the purpose of getting information about the location of units according to size. 14 general characterization the units are distributed in the interval from 0.44 to 1, and the share of TOM of fully efficient units is rather small, representing about 5 percent of accumulated TOM. When looking at the country distributions it is remarkable that all countries have fully efficient units. This supports the use of a common technology, in the sense that no country is completely dominated by another, and all countries contribute to spanning the frontier. There are two aspects that the figure sheds light on: the size of the efficient units – measured by the input total operating costs – and how the efficient units stand out in the country specific distributions. For the three countries Denmark, the Netherlands and Sweden, the efficient units are quite small compared to average size within each country. This is especially striking for the Netherlands with the most pronounced dichotomy in size with one group of large units and the other with considerably smaller ones. The units within the group of large units have about equal efficiency levels, while the group with small units has units both at the least efficient part and the most efficient part of the distribution. The least efficient units have only half the value of the efficiency score than the average. For Finland and Norway the efficient units are closer to the medium size (disregarding the large Norwegian selfevaluator). The Swedish distribution is characterized by an even distribution of efficiency scores with large units being at the upper end of the inefficiency distribution, and mediumand small sized units being evenly located over the entire distribution. The inefficient units with the highest efficiency scores have values quite a bit lower than 1 for Denmark, the Netherlands and Norway, while the values are much closer to the fully efficient ones in Finland and Sweden. The Norwegian distribution has no marked size pattern, but has a much more narrow range of the efficiency scores for the inefficient units than for Sweden. The range of the distribution for Finland is the narrowest without one or two extremely inefficient units like the case for the Netherlands, Norway and Sweden. Both for Finland and Denmark the largest units are located centrally in the distributions. A rough measure of the total potential improvement for each country may be read off graphically in Figure 2 by the area between the value 1 for the efficiency score and the top of the bars representing the individual units for each country. The total savings potential for operating and maintenance costs is about 20 per cent (the potential for the other two inputs cannot be seen so accurately since TOM is used as the size variable). Finland has the smallest 15 potential while Sweden and the Netherlands have the highest. As a summary expression for the different shapes of the efficiency distributions, different number of units and absolute size the savings potential (= ∑ i∈I q xis (1 − Ei ) / ∑ j∈I x js (1 − E j ), q ∈ L) for each of the three inputs between units and location of size classes within country distributions, the country share of are set out in Table 2, using the radial projections.7 Due to the large, inefficient Dutch units that we see in Figure 2, the Netherlands has a higher savings potential than the other countries, especially for replacement value of capital. Sweden has a high potential for total operating- and maintenance costs savings, and Norway for savings in energy loss. Denmark comes second to the Netherlands in saving potential for replacement value of capital, and has the smallest share for energy loss, roughly on the same level as Finland. Finland has significantly lower savings potential for total operating- and maintenance costs and replacement value of capital than the other countries. In order to assess the efficiency of countries measuring an individual unit against the total (geometric) mean was introduced in Equation (8). The line of this geometric mean is inserted in Figure 2 ( E = 0.82). The figure gives a visual impression of such comparisons. As overall characterizations we may note that the median efficiency score of Denmark and Norway is below the total mean, while the median value of Finland, the Netherlands and Sweden are higher. The Netherlands is a special case since all the large units are less productive than the sample (geometric) average. 4.2. Structural features of best- and worst practice units From the efficiency distribution shown in Figure 2 we identify the 12 active peers (excluding the self-evaluator) and the 12 worst practice units and calculate the average input- and output values. Since we have 121 units this number represents the upper and lower deciles of the Table 2. Country distribution of savings potential shares Denmark Finland Netherlands Norway Sweden 7 TOM LossMWh RV 0.19 0.08 0.29 0.16 0.28 0.14 0.14 0.28 0.25 0.19 0.22 0.10 0.33 0.18 0.17 If the reference points on the frontier had been used some differences in shares may occur if slacks on the input constraints in (1) are present and unevenly distributed on countries. 16 distribution. The comparison is shown in Figure 3. It is the relative position in the radar diagram that reveals the structure. Both the best practice units (BP) and the worst practice units (WP) are smaller than the sample average (the 100% contour), except the input RV for the WP units. The BP units have, on average, higher values for all outputs than the WP units, and relatively lower number of customers compared with the WP units. Concerning inputs, the WP units have a significant over-use of capital (measured by the replacement value) leading to a much higher use of this input than for BP units, and also higher for total operating and maintenance costs (TOM), while energy loss is actually a little lower than for BP units. 4.3. The degree of localness of peers We have already seen in Figure 2 that peers are found in all countries. An interesting question is the nature of the peers: are they multinational or pure national peers? If all the peers turn out to be national, the common technology is partitioned into country parts, and there is no foundation for international benchmarking. We will use the Degree of peer localness index (5) to describe the connection between a peer and its associated inefficient units, and the scope for international benchmarking. The information is found by partitioning the Referencing sets (3) on countries according to (4). This is done in the columns of Table 3 for TOM 200 % MWhDelivered 100 % LossMWh 0% TotLines RV Average 12 best Average 12 worst NumCust Figure 3. Structural comparison of best- and worst practice units] 17 each peer, entered according to nationality. An inefficient unit may appear in one or more of the peer columns.8 All the active peers are referencing one or more inefficient units from their own country. We use as a criterion for a national peer that 50 percent or more of the inefficient units in its Referencing set are from its own country. The Degree of localness index in the last row of Table 3 shows that three peers are national. Both the two Swedish units (5022 and 5047), and one Danish unit (1023) have national roles as peers. The two Swedish peers have the highest Degree of peer localness index values of all peers, 1.00 and 0.73. A Finnish unit (2026) is close to being national, with an index value of 0.48. There are five truly multinational peers in the group of 13 efficient units, in the sense that they are referencing inefficient units from all five countries. Three of these stand out as referencing a considerably higher number of inefficient units, as seen from the second row from below. This is the pure count measure of peer importance. Only one peer (Swedish unit 5047) is truly national in the sense that it is only referencing inefficient units from its own country. Based on the pattern of country origin of peers and referenced units, Sweden has the most national peers with only one of two peers referencing a few inefficient units from Norway, Finland and Denmark. Denmark and Sweden seem to be furthest apart with reference to the common technology frontier, since two of Denmark’s peers have only a single Swedish inefficient unit in their Referencing its sets, and only one Danish inefficient unit has a Swedish peer. Two of the four Finnish peers have no Swedish units in their referencing sets. Three peers, one each from Finland, the Netherlands and Norway, have the maximal number of inefficient Swedish units in their Referencing sets. Actually the Finnish and Norwegian Table 3. The degree of localness index (5). Country partitioning (4) of Referencing sets (3) Denmark Finland 1009 Denmark 10 Finland 8 Netherlands 6 Norway 2 Sweden 1 Total count 27 Localness index 8 the Netherlands Norway 1023 2014 2016 2026 2124 3005 21 13 4 4 8 5 3 15 3 13 2 2 11 6 0 7 0 2 3 12 4 3 1 0 1 33 0 0 8 0 39 79 11 27 19 9 3010 12 12 6 5 28 63 3017 4 0 1 0 2 7 4192 4462 6 0 9 0 7 0 15 0 34 0 71 0 0.37 0.54 0.19 0.27 0.48 0.11 0.22 0.10 0.14 0.21 - Sweden 5022 1 2 0 8 30 41 5047 0 0 0 0 6 6 0.73 1.00 The maximal number for each inefficient unit is five peers, since there are six constraints in (1) and the solution for the efficiency score is always positive. 18 peers are referencing more Swedish units than the Swedish peers themselves, and the Dutch peer just a few Swedish units less than the Swedish peer with the highest number of Swedish inefficient units in its Referencing set. We interpret the obtained values of the Degree of peer localness index as empirical support for the importance of international benchmarking. Furthermore, the frontier seems to be well supported by the data in the sense that there is only one self-evaluator among the peers (Norwegian peer 4462). 4.4. Cross-country peer patterns While the Degree of localness index is peer-specific, there may also be a need for a description of how countries are interconnected. The results for the Cross-country peer importance index, ρ qrs , defined in (6), are set out in Table 4 (a-c). As explained in Section 2 the index is based on combining the numbers in Table 3 on the occurrence of inefficient units in the referencing sets, and the weights, λip, which are part of the solution to model (1).9 The origin of the inefficient units is given by the rows, and the columns give the origin of peers. The interpretation of a cell number, e.g. 17.5 in the second cell in the first row of Panel a, is the relative share of the weighted input saving (in percent) of replacement value of capital of inefficient Danish units referenced by Finnish peers. If we look at the most influential peer country for the inefficient units we see that for two of the three inputs Dutch peers are more important than Danish ones for inefficient units in Denmark, while for Dutch inefficient peers Danish peers are the most important for one input and Dutch peers for two inputs. For Finnish inefficient units Finnish peers are the most important by a large margin, and this national influence is also the case for Norway. For Swedish inefficient units Finnish peers are the most important for all inputs, and then the Dutch peers, the Swedish peers coming third. 9 The numbers are reported in Edvardsen and Førsund (2002). 19 Table 4.Cross-country peer importance index (6) in percent. Panel a. Replacement value of capital the DenmarkFinland Netherlands Norway Denmark 39.8 17.5 37.8 0.3 Finland 2.1 72.4 20.3 0.7 Netherlands 45.4 10.9 40.6 3.1 Norway 15.5 23.7 5.7 43.1 Sweden 0.2 46.5 26.1 4.9 Sweden 4.7 4.5 0.0 12.0 22.3 Panel b. Total operating and maintenance costs the Denmark Finland Netherlands Norway Sweden Denmark 34.6 Finland 3.6 Netherlands 38.4 Norway 10.2 Sweden 0.6 15.2 57.5 14.0 22.9 36.7 40.1 32.2 45.2 8.1 33.0 0.4 1.2 2.4 40.6 4.5 9.8 5.4 0.0 18.2 25.3 Panel c. Energy loss the Denmark Finland Netherlands Norway Sweden Denmark 36.7 Finland 3.1 Netherlands 38.3 Norway 9.1 Sweden 0.4 16.1 60.1 14.9 22.6 35.4 38.4 31.3 44.4 8.7 30.3 0.3 1.2 2.5 44.0 4.4 8.4 4.3 0.0 15.5 29.5 Inspecting the peer groups we see that the Danish peers are more important for Dutch inefficient units than for Danish ones, the latter coming second for all inputs. The Finnish peers are most important for Finnish inefficient units and then for Swedish units for all inputs. The Dutch peers are most important for Dutch units, and then come Danish inefficient units. Norway and Sweden are most strongly connected, with Norwegian peers having Swedish inefficient units as the second most important group of inefficient units after its own. Swedish peers also have Norwegian inefficient units as the second most important group after its own inefficient units. The location of small and high values of the index values shows the pattern of cross-country connections. The Norwegian peers have a very low importance for inefficient units in all 20 other countries than Norway itself and Sweden. As seen also from Table 3 there is no connection between Swedish peers and Dutch inefficient units. The impact on Danish and Finnish peers is small compared with the impact on Norwegian units and its own inefficient units. The connections between Denmark and the Netherlands work symmetrically both ways, while Finnish peers influence Swedish inefficient units much more than Swedish peers are of importance for Finnish inefficient units, and Dutch peers are more important for Finnish inefficient units than Finnish peers are for Dutch inefficient units. 4.5. Local versus common technology We have investigated the possibility of operating with individual country technology by running the DEA model for the three output- and three input variables. However, we may have a problem of dimensionality with Denmark, Finland, the Netherlands and Norway, since this sample includes 24, 25, 16 and 14 units respectively. The ad hoc rule (Cooper et al., 2000) that there are dimensionality problems if the number of dimensions multiplied with three is higher than the number of observations, apply to the Netherlands and Norway. A run of country specific technologies is presented together with the common frontier in Figure 2. The ordering of units within the countries from the common technology run is kept, and the scores for country specific technologies shown by the step curve above the bars are ordered identically. As expected the number of efficient units in the Netherlands and Norway increase drastically, and also for Denmark. The individual changes for the units can be large, illustrating the dimensionality problem for all countries except Sweden. The distribution for Sweden with 42 observations is much more stable, and we see a more or less parallel shift upwards of the whole distribution. Of the 11 units being efficient within the local frontier, only two remain so within the common frontier, and only one peer has other countries’ inefficient units in its referencing set. Other countries’ peers setting a higher standard for Swedish units cause the downward shift the efficiency distribution. The importance of exposing national peers to international benchmarking is clearly demonstrated. 4.6. Productivity comparisons of countries Table 5 shows the ratios of the geometric average of the efficiency scores for each country relative to all other countries and also to the total geometric mean (cf. Equations (9) and (10)). Finland seems to be the most productive country within the common technology, having a bilateral index value compared with all the other countries higher than one. Sweden comes closest, while Norway and the Netherlands are on about the same level, and Denmark 21 is the least productive country. Starting with Denmark; Finland and Sweden are the most productive countries relative to it, while the Netherlands and Norway are ahead by 4 to 6 percentage points. Norway’s performance is closest to the Netherlands’, lagging it by about 1 percentage point. It is interesting to note, in view of the special situation for Sweden revealed earlier, that Sweden, after all, on average, is in front of all countries with the exception of Finland. We can use the performance against the total sample average as a final ranking. The last row shows that the ranking has Finland at the top, then Sweden, the Netherlands, Norway and Denmark, the two first countries being in front of the total (geometric) average and the other three behind. The use of the Malmquist indices to rank countries corresponds closely to the form of the country efficiency distributions discussed above in connection with Figure 2, where it was pointed out that Finland and Sweden had the most even distributions and the highest share of units above the total sample mean. 5. Conclusions When doing international benchmarking for the same type of production activity in several countries, applying a common frontier technology seems to yield the most satisfactory environment for identifying multinational peers and assessing the extent of inefficiency. In our exercise for a sample of large electricity distribution utilities from Denmark, Finland Norway, Sweden and the Netherlands it is remarkable that peers come from all countries. The importance of exposing national units, and especially units that would have been peers within a national technology, to international benchmarking is clearly demonstrated. The multinational setting has called for the development of new indices to capture the cross- Table 5. Productivity comparisons of countries. Malmquist productivity indices (9), (10) calculated as ratios of geometric means the Denmark Finland Netherlands Norway Sweden Denmark Finland Netherlands Norway Sweden Average all units 1.00 0.86 0.95 0.96 0.89 1.16 1.00 1.10 1.11 1.04 1.06 0.91 1.00 1.01 0.94 1.04 0.90 0.99 1.00 0.93 1.12 0.97 1.06 1.08 1.00 0.92 1.07 0.97 0.96 1.03 22 country pattern of the nationality of peers and the nationality of units in their referencing sets. Bilateral Malmquist productivity comparisons can be performed between units of particular interest in addition to country origin, e.g. sorting by size, or location of utility (urban - rural), etc. We have focused on a single unit against the (geometric) average performance of all units, as well as bilateral comparisons of (geometric) averages of each country. Our results point to Finland as the most productive country within the common technology. This result reflects the more even distribution of the Finnish units and the high share of units above the total sample mean of efficiency scores. The advantage of working with the DEA model is the richness of details available from the model solutions and the concrete connections to actual units. However, this may also be a problem because it is not always so easy to find explanations for specific findings such as why some units are efficient. The main practical purpose of the paper is to serve as a pilot study for the Nordic electricity regulators and the Dutch regulator as a start of a process of finding tools for regulation. The quality of the data and the acceptance of the model framework are of crucial importance for regulation since the units are regulated based on their individual performance as portrayed by the model results. There is at present some disagreement about the possibility of basing regulation of utilities on the approach use here (Shuttleworth (1999), Nillesen and Telling, 2001). In order to improve upon the model approach as a benchmarking tool we would like to follow up the developments and results of the present study with the following research agenda: i) Find explanations for the cross country peer and inefficient unit patterns revealed by the novel cross country peer importance indices ii) Improve the comparability of data between countries by harmonizing definitions of variables and extending collection to cover environmental variables iii) Define financial variables and collect data for cost efficiency exercises iv) Investigate the scale properties by specifying variable returns to scale technology v) Increase the number of cross section observations to cover all units within a country enabling country specific technologies also to be studied (if the total number of national units allows) vi) Establish time series of cross sections enabling productivity developments to be studied vii) Develop a more general transitive Malmquist index for the latter two cases 23 References Andersen, P. and N. C. Petersen, 1993, A procedure for ranking efficient units in data envelopment analysis, Management Science 39, 1261-1264. Berg, S. A., F. R. Førsund and E. S. Jansen, 1992, Malmquist indices of productivity growth during the deregulation of Norwegian banking, Scandinavian Journal of Economics 94, Supplement, 211-228. Berg, S. A., F. R. Førsund, L. Hjalmarsson and M. Suominen, 1993, Banking efficiency in the Nordic countries, Journal of Banking and Finance 17, 371-388. Caves, D.W., L.R. Christensen and E. Diewert, 1982a, The economic theory of index numbers and the measurement of input, output, and productivity, Econometrica 50, 1393-1414. Caves, D. W., L.R. Christensen and W. E. Diewert, 1982b, Multilateral comparisons of output, input, and productivity using superlative index numbers, Economic Journal 92 (March), 73-86. Charnes, A., W.W. Cooper and E. Rhodes, 1978, Measuring the efficiency of decision making units, European Journal of Operational Research 2(6), 429-444. Cooper, W.W., L.M. Seiford and K. Tone, 2000. Data envelopment analysis. A comprehensive text with models, applications, references and DEA – solver software (Kluwer Academic Publishers, Boston/Dordrecht/London). Edvardsen, D. F. and F. R. Førsund, 2002, International benchmarking of electricity distribution utilities, Working Paper No 08/02 ICER [http.//www.icer.it/docs/wp2002/ forsund08-02.pdf]. Farrell, M. J., 1957, The measurement of productive efficiency, Journal of the Royal Statistical Society, Series A, 120, III, 253-281. Färe, R. and D. Primont, 1995, Multi-output production and duality: theory and applications (Kluwer Academic Publishers, Boston). Färe, R., S. Grosskopf, B. Lindgren and P. Roos, 1994, Productivity developments in Swedish hospitals: a Malmquist output index approach, in: A. Charnes, W. W. Cooper, A.Y. Lewin and L. M. Seiford, eds., Data envelopment analysis: theory, methodology, and applications (Kluwer Academic Publishers, Boston/Dordrecht/London), 253-272. Førsund, F. R., 1993, Productivity growth in Norwegian ferries, in: H. Fried, C. A. K. Lovell, and S. Schmidt, eds., The measurement of productive efficiency, techniques and applications (Oxford University press, Oxford), 352-373. Førsund, F. R., 1999, On the contribution of Ragnar Frisch to production theory, Rivista Internazionale di Scienze Economiche e Commerciali (International Review of Economics and Business) XLVI (1), 1-34. 24 Førsund, F. R., 2002, On the circularity of the Malmquist productivity index, Working Paper No 29/02, ICER [http.//www.icer.it/docs/wp2002/forsund29-02.pdf]. Førsund, F. R. and S. A. C. Kittelsen, 1998, Productivity development of Norwegian electricity distribution utilities, Resource and Energy Economics 20(3), 207-224. Hjalmarsson, L. and A. Veiderpass, 1992a, Efficiency and ownership in Swedish electricity retail distribution, Journal of Productivity Analysis 3, 7-23. Hjalmarsson, L. and A.Veiderpass, 1992b, Productivity in Swedish electricity retail distribution, Scandinavian Journal of Economics 94, Supplement, 193-205. Jamasb, T. and M. Pollitt, 2001, Benchmarking and regulation: international electricity experience, Utilities Policy 9(3), 107-130. Johansen, L. and Å. Sørsveen, 1967, Notes on the measurement of real capital in relation to economic planning models, The Review of Income and Wealth, Series 13, 175-197. Kittelsen, S. A. C., 1993, Stepwise DEA; choosing variables for measuring technical efficiency in Norwegian electricity distribution, Memorandum No. 6/1993. Department of Economics, University of Oslo. Langset, T. og S. A. C. Kittelsen, 1997, Forsyningsareal og metodevalg ved beregning av effektivitet i elektrisitetsfordeling [Service area and choice of method when calculating efficiency in electricity distribution], Rapport 85. Stiftelsen for samfunns- og næringslivsforskning, Oslo. Neuberg, L. G., 1977, Two issues in the municipal ownership of electric power distribution systems, Bell Journal of Economics 8(1), 303-323. Nillesen, P. and J. Telling, 2001, Benchmarking distribution companies, EPRM Electricity March 2001, 10-12 [http.//www.icfconsulting.com/Publications/doc_files/ BenchmarkingDistributionCompanies.pdf]. Salvanes, K. G. and S. Tjøtta, 1994, Productivity differences in multiple output industries: an empirical application to electricity distribution, Journal of Productivity Analysis 5, 23-43. Schaffnit, C., D. Rosen and J. C. Paradi, 1997, Best practice analysis of bank branches: an application of DEA in a large Canadian bank, European Journal of Operational Research 98, 269-289. Shuttleworth, G., 1999, Energy regulation brief, National Economic Research Associates, n/e/r/a, London, 1-4 [http.//www.nera.com/wwt/newsletter_issues/4030.pdf]. Torgersen, A. M., F. R. Førsund and S. A. C. Kittelsen, 1996, Slack adjusted efficiency measures and ranking of efficient units, Journal of Productivity Analysis 7, 379-398. Weiss, L.W., 1975, Antitrust in the electric power industry, in: A. Phillips, ed., Promoting competition in regulated markets (Brookings Institute, Washington, DC). FAR OUT OR ALONE IN THE CROWD: ∗ CLASSIFICATION OF SELF-EVALUATORS IN DEA by Dag Fjeld Edvardsen The Norwegian Building Research Institute, Finn R. Førsund† Department of Economics University of Oslo/ The Frisch Centre and Sverre A. C. Kittelsen The Frisch Centre Abstract: The units found strongly efficient in DEA studies on efficiency can be divided into selfevaluators and active peers, depending on whether the peers are referencing any inefficient units or not. The contribution of the paper is to develop a method for classifying self-evaluators based on the additive DEA model into interior and exterior ones. The exterior self-evaluators are efficient “by default”; there is no firm evidence from observations for the classification. These units should therefore not been regarded as efficient, and should be removed from the observations of efficiency scores when performing a two-stage analysis of explaining the distribution of the scores. The application to municipal nursing- and home care services of Norway shows significant effects of removing exterior self-evaluators from the data when doing a two-stage analysis. Keywords: Self-evaluator, interior and exterior self-evaluator, DEA, efficiency, referencing zone, nursing homes JEL classification: C44, C61, D24, I19, L32 ∗ The paper is based on results from the NFR-financed Frisch Centre project “Better and Cheaper?” and written within the project “Efficiency analyses of the nursing and home care sector of Norway” at the Health Economics Research Programme at the University of Oslo (HERO) and the Frisch Centre. † Corresponding author. Email: [email protected], postal address: Department of Economics, University of Oslo, Box 1095, 0317 Blindern, Oslo, Norway. 1. Introduction The calculation of efficiency scores for production units based on a non-parametric piecewise linear frontier production function, is well established within the last two decades. Originally introduced by Farrell (1957) the method was further developed in Charnes, Cooper and Rhodes (1978), where the term the Data envelopment analysis (DEA) model was coined. The efficient units span the frontier, but the classification of some of these units as efficient is not based on other observations being similar, but is due to the method. We are referring to units, which are classified as being self-evaluators in the literature a concept introduced by Charnes et al. (1985a). Self-evaluators may most naturally appear at the “edges” of the technology, but it is also possible that self-evaluators appear in the interior. It may be of importance to distinguish between those self-evaluators that are exterior and those that are interior. Finding the influence of some variables on the level of efficiency by running regressions of efficiency scores on a set of potential explanatory variables, is an approach often followed in actual investigations.1 Using exterior self-evaluators with efficiency score of 1 may then distort the results, because to assign the value of 1 to these self-evaluators is arbitrary. Interior selfevaluators, on the other hand, may have peers that are fairly similar. They should therefore not necessarily be dropped when applying the two- stage approach. The plan of the paper is to review the DEA models in Section 2 and define the new concepts of interior and exterior self-evaluators. In Section 3 the method for classifying the selfevaluators is introduced. Actual data are presented in Section 4 and the method for classifying self-evaluators is applied. The effect of removing exterior self-evaluators is shown. Section 5 concludes. 1 The approach was originally introduced in Seitz (1967), inspired by Nerlove (1965), see Førsund and Sarafoglou (2002). Simar and Wilson (2003) review the approach and find it at fault in general due to serial correlation between the efficiency scores, and provides a new statistically sound procedure based on specifying explicitly the data generating process and bootstrapping to obtain confidence intervals. 2 2. Self-evaluators DEA models Consider a set, J, of production units transforming multiple inputs into multiple outputs. Let ymj be an output ( m ∈ M , j ∈ J ) and xnj an input ( n ∈ N , j ∈ J ) . As the reference for the units in efficiency analyses we want to calculate a piecewise linear frontier based on observations, fitting as closely as possible and obeying some fundamental assumptions, like free disposal, and the technology set being convex and closed as usually entertained (Banker et al., 1984, Färe and Primont, 1995). This frontier can be found by solving the following LP problem, termed the additive model in the DEA literature (Charnes et al., 1985b): ⎧ ⎫ + + ∑ sni− ⎬ Max ⎨ ∑ smi n∈N ⎩ m∈M ⎭ s.t. ∑λ + = 0 ,m∈M ymj − ymi − smi xni − ∑ λij xnj − sni− = 0 , n ∈ N j∈J ij j∈J (1) + , sni− ≥ 0 smi λij ≥ 0 ∑λ j∈J ij =1 The last equality constraint in (1) imposes variable returns to scale (VRS) on the frontier, while dropping this constraint imposes constant returns to scale (CRS). Our analysis will be valid for both scale assumptions. The frontier is found by maximising the sum of the slacks on + the output constraints, smi , and input constraints, sni− . The strongly efficient units (using the terminology of Charnes et al., 1985b) are identified by the sum of the slacks and therefore all the slack variables being zero. All weights, λij , must be zero except the weight for itself that will be one (i.e. λij = 0 for i ≠ j , λii = 1 if i is an efficient unit).2 The efficient points will appear as vertex points on the frontier function surface, or corner points of facets. The sets of strongly efficient units, P, and the inefficient units, I, are: A strongly efficient unit, i, may end up being located exactly on a facet. We may then have multiple solutions for the weights, although the maximal sum of slacks is still zero. One of the solutions will be λij = 0 for j ≠ i, and λii = 1. 2 3 ⎧ ⎫ + P = ⎨i ∈ J : ∑ smi + ∑ sni− = 0 ⎬ m∈M n∈N ⎩ ⎭ ⎧ I = ⎨i ∈ J : ⎩ P∪I = J ∑s m∈M + mi ⎫ + ∑ sni− > 0 ⎬ , n∈N ⎭ (2) So far we only have slacks as measures of inefficiency. If we want only one measure for each unit, and a measure that is independent of units of measurement, the Farrell (1957) measure of technical inefficiency is the natural choice. The standard DEA model on primal (enveloping) form, is set up as a problem of determining the Farrell technical efficiency score, Eoi, (o = 1,2), either in the input- (o = 1) or the output (o = 2) direction for an observation, i. The following LP model is formulated for each observation in the case of input-orientation: E1i ≡ Min θ i ∑λ y s.t. − ymi ≥ 0 , m ∈ M θ i xni − ∑ λij xnj ≥ 0 , n ∈ N j∈P ij mj j∈P λij ≥ 0 ∑λ j∈P (3) =1 ij In the case of output orientation we have the following LP program: 1/ E2 i ≡ Max φi φi ymi − ∑ λij ymj ≤ 0 , m ∈ M s.t. ∑λ x j∈P ij nj λij ≥ 0 ∑λ j∈P j∈P ij − xni ≤ 0 , n ∈ N (4) =1 For notational ease the same symbols have been used for weights in (1), (3) and (4). The proportionality factor, θi or φi, and the weights, λij , are the endogenous variables. (∑ λij x1 j ,.., ∑ λij x# Nj , ∑ λij y1 j ,.., ∑ λij y# Mj ) Adopting the notation #N and #M for the number of inputs and outputs respectively, the point j∈P j∈P j∈P j∈P (5) 4 is per construction on the frontier surface, and is defined as the reference point for unit i. If there are no slacks on the output- and input constraints in (3) or (4) then the reference point coincide with the radial projection point, using either θi or φi when adjusting an inefficient observation. These points will normally be interior points on facets (but may fall on border lines). With one or more slacks positive the reference point and the radial projection point differ. The reference points will again appear as vertex points on the frontier function surface, or corner points of facets. It is well known that the radial Farrell efficiency measure Eoi may be one, but that the unit may still improve its performance by either using less inputs or producing more outputs. All units with a radial efficiency score of one are by definition located on the frontier, but it is only for the strongly efficient units that the reference points coincide with the observation. A unit may have Eoi =1, but one or more of the constraints in (3) or (4) being non-binding (i.e. one or more slacks positive and zero shadow prices on the constraints in question). Although the model (3) or (4) can be solved directly by letting the index j run over all observations in J, a two-stage procedure of solving (1) first is often followed. By using the information on strongly efficient units when solving (3) or (4), the LP computations are done more efficiently, and one will only identify reference points by (5) that are in the strongly efficient subset of the frontier. In the context of the DEA models (3) and (4), the strongly efficient units are termed peers. For each inefficient unit, i, a Peer group set, Pi, (Cooper, Seiford and Tone, 2000) may be { formed: } Pi = p ∈ P : λip > 0 , i ∈ I (6) where λip are the solution values of the weights in either (3) or (4) depending on the orientation in question. If the Peer group sets are empty, then all the units are efficient. The solutions to (1), (3) or (4) do not identify facets systematically, but by using (6) we can identify the corner points of facets where one or more radial projection points of inefficient units are located. 5 It will also turn out useful to look at the group of inefficient units referenced by a peer. Such a set is defined for each peer, p, as the Referencing set in Edvardsen and Førsund (2001) with I p = {i ∈ I : λip > 0 }, reference to the solutions of (3) or (4): p∈P (7) The self evaluators The Referencing set (7) may be empty, in which case the unit is called a self-evaluator: Definition 1: A peer p ∈ P , where the set P is defined in (2), is a self-evaluator if I p = Ø , where Ip is defined in (7)3. The set of peers may thus be partitioned into a set of self-evaluators, PS, and a set, PA, of { } P A = {p ∈ P : I p ≠ Ø} active peers, i.e. peers with non-empty referencing sets: PS = p ∈ P : I p = Ø PS ∪ P A = P (8) The self-evaluators are vertex points of facets without any reference points defined as the radial projection points of inefficient observations located on these facets. The LP solutions to (3) or (4) do not give us any information as to which efficient units constitute the vertex points of such a facet without reference points. An efficient unit may be a vertex point for many facets. Our definition of a self-evaluator implies that there are no reference points on any of its facets. 3. The determination of type of self-evaluator There are two possibilities as to the location of facets formed by self-evaluators on the frontier surface. Such facets may be part of the extreme areas of the frontier, i.e. facets closest to the axes in the case of CRS, or facets, in the case of VRS, also furthest away from the origin or closest to the origin (the VRS frontier will in general not contain the origin). In the case of 3 An alternate definition could be in terms of the reference shares defined in Torgersen, Førsund and Kittelsen (1996), where a self-evaluator has a reference share of zero. 6 CRS only mixes of inputs or outputs may be extreme, while in the case of VRS we in addition have the scale dimension. Such self-evaluators will be termed exterior self-evaluators. In the case of CRS, facets without any reference points may also be found in the interior of the frontier surface with respect to mixes, while for VRS interior also means interior regarding scale. Such self-evaluators will be termed interior self-evaluators. Figure 1 shows the two different cases in the simplest case of two dimensions. The observations represented by points A, B, C, D, F and G are efficient, while O1 is inefficient. The radial reference or projection point for unit O1 is a in the case of input orientation. The reference point (5) in this simple case coincides with the peer A. Considering outputorientation the peers are D and F, and the reference point is d. To illustrate the referencing set of a peer, the shaded area in Figure 1 shows the referencing zone for the efficient unit D in the case of output orientation. All the inefficient units being in unit D’s referencing set must be located here (such inefficient units may also appear in referencing sets of other peers; here unit F’s). If the referencing zone is empty then the peer is a self-evaluator. Removal of such a self-evaluator will not change the efficiency scores for any other units. We would expect the self-evaluators to be extreme points in one or more of the mix or scale dimensions, but if the referencing zone is narrow a self-evaluator may also be centrally placed within the set of observations. A narrow zone means that other peers are close to the self-evaluator. Figure 1: DEA and the two types of self evaluators 7 Notice that the classification as a self-evaluator is dependent of the orientation of the efficiency measure. Considering output orientation we have that both B and C are interior self-evaluators, while A and G are exterior self-evaluators. Considering input orientation we have that B, C, D and F are interior self-evaluators, while G is an exterior one. In both cases the unit G could have been observed anywhere between the line g’ (the continuation of the line DF) and the line g’’ (referenced by F), without any unit changing its estimated efficiency or its status as peer. The efficiency score of 1 assigned to unit G therefore contains little information. In e.g. the output oriented case we see that there is a considerable scope for output variation for a given input yielding the efficiency score of 1. Our purpose is to develop a method for classification into exterior or interior self-evaluators using only the standard DEA format. Enveloping from below The production set is by construction convex. If all inefficient units are removed from the data set, and a new run is done with only the efficient units, we will find the exterior peers by reversing the enveloping of the data from “above” to be from “below”. All that needs to be done is to reverse the inequalities in the LP program (1) by adding the slack variables instead of subtracting: ⎧ ⎫ + Max ⎨ ∑ smi + ∑ sni− ⎬ (i ∈ P ) ⎩ m∈M ⎭ n∈N s.t. ∑λ y + − ymi + smi = 0 ,m∈ M xni − ∑ λij xnj + sni− = 0 , n ∈ N j∈P ij mj (9) j∈P + , sni− ≥ 0 , m ∈ M , n ∈ N smi λij ≥ 0 ∑λ j∈P ij =1 Notice that we are only considering observations belonging to the set of strongly efficient units P determined by solving (1). This envelopment of the data is by construction concave. 8 The units that turn out as “efficient” in solving (9), in the sense that all slacks are zero, must be units belonging to the exterior facets in the solution to the original model (1). We will use this result to define exterior and interior strongly efficient units: Definition 2: A strongly efficient unit belonging to the set P defined by (2) is exterior if it belongs to the set PE: ⎫ ⎧ + − P E = ⎨ p ∈ P : ∑ smp + ∑ snp = 0⎬ m∈M n∈N ⎩ ⎭ (10) + − , snp where the slack variables, smp , are solutions to the problem (9). A strongly efficient unit belonging to the set P defined by (2) is interior if it belongs to the set ⎧ ⎫ + − P I = ⎨ p ∈ P : ∑ smp + ∑ snp > 0⎬ ( P E ∪ P I = P) m∈M n∈N ⎩ ⎭ (11) where the set PE is defined in (10)4. To determine the nature of a self-evaluator an orientation for the calculation of the Farrell efficiency measures has to be chosen, i.e. either input- or output orientation. The following definition can then be made as to the classification of self-evaluators: Definition 3: Consider a peer p ∈ P , where the set P is defined in (2), that is a self- evaluator, p ∈ P S , where the set PS is defined in (8) and found by running either the input- oriented program (3), or the output-oriented program (4). If p ∈ P E , where the set PE is defined in (10), then p is an exterior self-evaluator. If p ∉ P E then p is an interior self- evaluator: P SE = P S ∩ P E P SI = P S ∩ P I (P ∪ P = P ) SE SI (12) S where PSE and PSI are the sets of the exterior and interior self-evaluators respectively. Illustrating the approach using Figure 1, we have that the new “from below frontier” will be the line from A to G, thus these units are the only ones on the “from below frontier” and 4 Note that in the special case where two units have identical input-output vectors, both could be classified as exterior by this criterion, but would not have unique intensity weights in (9). In this situation, which is likely to be rare in empirical applications, it seems natural to classify the units as interior. 9 therefore exterior points in PE. This classification is independent of orientation, and they are both being located on exterior facets in the original problem (1). In the case of output orientation, the self-evaluators B and C, according to the solution to problem (4), will not appear on the new frontier, and they are therefore interior according to Definition 3. The selfevaluators A and G appear on the new frontier and are therefore exterior. In the case of input orientation solving problem (4) gives B, C, D, F and G as self-evaluators, and we have that B, C, D, and F are interior self-evaluators and G an exterior one. While A is an exterior peer in input orientation, it is not a self-evaluator. Figure 2 provides another illustration. In a two-dimensional input space an isoquant is shown in the efficient units A, B, C and D. Consider input orientation and CRS. Assuming inefficient units are only located northeast of the isoquant segment AB in the cone delimited by the rays going through the points A and B, we have that C is an interior self-evaluator, and D is an exterior self-evaluator. Running the “reverse” program (9) we will envelope the four peers from “behind” by the broken line from A to D. We then know that units A and D are exterior, and using the information from running the DEA model (1) we then have that unit C is an interior self evaluator, and unit D an exterior one. Figure 2. Determining the type of self-evaluator 10 It may also be of interest to classify the active peers according to the type exterior and interior. Building on definition 3 we have. Definition 4. The active peers defined in (8) belong to the subsets PAE and PAI: P AE = P A ∩ P E P AI = P A ∩ P I ( P AE ∪ P AI = P A ) (13) where PE and PI are defined in (10) and (11) respectively. The program (9) is not the standard DEA additive formulation, since the sign of the slacks in the restrictions on inputs and outputs have been changed. However, by negating these equalities, (9) can be rewritten as: ⎧ ⎫ − Max ⎨ ∑ smi + ∑ sni+ ⎬ (i ∈ P ) n∈N ⎩ m∈M ⎭ s.t. ∑λ x − xni − sni+ = 0 , n ∈ N − ymi − ∑ λij ymj − smi = 0 , m∈M j∈P ij nj j∈P (14) + smi , sni− ≥ 0 , m ∈ M , n ∈ N λij ≥ 0 ∑λ j∈P ij =1 Comparing (1) and (14) we see that these are identical except that inputs and outputs are exchanged. Since existing DEA software often will solve the additive model (1), we may as well for convenience find the set of exterior self-evaluators PSE by exchanging inputs and outputs and running (14) on the strongly efficient units, rather than running (9) on these units. 11 4. An empirical application The data We will apply the method for determining interior and exterior self-evaluators on a cross section data set of the nursing and home care sector of Norwegian municipalities. The data is found in Edvardsen et al. (2000). The primary data source is the official yearly statistics for municipal activities published by Statistics Norway. Resource usage is measured by financial data and number of man-years of different categories. Production data contains mainly the number of clients dealt with by institutionalised nursing, home based nursing, and practical assistance. Quality information is lacking, but the clients are split into some age groups that may be of significance for resource use. In cooperation with representatives form the municipalities and the ministries of Finance, Municipal and regional affairs, and Social and health affairs we have chosen to split the clients on two major age groups, 0-66 and above 66 (67+), and use institutions and home care as separate outputs. Within institutions there are also a number of short-stay clients, either coming on a day care basis or on limited stay of convalescence. These usually require fewer resources than the permanent clients. As indicators of quality of institutions we have information of number of single person rooms and on clients staying in closed wards. The separation is regarded both as a quality factor for the clients taken care of (demented cases), and for the other clients. In home-based care mentally disabled may be quite resource demanding. They may also be found in the 0-66 age group within institutions. There is no information on how long time a home visit may last or how often it is received. Such information would obviously have given us some quality indicators. We also run the risk of municipalities cutting down on both length and number of visits showing the same number of clients receiving a more generous support in other municipalities. To ensure that the data quality was good enough we entered a phase of quality control. We strongly feel that one should not automatically remove outliers, but if possible contact the municipality in question and ask if the data is correct. This is especially important if the methodology is frontier based (such as DEA) because the units defining the frontier are outliers by definition. This led to many changes in the dataset and required quit a lot of work, but as a result we could be much more confident in the quality of the data (see Aas (2000) for details). 12 Table 1: Primary variables used in the DEA model, cross-section 1997 of 469 municipalities. Average Inputs Trained Nurses Other Employees Other expenses Outputs: No. of Clients Institutions, age 0-66 Institutions, age 67+ Short-term stay Closed wards Single person room Mentally disabled Practical assistance, 0-66 Practical assistance, 67+ Home based nursing, 0-66 Home based nursing, 67+ Standard deviation Min Max x1 x2 x3 31.1 137.4 9066.2 41.4 169.4 13449.5 1.5 5.3 190.0 410.4 1821.5 108990.0 y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 3.4 87.7 113.8 11.8 65.7 48.7 51.3 212.7 34.1 125.8 4.9 108.6 163.3 19.3 82.2 79.5 66.3 272.4 45.3 153.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 50.0 1024.0 1614.0 195.0 747.0 857.0 597.0 2190.0 407.0 1480.0 Table 1 shows descriptive statistics for the variables used in the DEA model. The first three rows measure the inputs in the model. Trained nurses and Other Employees shows us that about 18% of the employees (measured in man-years) are trained nurses. Other expenses are measured in 1000 NOK (Norwegian currency). The last 9 rows in table 1 measure the outputs. Institution, age 0-66 and Institutins, age 67+ are the number of institutionalized clients in the age groups 0-66 and above 67 respectively. Short-term stay shows how many visits the institutions in the municipality have gotten from clients who are not residents, while Closed ward shows how many of the residents are in a special ward for dementia clients. Mentally disabled shows how many of the clients are mentally disabled (almost all of these clients get home care). Practical assistance, 0-66 and Practical assistance, 67+ counts how many clients get practical assistance (such as cleaning and making food) in the indicated age groups, while Home based nursing, 0-66 and Home based nursing, 67+ count the same for clients getting nursing services in their own homes. The Farrell output-oriented efficiency scores Figure 3 shows E2 (output-increasing efficiency assuming variable returns to scale). Each bar in the diagram represents one of the 469 municipalities, sorted by increasing efficiency. The 13 1.0 0.9 0.8 0.7 E2 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 10 000 20 000 30 000 40 000 50 000 60 000 70 000 Size measured in man-years Figure 3: Sorted output-oriented efficiency scores height of the bars represents the efficiency of the DMU, while the width of the bar shows the size measured by man-years (sum of trained nurses and other employees). Both large and small DMUs can be found in all parts of the diagram, with the exception that no large municipalities are located in the (very inefficient) leftmost part of the diagram. The average efficiency is 86 percent, while the efficiency of the average unit is 67 percent. Exterior (PSE) 25 Inefficient (I) 340 Self-evaluators (PS) 29 Interior (PSI) 4 Total (J) 469 Efficient (P) 129 Exterior (PAE) 72 Active (PA) 100 Interior (PAI) 28 Efficient Exterior (PE) 97 Efficient Interior (PI ) 32 Additive model Farrell efficiency model Figure 4: The taxonomy of units in DEA efficiency analyses 14 An overview of the taxonomy developed in Sections 2 and 3 for classification of units is given in Figure 4, together with the actual decomposition for the data set at hand. In view of the relatively large number of observations it may be surprising that as many as 28 percent of the units are efficient. This may be due to the unusually high number of dimensions, 13 variables in all. Since the efficient units span out the frontier technology it is to be expected that the number of exterior ones is higher than the interior ones, 75 and 25 percent respectively. Turning to the Farrell efficiency model (4) the self-evaluators represent 23 percent of the efficient units. As expected the relative share of exterior peers is larger in the group of self-evaluators than in the group of active peers, 86 versus 72 percent. Among the active peers that share of interior units is higher, 28 percent. This distribution is of importance for the empirical support of the frontier and the associated efficiency distribution. Table 2. Relative size of interior- and exterior self-evaluators measured as percentage deviation from the sample average Inputs Municipality number and name Interior self-evaluators 425 Åsnes 616 Nes 807 Notodden 1567 Rindal Exterior self-evaluators 101 Halden 213 Ski 217 Oppegård 219 Bærum 430 Stor-Elvdal 615 Flå 632 Rollag 709 Larvik 806 Skien 904 Grimstad 941 Bykle 1144 Kvitsøy 1222 Fitjar 1411 Gulen 1612 Hemne 1632 Roan 1702 Steinkjer 1714 Stjørdal 1723 Mosvik 1839 Beiarn 1868 Øksnes 1920 Lavangen 3001 Bygdøy-Frogner 3003 St.Hanshaugen-Ullevål 3004 Sagene-Torshov Outputs Other Other Trained employ expens nurses es ees Inst 0-66 Inst 67+ Short Single Closed Mentally PA time person ward disabled 0-66 stay room PA 67+ HS 0-66 HS 67+ 20 -56 14 -51 3 -63 36 -68 -1 -58 22 -64 -12 -41 17 -71 68 -44 83 -41 15 -38 167 -32 61 -49 53 -41 25 -25 106 -45 7 -71 15 -84 -2 -40 -34 -73 27 -36 42 -54 6 -38 17 -82 54 -41 84 -39 126 63 92 929 -68 -90 -81 345 353 92 -86 -85 -61 -68 -51 -88 201 128 -72 -78 -48 -87 41 262 241 194 43 6 736 -48 -73 -72 296 356 21 -76 -95 -64 -49 -57 -82 116 96 -87 -75 -25 -82 10 271 339 62 66 29 1102 -49 -73 -74 205 346 -2 -77 -94 -73 -57 -58 -86 27 29 -89 -74 -48 -83 150 695 808 17 311 76 825 -100 -100 -71 517 252 -12 -100 -100 -71 -41 -71 -100 135 487 -100 -41 -12 -71 -100 47 311 32 -12 -9 748 -52 -71 -69 283 297 27 -73 -95 -70 -24 -65 -82 85 97 -76 -75 -37 -77 -37 392 305 -32 -38 22 551 -49 -77 -89 303 93 -23 -82 -89 -58 -84 -16 -77 87 99 -82 -75 -57 -82 -100 167 338 -100 69 -100 1112 -7 -100 -100 273 19 188 -58 -100 -32 -100 -7 -100 358 36 -100 -49 -41 -66 -100 222 171 14 11 28 1038 -50 -70 -63 241 304 -7 -63 -91 -27 -57 -82 -71 68 128 -76 -62 -19 -73 -16 441 252 198 54 42 781 -77 -90 -65 393 235 54 -98 -100 -77 -75 -88 -84 60 153 -77 -90 -42 -79 -32 33 23 167 257 15 659 -67 -84 -80 356 349 159 -94 -94 -75 -84 -49 -94 157 126 -88 -90 -40 -96 15 37 372 319 58 43 538 -37 -81 -80 390 551 44 -96 -94 -73 -59 -41 -79 172 60 -85 -71 -56 -85 194 347 554 193 -30 390 577 26 -97 -24 170 322 225 -91 -94 -77 -65 38 -91 126 3 -88 -77 143 -97 99 -9 176 402 -3 184 415 -42 -77 -81 280 562 9 -94 -90 -79 -45 7 -74 115 18 -75 -65 -26 -84 188 231 351 15 Far out or alone in the crowd? In Table 2 the relative distance from the average unit is illustrated by measuring each of its variables against the average for the sample (J). The first four units are the interior selfevaluators in PSI . These units are on both sides of the average, and one of the four units is quite close to the sample average. No unit is close to either the small or large exterior units. It seems appropriate to use the expression “alone in the crowd.” The 28 exterior self-evaluators are distributed with half above and half below the sample average. One unit has maximal sample values for two of the variables. There are several output variables with zero as the lower limit. The variable “Institution, 0-66” has seven exterior units with the minimum value of zero; while for “Closed ward” there are eight exterior units with the minimum value of zero. So given that “far out “ means both small and large units the exterior units deserve well this classification. The influence of extreme mixes may also be investigated, but due to all the possible comparison we leave this exercise out. The idea behind the two-stage approach is based on the distinction between pure inputs and outputs on the one hand, and environmental variables on the other. By the assumptions of the DEA method, the input-output vectors must belong to a deterministic technology set bounded by the frontier. However, environmental variables may be relevant for the performance of the units, but their influence may be regarded to be of a stochastic nature that is most appropriately revealed by studying the statistical association between some measure of performance and the environmental variables. Since the crucial point of being concerned with environmental variables is that there must be some influence on either the discretionary inputs or outputs of the environmental variables there is a good case for advocating a single stage approach and incorporating all relevant variables in one single model. One reason for treating environmental variables differently than standard outputs and inputs is that the way the variables interact with the standard production variables may be difficult to model. It may, for example, not be so clear-cut whether the variable is an input or an output. The formulation of the second stage is to establish an association between the efficiency score and the environmental variables, zk: Eoi = f ( z1 ,.., zK ) + ε i , i ∈ J ,o = 1,2 (15) 16 where εi is a random variable. There have been several approaches to estimating (15). The first approach was to specify f(.) as a linear function and apply OLS (Seitz 1967, 1971). But there are two special features of the model (15). By definition the efficiency scores are restricted to be between zero and one, 0 ≤ Eoi = f ( z1 ,.., z K ) + ε i ≤ 1 , i ∈ J ,o = 1,2 (16) and using the DEA model (3) or (4) to generate the efficiency scores usually leads to a concentration of the values 1. As shown in Figure 4 we have 28 percent of the efficiency scores at the upper limit of one. This has lead researchers to apply a censored regression like the Tobit model or truncated regressions. These approaches are strongly criticized in Simar and Wilson (2003). The fundamental point is made that the efficiency scores in (15) are estimates of the unknown efficiencies, and that these scores are serially correlated. Therefore, neither applying a Tobit or a truncated regression will solve this problem. A sequence of bootstrapping techniques is proposed that will yield proper confidence intervals of the parameters of f(.). Table 3: Stage 2 regression results applying OLS to a linear model All units included*) Excluding exterior Self-evaluators R2 Variable Climate indicator Coeff. 0.1737 p-value R2 Coeff. 0.2082 p-value -0.007 0.035 -0.006 0.056 0.098 0.019 0.099 0.015 Free disposable income, 1996 -0.020 0.054 -0.028 0.010 Share of users in home care -1.019 0.000 -1.089 0.000 Share in home care of age group 0-66 1.823 0.170 1.437 0.282 Share in home care of age group 67-79 0.574 0.006 0.665 0.001 Share in home care of age group 80-89 0.270 0.004 0.261 0.004 Share in home care of age group 90+ 0.100 0.019 0.109 0.011 Share in inst. care of age group 0-66 24.785 0.019 27.615 0.009 Share in inst. care of age group 97-79 -1.072 0.011 -0.926 0.026 Share in inst. care of age group 80-89 -0.101 0.524 -0.152 0.331 Share in inst. care of age group 90+ 0.026 0.561 0.053 0.235 Constant term 1.527 0.000 1.562 0.000 Share of private institutions *) Communities within the two major cities Bergen and Oslo are aggregated and one unit is removed from the data set 17 However, since the purpose of our paper is to demonstrate the importance of the role of exterior peers, the relation (15) is here interpreted just to represent an investigation of association and not to be a causality model. Therefore, OLS is used to estimate a linear function (15). An advantage of OLS is that better diagnostics to characterize the covariations are available, like the multiple regression coefficient. Table 3 shows the result of an OLS regression using a linear model in (15). The p-values are also given, although they should not be taken at face value due to the inherent statistical problems with the approach, as mentioned above. We perform regressions firstly with the complete data set, and secondly excluding the exterior self-evaluators. The environmental variables represent background variables that experts have suggested may influence the efficiencies of municipalities. Climate indicator is a measure of the average temperature measured over the year in the municipality. It can also be seen as a proxy for amount of snow, altitude and distance from the coast. We note that removing the exterior selfevaluators changes both the regression coefficient and the p-value, indicating a weaker connection between efficiency scores and this variable. Share of private institutions is measured by how large share of the total number of institutions are in the private sector (most often NGO’s). It would be better to measure this by the number of clients, but such data was not available. Possible interpretations of a positive parameter estimate (and low p-values in both regression models) are that the municipalities own care providers get a learning effect from presence of private service providers, or that private presence reduces inefficiency because they increase the fear of privatization in the municipal nursing sector. Free disposable Income, 1996 is measure of the relative wealth of the municipality (per inhabitant). It is calculated by finding the difference between the actual income in the municipality, and the “required expenses” in the municipality in other sectors than care for the elderly (i.e. schools, roads etc.). Required expenses are calculated on demographical variables and other factors exogenous to the municipality. (See Aaberge and Langørgen (2003) for the details behind the construction of this indicator.) Data for 1997 (the year all the other data is from) was also available, but we reasoned that the municipality’s decision on how it want provide care for the elderly is more strongly based on income in the previous than 18 in the current year. This has some statistical support in that the ’96 variable has larger explanatory power measured by R2 of the model and T-value of the parameter estimate. The p-value for the parameter estimate for this variable improves when the exterior self-evaluators are removed from the regression model. One possible explanation of the negative parameter estimate is that a “rich” municipality might use the extra resources on higher quality (not picked up by the DEA model) and/or allowing inefficiency in production of services. Share of users of home care is a measure of the size of the share of home care clients in relation to all the clients getting nursing services. This coefficient has a negative parameter estimate. This is an indication that the technical efficiency tends to be lower when a larger part of the municipality’s clients is in home care. This is interesting, because it is a measure of the product mix in the municipality. The DEA method takes into account the case mix when estimating the frontier. However, the distance between the frontier and the average unit behind the frontier might vary with case mix. It is important to remember that since we have no price information on the products (home care and institutionalized care), we do not know which group has the highest total efficiency. Without price information we can only estimate technical efficiency and scale efficiency, not allocative efficiency, which is also a component of total efficiency. Thus, we can make no recommendation of what is better, only point out that the variation of technical efficiency seem to grow with the share of home based nursing. Share in home care of users in age group…(four age groups) measures how large share of the total population in an age group gets home based nursing services. With the exception of the lowest age group (0-66) all of the parameter estimates are statistically significant and positive. This supports our hypothesis that the higher the coverage of home based nursing, the lower the required resource usage per client. The reasoning is that the nursing sector behaves as if it ranks its potential clients from the ones that require the most nursing to the ones that requires the least, and that it uses this ranking as a prioritized list of which clients to accept first. If the municipality has a larger share of the population in an age group as its clients, we expect the average required resource usage per client to be lower because the average client is healthier. Share in inst. care of users in age group … (four age groups) is similar to the variables described above, but for institutionalized care. The parameter estimate for the youngest age group (0-66) is positive and statistically significant. It is a priori known that some of these clients require a lot of resource usage, but remember that the number of users in this group 19 (inst. 0-66) is included in the DEA model. It might be that the municipalities who has a relatively large share of these users compared to their total population have healthier clients on average. The only other age group in inst. care that gets a statistically significant parameter estimate is 67-79 where the sign is negative. This is an indication that the “youngest of the oldest” require more resource usage in inst. care than the other groups above 67. It might be that it more for difficult for the clients in this relatively young age group to get inst. care, and that the clients who actually get it require more resources on average than in the older age groups. Removing the exterior self-evaluators can make a difference. In this case the explained share of the total variance in the model increased as R2 rose from 17% to 21%. Both coefficient estimates and p-values change, sharpening the estimates of seven coefficients while only three had increased p-values5. While numerical changes are small, they are still sizeable considering that only 25 out of 469 observations (5%) were removed. Essentially, we have removed the units that are most likely not to contain any information, i.e. to be pure noise6. This is of course not conclusive evidence that one approach is better than the other. The point we want to make is that it may make a difference. We have already argued that it makes theoretical sense to remove the exterior self-evaluators. It may be added that in Simar and Wilson (2003) it is conjectured that the bootstrap works better the denser the data. Since we have removed data points in regions that by definition are as “thin” as possible, the bootstrap should also work better. In sum, we feel that we have made a solid case for the advantages of identifying and removing the exterior self-evaluators when doing a two-stage analysis in a DEA setting. 5 In contrast, excluding all self-evaluators, both interior and exterior, would have lowered R2 and decreased pvalues only for three coefficients and increased them for seven. 6 Preliminary results from using the homogenous bootstrap suggested by Simar and Wilson (1998) show a standard error of the bias-corrected estimates that is consistently twice as large for the exterior than for the interior self-evaluators, supporting the lack of information content in the efficiency estimates of the former. 20 5. Conclusions The units found strongly efficient in DEA studies on efficiency can be divided into selfevaluators and active peers, depending on whether the peers are referencing any inefficient units or not. The contribution of the paper starts with subdividing the self-evaluators into interior and exterior ones. The exterior self-evaluators are efficient “by default”; there is no firm evidence from observations for the classification. Self-evaluators may most naturally appear at the “edges” of the technology, but it is also possible that self-evaluators appear in the interior. It may be of importance to distinguish between the self-evaluators being exterior or interior. Finding the influence of some variables on the level of efficiency by running regressions of efficiency scores on a set of potential explanatory variables is an approach often followed in actual investigations. Using exterior self-evaluators with efficiency score of 1 in such a “two-stage” procedure may then distort the results, because to assign the value of 1 to these self-evaluators is arbitrary. Interior self-evaluators, on the other hand, may have peers that are fairly similar. They should then not be dropped when applying the two- stage approach. A method for classifying self-evaluators based on the additive DEA model, either CRS or VRS, is developed. The exterior strongly efficient units are found by running the enveloping procedure “from below”, i.e. reversing the signs of the slack variables in the additive model (1), after removing all the inefficient units from the data set. Which units of the strongly efficient units from the additive model (1) that turn out to be self-evaluators or active peers, will depend on the orientation of the efficiency analysis, i.e. whether input-or output orientation is adopted. The classification into exterior and interior peers is determined by the strongly efficient units turning out to be exterior ones running the “reversed” additive model (9). The exterior self-evaluators units should be removed from the observations on efficiency scores when performing a two-stage analysis of explaining the distribution of the scores. The application to municipal nursing- and home care services of Norway shows significant effects of removing exterior self-evaluators from the data when doing a two-stage analysis. Thus the conclusions as to explanations of the efficiency score distribution will be qualified taking our new taxonomy into use. 21 References Banker, R.D., A. Charnes and W.W. Cooper (1984) "Some models for estimating technical and scale inefficiencies." Management Science, 30, pp. 1078-92. Charnes, A., C.T. Clark, W.W. Cooper, and B. Golany (1985a) "A Developmental Study of Data Envelopment Analysis in Measuring the Efficiency of Maintenance Units in the U.S. Air Forces." Annals of Operations Research, 2, 95-112. Charnes, A., W. W. Cooper, B. Golany, L. Seiford, and J. Stutz (1985b): "Foundations of Data Envelopment Analysis for Pareto-Koopmans Efficient Empirical Production Functions.," Journal of Econometrics, 30, 91-107. Charnes, A., W.W. Cooper and E. Rhodes (1978): “Measuring the efficiency of decision making units,” European Journal of Operations Research 2, 429-444. Cooper, W. W., L. M. Seiford, and K. Tone (2000): Data Envelopment Analysis. A comprehensive text with models, applications, references and DEA-solver software, Boston/Dordrecht/London: Kluwer Academic Publishers. Edvardsen, D. F. and F. R. Førsund (2001): “International benchmarking of electricity distribution utilities, ” Memorandum 35/2001, Department of Economics, University of Oslo. Edvardsen, D. F., F. R. Førsund og E. Aas (2000): “ Effektivitet i pleie- og omsorgssektoren” [Efficiency in the nursing- and home care sector], Rapport 2/2000, Oslo: Frischsenteret. Erlandsen, E. and F. R. Førsund (2002): “Efficiency in the Provision of Municipal Nursingand Home Care Services: The Norwegian Experience,” in K. J. Fox (ed.): Efficiency in the Public Sector, Boston/Dordrecht/London: Kluwer Academic Publishers, x-y. Färe, R. and D. Primont (1995): "Multi output production and duality: applications," Southern Illinois University at Carbondale. Theory and Farrell, M. J. (1957): “The measurement of productive efficiency,” Journal of the Royal Statistical Society, Series A, 120 (III), 253-281. Førsund, F. R. and N. Sarafoglou (2002): “On the origins of data envelopment analysis,” Journal of Productivity Analysis 17, 23-40. Nerlove, M. (1965): Estimation and identification of Cobb – Douglas production functions, Amsterdam: North-Holland Publishing Company Seitz, W. D. (1967): “Efficiency measures for steam-electric generating plants”, Western Farm Economic Association, Proceedings 1966, Pullman, Washington, 143-151. Seitz, W. D. (1971): “Productive efficiency in the steam-electric generating industry,” Journal of Political Economy 79, 878-886. 22 Simar, L. and P.W. Wilson (1998) "Sensitivity Analysis of Efficiency Scores: How to Bootstrap in Nonparametric Frontier Models." Management Science, 44, 49-61. Simar, L. and P. W. Wilson (2003): “Estimation and inference in two-stage, semi-parametric models of production processes,” Technical report 0310 IAP statistics network (http://www.stat.ucl.ac.be/Iapdp/tr2003/TR0310.ps ). Torgersen, A.M., F.R. Førsund, and S.A.C. Kittelsen (1996): "Slack-Adjusted Efficiency Measures and Ranking of Efficient Units." Journal of Productivity Analysis, 7, 379-39. Aas, E. (2000): “På leting etter målefeil – en studie av pleie- og omsorgssektoren”, Notater 2000:10, Statistics Norway, Oslo. 23 1 CLIMBING THE EFFICIENCY STEPLADDER: ROBUSTNESS OF EFFICIENCY SCORES IN DEA∗ by Dag Fjeld Edvardsen Norwegian Building Research Institute, Forskningsveien 3b, NO-0314 Oslo, Norway. Email: [email protected] Abstract: The robustness of the efficiency scores in DEA (Data Envelopment Analysis) has been addressed on a number of occasions. It is of crucial importance for the practical use of efficiency scores. The purpose of this paper is to demonstrate the usefulness of a new way of getting an indication of the sensitivity of each of the efficiency scores to measurement error. The main idea is to investigate a DMU’s (Decision Making Unit) sensitivity to sequential removal of its most influential peer (with new peer identification as a part of each of the iterations). The Efficiency stepladder approach is shown to provide relevant and useful information when applied on a dataset of Nordic and Dutch electricity distribution utilities. Some of the empirical efficiency estimations are shown to be very sensitive to the validity and existence of one or a low number of other observations in the sample. The main competing method is Peeling, which consists of removing all the frontier units in each step. The new method has some strengths and some weaknesses in comparison. All in all, the Efficiency stepladder measure is simple and crude, but it is shown that it can provide useful information for practitioners about the robustness of the efficiency scores in DEA. Keywords: DEA, Sensitivity, Robustness, Efficiency stepladder, Peeling. ∗ This study is part of the methodological development within the research project “Productivity in Construction” at the Norwegian Building Research Institute (NBI) financed by the Norwegian Research Council. Finn R. Førsund has followed the whole research process and offered detailed comments. Hans Bjurek, Håkan Eggert, Lennart Hjalmarsson, and Sverre A.C. Kittelsen have also given valuable comments. Any remaining misunderstandings are solely this author’s responsibility. 2 1. Introduction The robustness of the efficiency scores in DEA has been addressed in a number of research papers. There are several potential problems that can disturb precise efficiency estimation, such as sampling error, specification error, and measurement error. It is almost exclusively the latter that is dealt with in this paper. It has been proven analytically that the DEA efficiency estimators are asymptotically consistent given that a set of assumptions is satisfied.1 The most critical assumption might be that there are no measurement errors. The DEA method estimates the production possibility set by enveloping the data as close as possible, in the sense that the frontier consists of convex combinations of actual observations, given that the frontier estimate can never be “below” an observed value. If the assumption of no measurement error is broken we might observe inputoutput vectors that are outside the true production possibility set, and the DEA frontier estimate will be too optimistic. Calculating the efficiency of a correctly measured observation against this optimistic frontier will lead to efficiency scores that are biased downwards. In other words, even symmetric measurement errors can produce efficiency estimates that are too pessimistic. It is of crucial importance for the practical use of the efficiency scores that information about their sensitivity is available. The reason why measuring sensitivity is a challenge is in a sense related to the difficulty with looking at n-dimensional space. In two dimensions, and possibly three, one can get an idea of the sensitivity of one observation efficiency score by visually inspecting a scatter diagram. But when the number of dimensions is higher than three, help is needed. The Efficiency stepladder method introduced in this paper is an offer to empirically oriented DEA applications. This paper is not about detecting outliers; it is about investigating the robustness of each DMUs efficiency score. The main inspiration is Timmer (1971), and the intention is to offer a crude and simple method that works relatively quickly and is available to practitioners as a freely downloadable software package. In the following only DEA related approaches are considered. There are mainly two ways sensitivity to measurement error in DEA has been examined: (1) perturbations of the 1 See Banker (1993) and Kneip et al. (1998) for details. 3 observations, often with strong focus on the underlying LP model, and (2) exclusion of one or more of the observations of the dataset. The Efficiency stepladder is based on the latter alternative. The main idea is to examine how the efficiency score of a given inefficient DMU develops as the most influential other DMU is removed in each of the iterative steps. The first step is to determine which of the peers whose removal is associated with the largest increase in the efficiency score. This peer is permanently removed, and the DEA model is recalculated giving a new efficiency score and a new set of peers. The removal continues in this fashion until the DMU in question is fully efficient. This series of iterative DMU exclusions provides an “efficiency curve” of the increasing efficiency values connected with each step. There are few alternative approaches available that provide information about the sensitivity of efficiency scores. Related methods in the literature are Peeling (Barr et al., 1994), Efficiency Order (Sinuany-Stern et al., 1994) and Efficiency Depth (Cherchye et al., 2000). Peeling consists of removing all the frontier units in each step. There are also similarities between the Efficiency stepladder and the Efficiency Order/Efficiency Depth methods. The main difference is that the Efficiency stepladder approach is concerned with the stepwise increase in the efficiency scores after each iterative peer removal, while the Efficiency Order/Efficiency Depth methods are more concerned with the number of observation removals that is required for the DMU in question to reach full efficiency. The empirical application is mainly used as an illustration on how the Efficiency stepladder method works on real world data. The application is used to show what kind of analysis can be performed using this method. To carry out a full scale empirical analysis is an extensive undertaking, and is outside the scope of this paper. The layout of the rest of the paper is according to the following plan. Section 2 gives a brief survey of some of the literature related to the sensitivity of the efficiency scores in DEA. Section 3 explains the basic properties of the DEA method. Introduction of the Efficiency stepladder approach is the topic of Section 4. In Section 5, model specification and the basic facts about the dataset are presented. The empirical results and how the Efficiency stepladder method can provide insight about the sensitivity of the dataset used are found in Section 6. Section 7 rounds off the paper with the conclusions. 2. Sensitivity in DEA – a brief survey The topic of this paper is the sensitivity of the efficiency scores in DEA. Other nonparametric approaches are claimed to be more robust to noisy data. One example is the Order- 4 M frontier method. It is described in Cazals et al. (2002). One application of this method (on U.S. Commercial Banks) is Wheelock and Wilson (2003). Instead of measuring performance relative to the unknown (and difficult-to-estimate) boundary of the production set, performance for a given DMU is measured relative to expected maximum output among banks using no more of each input than the given DMU. The authors claim that this approach permits a fully non-parametric estimation with a much better rate of convergence than DEA, avoiding the usual curse of dimensionality that plagues traditional non-parametric efficiency estimators. In the following, only DEA related approaches are considered. There are mainly two ways in which sensitivity to measurement error in DEA has been examined: (1) perturbations of the observations, often with strong focus on the underlying LP model, and (2) exclusion of one or more of the dataset observations. Other alternatives have been used when more information about the uncertainty of one or a few of the dimensions is available.2 2.1 Investigations based on perturbations of the data in the LP model Charnes et al. (1985) examined the consequences of varying one of the output variables. The intention was to identify the efficient DMUs that have wide ranging effects and distinguish them from others whoose effects are more limited. In the conclusion they state that “More work needs to be done to extend this for studying the consequences of altering several outputs simultaneously. Input variations and also simultaneous variations of inputs and outputs need to be addressed in other research that should be of value for sensitivity analysis in general.” One of the papers that picked up that challenge was Charnes et al. (1992), who used “distance” (the norm of a vector) in order to determine the “radii of stability” for a DMU. Within this region, data variations do not alter a DMU’s status from inefficient to efficient (or vice versa). This is done by centring a box on the original observation for the DMU in question. This box (they refer to it as a “Unit ball”, even when it is not round in any possible sense) is defined by the Chebyshev norm which is described by the smallest distance from the centre of the box to any of the sides. For an inefficient DMU the radius defining this box is increased from zero until an observation within this box can be reclassified from inefficient to efficient. The sensitivity of the efficient units is estimated in a similar way. 2 See Kittelsen et al. (2001). 5 Thompson et al. (1994) wanted to determine the magnitudes of data variations that can cause changes in status for the DMUs classified as fully efficient. Their method is based on studying the effects of small increments and decrements in the inputs and outputs with regards to the DMU’s classification as efficient or inefficient. They applied this method on two real world datasets (Kansas farming and Illinois coal mining). In the latter they found that within the data variations considered (+/-20% or less in absolute value), 98% of the DMUs in the subset originally classified as 100% efficient were insensitive to potential data errors. The authors claim that their sensitivity analysis shows that DEA results tend to be robust for extreme efficient DMUs. Zhu (1996) examines how to identify the sensitivity or robustness of efficient DMUs in DEA. His approach is based on linear programming problems whose optimal values yield particular regions of stability. Sufficient and necessary conditions for variations in inputs and outputs of an efficient DMU to maintain full efficiency are provided. 2.2 Investigations based on exclusion of observations from the dataset An early and influential contribution was Timmer (1971). This paper heavily quoted Farrell (1957), but used deterministic frontiers (estimated with linear programming) instead of DEA. Though not mentioned in its abstract, the paper was a pioneering contribution when it comes to measuring the sensitivity of the efficiency scores when removing selected units from the dataset. Timmer showed two ways to do this. The first alternative he suggested was to remove observations from the dataset until a given percentage of the dataset is outside the probabilistic frontier. The other alternative he suggested was to remove efficient observations one by one until the resulting frontier stabilizes. Timmer claimed that either of these approaches may overcome the objections to estimating a frontier function because of data problems. Superefficiency was introduced in Andersen and Petersen (1993). It was introduced to rank efficient units, but as pointed out in Banker and Chang (2000), it is probably more useful for detecting outliers when there is reason to believe that the data might be noisy. Superefficiency is a measure of the relative radial distance from the origin to the DMU in question,when the frontier is estimated without this DMU included in the dataset. Superefficiency is by construction greater than (or equal to) one. A superefficiency value of 1.2 implies that the DMU is positioned “20% outside” where the frontier would have been without this DMU (in a radial sense). 6 Peeling is described in Barr et al. (1994). This approach measures how much the efficiency of a DMU would change if the whole frontier was removed. They used the allegory that peeling in DEA is like removing layers from an onion. The DEA dataset can be seen as a series of frontiers inside other frontiers. If we remove all the observations in the first frontier, a new frontier is generated when the LP model is recalculated for the remaining units. This continues until there are no more observations left. With peeling one is typically mostly concerned with which frontier a DMU belongs to -- the one where it becomes efficient. A weakness is that a frontier in DEA typically consists of different numbers of units. For the individual DMU, removing one single unit can be sufficient for it to reach the frontier. Removing the entire frontier is measured as one operation, independently of the number of units this particular frontier consisted of. One attractive aspect with Peeling is that it is very fast to compute. Peeling is well known in the DEA research community, but surprisingly few empirical DEA applications take advantage of this method. One possible explanation is that none of the mainstream commercial and freeware DEA software packages offer automatic generation of the layer number of each DMU. Peeling in DEA is in spirit very close to Timmer (1971), but since Timmer did not use DEA the selection of which and how many DMUs to remove is a little different. Timmer suggested removing a given number or a given percentage of the DMUs, while Barr et al. suggest removing the entire frontier – independently of whether the frontier is made up of 1 or 20 DMUs. Sinuany-Stern et al. (1994) introduced Efficiency Order as “the number of units we need to delete in order to reach efficiency.” What algorithm one should use to identify the number of units that is required to be deleted is not explained in detail. A similar approach can be found in Cherchye et al. (2000),3 who used a mixed integer algorithm to identify the Efficiency Order. Further information on how the Efficiency Order relates to the Efficiency stepladder is given in Section 4. Wilson (1995) investigated the consequences of removing observations from the dataset. If removing an observation makes big differences in the efficiency scores of the other DMUs, then the area of the dataset where this input/output combination was found is not densely populated, and convex combinations of other DMU offer little help. This is an indication that the observation in question is a possible outlier and should be investigated for measurement error. By definition this approach works only on the fully efficient units. 3 They use the term “efficiency depth”, and do not refer to Sinuany-Stern et al. (1994). 7 3. Data Envelopment Analysis 3.1 The origins of DEA The original idea behind DEA was introduced in Farrell (1957). It was further developed in a very influential paper by Charnes, Cooper and Rhodes (1978). The term Data Envelopment Analysis (DEA) was coined in their paper. However, the first use of Linear Programming (LP) in the calculation of the DEA efficiency scores was made by Farrell and Fieldhouse (1962). The DEA model with variable returns to scale is often referred to as the BCC-model (Banker, Charnes and Cooper, 1984), but it was introduced in Afriat (1972) in the single output case, and empirically implemented in the case of multiple outputs in Färe, Grosskopf and Logan (1983).4 Banker (1993) proved that the output oriented efficiency score is consistent in the case of a single output, while Kneip et al. (1998) showed statistical consistency and rate of convergence in the general multiple-input and multiple-output case. Unfortunately the rate of convergence is low, leading to sampling bias. The expected size of this bias increases exponentially in the number of inputs and outputs for a given sample size. The bias can be estimated and the efficiency estimate bias adjusted with a statistical technique referred to as bootstrapping (Efron, 1979). If the required number of inputs and outputs is large compared to the number of DMUs available, the standard errors for the (bootstrapped) bias corrected efficiency scores will be very large, and the discrimination of the efficiency scores will have little statistical significance. Including too few inputs and outputs will reduce the curse of dimensionality, but will lead to a wrongly specified efficiency model. In a sense this is worse, because the confidence intervals will misleadingly tend to be smaller the fewer inputs and outputs we include.5 Including too many inputs or outputs (as long as the correct ones are included) will not make the efficiency estimator inconsistent (in an asymptotic sense), but with finite samples it will make the efficiency estimate more noisy and biased. Statistical tools for choosing model specification have been developed, but they do (of course) require that observations of the important inputs and outputs are available, and that a sufficiently large number of DMUs are available for the tests to give significant results (depending of the power of the tests) . This line of thought leads back to Banker (1993, 1996) and Kittelsen (1993). 4 5 For the history of the development of DEA, see Førsund and Sarafoglou (2002). This is a complicated mechanism, and will not be covered in further detail in this paper. 8 3.2 The LP formulation of the DEA model Førsund and Hjalmarsson (1979) define the measures E1 to E5, where E1 is radial efficiency assuming variable returns to scale. This is the same as the Banker et al. (1984) model formulated as: E1i ≡ Min θ i s. t . ∑λ ymj − ymi ≥ 0 , m = 1,..., M θ i xsi − ∑ λ ij xsj ≥ 0, s = 1,..., S j ∈N ∑λ j ∈N ij j ∈N ij (1) =1 λ ij ≥ 0, j ∈ N The usage of symbols in model (1) is as follows: E1i is the input saving VRS efficiency for DMU I, θ i is a scalar, S is the number of inputs dimensions, M is the number of output dimensions, and N is the set of DMU. The indices i and j belong to the set N, ymj is the level of output and x is the level of intput. λ ij is a reference weights. The peers for DMUi in problem (1) are the ones for which λ ij is strictly positive. If DMUi is strongly efficient then λ ii has the value of 1. In other words, a strongly efficient DMU is its own peer.6 Notice that not all units with radial efficiency equal to 1 are Pareto efficient. They might have slack in one or more dimensions. To identify which of the DMUs are Pareto efficient, the “additive” DEA model can be used. Here the sum of slacks for each DMU is maximized, and only the Pareto efficient units have zero slack (see Charnes et al., 1985). Figure 1 is an illustration of how the DEA model works in the VRS case with two inputs and one output. The DMUs A, B and C are efficient and define the boundary of the Production Possibility Set (PPS), while DMU D is inefficient and is positioned strictly in the interior of the PPS. The input saving radial efficiency of DMU D is equivalent to the proportional radial contraction of all inputs possible while staying within the PPS. The radial contraction is stopped at point F. The radial input saving efficiency of DMU D (E1) is equal to the ratio GF/GD. However, point F is not a Pareto efficient point. In addition to reducing both 6 If two (or more) DMUs have identical input-output vectors, the choice of peer(s) is not unique. Output y In pu tx 2 9 C H F D A B G O SLACK Input x1 Figure 1: Illustration of DEA with two inputs and one output (VRS) using the radial Farrell input reducing efficiency measure. the inputs with the same percentage, it would also be possible to further reduce the usage of input x1 equivalent with the distance FH. The existence of this extra slack is not captured by the E1 measure. 4. The Efficiency stepladder – a method for measuring sensitivity in DEA The basic idea behind the Efficiency stepladder approach is quite similar to the efficiency order (Section 2.2). The robustness of the efficiency score of the unit under investigation is examined in light of the exclusion of other observations in the sample. In both cases one is interested in the lowest possible number of observations that has to be removed for the DMU in question to reach the frontier, but with the Efficiency stepladder approach there is greater focus on the entire development from the original efficiency and then step by step until the DMU is fully efficient. The exact algorithm used is presented in Section 4.1, but the basic idea used in the computer program is in each step to determine which of the peers whose removal is associated with the largest efficiency increase. This peer is then removed, and the DEA model is recalculated leading to a new peer group set. This is repeated until the DMU has an efficiency score of 100%. 10 One alternative algorithm could be to iterate over all alternatives and then determine which sequence of observation removals that most quickly moved the DMU to the frontier. This approach would however extremely time consuming with even medium sized datasets because of the very high number of possible sequences to consider. A natural implementation when an unlimited number of CPU cycles are not available is a first step optimal algorithm. It can easily be shown that this algorithm can choose stupid paths when digging out peers in order to get the unit to the frontier with as few steps as possible (one example is provided further down in the text), but the results are still useful as long as one remembers that a high number of steps to the frontier should not be taken as very strong evidence that the DMU is inefficient. On the other hand a low number of Efficiency stepladder iterations before we reach the frontier means with certainty that the efficiency of the DMU in question is very sensitive to the quality of observation for the DMUs removed in the Efficiency stepladder sequence. In other words, be concerned if the slope of the Efficiency stepladder is steep, but don’t be too calm if the increase is slow. The computer program used to calculate the numbers in this paper is “DagEA” which has been developed for this exact purpose. In the current version it is a front end7 that uses DEAP (Coelli, 1996) as its DEA solver. Both DagEA will be and DEAP are freely available on the Internet. In the front end of DagEA there are routines for automatic calculation of the Efficiency stepladder for a dataset.8 For middle-sized datasets, calculation is relatively fast, but calculation time increases exponentially with the dimensionality.9 4.1 The Efficiency stepladder approach illustrated The one step optimal algorithm is very simple: 1. Calculate the DEA efficiency score for the unit of interest (“DMU P1”) and write down which units serve as peers for DMU. The peers are characterized by having a strictly positive λ in the optimal solution of the LP model formulated in (1). 7 A front end is a computer program that provides the visual interface that the user interacts with, but it uses another computer program as its calculation engine. 8 DagEA is developed by Dag Fjeld Edvardsen. It will eventually be downloadable for free from http://home.broadpark.no/~dfedvard. 9 Calculating all the Efficiency stepladder values on the dataset used in this paper took 37 minutes on a 1.2 Ghz Pentium M notebook, but there are still possibilities for optimizing the source code. Increasing the speed by a factor of 10 on the same hardware might be realistic. 11 2. For each of the peers identified in the step above: Calculate the efficiency score of the DMU P1 if that peer is removed from the dataset, write down the efficiency score, and put the peer back in the dataset before the efficiency of DMU P1 is calculated with another peer temporarily removed. 3. Permanently remove the most influential peer identified in the step above. This is the one that has experienced the largest change in efficiency associated with its removal (this is equivalent with the peer whose removal makes the efficiency score of DMU P1 the largest). This efficiency score and the most influential peer’s identity are added to the Efficiency stepladder table. 4. Repeat (1)-(3) while permanently removing the peers identified in (3) and for each iteration adding the id-number of the most influential peer and the efficiency score associated with its removal. Stop the repetitions when the efficiency of DMU P1 reaches 1. 4.2 Possible problems with the one step optimal algorithm Figure 2 is an example of how a one step optimal routine can potentially choose a route towards the frontier that takes a higher number of steps than necessary. The challenge is to find the shortest number of sequential peer exclusions that will make DMU J fully efficient. Looking at Figure 2, it is easy to see that excluding DMU H and then DMU I results in DMU Output J reaching the frontier in two steps. However, the one step optimal algorithm by definition H I J A B C F D G E Input Figure 2: Illustration of how the one step optimal algorithm can choose the wrong path. 12 only compares the alternative one step further down the road. Because of this it will choose to remove DMU A instead of removing DMU H since this will result in the greatest increase in the efficiency score for DMU J. Next, with DMU A out of the way, it will choose to remove DMU B instead of DMU H, for the same reason. This continues as the one step optimal algorithm chooses to remove the DMUs C, D, E, F, and G. Only after this it decides to eliminate DMU H and DMU I. In this example the algorithm used nine steps to accomplish what really needed only two steps. However, this example is constructed to show the one step optimal algorithm in the worst possible light. There is no indication that such behaviour is common when real world data is used. At the same time it is a demonstration of why it is important to think of the Efficiency stepladder approach as one way safe. If it reports that it only takes a low number of sequential peer removals to move from large inefficiency to the frontier one can be certain that the sensitivity of the efficiency score is high. But as demonstrated by Figure 2, one should not be too calm if the algorithm indicates that a high number of peer removals are necessary. It could be tempting to use brute force and compare all possible peer removals until the path with the lowest number is found, but this may not be a practical alternative because this exhaust the capacity of today’s generation of PCs. The reason is that the number of alternatives to compare will easily be extremely high. There are some ways to reduce this problem. One way is to cluster two observations close to each other into a singly entity, and possibly minimize a penalty function where removing this entity counts as double the removal of only one DMU. Another possibility is to combine the Efficiency stepladder approach with peeling, and notice the cases where there are large differences between the results of these two methods. Peeling has it own weaknesses, but it is simpler, faster and in some cases more robust in difficult situations such as the one presented in Figure 2. 4.3 Efficiency stepladder for fully efficient units The Efficiency stepladder can also be calculated for fully efficient DMUs, or DMUs which become fully efficient after having gone through a number of iterations from their original position below the frontier. In these cases we measure using the “superefficiency” concept (Andersen and Petersen, 1993). We continue to do Efficiency stepladder iterations, and stop when the superefficiency is undefined. The higher the number of steps from the original position to undefined, the more units involved in calculating the efficiency of the DMU. The reason why the sensitivity of the fully efficient DMUs is interesting is the same as 13 for the inefficient units – it is relevant to know how robust the frontier that this unit is compared with is to measurement error. If the efficiency of the unit becomes undefined after a low number of Efficiency stepladder iterations, this is an indication that the part of the frontier that this unit is compared with is not very robust. Calculating the Efficiency stepladder involves removing the most influential peer for a DMU in each step. The fully efficient DMUs are their own peers, and when we remove them the value is by definition greater than or equal to 1 (they are no longer allowed to be part of the convex combinations that define the frontier, but remain as ghost units that we measure against). For this reason ESL(1) for the frontier units is the same as superefficiency, while ESL(2) and later take the superefficiency concept further. 5. Model specification and data The empirical part of this paper is mainly intended as an illustration of the ESL method. The dataset is a quite typical example of the datasets used in empirical applications for the DEA method when it comes to the number of observations and the number of inputs and outputs. The dataset is cross section data on the Nordic and Dutch electricity distributors in 1997 (see Edvardsen and Førsund, 2003). The data was collected by the national regulators. The key characteristics of the data are presented in Table 1. The difference in size between the DMUs is large, as revealed by the last two columns. TOM is Total Operating and Maintenance cost (including labor costs) measured in Swedish kronor in thousands. LossMWH is energy loss in megawatt hours, RV is Replacement Value measured in Swedish kronor in thousands. NumCust is the number of costumers. TotLines is the total length of lines. MwhDelivered is the sum of megawatt hours delivered. See Edvardsen and Førsund (2003) for further details on the content and history of the dataset. Table 1. Summary statistics. Cross-section 1997. Number of units 122. Average TOM(kSEK) LossMWh RV (kSEK) NumCust TotLines MWhDelivered 152388 91449 2826609 109260 7640 2110064 Median 97026 52318 1907286 55980 4948 1003472 Standard Deviation Minimum Maximum 182923 104777 3288382 163422 8824 2815025 11274 7020 211789 20035 450 166015 981538 615281 22035846 1052096 54166 178054730 14 6. The results 6.1 The basic DEA efficiency results Figure 3 is an Efficiency Diagram10 showing the results of the efficiency calculations assuming Variable Returns to Scale (VRS). Each of the efficiency scores is calculated by solving the linear programming problem in (1). The DEA calculations shown in Figure 3 assume no measurement error, but what if this assumption does not hold? The purpose of the Efficiency stepladder approach is to examine the sensitivity of efficiency scores to measurement errors. 1.0 0.9 0.8 0.7 E1 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 2000000 4000000 6000000 8000000 10000000 12000000 14000000 16000000 18000000 Size in TOM Figure 3: Input saving efficiency scores when assuming VRS (E1). 10 One interesting feature of Efficiency diagrams is that both the heights and the widths of the bars can contain information – unlike a bar chart where only the heights of the bars are actively used. This is especially useful when illustrating the results of efficiency analysis. The efficiency of each DMU is shown by the height of the bar, while its economic size (man-years in Fig. 3) is shown by the width of the bars. This means that it is possible to examine whether there are any systematic correlations between the sizes of the units and their efficiencies. Another interesting geometric aspect of these figures is that they are sorted according to increasing efficiency from left to right. The distance from the top of each bar to 1.00 is a measure of that DMU’s inefficiency, and the width of the bar is a measure of its economic size. For this reason the area above each of the bars is proportional to the economic cost of that DMU not being 100% efficient. This means that there will typically be a “white triangle” above the inefficient units, and that the size of that area is proportional to the economic cost of the total inefficiency in the sample. 15 0.24 0.22 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 Figure 4: First step ESL values for all the inefficient DMUs, sorted in increasing order. The changes in the efficiency scores from the first step in the ESL algorithm are shown in Figure 4. For more than half of the inefficient DMUs the changes after removing their originally most influential peer, ESL(1), is larger than 5 percentage points, and for a fifth of the DMUs the changes are larger than 10 percentage points. Two DMUs experience increases in their efficiency scores larger than 20 percentage points. This suggests that the individual efficiency scores in DEA applications strongly depend on the assumption of no measurement error. If the most influential of its peers is outside the true production possibility set, one can get a very large negative measurement-error bias in the estimated efficiency scores. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 ESL(1) 0.1 Original 0 Figure 5: The original efficiency scores and the ESL(1) values for all the inefficient DMUs, sorted pairwise. 16 Figure 5 is similar to Figure 4, but now the ESL(1) values for the inefficient units are shown together with their DMU’s original efficiency (sorted pairwise). A visual inspection of Figure 1 confirms that a number of the inefficient units move from being quite inefficient to being quite efficient (if we place the border between these two conditions at the ad hoc value of 0.85). It is also interesting to notice that there does not seem to be any strong pattern concerning the correlation between the original value and the ESL(1) value, especially if one sees it in light of the DMUs that are originally assigned a high efficiency being limited in how big the ESL(1) value can be since the efficiency number can not be larger than 1. Figure 6 is similar to Figure 5, but shows the Efficiency stepladder values for the first six steps of the sequential Efficiency stepladder iterations for all the inefficient DMUs. The changes in the efficiency score with ESL(2) tend to be smaller than in ESL(1), but there are examples of the opposite. A few of the DMUs have low sensitivities to the validity of their peers, but the general picture is that most of the DMUs experience large changes in their efficiency scores after two or three Efficiency stepladder iterations. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 ESL(6) ESL(5) ESL(4) ESL(3) ESL(2) ESL(1) Original 0.2 0.1 0 Figure 6: Stacked bar chart showing original efficiency and ESL(1) to ESL(6). 17 A(1) 1 B(4) C(10) D(19) E(33) F(39) G(42) ) H(56) 0.95 0.9 0.85 0.8 E-score 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 Efficiency stepladder number Figure 7: The Efficiency stepladder for a few selected inefficient DMUs (horizontal axis is the Efficiency stepladder number). Figure 7 shows the ESL curves for seven of the inefficient DMUs in the dataset (referred to as A-H). They are selected because their curves show some of the different developments. By construction all of the curves are non-decreasing. The identity of each of the curves is indicated on the top of the figure, together with the required number of Efficiency Stair iterations for that DMU to reach full efficiency. DMU A has an original efficiency of 0.85, but it becomes fully efficient after excluding only one of its peers from the sample. DMU B has an original DEA efficiency score of 0.66, but after four steps it reaches the frontier. DMU C starts at 0.57, but has a very steep increase in efficiency. The other DMUs (E-G) have slower efficiency increases. One natural summary measure of the steepness of the Efficiency stepladder is the average increase per step. This is presented in Table 2. It is worth noticing that DMU B starts at a much lower efficiency than DMU D, but because of DMU B’s lower average increase per step it reaches the frontier in only four steps while DMU D needs 19 steps. DMU H is a bit different from the other DMUs in that it experiences a mostly convex (broadly speaking) development, while the other DMUs that experience a large number of ESL steps before they reach the frontier (D-G) follow a mostly concave pattern. 18 Table 2: Average increase per step for the selected DMUs Original Inefficiency Steps Average increase value per step A 0.84 0.16 1 0.16 B 0.66 0.34 4 0.085 C 0.57 0.43 10 0.043 D 0.78 0.22 19 0.012 E 0.79 0.21 33 0.006 F 0.47 0.53 39 0.013 G 0.77 0.23 42 0.005 H 058 0.42 56 0.008 6.2 Efficiency stepladder (ESL) versus Efficiency Order The Efficiency Order approach (Sinuany-Stern et al., 1994) is mainly concerned with the minimum number of DMUs that need to be deleted for the DMU in question to reach the frontier. In relation to Figure 8 the efficiency order approach would be interested in the number of steps each of the DMUs required to get to the frontier, while the ESL approach is more focused on the steepness of the Efficiency stepladder. Another difference is that the Efficiency Order approach does not seem to be interested in the first few steps in the stepladder, but only in the total number of iterative DMU exclusions that leads to full efficiency. This might be a weakness of the Efficiency Order approach since the likelihood of a low number of observations to be outside the true production possibility may be low. Correspondingly, the likelihood of a large number of observations to be infected with serious measurement errors may be quite low. In other words, one of the most interesting indications of a largely inefficient DMU’s robustness is its sensitivity to one, two, or maybe three sequential peer removals. The Efficiency Order approach would not capture the fact that the first peer removal might move an inefficient unit from 40% to 90% efficiency if the rest of its way to full efficiency takes 10 more steps. The ESL approach proposed in this paper is concerned with the changes in efficiency in each step, and not only with the number of removals necessary before the frontier is reached. Another difference is that the ESL approach is also relevant for the fully efficient units. The first step is then identical to the Superefficiency (Andersen and Petersen, 1993). 19 Thirdly, to calculate the Efficiency Order using the algorithm proposed in Cherchye et al. (2000), specialised software and some knowledge of computer programming are required. The authors claim that the calculation of the Efficiency Order (they refer to it as “efficiency depth”) with their approach “should not involve substantial computational burden” and “require only minimal effort using an ordinary PC desktop (sic).” The exact approach for calculating the efficiency depth is unclear, and they formulate a Mixed Integer Linear Programming (MILP) problem without explaining how much CPU time that is required on a desktop PC. Identifying which of the DMUs should be removed and in what sequence is left to the CPLEX11 MILP optimizer. It is not certain that a different MILP optimizer would choose the same path towards the frontier. The simpler algorithm used in this paper is more accessible to practitioners. A computer program that calculates the Efficiency stepladder has been developed to calculate the numbers presented. It will be freely available on the Internet so that practitioners can use it to get some crude but useful information on the sensitivity of the efficiency scores in DEA. Since the algorithm always chooses the one-step optimal solution it is predictable how the Efficiency stepladder is constructed.12 7. Conclusions Ideally sensitivity analysis, detection of potential outliers, and estimation of sampling bias should be carried out simultaneously. It is easier to detect outliers if we have some information about the sampling bias, and it is easier to estimate sampling bias if we have first identified the outliers. There have been developments made on all these areas in the last few years, but at the time of writing no single method offers a solution to all the mentioned challenges. The Efficiency stepladder method is simple and crude, but it can still be useful for applied DEA investigations. It should be thought of as one way safe: An Efficiency stepladder that is very steep is a clear indication that the DEA estimated efficiency is strongly dependent on the correctness of a low number of other observations. A slow increase on the other should 11 CPLEX is a commercial optimizer capable of solving Mixed Integer Linear Programming problems. See http://www.ilog.com/products/cplex/ for more information. 12 But even with the open algorithm used in the ES approach there can be situations where the computer program has to choose between two or more equally good alternatives. However, in most of these cases this is the last step before the DMU in question reaches the frontier, and the “problem” is that removing any of several alternative peers will lead to full efficiency. In this case the Efficiency stepladder curve and all the efficiency values remain the same; only the identity of the last peer removed will differ. 20 not be interpreted as a strong indication that the efficiency is at least this low. The reason is that the method is only one-step-optimal. In addition to measuring the sensitivity of the escores for efficient and inefficient units, it might be used in combination with bootstrapping to identify possible outliers. The necessary software for carrying out the Efficiency stepladder calculations will be made available from the author’s website. The purpose of the ESL method is to examine the sensitivity of the efficiency scores for measurement errors. Bootstrapping on the other hand is in the DEA context (primarily) used to measure sensitivity to sampling errors. We would expect that a DMU with a large ESL(1) value would also have a large standard error of the bias corrected efficiency score. The reason is that we expect the part of the (input, output) space where the DMU is located to be sparsely populated. Tentative runs have shown statistically significant and positive correlation between the ESL(1) values and the standard errors of the bootstrapped bias corrected efficiency scores. Furthermore, there is strong empirical association between the ESL(1) values for the fully efficient DMUs (=superefficiency) and the sampling bias estimated using bootstrapping. This is a promising topic for further research. 21 References Afriat, S., 1972, Efficiency estimation of production functions, International Economic Review, 13(3), 568-598. Andersen, P. and Petersen, N.C., 1993, A Procedure for Ranking Efficient Units in Data Envelopment Analysis. Management Science, 39(10), 1261-1264. Banker, R.D., 1993, Maximum Likelihood, Consistency and Data Envelopment Analysis, a statistical foundation, Management Science, 39, 1265-1273. Banker, R.D., 1996, Hypothesis Tests Using Data Envelopment Analysis, Journal of Productivity Analysis, 7, 139-159. Banker, R.D., A. Charnes and W.W. Cooper, 1984, Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis, Management Science 30, 1078-1092. Banker, R.D. and Chang, H., 2000, Evaluating the Super-Efficiency Procedure in Data Envelopment Analysis for Outlier Identification and for Ranking Efficient Units, Working Paper from the University of Texas at Dallas. Barr, R.S., M.L. Durchholz and Seiford, L., 1994, Peeling the DEA Onion. Layering and Rank-Ordering DMUs Using Tiered DEA, Southern Methodist University technical report, Dallas, Texas. Cazals, C., J.-P. Florens, and L. Simar, 2002, Nonparametric frontier estimation: a robust approach, Journal of Econometrics, 106, 1-25. Charnes, A., Haag, S., Jaska, P., and Semple, J.,1992, Sensitivity of Efficiency Calculations in the Additive Model of Data Envelopement Analysis, , International Journal of System Sciences, 23, pp. 789-798. Charnes, A., Cooper, W.W. and Rhodes, E., 1978, Measuring the efficiency of decision making units, European Journal of Operations Research 2, 429-444. Charnes, A., Cooper, W.W., Lewin, A.Y. , Morey, R.C., and Rousseau, J.J.., 1985. Sensitivity and Stability Analysis in DEA. Annals of Operations Research 2 139-150. Cherchye, L. Kuosmanen, T. and Post, G.T., 2000, New Tools for Dealing with Errors-InVariables in DEA, Katholike Universiteit Leuven, Center for Economic Studies, Discussion Paper Series DPS 00.06. Coelli, T.J. 1996, “A Guide to DEAP Version 2.1: A Data Envelopment Analysis (Computer) Program”, CEPA Working Paper 96/8, Department of Econometrics, University of New England, Armidale NSW Australia. Edvardsen, D.F. and Førsund, F.R. 2003: International benchmarking of electricity distribution utilities, Resource and Energy Economics, 25, 353-371. 22 Farrell, M.J.,1957, The measurement of productive efficiency, J.R. Statis. Soc. Series A 120, 253-281. Farrell, M.J. and Fieldhouse M., 1962, Estimating efficient production functions under increasing returns to scale, J.R. Statis. Soc. Series A 125, 252-267. Førsund, F.R. and N. Sarafoglou, 2002, On the origins of Data Envelopment Analysis, Journal of Productivity Analysis 17, 23-40. Färe, R., Grosskopf, S. and Logan, J., 1983, The relative efficiency of Illinois electric utilities, Resource and Energy 5, 349-367. Kittelsen, S.A.C., 1993, Stepwise DEA; Choosing Variables for Measuring Technical Efficiency in Norwegian Electricity Distribution, Memo 06/1993 Department of Economics, University of Oslo Kittelsen, S.A.C., G.G. Kjæserud and O.J. Kvamme: Errors in Survey Based Quality Evaluation Variables in Efficiency Models of Primary Care Physicians, HERO Memoranda 24/2001, Oslo. [http://www.oekonomi.uio.no/memo/memopdf/memo2401.pdf ] Kneip, A., Park, B.U. and Simar, L., 1998, A note on the convergence of nonparametric DEA estimators for production efficiency scores, Econometric Theory, 14, 783-793. Sinuany-Stern, Z., A. Mehrez and A. Barboy, 1994, Academic Departments Efficiency via DEA, Computers Ops. Res., vol. 21, No. 5, pp. 543-556. Timmer, C.P., 1971, Using a Probibalistic Frontier Production Function to Measure Technical Efficiency, Journal of Political Economy, Vol. 79, No. 4 (Jul. – Aug. 1971), 776-794. Thompson R., Dharmapala, P.S. and Thrall, R.M., 1994, Sensitivity analysis of efficiency measures with applications to Kansas farming and Illinois coal mining. In: Data Envelopment Analysis, Theory, methodology and applications. Edited by Charnes A., W. Cooper, Lewin A.Y, Seiford, L.M. Kluwer. Wilson, P.W., 1995,Detecting Influential Observations in Data Envelopment Analysis, Journal of Productivity Analysis 6, 27-46. Zhu, J., 1996, , Robustness of the efficient DMUs in data envelopment analysis, European Journal of Operations Research, 90, pp. 451-460. 1 EFFICIENCY OF NORWEGIAN CONSTRUCTION FIRMS∗ by Dag Fjeld Edvardsen Norwegian Building Research Institute, Forskningsveien 3b, NO-0314 Oslo, Norway. Email: [email protected] Abstract: Efficiency studies of the construction industry at the micro level are few and far between. In this paper information on multiple outputs is utilized by applying Data Envelopment Analysis (DEA) on a cross section dataset of Norwegian construction firms. Bootstrapping is applied to select the scale specification of the model. Constant returns to scale was rejected. Furthermore, bootstrapping was used to estimate and correct for the sampling bias in the DEA efficiency scores. One important lesson that can be learned from this application is the danger of taking the efficiency scores from uncorrected DEA calculations at face value. A new contribution is to use the inverse of the standard errors (from the bias correction of the efficiency scores) as weights in a regression to explain the efficiency scores. Several of the hypotheses investigated are found to have statistically significant empirical relevance. Keywords: Construction industry, DEA, efficiency, bootstrapping, weighted two stage. ∗ This study is part of the research project “Productivity in Construction” at the Norwegian Building Research Institute (NBI) financed by the Norwegian Research Council. I would like to thank Thorbjørn Ingvaldsen (NBI) for insightful comments about the nature of construction activities. Finn R. Førsund (University of Oslo) has followed the whole research process and offered detailed comments. Sverre A.C. Kittelsen (Frisch Centre) has been invaluable in the development of the software for bootstrapping, and for giving detailed comments. Comments from Lennart Hjalmarsson, Håkan Eggert and Hans Bjurek have highly improved the presentation of this paper. Roger Jensen (Statistics Norway at Kongsvinger) provided the dataset and offered help in the data selection process. Any remaining errors are solely this author’s responsibility. 2 1. Introduction Low productivity growth of the construction industry in the nineties (based on national accounting figures) is causing substantial concern in Norway. To identify the underlying causes investigations at the micro level are needed. However, efficiency studies at the micro level of the of the construction industry are very rare.1 The objective of this study is to analyze productive efficiency in the Norwegian construction industry. A piecewise linear frontier is used, and technical efficiency measures (Farrell, 1957) are calculated on cross section data following a DEA (data envelopment analysis) approach (Charnes et al., 1978). The DEA efficiency scores are bias corrected by bootstrapping (Simar and Wilson, 1998, 2000), and a bootstrapped scale specification test is performed (Simar and Wilson, 2002). A new contribution is to use weights based on the standard errors from the bootstrapped bias correction in the two stage model when searching for explanations for the efficiency scores. One reason for the small number of efficiency analyses of the construction industry may be the problem to “identify” the activities in terms of technology, inputs and outputs in this industry. It is well known that there are large organizational and technological differences between building firms. Even when the products are seemingly similar there are large differences in the way projects are carried out. For instance some building projects use a large share of prefabricated elements, while other projects produce almost everything on the building site. This often happens even when the resulting construction is seemingly similar. It is interesting to note that projects with such large differences in the technological approach can exist at the same time. Moreover, the composition of output varies a lot between different construction companies so the definition of the output vector may also be a problem. Thus to capture such industry characteristics, a multiple input multiple output approach is required. A debated issue is whether an efficiency analysis should be carried out at the project level or at the firm level. In many ways it is more natural to think of the project level as the decision making unit (DMU) in this industry. In addition it might be easier to find relatively homogenous projects than firms. A third aspect is that when one tries to explain any 1 Two Scandinavian studies are Jonsson (1996) which looked at construction productivity at the project level and Albriktsen and Førsund (1991) which investigated the efficiency of Norwegian construction firms. The latter was based on a parametric frontier approach specifying only one output. 3 efficiency differences it is likely that there are bigger differences between the projects than between the firms when it comes to choice of construction technology. However, the required data for studying productivity at the project level is not (yet) available, so the firm level is the only available level for an efficiency study of the construction industry at the micro level in Norway.2 It should be noted that the firm level should not necessarily be seen as a higher aggregation than the project level. It is not unusual that a project in this industry is larger than any of the participating firms, and quite often a large project can span two or three accounting years. The layout of the rest of the paper is according to the following plan. Section 2 gives a brief overview of the methods used in this paper. The main ideas are explained, notation is introduced, and the most central references are listed. In Section 2.4 a new approach is developed, that explains the possible benefits of using weighted regression in a two stage DEA setting. Section 3 deals with how the data used in this paper was collected and processed. Selection of the scale specification in the DEA model is the topic of Section 4. In Section 5 results of the DEA efficiency calculations are reported, and the effects of correcting the efficiency scores for bias is shown. Some interesting hypothesis that might explain some of the differences in the firms’ efficiency scores are investigated in Section 6. Section 7 rounds off the paper with a summary. 2. The methods The efficiency scores in this paper are calculated with DEA and then bias corrected with bootstrapping. The model selection is also done with the help of bootstrapping, while the statistical power of the stage two regression is increased by taking advantage of the standard errors of the bias corrected efficiency estimates. 2.1 Data Envelopment Analysis (DEA) The idea behind DEA is to use the closest possible piecewise linear envelope of the actual data as an estimate of the border of the production possibility area. A more detailed explanation than is given here can be found in e.g. Cooper et al. (2000). The efficiency of an observation (often referred to as a “DMU,” Decision Making Unit, in the DEA literature) is 2 Collecting data at the project level is part of the research within “Productivity in Construction.” ro nti er 4 CR S -f Output y k k Reference point for unit P 1 (output increasing) Self-evaluator (exterior) Self-evaluator (interior) VRS - frontier yn n Reference point for unit P 1 (input saving, weighted) F G Peers for unit P 1 (output increasing) D C yB B Reference point for unit P1 (input saving, radial) yA A y1 h O x h P1 Reference zone for unit D (shaded area) m x A x B x 1 Input Figure 1: The DEA method illustrated in two dimensions. calculated as the relative distance to the frontier. The efficiency score is a number between 0 and 1, and the units positioned on the frontier are assigned the efficiency score of 1. Input saving efficiency is a measure of how much it is possible to simultaneously reduce all inputs, while the outputs are at least the same. Banker et al. (1984) formalized the axioms that an envelopment should satisfy, and showed that the DEA production possibility set is the smallest set that satisfies the following assumptions (x1, x2 are vectors of inputs; y1, y2 are vectors of outputs): 5 1) All observations are possible: If we observe (x1, y1), then it is possible to produce y1 with the use of x1. 2) Convexity: If (x1, y1) and (x2, y2) are observed, then a x1 , y1 + 1 − a x2 , y2 is possible b g b gb g for all a in [0,1] (this is true when assuming Variable Return to Scale (VRS). When assuming Constant Returns to Scale (CRS) any positive a is allowed). 3) Free disposal: Higher usage of resources always means that it is possible to produce the same or more of products. It is also always possible to produce fewer products with the same amount of resources. In Figure 1 the most important concepts in DEA are illustrated. A, B, C, D, F, and G are DMUs that in DEA would be calculated as technically efficient when we assume VRS technology, while P1 is technically inefficient. With CRS only the DMU with the highest output/input ratio would be considered technically efficient, because in this case the productivity of all units is compared – independently of the size of the DMUs. In the following I will concentrate on the VRS production frontier in Figure 1. The reason is that some interesting aspects of the DEA method apply to CRS only if we have more than two dimensions. This again is because under CRS in two dimensions all units are compared to the same face (=the part of the efficiency frontier that the inefficient units can be compared to; each linear part of the frontier) in CRS, but with VRS we typically have more than one facet. The efficiency measures can be set up mathematically as Linear Programming (LP) problems in the following way:3 E1: Input oriented VRS efficiency can be calculated by solving the following LP problem for each DMU. For unit P1 in Figure 1 this equals xA /x1. E1i ≡ Min θ i ∑λ y s.t. − ymi ≥ 0 , m ∈ M θ i xni − ∑ λij xnj ≥ 0 , n ∈ N j∈P ij λij ≥ 0 ∑λ j∈P 3 ij mj j∈P =1 (1) See Førsund and Hjalmarsson (1979) where these measures are defined in the general case, independently of the choice of frontier estimation methodology. 6 The reference point for DMUi is ( ∑ j ∈P λ ij xnj , ∑ j ∈P λ ij ymj ) . DMU A in Figure 1 is the input saving reference point for DMU P1. Point “m” is the radial projection point on the VRS frontier, and does not take advantage of the possibility to increase output in addition to the reduction of input. E2: Output oriented VRS efficiency can be calculated for each DMU by solving the following LP problem. For unit P1 in Figure 1 this equals y1/yn. 1/ E2i ≡ Max φi φi ymi − ∑ λij ymj ≤ 0 , m ∈ M s.t. ∑λ x j∈P ij nj λij ≥ 0 ∑λ j∈P j∈P ij − xni ≤ 0 , n ∈ N =1 (2) E3: Efficiency assuming CRS can be calculated based on either (1) or (2) if we remove the constraint in the last line which demands the sum of the weights to equal one. For unit P1 in Figure 1 this equals either x1/ xh or y1/yk; input and output orientation both return the same number when CRS is assumed. E4: Input reducing scale efficiency equals E3 / E1. E5: Output increasing scale efficiency equals E3 / E2. 2.2 Estimating sampling bias using bootstrapping It is well known that empirical estimations of the DEA models defined in the formulas above are influenced by sampling bias (Simar and Wilson, 1998). The reason is that the DEA estimate of the production frontier is based on a convex combination of best practice observations. If we had sampled all possible DMUs generated by the same underlying Data Generating Process (DGP), we would expect to get a new production possibility area that is strictly outside the DEA estimate. The sampling bias for a given DMU can be expected to be higher the lower the number of other observations in the sample. The DEA frontier estimate is 7 based on the best observed practice. But this is a biased estimate of the best possible practice in any real world (finite sample) situation. The following DGP is assumed (Simar and Wilson, 1998): observations are randomly drawn from the true production possibility area. There is a strictly positive probability of drawing observations close to all parts of the true production frontier, and the DEA assumptions (no measurement error, convexity, free disposability) hold. A homogenous efficiency distribution4 is assumed in the following, but this can be relaxed with a more complicated DEA bootstrap methodology (Simar and Wilson, 2000). Banker (1993) proved that as the number of draws goes towards infinity, the distance between the DEA estimate and the true efficiency score goes towards zero.5 In other words, the DEA estimator is consistent. But it is biased in finite samples. The reason is that there is zero probability that a finite number of samples will span the entire outer edge of a continuous production possibility area. The true efficiency of a DMU is the relative radial distance from the DMU to the true production frontier. The DEA estimated efficiency of the same DMU is the relative radial distance from the DMU to the estimated production frontier. The difference between these two distances is the sampling bias. One thing we know is that it is strictly positive, in the sense that the DEA estimated efficiency is higher than the true efficiency. Simar and Wilson (1998) showed how to estimate the sampling bias in DEA with a method referred to as “bootstrapping” (Efron, 1979). Bootstrapping is in general a way of testing the reliability of the dataset, and works by creating pseudoreplicate datasets using resampling. Bias correction in DEA using bootstrapping is based on the following assumption: (E1 – E1*) ~(approx.) (E1* - E1**), (3) where E1 is the true unknown (input oriented VRS) efficiency, E1* is the original DEA efficiency estimate, and E1** is the bootstrapped efficiency estimate. We can not directly calculate the left hand side of equation (3) since the true production frontier is unknown. However, it can be approximated by running computer simulations based on the right hand side of the same equation. This is possible since both the 4 An efficiency distribution is denoted as homogenous when it is independent of input mix, output mix, and scale. 5 Banker’s paper showed this formally in the single-output multiple-input case. This has later been generalized to the more general multiple-inputs multiple-outputs (Kneip et al., 2003). 8 DEA efficiency scores and the linear programs that created them are known. The homogenous bootstrap used in this paper can be calculated with the following algorithm (inspired by Simar and Wilson, 1998): a) Use the original dataset and calculate the DEA efficiency scores. b) Create a Kernel Density Estimate6 (KDE) of the efficiency scores from (a). c) Move all the DMUs to their comparison point on the frontier. d) Create a pseudo-dataset by dividing the input values from (c) with values obtained by drawing randomly from the KDE in (b) with reflection (Silverman, 1986). e) Calculate a new series of efficiency scores on the pseudo-dataset in (d). f) Repeat (d)-(e) a large number of times (2000 is recommended by Simar and Wilson, 1998). For E1** in (5) the average value of the efficiency scores in (f) is used. KDE is used to smooth the empirical distribution of the original efficiency scores (bootstrapping without smoothing is referred as “naïve bootstrapping”). Reflection is used to deal with the boundary condition that is problematic for nonparametric density estimation. The reason is that the KDE smoothing typically results in part of the smoothed distribution densities at values greater than 1. Denote the difference between E1* and E1** with Bias*. Based on this we can create a bias corrected efficiency estimate. Silverman (1993) gives a warning against using bias correction carelessly. The danger is that the bias corrected estimator might have a substantially greater standard error than the original estimator. The result is that we might end up with a new estimator that is unbiased, but at the same time “more wrong on average” than the original biased estimator (larger MRSE). See Simar and Wilson (1998) for a more detailed description. One difficulty with the algorithm above is that the kernel density estimation requires two parameters: the kernel function (i.e. Gaussian) and the bandwidth parameter (determining the length of the tails of the kernel function). In practice, the choice of the kernel is not nearly as important as the choice of the bandwidth. The theoretical background of this observation is 6 Kernel Density Estimation is a way getting a smoother estimate of an empirical distribution (when the true shape of the distribution is unknown). See Silverman (1986) for details. 9 that kernel functions can be rescaled such that the difference between two kernel density estimates using two different kernels is almost negligible (Marron and Nolan, 1988). In the kernel literature it is documented that the standard formulas for choosing bandwidths pick too large of a bandwidth if the distribution is multi-modal or highly skewed (Silverman, 1986). Since the latter may well be the case for efficiency distributions we avoid using the normal reference rule. Applying leave-one-out cross validation7 would probably have been the best alternative (Efron and Tibshirani, 1993). The main reason is that selecting bandwidth based on a predetermined mathematical formula would be less subjective.8 Bandwidth in this paper is selected using visual inspection of the kernel density estimate. This is done using an interactive tool9 created in Object Pascal for visually inspecting the effects of different bandwidths. The bandwidth selected was 1.0. The effects of choosing other bandwidths (0.5 and 1.5) have also been examined. It made quite a large difference for a low number of units, but the difference in the overall distribution of the efficiency scores was relatively small. 2.3 Testing the scale specification using bootstrapping As pointed out in Simar and Wilson (1998) the question of whether the production possibility set exhibits CRS has not only economical but also statistical importance. If the true technology is globally CRS then both E3* and E1* are consistent estimators of the true E3, but E1*might be less efficient than E3* in a statistical sense due to slower convergence. Simar and Wilson (1998) suggest several tests of scale specification using a bootstrapped test. One alternative is the mean of the ratios (with their notation): 10 n S$1crs = n −1 ∑i =1 D$ icrs xi , yi / D$ ivrs xi , yi b g b g (4a) Using the Førsund and Hjalmarsson (1979) notation (Section 2.1 in this paper): 7 Leave-one-out cross validation is a technique to investigate the probability that a certain observation was drawn from the same underlying population as the rest of the sample. See Silverman (1986) for details. 8 As far as is known, there is not any generally available tool for bandwidth selection based on cross validation. 9 This is a computer program (for the Win32 platform, or Linux under “Wine”) developed by Dag Fjeld Edvardsen. 10 It might be worth mentioning that the notation in Simar and Wilson’s (1998) formula 4.1 is a bit unclear since that paper uses a “hat” symbol on top of both the nominator and on the denominator. However, if one reads their paper carefully, one will see in the text that they clearly state that it is the ratio that should be estimated for each of the iterations, not the nominator and the denominator in separate iterations. This is to ensure simultaneous estimation. 10 n n n S$1crs = n −1 ∑i =1 E$ 3i xi , yi / ∑i =1 E$1i xi , yi = n −1 ∑i =1 E$ 4i xi , yi b g b g b g (4b) The question is whether the average scale efficiency we observed (using uncorrected DEA; E4*) could have been generated by a CRS technology. An attempt to answer this is made by running a bootstrap simulation where we assume that the true technology is CRS. In each of the iterations we record the average value of E4. If the average E4* that we originally calculated using DEA is outside the given density range, e.g. 95%, then we choose to discard the H0 that “The true technology exhibits CRS” and use VRS instead. In addition to the test above, Simar and Wilson (1988) suggest several other tests; among these were the ratio of the means: n n S$2crs = ∑i =1 E$ 3i xi , yi / ∑i =1 E$1i xi , yi b g b g (5) They end up recommending the ratio of the means (S2) since it performs best in the Monte Carlo tests. But the mean of the ratios (S1) performs almost as well, and has an intuitive geometric interpretation. In this paper both S1 and S2 will be calculated in the scale specification test. 2.4 Weighted regression in stage two In the empirical DEA literature it is common to use a “two stage” approach11 when efficiency estimates are to be both measured and “explained.” The first stage refers to the DEA calculation of efficiency scores, based on the data on inputs and outputs. In the second stage it is investigated whether the efficiency scores from stage one are empirically correlated with other variables we believe may “explain” the efficiency scores. The variables used in stage two are typically environmental or managerial variables (both discretionary and nondiscretionary variables are commonly used). This possible “empirical correlation” is investigated using multivariate regression models with the efficiency score on the left side of the equation, though other approaches can also be used. It is often argued that one should do the estimation of both the efficiency explanatory variables and the efficiency itself in the same stage. This argument for improving statistical 11 See Førsund and Sarafoglou (2002) for a historical account of the origin of the two stage approach. 11 efficiency is frequently put forward in “standard econometrics.” However, there might be situations where that is not possible or desirable. If it can be assumed that the explanatory variables affect the production in a different way than the regular inputs, i.e. that the explanatory variables do not influence the rate of substitution between the latter, then one might not lose statistical efficiency by using the two stage approach. The explanatory variables might have the character of general shift factors. Another reason for choosing a two-stage approach is if the explanatory variables are too “rough” for DEA. The assumptions behind the DEA model do not allow measurement error, even in those cases where the measurement errors can be assumed to be symmetrically distributed. A second stage regression model will in such a situation be more robust than DEA. In addition there are more tools available for doing diagnostics and for correcting possible problems. A reason that has been mentioned when these discussions arise is that the explanatory variables are non-discretionary. However, this is not a good motivation to avoid a single stage approach. Several of the generally available DEA software packages allow making one or more of the included variables “fixed” (non-discretionary). An additional argument is that we might not know if the variable is an input or and output (this has been addressed in Simar and Wilson, 2001). Lastly, it might be difficult to include the variable in the single-stage DEA calculation if we have reasons to believe that the relationship between this variable and the efficiency score is not monotonic. It might be partly dealt with either by transformation of data, or possibly by relaxing the assumption of free disposability (allowing for congestion). Given that a two stage approach is chosen it will be more efficient, statistically speaking, to bring with us the inverse of the standard errors over to stage two. See for instance Carrol and Ruppert (1988) for a more detailed description of using weighted regression from an econometric perspective. The motivation for using weighted regression is that we have different degrees of certainty when it comes to the precision of the estimation of the efficiency scores in stage 1. For this reason we want to put a larger weight on the more precise observations when we fit the regression hyperplane to the data. This means that there is a greater penalty if the hyperplane is far away from observations with a high weight than for those with a low weight. When we do the bootstrapped bias correction as described in Section 2.2 we get not only bias corrected point estimates, but also standard errors. These standard errors are (given the assumptions) good measures of how certain we are of the value of each of these point 12 estimates. Since higher standard errors mean lower certainty, the inverse of the standard errors will be used as weights in the stage two regression. Simar and Wilson (2003) describe two possible approaches for how estimation can be done in a statistically consistent way in DEA in a two-stage setting. They suggest using either a single or a double bootstrap approach, and then argue that the latter is preferable since it has a more rapidly declining MRSE of the intercept and the slope in the regression. Comparing these methods (analytically or using Monte Carlo simulations) with using weighted regression and discovering which of them performs best is a task for further research. One of the motivations behind this paper is to investigate causes for the efficiency differences among firms in the Norwegian building industry. As described later in this paper (Section 2.4), weighted regression in a two stage setting will be used, and the reason is to reduce the influence of the bias corrected efficiency scores with a large estimated standard error. The view of the current paper is that an unbiased estimator might be useful even if it has an estimated MRSE larger than the original biased one. One example is to use it in a second stage regression (with weights based on the inverse of the standard errors), which is done in Section 6.3. 3. The data In 2001 the Norwegian building industry consisted of about 34 500 enterprises, and employed about 132 500 persons (about 10 percent of the Norwegian labour force). From 2000 to 2001 there was an 8.7% growth in turnover and 7.4% growth in employee compensation. The efficiency calculations in this paper must be seen in the light of the fact that the industry experienced strong growth in the year under investigation. The primary data on the building enterprises is collected on a yearly basis by Statistics Norway. All the firms in the dataset used in this paper have a NACE code of 45.211. This means that at least 50% of their production value is in the category “construction of buildings.” The sample collected by Statistics Norway consists of all enterprices with more than 100 employees, and a sample of the smaller enterprices. The sample contains at least 30% of the total emplyment in the NACE 45.211 subgroup. Based on data for each building enterprise we have created a cross section database on production and resource usage for the most recent year available (2001). A rather extensive set of input and output data were available based on annual company accounts and the structural survey conducted by Statistics Norway. After extensive discussions with Statistics 13 Norway and sector experts the input-output specification was selected. Output is measured as value split on three different categories: Residential buildings, Non-Residential Buildings, and Civil Engineering. The three inputs are External Expenditure, Labor, and Real Capital. Details are laid out in Section 3.2. 3.1 Data quality filters Statistics Norway has several routines for detecting and correcting erroneous data. This should help improve the quality of the data. However, the data collected in the yearly surveys is for general purposes, and the definition of which observations we believe to have good enough quality depends on what they are to be used for. Productivity measurement with frontier models is especially sensitive to outliers. When we use the DEA model we formally assume that there are no measurement errors in the data we feed into the efficiency estimation model. However, it is important to avoid shaping the results to confirm a priori suspicions. This is especially important in a frontier setting, because the frontier-defining units are by definition outliers. But experience with empirical DEA applications strongly suggests that not cleaning the data for suspicious units can lead to very questionable and sometimes absurd results. Very influential units should be checked extra carefully, since errors in these DMUs can strongly influence the efficiency estimate for a large number of other DMUs.12 It was required that all the observations used in the DEA model should be able to meet the following three requirements. (1): At least 90% of the production (measured in total value) has to be from the construction industry.13 15.8% of the companies did not meet this requirement. (2): All three inputs must be greater than zero. 23.6% of the companies did not meet this requirement. (3): The observed usage of labor in man-years must be larger than or equal to one.14 4.9% of the dataset did not meet this requirement. 3.2% of the companies failed to meet more than one of the three requirements. After this automatic cleaning, five more units were removed from the dataset after test runs of the DEA model (using a VRS specification). They showed up as strongly influential 12 In Thorgersen et al. (1996) the “Peer index” was introduced. This measures the influence of each of the peers in the DEA estimation relative to how large a share of the improvement potential (for each of the dimensions) this peer is referencing. The calculation of the Peer index is based on the optimal weights in the DEA calculation. The maximum value is 100%, and can only be attained if this peer refers all potential improvements in this dimension. The Peer index is a useful measure of the influence of each of the peers in the dataset. 13 The number 90% is ad-hoc, but is selected to make sure that the building firms are homogenous in the sense that none of them are allowed to have a large share of their sales outside the building industry. 14 (large peer index, see Torgersen et al., 1996) and with high superefficiency15 (see Andersen and Petersen, 1993). Originally superefficiency was used as a way to rank among the efficient units, but recently it has more often been seen as a way of detecting strange units. Removing strongly influential units if they are radial outliers might be questionable, but is probably the least evil choice. 3.2 Descriptive statistics for the primary dataset The resource usage of the building entrepreneurs is captured by three inputs (the three first columns in Table 1): External Expenditure includes materials, subcontractors, energy, transportation etc. Labor in Man-Years is a measure of the labor usage. Real Capital is a measure capital service based on the use of production equipment, machines, etc. It is calculated from rental expenditures and depreciation. The last three columns of Table 1 contain summary statistics on the production of the building entrepreneurs. Residential is a measure of the sales value of the residential and recreational buildings. Non-Residential is a measure of the sales value of other buildings, such as office buildings and institutional buildings (schools, prisons, hospitals etc). Civil Engineering measures the sales value of constructions such as roads, tunnels, harbors, etc. Because of the data filters all three inputs have strictly positive values, and the lowest value for Labor in Man-years is 1. The lowest value for all the three product variables is 0, but all firms have a strictly positive sum of output values. Construction is clearly the output with the lowest number of strictly positive output values, and is also the variable with the largest CV-number. Concerning the size distribution, 39% of the firms use less than 10 man years, Table 1: Descriptive statistics for the primary variables (342 observations after the data cleaning). Civil NonExternal Labor in ManReal Residential Residential Engineering Exp. years Capital 18.0 1.0 2.0 0.0 0.0 0.0 Minimum Maximum Average St.dev Count >0 CV16 14 4 083 634.0 44 968.6 233 063.7 342 5.2 2 950.0 23 9847.0 1 597 609.0 2 556 175.0 1 170 239.0 38.3 167.2 342 4.4 1970.1 13 655.7 342 6.9 27141.4 113 635.8 301 4.2 32 160.2 147 942.9 218 4.6 5 688.4 66 574.4 31 11.7 The reason is that only real production firms are included. Firms not meeting this demand may be pure accounting units, or may be newly started, closing down or in hibernation. 15 Superefficiency is a measure of the relative radial distance from the origin to the DMU in question, when the frontier is estimated without this DMU included in the dataset. Superefficiency is by construction greater than (or equal to) one. A superefficiency value of 1.2 implies that the DMU is positioned “20% outside” where the frontier would have been without this DMU (in a radial sense). 16 CV is the Coefficient of Variation. It is defined as the ratio of the standard deviation and the average. 15 47% use between 10 and 50 man years, 11% use between 50 and 100 man years, while 5% use more than 100 man years . The average firm has close to 41 employees and uses 38 man years. 4. Choosing model specification In Section 2.3 it is explained how one can use the bootstrapping methodology to help select the scale specification of the DEA model. If the scale efficiency (E4) in the original DEA model is outside the (95%) one sided lower confidence interval we reject the H0 that the technology is CRS and apply VRS instead. In Figure 2 the bootstrapped simulations required are plotted in a histogram. If the null hypothesis were true we would expect the observed E4 from the uncorrected DEA estimation to be located the inside the 95% confidence interval. The observed E4 is 0.777 and we get a strong rejection of H0. The histogram of S2 is practically identical. In both cases (S1 and S2) we get a very solid rejection of the null hypothesis (“The true production technology is globally CRS”). Based on this result we will in the following assume that the technology exhibits variable returns to scale. Fraction .229 0 .777 1 S1 Figure 2: Histogram of the bootstrapped distribution of the average scale efficiency (S1) assuming CRS. 16 In this paper the only statistical tools for choosing the correct model are the ones designed for tests of scale specification. A similar set of bootstrapped tests for model specification could also be used for selecting which variables should be included. This line of thought is based on Banker (1993, 1996) and Kittelsen (1993), but it would be better from a statistical perspective to perform these tests using the bootstrap methodology. However, it is important to select the model based on economic theory and the knowledge of the sector we are investigating – not purely on statistical tests. 5. Estimating the efficiency scores 5.1 DEA efficiency scores The figures showing the uncorrected DEA efficiency scores (E1 and E3) will only be commented on briefly since the numbers change greatly when correcting for sampling error using the bootstrap method. However, it is important to point out that almost all of the published DEA papers stop with calculating only the DEA efficiency scores, and do not estimate and correct for sampling bias. It will be shown that this makes a big difference for the interpretation of the results. Refer to Section 2.2 for explanation of bias correction 1.0 0.9 0.8 0.7 E1 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 2000 4000 6000 8000 10000 Size in man-years Figure 3: Uncorrected efficiency scores assuming variable returns to scale. . 12000 17 Figure 3 shows the uncorrected efficiency scores, assuming VRS, in an Efficiency diagram.17 One interesting feature of Efficiency diagrams is that both the height and the width of the bars can contain information – unlike a bar chart where only the heights of the bars are actively used. This is especially useful when illustrating the results of efficiency analysis. The efficiency of each of the DMUs is shown by the height of the bar, while its economic size (man-years in Fig. 3) is shown by the width of the bars. This means that it is possible to examine whether there are any systematic correlations between the sizes of the units and their efficiencies. Another interesting geometric aspect of these figures is that they are sorted according to increasing efficiency from left to right. The distance from the top of each bar to 1.00 is a measure of that particular DMU’s inefficiency, and the width of the bar is a measure of its economic size. For this reason the area above each of the bars is proportional to the economic cost of that DMU not being 100% efficient. This means that there will typically be a “white triangle” above the inefficient units, and that the size of this area is proportional to the economic cost of the total inefficiency in the sample. The software used to construct these graphs is an “add-in” for Microsoft Excel.18 1.0 0.9 0.8 0.7 E3 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 2000 4000 6000 8000 10000 12000 Size in man-years Figure 4: Technical productivity (E3). 17 This diagram was first introduced in Førsund and Hjalmarsson (1979), and is based on the input coefficient diagram in Salter (1960). 18 This add-in (MS Excel under Win32) is authored by Dag Fjeld Edvardsen, and will be made available for free from http://home.broadpark.no/~dfedvard. 18 When it comes to interpreting Figure 3 there are two aspects that are dominating. The first is that the average efficiency is quite high (83.44% to be exact), and the second is that all of the largest units have been considered fully efficient. Figure 4 shows efficiency under CRS (E3). E3 is still interesting even when we have chosen VRS as the correct scale assumption. The reason for this is that E3 is also a measure of “technical productivity,” so it is useful even when we don’t believe in it as a measure of efficiency. The efficiency is much lower with E3, and the difference is quite striking for the largest units. However, when comparing Figure 3 and 4, it is important to remember that sampling error has not been taken account of. Since the difference between the LP formulation for E1 and E3 is that the former implies restrictions on the multiplier weight, we have reason to believe that the efficiency measure E1 is more affected by sampling error than the productivity measure E3. A rough explanation is that with E3 all units are potentially compared independent of size, so each DMU has a higher number of units to be compared with. Figure 5 shows the histogram of the uncorrected efficiency scores from DEA, while Figure 6 shows the same for the bias corrected efficiency scores. It is obvious to the eye that the change of the distribution is dramatic. The most obvious difference is that the strong concentration of fully efficient units at the right of the histogram disappears when we correct Fraction .248538 0 .291115 1 Figure 5: Histogram of uncorrected efficiency scores (E1). 19 Fraction .111111 0 .276906 1 Figure 6: Histogram of bias corrected efficiency scores (E1). for the estimated sampling bias. In fact, only three DMUs are assigned unit efficiency score after the bias correction. Figure 7 contains both the bias corrected and the uncorrected efficiency scores in an Efficiency diagram. The bias corrected values are the lower bars, while the uncorrected values are plotted in the upper curve. Both series are sorted independently of each other. It is obvious to the eye that the estimated inefficiency is much larger with the bias corrected values. 1.0 0.9 0.8 E3, Corrected E3 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 2000 4000 6000 8000 10000 12000 Size in man-years Figure 7: Efficiency diagram of bias corrected and uncorrected efficiency scores (E1), sorted separately. 20 Figure 8 is based on Figure 7, but the bias corrected efficiency scores are sorted pairwise with the uncorrected efficiency scores. This allows us to compare the efficiency score for each individual DMU before and after the bias correction, and also examine whether there is any systematic difference when it comes to how the sampling error influences firms of different sizes. Inspection of Figure 8 confirms that all of the large construction firms have a large estimated bias. This is often the case, because the sampling error typically hits the large firms harder in a VRS model. The tendency is relatively strong, as shown by the regression on estimated bias versus production volume (and its square) below. Figure 5 is quite instructive because it shows the big difference that the bias correction does with the very large units. This strongly suggests that analyzing scale economies without checking for sampling bias in DEA can give misleading results. In addition, since the large firms very often contribute a large share of the production and resource usage of an industry, measures of efficiency at the aggregated level will tend to be more distorted. The same problem is present for the smallest firms, but this is much more difficult to point out in Figure 8 since the width of the bars are proportional to the size of the firm. An OLS regression between the estimated bias and the size of the firm (measured by the sum of sales and its square) obtains statistical significance for both parameter estimates (and the intercept), and the R-squared is 0.1755. 1.0 0.9 0.8 E3, Corrected E3 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 2000 4000 6000 8000 10000 12000 Size in man-years Figure 8: Efficiency diagram of bias corrected and uncorrected efficiency scores (E1), pairwise sorted. 21 CorrE 1 .276906 .291115 1 Figure 9: Scatter diagram of bias corrected and uncorrected efficiency scores (E1). The suggestion that bias tends to be correlated with scale is a little alarming. One reason is that we might get distortions when we investigate the economics of scale. Another reason is that it might not be unusual that the explanatory variables are correlated with the size of the DMU. A statistically significant finding of the correlation between uncorrected DEA and the explanatory variable might be the result, and misinterpretations are in these cases easily made. The effect might be strongest in VRS, where the effects of scale are supposedly removed from the equation. Practitioners that have carried out regression analysis on the uncorrected VRS efficiency scores without checking for the effects of scale (because it was supposedly already taken care of in VRS) might want to revisit old results and see if this happened in their case. Figure 9 shows almost the same information as Figure 8, but in a scatter diagram. The information about the size of the units is removed, but on the other hand it is easier to compare the individual changes with and without bias correction. The units with an uncorrected efficiency score of 1.00 are the ones that are changed the most by the bias correction. 22 Table 2: Original efficiency scores and bias corrected efficiency scores. Orig.Range Avg CorrE Obs. 1 0.855 77 0.9-1.0 0.891 69 0.8-0.9 0.809 76 0.7-0.8 0.723 71 0.6-0.7 0.634 28 0.5-0.6 0.513 19 0.4-0.5 0.457 1 0.3-0.4 -0 0.2-0.3 0.277 1 Sum Obs. 342 Table 2 shows the original e-scores (not corrected for sampling bias) in its left column. In the middle column are the average bias corrected efficiency scores for the same observations. It is interesting to note that the average bias corrected e-score for the units with unit uncorrected e-score is lower than the e-scores for the group with uncorrected e-score in the range between 0.9 and 1.0. This is a reminder of the large sampling bias associated with the DMUs that were assumed to be fully efficient in the uncorrected DEA calculations. The units we want to learn from are the ones with the highest corrected efficiency scores and the smallest confidence interval. When choosing between combinations of these two attractive aspects it would probably be wise to focus on the DMUs whose efficiency scores have low standard errors among the DMUs with quite high corrected e-scores. This is done in contrast to being totally focused on choosing the DMU(s) with the very highest corrected efficiency scores. Table 3 displays the descriptive statistics for the VRS (E1) and CRS (E3) efficiency scores, with and without bias correction. Before examining the numbers, one might expect the difference between the uncorrected and corrected average efficiency score to be largest in the VRS case. The reason is that the problem with sampling error might be expected to be largest in the VRS case, since the LP formulation (2) for the DEA problem requires the sum of reference weights to equal one. This is not the case in the CRS model, and for this reason Table 3: Descriptive statistics for VRS and CRS efficiency scores, with and without bias correction. E1 uncorrected E3 uncorrected E1 biascorrected E3 biascorrected Obs 342 342 342 342 Average 0.848 0.687 0.785 0.615 Stdev 0.136 0.167 0.116 0.137 Min. 0.291 0.255 0.277 0.235 Max 1 1 1 1 23 there is a higher number of potential units that a given DMU can be compared with under the CRS assumption. But the difference between the uncorrected and bias corrected average efficiency score is slightly larger under CRS than under VRS in Table 3. R2 for a regression between the uncorrected and the corrected efficiency scores is 0.801 if we allow for an endogenous constant term in the fitted regression equation. The fit is quite good, but as mentioned the difference is large for a high number of units. There is a tendency (as seen in Figure 10) that the lower the original efficiency score, the lower the estimated sampling bias. There might be several reasons for this, but one possible explanation is that the more centrally positioned a DMU is in the dataset (with regards to size and input/output mix), typically the higher the number of units it will be compared with. In other words, we expect a unit with a low observed uncorrected DEA efficiency score to be closer to its real value than one with a high observed efficiency score. The reason is that an observed uncorrected efficiency score is expected to be correlated with centrality in the dataset. The bias can be expected to be the largest for the units with the highest scores in the uncorrected DEA model, and the lower the uncorrected efficiency score the lower the expected bias. But the decrease in bias seems to be slower the lower the efficiency score of the DMU under investigation (a smooth fitted parametric function would have a positive first and second derivate. Bias .237706 0 .291115 1 Figure 10: Scatter diagram of estimated bias vs the original uncorrected efficiency scores (E1). 24 StErrCorrE .265485 0 .237706 0 Bias Figure 11: Scatter diagram of bias versus standard error of the corrected efficiency scores. This is closely related to the problem with the curse of dimensionality and the speed of convergence of the DEA model relative to sample size. It might be that the empirically strong relation discovered in the dataset can be investigated in light of a theoretical relationship. Such a formula might show the relation between the true efficiency and the size of the estimated bias. It seems that the lower the true efficiency, the lower the size of the bias (Bias = f(E1), f’>0, f’’>0) . However, this task will have to remain for further research. Figure 11 shows the strong empirical correlation between the estimated bias and the standard error of the bias corrected efficiency estimate. The relation is so strong that one might want to examine if this can be established formally: The higher the estimated bias, the higher one can expect the standard errors to be. A possible explanation is that the high estimated bias occurs for samples that one has little information about. If the area of input/output space is scarcely populated, we have little information about the location of the frontier; especially if the area is outside the center of the sample (we get little help from the convexity assumption and surrounding observations). At the same time, this lack of information is also captured in a large standard error. In other words, there might not be two separate phenomena, but rather two manifestations of the lack of information about the area in the input/output space where the DMU is located. This strong empirical correlation has 25 probably not been shown before in the literature, and an extensive search on the Internet for similar findings turned up with no relevant findings.19 As mentioned in Section 2.2 one should be careful when using the bias corrected efficiency estimates without evaluating the standard errors. The reason is that the bias corrected efficiency estimates might get higher MRSE than the original DEA estimates. Only 202 of the 371 DMUs (54.4% of the sample) in the dataset get bias corrected efficiency with lower estimated mean square error than the original DEA efficiency estimate. 6. Efficiency and productivity explained The efficiency estimation in Section 5 revealed large differences among the firms when it comes to technical efficiency and productivity. In this section, different hypotheses that might “explain” these differences are developed and tested in a stage two setting for empirical relevance. In addition to including the efficiency scores from stage one, it is argued in Section 2.4 that the certainty level for each of the observations should also be taken into account. It should however be noted that the available data is not sufficiently detailed to give a clear indication of why some firms appear to be much more efficient than others. Some hypotheses are associated with statistically significant parameter estimates, but these should mainly be viewed as indications of interesting topics for further research. 6.1 Constructing hypotheses based on existing theory and knowledge of the industry The hypotheses developed below were generated based on knowledge about the industry and given by the limited availability of data. a) Wage cost per hour. Higher wages can attract the best workers. In addition, piecework contracts can lead to higher average wages. One might suspect that this factor is closely related to the hypothesis (d) since overtime is also associated with a higher hourly pay. However, a regression with Wage Cost per Hour explained by Hours Worked per Employee gets an R2 of only 0.03, so there should be no problem with multi-colinearity. b) Apprentices. This is defined as the number of apprentices relative to the number of employees. The idea is that the most efficient companies have low shares of apprentices. The reason is that we expect the companies with high shares of apprentices to have higher costs 19 A search was carried out on the Internet search engine Google.com with the terms: DEA bootstrap correlation bias standard error (and confidence interval). It did not return any relevant results. 26 and lower production since apprentices are under training, which should imply lower productivity of the apprentices and also man-hours used by other employees to offer them guidance. On the other hand, a history of having a high number of apprentices could give good access to high quality human capital in the long run. The hypothesis is nevertheless that the total effect of a high number of apprentices is reduced efficiency. c) Product Mix. This is measured, using the Herfindahl index, by the quadratic share of the sales of the companies divided between the seven underlying markets. The expectation is that the most diversified firms are more efficient since they have the option of using their resources in the most attractive market depending on short term business cycles. It should be mentioned that the business cycles in the construction industry can change very fast. In addition, there is a possible selection effect since this variable might pick up that the “best” firms get contracts outside of their key markets. Notice that testing the economics of scope using a DEA model can be problematic. The reason is that DEA assumes global convexity. If we find that the most diversified companies are the least efficient, this might be a warning that we have a serious breach of one of the assumptions underlying the DEA model used in stage one. d) Hours Worked per Employee. The hypothesis is that the firms with high numbers of hours per employee are more efficient. The reasoning is that these firms get more efficient production by the use of overtime. It might be that some employees work better under a certain degree of pressure. It could also be that the best workers choose their employer based on the opportunity to work overtime. An additional possibility is that the repetition effect is positive and that the use of overtime allows for longer repetition sequences. e) Located in Oslo. Oslo is the capital and the largest city of Norway, and a pressure area. Housing prices are usually higher in the Oslo area, and the way efficiency in this paper may be influenced by this price effect. It would have been interesting to follow up Albriktsen and Førsund (1991) and examine if the amount paid to subcontractors is correlated with efficiency. However, the quality of the data at hand is too low (a large number of firms have reported a value of zero even when this is not believable). 27 Table 4: Correlation table for the explanatory variables. Wage cost per hour Share of apprentices Product mix Hours worked per employee Location Oslo Wage cost per hour 1 Share of apprentices Product mix Hours worked per employee -0.2188* 1 0.0197 -0.1546* -0.1446* 0.0208 1 0.0842 1 0.2242* -0.1 -0.0298 0.068 Location Oslo 1 Table 4 shows the correlation table for the explanatory variables. The pairwise statistically significant correlations (at the 5% sig. level) are marked with a asterisk (and a bold font style), and only those will be commented on. Wage Cost per Hour is negatively related to Share of Apprentices, Hours Worked per Employee and Location Oslo. In addition Share of Apprentices is negatively related to Product Mix. That Wage cost is negatively correlated with Share of apprentices and Location Oslo is not surprising since the pay to apprentices is less than for trained labor, and the wages in Oslo is known to be higher than in other parts of Norway. It is surprising that Wage Cost per Hour is negatively correlated with Hours Worked per Employee, but it might be that the employees with low hourly salaries choose to compensate by working extra hours. It might be that the substitution effect, known from Labor Economics, dominates. It is difficult to explain why Share of Apprentices is negatively correlated with Product Mix. 6.2 Do the suggested hypotheses have empirical relevance? The regression models used below are weighted least squares and OLS (for comparison). The weights in the weighted regression are the inverse of the squares of the estimated standard errors from the bootstrap simulations. The motivation is to put low weight on an observation when there is low certainty of its real value. The regression calculation was carried out in Stata7. This statistics package (and many others) has built in support for assigning a priori (in a sense) known weights to the observations. Table 5 shows the results from an ordinary OLS regression (left part of the table) and a weighted least squares regression. A truncated regression was also computed (with right truncation at 1) but the results were as good as identical to those laid out in Table 4. The coefficients with statistically significant parameter estimates (at the 5% significance level) are 28 Table 5: Weighted and unweighted regression. Explanatory variables Wagecost per hour (a) Share of apprentices (b) Product mix (c) Hours worked per employee (d) Oslo (e) Intercept Unweighted (R2=0.14) Coef. t P>|t| 0.0012 6.69 0 -0.0802 -1.15 0.25 -0.0478 -2.04 0.042 0.0001 2.64 0.009 -0.0106 -0.49 0.622 0.4349 5.91 0 Weighted (R2=0.26) Coef. t P>|t| 0.0019 9.14 0 -0.1487 -2.18 0.03 -0.0880 -3.83 0 0.0001 4.02 0 0.0093 0.32 0.752 0.2456 3.1 0.002 highlighted with bold fonts. A higher number of the parameter estimates are statistically significant in the weighted regression compared to the OLS regression. The p-values are also lower in the weighted regression, with the exception of the Intercept estimate which has a slightly lower p-value in the OLS regression. If we believe in the statistically significant parameters from the weighted regression i(Table 5), then the most efficient construction companies are characterized by: – High average wage per hour – Low numbers of apprentices – Low concentrations in product mix – High numbers of hours worked per employee Not statistically significant: – Located in Oslo Earlier in this paper it has been shown (Fig. 4) that the bias correction and the standard error of the bias corrected efficiency score have a strong and positive correlation. Remember that the inverse of the latter were used as weights in the main regression model. The implication is empirically that low weights are put on the units that have gotten a strong bias correction (because they very often get large confidence intervals). It was noted above that the units with the highest bias correction tend to be found among the units with efficiency scores equal to 1 from the uncorrected DEA calculations. Many applied DEA papers have used tobit regression in a second stage. The reason is probably that the authors have observed a concentration of DMUs with uncorrected efficiency scores of 100%, and that they based on this have thought of the DEA efficiency scores as 29 being truncated. This is wrong since (as seen in the LP formulation) they are not truncated20 -they are serially correlated. Simar and Wilson (2003) shows that using a tobit regression in stage two is both theoretically and empirically (using Monte Carlo simulations) wrong. 6.3 Productivity and scale Earlier in this paper a bootstrapped scale specification test rejected the null hypothesis that the correct model specification was CRS. However, even when we choose to believe that the true production function exhibits VRS we can find use for the CRS measure, and interpret it as productivity. This can be used as a measure of to what degree the sector has an efficient structure. In Figure 12 the maximal value plotted on the horizontal axis is 1,200,000. The intention is to zoom in on the range where there seems to be most interesting systematic tendencies in the simultaneous distribution of average productivity and scale. It seems that the average productivity of the construction firms increases until the size is about 100 millions (NOK). There does not seem to be any systematic change after this value is reached. However, there are not many construction firms with production values much higher than 100 millions NOK in the sample, nor in the population of all Norwegian construction firms. 1.2 1 0.8 E3 corrected 0.6 0.4 0.2 0 0 200000 400000 600000 800000 1000000 1200000 Size in production value Figure 12: Scale chart showing Production Value and E3 (range 0-1’200’000). 20 The efficiency scores can never be above 1. The reason for the concentration at 1 is that the efficiency scores of each unit depend on the input-output vectors of the other units (leading to serial correlation in the bestpractice calculation). They are not truncated at 1 as such. 30 7. Conclusions This paper concerns using DEA to investigate the efficiency of Norwegian building firms. Large differences in the efficiency and productivity scores were discovered. One important lesson that can be learned from this application is the danger of taking the efficiency scores from uncorrected DEA calculations at face value. If one decided to learn from a few DMUs based on their uncorrected efficiency scores, one might get into trouble. It is not unreasonable to think that similar things have happened in the last few years as DEA has been embraced by a very large number of practitioners (researchers and consultants). It would be interesting if the large number of empirical DEA papers were recalculated using the bootstrap methodology. Anecdotal observations indicate that very few practitioners use bootstrapping. The reason for this might be that bootstrapping is not yet available in the standard DEA software packages. Based on a scale specification test, a variable returns to scale specification was selected. A scale chart indicated that firms with total production values lower than 100 mill. NOK might be operating at a suboptimal scale level. The differences in the efficiency scores may be explained by environmental and managerial variables. Such variables have been tried in a two stage approach. A new contribution is the demonstration of how one can use the standard errors from the bias correction in stage one to improve the power of the regression model in stage two. Five possible explanations were examined for empirical relevance, and four of them were found to be statistically significant in a multivariate weighted regression setting. More detailed data would be necessary before strong conclusions can be made, but there are indications that the most efficient building firms are characterized by high average wages, low numbers of apprentices, diversified product mixes and high numbers of hours worked per employee. One possible problem when it comes to interpreting these results is the one of unbalanced selection. It might be that the firms that were removed from the dataset belong to a different population when it comes to the inefficiency distribution. There might be a positive correlation between entering correct data and the true technical efficiency of the units included. If the units included in the dataset are on average more efficient than the average in the population, then the overall picture of the efficiency of the industry is too optimistic. 31 Concerning further research a possible extension is to study time series by including data for other years. The Malmquist index could be used to decompose the productivity development of each firm into frontier shift and catching up. The relationship between productivity change and entry / exit analysis could provide additional insights. In the current paper a bootstrapped model specification test is used to select the scale specification, but a similar approach can also be used to help select which of the inputs and outputs should be included. It could be rewarding to examine how the weighted regression method suggested in this paper performs compared to bootstrapping in both stage one and stage two. This comparison could be done in a Monte Carlo setting. If data on the project level became available, it could be investigated whether the findings in this paper have empirical relevance on project level data. It would also be interesting to further investigate the theoretical relationship between the estimated bias and the original uncorrected DEA efficiency score, as well as the relationship between the estimated bias and the standard error of the bias corrected efficiency score. 32 References Albriktsen, R, 1989, Produktivitet i byggebransjen i Norden, NBI Project report no. 40, Norwegian Building Research Institute. Albriktsen, R. and Førsund, F, 1990, A productivity study of the Norwegian building industry, Journal of Productivity Analyses, 2-1990, pp. 53-66. Andersen, P. & Petersen, N.C., 1993, A Procedure for Ranking Efficient Units in Data Envelopment Analysis. Management Science, 39(10), 1261-1264. Banker, R.D.,1993, Maximum Likelihood, consistency and data envelopment analysis: a statistical foundation, Management Science, 39, 10, 1265-1273. Banker, R.D., 1996, Hypothesis Tests Using Data Envelopment Analysis, Journal of Productivity Analysis, 7, 139-159. Carroll, R.J and D. Ruppert, 1988, Transformation and Weighting in Regression, Chapman and Hall, New York. Charnes, A., Cooper, W.W. and Rhodes, E., 1978, Measuring the efficiency of decision making units, European Journal of Operations Research 2, 429-444. Cooper, W.W., L.M. Seiford, and K. Tone, 2000, Data Envelopment Analysis: A comprehensive text with models, applications, references and DEA-solver software, Boston/Dordrecht/London: Kluwer Academic Publishers. Edvardsen, D.F., Førsund, F.R., Kittelsen, S.A.C., 2003, Far out or alone in the crowd: Classification of self-evaluators in DEA, Working paper 2003:7 from the Health Economics research program, University of Oslo. Efron, B., 1979, Bootstrap methods: another look at the jackknife, Annals of statistics 7, 1-6. Førsund, F. R. and L. Hjalmarsson, 1979, Generalized Farrell measures of efficiency: an application to milk processing in Swedish dairy plants, Economic Journal 89, 294-315. Farrell, M.J.,1957, The measurement of productive efficiency, J.R. Statis. Soc. Series A 120, 253-281. Groak, S. 1994, Is construction an industry? , Construction management and economics, 12, 4, pp 187-193. Jonsson, J., 1996, Construction site productivity measurement: selection, application and evaluation of methods and measures. Doctoral thesis, Lulea University of Technology. Kittelsen, S.A.C., 1993, Stepwise DEA; Choosing Variables for Measuring Technical Efficiency in Norwegian Electricity Distribution, Memo 06/1993 Department of Economics, University of Oslo 33 Kneip, A., Simar, L. and Wilson, P., 2003, Asymptotics for DEA Estimators in Nonparametric Frontier Models, Discussion Paper 317, Institute de Statistique, Université Catholique de Louvain. Marron, J. S. and Nolan, D., 1988, Canonical kernels for density estimation, Statistics & Probability Letters 7(3): 195-199. Ofori, G. 1994, Establishing construction economics as an academic discipline, Construction Management and Economics, pp 295-306, 14, 4, Salter, W.E.G., 1960, Productivity and Technical Change, Cambridge, UK: Cambridge University Press. Silverman, B.W., 1986, Density Estimation for Statistics and Data Analysis, published by Chapman and Hall. Silverman, B.W. and Young, G.A.,1987, The bootstrap: to smooth or not to smooth? Biometrika 74, 469-479. Simar, L. and Wilson, P. W., 1998, Sensitivity analysis of efficiency scores: How to bootstrap in nonparametric frontier models. Management Science, 44, 49–61. Simar, L., and Wilson, P., 2000, A general methodology for bootstrapping in nonparametric frontier models, Journal of Applied Statistics 27, 779--802. Simar, L. and Wilson, P., 2001, Testing restrictions in nonparametric efficiency models, Communications in Statistics, 30, 159-184. Simar, L. and Wilson, P., 2002, Nonparametric Tests of Returns to Scale, European Journal of Operational Research, 139, 115-132 Simar, L. and P. Wilson, 2003, Estimation and Inference in Two-Stage, Semi-Parametric Models of Production Processes, Discussion Paper 307, Institute de Statistique, Université Catholique de Louvain. Torgersen, A.M., Førsund, F.R., Kittelsen, S.A.C., 1996, Slack adjusted efficiency measures and ranking of efficient units, Journal of Productivity Analysis, 7, 379-398.