Academia.eduAcademia.edu

Payment by Outcome: A Commissioner's Toolkit

2011

With emerging interest in contracting for outcomes, this study re viewed the lessons to be learned from a number of sectors where some form of payment-by-outcome had been employed in the past - welfare-to-work, offender management, the management of long-term health conditions, pharmaceutical pricing and foster care.

2020 Public Services Trust at the motivation incentives safetylearning achievement outcomes accountability resources PAYMENT BY OUTCOME effectiveness capacity leadership empowerment improvement responsibility results measure flexibility 2020 innovation population commissioning baseline A Commissioner’s Toolkit performance choice personalisation resources Payment by Outcome 2020 Public Services Trust Gary L. Sturgess and Lauren M. Cumming with James Dicker, Alexis Sotiropoulos and Nadiya Sultan 2020 Public Services Trust at the Payment by Outcome A Commissioner’s Toolkit 2020 Public Services Trust 2 About the 2020 Public Services Trust The 2020 Public Services Trust is a registered charity (No.1124095), based at the RSA. It is not aligned with any political party and operates with independence and impartiality. The Trust exists to stimulate deeper understanding of the challenges facing public services in the medium term. Through research, inquiry and discourse, it aims to develop rigorous and practical solutions, capable of sustaining support across all political parties. In December 2008, the Trust launched a major new Commission on 2020 Public Services, chaired by Sir Andrew Foster, to recommend the characteristics of a new public services settlement appropriate for the future needs and aspirations of citizens, and the best practical arrangements for its implementation. For more information on the Trust and its Commission, please visit www.2020pst.org. The views expressed in this report are those of the authors and do not represent the opinion of the Trust or the Commission. Published by the 2020 Public Services Trust, January 2011 2020 Public Services Trust at the RSA 8 John Adam Street London WC2N 6EZ © 2020 Public Services Trust, 2011 ISBN 978-1-907815-25-6 Payment by Outcome: A Commissioner’s Toolkit Paperback ISBN 978-1-907815-26-3 Payment by Outcome: A Commissioner’s Toolkit PDF 3 Payment by Outcome Contents Authors 5 Foreword 6 Executive Summary 8 1 Introduction 11 2 Why Use Payment-by-Outcome? 17 What Is It? 17 Why Is It Used? 19 Outcomes 22 3 4 5 6 Which Outcomes? 23 Whose Outcomes? 26 Level of Outcome 26 The Primary Tools: Measures and Standards 29 Measures, Standards and Incentives 30 Measures That Work 31 Designing Standards 35 How Measures Are Used 39 The Primary Tools: Incentives 42 Adjusting Intensity 42 Adjusting Diversity 43 Managing Gaming 47 Other Design Tools 50 Population Segmentation 50 2020 Public Services Trust 4 7 8 9 Dynamic Design 52 Contract Duration 55 System Design 57 Ownership of the Residual 57 Building a Learning System 58 Designing Procurements 59 Case Study 1: Welfare to Work 61 Population Segmentation 63 Designing Incentives 66 The Procurement Process 68 Case Study 2: Offender Management 71 Background 71 Payment-by-Outcome Pilots 72 Alternative Service Models 74 Selecting Measures 76 Controlled Innovation 78 10 Case Study 3: Long-Term Condition Management 80 Background 80 Why Pay for Outcomes? 82 Tools 83 11 Other Case Studies 88 Pharmaceutical Pricing 88 Foster Care 94 Foreign Aid 98 Streetscene Management 98 12 Conclusion 101 Acknowledgements 103 Endnotes 105 5 This study was undertaken by Gary L. Sturgess and Lauren M. Cumming with case studies by James Dicker, Alexis Sotiropoulos and Nadiya Sultan. The time of the senior authors was a donation to 2020 PST from The Serco Institute, a think tank specialising in the study of public service markets. Two research assistants were employed using donations to 2020 PST by Local Partnerships, Partnerships UK and Serco Civil Government. Gary has been Executive Director of The Serco Institute since January 2003, driving its research and publication agenda. He is former Cabinet Secretary in the New South Wales state government in Sydney, Australia. Lauren is a Policy Analyst at The Serco Institute. Previously, Lauren was a Researcher to the Commission on 2020 Public Services. Lauren has a Masters in International Security from Sciences Po Paris and an MSc with Distinction in International Political Economy from the London School of Economics. James was a Research Assistant at the Serco Institute. Previously, he worked at the Economic and Social Research Council and interned at the Commission on 2020 Public Services and the New Policy Institute. James has an MSc with Distinction from the London School of Economics and was the Adam Smith Prize winner for his BA at the University of Exeter. Alexis is a Policy Analyst at The Serco Institute, having joined in August 2006. Previously, he worked for the National Offender Management Service and, prior to that, the Department of Trade and Industry. He has a law degree from the University of Cambridge and an MA in legal and political theory from University College London, where he studied at the School of Public Policy. Nadiya is a graduate intern with Adam Smith International (ASI), working primarily on public inancial management within ASI’s government reform practice. Previously, Nadiya was a Research Assistant with the Serco Institute, a position she took up after completing her Master in Public Policy and Management from the London School of Economics with High Merit standing. Payment by Outcome Authors 2020 Public Services Trust 6 Foreword In the Foreword to this report’s predecessor, Better Outcomes, I relected that ‘Reducing the costs of inputs has value, but if the problem is how better to convert inputs to outcomes, then competition around the costs of the inputs will not solve it.’ Given the iscal climate in which we ind ourselves, that statement is worth reiterating. This report is not about cost cutting and it is not about the beneits of privatisation. Rather, it is dedicated to understanding how government can use payment-by-outcome to extract better results from public services, promote innovation, increase accountability and encourage co-production from service users. Having produced one report on what we then called outcome commissioning, we felt that there was more work to be done. Better Outcomes made the case for payment-by-outcome and explored areas where it could be applied, but because we wanted to keep the report brief and make it accessible to readers unfamiliar with these ideas, discussion of implementation was kept to a minimum. Events have moved on, and policymakers across government are now grappling with the practicalities of how to make payment-by-outcome actually work to reduce unemployment, rehabilitate offenders more effectively, improve clinical outcomes for patients with mental health problems and improve the life chances of the poorest children. This report, therefore, is very much a toolkit, designed to help commissioners think through the challenges of using payment-by-outcome and how these can be overcome. Through a number of real-world case studies, the authors have identiied common problems and solutions that have been implemented or attempted. It does not claim to have all the answers. Rather, it is intended to stimulate further investigation into how payment-by-outcome, with all its obvious beneits and evident dificulties, can be made to work. The tone of the report is optimistic, but cautiously so, which drives the authors to focus on the practical issues of implementation. While recognising the advantages of payment-by-outcome, the authors pay serious attention to the criticism that has Mintzberg. Paying for outcomes is not a panacea and needs to be implemented with due consideration to the goals of the commissioning body and the particularities and context of each service. However, the authors are clear that, done well, it can deliver substantial improvements to public services. I hope that policymakers, commissioners and public service providers will engage with the report and its authors, so we can continue to learn from experience, share our knowledge and improve our public services. Lord Geoffrey Filkin Chair, 2020 Public Services Trust Payment by Outcome come from respected thinkers such as Aaron Wildavsky, Allen Schick and Henry 7 2020 Public Services Trust 8 Executive Summary Payment-by-outcome is a form of performance management where providers are paid on the basis of outcomes rather than effort. It combines a high-stake form of performance contracting that has come to be known as payment-by-results, with intense focus on the primary outcomes for which government programmes have been introduced. Payment by Outcome is a toolbox. It explores the challenges involved in using this complex form of performance contracting. It seeks to understand the tools that have been employed in the past to cope with these challenges. And it imagines ways in which these instruments might be used differently in the future. It does this by examining the use of payment-by-outcome in welfare to work, offender management and long-term condition management, and drawing on additional insights from several other areas, including pharmaceutical pricing and foster care. In the public sector, programme objectives are often ambiguous, with primary outcomes surrounded by a variety of contextual goals. Having established the need for early resolution of these ambiguities, Payment by Outcome turns to the tools that lie at the heart of this particular form of performance contracting, measures, standards and incentives, arguing that in the design and application of these instruments, commissioners must give full regard to the human dimensions of performance regimes. Successful performance contracting draws on a wide range of non-inancial incentives, so that in designing a system based on payment-by-outcome, commissioners can adjust both the diversity as well as the intensity of incentives. They can also adjust the deinition of the service population and the duration of contracts to further improve the alignment of the providers’ and commissioners’ interests. However, commissioners also have more strategic instruments at their disposal, enabling them to adjust the design of the system within which contracts are made. The most complex payment-by-outcome systems currently in operation – those assisting the long-term unemployed into work – have been under development for past experience, policymakers must also be patient, recognising that any move to high-stake performance incentives and outcome speciication will inevitably be a process of discovery, with initial mistakes and misunderstandings. It follows that the ideal set of performance incentives and the most effective segmentation of the population cannot be known in advance, but will only be discovered through systemic learning. The report argues that commissioners should deliberately create adaptive systems from the outset, where exploration is encouraged and learning is embraced. The third high-level conclusion of this report is that payment-by-outcome works best in public services where there are ‘known unknowns’. Where the linkages between inputs and outcomes are well understood and tightly connected, there is little point in specifying outcomes. Commissioners might as well purchase the key inputs or processes that they know will deliver the desired outcomes. On the other hand, where these linkages are so poorly understood that there is very little agreement about the relationship between effort and outcome, it will be virtually impossible to write an outcome-based contract that effectively transfers risk. Under such conditions, providers might just as easily be penalised for failings over which they had no control, or rewarded for successes to which they made only a small contribution. Payment-by-outcome seems to work best in circumstances where commissioners already have some conidence about the service models that are likely to work, but lack conidence about the capacity of existing delivery chains to deliver signiicantly better outcomes. Much of the interest in payment-by-outcome seems to relate to certain kinds of innovation: (i) identiication of those beneiciaries for whom particular service models will work best; (ii) creation of effective management processes (for example, through joining up fragmented supply chains) enabling services to be tailored to different classes of beneiciary; and (iii) encouragement of much greater co-production on the part of beneiciaries. The inal high-level conclusion from the report is that payment-by-outcome is not always appropriate. Policymakers must be clear about the policy challenge they are grappling with and the nature of the interventions that are most likely to work. If payment-by-outcome is the default option, social problems for which it is not the appropriate solution may be overlooked. When the only available tool is a hammer, there is a danger that every problem looks like a nail. Payment by Outcome more than 60 years. While there are obvious beneits to be gained from studying 9 2020 Public Services Trust 10 In some cases, there will be such a long delay between intervention and impact that payment-by-outcome will simply not work. Since the need for longterm investment in intractable social problems is one of the principal reasons why governments become involved, this may be a signiicant constraint on the use of payment-by-outcome. At other times, the use of a term contract will create a threshold problem at contract termination, with the commissioner and not the provider owning the residual beneits of successful intervention. In such cases, choice-based markets rather than competitively-tendered ones may be more appropriate. Having been written as a toolkit, much of the report’s value is to be found in the detail, which cannot easily be written up into an executive summary. Its real worth will lie with commissioners who rummage through it and come back to it from time to time for new insights as they grapple with real-world challenges of policy design. 11 Payment by Outcome 1 Introduction Lots of animals, particularly apes, use objects; but what sets us apart from them is that we make tools before we need them, and once we have used them we keep them to use again. This chipped stone from Olduvai Gorge [manufactured around 2 million years ago] is the beginning of the toolbox. – Neil MacGregor, A History of the World in 100 Objects This report is the beginning of a toolbox for policymakers charged with commissioning public services from providers who are paid on the basis of outcome rather than effort. It does not seek to solve speciic policy problems. It does not offer blueprints for the implementation of payment-by-outcome in particular public services. Rather, by relecting on the experience of others, it seeks to explore the challenges involved in using this highly complex form of performance management, to understand some of the tools that have been used to overcome these challenges in the past, and to imagine ways in which these instruments might be used differently in the future. Payment-by-outcome has many names – cash on delivery, payment-by-results, commissioning for outcomes, payment for progress, no cure-no pay – and it is being explored as the possible solution to a wide range of social reforms – inding jobs for the long-term unemployed; raising lifetime earnings of job training participants; breaking the cycle of re-offending; placing children more promptly in stable foster care; rolling out expensive new pharmaceuticals; providing greater stability and accountability in international aid. The challenge lies in knowing how to use performance targets and inancial incentives to focus management on a limited number of high-level policy priorities, promote innovation in service delivery while avoiding some of the distortions associated with traditional performance regimes. 2020 Public Services Trust 12 It is regarded as one of the hottest new ideas in public service management today, and yet the principles underlying payment-by-outcome are not new. Over the past 40 or 50 years, performance management with a focus on programme objectives has been attempted in various ways under a variety of different titles – ‘Management by Objectives’, ‘Program Budgeting’, ‘Performance Budgeting’ and ‘Management by Results’. For the most part, these initiatives have not been a great success, and some of the most highly respected thinkers in the ield of management and public administration have warned against them. In 1969, Aaron Wildavsky concluded that: [Program budgeting] resembles nothing so much as a Rube Goldberg apparatus in which the operations performed bear little relation to the output achieved.1 In 1986, W. Edwards Deming wrote: ‘Focus on outcome (management by numbers, MBO, work standards, met speciications, zero defects, appraisal of performance) must be abolished…’ And in 1996, Henry Mintzberg asked: ‘How many times do we have to come back to this one until we give up?’2 And yet, governments around the world continue to explore payment-byoutcome, and in the United Kingdom it has become the subject of intense interest amongst policymakers and practitioners in the ields of welfare to work and offender management, among others. It is not entirely clear what has stimulated this rebirth, but there are a number of possible explanations: • Performance contracting has been employed by governments across the English-speaking world for several decades, with increasing complexity. Over time, contracting has developed from the speciication of inputs and processes, to the incentivisation of outputs and, in some cases, outcomes. In the UK, for example, contracting for refuse collection has developed into the commissioning for maintenance of local streets and the neighbourhood spaces (including litter collection, verge maintenance and grafiti removal), with some contractors paid in part, based on the perceptions of local residents. Of course, there have been failures in performance contracting, but there have also been notable successes, and policymakers have become suficiently conident in their ability to commission complex services to experiment with payment-by-outcome. For example, there is a widespread perception that competition and contracting in prison management in the UK and Australia have delivered a comparable level of quality at signiicantly lower cost, so that there is now bipartisan support for exploring payment-by-outcome in reducing • In the United States, a number of state and federal initiatives launched in the 1980s made payment-by-results (for outputs if not outcomes) a condition of funding, the most notable examples being job training and foster care. These programmes have been extensively studied, and while they have not been without their dificulties, the indings have in general been encouraging. Building on these programmes, governments in Australia, the United Kingdom, France, Germany and the Netherlands, to mention a few, have experimented with payment-by-results contracting in job placement based on the delivery of intermediate outcomes. And while adjustments are still being made to the incentive regimes, policymakers remain optimistic about the progress being made. • Over the same period, governments in Europe and North America have used performance management to drive service improvements in health and education (although not usually in conjunction with contracting). These initiatives are widely known and controversial, but continue to attract a high level of interest in the policy community. • Payment-by-outcome has also been employed, with varying degrees of success, in the introduction of expensive new pharmaceuticals. Large corporations have been prepared to accept signiicant levels of risk in the inal stages of research and development, as new drugs are brought onto the market. • And in the treatment of chronic health conditions, management techniques developed by managed care organisations in the United States in the 1970s have attracted signiicant international attention. Drawing on the principles of case management and co-production, coordinated care seems to offer valuable insights into how payment-by-outcome might work. So, despite the pessimistic conclusions of an earlier generation of scholars, paymentby-outcome continues to lie at the cutting edge of public service reform across the industrialised world. It appears that policymakers and service commissioners have had enough success to believe that they can do more with it. A Performance Contracting Framework Payment-by-outcome is a form of performance management, and there is no reason to believe that it could not be implemented within an individual government department Payment by Outcome re-offending. 13 2020 Public Services Trust 14 or agency using performance bonuses paid to senior managers. However, this report assumes that it is employed in conjunction with performance contracting; that is, it assumes that payment-by-results is implemented using a contractual model, with a public sector commissioner purchasing speciied outcomes or outputs from an independent or semi-independent provider from the public, private or voluntary sectors. This approach was initially adopted as an aid to clear thinking, particularly on issues such as risk transfer, but it is possible that payment-by-outcome may also work better in a contractual environment. Performance budgeting (which usually has a focus on outcomes or high-level outputs), has been explored by a multitude of governments over recent decades, and it has rarely been a success. In part this is because of the inherent tensions of the budgetary process, in part because of the challenges that a focus on outcomes or outputs brings for existing organisational structures, and in part because of the incremental nature of public sector budgeting.3 However, if we look at payment-by-outcome through a contractual lens, the magnitude of these obstacles is signiicantly reduced. Contracting can be undertaken outside of the budgetary cycle. It is multi-year rather than annual, resulting in lower transaction costs. It can be used selectively, applied only to those programmes presently capable of being managed for results. It can be utilised for all of the funding allocated for a particular function, rather than just the increment of an existing budget; in this way it is possible to stimulate zero-base budgeting each time the function is re-competed. The provider can be held to account inancially for the failure to deliver the contracted results. A great deal more risk can be transferred to providers under a contractual framework than is possible through informal ‘service level agreements’ between government agencies. And the negotiation of a contract focuses debate on operative goals rather than abstract high-level outcomes, forcing policymakers to deal with the problem of multiple and inconsistent objectives. Thus, performance contracting appears to create a much stronger link between resources and results. It may be possible to employ other forms of performance budgeting to accomplish this, but robust examples are dificult to ind. For the purposes of this report, it was decided to use the contractual model as a way of framing the issues in the exploration of payment-by-outcome. Using the Whole Toolkit This report is not only concerned with performance measures and inancial incentives. One of the clearest insights to have emerged from this project is that policymakers must draw on the full range of instruments in the toolchest. Successful is true of individual employment contracts used by corporations to motivate their internal workforce as much as it is of public services commissioned from external providers: Firms use a variety of incentive instruments… Perhaps the most direct is to pay the agent based on measured performance in a given task or set of tasks. But monitoring is imperfect and costly, enabling only a narrow set of activities to be rewarded effectively this way. Asset ownership is often a broader, more powerful instrument. When an agent owns a set of productive assets, she maintains those assets more effectively… A third major incentive is the design of the job: the tasks included in the job description, the activities that are expressly excluded… and the speciication of work rules, working hours, and similar policies that restrict the freedom of the worker.4 The following chapters explore this wider range of instruments – the selection of market models and the design of procurements, the use of non-contractual instruments and non-inancial incentives within the contractual framework, as well as other elements of task design, such as contract duration and population segmentation. Methodology Payment by Outcome builds on an earlier publication by the 2020 Public Services Trust entitled Better Outcomes, which explored the potential gains from paymentby-outcome and where it might be applied, as well as examining some of the challenges involved in its implementation. This report builds on the irst, but goes further in seeking to understand how payment-by-outcome models work, investigating in detail the challenges that practitioners have faced and the tools that have been used to overcome them. The academic literature on performance measurement and management is vast, and the authors have drawn on it to a certain degree, deliberately focusing on those articles that studied real-world cases and paying less attention to those focused exclusively on theory. This is consistent with the general approach of this report, which is to understand how to make payment-by-outcome work in practice. Payment by Outcome performance contracting has never relied exclusively on targets and incentives. This 15 2020 Public Services Trust 16 The principal methodology employed in the research, however, was a case study approach. Three case studies in particular, where payment-by-outcome or other complex performance incentives have been employed in recent decades or are currently being explored, formed the foundation from which insights were extracted and applied to other areas. a. Job placement and job training – one of the earliest experiments in paymentby-results, and the service where the most advanced examples of paying for outcomes are to be found; b. Offender management – where the UK government has a irm commitment to introducing payment-by-outcome, where signiicant design work has already been undertaken by public oficials and research institutions, and where several pilots have been launched; c. Management of long-term health conditions – where there is a history of initiatives involving performance measurement, case management and co-production, and there are insights to be gained from closer analysis. Several other examples were also briely reviewed: pharmaceutical pricing, foster care, international aid and streetscene management. All case studies were themselves a combination of desk research, interviews with practitioners involved in the delivery of payment-by-outcome models in the United Kingdom, and conversations with policymakers concerned with the development of new applications of this approach in the UK. Payment by Outcome is not concerned with answers; rather the cases were studied with the hope of designing a better set of questions. The overriding objective was to understand the challenges involved in developing and implementing payment-by-outcome systems, and to identify the tools that commissioners have employed as they have sought to resolve them. Each case study is different, and comparison has served to highlight some of the reasons why some tools work in some situations and not in others. 17 Why Use Paymentby-Outcome? Payment-by-outcome will not always be the best solution. There are some public services for which high-intensity performance incentives are not appropriate, for example, where neither commissioners nor providers understand a great deal about the best linkages between inputs, outputs and outcomes. And there are some public services for which the identiication and speciication of clear and consistent outcomes is extraordinarily dificult. Since high-intensity payment-by-outcome is not always appropriate, policymakers need to understand the deining characteristics of this particular set of tools, and engage in honest and open debate about why they are being used. What Is It? At the time of publication, the term widely used in the United Kingdom for this form of performance management was ‘Payment by Results’. This phrase had come into use with the election of the Liberal-Conservative Coalition, and largely replaced the term ‘Outcome-based Commissioning’, which had been employed by the previous Labour Government. At the same time, policymakers were exploring the use of invest-to-save as a way of shifting signiicant risk to providers, requiring them to inance a substantial proportion of the up-front costs of intervention, with reimbursement being made later out of identiied savings. Thus, the policy dialogue about performance contracting at the time of publication involved three distinct but related ideas: a. Payment-by-results, the basic concept, is performance management with teeth, performance budgeting with effective transfer of risk. The term implies that Payment by Outcome 2 2020 Public Services Trust 18 providers have suficient control over the resources that impact on outputs or outcomes to be inancially accountable if the desired results do not eventuate. It is often assumed that the performance incentives will be high-intensity, with signiicant inancial risk for the provider, however this is not an essential feature of the model, and in some cases may be highly undesirable. b. Outcome commissioning focuses on the speciication and incentivisation of the ultimate policy or programme objectives. While it may not be possible to design performance measures that directly drive the delivery of primary outcomes, outcome commissioning seeks to clarify the ultimate ends for which a programme was established, and if necessary, to incentivise the delivery of intermediate outcomes or high-level outputs that serve as surrogates for these primary goals. c. Invest-to-save requires providers to inance the up-front investment necessary for service improvement, with compensation being made out of savings that are directly attributable to that investment. In the case of job placement schemes, providers inance the costs of assisting clients into sustainable jobs, with reimbursement being funded out of reductions in spending on unemployment beneits. Much of the interest in invest-to-save is being driven by current budgetary constraints; however, there is also interest in the concept of ‘service PFIs’, a reference to the Private Finance Initiative, where private providers designed, built, inanced and operated public infrastructure such as schools and hospitals, with payment being deferred until the buildings were operational. There are numerous examples in recent years where these three approaches have been pursued in isolation. For example, ‘Payment by Results’ (PbR) has been employed in the National Health Service to refer to a ixed price tariff introduced for the payment of certain medical procedures in hospitals. These are standardised across the hospital system and are based on national averages. However, this was primarily intended as a system of cost control, without particular regard to ultimate health outcomes or concern with invest-to-save. Under the previous government, Public Service Agreements were grounded in the principal objectives of programmes, but there was no thought that these would be associated with performance incentives. And ‘Invest to Save’ grant funding was used by UK central government for some years (for example, in an attempt to break the cycle of child poverty), but there was no expectation that providers would assume the risk of delivering demonstrable savings before they were paid. In the UK, these concepts Over the past year or two, however, politicians and policymakers have begun to bring these concepts together into an integrated framework that would see providers paid, at least in part, based on their success in delivering agreed outcomes. In the two programmes where thinking about this approach is most advanced – welfare to work and offender management – there is also a irm commitment to combining payment-by-outcome with an element of invest-to-save. This is most evident in the Invitation to Tender for the UK’s proposed new Work Programme, released in late December 2010.5 This report is primarily concerned with the irst two of these concepts, paymentby-results and outcome commissioning, which explains the use of the term, payment-by-outcome. Invest-to-save is not essential to this kind of performance contracting, and at this stage, it is unclear how extensively this high-stake form of service contracting can be applied. Placing such a large proportion of revenue at risk is certainly not necessary in order to align providers’ interests with those of commissioners, and for the sake of clarity and brevity, consideration of invest-tosave has not been considered at any length in this report. Why Is It Used? If payment-by-outcome is an instrument rather than an ideology, then commissioners must understand how it works, the conditions under which it works best, and what they are hoping to accomplish through its implementation. This last question matters since it will rarely be possible to use payment-by-outcome in an undiluted form, and as compromises are made, commissioners must keep irmly in mind what they are hoping to accomplish. For example, if what commissioners most want is to stimulate greater innovation in the range of service models, exploring alternative linkages between outputs and outcomes, then it may be necessary to limit the amount of inancial risk transferred to providers. If, on the other hand, there is a desire to have providers fund much of the investment in service improvement up front, then commissioners may have to be satisied with service models that are more familiar and thus more easily priced and inanced. Similar compromises will have to be made in the selection of outcomes, the identiication of workable measures and the design of incentives – there is no one Payment by Outcome have developed separately from one another. 19 2020 Public Services Trust 20 model that will work well in every case and commissioners must understand which aspects matter most. This study has concluded that payment-by-outcome is being deployed for a variety of reasons: • Transferring performance risk to providers: There is no point in attempting to contract for outcomes unless commissioners believe that it is possible to transfer performance risk from policymakers to providers, and that there will be suficient social and economic beneits from so doing. This assumes that there is something yet to be gained, either through the discovery of more effective linkages between inputs and outcomes, or through more effective implementation of what is already known. Payment-by-outcome is being deployed in the realm of ‘known unknowns’, where commissioners believe that there are signiicant gains to be made, but where traditional managerial approaches to reform are regarded as inadequate. • Innovation: In some cases, government agencies focus on outcome commissioning for the explicit purpose of encouraging much greater innovation in the service models. The effect is to shift responsibility for the linkages between inputs and outputs and outputs and outcomes from policymakers to practitioners. This is most evident in recent discussions of its use in offender management where a great deal remains to be learned about desistance and its practical application. Invest-to-save may have more limited application in such cases, at least in the short term, when commissioners will wish to increase the amount of outcome risk transferred to providers. • Devolving authority to the front line: Even under traditional delivery models, front-line public servants enjoy signiicant discretion in how policy is interpreted and applied, and where improved service outcomes depend on much greater personalisation of services, the local knowledge possessed by ‘street-level bureaucrats’ acquires even greater value. Payment-by-outcome provides a vehicle for radical devolution to providers, within a framework of enhanced accountability.6 Performance budgeting and performance contracting are accountability reforms that have often been driven by funding agencies, and oficials from these agencies have sometimes clashed with policymakers wanting to give local authorities, non-proit organisations and front-line delivery agents greater freedom in operational decision-making. Payment-by-outcome is of interest in some quarters because it promises to respect the values of autonomy and discussed in the context of foreign aid and non-proit funding.7 • Secured funding: Because it implies such a strong accountability framework, payment-by-outcome may provide funders with greater conidence in the capacity of providers to deliver measurable outcomes, and thus attract additional funding in the short term or a irmer commitment to inancial support over the longer term. Security of this kind will be of interest to commissioners seeking to make strategic reforms. This appears to be one of the beneits of invest-to-save, where Treasury may be encouraged to release additional funding in future years once the evidence of cashable savings is already clear. • Joining up funding streams: In some cases, payment-by-outcome might be used as a way of integrating funding streams. Sharper focus on primary outcomes enables commissioners to ask challenging questions about the potential beneits of joint funding of shared objectives, and agencies with a strong case for joining-up on their terms can also be expected to support payment-by-outcome for this reason. Invest-to-save may also facilitate integration, since agencies will not need to draw on existing budgets in order to fund such an initiative. • Integrated delivery: There are also those who argue that outcome commissioning will facilitate the joining up of service provision. This may be assisted by the joining up of funding streams, but even where that cannot be accomplished, the transfer of outcome risk to providers should create stronger incentives to bring together those services that necessarily must be interconnected to ensure that key outcomes are delivered. • Overcoming distortions in performance management: Finally, there is some support for payment-by-outcome among providers and commissioners who recognise the beneits of payment-by-results, but also acknowledge some of the dificulties in designing effective performance regimes. In this case, it is the focus on outcomes that offers the greatest scope for improvement. Payment by Outcome accountability simultaneously. This is one of the main reasons why it is being 21 2020 Public Services Trust 22 3 Outcomes The challenges involved in designing and operating effective performance regimes are well understood and can be said to lie in ‘the folly of rewarding A, while hoping for B’.8 Payment-by-outcome seems to avoid this folly by deining what is required (and thus, what will be rewarded) in terms of social impact, rather than the inputs or outputs that contribute to those ends. However, the use of outcomes as performance measures is also problematic, for a variety of different reasons. As James Q. Wilson noted some years ago: Outcomes – results – may be hard to observe because the organization lacks a method for gathering information about the consequences of its actions (for example, a suicide-prevention agency may actually prevent suicides but it has no way of counting the number of potential suicides that did not occur); because the operator lacks a proven means to produce an outcome (for example, prison psychologists do not know how to rehabilitate criminals); because the outcome results from an unknown combination of operator behavior and other factors (for example, a child’s score on a test relects some mix of pupil intelligence, parental inluence, and teacher skill); or because the outcome appears after a long delay (for example, the penalty imposed on a criminal may lead to a reduction – or even an increase – in the offender’s behavior ive years later).9 After studying the measurement of clinical outcomes in the health sector, Professor Peter Smith of Imperial College London arrived at similar conclusions, arguing that outcome measurement works best where: the nature of the outcome is relatively uncontested; it can be captured relatively easily in operational performance measures; an indicator of the outcome can be secured reasonably soon after the external factors; and there is need for considerable clinical judgement as to the most appropriate intervention to offer.10 Thus outcomes are not always suitable for use as performance indicators and it is often necessary to use outputs as proxy measures to give the commissioner some conidence that the primary objective is being delivered. In cases where measuring outputs is too complex, it may be necessary to specify processes or inputs. The challenge of translating primary goals into workable measures brings back the risk of creating byzantine rules and perverse incentives that fail to deliver what policymakers ultimately want – hoping for B whilst incentivising A. Thus, while outcome commissioning may ameliorate the challenges involved in designing effective performance regimes, by no means does it eliminate them entirely. It follows that commissioners must be absolutely clear about what they want to achieve from a particular programme. Which Outcomes? Identifying the primary outcome of a particular programme or agency can itself be problematic. Wilson argued that public agencies often have contextual goals, ‘descriptions of desired states of affairs other than the one the agency was brought into being to create’.11 In part, this is because policymakers need to garner support from broad constituencies, and this helps to explain the widespread existence of outcomes that are stated in ambiguous and inconsistent terms. This is sometimes evident in the establishment of new programmes, even where they rely on payment-by-results. In the United States, the Job Training Partnership Act, a performance-based scheme legislated by the US federal government in 1982, focused training on ‘those who can beneit from, and are most in need of, such opportunities’. Service providers were simultaneously directed to focus on equity (those most in need of assistance) and eficiency (those who would most readily beneit from assistance), and since there was no reason to believe that these would be the same individuals, conlict was introduced from the outset as to how resources should be prioritised.12 With long-established programmes, contextual goals have often accreted over time through incremental policy adjustments. One of the distinct advantages of Payment by Outcome intervention; the outcome is readily attributable to clinical performance rather than 23 2020 Public Services Trust 24 payment-by-outcome is that it introduces greater discipline into thinking about objectives, since providers will be reluctant to accept the inancial risk of delivering results that have been expressed in ambiguous language or remain the subject of internal conlict. Where a department or agency has been delivering services over many years, this complexity may not be obvious, and it is unsurprising that a move to performance contracting sometimes results in tension, as commissioners stumble across implicit outcomes for which there are entrenched but previously unrecognised constituencies. Where services are commissioned across organisational boundaries, then different commissioners and/or providers will often have conlicting missions and cultures. However, ambiguity or inconsistency may also arise from differences within an agency. Fifty years ago, the North American sociologist, Charles Perrow noted that: Oficial goals are purposely vague and general and do not indicate two major factors which inluence organizational behavior: the host of decisions that must be made among alternative ways of achieving oficial goals and the priority of multiple goals, and the many unoficial goals pursued by groups within the organization.13 Perrow described these unoficial objectives as ‘operative goals’, and argued that they could only be discerned through careful observation. One of the risks involved in assuming that primary outcomes are unambiguous and that delivery is simply a matter of identifying the appropriate measures and incentives is the temptation to conclude that the only real obstacle to success lies in manoeuvring by middle management for status, resources and power.14 Moreover, attempts to use payment-by-outcome as a vehicle for joining up fragmented supply chains may reinforce the temptation to specify outcomes at a highly abstract level, and encourage commissioners to overlook the brute reality that, in order to be effective, payment-by-outcome must exclude as well as include. As Pressman and Wildavsky observed: If we relax the assumption that a common purpose is involved… and admit the possibility (indeed, the likelihood) of conlict over goals, then coordination becomes another term for coercion.15 One of the deining features of performance management systems is that they demand and by deinition, this must involve downgrading or excluding other objectives. This has been particularly evident, for example, in the controversy over education reform, where performance measures have directed schools and teachers to concentrate on literacy and numeracy at the expense of other important aspects of learning such as creativity and higher-order problem-solving, which are not as easily measured. This may be what policymakers intended, but if they are to avoid unintended consequences, commissioners must be explicit about the objectives that will be relegated to secondary status, or what compensating measures will be deployed. One of the core responsibilities of commissioners must be to identify where different individual and organisational missions and objectives are not aligned, and ind ways of managing the associated tensions. That we focus our attention on a particular [goal], singling it out as our objective, does not mean there are not others within which we must also operate or, at least, ind ways to relax or overcome. Knowing only the avowed programmatic objective without being aware of other constraints is insuficient for predicting or controlling outcomes.16 Where it is not possible to reduce performance measures to a relatively small number of clear and consistent objectives, it will be dificult to shift the risk of delivery down to operational managers, whether they are employed directly by government or indirectly under contract. Wilson concluded that: … the more contextual goals and constraints that must be served, the more discretionary authority in an agency is pushed upward to the top… The greater the number and complexity of those goals, the riskier it is to give authority to operators.17 Thus, if the responsibility for delivering programme objectives is to be devolved to front-line service managers (which appears to be one of the principal beneits of using payment-by-outcome), then it is essential that commissioners acknowledge the complexity of organisational or programme outcomes, and seek to reduce them to a set of unambiguous deliverables. Payment by Outcome a focus on speciic results that are regarded by policymakers as having a higher priority, 25 2020 Public Services Trust 26 The pursuit of clarity does not mean that payment-by-outcome models cannot also incorporate complexity and lexibility. However, it is unlikely that high-stake incentive schemes, where signiicant inancial risk is shifted to providers, will be suited to such models. In public service contracting, contextual goals are often addressed through non-inancial performance incentives, including corporate reputation, professional norms and governance arrangements, and these tend to result in a less highly leveraged inancial incentive regime. Whose Outcomes? Another source of ambiguity is created when multiple layers of commissioners are involved in the administration of a performance regime. This is most evident in federal systems, where the responsibility for delivery is sometimes cascaded down from a national government, which supplies funding and policy oversight, to subordinate provincial and local departments and agencies. Each may have priorities that differ from those of the ultimate commissioner, resulting in a proliferation of decision-points, with much greater scope for misunderstanding and misalignment of interests.18 While this may be less of a problem in unitary systems such as the United Kingdom, there are still suficient layers of administration in, say, the National Heath Service, for ambiguity of this kind to emerge. And the deliberate use of payment-by-outcome as a way of joining up public services will create new sources of uncertainty around the ownership of outcomes. If payment-by-outcome is to be effective, providers must know whose outcomes are being commissioned. One way of resolving this ambiguity lies in identifying a lead agency. But where multiple stakeholders continue to play a role in the allocation of priorities, providers must have a way of discovering well in advance how multiple objectives will be prioritised, and thus how measures will be interpreted and inancial rewards and penalties allocated. This implies that policy adjustments will be punctuated, with change being made periodically, rather than evolving organically over time. While this has some obvious downsides for policymakers, it has major advantages for providers, and makes it possible to transfer performance risk successfully. Level of Outcome Outcomes must be suficiently abstract to allow for innovation in service delivery, including the integration of previously ‘siloed’ services, yet speciic enough to enable independent observation and objective measurement. They must be suficiently to assume the risk of delivering the outcomes.19 In an ideal world it would be possible to contract for primary outcomes, with success or failure ascertained using associated performance measures and appropriate inancial incentives. In the real world, this will hardly ever be possible. In some cases, primary outcomes may simply be unmeasurable, or they may capture so many extraneous variables that it is impossible to attribute improved outcomes to the interventions of individual providers. In other cases, outcome improvements may only be observable after such a long time period, that a performance contract may be unsuitable because of high monitoring costs and the dificulty of attribution. • Measurability is the principal reason why the primary objective of reduced re-offending is useless as a performance measure. In this case, it is dificult to obtain reliable statistics on the number of offences that are being committed, and impossible to measure the rate of re-offending, since that would require the attribution of known crimes to known offenders. Here it is dificult to imagine what kind of system might generate the required information, since offenders have a strong interest in concealing their own rate of re-offending. In other cases, the methodology might be understood, but the costs of collecting the data might be prohibitive. • Duration: The primary outcome in many welfare to work programmes is sustainable jobs for the long-term unemployed. In the UK, the performance measures used in existing schemes are heavily based on retained employment at 13 and 26 weeks, which is not necessarily a strong indicator of sustained participation over the longer term. It is for this reason that the Department for Work and Pensions has proposed that under the Work Programme, performance periods be signiicantly extended. By way of example, for claimants of the Job Seekers Allowance (JSA) aged 25 and over, providers will be paid around one quarter of the potential total only if clients are kept in employment for six months, with another two-thirds, paid in four-weekly instalments, for a further period of up to twelve months. For some of those claiming the Employment and Support Allowance (ESA), the maximum period of sustainment will be two years. These measures have been published in the Invitation to Tender issued in December Payment by Outcome ambitious to encourage innovation, but realistic enough that providers will be willing 27 2020 Public Services Trust 28 2010, and it has yet to be established whether a payment-by-outcome and invest-to-save scheme based on such long performance periods is bankable.20 • Attribution: Ideally, outcomes should also be set at a level that minimises the contribution of external variables. The contribution of one particular prison or probation manager to lower reconviction rates within a deined population may be small. Commissioners may compensate for this by reducing the intensity of the targets and/or the associated inancial incentives, but the challenges of attribution might also be overcome by reducing the level at which performance is measured, from primary or intermediate outcomes, to high-level outputs. Providers will have much greater control over outputs such as successful completion of a drug rehabilitation course, or placement and sustainment in employment. Regardless of whether commissioners choose to measure and reward performance at the level of primary or intermediate outcomes, high- or low-level outputs, processes or inputs, under outcome-based commissioning, they should commence the design process with a close study of the ultimate objectives for which the programme in question has been established, and ensure that the measures selected are aligned as closely as possible with these goals. 29 The Primary Tools: Measures and Standards It has been argued that the dominant philosophy underlying performance incentive models such as payment-by-outcome is the notion of ‘managerial cybernetics’: In its crudest form this envisages the managerial process as follows. Organizational objectives are identified. Performance indicators are developed to relect these objectives. Targets are set in terms of the performance indicators. Management then chooses action and effort to achieve the targets. Progress towards targets is monitored using (performance indicators), and – if there is a divergence from targets – new targets are set and appropriate remedial action is taken.21 This passage was written in the early 1990s, drawing on research from the 1960s, and yet it captures almost in its entirety the underlying structure of payment-byoutcome – identiication of outcomes, choice of measures to relect those outcomes, selection of standards and associated performance incentives. The principal law with the cybernetic model, as the author of this passage recognised, is that organisations are made up of complex networks of human beings, all able to (and inclined to) exercise their own will. Performance management has contributed a great deal to organisational eficiency in the public and private sectors over recent decades, but it has been much less effective where it has failed to take into account this human dimension. This report has been written on the assumption that commissioners understand this unavoidable reality, and build it into the design of their performance regimes. Payment by Outcome 4 2020 Public Services Trust 30 It is not a matter of designing the perfect cybernetic system, and then correcting for gaming once the people are added, but recognising from the outset that the design of measures, standards and incentives must acknowledge the existence of human will. This does not mean that commissioners must anticipate all of these human reactions in advance (although they should certainly try), but rather that they should build adaptive systems that are capable of learning from these responses and being reinvented. Moreover, in taking the human dimension into account, commissioners will discover that they have available a much wider range of tools than might otherwise be suggested by the cybernetic model, including reputational incentives, organisational culture and professional ethos. These aspects are addressed in a later chapter, but irst it is necessary to focus on the primary instruments of payment-by-outcome. Measures, Standards and Incentives Measures, standards and incentives are the primary, though by no means the only, tools of a performance measurement regime, and it is with these particular instruments that this chapter is concerned. The selection of measures will be closely informed by decisions about the level at which outcomes or outputs are to be commissioned. Thus, in the case of offender management where measurement of the primary outcome is not possible, commissioners must decide which intermediate outcome (reconviction or re-imprisonment) and/or which high-level outputs (education, employment, drug rehabilitation and so on) will be assessed. Standards are the targets or levels of acceptable performance that providers are expected to achieve before they are rewarded or sanctioned, and these may be expected to change more frequently than the measurement regime. Incentives are sometimes described as the rewards or penalties applied for meeting or failing to meet these standards, but this narrows the range of potential motivations to monetary ones.22 On the principle that ‘what gets measured gets done’, it is probably impossible to conceive of a measurement regime that does not generate behavioural incentives of some kind, but commissioners have a wide range of choices about how measures will be used and thus what incentives will be created. A provider’s comparative performance may be published amongst its peers, or league tables may be broadcast to the public at large, causing considerable harm to its reputation. By comparison, a small inancial penalty may not serve as is at risk based on performance, then inancial incentives will operate with much higher intensity than reputational ones. However, measures and incentives are only some of the tools to be found in the commissioner’s toolkit, and how they are employed and with what intensity will inevitably be inluenced by the way in which other instruments are used. Contract duration and population segmentation are two such factors, but commissioners are also able to inluence the behaviour of providers through the ways in which systems as a whole are designed. Creating a coherent payment-by-outcome regime demands that commissioners understand how measures and incentives operate, how they are related to the primary outcomes, how they work in conjunction with other tools, how providers may respond when confronted with the full range of incentives, and how to make changes in the face of unintended consequences. Since it is unfair to expect that commissioners can have a full appreciation of all these dimensions from the outset, they should consciously design an adaptive system, so that lessons are learned, alternative tools are employed, and adjustments are made over time in order that the performance regime delivers the outcomes for which the programme was established. Measures That Work One of the greatest potential beneits of payment-by-outcome is that it allows commissioners to specify, measure and reward the ultimate objective, thereby avoiding the distortions associated with the speciication of multiple tasks. However, commissioners cannot avoid the onerous responsibility they bear for designing measures that work. In part, this is because primary outcomes are not often directly measurable, but even where they are, commissioners must have a mature understanding of the linkages between effort and outcome to be conident that the measures are appropriate. Selection of Measures The story has been told of the perverse incentives created by an incentive contract signed by Ken O’Brien, one of the most highly-ranked quarterbacks in North American football: Payment by Outcome a particularly powerful incentive, but if a substantial part of the provider’s revenue 31 2020 Public Services Trust 32 Early in his career, he had a tendency to throw interceptions. As a result, he received a contract that penalized him every time he threw a ball to a member of the opposition. However, while it was the case that he subsequently threw fewer interceptions, this was largely because he refused to throw the ball, even in cases where he should have done so. As Joe Namath put it, ‘I see him hold onto the ball more than he should… I don’t like incentive contracts that pertain to numbers.’23 As a former quarterback, Namath instinctively recognised that a performance regime that focused on only one of the dimensions of this complex role would inevitably result in perverse incentives. Where the delivery of the desired outcome involves multiple tasks or where a single task has several dimensions, but measurement and compensation are based on a subset thereof, providers will reallocate resources to the sub-set for which they are compensated, rather than delivering the full range of desired activities.24 In general, where there are multiple tasks, incentive pay serves not only to allocate risks and to motivate hard work, it also serves to direct the allocation of the agents’ attention among their various duties.25 The complexity and multi-dimensionality of public services is less problematic if commissioners measure several aspects of performance instead of attempting to capture the outcome in a single measure. Of course, using too many measures can also create dificulties since it increases the burden of data collection and reporting and could ultimately constrain providers’ ability to innovate and personalise services. Even where the full range of tasks is incentivised, some may be more observable or easier to measure, and there will be a tendency for these to be prioritised. James Q. Wilson argued that there is a kind of Gresham’s Law at work in many government programmes: ‘Work that produces measurable outcomes tends to drive out work that produces unmeasurable outcomes’.26 Understanding the Service Model Where the linkages between outputs and outcomes are well established and widelyagreed, it is easier to design an effective system of performance measurement. This is more likely to be the case in the pharmaceuticals industry where the causal relationships between taking a particular drug and a given outcome must be wellan example of a relatively straightforward service in which the correlation between outputs, such as the prompt collection of litter and the timely mowing of verges, and the outcome of resident satisfaction, is relatively simple and easily understood. However, there are many public services in which outputs are not closely correlated with outcomes, due to such complexities as human agency, the willingness of clients to co-produce, and external interventions such as recessions or pandemics, and in these cases, Schick may be correct when he argues that: There is no inherent causal link between [outputs and outcomes]. Some outcomes may derive from speciied governmental outputs, many do not. Alternatively, producing the right outputs does not ensure that the desired outcomes will materialise.27 The ultimate objective of human happiness is no doubt one example of such an outcome. But the ambition of signiicantly reducing re-offending may well be another. The leading US criminologist, Joan Petersilia, pointed to this kind of complexity in offender management when she declared: ‘There is nothing in our history of over 100 years of reform that says that we know how to reduce recidivism by more than 15 or 20 percent. And to achieve those rather modest outcomes, you have to get everything right.’28 This makes the selection of good measures inherently dificult. In some cases, this problem can be overcome by specifying intermediate outcomes; for example, in measuring the success of providers in reducing re-offending, commissioners might monitor the reconviction rate. This would be more closely tied to reductions in re-offending than, say, employment outcomes, and would save commissioners the dificult task of attempting to specify all the outputs that may contribute to desistance. Measuring Impact While policymakers must track progress towards the ultimate outcomes, what commissioners most need to understand when making decisions as to whether or not to reward providers, is the value providers have added or the impact they have made on the achievement of those ends. The difference between outcome and Payment by Outcome researched before the drug can be approved. Streetscene management provides 33 2020 Public Services Trust 34 impact can be explained by external factors, such as the robustness of the general economy or the presence of a pandemic, or the ability of clients to solve their own problems without the assistance of state-inanced service providers. Where the commissioner has not speciied the measures and standards so that it is the providers’ impact that is being rewarded, there is scope for intentional or unintentional cream-skimming. Thus, if providers are paid a fee for each person who does not suffer an emergency admission to hospital in a given year, and they have some discretion over which clients they serve, providers will be inclined to select patients with a low risk of admission. However, if the payment is adjusted so that providers serving high-risk patients are paid more than those focusing on those with a lower risk proile, this perverse incentive will be removed. These kinds of adjustments are usually made based on regression analyses using predictive data, although it is possible to make retrospective adjustments based on actual data, with some or all of the payments delayed until the end of the measurement period. Under the American Job Training Partnership Act, providers had discretion over which clients they enrolled because inadequate funding meant they were only able to serve less than 5% of the eligible population.29 In that case, researchers found only modest evidence of creaming. In part, this was because payments were adjusted using a statistical model that took into account the predicted client proile (based on data from previous years) as well as regional economic conditions. Moreover, at one site studied, the ‘social worker mentality’ of front-line staff was so strong that it appears to have trumped any incentives to cream.30 Yardstick competition can also be used to measure the impact of providers. If two or more providers operate under the same conditions and serve the same types of clients, the results of their interventions can be compared. The act of comparison eliminates the impact of extraneous variables, so that if one provider delivers fewer outcomes than another, it is likely that they are underperforming. If outcomes are similar but fewer than expected, then in the absence of collusion, this would amount to prima facie evidence that some external factor has hampered performance. Randomised control trials are the most scientiically rigorous way of measuring value-added, but for ethical and practical reasons they are seldom used in human services. Individual or Population Outcomes positive change in a person’s life; however, measuring the outcomes of a whole cohort of individuals is usually the better design choice. For example, commissioners have the option of measuring and rewarding providers based on each individual that has entered employment, or measuring the percentage of the assigned population that providers have got into work. Measuring population outcomes is a precondition for identifying and isolating the impact of extraneous variables, and thus increasing conidence that providers are being paid for the value they have added. The disadvantage of this form of measurement is that it requires monitoring of an entire cohort over time, with providers receiving outcome payments only at the end of that period. However, in Flexible New Deal, a programme for the long-term unemployed, commissioners were able to rely on the measurement of individual outcomes. Because of the long period of unemployment of these jobseekers, any outcome that occurred while jobseekers were on the Flexible New Deal could be assumed with some conidences to be a result of the programme. Taking Other Tools into Account The design of performance measures does not occur in a vacuum. The measures selected will have an impact not only on the incentive regime, but also on other aspects of the entire payment-by-outcome system. In particular, population segmentation should be kept in mind when designing the measurement regime. Measures which are appropriate for one sub-group may not function well for another. For example, for patients with long-term conditions who are at high risk of hospitalisation, measuring the number of emergency hospital admissions may well be appropriate. For lower-risk patients, a more informative measure may be the rate of healthcare utilisation, including GP, specialist and hospital visits. (The subject of population segmentation is further discussed in section 6.1.) Designing Standards Standards establish the level of required achievement, and it is not unusual for multiple standards to be used for a single programme. In the UK Employment Zones, for example, providers had to ensure jobseekers retained work for 13 weeks in order to be eligible for outcome payments. In addition, they were also required to Payment by Outcome Individual outcomes are obviously important since they are indicative of concrete, 35 2020 Public Services Trust 36 obtain work for a certain percentage of jobseekers referred to them in order to be eligible for a bonus award (or prize as these awards are sometimes known). Ideally, standard-setting should be based on a solid understanding of the status quo so that targets are realistic but suficiently challenging to stimulate innovation on the part of providers. There is a risk that competitive tenders launched without any real understanding of the baseline will result in a ‘winner’s curse’, with the successful competitors bidding a price and/or level of performance risk that is unattainable. This risk is greater where providers compete in setting the standards. In the UK’s Pathways to Work, for example, the commissioner established the standard for clients remaining in work at 26 weeks, while providers competed to set their own monthly targets for numbers of job and sustained job outcomes. As noted in Chapter 8, Pathways to Work resulted in a winner’s curse, with contracts being signed at levels of performance that were unachievable. This is not to say that providers cannot play a role in standard-setting, and this is one issue that could be addressed in consultation with the industry and through competitive dialogue. Inevitably there will be tension, since providers would prefer standards to be ‘realistic’, while commissioners would like them to be stretching so as to encourage innovation. Types of Standards The type of performance standard selected can have a large impact on the incentives that providers face. There are at least three different kinds. Thresholds are strict standards in which providers receive no reward until a certain level of achievement has been reached. For example, in one streetscene management contract in the UK, the provider did not receive any of the outcome payment unless 60% of the area’s residents were satisied with the services they received (although the outcome payment was a small proportion of total revenue). Thresholds have the advantage of setting high standards and giving providers a clear focus, but can also create large distortions. As Propper and Wilson warn, ‘target indicators introduce an arbitrary dichotomy into continuous data and will therefore focus agents’ attention on the borderline’.31 In the education sector, thresholds have been blamed for focusing teachers’ attention on pupils who, without extra help, may have just missed the target, at the expense of pupils who would have had greater dificulty in meeting it, and thus were the kind of student the policy was probably meant to assist. With distance-travelled standards, the aim is to reward incremental payment-by-outcome models in offender management often advocate the use of distance-travelled standards, so providers would be rewarded for improvements on the current re-offending rate, rather than for lowering re-offending to a certain level. This approach is adopted in part because desistance is not often binary, and offenders typically reduce their rate of re-offending over time. Distance-travelled payments often relate to the time period over which outcomes are sustained. The performance regime for the UK’s proposed new Work Programme is based in part on monthly payments for a deined period of time over which employment is sustained. These standards are especially useful where the status quo is not well understood. The disadvantage of using such standards is that they may not be suficiently challenging to incentivise providers to innovate and achieve far better outcomes. Milestones can be used with intermediate outcomes, in which case they signal progress against a distance-travelled measure, or with outputs, where they represent stepping stones on the road to intermediate or primary outcomes. They measure progress that commissioners value in its own right and choose to reward. This may be because the milestones signal the achievement of a second outcome (such as lower costs), or because they reduce performance risk for providers or assist them in dealing with cash low problems, or because they provide early evidence of partperformance and thus help to overcome problems with creaming. In the UK, welfare to work providers receive payments when jobseekers are placed in a job and then when they retain work for 13 weeks, even though the primary focus is on ensuring that clients remain in employment for 26 weeks. These milestones are considered valuable enough to be worth rewarding, perhaps because they provide evidence of cost savings, although they have the additional advantage of addressing cash low requirements on the part of providers (particularly important for small to medium-sized enterprises and not-for-proit providers). Milestone payments can also serve to motivate providers to work with disadvantaged clients who may not meet the inal standard but will nevertheless achieve some of the benchmarks. The problem with milestones lies in selection. In some cases, payment-byoutcome may be used where commissioners are unsure of the linkages between outputs and outcomes, so it is dificult to know how to reward progress. In other Payment by Outcome improvement, rather than setting the bar at a particular height. Proposals for 37 2020 Public Services Trust 38 instances, personalisation is required and rewarding the achievement of particular outputs can detract from this. Of course, these various types of performance standards can be combined, and often are. In the payment-by-outcome pilot for Peterborough Prison, a blend of threshold and distance-travelled measures is being used. The investors only begin to make a return when re-offending has decreased by 7.5% on the current rate; after this point they make increasing returns, up to a maximum reduction in re-offending of 13%. In all cases, rewarding a provider’s contribution to achieving the standard will create fewer distortions than simply rewarding the provider because the standard was reached. The challenge lies in knowing how to measure providers’ impact. Consider a situation in which providers have discretion over which jobseekers they accept onto their programmes. If providers are simply rewarded for each jobseeker obtaining work, they may attempt to screen potential clients based on their probability of inding jobs, leading to inequality of access. If, however, providers’ rewards for job placement are adjusted based on the characteristics of the clients they serve (clients’ distance from the labour market), they will not be incentivised to restrict programme access. Duration of Performance Period The length of time over which providers must sustain their performance in order to qualify for payment is another variable that commissioners must take into account when setting performance standards. In some cases, the outcomes public services aim to achieve are only deliverable over the very long-term. Indeed, there are many services that are commissioned by government rather than private individuals or families for the very reason that beneits will only be realised over the course of a lifetime. Payment-by-outcome does not work well with policies where there is a long delay between action and outcome. Early intervention schemes are a classic example, where it is hoped that support services provided to children and parents in the early years of life will improve health and education outcomes in later life. Time lags may also pose a problem for the use of payment-by-outcome with certain categories of criminal offender, although this remains the subject of ongoing research and debate. The reasons why a long delay between intervention and measure matter are several. First, extraneous variables will become more problematic as the performance period is extended; it will become more dificult to determine the outcome measure. Second, outcome payments cannot be withheld for very long periods if commissioners wish to sustain a diverse and commercially viable support market. Finally, over extended periods, monitoring costs will outweigh the beneit of ensuring the outcome has truly been achieved. This means that where the results of interventions are not visible for many years, as in early intervention services, paying for outcomes will not be suitable. Where payment-by-outcome is implemented, commissioners must strike a balance between a long monitoring period, which may increase the certainty that the primary outcome has actually been delivered, and the practicalities of extraneous variables, provider cash low and monitoring costs. To do this well requires a solid understanding of the intermediate and primary (or in this case, short- and longterm) outcomes. How Measures Are Used Measures and standards are capable of being used in a number of different ways, and it is not necessary that they should always be expressed as mathematical formulae. For example, in thinking about performance budgeting, the Canadian academic, Allen Schick, has drawn a distinction between their use as analytic tools and as decision rules: … analytic tools empower budget makers, whereas decision rules constrain them. The former allow full scope for judgment and subjectivity, the latter made budgeting less judgmental and more objective.32 Another way of making this distinction is to contrast the use of measures for weighing performance as opposed to just counting it. The difference between the two lies in the interposition of human discretion, and relatively crude performance measures can be highly effective as long as they are balanced by human judgement. Freelance journalists are typically paid by the word, but this does not operate as a perverse incentive, since payment is based on the words published, not on the number submitted. The editor has complete discretion over which words are inally used, so that what appears to be a crude quantitative measure operates in practice as a highly-effective form of quality control. Thus, measures and standards can Payment by Outcome impact of the initial intervention on the status of the individual, as indicated by the 39 2020 Public Services Trust 40 operate either as an input to professional decision-making by contract monitors, or as formulae with which monitors are invariably required to comply. Measures also work differently when they are used as decision rules, with several inancial or reputational penalties attached, leading some observers to conclude that ‘when a measure becomes a target, it ceases to be a good measure’.33 Pascal and Marschke give the following example from an automotive repair chain in North America. Before Sears implemented its incentive scheme, Auto Center proits may have been positively correlated with the number of repair jobs. This statistical relationship may have prompted Sears oficials to compensate Auto Centers on the basis of the number of repairs completed. Once Sears began paying managers bonuses for meeting service quotas, however, those service quotas became the managers’ objective. It was not long before the managers had found easy ways to boost sales volumes that did not also result in higher store proits. By charging customers for un-needed and unperformed repairs, store staff uncoupled the performance measure from the store’s long-term proits. Their response to the incentives drove up the value of the performance measures while driving down proits. Thus, repairs and long-term proits would not be positively correlated after Sears based pay on the number of repairs performed.34 While the publication of numerical performance standards can create highstake incentives for providers, in general, reputational incentives will provide commissioners with much greater lexibility. Australia’s Job Network ‘star rating’ system is not used to determine how much providers are paid, but to decide whether to extend or automatically renew existing contracts. In this way, star ratings operate as a particularly powerful reputational incentive. While they are quantitative in nature, they do give attention to certain qualitative dimensions, such as providers’ performance in serving the most disadvantaged. It is also possible for commissioners to construct complex assessment frameworks with relatively few numerical measures. For example, HM Inspectorate of Prisons for England and Wales employs robust and long-established assessment criteria when it inspects prisons and young offender institutions, but these are subjective in nature, and laid down in a 248-page guidance document entitled ‘Expectations’, now in its third edition. While the inspections are based on a large published reports provide an overall judgment about the quality of the prison, based on a simple four-part ranking. Payment by Outcome number of assessments about processes or low-level outputs, the Chief Inspector’s 41 2020 Public Services Trust 42 5 The Primary Tools: Incentives In selecting the performance incentives to be used in conjunction with the measures and standards, commissioners are able to adjust their intensity – the level of risk assumed by providers – and to select among a range of different incentive mechanisms that operate in somewhat different ways. Contrary to what is often assumed, commissioners have a great deal of choice in the design of the incentive regime and they are not forced to rely exclusively on inancial rewards and penalties. Adjusting Intensity Any distortions in the design of a measurement regime will be magniied if the associated incentives are high-stake, with the potential for large gains or losses on the part of providers. Invest-to-save schemes are high-stake, since providers are expected to inance a substantial proportion of the investment in improved outcomes (and thus reduced costs) before any payment is received. However, any system of performance incentives can be constructed so that it is high-intensity, if a suficient proportion of revenue is at risk, or there is the potential for a large enough impact on reputation. Research into the impact of high-stake incentives on teaching in the US has found (unsurprisingly) that: … as stakes increase, so does the inluence of the test… teachers in highstakes situations… reported feeling more pressure to have their students do well on the test, to align their instruction with the test, to engage in more test preparation, and so forth.35 Given that the objective of state-mandated testing was to focus school districts, schools and teachers on speciic outcomes, this might indicate that the policy has been a success. However, increasing the severity of the consequences also creates Teachers in states with high-stake tests are much more apt than their counterparts in states with lower-stake tests to engage in test preparation earlier in the school year; spend more time on such initiatives; target special groups of students for more intense preparation; use materials that closely resemble the test; use commercially or state-developed test-speciic preparation materials; use released items from the state test; and try to motivate their students to do well on the state test.36 Using previous tests to drill students helps to improve test results, but research suggests that it is unlikely to contribute to meaningful learning. This comes as no surprise, but serves as a reminder that as the stakes are increased through higher inancial rewards and sanctions, any distortions in the measurement regime are magniied and the incentive to game is strengthened. High-stake incentive regimes are problematic where the nature of the service performed is complex and quality is dificult to measure. Thus, with the planting of seedlings in forestry plantations, irms in British Columbia generally pay their workers piece rates rather than salaries since these are associated with signiicantly higher productivity. However, where planting conditions are dificult, so that workers are required to exercise more care and discretion to improve the seedling’s chances of survival, salaries seem to be used more often.37 As tasks become more complex, greater collaboration is required among different providers, outcomes become more dificult to monitor and greater judgement is expected of providers, commissioners tend to reduce the intensity of their performance regimes and sometimes withdraw from the use of performance incentives altogether.38 Adjusting Diversity Incentives must be considered in their totality: ‘the range of instruments that can be used to control an agent’s performance in one activity is much wider than just deciding how to pay for performance’.39 Much of the academic literature seems to assume that inancial penalties are the only tools in the commissioner’s toolkit. Were this assumption to be built into the design of a payment-by-outcome system, Payment by Outcome stronger incentives to ‘teach to the test’: 43 2020 Public Services Trust 44 commissioners would have little alternative but to ratchet up the intensity of incentives whenever they wished to focus providers more closely on desired outcomes. In fact, successful performance contracting usually relies on a blend of incentives to align the interests of providers and commissioners. Thus, in a mature payment-by-outcome system, we should expect to ind a reliance on inancial rewards as well as inancial penalties, regulation and certiication, individual and corporate reputation, professional norms and organisational culture, and structure and governance. • Financial rewards: Psychologists maintain that fear is a more powerful source of motivation than hope, and this, combined with the fact that in political debate it is easier to be seen as punishing failure than rewarding success, results in a systematic bias in favour of using penalties. However, it is dificult for providers to encourage their employees to strive for excellence when success is never acknowledged, and the only form of public recognition is the shame associated with the failure to meet performance targets. Ideally, inancial incentives should consist of a mixture of rewards and penalties. Prizes are a form of positive reinforcement where providers know that there will be a reward for excellent performance, but they do not necessarily know in advance what standard will attract the reward, or what the value of the prize will be. Where commissioners are obliged to allocate a speciic sum of money for a programme, the total value of the prize may be stated in advance, but how it will be shared among providers may only be known at the end of the assessment period, once relative performance has been ascertained. Prizes are widely used in the academic and scientiic communities, but they were also used in Employment Zones, one of the earliest stages in the development of the UK government’s contracts for welfare to work. Since neither the performance target nor the amount of the bonus was disclosed by the commissioner, the prize acted as a ‘last mile incentive’, encouraging providers to ind employment for as many jobseekers as possible. • Reputation: Commissioners get a great deal of additional effort for free when providers believe that good performance will be recognised when contracts come up for renegotiation. While not originally designed with this in mind, prison inspections in the UK provide an outstanding example of reputational incentives at work and there is strong anecdotal evidence that prison management irms Reputational incentives involve the intermediation of a public oficial who is capable of exercising professional judgement in the interpretation of results. As a result, they are less vulnerable to gaming or distortion through extraneous inluences than exclusive reliance on a set of mathematical formulae. Reputational incentives involve weighing and not just counting. • Regulation and certification: In a modern society, there is a great deal of generic regulation, covering such issues as employee relations, health and safety, professional conduct and environmental protection, that contributes to the mix of incentives surrounding public service providers. The medical profession and the pharmaceutical industry, for example, are highly regulated, so that any use of payment-by-outcome in long-term condition management must have cognisance of this wider framework of constraints and incentives. This also applies in the ield of criminal justice, where human rights legislation serves to limit the range of interventions that might be used. In the UK, a group of experts, organised as the Correctional Services Accreditation Panel, also assists the Ministry of Justice in the certiication of new programmes for offenders. In a deep market where there are opportunities for repeat business, and where providers believe that reputation matters, companies will invest in securing third-party certiication as a way of signalling to potential customers that they are different from their competitors. It is for this reason that public service providers pursue RoSPA (Royal Society for the Prevention of Accidents), BiTC (Business in the Community) or Public Servant of the Year awards and other ‘kitemarks’ of corporate social responsibility. • Norms and Culture: Professional norms will also reduce the amount of gaming on the part of front-line service providers such as doctors, teachers and social workers. North American research found that public servants responsible for screening potential beneiciaries of job training schemes did not engage in creaming where that would have improved the inancial rewards to their employer. Professional culture was an important factor in this case, although there was also a prohibition under the scheme on inancial incentives being cascaded down to front-line staff.40 Payment by Outcome worry about the tone as well as the content of these reports. 45 2020 Public Services Trust 46 However, professional norms may constrain gaming even where individual providers do have the potential to capture some of the inancial gains. It is generally considered that cultural incentives constrained self-seeking behaviour under GP fundholding in the UK during the early 1990s where, on the face of it, general practitioners might have been expected to restrict access to secondary care to generate savings. • Governance: In some complex public-private partnerships, control is exercised not only through contractual instruments, but also through governance arrangements that are similar to the controls employed within traditional hierarchical organisations. Partnership boards (in the case of joint ventures), contract boards, and memoranda of understanding supplement the inancial and other hard incentives built into the contract. Thus, inancial incentives are only one of the tools available to commissioners, and this report concludes that policymakers should look to the much wider range of instruments in the policymaker’s toolchest. This conclusion is similar to that arrived at by James C. Robinson in a 2001 article on the different incentive regimes in American medicine: There are many mechanisms for paying physicians; some are good and some are bad. The three worst are fee-for-service, capitation, and salary. Fee-forservice rewards the provision of inappropriate services, the fraudulent upcoding of visits and procedures, and the churning of ‘ping-pong’ referrals among specialists. Capitation rewards the denial of appropriate services, the dumping of the chronically ill, and a narrow scope of practice that refers out every timeconsuming patient. Salary undermines productivity, condones on-the-job leisure, and fosters a bureaucratic mentality in which every procedure is someone else’s problem. But American medicine exhibits numerous interesting compensation schemes that blend elements of retrospective and prospective payment, of fee-for-service, salary, and capitation. These innovations seek a middle ground between high- and low-intensity incentives, between piece rates and straight salary. Payment mechanisms also are embedded in and supported by nonprice mechanisms – i.e., by methods of monitoring and motivating appropriate behaviour that may have inancial consequences but rely more directly on screening, socialisation, proiling, promotion, and practice ownership.41 Managing Gaming 47 Gaming has already been mentioned a number of times this report, partly because Payment by Outcome it is such a pervasive problem in payment-by-results and partly because many of the tools discussed in this report are deployed to reduce the incidence or impact of harmful gaming. A fuller discussion is justiied because, while the academic literature has recognised that some forms of gaming are beneicial, the policy debate has not yet caught up. Adapting Carolyn Heinrich’s deinition, gaming can be said to occur when providers increase their measured performance in ways that do not increase their performance in relation to the primary outcome sought.42 It can be dificult to avoid creating the perverse incentives that lead to gaming because primary outcomes can rarely be incentivised directly. Rewarding the achievement of a range of intermediate outcomes or high-level outputs, even where commissioners put an enormous amount of effort into selecting the right ones, still has the risk that providers can perform well on these while failing to deliver the primary outcome. However, the risk of ‘gaming’ increases when commissioners have failed to identify all of the contextual outcomes of a programme. In so doing, they therefore fail to design a system that incentivises all of what they want the programme to achieve, and are then disappointed when providers fail to deliver results that are not speciied and incentivised. While commissioners may perceive this as gaming, the term is inappropriate when applied to the actions of providers, since they could not have been expected to deliver what was not communicated to them in their contracts or the design of the system overall. This kind of misunderstanding can only be avoided if commissioners take seriously the initial task of selecting outcomes and agreeing detailed programme objectives. Barnow and Smith have argued that different kinds of innovation are likely to result from the introduction of performance incentives.43 Depending on the goals, some types of innovation that are sometimes labelled as ‘gaming’ could be beneicial to the programme as a whole. • Change in the (technical) efficiency of service provision: This refers to improvements in the quantity and quality of the effort put forward by providers and is rarely, if ever, unwelcome. In fact, one of the reasons commissioners may choose to implement payment-by-outcome is that it encourages providers to make managerial changes that improve both eficiency and effectiveness. • 2020 Public Services Trust 48 Change in the persons served: This kind of innovation is often labelled as gaming and attempts are made to regulate provision or incentivise providers so that they work with the same range of clients that were served prior to the introduction of payment-by-outcome. However, providers’ search for those individuals they judge to be most likely to respond to intervention may increase eficiency. Indeed, commissioners should exploit the information about the kinds of individuals providers are creaming and parking to segment the population differently, adjust incentive payments, introduce new measures and standards, and make other changes so that the system produces outcomes that are closer to those originally intended. For example, creaming (where providers focus on those individuals closest to the labour market) and parking (where the most dificult-to-help clients are provided only a minimal service) are relatively common in welfare to work programmes. They occur because commissioners have not suficiently incentivised providers to work with the most disadvantaged, for example by paying them more when they achieve outcomes for this group. Clearly, this may be a problem for reasons of equity, but creaming and parking also provide commissioners with valuable information about the actual cost of helping those jobseekers furthest from the labour market, empowering them to make the changes necessary to ensure this group receives the services they require. • Change in the mix of services provided to clients: Again, changing the range of services clients are able to access could be considered gaming since commissioners may feel citizens are receiving a poorer quality service as a result. However, if payment-by-outcome has the effect of stimulating greater eficiency in service delivery, then we should expect to see a change in the mix of services. These changes are only problematic insofar as commissioners have not adequately speciied what they would like providers to achieve. For example, with the introduction of payment-by-outcome in welfare to work, many providers stopped sending jobseekers on lengthy skills courses and became much more work-focused. While less investment in human capital development might be considered undesirable, an approach that moves people into work more quickly and less expensively is more eficient, certainly in the short term. There is some evidence to suggest that a ‘work irst’ approach such as this may also be more eficient over the long term. • Outright gaming: Sometimes providers exploit design weaknesses, undertaking in ways that have little if any bearing on the primary outcome. This is outright gaming and commissioners should seek to reduce or eliminate the design features that allow it since it means providers are able to earn payments for non-welfare-improving activities. Some of the most outrageous examples of this come from the health sector, where studies have found that in order to perform well on waiting time targets, staff in some hospitals removed wheels from stretchers so they could be designated as beds and the patients redeined as having been admitted.44 In extreme cases, gaming may amount to fraud, and thus can be dealt with through the criminal law. For the aforementioned reasons, it is important for commissioners to assess individual cases of gaming to determine whether or not they are truly undesirable. In general, gaming will be considered most harmful where it leaves certain groups of individuals in a worse position than prior to the introduction of payment-byoutcome. Where services under a payment-by-outcome scheme are additional, there can be fewer objections on these grounds. For this reason, it may be easier to pilot payment-by-outcome in additional services to allow learning to occur, before applying the model to core services. Where gaming is undesirable, commissioners will need tools that help reduce or eliminate the perverse incentives that cause it. Adjusting the mix of performance measures so they accurately relect the primary outcome sought; using measures appropriately, assessing impact and interposing discretion where necessary; changing the intensity and diversity of incentives; and segmenting the population differently (discussed in chapter 6) are among the tools commissioners can use to reduce harmful gaming. Payment by Outcome activities which allow them to perform better on the performance measures 49 2020 Public Services Trust 50 6 Other Design Tools In designing performance incentives in the workplace, irms do not rely solely on employee remuneration. To the contrary, a substantial proportion of workplace incentives are embedded in the design of different jobs. Much the same applies to the performance regimes constructed for motivating and controlling independent providers. Apart from the design of the performance measures and the mix of inancial and non-inancial incentives, commissioners have several other instruments through which they can exercise control. These include the boundaries of the population whose outcomes providers are expected to improve, and the duration of the contract, elements that are similar in effect to job design within the irm. Population Segmentation It is inherent in the process of policy implementation, and particularly in programmes relying on payment-by-outcome, that commissioners will focus on a speciic population whose welfare the policy is intended to improve. Thus, a job placement programme might focus on inding sustainable work for the long-term unemployed, or an offender management policy on accelerating desistance among prolific offenders. However, population segmentation can also be used as a tool for improving the eficiency and effectiveness of payment-by-outcome regimes, and it is with these applications that this chapter is concerned. Policymakers appear to segment their populations for several different reasons. • Different levels of need: While all members of a particular population (say, the long-term unemployed) may have the same ultimate need (in that case, sustainable employment), the reasons why they are not able to realise this ambition may fundamentally differ, and some groups of individuals may be Different kinds of jobseeker will have different needs, and it is possible that for some individuals, the best policy intervention may not be as suited to resolution through payment-by-outcome. For jobseekers close to the labour market, performance measures based on entry into the workforce and holding down a sustainable job may be entirely appropriate; however, for more disadvantaged jobseekers with complex needs, solutions based on building human capital over time may need to be designed in a different way. In the case of offender management, policymakers might focus a particular programme on sex offenders, or tailor services to prisoners with mental health problems. While payment-by-outcome may still be appropriate in these cases, commissioners may decide that the needs of different groups are better addressed through separate schemes with different architectures. • Costs and benefits: In some cases, policymakers may choose to focus a policy intervention on a particular sub-group for the explicit reason that the costs of delivering the outcome to these clients are expected to be lower, deliberately targeting the low-hanging fruit. For example, commissioners may concentrate on older male prisoners who are more inclined to desist from criminality for reasons of maturation and rising opportunity costs. Alternatively, they might focus on proliic offenders because of the greater magnitude of the economic gains expected from increased desistance among this cohort. • Reduced gaming: One of the ways that commissioners can reduce the scope for creaming is to segment the population into several different classes of beneiciary that are internally homogeneous, and to pay providers different fees based on the dificulty of delivering outcomes to these groups. Job Services Australia has used the Job Seeker Classiication Instrument to separate clients into three (sometimes two) different groups based on disadvantage, each attracting a different level of payment. In the UK, the population of jobseekers used to be segmented according to beneit type, presumably on the assumption that this might indicate fundamentally different classes of need. However, beneit types were not suficiently useful in categorising clients for the purposes of job placement, and the Department for Work and Pensions is moving to a classiication tool similar to that used in Australia. Payment by Outcome much further from achieving it than others. 51 2020 Public Services Trust 52 • Measuring performance: Where they expect to rely on statistical comparisons or yardstick competition to assess performance, commissioners will need to consider the homogeneity of populations to ensure that comparisons are meaningful. • Increasing price competition: Where contracts are being awarded through competitive tender, homogeneity also makes it easier for commissioners to intensify price competition. Much the same occurs at auctions where auctioneers bring together lots of broadly similar items – competition tends to be more vigorous than when a miscellaneous assortment of items is put to auction as a job lot. • Extraneous variables: The recently-published Criminal Justice Green Paper recognises that population design is also inluenced in part by the provider’s control of extraneous variables. In such cases, population selection will have an impact on the kind of provider that is invited to participate in the scheme. For example, in addressing offenders who are sentenced to more than 12 months in prison, the report notes that they will typically spend time in a number of different prisons, receiving rehabilitative interventions from a variety of different providers. As a result, ‘It can be dificult to identify which organisation deserves the most credit for a reduction in re-offending.’ The recommended solution is to target the proposed payment-by-results system at the providers of probation services, since they ‘have the most continuous contact with the offender, from the sentence to completion’. By contrast, more than half of prisoners serving sentences of less than 12 months spend their time in a single prison, and the Green Paper argues for incentives aimed at reducing re-offending to be directed to prison providers.45 Dynamic Design It is sometimes assumed that population design must take place in advance of procurement, with boundaries remaining relatively static throughout the life of the contract. However, with a learning system, it is possible for segmentation to be dynamic, with adjustment and adaptation over time. On this view, the ideal segmentation of the population cannot be known in advance, but will be discovered through exploration and systemic learning. Thus cream-skimming in performance contracting is seen not only as inevitable, but – assuming systems are designed well and there is scope for adaptation on the part of commissioners – as a necessary part of the learning process. This is often dificult for policymakers to accept, since cream-skimming has to the population for which the contracts were let. It is politically embarrassing, since providers proit from delivering something less than what was contractually committed. However, if commissioners do not fully understand the boundaries of different population segments, and if there is previously unidentiied ‘low-hanging fruit’, for whom outcomes can be delivered at signiicantly lower cost, then providers make an important contribution to the eficiency of the programme overall by identifying these individuals or classes of individual. A political problem arises when commissioners are locked into long-term contracts so that they are paying more than necessary for an extended period of time. On the other hand, where commissioners create an adaptive system, so that exploration is encouraged and lessons are learned, the political risks can be signiicantly reduced. An obsession with not paying for ‘deadweight costs’ – the costs associated with assisting people who would have found a job or given up re-offending on their own – relects a static view of population design, since it assumes that this group can be known in advance. The prospectus for the UK government’s proposed new ‘Work Programme’, released in November 2010, deals with the question of differential pricing skewed in favour of more dificult customers. The prospectus was not supportive of the so-called accelerator funding model which would allocate funding at an increasing rate as more clients were placed in employment. This model was rejected on the basis that ‘it encourages providers to support the easiest to help into work irst’ and the paper opted for a system of classifying job seekers up front, which it assumed was more equitable.46 Whilst not endorsing the accelerator funding model, this report argues that there are strong public beneits in identifying the low-hanging fruit, and assisting this segment of the job-seeking population as soon as possible. A problem only arises if commissioners are obliged to pay unacceptably high prices for identifying this group. Static classiication models may not be suficiently responsive to alternative (and possibly more effective) ways of classifying beneiciaries. Providers may well have access to local knowledge that will enable them to target resources much more effectively on the relevant segment of the population. There is some evidence Payment by Outcome traditionally been viewed as opportunistic behaviour to avoid delivering services 53 2020 Public Services Trust 54 that alternative classiication systems have emerged under Flexible New Deal, as providers have organised themselves and their supply chains in fundamentally different ways.47 Dynamic design is evident in some of the risk-sharing schemes being employed in the roll-out of innovative new pharmaceuticals. For example, in 1999, the North Staffordshire Health Authority, in collaboration with Pizer, introduced a pilot scheme based on an outcome guarantee for a new statin (a drug used in lowering cholesterol). Pizer agreed to refund all costs of the drug if the expected reduction in cholesterol levels was not delivered. This gave Pizer an incentive to identify as early as possible, those patients for whom the drug was unlikely to make a difference, so as to exclude them from the population being served and thereby reduce its inancial exposure, and the company cooperated with medical practitioners to identify the target group. Other outcome-based agreements in the pharmaceutical sector monitor results on an ongoing basis and rely on ‘continuation criteria’ to ascertain whether treatment should be extended. In Australia, a registry was maintained for the administration of bosentan to patients enrolled for the treatment of pulmonary arterial hypertension, with strict criteria for excluding those who were assessed to be not responding. In both these cases, the system was designed from the outset to identify as soon as possible those patients for whom the interventions were not appropriate. There was no suggestion that the drugs should be withheld until a solution had been found that worked for all sufferers of the diseases in question. Paymentby-outcome was deliberately employed to segment the population, and delivered targeted interventions to those for whom they would work best. Much the same approach should be adopted in payment-by-outcome for public services such as offender management and welfare to work. Commissioners should not expect that in the beginning, they will understand all the salient characteristics of the population of potential beneiciaries. Because of their detailed knowledge of local conditions, front-line public servants always play a signiicant role in the making of policy, and the highly devolved and adaptive nature of decision-making under payment-by-outcome means that the inluence of providers will be even greater.48 Rather than trying to ignore these qualities of payment-by-outcome, commissioners should make them work to their advantage. Contract Duration 55 Commissioners’ ability to adjust the terms of the performance regime as they learn Payment by Outcome will depend in part on the length of the contracts. Shorter contracts will enable commissioners to review the terms frequently, reducing prices to capture eficiency savings, segmenting the population differently, and generally adapting the contract to suit changing circumstances. Longer contracts make adjustment of this kind more dificult, but they have other beneits that may outweigh the loss of lexibility. Thus the length of contract ultimately selected will depend on several variables. One is the length of time it takes to deliver the outcome or the evidence of major milestones and this will differ from case to case depending on the nature of the service in question. Of course, contract length will also be determined by the duration of the performance period. For example, it takes a certain amount of time to prepare jobseekers to return to work, and commissioners must take this into account in the design of contracts. In addition the Department for Work and Pensions imposes a 13- or 26-week measurement period on providers once jobseekers are in work, and under the Work Programme, they are proposing measurement periods of up to two and a half years. Contract duration must be at least as long as it takes to produce and measure one outcome or proxy output, although commissioners may wish to allow several cycles of outcomes to be delivered before re-competing the service. Second, some services may require greater initial investment than others and therefore the supply side will demand longer contracts to ensure they reap the returns. Thus, we should expect contract duration to be longer under invest-to-save schemes and wherever greater performance risk is shifted to providers. Commissioners may also wish to let longer contracts in order to stimulate innovation in service delivery. In public service markets, where competitive tendering is highly transparent and where commissioners are reluctant to recognise intellectual property rights in major innovations, the duration of the contract may be one of the few protections that providers have to secure a return on transformational initiatives that require some investment in research and development. On the other hand, very long contracts may weaken the competitive pressure on providers to deliver better outcomes, thereby reducing the level of innovation and service quality. Longer contracts have the tendency to thin the market, since fewer tenders mean less frequent competition and fewer opportunities for new providers to enter the market. Commissioners need to consider how to keep the threat of contestability real in order to maintain a healthy market. 2020 Public Services Trust 56 Shorter-term contracts and more frequent competitions are one way of doing this, although as noted above, they bring with them their own disadvantages. Contract scale – the size of the population being served – is another variable that commissioners can adjust to increase market depth and encourage new entrants, as well as staggering the dates when contracts are thrown open to tender, so that providers face an ongoing series of opportunities to tender their services and demonstrate the value of their reputation. Where longer contracts are preferable, commissioners can impose minimum performance standards which allow them to terminate contracts in the case of unsatisfactory performance. The Department for Work and Pensions has made this a feature of the Work Programme: providers will be given ive-year contracts but must deliver a minimum level of employment outcomes for certain categories of jobseeker; failing to do so will result in contractual action, including contract termination if performance does not improve.49 Another interesting innovation from Australia is the rolling local area tender. At the end of the third Job Network contract in 2006, all contracts of providers performing to a satisfactory standard were renewed for another three years. However, six-monthly reviews allowed commissioners to replace poor performers throughout the period of the extension, thereby maintaining pressure on providers to continue to deliver high quality services.50 Minimum standards, automatic contract renewal and rolling tenders allow commissioners to avoid re-competing services where performance is good and to replace poor performers where necessary. The administered contract may be another option. In this case, an independent arbiter or regulator is appointed whose role is to re-set prices at pre-determined intervals throughout the contract term. This arrangement was used for the London Underground PPPs, where commissioner and contractors were obliged to commit themselves to long-term arrangements with only limited understanding of the challenges the contractors would be required to address. The Department for Work and Pensions has contemplated although not yet implemented the use of such a model in the welfare to work market in the UK. Of course, there are limitations with such arrangements, but they have proved relatively robust in the utility sector for more than a century in North America and the UK.51 57 System Design If the term ‘commissioning’ means something more than just ‘procurement’, then it must include a contribution to the selection and design of the overall systems within which implementation takes place, as well as the design of the policy framework within which procurement occurs. Ownership of the Residual After the decision is taken to commission services from independent or semiindependent providers, policymakers must still decide whether they will be procured through choice-based markets, where individuals or their agents negotiate directly with providers, or competitively-tendered markets where government agencies procure services on behalf of their clients. This is not just a matter of ideological preference, but will be heavily inluenced by the nature of the service in question and the challenges associated with different design characteristics. This report is concerned with public services that are competitively-tendered, rather than choice-based systems. However, as noted in the previous chapter, one of the challenges associated with commissioned services are the perverse incentives that arise at the end of the contract period. In the case of long-term condition management in the UK, the health of the population at the end of the contract period is a valuable asset, but it is owned by the commissioner and not the provider, and unless this asset can be valued and the provider compensated, there will be an incentive to under-invest in the long-term health of service beneiciaries. This perverse incentive can be ameliorated to some extent by extending the contract period, and/or by conining the population to patients with particularly severe conditions, but as long as there is a commissioner-provider split, this problem will remain. Payment by Outcome 7 2020 Public Services Trust 58 Choice-based selection is capable of overcoming this problem, particularly where patients (or agents on their behalf) exercise that choice in selecting their health insurer, rather than just selecting their health provider. In this case, the insurer owns the residual asset, and thus it is in their interest to invest in the management of longterm conditions. That it does operate as a powerful incentive for health insurance irms is evident from the fact that the great pioneers in long-term condition management have been the managed care organisations of North America. The other beneit of a choice-based system is that service contracts are renegotiated periodically, thereby avoiding the threshold effect that accompanies the termination and re-tendering of a single contract for a large population of beneiciaries. This distortion might be ameliorated to some extent if there was the prospect of the contract being rolled over in the case of good performance, but the competition norm in public contracting is so strong that governments are generally unwilling to extend contracts in this way. It is possible that choice-based markets may also stimulate a greater investment in research and development. One of the disadvantages of commissioning public services through competitive tendering is that intellectual capital is undervalued. The public nature of the tendering process means that new ideas are quickly disseminated across the system as a whole. Of course, this also has signiicant beneits, but it does have the effect of discouraging signiicant investment in long-term research and development. Choice-based systems are not as transparent as those based on competitive tendering, and this might also help to explain the signiicant advancements made in recent decades in the managed care organisations of North America. Of course, individual choice is not possible in certain public services. Offender management is the obvious example – it is dificult to imagine how one might construct a corrections system based on prison vouchers. And there are many other challenges associated with relying on individual choice in public services that may tip the balance in favour of commissioning services through a public agency. It is beyond the scope of this report to resolve this particular issue; however, it is important to recognise that alternative market models may overcome some of the dificulties associated with contracting for outcomes. Building a Learning System The most complex payment-by-outcome systems currently in operation – those employed in assisting the long-term unemployed into jobs – have been under development for more than 60 years. The earliest example of performance incentives being used for staff involved in job placement dates to 1948: this was a system for than inancial, but it bore many of the design elements of a modern payment-byresults contract.52 Financial incentives and performance contracts were irst introduced in some of the US states in the 1970s, and the federal government passed the Job Training Partnership Act in 1982, a system that has been amended and improved in the decades since. In Australia and in the United Kingdom, jurisdictions that have explored payment-by-outcome models since the mid-1990s, there has been a succession of schemes as lessons have been learned and conidence has developed in moving to higher-level objectives and transferring more risk. Policymakers in other service areas are now studying these welfare to work models in an attempt to shorten the development process; however, there are signiicant differences between public services, and only rarely can designs be borrowed directly. One of the major conclusions to be drawn from this project is the desirability of payment-by-outcome systems being developed over time. Any move to strong performance incentives and outcome speciication will inevitably be a process of discovery, with initial mistakes and misunderstandings. Academic economists have argued the advantages of ‘adaptive contracting’, where commissioners deliberately leave contracts incomplete, using contract renegotiation to learn from the opportunistic behaviour of providers.53 By adopting this approach in payment-by-outcome contracting, commissioners should be able to exploit gaming behaviour such as parking and creaming, to identify previously unidentiied segments of the beneiciary population and redesign the contracts so that they better serve the public interest. Designing Procurements Since this report is based on the assumption that payment-by-outcome is implemented through a system of performance contracting, it follows that some consideration must be given to the design of the procurement regime through which providers are selected. Competitive tenders are intended to reveal the providers who can deliver the best value for money – the greatest improvement in service quality for a ixed unit of cost; the greatest reduction in spending for a ixed unit of service; and/or the greatest capacity for managing performance risk in the outcome in question. Payment by Outcome the management of full-time employees, and the incentives were reputational rather 59 2020 Public Services Trust 60 This is challenging enough where commissioners and providers understand the nature of the service and how it should be priced, but there is a grave risk of generating a ‘winner’s curse’ in a price-based competition where commissioners and providers do not agree upon the fundamental characteristics of the service being put to tender. Economists have deined a ‘winner’s curse’ as a procurement where the winner inevitably bids a price/service package that it is not able to deliver or assumes risk that it is not capable of managing. Particularly in the early stages, payment-by-outcome procurements seem to bear many of the characteristics of the winner’s curse, and as discussed in the following chapter, this is what appears to have happened in the UK in the competition for Pathways to Work. One way of overcoming this dificulty is to organise the competition so that it is focused on quality rather than price, although the risk in this case is that commissioners may pay too much for services, resulting in large proits and poor value-for-money for taxpayers. This appears to be what happened when the Australian government procured its irst outcome-based contract for the Working Nation programme in 1994. In an attempt to resolve the problem without creating a winner’s curse, Australia set a minimum price which potential providers could bid above but not below. This did not appear to work, however, since providers’ bids did not vary much from the loor price. One proposal that may enable commissioners to be conident of both service quality and value-for-money is the imposition of contracts with periodic price re-sets that maintain pressure on providers to deliver eficiency savings and allow these to be captured by commissioners. Ultimately, decisions about how to procure services must take into account a large number of factors, including the level of innovation commissioners expect to see at the bidding stage and thus how easy it will be to compare the quality and cost of bids; the relative importance of service quality versus eficiency savings; the level of experience of providers of delivering services; and assessments of providers’ ability to cost services accurately. It is possible that where payment-by-outcome is implemented for the irst time, ixed-price competition based entirely on quality may be more appropriate, whereas in mature markets commissioners can be more conident that price-based competition will not lead to a winner’s curse. 61 Case Study 1: Welfare to Work Performance management systems have been used by US governments in their welfare to work programmes since the late 1940s, although the irst large-scale, federally-funded programme that used performance standards, the Job Training Partnership Act (JTPA), was rolled out in 1982.54 Since then, two other federallyfunded initiatives and a variety of state programmes have been implemented. Australia began to experiment with payment for outcomes in its Working Nation programme in 1994. Working Nation was replaced by the Job Network in 1996, and in 2009 the Job Network evolved into Job Services Australia. The UK’s irst payment-by-outcome programme, Employment Zones, was rolled out in 2000. Subsequent programmes have built on the New Deal, another welfare to work programme, rather than the Employment Zones model, as government has sought to increase the levels of risk transferred to providers. Although these programmes all paid for outcomes, the differences among them are striking. In the American JTPA programme, the primary outcome speciied in the legislation was to maximise the ‘return on investment in human capital’, where human capital was deined as a ‘set of skills and knowledge’ acquired through education and training.55 The population targeted for the intervention suggests poverty reduction was another implicit objective. In Australia and the UK, the focus has been more explicitly on ensuring beneits claimants ind employment, promoting social inclusion and reducing the cost of unemployment to government. The measures selected to determine whether these objectives were achieved relected the variation in primary outcomes. Under JTPA, clients’ average wage on entering employment was one of four measures used to determine an agency’s eligibility for an outcome award.56 The US is the only one of the countries studied here that placed emphasis on increasing jobseekers’ earnings, used as a proxy Payment by Outcome 8 2020 Public Services Trust 62 for increased human capital. Australian and UK programmes have focused on measures of sustainable employment, namely the number of jobseekers achieving 13 and 26 weeks continuously in work. Incentive systems also differ. In the American JTPA, states had discretion over the ways in which eligible agencies were rewarded. In most states, agencies simply had to meet a certain standard of outcome achievement to receive an award. However, some states gave the entire bonus to the best-performing agency while some divided it among all agencies performing above a certain threshold. Other states varied the award according to performance relative to the standards, so agencies which far surpassed their targets received more than those which simply complied.57 The Employment Zones programme in the UK was quite different. Providers received some funding upfront, a proportion of which they were entitled to retain if jobseekers found work within a certain period of time. Providers also received standard payments for jobseekers placed in employment and those who retained work for 13 weeks. Subsequent UK programmes such as Pathways to Work and the Flexible New Deal have similar incentive systems to Australia’s Job Networks, with providers receiving service fees of varying levels for accepting jobseekers onto their programmes and then earning outcome payments when jobseekers retain jobs for 13 and 26 weeks. A key difference between these two countries is that Australian programmes have also tended to measure and reward education and training outcomes, while in the UK the incentives have steered providers to focus much more heavily on employment outcomes.58 Finally, the client groups targeted and the ways in which outsourcing has been used to serve different groups of jobseekers varied quite dramatically. Certain American programmes have been aimed at individuals living below the poverty line who do not have to be unemployed to qualify. In Australia, jobseekers are classiied according to their level of disadvantage, and this determines their eligibility for services. Those categorised as closest to the labour market are eligible only for job matching services, while the most disadvantaged qualify for intensive assistance.59 The private and voluntary sectors play a role in providing services for all clients. In the UK, eligibility for programmes has historically been determined by length of unemployment and the type of beneit claimed. In general, those unemployed for less than twelve months qualify for services from Jobcentre Plus, the public employment service, while the long-term unemployed are served by private and voluntary sector providers specialising in personalised case management, with a to establish a Work Programme for most jobseekers regardless of beneit type, and certain jobseekers (for example those classiied as ‘facing signiicant disadvantage’) will have access to this programme after only three months of unemployment.60 This brief chronology demonstrates that payment-by-outcome can be applied to welfare to work programmes in different ways. Each programme has a speciic context which affects its design and level of success. The following sections analyse those tools that are most critical to the successful use of payment-by-outcome in welfare to work schemes. Population Segmentation The population of jobseekers in any country is diverse. They face different types of barriers to work and some are further from the labour market than others. Working with a diverse population has implications for the design of a payment-by-outcome system. First, although the primary outcome for all jobseekers may be inding sustainable employment, the measures used to assess progress may depend on how far a group is from the labour market. Second, providers may need to be paid more to help highly disadvantaged jobseekers who will cost more to get into work. Finally, the different kinds of jobseekers with which providers work may make it dificult to compare their performance, so commissioners may ind it necessary to segment the population into groups to make measuring providers’ impacts feasible. Selecting Measures The UK, the US and Australia all desire one or two primary outcomes for all clients. However, applying the same measures of progress to all jobseekers could encourage gaming and reduce providers’ lexibility in delivering services tailored to their clients’ needs. Therefore commissioners may need to create more than one welfare to work programme, so that providers of different programmes are incentivised to achieve different measures, relecting the group of jobseekers with which they work. If all jobseekers are served by the same programme, and that programme measures only employment outcomes, there is a risk that providers will park disadvantaged jobseekers who have complex, non-vocational barriers to work. For jobseekers closest to the labour market, measures of entry into the workforce and retention of employment are appropriate. However, for more disadvantaged Payment by Outcome separate programme for those with disabilities. The Coalition Government has plans 63 2020 Public Services Trust 64 jobseekers, for whom employment outcomes are unlikely in the short to medium term, a separate programme that incentivises providers to improve education and skills outcomes may well be necessary. Measuring employment outcomes also narrows the scope of interventions providers are likely to offer. Providers may feel obliged to adopt a ‘work irst’ approach which may not successfully address the needs of very disadvantaged clients. Where education and skills outcomes are rewarded, providers are much more likely to invest in developing the human capital of their clients. Commissioners can segment the population based on jobseekers’ distance from the labour market, referring groups to different programmes that deine and measure outcomes in different ways. Australia has adopted this approach. Thus, providers under the Personal Support Programme operated separately from the Job Network and under a different set of incentives which, while still rewarding employment outcomes, focused providers much more on developing human capital through education and skills courses.61 There is some evidence that highly disadvantaged jobseekers beneit more from employment-focused than human capital-focused programmes. However, there is also some evidence that the most successful approach is one in which providers personalise clients’ back-to-work plans so that some are immediately work-focused while others begin with basic skills training or other activities. Programmes that exclusively measure and reward employment outcomes may prevent providers from being lexible about which activities clients undertake irst. Therefore it may be appropriate to refer to other programmes those jobseekers who are likely to beneit from non-vocational support irst.62 Setting Prices To reduce incentives to cream and park, commissioners may pay providers more for securing outcomes for highly disadvantaged jobseekers; however this demands a reliable and trusted means of classiication. In Australia, the Job Seeker Classiication Instrument is used to categorise clients into three different groups, each attracting a different level of payment. Providers have the option of petitioning to reclassify jobseekers they feel have been miscategorised.63 The instrument is only accurate enough to classify jobseekers into broad categories, so that the groups are still quite heterogeneous. While imperfect, it is likely this does reduce the incidence of parking. Commissioners may also choose to establish the proportion of jobseekers as more clients are served. For example, commissioners might not pay anything for the irst 10% of employment outcomes, on the assumption that these jobseekers would have found work without intervention, £600 per outcome for the next 15% of jobseekers, £1,000 for the following 15% and so on. 64 A target accelerator model such as this would eliminate the risk of misclassifying individual jobseekers, but it would still rely on commissioners correctly identifying the proportion of jobseekers in each category and the approximate cost of getting them into work.65 Measuring Impact Because the population of jobseekers is diverse, it is likely that providers will serve different proportions of easy and dificult clients. This makes comparing the performance of different providers, a common way of distinguishing the impact of the service from that of external variables, very dificult. Measuring raw outputs – the numbers of jobseekers retaining employment for 13 and 26 weeks – is relatively simple. However, in order to measure the impact of providers on those outcomes, it is necessary to control for external factors that may have impacted on outcomes. This can be achieved in various ways. In general, commissioners of welfare to work programmes have not been able to compare programme results with those of a control group. An alternative way of attributing impact lies in using a statistically derived comparator, based on the results that commissioners would expect providers to achieve, taking into account jobseekers’ characteristics and labour market conditions. However, this relies on the statistical model being robust. One common way of attributing impact is therefore to compare the results of two providers operating in the same conditions, referred to as yardstick competition. Providers operate in the same labour market and serve similar populations of jobseekers, achieved through referring similar proportions of easy and dificult to help jobseekers. Yardstick competition reveals information about the impact of external variables since if two or more providers are underperforming in relation to expectations, there is a prima facie case that dificult economic conditions have impacted on outcome achievement. Alternatively a large discrepancy between the results of providers is an indication of underperformance by one. Payment by Outcome that fall into each price category and increase the level of payment to providers 65 2020 Public Services Trust 66 Collusion, often raised as a problem with this model, does not seem to have been an issue in the UK, where several Employment Zones and Flexible New Deal districts operated with yardstick competition. This is possibly due to the shadow of the future, and the importance providers place on protecting their brands. Another challenge lies in classifying jobseekers accurately so that providers serve similar client mixes. The tool used to classify jobseekers must be robust, taking into account the key variables that may impact on an individual’s ability to obtain and retain work. Alternatively, providers can serve different types of jobseekers and commissioners can adjust for this statistically before making comparisons. However, statistically controlling for client mix still requires agreement on the characteristics that make jobseekers more dificult to help. Australia makes no attempt to ensure providers serve similar mixes of clients, but rather uses information about the jobseekers served over a certain period to make statistical adjustments to the number of outcomes achieved. This forms the basis for ‘star ratings’, the principal method of comparing providers.66 Designing Incentives The design of incentives is critical in the creation of a payment-by-outcome system since they shape providers’ behaviour, sometimes in unanticipated and unacceptable ways. The principal rationale for paying for outcomes is that it aligns providers’ interests with those of the commissioner. However, since many primary outcomes are not directly measurable, commissioners must incentivise the achievement of intermediate outcomes or outputs instead. Problematically, these are only able to capture some of the aspects of the primary outcome, which means that providers may have a perverse incentive to increase their measured performance rather than their actual performance in relation to the ultimate outcome. Commissioners have experimented with scaled payments, retrospective adjustments, and prizes, all of which are intended to limit the perverse incentives created by paying for outputs. Paying for Outputs Thirteen- and 26-week employment measures do not capture the totality of what commissioners want providers to achieve. These outputs do not, for example, specify which kinds of jobseekers commissioners want providers to return to work or give a detailed indication of the quality of employment providers must help jobseekers ind.67 These omissions create opportunities for providers to game the system, for example by focusing on jobseekers with few barriers to work (creaming), providing jobseekers to accept jobs with little opportunity for advancement (an example of shading service quality). Commissioners can reduce the likelihood that providers will game by employing other incentives. Scaled Payments Increasing the amount of payment providers can earn for helping highly disadvantaged jobseekers into work should decrease the incidence of parking. Under Australia’s Job Network, jobseekers were categorised according to their distance from the labour market so that providers earned more for achieving outcomes for disadvantaged jobseekers.68 Since 2006, experts in the UK have been advocating a form of ‘target accelerator’ in which commissioners pay progressively more the more jobseekers providers return to work. But such a system has not yet been implemented and the Department of Work and Pensions has indicated that it is not supportive of this approach.69 Retrospective Adjustment Rather than estimating in advance the proportions of different types of jobseekers providers will serve and the likely labour market conditions in which they will operate, retrospective adjustment involves paying providers based on actual clients served and conditions experienced. While scaled payments reduce incentives to park, retrospective adjustment also takes into account the impact of labour market conditions on outcomes. As the adjustment must be made retrospectively, commissioners can either pay providers an expected rate and then request a refund or arrange additional payment to providers, or wait until the end of a measurement period to make payments. To the authors’ knowledge, no payment-by-outcome systems are currently using retrospective adjustment, although programmes such as the JTPA have adjusted based on predictions of expected performance. Prizes Prizes can act as ‘last mile incentives’ for providers to continue to achieve outcomes beyond the level where the cost of helping each jobseeker begins to outweigh the amount of the outcome payments. In Employment Zones, providers could earn a bonus for placing a certain proportion of their jobseekers in employment. The Payment by Outcome minimal services to very disadvantaged jobseekers (parking) or by encouraging 67 2020 Public Services Trust 68 amount of the bonus and the proportion of jobseekers providers were required to place in order to earn the prize were not disclosed.70 This ensured that providers had an incentive to ind employment for as many jobseekers as possible, rather than simply inding work for those whom it was proitable to place based on the amount of the outcome payments. The Procurement Process The way in which services are procured can have a large and lasting impact on their quality and the outcomes they achieve. The long history of contracting welfare to work programmes enables analysis of the effects of the procurement process on payment-by-outcome systems. An important lesson from the procurement of Provider-led Pathways to Work in the UK is that competition based heavily on price or risk transfer can lead to a winner’s curse where the winner inevitably bids an uneconomic solution. This occurs where price-based competition is ierce and there is uncertainty about the population to be served and the exact nature of the service to be delivered. Pathways bids were in large part assessed based on the numbers of outcomes providers pledged to achieve. Many potential providers based their estimates of the outcomes they could achieve on their experience delivering the voluntary New Deal for Disabled People programme. However, jobseekers on the mandatory Pathways programme faced complex barriers to work and were far less job-ready than providers had anticipated. Thus the cost to providers of helping these individuals was higher than expected.71 This problem was compounded by the highly competitive bidding process. Providers were required to bid based on the number of jobseekers they would get into work, and this determined the unit price per outcome achieved.72 Partly because they had misjudged the population and partly because the process was so competitive, providers were overly optimistic about the numbers of jobseekers they would be able to get into work, which resulted in their unit prices being very low. Subsequently, providers were unable to supply the numbers of outcomes anticipated, and thus faced severe cash-low problems. These two issues combined to help create an unviable supply market providing a ‘bare bones’ service to clients. Australia provides alternative examples of how to procure services under payment-by-outcome. For the Working Nation programme (1994), the government opted to conduct a competition based on quality, since the market was new and providers inexperienced.73 Running a competition based on quality is likely to based on a very low priced bid, but must ensure instead that their service delivery model is original. Moreover, by ixing the price, rather than allowing it to emerge through the procurement process, the commissioner assumed the risk of setting it too low so that service quality suffered or setting it too high, so that providers made large proits. This may have been a sensible risk, since at this early stage the Department of Employment and Workplace Relations, which was procuring the service, probably possessed more knowledge about costs than the inexperienced providers. When the next government scrapped the programme, it argued Working Nation had been expensive and ineffective at placing jobseekers in permanent employment.74 It is possible that the high programme cost was partly a result of the government setting the ixed price too high, and the generous proits made by some providers seem to conirm this. The three Job Network contracts that followed were procured slightly differently. For the irst contract, Job Network providers could bid to deliver up to three different services: job matching, job search training and intensive assistance.75 Providers could compete on price for job matching and job search training functions, which were fairly standard services.76 Bids were assessed for quality and then ranked by price.77 The Department was not required to accept the lowest bid for any service, but could instead trade off aspects of quality and price. However, prices for intensive assistance were ixed and providers competed solely on quality for this particular service. This was due to ‘concerns that the bidders would initially lack the expertise to cost the new service’.78 For the second round of contracts, the Department decided to allow some price competition for the intensive assistance service. A 75% weighting was given to the quality of the services in the bid and a 25% weighting to price. The Department set a loor price which potential providers could bid above if they pledged to deliver ‘greater outcomes than the average’.79 This kind of competition was intended ‘to ease the transition to a fully competitive market for Intensive Assistance, and as a safeguard to protect service quality and reduce the risk of market failure’.80 In practice the loor set the price, as potential providers bid down to that level: ‘the difference between the average and minimum price for upfront payments for level B clients in [intensive assistance] was less than 7 per cent, compared with nearly 90 per cent for [job search training] (where no loor price was set)’.81 Payment by Outcome encourage more innovation by providers who cannot make their proposals stand out 69 2020 Public Services Trust 70 The third round of procurement saw a return to ixed-price competition as the Department attempted to weaken the incentive to park by ensuring a suficiently high level of compensation for helping the most disadvantaged.82 Thus commissioners can use the procurement process as a tool to promote innovation by selecting providers offering different models of service delivery, and as a means to secure eficiencies through price-based competition. However, where providers are inexperienced and there are uncertainties about the cost of delivering outcomes and the level of outcome achievement possible, using competition to lower the price could lead to a winner’s curse. In immature markets, ixedprice competition may be the most appropriate procurement method. To secure eficiencies with ixed-price competition, commissioners can implement periodic price re-sets to ensure they capture the eficiency savings made by providers. 71 Case Study 2: Offender Management Despite recent decreases in re-offending rates, around half of adult prisoners in England and Wales are reconvicted within 12 months of release.83 The economic costs are considerable: in 2002 the Social Exclusion Unit estimated that imposing a custodial sentence at a crown court cost the criminal justice system £30,500 and incarcerating an offender cost an additional £37,500 per year.84 Reducing the rate of re-offending would therefore help the Ministry of Justice deliver the substantial cost savings required by its budget settlement in the 2010 Comprehensive Spending Review. This case study considers the tools that commissioners of offender management services might use to implement a payment-by-outcome system successfully. In particular, it examines how performance can be measured and how using different measures may affect the scope for innovation. Background Following the Carter Review in 2003, the National Offender Management Service (NOMS) was established to bring together the headquarters of the prison and probation services. The intention was to give each offender a case manager who would follow the offender throughout imprisonment and probation to ensure continuity of relationships and service provision. NOMS, now an executive agency of the Ministry of Justice, is responsible for commissioning and delivering adult offender management services including the safe incarceration of offenders and rehabilitation services in custody and the community.85 After the Social Exclusion Unit’s seminal report and the Carter Review, the government released its Reducing Re-offending National Action Plan (2004). Its Payment by Outcome 9 2020 Public Services Trust 72 core focus was on the resettlement of ex-prisoners through the provision of key services, including accommodation; education, training and employment; mental and physical health; drugs and alcohol; inance, beneit and debt; children and families of offenders; and attitudes, thinking and behaviour. These are known as the seven ‘pathways’ to reducing re-offending.86 A variety of programmes designed to assist offenders to progress along these seven pathways are provided in prisons and in the community. The speciic needs of each offender are assessed and offenders are referred to specialist services for support to meet their needs. More recently, the government published its Green Paper, Breaking the Cycle, detailing further reforms to the criminal justice system, including plans for more effective rehabilitation and the use of payment-by-results to reduce re-offending.87 Payment-by-Outcome Pilots Since the Carter Review there has been growing interest in the use of contracting to drive a greater focus on rehabilitation and resettlement, with a variety of pilots directed to learning how payment-by-results might be implemented in this ield. Some of the pilots focus on getting offenders into stable employment. Given the aim of reducing re-offending and the evidence that improving the employment prospects of offenders contributes to a reduction in re-offending, this is an output; however, the Department for Work and Pensions considers it an outcome. While not yet based on payment-by-outcome, Path2Work was an early pilot directed to the improvement of employment outcomes that achieved comparatively good results. Managed by an alliance of private and voluntary sector providers, the scheme provided services to increase the employability of ex-offenders in the East of England from 2006 to 2009. It was voluntary and additional to other welfare to work services, including New Deal and Pathways to Work. Path2Work provided a variety of services including improving basic skills, job matching and advice on disclosing information about criminal convictions to employers. Offenders were also offered in-work support once they had found employment. Four- and 12-week employment outcomes of participants were monitored, and the results were promising. An evaluation carried out by Deloitte indicated that while Path2Work did not meet its targets, 30% of participants were placed in work, compared with 6% to 11% of offenders participating in similar programmes.88 In September 2010, the Ministry of Justice announced a six-year pilot scheme catering for 3,000 male prisoners at Peterborough Prison serving sentences of less than 12 months (a class of offender who would not normally receive post-release outcome, with the initial investment inanced by a ‘social impact bond’ where third-party investors are compensated if social outcomes are achieved. Investors will be paid for reductions in conviction events compared with a control group.89 Payments start when the reconviction rate of the intervention group is 7.5% less than that of the control group, with increasing returns up to a maximum rate of 13%.90 The Peterborough pilot is the irst in the world where private investors have assumed inancial risk for reducing re-offending.91 Job Deal aims to help young people not in education, employment or training, prisoners with less than three years to serve, and offenders on community sentences into employment. It is part of a larger scheme funded by the European Social Fund and the Department for Work and Pensions, and is managed by NOMS. Phase One of the pilot began in 2010 and will run for two years. The provider assigns each offender a case manager. Together they develop a tailored action plan and identify any specialist support the offender may need. The scheme is voluntary, although participants must commit to meeting with their designated case managers at least once every three weeks. The provider receives 70% of its funding in the form of a monthly service fee; the remaining 30% is contingent on achieving set targets. One third of the conditional payment is for successfully enrolling offenders on the programme. Another third is for achieving ‘hard outcomes’, such as clients entering employment or enrolling in further learning. The remainder is for meeting what the programme refers to as ‘soft outcomes’, but which are in reality a combination of outputs and processes, such as helping clients open bank accounts, organising mentoring and providing in-work support. In early 2010, the Social Market Foundation proposed a model in which ten regional prime providers would operate small prisons dedicated to the incarceration of short-term prisoners. Providers would manage offenders both in custody and the community. They would receive a payment for the secure and humane incarceration of prisoners and outcome payments every six months for two years after a cohort of offenders had been released, based on the number who had not been reconvicted during each six-month period.92 Finally, in December 2010, the government announced it would be commissioning six pilots to test payment-by-results in offender management. Two of these will be aimed at offenders on community sentences and those released on Payment by Outcome support). The contract with not-for-proit providers is based on payment-by- 73 2020 Public Services Trust 74 licence while another two will target offenders sentenced to less than 12 months in custody. It seems likely that at least one of these pilots will be jointly commissioned and will focus on an output such as drug use cessation or employment as well as reduced re-offending.93 A further two pilots will involve local partners working together to reduce re-offending. They will be able to retain a share of any savings made, to ‘be reinvested in further crime prevention activity at the local level.’94 Alternative Service Models Even where providers are encouraged to explore alternative service models with the intention of accelerating the rate of reduction in re-offending, commissioners will have their own views about the linkages between outputs and outcomes, for the purposes of regulation and certiication as well as the more effective negotiation of contracts and the management of providers. Compared to services such as healthcare, where some linkages between drugs and clinical outcomes are well established, there is a lack of widespread agreement about what works to reduce re-offending. While evaluations of rehabilitation programmes no longer conclude that ‘nothing works’, one well-respected criminologist has observed that the evidence now supports the conclusion that ‘some things work for some people, some of the time, in some settings’.95 This lack of agreement about what works underlines the potential beneits that might arise from stimulating greater innovation, but it also helps to explain why commissioners cannot just specify high-level outcomes and allow delivery models to remain a ‘black box’. Two different literatures inform the debate about how best to reduce recidivism. One is theoretical, seeking to understand why offenders desist. The contributors to this literature are psychologists, sociologists and criminologists who test their theories using longitudinal studies. Practitioners can draw on these theories and the accompanying evidence to design interventions to speed the process of desistance. The other body of evidence has been called the ‘what works’ literature. Rather than theorising the reasons for desistance, these authors evaluate programmes to identify those that have been most effective. There is some convergence in the indings of these two literatures. Subscribing to one theory does not necessarily lead to the exclusion of certain types of programmes, although it is likely to lead to some being given more weight. As one source has described it, all programmes carry within them ‘implicit criminologies’ – assumptions about why offenders start and cease committing crime and understandings of framework will have a powerful inluence on the design of any payment-by-outcome system, and the speciication of a particular measurement regime may lock out alternative approaches to rehabilitation, thereby narrowing the scope for innovation. Resettlement approaches: Voluntary organisations have provided services to prisoners leaving custody since the 19th century, but in England and Wales, resettlement gained new importance with the publication of the Social Exclusion Unit report in 2002, which argued that prisoners’ needs upon release were not being met and that this was contributing to the high rate of recidivism.97 The government’s response was the National Action Plan, which identiied seven distinct pathways, six of which were based on public services, while only one focused on ‘attitudes, thinking and behaviour’. The emphasis was on services that helped offenders manage daily life in the community, and the continuity of those services across the boundary from prison to the community. It has been argued that ‘through the gate’ interventions of this kind are, at their root, determinist in nature relecting a view that ‘offenders are largely the victims of their social circumstances and problems beyond their control’.98 Even if this description is unfair, it is true that characterising the problem in this way has resulted in a signiicant emphasis on structural and managerial reforms. Motivational approaches: There is a multitude of competing theories that can be categorised as motivational, but they differ from resettlement theory in their focus on individual agency, rather than structure, in the process of desistance. Change is seen as ‘a dificult and often lengthy process’, with numerous relapses and reversals. Thus programmes based on motivational theory tend to place much greater focus on the development of human capital.99 Motivational approaches to desistance can live harmoniously alongside a resettlement approach, recognising the importance of helping offenders to cope with the practicalities of daily of life. Indeed, it has been argued that the two approaches are mutually reinforcing; solving the practical problems without addressing thinking and attitudes or vice versa is unlikely to be effective.100 The Ministry of Justice recognises the importance of a mixed approach. The 2010 Green Paper Evidence Report cites ‘good evidence that cognitive/motivational programmes… can reduce re-offending; and there is promising evidence about the impact of drug treatment programmes [and] education, training and employment’.101 Payment by Outcome how programmes aid the process of desistance.96 The commissioner’s theoretical 75 2020 Public Services Trust 76 Moreover, the report emphasises the importance of the supervisory relationship between offender and case manager to rehabilitation and reduced re-offending.102 However, the foundation of NOMS’s approach to desistance has been the more effective coordination of services ‘through the gate’: ‘the [regional] plans outlined under most of the Pathways seem to take it for granted that good service provision will result in less re-offending…’ Given limited time and scarce resources, there is an even greater danger that ambitious plans for one-to-one supervision of offenders, as foreseen in the Green Paper, may be compromised by more instrumental approaches.103 Selecting Measures Accurately Measuring Outcomes Implementing payment-by-outcome requires commissioners to choose outcomes and one or more measures to assess and reward attainment. While criminal justice programmes often measure reductions in reconviction rates, interventions can also target outputs such as increasing employment rates. The primary outcome sought by criminal justice programmes is desistance from crime. Theoretically, this could be measured by a reduction in the rate of re-offending; however, it is impossible in practice to measure this. Indeed, it is extremely dificult to arrive at reliable statistics on the rates of crime overall, which means that many individuals classed as irst time offenders by the system will in fact be re-offenders. Moreover, even if all crimes were detected, it would still be necessary to attribute them to known offenders in order to measure rates of re-offending. As a result, payment-by-outcome schemes must rely on intermediate outcomes which are broadly indicative of the level of re-offending and can be measured, such as reductions in the rates of reconviction or re-imprisonment. Alternatively, commissioners might specify a cluster of outputs that they believe contribute to the reduction of re-offending. Reductions in drug misuse, improvements in the stability of relationships, and success in becoming debt-free, although not themselves indicative of desistance, are considered to be important factors in reducing re-offending. Intermediate outcomes are likely to be more reliable measures as they are more strongly correlated with reductions in re-offending. In selecting outputs or intermediate outcomes to measure, commissioners must be satisied that they are reliable proxies for the primary outcome. For example, the length of time over which outcomes or outputs are measured will impact on and March 2000, 43% were reconvicted within one year of their release; by the end of nine years, this igure was 75%.104 Clearly, measures of non-reconviction taken at the end of nine years would more accurately relect actual desistance than measures taken at the end of the irst year. However, longer measurement periods increase transaction costs, so that increased reliability of proxies must be weighed against monitoring costs. Ministry of Justice statistics show that of offenders who completed sentences between January and March 2000, more than three-quarters of those who re-offended in the two-year follow up period did so within the irst year of measurement. This suggests that the advantages of a longer measurement period decline substantially after the irst year. Clear, Assessable and Continuous Measures Ideally, proxies should also be clear and easy to measure. They should also be continuous; that is, where commissioners value progress towards an outcome, measures should relect such progress.105 However, it is very dificult to ind a measure that has all of these characteristics. There are two main ways of measuring reconviction rates. The irst is known as a binary measure, and simply records whether or not, in a certain period, offenders have been reconvicted. The second is referred to as a ‘distance-travelled’ measure and captures more qualitative elements of reconviction, such as how many reconvictions have occurred over a deined period of time and how severe the offences were. Both approaches have advantages and shortcomings that commissioners will need to weigh when making their selection. In offender management, binary measures assess absolute desistance by assessing whether or not offenders have been reconvicted. One advantage of such measures is that they give providers a clear indication of what they must achieve: they send a strong signal that anything less than complete desistance is a failure. Binary measures are more amenable to statistical analysis. In order to measure a provider’s impact on outcomes, commissioners must be able to compare their results with those of a control group or a statistically-derived level of expected performance.106 The use of a control group is considered to be more rigorous. Where for logistical reasons this is not possible, for example where all offenders are receiving post-release services, commissioners will need to compare actual Payment by Outcome their utility as proxies. Of offenders who completed sentences between January 77 2020 Public Services Trust 78 outcomes to statistically-predicted results. The Ministry of Justice generates predictions of the percentage of offenders released in a particular year who will re-offend within 12 months of their release. As binary measures of offending simply inform commissioners whether or not offenders have been reconvicted, they provide limited information about changes in offending behaviour. Distance-travelled measures such as reductions in the frequency or severity of offences, allow commissioners to assess the progress providers have made towards achieving outcomes. Criminologists argue that offenders do not abruptly desist from crime but rather gradually reduce the frequency and severity of their offending.107 Given the nature of desistance, it may be considered unfair to penalise providers who had achieved reduced frequency and severity of offences where some of their offenders were nevertheless reconvicted. One advantage of distance-travelled measures is that they may incentivise providers to engage with high-risk offenders who are unlikely to achieve absolute desistance and who providers may not otherwise wish to serve.108 On the other hand, distance-travelled measures may require more complex measurement systems. For example, offenders who are re-incarcerated are unable to commit any further recorded offences, which may create challenges for commissioners attempting to measure frequency. Commissioners could solve this problem by ‘pausing’ the monitoring of offenders while they were imprisoned and restarting once they were released, but this could make tracking a cohort complicated as there would be multiple programme termination dates. While there may be solutions to this problem, this does suggest that using distance-travelled measures results in much more complex measurement systems. Finally, it is more dificult to attribute the impact of providers with distancetravelled measures. Because there are currently no predictions of the frequency or severity of re-offending due to the complexity of constructing such measures, control groups would be needed to attribute providers’ impact on distance-travelled measures.109 Controlled Innovation One of the advantages of paying for outcomes rather than processes is that providers can experiment with different ways of delivering programme objectives. Commissioning for a reduction in re-offending would allow providers to test alternative service models, including resettlement and motivational approaches, in the search for a more successful way of reducing re-offending. However, as measure intermediate outcomes, such as reductions in the rate of reconviction or re-imprisonment of a cohort, or outputs, such as success in inding stable employment and accommodation, in order to measure achievement. Where commissioners use intermediate outcomes as measures, the scope for innovation by providers is still likely to be considerable since providers may take quite different approaches to reducing reconviction or re-imprisonment rates. Where commissioners decide to measure and reward outputs, however, innovation is likely to be constrained as some outputs will not be compatible with some alternative service models. For example, Job Deal is clearly based on resettlement theory as inding employment is part of coping with daily life outside of prison. Providers of this programme are not free to experiment with motivational approaches since they have been contracted to provide employment-related services, and part of their payment depends on their achieving ‘soft’ and ‘hard’ employment and education targets. Commissioners may deliberately choose to limit the extent to which providers are able to operate programmes based on very different service models, either because they consider the evidence for those approaches to be weak or because the programmes would be politically unacceptable. For example, paying offenders to desist could be controversial. Thus, different kinds of measures can be used to control the level of innovation by providers. Commissioners should take care to use this tool consciously, ensuring they are fully aware of how using a particular measure could narrow or broaden the scope for research and development. Payment by Outcome it is not possible to observe the primary outcome directly, commissioners must 79 2020 Public Services Trust 80 10 Case Study 3: Long-Term Condition Management Background Long-term conditions, the most common of which worldwide are heart disease, stroke, diabetes, asthma, cancer and chronic obstructive pulmonary disease, are diseases that cannot currently be cured but can be controlled with the use of medication and/or other therapies.110 As of January 2010, 15.4 million people in England were living with a long-term condition and the number of people with at least one such condition is expected to rise to 18 million by 2025. The costs associated with treating these conditions are large, accounting for 70% of the total health and social care spend in England. By 2022 public expenditure on long-term care is expected to rise by 94% to £15.9 billion, and given the growing prevalence of these conditions and the concomitant rise in healthcare costs, there is a strong case for exploring new ways of ensuring that they are systematically and proactively managed. Most chronic care programmes in the United States, Australia and the United Kingdom are evaluated in part based on the outcomes they achieve. UnitedHealthcare’s Evercare programme, for instance, deines programme success by the number of hospital admissions avoided through shifting care for frail elderly patients from the hospital to the nursing home. In the United Kingdom, the Evercare model has been adapted to work in a community setting. In the Newham pilot, patients at high risk of suffering another hospital admission in the 12 months following discharge were assigned a community matron who was responsible for ensuring that the patient did not relapse. In East Lincolnshire, an integrated approach to managing chronic obstructive pulmonary disease (COPD) was taken, and a specialised COPD intermediate care team called ‘Inspire’ was established, of admissions, re-admission rates and length of stay, improvements in quality of life indicators and mortality data. However, while outcomes of chronic care programmes have been monitored and published, providers have so far not been paid by outcomes. Instead, physicians in the US, UK and Australia are being paid for outputs and processes thought to improve quality of care. In Australia, GPs have received blended payments since 1999 in an effort to move away from a fee-for-service model. In 2003, outcome payments were introduced, providing additional remuneration for doctors completing certain treatments or tests for a percentage of the population in each disease area. With diabetes, for example, the main indicator of quality is whether a test of blood sugar levels is conducted during the consultation.111 In the United Kingdom, payment-by-results was built into GPs’ General Medical Services (GMS) contract in 2004 in the form of the Quality and Outcome Framework (QOF). This offers inancial rewards to practices that demonstrate their achievement on 128 quality indicators in four domains. Points for clinical quality are awarded where practices can demonstrate that they have fulilled a number of key stages in the management of chronic disease for a proportion of the relevant population. Since 2001 hospitals in the UK have also been placed under a payment-byresults regime. All providers of hospital care are paid nationally-determined fees based on the number of predeined activities (called Healthcare Related Groups) carried out.112 In 2000 the US Congress mandated the Centers for Medicare & Medicaid Services (CMS) to test a hybrid payment methodology for physician groups that combines Medicare fee-for-service payments with incentive payments. The participants were eligible to earn annual incentive payments by achieving cost savings and meeting quality of care targets. In the irst year, physicians were assessed against six quality targets set by the CMS, such as whether a beneiciary’s blood pressure was at the recommended level. In 2001, the California Pay-for-Performance Program became the largest non-governmental physician incentive program in the United States. Performance is assessed based on clinical process measures, and, since 2005 physicians are also remunerated for achieving targets, such as reducing diabetes patients’ blood sugar levels to a certain threshold. Payment by Outcome spanning primary and secondary care. Outcomes measured included reductions 81 2020 Public Services Trust 82 The insights derived from the outcome evaluations of long-term condition management and result-based payment schemes for physicians, such as the Quality and Outcomes Framework, can be used to analyse how payment-by-outcome might apply in long-term condition management and the issues that might arise in the process. The next sections examine why payment-by-outcome is desirable and what tools are available to improve the chances of success. Why Pay for Outcomes? Payment-by-outcome offers two particular advantages in the ield of long-term condition management. It fosters innovation by giving providers the lexibility to experiment with the linkages between inputs and outputs, and outputs and outcomes, and it may improve eficiency if providers are allowed to retain cost savings associated with treating patients proactively. Of course, much innovation in the medical treatment of long-term conditions has already occurred. The scientiic evidence on the effectiveness of treatments is strong and the National Institute for Health and Clinical Excellence (NICE) already issues best practice guidelines for diagnosing and treating certain conditions. For example, there is compelling data on the role of exercise, diet, blood sugar control (HbA1c) and insulin injections in diabetes management. Providers are therefore unlikely to deviate from these treatments, especially since new ones have to be approved by NICE if they are to be funded on the NHS. However payment-by-outcome may foster two other forms of innovation. First, providers may experiment with different ways of encouraging patients to cooperate with treatments. Often the effectiveness of treatment depends not only on the biochemical effects on patients’ bodies, but also on whether patients cooperate in the management of their diseases. In 2005, the UK government recognised the importance of patient involvement in their own care when it pledged to triple investment in the Expert Patients Programme, which delivers ‘free courses aimed at helping people who are living with a long-term health condition manage their condition better on a daily basis.’113 In the United States, healthcare organisations contact patients when they are more likely to be receptive to medical advice, such as when they receive a new diagnosis, experience changes in medication or are discharged from hospital.114 The timing of advice and treatments is thought to have a signiicant impact on patients’ willingness to co-produce. Second, payment-by-outcome is likely to encourage providers to create functional for certain conditions, these treatments are not always administered to all patients who need them. Where providers take on the risk for health outcomes, they are incentivised to ensure that best practice is systematically carried out. This could be through standardised processes that ensure health indicators are monitored and recorded and treatments are changed or emergency care is given where needed, or through information technology that provides early warning of patient deterioration. Providers are likely to invest in inding new ways to ensure all patients receive the best possible care, as this will contribute to the achievement of outcomes. In addition, payment-by-outcome may encourage providers to become more eficient, if providers are allowed to retain a share of the cost savings they produce. There is evidence that paying doctors for performing certain procedures, as is the case under payment-by-results, ‘encourages resource consumption’.115 Although not yet tested, the theory would suggest that paying physicians based on patient health outcomes and healthcare utilisation, and allowing them to retain a proportion of the cost savings generated, would encourage them to be more cost-conscious. Tools Paying physicians by outcomes has clear beneits but is also challenging. This section analyses some of the tools available to commissioners that may make payment-by-outcome work successfully in long-term condition management. Measures The existence of measures that simultaneously proxy for outcomes and cost savings facilitates the introduction of payment-by-outcome since savings can be used to remunerate high-performing providers. In long-term condition management, a reduction in hospital utilisation – a function of frequency of hospitalisation and length of stay – has a direct impact on the cost associated with treating a particular patient and also proxies for health outcomes. Managed care programmes in Australia, the UK and the US are already evaluated on the basis of reduced patient hospital utilisation. However, reduced patient hospital utilisation may not be a suitable measure for all types of patients. While a physician’s success with patients who have a prior history of hospitalisation can be judged by reductions in utilisation, the same cannot Payment by Outcome service delivery systems. While there is scientiic evidence of which treatments work 83 2020 Public Services Trust 84 be said for patients with mild conditions. Such patients often have no prior history of hospital episodes and may not be at imminent risk of admission. Reductions in GP visits or nurse consultations are alternative outcome measures for lower-risk patients. The number of GP visits is already factored into patient risk assessment tools such as the Combined Predictive Model used to predict patients’ risk of hospitalisation and it is also being tracked by the UK Department of Health as part of the ‘Whole Systems Demonstrators’ evaluation of telecare and telehealth pilots. Furthermore, reduced patient hospital utilisation or service use need not be the only intermediate outcome that commissioners measure. In fact, unitary performance measures often do not adequately capture all aspects of primary outcomes and may induce providers to game the performance measure by improving measured performance without delivering better health. Therefore contracting for improvements in a number of outcome measures such as mortality and morbidity rates and service quality in addition to reductions in service use may be preferable. Monitoring a number of indicators gives the commissioner a more accurate picture of a provider’s contribution to the primary outcome and should reduce the incidence of creaming and parking.116 The Quality and Outcomes Framework, for instance, rewards GPs for achieving a number of clinical and organisational targets as well as for providing a good treatment experience to patients. Treatment experience is assessed using standardised surveys that ask patients a set of questions to reveal how they feel about their treatment. In 2009, Patient Reported Outcome Measures (PROMs) were introduced into the UK National Health Service (NHS). These are outcome- rather than process-based measures and are derived in two stages. First, patients are asked to report on various dimensions of their health (for example, mobility and pain/discomfort) in order to compute their overall health state. A preference weight or utility is then attached to that state by asking patients or a sample of the general population how many years of life in the current health state they would be willing to give up for a year of better health.117 Insofar as the trade-off forces patients to measure quality of life in a common currency, namely in terms of years of life they are willing to give up for better health, the scores of different patients can be compared. Theoretically, this means that commissioners could compare the scores of providers’ patients to evaluate their performance. However, a major problem with both Patient Reported Experience and Outcome Measures is that providers may not have much control over key variables that determine how much patients value a treatment or outcome. For instance, experiences as one is more sensitive to pain than another. Furthermore, two patients with the same mobility after treatment could report different PROM scores because mobility is more important for one patient’s job. Awarding outcome payments to providers for achieving improvements in patient experience or patient reported outcomes may be unreasonable, since variables outside of providers’ control could inluence the measures. Instead, monitoring and publishing these intermediate outcomes may act as a softer incentive for physicians to improve patient experience and quality of life. In summary, there are a number of measures available to assess providers’ performance on primary outcomes in chronic disease management. While measuring patient hospital utilisation has the advantage of revealing information about a patient’s healthcare costs, it may be desirable to monitor other measures as well in order to better capture all aspects of primary outcomes and deter gaming. Ownership- and Integration-Related Incentives Payment-by-outcome is one means of aligning the interests of providers with those of the commissioner through a contract, but the termination of contracts may generate perverse incentives. When contracting a provider to manage a high-risk population nearing the ends of their lives, this may not be a problem. However, as commissioners seek to serve lower risk patients who may have 20 to 30 years left to live, the perverse incentives associated with a contract terminating at the end of, say, ten years, could be large. These perverse incentives are best explained through the concept of ownership: who owns the beneit of a particular intervention. For example, if a provider is given a contract to manage the care of a population for ten years, and at the end of that period the contract is to be retendered, the subsequent provider will own the beneit of inheriting a population that has better health than at the beginning of the irst contract. This means the original provider will have an incentive to innovate and manage the health of its patients proactively at the beginning of the contract period, because it is likely to beneit, but these incentives will dwindle towards the end of the contract. To ensure the original provider always has an incentive to manage patient health in the most long-term cost-effective way, the original provider must own the beneit of the intervention. Payment by Outcome two patients both undergoing the same arthritis treatment could report different 85 2020 Public Services Trust 86 The integrated Managed Care Organisation (MCO) model in the US, which combines indemnity insurance with the provision of managed care to control healthcare utilisation and therefore costs, is one way of aligning incentives without the perversities associated with contract termination in a payment-by-outcome system. 118 Most Americans are now enrolled in an MCO. Patients who are unsatisied with the care they are receiving may be able to switch to another health plan, so the level of competition among health insurance companies is relatively high.119 Patients who are satisied with the service, however, are likely to remain with their insurer throughout their lives. Competition gives MCOs an incentive to deliver very good quality of care, while the possibility of a client remaining with the insurer indeinitely ensures it owns the beneit of delivering cost-effective care over the long-term. If doctors working on the frontline are self-employed or work for a number of different insurance companies, then they will not have the same incentives as the insurers to deliver proactive care. To better align the incentives of physicians, a number of MCOs operate as integrated care organisations, in which the insurer, commissioners and providers of care are organisationally or contractually integrated. Kaiser Permanente, for example, has a largely contractually integrated structure. It contracts with the Permanente Medical Group of physicians on an exclusive basis, and, depending on the state, either owns and runs hospitals or contracts with non-Kaiser hospitals with which it has a long-term relationship.120 In this structure, Kaiser Permanente acts as both the insurer and the commissioner of care and physicians act as both providers and commissioners of secondary and tertiary care. This structure means commissioners can exert more inluence over doctors to deliver outcomes,121 since physicians who contract with more than one insurer have the freedom to stop doing business with one if they choose, whereas doctors contracting exclusively with one MCO must either adopt the model of care of the organisation or leave their jobs. The high level of contractual integration means that Kaiser physicians ‘share a common destiny’122 with the organisation as a whole and are therefore more likely to act in ways that increase its competitiveness, including delivering outcomes that improve the insurer’s inancial position overall. Furthermore, Kaiser doctors have financial incentives to ensure the company performs well. The doctors own shares of the MCO which ties part of their remuneration to its performance. As a result, physicians have a inancial motivation to generate cost savings that improve the MCO’s proitability, for example through managing long-term conditions in primary care settings.123 In fact, Kaiser days it uses to treat 11 medical conditions for those aged 65 and above.124 While the American system has been a reference point for some British policymakers, it is important to recognise that the UK system is fundamentally different since there is a purchaser-provider split. When considering how incentives will operate, this aspect of the UK context needs to be taken into account. Professional Norms and Regulation Payment-by-outcome creates a inancial incentive for providers of long-term condition management to improve or prevent the deterioration of patient health. Depending on the design of the incentive system used to reward provider performance, providers may be inclined to game the system and increase measured performance without actually improving patient health. However, commissioners can harness professional norms to counteract gaming. In the UK, healthcare providers are not only accountable for the quality of the services they deliver, as monitored by the Care Quality Commission, but they are also personally answerable for their professional conduct to the General Medical Council. The Council enforces professional standards as published in its general guidance, case studies and ethical standards. These clarify the meanings of good medical practice expected of physicians. Where physicians do not adhere to these professional norms, the Council has the legal mandate to sanction doctors and, in cases of severe misconduct, revoke their licence to practice.125 Professional norms and regulation reduce the likelihood of gaming, as achieving measures without delivering outcomes violates professional standards to which physicians have legally committed themselves. For instance, under GP fundholding, doctors had a inancial incentive to restrict access to secondary care (hospitals and specialists), as a reduction in referrals to secondary care promised to generate budget savings which doctors could reinvest in services in subsequent years.126 While such a inancial incentive may have been expected to encourage doctors to postpone referring patients to secondary care even when they needed the treatment, this did not occur. Insofar as they are personally liable for negligence and misconduct, doctors are unlikely not to refer patients if it is in their best interests to receive secondary care. Payment by Outcome Permanente performs 3.5 times better than the NHS in terms of the number of bed 87 2020 Public Services Trust 88 11 Other Case Studies Pharmaceutical Pricing In the health sector payment-by-outcome has been used most extensively for pharmaceutical products. As governments and insurers have sought to contain rising health expenditures through greater scrutiny of cost-effectiveness, drug manufacturers have assumed some of the risks around the performance of new products. The transfer of outcome risks was irst developed in the US in the 1990s between insurers and manufacturers. Early examples included ‘no cure, no pay’ strategies for male pattern baldness drugs, schizophrenia treatments and cholesterol-lowering statins.127 This approach proved unsustainable as it became clear that the schemes beneited insurers far more than manufacturers, and they were either not renewed or manufacturers attempted to renegotiate the agreements, which generated mistrust.128 In 1999 an outcomes guarantee was piloted in North Staffordshire (UK), with the trial of a new branded statin. The manufacturer promised to refund all costs of its drug if it failed to reduce patients’ LDL cholesterol to safe levels. This proved successful and allowed the makers to differentiate their product from older statins that were due to go off-patent and decrease markedly in price. Recent outcome-based schemes in the UK have been applied to more complex products. The irst national scheme arose amidst controversy following a NICE decision not to recommend drugs which slowed the progression of Multiple Sclerosis. As a result, in 2002, the Department of Health negotiated a scheme whereby list prices would be reviewed so as to meet a maximum cost-effectiveness ratio of £36,000 per Quality Adjusted Life Year (QALY, a measure of the quantity and quality of life generated by a healthcare intervention).129 In Australia a similar scheme was developed to adjust the price of a drug for a rare pulmonary disease Simpler rebate schemes have been used for two cancer drugs, Velcade and Erbitux. The manufacturers offered to refund costs after a pre-agreed course of treatment if the outcomes fell below the expected levels.131 Another drug company offered to bear the cost of treatment for patients with macular degeneration after 14 doses of their Lucentis product if there was no improvement in visual acuity compared to standard care.132 In the US, insurers have turned once again to risk-sharing schemes. The manufacturers of an anti-osteoporosis drug have reached an agreement with a medium-sized insurer to reimburse costs associated with fractures for patients taking the treatment as prescribed. The manufacturer Merck agreed with a major insurer, Cigna, to discount the cost of its anti-diabetes drugs following decreases in blood-sugar levels of patients taking any diabetes drug and further discounts if patients were taking Merck drugs as prescribed. There have also been movements to extend value-based purchasing to all branded drugs. The UK Pharmaceutical Pricing and Regulation Scheme was revised in 2009 to relect this concept, and a pay-for-performance model has been proposed by the Centers for Medicaid & Medicare Services.133 Outcomes and Measures Research and development of novel drugs is rooted in a scientiic process of discovery and measurement which gives purchasers and manufacturers a certain amount of conidence to implement payment-by-outcome schemes. However, much of the evidence on pharmaceutical performance is produced under clinically controlled trial conditions, which addresses the drug’s eficacy in targeting a particular process whilst what matters most to purchasers is its effectiveness with real-world patients. Surrogates The performance of some products is more easily measurable where a clear biomarker exists. This is an objective and measurable biochemical feature that can be used as a surrogate for hard outcomes. Widely used biomarkers such as LDL cholesterol for heart disease or glycated haemoglobin for diabetes are based on well-evidenced links between use of the drug and the surrogate and between the Payment by Outcome based on mortality rates.130 89 2020 Public Services Trust 90 surrogate and disease progression. A clear example of this is the Velcade Response Scheme for patients with multiple myeloma (a cancer of the blood) where response was measured by a reduction of abnormal cells in the blood known as M-proteins. Nevertheless, there can be uncertainty around interpreting biomarkers. There are concerns that M-protein is not a good surrogate for life expectancy, whilst in 10-15% of cases patients do not have measurable M-protein levels. Functional Measures Where clear evidence-based surrogates do not exist, functional measures based on physical improvement or harm have been used in fairly simple rebate/discount agreements. The Lucentis scheme for patients with macular degeneration passes on the costs of treatment to the manufacturer if there has been no improvement in eyesight compared to standard care based on visual acuity scores. Alternatively, the Actonel Fracture Protection Programme reimburses insurers for the costs of non-spinal fractures based on average medical expenses. The Erbitux scheme for colorectal cancer addresses the size of the tumour as interpreted by doctors. Effectiveness More sophisticated measurement systems have tried to scrutinise more intensely the effectiveness of pharmaceutical products in the real world rather than simply their eficacy in altering biochemical features. An important feature of these schemes is that they measure the deviation of actual from expected performance rather than from a placebo or a control group and therefore assess whether the product lives up to the claims made based on clinical trial data. In Australia a registry for patients diagnosed with pulmonary arterial hypertension and taking the drug bosentan was used to compare actual annual mortality rates to a benchmark rate derived from a predictive model based on evidence from clinical trials. An increase in mortality rates would lead to a decrease in price. The UK MS risk sharing scheme uses a ten-year study to monitor disease progression in patients compared to an expected progression derived from a predictive model. Disease progression was measured using the expanded disability status scale (EDSS), speciically developed to measure MS patient health on a scale from zero (perfect health) to ten (death). Although the study began in 2002, it has not yet provided conclusive evidence concerning the actual effect of the drug. This is largely due to the uncertainty of the outcome measure which does not fully capture the complexity and long-run nature of the disease and suffers from variation Cost-Effectiveness Incremental cost-effectiveness ratios based on quality-adjusted life-years are a useful measure for capturing eficiency and are commonly used when negotiating prices. However, establishing thresholds can be highly sensitive since it requires a maximum price to be put on patient health and life expectancy. NICE has established a notional upper limit of £20-30,000 per QALY, above which a drug will generally not be recommended. The Australian Pharmaceutical Beneits Scheme has set a similar ceiling at A$60,000. In the US, insurers have reported that state regulations and market pressures make it virtually impossible for them to refuse a drug.135 Establishing cost-effectiveness is heavily dependent on the quality of data available regarding quality of life and life expectancy. Initial calculations of the costeffectiveness of beta-interferons for MS based on randomised control trials prior to the risk-sharing scheme produced wildly divergent results ranging from £20,000 to £1 million per QALY.136 The interim results of the risk-sharing scheme’s 10-year monitoring study have failed to provide any further clarity. Financial Incentives Governments and health insurers must strike a balance between asserting control over high pharmaceutical costs and continuing to foster innovation in new products. Voluntary agreements with industry to regulate gross proits, such as the UK’s Pharmaceutical Price Regulation Scheme, and taxation aimed speciically at manufacturers, as used in the US, form a background to pricing incentives but are inevitably imprecise in their application. The aim of outcome-based pricing schemes is to link manufacturers’ remuneration more closely to the performance of their pharmaceutical products and the actual value to the patient rather than covering the costs of research and development, marketing and manufacturing. There are three identiiable models: a. rebate based on price of drug b. rebate based on costs of harm c. price adjustment (up or down) based on observed outcomes Payment by Outcome and measurement error.134 91 2020 Public Services Trust 92 In practice, prices tend to start high and gradually decrease over time. Increases are rare and usually arise in the renegotiation of prices as a result of higher manufacturing costs. Price-adjustment schemes usually only adjust downwards, thus manufacturers can mitigate their revenue risk if they can justify a suficiently high entry price to cover input costs. Rebate schemes can be high-stakes agreements since they depend on an allor-nothing evaluation of patient response to trigger the refund. This makes the level at which the threshold response is set very important. The Velcade Response Scheme involved hard bargaining around the level of response required (reduction in serum-M protein biomarkers in the blood). Whilst NICE managed to negotiate higher standards of performance, raising the minimum response from 25% to 50%, patients who fell below this mark could lose out on further treatment despite experiencing some beneit. To mitigate this risk an extra cycle of treatment was allowed. However, given each cycle of treatment costs around £3000 this also increases the potential refund from the manufacturer.137 The burden of making claims tends to fall on the purchaser (in this case hospital pharmacy departments) and evidence from NHS patient access schemes for oncology drugs (such as Velcade and Erbitux) suggests that administrative costs are high and a signiicant proportion of rebates are lost as they are not claimed within the necessary time limits.138 A lower-stakes approach is used under the Lucentis dose-capping scheme where the manufacturer provides the treatment at no further cost after the irst 14 rounds if an adequate response has not yet been achieved. This reduces the manufacturer’s potential exposure (since there is a guaranteed revenue stream), but caps the proit per patient. Moreover, it relieves the purchaser from some of the costs involved in assessing performance, since, in contrast to the all-or-nothing nature of the rebate schemes above, a claim only has to be made if further treatment is thought to be worthwhile. Rebates related to the costs of harm can increase manufacturers’ exposure. The concept has so far been applied to only one US scheme for anti-osteoporosis tablets whereby the average medical expenses associated with a non-spinal osteoporotic fracture are reimbursed to the insurer. Depending on the type of the fracture this can cost up to $30,000 per patient compared to the cost of the drugs which is around $1000 per year. Successful claims are required to show that the patient has been taking the drug as prescribed for at least six months whilst a maximum number of re-imbursable fractures was also set as part of the one-year agreement. the maximum for the year.139 An alternative model of discounting prices based on improved performance has also been used by a US drug manufacturer. The agreement between the US insurer Cigna and Merck for the oral diabetes drugs, Januvia and Janumet, drives adherence and greater volume sales by providing one set of discounts if blood sugar levels of patients decrease and a second set if patients have been taking the drugs as prescribed. The insurer was chosen largely because they already offered their own diet and life-style programmes to help manage the condition, suggesting that manufacturers are willing to take greater risks with their money where they feel that purchasers and their patients will co-operate. Manufacturers have also been willing to assume the costs of monitoring outcomes for the purposes of complex price-adjustment schemes. However, there is little evidence from the beta-interferon and bosentan price-adjustment schemes of changes in pricing so it is not yet possible to observe an effect on incentives. Identifying the Patient Population Pharmaceutical products are highly differentiated and specialised. They are therefore explicitly targeted at particular types of patients from a very early stage of development but even at launch there will be disagreement about who might beneit. Risk-sharing schemes have helped to overcome some of the tensions involved in price negotiations between manufacturers and purchasers by setting out clear inclusion criteria whilst also facilitating an ongoing reinement of the population as the scheme is rolled out and effectiveness is assessed. Inclusion Criteria Under randomised control trial conditions, manufacturers usually recruit a narrowly deined sample that is most likely to achieve optimal performance against a control group. Following successful clinical trials, however, manufacturers are keen for their treatments to be applied as widely as possible to increase proits, whilst purchasers will seek to narrow application to contain costs. However, under a risk-sharing scheme the purchaser may be more inclined to open up the inclusion criteria whilst the manufacturer will be more careful to identify patients who are closer to the population that took part in the randomised Payment by Outcome Data from the irst nine months showed that the reimbursement rate was well below 93 2020 Public Services Trust 94 control trial. NICE decided to expand inclusion criteria for the age-related macular degeneration drug Lucentis, to include patients with deterioration in one eye after Novartis agreed to refund the cost of doses after the 14th injection. Clinical and genetic markers are also used to predict the likelihood of response and therefore inluence the selection of the appropriate population. Many new oncology drugs now have an accompanying test to identify the most likely responders. In 2007 UnitedHealthcare entered an 18-month risk-sharing trial with the maker of a US$3,500 genetic test which determined whether a woman with early-stage breast cancer would beneit from chemotherapy. The insurer paid for the test during the trial in the expectation that a lower price would be negotiated if the costs of chemotherapy had not reduced in line with test results. 140 Continuation Criteria Once risk-sharing schemes are in progress, it will become apparent that some patients fail to respond as expected and do not beneit from the drug, necessitating further reinement. Whilst a randomised control trial cannot exclude participants based on sub-optimal performance, in real-world conditions, physicians will use their judgement in line with professional guidelines when observing patients on a course of medication and decide whether to alter dosage, switch to different medication or stop treatment altogether. Targeting the most responsive patients also activates a virtuous circle of improvement since patients will be more likely to keep taking a drug as prescribed where they experience beneits. Continuation criteria are designed to increase the cost-effectiveness of risksharing schemes and are generally stricter than original guidelines. In the Australian bosentan scheme, patients were rigorously assessed every six months and 20% of patients were excluded if their conditions did not stabilise or improve, since they were unlikely to beneit from continuing to take the drug. Foster Care Foster care is a short-term measure that is widely regarded as inadequate in meeting the physical, social and emotional needs of children over the medium- to long-term. Those children who do not ind a permanent home encounter more problems in the future: they commit signiicantly more crime, spend more time in jail, and receive disproportionately high welfare assistance as adults.141 In the United States, the management of child welfare is divided: while states intervened to impose standards and provide technical assistance. Not-for-proit organizations have assumed responsibility for actual delivery funded through grants and contracts. In some states, adoption and foster care services have been contracted together, since fostered children often ind permanent homes with their foster families and both services require similar skills. In other cases, only adoption services are outsourced, so agencies arrange permanent homes for those already in foster care. Other states have created different markets for foster care and adoption. However, the identiication and prevention of abuse in the original family environment remains a separate part of the service and is typically performed by a public sector agency. A payment-by-outcome approach was employed both in agreements between the federal and state governments, and in the contracts states signed with voluntary sector agencies. Child and Family Services Reviews were introduced in the year 2000 as part of a federal performance budgeting initiative. Some states, including Illinois, Michigan and Kansas, also implemented payment-by-outcome in their contracts with agencies. The design of these schemes reveals important insights into how the dificulties associated with payment-by-outcome have been managed. One of the principal challenges lay in selecting effective measures of performance. Child and Family Services Reviews monitor the performance of states according to safety, permanency, and child and family well-being. These categories are broken down into seven intermediate outcomes and 23 indicators such as the percentage of children re-entering foster care after being reuniied with their families and the percentage of children placed with siblings.142 States that do not achieve the required outcomes may accrue inancial penalties which can run into millions of dollars, although as of January 2004 no inancial penalties had been applied.143 While these reviews are acknowledged as having instigated a revolution in performance management, there were problems with the measures used to assess performance.144 It has been argued that the indicators ‘fail to capture experiences of children and families that adequately relect safety, permanency and well-being outcomes.’145 In addition, the methodology underpinning data collection has been criticised for skewing the measures. Measures are based either on a cross-section of children in Payment by Outcome continue to have primary responsibility for provision, the federal government has 95 2020 Public Services Trust 96 care or on those who exited care during a given review period. The former measure results in the over-representation of children with long lengths of stay, while the latter results in the ‘underestimation of length of time to permanency outcomes because of the bias of exit cohort samples towards children with shorter lengths of stay.’146 If information were being used purely for improvement purposes, then it would still be useful, even if biased, as long as the methodology remained the same over time. However, when states face inancial penalties for poor performance, accurate, unbiased data is essential. Another important challenge of payment-by-outcome lies in determining the amount of the incentive payment providers should receive. Public sector agencies often do not have good quality data on costs, and setting a price that will motivate providers to perform is dificult. Kansas experienced problems with its irst paymentby-outcome contract due to insuficient understanding of inancial issues. Costs for foster care were 65% higher than estimated, creating iscal problems for providers: the state paid foster care providers US$105 million in unexpected costs above the US$179 million contracted for, while the adoption provider received an additional US$31.4 million above the contract amount of US$37.4 million and yet was close to bankruptcy by the end of the four year contract.147 Designing incentive systems that encourage appropriate behaviour can be dificult. When Kansas irst outsourced its foster care and adoption services in 1997, the state paid providers a lat outcome payment (US$12,860-15,504 in 1997) per case. For this fee, providers were expected to deliver all traditional foster care services, and in addition, provide services for 12 months after a child was placed, with no additional funding if the child re-entered care during that time. Moreover, providers had to maintain standards in relation to child safety.148 This incentive system was intended to encourage providers to place children quickly and deliver permanent and safe outcomes, but it was problematic since it did not pay providers more for existing foster care cases which were more dificult and costly to manage than new referrals. The second set of contracts let in 2000 changed the payment mechanism so that providers were remunerated based on a ixed price per child per month, which meant that providers earned more for managing cases which were more time-consuming. Despite fears that the payment structure would create a perverse incentive for providers to keep children in foster care longer than appropriate or necessary, there is still a positive incentive for providers since annual contract renewal depends on performance.149 Michigan may have experienced similar problems. While the focus of foster time children spend moving among temporary placements also has an impact on their later lives. Since 1992, Michigan has incentivised agencies to ind children good homes more quickly. Once a child is removed from her original home she is placed with the state or private agency with the best available foster care home at that time. This agency then has a six-month window to secure an adoption (proposed adoptive parents have to be found within three months to ensure a stable, well-planned transition). If after six months the child has not been placed, her details are made available to all licensed agencies which can then compete for placement.150 Agencies are paid primarily according to the speed with which adoptions are inalised. Since providers are paid less for children not placed after one year than for newly-available children, concerns that providers will not be as strongly motivated to place such children may be legitimate. Available data neither conirms nor denies this: the total number of adoptions between 1991 and 1999 increased by 83%, but the number of children available for adoption increased 116% over this period, suggesting that providers did not manage to place even all the newly-available children; however, a breakdown of the proportion placed by length of time in foster care, which would conirm the hypothesis, is not available.151 Finally, extraneous variables appear to be a signiicant issue in payment-byoutcome in foster care and adoption. Agencies rewarded for placing children in safe, permanent homes cannot control the number of children who require foster care or adoption (which depend on, among other variables, the success of preventive services and court decisions), and yet this is one of the main variables that will impact on their results. Because of this, contracts that shift small amounts of risk to providers may be more suitable in this sector than high-stakes contracts. For example, in Cook County, Illinois, agencies were paid a monthly fee per child and were expected to place 24% of children into permanent homes each year. Failing to do so would result in the state considering not referring any additional children to the agency in the future. In other parts of Illinois, providers received US$2000 for each child placed after the 24% rate has been achieved.152 This relatively soft form of incentivisation nevertheless achieved dramatic results. Adoptions rose by 94% from 1997 to 2003 and the permanency rate rose from 2-4% to 12-23%, while costs fell.153 Payment by Outcome care and adoption is to ind children safe, permanent homes, the amount of 97 98 Foreign Aid 2020 Public Services Trust Signiicant change has taken place over the past three decades in the conditionality of international aid, with a shift from payment for promises, with grants and loans linked to commitments to policy change, to payment for processes, where recipient governments demonstrate their commitment by making institutional change aimed (say) at reducing corruption and increasing local involvement. Among other things, process conditionality has been criticised for being intrusive and undermining local accountability. The dificulties with conditionality serve to highlight the fundamental problem that outsiders cannot be effective and can actually do harm if they try to design the ‘software’ of an economy.154 Starting around the year 2000, the international community began to focus on performance-based funding, with the Paris Declaration in 2005 committing donors to increasing the ownership of initiatives by recipient countries, improving accountability and focusing on results and outcomes as measures of performance.155 A number of foreign aid schemes in recent years have adopted payment-byresults. The Global Alliance for Vaccines and Immunization has made new grants conditional on countries’ past performance, with payments linked to the number of children vaccinated. In a programme sponsored by the UK Department for International Development, the World Bank has introduced ‘output-based aid’ in some of its water projects, with concessionaires paid for the number of water connections. Conditional cash transfers make payments dependent on recipients’ performance of some key activity, such as ensuring that children attend school or visit health clinics on a regular basis.156 More recently, Nancy Birdsall, President of the Center for Global Development in Washington D.C., has proposed a comprehensive system of cash-on-delivery incentives in foreign aid, with particular application to completion rates in primary schooling: ‘The core of COD Aid is a contract for funders and recipients to agree on a mutually desired outcome and a ixed payment for each unit of conirmed progress.’157 Streetscene Management An integrated approach to improving the quality of public spaces has been a notable trend in local government commissioning. Local councils have sought to bring together services such as waste management, recycling, street-cleaning areas. Councils have also begun to consult more actively with their local residents in order to understand preferences and local priorities. Some councils have gone further and negotiated agreements where a portion of the contractor’s remuneration is linked to user satisfaction with the services as well as increased recycling rates, removing grafiti and tackling ly tipping. In 2003 Woking Borough Council concluded a ten-year contract with a private provider for street-cleaning and landscaping services. The contract does not stipulate any inputs and the provider is free to adjust schedules and redeploy the workforce in the most effective manner. The provider is paid an annual service fee, which is supplemented by a performance payment of up to 8% of the total annual contract value which represents the provider’s proit margin. The performance fee is calculated using a sliding scale of overall customer satisfaction with the cleanliness of streets and the appearance of parks, lowerbeds and grass verges measured with a quarterly telephone survey of 350 local residents and administered by an independent third party. The scale runs from 60-100% overall satisfaction, so any outcome below 60% elicits no extra payment beyond the service fee, whilst three continuous quarters of sub-threshold performance permit the council to terminate the contract. These performance incentives have caused the provider to experiment with ways to increase satisfaction with services. For example, cards are delivered to residents to alert them to the fact that their streets have been cleaned and seek feedback. In Charnwood, the same provider of ‘street scene’ services has introduced a ‘Community Champions’ scheme which equips volunteers with digital cameras itted with GPS devices to take photographs of local ‘grot spots’ blighted by lytipping, grafiti and littering and send them to a rapid response team which can very quickly arrive and cleanse the affected area. In Sandwell, a deduction-based model has been adopted using a blended suite of indicators which includes customer satisfaction targets, recycling rates and environmental cleanliness. Deductions are accrued for every performance failure which are then subtracted from monthly payments to the provider. Failure to rectify performance failures can lead to a multiplier being used to motivate the provider to act rather than absorb the cost. To mitigate the downside risk for the provider, during the irst year a quarterly allowance is in place which allows them to accrue Payment by Outcome and parks maintenance which enhance their residents’ shared experience of local 99 2020 Public Services Trust 100 a certain number points without a deduction being made, similar to an insurance excess. However, after the 12-month ‘bedding in’ period, this allowance will no longer be available. 101 Conclusion Since it was designed as a toolkit, this report contains a number of detailed insights into how payment-by-outcome can be made to work more effectively. However, three over-arching design principles have emerged as particularly important. Two of these – the warning that payment-by-outcome has its limitations and must be understood if it is to be used appropriately, and the need to build systems over time using the full range of tools in the toolkit – have already been discussed in some detail in body of the report, so it is with the third that this report concludes. Where the linkages between outputs and outcomes are already well understood, and where they are tightly-coupled so that the successful completion of a process almost invariably leads to the desired outcome, then there is little purpose in attempting to transfer outcome risk from commissioner to provider. On the other hand, where there is very little understanding of or agreement about the linkages between effort and outcome, it will be virtually impossible to write an outcomesbased contract that effectively transfers performance risk. Payment-by-outcome seems to work best in situations where commissioners are confronted by ‘known unknowns’. It is in these situations where commissioners are able to transfer performance risk, and there are social and economic beneits from so doing. These gains may come from the discovery of more effective linkages between outputs and outcomes, so that, for example, commissioners may wish to use payment-by-outcome to encourage providers to explore innovative drug rehabilitation techniques that signiicantly reduce recidivism rates. Or these gains may come from the better implementation of what is already known. For example, it may already be widely agreed that a certain pharmaceutical will improve the quality of life of a particular group of patients, but still necessary Payment by Outcome 12 2020 Public Services Trust 102 to seek out and identify that sub-set of the population for whom the drug will make the greatest difference, and to motivate those patients to take their medication as prescribed. Much of the current interest in payment-by-outcome seems to relate to this second kind of innovation, with particular emphasis on: (i) identiication of those beneiciaries for whom particular service models will work best; (ii) creation of effective management processes (for example, through joining up fragmented supply chains) enabling services to be tailored to different classes of beneiciary; and (iii) encouragement of much greater co-production on the part of beneiciaries. It is no coincidence that the sectors policymakers have earmarked for the implementation of payment-by-outcome are those that bear these characteristics. 103 The 2020 Public Services Trust and the authors would like to thank the many people that participated in the preparation of this report, generously giving of their time and offering encouragement, information and insights. Project sponsors Local Partnerships, Partnerships UK, Serco Group plc Commissioner project lead Lord Geoffrey Filkin Advisory board members Lauren M. Cumming, Alastair Dick, Lord Geoffrey Filkin, Gary L. Sturgess Foster care case study Peter May, Online Editor, The Serco Institute Long-term condition management case study Miles Ayling, Director of Service Design, Commissioning and System Management, Department of Health Conor Burke, CEO, Redbridge PCT Tim Ellis, Programme Manager for Whole System Demonstrators, Department of Health Peter Forrester, Director, Serco Consulting Pam Garside, Fellow in Health Management, Judge Business School, University of Cambridge Nick Goodwin, Senior Fellow, King’s Fund Jeff James, CEO, Wiltshire PCT John Myatt, Director, Serco Consulting Payment by Outcome Acknowledgements 2020 Public Services Trust 104 Andrew Prince, Director, Serco Consulting Mike Sadler, Chief Operating Oficer & Medical Director, Serco Health Peter Smith, Professor of Health Policy, Imperial College London Offender management case study John Biggin, Contract Director, HMP Doncaster Luke Edwards, Head of Strategy and Change, Ministry of Justice Elizabeth Fells, Head of Public Services Reform, Confederation of British Industry Steve Hall, Contract Director, Business Development, Serco Chris Harrison, Contract Manager, Serco Welfare to Work Andy Homer, Assistant Director Operations Support, Serco Civil Government Stephen Hornby, Senior Partnerships Manager, Serco FND Manchester Wyn Jones, Contract Director, HMP Dovegate Richard Judge, Finance Director, Serco Welfare to Work Martin McClellan, Senior Manager, Offender Management, HMP Doncaster Verena Menne, Researcher, SMF Phil Oliver, Senior Assistant Director, Security and Operations, HMP and YOI Doncaster Andrew Templeman, Director, Serco Consulting Nigel Thacker, Market Development Director, Reliance Tom Thackray, Policy Adviser, Confederation of British Industry Trevor Williams, Assistant Director, HMP Dovegate Streetscene management case study Robin Davies, Marketing Director, Serco Local Government and Commercial Welfare to work case study Mike Hope, Delivery Directorate Senior Analyst, Department for Work and Pensions Richard Johnson, Managing Director, Serco Welfare to Work Khusbu Patel, Bid Manager, Serco Welfare to Work Report preparation Heidi Hauf, 2020 Public Services Trust; Jeanette Thompson, The Serco Institute (administrative support) Peter May, The Serco Institute (proofreading) SoapBox, www.soapboxcommunications.co.uk (design and printing) 105 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. Aaron Wildavsky, ‘Rescuing Policy Analysis from PPBS,’ [1969] in Aaron Wildavsky, The Revolt Against the Masses (New Brunswick: Transaction Publishers, 2003): 407. Cited in Burt Perrin, ‘Effective Use and Misuse of Performance Measurement,’ American Journal of Evaluation 3 (1998): 368. See, for example, Allen Schick, ‘Performance Budgeting and Accrual Budgeting: Decision Rules or Analytic Tools?’ OECD Journal on Budgeting 2 (2007): 109-138. Bengt Holstrom and Paul Milgrom, ‘The Firm as an Incentive System,’ American Economic Review 4 (1994): 972. See Department for Work and Pensions, ‘The Work Programme: Invitation to Tender. Speciication and Supporting Information,’ Version 5.0, December 2010. Michael Lipsky, Street-Level Bureaucracy: Dilemmas of the Individual in Public Services (New York: Russell Sage Foundation, 1980). Peter Frumkin, ‘Managing for Outcomes: Milestone Contracting in Oklahoma,’ PricewaterhouseCoopers Endowment for The Business of Government, January 2001; Nancy Birdsall and William D. Savedoff with Ayah Mahgoub and Katherine Vyborny, Cash on Delivery (Washington, D.C.: Center for Global Development, 2010). Steven Kerr, ‘On the folly of rewarding A, while hoping for B,’ Academy of Management Journal 4 (1975): 769-783. James Q. Wilson, Bureaucracy: What Government Agencies Do and Why They Do It (New York: BasicBooks, 1989): 159. Addressed in Smith’s chapters in Peter C. Smith, et al. (eds.), Performance Measurement for Health System Improvement (Cambridge University Press, 2009). James Q. Wilson, op. cit., 129-134. Burt S. Barnow and Jeffrey A. Smith, ‘Performance Management of U.S. Job Training Programs,’ in Christopher J. O’Leary, Robert A. Straits and Stephen A. Wandner (eds.), Job Training Policy in the United States (Kalamazoo: Upjohn Institute, 2004): 22-23. Charles Perrow, ‘The Analysis of Goals in Complex Organizations,’ American Sociological Review 6 (1961): 854. Ibid., 855. Jeffrey L. Pressman and Aaron Wildavsky, Implementation, 3rd Edition (Berkeley: University of California Press, 1984): 133. Ibid., 171. James Q. Wilson, op. cit., 133. See Jeffrey L. Pressman and Aaron Wildavsky, op. cit. Lauren M. Cumming et al., Better Outcomes (London: 2020 Public Services Trust, 2009): 39. See Department for Work and Pensions (2010), ‘The Work Programme: Invitation to Tender,’ op. cit. Peter Smith, ‘On the Unintended Consequences of Publishing Performance Data in the Public Sector,’ International Journal of Public Administration 2 & 3 (1995): 279-280. Burt S. Barnow and Jeffrey A. Smith, op. cit., 23. Charles Brown, ‘Firms’ Choice of Method of Pay,’ Industrial and Labor Relations Review 3 (1990): 165182, quoted in Canice Prendergast, ‘The Provision of Incentives in Firms,’ Journal of Economic Literature 1 (1999): 21. Bengt Holstrom and Paul Milgrom, ‘Multitask Principal-Agent Analyses: Incentive Contracts, Asset Ownership, and Job Design,’ Journal of Law, Economics & Organization Special Issue (1991): 25. Ibid. James Q. Wilson, op.cit., 161. Allen Schick, The Spirit of Reform: Managing the New Zealand State Sector in a Time of Change (Wellington: State Services Commission, 1996): 61. Also Allen Schick (2007), op. cit., 126. Joan Petersilia, quoted in Greg Berman and Aubrey Fox, Trial and Error in Criminal Justice Reform: Learning from Failure (Washington: The Urban Institute Press, 2010): 7. Burt S. Barnow and Jeffrey A. Smith, op. cit., 31. Ibid., 32. Carol Propper and Deborah Wilson, ‘The Use and Usefulness of Performance Measures in the Public Sector,’ Oxford Review of Economic Policy 2 (2003): 259. Payment by Outcome Endnotes 2020 Public Services Trust 106 32. Allen Schick (2007), op. cit., 111. 33. This was Professor Marilyn Strathern’s reformulation of Goodhart’s Law. 34. Pascal Courty and Gerald Marschke, ‘Dynamics of Performance-Management Systems,’ Oxford Review of Economic Policy 2 (2003): 277. 35. Joseph J. Pedulla et al., ‘Perceived Effects of State-Mandated Testing Programs on Teaching and Learning: Findings of a National Survey of Teachers,’ National Board on Educational Testing and Public Policy, Lynch School of Education, Boston College, (2003): 5. 36. Ibid., 9. 37. Harry J. Paarsch and Bruce Shearer, ‘Piece Rates, Fixed Wages and Incentive Effects: Statistical Evidence from Payroll Records,’ CIRANO Scientific Series, 96s-31, (Montreal: 1996). 38. Bengt Holstrom and Paul Milgrom (1991), op. cit., 28. 39. Ibid., 50. 40. James J. Heckman, Jeffrey A. Smith and Christopher Taber, ‘What Do Bureaucrats Do? The Effects of Performance Standards and Bureaucratic Preferences on Acceptance into the JTPA Program,’ Working Paper No.5535 (Cambridge, MA: National Bureau of Economic Research, 1996): 4. 41. James C. Robinson, ‘Theory and Practice in the Design of Physician Payment Incentives,’ The Milbank Quarterly 2 (2001): 149. 42. See Carolyn J. Heinrich, ‘Outcomes-Based Performance Management in the Public Sector: Implications for Government Accountability and Effectiveness,’ Public Administration Review 6 (2002): 714. 43. Burt S. Barnow and Jeffrey A. Smith, op. cit., 29. 44. House of Commons Public Administration Select Committee, On Target? Government By Measurement: Fifth Report of Session 2002-2003 Volume 1 (London: The Stationary Ofice Limited, 2003): 19-20. 45. Ministry of Justice, Breaking the Cycle: Effective Punishment, Rehabilitation and Sentencing of Offenders (London: Ministry of Justice, 2010): 41. 46. Department for Work and Pensions, The Work Programme Prospectus – November 2010, accessed online at <http://www.dwp.gov.uk/docs/work-prog-prospectus-v2.pdf>: 14. 47. Sandra Vegeris et al., Jobseekers Regime and Flexible New Deal Evaluation: A report on qualitative research findings, (London: Department for Work and Pensions, 2010): 59-60. 48. Michael Lipsky, op. cit. 49. Department for Work and Pensions (2010), ‘The Work Programme: Invitation to Tender,’ op. cit., 14. 50. Peter Saunders, ‘The experience of contracting out employment services in Australia,’ Paying for Success: How to make contracting out work in employment services (London: Policy Exchange, 2008): 20. 51. See Victor P. Goldberg, ‘Regulation and Administered Contracts,’ The Bell Journal of Economics 2 (1976): 426-448. 52. Peter M. Blau, The Dynamics of Bureaucracy: A Study of Interpersonal Relations in Two Government Agencies (Chicago: The University of Chicago Press, 1955): 37-41. 53. See Morten Bennedsen and Christian Schultz, ‘Adaptive contracting: the trial and error approach to outsourcing,’ Economic Theory 1 (2005): 35-50. 54. Peter M. Blau, op. cit., 38-39. 55. Pascal Courty and Gerald Marschke, ‘Making Government Accountable: Lessons from a Federal Job Training Program,’ Public Administration Review 5 (2007): 906, 913. 56. Burt S. Barnow and Jeffrey A. Smith, op. cit., 26. 57. Pascal Courty and Gerald Marschke, ‘Performance Funding in Federal Agencies: A Case Study of a Federal Job Training Program,’ Public Budgeting & Finance 3 (2003): 39-40. 58. Australian National Audit Ofice, Administration of Job Network Outcome Payments (Department of Education, Employment and Workplace Relations, 2009): 46. 59. Helen Morrell and Natalie Branosky (eds.), The use of contestability and flexibility in the delivery of welfare services in Australia and the Netherlands (Norwich: Department of Work and Pensions, 2005): 25-26. 60. Department for Work and Pensions, The Work Programme Prospectus, op. cit., 2, 4. 61. Daniel Perkins, Personal Support Programme evaluation: Interim report (Fitzroy: Brotherhood of St Laurence, 2005): 44. 62. Charles Michalopoulos and Christine Schwartz with Diana Adams-Ciardullo, NEWWS: What Works Best for Whom: Impacts of 20 Welfare to work Programs by Subgroup (U.S. Departments of Health and Human Services and of Education, 2001): ES-4. 63. Department of Education, Employment and Workplace Relations, Review of the Job Seeker Classification Instrument (2009): 9. 64. Please note these numbers are used for illustrative purposes only. 65. Jane Mansour and Richard Johnson, Buying quality performance: Procuring effective employment services (London: WorkDirections, 2006): 13. 66. Helen Morrell and Natalie Branosky, op. cit., 26, 39-40. 67. Most commissioners do specify a certain number of hours per week that clients must be in work in order for providers to be able to claim the outcome payment. 68. Australian National Audit Ofice, op. cit., 47-48. 69. Department for Work and Pensions, The Work Programme Prospectus, op. cit., 14. 70. Oliver Bruttel, ‘Are Employment Zones Successful? Evidence From the First Four Years,’ Local Economy 4 (2005): 392. 71. Maria Hudson, Joan Phillips, Kathryn Ray, Sandra Vegeris and Rosemary Davidson, The influence of outcomebased contracting on Provider-led Pathways to Work (Norwich: Department of Work and Pensions, 2010): 24. 107 Payment by Outcome 72. Please note these numbers are used for illustrative purposes only. 73. Mark Considine, ‘The Reform that Never Ends: Quasi-Markets and Employment Services in Australia,’ in Els Sol and Mies Westerveld (eds.), Contractualism in Employment Services: A New Form of Welfare State Governance (The Hague: Kluwer Law International, 2005): 46. 74. Peter Saunders, op. cit., 16 75. Ibid, 18. 76. Mark Considine, op. cit., 51. 77. Peter Saunders, op. cit., 18. 78. Productivity Commission, Independent Review of the Job Network (Melbourne: Productivity Commission, 2002): 10.4. 79. Ibid. 80. Quoted in ibid. 81. Ibid. 82. Peter Saunders, op. cit., 21. 83. Ministry of Justice, Re-offending of adults: Results from the 2008 cohort (London: Ministry of Justice, 2010): 35. 84. Social Exclusion Unit, Reducing re-offending by ex-prisoners (London: Social Exclusion Unit, 2002): 5. 85. Ministry of Justice, ‘National Offender Management Service,’ accessed online at <http://www.justice.gov.uk/ about/noms.htm>. 86. Mike Maguire and Peter Raynor, ‘How the resettlement of prisoners promotes desistance from crime: Or does it?’ Criminology and Criminal Justice 6 (2006): 22. 87. Ministry of Justice (2010), Breaking the Cycle, op. cit., 10-11. 88. Deloitte, Path2Work Evaluation- Evaluation of the Path2Work employment pathfinder (Deloitte 2008); PS Plus, ‘Statistics,’ accessed online at <http://www.psplus.org/Statistics.html>; Jobtrack: An evaluation of NIACRO’s Jobtrack programme 2004-2006 (Belfast: NIACRO, 1996): 42. 89. A reconviction event occurs when an offender is reconvicted for any number of offences at a single court appearance in the 12 months following release. As the measure counts each court appearance it is tightly linked to cashable savings. 90. Owen Bowcott, ‘Rich to invest in scheme to cut prisoner re-offending rates,‘ The Guardian 10 September 2010, accessed online at <http://www.guardian.co.uk/society/2010/sep/10/rich-invest-scheme-cut-prisonre-offending?utm_source=twitterfeed&utm_medium=twitter>. 91. Tom Whitehead, ‘World irst in rehabilitation scheme,’ Telegraph 10 September 2010, accessed online at <http://www.telegraph.co.uk/news/uknews/law-and-order/7991948/World-irst-in-rehabilitation-scheme. html>. 92. Ian Mulheirn, Barney Gough and Verena Menne, Prison Break: Tackling recidivism, reducing costs (London: Social Market Foundation, 2010): 55-56. 93. Ministry of Justice (2010), Breaking the Cycle, op. cit., 42. 94. Ibid., 43. 95. Quoted in Greg Berman and Aubrey Fox, ‘Embracing Failure: Lessons for Court Manager,’ The Court Manager 4 (2008): 20-26. 96. Mike Maguire and Peter Raynor, op. cit., 27. 97. Ibid., 21-22. 98. Ibid., 27. 99. Ibid., 24. 100. Ibid., 24-25. 101. Ministry of Justice, Green Paper Evidence Report: Breaking the Cycle: Effective Punishment, Rehabilitation and Sentencing of Offenders (London: Ministry of Justice, 2010): 57. 102. Ibid., 59. 103. Mike Maguire and Peter Raynor, op. cit., 27-29. 104. Ministry of Justice, Compendium of re-offending statistics and analysis (London: Ministry of Justice, 2010): 91. 105. The authors have drawn on insights from Nancy Birdsall and William D. Savedoff et. al., op. cit., 38, 46. 106. It is also possible to attribute outcomes using before-and-after measures, which compare offending behaviour before and after a sentence. However, this is considered the least scientiically rigorous method of assessing impact, so should be avoided. 107. Marc LeBlanc and Rolf Loeber, ‘Developmental Criminology Updated’ in Michael Tonry (ed.) Crime and Justice Vol. 23. (Chicago: University of Chicago Press, 1998): 115-198. 108. Ministry of Justice (2010), Breaking the Cycle, op. cit., 45. 109. Ministry of Justice (2010), Re-offending of adults, op. cit., 45. 110. World Health Organisation, Preventing Chronic Diseases: A vital investment (Geneva: WHO, 2005), accessed online at <www.who.int/chp/chronic_disease_report/full_report.pdf>. 111. Anthony Scott, Stefanie Schurer, Paul H. Jensen and Peter Sivey, ‘The Effects of Financial Incentives on Quality of Care: The Case of Diabetes,’ HEDG Working Paper 08/15 (2008), accessed online at <http://ideas. repec.org/p/yor/hectdg/08-15.html>. 112. Pauline Allen, ‘’Payment by Results’ in the English NHS: the continuing challenges,’ Public Money and Management 3 (2009): 161. 113. Expert Patients Programme Community Interest Company, ‘What we do,’ About us, accessed online at <http://www.expertpatients.co.uk/about-us/what-we-do>. 2020 Public Services Trust 108 114. Rebecca Rosen, Perviz Asaria and Anna Dixon, Improving Chronic Disease Management: An AngloAmerican exchange (London: King’s Fund, 2007): 9. 115. James C. Robinson, ‘Theory and Practice in the Design of Physician Payment Incentives,’ The Milbank Quarterly 2 (2001): 157. 116. Robert D. Behn and Peter A. Kant, ‘Avoiding the Pitfalls of Performance Contracting,’ Public Productivity & Management Review 4 (1999): 480. 117. Alternatively, patients or the general population can be asked to rate their overall health state on a Visual Analogue Scale (VAS), with the end points labelled best imaginable health state and worst imaginable health state. VAS scores are of most value when looking at change within individuals, and are of less value for comparing across a group of individuals at one time point. For more details see: Dr. Mary Ellen Wewers and Nancy K. Lowe, ‘A critical review of visual analogue scales in the measurement of clinical phenomena,’ Research in Nursing and Health 4 (1990): 227-236. 118. Eric R. Wagner, ‘Types of Managed Care Organizations,’ in Peter R. Kongsvedt (ed.), The Managed Health care Handbook: Fourth Edition (Gaithersburg: Aspen Publishers, Inc., 2001): 28, 30. 119. Individuals who are insured through their employers are usually not able to switch, and those with preexisting conditions may not be accepted onto new health plans. 120. Natasha Curry and Chris Ham, Clinical and service integration: The route to improved outcomes (London: King’s Fund, 2010): 9. 121. Jennifer Dixon et al., Managing Chronic Disease: What Can We Learn from the US Experience? (London: King’s Fund, 2004): 22. 122. Ibid., 22. 123. Arguably, US healthcare spending is growing as a percentage of GDP, but this trend does not imply an inability of HMOs to contain costs. The major drivers of higher spending are the decline of managed care and the growth of consumer-driven health plans. For more information see Ronald Lagoe, Deborah L. Aspling and Gert P. Westert, ‘Current and future developments in managed care in the United States and implications for Europe,’ Health Research Policy and Systems 3 (2005): 5. 124. Chris Ham, Nick York, Steve Sutch and Rob Shaw, ‘Hospital bed utilisation in the NHS, Kaiser Permanente, and the US Medicare programme: analysis of routine data,’ British Medical Journal 7426 (2003): 1257. 125. The Medical Act of 1983 gives the GMC the authority to foster good medical practice and deal irmly and fairly with doctors whose itness to practice is in doubt. 126. King’s Fund, GP commissioning: what can we learn from previous commissioning models, 1 October 2010, accessed online at <http://www.kingsfund.org.uk/current_projects/the_nhs_white_paper/gp_commissioning. html>. 127. Claus Moldrup, ‘No Cure, No Pay,’ British Medical Journal 7502 (2005): 1262-1264. 128. Bob Carlson, ‘Satisfaction Guaranteed: Or Your Money Back,’ Biotechnology Healthcare Journal October/ November (2009): 14-22. 129. Mike Boggild et al., ‘Multiple Sclerosis risk sharing scheme: two year results of clinical cohorts study with historical comparator,’ British Medical Journal 4677 (2009): 1-9. 130. Anne Keogh et al., ‘The Bosentan Patient Registry: Long-Term Survival in Pulmonary Arterial Hypertension,’ Internal Medicine Journal, (Accepted Article – 20 August 2009). 131. ‘More Velcade-Style Risk Sharing in the UK?’ EuroPharma Today, 21 January 2009, accessed online at <http://www.europharmatoday.com/2009/01/more-velcadestyle-risksharing-in-the-uk.html>. 132. National Institute for Health and Clinical Excellence, ‘Ranibizumab and pegaptanib for the treatment of agerelated macular degeneration,’ NICE Technology Appraisal 155 (2008), accessed online at <http://www.nice. org.uk/nicemedia/pdf/TA155guidance.pdf>. 133. Fred Pane, ‘Get ready for changes in drug contracting from P4P,’ Drug Topics, 21 May 2007, accessed online at <http://drugtopics.modernmedicine.com/drugtopics/HospitalHealthSystemPharmacy/ ArticleStandard/article/detail/426504>. 134. George C. Ebers, ‘Commentary: Outcome measures were lawed,’ British Medical Journal 2693 (2010). 135. Andrew Pollack, ‘Pricing Pills by the Results,’ New York Times, 14 July 2007, accessed online at <http:// www.nytimes.com/2007/07/14/business/14drugprice.html>. 136. Jim Chilcott et al, ‘Modelling the cost effectiveness of interferon beta and glatiramer acetate in the management of multiple sclerosis’, British Medical Journal, 2003, 326, 522. 137. National Institute for Health and Clinical Excellence, ‘Bortezomib monotherapy for relapsed multiple myeloma,’ NICE technology appraisal guidance 129, (2007): 17-21, accessed online at <http://www.nice. org.uk/nicemedia/pdf/TA129Guidance.pdf>. 138. Steve Williamson, A Report into the Uptake of Patient Access Schemes in the NHS, Cancer Network Pharmacist Forum, November 2009, accessed online at <http://www.nice.org.uk/nicemedia/pdf/ TA129Guidance.pdf>. (In NICE’s guidance, see n.137 above, the committee noted that the Department of Health ‘considered that the scheme would not impose a disproportionate organisational burden on NHS organisations in England.’) 139. ‘Health Alliance Announces Promising Nine-Month Results from First Ever Outcome- Based Reimbursement Program fro Actonel Tablets,’ PR Newswire, 29 October 2010, accessed online at <http://www.prnewswire. com/news-releases/health-alliance-announces-promising-nine-month-results-from-irst-ever-outcome-basedreimbursement-program-for-actonelr-risedronate-sodium-tablets-67198367.html>. 140. ‘Genomic Health Announces National Payor Agreement with United Healthcare Company,’ Genomic Health, 10 January 2007, accessed online at <http://investor.genomichealth.com/ReleaseDetail. cfm?ReleaseID=225085>; and Andrew Pollack, op. cit. 109 Payment by Outcome 141. Erwin A. Blackstone et al., ‘Privatizing adoption and foster care: Applying auction and market solutions,’ Child and Youth Services Review 11 (2004): 1034. 142. Children’s Bureau, ‘Section III – Narrative Assessment of Child and Family Outcomes,’ accessed online at <http://www.acf.hhs.gov/programs/cb/cwmonitoring/tools_guide/statewidethree.htm#Toc140565118>. 143. United States General Accounting Ofice, Child and Family Services Reviews: Better Use of Data and Improved Guidance Could Enhance HSS’s Oversight of State Performance (Washington, D.C.: United States General Accounting Ofice, 2004): 7. 144. Children’s Bureau, ‘Child Welfare Final Rule Executive Summary,’ accessed online at <http://www.acf.hhs. gov/programs/cb/cwmonitoring/legislation/exsum.htm>. 145. Albert R. Roberts and Kenneth R. Yeager, Evidence-Based Practice Manual: Research and Outcome Measures in Health and Human Services (Oxford: Oxford University Press, 2004): 426. 146. Roberts and Yeager, Evidence-Based Practice Manual (2004): 426. 147. Blackstone et al., op. cit., 1038. 148. Ibid. 149. Ibid. 150. Ibid., 1036. 151. Ibid., 1037. 152. Ibid., 1039. 153. Ibid., 1039-40. 154. Owen Barder and Nancy Birdsall, ‘Payments for Progress: A Hands-Off Approach to Foreign Aid,’ Working Paper No.102 (Washington, DC: Center for Global Development, 2006): 6. 155. Nancy Birdsall and William D. Savedoff et. al., op. cit., 8. 156. Ibid., 36-37. 157. Ibid., 17. aspirations 2020 Public Services Trust gaming efficiency personalisation at the 2020 Public Services Trust, RSA, 8 John Adam Street, London, WC2N 6EZ telephone: 020 7451 6962 | charity no: 1124095 | www.2020pst.org baseline theory 2020 Public Services Trust results leadership responsibility Supported by Local Partnerships, Partnerships UK and Serco: motivation service measure