Academia.eduAcademia.edu

Complexity Science Challenges in Cybersecurity

2000

Computers and the Internet are indispensable to our modern society, but by the standards of critical infrastructure, they are notably unreliable. Existing analysis and design approaches have failed to curb the frequency and scope of malicious cyber exploits. A new approach based on complexity science holds promise for addressing the underlying causes of the cybersecurity problem. The application of complexity science to cybersecurity presents key research challenges in the areas of network dynamics, fault tolerance, and large-scale modeling and simulation. We believe that the cybersecurity problem is urgent enough, the limits of traditional reductive analysis are clear enough, and the possible benefits of reducing cyber exploits are great enough, that the further development of cybersecurity-targeted complexity-science tools is a major research need.

SANDIA REPORT SAND2009-2007 Unlimited Release Printed March 2009 Complexity Science Challenges in Cybersecurity Robert C. Armstrong, Jackson R. Mayo, Frank Siebenlist Prepared by Sandia National Laboratories Albuquerque, New Mexico 87185 and Livermore, California 94550 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under Contract DE-AC04-94AL85000. Approved for public release; further dissemination unlimited. Issued by Sandia National Laboratories, operated for the United States Department of Energy by Sandia Corporation. NOTICE: This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government, nor any agency thereof, nor any of their employees, nor any of their contractors, subcontractors, or their employees, make any warranty, express or implied, or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represent that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government, any agency thereof, or any of their contractors or subcontractors. The views and opinions expressed herein do not necessarily state or reflect those of the United States Government, any agency thereof, or any of their contractors. Printed in the United States of America. This report has been reproduced directly from the best available copy. Available to DOE and DOE contractors from U.S. Department of Energy Office of Scientific and Technical Information P.O. Box 62 Oak Ridge, TN 37831 Telephone: Facsimile: E-Mail: Online ordering: (865) 576-8401 (865) 576-5728 [email protected] http://www.osti.gov/bridge Available to the public from U.S. Department of Commerce National Technical Information Service 5285 Port Royal Rd Springfield, VA 22161 (800) 553-6847 (703) 605-6900 [email protected] http://www.ntis.gov/help/ordermethods.asp?loc=7-4-0#online NT OF E ME N RT GY ER DEP A Telephone: Facsimile: E-Mail: Online ordering: • ED ER U NIT IC A • ST A TES OF A M 2 SAND2009-2007 Unlimited Release Printed March 2009 Complexity Science Challenges in Cybersecurity Robert C. Armstrong Jackson R. Mayo Scalable Computing R&D Visualization & Scientific Computing Sandia National Laboratories, P.O. Box 969, Livermore, CA 94551 Frank Siebenlist, Mathematics & Computer Science Argonne National Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439 Abstract Computers and the Internet are indispensable to our modern society, but by the standards of critical infrastructure, they are notably unreliable. Existing analysis and design approaches have failed to curb the frequency and scope of malicious cyber exploits. A new approach based on complexity science holds promise for addressing the underlying causes of the cybersecurity problem. The application of complexity science to cybersecurity presents key research challenges in the areas of network dynamics, fault tolerance, and large-scale modeling and simulation. We believe that the cybersecurity problem is urgent enough, the limits of traditional reductive analysis are clear enough, and the possible benefits of reducing cyber exploits are great enough, that the further development of cybersecurity-targeted complexity-science tools is a major research need. 3 1 Introduction Computers and networks of computers are indispensable to our everyday work because they are so powerful as information processing and communication tools. Partly as a result of this immense capability to do the things we want, computers will also do many things that we do not want. In fact, computers are increasingly doing the bidding of attackers, to the detriment of their owners. Complexity is not an incidental feature of cyber systems but an inextricable necessity due to the complex things they are required to do. Complexity allows serious vulnerabilities to exist in apparently benign code, and allows those who would exploit them to attack invisibly. To understand the cybersecurity problem, we must understand cyber complexity and model it with predictive simulation. To solve the cybersecurity problem, we must figure out ways to live with the inherent unpredictability of modern cyber systems. In the face of the essential unpredictability of software and the ability for attackers to remain unseen, there are three major approaches to mitigation, all of which will benefit from a more scientific approach: 1. Model checking, formal methods [12], and software analysis [22] detect errors and, in the case of very simple systems, rigorously verify behavior as long as the foundational assumptions are correct. Most realistic cyber systems are too complex for rigorous verification, but can benefit from non-exhaustive analysis that will find a few of the straightforward vulnerabilities. Applications are common in supervisory control and data acquisition (SCADA) and medical devices [14], where the systems are less complex and the consequences of faults more severe. 2. Encapsulation, sandboxing [5], and virtual machines [25] provide a way to “surround” otherwise unpredictable software, hardware, and networks with software or hardware that is more trusted. A common but often ineffective example is a network firewall. Other more effective examples are being developed [17] and this technology holds particular promise for environments unique to DOE (Section 4). 3. Complexity science [27] drawing on biological [11, 15] and other analogues [8] is the least exploited but possibly the most promising approach. Biological metaphors are part of the cyber lexicon: virus, worm, etc. Models of complex cyber systems and their emergent behavior are needed to understand the cybersecurity problem. In this paper we will concentrate mainly on item 3, complexity science, and secondarily, item 2, the encapsulation of complexity. Within the complexity arena, we recommend two basic research thrusts: 1. Confronting unpredictability in programs, machines, and networks and its impact on the cybersecurity problem. Whether it is considered a theoretical certainty or a pragmatic heuristic, it is impossible to “formally” ensure [16] that a realistic program or operating system is 4 vulnerability free. Research is needed to understand how this complexity asymmetrically favors the attacker in current systems, and how it can instead be leveraged against the attacker to the defender’s advantage. Theories and algorithms are needed that use complexity to increase the effort required of attackers and reduce their likelihood of success. Here existing work from the fields of fault tolerance and high-reliability systems can be helpful. 2. Modeling the emergent behavior of programs, machines, and networks to understand what they are doing and predict what they will do, to the extent permitted by the underlying complexity. This type of modeling is largely unfamiliar to scientific computing but enjoys wide acceptance in systems analysis (e.g., war gaming, social simulation, business operations). Agent-based and discrete-event simulation methods are commonly used in this area and have already been applied to network and computer system modeling on a small scale. Large-scale modeling and simulation relevant to Internet proportions are needed to understand how the emergent behavior of large numbers of computers, routers, and network links can either mitigate or exacerbate the cybersecurity problem. Understanding the emergent behavior of cyber systems can also enable a broader approach to reliability and fault tolerance. 2 Confronting unpredictability in programs, machines, and networks Complexity of computers and software is an artifact of the complex things we require them to do. Their capacity for computation is inextricably connected to the fact that they are also unpredictable, or rather capable of unforeseen emergent behavior. Vulnerabilities1 are one of those behaviors. A complex system’s emergent behavior cannot be predicted from even a perfect knowledge of the constituent parts from which it is composed, a ramification of undecidability [4] and Turing Completeness. This means that a sufficiently complex system has emergent behavior that cannot be predicted ahead of “running” it, or alternatively, simulating that system with sufficient fidelity (and complexity) to see the emergent behavior [27] (Section 3). Concretely, this is why cyber systems that are composed of elements like programs, processors, and routers, each of which we presumably understand, are nonetheless constantly surprising us with the consequences of previously unknown vulnerabilities. 2.1 Origins of the cybersecurity problem in complexity Even though most realistic applications are too complex, there are programs and systems that are simple enough to be analyzed by formal methods and other reductive tools. In those cases, boundedness properties of the code can be asserted and proved, making the behavior well-understood 1 By “vulnerability” we mean some flaw in the system that can be exploited by a malicious actor to cause an effect not desired by the operator of the system. Note that the malicious actor could actually be a programmer or designer of the system with the intent of subverting a user. 5 under a wide variety of circumstances. Formal verification [12] is accomplished by automatically following all salient execution paths to understand their consequences. However, probably the vast majority of codes are of the sort that are too complex for this analysis, the number of paths soon grows beyond the capacity of even the largest machines. In this section we consider systems that are undecidable and for which vulnerabilities can only be discovered anecdotally but not thoroughly. Because their containing programs are unanalyzable, these vulnerabilities cannot be guaranteed to be found by any means. Because the defender cannot be certain to find all of the vulnerabilities in a system, and because only one vulnerability is needed for it to be compromised, the cards are asymmetrically stacked in the attacker’s favor. 2.2 Fault tolerance and high-reliability systems A vital emergent behavior of many real-world complex systems, particularly biological ones, is robustness to disturbances. The concept of highly optimized tolerance [11] describes highly engineered systems that take on an organic structure after many design cycles promoting robustness similar to biological evolution. Usually confined to medical devices [14] and aerospace avionics [26], fault-tolerant systems use diverse redundant systems and “vote” the answers to common input, detecting faults2 as outliers. An example is the set of identical computers on board the Space Shuttle [26]. If we recognize that a vulnerability is just a fault that is exploited, it is clear that cybersecurity too has much to gain from employing diversity [24]. To achieve robustness to such failures through redundancy, the replicated components can employ diverse implementations of the same functionality. In this way, exploits that trigger deviations from the intended behavior are unlikely to affect more than a small fraction of components at a time, and can be detected by voting the outputs [10, 18, 21]. Diversity need not involve complete replication of cyber systems but may be applied creatively, possibly in analogy to the way RAID encoding [2] allows any one element of a disk array to fail and still maintains integrity of the data. If a program’s vulnerabilities are unknowable, then some form of diversity is likely necessary to detect, deter, and otherwise leverage complexity against an attacker. Research is needed to formulate measures of diversity and model it in cyber systems to assess its effectiveness. Key challenges include • Developing quantifiable models of fault-tolerant and vulnerability-tolerant architectures – Quantifying the benefits of implementation diversity – Understanding the scaling of attacker and defender effort in fault-tolerant systems 2 By fault we mean any anomalous output of the system. All vulnerabilities are faults, but not all faults are vulnerabilities. 6 • Developing theories and metrics for diversity among variant implementations of the same software, particularly with respect to vulnerabilities (there is the potential to apply this to hardware and networks in modified form) • Using agent-based models to engineer or discover emergent robustness as a topological property of the graph or network of large collections of cyber subsystems • Using the above to suggest programming models and network/computer architectures that are inherently more secure 3 Modeling the behavior of programs, machines, and networks Complex systems pose challenges for a reductionist approach to modeling, with a system’s global behavior frequently not being readily deducible from even a detailed understanding of its constituent parts. An additional confounding feature of complex systems is their adaptive nature: Such systems can evolve, often in a coordinated manner, to accomplish system-level imperatives. Such behaviors are said to be “emergent” (they emerge as a consequence of the interactions of the constituents) and being able to reproduce them using appropriate constituent models has often been a matter of trial and error. 3.1 Complex systems modeling The connections among computers that constitute the Internet, the interactions among virtual components in software systems, and the wiring of logic circuits in microchips can all be represented at an abstract level by graphs and networks. Large numbers of entities with discrete interaction patterns, resulting in unpredictable emergent behavior, form what are known as entity-based complex systems approachable from agent-based or discrete-event algorithms [9, 13]. The emergent behaviors of such systems can, however, be understood and predicted in certain global aspects. Specifically, if we are aware of the impact of these networks and graphs on the emergent behavior of the system, they can be crafted to possibly produce desired emergent behavior like robustness to attack. Because cybersecurity issues are rooted in complexity, modeling that complexity with fidelity is paramount. Key challenges include • Understanding the global network dynamics of prototypical complex systems – Investigating the emergent behavior of idealized complexity models, such as cellular automata [8, 27] and Boolean networks [15] – Extending the results to classify the emergent behavior of more general agent-based models 7 • Relating realistic software architectures to the results of complexity science, in particular robustness to attack, resilience after attack, etc. • Designing more general and efficient fault-tolerant architectures using distributed redundancy with emergent robustness, rather than more resource-intensive replication with a single voter • Predicting the dynamics and spread of malicious software on physical computer networks • Developing agent-based monitoring and cyber forensic capabilities to detect intrusions based on anomalous dynamics 3.2 Large-scale modeling and simulation The application of complexity science to real-world cyber systems requires the ability to model and simulate these systems in a rigorous and validated manner. Just as in traditional science and engineering, coarse-grained models that describe a tractable subset of degrees of freedom are crucial for predictive understanding. But due to the undecidability of complex systems, such reduced models can in general describe only overall features, rather than details, of emergent behavior. The validation of such models thus requires particular care. An important avenue for controlled cyber experiments, enabling exploration of emergent behavior and validation of models, is provided by large-scale emulation—an especially realistic form of simulation. Emulation allows a physical computer cluster, through creation of numerous virtual machines and virtual network connections, to trace efficiently and with high realism the behavior of a much larger network of computers. In addition, we need virtual machine managers (VMMs) that allow us to monitor ad-hoc the application’s detailed resource access inside the virtual machines in real time, such as network connections, message communication, and CPU/disk/network usage [6]. The advantage of this approach is that the monitoring can be applied transparently to the applications. This monitored data will be part of the emulation or simulation’s result set and possibly fed back into our emulation models. Through the same VMMs, we can also enforce fine-grained policies on resource access by the applications, such that we can change and fine-tune our model parameter values in real time, again transparently to the applications themselves. Key challenges include • Developing techniques to derive coarse-grained models of complex systems – Using renormalization methods from theoretical physics to construct predictive models that preserve key emergent behaviors – Applying coarse-grained models to characterize robustness to attack • Understanding the relations among different levels of modeling and their ability to predict emergent behavior 8 – Quantifying tradeoffs between cost and accuracy of models to achieve effective simulations of complex systems – Using emergent behavior classifications to guide the choice of appropriate agent-based models • Extending network emulation/simulation capabilities to Internet scale • Combining insights from modeling and simulation to offer potential improvements to existing software and network protocols • Enhancing virtual machine manager implementations to accomodate the detailed monitoring and fine-grained policy enforcements needed for real-time simulation of our models 4 Role of the Department of Energy The Department of Energy has particular needs not currently being answered by the commercial or research cybersecurity community, as well as unique capabilities to advance the state of the art in cybersecurity research. 4.1 Need for securing DOE’s unique open resources High performance computing (HPC) platforms and attendant software are geared for performance, and any added controls, such as cybersecurity measures, that inhibit performance are unlikely to be adopted. In the Office of Science these platforms and their computational results need to be given the widest possible access to authorized researchers that can be safely achieved. Yet these platforms and attendant simulations are as complex as any in the commercial world and likely heir to the same frailties. Couple this with the fact that some percentage of the university researchers will be using these HPC resources from (unknowingly) compromised platforms and this makes for a particularly challenging problem. Because neither the outside (university researcher’s platform) nor the inside (DOE HPC simulation) can be trusted, some sort of virtual machine with sandboxed proxy for the simulation must be considered. The proxy may take the form of a remote method invocation system [3] or even a web-based protocol. The virtual machine within which it executes must enforce a security policy that disallows unsafe operations originating from either side [23]. This policy and attendant enforcement code will benefit from software analysis and formal methods to ensure correctness so that the virtual machine sandbox itself cannot be compromised. More advanced research that allows more flexible and adaptable allocation of potentially dangerous capabilities should be considered as well. The recent, promising research results on creating more secure and safer execution environments for JavaScript and Java, using capability-based techniques and enforced best-practices [17, 19, 20], may be applicable to a virtual machine sandbox environment with fine-grained access control policy enforcement on resource usage and message passing. A wide range of ideas should be considered that enables the DOE-unique environment where both openness is permitted and performance is uninhibited. 9 4.2 Use of DOE’s unique capabilities for cybersecurity research As discussed previously, large-scale emulation and simulation hold great promise for understanding cybersecurity (Section 3.2). While this is not a new concept [1], DOE is in a unique position to take advantage of it. Within high-performance clusters and Leadership Class Computing (LCC), a virtual Internet can be constructed for emulating and simulating routers, web farms, terminal nodes, etc., and the malefactors that prey on them. Both in scale and speed, LCC machines present an opportunity not found elsewhere for the emulation and simulation of the Internet at a nationstate scale, enabling computer experiments for phenomena that require a global view to understand, such as botnets [7]. Key challenges include • Researching novel architectures for preserving an open computing and collaboration framework while maintaining a secure environment for DOE-held assets • Investigating simulation and emulation environments that are enabled by current DOE HPC resources 10 References [1] DARPA BAA for Cyber Test Range. http://www.darpa.mil/STO/ia/pdfs/NCR_Qs_ and_As.pdf. [2] RAID. http://en.wikipedia.org/wiki/RAID. [3] Remote method invocation. basic/rmi. http://java.sun.com/javase/technologies/core/ [4] Rice’s theorem. http://en.wikipedia.org/wiki/Rice’s_theorem. [5] Sandbox for computer security. http://en.wikipedia.org/wiki/Sandbox_(computer_ security). [6] sHype: Hypervisor security architecture. http://www.research.ibm.com/secure_ systems_department/projects/hypervisor/. [7] Storm botnet. http://en.wikipedia.org/wiki/Storm_botnet. [8] P. Bak, C. Tang, and K. Wiesenfeld. Self-organized criticality: An explanation of 1/ f noise. Physical Review Letters, 59:381–384, 1987. [9] P. Brantley, B. L. Fox, and L. E. Schrage. A Guide to Simulation. Springer-Verlag, New York, 1986. [10] S. S. Brilliant, J. C. Knight, and N. G. Leveson. Analysis of faults in an N-version software experiment. IEEE Transactions on Software Engineering, 16:238–247, 1990. [11] J. M. Carlson and J. Doyle. Highly optimized tolerance: A mechanism for power laws in designed systems. Physical Review E, 60:1412–1427, 1999. [12] E. M. Clark, O. Grumberg, and D. A. Peled. Model Checking. MIT Press, 1999. [13] W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks, pages 473–509. Elsevier, San Francisco, 2004. [14] High Confidence Software and Systems Coordinating Group. High-Confidence Medical Devices: Cyber-Physical Systems for 21st Century Health Care, pages 23–24. Networking and Information Technology Research and Development Program (NITRD), 2009. http:// nitrd.gov/About/MedDevice-FINAL1-web.pdf. [15] S. A. Kauffman. The Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press, 1993. [16] T. W. Körner. The Pleasures of Counting, pages 298–318. Cambridge University Press, Cambridge, UK, 1996. [17] B. Laurie. Access control. http://www.links.org/files/capabilities.pdf. 11 [18] B. Littlewood, P. Popo, and L. Strigini. Modeling software design diversity. ACM Computing Surveys, 33:177–208, June 2001. [19] A. Mettler and D. Wagner. The Joe-E language specification (draft). Technical Report UCB/EECS-2006-26, U.C. Berkeley, May 2006. http://www.truststc.org/pubs/246. html. [20] M. S. Miller, M. Samuel, B. Laurie, I. Awad, and M. Stay. Caja: Safe active content in sanitized JavaScript. Technical report, June 2008. http://google-caja.googlecode. com/files/caja-spec-2008-06-07.pdf. [21] J. Oberheide, E. Cooke, and F. Janhanian. CloudAV: N-version antivirus in the network cloud. In Proceedings of the 17th USENIX Security Symposium, San Jose, CA, July 2008. [22] D. Quinlan. ROSE: Compiler support for object-oriented frameworks. In Proceedings of Conference on Parallel Compilers (CPC2000), Aussois, France, January 2000. [23] R. Sahita and D. Kolar. Beyond Ring-3: Fine grained application sandboxing. In W3C Workshop on Security for Access to Device APIs from the Web, London, December 2008. [24] B. Salamat, T. Jackson, A. Gal, and M. Franz. Intrusion detection using parallel execution and monitoring of program variants in user-space. In Proceedings of EuroSys’09, Nürnberg, Germany, April 2009. [25] S. Santhanam, P. Elango, A. Arpaci-Dusseau, and M. Livny. Deploying virtual machines as sandboxes for the grid. In Proceedings of Second Workshop on Real, Large Distributed Systems, 2005. [26] J. R. Sklaroff. Redundancy management technique for Space Shuttle computers. IBM Journal of Research and Development, 20:20–28, 1976. [27] S. Wolfram. A New Kind of Science. Wolfram Media, Champaign, IL, 2002. 12 v1.31