Fueled by the success of Rust, many programming languages are adding substructural features to th... more Fueled by the success of Rust, many programming languages are adding substructural features to their type systems. The promise of tracking properties such as lifetimes and sharing is tremendous, not just for low-level memory management, but also for controlling higher-level resources and capabilities. But so are the difficulties in adapting successful techniques from Rust to higher-level languages, where they need to interact with other advanced features, especially various flavors of functional and type-level abstraction. Hence, recent proposals such as Scala's Capture Types target far narrower domains than Rust. But what would it take to bring full-fidelity reasoning about lifetimes and sharing to mainstream languages? Reachability types are a recent proposal that has shown promise in scaling to higher-order but monomorphic settings, tracking aliasing and separation on top of a substrate inspired by separation logic. The * reachability type system qualifies types with sets of reachable variables and guarantees separation if two terms have disjoint qualifiers. However, naive extensions with type polymorphism and/or precise reachability polymorphism are unsound, making * unsuitable for adoption in real languages. Combining reachability and type polymorphism that is precise, sound, and parametric remains an open challenge. This paper presents a rethinking of the design of reachability tracking and proposes a solution to the key challenge of reachability polymorphism. Instead of always tracking the transitive closure of reachable variables as in the original design, we only track variables reachable in a single step and compute transitive closures only when necessary, thus preserving chains of reachability over known variables that can be refined using substitution. To enable this property, we introduce a new freshness qualifier, which indicates variables whose reachability sets may grow during evaluation steps. These ideas yield the simply-typed-calculus with precise lightweight, i.e., quantifier-free, reachability polymorphism, and the F <:-calculus with bounded parametric polymorphism over types and reachability qualifiers. We prove type soundness and a preservation of separation property in Coq. We show that our system subsumes both previous reachability type systems as well as the essence of Scala's capture types, making true tracking of lifetimes and sharing practical for mainstream languages.
Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security
Cache side-channel attacks exhibit severe threats to software security and privacy, especially fo... more Cache side-channel attacks exhibit severe threats to software security and privacy, especially for cryptosystems. In this paper, we propose CaType, a novel refinement type-based tool for detecting cache side channels in crypto software. Compared to previous works, CaType provides the following advantages: (1) For the first time CaType analyzes cache side channels using refinement type over x86 assembly code. It reveals several significant and effective enhancements with refined types, including bit-level granularity tracking, distinguishing different effects of variables, precise type inferences, and high scalability. (2) CaType is the first static analyzer for crypto libraries in consideration of blinding-based defenses. (3) From the perspective of implementation, CaType uses cache layouts of potential vulnerable control-flow branches rather than cache states to suppress false positives. We evaluate CaType in identifying side channel vulnerabilities in real-world crypto software, including RSA, ElGamal, and (EC)DSA from OpenSSL and Libgcrypt. CaType captures all known defects, detects previously-unknown vulnerabilities, and reveals several false positives of previous tools. In terms of performance, CaType is 16× faster than CacheD and 131× faster than CacheS when analyzing the same libraries. These evaluation results confirm the capability of CaType in identifying side channel defects with great precision, efficiency, and scalability. CCS Concepts • Security and privacy → Cryptanalysis and other attacks; Formal methods and theory of security; Hardware attacks and countermeasures.
The Java Modeling Language (JML) is a specification language for describing the functional behavi... more The Java Modeling Language (JML) is a specification language for describing the functional behavior of sequential Java program modules. The object-oriented features of Java make specifying invariants and framing difficult in the presence of subtyping. Using regions as a basis for a methodology, we precisely describe a technique for specifying invariants and framing in the presence of subtyping. We also extend JML by adding separating conjunction (from separation logic) for certain kinds of assertions.
Proceedings of the ACM on Programming Languages, 2021
Ownership type systems, based on the idea of enforcing unique access paths, have been primarily f... more Ownership type systems, based on the idea of enforcing unique access paths, have been primarily focused on objects and top-level classes. However, existing models do not as readily reflect the finer aspects of nested lexical scopes, capturing, or escaping closures in higher-order functional programming patterns, which are increasingly adopted even in mainstream object-oriented languages. We present a new type system, λ * , which enables expressive ownership-style reasoning across higher-order functions. It tracks sharing and separation through reachability sets, and layers additional mechanisms for selectively enforcing uniqueness on top of it. Based on reachability sets, we extend the type system with an expressive flow-sensitive effect system, which enables flavors of move semantics and ownership transfer. In addition, we present several case studies and extensions, including applications to capabilities for algebraic effects, one-shot continuations, and safe parallelization.
Cryptographic techniques have the potential to enable distrusting parties to collaborate in funda... more Cryptographic techniques have the potential to enable distrusting parties to collaborate in fundamentally new ways, but their practical implementation poses numerous challenges. An important class of such cryptographic techniques is known as secure multi-party computation (MPC). In an effort to provide an ecosystem for building secure MPC applications using higher degrees of automation, we present the HACCLE (High Assurance Compositional Cryptography: Languages and Environments) toolchain. The HACCLE toolchain contains an embedded domain-specific language (Harpoon) for software developers without cryptographic expertise to write MPC-based programs. Harpoon programs are compiled into acyclic circuits represented in HACCLE's Intermediate Representation (HIR) that serves as an abstraction for implementing a computation using different cryptographic protocols such as secret sharing, homomorphic encryption, or garbled circuits. Implementations of different cryptographic protocols ser...
Proceedings of the 20th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, 2021
Cryptographic techniques have the potential to enable distrusting parties to collaborate in funda... more Cryptographic techniques have the potential to enable distrusting parties to collaborate in fundamentally new ways, but their practical implementation poses numerous challenges. An important class of such cryptographic techniques is known as Secure Multi-Party Computation (MPC). Developing Secure MPC applications in realistic scenarios requires extensive knowledge spanning multiple areas of cryptography and systems. And while the steps to arrive at a solution for a particular application are often straightforward, it remains difficult to make the implementation efficient, and tedious to apply those same steps to a slightly different application from scratch. Hence, it is an important problem to design platforms for implementing Secure MPC applications with minimum effort and using techniques accessible to non-experts in cryptography. In this paper, we present the HACCLE (High Assurance Compositional Cryptography: Languages and Environments) toolchain, specifically targeted to MPC applications. HACCLE * Both authors contributed equally to this work. 2 Work performed while author was at Purdue University.
Cache-based side channels enable a dedicated attacker to reveal program secrets by measuring the ... more Cache-based side channels enable a dedicated attacker to reveal program secrets by measuring the cache access patterns. Practical attacks have been shown against real-world crypto algorithm implementations such as RSA, AES, and ElGamal. By far, identifying information leaks due to cache-based side channels, either in a static or dynamic manner, remains a challenge: the existing approaches fail to offer high precision, full coverage, and good scalability simultaneously, thus impeding their practical use in real-world scenarios. In this paper, we propose a novel static analysis method on binaries to detect cache-based side channels. We use abstract interpretation to reason on program states with respect to abstract values at each program point. To make such abstract interpretation scalable to real-world cryptosystems while offering high precision and full coverage, we propose a novel abstract domain called the Secret-Augmented Symbolic domain (SAS). SAS tracks program secrets and depe...
Framing is important for specification and verification of object-oriented programs. This dissert... more Framing is important for specification and verification of object-oriented programs. This dissertation develops the local reasoning approach for framing in the presence of data structures with unrestricted sharing and subtyping. It can verify shared data structures specified in a concise way by unifying fine-grained region logic and separation logic. Then the fine-grained region logic is extended to reason about subtyping. First, fine-grained region logic is adapted from region logic to express regions at the granularity of individual fields. Conditional region expressions are introduced; not only does this allow one to specify more precise frame conditions, it also has the ability to express footprints of separation logic assertions. Second, fine-grained region logic is generalized to a new logic called unified fine-grained region logic by allowing the logic to restrict the heap in which a program runs. This feature allows one to express specifications in separation logic. Third, b...
Automated Technology for Verification and Analysis, 2021
A recent case study from AWS by Chong et al. proposes an effective methodology for Bounded Model ... more A recent case study from AWS by Chong et al. proposes an effective methodology for Bounded Model Checking in industry. In this paper, we report on a followup case study that explores the methodology from the perspective of three research questions: (a) can proof artifacts be used across verification tools; (b) are there bugs in verified code; and (c) can specifications be improved. To study these questions, we port the verification tasks for aws-c-common library to SeaHorn and KLEE. We show the benefits of using compiler semantics and crosschecking specifications with different verification techniques, and call for standardizing proof library extensions to increase specification reuse. The verification tasks discussed are publicly available online. This research was supported by grants from WHJIL and NSERC CRDPJ 543583-19. 1 By continuous verification, we mean verification that is integrated with continuous integration (CI) and is checked during every commit.
Framing is important for specification and verification, especially in programs that mutate data ... more Framing is important for specification and verification, especially in programs that mutate data structures with shared data, such as DAGs. Both separation logic and region logic are successful approaches to framing, with separation logic providing a concise way to reason about data structures that are disjoint, and region logic providing the ability to reason about framing for shared mutable data. In order to obtain the benefits of both logics for programs with shared mutable data, this paper unifies them into a single logic, which can encode both of them and allows them to interoperate. The new logic thus provides a way to reason about program modules specified in a mix of styles.
Sclerostin and Dickkopf-1 (Dkk-1) are potent antagonists of Wnt signalling and might therefore pl... more Sclerostin and Dickkopf-1 (Dkk-1) are potent antagonists of Wnt signalling and might therefore play important roles in cardiovascular disease. We investigated whether serum sclerostin and Dkk-1 levels are associated with acute ischaemic stroke and specific stroke subtypes. Serum levels of sclerostin and Dkk-1 were measured by ELISA on day 1 and on day 6 after stroke in 62 patients with large artery atherosclerotic (LAA) stroke, on day 1 after stroke in 62 age- and gender-matched patients with small-artery occlusion (SAO) stroke and on admission in 62 healthy controls. Stroke severity was determined based on the National Institutes of Health Stroke Scale (NIHSS) and by measuring stroke volume on diffusion-weighted imaging. Outcome was measured by the modified Rankin Scale (mRS) on day 90. Compared with controls, serum sclerostin and Dkk-1 levels were significantly higher in both patients with LAA stroke and with SAO stroke, and no difference was detected between the stroke subtypes. ...
Proceedings of the 17th Workshop on Formal Techniques for Java-like Programs, 2015
Specification languages have long featured ways to describe what does not change when an imperati... more Specification languages have long featured ways to describe what does not change when an imperative procedure is executed: the socalled frame problem. Solutions to the frame problem are needed for formal verification in imperative programming, as otherwise a verification would not be able to accumulate information from one statement to the next. Region logic is one of the approaches to solving the frame problem. We present a modified version of region logic with fine-granularity and introduce conditional effects that allows one to specify more precise frame conditions.
Several techniques have been proposed for specification and verification of frame conditions, mak... more Several techniques have been proposed for specification and verification of frame conditions, making it difficult for specification language designers to know which to pick. Ideally there would be a single mechanism that could be used to express specifications written in all techniques. In this paper we provide a single mechanism that can be used to write specifications in the style of both separation logic and dynamic frames. This mechanism shows common characters between the two methodologies.
Several techniques have been proposed for specification and verification of frame conditions, mak... more Several techniques have been proposed for specification and verification of frame conditions, making it difficult for specification language designers to know which to pick. Ideally there would be a single mechanism that could be used to express specifications written in all techniques. In this paper we provide a single mechanism that can be used to write specifications in the style of both separation logic and dynamic frames. This mechanism shows common characters between the two methodologies.
Fueled by the success of Rust, many programming languages are adding substructural features to th... more Fueled by the success of Rust, many programming languages are adding substructural features to their type systems. The promise of tracking properties such as lifetimes and sharing is tremendous, not just for low-level memory management, but also for controlling higher-level resources and capabilities. But so are the difficulties in adapting successful techniques from Rust to higher-level languages, where they need to interact with other advanced features, especially various flavors of functional and type-level abstraction. Hence, recent proposals such as Scala's Capture Types target far narrower domains than Rust. But what would it take to bring full-fidelity reasoning about lifetimes and sharing to mainstream languages? Reachability types are a recent proposal that has shown promise in scaling to higher-order but monomorphic settings, tracking aliasing and separation on top of a substrate inspired by separation logic. The * reachability type system qualifies types with sets of reachable variables and guarantees separation if two terms have disjoint qualifiers. However, naive extensions with type polymorphism and/or precise reachability polymorphism are unsound, making * unsuitable for adoption in real languages. Combining reachability and type polymorphism that is precise, sound, and parametric remains an open challenge. This paper presents a rethinking of the design of reachability tracking and proposes a solution to the key challenge of reachability polymorphism. Instead of always tracking the transitive closure of reachable variables as in the original design, we only track variables reachable in a single step and compute transitive closures only when necessary, thus preserving chains of reachability over known variables that can be refined using substitution. To enable this property, we introduce a new freshness qualifier, which indicates variables whose reachability sets may grow during evaluation steps. These ideas yield the simply-typed-calculus with precise lightweight, i.e., quantifier-free, reachability polymorphism, and the F <:-calculus with bounded parametric polymorphism over types and reachability qualifiers. We prove type soundness and a preservation of separation property in Coq. We show that our system subsumes both previous reachability type systems as well as the essence of Scala's capture types, making true tracking of lifetimes and sharing practical for mainstream languages.
Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security
Cache side-channel attacks exhibit severe threats to software security and privacy, especially fo... more Cache side-channel attacks exhibit severe threats to software security and privacy, especially for cryptosystems. In this paper, we propose CaType, a novel refinement type-based tool for detecting cache side channels in crypto software. Compared to previous works, CaType provides the following advantages: (1) For the first time CaType analyzes cache side channels using refinement type over x86 assembly code. It reveals several significant and effective enhancements with refined types, including bit-level granularity tracking, distinguishing different effects of variables, precise type inferences, and high scalability. (2) CaType is the first static analyzer for crypto libraries in consideration of blinding-based defenses. (3) From the perspective of implementation, CaType uses cache layouts of potential vulnerable control-flow branches rather than cache states to suppress false positives. We evaluate CaType in identifying side channel vulnerabilities in real-world crypto software, including RSA, ElGamal, and (EC)DSA from OpenSSL and Libgcrypt. CaType captures all known defects, detects previously-unknown vulnerabilities, and reveals several false positives of previous tools. In terms of performance, CaType is 16× faster than CacheD and 131× faster than CacheS when analyzing the same libraries. These evaluation results confirm the capability of CaType in identifying side channel defects with great precision, efficiency, and scalability. CCS Concepts • Security and privacy → Cryptanalysis and other attacks; Formal methods and theory of security; Hardware attacks and countermeasures.
The Java Modeling Language (JML) is a specification language for describing the functional behavi... more The Java Modeling Language (JML) is a specification language for describing the functional behavior of sequential Java program modules. The object-oriented features of Java make specifying invariants and framing difficult in the presence of subtyping. Using regions as a basis for a methodology, we precisely describe a technique for specifying invariants and framing in the presence of subtyping. We also extend JML by adding separating conjunction (from separation logic) for certain kinds of assertions.
Proceedings of the ACM on Programming Languages, 2021
Ownership type systems, based on the idea of enforcing unique access paths, have been primarily f... more Ownership type systems, based on the idea of enforcing unique access paths, have been primarily focused on objects and top-level classes. However, existing models do not as readily reflect the finer aspects of nested lexical scopes, capturing, or escaping closures in higher-order functional programming patterns, which are increasingly adopted even in mainstream object-oriented languages. We present a new type system, λ * , which enables expressive ownership-style reasoning across higher-order functions. It tracks sharing and separation through reachability sets, and layers additional mechanisms for selectively enforcing uniqueness on top of it. Based on reachability sets, we extend the type system with an expressive flow-sensitive effect system, which enables flavors of move semantics and ownership transfer. In addition, we present several case studies and extensions, including applications to capabilities for algebraic effects, one-shot continuations, and safe parallelization.
Cryptographic techniques have the potential to enable distrusting parties to collaborate in funda... more Cryptographic techniques have the potential to enable distrusting parties to collaborate in fundamentally new ways, but their practical implementation poses numerous challenges. An important class of such cryptographic techniques is known as secure multi-party computation (MPC). In an effort to provide an ecosystem for building secure MPC applications using higher degrees of automation, we present the HACCLE (High Assurance Compositional Cryptography: Languages and Environments) toolchain. The HACCLE toolchain contains an embedded domain-specific language (Harpoon) for software developers without cryptographic expertise to write MPC-based programs. Harpoon programs are compiled into acyclic circuits represented in HACCLE's Intermediate Representation (HIR) that serves as an abstraction for implementing a computation using different cryptographic protocols such as secret sharing, homomorphic encryption, or garbled circuits. Implementations of different cryptographic protocols ser...
Proceedings of the 20th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, 2021
Cryptographic techniques have the potential to enable distrusting parties to collaborate in funda... more Cryptographic techniques have the potential to enable distrusting parties to collaborate in fundamentally new ways, but their practical implementation poses numerous challenges. An important class of such cryptographic techniques is known as Secure Multi-Party Computation (MPC). Developing Secure MPC applications in realistic scenarios requires extensive knowledge spanning multiple areas of cryptography and systems. And while the steps to arrive at a solution for a particular application are often straightforward, it remains difficult to make the implementation efficient, and tedious to apply those same steps to a slightly different application from scratch. Hence, it is an important problem to design platforms for implementing Secure MPC applications with minimum effort and using techniques accessible to non-experts in cryptography. In this paper, we present the HACCLE (High Assurance Compositional Cryptography: Languages and Environments) toolchain, specifically targeted to MPC applications. HACCLE * Both authors contributed equally to this work. 2 Work performed while author was at Purdue University.
Cache-based side channels enable a dedicated attacker to reveal program secrets by measuring the ... more Cache-based side channels enable a dedicated attacker to reveal program secrets by measuring the cache access patterns. Practical attacks have been shown against real-world crypto algorithm implementations such as RSA, AES, and ElGamal. By far, identifying information leaks due to cache-based side channels, either in a static or dynamic manner, remains a challenge: the existing approaches fail to offer high precision, full coverage, and good scalability simultaneously, thus impeding their practical use in real-world scenarios. In this paper, we propose a novel static analysis method on binaries to detect cache-based side channels. We use abstract interpretation to reason on program states with respect to abstract values at each program point. To make such abstract interpretation scalable to real-world cryptosystems while offering high precision and full coverage, we propose a novel abstract domain called the Secret-Augmented Symbolic domain (SAS). SAS tracks program secrets and depe...
Framing is important for specification and verification of object-oriented programs. This dissert... more Framing is important for specification and verification of object-oriented programs. This dissertation develops the local reasoning approach for framing in the presence of data structures with unrestricted sharing and subtyping. It can verify shared data structures specified in a concise way by unifying fine-grained region logic and separation logic. Then the fine-grained region logic is extended to reason about subtyping. First, fine-grained region logic is adapted from region logic to express regions at the granularity of individual fields. Conditional region expressions are introduced; not only does this allow one to specify more precise frame conditions, it also has the ability to express footprints of separation logic assertions. Second, fine-grained region logic is generalized to a new logic called unified fine-grained region logic by allowing the logic to restrict the heap in which a program runs. This feature allows one to express specifications in separation logic. Third, b...
Automated Technology for Verification and Analysis, 2021
A recent case study from AWS by Chong et al. proposes an effective methodology for Bounded Model ... more A recent case study from AWS by Chong et al. proposes an effective methodology for Bounded Model Checking in industry. In this paper, we report on a followup case study that explores the methodology from the perspective of three research questions: (a) can proof artifacts be used across verification tools; (b) are there bugs in verified code; and (c) can specifications be improved. To study these questions, we port the verification tasks for aws-c-common library to SeaHorn and KLEE. We show the benefits of using compiler semantics and crosschecking specifications with different verification techniques, and call for standardizing proof library extensions to increase specification reuse. The verification tasks discussed are publicly available online. This research was supported by grants from WHJIL and NSERC CRDPJ 543583-19. 1 By continuous verification, we mean verification that is integrated with continuous integration (CI) and is checked during every commit.
Framing is important for specification and verification, especially in programs that mutate data ... more Framing is important for specification and verification, especially in programs that mutate data structures with shared data, such as DAGs. Both separation logic and region logic are successful approaches to framing, with separation logic providing a concise way to reason about data structures that are disjoint, and region logic providing the ability to reason about framing for shared mutable data. In order to obtain the benefits of both logics for programs with shared mutable data, this paper unifies them into a single logic, which can encode both of them and allows them to interoperate. The new logic thus provides a way to reason about program modules specified in a mix of styles.
Sclerostin and Dickkopf-1 (Dkk-1) are potent antagonists of Wnt signalling and might therefore pl... more Sclerostin and Dickkopf-1 (Dkk-1) are potent antagonists of Wnt signalling and might therefore play important roles in cardiovascular disease. We investigated whether serum sclerostin and Dkk-1 levels are associated with acute ischaemic stroke and specific stroke subtypes. Serum levels of sclerostin and Dkk-1 were measured by ELISA on day 1 and on day 6 after stroke in 62 patients with large artery atherosclerotic (LAA) stroke, on day 1 after stroke in 62 age- and gender-matched patients with small-artery occlusion (SAO) stroke and on admission in 62 healthy controls. Stroke severity was determined based on the National Institutes of Health Stroke Scale (NIHSS) and by measuring stroke volume on diffusion-weighted imaging. Outcome was measured by the modified Rankin Scale (mRS) on day 90. Compared with controls, serum sclerostin and Dkk-1 levels were significantly higher in both patients with LAA stroke and with SAO stroke, and no difference was detected between the stroke subtypes. ...
Proceedings of the 17th Workshop on Formal Techniques for Java-like Programs, 2015
Specification languages have long featured ways to describe what does not change when an imperati... more Specification languages have long featured ways to describe what does not change when an imperative procedure is executed: the socalled frame problem. Solutions to the frame problem are needed for formal verification in imperative programming, as otherwise a verification would not be able to accumulate information from one statement to the next. Region logic is one of the approaches to solving the frame problem. We present a modified version of region logic with fine-granularity and introduce conditional effects that allows one to specify more precise frame conditions.
Several techniques have been proposed for specification and verification of frame conditions, mak... more Several techniques have been proposed for specification and verification of frame conditions, making it difficult for specification language designers to know which to pick. Ideally there would be a single mechanism that could be used to express specifications written in all techniques. In this paper we provide a single mechanism that can be used to write specifications in the style of both separation logic and dynamic frames. This mechanism shows common characters between the two methodologies.
Several techniques have been proposed for specification and verification of frame conditions, mak... more Several techniques have been proposed for specification and verification of frame conditions, making it difficult for specification language designers to know which to pick. Ideally there would be a single mechanism that could be used to express specifications written in all techniques. In this paper we provide a single mechanism that can be used to write specifications in the style of both separation logic and dynamic frames. This mechanism shows common characters between the two methodologies.
Uploads
Papers by Yuyan Bao