Skip to main content

Thomas Pasquier

University of British Columbia, Computer Science, Faculty Member

Followers

51

Following

8

Co-authors

5

Public Views

I am a Tenure-Track Assistant Professor in the Computer Science Department, University of British Columbia. I am part of the Systopia Lab where we work on Systems research in a broad sense. My research focuses on building more observable and transparent systems. I work on topics such as Digital Provenance, Auditing, Accountability, Intrusion Detection, and Systems Optimization.

less

InterestsView All (15)

Uploads

Conference Papers by Thomas Pasquier

Runtime Analysis of Whole-System Provenance

Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018

Identifying the root cause and impact of a system intrusion remains a foundational challenge in c... more Identifying the root cause and impact of a system intrusion remains a foundational challenge in computer security. Digital provenance provides a detailed history of the flow of information within a computing system, connecting suspicious events to their root causes. Although existing provenance-based auditing techniques provide value in forensic analysis, they assume that such analysis takes place only retrospectively. Such post-hoc analysis is insufficient for realtime security applications; moreover, even for forensic tasks, prior provenance collection systems exhibited poor performance and scalability, jeopardizing the timeliness of query responses. We present CamQuery, which provides inline, realtime provenance analysis, making it suitable for implementing security applications. CamQuery is a Linux Security Module that offers support for both userspace and in-kernel execution of analysis applications. We demonstrate the applicability of CamQuery to a variety of runtime security applications including data loss prevention, intrusion detection, and regulatory compliance. In evaluation, we demonstrate that CamQuery reduces the latency of realtime query mechanisms, while imposing minimal overheads on system execution. CamQuery thus enables the further deployment of provenance-based technologies to address central challenges in computer security.

Unicorn: Runtime Provenance-Based Detector for Advanced Persistent Threats

Proceedings 2020 Network and Distributed System Security Symposium, 2020

Advanced Persistent Threats (APTs) are difficult to detect due to their "low-and-slow" attack pat... more Advanced Persistent Threats (APTs) are difficult to detect due to their "low-and-slow" attack patterns and frequent use of zero-day exploits. We present UNICORN, an anomalybased APT detector that effectively leverages data provenance analysis. From modeling to detection, UNICORN tailors its design specifically for the unique characteristics of APTs. Through extensive yet time-efficient graph analysis, UNICORN explores provenance graphs that provide rich contextual and historical information to identify stealthy anomalous activities without predefined attack signatures. Using a graph sketching technique, it summarizes long-running system execution with space efficiency to combat slow-acting attacks that take place over a long time span. UNICORN further improves its detection capability using a novel modeling approach to understand long-term behavior as the system evolves. Our evaluation shows that UNICORN outperforms an existing state-of-the-art APT detection system and detects reallife APT scenarios with high accuracy. 1 An algorithm that builds a graph data structure is called a graph kernel, which is an overloaded term that also refers to functions that compare the similarity between two graphs. We adopt the second definition here.

Big ideas paper

Proceedings of the 17th International Middleware Conference, 2016

Internet of Things (IoT) applications, systems and services are subject to law. We argue that for... more Internet of Things (IoT) applications, systems and services are subject to law. We argue that for the IoT to develop lawfully, there must be technical mechanisms that allow the enforcement of specified policy, such that systems align with legal realities. The audit of policy enforcement must assist the apportionment of liability, demonstrate compliance with regulation, and indicate whether policy correctly captures legal responsibilities. As both systems and obligations evolve dynamically, this cycle must be continuously maintained. This poses a huge challenge given the global scale of the IoT vision. The IoT entails dynamically creating new services through managed and flexible data exchange. Data management is complex in this dynamic environment, given the need to both control and share information, often across federated domains of administration. We see middleware playing a key role in managing the IoT. Our vision is for a middleware-enforced, unified policy model that applies end-to-end, throughout the IoT. This is because policy cannot be bound to things, applications, or administrative domains, since functionality is the result of composition, with dynamically formed chains of data flows. We have investigated the use of Information Flow Control (IFC) to manage and audit data flows in cloud computing; a domain where trust can be well-founded, regulations are more mature and associated responsibilities clearer. We feel that IFC has great potential in the broader IoT context. However, the sheer scale and the dynamic, federated nature of the IoT pose a number of significant research challenges.

Proceedings of the 20th International Middleware Conference, 2019

System level provenance is of widespread interest for applications such as security enforcement a... more System level provenance is of widespread interest for applications such as security enforcement and information protection. However, testing the correctness or completeness of provenance capture tools is challenging and currently done manually. In some cases there is not even a clear consensus about what behavior is correct. We present an automated tool, ProvMark, that uses an existing provenance system as a black box and reliably identifies the provenance graph structure recorded for a given activity, by a reduction to subgraph isomorphism problems handled by an external solver. ProvMark is a beginning step in the much needed area of testing and comparing the expressiveness of provenance systems. We demonstrate ProvMark's usefuless in comparing three capture systems with different architectures and distinct design philosophies. CCS Concepts • Applied computing → System forensics; • Software and its engineering → Operating systems; Software verification and validation; • Computing methodologies → Logic programming and answer set programming.

Secure Namespaced Kernel Audit for Containers

Proceedings of the ACM Symposium on Cloud Computing, 2021

Despite the wide usage of container-based cloud computing, container auditing for security analys... more Despite the wide usage of container-based cloud computing, container auditing for security analysis relies mostly on built-in host audit systems, which often lack the ability to capture high-fidelity container logs. State-of-the-art reference-monitor-based audit techniques greatly improve the quality of audit logs, but their system-wide architecture is too costly to be adapted for individual containers. Moreover, these techniques typically require extensive kernel modifications, making it difficult to deploy in practical settings. In this paper, we present saBPF (secure audit BPF), an extension of the eBPF framework capable of deploying secure system-level audit mechanisms at the container granularity. We demonstrate the practicality of saBPF in Kubernetes by designing an audit framework, an intrusion detection system, and a lightweight access control mechanism. We evaluate saBPF and show that it is comparable in performance and security guarantees to audit systems from the literature that are implemented directly in the kernel. CCS CONCEPTS • Security and privacy → Operating systems security; Intrusion detection systems; • Networks → Cloud computing.

SIGL: Securing Software Installations Through Deep Graph Learning

Many users implicitly assume that software can only be exploited after it is installed. However, ... more Many users implicitly assume that software can only be exploited after it is installed. However, recent supply-chain attacks demonstrate that application integrity must be ensured during installation itself. We introduce SIGL, a new tool for detecting malicious behavior during software installation. SIGL collects traces of system call activity, building a data provenance graph that it analyzes using a novel autoencoder architecture with a graph long short-term memory network (graph LSTM) for the encoder and a standard multilayer perceptron for the decoder. SIGL flags suspicious installations as well as the specific installation-time processes that are likely to be malicious. Using a test corpus of 625 malicious installers containing real-world malware, we demonstrate that SIGL has a detection accuracy of 96%, outperforming similar systems from industry and academia by up to 87% in precision and recall and 45% in accuracy. We also demonstrate that SIGL can pinpoint the processes most...

PHP2Uni: Building Unikernels Using Scripting Language Transpilation

2017 IEEE International Conference on Cloud Engineering (IC2E), 2017

Unikernels are a rapidly emerging technology in the world of cloud computing. Unikernels build on... more Unikernels are a rapidly emerging technology in the world of cloud computing. Unikernels build on research into library operating systems to deliver smaller, faster and more secure virtual machines, specifically optimised for a single application service. These features are especially useful in cost or resource constrained environments. However, as with any new technology, early adopters need to master many technical details, and understand many aspects of the mechanisms used to build and deploy unikernels. Both of these factors may slow adoption rates. In this paper, we present our initial experiments into the use of an approach for building unikernels that is accessible to those whose technical expertise is focused on web development. We present PHP2Uni: a tool chain that takes a website built from PHP files—PHP remains the most widely used web language—and builds a resource-efficient unikernel image from them, while requiring little knowledge of the underlying operating system so...

Xanthus: Push-button Orchestration of Host Provenance Data Collection

Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems

Host-based anomaly detectors generate alarms by inspecting audit logs for suspicious behavior. Un... more Host-based anomaly detectors generate alarms by inspecting audit logs for suspicious behavior. Unfortunately, evaluating these anomaly detectors is hard. There are few high-quality, publiclyavailable audit logs, and there are no pre-existing frameworks that enable pushbutton creation of realistic system traces. To make trace generation easier, we created Xanthus, an automated tool that orchestrates virtual machines to generate realistic audit logs. Using Xanthus' simple management interface, administrators select a base VM image, configure a particular tracing framework to use within that VM, and define post-launch scripts that collect and save trace data. Once data collection is finished, Xanthus creates a self-describing archive, which contains the VM, its configuration parameters, and the collected trace data. We demonstrate that Xanthus hides many of the tedious (yet subtle) orchestration tasks that humans often get wrong; Xanthus avoids mistakes that lead to non-replicable experiments. CCS CONCEPTS • Security and privacy → Usability in security and privacy; Intrusion detection systems; Penetration testing.

Demonstrating the Practicality of Unikernels to Build a Serverless Platform at the Edge

2020 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 2020

The rise of IoT has led to large volumes of personal data being produced at the network's edge. M... more The rise of IoT has led to large volumes of personal data being produced at the network's edge. Most IoT applications process data in the cloud raising concerns over privacy and security. As many IoT applications are event-based and are implemented on cloud-based, serverless platforms, we've seen a number of proposals to deploy serverless solutions at the edge to address concerns over data transfer. However, conventional serverless platforms use container technology to run user-defined functions. Containers introduce their own issues regarding security-due to a large trusted computing base-, and performance issues including long initialisation times. Additionally, OpenWhisk a popular and widely used containerbased serverless platform available for edge devices perform relatively poorly as we demonstrate in our evaluation. In this paper, we propose to investigate unikernel as a solution to build serverless platform at the edge, addressing in particular performance and security concerns. We present UniFaaS, a prototype edge-serverless platform which leverages unikernels-tiny library single-address-space operating systems that only contain the parts of the OS needed to run a given application-to execute functions. The result is a serverless platform with extremely low memory and CPU footprints, and excellent performance. UniFaaS has been designed to be deployed on low-powered single-board computer devices, such as Raspberry Pi or Arduino, without compromising on performance.

To Tune or Not to Tune?

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020

This experimental study presents a number of issues that pose a challenge for practical configura... more This experimental study presents a number of issues that pose a challenge for practical configuration tuning and its deployment in data analytics frameworks. These issues include: 1) the assumption of a static workload or environment, ignoring the dynamic characteristics of the analytics environment (e.g., increase in input data size, changes in allocation of resources). 2) the amortization of tuning costs and how this influences what workloads can be tuned in practice in a cost-effective manner. 3) the need for a comprehensive incremental tuning solution for a diverse set of workloads. We adapt different ML techniques in order to obtain efficient incremental tuning in our problem domain, and propose Tuneful, a configuration tuning framework. We show how it is designed to overcome the above issues and illustrate its applicability by running a wide array of experiments in cloud environments provided by two different service providers. CCS CONCEPTS • Theory of computation → Online learning algorithms; Gaussian processes; Non-parametric optimization.

Container Escape Detection for Edge Devices

Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, 2021

Edge computing is rapidly changing the IoT-Cloud landscape. Various testbeds are now able to run ... more Edge computing is rapidly changing the IoT-Cloud landscape. Various testbeds are now able to run multiple Docker-like containers developed and deployed by end-users on edge devices. However, this capability may allow an attacker to deploy a malicious container on the host and compromise it. This paper presents a dataset based on the Linux Auditing System, which contains malicious and benign container activity. We developed two malicious scenarios, a denial of service and a privilege escalation attack, where an adversary uses a container to compromise the edge device. Furthermore, we deployed benign user containers to run in parallel with the malicious containers. Container activity can be captured through the host system via system calls. Our time series auditd dataset contains partial labels for the benign and malicious related system calls. Generating the dataset is largely automated using a provided AutoCES framework. We also present a semi-supervised machine learning use case with the collected data to demonstrate its utility. The dataset and framework code are open-source and publicly available. CCS CONCEPTS • Information systems → Data mining; • Security and privacy → Intrusion/anomaly detection and malware mitigation.

Information Flow Audit for PaaS clouds

by Thomas Pasquier and Jatinder Singh

—With the rapid increase in uptake of cloud services, issues of data management are becoming incr... more —With the rapid increase in uptake of cloud services, issues of data management are becoming increasingly prominent. There is a clear, outstanding need for the ability for specified policy to control and track data as it flows throughout cloud infrastructure, to ensure that those responsible for data are meeting their obligations. This paper introduces Information Flow Audit, an approach for tracking information flows within cloud infrastructure. This builds upon CamFlow (Cambridge Flow Control Architecture), a prototype implementation of our model for data-centric security in PaaS clouds. CamFlow enforces Information Flow Control policy both intra-machine at the kernel-level, and inter-machine, on message exchange. Here we demonstrate how CamFlow can be extended to provide data-centric audit logs akin to provenance metadata in a format in which analyses can easily be automated through the use of standard graph processing tools. This allows detailed understanding of the overall system. Combining a continuously enforced data-centric security mechanism with meaningful audit empowers tenants and providers to both meet and demonstrate compliance with their data management obligations.

Practical Whole-System Provenance Capture

Data provenance describes how data came to be in its present form. It includes data sources and t... more Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a system's behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities ; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.

PHP2Uni: Building Unikernels using Scripting Language Transpilation

—Unikernels are a rapidly emerging technology in the world of cloud computing. Unikernels build o... more —Unikernels are a rapidly emerging technology in the world of cloud computing. Unikernels build on research on library operating systems to deliver smaller, faster and more secure virtual machines, specifically optimised for a single application service. These features are especially useful in cost or resource constrained environments. However, as with any new technology, early adopters need to master many technical details, and understand many aspects of the mechanisms used to build and deploy unikernels. Both of these factors may slow adoption rates. In this paper, we present our initial experiments into the use of an approach for building unikernels that is accessible to those whose technical expertise is focused on web development. We present PHP2Uni: a tool chain that takes a website built from PHP files—PHP remains the most widely used web language— and builds a resource-efficient unikernel image from them, while requiring little knowledge of the underlying operating system software complexity.

Big ideas paper: Policy-driven middleware for a legally-compliant Internet of Things

Internet of Things (IoT) applications, systems and services are subject to law. We argue that for... more Internet of Things (IoT) applications, systems and services are subject to law. We argue that for the IoT to develop lawfully, there must be technical mechanisms that allow the enforcement of specified policy, such that systems align with legal realities. The audit of policy enforcement must assist the apportionment of liability, demonstrate compliance with regulation, and indicate whether policy correctly captures legal responsibilities. As both systems and obligations evolve dynamically, this cycle must be continuously maintained. This poses a huge challenge given the global scale of the IoT vision. The IoT entails dynamically creating new services through managed and flexible data exchange. Data management is complex in this dynamic environment, given the need to both control and share information, often across federated domains of administration. We see middleware playing a key role in managing the IoT. Our vision is for a middleware-enforced, unified policy model that applies end-to-end, throughout the IoT. This is because policy cannot be bound to things, applications, or administrative domains, since functionality is the result of composition, with dynamically formed chains of data flows. We have investigated the use of Information Flow Control (IFC) to manage and audit data flows in cloud computing; a domain where trust can be well-founded, regulations are more mature and associated responsibilities clearer. We feel that IFC has great potential in the broader IoT context. However, the sheer scale and the dynamic, federated nature of the IoT pose a number of significant research challenges.

Data-Centric Access Control for Cloud Computing

The usual approach to security for cloud-hosted applications is strong separation. However, it is... more The usual approach to security for cloud-hosted applications is strong separation. However, it is often the case that the same data is used by different applications, particularly given the increase in data-driven ('big data' and IoT) applications. We argue that access control for the cloud should no longer be application-specific but should be data-centric, associated with the data that can flow between applications. Indeed, the data may originate outside cloud services from diverse sources such as medical monitoring, environmental sensing etc. Information Flow Control (IFC) potentially offers data-centric, system-wide data access control. It has been shown that IFC can be provided at operating system level as part of a PaaS offering, with an acceptable overhead. In this paper we consider how IFC can be integrated with application-specific access control, transparently from application developers, while building from simple IFC primi-tives, access control policies that align with the data management obligations of cloud providers and tenants.

Clouds of Things need Information Flow Control with Hardware Roots of Trust

International Conference on Cloud Computing Technology and Science

There is a clear, outstanding need for new security mechanisms that allow data to be managed and ... more There is a clear, outstanding need for new security mechanisms that allow data to be managed and controlled within the cloud-enabled Internet of Things. Towards this, we propose an approach based on Information Flow Control (IFC) that allows: (1) the continuous, end-to-end enforcement of data flow policy, and (2) the generation of provenance-like audit logs to demonstrate policy adherence and contractual\slash regulatory compliance. Further, we discuss the role of Trusted Platform Modules (TPMs) in supporting such a system, by providing hardware roots of trust. TPMs can be leveraged to validate software configurations, including the IFC enforcement mechanism, both in the cloud and externally via remote attestation.

Managing Big Data with Information Flow Control

by Thomas Pasquier and Jean Bacon

Concern about data leakage is holding back more widespread adoption of cloud computing by compani... more Concern about data leakage is holding back more widespread adoption of cloud computing by companies and public institutions alike. To address this, cloud tenants/applications are traditionally isolated in virtual machines or containers. But an emerging requirement is for cross-application sharing of data, for example, when cloud services form part of an IoT architecture. Information Flow Control (IFC) is ideally suited to achieving both isolation and data sharing as required. IFC enhances traditional Access Control by providing continuous, data-centric, crossapplication, end-to-end control of data flows. However, large-scale data processing is a major requirement of cloud computing and is infeasible under standard IFC. We present a novel, enhanced IFC model that subsumes standard models. Our IFC model supports ‘Big Data’ processing, while retaining the simplicity of standard IFC and enabling more concise, accurate and maintainable expression of policy.

Securing Tags to Control Information Flows within the Internet of Things

To realise the full potential of the Internet of Things (IoT), IoT architectures are moving towar... more To realise the full potential of the Internet of Things (IoT), IoT architectures are moving towards open and dynamic interoperability, as opposed to closed application silos. This is because functionality is realised through the interactions, i.e. the exchange of data, between a wide-range of 'things'. Data sharing requires management. Towards this, we are exploring distributed, decentralised Information Flow Control (IFC) to enable controlled data flows, end-to-end, according to policy. In this paper we make the case for IFC, as a data-centric control mechanism, for securing IoT architectures. Previous research on IFC focuses on a particular system or application, e.g. within an operating system, with little concern for wide-scale, dynamic systems. To render IFC applicable to IoT, we present a certificate-based model for secure, trustworthy policy specification, that also reflects real-world IoT concerns such as 'thing' ownership. This approach enables decentralised, distributed, verifiable policy specification, crucial for securing the wide-ranging, dynamic interactions of future IoT applications.

FlowK: Information Flow Control for the Cloud

Security concerns are widely seen as an obstacle to the adoption of cloud computing solutions and... more Security concerns are widely seen as an obstacle to the adoption of cloud computing solutions and although a wealth of law and regulation has emerged, the technical basis for enforcing and demonstrating compliance lags behind. Our CloudSafetyNet project aims to show that Information Flow Control (IFC) can augment existing security mechanisms and provide continuous enforcement of extended. finer-grained application-level security policy in the cloud.

We present FlowK, a loadable kernel module for Linux, as part of a proof of concept that IFC can be provided for cloud computing. Following the principle of policy-mechanism separation, IFC policy is assumed to be expressed at application level and FlowK provides mechanisms to enforce IFC policy at runtime. FlowK’s design minimises the changes required to existing software when IFC is provided. To show how FlowK can be integrated with cloud software we have designed and evaluated a framework for deploying IFC-aware web applications, suitable for use in a PaaS cloud.

Runtime Analysis of Whole-System Provenance

Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018

Identifying the root cause and impact of a system intrusion remains a foundational challenge in c... more Identifying the root cause and impact of a system intrusion remains a foundational challenge in computer security. Digital provenance provides a detailed history of the flow of information within a computing system, connecting suspicious events to their root causes. Although existing provenance-based auditing techniques provide value in forensic analysis, they assume that such analysis takes place only retrospectively. Such post-hoc analysis is insufficient for realtime security applications; moreover, even for forensic tasks, prior provenance collection systems exhibited poor performance and scalability, jeopardizing the timeliness of query responses. We present CamQuery, which provides inline, realtime provenance analysis, making it suitable for implementing security applications. CamQuery is a Linux Security Module that offers support for both userspace and in-kernel execution of analysis applications. We demonstrate the applicability of CamQuery to a variety of runtime security applications including data loss prevention, intrusion detection, and regulatory compliance. In evaluation, we demonstrate that CamQuery reduces the latency of realtime query mechanisms, while imposing minimal overheads on system execution. CamQuery thus enables the further deployment of provenance-based technologies to address central challenges in computer security.

Unicorn: Runtime Provenance-Based Detector for Advanced Persistent Threats

Proceedings 2020 Network and Distributed System Security Symposium, 2020

Advanced Persistent Threats (APTs) are difficult to detect due to their "low-and-slow" attack pat... more Advanced Persistent Threats (APTs) are difficult to detect due to their "low-and-slow" attack patterns and frequent use of zero-day exploits. We present UNICORN, an anomalybased APT detector that effectively leverages data provenance analysis. From modeling to detection, UNICORN tailors its design specifically for the unique characteristics of APTs. Through extensive yet time-efficient graph analysis, UNICORN explores provenance graphs that provide rich contextual and historical information to identify stealthy anomalous activities without predefined attack signatures. Using a graph sketching technique, it summarizes long-running system execution with space efficiency to combat slow-acting attacks that take place over a long time span. UNICORN further improves its detection capability using a novel modeling approach to understand long-term behavior as the system evolves. Our evaluation shows that UNICORN outperforms an existing state-of-the-art APT detection system and detects reallife APT scenarios with high accuracy. 1 An algorithm that builds a graph data structure is called a graph kernel, which is an overloaded term that also refers to functions that compare the similarity between two graphs. We adopt the second definition here.

Big ideas paper

Proceedings of the 17th International Middleware Conference, 2016

Internet of Things (IoT) applications, systems and services are subject to law. We argue that for... more Internet of Things (IoT) applications, systems and services are subject to law. We argue that for the IoT to develop lawfully, there must be technical mechanisms that allow the enforcement of specified policy, such that systems align with legal realities. The audit of policy enforcement must assist the apportionment of liability, demonstrate compliance with regulation, and indicate whether policy correctly captures legal responsibilities. As both systems and obligations evolve dynamically, this cycle must be continuously maintained. This poses a huge challenge given the global scale of the IoT vision. The IoT entails dynamically creating new services through managed and flexible data exchange. Data management is complex in this dynamic environment, given the need to both control and share information, often across federated domains of administration. We see middleware playing a key role in managing the IoT. Our vision is for a middleware-enforced, unified policy model that applies end-to-end, throughout the IoT. This is because policy cannot be bound to things, applications, or administrative domains, since functionality is the result of composition, with dynamically formed chains of data flows. We have investigated the use of Information Flow Control (IFC) to manage and audit data flows in cloud computing; a domain where trust can be well-founded, regulations are more mature and associated responsibilities clearer. We feel that IFC has great potential in the broader IoT context. However, the sheer scale and the dynamic, federated nature of the IoT pose a number of significant research challenges.

Proceedings of the 20th International Middleware Conference, 2019

System level provenance is of widespread interest for applications such as security enforcement a... more System level provenance is of widespread interest for applications such as security enforcement and information protection. However, testing the correctness or completeness of provenance capture tools is challenging and currently done manually. In some cases there is not even a clear consensus about what behavior is correct. We present an automated tool, ProvMark, that uses an existing provenance system as a black box and reliably identifies the provenance graph structure recorded for a given activity, by a reduction to subgraph isomorphism problems handled by an external solver. ProvMark is a beginning step in the much needed area of testing and comparing the expressiveness of provenance systems. We demonstrate ProvMark's usefuless in comparing three capture systems with different architectures and distinct design philosophies. CCS Concepts • Applied computing → System forensics; • Software and its engineering → Operating systems; Software verification and validation; • Computing methodologies → Logic programming and answer set programming.

Secure Namespaced Kernel Audit for Containers

Proceedings of the ACM Symposium on Cloud Computing, 2021

Despite the wide usage of container-based cloud computing, container auditing for security analys... more Despite the wide usage of container-based cloud computing, container auditing for security analysis relies mostly on built-in host audit systems, which often lack the ability to capture high-fidelity container logs. State-of-the-art reference-monitor-based audit techniques greatly improve the quality of audit logs, but their system-wide architecture is too costly to be adapted for individual containers. Moreover, these techniques typically require extensive kernel modifications, making it difficult to deploy in practical settings. In this paper, we present saBPF (secure audit BPF), an extension of the eBPF framework capable of deploying secure system-level audit mechanisms at the container granularity. We demonstrate the practicality of saBPF in Kubernetes by designing an audit framework, an intrusion detection system, and a lightweight access control mechanism. We evaluate saBPF and show that it is comparable in performance and security guarantees to audit systems from the literature that are implemented directly in the kernel. CCS CONCEPTS • Security and privacy → Operating systems security; Intrusion detection systems; • Networks → Cloud computing.

SIGL: Securing Software Installations Through Deep Graph Learning

Many users implicitly assume that software can only be exploited after it is installed. However, ... more Many users implicitly assume that software can only be exploited after it is installed. However, recent supply-chain attacks demonstrate that application integrity must be ensured during installation itself. We introduce SIGL, a new tool for detecting malicious behavior during software installation. SIGL collects traces of system call activity, building a data provenance graph that it analyzes using a novel autoencoder architecture with a graph long short-term memory network (graph LSTM) for the encoder and a standard multilayer perceptron for the decoder. SIGL flags suspicious installations as well as the specific installation-time processes that are likely to be malicious. Using a test corpus of 625 malicious installers containing real-world malware, we demonstrate that SIGL has a detection accuracy of 96%, outperforming similar systems from industry and academia by up to 87% in precision and recall and 45% in accuracy. We also demonstrate that SIGL can pinpoint the processes most...

PHP2Uni: Building Unikernels Using Scripting Language Transpilation

2017 IEEE International Conference on Cloud Engineering (IC2E), 2017

Unikernels are a rapidly emerging technology in the world of cloud computing. Unikernels build on... more Unikernels are a rapidly emerging technology in the world of cloud computing. Unikernels build on research into library operating systems to deliver smaller, faster and more secure virtual machines, specifically optimised for a single application service. These features are especially useful in cost or resource constrained environments. However, as with any new technology, early adopters need to master many technical details, and understand many aspects of the mechanisms used to build and deploy unikernels. Both of these factors may slow adoption rates. In this paper, we present our initial experiments into the use of an approach for building unikernels that is accessible to those whose technical expertise is focused on web development. We present PHP2Uni: a tool chain that takes a website built from PHP files—PHP remains the most widely used web language—and builds a resource-efficient unikernel image from them, while requiring little knowledge of the underlying operating system so...

Xanthus: Push-button Orchestration of Host Provenance Data Collection

Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems

Host-based anomaly detectors generate alarms by inspecting audit logs for suspicious behavior. Un... more Host-based anomaly detectors generate alarms by inspecting audit logs for suspicious behavior. Unfortunately, evaluating these anomaly detectors is hard. There are few high-quality, publiclyavailable audit logs, and there are no pre-existing frameworks that enable pushbutton creation of realistic system traces. To make trace generation easier, we created Xanthus, an automated tool that orchestrates virtual machines to generate realistic audit logs. Using Xanthus' simple management interface, administrators select a base VM image, configure a particular tracing framework to use within that VM, and define post-launch scripts that collect and save trace data. Once data collection is finished, Xanthus creates a self-describing archive, which contains the VM, its configuration parameters, and the collected trace data. We demonstrate that Xanthus hides many of the tedious (yet subtle) orchestration tasks that humans often get wrong; Xanthus avoids mistakes that lead to non-replicable experiments. CCS CONCEPTS • Security and privacy → Usability in security and privacy; Intrusion detection systems; Penetration testing.

Demonstrating the Practicality of Unikernels to Build a Serverless Platform at the Edge

2020 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 2020

The rise of IoT has led to large volumes of personal data being produced at the network's edge. M... more The rise of IoT has led to large volumes of personal data being produced at the network's edge. Most IoT applications process data in the cloud raising concerns over privacy and security. As many IoT applications are event-based and are implemented on cloud-based, serverless platforms, we've seen a number of proposals to deploy serverless solutions at the edge to address concerns over data transfer. However, conventional serverless platforms use container technology to run user-defined functions. Containers introduce their own issues regarding security-due to a large trusted computing base-, and performance issues including long initialisation times. Additionally, OpenWhisk a popular and widely used containerbased serverless platform available for edge devices perform relatively poorly as we demonstrate in our evaluation. In this paper, we propose to investigate unikernel as a solution to build serverless platform at the edge, addressing in particular performance and security concerns. We present UniFaaS, a prototype edge-serverless platform which leverages unikernels-tiny library single-address-space operating systems that only contain the parts of the OS needed to run a given application-to execute functions. The result is a serverless platform with extremely low memory and CPU footprints, and excellent performance. UniFaaS has been designed to be deployed on low-powered single-board computer devices, such as Raspberry Pi or Arduino, without compromising on performance.

To Tune or Not to Tune?

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020

This experimental study presents a number of issues that pose a challenge for practical configura... more This experimental study presents a number of issues that pose a challenge for practical configuration tuning and its deployment in data analytics frameworks. These issues include: 1) the assumption of a static workload or environment, ignoring the dynamic characteristics of the analytics environment (e.g., increase in input data size, changes in allocation of resources). 2) the amortization of tuning costs and how this influences what workloads can be tuned in practice in a cost-effective manner. 3) the need for a comprehensive incremental tuning solution for a diverse set of workloads. We adapt different ML techniques in order to obtain efficient incremental tuning in our problem domain, and propose Tuneful, a configuration tuning framework. We show how it is designed to overcome the above issues and illustrate its applicability by running a wide array of experiments in cloud environments provided by two different service providers. CCS CONCEPTS • Theory of computation → Online learning algorithms; Gaussian processes; Non-parametric optimization.

Container Escape Detection for Edge Devices

Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, 2021

Edge computing is rapidly changing the IoT-Cloud landscape. Various testbeds are now able to run ... more Edge computing is rapidly changing the IoT-Cloud landscape. Various testbeds are now able to run multiple Docker-like containers developed and deployed by end-users on edge devices. However, this capability may allow an attacker to deploy a malicious container on the host and compromise it. This paper presents a dataset based on the Linux Auditing System, which contains malicious and benign container activity. We developed two malicious scenarios, a denial of service and a privilege escalation attack, where an adversary uses a container to compromise the edge device. Furthermore, we deployed benign user containers to run in parallel with the malicious containers. Container activity can be captured through the host system via system calls. Our time series auditd dataset contains partial labels for the benign and malicious related system calls. Generating the dataset is largely automated using a provided AutoCES framework. We also present a semi-supervised machine learning use case with the collected data to demonstrate its utility. The dataset and framework code are open-source and publicly available. CCS CONCEPTS • Information systems → Data mining; • Security and privacy → Intrusion/anomaly detection and malware mitigation.

Information Flow Audit for PaaS clouds

by Thomas Pasquier and Jatinder Singh

—With the rapid increase in uptake of cloud services, issues of data management are becoming incr... more —With the rapid increase in uptake of cloud services, issues of data management are becoming increasingly prominent. There is a clear, outstanding need for the ability for specified policy to control and track data as it flows throughout cloud infrastructure, to ensure that those responsible for data are meeting their obligations. This paper introduces Information Flow Audit, an approach for tracking information flows within cloud infrastructure. This builds upon CamFlow (Cambridge Flow Control Architecture), a prototype implementation of our model for data-centric security in PaaS clouds. CamFlow enforces Information Flow Control policy both intra-machine at the kernel-level, and inter-machine, on message exchange. Here we demonstrate how CamFlow can be extended to provide data-centric audit logs akin to provenance metadata in a format in which analyses can easily be automated through the use of standard graph processing tools. This allows detailed understanding of the overall system. Combining a continuously enforced data-centric security mechanism with meaningful audit empowers tenants and providers to both meet and demonstrate compliance with their data management obligations.

Practical Whole-System Provenance Capture

Data provenance describes how data came to be in its present form. It includes data sources and t... more Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a system's behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities ; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.

PHP2Uni: Building Unikernels using Scripting Language Transpilation

—Unikernels are a rapidly emerging technology in the world of cloud computing. Unikernels build o... more —Unikernels are a rapidly emerging technology in the world of cloud computing. Unikernels build on research on library operating systems to deliver smaller, faster and more secure virtual machines, specifically optimised for a single application service. These features are especially useful in cost or resource constrained environments. However, as with any new technology, early adopters need to master many technical details, and understand many aspects of the mechanisms used to build and deploy unikernels. Both of these factors may slow adoption rates. In this paper, we present our initial experiments into the use of an approach for building unikernels that is accessible to those whose technical expertise is focused on web development. We present PHP2Uni: a tool chain that takes a website built from PHP files—PHP remains the most widely used web language— and builds a resource-efficient unikernel image from them, while requiring little knowledge of the underlying operating system software complexity.

Big ideas paper: Policy-driven middleware for a legally-compliant Internet of Things

Internet of Things (IoT) applications, systems and services are subject to law. We argue that for... more Internet of Things (IoT) applications, systems and services are subject to law. We argue that for the IoT to develop lawfully, there must be technical mechanisms that allow the enforcement of specified policy, such that systems align with legal realities. The audit of policy enforcement must assist the apportionment of liability, demonstrate compliance with regulation, and indicate whether policy correctly captures legal responsibilities. As both systems and obligations evolve dynamically, this cycle must be continuously maintained. This poses a huge challenge given the global scale of the IoT vision. The IoT entails dynamically creating new services through managed and flexible data exchange. Data management is complex in this dynamic environment, given the need to both control and share information, often across federated domains of administration. We see middleware playing a key role in managing the IoT. Our vision is for a middleware-enforced, unified policy model that applies end-to-end, throughout the IoT. This is because policy cannot be bound to things, applications, or administrative domains, since functionality is the result of composition, with dynamically formed chains of data flows. We have investigated the use of Information Flow Control (IFC) to manage and audit data flows in cloud computing; a domain where trust can be well-founded, regulations are more mature and associated responsibilities clearer. We feel that IFC has great potential in the broader IoT context. However, the sheer scale and the dynamic, federated nature of the IoT pose a number of significant research challenges.

Data-Centric Access Control for Cloud Computing

The usual approach to security for cloud-hosted applications is strong separation. However, it is... more The usual approach to security for cloud-hosted applications is strong separation. However, it is often the case that the same data is used by different applications, particularly given the increase in data-driven ('big data' and IoT) applications. We argue that access control for the cloud should no longer be application-specific but should be data-centric, associated with the data that can flow between applications. Indeed, the data may originate outside cloud services from diverse sources such as medical monitoring, environmental sensing etc. Information Flow Control (IFC) potentially offers data-centric, system-wide data access control. It has been shown that IFC can be provided at operating system level as part of a PaaS offering, with an acceptable overhead. In this paper we consider how IFC can be integrated with application-specific access control, transparently from application developers, while building from simple IFC primi-tives, access control policies that align with the data management obligations of cloud providers and tenants.

Clouds of Things need Information Flow Control with Hardware Roots of Trust

International Conference on Cloud Computing Technology and Science

There is a clear, outstanding need for new security mechanisms that allow data to be managed and ... more There is a clear, outstanding need for new security mechanisms that allow data to be managed and controlled within the cloud-enabled Internet of Things. Towards this, we propose an approach based on Information Flow Control (IFC) that allows: (1) the continuous, end-to-end enforcement of data flow policy, and (2) the generation of provenance-like audit logs to demonstrate policy adherence and contractual\slash regulatory compliance. Further, we discuss the role of Trusted Platform Modules (TPMs) in supporting such a system, by providing hardware roots of trust. TPMs can be leveraged to validate software configurations, including the IFC enforcement mechanism, both in the cloud and externally via remote attestation.

Managing Big Data with Information Flow Control

by Thomas Pasquier and Jean Bacon

Concern about data leakage is holding back more widespread adoption of cloud computing by compani... more Concern about data leakage is holding back more widespread adoption of cloud computing by companies and public institutions alike. To address this, cloud tenants/applications are traditionally isolated in virtual machines or containers. But an emerging requirement is for cross-application sharing of data, for example, when cloud services form part of an IoT architecture. Information Flow Control (IFC) is ideally suited to achieving both isolation and data sharing as required. IFC enhances traditional Access Control by providing continuous, data-centric, crossapplication, end-to-end control of data flows. However, large-scale data processing is a major requirement of cloud computing and is infeasible under standard IFC. We present a novel, enhanced IFC model that subsumes standard models. Our IFC model supports ‘Big Data’ processing, while retaining the simplicity of standard IFC and enabling more concise, accurate and maintainable expression of policy.

Securing Tags to Control Information Flows within the Internet of Things

To realise the full potential of the Internet of Things (IoT), IoT architectures are moving towar... more To realise the full potential of the Internet of Things (IoT), IoT architectures are moving towards open and dynamic interoperability, as opposed to closed application silos. This is because functionality is realised through the interactions, i.e. the exchange of data, between a wide-range of 'things'. Data sharing requires management. Towards this, we are exploring distributed, decentralised Information Flow Control (IFC) to enable controlled data flows, end-to-end, according to policy. In this paper we make the case for IFC, as a data-centric control mechanism, for securing IoT architectures. Previous research on IFC focuses on a particular system or application, e.g. within an operating system, with little concern for wide-scale, dynamic systems. To render IFC applicable to IoT, we present a certificate-based model for secure, trustworthy policy specification, that also reflects real-world IoT concerns such as 'thing' ownership. This approach enables decentralised, distributed, verifiable policy specification, crucial for securing the wide-ranging, dynamic interactions of future IoT applications.

FlowK: Information Flow Control for the Cloud

Security concerns are widely seen as an obstacle to the adoption of cloud computing solutions and... more Security concerns are widely seen as an obstacle to the adoption of cloud computing solutions and although a wealth of law and regulation has emerged, the technical basis for enforcing and demonstrating compliance lags behind. Our CloudSafetyNet project aims to show that Information Flow Control (IFC) can augment existing security mechanisms and provide continuous enforcement of extended. finer-grained application-level security policy in the cloud.

We present FlowK, a loadable kernel module for Linux, as part of a proof of concept that IFC can be provided for cloud computing. Following the principle of policy-mechanism separation, IFC policy is assumed to be expressed at application level and FlowK provides mechanisms to enforce IFC policy at runtime. FlowK’s design minimises the changes required to existing software when IFC is provided. To show how FlowK can be integrated with cloud software we have designed and evaluated a framework for deploying IFC-aware web applications, suitable for use in a PaaS cloud.

If these data could talk

Scientific data, Sep 5, 2017

In the last few decades, data-driven methods have come to dominate many fields of scientific inqu... more In the last few decades, data-driven methods have come to dominate many fields of scientific inquiry. Open data and open-source software have enabled the rapid implementation of novel methods to manage and analyze the growing flood of data. However, it has become apparent that many scientific fields exhibit distressingly low rates of reproducibility. Although there are many dimensions to this issue, we believe that there is a lack of formalism used when describing end-to-end published results, from the data source to the analysis to the final published results. Even when authors do their best to make their research and data accessible, this lack of formalism reduces the clarity and efficiency of reporting, which contributes to issues of reproducibility. Data provenance aids both reproducibility through systematic and formal records of the relationships among data sources, processes, datasets, publications and researchers.

Personal data and the internet of things

Communications of the ACM, 2019

From Here to Provtopia

Heterogeneous Data Management, Polystores, and Analytics for Healthcare, 2019

Valuable, sensitive, and regulated data flow freely through distributed systems. In such a world,... more Valuable, sensitive, and regulated data flow freely through distributed systems. In such a world, how can systems plausibly comply with the regulations governing the collection, use, and management of such data? We claim that distributed data provenance, the directed acyclic graph documenting the origin and transformations of data holds the key. Provenance analysis has already been demonstrated in a wide range of applications: from intrusion detection to performance analysis. We describe how similar systems and analysis techniques are suitable both for implementing the complex policies that govern data and verifying compliance with regulatory mandates. We also highlight the challenges to be addressed to move provenance from research laboratories to production systems.

Comment: If these data could talk

by Thomas Pasquier, Ana Trisovic, and Mercè Crosas

In the last few decades, data-driven methods have come to dominate many fields of scientific inqu... more In the last few decades, data-driven methods have come to dominate many fields of scientific inquiry. Open data and open-source software have enabled the rapid implementation of novel methods to manage and analyze the growing flood of data. However, it has become apparent that many scientific fields exhibit distressingly low rates of reproducibility. Although there are many dimensions to this issue, we believe that there is a lack of formalism used when describing end-to-end published results, from the data source to the analysis to the final published results. Even when authors do their best to make their research and data accessible, this lack of formalism reduces the clarity and efficiency of reporting, which contributes to issues of reproducibility. Data provenance aids both reproducibility through systematic and formal records of the relationships among data sources, processes, datasets, publications and researchers. Reproducibility The success and power of science depends on the transparency and validation of its findings. However, issues with reproducibility have surfaced across a broad swath of scientific disciplines. Reports of such issues have emanated from fields ranging from the social sciences to physics and the life-sciences, including medicine 1. Although the lack of reproducibility does not necessarily imply incorrect results 2 , it remains a worrisome issue. This comes at a time when the rate of scientific publication is increasing exponentially 3. At the same time, the data and the processes that produce results are becoming more computationally demanding. Reproducibility is the cornerstone of science, so it is imperative that we improve the quality and reliability of publications by going beyond the publication of results and data to making analytical processes, not only available, but more importantly, intelligible 4. Too often, despite the best efforts of authors, transparency, adequate for the replication of computational processes, is elusive. We advocate open-data, open-source and open-process, which we define as the formal record of the workflow that produced a result. Changes to the pipeline that transforms raw data to results can lead to non-trivial differences in results, which are impossible to explain without sufficient reporting. For example, a re-examination of studies of carbon flux in forested ecosystems in the Amazon detected differences in estimates up to 140%, which could mean as much as 7 metric tons of carbon per year in an area roughly the size of a football field, resulting from small differences in analytical pipelines 5. Also, seemingly simple details, such as the version of the initial (raw) data or versions of the analytical software programs, are often difficult to identify, and their absence makes replication of analyses impossible, even if the code is available. Provenance-Aware Research We suggest that there is an opportunity for the implementation of formalized (following mathematical reasoning) methods for collecting analytical details. Such methods are essential for transparent scientific research as promoted by the data policies of many funding agencies, including the UK Engineering and Physical Sciences Research Council and the US National Science Foundation. Implementation of such methods can be achieved only by systematically capturing computational processes and presenting this information in a machine-actionable format. One possible solution is the use of data provenance, which is

Data provenance to audit compliance with privacy policy in the Internet of Things

Managing privacy in the IoT presents a significant challenge. We make the case that information o... more Managing privacy in the IoT presents a significant challenge. We make the case that information obtained by auditing the flows of data can assist in demonstrating that the systems handling personal data satisfy regulatory and user requirements. Thus, components handling personal data should be audited to demonstrate that their actions comply with all such policies and requirements. A valuable side-effect of this approach is that such an auditing process will highlight areas where technical enforcement has been incompletely or incorrectly specified. There is a clear role for technical assistance in aligning privacy policy enforcement mechanisms with data protection regulations. The first step necessary in producing technology to accomplish this alignment is to gather evidence of data flows. We describe our work producing, representing and querying audit data and discuss outstanding challenges.

CamFlow: Managed data-sharing for cloud services

A model of cloud services is emerging whereby a few trusted providers manage the underlying hardw... more A model of cloud services is emerging whereby a few trusted providers manage the underlying hardware and communications whereas many companies build on this infrastructure to offer higher level, cloud-hosted PaaS services and/or SaaS applications. From the start, strong isolation between cloud tenants was seen to be of paramount importance, provided first by virtual machines (VM) and later by containers, which share the operating system (OS) kernel. Increasingly it is the case that applications also require facilities to effect isolation and protection of data managed by those applications. They also require flexible data sharing with other applications, often across the traditional cloud-isolation boundaries; for example, when government provides many related services for its citizens on a common platform. Similar considerations apply to the end-users of applications. But in particular, the incorporation of cloud services within ‘Internet of Things’ architectures is driving the requirements for both protection and cross-application data sharing. These concerns relate to the management of data. Traditional access control is application and principal/role specific, applied at policy enforcement points, after which there is no subsequent control over where data flows; a crucial issue once data has left its owner’s control by cloud-hosted applications and within cloud-services. Information Flow Control (IFC), in addition, offers system-wide, end-to-end, flow control based on the properties of the data. We discuss the potential of cloud-deployed IFC for enforcing owners’ dataflow policy with regard to protection and sharing, as well as safeguarding against malicious or buggy software. In addition, the audit log associated with IFC provides transparency, giving configurable system-wide visibility over data flows. This helps those responsible to meet their data management obligations, providing evidence of compliance, and aids in the ident- fication of policy errors and misconfigurations. We present our IFC model and describe and evaluate our IFC architecture and implementation (CamFlow). This comprises an OS level implementation of IFC with support for application management, together with an IFC-enabled middleware. Our contribution is to demonstrate the feasibility of incorporating IFC into cloud services: we show how the incorporation of IFC into cloud-provided OSs underlying PaaS and SaaS would address application sharing and protection requirements, and more generally, greatly enhance the trustworthiness of cloud services at all levels, at little overhead, and transparently to tenants.

Data Flow Management and Compliance in Cloud Computing

As cloud computing becomes an increasingly dominant means of providing computing resources, the l... more As cloud computing becomes an increasingly dominant means of providing computing resources, the legal and regulatory issues associated with data in the cloud become more pronounced. These issues derive primarily from four areas: contract, data protection, law enforcement, and regulatory and common law protections for particularly sensitive domains such as health, finance, fiduciary relations, and intellectual property assets. From a technical perspective, these legal requirements all impose information management obligations on data sharing and transmission within cloud-hosted applications and services. They might restrict how, when, where, and by whom data may flow and be accessed. These issues must be managed not only between applications, but also through the entire, potentially global, cloud supply chain.

Twenty security considerations for cloud-supported Internet of Things

by Thomas Pasquier and Ronny Ko

To realise the broad vision of pervasive computing, underpinned by the “Internet of Things” (IoT)... more To realise the broad vision of pervasive computing, underpinned by the “Internet of Things” (IoT), it is essential to break down application and technology-based silos and support broad connectivity and data sharing; the cloud being a natural enabler. Work in IoT tends towards the subsystem, often focusing on particular technical concerns or application domains, before offloading data to the cloud. As such, there has been little regard
given to the security, privacy and personal safety risks that arise beyond these subsystems; that is, from the wide-scale, cross-platform openness that cloud services bring to IoT.

In this paper we focus on security considerations for IoT from the perspectives of cloud tenants, end-users and cloud providers, in the context of wide-scale IoT proliferation, working across the range of IoT technologies (be they things or entire IoT subsystems). Our contribution is to analyse the current state of cloud-supported IoT to make explicit the security considerations that require further work.

Information Flow Control for Secure Cloud Computing

Security concerns are widely seen as an obstacle to the adoption of cloud computing solutions. In... more Security concerns are widely seen as an obstacle to the adoption of cloud computing solutions. Information Flow Control (IFC) is a well understood Mandatory Access Control methodology. The earliest IFC models targeted security in a centralised environment, but decentralised forms of IFC have been designed and implemented, often within academic research projects. As a result, there is potential for decentralised IFC to achieve better cloud security than is available today.

In this paper we describe the properties of cloud computing—Platform-as-a-Service clouds in particular—and review a range of IFC models and implementations to identify opportunities for using IFC within a cloud computing context. Since IFC security is linked to the data that it protects, both tenants and providers of cloud services can agree on security policy, in a manner that does not require them to understand and rely on the particulars of the cloud software stack in order to effect enforcement.

Provenance-based Intrusion Detection: Opportunities and Challenges

ArXiv, 2018

Intrusion detection is an arms race; attackers evade intrusion detection systems by developing ne... more Intrusion detection is an arms race; attackers evade intrusion detection systems by developing new attack vectors to sidestep known defense mechanisms. Provenance provides a detailed, structured history of the interactions of digital objects within a system. It is ideal for intrusion detection, because it offers a holistic, attack-vector-agnostic view of system execution. As such, provenance graph analysis fundamentally strengthens detection robustness. We discuss the opportunities and challenges associated with provenance-based intrusion detection and provide insights based on our experience building such systems.

FRAPpuccino: Fault-detection through Runtime Analysis of Provenance

We present FRAPpuccino (or FRAP), a provenance-based fault detection mechanism for Platform as a ... more We present FRAPpuccino (or FRAP), a provenance-based fault detection mechanism for Platform as a Service (PaaS) users, who run many instances of an application on a large cluster of machines. FRAP models, records, and analyzes the behavior of an application and its impact on the system as a directed acyclic provenance graph. It assumes that most instances behave normally and uses their behavior to construct a model of legitimate behavior. Given a model of legitimate behavior, FRAP uses a dynamic sliding window algorithm to compare a new instance's execution to that of the model. Any instance that does not conform to the model is identified as an anomaly. We present the FRAP prototype and experimental results showing that it can accurately detect application anomalies.

Expressing and Enforcing Location Requirements in the Cloud using Information Flow Control

The adoption of cloud computing is increasing and its use is becoming widespread in many sectors.... more The adoption of cloud computing is increasing and its use is becoming widespread in many sectors. As cloud service provision increases, legal and regulatory issues become more significant. In particular, the international nature of cloud provision raises concerns over the location of data and the laws to which they are subject. In this paper we investigate Information Flow Control (IFC) as a possible technical solution to expressing, enforcing and demonstrating compliance of cloud computing systems with policy requirements inspired by data protection and other laws. We focus on geographic location of data, since this is the paradigmatic concern of legal/regulatory requirements on cloud computing and, to date, has not been met with robust technical solutions and verifiable data flow audit trails.

Information Flow Control for Strong Protection with Flexible Sharing in PaaS

The need to share data across applications is be-coming increasingly evident. Current cloud isola... more The need to share data across applications is be-coming increasingly evident. Current cloud isolation mechanisms focus solely on protection, such as containers that isolate at the OS-level, and virtual machines that isolate through the hypervi-sor. However, by focusing rigidly on protection, these approaches do not provide for controlled sharing. This paper presents how Information Flow Control (IFC) offers a flexible alternative. As a data-centric mechanism it enables strong isolation when required, while providing continuous, fine grained control of the data being shared. An IFC-enabled cloud platform would ensure that policies are enforced as data flows across all applications, without requiring any special sharing mechanisms.

Regional clouds: technical considerations

Runtime Analysis of Whole-System Provenance

arXiv (Cornell University), Aug 18, 2018

Identifying the root cause and impact of a system intrusion remains a foundational challenge in c... more Identifying the root cause and impact of a system intrusion remains a foundational challenge in computer security. Digital provenance provides a detailed history of the flow of information within a computing system, connecting suspicious events to their root causes. Although existing provenance-based auditing techniques provide value in forensic analysis, they assume that such analysis takes place only retrospectively. Such post-hoc analysis is insufficient for realtime security applications; moreover, even for forensic tasks, prior provenance collection systems exhibited poor performance and scalability, jeopardizing the timeliness of query responses. We present CamQuery, which provides inline, realtime provenance analysis, making it suitable for implementing security applications. CamQuery is a Linux Security Module that offers support for both userspace and in-kernel execution of analysis applications. We demonstrate the applicability of CamQuery to a variety of runtime security applications including data loss prevention, intrusion detection, and regulatory compliance. In evaluation, we demonstrate that CamQuery reduces the latency of realtime query mechanisms, while imposing minimal overheads on system execution. CamQuery thus enables the further deployment of provenance-based technologies to address central challenges in computer security.

Impact, Jan 9, 2017

Viewpoint | Personal Data and the Internet of Things: It is time to care about digital provenance

arXiv (Cornell University), Mar 30, 2019

e Internet of ings promises a connected environment reacting to and addressing our every need, bu... more e Internet of ings promises a connected environment reacting to and addressing our every need, but based on the assumption that all of our movements and words can be recorded and analysed to achieve this end. Ubiquitous surveillance is also a precondition for most dystopian societies, both real and fictional. How our personal data is processed and consumed in an ever more connected world must imperatively be made transparent, and more effective technical solutions than those currently on offer, to manage personal data must urgently be investigated.

Secure Namespaced Kernel Audit for Containers

arXiv (Cornell University), Nov 3, 2021

Despite the wide usage of container-based cloud computing, container auditing for security analys... more Despite the wide usage of container-based cloud computing, container auditing for security analysis relies mostly on built-in host audit systems, which often lack the ability to capture high-fidelity container logs. State-of-the-art reference-monitor-based audit techniques greatly improve the quality of audit logs, but their system-wide architecture is too costly to be adapted for individual containers. Moreover, these techniques typically require extensive kernel modifications, making it difficult to deploy in practical settings. In this paper, we present saBPF (secure audit BPF), an extension of the eBPF framework capable of deploying secure system-level audit mechanisms at the container granularity. We demonstrate the practicality of saBPF in Kubernetes by designing an audit framework, an intrusion detection system, and a lightweight access control mechanism. We evaluate saBPF and show that it is comparable in performance and security guarantees to audit systems from the literature that are implemented directly in the kernel.

FRAPpuccino: Fault-detection through Runtime Analysis of Provenance

arXiv (Cornell University), Nov 30, 2017

We present FRAPpuccino (or FRAP), a provenancebased fault detection mechanism for Platform as a S... more We present FRAPpuccino (or FRAP), a provenancebased fault detection mechanism for Platform as a Service (PaaS) users, who run many instances of an application on a large cluster of machines. FRAP models, records, and analyzes the behavior of an application and its impact on the system as a directed acyclic provenance graph. It assumes that most instances behave normally and uses their behavior to construct a model of legitimate behavior. Given a model of legitimate behavior, FRAP uses a dynamic sliding window algorithm to compare a new instance's execution to that of the model. Any instance that does not conform to the model is identified as an anomaly. We present the FRAP prototype and experimental results showing that it can accurately detect application anomalies.

The Fruits of Provenance

The goal of our project is to design, build, and study an end-to-end system that extends all the ... more

Seeing through the clouds : Managing data flow and compliance in cloud computing

As cloud computing becomes an increasingly dominant means of providing computing resources worldw... more As cloud computing becomes an increasingly dominant means of providing computing resources worldwide, legal and regulatory issues associated with the cloud also become more pronounced. In particular, there is a heightened focus on ensuring the privacy and integrity of end-users’ personal data. At present, the cloud is opaque, a black-box. The technical means for enforcing and demonstrating compliance with data management practices lag behind legal and regulatory aspirations. After reviewing existing methods for containing, accessing and encrypting data, we introduce Information Flow Control (IFC) as a technology enabling auditable, fine-grained management as data moves throughout systems. We describe how IFC offers potential in improving the visibility and control over data flows within and between cloud services and cloud-hosted applications. This is demonstrated through real-world legal/ regulatory examples, which show how IFC can help satisfy data management obligations, and impr...

Rclean: A Tool for Writing Cleaner, More Transparent Code

Journal of Open Source Software, 2020

Information Flow Control with Minimal Tag Disclosure

Proceedings of the International Conference on Internet of things and Cloud Computing, 2016

Information Flow Control (IFC) extends conventional access control beyond application boundaries,... more Information Flow Control (IFC) extends conventional access control beyond application boundaries, and allows control of data flows after a point of authorised data disclosure. In a deployment of IFC within a cloud operating system (OS), the IFC implementation can be trusted by applications running over the same OS instance. In an IFC deployment within a widely distributed system, such as in the Internet of Things, the potential for trustworthy enforcement of IFC must be ascertained during connection establishment. IFC is based on tagging data in line with data management requirements. When audit is included as part of IFC, it can be shown that a system complies with these requirements. In this paper, we consider the possibility that some tags may be sensitive and discuss the use of Private Set Intersection (PSI) to prevent unnecessary disclosure of IFC tags during the establishment of communication channels. The proposed approach guarantees that on authorised flows, only the tags necessary for that interaction are disclosed and that no tags are disclosed for prevented flows. This functionality is particularly important in contexts such as healthcare, where privacy and confidentiality are paramount. CCS Concepts •Security and privacy → Information flow control; Distributed systems security; Public key (asymmetric) techniques;

On Information Flow Control and Audit for Demonstrable Compliance in the Cloud

Tiny Trans. Comput. Sci., 2016

There is pressure for more and better control over personal data in cloud environments. Cloud ten... more There is pressure for more and better control over personal data in cloud environments. Cloud tenants are increasingly burdened with data management obligations [3], and therefore require assurance of proper data handling throughout the whole-system. We believe a simple technical mechanism can contribute to such guarantees. Decentralised Information Flow Control (DIFC) is a data-centric mandatory access control scheme that guarantees non-interference across security contexts, based on lattices defined by secrecy and integrity properties. Every data flow is continuously monitored to guarantee the enforcement of (decentrally) specified policies. We demonstrated that DIFC can constrain data flows throughout an entire cloud platform [2]. A simple DIFC constraint confines the flow of data within a security context. Simple constraints can guarantee complex workflow, for example to ensure that data is anonymised before crossing security contexts for disclosure to third parties. We considered the enforcement of some cloud legal requirements [3]. We showed that information captured during DIFC enforcement allows the generation of a provenance-like directed graph representing whole-system data exchange [1]. This coupling of policy enforcement with the capture of audit data allows system "noise" to be removed, so that only the information relevant to the policies in place is recorded. These graphs can be queried to reveal whether system behaviour accords with particular data management obligations. For instance, from data gathered at run-time, we show it can be ascertained that there is no path in which personal data was transferred to another tenant without being anonymised [1]. BODY Complex policy can be built upon simple primitives enforced on every data flow. Enforcement data can be captured to demonstrate compliance.

Tuneful: An Online Significance-Aware Configuration Tuner for Big Data Analytics

arXiv (Cornell University), Jan 22, 2020

Distributed analytics engines such as Spark are a common choice for processing extremely large da... more Distributed analytics engines such as Spark are a common choice for processing extremely large datasets. However, finding good configurations for these systems remains challenging, with each workload potentially requiring a different setup to run optimally. Using suboptimal configurations incurs significant extra runtime costs. We propose Tuneful, an approach that efficiently tunes the configuration of in-memory cluster computing systems. Tuneful combines incremental Sensitivity Analysis and Bayesian optimization to identify near optimal configurations from a high-dimensional search space, using a small number of executions. This setup allows the tuning to be done online, without any previous training. Our experimental results show that Tuneful reduces the search time for finding close-to-optimal configurations by 62% (at the median) when compared to existing state-of-the-art techniques. This means that the amortization of the tuning cost happens significantly faster, enabling practical tuning for new classes of workloads.

Facilitating plausible deniability for cloud providers regarding tenants’ activities using trusted execution

2020 IEEE International Conference on Cloud Engineering (IC2E)

A cloud provider that can technically determine tenants' operations may be compelled to disclose ... more A cloud provider that can technically determine tenants' operations may be compelled to disclose such activities by law enforcement agencies (LEAs). The situation gets even more complex when multiple LEAs across different jurisdictions are involved, e.g., because of the distributed locations of cloud servers and data storage. Yet cloud providers typically do not need or want to know about their tenants' activities, other than measuring how such activities incur expenses for using cloud resources. Thus mechanisms should be developed for cloud providers to have sufficient plausible deniability with regards to the processing being carried out by tenants on their platform, in jurisdictions that permit cloud providers to avoid liabilities in this way. Symmetrically, such mechanisms could protect tenants from legal overreach , for example, when the country in which the cloud provider is incorporated could force disclosure of the processing carried out by cloud tenants. But to what extent can cloud providers acquire plausible deniability? Current discussions regarding risk have focused on data confidentiality and integrity. We argue that processing operations can equally reveal sensitive information-such as trade secrets and business processes-and that for some classes of application both data protection and algorithm protection are necessary. In this paper, we examine the legal and technical motivations for achieving plausible deniability in cloud interactions. We demonstrate the likely performance overhead of using containers secured with technologies such as Intel SGX. Further, we examine the current limitations of our proposed plausible deniability mechanisms, and outline a potential approach for enabling lawful access to enclaves subject to appropriate judicial oversight.

Integrating Middleware with Information Flow Control

Data provenance to audit compliance with privacy policy in the Internet of Things

Personal and Ubiquitous Computing, 2017

Managing privacy in the IoT presents a significant challenge. We make the case that information o... more Managing privacy in the IoT presents a significant challenge. We make the case that information obtained by auditing the flows of data can assist in demonstrating that the systems handling personal data satisfy regulatory and user requirements. Thus, components handling personal data should be audited to demonstrate that their actions comply with all such policies and requirements. A valuable side-effect of this approach is that such an auditing process will highlight areas where technical enforcement has been incompletely or incorrectly specified. There is a clear role for technical assistance in aligning privacy policy enforcement mechanisms with data protection regulations. The first step necessary in producing technology to accomplish this alignment is to gather evidence of data flows. We describe our work producing, representing and querying audit data and discuss outstanding challenges.

Information Flow Audit for Transparency and Compliance in the Handling of Personal Data

2016 IEEE International Conference on Cloud Engineering Workshop (IC2EW), 2016

The adoption of cloud computing is increasing and its use is becoming widespread in many sectors.... more The adoption of cloud computing is increasing and its use is becoming widespread in many sectors. As the proportion of services provided using cloud computing increases, legal and regulatory issues are becoming more significant. In this paper we explore how an Information Flow Audit (IFA) mechanism, that provides key data regarding provenance, can be used to verify compliance with regulatory and contractual duty, and survey potential extensions. We explore the use of IFA for such a purpose through a smart electricity metering use case derived from a French Data Protection Agency recommendation.

Data-Centric Access Control for Cloud Computing

Proceedings of the 21st ACM on Symposium on Access Control Models and Technologies, 2016

The usual approach to security for cloud-hosted applications is strong separation. However, it is... more The usual approach to security for cloud-hosted applications is strong separation. However, it is often the case that the same data is used by different applications, particularly given the increase in data-driven ('big data' and IoT) applications. We argue that access control for the cloud should no longer be application-specific but should be data-centric, associated with the data that can flow between applications. Indeed, the data may originate outside cloud services from diverse sources such as medical monitoring, environmental sensing etc. Information Flow Control (IFC) potentially offers data-centric, system-wide data access control. It has been shown that IFC can be provided at operating system level as part of a PaaS offering, with an acceptable overhead. In this paper we consider how IFC can be integrated with application-specific access control, transparently from application developers, while building from simple IFC primitives, access control policies that align with the data management obligations of cloud providers and tenants.

Clouds of Things Need Information Flow Control with Hardware Roots of Trust

2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom), 2015

There is a clear, outstanding need for new security mechanisms that allow data to be managed and ... more There is a clear, outstanding need for new security mechanisms that allow data to be managed and controlled within the cloud-enabled Internet of Things. Towards this, we propose an approach based on Information Flow Control (IFC) that allows: (1) the continuous, end-to-end enforcement of data flow policy, and (2) the generation of provenance-like audit logs to demonstrate policy adherence and contractual/regulatory compliance. Further, we discuss the role of Trusted Platform Modules (TPMs) in supporting such a system, by providing hardware roots of trust. TPMs can be leveraged to validate software configurations, including the IFC enforcement mechanism, both in the cloud and externally via remote attestation.

Information Flow Audit for PaaS Clouds

2016 IEEE International Conference on Cloud Engineering (IC2E), 2016

With the rapid increase in uptake of cloud services, issues of data management are becoming incre... more With the rapid increase in uptake of cloud services, issues of data management are becoming increasingly prominent. There is a clear, outstanding need for the ability for specified policy to control and track data as it flows throughout cloud infrastructure, to ensure that those responsible for data are meeting their obligations. This paper introduces Information Flow Audit, an approach for tracking information flows within cloud infrastructure. This builds upon CamFlow (Cambridge Flow Control Architecture), a prototype implementation of our model for data-centric security in PaaS clouds. CamFlow enforces Information Flow Control policy both intra-machine at the kernel-level, and inter-machine, on message exchange. Here we demonstrate how CamFlow can be extended to provide data-centric audit logs akin to provenance metadata in a format in which analyses can easily be automated through the use of standard graph processing tools. This allows detailed understanding of the overall system. Combining a continuously enforced data-centric security mechanism with meaningful audit empowers tenants and providers to both meet and demonstrate compliance with their data management obligations.

Camflow: Managed Data-Sharing for Cloud Services

IEEE Transactions on Cloud Computing, 2017

Data Flow Management and Compliance in Cloud Computing

IEEE Cloud Computing, 2015

As cloud computing becomes an increasingly dominant means of providing computing resources worldw... more As cloud computing becomes an increasingly dominant means of providing computing resources worldwide, legal and regulatory issues associated with the cloud also become more pronounced. In particular, there is a heightened focus on ensuring the privacy and integrity of end-users' personal data. At present, the cloud is opaque, a black-box. The technical means for enforcing and demonstrating compliance with data management practices lag behind legal and regulatory aspirations. After reviewing existing methods for containing, accessing and encrypting data, we introduce Information Flow Control (IFC) as a technology enabling auditable, fine-grained management as data moves throughout systems. We describe how IFC offers potential in improving the visibility and control over data flows within and between cloud services and cloud-hosted applications. This is demonstrated through real-world legal/ regulatory examples, which show how IFC can help satisfy data management obligations, and improve the accountability of responsible parties. I. Responsibility in the cloud Cloud computing is an industry with rapid and continued growth, reflecting the efficiencies and cost reductions that can be obtained through economies of scale, improved global accessibility, and simplified, 'outsourced' management and configuration. Legal issues concerning data in the cloud derive primarily from four areas: contract; data protection; law enforcement; and regulatory and common law protections for particularly sensitive domains such as health, finance, fiduciary relations, and intellectual property assets. From a technical perspective, these legal requirements all impose information management obligations on the transmission and sharing of data within cloud-hosted applications and services. They may restrict how, when, where, and by whom, data may flow and be accessed. These issues must be managed not only between applications, but through the entire, potentially global, cloud supply chain. Currently, cloud providers employ access controls, to prevent unauthorised access to data and services; and containment mechanisms, to prevent data leaking between tenants (those consuming cloud services) using shared infrastructure. But these tend to be security rather than compliance focused, often applying at specific application, system or user boundaries. Further, cloud services tend to be opaque and black-box in