Skip to main content

Mark Purcell

Followers

0

Following

0

Public Views

University of Zagreb

Uppsala University

University of East London

University of Leicester

Gwen Robbins Schug

University of North Carolina at Greensboro

Gabriel Gutierrez-Alonso

University of Salamanca

Macquarie University

Universidade Federal do Rio Grande do Sul

Swansea University

Jesper Hoffmeyer

University of Copenhagen

Uploads

Papers by Mark Purcell

Towards an Accountable and Reproducible Federated Learning: A FactSheets Approach

arXiv (Cornell University), Feb 24, 2022

Federated Learning (FL) is a novel paradigm for the shared training of models based on decentrali... more Federated Learning (FL) is a novel paradigm for the shared training of models based on decentralized and private data. With respect to ethical guidelines, FL is promising regarding privacy, but needs to excel vis-à-vis transparency and trustworthiness. In particular, FL has to address the accountability of the parties involved and their adherence to rules, law and principles. We introduce AFˆ2 Framework , where we instrument FL with accountability by fusing verifiable claims with tamper-evident facts, into reproducible arguments. We build on AI FactSheets for instilling transparency and trustworthiness into the AI lifecycle and expand it in order to incorporate dynamic and nested facts, as well as complex model compositions in FL. Based on our approach, an auditor can validate, reproduce and certify a FL process. This can be directly applied in practice to address the challenges of AI engineering and ethics.

Accelerating Climate Simulations Through Hybrid Computing

Unconventional multi-core processors (e.g., rEM Cell B/E and NYIDIDA GPU) have emerged as acceler... more Unconventional multi-core processors (e.g., rEM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AM D) using MP!. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (I) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Yirtualization (DAY) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading computeintensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been omoaded to multiple Cell blades with -10% network overhead.

AI Modelling and Time-series Forecasting Systems for Trading Energy Flexibility in Distribution Grids

We demonstrate progress on the deployment of two sets of technologies to support distribution gri... more We demonstrate progress on the deployment of two sets of technologies to support distribution grid operators integrating high shares of renewable energy sources, based on a market for trading local energy flexibilities. An artificial-intelligence (AI) grid modelling tool, based on probabilistic graphs, predicts congestions and estimates the amount and location of energy flexibility required to avoid such events. A scalable timeseries forecasting system delivers large numbers of short-term predictions of distributed energy demand and generation. We discuss the deployment of the technologies at three trial demonstration sites across Europe, in the context of a research project carried out in a consortium with energy utilities, technology providers and research institutions.

Knowledge- and Data-driven Services for Energy Systems using Graph Neural Networks

arXiv (Cornell University), Mar 12, 2021

The transition away from carbon-based energy sources poses several challenges for the operation o... more The transition away from carbon-based energy sources poses several challenges for the operation of electricity distribution systems. Increasing shares of distributed energy resources (e.g. renewable energy generators, electric vehicles) and internet-connected sensing and control devices (e.g. smart heating and cooling) require new tools to support accurate, datadriven decision making. Modelling the effect of such growing complexity in the electrical grid is possible in principle using state-of-the-art power-power flow models. In practice, the detailed information needed for these physical simulations may be unknown or prohibitively expensive to obtain. Hence, datadriven approaches to power systems modelling, including feedforward neural networks and auto-encoders, have been studied to leverage the increasing availability of sensor data, but have seen limited practical adoption due to lack of transparency and inefficiencies on large-scale problems. Our work addresses this gap by proposing a data-and knowledge-driven probabilistic graphical model for energy systems based on the framework of graph neural networks (GNNs). The model can explicitly factor in domain knowledge, in the form of grid topology or physics constraints, thus resulting in sparser architectures and much smaller parameters dimensionality when compared with traditional machine-learning models with similar accuracy. Results obtained from a real-world smart-grid demonstration project show how the GNN was used to inform grid congestion predictions and market bidding services for a distribution system operator participating in an energy flexibility market.

Scalable Deployment of AI Time-series Models for IoT

arXiv (Cornell University), Mar 24, 2020

IBM Research Castor, a cloud-native system for managing and deploying large numbers of AI timeser... more IBM Research Castor, a cloud-native system for managing and deploying large numbers of AI timeseries models in IoT applications, is described. Modelling code templates, in Python and R, following a typical machine-learning workflow are supported. A knowledge-based approach to managing model and time-series data allows the use of general semantic concepts for expressing feature engineering tasks. Model templates can be programmatically deployed against specific instances of semantic concepts, thus supporting model reuse and automated replication as the IoT application grows. Deployed models are automatically executed in parallel leveraging a serverless cloud computing framework. The complete history of trained model versions and rolling-horizon predictions is persisted, thus enabling full model lineage and traceability. Results from deployments in realworld smart-grid live forecasting applications are reported. Scalability of executing up to tens of thousands of AI modelling tasks is also evaluated.

Adaptive Aggregation For Federated Learning

ArXiv, 2022

Advances in federated learning (FL) algorithms, along with technologies like differential privacy... more Advances in federated learning (FL) algorithms, along with technologies like differential privacy and homomorphic encryption, have led to FL being increasingly adopted and used in many application domains. This increasing adoption has led to rapid growth in the number, size (number of participants/parties) and diversity (intermittent vs. active parties) of FL jobs. Many existing FL systems, based on centralized (often single) model aggregators are unable to scale to handle large FL jobs and adapt to parties' behavior. In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we demonstrate how traditional tree overlay based aggregation techniques (from P2P, publishsubscribe and stream processing research) can help FL aggregation scale, but are ineffective from a resource utilization and cost standpoint. Next, we present the design and implementation of AdaFed, which uses serverless/cloud functions to adaptively scale aggregation in a resource efficient and fault tolerant manner. We describe how AdaFed enables FL aggregation to be dynamically deployed only when necessary, elastically scaled to handle participant joins/leaves and is fault tolerant with minimal effort required on the (aggregation) programmer side. We also demonstrate that our prototype based on Ray [1] scales to thousands of participants, and is able to achieve a > 90% reduction in resource requirements and cost, with minimal impact on aggregation latency.

Adaptive Aggregation For Federated Learning

2022 IEEE International Conference on Big Data (Big Data)

In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we ... more In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we demonstrate how traditional tree overlay based aggregation techniques (from P2P, publish-subscribe and stream processing research) can help FL aggregation scale, but are ineffective from a resource utilization and cost standpoint. Next, we present the design and implementation of AdaFed, which uses serverless/cloud functions to adaptively scale aggregation in a resource efficient and fault tolerant manner. We describe how AdaFed enables FL aggregation to be dynamically deployed only when necessary, elastically scaled to handle participant joins/leaves and is fault tolerant with minimal effort required on the (aggregation) programmer side. We also demonstrate that our prototype based on Ray [1] scales to thousands of participants, and is able to achieve a > 90% reduction in resource requirements and cost, with minimal impact on aggregation latency.

Power Saving Proxies for Web Servers

The Computer Journal, 2019

Electricity is a major cost in running a data centre, and servers are responsible for a significa... more Electricity is a major cost in running a data centre, and servers are responsible for a significant percentage of the power consumption. Given the widespread use of HTTP, both as a service and a component of other services, it is worthwhile reducing the power consumption of web servers. In this paper we consider how reverse proxies, commonly used to improve the performance of web servers, might be used to improve energy efficiency. We suggest that when demand on a server is low, it may be possible to switch off servers. In their absence, an embedded system with a small energy footprint could act as a reverse proxy serving commonly-requested content. When new content is required, the reverse proxy can power on the servers to meet this new load. Our results indicate that even with a modest server, we can get a 25% power saving while maintaining acceptable performance.

D6.3 Security of Federated Machine Learning Algorithms

This deliverable D6.3 – Security of federated machine learning algorithms – is the only deliverab... more This deliverable D6.3 – Security of federated machine learning algorithms – is the only deliverable for task T6.3 (Assessing the security of machine learning algorithms under the different privacy operation modes) in WP6. This includes a report with a comprehensive evaluation of the robustness of the different algorithms developed in the MUSKETEER Machine Learning Library (MMLL) against different attacks both at training (poisoning attacks) and test time (evasion attacks). The assessment is performed for both supervised and unsupervised learning tasks across the different Privacy Operation Modes (POMs) considered in the project. The defensive mechanisms evaluated in this deliverable are already described in D5.4 and D5.5.

Castor: Contextual IoT Time Series Data and Model Management at Scale

2018 IEEE International Conference on Data Mining Workshops (ICDMW), 2018

We demonstrate Castor, a cloud-based system for contextual IoT time series data and model managem... more We demonstrate Castor, a cloud-based system for contextual IoT time series data and model management at scale. Castor is designed to assist Data Scientists in (a) exploring and retrieving all relevant time series and contextual information that is required for their predictive modelling tasks; (b) seamlessly storing and deploying their predictive models in a cloud production environment; (c) monitoring the performance of all predictive models in production and (semi-)automatically retraining them in case of performance deterioration. The main features of Castor are: (1) an efficient pipeline for ingesting IoT time series data in real time; (2) a scalable, hybrid data management service for both time series and contextual data; (3) a versatile semantic model for contextual information which can be easily adapted to different application domains; (4) an abstract framework for developing and storing predictive models in R or Python; (5) deployment services which automatically train and/or score predictive models upon user-defined conditions. We demonstrate Castor for a real-world Smart Grid use case and discuss how it can be adapted to other application domains such as Smart Buildings, Telecommunications, Retail or Manufacturing.

Data Spaces

In an Industrie 4.0 (I4.0), rigid structures and architectures applied in manufacturing and indus... more In an Industrie 4.0 (I4.0), rigid structures and architectures applied in manufacturing and industrial information technologies today will be replaced by highly dynamic and self-organizing networks. Today’s proprietary technical systems lead to strictly defined engineering processes and value chains. Interacting Digital Twins (DTs) are considered an enabling technology that could help increase flexibility based on semantically enriched information. Nevertheless, for interacting DTs to become a reality, their implementation should be based on open standards for information modeling and application programming interfaces like the Asset Administration Shell (AAS). Additionally, DT platforms could accelerate development and deployment of DTs and ensure their resilient operation.This chapter develops a suitable architecture for such a DT platform for I4.0 based on user stories, requirements, and a time series messaging experiment. An architecture based on microservices patterns is identi...

IBM Federated Learning: an Enterprise Framework White Paper V0.1

ArXiv, 2020

Federated Learning (FL) is an approach to conduct machine learning without centralizing training ... more Federated Learning (FL) is an approach to conduct machine learning without centralizing training data in a single place, for reasons of privacy, confidentiality or data volume. However, solving federated machine learning problems raises issues above and beyond those of centralized machine learning. These issues include setting up communication infrastructure between parties, coordinating the learning process, integrating party results, understanding the characteristics of the training data sets of different participating parties, handling data heterogeneity, and operating with the absence of a verification data set. IBM Federated Learning provides infrastructure and coordination for federated learning. Data scientists can design and run federated learning jobs based on existing, centralized machine learning models and can provide high-level instructions on how to run the federation. The framework applies to both Deep Neural Networks as well as ``traditional'' approaches for ...

D3.4 Final Prototype of the MUSKETEER Platform

This deliverable (D3.4 "Final Prototype of the MUSKETEER Platform") is a document descr... more This deliverable (D3.4 "Final Prototype of the MUSKETEER Platform") is a document describing the demonstration of the final prototype. It is the culmination of milestone 3 and builds upon the documents D3.1/D3.2/D3.3, providing feature updates as well as highlighting how these features complete the platform requirements. Functionally, this platform provides the infrastructure and implements the services that are required to enable the federated ML algorithms developed in WP4 and WP5 in end-to-end applications. It also supports the assessments to be carried out in WP6 and provides interfaces which allow for the development of client connectors and end-to-end demonstrations of the industrial use cases in WP7.

Increasing Trust for Data Spaces with Federated Learning

Springer eBooks, 2022

Despite the need for data in a time of general digitization of organizations, many challenges are... more Despite the need for data in a time of general digitization of organizations, many challenges are still hampering its shared use. Technical, organizational, legal, and commercial issues remain to leverage data satisfactorily, specially when the data is distributed among different locations and confidentiality must be preserved. Data platforms can offer "ad hoc" solutions to tackle specific matters within a data space. MUSKETEER develops an Industrial Data Platform (IDP) including algorithms for federated and privacy-preserving machine learning techniques on a distributed setup, detection and mitigation of adversarial attacks, and a rewarding model capable of monetizing datasets according to the real data value. The platform can offer an adequate response for organizations in demand of high security standards such as industrial companies with sensitive data or hospitals with personal data. From the architectural point of view, trust is enforced in such a way that data has never to leave out its provider's premises, thanks to federated learning. This approach can help to better comply with the European regulation as confirmed All authors have contributed equally and they are listed in alphabetical order.

Proceedings of the Thirteenth ACM International Conference on Future Energy Systems

A demand response scheme that uses direct device control to actively exploit prosumer flexibility... more A demand response scheme that uses direct device control to actively exploit prosumer flexibility has been identified as a key remedy to meet the challenge of increased renewable energy sources integration. Although a number of direct control-based demand response solutions exist and have been successfully deployed and demonstrated in the real world, they are typically designed for, and are effective only at small scale and/or target specific types of loads, leading to relatively high cost-of-entry. This prohibits deploying scalable solutions. The H2020 GOFLEX project has addressed this issue and developed a scalable, general, and replicable so-called GOFLEX system, which offers a market-driven approach to solve congestion problems in distribution grids based on aggregated individual flexibilities from a wide range of prosumers, both small (incl. electric

Privacy-Preserving Technologies for Trusted Data Spaces

Technologies and Applications for Big Data Value

The quality of a machine learning model depends on the volume of data used during the training pr... more The quality of a machine learning model depends on the volume of data used during the training process. To prevent low accuracy models, one needs to generate more training data or add external data sources of the same kind. If the first option is not feasible, the second one requires the adoption of a federated learning approach, where different devices can collaboratively learn a shared prediction model. However, access to data can be hindered by privacy restrictions. Training machine learning algorithms using data collected from different data providers while mitigating privacy concerns is a challenging problem. In this chapter, we first introduce the general approach of federated machine learning and the H2020 MUSKETEER project, which aims to create a federated, privacy-preserving machine learning Industrial Data Platform. Then, we describe the Privacy Operations Modes designed in MUSKETEER as an answer for more privacy before looking at the platform and its operation using these...

MUSKETEER D3.2 Architecture Design – Final Version

This deliverable (D3.2 "Architecture Design – Final Version") is a document describing ... more This deliverable (D3.2 "Architecture Design – Final Version") is a document describing the architecture for the MUSKETEER centralized server platform. It is the culmination of task T3.1 and builds upon the initial architecture document D3.1, providing architecture/design updates as well as reporting progress in relation to the platform requirements. This document describing the final version of the MUSKETEER platform architecture, how it meets the final requirements of the federated and privacy-preserving machine learning services, how it addresses the final user stories, how it supports incorporating active security measures against adversarial attacks (data poisoning, evasion), and how it aligns with existing Industrial Data Platform standards.

MUSKETEER D3.1 Architecture Design – Initial Version

This deliverable (D3.1 "Architecture Design") is a document describing the initial vers... more This deliverable (D3.1 "Architecture Design") is a document describing the initial version of the MUSKETEER platform architecture. It addresses the previously delivered technical requirements and key performance indicators, takes into account legal and ethical requirements, and aligns with the algorithm library architecture and assessment framework. It informs the MUSKETEER platform development work and acts as counterpart of the client connectors' architecture, which describes the customization and end-to-end integration of the core platform capabilities for the industrial use cases.

Knowledge- and Data-driven Services for Energy Systems using Graph Neural Networks

2020 IEEE International Conference on Big Data (Big Data), 2020

The transition away from carbon-based energy sources poses several challenges for the operation o... more The transition away from carbon-based energy sources poses several challenges for the operation of electricity distribution systems. Increasing shares of distributed energy resources (e.g. renewable energy generators, electric vehicles) and internet-connected sensing and control devices (e.g. smart heating and cooling) require new tools to support accurate, datadriven decision making. Modelling the effect of such growing complexity in the electrical grid is possible in principle using state-of-the-art power-power flow models. In practice, the detailed information needed for these physical simulations may be unknown or prohibitively expensive to obtain. Hence, datadriven approaches to power systems modelling, including feedforward neural networks and auto-encoders, have been studied to leverage the increasing availability of sensor data, but have seen limited practical adoption due to lack of transparency and inefficiencies on large-scale problems. Our work addresses this gap by proposing a data-and knowledge-driven probabilistic graphical model for energy systems based on the framework of graph neural networks (GNNs). The model can explicitly factor in domain knowledge, in the form of grid topology or physics constraints, thus resulting in sparser architectures and much smaller parameters dimensionality when compared with traditional machine-learning models with similar accuracy. Results obtained from a real-world smart-grid demonstration project show how the GNN was used to inform grid congestion predictions and market bidding services for a distribution system operator participating in an energy flexibility market.

Accountable Federated Machine Learning in Government: Engineering and Management Insights

Towards an Accountable and Reproducible Federated Learning: A FactSheets Approach

arXiv (Cornell University), Feb 24, 2022

Federated Learning (FL) is a novel paradigm for the shared training of models based on decentrali... more Federated Learning (FL) is a novel paradigm for the shared training of models based on decentralized and private data. With respect to ethical guidelines, FL is promising regarding privacy, but needs to excel vis-à-vis transparency and trustworthiness. In particular, FL has to address the accountability of the parties involved and their adherence to rules, law and principles. We introduce AFˆ2 Framework , where we instrument FL with accountability by fusing verifiable claims with tamper-evident facts, into reproducible arguments. We build on AI FactSheets for instilling transparency and trustworthiness into the AI lifecycle and expand it in order to incorporate dynamic and nested facts, as well as complex model compositions in FL. Based on our approach, an auditor can validate, reproduce and certify a FL process. This can be directly applied in practice to address the challenges of AI engineering and ethics.

Accelerating Climate Simulations Through Hybrid Computing

Unconventional multi-core processors (e.g., rEM Cell B/E and NYIDIDA GPU) have emerged as acceler... more Unconventional multi-core processors (e.g., rEM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AM D) using MP!. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (I) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Yirtualization (DAY) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading computeintensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been omoaded to multiple Cell blades with -10% network overhead.

AI Modelling and Time-series Forecasting Systems for Trading Energy Flexibility in Distribution Grids

We demonstrate progress on the deployment of two sets of technologies to support distribution gri... more We demonstrate progress on the deployment of two sets of technologies to support distribution grid operators integrating high shares of renewable energy sources, based on a market for trading local energy flexibilities. An artificial-intelligence (AI) grid modelling tool, based on probabilistic graphs, predicts congestions and estimates the amount and location of energy flexibility required to avoid such events. A scalable timeseries forecasting system delivers large numbers of short-term predictions of distributed energy demand and generation. We discuss the deployment of the technologies at three trial demonstration sites across Europe, in the context of a research project carried out in a consortium with energy utilities, technology providers and research institutions.

Knowledge- and Data-driven Services for Energy Systems using Graph Neural Networks

arXiv (Cornell University), Mar 12, 2021

The transition away from carbon-based energy sources poses several challenges for the operation o... more The transition away from carbon-based energy sources poses several challenges for the operation of electricity distribution systems. Increasing shares of distributed energy resources (e.g. renewable energy generators, electric vehicles) and internet-connected sensing and control devices (e.g. smart heating and cooling) require new tools to support accurate, datadriven decision making. Modelling the effect of such growing complexity in the electrical grid is possible in principle using state-of-the-art power-power flow models. In practice, the detailed information needed for these physical simulations may be unknown or prohibitively expensive to obtain. Hence, datadriven approaches to power systems modelling, including feedforward neural networks and auto-encoders, have been studied to leverage the increasing availability of sensor data, but have seen limited practical adoption due to lack of transparency and inefficiencies on large-scale problems. Our work addresses this gap by proposing a data-and knowledge-driven probabilistic graphical model for energy systems based on the framework of graph neural networks (GNNs). The model can explicitly factor in domain knowledge, in the form of grid topology or physics constraints, thus resulting in sparser architectures and much smaller parameters dimensionality when compared with traditional machine-learning models with similar accuracy. Results obtained from a real-world smart-grid demonstration project show how the GNN was used to inform grid congestion predictions and market bidding services for a distribution system operator participating in an energy flexibility market.

Scalable Deployment of AI Time-series Models for IoT

arXiv (Cornell University), Mar 24, 2020

IBM Research Castor, a cloud-native system for managing and deploying large numbers of AI timeser... more IBM Research Castor, a cloud-native system for managing and deploying large numbers of AI timeseries models in IoT applications, is described. Modelling code templates, in Python and R, following a typical machine-learning workflow are supported. A knowledge-based approach to managing model and time-series data allows the use of general semantic concepts for expressing feature engineering tasks. Model templates can be programmatically deployed against specific instances of semantic concepts, thus supporting model reuse and automated replication as the IoT application grows. Deployed models are automatically executed in parallel leveraging a serverless cloud computing framework. The complete history of trained model versions and rolling-horizon predictions is persisted, thus enabling full model lineage and traceability. Results from deployments in realworld smart-grid live forecasting applications are reported. Scalability of executing up to tens of thousands of AI modelling tasks is also evaluated.

Adaptive Aggregation For Federated Learning

ArXiv, 2022

Advances in federated learning (FL) algorithms, along with technologies like differential privacy... more Advances in federated learning (FL) algorithms, along with technologies like differential privacy and homomorphic encryption, have led to FL being increasingly adopted and used in many application domains. This increasing adoption has led to rapid growth in the number, size (number of participants/parties) and diversity (intermittent vs. active parties) of FL jobs. Many existing FL systems, based on centralized (often single) model aggregators are unable to scale to handle large FL jobs and adapt to parties' behavior. In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we demonstrate how traditional tree overlay based aggregation techniques (from P2P, publishsubscribe and stream processing research) can help FL aggregation scale, but are ineffective from a resource utilization and cost standpoint. Next, we present the design and implementation of AdaFed, which uses serverless/cloud functions to adaptively scale aggregation in a resource efficient and fault tolerant manner. We describe how AdaFed enables FL aggregation to be dynamically deployed only when necessary, elastically scaled to handle participant joins/leaves and is fault tolerant with minimal effort required on the (aggregation) programmer side. We also demonstrate that our prototype based on Ray [1] scales to thousands of participants, and is able to achieve a > 90% reduction in resource requirements and cost, with minimal impact on aggregation latency.

Adaptive Aggregation For Federated Learning

2022 IEEE International Conference on Big Data (Big Data)

In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we ... more In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we demonstrate how traditional tree overlay based aggregation techniques (from P2P, publish-subscribe and stream processing research) can help FL aggregation scale, but are ineffective from a resource utilization and cost standpoint. Next, we present the design and implementation of AdaFed, which uses serverless/cloud functions to adaptively scale aggregation in a resource efficient and fault tolerant manner. We describe how AdaFed enables FL aggregation to be dynamically deployed only when necessary, elastically scaled to handle participant joins/leaves and is fault tolerant with minimal effort required on the (aggregation) programmer side. We also demonstrate that our prototype based on Ray [1] scales to thousands of participants, and is able to achieve a > 90% reduction in resource requirements and cost, with minimal impact on aggregation latency.

Power Saving Proxies for Web Servers

The Computer Journal, 2019

Electricity is a major cost in running a data centre, and servers are responsible for a significa... more Electricity is a major cost in running a data centre, and servers are responsible for a significant percentage of the power consumption. Given the widespread use of HTTP, both as a service and a component of other services, it is worthwhile reducing the power consumption of web servers. In this paper we consider how reverse proxies, commonly used to improve the performance of web servers, might be used to improve energy efficiency. We suggest that when demand on a server is low, it may be possible to switch off servers. In their absence, an embedded system with a small energy footprint could act as a reverse proxy serving commonly-requested content. When new content is required, the reverse proxy can power on the servers to meet this new load. Our results indicate that even with a modest server, we can get a 25% power saving while maintaining acceptable performance.

D6.3 Security of Federated Machine Learning Algorithms

This deliverable D6.3 – Security of federated machine learning algorithms – is the only deliverab... more This deliverable D6.3 – Security of federated machine learning algorithms – is the only deliverable for task T6.3 (Assessing the security of machine learning algorithms under the different privacy operation modes) in WP6. This includes a report with a comprehensive evaluation of the robustness of the different algorithms developed in the MUSKETEER Machine Learning Library (MMLL) against different attacks both at training (poisoning attacks) and test time (evasion attacks). The assessment is performed for both supervised and unsupervised learning tasks across the different Privacy Operation Modes (POMs) considered in the project. The defensive mechanisms evaluated in this deliverable are already described in D5.4 and D5.5.

Castor: Contextual IoT Time Series Data and Model Management at Scale

2018 IEEE International Conference on Data Mining Workshops (ICDMW), 2018

We demonstrate Castor, a cloud-based system for contextual IoT time series data and model managem... more We demonstrate Castor, a cloud-based system for contextual IoT time series data and model management at scale. Castor is designed to assist Data Scientists in (a) exploring and retrieving all relevant time series and contextual information that is required for their predictive modelling tasks; (b) seamlessly storing and deploying their predictive models in a cloud production environment; (c) monitoring the performance of all predictive models in production and (semi-)automatically retraining them in case of performance deterioration. The main features of Castor are: (1) an efficient pipeline for ingesting IoT time series data in real time; (2) a scalable, hybrid data management service for both time series and contextual data; (3) a versatile semantic model for contextual information which can be easily adapted to different application domains; (4) an abstract framework for developing and storing predictive models in R or Python; (5) deployment services which automatically train and/or score predictive models upon user-defined conditions. We demonstrate Castor for a real-world Smart Grid use case and discuss how it can be adapted to other application domains such as Smart Buildings, Telecommunications, Retail or Manufacturing.

Data Spaces

In an Industrie 4.0 (I4.0), rigid structures and architectures applied in manufacturing and indus... more In an Industrie 4.0 (I4.0), rigid structures and architectures applied in manufacturing and industrial information technologies today will be replaced by highly dynamic and self-organizing networks. Today’s proprietary technical systems lead to strictly defined engineering processes and value chains. Interacting Digital Twins (DTs) are considered an enabling technology that could help increase flexibility based on semantically enriched information. Nevertheless, for interacting DTs to become a reality, their implementation should be based on open standards for information modeling and application programming interfaces like the Asset Administration Shell (AAS). Additionally, DT platforms could accelerate development and deployment of DTs and ensure their resilient operation.This chapter develops a suitable architecture for such a DT platform for I4.0 based on user stories, requirements, and a time series messaging experiment. An architecture based on microservices patterns is identi...

IBM Federated Learning: an Enterprise Framework White Paper V0.1

ArXiv, 2020

Federated Learning (FL) is an approach to conduct machine learning without centralizing training ... more Federated Learning (FL) is an approach to conduct machine learning without centralizing training data in a single place, for reasons of privacy, confidentiality or data volume. However, solving federated machine learning problems raises issues above and beyond those of centralized machine learning. These issues include setting up communication infrastructure between parties, coordinating the learning process, integrating party results, understanding the characteristics of the training data sets of different participating parties, handling data heterogeneity, and operating with the absence of a verification data set. IBM Federated Learning provides infrastructure and coordination for federated learning. Data scientists can design and run federated learning jobs based on existing, centralized machine learning models and can provide high-level instructions on how to run the federation. The framework applies to both Deep Neural Networks as well as ``traditional'' approaches for ...

D3.4 Final Prototype of the MUSKETEER Platform

This deliverable (D3.4 "Final Prototype of the MUSKETEER Platform") is a document descr... more This deliverable (D3.4 "Final Prototype of the MUSKETEER Platform") is a document describing the demonstration of the final prototype. It is the culmination of milestone 3 and builds upon the documents D3.1/D3.2/D3.3, providing feature updates as well as highlighting how these features complete the platform requirements. Functionally, this platform provides the infrastructure and implements the services that are required to enable the federated ML algorithms developed in WP4 and WP5 in end-to-end applications. It also supports the assessments to be carried out in WP6 and provides interfaces which allow for the development of client connectors and end-to-end demonstrations of the industrial use cases in WP7.

Increasing Trust for Data Spaces with Federated Learning

Springer eBooks, 2022

Despite the need for data in a time of general digitization of organizations, many challenges are... more Despite the need for data in a time of general digitization of organizations, many challenges are still hampering its shared use. Technical, organizational, legal, and commercial issues remain to leverage data satisfactorily, specially when the data is distributed among different locations and confidentiality must be preserved. Data platforms can offer "ad hoc" solutions to tackle specific matters within a data space. MUSKETEER develops an Industrial Data Platform (IDP) including algorithms for federated and privacy-preserving machine learning techniques on a distributed setup, detection and mitigation of adversarial attacks, and a rewarding model capable of monetizing datasets according to the real data value. The platform can offer an adequate response for organizations in demand of high security standards such as industrial companies with sensitive data or hospitals with personal data. From the architectural point of view, trust is enforced in such a way that data has never to leave out its provider's premises, thanks to federated learning. This approach can help to better comply with the European regulation as confirmed All authors have contributed equally and they are listed in alphabetical order.

Proceedings of the Thirteenth ACM International Conference on Future Energy Systems

A demand response scheme that uses direct device control to actively exploit prosumer flexibility... more A demand response scheme that uses direct device control to actively exploit prosumer flexibility has been identified as a key remedy to meet the challenge of increased renewable energy sources integration. Although a number of direct control-based demand response solutions exist and have been successfully deployed and demonstrated in the real world, they are typically designed for, and are effective only at small scale and/or target specific types of loads, leading to relatively high cost-of-entry. This prohibits deploying scalable solutions. The H2020 GOFLEX project has addressed this issue and developed a scalable, general, and replicable so-called GOFLEX system, which offers a market-driven approach to solve congestion problems in distribution grids based on aggregated individual flexibilities from a wide range of prosumers, both small (incl. electric

Privacy-Preserving Technologies for Trusted Data Spaces

Technologies and Applications for Big Data Value

The quality of a machine learning model depends on the volume of data used during the training pr... more The quality of a machine learning model depends on the volume of data used during the training process. To prevent low accuracy models, one needs to generate more training data or add external data sources of the same kind. If the first option is not feasible, the second one requires the adoption of a federated learning approach, where different devices can collaboratively learn a shared prediction model. However, access to data can be hindered by privacy restrictions. Training machine learning algorithms using data collected from different data providers while mitigating privacy concerns is a challenging problem. In this chapter, we first introduce the general approach of federated machine learning and the H2020 MUSKETEER project, which aims to create a federated, privacy-preserving machine learning Industrial Data Platform. Then, we describe the Privacy Operations Modes designed in MUSKETEER as an answer for more privacy before looking at the platform and its operation using these...

MUSKETEER D3.2 Architecture Design – Final Version

This deliverable (D3.2 "Architecture Design – Final Version") is a document describing ... more This deliverable (D3.2 "Architecture Design – Final Version") is a document describing the architecture for the MUSKETEER centralized server platform. It is the culmination of task T3.1 and builds upon the initial architecture document D3.1, providing architecture/design updates as well as reporting progress in relation to the platform requirements. This document describing the final version of the MUSKETEER platform architecture, how it meets the final requirements of the federated and privacy-preserving machine learning services, how it addresses the final user stories, how it supports incorporating active security measures against adversarial attacks (data poisoning, evasion), and how it aligns with existing Industrial Data Platform standards.

MUSKETEER D3.1 Architecture Design – Initial Version

This deliverable (D3.1 "Architecture Design") is a document describing the initial vers... more This deliverable (D3.1 "Architecture Design") is a document describing the initial version of the MUSKETEER platform architecture. It addresses the previously delivered technical requirements and key performance indicators, takes into account legal and ethical requirements, and aligns with the algorithm library architecture and assessment framework. It informs the MUSKETEER platform development work and acts as counterpart of the client connectors' architecture, which describes the customization and end-to-end integration of the core platform capabilities for the industrial use cases.

Knowledge- and Data-driven Services for Energy Systems using Graph Neural Networks

2020 IEEE International Conference on Big Data (Big Data), 2020

The transition away from carbon-based energy sources poses several challenges for the operation o... more The transition away from carbon-based energy sources poses several challenges for the operation of electricity distribution systems. Increasing shares of distributed energy resources (e.g. renewable energy generators, electric vehicles) and internet-connected sensing and control devices (e.g. smart heating and cooling) require new tools to support accurate, datadriven decision making. Modelling the effect of such growing complexity in the electrical grid is possible in principle using state-of-the-art power-power flow models. In practice, the detailed information needed for these physical simulations may be unknown or prohibitively expensive to obtain. Hence, datadriven approaches to power systems modelling, including feedforward neural networks and auto-encoders, have been studied to leverage the increasing availability of sensor data, but have seen limited practical adoption due to lack of transparency and inefficiencies on large-scale problems. Our work addresses this gap by proposing a data-and knowledge-driven probabilistic graphical model for energy systems based on the framework of graph neural networks (GNNs). The model can explicitly factor in domain knowledge, in the form of grid topology or physics constraints, thus resulting in sparser architectures and much smaller parameters dimensionality when compared with traditional machine-learning models with similar accuracy. Results obtained from a real-world smart-grid demonstration project show how the GNN was used to inform grid congestion predictions and market bidding services for a distribution system operator participating in an energy flexibility market.

Accountable Federated Machine Learning in Government: Engineering and Management Insights