The transition away from carbon-based energy sources poses several challenges for the operation o... more The transition away from carbon-based energy sources poses several challenges for the operation of electricity distribution systems. Increasing shares of distributed energy resources (e.g. renewable energy generators, electric vehicles) and internet-connected sensing and control devices (e.g. smart heating and cooling) require new tools to support accurate, datadriven decision making. Modelling the effect of such growing complexity in the electrical grid is possible in principle using state-of-the-art power-power flow models. In practice, the detailed information needed for these physical simulations may be unknown or prohibitively expensive to obtain. Hence, datadriven approaches to power systems modelling, including feedforward neural networks and auto-encoders, have been studied to leverage the increasing availability of sensor data, but have seen limited practical adoption due to lack of transparency and inefficiencies on large-scale problems. Our work addresses this gap by proposing a data-and knowledge-driven probabilistic graphical model for energy systems based on the framework of graph neural networks (GNNs). The model can explicitly factor in domain knowledge, in the form of grid topology or physics constraints, thus resulting in sparser architectures and much smaller parameters dimensionality when compared with traditional machine-learning models with similar accuracy. Results obtained from a real-world smart-grid demonstration project show how the GNN was used to inform grid congestion predictions and market bidding services for a distribution system operator participating in an energy flexibility market.
IBM Research Castor, a cloud-native system for managing and deploying large numbers of AI timeser... more IBM Research Castor, a cloud-native system for managing and deploying large numbers of AI timeseries models in IoT applications, is described. Modelling code templates, in Python and R, following a typical machine-learning workflow are supported. A knowledge-based approach to managing model and time-series data allows the use of general semantic concepts for expressing feature engineering tasks. Model templates can be programmatically deployed against specific instances of semantic concepts, thus supporting model reuse and automated replication as the IoT application grows. Deployed models are automatically executed in parallel leveraging a serverless cloud computing framework. The complete history of trained model versions and rolling-horizon predictions is persisted, thus enabling full model lineage and traceability. Results from deployments in realworld smart-grid live forecasting applications are reported. Scalability of executing up to tens of thousands of AI modelling tasks is also evaluated.
Advances in federated learning (FL) algorithms, along with technologies like differential privacy... more Advances in federated learning (FL) algorithms, along with technologies like differential privacy and homomorphic encryption, have led to FL being increasingly adopted and used in many application domains. This increasing adoption has led to rapid growth in the number, size (number of participants/parties) and diversity (intermittent vs. active parties) of FL jobs. Many existing FL systems, based on centralized (often single) model aggregators are unable to scale to handle large FL jobs and adapt to parties' behavior. In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we demonstrate how traditional tree overlay based aggregation techniques (from P2P, publishsubscribe and stream processing research) can help FL aggregation scale, but are ineffective from a resource utilization and cost standpoint. Next, we present the design and implementation of AdaFed, which uses serverless/cloud functions to adaptively scale aggregation in a resource efficient and fault tolerant manner. We describe how AdaFed enables FL aggregation to be dynamically deployed only when necessary, elastically scaled to handle participant joins/leaves and is fault tolerant with minimal effort required on the (aggregation) programmer side. We also demonstrate that our prototype based on Ray [1] scales to thousands of participants, and is able to achieve a > 90% reduction in resource requirements and cost, with minimal impact on aggregation latency.
2022 IEEE International Conference on Big Data (Big Data)
In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we ... more In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we demonstrate how traditional tree overlay based aggregation techniques (from P2P, publish-subscribe and stream processing research) can help FL aggregation scale, but are ineffective from a resource utilization and cost standpoint. Next, we present the design and implementation of AdaFed, which uses serverless/cloud functions to adaptively scale aggregation in a resource efficient and fault tolerant manner. We describe how AdaFed enables FL aggregation to be dynamically deployed only when necessary, elastically scaled to handle participant joins/leaves and is fault tolerant with minimal effort required on the (aggregation) programmer side. We also demonstrate that our prototype based on Ray [1] scales to thousands of participants, and is able to achieve a > 90% reduction in resource requirements and cost, with minimal impact on aggregation latency.
Electricity is a major cost in running a data centre, and servers are responsible for a significa... more Electricity is a major cost in running a data centre, and servers are responsible for a significant percentage of the power consumption. Given the widespread use of HTTP, both as a service and a component of other services, it is worthwhile reducing the power consumption of web servers. In this paper we consider how reverse proxies, commonly used to improve the performance of web servers, might be used to improve energy efficiency. We suggest that when demand on a server is low, it may be possible to switch off servers. In their absence, an embedded system with a small energy footprint could act as a reverse proxy serving commonly-requested content. When new content is required, the reverse proxy can power on the servers to meet this new load. Our results indicate that even with a modest server, we can get a 25% power saving while maintaining acceptable performance.
This deliverable D6.3 – Security of federated machine learning algorithms – is the only deliverab... more This deliverable D6.3 – Security of federated machine learning algorithms – is the only deliverable for task T6.3 (Assessing the security of machine learning algorithms under the different privacy operation modes) in WP6. This includes a report with a comprehensive evaluation of the robustness of the different algorithms developed in the MUSKETEER Machine Learning Library (MMLL) against different attacks both at training (poisoning attacks) and test time (evasion attacks). The assessment is performed for both supervised and unsupervised learning tasks across the different Privacy Operation Modes (POMs) considered in the project. The defensive mechanisms evaluated in this deliverable are already described in D5.4 and D5.5.
2018 IEEE International Conference on Data Mining Workshops (ICDMW), 2018
We demonstrate Castor, a cloud-based system for contextual IoT time series data and model managem... more We demonstrate Castor, a cloud-based system for contextual IoT time series data and model management at scale. Castor is designed to assist Data Scientists in (a) exploring and retrieving all relevant time series and contextual information that is required for their predictive modelling tasks; (b) seamlessly storing and deploying their predictive models in a cloud production environment; (c) monitoring the performance of all predictive models in production and (semi-)automatically retraining them in case of performance deterioration. The main features of Castor are: (1) an efficient pipeline for ingesting IoT time series data in real time; (2) a scalable, hybrid data management service for both time series and contextual data; (3) a versatile semantic model for contextual information which can be easily adapted to different application domains; (4) an abstract framework for developing and storing predictive models in R or Python; (5) deployment services which automatically train and/or score predictive models upon user-defined conditions. We demonstrate Castor for a real-world Smart Grid use case and discuss how it can be adapted to other application domains such as Smart Buildings, Telecommunications, Retail or Manufacturing.
In an Industrie 4.0 (I4.0), rigid structures and architectures applied in manufacturing and indus... more In an Industrie 4.0 (I4.0), rigid structures and architectures applied in manufacturing and industrial information technologies today will be replaced by highly dynamic and self-organizing networks. Today’s proprietary technical systems lead to strictly defined engineering processes and value chains. Interacting Digital Twins (DTs) are considered an enabling technology that could help increase flexibility based on semantically enriched information. Nevertheless, for interacting DTs to become a reality, their implementation should be based on open standards for information modeling and application programming interfaces like the Asset Administration Shell (AAS). Additionally, DT platforms could accelerate development and deployment of DTs and ensure their resilient operation.This chapter develops a suitable architecture for such a DT platform for I4.0 based on user stories, requirements, and a time series messaging experiment. An architecture based on microservices patterns is identi...
Federated Learning (FL) is an approach to conduct machine learning without centralizing training ... more Federated Learning (FL) is an approach to conduct machine learning without centralizing training data in a single place, for reasons of privacy, confidentiality or data volume. However, solving federated machine learning problems raises issues above and beyond those of centralized machine learning. These issues include setting up communication infrastructure between parties, coordinating the learning process, integrating party results, understanding the characteristics of the training data sets of different participating parties, handling data heterogeneity, and operating with the absence of a verification data set. IBM Federated Learning provides infrastructure and coordination for federated learning. Data scientists can design and run federated learning jobs based on existing, centralized machine learning models and can provide high-level instructions on how to run the federation. The framework applies to both Deep Neural Networks as well as ``traditional'' approaches for ...
This deliverable (D3.4 "Final Prototype of the MUSKETEER Platform") is a document descr... more This deliverable (D3.4 "Final Prototype of the MUSKETEER Platform") is a document describing the demonstration of the final prototype. It is the culmination of milestone 3 and builds upon the documents D3.1/D3.2/D3.3, providing feature updates as well as highlighting how these features complete the platform requirements. Functionally, this platform provides the infrastructure and implements the services that are required to enable the federated ML algorithms developed in WP4 and WP5 in end-to-end applications. It also supports the assessments to be carried out in WP6 and provides interfaces which allow for the development of client connectors and end-to-end demonstrations of the industrial use cases in WP7.
Despite the need for data in a time of general digitization of organizations, many challenges are... more Despite the need for data in a time of general digitization of organizations, many challenges are still hampering its shared use. Technical, organizational, legal, and commercial issues remain to leverage data satisfactorily, specially when the data is distributed among different locations and confidentiality must be preserved. Data platforms can offer "ad hoc" solutions to tackle specific matters within a data space. MUSKETEER develops an Industrial Data Platform (IDP) including algorithms for federated and privacy-preserving machine learning techniques on a distributed setup, detection and mitigation of adversarial attacks, and a rewarding model capable of monetizing datasets according to the real data value. The platform can offer an adequate response for organizations in demand of high security standards such as industrial companies with sensitive data or hospitals with personal data. From the architectural point of view, trust is enforced in such a way that data has never to leave out its provider's premises, thanks to federated learning. This approach can help to better comply with the European regulation as confirmed All authors have contributed equally and they are listed in alphabetical order.
Proceedings of the Thirteenth ACM International Conference on Future Energy Systems
A demand response scheme that uses direct device control to actively exploit prosumer flexibility... more A demand response scheme that uses direct device control to actively exploit prosumer flexibility has been identified as a key remedy to meet the challenge of increased renewable energy sources integration. Although a number of direct control-based demand response solutions exist and have been successfully deployed and demonstrated in the real world, they are typically designed for, and are effective only at small scale and/or target specific types of loads, leading to relatively high cost-of-entry. This prohibits deploying scalable solutions. The H2020 GOFLEX project has addressed this issue and developed a scalable, general, and replicable so-called GOFLEX system, which offers a market-driven approach to solve congestion problems in distribution grids based on aggregated individual flexibilities from a wide range of prosumers, both small (incl. electric
The quality of a machine learning model depends on the volume of data used during the training pr... more The quality of a machine learning model depends on the volume of data used during the training process. To prevent low accuracy models, one needs to generate more training data or add external data sources of the same kind. If the first option is not feasible, the second one requires the adoption of a federated learning approach, where different devices can collaboratively learn a shared prediction model. However, access to data can be hindered by privacy restrictions. Training machine learning algorithms using data collected from different data providers while mitigating privacy concerns is a challenging problem. In this chapter, we first introduce the general approach of federated machine learning and the H2020 MUSKETEER project, which aims to create a federated, privacy-preserving machine learning Industrial Data Platform. Then, we describe the Privacy Operations Modes designed in MUSKETEER as an answer for more privacy before looking at the platform and its operation using these...
This deliverable (D3.2 "Architecture Design – Final Version") is a document describing ... more This deliverable (D3.2 "Architecture Design – Final Version") is a document describing the architecture for the MUSKETEER centralized server platform. It is the culmination of task T3.1 and builds upon the initial architecture document D3.1, providing architecture/design updates as well as reporting progress in relation to the platform requirements. This document describing the final version of the MUSKETEER platform architecture, how it meets the final requirements of the federated and privacy-preserving machine learning services, how it addresses the final user stories, how it supports incorporating active security measures against adversarial attacks (data poisoning, evasion), and how it aligns with existing Industrial Data Platform standards.
This deliverable (D3.1 "Architecture Design") is a document describing the initial vers... more This deliverable (D3.1 "Architecture Design") is a document describing the initial version of the MUSKETEER platform architecture. It addresses the previously delivered technical requirements and key performance indicators, takes into account legal and ethical requirements, and aligns with the algorithm library architecture and assessment framework. It informs the MUSKETEER platform development work and acts as counterpart of the client connectors' architecture, which describes the customization and end-to-end integration of the core platform capabilities for the industrial use cases.
2020 IEEE International Conference on Big Data (Big Data), 2020
The transition away from carbon-based energy sources poses several challenges for the operation o... more The transition away from carbon-based energy sources poses several challenges for the operation of electricity distribution systems. Increasing shares of distributed energy resources (e.g. renewable energy generators, electric vehicles) and internet-connected sensing and control devices (e.g. smart heating and cooling) require new tools to support accurate, datadriven decision making. Modelling the effect of such growing complexity in the electrical grid is possible in principle using state-of-the-art power-power flow models. In practice, the detailed information needed for these physical simulations may be unknown or prohibitively expensive to obtain. Hence, datadriven approaches to power systems modelling, including feedforward neural networks and auto-encoders, have been studied to leverage the increasing availability of sensor data, but have seen limited practical adoption due to lack of transparency and inefficiencies on large-scale problems. Our work addresses this gap by proposing a data-and knowledge-driven probabilistic graphical model for energy systems based on the framework of graph neural networks (GNNs). The model can explicitly factor in domain knowledge, in the form of grid topology or physics constraints, thus resulting in sparser architectures and much smaller parameters dimensionality when compared with traditional machine-learning models with similar accuracy. Results obtained from a real-world smart-grid demonstration project show how the GNN was used to inform grid congestion predictions and market bidding services for a distribution system operator participating in an energy flexibility market.
Devices and technologies to measure and report water consumption at sub-daily intervals are growi... more Devices and technologies to measure and report water consumption at sub-daily intervals are growing in popularity. Data from these devices are creating new opportunities to manage the supply and demand of water in near real-time. To this end, the EU FP7 iWIDGET (Improved Water efficiency through ICT for integrated supply-Demand side manaGEmenT) project is developing a state-of-the art analytics platform for the integrated management of urban water. Key challenges include extracting useful insights from high-resolution consumption data and exploring a range of decision-support tools for water utilities and consumers. To overcome these challenges, iWIDGET is developing a distributed, open, robust, collaborative architecture that allows partners and utilities to collect and process data from a large number of sensors in parallel and analyze data on demand. We present a distributed system that enables flexible, near real-time monitoring of water networks by providing four critical mechanisms. First, a means to regularly poll water utility raw data systems. Second, assimilation of fresh data into a purposely designed, highperformance database. Third, geographically local or remote analytic systems poll the database to incorporate the latest consumption information in their analysis. Lastly, an online portal based platform is used to trigger analysis and review results. A key architectural feature of this system is a loose coupling between central storage and analytic systems. Communication between the central storage and processing components utilizes standard techniques, including WaterML, over RESTful web services. This arrangement avoids restrictions on the underlying technologies in analytical components and allows analytic systems to execute on different operating systems and run-times. The system is under active development and will enable a wide variety of tools for water utilities and individual consumers. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 318272.
2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications, 2012
Understanding the baseline underwater acoustic signature of an offshore location is a necessary, ... more Understanding the baseline underwater acoustic signature of an offshore location is a necessary, early step in formulating an environmental impact assessment of wave energy conversion devices. But in order to even begin this understanding, infrastructure must be deployed to capture raw acoustic signals for an extended period of time. This infrastructure is comprised of at least four distinct components. Firstly, a hydrophone, deployed underwater, which is capable of operating at a high sampling rate: 500,000 16-bit samples per second. Secondly, an analog/digital converter (ADC), to which the hydrophone transmits raw voltages. Thirdly, a communications infrastructure for bridging the gap from the ADC to shore. And finally, an onshore base-station for receiving the signals and presenting them to a remote analytic or simulation infrastructure for further processing. Attempting this signal capture in real-time poses many problems. On a practical level, deploying cabled infrastructure to deliver power and communications to the offshore components may be prohibitively expensive. However, reliance on solar power may result in interruptions to real-time wireless transmission. Additionally, a high sampling rate will require significant base-station memory/storage/processing capabilities as well as potentially high costs of delivery to a remote infrastructure, part of which could be alleviated by realtime signal compression. This paper discusses our attempts at implementing such a system which would reliably acquire real-time data and scale with growing demands.
2011 22nd IEEE International Symposium on Rapid System Prototyping, 2011
Traditional use of software and hardware simulators and emulators has been in efforts for chip le... more Traditional use of software and hardware simulators and emulators has been in efforts for chip level analysis and verification. However, prototyping and bringup requirements often demands system or platform level integration and analysis requiring new uses of these traditional pre-silicon methods along with novel interpretations of existing hardware to prototype some functions matching behaviors of future systems. In order to
The transition away from carbon-based energy sources poses several challenges for the operation o... more The transition away from carbon-based energy sources poses several challenges for the operation of electricity distribution systems. Increasing shares of distributed energy resources (e.g. renewable energy generators, electric vehicles) and internet-connected sensing and control devices (e.g. smart heating and cooling) require new tools to support accurate, datadriven decision making. Modelling the effect of such growing complexity in the electrical grid is possible in principle using state-of-the-art power-power flow models. In practice, the detailed information needed for these physical simulations may be unknown or prohibitively expensive to obtain. Hence, datadriven approaches to power systems modelling, including feedforward neural networks and auto-encoders, have been studied to leverage the increasing availability of sensor data, but have seen limited practical adoption due to lack of transparency and inefficiencies on large-scale problems. Our work addresses this gap by proposing a data-and knowledge-driven probabilistic graphical model for energy systems based on the framework of graph neural networks (GNNs). The model can explicitly factor in domain knowledge, in the form of grid topology or physics constraints, thus resulting in sparser architectures and much smaller parameters dimensionality when compared with traditional machine-learning models with similar accuracy. Results obtained from a real-world smart-grid demonstration project show how the GNN was used to inform grid congestion predictions and market bidding services for a distribution system operator participating in an energy flexibility market.
IBM Research Castor, a cloud-native system for managing and deploying large numbers of AI timeser... more IBM Research Castor, a cloud-native system for managing and deploying large numbers of AI timeseries models in IoT applications, is described. Modelling code templates, in Python and R, following a typical machine-learning workflow are supported. A knowledge-based approach to managing model and time-series data allows the use of general semantic concepts for expressing feature engineering tasks. Model templates can be programmatically deployed against specific instances of semantic concepts, thus supporting model reuse and automated replication as the IoT application grows. Deployed models are automatically executed in parallel leveraging a serverless cloud computing framework. The complete history of trained model versions and rolling-horizon predictions is persisted, thus enabling full model lineage and traceability. Results from deployments in realworld smart-grid live forecasting applications are reported. Scalability of executing up to tens of thousands of AI modelling tasks is also evaluated.
Advances in federated learning (FL) algorithms, along with technologies like differential privacy... more Advances in federated learning (FL) algorithms, along with technologies like differential privacy and homomorphic encryption, have led to FL being increasingly adopted and used in many application domains. This increasing adoption has led to rapid growth in the number, size (number of participants/parties) and diversity (intermittent vs. active parties) of FL jobs. Many existing FL systems, based on centralized (often single) model aggregators are unable to scale to handle large FL jobs and adapt to parties' behavior. In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we demonstrate how traditional tree overlay based aggregation techniques (from P2P, publishsubscribe and stream processing research) can help FL aggregation scale, but are ineffective from a resource utilization and cost standpoint. Next, we present the design and implementation of AdaFed, which uses serverless/cloud functions to adaptively scale aggregation in a resource efficient and fault tolerant manner. We describe how AdaFed enables FL aggregation to be dynamically deployed only when necessary, elastically scaled to handle participant joins/leaves and is fault tolerant with minimal effort required on the (aggregation) programmer side. We also demonstrate that our prototype based on Ray [1] scales to thousands of participants, and is able to achieve a > 90% reduction in resource requirements and cost, with minimal impact on aggregation latency.
2022 IEEE International Conference on Big Data (Big Data)
In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we ... more In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we demonstrate how traditional tree overlay based aggregation techniques (from P2P, publish-subscribe and stream processing research) can help FL aggregation scale, but are ineffective from a resource utilization and cost standpoint. Next, we present the design and implementation of AdaFed, which uses serverless/cloud functions to adaptively scale aggregation in a resource efficient and fault tolerant manner. We describe how AdaFed enables FL aggregation to be dynamically deployed only when necessary, elastically scaled to handle participant joins/leaves and is fault tolerant with minimal effort required on the (aggregation) programmer side. We also demonstrate that our prototype based on Ray [1] scales to thousands of participants, and is able to achieve a > 90% reduction in resource requirements and cost, with minimal impact on aggregation latency.
Electricity is a major cost in running a data centre, and servers are responsible for a significa... more Electricity is a major cost in running a data centre, and servers are responsible for a significant percentage of the power consumption. Given the widespread use of HTTP, both as a service and a component of other services, it is worthwhile reducing the power consumption of web servers. In this paper we consider how reverse proxies, commonly used to improve the performance of web servers, might be used to improve energy efficiency. We suggest that when demand on a server is low, it may be possible to switch off servers. In their absence, an embedded system with a small energy footprint could act as a reverse proxy serving commonly-requested content. When new content is required, the reverse proxy can power on the servers to meet this new load. Our results indicate that even with a modest server, we can get a 25% power saving while maintaining acceptable performance.
This deliverable D6.3 – Security of federated machine learning algorithms – is the only deliverab... more This deliverable D6.3 – Security of federated machine learning algorithms – is the only deliverable for task T6.3 (Assessing the security of machine learning algorithms under the different privacy operation modes) in WP6. This includes a report with a comprehensive evaluation of the robustness of the different algorithms developed in the MUSKETEER Machine Learning Library (MMLL) against different attacks both at training (poisoning attacks) and test time (evasion attacks). The assessment is performed for both supervised and unsupervised learning tasks across the different Privacy Operation Modes (POMs) considered in the project. The defensive mechanisms evaluated in this deliverable are already described in D5.4 and D5.5.
2018 IEEE International Conference on Data Mining Workshops (ICDMW), 2018
We demonstrate Castor, a cloud-based system for contextual IoT time series data and model managem... more We demonstrate Castor, a cloud-based system for contextual IoT time series data and model management at scale. Castor is designed to assist Data Scientists in (a) exploring and retrieving all relevant time series and contextual information that is required for their predictive modelling tasks; (b) seamlessly storing and deploying their predictive models in a cloud production environment; (c) monitoring the performance of all predictive models in production and (semi-)automatically retraining them in case of performance deterioration. The main features of Castor are: (1) an efficient pipeline for ingesting IoT time series data in real time; (2) a scalable, hybrid data management service for both time series and contextual data; (3) a versatile semantic model for contextual information which can be easily adapted to different application domains; (4) an abstract framework for developing and storing predictive models in R or Python; (5) deployment services which automatically train and/or score predictive models upon user-defined conditions. We demonstrate Castor for a real-world Smart Grid use case and discuss how it can be adapted to other application domains such as Smart Buildings, Telecommunications, Retail or Manufacturing.
In an Industrie 4.0 (I4.0), rigid structures and architectures applied in manufacturing and indus... more In an Industrie 4.0 (I4.0), rigid structures and architectures applied in manufacturing and industrial information technologies today will be replaced by highly dynamic and self-organizing networks. Today’s proprietary technical systems lead to strictly defined engineering processes and value chains. Interacting Digital Twins (DTs) are considered an enabling technology that could help increase flexibility based on semantically enriched information. Nevertheless, for interacting DTs to become a reality, their implementation should be based on open standards for information modeling and application programming interfaces like the Asset Administration Shell (AAS). Additionally, DT platforms could accelerate development and deployment of DTs and ensure their resilient operation.This chapter develops a suitable architecture for such a DT platform for I4.0 based on user stories, requirements, and a time series messaging experiment. An architecture based on microservices patterns is identi...
Federated Learning (FL) is an approach to conduct machine learning without centralizing training ... more Federated Learning (FL) is an approach to conduct machine learning without centralizing training data in a single place, for reasons of privacy, confidentiality or data volume. However, solving federated machine learning problems raises issues above and beyond those of centralized machine learning. These issues include setting up communication infrastructure between parties, coordinating the learning process, integrating party results, understanding the characteristics of the training data sets of different participating parties, handling data heterogeneity, and operating with the absence of a verification data set. IBM Federated Learning provides infrastructure and coordination for federated learning. Data scientists can design and run federated learning jobs based on existing, centralized machine learning models and can provide high-level instructions on how to run the federation. The framework applies to both Deep Neural Networks as well as ``traditional'' approaches for ...
This deliverable (D3.4 "Final Prototype of the MUSKETEER Platform") is a document descr... more This deliverable (D3.4 "Final Prototype of the MUSKETEER Platform") is a document describing the demonstration of the final prototype. It is the culmination of milestone 3 and builds upon the documents D3.1/D3.2/D3.3, providing feature updates as well as highlighting how these features complete the platform requirements. Functionally, this platform provides the infrastructure and implements the services that are required to enable the federated ML algorithms developed in WP4 and WP5 in end-to-end applications. It also supports the assessments to be carried out in WP6 and provides interfaces which allow for the development of client connectors and end-to-end demonstrations of the industrial use cases in WP7.
Despite the need for data in a time of general digitization of organizations, many challenges are... more Despite the need for data in a time of general digitization of organizations, many challenges are still hampering its shared use. Technical, organizational, legal, and commercial issues remain to leverage data satisfactorily, specially when the data is distributed among different locations and confidentiality must be preserved. Data platforms can offer "ad hoc" solutions to tackle specific matters within a data space. MUSKETEER develops an Industrial Data Platform (IDP) including algorithms for federated and privacy-preserving machine learning techniques on a distributed setup, detection and mitigation of adversarial attacks, and a rewarding model capable of monetizing datasets according to the real data value. The platform can offer an adequate response for organizations in demand of high security standards such as industrial companies with sensitive data or hospitals with personal data. From the architectural point of view, trust is enforced in such a way that data has never to leave out its provider's premises, thanks to federated learning. This approach can help to better comply with the European regulation as confirmed All authors have contributed equally and they are listed in alphabetical order.
Proceedings of the Thirteenth ACM International Conference on Future Energy Systems
A demand response scheme that uses direct device control to actively exploit prosumer flexibility... more A demand response scheme that uses direct device control to actively exploit prosumer flexibility has been identified as a key remedy to meet the challenge of increased renewable energy sources integration. Although a number of direct control-based demand response solutions exist and have been successfully deployed and demonstrated in the real world, they are typically designed for, and are effective only at small scale and/or target specific types of loads, leading to relatively high cost-of-entry. This prohibits deploying scalable solutions. The H2020 GOFLEX project has addressed this issue and developed a scalable, general, and replicable so-called GOFLEX system, which offers a market-driven approach to solve congestion problems in distribution grids based on aggregated individual flexibilities from a wide range of prosumers, both small (incl. electric
The quality of a machine learning model depends on the volume of data used during the training pr... more The quality of a machine learning model depends on the volume of data used during the training process. To prevent low accuracy models, one needs to generate more training data or add external data sources of the same kind. If the first option is not feasible, the second one requires the adoption of a federated learning approach, where different devices can collaboratively learn a shared prediction model. However, access to data can be hindered by privacy restrictions. Training machine learning algorithms using data collected from different data providers while mitigating privacy concerns is a challenging problem. In this chapter, we first introduce the general approach of federated machine learning and the H2020 MUSKETEER project, which aims to create a federated, privacy-preserving machine learning Industrial Data Platform. Then, we describe the Privacy Operations Modes designed in MUSKETEER as an answer for more privacy before looking at the platform and its operation using these...
This deliverable (D3.2 "Architecture Design – Final Version") is a document describing ... more This deliverable (D3.2 "Architecture Design – Final Version") is a document describing the architecture for the MUSKETEER centralized server platform. It is the culmination of task T3.1 and builds upon the initial architecture document D3.1, providing architecture/design updates as well as reporting progress in relation to the platform requirements. This document describing the final version of the MUSKETEER platform architecture, how it meets the final requirements of the federated and privacy-preserving machine learning services, how it addresses the final user stories, how it supports incorporating active security measures against adversarial attacks (data poisoning, evasion), and how it aligns with existing Industrial Data Platform standards.
This deliverable (D3.1 "Architecture Design") is a document describing the initial vers... more This deliverable (D3.1 "Architecture Design") is a document describing the initial version of the MUSKETEER platform architecture. It addresses the previously delivered technical requirements and key performance indicators, takes into account legal and ethical requirements, and aligns with the algorithm library architecture and assessment framework. It informs the MUSKETEER platform development work and acts as counterpart of the client connectors' architecture, which describes the customization and end-to-end integration of the core platform capabilities for the industrial use cases.
2020 IEEE International Conference on Big Data (Big Data), 2020
The transition away from carbon-based energy sources poses several challenges for the operation o... more The transition away from carbon-based energy sources poses several challenges for the operation of electricity distribution systems. Increasing shares of distributed energy resources (e.g. renewable energy generators, electric vehicles) and internet-connected sensing and control devices (e.g. smart heating and cooling) require new tools to support accurate, datadriven decision making. Modelling the effect of such growing complexity in the electrical grid is possible in principle using state-of-the-art power-power flow models. In practice, the detailed information needed for these physical simulations may be unknown or prohibitively expensive to obtain. Hence, datadriven approaches to power systems modelling, including feedforward neural networks and auto-encoders, have been studied to leverage the increasing availability of sensor data, but have seen limited practical adoption due to lack of transparency and inefficiencies on large-scale problems. Our work addresses this gap by proposing a data-and knowledge-driven probabilistic graphical model for energy systems based on the framework of graph neural networks (GNNs). The model can explicitly factor in domain knowledge, in the form of grid topology or physics constraints, thus resulting in sparser architectures and much smaller parameters dimensionality when compared with traditional machine-learning models with similar accuracy. Results obtained from a real-world smart-grid demonstration project show how the GNN was used to inform grid congestion predictions and market bidding services for a distribution system operator participating in an energy flexibility market.
Devices and technologies to measure and report water consumption at sub-daily intervals are growi... more Devices and technologies to measure and report water consumption at sub-daily intervals are growing in popularity. Data from these devices are creating new opportunities to manage the supply and demand of water in near real-time. To this end, the EU FP7 iWIDGET (Improved Water efficiency through ICT for integrated supply-Demand side manaGEmenT) project is developing a state-of-the art analytics platform for the integrated management of urban water. Key challenges include extracting useful insights from high-resolution consumption data and exploring a range of decision-support tools for water utilities and consumers. To overcome these challenges, iWIDGET is developing a distributed, open, robust, collaborative architecture that allows partners and utilities to collect and process data from a large number of sensors in parallel and analyze data on demand. We present a distributed system that enables flexible, near real-time monitoring of water networks by providing four critical mechanisms. First, a means to regularly poll water utility raw data systems. Second, assimilation of fresh data into a purposely designed, highperformance database. Third, geographically local or remote analytic systems poll the database to incorporate the latest consumption information in their analysis. Lastly, an online portal based platform is used to trigger analysis and review results. A key architectural feature of this system is a loose coupling between central storage and analytic systems. Communication between the central storage and processing components utilizes standard techniques, including WaterML, over RESTful web services. This arrangement avoids restrictions on the underlying technologies in analytical components and allows analytic systems to execute on different operating systems and run-times. The system is under active development and will enable a wide variety of tools for water utilities and individual consumers. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 318272.
2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications, 2012
Understanding the baseline underwater acoustic signature of an offshore location is a necessary, ... more Understanding the baseline underwater acoustic signature of an offshore location is a necessary, early step in formulating an environmental impact assessment of wave energy conversion devices. But in order to even begin this understanding, infrastructure must be deployed to capture raw acoustic signals for an extended period of time. This infrastructure is comprised of at least four distinct components. Firstly, a hydrophone, deployed underwater, which is capable of operating at a high sampling rate: 500,000 16-bit samples per second. Secondly, an analog/digital converter (ADC), to which the hydrophone transmits raw voltages. Thirdly, a communications infrastructure for bridging the gap from the ADC to shore. And finally, an onshore base-station for receiving the signals and presenting them to a remote analytic or simulation infrastructure for further processing. Attempting this signal capture in real-time poses many problems. On a practical level, deploying cabled infrastructure to deliver power and communications to the offshore components may be prohibitively expensive. However, reliance on solar power may result in interruptions to real-time wireless transmission. Additionally, a high sampling rate will require significant base-station memory/storage/processing capabilities as well as potentially high costs of delivery to a remote infrastructure, part of which could be alleviated by realtime signal compression. This paper discusses our attempts at implementing such a system which would reliably acquire real-time data and scale with growing demands.
2011 22nd IEEE International Symposium on Rapid System Prototyping, 2011
Traditional use of software and hardware simulators and emulators has been in efforts for chip le... more Traditional use of software and hardware simulators and emulators has been in efforts for chip level analysis and verification. However, prototyping and bringup requirements often demands system or platform level integration and analysis requiring new uses of these traditional pre-silicon methods along with novel interpretations of existing hardware to prototype some functions matching behaviors of future systems. In order to
Uploads
Papers by Mark Purcell