2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2019
In the exascale era, HPC systems are expected to operate under different system-wide power-constr... more In the exascale era, HPC systems are expected to operate under different system-wide power-constraints. For such power-constrained systems, improving per-job flops-per-watt may not be sufficient to improve the total HPC productivity as more number of scientific applications with different compute intensities are migrating to the HPC systems. To measure HPC productivity for such applications, we utilize a monotonically decreasing time-dependent value function, called job-value, with each application. A job-value function represents the value of completing a job for an organization. We begin by exploring the trade-off between two commonly used static power allocation strategies (uniform and greedy) in a power-constrained oversubscribed system. We simulate a large-scale system and demonstrate that, at the tightest power constraint, the greedy allocation can lead to 30% higher productivity compared to the uniform allocation whereas, the uniform allocation can gain up to 6% higher productivity at the relaxed power constraint. We then propose a new dynamic power allocation strategy that utilizes power-performance models derived from offline data. We use these models for reallocating power from running jobs to newly arrived jobs to increase overall system utilization and productivity. In our simulation study, we show that compared to static allocation, the dynamic power allocation policy improves node utilization and job completion rates by 20% and 9%, respectively, at the tightest power constraint. Our dynamic approach consistently earns up to 8% higher productivity compared to the best performing static strategy under different power constraints.
High performance computing (HPC) systems are confronting the challenge of improving their product... more High performance computing (HPC) systems are confronting the challenge of improving their productivity under a system-wide power constraint in the exascale era. To measure the productivity of an HPC job, researchers have proposed to assign a monotonically decreasing time-dependent value function, called job-value, to that job. These job-value functions are used by the value-based scheduling algorithms to maximize the system productivity where system productivity is the accumulation of job-value for the completed jobs. In this study, we first show that the relative performance of the competing state-of-the-art static power allocation strategies interchange based on the level of the power constraint when applied to the value-based algorithms. We then investigate the limitations of these static strategies by relating the job completion rate to the resource utilization, and expose that there is non-negligible amount of unused resources for the scheduler to utilize. Even though the system is oversubscribed, these unused resources are insufficient to schedule new high-value jobs. Based on this observation, we propose a novel dynamic power management strategy for the value-based algorithms. Our dynamic allocation policy maximizes the system productivity, resource utilization, and job completion rate by utilizing application power-performance models to reallocate power from running jobs to newly arrived jobs. We simulate a large-scale system that uses job arrival traces from a real HPC system. We demonstrate that the dynamic-variant of each value-based algorithm earns up to 16% higher productivity and completes 13% more jobs compared to its static variants when power becomes a highly constrained resource in the system.
IEEE Transactions on Parallel and Distributed Systems, 2020
In this study, we investigate limitations in the traditional value-based algorithms for a power-c... more In this study, we investigate limitations in the traditional value-based algorithms for a power-constrained HPC system and evaluate their impact on HPC productivity. We expose the trade-off between allocating system-wide power budget uniformly and greedily under different system-wide power constraints in an oversubscribed system. We experimentally demonstrate that, under the tightest power constraint, the mean productivity of the greedy allocation is 38% higher than the uniform allocation whereas, under the intermediate power constraint, the uniform allocation has a mean productivity of 6% higher than the greedy allocation. We then propose a new algorithm that adapts its behavior to deliver the combined benefits of the two allocation strategies. We design a methodology with online retraining capability to create application-specific power-execution time models for a class of HPC applications. These models are used in predicting the execution time of an application on the available resources at the time of making scheduling decisions in the power-aware algorithms. We evaluate the proposed algorithm using emulation and simulation environments, and show that our adaptive strategy results in improving HPC resource utilization while delivering a mean productivity that is almost the same as the best performing algorithm across various system-wide power constraints.
The Earth System Grid Federation (ESGF) is driven by a collection of independently funded nationa... more The Earth System Grid Federation (ESGF) is driven by a collection of independently funded national and international projects that develop, deploy, and maintain the necessary open-source software infrastructure to empower geoscience collaboration and the study of climate science. This successful international collaboration manages the first-ever decentralized database for handling climate science data, with multiple petabytes (PBs) of data at dozens of federated sites worldwide. ESGF's widespread adoption, federation capabilities, broad developer base, and focus on climate science data distinguish it from other collaborative knowledge systems. The ESGF distributed archive holds the premier collection of simulations, observations, and reanalysis data to support the analysis of climate research. It is the leading archive for today's climate model data holdings-including the most important and largest datasets of global climate model simulations. For this longstanding commitment to collaboration through innovative technology, the ESGF has been recognized by R&D Magazine with a 2017 R&D 100 Award. The ESGF's mission is to facilitate scientific research and discovery on a global scale and maintain a robust, international federated data grid for climate research. The ESGF architecture federates a geographically distributed network of climate modeling and data centers that are independently administered yet united by common protocols and APIs. The cornerstone of its interoperability is peer-to-peer messaging, which continuously exchanges information among all nodes through a shared, secure architecture for search and discovery. The ESGF integrates popular open-source application engines with custom components for data publishing, searching, user interface (UI), security, metrics, and messaging to provide PBs of geophysical data to roughly 25,000 users from over 1,400 sites on six continents. It contains output from the Coupled Model Intercomparison Project (CMIP), used by authors of the Intergovernmental Panel on Climate Change (IPCC) Third, Fifth, and Sixth Assessment Reports, and output from the U.S. Department of Energy's (DOE's) Energy Exascale Earth System Model (E3SM) and the European Union's (EU's) Coordinated Regional Climate Downscaling Experiment (CORDEX) projects, to name only a few. These goals will support a data-sharing ecosystem and, ultimately, provide predictive understanding of couplings and feedbacks among natural-system and anthropogenic processes across a wide range of geophysical spatial scales. They will also help to expand access to relevant data and information integrated with tools for analysis and visualization supported by the necessary hardware and network capabilities to make sense of peta-/exascale scientific data. ESGF is continuously adding new data sets based on community requests and needs and in the future, it intends to widen its scope to include other climate-related datasets such as downscaled model data, climate predictions from both operational and experimental systems, and other derived data sets. Over the next few years, we propose to: sustain and enhance a resilient data infrastructure with friendlier tools for the expanding global scientific community, and prototype new tools that fill important capability gaps in scientific data archiving, access, and analysis. The information emerging from collaboration between interagency partner meetings influences requirements, development, and operations. In December of 2018, representatives from a significant fraction of projects utilizing ESGF to disseminate and analyze data attended the sixth annual ESGF Face-to-Face (F2F) Conference (https://esgf.llnl.gov/esgfmedia/pdf/2018_8th_Annual_ESGF_Conference_Report_final.pdf). Attendees provided important feedback regarding current and future community data use
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
In this paper, we address the challenge of analyzing simulation data on HPC systems by using Apac... more In this paper, we address the challenge of analyzing simulation data on HPC systems by using Apache Spark, which is a Big Data framework. One of the main problems we encountered with using Spark on HPC systems is the ephemeral data explosion, which is brought about by the curse of persistence in the Spark framework. Data persistence is essential in reducing I/O, but it comes at the cost of storage space. We show that in some cases, Spark scratch data can consume an order of magnitude more space than the input data being analyzed, leading to fatal out-of-disk errors. We investigate the real-world application of scaling machine learning algorithms to predict and analyze failures in multi-physics simulations on 76TB of data (over one trillion training examples). This problem is 2-3 orders of magnitude larger than prior work. Based on extensive experiments at scale, we provide several concrete recommendations as state-of-the-practice, and demonstrate a 7x reduction in disk utilization with negligible increases or even decreases in runtime.
As datasets grow beyond the gigabyte scale, there is an increasing demand to develop techniques f... more As datasets grow beyond the gigabyte scale, there is an increasing demand to develop techniques for dealing/interacting with them. To this end, the DataFoundry team at the Lawrence Livermore National Laboratory has developed a software prototype called Approximate Adhoc Query Engine for Simulation Data (AQSim). The goal of AQSim is to provide a framework that allows scientists to interactively perform adhoc queries over terabyte scale datasets using numerical models as proxies for the original data. The advantages of this system are several. The first is that by storing only the model parameters, each dataset occupies a smaller footprint compared to the original, increasing the shelf life of such datasets before they are sent to archival storage. Second, the models are geared towards approximate querying as they are built at different resolutions, allowing the user to make the tradeoff between model accuracy and query response time. This allows the user greater opportunities for exploratory data analysis. Lastly, several different models are allowed, each focusing on a different characteristic of the data thereby enhancing the interpretability of the data compared to the original. The focus of this paper is on the modeling aspects of the AQSim framework.
Data management is the organization of information to support efficient access and analysis. For ... more Data management is the organization of information to support efficient access and analysis. For data intensive computing applications, the speed at which relevant data can be accessed is a limiting factor in terms of the size and complexity of computation that can be performed. Data access speed is impacted by the size of the relevant subset of the data, the complexity of the query used to define it, and the layout of the data relative to the query. As the underlying data sets become increasingly complex, the questions asked of it become more involved as well. For example, geospatial data associated with a city is no longer limited to the map data representing its streets, but now also includes layers identifying utility lines, key points, locations and types of businesses within the city limits, tax information for each land parcel, satellite imagery, and possibly even street-level views. As a result, queries have gone from simple questions, such as “how long is Main Street?”, to ...
Scientific experiments typically produce a plethora of files in the form of intermediate data or ... more Scientific experiments typically produce a plethora of files in the form of intermediate data or experimental results. As the project grows in scale, there is an increased need for tools and techniques that link together relevant experimental artifacts, especially if the files are heterogeneous and distributed across multiple locations. Current provenance and search techniques, however, fall short in efficiently retrieving experiment-related files, presumably because they are not tailored towards the common use cases of researchers. In this position paper, we propose Experiment Explorer, a lightweight and efficient approach that takes advantage of metadata to retrieve and visualize relevant experiment-related files.
2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2019
In the exascale era, HPC systems are expected to operate under different system-wide power-constr... more In the exascale era, HPC systems are expected to operate under different system-wide power-constraints. For such power-constrained systems, improving per-job flops-per-watt may not be sufficient to improve the total HPC productivity as more number of scientific applications with different compute intensities are migrating to the HPC systems. To measure HPC productivity for such applications, we utilize a monotonically decreasing time-dependent value function, called job-value, with each application. A job-value function represents the value of completing a job for an organization. We begin by exploring the trade-off between two commonly used static power allocation strategies (uniform and greedy) in a power-constrained oversubscribed system. We simulate a large-scale system and demonstrate that, at the tightest power constraint, the greedy allocation can lead to 30% higher productivity compared to the uniform allocation whereas, the uniform allocation can gain up to 6% higher productivity at the relaxed power constraint. We then propose a new dynamic power allocation strategy that utilizes power-performance models derived from offline data. We use these models for reallocating power from running jobs to newly arrived jobs to increase overall system utilization and productivity. In our simulation study, we show that compared to static allocation, the dynamic power allocation policy improves node utilization and job completion rates by 20% and 9%, respectively, at the tightest power constraint. Our dynamic approach consistently earns up to 8% higher productivity compared to the best performing static strategy under different power constraints.
High performance computing (HPC) systems are confronting the challenge of improving their product... more High performance computing (HPC) systems are confronting the challenge of improving their productivity under a system-wide power constraint in the exascale era. To measure the productivity of an HPC job, researchers have proposed to assign a monotonically decreasing time-dependent value function, called job-value, to that job. These job-value functions are used by the value-based scheduling algorithms to maximize the system productivity where system productivity is the accumulation of job-value for the completed jobs. In this study, we first show that the relative performance of the competing state-of-the-art static power allocation strategies interchange based on the level of the power constraint when applied to the value-based algorithms. We then investigate the limitations of these static strategies by relating the job completion rate to the resource utilization, and expose that there is non-negligible amount of unused resources for the scheduler to utilize. Even though the system is oversubscribed, these unused resources are insufficient to schedule new high-value jobs. Based on this observation, we propose a novel dynamic power management strategy for the value-based algorithms. Our dynamic allocation policy maximizes the system productivity, resource utilization, and job completion rate by utilizing application power-performance models to reallocate power from running jobs to newly arrived jobs. We simulate a large-scale system that uses job arrival traces from a real HPC system. We demonstrate that the dynamic-variant of each value-based algorithm earns up to 16% higher productivity and completes 13% more jobs compared to its static variants when power becomes a highly constrained resource in the system.
IEEE Transactions on Parallel and Distributed Systems, 2020
In this study, we investigate limitations in the traditional value-based algorithms for a power-c... more In this study, we investigate limitations in the traditional value-based algorithms for a power-constrained HPC system and evaluate their impact on HPC productivity. We expose the trade-off between allocating system-wide power budget uniformly and greedily under different system-wide power constraints in an oversubscribed system. We experimentally demonstrate that, under the tightest power constraint, the mean productivity of the greedy allocation is 38% higher than the uniform allocation whereas, under the intermediate power constraint, the uniform allocation has a mean productivity of 6% higher than the greedy allocation. We then propose a new algorithm that adapts its behavior to deliver the combined benefits of the two allocation strategies. We design a methodology with online retraining capability to create application-specific power-execution time models for a class of HPC applications. These models are used in predicting the execution time of an application on the available resources at the time of making scheduling decisions in the power-aware algorithms. We evaluate the proposed algorithm using emulation and simulation environments, and show that our adaptive strategy results in improving HPC resource utilization while delivering a mean productivity that is almost the same as the best performing algorithm across various system-wide power constraints.
The Earth System Grid Federation (ESGF) is driven by a collection of independently funded nationa... more The Earth System Grid Federation (ESGF) is driven by a collection of independently funded national and international projects that develop, deploy, and maintain the necessary open-source software infrastructure to empower geoscience collaboration and the study of climate science. This successful international collaboration manages the first-ever decentralized database for handling climate science data, with multiple petabytes (PBs) of data at dozens of federated sites worldwide. ESGF's widespread adoption, federation capabilities, broad developer base, and focus on climate science data distinguish it from other collaborative knowledge systems. The ESGF distributed archive holds the premier collection of simulations, observations, and reanalysis data to support the analysis of climate research. It is the leading archive for today's climate model data holdings-including the most important and largest datasets of global climate model simulations. For this longstanding commitment to collaboration through innovative technology, the ESGF has been recognized by R&D Magazine with a 2017 R&D 100 Award. The ESGF's mission is to facilitate scientific research and discovery on a global scale and maintain a robust, international federated data grid for climate research. The ESGF architecture federates a geographically distributed network of climate modeling and data centers that are independently administered yet united by common protocols and APIs. The cornerstone of its interoperability is peer-to-peer messaging, which continuously exchanges information among all nodes through a shared, secure architecture for search and discovery. The ESGF integrates popular open-source application engines with custom components for data publishing, searching, user interface (UI), security, metrics, and messaging to provide PBs of geophysical data to roughly 25,000 users from over 1,400 sites on six continents. It contains output from the Coupled Model Intercomparison Project (CMIP), used by authors of the Intergovernmental Panel on Climate Change (IPCC) Third, Fifth, and Sixth Assessment Reports, and output from the U.S. Department of Energy's (DOE's) Energy Exascale Earth System Model (E3SM) and the European Union's (EU's) Coordinated Regional Climate Downscaling Experiment (CORDEX) projects, to name only a few. These goals will support a data-sharing ecosystem and, ultimately, provide predictive understanding of couplings and feedbacks among natural-system and anthropogenic processes across a wide range of geophysical spatial scales. They will also help to expand access to relevant data and information integrated with tools for analysis and visualization supported by the necessary hardware and network capabilities to make sense of peta-/exascale scientific data. ESGF is continuously adding new data sets based on community requests and needs and in the future, it intends to widen its scope to include other climate-related datasets such as downscaled model data, climate predictions from both operational and experimental systems, and other derived data sets. Over the next few years, we propose to: sustain and enhance a resilient data infrastructure with friendlier tools for the expanding global scientific community, and prototype new tools that fill important capability gaps in scientific data archiving, access, and analysis. The information emerging from collaboration between interagency partner meetings influences requirements, development, and operations. In December of 2018, representatives from a significant fraction of projects utilizing ESGF to disseminate and analyze data attended the sixth annual ESGF Face-to-Face (F2F) Conference (https://esgf.llnl.gov/esgfmedia/pdf/2018_8th_Annual_ESGF_Conference_Report_final.pdf). Attendees provided important feedback regarding current and future community data use
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
In this paper, we address the challenge of analyzing simulation data on HPC systems by using Apac... more In this paper, we address the challenge of analyzing simulation data on HPC systems by using Apache Spark, which is a Big Data framework. One of the main problems we encountered with using Spark on HPC systems is the ephemeral data explosion, which is brought about by the curse of persistence in the Spark framework. Data persistence is essential in reducing I/O, but it comes at the cost of storage space. We show that in some cases, Spark scratch data can consume an order of magnitude more space than the input data being analyzed, leading to fatal out-of-disk errors. We investigate the real-world application of scaling machine learning algorithms to predict and analyze failures in multi-physics simulations on 76TB of data (over one trillion training examples). This problem is 2-3 orders of magnitude larger than prior work. Based on extensive experiments at scale, we provide several concrete recommendations as state-of-the-practice, and demonstrate a 7x reduction in disk utilization with negligible increases or even decreases in runtime.
As datasets grow beyond the gigabyte scale, there is an increasing demand to develop techniques f... more As datasets grow beyond the gigabyte scale, there is an increasing demand to develop techniques for dealing/interacting with them. To this end, the DataFoundry team at the Lawrence Livermore National Laboratory has developed a software prototype called Approximate Adhoc Query Engine for Simulation Data (AQSim). The goal of AQSim is to provide a framework that allows scientists to interactively perform adhoc queries over terabyte scale datasets using numerical models as proxies for the original data. The advantages of this system are several. The first is that by storing only the model parameters, each dataset occupies a smaller footprint compared to the original, increasing the shelf life of such datasets before they are sent to archival storage. Second, the models are geared towards approximate querying as they are built at different resolutions, allowing the user to make the tradeoff between model accuracy and query response time. This allows the user greater opportunities for exploratory data analysis. Lastly, several different models are allowed, each focusing on a different characteristic of the data thereby enhancing the interpretability of the data compared to the original. The focus of this paper is on the modeling aspects of the AQSim framework.
Data management is the organization of information to support efficient access and analysis. For ... more Data management is the organization of information to support efficient access and analysis. For data intensive computing applications, the speed at which relevant data can be accessed is a limiting factor in terms of the size and complexity of computation that can be performed. Data access speed is impacted by the size of the relevant subset of the data, the complexity of the query used to define it, and the layout of the data relative to the query. As the underlying data sets become increasingly complex, the questions asked of it become more involved as well. For example, geospatial data associated with a city is no longer limited to the map data representing its streets, but now also includes layers identifying utility lines, key points, locations and types of businesses within the city limits, tax information for each land parcel, satellite imagery, and possibly even street-level views. As a result, queries have gone from simple questions, such as “how long is Main Street?”, to ...
Scientific experiments typically produce a plethora of files in the form of intermediate data or ... more Scientific experiments typically produce a plethora of files in the form of intermediate data or experimental results. As the project grows in scale, there is an increased need for tools and techniques that link together relevant experimental artifacts, especially if the files are heterogeneous and distributed across multiple locations. Current provenance and search techniques, however, fall short in efficiently retrieving experiment-related files, presumably because they are not tailored towards the common use cases of researchers. In this position paper, we propose Experiment Explorer, a lightweight and efficient approach that takes advantage of metadata to retrieve and visualize relevant experiment-related files.
Uploads
Papers by Ghaleb Abdulla