Proceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems
Storage systems and their OS components are designed to accommodate a wide variety of application... more Storage systems and their OS components are designed to accommodate a wide variety of applications and dynamic workloads. Storage components inside the OS contain various heuristic algorithms to provide high performance and adaptability for different workloads. These heuristics may be tunable via parameters, and some system calls allow users to optimize their system performance. These parameters are often predetermined based on experiments with limited applications and hardware. Thus, storage systems often run with these predetermined and possibly suboptimal values. Tuning these parameters manually is impractical: one needs an adaptive, intelligent system to handle dynamic and complex workloads. Machine learning (ML) techniques are capable of recognizing patterns, abstracting them, and making predictions on new data. ML can be a key component to optimize and adapt storage systems. In this position paper, we propose KML, an ML framework for storage systems. We implemented a prototype and demonstrated its capabilities on the well-known problem of tuning optimal readahead values. Our results show that KML has a small memory footprint, introduces negligible overhead, and yet enhances throughput by as much as 2.3×. CCS CONCEPTS • Software and its engineering → Operating systems; File systems management; • Computing methodologies → Machine learning.
Operating systems include many heuristic algorithms designed to improve overall storage performan... more Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users—thus burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for most latency in I/O-heavy applications, so even a small latency improvement can be significant. Machine learning (ML) techniques promise to learn patterns, generalize from them, and enable optimal solutions that adapt to changing workloads. We propose that ML solutions become a first-class component in OSs and replace manual heuristics to optimize storage systems dynamically. In this paper, we describe our proposed ML architecture, called KML. We developed a prototype KML architecture and applied it to two case studies: optimizing readahead and NFS read-size values. Our experiments show ...
Optical density (OD) is a fast, cheap, and high-throughput measurement widely used to estimate th... more Optical density (OD) is a fast, cheap, and high-throughput measurement widely used to estimate the density of cells in liquid culture. These measurements, however, cannot be compared between instruments without a standardized calibration protocol and are challenging to relate to actual cell count. We address these shortcomings with an interlaboratory study comparing three OD calibration protocols, as applied to eight strains of E. coli engineered to constitutively express varying levels of GFP. These three protocols—comparison with colloidal silica (LUDOX), serial dilution of silica microspheres, and a reference colony-forming unit (CFU) assay—are all simple, low-cost, and highly accessible. Based on the results produced by the 244 teams completing this interlaboratory study, we recommend calibrating OD using serial dilution of silica microspheres, which readily produces highly precise calibration (95.5% of teams having residuals less than 1.2-fold), is easily assessed for quality c...
Proceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems
Storage systems and their OS components are designed to accommodate a wide variety of application... more Storage systems and their OS components are designed to accommodate a wide variety of applications and dynamic workloads. Storage components inside the OS contain various heuristic algorithms to provide high performance and adaptability for different workloads. These heuristics may be tunable via parameters, and some system calls allow users to optimize their system performance. These parameters are often predetermined based on experiments with limited applications and hardware. Thus, storage systems often run with these predetermined and possibly suboptimal values. Tuning these parameters manually is impractical: one needs an adaptive, intelligent system to handle dynamic and complex workloads. Machine learning (ML) techniques are capable of recognizing patterns, abstracting them, and making predictions on new data. ML can be a key component to optimize and adapt storage systems. In this position paper, we propose KML, an ML framework for storage systems. We implemented a prototype and demonstrated its capabilities on the well-known problem of tuning optimal readahead values. Our results show that KML has a small memory footprint, introduces negligible overhead, and yet enhances throughput by as much as 2.3×. CCS CONCEPTS • Software and its engineering → Operating systems; File systems management; • Computing methodologies → Machine learning.
Operating systems include many heuristic algorithms designed to improve overall storage performan... more Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users—thus burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for most latency in I/O-heavy applications, so even a small latency improvement can be significant. Machine learning (ML) techniques promise to learn patterns, generalize from them, and enable optimal solutions that adapt to changing workloads. We propose that ML solutions become a first-class component in OSs and replace manual heuristics to optimize storage systems dynamically. In this paper, we describe our proposed ML architecture, called KML. We developed a prototype KML architecture and applied it to two case studies: optimizing readahead and NFS read-size values. Our experiments show ...
Optical density (OD) is a fast, cheap, and high-throughput measurement widely used to estimate th... more Optical density (OD) is a fast, cheap, and high-throughput measurement widely used to estimate the density of cells in liquid culture. These measurements, however, cannot be compared between instruments without a standardized calibration protocol and are challenging to relate to actual cell count. We address these shortcomings with an interlaboratory study comparing three OD calibration protocols, as applied to eight strains of E. coli engineered to constitutively express varying levels of GFP. These three protocols—comparison with colloidal silica (LUDOX), serial dilution of silica microspheres, and a reference colony-forming unit (CFU) assay—are all simple, low-cost, and highly accessible. Based on the results produced by the 244 teams completing this interlaboratory study, we recommend calibrating OD using serial dilution of silica microspheres, which readily produces highly precise calibration (95.5% of teams having residuals less than 1.2-fold), is easily assessed for quality c...
Uploads
Papers by Lukas Velikov